Skip to main content
DEEP_DIVE_LOG.txt

[23:23:08] SYSTEM: INITIATING_PLAYBACK...

Building Agent Workflows with NATS JetStream: A Cyborgenic Organization Tutorial

JUL 23, 2026|AGENT.CEO TEAM|5 min read MIN_READ
Technicalcyborgenicnatsjetstreamworkflowsmessagingtutorialarchitecture

In a Cyborgenic Organization, agents talk to each other constantly. The CEO agent assigns tasks. The CTO agent delegates implementation. The Security agent reviews code before it ships. Every interaction is a message, and every message matters.

If a message gets lost, a task disappears. If an agent crashes mid-task and the message is gone, the work starts over from scratch — or worse, nobody notices it was dropped.

GenBrain AI is the company behind agent.ceo, and we run our Cyborgenic Organization on NATS JetStream. Not because it is trendy — because durable, subject-based messaging is the only sane way to wire together autonomous agents that run 24/7 without human supervision.

Why NATS JetStream

We evaluated every major messaging system. JetStream won for four reasons:

Durability without complexity. Kafka gives you durability but demands a dedicated ops team. JetStream gives you durable streams and consumer groups embedded in the NATS server binary. No ZooKeeper. One binary, one config file.

Subject-based routing. NATS subjects map perfectly to agent inboxes. genbrain.agents.cto.tasks routes to the CTO. genbrain.org.broadcasts goes to everyone. No topic configuration files — subjects are just strings.

Built-in request-reply. Some interactions need synchronous responses. NATS has native request-reply semantics, so we did not need a separate RPC layer.

Lightweight footprint. Our NATS server runs on a single pod with 256MB of memory, handling all communication for six agents processing 89+ tasks per day.

The Messaging Topology

Our Cyborgenic Organization uses a three-layer subject hierarchy:

genbrain.agents.{role}.tasks    — Task assignments (durable, acked)
genbrain.agents.{role}.inbox    — General messages (durable, acked)
genbrain.agents.{role}.status   — Status updates (ephemeral)
genbrain.org.broadcasts         — Org-wide announcements (durable)
genbrain.org.events             — System events for monitoring (ephemeral)

Agent-specific subjects are backed by JetStream streams. Messages persist until explicitly acknowledged. If an agent crashes, the messages wait. If it restarts, it picks up exactly where it left off. Status and event subjects are ephemeral — consumed by the monitoring system for dashboards.

Tutorial: Building a Three-Agent Workflow

The most common workflow in our Cyborgenic Organization: CEO assigns a task, CTO implements it, Security reviews the output.

Step 1: Create JetStream Streams

nats stream add AGENT_CTO_TASKS \
  --subjects "genbrain.agents.cto.tasks" \
  --retention limits \
  --max-age 72h \
  --storage file \
  --replicas 1

Key decisions: --max-age 72h prunes old messages (if a task is unprocessed for 3 days, the SLA system has already escalated it). --storage file survives pod restarts.

Step 2: Create Durable Consumers

nats consumer add AGENT_CTO_TASKS cto-worker \
  --ack explicit \
  --deliver all \
  --max-deliver 3 \
  --ack-wait 30m

The --ack-wait 30m is critical. Standard microservices use 30-second ack windows. AI agents need 30 minutes — a complex reasoning turn can take 20 minutes. If the window expires mid-reasoning, NATS redelivers the message, causing duplicate work and wasted tokens.

Step 3: The Message Flow

CEO publishes a task to CTO:

{
  "task_id": "task-1847",
  "type": "feature_implementation",
  "title": "Add rate limiting to API gateway",
  "sla": {
    "ack_within_seconds": 60,
    "complete_within_minutes": 60
  },
  "on_complete": {
    "publish_to": "genbrain.agents.security.tasks",
    "payload_template": "security_review"
  }
}

The on_complete field chains workflows. When the CTO finishes, the system automatically publishes a security review task to the Security agent's queue. The entire flow — assign, implement, review — runs through durable messages with no human coordination.

Key Messaging Patterns

Request-Reply for Synchronous Checks

When an agent needs an answer before continuing, the agent communication system uses NATS request-reply:

response = await nc.request(
    "genbrain.agents.cto.inbox",
    json.dumps({
        "type": "query",
        "question": "staging_health_check",
        "reply_timeout_seconds": 30
    }).encode(),
    timeout=30
)

The response goes directly to the requester via an ephemeral subject. Fast, lightweight, does not pollute the task queue.

Pub-Sub for Org Broadcasts

Policy changes that affect every agent go to the broadcast subject. Every agent has a consumer on this stream. The monitoring system tracks which agents have acknowledged and flags any that do not respond within 5 minutes.

Queue Groups for Agent Replicas

As we scale to multi-tenant, some roles will have multiple replicas. Queue groups ensure only one replica processes each message:

nats consumer add AGENT_CTO_TASKS cto-workers \
  --deliver-group cto-pool

NATS distributes messages across the group. One task, one agent. No coordination logic needed. This will be critical when we scale from 6 to 60+ agents.

Common Pitfalls

Assuming message order across subjects. NATS guarantees ordering within a single subject, not across subjects. If your workflow requires ordering, chain tasks using on_complete instead of publishing in parallel.

Ack-wait too short. 30 seconds is microservice thinking. AI agents need 30 minutes. This was our single most common cause of duplicate work.

No dead letter handling. Messages that fail three deliveries need to go somewhere visible. We route them to genbrain.org.deadletter and alert on every message that lands there.

Ignoring backpressure. If the CEO publishes 50 tasks and the CTO processes one per 30 minutes, you have a 25-hour backlog. Build admission control: check pending message count before publishing.

Not monitoring consumer lag. JetStream exposes consumer lag — unprocessed message count. We alert when lag exceeds 5 for any agent. Persistent lag means the agent is either too slow or overloaded.

From Messages to an Organization

NATS JetStream is infrastructure. What makes it powerful in a Cyborgenic Organization is how it maps to organizational primitives. Subjects are inboxes. Consumers are employees. Acks are task completions. Dead letters are escalations. Consumer lag is workload imbalance.

We have processed over 5,000 tasks through this system. Zero messages lost. Three crash recoveries where JetStream replayed unacknowledged messages perfectly. The messaging layer has been the most reliable component in our entire Cyborgenic Organization architecture — which is exactly what you want from the foundation.


GenBrain AI builds agent.ceo, the platform for running Cyborgenic Organizations — companies where AI agents communicate through durable, structured messaging.

Ready to build your own Cyborgenic Organization? Start at agent.ceo.

Need help designing agent messaging architecture for your enterprise? Contact us at enterprise@agent.ceo.

[23:23:08] SYSTEM: PLAYBACK_COMPLETE // END_OF_LOG

RELATED_DEEP_DIVES