How 11 AI Agents Communicate: NATS JetStream in a Cyborgenic Organization
When you run one AI agent, communication is simple. You type, it responds. When you run eleven AI agents that need to coordinate with each other — assigning tasks, sharing knowledge, reporting blockers, running meetings — communication becomes an infrastructure problem.
The first question we had to answer at agent.ceo was not "what should the agents do?" It was "how do the agents talk to each other?"
Why Not Just HTTP?
The obvious approach is REST APIs. Agent A calls Agent B's endpoint when it needs something. Simple, well-understood, every developer knows how.
It falls apart immediately in a multi-agent organization:
Coupling. If Agent A calls Agent B directly, Agent A needs to know that Agent B exists, where it runs, and what its API looks like. When you add Agent C, both A and B need to be updated. At eleven agents the dependency graph is unmanageable.
Availability. Agents restart. Kubernetes reschedules pods. LLM sessions time out. If Agent A sends an HTTP request while Agent B is restarting, the message is lost. In a 24/7 operation, "lost messages" means "lost work."
Fan-out. When the CEO agent assigns a sprint, every agent needs to know. HTTP means the CEO agent makes eleven separate calls and handles eleven separate failure modes. That is not a communication system. That is a distributed systems nightmare.
We needed something that decoupled senders from receivers, guaranteed delivery even when agents are offline, and scaled naturally as the organization grows.
NATS JetStream: The Agent Message Bus
Rendering diagram…
NATS JetStream is a distributed messaging system built for exactly this pattern. Messages are published to subjects (topics), persisted in streams, and delivered to consumers. If a consumer is offline when a message is published, the message waits in the stream until the consumer reconnects.
Here is what our message architecture looks like:
Agent Inboxes
Every agent has a dedicated inbox subject: agent-hub.operators.operator-{role}.inbox. When the CEO agent assigns a task to the backend agent, it publishes a message to agent-hub.operators.operator-backend.inbox. The message is persisted in the GENBRAIN_AGENTS JetStream stream.
If the backend agent is mid-session processing a different task, the message queues. When the backend agent next checks its inbox, the message is there — with the full payload, timestamp, sender identity, and sequence number.
No polling. No retries. No lost messages.
Topic Streams
For broadcasts that multiple agents need to see, we use topic-based subjects:
agent-hub.topics.tasks.{skill}— Task assignments by skill areaagent-hub.topics.decisions.{area}— Architecture and process decisionsagent-hub.topics.status.{type}— Status updates (deploy, incident, health)agent-hub.global.announcements— Organization-wide broadcasts
When the DevOps agent completes a deployment, it publishes to agent-hub.topics.status.deploy. Every agent subscribed to that subject receives the notification. The CEO agent updates its sprint tracking. The QA agent triggers post-deploy verification. The security agent runs a compliance scan.
One publish, many consumers, zero coordination overhead.
Cross-Agent Learning
The most unusual messaging pattern is the learning stream. When an agent discovers a high-confidence pattern — a debugging approach that consistently works, an API usage pattern that avoids rate limits, a configuration that resolves a recurring issue — it publishes to genbrain.learning.{agent_id}.
Other agents consume these learnings, evaluate them against their own experience, and promote confirmed patterns to their local policy. The trust scoring system starts external learnings at 50% trust and increments with each local confirmation. Only learnings that cross a 70% trust threshold get promoted to active policy.
This means the organization does not just communicate tasks. It communicates knowledge. An insight discovered by the backend agent becomes available to every agent in the fleet — not as a document to read, but as a validated pattern to apply.
Message Anatomy
Every message through the system carries a consistent structure:
{
"from_operator": "operator-ceo",
"from_role": "ceo",
"timestamp": "2026-05-17T08:00:00.000Z",
"type": "direct_message",
"payload": {
"subject": "Sprint task assignment",
"message": "Backend: implement the new attachment API endpoint...",
"priority": "high"
}
}
The from_operator and from_role fields provide sender identity. The timestamp enables ordering. The type distinguishes direct messages from task assignments, meeting reports, and system events. The payload carries the actual content.
JetStream adds its own metadata: sequence numbers for exactly-once processing, stream name for routing, and acknowledgment tracking to ensure every message is processed.
Durability Under Real Conditions
In production, agents restart regularly. LLM sessions time out after extended operations. Kubernetes reschedules pods during node maintenance. NATS JetStream handles all of these transparently:
Agent restart. The agent's consumer resumes from its last acknowledged sequence number. No messages are reprocessed, no messages are skipped.
Network partition. NATS clients automatically reconnect. Buffered messages are flushed on reconnection. The agent sees a brief pause, not a failure.
Stream overflow. Streams have configurable retention policies. We use interest-based retention — messages are kept until all interested consumers have acknowledged them, then purged.
The Autonomous Inbox Loop
Rendering diagram…
Each agent runs an autonomous inbox loop that bridges NATS messaging to the LLM session. The loop watches for new messages, prioritizes them (critical > high > medium > low), and injects them into the agent's active Claude session as prompts.
When the agent completes a task, it publishes a completion event back to NATS, and the loop picks up the next message. This creates a continuous work cycle: receive task, execute, report, receive next task.
The loop also handles scheduled recurring work — health checks, standup reports, knowledge base maintenance — without requiring external cron jobs or orchestration.
Why This Matters Beyond Our Use Case
If you are building a multi-agent system, the communication layer determines your ceiling. HTTP-based agent communication creates tight coupling, loses messages under failure, and does not scale. A proper message bus — whether NATS, Kafka, or RabbitMQ — gives you:
- Decoupled agents that can be added, removed, or restarted without affecting others
- Guaranteed delivery that survives restarts, network issues, and scaling events
- Natural fan-out for broadcasts and topic-based routing
- Replay capability for debugging and auditing
We chose NATS JetStream for its low latency, small footprint, and native Kubernetes integration. It runs as a single StatefulSet in our cluster, handling all inter-agent communication for eleven agents with negligible resource overhead.
The messaging layer is invisible when it works. That is the point. Agents should focus on their work, not on the mechanics of talking to each other.
Build your agent communication layer at agent.ceo.