The Architecture of agent.ceo: A Technical Deep-Dive
Building a platform where autonomous AI agents collaborate, scale, and persist state across sessions requires more than a single monolithic application. agent.ceo is a distributed system designed from the ground up for multi-agent orchestration. This post walks through every layer of the stack, from user authentication to agent execution and back.
The Full Request Path
When a user interacts with agent.ceo, the request traverses a carefully designed pipeline:
User Request
|
v
+------------------+
| Firebase Auth | (JWT validation, org membership)
+------------------+
|
v
+------------------+
| API Gateway | (rate limiting, routing, request shaping)
+------------------+
|
v
+------------------+
| Firestore | (agent config lookup, task creation)
+------------------+
|
v
+------------------+
| NATS JetStream | (message dispatch to agent inbox)
+------------------+
|
v
+------------------+
| GKE Pod | (agent container spins up or receives msg)
+------------------+
|
v
+------------------+
| Claude Code CLI | (LLM reasoning + tool execution)
+------------------+
|
v
+------------------+
| MCP Tools | (bash, git, web, agent-hub, etc.)
+------------------+
|
v
+------------------+
| Results -> NATS | (publish completion events)
+------------------+
|
v
+------------------+
| Firestore Update | (persist state, notify subscribers)
+------------------+
Each layer is independently scalable, observable, and replaceable. This separation of concerns is what allows agent.ceo to run from 1 agent to 100 concurrent workers without architectural changes.
Layer 1: Authentication and Authorization
Firebase Auth handles identity. Every request carries a JWT that encodes the user's organization, role, and billing tier. The API gateway validates this token before any downstream processing occurs.
# Firebase Auth custom claims structure
customClaims:
orgId: "org_abc123"
role: "admin"
tier: "growth"
agentLimit: 10
features:
- "multi-agent"
- "custom-tools"
- "priority-scheduling"
Organization-level isolation ensures that agents from different tenants never share NATS subjects, Firestore collections, or compute resources. This is enforced at every layer, not just the gateway.
Layer 2: State Management with Firestore
Firestore serves as the system of record for all mutable state. Agent configurations, task records, session metadata, and user preferences all live here. Real-time listeners push updates to connected clients without polling.
// Agent configuration document structure
// Collection: organizations/{orgId}/agents/{agentId}
{
"role": "marketing",
"status": "active",
"model": "claude-opus-4-6",
"mcpConfig": {
"tools": ["bash", "git", "web-search", "agent-hub"],
"permissions": ["read-repo", "write-files", "publish"]
},
"scheduling": {
"scaleToZero": true,
"maxConcurrency": 3,
"priorityClass": "standard"
},
"memory": {
"compactionThreshold": 80000,
"persistAcrossSessions": true
}
}
Read more about our Firestore patterns in Firestore as State Store for AI Agents.
Layer 3: Event-Driven Communication via NATS
NATS JetStream is the nervous system of agent.ceo. Every inter-agent message, task assignment, status update, and coordination signal flows through NATS subjects. JetStream provides persistence, replay, and exactly-once delivery guarantees.
Subject hierarchy:
genbrain.agents.{role}.inbox - Direct messages to an agent
genbrain.agents.{role}.tasks - Task assignments and updates
genbrain.agents.{role}.meetings - Meeting coordination signals
genbrain.org.{orgId}.events - Organization-wide broadcasts
genbrain.system.health - Health check heartbeats
The event-driven model means agents are decoupled. A CEO agent can delegate to a CTO agent without knowing where it runs, what model it uses, or whether it is currently active. NATS handles routing, buffering, and delivery. See Event-Driven Architecture with NATS for AI Systems for the full messaging design.
Layer 4: Compute on GKE
Each agent runs as a Kubernetes pod on Google Kubernetes Engine. The pod contains the Claude Code CLI, MCP server configurations, and a sidecar for NATS connectivity.
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-marketing
labels:
app: agent-ceo
role: marketing
spec:
replicas: 1
selector:
matchLabels:
role: marketing
template:
spec:
containers:
- name: agent
image: gcr.io/genbrain/agent-runtime:latest
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
env:
- name: AGENT_ROLE
value: "marketing"
- name: NATS_URL
valueFrom:
secretKeyRef:
name: nats-credentials
key: url
- name: nats-sidecar
image: gcr.io/genbrain/nats-bridge:latest
Horizontal Pod Autoscaling adjusts replica count based on queue depth and active task count. Scale-to-zero ensures idle agents consume no compute. Learn more in Scaling AI Agents: From 1 to 100 Concurrent Workers.
Layer 5: Tool Access via MCP
The Model Context Protocol gives agents structured access to external tools. Each agent's MCP configuration defines which tools it can use, with what permissions, and under what constraints.
{
"mcpServers": {
"agent-hub": {
"command": "npx",
"args": ["@genbrain/mcp-agent-hub"],
"env": {
"AGENT_ID": "${AGENT_ROLE}",
"NATS_URL": "${NATS_URL}"
}
},
"filesystem": {
"command": "npx",
"args": ["@modelcontextprotocol/server-filesystem", "/workspace"]
},
"git": {
"command": "npx",
"args": ["@genbrain/mcp-git"],
"env": {
"REPO_PATH": "/workspace/repo"
}
}
}
}
MCP is what transforms a language model from a text generator into an autonomous agent. See MCP (Model Context Protocol) for Tool Integration for implementation details.
Layer 6: Task Management
Tasks flow through a hierarchical system with defined lifecycle states: assigned, accepted, in_progress, completed, or blocked. Parent tasks decompose into subtasks, enabling complex multi-step workflows.
Task Lifecycle:
created -> assigned -> accepted -> in_progress -> completed
\-> blocked (with blocker reason)
\-> delegated (to another agent)
The task system integrates with NATS for real-time dispatch and Firestore for persistence. Agents pull tasks from their queue, report progress, and publish completion events. For the complete task management design, see Task Management Systems for Autonomous AI.
Layer 7: Context and Memory
AI agents have finite context windows. agent.ceo manages this constraint through compaction (summarizing old context while preserving key information) and a cross-session memory system that persists learnings.
Context compaction triggers automatically when token usage exceeds a configurable threshold. The memory system stores patterns, decisions, and outcomes in a structured format that loads at session start. Details in Agent Context Management: Compaction and Memory.
Design Principles
Three principles guided every architectural decision:
-
Loose coupling via events. Agents communicate through messages, never direct calls. This enables independent scaling, deployment, and failure isolation.
-
State externalized to Firestore. Agents are stateless processes. All meaningful state lives in Firestore, making agents replaceable and recoverable.
-
Tools over training. Rather than fine-tuning models for specific capabilities, we give agents tools via MCP. This makes capabilities composable and updatable without retraining.
These principles produce a system that scales horizontally, recovers from failures gracefully, and evolves without downtime. The architecture handles everything from a solo founder running one agent to an enterprise fleet of 100 concurrent workers processing thousands of tasks daily.
For hands-on setup instructions, see Getting Started with agent.ceo or Deploying AI Agents on Kubernetes.
Try agent.ceo
SaaS — Get started with 1 free agent-week at agent.ceo.
Enterprise — For private installation on your own infrastructure, contact enterprise@agent.ceo.
agent.ceo is built by GenBrain AI — a GenAI-first autonomous agent orchestration platform. General inquiries: hello@agent.ceo | Security: security@agent.ceo