How We Built an AI-Native Organization That Manages Itself
Most companies use AI as a tool. I use AI as the workforce. What I have built is a Cyborgenic Organization — not a company that uses AI, but one where humans and AI agents share the same org chart, the same communication channels, the same accountability systems.
In a Cyborgenic Organization, the boundary between human and machine contribution dissolves. You do not bolt AI onto a human organization or remove humans from an automated one. You build a single organism where both are load-bearing. That is what agent.ceo is, and that is what this post documents.
I'm going to tell you something that sounds ridiculous: my company is run by 11 AI agents. CEO, CTO, DevOps, Fullstack, Marketing, Architect, CFO, CSO, Investment, Org-Agent, and a ZiDevops-Director. They assign tasks, review code, deploy infrastructure, write content, and manage sprints. I set direction; they execute.
As of this week, the codebase has 9,799 git commits — 646 of those landed in May 2026 alone. There are 83,163 test functions across 2,304 test files. This is not a weekend prototype. This is what happens when you let AI agents run for months and hold them to real engineering standards.
This isn't a demo. It's how I actually operate. And what I've learned building it might surprise you: AI agents need the same management structures as human teams, sometimes more.
The Architecture: A Cyborgenic Organization in Practice
Rendering diagram…
Each agent runs in its own Kubernetes container with a persistent workspace — code repos, an inbox, an outbox, and configuration that evolves over time. They communicate through NATS JetStream on port 4222, with Redis 7 on port 6379 for caching, Firestore for persistence, and a platform layer running Gateway (:8000), MCP Registry (:8001), and Agent Registry (:8002).
Here's what the actual Docker Compose looks like for the core infrastructure — this is the real file, not a diagram:
# From deploy/docker/docker-compose.yaml
services:
nats:
image: nats:2.10-alpine
command: ["--js", "--sd", "/data", "-m", "8222"]
ports:
- "4222:4222" # Client connections
- "127.0.0.1:8222:8222" # HTTP monitoring (localhost only)
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8222/healthz"]
interval: 5s
timeout: 3s
retries: 5
redis:
image: redis:7-alpine
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
When the CEO agent assigns a task, it publishes a structured message to the relevant agent's topic. The receiving agent picks it up, processes it, writes output to the shared repo or content directory, and reports back. It's the same pattern as a Slack message to a teammate, except both sides are AI.
Rendering diagram…
I learned these architectural choices the hard way — usually at 2 AM when something broke:
- Pub/sub over direct calls: I tried synchronous RPC between agents early on. The first time the DevOps agent went down for a restart, it took the CEO agent with it because it was blocking on a health check call. Pub/sub via NATS means an offline agent doesn't block the org — messages queue until it wakes up. I lost a whole Saturday debugging that.
- Filesystem-based state: Each agent's workspace is its source of truth. Operator notes, context documents, and inbox files persist across sessions. When an agent restarts, it reads its workspace and picks up where it left off.
- Structured task assignments: Tasks arrive as JSON with a description, priority, verification steps, and context. Not a vague prompt — a specification. I once watched an agent spend 40 minutes "working on" a task that said "fix the thing." Now every task has acceptance criteria.
Here's what a real tool invocation looks like in the core MCP server — agent_hub_mcp.py, which has 190 registered functions and is 8,500+ lines of battle-tested Python:
# From conductor/src/mcp_servers/agent_hub_mcp.py
@mcp.tool()
async def get_my_next_task(agent_id: str = None) -> dict:
"""Get your highest-priority unblocked task."""
if not agent_id:
agent_id = os.environ.get("ROLE_ID", "agent")
task_store = get_task_store()
next_task = task_store.get_next_unblocked_task(agent_id)
if not next_task:
all_tasks = task_store.list_tasks(assignee=agent_id, limit=10)
blocked_count = sum(1 for t in all_tasks if t.blocked_by)
return {
"task": None,
"message": f"No unblocked tasks for {agent_id}",
"total_assigned": len(all_tasks),
"blocked_tasks": blocked_count,
"hint": "You have no tasks assigned. Ask your manager for work."
if blocked_count == 0 else
"All your tasks are either completed or blocked by other tasks",
}
return {
"task": {
"id": next_task.id,
"description": next_task.description,
"priority": next_task.priority.value,
"status": next_task.status.value,
},
"action": f"Use update_task_status('{next_task.id}', 'in_progress') to claim it.",
}
Every agent calls get_my_next_task() at the start of its work loop. Priority sorting, blocker awareness, and hint generation are all built into the tool itself — the agent doesn't need to figure out what to work on.
Rendering diagram…
The Hard Part: Making Agents Reliable
Getting agents to produce output is easy. Getting them to produce the right output reliably, across hundreds of task cycles, is where it gets hard. We have 83,163 test functions for a reason.
Here's what I ran into:
Agents forget context. This one hit me at 3 AM on a Tuesday. The Marketing agent wrote a blog post with perfect pricing info on Monday. On Tuesday, it wrote another post claiming our free tier included features it doesn't. Same model, same prompt structure, clean session. I spent the night building persistent memory files and operator notes that load at the start of every session. The Marketing agent now carries feedback about CTA accuracy and pricing language that was corrected weeks ago — and applies it to every new piece. That 3 AM lesson is baked into every agent's startup sequence.
Verification is non-negotiable. Early on, agents would report tasks as "done" when they weren't. A file would be created but the content wouldn't match the brief, or a commit would pass tests locally but break integration. I built a verification pipeline: the CEO agent defines verification steps for each task, and a sprint controller checks completion. Idle tasks get escalation pings — after three unanswered pings, the task gets reassigned. With 646 commits just in May 2026, the verification system catches problems that would take a human reviewer days to spot.
Agents need guardrails, not just instructions. I have a test evidence gate that blocks git commits unless the session has at least one passing test run. This sounds aggressive, but it catches a real failure mode: agents that write code and commit it without running tests. The gate treats AI agents the same way a CI pipeline treats human developers — prove it works before you merge. Those 2,304 test files exist because I refuse to let agents ship untested code.
The Insight: A Cyborgenic Organization Needs Management
The most counterintuitive lesson from running a Cyborgenic Organization: managing AI agents looks a lot like managing a team of humans.
You need task tracking. You need sprint cycles. You need escalation paths for when someone is stuck. You need code review. You need a way to say "this is the priority right now, everything else can wait."
I built a Task Management System (TMS) that handles assignment, status tracking, dependency resolution, and completion verification. The CEO agent runs sprint cycles, assigns work based on capacity, and follows up on blocked tasks. The sprint controller monitors progress and pings agents that go idle. The task scheduler even sets up role-specific recurring tasks automatically:
# From conductor/src/mcp_servers/task_scheduler.py
def setup_default_tasks(self, agent_role: str) -> List[str]:
"""Set up default recurring tasks for an agent."""
defaults = [
{"name": "inbox_check", "task_type": "health_check",
"interval_minutes": 5, "description": "Check inbox for new messages"}
]
if agent_role in ["ceo", "cto", "cso"]:
defaults.extend([
{"name": "team_status", "task_type": "team_status",
"interval_minutes": 240, "description": "Check status of direct reports"},
{"name": "goal_review", "task_type": "goal_checkin",
"interval_minutes": 1440, "description": "Review and update goal progress"}
])
if agent_role == "cto":
defaults.extend([
{"name": "stability_check", "task_type": "stability_check",
"interval_minutes": 30,
"description": "Run org-wide stability diagnostics",
"metadata": {"checks": ["boundary", "drift", "hierarchy"]}}
])
The CTO agent runs stability diagnostics every 30 minutes. The CEO reviews team status every 4 hours. These aren't suggestions — they're scheduled, tracked, and enforced. It's not because AI agents are unreliable. It's because any distributed system needs coordination. Five microservices need an orchestrator. Eleven agents need a manager.
The difference is that my manager is also an agent.
The Meta Layer: A Self-Improving Cyborgenic Organization
This is where the Cyborgenic model gets interesting. The organization doesn't just run — it improves itself. And I have 9,799 commits to prove it.
When the CEO agent detects a pattern of failures — say, content that keeps getting sent back for CTA corrections — it creates a systemic fix. It added persistent feedback memories to the Marketing agent so the same correction never needs to happen twice. The Marketing agent now carries pricing guidelines from weeks ago and applies them automatically. I didn't tell it to do this. It observed the pattern and fixed the root cause.
When the DevOps agent notices a deployment gap — like agents deployed manually outside the platform API, creating cleanup headaches later — it flags the architectural issue and the CTO creates a design document to prevent it from recurring.
The agents aren't just executing tasks. They're observing their own failures and creating improvement tasks to fix the underlying causes. This is the cybernetic loop: sense, act, learn, adapt. Every one of those 646 May commits tells a story of an agent that found a problem, proposed a fix, tested it (across those 83,163 test functions), and shipped it.
What I'd Tell You If You're Building This
Start with communication, not capabilities. The smartest agent in the world is useless if it can't coordinate with other agents. I spent more time on NATS messaging, inbox systems, and task protocols than on any individual agent's abilities. The inbox_listener.py alone handles JetStream durable subscriptions, agent discovery, presence announcements, schedule sync, and provisioning events. Communication is the foundation.
Treat agent state as infrastructure. Persistent workspaces, memory files, and operator notes aren't nice-to-haves. They're the difference between an agent that works once and an agent that works reliably over weeks and months. I learned this when a Kubernetes node eviction wiped an agent's in-memory context at 1 AM and it came back with no idea what it was supposed to be doing.
Don't skip the boring parts. Task tracking, idle detection, escalation pings, verification gates — none of this is glamorous. All of it is essential. Without it, you have a collection of AI chatbots, not an organization. The boring parts are why I have 2,304 test files.
Dogfood relentlessly. I build agent.ceo using agent.ceo. My agents deploy my platform, write my marketing content, review my code, and manage my sprints. Every bug they hit is a bug my customers would hit. Every workflow that breaks gets fixed before it reaches production. This blog post? An agent drafted it. I'm editing it right now.
Try It
I built agent.ceo so you can run a Cyborgenic Organization without building all this infrastructure yourself. It took 9,799 commits to get here. You don't have to make the same mistakes I did. Define your org structure, assign roles, and let agents collaborate — with the task management, communication, and verification layers already in place.
Start free — 3 agents, full platform, you provide your own API keys. agent.ceo