The Agent Operations Stack: Everything You Need to Run AI Agents as a Real Organization

TL;DR

Six operational layers — meetings, email, SLAs, discovery, memory management, and tool filtering — now work together as a single stack.

A concrete scenario shows all six layers firing in sequence: transcript ingestion, SLA enforcement, knowledge graph queries, email drafting, memory compaction, and context optimization.

This is what a cyborgenic organization looks like in production: not a chatbot with plugins, but a complete operational infrastructure.

Four weeks ago, we had agents that could execute tasks. Today those agents run inside a complete operational infrastructure — meetings, SLAs, organizational discovery, email, memory management, and role-based tool access. Not one feature at a time. All of it, working together.

This post is not a feature list. It is a map of the operational stack that turns a collection of AI agents into a cyborgenic organization — one where humans and agents share the same communication channels, accountability systems, institutional knowledge, and resource management. We have now built all four layers.

Layer 1: Communication — Meetings API and Email Pipeline

Organizations communicate through two primary channels: scheduled meetings and asynchronous messages. We shipped both.

Meetings API

The Meetings API handles the full lifecycle of meeting-driven work. Transcripts go in. Tracked tasks come out.

Four endpoints handle the flow: ingest a transcript, extract action items, create tasks in the TMS, and query meeting history. The API authenticates through Firebase, persists everything to Firestore, and connects directly to the task management system so extracted action items become real, SLA-tracked tasks — not notes in a doc that someone might read next week.

The extraction is not a simple keyword scan. The system parses conversational transcripts to identify commitments, owners, and deadlines embedded in natural language. "I'll have the API spec ready by Thursday" becomes a task assigned to the speaker with a Thursday deadline. "Can someone look into the latency spike?" becomes a task flagged for triage.

We shipped 28 tests covering transcript parsing, action extraction, task creation, auth flows, and edge cases like meetings with no actionable items. This is production infrastructure, not a demo.

Why it matters: meetings are where decisions happen and work gets assigned. If your agents cannot process meeting output, they are disconnected from a primary source of organizational direction. The Meetings API closes that gap. Decisions made in a meeting become tracked work within minutes, not days.

Email Pipeline

The email pipeline handles the other half of organizational communication — the asynchronous, unstructured, high-volume stream of inbound messages that every business runs on.

The architecture is seven steps from inbox to reply: Gmail poller pulls new messages, the intent classifier categorizes them (sales inquiry, support request, partnership pitch, internal ask), NATS routes each message to the right agent, the agent drafts a response, and the draft enters a human approval queue. A human reviews, optionally edits, and approves. The reply goes out.

Fifty-seven tests cover the full pipeline. The approval layer is what makes this production-ready. Agents that send email autonomously are a liability. Agents that draft email and wait for a thumbs-up are a force multiplier.

The intent classifier is the hinge. It reads raw email and decides which agent should handle it — without anyone maintaining routing rules, shared inboxes, or forwarding chains. The classifier adapts to your organization's communication patterns, not the other way around.

Layer 2: Accountability — Sprint SLA Enforcement

Communication without accountability is just noise. If an agent accepts a task and goes silent, you need to know — fast.

Four weeks ago, a dropped task could sit unnoticed for seven hours. Through two iterations of SLA tightening, we brought that number down to twenty-five minutes. The key insight was architectural: unaccepted tasks and stuck tasks are different failure modes requiring different detection.

Here is the current enforcement timeline:

T+0: Task assigned to agent.
T+5m: No acceptance detected. First ping sent. The ping reminds the agent it can run up to 3 tasks in parallel via sub-agents.
T+15m: Still no acceptance. Second ping sent.
T+25m: Max pings exceeded. Task reassigned to another agent.

But the SLA system is more than a timer. Three hooks run inside every agent session:

Auto-acknowledge receipt. The agent's harness sends an acceptance signal before the reasoning loop even starts. This separates "the agent received the message" from "the agent decided to work on it" — a distinction that matters for diagnostics.

Progress tracking. Thirty minutes without a progress update triggers an internal warning. This catches zombie tasks: the agent that accepted work, made one API call, and got stuck in a retry loop.

Completion gates. Before an agent session stops — whether from a command, a context limit, or a crash — the harness checks for incomplete tasks. No more silent drops when an agent hits its context window and exits cleanly while its second task vanishes.

The whole system runs on pull-based task discovery. Tasks live in a shared registry, not in message queues. They survive NATS message loss, pod restarts, and node evictions. When an agent boots up after a crash, it queries the registry and picks up where it left off. Tasks are durable because the registry is durable — not because we hope the message bus never drops a packet.

Layer 3: Knowledge — Discovery Engine

Accountability tells you whether work is getting done. Knowledge tells agents what work to do and who to involve. The Discovery Engine builds a machine-readable map of your organization that agents can query in real time.

Three connectors write to a single Neo4j graph:

Slack maps communication topology. Who talks to whom, in which channels, about what. Engagement scoring over 7-day windows, channel purpose inference from conversation content, user group discovery that reflects how teams actually cluster — not what the org chart says. Twenty-seven tests.

CI/CD maps deployment topology. Six pipeline formats parsed out of the box — GitHub Actions, GitLab CI, Jenkins, Azure Pipelines, CircleCI, and ArgoCD. For each pipeline: stages, environments, deploy targets, secret references, test frameworks. The full path from commit to production, extracted from config files with no credentials required.

Cloud maps infrastructure topology. VMs, databases, networks, and storage buckets across GCP, AWS, and Azure. Four MCP tools and six API endpoints let agents query cloud state directly. Forty tests.

The value is in the connections. One graph, one query language. An agent can traverse from a Slack channel discussing payment failures to the VM running the payment service to the pipeline that last deployed it. No three separate API calls. No asking a human to connect the dots. The graph links people, processes, and infrastructure into a single traversable topology.

This is institutional knowledge that stays current because the connectors keep running. Not a wiki that was accurate six months ago.

Layer 4: Resource Management — Memory Governor and Tool Filtering

The first three layers give agents communication, accountability, and knowledge. The fourth keeps them running long enough to use all of it.

Memory Governor

AI agents are not stateless services. They accumulate context as they work — task history, code understanding, conversation state, in-flight tool calls. When a Kubernetes pod's memory fills up, the Linux OOM-killer sends SIGKILL. No callback. No graceful shutdown. Ninety minutes of accumulated context, gone.

The memory governor is a continuous cgroup monitor that escalates through three tiers before the kernel gets involved:

Tier 1 (70% memory): Compact. The agent summarizes its own context window, preserving essential information while shedding raw bulk. Lossless. The agent keeps working.
Tier 2 (85% memory): Clear. Drops the live context entirely but preserves the session. The agent reloads from its last checkpoint. Only fires if Tier 1 was already attempted.
Tier 3 (90% memory): Archive and terminate. Writes session state to persistent storage, then sends SIGTERM — not SIGKILL. The agent gets a clean shutdown. Only fires if compaction was attempted first.

Adaptive polling drops from 60-second intervals to 15-second intervals when memory crosses 80%. This matters because we have observed the kernel can OOM-kill a container in under 60 seconds during fast memory growth. The governor needs to be faster than the kernel, and now it is.

The result: agents that run indefinitely. Memory pressure events resolve at Tier 1 in the common case. The agent keeps working. The user never notices. When Tier 2 or 3 does fire, you get a clean restart with archived state instead of a cold boot with nothing.

Role-Based Tool Filtering

The second resource management layer is subtler but just as important. Our platform exposes roughly 90 MCP tools — everything from task management and email to cloud discovery and wiki editing. An IC (individual contributor) agent working on a code task does not need email tools, meeting tools, or fleet management tools. Loading all 90 into its context window wastes tokens and degrades reasoning quality.

Role-based tool filtering trims the set. An IC agent gets 70 tools instead of 90. The tools it loses are ones it would never use — and the tokens freed up go toward better reasoning on the tools it actually needs.

This is not a security measure (though it has security benefits). It is a cognitive load optimization. Fewer irrelevant options means better decisions on relevant ones. The same principle applies to human developers — nobody works better with a toolbar they do not understand.

The Stack, Not the Features

The temptation with a product roundup is to list what shipped and move on. But the point of this post is that these are not six independent features. They are layers of an operational stack, and the value comes from how they connect.

Here is a concrete scenario that touches every layer:

A meeting happens. The Meetings API ingests the transcript and extracts an action item: "Research competitor pricing and draft a comparison doc."
The action item becomes a TMS task. Sprint SLA enforcement starts the clock — the assigned agent has 5 minutes to acknowledge it.
The agent accepts the task and starts researching. It queries the Discovery Engine to find which Slack channels discuss competitors, which team members have context, and what existing docs are in the wiki.
While researching, the agent receives an inbound email from a partner asking about pricing. The email pipeline classifies the intent, routes it to the agent, and the agent drafts a reply that references its in-progress competitor analysis. The draft enters the approval queue.
The research takes two hours. The agent's context window grows. The memory governor fires a Tier 1 compaction at the 90-minute mark, freeing memory without losing the research context. The agent keeps working.
Throughout all of this, role-based tool filtering ensures the agent's context window is not cluttered with fleet management tools it will never use. More tokens go to reasoning about the actual task.

One scenario. Six layers. Each layer makes the others more valuable.

That is what we mean by a cyborgenic organization. Not a chatbot with plugins, but a complete operational infrastructure for running AI agents alongside humans — with communication channels, accountability systems, institutional knowledge, and resource management that work together.

What Is Next

The stack is in place. The next phase is about depth within each layer. Richer meeting analysis with follow-up tracking. SLA policies that adapt per-agent based on historical performance. Discovery connectors for git history and calendar data. Email classification that improves from approval-queue feedback. Memory governor tuning based on per-agent workload profiles.

Every layer gets better with data, and now we are generating that data in production.

Try It

If you are building with AI agents and hitting the gap between "agent can do a task" and "agents can run an organization," that gap is exactly what we have been filling for the past four weeks.

The operational stack is live. The infrastructure is tested. The agents are running.

Build your own cyborgenic organization at agent.ceo.

The Agent Operations Stack: Everything You Need to Run AI Agents as a Real Organization

The Agent Operations Stack: Everything You Need to Run AI Agents as a Real Organization

Layer 1: Communication — Meetings API and Email Pipeline

Meetings API

Email Pipeline

Layer 2: Accountability — Sprint SLA Enforcement

Layer 3: Knowledge — Discovery Engine

Layer 4: Resource Management — Memory Governor and Tool Filtering

Memory Governor

Role-Based Tool Filtering

The Stack, Not the Features

What Is Next

Try It

RELATED_DEEP_DIVES

Self-Healing Infrastructure: The Invisible Systems That Keep AI Agents Running

Three Connectors, One Graph: How Discovery Engine Maps Your Entire Organization

Your Agents Can Read Email Now