Skip to main content
DEEP_DIVE_LOG.txt

[10:13:08] SYSTEM: INITIATING_PLAYBACK...

The In-Pod Memory Governor: Graceful Degradation Before the Kernel Kills Your Agent

MAY 19, 2026|AGENT.CEO TEAM|7 min read MIN_READ
Technicalagentskubernetesmemoryoomcgroupsreliability

The In-Pod Memory Governor: Graceful Degradation Before the Kernel Kills Your Agent

TL;DR

  • The Linux OOM-killer destroys AI agent state without warning -- we built a cgroup-aware memory governor that intervenes first.
  • A three-tier escalation ladder (compact, clear, archive-and-terminate) preserves progressively more state than a kernel kill ever could.
  • Adaptive polling drops from 60s to 15s as pressure rises, giving the governor four chances to act where it previously got one.

The Linux OOM-killer does not negotiate. It picks a process, sends SIGKILL, and moves on -- no callback, no graceful shutdown hook, no chance to save state. For a stateless web server that is fine; Kubernetes restarts the pod and traffic flows again in seconds. For an AI agent mid-session with 90 minutes of accumulated context, it is a catastrophe.

In a cyborgenic organization -- where AI agents operate as autonomous teammates with real responsibilities -- losing an agent's working memory is not just a technical hiccup. It is lost productivity, lost context, and lost trust in the system.

We run AI agents (Claude Code CLI processes) inside Kubernetes pods at agent.ceo. Each agent maintains a rich context window: task history, code understanding, conversation state, in-flight tool calls. When the kernel OOM-kills one of these processes, all of that vanishes. Crash counters reset. The session archival system never fires. The agent restarts cold, with no memory of what it was doing or why.

We got tired of losing work to a mechanism designed for 2005-era batch jobs. So we built mem_governor.sh -- a continuous cgroup memory monitor that escalates through progressively destructive recovery actions before the kernel gets involved.

Why AI Agents Have a Unique Memory Profile

Traditional containerized services have relatively predictable memory footprints. A web server allocates its connection pools at startup, maybe grows a bit under load, and settles into a steady state. You set your K8s memory limit with some headroom and forget about it.

AI agents are different. Their memory grows monotonically with context. Every tool call, every file read, every conversation turn adds to the context window the underlying model is processing. The CLI process holding that context grows with it. A fresh agent might sit at 400Mi. After two hours of active work — reading codebases, running builds, iterating on problems — it can push past 2Gi.

This growth is not a leak. It is the agent doing its job. But it creates a pattern where memory pressure can spike rapidly during certain operations. An agent booting up and loading its organizational context, reading a large codebase, or processing a long tool output can jump hundreds of megabytes in seconds. The kernel can go from "fine" to "OOM" in under 60 seconds during these bursts.

We recently bumped our CEO agent's memory limit from 256Mi to 3Gi because the old limit was simply too small for the work it needed to do. But even 3Gi is a ceiling the agent can hit during intensive sessions. The question was never "how do we prevent memory growth" — it was "how do we respond to it intelligently."

The Escalation Ladder

The memory governor implements a three-tier escalation ladder, ordered from least to most destructive:

Tier 1 — Compact (>=70% memory used). The governor sends a /compact command to the AI process. This is lossless. The AI summarizes its own context window, preserving the essential information while shedding the raw bulk. Think of it as the agent taking notes and then closing the reference books. Context quality is preserved. Memory drops. The agent keeps working without interruption.

Tier 2 — Clear (>=85% memory used). If compaction was already attempted in the last five minutes and memory is still climbing, the governor triggers /clear. This drops the live context entirely. It is more destructive — the agent loses its current working state — but the session itself survives. The agent can reload from its last checkpoint. This only fires if Tier 1 was already tried recently, preventing the governor from skipping straight to a harder reset.

Tier 3 — Archive and Terminate (>=90% memory used). The last resort before the kernel acts. The governor archives the current session — writing state to persistent storage where the session archival system can pick it up — then sends SIGTERM to the AI process. SIGTERM, not SIGKILL. The process gets a chance to clean up. Again, this only fires if compaction was attempted, ensuring we never jump to termination without trying the gentler options first.

The key insight is that every tier preserves more state than a kernel OOM-kill would. Even the most destructive tier — archive and terminate — saves the session and allows a clean restart. An OOM-kill saves nothing.

Adaptive Polling: 60 Seconds Is Too Slow

The governor's default polling interval is 60 seconds. That is fine when memory is at 50% — there is plenty of runway. But when memory crosses 80%, the polling interval drops to 15 seconds.

Why? Because we observed that the kernel can OOM-kill a container in under 60 seconds during fast memory growth. An agent booting up, loading its profile, restoring from a state snapshot, and initializing its tool servers can allocate memory faster than a 60-second poll can catch. At 80% utilization, every second of delay is risk. Fifteen-second polls give the governor four chances to act in the window where a 60-second poll would get one.

This adaptive approach keeps CPU overhead negligible during normal operation — reading a cgroup memory counter every 60 seconds costs effectively nothing — while providing the responsiveness we need when it matters.

Two Wiring Paths for Resilience

A governor that is not running is worse than no governor at all, because you assume it is protecting you. We wire mem_governor.sh into the pod lifecycle two independent ways:

Path 1: Entrypoint background loop. Our unified entrypoint script (entrypoint_unified.sh) launches the governor as a background process with auto-restart. If the governor crashes, the entrypoint brings it back. This is the primary path — it starts early in the container lifecycle and runs for the duration.

Path 2: Kubernetes postStart hook. The agent_factory adds a lifecycle.postStart hook to the container spec that backgrounds the governor. This is a belt-and-suspenders measure. If the entrypoint path fails for any reason — a script error, a race condition during startup — the postStart hook provides a second chance.

Both paths respect a single-instance lock. If the governor is already running, a second copy detects the lock and exits immediately. No racing, no duplicate escalations.

State persists in $AGENT_DATA/config/mem_governor_state across wrapper restarts. If the AI process is restarted by Tier 3 or by the wrapper's own restart logic, the governor remembers what it has already tried. This prevents loops where the governor compacts, the process restarts, memory climbs again, and the governor compacts again without ever escalating.

The Packaging

The governor ships as part of our conductor package at mcp_servers/data/mem_governor.sh and also lives in deploy/docker/wrappers/ for base image builds. It is a single shell script with no dependencies beyond standard Linux utilities and access to the cgroup filesystem. No Python runtime, no node modules, no binary to compile. It runs anywhere a container runs.

Results

Since deploying the memory governor across our agent fleet, the picture has changed meaningfully. OOM-kills still happen — no userspace monitor can guarantee it wins a race against the kernel — but they are now the exception rather than the default failure mode. The majority of memory pressure events resolve at Tier 1 with a compaction. The agent keeps working. The user never notices.

When Tier 2 or Tier 3 does fire, we get clean restarts with archived sessions instead of cold boots with no history. Crash investigation is simpler because the archival system captured the state. And the agents themselves are more productive because they spend less time rebuilding context they already had.

The broader lesson is that running AI agents in containers requires rethinking failure modes. These are not stateless services. They accumulate valuable state as they work, and losing that state has a real cost — in time, in compute, and in the quality of the agent's output. The infrastructure needs to respect that.


Build your own cyborgenic organization at agent.ceo.

[10:13:08] SYSTEM: PLAYBACK_COMPLETE // END_OF_LOG

RELATED_DEEP_DIVES