Kubernetes Orchestration for AI Agent Workloads

AI agents are not traditional web services. They don't handle HTTP requests in a request-response cycle. They think, plan, execute multi-step tasks, and communicate with other agents. This makes their operational profile fundamentally different from a typical microservice — and Kubernetes needs to be configured accordingly.

At agent.ceo, every agent runs as a Kubernetes pod. This isn't just convenience — it's architectural. Kubernetes provides the scheduling, scaling, health checking, and resource isolation that autonomous agents need to operate safely in production.

Why Kubernetes for AI Agents

The properties that make Kubernetes ideal for AI agent workloads:

Isolation: Each agent runs in its own pod with defined resource limits
Scheduling: Kubernetes places agents on appropriate nodes based on resource needs
Health checking: Liveness and readiness probes ensure agents are functioning
Scaling: Horizontal pod autoscaling adapts to workload demands
RBAC: Fine-grained permissions control what each agent can access
Networking: Network policies restrict agent-to-agent communication
Observability: Standard metrics, logging, and tracing integration

The Agent Pod Architecture

Here's how a typical agent.ceo agent is deployed:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-devops
  namespace: agent-system
  labels:
    agent.ceo/role: devops
    agent.ceo/tier: operations
spec:
  replicas: 1
  selector:
    matchLabels:
      app: agent-devops
  template:
    metadata:
      labels:
        app: agent-devops
        agent.ceo/role: devops
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
    spec:
      serviceAccountName: agent-devops
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      
      containers:
        - name: agent
          image: gcr.io/agent-ceo/agent-runtime:latest
          args: ["--role=devops", "--config=/etc/agent/config.yaml"]
          
          ports:
            - name: metrics
              containerPort: 9090
            - name: health
              containerPort: 8080
          
          env:
            - name: NATS_URL
              value: "nats://nats.agent-system.svc:4222"
            - name: AGENT_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: AGENT_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          
          resources:
            requests:
              memory: "1Gi"
              cpu: "500m"
            limits:
              memory: "4Gi"
              cpu: "2000m"
          
          livenessProbe:
            httpGet:
              path: /healthz
              port: health
            initialDelaySeconds: 30
            periodSeconds: 60
            timeoutSeconds: 10
          
          readinessProbe:
            httpGet:
              path: /readyz
              port: health
            initialDelaySeconds: 10
            periodSeconds: 30
          
          volumeMounts:
            - name: agent-config
              mountPath: /etc/agent
            - name: workspace
              mountPath: /workspace
      
      volumes:
        - name: agent-config
          configMap:
            name: agent-devops-config
        - name: workspace
          emptyDir:
            sizeLimit: 10Gi

Resource Management for AI Workloads

AI agents have bursty resource profiles. During analysis and planning, they consume significant CPU and memory. During execution, they might be I/O bound waiting for API responses. This requires careful resource configuration:

# Resource profiles for different agent types
apiVersion: v1
kind: ConfigMap
metadata:
  name: agent-resource-profiles
  namespace: agent-system
data:
  profiles.yaml: |
    profiles:
      # Agents that analyze code and make decisions
      cognitive:
        requests:
          memory: "2Gi"
          cpu: "1000m"
        limits:
          memory: "8Gi"
          cpu: "4000m"
        nodeSelector:
          agent.ceo/workload-type: cognitive
      
      # Agents that execute commands and API calls
      executor:
        requests:
          memory: "512Mi"
          cpu: "250m"
        limits:
          memory: "2Gi"
          cpu: "1000m"
      
      # Agents that monitor and observe
      observer:
        requests:
          memory: "256Mi"
          cpu: "100m"
        limits:
          memory: "1Gi"
          cpu: "500m"

Scaling AI Agents

Unlike web services that scale on request count, AI agents scale based on task queue depth. Here's the HPA configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: agent-devops-hpa
  namespace: agent-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: agent-devops
  minReplicas: 1
  maxReplicas: 5
  metrics:
    # Scale on pending tasks in the NATS queue
    - type: External
      external:
        metric:
          name: nats_consumer_pending_messages
          selector:
            matchLabels:
              consumer: devops-agent
              stream: agent-tasks
        target:
          type: AverageValue
          averageValue: "3"  # Scale up when >3 pending tasks per replica
    
    # Also consider memory pressure
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75
  
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Pods
          value: 2
          periodSeconds: 120
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 300

The custom metric comes from a NATS metrics exporter that exposes queue depths to Prometheus, which is then available to the HPA via the Prometheus adapter.

Inter-Agent Communication via NATS

Agents communicate through NATS JetStream, which provides durable, exactly-once messaging between pods:

# Agent communication patterns in Kubernetes
class AgentCommunication:
    def __init__(self, agent_id, nats_url):
        self.agent_id = agent_id
        self.nc = None
        self.js = None
    
    async def connect(self):
        self.nc = await nats.connect(
            self.nats_url,
            name=self.agent_id,
            max_reconnect_attempts=-1,  # Always reconnect
            reconnect_time_wait=2
        )
        self.js = self.nc.jetstream()
    
    async def delegate_task(self, target_role, task):
        """Send a task to another agent by role."""
        await self.js.publish(
            f"tasks.{target_role}.new",
            json.dumps({
                "task_id": str(uuid.uuid4()),
                "from_agent": self.agent_id,
                "payload": task,
                "created_at": datetime.utcnow().isoformat()
            }).encode()
        )
    
    async def request_info(self, target_role, query, timeout=30):
        """Request-reply pattern for synchronous info exchange."""
        response = await self.nc.request(
            f"agents.{target_role}.query",
            json.dumps(query).encode(),
            timeout=timeout
        )
        return json.loads(response.data)

Network Policies for Agent Isolation

Not all agents should communicate with all other agents. Network policies enforce communication boundaries:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-devops-network
  namespace: agent-system
spec:
  podSelector:
    matchLabels:
      agent.ceo/role: devops
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Allow NATS messages from other agents
    - from:
        - namespaceSelector:
            matchLabels:
              name: agent-system
      ports:
        - protocol: TCP
          port: 4222
  egress:
    # Allow connection to NATS
    - to:
        - podSelector:
            matchLabels:
              app: nats
      ports:
        - protocol: TCP
          port: 4222
    # Allow connection to Kubernetes API
    - to:
        - ipBlock:
            cidr: 10.0.0.1/32  # K8s API server
      ports:
        - protocol: TCP
          port: 443
    # Allow connection to monitored services
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: TCP
          port: 443
        - protocol: TCP
          port: 80

Health Checking AI Agents

Traditional liveness probes check if a process is running. AI agents need more sophisticated health checks — is the agent actually making progress on tasks, or is it stuck?

# Agent health endpoint
from fastapi import FastAPI

app = FastAPI()

class AgentHealth:
    def __init__(self, agent):
        self.agent = agent
    
    @app.get("/healthz")
    async def liveness(self):
        """Is the agent process alive and responsive?"""
        return {"status": "ok", "uptime": self.agent.uptime_seconds}
    
    @app.get("/readyz")
    async def readiness(self):
        """Is the agent ready to accept new tasks?"""
        checks = {
            "nats_connected": self.agent.nats.is_connected,
            "task_loop_running": self.agent.task_loop.is_alive(),
            "last_heartbeat_age": time.time() - self.agent.last_heartbeat,
            "memory_available": psutil.virtual_memory().available > 256_000_000,
        }
        
        healthy = all([
            checks["nats_connected"],
            checks["task_loop_running"],
            checks["last_heartbeat_age"] < 300,  # Less than 5 min since last heartbeat
            checks["memory_available"],
        ])
        
        if healthy:
            return {"status": "ready", "checks": checks}
        else:
            raise HTTPException(status_code=503, detail=checks)

Node Affinity and Agent Placement

Different agents have different resource profiles. Use node affinity to place cognitive-heavy agents on appropriate nodes:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: agent.ceo/workload-type
                operator: In
                values:
                  - cognitive
    # Spread agents across zones for resilience
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            agent.ceo/tier: operations

Observability Stack

Every agent exposes Prometheus metrics for real-time monitoring:

# ServiceMonitor for agent metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: agent-metrics
  namespace: agent-system
spec:
  selector:
    matchLabels:
      agent.ceo/monitored: "true"
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

Key metrics exposed by each agent:

# Agent task metrics
agent_tasks_completed_total{role="devops", status="success"} 142
agent_tasks_completed_total{role="devops", status="failed"} 3
agent_task_duration_seconds_bucket{role="devops", le="60"} 89
agent_task_duration_seconds_bucket{role="devops", le="300"} 130
agent_task_duration_seconds_bucket{role="devops", le="3600"} 142

# Agent resource usage
agent_context_tokens_used{role="devops"} 45000
agent_decisions_made_total{role="devops", type="deploy"} 23
agent_decisions_made_total{role="devops", type="scale"} 8
agent_escalations_total{role="devops"} 2

Building Resilient Agent Fleets

For production reliability, agents need to handle pod restarts, node failures, and network partitions gracefully. The combination of Kubernetes pod management, NATS durable subscriptions, and agent state persistence creates resilient agent fleets that recover automatically from infrastructure failures.

Learn more about the overall architecture and how to scale AI agents for growing workloads.

Try agent.ceo

SaaS — Get started with 1 free agent-week at agent.ceo.

Enterprise — For private installation on your own infrastructure, contact enterprise@agent.ceo.

agent.ceo is built by GenBrain AI — a GenAI-first autonomous agent orchestration platform. General inquiries: hello@agent.ceo | Security: security@agent.ceo

Kubernetes Orchestration for AI Agent Workloads

Kubernetes Orchestration for AI Agent Workloads

Why Kubernetes for AI Agents

The Agent Pod Architecture

Resource Management for AI Workloads

Scaling AI Agents

Inter-Agent Communication via NATS

Network Policies for Agent Isolation

Health Checking AI Agents

Node Affinity and Agent Placement

Observability Stack

Building Resilient Agent Fleets

Try agent.ceo

Related Posts

Scaling AI Agents: From 1 to 100 Concurrent Workers

AI Security Reviews: Finding 14 Vulnerabilities in 4 Hours

Autonomous Deployment: How AI Agents Ship Code