Skip to main content
Back to blog
Technical8 min read

Agent Lifecycle Management: Create, Deploy, Scale, Pause

M
Moshe Beeri, Founder
/
lifecyclekubernetesscalingdeploymentoperations

AI agents are not static processes. They need to be created with specific configurations, deployed to infrastructure, scaled up when demand spikes, scaled down when idle, paused without losing state, and eventually terminated cleanly. Getting lifecycle management right is the difference between a reliable agent fleet and an expensive, unpredictable mess.

This guide covers the full agent lifecycle as implemented on agent.ceo's Kubernetes-based platform.

The Agent Lifecycle

Rendering diagram…

Each state transition has specific preconditions, actions, and postconditions that the platform enforces.

Stage 1: Create

Creating an agent defines its identity, capabilities, and configuration — without consuming compute resources.

Agent Definition

apiVersion: agent.ceo/v1
kind: Agent
metadata:
  name: backend-agent
  namespace: genbrain
  labels:
    role: backend
    team: engineering
    tier: specialist
spec:
  # Identity
  role: backend
  description: "Implements backend services, APIs, and data models"
  
  # Capabilities
  skills:
    - name: "code-implementation"
      languages: [go, python, typescript]
    - name: "api-design"
      standards: [openapi, grpc]
    - name: "database"
      systems: [postgres, redis, clickhouse]
  
  # Communication
  subscriptions:
    - genbrain.agents.backend.inbox
    - genbrain.agents.backend.tasks
    - genbrain.events.ci.failure
    - genbrain.events.pr.review-requested
  
  # Resources
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  
  # Behavior
  autonomy_level: 3
  escalation_policy:
    blocked_timeout: 30m
    escalate_to: cto
  
  # Scaling
  scaling:
    min_replicas: 0
    max_replicas: 5
    target_queue_depth: 3

Validation

Before deployment, the platform validates:

  • Role uniqueness within the namespace
  • NATS subject permissions don't conflict with other agents
  • Resource requests are within quota
  • Escalation targets exist
  • Referenced skills are available
# Validate agent definition
agentctl validate -f backend-agent.yaml

# Create the agent (does not deploy)
agentctl create -f backend-agent.yaml
# Output: Agent 'backend-agent' created. Status: Created. Use 'agentctl deploy backend-agent' to start.

Stage 2: Deploy

Deployment provisions infrastructure and starts the agent process.

What Happens During Deployment

  1. Container image pull: The agent runtime image is pulled to the node
  2. Configuration injection: Agent definition, secrets, and CLAUDE.md are mounted
  3. NATS consumer creation: Durable consumers are created for the agent's subscriptions
  4. Health check registration: Liveness and readiness probes are configured
  5. Agent initialization: The agent loads its knowledge base and enters ready state
# Kubernetes Deployment generated from Agent definition
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-agent
  namespace: genbrain
spec:
  replicas: 1
  selector:
    matchLabels:
      agent: backend-agent
  template:
    metadata:
      labels:
        agent: backend-agent
        role: backend
    spec:
      containers:
        - name: agent
          image: gcr.io/genbrain/agent-runtime:v2.4.0
          env:
            - name: AGENT_ROLE
              value: "backend"
            - name: NATS_URL
              value: "nats://nats.genbrain.svc.cluster.local:4222"
            - name: NATS_CREDS
              valueFrom:
                secretKeyRef:
                  name: backend-agent-nats
                  key: credentials
          volumeMounts:
            - name: agent-config
              mountPath: /etc/agent
            - name: workspace
              mountPath: /agent-data/workspace
          resources:
            requests:
              cpu: "500m"
              memory: "1Gi"
            limits:
              cpu: "2000m"
              memory: "4Gi"
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            periodSeconds: 10
      volumes:
        - name: agent-config
          configMap:
            name: backend-agent-config
        - name: workspace
          persistentVolumeClaim:
            claimName: backend-agent-workspace

Deploy Command

# Deploy with default settings (1 replica)
agentctl deploy backend-agent

# Deploy with specific replica count
agentctl deploy backend-agent --replicas 3

# Watch deployment progress
agentctl status backend-agent --watch
# Output:
# backend-agent: Deploying... [pulling image]
# backend-agent: Deploying... [creating consumers]
# backend-agent: Running (1/1 ready) - 23s

Stage 3: Scale

Scaling adjusts the number of agent replicas based on workload. This is where Kubernetes and AI agents work together powerfully.

Horizontal Pod Autoscaling

Agent scaling is driven by queue depth — the number of unprocessed messages in the agent's task consumer:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-agent-hpa
  namespace: genbrain
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend-agent
  minReplicas: 1
  maxReplicas: 5
  metrics:
    - type: External
      external:
        metric:
          name: nats_consumer_pending_messages
          selector:
            matchLabels:
              stream: AGENT_COMMS
              consumer: BACKEND_TASKS
        target:
          type: AverageValue
          averageValue: "3"  # Scale up when > 3 pending tasks per replica
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Pods
          value: 2
          periodSeconds: 120
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 300

Scale-to-Zero

For cost optimization, agents can scale to zero replicas when idle. KEDA (Kubernetes Event-Driven Autoscaler) handles this:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: backend-agent-scaler
  namespace: genbrain
spec:
  scaleTargetRef:
    name: backend-agent
  minReplicaCount: 0
  maxReplicaCount: 5
  cooldownPeriod: 600  # Wait 10 minutes before scaling to zero
  triggers:
    - type: nats-jetstream
      metadata:
        natsServerMonitoringEndpoint: "nats.genbrain.svc.cluster.local:8222"
        account: "$G"
        stream: "AGENT_COMMS"
        consumer: "BACKEND_TASKS"
        lagThreshold: "1"  # Scale up from zero on first message

Manual Scaling

# Scale to specific replica count
agentctl scale backend-agent --replicas 3

# Check current scale
agentctl status backend-agent
# Output: backend-agent: Running (3/3 ready), Queue depth: 7, Processing: 3

Scaling Considerations for AI Agents

Unlike traditional microservices, AI agent scaling has unique considerations:

  1. Context isolation: Each replica maintains independent context — no shared state between replicas
  2. Task affinity: Some tasks benefit from routing to the same replica (context reuse)
  3. Cost proportionality: Each replica consumes LLM API credits proportional to activity
  4. Cold start time: New replicas need time to load knowledge base and warm up

Stage 4: Pause

Pausing suspends an agent without losing its state. This is crucial for:

  • Cost management: Stop paying for idle agents during off-hours
  • Maintenance windows: Pause agents while updating infrastructure
  • Incident response: Pause a misbehaving agent while investigating
# Pause agent (scales to 0 replicas, preserves state)
agentctl pause backend-agent
# Output: backend-agent: Pausing... consumers paused, scaling to 0
# Output: backend-agent: Paused. Messages will queue until resumed.

# Resume agent
agentctl resume backend-agent
# Output: backend-agent: Resuming... scaling to 1, consumers resumed
# Output: backend-agent: Running (1/1 ready). Processing 4 queued messages.

What Happens During Pause

  1. Graceful drain: Current task completes (or checkpoints)
  2. Consumer pause: NATS consumers stop delivering messages (messages queue up)
  3. Scale to zero: All replicas are terminated
  4. State preservation: PersistentVolumeClaims and ConfigMaps are retained

Scheduled Pause/Resume

For predictable usage patterns, schedule agents:

apiVersion: agent.ceo/v1
kind: AgentSchedule
metadata:
  name: backend-agent-schedule
spec:
  agent: backend-agent
  schedules:
    - action: pause
      cron: "0 22 * * 1-5"  # Pause at 10 PM weekdays
    - action: resume
      cron: "0 7 * * 1-5"   # Resume at 7 AM weekdays
    - action: pause
      cron: "0 18 * * 6,0"  # Pause at 6 PM weekends

Stage 5: Terminate

Termination is permanent — the agent is stopped and its resources are released.

# Graceful termination (waits for current task)
agentctl terminate backend-agent --grace-period 5m

# Force termination (immediate)
agentctl terminate backend-agent --force

# Terminate with archive (preserves history for audit)
agentctl terminate backend-agent --archive

Termination Sequence

  1. Stop accepting new tasks: Remove from consumer group
  2. Complete or checkpoint current task: Wait up to grace period
  3. Publish termination event: Notify dependent agents
  4. Delete NATS consumers: Clean up messaging resources
  5. Delete Kubernetes resources: Deployment, HPA, services
  6. Archive or delete state: Based on retention policy

Monitoring the Lifecycle

The agent.ceo dashboard provides real-time visibility into agent states:

# Fleet overview
agentctl fleet status
# Output:
# AGENT            STATUS    REPLICAS  QUEUE  PROCESSING  COST/HR
# ceo-agent        Running   1/1       0      1           $1.00
# cto-agent        Running   1/1       2      1           $1.00
# backend-agent    Running   3/3       7      3           $3.00
# frontend-agent   Running   1/1       1      1           $1.00
# devops-agent     Running   1/1       0      0           $1.00
# cso-agent        Paused    0/0       4      0           $0.00
# marketing-agent  Running   1/1       0      1           $1.00

For a deeper dive into operating agent fleets at scale, including monitoring, alerting, and incident response, see Building Resilient AI Agent Fleets.

Cost Optimization Strategies

Agent lifecycle management directly impacts cost. At $1/agent-hour on the pay-as-you-go plan:

  1. Scale to zero: Idle agents cost nothing
  2. Right-size replicas: Match replica count to actual queue depth
  3. Pause overnight: Reduce spend by 40% for business-hours-only workloads
  4. Batch tasks: Queue up work and burst with multiple replicas
# Cost report
agentctl costs --period 7d
# Output:
# Total agent-hours: 312
# Active agent-hours: 187 (60%)
# Idle agent-hours: 125 (40%) <- opportunity for pause scheduling
# Estimated monthly: $4,680 (could be $2,808 with scheduling)

The Standard plan at $200/agent/month becomes cost-effective when agents run more than 200 hours/month (roughly business hours plus some overflow).

Best Practices

  1. Always define scaling policies: Never run fixed replicas without autoscaling
  2. Set appropriate grace periods: Tasks need time to complete or checkpoint
  3. Use durable consumers: Messages must survive agent restarts
  4. Monitor queue depth: Growing queues indicate scaling or performance issues
  5. Archive before terminating: Agent history is valuable for debugging and audit
  6. Start with one replica: Scale up based on observed load, not predictions

For enterprise deployment inquiries, organizations can reach out to enterprise@agent.ceo.

Try agent.ceo

SaaS — Get started with 1 free agent-week at agent.ceo.

Enterprise — For private installation on your own infrastructure, contact enterprise@agent.ceo.


agent.ceo is built by GenBrain AI — a GenAI-first autonomous agent orchestration platform. General inquiries: hello@agent.ceo | Security: security@agent.ceo

Related articles