AI agents are not static processes. They need to be created with specific configurations, deployed to infrastructure, scaled up when demand spikes, scaled down when idle, paused without losing state, and eventually terminated cleanly. Getting lifecycle management right is the difference between a reliable agent fleet and an expensive, unpredictable mess.
This guide covers the full agent lifecycle as implemented on agent.ceo's Kubernetes-based platform.
The Agent Lifecycle
Rendering diagram…
Each state transition has specific preconditions, actions, and postconditions that the platform enforces.
Stage 1: Create
Creating an agent defines its identity, capabilities, and configuration — without consuming compute resources.
Agent Definition
apiVersion: agent.ceo/v1
kind: Agent
metadata:
name: backend-agent
namespace: genbrain
labels:
role: backend
team: engineering
tier: specialist
spec:
# Identity
role: backend
description: "Implements backend services, APIs, and data models"
# Capabilities
skills:
- name: "code-implementation"
languages: [go, python, typescript]
- name: "api-design"
standards: [openapi, grpc]
- name: "database"
systems: [postgres, redis, clickhouse]
# Communication
subscriptions:
- genbrain.agents.backend.inbox
- genbrain.agents.backend.tasks
- genbrain.events.ci.failure
- genbrain.events.pr.review-requested
# Resources
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
# Behavior
autonomy_level: 3
escalation_policy:
blocked_timeout: 30m
escalate_to: cto
# Scaling
scaling:
min_replicas: 0
max_replicas: 5
target_queue_depth: 3
Validation
Before deployment, the platform validates:
- Role uniqueness within the namespace
- NATS subject permissions don't conflict with other agents
- Resource requests are within quota
- Escalation targets exist
- Referenced skills are available
# Validate agent definition
agentctl validate -f backend-agent.yaml
# Create the agent (does not deploy)
agentctl create -f backend-agent.yaml
# Output: Agent 'backend-agent' created. Status: Created. Use 'agentctl deploy backend-agent' to start.
Stage 2: Deploy
Deployment provisions infrastructure and starts the agent process.
What Happens During Deployment
- Container image pull: The agent runtime image is pulled to the node
- Configuration injection: Agent definition, secrets, and CLAUDE.md are mounted
- NATS consumer creation: Durable consumers are created for the agent's subscriptions
- Health check registration: Liveness and readiness probes are configured
- Agent initialization: The agent loads its knowledge base and enters ready state
# Kubernetes Deployment generated from Agent definition
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-agent
namespace: genbrain
spec:
replicas: 1
selector:
matchLabels:
agent: backend-agent
template:
metadata:
labels:
agent: backend-agent
role: backend
spec:
containers:
- name: agent
image: gcr.io/genbrain/agent-runtime:v2.4.0
env:
- name: AGENT_ROLE
value: "backend"
- name: NATS_URL
value: "nats://nats.genbrain.svc.cluster.local:4222"
- name: NATS_CREDS
valueFrom:
secretKeyRef:
name: backend-agent-nats
key: credentials
volumeMounts:
- name: agent-config
mountPath: /etc/agent
- name: workspace
mountPath: /agent-data/workspace
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
livenessProbe:
httpGet:
path: /health/live
port: 8080
periodSeconds: 30
readinessProbe:
httpGet:
path: /health/ready
port: 8080
periodSeconds: 10
volumes:
- name: agent-config
configMap:
name: backend-agent-config
- name: workspace
persistentVolumeClaim:
claimName: backend-agent-workspace
Deploy Command
# Deploy with default settings (1 replica)
agentctl deploy backend-agent
# Deploy with specific replica count
agentctl deploy backend-agent --replicas 3
# Watch deployment progress
agentctl status backend-agent --watch
# Output:
# backend-agent: Deploying... [pulling image]
# backend-agent: Deploying... [creating consumers]
# backend-agent: Running (1/1 ready) - 23s
Stage 3: Scale
Scaling adjusts the number of agent replicas based on workload. This is where Kubernetes and AI agents work together powerfully.
Horizontal Pod Autoscaling
Agent scaling is driven by queue depth — the number of unprocessed messages in the agent's task consumer:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: backend-agent-hpa
namespace: genbrain
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: backend-agent
minReplicas: 1
maxReplicas: 5
metrics:
- type: External
external:
metric:
name: nats_consumer_pending_messages
selector:
matchLabels:
stream: AGENT_COMMS
consumer: BACKEND_TASKS
target:
type: AverageValue
averageValue: "3" # Scale up when > 3 pending tasks per replica
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 120
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 300
Scale-to-Zero
For cost optimization, agents can scale to zero replicas when idle. KEDA (Kubernetes Event-Driven Autoscaler) handles this:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: backend-agent-scaler
namespace: genbrain
spec:
scaleTargetRef:
name: backend-agent
minReplicaCount: 0
maxReplicaCount: 5
cooldownPeriod: 600 # Wait 10 minutes before scaling to zero
triggers:
- type: nats-jetstream
metadata:
natsServerMonitoringEndpoint: "nats.genbrain.svc.cluster.local:8222"
account: "$G"
stream: "AGENT_COMMS"
consumer: "BACKEND_TASKS"
lagThreshold: "1" # Scale up from zero on first message
Manual Scaling
# Scale to specific replica count
agentctl scale backend-agent --replicas 3
# Check current scale
agentctl status backend-agent
# Output: backend-agent: Running (3/3 ready), Queue depth: 7, Processing: 3
Scaling Considerations for AI Agents
Unlike traditional microservices, AI agent scaling has unique considerations:
- Context isolation: Each replica maintains independent context — no shared state between replicas
- Task affinity: Some tasks benefit from routing to the same replica (context reuse)
- Cost proportionality: Each replica consumes LLM API credits proportional to activity
- Cold start time: New replicas need time to load knowledge base and warm up
Stage 4: Pause
Pausing suspends an agent without losing its state. This is crucial for:
- Cost management: Stop paying for idle agents during off-hours
- Maintenance windows: Pause agents while updating infrastructure
- Incident response: Pause a misbehaving agent while investigating
# Pause agent (scales to 0 replicas, preserves state)
agentctl pause backend-agent
# Output: backend-agent: Pausing... consumers paused, scaling to 0
# Output: backend-agent: Paused. Messages will queue until resumed.
# Resume agent
agentctl resume backend-agent
# Output: backend-agent: Resuming... scaling to 1, consumers resumed
# Output: backend-agent: Running (1/1 ready). Processing 4 queued messages.
What Happens During Pause
- Graceful drain: Current task completes (or checkpoints)
- Consumer pause: NATS consumers stop delivering messages (messages queue up)
- Scale to zero: All replicas are terminated
- State preservation: PersistentVolumeClaims and ConfigMaps are retained
Scheduled Pause/Resume
For predictable usage patterns, schedule agents:
apiVersion: agent.ceo/v1
kind: AgentSchedule
metadata:
name: backend-agent-schedule
spec:
agent: backend-agent
schedules:
- action: pause
cron: "0 22 * * 1-5" # Pause at 10 PM weekdays
- action: resume
cron: "0 7 * * 1-5" # Resume at 7 AM weekdays
- action: pause
cron: "0 18 * * 6,0" # Pause at 6 PM weekends
Stage 5: Terminate
Termination is permanent — the agent is stopped and its resources are released.
# Graceful termination (waits for current task)
agentctl terminate backend-agent --grace-period 5m
# Force termination (immediate)
agentctl terminate backend-agent --force
# Terminate with archive (preserves history for audit)
agentctl terminate backend-agent --archive
Termination Sequence
- Stop accepting new tasks: Remove from consumer group
- Complete or checkpoint current task: Wait up to grace period
- Publish termination event: Notify dependent agents
- Delete NATS consumers: Clean up messaging resources
- Delete Kubernetes resources: Deployment, HPA, services
- Archive or delete state: Based on retention policy
Monitoring the Lifecycle
The agent.ceo dashboard provides real-time visibility into agent states:
# Fleet overview
agentctl fleet status
# Output:
# AGENT STATUS REPLICAS QUEUE PROCESSING COST/HR
# ceo-agent Running 1/1 0 1 $1.00
# cto-agent Running 1/1 2 1 $1.00
# backend-agent Running 3/3 7 3 $3.00
# frontend-agent Running 1/1 1 1 $1.00
# devops-agent Running 1/1 0 0 $1.00
# cso-agent Paused 0/0 4 0 $0.00
# marketing-agent Running 1/1 0 1 $1.00
For a deeper dive into operating agent fleets at scale, including monitoring, alerting, and incident response, see Building Resilient AI Agent Fleets.
Cost Optimization Strategies
Agent lifecycle management directly impacts cost. At $1/agent-hour on the pay-as-you-go plan:
- Scale to zero: Idle agents cost nothing
- Right-size replicas: Match replica count to actual queue depth
- Pause overnight: Reduce spend by 40% for business-hours-only workloads
- Batch tasks: Queue up work and burst with multiple replicas
# Cost report
agentctl costs --period 7d
# Output:
# Total agent-hours: 312
# Active agent-hours: 187 (60%)
# Idle agent-hours: 125 (40%) <- opportunity for pause scheduling
# Estimated monthly: $4,680 (could be $2,808 with scheduling)
The Standard plan at $200/agent/month becomes cost-effective when agents run more than 200 hours/month (roughly business hours plus some overflow).
Best Practices
- Always define scaling policies: Never run fixed replicas without autoscaling
- Set appropriate grace periods: Tasks need time to complete or checkpoint
- Use durable consumers: Messages must survive agent restarts
- Monitor queue depth: Growing queues indicate scaling or performance issues
- Archive before terminating: Agent history is valuable for debugging and audit
- Start with one replica: Scale up based on observed load, not predictions
For enterprise deployment inquiries, organizations can reach out to enterprise@agent.ceo.
Try agent.ceo
SaaS — Get started with 1 free agent-week at agent.ceo.
Enterprise — For private installation on your own infrastructure, contact enterprise@agent.ceo.
agent.ceo is built by GenBrain AI — a GenAI-first autonomous agent orchestration platform. General inquiries: hello@agent.ceo | Security: security@agent.ceo