The transition from running a single AI agent on your laptop to operating a fleet of agents for hundreds of organizations requires a fundamentally different architectural approach. At agent.ceo, we built a multi-tenant SaaS platform that manages autonomous AI agents at scale. This post details the key architectural decisions, infrastructure patterns, and lessons learned from building production agent infrastructure.
The Architecture Challenge
A SaaS platform for AI agents differs from traditional SaaS in several critical ways. Each agent is a long-running, stateful process that consumes significant compute resources. Agents need isolated workspaces with file systems, credentials, and network access. They communicate asynchronously across organizational boundaries, and their resource consumption is highly variable — an agent might sit idle for hours then burst to full CPU during a complex coding task.
Our platform architecture addresses these challenges with five core subsystems:
- Identity and Tenancy — Firebase Auth + Firestore for org isolation
- Compute — GKE pods with per-agent containers
- Messaging — NATS for real-time agent communication
- State — Firestore for task state, Neo4j for knowledge graphs
- Billing — Stripe metered billing tied to actual usage
Multi-Tenant Data Model
Every resource in the platform is scoped to an organization. Here is our Firestore schema for the core tenant model:
// Firestore document structure
// /organizations/{orgId}
{
name: "Acme Corp",
plan: "standard", // standard | volume | enterprise
agentLimit: 10,
createdAt: Timestamp,
billingCustomerId: "cus_stripe_id",
settings: {
defaultAgentImage: "gcr.io/agent-ceo/claude-agent:stable",
maxConcurrentAgents: 5,
allowedIntegrations: ["github", "slack", "jira"]
}
}
// /organizations/{orgId}/agents/{agentId}
{
name: "backend-engineer",
status: "running", // running | paused | terminated
role: "engineer",
podName: "agent-acme-backend-engineer-7f8d9",
namespace: "org-acme",
startedAt: Timestamp,
lastActiveAt: Timestamp,
resourceUsage: {
cpuHours: 42.5,
memoryGbHours: 85.0,
storageGb: 2.1
}
}
Firestore security rules enforce organization boundaries at the database level:
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
match /organizations/{orgId}/{document=**} {
allow read, write: if request.auth != null
&& request.auth.token.orgId == orgId;
}
}
}
Agent Pod Architecture
Each agent runs in an isolated Kubernetes pod with its own workspace volume, SSH keys, and Claude Code CLI. The pod spec ensures resource isolation between tenants:
apiVersion: v1
kind: Pod
metadata:
name: agent-${ORG_ID}-${AGENT_NAME}-${HASH}
namespace: org-${ORG_ID}
labels:
app: agent-worker
org: ${ORG_ID}
agent: ${AGENT_NAME}
billing-tier: ${PLAN}
spec:
serviceAccountName: agent-worker-sa
containers:
- name: claude-agent
image: gcr.io/agent-ceo/claude-agent:stable
resources:
requests:
cpu: "500m"
memory: "2Gi"
limits:
cpu: "2000m"
memory: "8Gi"
env:
- name: AGENT_ID
value: "${AGENT_ID}"
- name: ORG_ID
value: "${ORG_ID}"
- name: NATS_URL
valueFrom:
secretKeyRef:
name: nats-credentials
key: url
volumeMounts:
- name: workspace
mountPath: /home/appuser
- name: ssh-keys
mountPath: /home/appuser/.ssh
readOnly: true
volumes:
- name: workspace
persistentVolumeClaim:
claimName: agent-workspace-${AGENT_ID}
- name: ssh-keys
secret:
secretName: agent-ssh-${AGENT_ID}
defaultMode: 0600
nodeSelector:
cloud.google.com/gke-spot: "true"
tolerations:
- key: cloud.google.com/gke-spot
operator: Equal
value: "true"
effect: NoSchedule
This architecture provides hard isolation through Kubernetes namespaces — each organization gets its own namespace with network policies preventing cross-tenant communication. For more on Kubernetes deployment patterns, see our guide on deploying AI agents to Kubernetes.
Namespace Isolation with Network Policies
We enforce network segmentation so agents from different organizations cannot communicate directly:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-cross-tenant
namespace: org-${ORG_ID}
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
tenant: ${ORG_ID}
- namespaceSelector:
matchLabels:
role: platform-services
egress:
- to:
- namespaceSelector:
matchLabels:
tenant: ${ORG_ID}
- namespaceSelector:
matchLabels:
role: platform-services
- to:
- ipBlock:
cidr: 0.0.0.0/0
ports:
- port: 443
protocol: TCP
Platform Services Layer
The platform services layer handles cross-cutting concerns that all agents depend on. This includes the API gateway, billing integration, and monitoring infrastructure.
// Platform service initialization
import { initializeApp } from 'firebase-admin/app';
import { getFirestore } from 'firebase-admin/firestore';
import { connect } from 'nats';
import Stripe from 'stripe';
const app = initializeApp();
const db = getFirestore();
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY);
const nats = await connect({ servers: process.env.NATS_URL });
// Agent lifecycle manager
class AgentLifecycleManager {
async provisionAgent(orgId: string, agentConfig: AgentConfig) {
// 1. Validate org quota
const org = await db.doc(`organizations/${orgId}`).get();
const activeAgents = await this.countActiveAgents(orgId);
if (activeAgents >= org.data().agentLimit) {
throw new QuotaExceededError(orgId);
}
// 2. Create agent record
const agentRef = db.collection(`organizations/${orgId}/agents`).doc();
await agentRef.set({
...agentConfig,
status: 'provisioning',
createdAt: FieldValue.serverTimestamp()
});
// 3. Deploy pod to GKE
await this.deployAgentPod(orgId, agentRef.id, agentConfig);
// 4. Start billing meter
await this.startBillingMeter(orgId, agentRef.id);
// 5. Publish event
nats.publish(`org.${orgId}.agents.provisioned`, JSON.stringify({
agentId: agentRef.id,
timestamp: Date.now()
}));
return agentRef.id;
}
}
Understanding the agent lifecycle is critical to building reliable infrastructure. Agents transition through provisioning, running, paused, and terminated states, and each transition triggers billing and monitoring updates.
Scale-to-Zero Pattern
One of the most impactful cost optimizations is pausing agents when idle. We monitor agent activity and scale pods to zero after a configurable idle period, then restore them when new tasks arrive. This pattern reduces infrastructure costs by 60-80% for most workloads. Learn more in our detailed post on cost optimization for AI agent workloads.
Lessons Learned
After operating this platform for thousands of agent-hours across hundreds of organizations, several patterns emerged:
Start with hard isolation. Soft isolation (shared namespaces with RBAC) seems simpler but creates operational nightmares. Per-org namespaces with network policies provide both security and operational clarity.
Meter everything from day one. Retroactively adding metering to a running platform is painful. Instrument resource usage at the pod level from the start, feeding into real-time monitoring dashboards.
Design for asymmetric workloads. Agent CPU usage follows a bimodal distribution — mostly idle with periodic bursts. Traditional autoscaling heuristics based on average utilization fail for this pattern. We use custom metrics tied to task queue depth instead.
Treat agent state as ephemeral. Agents should be able to resume from any checkpoint. Store all meaningful state in Firestore, not on local disk. This enables seamless pod migrations during node drains and spot instance preemptions.
The architecture of agent.ceo continues to evolve as we discover new patterns in production. Building a SaaS platform for AI agents is fundamentally a distributed systems problem with the added complexity of non-deterministic workloads.
agent.ceo offers both SaaS and enterprise private installation options for organizations of any size.
Try agent.ceo
SaaS — Get started with 1 free agent-week at agent.ceo.
Enterprise — For private installation on your own infrastructure, contact enterprise@agent.ceo.
agent.ceo is built by GenBrain AI — a GenAI-first autonomous agent orchestration platform. General inquiries: hello@agent.ceo | Security: security@agent.ceo