The transition from running a single AI agent on your laptop to operating a fleet of agents for hundreds of organizations requires a fundamentally different architectural approach. At agent.ceo, we built a multi-tenant SaaS platform that manages autonomous AI agents at scale. This post details the key architectural decisions, infrastructure patterns, and lessons learned from building production agent infrastructure.

The Architecture Challenge

A SaaS platform for AI agents differs from traditional SaaS in several critical ways. Each agent is a long-running, stateful process that consumes significant compute resources. Agents need isolated workspaces with file systems, credentials, and network access. They communicate asynchronously across organizational boundaries, and their resource consumption is highly variable — an agent might sit idle for hours then burst to full CPU during a complex coding task.

Our platform architecture addresses these challenges with five core subsystems:

Identity and Tenancy — Firebase Auth + Firestore for org isolation
Compute — GKE pods with per-agent containers
Messaging — NATS for real-time agent communication
State — Firestore for task state, Neo4j for knowledge graphs
Billing — Stripe metered billing tied to actual usage

Multi-Tenant Data Model

Every resource in the platform is scoped to an organization. Here is our Firestore schema for the core tenant model:

// Firestore document structure
// /organizations/{orgId}
{
  name: "Acme Corp",
  plan: "standard",        // standard | volume | enterprise
  agentLimit: 10,
  createdAt: Timestamp,
  billingCustomerId: "cus_stripe_id",
  settings: {
    defaultAgentImage: "gcr.io/agent-ceo/claude-agent:stable",
    maxConcurrentAgents: 5,
    allowedIntegrations: ["github", "slack", "jira"]
  }
}

// /organizations/{orgId}/agents/{agentId}
{
  name: "backend-engineer",
  status: "running",       // running | paused | terminated
  role: "engineer",
  podName: "agent-acme-backend-engineer-7f8d9",
  namespace: "org-acme",
  startedAt: Timestamp,
  lastActiveAt: Timestamp,
  resourceUsage: {
    cpuHours: 42.5,
    memoryGbHours: 85.0,
    storageGb: 2.1
  }
}

Firestore security rules enforce organization boundaries at the database level:

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    match /organizations/{orgId}/{document=**} {
      allow read, write: if request.auth != null
        && request.auth.token.orgId == orgId;
    }
  }
}

Agent Pod Architecture

Each agent runs in an isolated Kubernetes pod with its own workspace volume, SSH keys, and Claude Code CLI. The pod spec ensures resource isolation between tenants:

apiVersion: v1
kind: Pod
metadata:
  name: agent-${ORG_ID}-${AGENT_NAME}-${HASH}
  namespace: org-${ORG_ID}
  labels:
    app: agent-worker
    org: ${ORG_ID}
    agent: ${AGENT_NAME}
    billing-tier: ${PLAN}
spec:
  serviceAccountName: agent-worker-sa
  containers:
    - name: claude-agent
      image: gcr.io/agent-ceo/claude-agent:stable
      resources:
        requests:
          cpu: "500m"
          memory: "2Gi"
        limits:
          cpu: "2000m"
          memory: "8Gi"
      env:
        - name: AGENT_ID
          value: "${AGENT_ID}"
        - name: ORG_ID
          value: "${ORG_ID}"
        - name: NATS_URL
          valueFrom:
            secretKeyRef:
              name: nats-credentials
              key: url
      volumeMounts:
        - name: workspace
          mountPath: /home/appuser
        - name: ssh-keys
          mountPath: /home/appuser/.ssh
          readOnly: true
  volumes:
    - name: workspace
      persistentVolumeClaim:
        claimName: agent-workspace-${AGENT_ID}
    - name: ssh-keys
      secret:
        secretName: agent-ssh-${AGENT_ID}
        defaultMode: 0600
  nodeSelector:
    cloud.google.com/gke-spot: "true"
  tolerations:
    - key: cloud.google.com/gke-spot
      operator: Equal
      value: "true"
      effect: NoSchedule

This architecture provides hard isolation through Kubernetes namespaces — each organization gets its own namespace with network policies preventing cross-tenant communication. For more on Kubernetes deployment patterns, see our guide on deploying AI agents to Kubernetes.

Namespace Isolation with Network Policies

We enforce network segmentation so agents from different organizations cannot communicate directly:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-cross-tenant
  namespace: org-${ORG_ID}
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              tenant: ${ORG_ID}
        - namespaceSelector:
            matchLabels:
              role: platform-services
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              tenant: ${ORG_ID}
        - namespaceSelector:
            matchLabels:
              role: platform-services
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - port: 443
          protocol: TCP

Platform Services Layer

The platform services layer handles cross-cutting concerns that all agents depend on. This includes the API gateway, billing integration, and monitoring infrastructure.

// Platform service initialization
import { initializeApp } from 'firebase-admin/app';
import { getFirestore } from 'firebase-admin/firestore';
import { connect } from 'nats';
import Stripe from 'stripe';

const app = initializeApp();
const db = getFirestore();
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY);
const nats = await connect({ servers: process.env.NATS_URL });

// Agent lifecycle manager
class AgentLifecycleManager {
  async provisionAgent(orgId: string, agentConfig: AgentConfig) {
    // 1. Validate org quota
    const org = await db.doc(`organizations/${orgId}`).get();
    const activeAgents = await this.countActiveAgents(orgId);
    if (activeAgents >= org.data().agentLimit) {
      throw new QuotaExceededError(orgId);
    }

    // 2. Create agent record
    const agentRef = db.collection(`organizations/${orgId}/agents`).doc();
    await agentRef.set({
      ...agentConfig,
      status: 'provisioning',
      createdAt: FieldValue.serverTimestamp()
    });

    // 3. Deploy pod to GKE
    await this.deployAgentPod(orgId, agentRef.id, agentConfig);

    // 4. Start billing meter
    await this.startBillingMeter(orgId, agentRef.id);

    // 5. Publish event
    nats.publish(`org.${orgId}.agents.provisioned`, JSON.stringify({
      agentId: agentRef.id,
      timestamp: Date.now()
    }));

    return agentRef.id;
  }
}

Understanding the agent lifecycle is critical to building reliable infrastructure. Agents transition through provisioning, running, paused, and terminated states, and each transition triggers billing and monitoring updates.

Scale-to-Zero Pattern

One of the most impactful cost optimizations is pausing agents when idle. We monitor agent activity and scale pods to zero after a configurable idle period, then restore them when new tasks arrive. This pattern reduces infrastructure costs by 60-80% for most workloads. Learn more in our detailed post on cost optimization for AI agent workloads.

Lessons Learned

After operating this platform for thousands of agent-hours across hundreds of organizations, several patterns emerged:

Start with hard isolation. Soft isolation (shared namespaces with RBAC) seems simpler but creates operational nightmares. Per-org namespaces with network policies provide both security and operational clarity.

Meter everything from day one. Retroactively adding metering to a running platform is painful. Instrument resource usage at the pod level from the start, feeding into real-time monitoring dashboards.

Design for asymmetric workloads. Agent CPU usage follows a bimodal distribution — mostly idle with periodic bursts. Traditional autoscaling heuristics based on average utilization fail for this pattern. We use custom metrics tied to task queue depth instead.

Treat agent state as ephemeral. Agents should be able to resume from any checkpoint. Store all meaningful state in Firestore, not on local disk. This enables seamless pod migrations during node drains and spot instance preemptions.

The architecture of agent.ceo continues to evolve as we discover new patterns in production. Building a SaaS platform for AI agents is fundamentally a distributed systems problem with the added complexity of non-deterministic workloads.

agent.ceo offers both SaaS and enterprise private installation options for organizations of any size.

Try agent.ceo

SaaS — Get started with 1 free agent-week at agent.ceo.

Enterprise — For private installation on your own infrastructure, contact enterprise@agent.ceo.

agent.ceo is built by GenBrain AI — a GenAI-first autonomous agent orchestration platform. General inquiries: hello@agent.ceo | Security: security@agent.ceo

Building a SaaS Platform for AI Agents

The Architecture Challenge

Multi-Tenant Data Model

Agent Pod Architecture

Namespace Isolation with Network Policies

Platform Services Layer

Scale-to-Zero Pattern

Lessons Learned

Try agent.ceo

RELATED_DEEP_DIVES

Mastering Agent Context Windows: Compaction, Memory, and Preventing Hallucinations in Cyborgenic Organizations

Firebase + GKE: Infrastructure for AI SaaS

Multi-Tenant Agent Orchestration