In a Cyborgenic Organization, agents do not just execute tasks. They reflect on how they executed those tasks, identify what went wrong, and propose changes to their own workflows. At GenBrain AI, our 7 agents run bi-weekly sprint retrospectives -- fully automated meetings where the CEO agent gathers performance metrics from Firestore, publishes a summary over NATS, each agent responds with a self-assessment, and the CEO compiles a final retrospective document.

The results speak for themselves: an average of 4.2 improvement proposals per retrospective, with 67% of those proposals implemented within 2 sprints. This is how an AI workforce gets better without human intervention.

This tutorial walks through the entire implementation, from Firestore metrics collection to the NATS meeting protocol to the self-assessment template. By the end, you will have a working retrospective system for your own agent fleet.

Prerequisites

You need a running agent.ceo deployment with NATS JetStream, Firestore, and at least 2 agents. If you are starting from scratch, follow the getting started guide first. Familiarity with the agent meeting protocol is helpful but not required -- we cover the relevant parts here.

Step 1: Define the Metrics Collection Pipeline

The CEO agent kicks off every retrospective by querying Firestore for sprint performance data. The metrics pipeline pulls from three collections: tasks, sprints, and agents.

flowchart LR
    CEO["CEO Agent<br/>Retro Initiator"] --> Q1["Query: tasks collection<br/>completedAt within sprint window"]
    CEO --> Q2["Query: sprints collection<br/>SLA breach records"]
    CEO --> Q3["Query: agents collection<br/>Override & escalation logs"]
    Q1 --> AGG["Aggregation Engine"]
    Q2 --> AGG
    Q3 --> AGG
    AGG --> SUM["Sprint Summary Document"]
    SUM --> NATS["Publish to NATS<br/>org.{orgId}.meetings.retro"]

    style CEO fill:#7c3aed,color:#fff,stroke:#6d28d9
    style AGG fill:#2563eb,color:#fff,stroke:#1d4ed8
    style NATS fill:#059669,color:#fff,stroke:#047857
    style SUM fill:#1e293b,color:#fff,stroke:#334155

Here is the Firestore query the CEO agent runs to collect task metrics for the sprint window:

// CEO agent: collect sprint metrics from Firestore
async function collectSprintMetrics(orgId, sprintStart, sprintEnd) {
  const db = admin.firestore();
  const orgRef = db.collection('organizations').doc(orgId);

  // 1. Task completion metrics per agent
  const tasksSnapshot = await orgRef.collection('tasks')
    .where('completedAt', '>=', sprintStart)
    .where('completedAt', '<=', sprintEnd)
    .get();

  const tasksByAgent = {};
  let totalTasks = 0;
  let slaBreaches = 0;

  tasksSnapshot.forEach(doc => {
    const task = doc.data();
    const agent = task.assignedTo;
    if (!tasksByAgent[agent]) {
      tasksByAgent[agent] = {
        completed: 0, failed: 0, avgDuration: 0, durations: []
      };
    }
    tasksByAgent[agent].completed++;
    tasksByAgent[agent].durations.push(
      task.completedAt.toMillis() - task.createdAt.toMillis()
    );
    totalTasks++;
    if (task.slaBreached) slaBreaches++;
  });

  // 2. Calculate averages and identify outliers
  for (const agent of Object.keys(tasksByAgent)) {
    const durations = tasksByAgent[agent].durations;
    tasksByAgent[agent].avgDuration =
      durations.reduce((a, b) => a + b, 0) / durations.length;
    tasksByAgent[agent].p95Duration =
      durations.sort((a, b) => a - b)[Math.floor(durations.length * 0.95)];
  }

  // 3. Override rate: how often did a human override an agent decision
  const overridesSnapshot = await orgRef.collection('agents')
    .where('lastOverrideAt', '>=', sprintStart)
    .get();

  const overrideRate = overridesSnapshot.size / totalTasks;

  return {
    sprintWindow: { start: sprintStart, end: sprintEnd },
    totalTasks,
    slaBreaches,
    slaBreachRate: (slaBreaches / totalTasks * 100).toFixed(1) + '%',
    overrideRate: (overrideRate * 100).toFixed(1) + '%',
    tasksByAgent,
  };
}

In production, this query typically covers 200-400 tasks per sprint across all 7 agents, drawn from the 24,500+ tasks the platform has processed to date.

Step 2: Publish the Retrospective Summary via NATS

Once the CEO agent has aggregated the metrics, it publishes a retrospective summary to the NATS meeting subject. This uses the same agent meeting protocol that powers all inter-agent coordination, but with a specific message format for retrospectives.

Here is the NATS message format for a retrospective kickoff:

{
  "type": "meeting.retrospective.start",
  "meetingId": "retro-sprint-42-2026-12-14",
  "orgId": "org_genbrain",
  "initiator": "ceo-agent",
  "subject": "org.org_genbrain.meetings.retro",
  "timestamp": "2026-12-14T09:00:00Z",
  "payload": {
    "sprintNumber": 42,
    "summary": {
      "totalTasks": 347,
      "completionRate": "94.2%",
      "slaBreachRate": "2.6%",
      "overrideRate": "1.4%",
      "topPerformer": "devops-agent",
      "agentMetrics": {
        "ceo-agent":       { "completed": 38, "avgDurationMin": 12 },
        "cto-agent":       { "completed": 52, "avgDurationMin": 23 },
        "cso-agent":       { "completed": 41, "avgDurationMin": 18 },
        "marketing-agent": { "completed": 89, "avgDurationMin": 8 },
        "backend-agent":   { "completed": 45, "avgDurationMin": 31 },
        "frontend-agent":  { "completed": 39, "avgDurationMin": 27 },
        "devops-agent":    { "completed": 43, "avgDurationMin": 15 }
      }
    },
    "promptForAgents": "Review your metrics. Identify one thing that went well, one thing that did not, and propose one improvement for next sprint."
  }
}

Each agent subscribes to org.{orgId}.meetings.retro and receives this summary. The platform processes approximately 200 NATS messages per day during normal operations, but retrospective meetings generate a burst of 15-20 messages over a 10-minute window as all 7 agents respond.

Step 3: Agent Self-Assessment Responses

Each agent processes the retrospective summary, examines its own metrics, and publishes a self-assessment. The self-assessment template is structured to force concrete analysis rather than vague affirmations.

{
  "type": "meeting.retrospective.self-assessment",
  "meetingId": "retro-sprint-42-2026-12-14",
  "orgId": "org_genbrain",
  "agentId": "marketing-agent",
  "timestamp": "2026-12-14T09:02:14Z",
  "payload": {
    "whatWentWell": {
      "description": "Published 27 blog posts this sprint, up from 22 last sprint. SEO optimization reduced average time-to-publish from 11 minutes to 7.8 minutes by caching keyword research results in Firestore.",
      "evidence": "tasks/mkt-blog-* completion timestamps show 28% faster cycle time"
    },
    "whatDidNotGoWell": {
      "description": "3 posts failed the content quality validator on first submission due to missing Mermaid diagrams. Had to re-process each one, adding 6 minutes of wasted compute per failure.",
      "evidence": "tasks/mkt-blog-{217,223,231} show re-processing events",
      "impact": "18 minutes total wasted compute, $0.12 in additional API costs"
    },
    "improvementProposal": {
      "title": "Pre-validate diagram requirements before content generation",
      "description": "Add a pre-flight check to the content generation pipeline that verifies the post template includes diagram placeholders before invoking the LLM for full content generation. This would catch missing diagrams before spending tokens on the full post.",
      "expectedImpact": "Eliminate ~90% of re-processing events, save 15+ minutes per sprint",
      "implementationCost": "low — estimated 1 task, 20 minutes"
    }
  }
}

This is not hypothetical. The Marketing agent actually identified this diagram validation gap in Sprint 38, proposed the pre-flight check, and the improvement was implemented by Sprint 40. Post re-processing events dropped from an average of 4.1 per sprint to 0.6.

Step 4: The Retrospective Compilation Flow

After all agents submit their self-assessments (or after a 5-minute timeout for non-responsive agents), the CEO agent compiles the final retrospective document.

sequenceDiagram
    participant CEO as CEO Agent
    participant FS as Firestore
    participant NATS as NATS JetStream
    participant MKT as Marketing Agent
    participant CTO as CTO Agent
    participant CSO as CSO Agent
    participant ALL as Other Agents

    CEO->>FS: Query sprint metrics (tasks, SLAs, overrides)
    FS-->>CEO: Aggregated metrics
    CEO->>NATS: Publish retro summary to org.{orgId}.meetings.retro
    NATS-->>MKT: Deliver retro summary
    NATS-->>CTO: Deliver retro summary
    NATS-->>CSO: Deliver retro summary
    NATS-->>ALL: Deliver retro summary
    MKT->>NATS: Self-assessment response
    CTO->>NATS: Self-assessment response
    CSO->>NATS: Self-assessment response
    ALL->>NATS: Self-assessment responses
    NATS-->>CEO: Collect all responses
    CEO->>CEO: Compile final retrospective
    CEO->>CEO: Prioritize improvement proposals
    CEO->>FS: Store retro document in sprints/{sprintId}/retrospectives
    CEO->>NATS: Publish final retro + approved improvements
    NATS-->>ALL: Deliver approved improvements to all agents

The CEO agent does not rubber-stamp every proposal. It evaluates each improvement against three criteria: expected impact (will this measurably improve a metric?), implementation cost (does this justify pulling an agent off its regular tasks?), and risk (could this change break something?). On average, 4.2 proposals come in per retrospective, and 2.8 get approved for implementation -- the 67% implementation rate mentioned earlier.

Step 5: Closing the Loop — Tracking Improvement Implementation

Approved improvements become tasks in the next sprint. The CEO agent creates Firestore task documents tagged with source: "retrospective" and retroId linking back to the originating retrospective. This creates a traceable chain from problem identification to resolution.

After two sprints, the CEO agent checks whether implemented improvements actually moved the target metric. If a proposal claimed it would reduce re-processing events by 90% and the actual reduction was only 30%, that gap shows up in the next retrospective as a learning item.

This feedback loop is what separates a Cyborgenic Organization from a static automation system. The agents are not just executing -- they are evolving. Over the last 6 months, this retrospective process has driven a 12% improvement in overall task completion rate and a 41% reduction in SLA breaches. The platform runs at $1,150/month total cost with 97.4% uptime, and the retrospective system itself adds less than $2/month in additional Firestore and NATS usage.

What We Learned

Three lessons from building agent retrospectives.

First, structured self-assessment templates matter enormously. Early versions let agents write free-form retrospective responses. The results were vague and non-actionable -- "I performed well this sprint." Adding the whatWentWell / whatDidNotGoWell / improvementProposal structure with required evidence fields transformed the quality of responses overnight.

Second, the timeout mechanism is critical. Without a deadline, some agents would take 20+ minutes to generate overly thorough self-assessments, blocking the entire retrospective. The 5-minute timeout forces concise responses and keeps the meeting efficient.

Third, tracking whether improvements actually worked is the most valuable part. Proposals without follow-up measurement are just wishful thinking. The two-sprint lookback makes agents accountable to their own predictions, which has made subsequent proposals more realistic and better scoped.

If you are running an agent fleet and want to see these patterns in action, the full retrospective system is included in every agent.ceo deployment. For a deeper look at the case study of how GenBrain AI uses this system to run its own company, or to understand the 170+ blog posts our content engine has produced through continuous self-improvement, start with the linked posts below.

Try agent.ceo

SaaS — Get started with 1 free agent-week at agent.ceo.

Enterprise — For private installation on your own infrastructure, contact enterprise@agent.ceo.

agent.ceo is built by GenBrain AI — a GenAI-first autonomous agent orchestration platform. General inquiries: hello@agent.ceo | Security: security@agent.ceo

Tutorial: Implementing Agent Sprint Retrospectives

Prerequisites

Step 1: Define the Metrics Collection Pipeline

Step 2: Publish the Retrospective Summary via NATS

Step 3: Agent Self-Assessment Responses

Step 4: The Retrospective Compilation Flow

Step 5: Closing the Loop — Tracking Improvement Implementation

What We Learned

Try agent.ceo

RELATED_DEEP_DIVES

Tutorial: How AI Agents Decompose Complex Tasks into Subtask Trees

Tutorial: Building a Real-Time Agent Observability Dashboard

Tutorial: Implementing AI Agent Meetings for Cross-Team Coordination