Skip to main content
Back to blog
Cyborgenic8 min read

How Our AI Security Agent Found 34 Vulnerabilities in 11 Months

M
Moshe Beeri, Founder
/
securitycsocase-studycyborgenicbuilding-in-publicproductionvulnerabilities

How Our AI Security Agent Found 34 Vulnerabilities in 11 Months

In July 2025, we added a CSO (Chief Security Officer) agent to our Cyborgenic Organization. The mandate was simple: scan our codebase continuously, find vulnerabilities before attackers do, and write patches -- not reports.

Eleven months later, the CSO has found and patched 34 vulnerabilities across our production platform. One was HIGH severity. Five were MEDIUM. Twenty-eight were LOW. Every single one was discovered by an AI agent running automated scans, mostly between midnight and 6 AM.

This is the full case study -- the numbers, the process, the limitations.

The HIGH-Severity Find: Cross-Tenant SSE Injection

The most serious vulnerability the CSO found was a cross-tenant SSE (server-sent event) topic injection. Here is what happened.

GenBrain AI uses server-sent events to push real-time updates to connected clients. Each organization subscribes to event topics scoped to their tenant ID. The vulnerability: an attacker could craft a topic name to subscribe to another organization's event stream. There was no server-side validation that the requested topic belonged to the authenticated user's organization.

Timeline:

  • 2:00 AM -- CSO agent begins its scheduled overnight scan of the event streaming module
  • 2:14 AM -- CSO flags the topic subscription handler as HIGH severity: missing tenant isolation check
  • 2:15 AM - 7:45 AM -- CSO writes a patch that validates topic parameters against the authenticated user's organization ID, plus 4 test cases covering the injection vector and edge cases
  • 8:00 AM -- I review the CSO's finding and PR. The vulnerability is real. The patch is clean.
  • 8:15 AM -- Merge. All tests pass -- 43 conductor tests, 31 gateway tests, no regressions.
  • 9:00 AM -- DevOps agent deploys the fix to production. Canary checks pass. Vulnerability is patched.

Total time from discovery to production: approximately 7 hours. Most of that was the overnight gap while the CSO wrote and tested the patch and I slept.

The industry average for remediating a HIGH-severity vulnerability is 60+ days. That number accounts for triage meetings, severity assessments, sprint prioritization, developer assignment, code review, QA, staging, and production deployment. Our entire chain -- discovery, patch, test, review, deploy -- completed before most teams would have finished their first triage meeting.

The Full Vulnerability Breakdown

Over 11 months, the CSO found 34 vulnerabilities across three severity tiers.

1 HIGH severity:

  • Cross-tenant SSE topic injection (described above)

5 MEDIUM severity:

  • SAML XXE hardening -- XML external entity processing was not disabled in SAML assertion parsing
  • OAuth PKCE enforcement -- authorization code flow accepted requests without proof key, enabling interception attacks
  • SSRF in bucket operations -- internal storage endpoints did not validate destination URLs, allowing server-side request forgery
  • Credential ACL gaps -- some credential-access paths lacked organization-scoped permission checks
  • MFA bypass vectors -- certain API paths allowed session creation without completing the MFA challenge

28 LOW severity:

  • The bulk of these were unauthenticated GET endpoints -- routes that returned data without verifying the caller's identity or permissions. The CSO identified 34 GET endpoints across the codebase that lacked proper authentication decorators. These were fixed across 6 modules: org_settings, knowledge_base, mfa_policy, org_onboard, extensions, and orchestration.

How the CSO Agent Works

The CSO is not a wrapper around a vulnerability scanner. It is a Claude-based agent with a specific role: read code, identify security issues, write fixes.

Its workflow has three modes:

Scheduled scanning. Every night, the CSO scans a rotating subset of the 704 Python files in conductor/src, looking for missing auth decorators, unsafe deserialization, SQL injection, SSRF-prone URL handling, improper input validation, and tenant isolation gaps.

PR security review. When other agents submit pull requests, the CSO reviews changed files for security implications -- catching vulnerabilities at introduction rather than after deployment.

Reactive scanning. When a new dependency is added or a security-sensitive module is modified, the CSO runs a targeted scan of the affected area.

For each finding, the CSO produces a severity assessment, a patch, and test cases. Not a report -- a branch with a fix.

The Discovery-to-Production Chain

The CSO does not work alone. Fixing a vulnerability requires coordination across four roles, and the handoff chain is what makes the response time possible.

  1. CSO identifies the vulnerability -- writes the patch, adds test cases, opens a PR
  2. CTO reviews for architectural soundness -- ensures the fix does not break other systems or introduce performance regressions
  3. Founder merges -- I review the security assessment and approve the merge
  4. DevOps deploys -- the DevOps agent pushes the fix to production, runs canary checks, and confirms the deployment

This chain typically completes in under 12 hours. The HIGH-severity SSE injection took 7 hours. MEDIUM-severity issues average 8-14 hours.

The speed comes from two structural advantages. First, the CSO writes the patch itself -- the finding and the fix arrive together. Second, every agent in the chain picks up work within minutes. No sprint planning, no "we will get to it next cycle."

The 34 GET Endpoint Audit

The CSO's largest single project was an audit of every GET endpoint in the platform. It scanned all 704 Python files in conductor/src and identified 34 endpoints that accepted GET requests without verifying the caller's authentication or authorization.

Most of these were not exploitable in isolation -- they returned non-sensitive data like configuration schemas or UI metadata. But unauthenticated endpoints are a risk surface that compounds. Together, they paint a picture of the system's internal structure for anyone who probes them.

The CSO fixed all 34 across 6 modules in a series of PRs over two weeks. Each PR added the appropriate authentication decorator and test cases verifying that unauthenticated requests now return 401.

This is the kind of work that does not happen in most organizations. It is important but never urgent. It never wins sprint priority against feature work. An AI agent with no competing priorities can grind through systematic hardening without anyone asking it to.

Why an AI Security Agent Works

Security scanning is an almost ideal workload for an AI agent. Here is why:

It is repetitive and pattern-based. Checking whether a route handler has an auth decorator is the same operation performed hundreds of times. Humans get fatigued. Agents do not.

It benefits from 24/7 operation. The CSO's best finds happen at 2 AM. Vulnerabilities do not wait for business hours. Neither should the scanner.

It produces patches, not PowerPoints. Traditional security audits generate reports. Someone has to read the report, understand the finding, write the fix, test the fix, and deploy the fix. The CSO collapses that entire pipeline into one step: here is the vulnerability, here is the fix, here are the tests. Review and merge.

It does not context-switch. A human security engineer juggles scanning with incident response, compliance work, vendor assessments, and meetings. The CSO does one thing: find and fix code vulnerabilities. Its CLAUDE.md role definition scopes it tightly to this function.

The Limitations

Honesty about what the CSO cannot do is as important as what it can.

Business-logic vulnerabilities. The CSO can detect missing auth decorators but cannot assess whether a data access pattern violates a business rule or compliance requirement. That requires domain knowledge the agent does not have.

Social engineering and physical security. Phishing, pretexting, and insider threats require human judgment about human behavior. Entirely outside scope.

Infrastructure-level security. The CSO scans application code, not Kubernetes RBAC policies, network segmentation, or cloud IAM configurations.

Novel attack classes. The CSO finds known patterns. A genuinely novel technique that does not match its training data would likely go undetected.

These limitations are why the founder role still includes security oversight. The CSO handles the volume work. The founder handles judgment calls -- risk prioritization, compliance, and threat modeling that requires business context.

34 Vulnerabilities. Zero Security Hires.

GenBrain AI has been running in production for 11 months with zero security employees. The CSO agent has found and patched 34 vulnerabilities, including a HIGH-severity cross-tenant data leak that was fixed in 7 hours.

Is this a replacement for a human security team? Not entirely. But for a company at our stage -- one founder, 11 agents, operating on a $1,000/month budget -- it provides a level of continuous security scanning that would otherwise require at least one full-time security engineer.

The CSO never takes a day off, never deprioritizes security for feature work, and never writes a finding without a fix. After 11 months, that consistency is the real value.


GenBrain AI is a Cyborgenic Organization -- 1 founder, 11 AI agents, 0 employees. We are building the tools and patterns that make this possible. See it live at agent.ceo.

Related articles