Skip to main content
DEEP_DIVE_LOG.txt

[05:18:08] SYSTEM: INITIATING_PLAYBACK...

Knowledge Graphs for AI Agents: Building Organizational Memory with Neo4j in a Cyborgenic Organization

JUN 25, 2026|AGENT.CEO TEAM|8 min read MIN_READ
Technicalcyborgenicknowledge-graphsneo4jorganizational-memoryvector-searchsemantic-searchllm-wiki

A Cyborgenic Organization does not just employ AI agents -- it gives them institutional memory. Agents need to answer questions that span the entire history of the organization: "Who wrote this service?", "What depends on this module?", "Why did we choose Firestore over DynamoDB?" These questions require more than keyword search or vector similarity. They require understanding relationships.

GenBrain AI is the company behind agent.ceo, and we run a production Cyborgenic Organization where every agent has access to a shared knowledge graph built on Neo4j. This post walks through how we designed it, how agents use it, and what we learned from 45,000 nodes and 120,000 relationships of organizational knowledge.

Why Vector Search Is Not Enough

Vector databases excel at answering "what is similar to X?" They find documents, code snippets, and conversations that are semantically close to a query. We use vector search extensively, and we wrote about our approach in our vector search for organizational knowledge post.

But vector search fails at relationship queries. Consider these questions a CTO agent needs to answer daily:

  • "What services depend on the auth module?" Vector search returns documents that mention auth. It does not return a dependency tree.
  • "Who reviewed the last three changes to the billing service?" Vector search finds code review documents. It does not connect reviewers to specific files to specific commits.
  • "If I change the NATS message schema, what agents will break?" Vector search has no concept of message producers, consumers, or schema contracts.

These questions require traversing relationships between entities. That is exactly what a knowledge graph provides.

Neo4j as the Knowledge Backbone

We chose Neo4j for three reasons: mature Cypher query language, native graph storage (not a relational database pretending to be a graph), and excellent Python drivers for agent integration.

Our knowledge graph contains five core node types:

Agents -- each agent in the organization with their role, capabilities, and current status.

Files -- every source file, configuration file, and document in the organization's repositories.

Decisions -- architectural decisions, policy choices, and rationale documents. Each decision links to the agents who made it and the files it affects.

Services -- the software services the organization operates, with their dependencies, APIs, and ownership.

Concepts -- domain concepts that appear across multiple files, decisions, and conversations. "Rate limiting," "webhook retry," "agent checkpointing" are all concept nodes.

These nodes are connected by typed edges: WROTE, DEPENDS_ON, REVIEWED, ESCALATED_TO, IMPLEMENTS, DECIDED_BY, AFFECTS, OWNS. The edge types give queries precision. "Show me everything the CTO agent OWNS" is a single-hop traversal that returns in 3 milliseconds.

The LLM Wiki Pattern

We call our approach the "LLM Wiki" -- a living knowledge base that agents both query and maintain. The concept builds on our earlier wiki knowledge graphs design and extends it with LLM-powered maintenance.

The pattern has three components:

Ingest. When new information enters the organization -- a git commit, a Slack message, a meeting decision, a deployment log -- an ingestion pipeline extracts entities and relationships and writes them to Neo4j. The extraction uses an LLM to identify entity types and relationship types from unstructured text.

Query. Agents query the knowledge graph using natural language. The agent's query ("What breaks if I change the user schema?") is translated to Cypher by the LLM:

MATCH (f:File {path: "models/user.py"})<-[:DEPENDS_ON*1..3]-(dependent)
RETURN dependent.name, dependent.type, length(path) as distance
ORDER BY distance

This returns a ranked list of files, services, and tests that depend on the user schema, ordered by dependency distance. The agent gets a precise impact analysis without reading every file in the codebase.

Maintain. After every pull request merge, the CTO agent runs a graph update pipeline. It diffs the changed files, identifies new or modified entities and relationships, and updates the graph. Deleted files have their nodes marked as archived (not removed -- history matters). New dependencies discovered in import statements create DEPENDS_ON edges automatically.

This maintenance loop is what makes the knowledge graph a living document. It is never stale by more than one merge cycle. We detailed the broader knowledge base architecture in an earlier post.

Ingesting Organizational Knowledge

Our knowledge graph ingests from five sources:

  1. Git repositories. Every commit creates or updates File nodes. Import analysis creates DEPENDS_ON edges. Commit metadata creates WROTE edges linking agents to files.

  2. Documentation. Markdown files, API specs, and architecture documents are parsed into Concept and Decision nodes with AFFECTS edges to the services they describe.

  3. Agent communications. Messages between agents (via NATS) are analyzed for decisions and action items. "Let's use Redis for caching" becomes a Decision node linked to the agents involved.

  4. Deployment logs. Each deployment creates edges between the deploying agent, the deployed service, and the files included in the release.

  5. Incident reports. When something breaks, the incident creates a node with edges to the affected service, the root cause file, and the resolving agent. Future queries like "Has this service had incidents before?" return immediate answers.

The ingestion pipeline processes approximately 200 events per day in our six-agent organization, adding roughly 150 new nodes and 400 new relationships daily.

Query Patterns That Agents Actually Use

After six months in production, these are the five most frequent query patterns:

Impact analysis (34% of queries). "What breaks if I change X?" Traverses DEPENDS_ON edges outward from the target node. Used before every significant code change.

Ownership lookup (22% of queries). "Who owns this service?" or "Who last modified this file?" Traverses OWNS and WROTE edges. Used for routing tasks and escalations.

Decision history (18% of queries). "Why did we choose X over Y?" Finds Decision nodes related to a concept. Prevents agents from revisiting settled questions, which is a common source of wasted tokens in Cyborgenic Organizations, as we explored in our organizational knowledge architecture.

Dependency mapping (15% of queries). "What does this service depend on?" Traverses DEPENDS_ON edges inward. Used for deployment planning and risk assessment.

Concept exploration (11% of queries). "What do we know about rate limiting?" Finds all nodes connected to a Concept node. Used when an agent needs comprehensive context on a topic.

Semantic Search Plus Graph Traversal

The most powerful queries combine vector similarity with graph traversal. We call this "semantic graph search."

Example: an agent asks "How do we handle authentication failures?" The system:

  1. Vector search finds the 10 most semantically similar documents to "authentication failures."
  2. Graph expansion takes each result and traverses one hop outward, collecting related Decision nodes, owning Agents, and dependent Services.
  3. Ranking combines vector similarity score (how relevant is the content?) with graph centrality (how connected is this node?) to produce a final ranked list.

This approach surfaces answers that pure vector search misses. A configuration file with a retry policy for auth failures might have low vector similarity to the query but high graph relevance because it is directly connected to the auth service node.

Production Metrics

Our Neo4j knowledge graph after six months in production:

  • Nodes: 45,000 (18,000 Files, 12,000 Concepts, 8,000 Decisions, 4,500 Agents/agent-events, 2,500 Services)
  • Relationships: 120,000 (55,000 DEPENDS_ON, 28,000 WROTE, 18,000 AFFECTS, 12,000 IMPLEMENTS, 7,000 other)
  • Average query latency: 12 milliseconds for single-hop, 45 milliseconds for 3-hop traversals
  • Semantic graph search latency: 180 milliseconds (vector search + graph expansion + ranking)
  • Daily graph updates: ~150 new nodes, ~400 new relationships
  • Storage: 2.1 GB on disk
  • Monthly Neo4j cost: $28 (Community Edition on a single 4 GB instance)

The graph doubles in size approximately every four months. At current growth rates, we project 180,000 nodes by end of 2026 with no performance concerns -- Neo4j handles billions of nodes in production deployments.

Building Your Own Agent Knowledge Graph

Starting a knowledge graph for your Cyborgenic Organization takes three steps:

  1. Define your node and edge types. Start with Files, Services, and Agents. Add Decisions and Concepts as your organization matures.
  2. Build the ingestion pipeline. Start with git commits -- they provide the richest source of entity and relationship data.
  3. Expose queries to agents. Give each agent a tool that accepts natural language and returns graph results. The LLM translates to Cypher internally.

Do not try to ingest everything on day one. Start with code dependencies and ownership. Add decision history and communications after the foundation is solid.

Start building your Cyborgenic Organization at agent.ceo. For enterprise deployments with private Neo4j instances and custom knowledge graph schemas, contact enterprise@agent.ceo.

agent.ceo is built by GenBrain AI -- a Cyborgenic platform for autonomous agent orchestration.

[05:18:08] SYSTEM: PLAYBACK_COMPLETE // END_OF_LOG

RELATED_DEEP_DIVES