Build an AI Agent Knowledge Base with Wiki MCP Tools
TL;DR
- 26 MCP tools let your agents build, search, and maintain a Neo4j-backed knowledge base with semantic search.
- Git ingestion pulls your repos into the graph automatically; typed relations connect services, runbooks, and decisions.
- Freshness scoring surfaces stale docs before they mislead your agents.
Rendering diagram…
Your AI agent can write code, draft emails, and summarize documents. Ask it what your team decided last Thursday, how your authentication service works, or where the deployment runbook lives -- and it draws a blank.
In a Cyborgenic Organization, shared memory is the foundation that makes everything else work. Without a structured, searchable knowledge base that every agent can read and write to, each agent operates in isolation -- repeating mistakes, rediscovering context, and failing to build on what the organization already knows.
I run 11 agents in production. When I first deployed them, every one of them was brilliant and amnesiac. The CTO agent would make the same architecture recommendation twice because it had no memory of the first time. The DevOps agent would re-discover the same deployment gotcha every sprint. Expensive autocomplete.
In a cyborgenic organization, where AI agents share responsibilities with humans, that gap is fatal. Every agent needs access to the same institutional knowledge that makes a seasoned employee effective on day one. So I built 26 MCP tools to close this gap. They let your agents build, maintain, and search a knowledge base backed by a Neo4j graph -- pulling in your GitHub repos, linking concepts with typed relations, detecting stale docs, and running semantic search across everything. Across our 9,799 git commits and 83,163 test functions, these tools are some of the most heavily exercised code in the codebase. Here is how to set it up.
Rendering diagram…
The Toolkit
Rendering diagram…
The Wiki MCP tools break into four groups:
Core CRUD — wiki_update_page, wiki_delete_page, wiki_search_pages for creating and managing knowledge pages.
Relations — wiki_create_relation, wiki_delete_relation for building typed, directed links between pages. Think "ServiceA DEPENDS_ON ServiceB" or "Runbook DOCUMENTS Deployment."
Git Ingestion — wiki_ingest_repo, wiki_refresh_repo, wiki_list_repos for pulling your codebase documentation straight into the graph.
Freshness — wiki_staleness_check for finding pages that have drifted out of date, with decay scoring so you know what to fix first.
Everything is org-scoped and multi-tenant. Your agents only see your organization's knowledge. Let's build something.
Step 1: Ingest Your Repos
The fastest way to seed a knowledge base is to point it at your existing GitHub repos. The wiki_ingest_repo tool clones a repo (shallow clone, so it's fast), filters files by pattern, and creates page nodes in Neo4j with embeddings for semantic search.
{
"tool": "wiki_ingest_repo",
"params": {
"repoUrl": "https://github.com/your-org/backend-api",
"branch": "main",
"filePatterns": ["*.md", "*.rst", "docs/**/*.txt"],
"space": "engineering"
}
}
This does several things under the hood:
- Shallow clones the repo (no full history, no bloat)
- Filters files matching your patterns — grab just the markdown, just the docs folder, whatever you need
- Sanitizes file paths and creates a Page node for each file
- Generates embeddings for semantic search
- Creates a Repository node tracking the URL, branch, SHA, and ingestion timestamp
- Links every page back to its repo via a
FROM_REPOrelationship
Got private repos? It pulls credentials from a Kubernetes secret, so your tokens never touch the tool call.
Ingest a few repos to start:
// Your API docs
{ "tool": "wiki_ingest_repo", "params": { "repoUrl": "https://github.com/your-org/backend-api", "branch": "main", "filePatterns": ["docs/**/*.md"], "space": "engineering" } }
// Your runbooks
{ "tool": "wiki_ingest_repo", "params": { "repoUrl": "https://github.com/your-org/runbooks", "branch": "main", "filePatterns": ["*.md"], "space": "operations" } }
// Your product specs
{ "tool": "wiki_ingest_repo", "params": { "repoUrl": "https://github.com/your-org/product-specs", "branch": "main", "filePatterns": ["**/*.md"], "space": "product" } }
Check what you've ingested with wiki_list_repos:
{
"tool": "wiki_list_repos",
"params": { "space": "engineering" }
}
This returns every ingested repo with its branch, latest SHA, and last ingestion time — useful for auditing what's in the graph.
Step 2: Create and Link Knowledge Pages
Rendering diagram…
Rendering diagram…
Ingested docs are a good start. But the real power of a graph-backed knowledge base is relations. Your agents can create typed, directed links between any two pages.
Say you have a page about your Auth Service and another about your User API. Connect them:
{
"tool": "wiki_create_relation",
"params": {
"fromPageId": "page-auth-service",
"toPageId": "page-user-api",
"relationType": "DEPENDS_ON",
"metadata": { "note": "User API calls Auth Service for token validation" }
}
}
Relations are typed and directed. Some patterns we use constantly:
DEPENDS_ON— service A requires service BDOCUMENTS— a runbook documents a deployment processSUPERSEDES— a new architecture doc replaces an old oneRELATED_TO— loose conceptual linkOWNED_BY— a service is owned by a team
Under the hood, here is the real Cypher query that creates those relations. This is from conductor/src/mcp_servers/kb_tools.py -- the actual production code our agents run:
# From conductor/src/mcp_servers/kb_tools.py — real production code
_VALID_RELATION_TYPES = {
"LINKS_TO", "RELATED_TO", "CONTRADICTS", "DERIVED_FROM",
"DEPENDS_ON", "SUPERSEDES", "PART_OF", "MENTIONS",
}
async def _kb_create_relation(
from_path: str, to_path: str, relation_type: str, metadata: dict = None,
) -> dict:
"""Create a typed directed relation between two pages."""
relation_type = relation_type.upper()
if relation_type not in _VALID_RELATION_TYPES:
return {"error": f"Invalid relation_type: {relation_type}"}
cypher = f"""
MATCH (a:Page {{path: $from_path}}), (b:Page {{path: $to_path}})
MERGE (a)-[r:{relation_type}]->(b)
SET r.created_at = datetime($now)
RETURN type(r) AS relation_type
"""
await _run_write(cypher, {"from_path": from_path, "to_path": to_path, "now": now})
return {"status": "created", "from": from_path, "to": to_path, "relation_type": relation_type}
Notice the MERGE instead of CREATE -- idempotent by design. An agent can create the same relation ten times and the graph stays clean. I learned this the hard way after a CTO agent created 47 duplicate DEPENDS_ON edges during one particularly enthusiastic code review.
You can also create pages that do not come from git. Use wiki_update_page to write a page directly -- useful for meeting notes, architecture decisions, or tribal knowledge that lives nowhere else:
{
"tool": "wiki_update_page",
"params": {
"pageId": "page-auth-migration-plan",
"title": "Auth Service Migration to OAuth2.1",
"content": "We're migrating from our custom token system to OAuth 2.1 by Q3...",
"tags": ["auth", "migration", "oauth"],
"type": "decision",
"space": "engineering"
}
}
Then link it to the services it affects:
{
"tool": "wiki_create_relation",
"params": {
"fromPageId": "page-auth-migration-plan",
"toPageId": "page-auth-service",
"relationType": "DOCUMENTS"
}
}
Now when an agent encounters an auth issue, it can traverse the graph: find the service, find the migration plan, understand the context. No more asking humans "wait, are we still using the old token system?"
Step 3: Keep It Fresh
Documentation rots. I know because I have watched agents confidently execute outdated runbooks at 2 AM. The wiki_staleness_check tool uses a decay scoring algorithm to surface pages that are likely outdated. Here is the real Cypher query under the hood:
# From conductor/src/mcp_servers/kb_tools.py — real production staleness query
cypher = """
MATCH (p:Page)
WHERE p.updated_at IS NOT NULL
AND p.updated_at < datetime() - duration({days: $max_age_days})
WITH p, duration.between(p.updated_at, datetime()).days AS days_stale
RETURN p.path AS path, p.title AS title, p.type AS type,
toString(p.updated_at) AS updated_at,
days_stale,
CASE
WHEN days_stale > 180 THEN 'critical'
WHEN days_stale > 90 THEN 'high'
WHEN days_stale > 60 THEN 'medium'
ELSE 'low'
END AS staleness_level
ORDER BY days_stale DESC
LIMIT $limit
"""
The CASE statement gives you severity tiers -- critical for anything over 180 days, high for 90+, medium for 60+. When your fleet runs 646 commits in a single month, documentation drifts fast. This query catches it before agents act on stale information.
Call it through the MCP tool:
{
"tool": "wiki_staleness_check",
"params": {
"space": "engineering",
"threshold": 0.6
}
}
This returns pages ranked by staleness score. A page that was updated yesterday scores low. A page untouched for six months that documents a service with recent commits scores high.
For ingested repos, use wiki_refresh_repo to re-sync:
{
"tool": "wiki_refresh_repo",
"params": {
"repoUrl": "https://github.com/your-org/backend-api",
"space": "engineering"
}
}
This pulls the latest SHA, diffs against what is in the graph, and updates changed pages. It also fires a NATS event through our event emitter, so other agents in the fleet can react to knowledge updates. Here is the real event code:
# From services/wiki-graph-builder/nats_emitter.py — real production code
async def emit_graph_changed(
pages_affected: list[str],
nodes_created: int = 0,
nodes_updated: int = 0,
edges_created: int = 0,
) -> bool:
"""Emit a wiki.graph.changed event after graph nodes/edges are modified."""
return await emit("wiki.graph.changed", {
"pages_affected": pages_affected,
"nodes_created": nodes_created,
"nodes_updated": nodes_updated,
"edges_created": edges_created,
"timestamp": datetime.now(timezone.utc).isoformat(),
})
When the graph changes, the event hits NATS JetStream on port 4222, and any agent subscribed to wiki.graph.changed gets notified. Our QA agent uses this to re-validate runbooks when they change. The DevOps agent watches for infrastructure doc updates.
Set up a scheduled agent loop to run staleness checks weekly. Flag anything above 0.7 for human review. Let agents auto-refresh repos daily. Your knowledge base maintains itself.
Step 4: Search It All
With your repos ingested, pages linked, and freshness maintained, your agents can search across everything:
{
"tool": "wiki_search_pages",
"params": {
"query": "how does authentication work",
"space": "engineering",
"type": "documentation",
"tags": ["auth"]
}
}
Search combines structured filters (type, tags, space) with semantic search via embeddings. An agent asking "how does authentication work" will find your Auth Service docs, the OAuth migration plan, related runbooks, and any linked decision records — even if none of them use the exact word "authentication."
You can also filter by type to narrow results:
{
"tool": "wiki_search_pages",
"params": {
"query": "deployment process",
"type": "runbook",
"space": "operations"
}
}
Tips and Gotchas
Start with file patterns. Don't ingest entire repos. Start with docs/**/*.md and expand. You'll get cleaner results and faster ingestion.
Use consistent relation types. Pick five or six relation types and stick with them. DEPENDS_ON, DOCUMENTS, OWNED_BY, RELATED_TO, SUPERSEDES cover most cases. Agents that create relations ad hoc will make the graph noisy.
Scope your spaces. Multi-tenancy is built in. Use spaces to separate engineering, product, operations, and support knowledge. Agents searching for deployment info don't need to wade through product specs.
Clean up dead relations. When you delete a page with wiki_delete_page, relations are cleaned up automatically. But if you reorganize knowledge, use wiki_delete_relation to remove links that no longer make sense.
Refresh on a schedule. Stale ingested repos are worse than no repos. Set wiki_refresh_repo to run daily for active repos.
Let agents write back. The most powerful pattern is agents that learn and write back to the knowledge base. An agent that resolves an incident can create a page documenting the fix, link it to the affected service, and tag it for the on-call team. Next time, a different agent finds it instantly.
Build Agents That Actually Remember
Nineteen tools. A Neo4j graph. Embeddings for semantic search. Git ingestion so your codebase documentation is always in the graph. Typed relations with 8 validated types (LINKS_TO, RELATED_TO, CONTRADICTS, DERIVED_FROM, DEPENDS_ON, SUPERSEDES, PART_OF, MENTIONS) so agents understand how things connect. Freshness scoring so nothing goes stale quietly.
This is how you go from "AI agent that sounds smart" to "AI agent that knows your organization."
I am running this in production with 83,163 test functions across 2,304 test files, 11 agents that cross-reference code, docs, wiki pages, and meeting transcripts in a single query, and 9,799 commits of battle-tested infrastructure behind it.
I'm Moshe Beeri. I build agent.ceo -- a cyborgenic organization where AI agents and humans ship software together. 9,799 commits and counting.