Build an AI Agent Knowledge Base with Wiki MCP Tools
TL;DR
- 19 MCP tools let your agents build, search, and maintain a Neo4j-backed knowledge base with semantic search.
- Git ingestion pulls your repos into the graph automatically; typed relations connect services, runbooks, and decisions.
- Freshness scoring surfaces stale docs before they mislead your agents.
Your AI agent can write code, draft emails, and summarize documents. Ask it what your team decided last Thursday, how your authentication service works, or where the deployment runbook lives -- and it draws a blank.
Agents without organizational memory are expensive autocomplete. In a cyborgenic organization, where AI agents share responsibilities with humans, that gap is fatal. Every agent needs access to the same institutional knowledge that makes a seasoned employee effective on day one.
We built 19 MCP tools to close this gap. They let your agents build, maintain, and search a knowledge base backed by a Neo4j graph -- pulling in your GitHub repos, linking concepts with typed relations, detecting stale docs, and running semantic search across everything. Here is how to set it up.
The Toolkit
The Wiki MCP tools break into four groups:
Core CRUD — wiki_update_page, wiki_delete_page, wiki_search_pages for creating and managing knowledge pages.
Relations — wiki_create_relation, wiki_delete_relation for building typed, directed links between pages. Think "ServiceA DEPENDS_ON ServiceB" or "Runbook DOCUMENTS Deployment."
Git Ingestion — wiki_ingest_repo, wiki_refresh_repo, wiki_list_repos for pulling your codebase documentation straight into the graph.
Freshness — wiki_staleness_check for finding pages that have drifted out of date, with decay scoring so you know what to fix first.
Everything is org-scoped and multi-tenant. Your agents only see your organization's knowledge. Let's build something.
Step 1: Ingest Your Repos
The fastest way to seed a knowledge base is to point it at your existing GitHub repos. The wiki_ingest_repo tool clones a repo (shallow clone, so it's fast), filters files by pattern, and creates page nodes in Neo4j with embeddings for semantic search.
{
"tool": "wiki_ingest_repo",
"params": {
"repoUrl": "https://github.com/your-org/backend-api",
"branch": "main",
"filePatterns": ["*.md", "*.rst", "docs/**/*.txt"],
"space": "engineering"
}
}
This does several things under the hood:
- Shallow clones the repo (no full history, no bloat)
- Filters files matching your patterns — grab just the markdown, just the docs folder, whatever you need
- Sanitizes file paths and creates a Page node for each file
- Generates embeddings for semantic search
- Creates a Repository node tracking the URL, branch, SHA, and ingestion timestamp
- Links every page back to its repo via a
FROM_REPOrelationship
Got private repos? It pulls credentials from a Kubernetes secret, so your tokens never touch the tool call.
Ingest a few repos to start:
// Your API docs
{ "tool": "wiki_ingest_repo", "params": { "repoUrl": "https://github.com/your-org/backend-api", "branch": "main", "filePatterns": ["docs/**/*.md"], "space": "engineering" } }
// Your runbooks
{ "tool": "wiki_ingest_repo", "params": { "repoUrl": "https://github.com/your-org/runbooks", "branch": "main", "filePatterns": ["*.md"], "space": "operations" } }
// Your product specs
{ "tool": "wiki_ingest_repo", "params": { "repoUrl": "https://github.com/your-org/product-specs", "branch": "main", "filePatterns": ["**/*.md"], "space": "product" } }
Check what you've ingested with wiki_list_repos:
{
"tool": "wiki_list_repos",
"params": { "space": "engineering" }
}
This returns every ingested repo with its branch, latest SHA, and last ingestion time — useful for auditing what's in the graph.
Step 2: Create and Link Knowledge Pages
Ingested docs are a good start. But the real power of a graph-backed knowledge base is relations. Your agents can create typed, directed links between any two pages.
Say you have a page about your Auth Service and another about your User API. Connect them:
{
"tool": "wiki_create_relation",
"params": {
"fromPageId": "page-auth-service",
"toPageId": "page-user-api",
"relationType": "DEPENDS_ON",
"metadata": { "note": "User API calls Auth Service for token validation" }
}
}
Relations are typed and directed. Some patterns we use constantly:
DEPENDS_ON— service A requires service BDOCUMENTS— a runbook documents a deployment processSUPERSEDES— a new architecture doc replaces an old oneRELATED_TO— loose conceptual linkOWNED_BY— a service is owned by a team
You can also create pages that don't come from git. Use wiki_update_page to write a page directly — useful for meeting notes, architecture decisions, or tribal knowledge that lives nowhere else:
{
"tool": "wiki_update_page",
"params": {
"pageId": "page-auth-migration-plan",
"title": "Auth Service Migration to OAuth2.1",
"content": "We're migrating from our custom token system to OAuth 2.1 by Q3...",
"tags": ["auth", "migration", "oauth"],
"type": "decision",
"space": "engineering"
}
}
Then link it to the services it affects:
{
"tool": "wiki_create_relation",
"params": {
"fromPageId": "page-auth-migration-plan",
"toPageId": "page-auth-service",
"relationType": "DOCUMENTS"
}
}
Now when an agent encounters an auth issue, it can traverse the graph: find the service, find the migration plan, understand the context. No more asking humans "wait, are we still using the old token system?"
Step 3: Keep It Fresh
Documentation rots. The wiki_staleness_check tool uses a decay scoring algorithm to surface pages that are likely outdated:
{
"tool": "wiki_staleness_check",
"params": {
"space": "engineering",
"threshold": 0.6
}
}
This returns pages ranked by staleness score. A page that was updated yesterday scores low. A page untouched for six months that documents a service with recent commits scores high.
For ingested repos, use wiki_refresh_repo to re-sync:
{
"tool": "wiki_refresh_repo",
"params": {
"repoUrl": "https://github.com/your-org/backend-api",
"space": "engineering"
}
}
This pulls the latest SHA, diffs against what's in the graph, and updates changed pages. It also fires a NATS event, so other agents in your fleet can react to knowledge updates — for example, a QA agent that re-validates runbooks when they change.
Set up a scheduled agent loop to run staleness checks weekly. Flag anything above 0.7 for human review. Let agents auto-refresh repos daily. Your knowledge base maintains itself.
Step 4: Search It All
With your repos ingested, pages linked, and freshness maintained, your agents can search across everything:
{
"tool": "wiki_search_pages",
"params": {
"query": "how does authentication work",
"space": "engineering",
"type": "documentation",
"tags": ["auth"]
}
}
Search combines structured filters (type, tags, space) with semantic search via embeddings. An agent asking "how does authentication work" will find your Auth Service docs, the OAuth migration plan, related runbooks, and any linked decision records — even if none of them use the exact word "authentication."
You can also filter by type to narrow results:
{
"tool": "wiki_search_pages",
"params": {
"query": "deployment process",
"type": "runbook",
"space": "operations"
}
}
Tips and Gotchas
Start with file patterns. Don't ingest entire repos. Start with docs/**/*.md and expand. You'll get cleaner results and faster ingestion.
Use consistent relation types. Pick five or six relation types and stick with them. DEPENDS_ON, DOCUMENTS, OWNED_BY, RELATED_TO, SUPERSEDES cover most cases. Agents that create relations ad hoc will make the graph noisy.
Scope your spaces. Multi-tenancy is built in. Use spaces to separate engineering, product, operations, and support knowledge. Agents searching for deployment info don't need to wade through product specs.
Clean up dead relations. When you delete a page with wiki_delete_page, relations are cleaned up automatically. But if you reorganize knowledge, use wiki_delete_relation to remove links that no longer make sense.
Refresh on a schedule. Stale ingested repos are worse than no repos. Set wiki_refresh_repo to run daily for active repos.
Let agents write back. The most powerful pattern is agents that learn and write back to the knowledge base. An agent that resolves an incident can create a page documenting the fix, link it to the affected service, and tag it for the on-call team. Next time, a different agent finds it instantly.
Build Agents That Actually Remember
Nineteen tools. A Neo4j graph. Embeddings for semantic search. Git ingestion so your codebase documentation is always in the graph. Typed relations so agents understand how things connect. Freshness scoring so nothing goes stale quietly.
This is how you go from "AI agent that sounds smart" to "AI agent that knows your organization."
We're running this in production with 272 tests passing and agents that cross-reference code, docs, wiki pages, and Slack history in a single query.
Build your own cyborgenic organization at agent.ceo.