Your AI agent just gave a perfect answer. It understood the codebase, remembered that you prefer Terraform over Pulumi, and knew not to touch the legacy billing service without running the integration tests first. Then the context window hit its limit, the session reset, and all of that knowledge vanished.
This is the single biggest problem with running AI agents in production. Context windows are finite. Sessions end. Agents restart. And every restart is amnesia.
We run a multi-agent organization at agent.ceo where AI agents hold real roles — marketing, engineering, operations — 24/7. Early on, our agents repeated the same mistakes, asked the same clarifying questions, and lost every correction we gave them. The fix was not a bigger context window. It was persistent memory written to disk.
Here is exactly how we built it.
Why Context Windows Are Not Memory
A context window is working memory. It holds what the agent is thinking about right now. But it has hard limits — typically 100K to 200K tokens. When a long-running agent accumulates enough conversation history, the context gets compacted (summarized and truncated) or the session restarts entirely.
Everything the agent learned during that session disappears:
- User preferences — "deploy to staging first, always"
- Past corrections — "don't use the v1 API, it's deprecated"
- Project context — "the auth migration is blocked on the database team"
- Institutional knowledge — "the CEO reviews all customer-facing copy before publish"
In a multi-agent organization, this is catastrophic. Agent A learns that a particular deployment pattern causes downtime. Agent A restarts. Agent A tries the same pattern again. The organization has no long-term memory — just a collection of amnesiac workers repeating each other's mistakes.
The solution is a structured file-based memory system that persists across sessions.
The Architecture
The system is simple. Each agent gets a memory/ directory containing individual memory files and an index. The index file — MEMORY.md — gets loaded into every new conversation automatically, giving the agent immediate access to everything it has learned.
.claude/
projects/
my-project/
memory/
MEMORY.md # Index — loaded every session
user-preferences.md # Who the user is, how they work
deploy-feedback.md # Corrections about deployment
auth-migration.md # Ongoing project context
staging-endpoints.md # Reference to external systems
The key constraint: MEMORY.md must stay under 200 lines. It is loaded into every conversation, so it consumes context window space. Keep it lean — one-line pointers to the detailed memory files, not the details themselves.
Each memory file uses frontmatter to declare its type and purpose:
---
name: deploy-feedback
description: "Corrections and confirmed patterns for deployment workflow"
metadata:
type: feedback
---
Four memory types cover everything an agent needs to remember:
| Type | Purpose | Example |
|---|---|---|
user | Who the agent works with | Role, preferences, communication style |
feedback | Corrections and confirmed approaches | "Always run lint before commit" |
project | Ongoing work context | Current sprint goals, blockers, decisions |
reference | Pointers to external systems | API endpoints, config file locations |
Tutorial: Setting Up Persistent Memory
Step 1: Create the Memory Directory
mkdir -p .claude/projects/my-project/memory
This directory lives alongside your project configuration. If you are running multiple agents, each agent gets its own memory directory.
Step 2: Define the MEMORY.md Index
Create .claude/projects/my-project/memory/MEMORY.md with this structure:
# Agent Memory
_Last compacted: 2026-06-25 | Outcomes: 3 | Patterns: 2_
## User Context
- moshe: Founder, prefers direct communication, reviews all deploys (user-preferences.md)
## Active Feedback
- deploy: Always run integration tests before staging push (deploy-feedback.md)
- api: Use v2 endpoints only, v1 deprecated since March (api-feedback.md)
## Project State
- auth-migration: Blocked on DB team, ETA July 1 (auth-migration.md)
## Reference
- staging: Endpoints and credentials location (staging-endpoints.md)
Each line is a one-line pointer — under 150 characters. The parenthetical references point to the detailed memory file. The agent reads the index every session and loads specific files only when relevant.
Step 3: Write Your First Memory File
Create a feedback memory for something the agent got wrong and you corrected:
---
name: deploy-feedback
description: "Deployment workflow corrections — verified patterns and anti-patterns"
metadata:
type: feedback
---
## Confirmed: Run Integration Tests Before Staging
**Why:** Staging push on 2026-06-10 broke the billing webhook because
unit tests passed but integration tests would have caught the
schema mismatch.
**How to apply:** Before any `git push` to a staging branch, run
`make test-integration`. If it fails, fix before pushing. No exceptions.
**Source:** Founder correction, 2026-06-10.
Notice the structure: a clear title, a Why: explaining the reasoning, a How to apply: with the concrete action, and a Source: for provenance. This is not a note — it is an instruction the agent can act on.
Step 4: Configure the Agent to Load Memory at Session Start
In your agent's system instructions (CLAUDE.md or equivalent), add:
## Session Start
1. Read `memory/MEMORY.md` for persistent context
2. Check for relevant memory files based on the current task
3. Verify any file paths or endpoints mentioned in memories still exist
The critical detail: the agent must verify memories before acting on them. A memory that says "config lives at /etc/app/config.yaml" is only useful if that file still exists. Memories are hypotheses about the world, not ground truth.
Step 5: Add Memory-Writing Triggers
The agent needs to know when to create or update memories. Add these triggers to your instructions:
## When to Write Memory
- User corrects you: feedback memory (what you did wrong, what to do instead)
- User states a preference: user memory (how they want things done)
- A project decision is made with reasoning: project memory (the decision and why)
- You discover a useful system reference: reference memory (endpoint, path, credential location)
## When NOT to Write Memory
- Code patterns (read the code directly)
- Git history (use git log)
- Debugging solutions (the fix is already in the code)
- Ephemeral task state (use task management, not memory)
The filter rule is simple: if removing the memory would not change the agent's future behavior, do not save it.
What to Store vs. What Not to Store
The most common mistake is storing too much. Bloated memory files waste context window space and mislead future sessions.
Store: user role and preferences, corrections that prevent recurring mistakes, project decisions with reasoning, external system locations.
Do not store: code patterns (read the code), git history (use git log), debugging solutions (the fix is in the commit), ephemeral task state (use your task manager).
The filter rule: if removing the memory would not change the agent's future behavior, do not save it.
Memory Hygiene
Four rules keep your memory system healthy over time:
- Cross-reference related memories with
See also: [[name]]links - Use absolute dates — "2026-06-19" not "last Thursday"
- Verify before acting — a memory referencing a file path is only useful if the file still exists
- Update or remove stale memories — a completed migration is noise, not knowledge
Real-World Memory Examples
Here are four memory files from our production agent organization:
User memory:
---
name: user-moshe
description: "Founder profile — communication style and review requirements"
metadata:
type: user
---
Moshe Beeri, Founder. Prefers direct, concise updates. Wants verification
evidence (curl output, test results) not prose summaries. Reviews all
customer-facing content before publish. Timezone: UTC+3.
Feedback memory:
---
name: api-versioning
description: "API version policy — confirmed after v1 incident"
metadata:
type: feedback
---
## Always Use v2 API Endpoints
**Why:** v1 endpoints deprecated 2026-03-15. The /v1/users endpoint
returns stale cached data since the migration. Agent used v1 on
2026-04-02 and generated a report with wrong user counts.
**How to apply:** All API calls use /v2/ prefix. If a v1 URL appears
in existing code, flag it for migration. See also: [[staging-endpoints]]
Project memory:
---
name: auth-migration
description: "Auth service migration — status and blockers"
metadata:
type: project
---
## Auth Migration to OAuth2
Status: In progress, blocked on database schema changes.
Blocker: DB team ETA is 2026-07-01.
Decision: Use Authorization Code flow, not Client Credentials
(founder decision, 2026-06-05). Reason: need user-level scoping
for multi-tenant support.
**How to apply:** Do not merge auth-related PRs until DB migration
lands. All new auth code targets OAuth2 flow.
Reference memory:
---
name: staging-endpoints
description: "Staging environment endpoints and access details"
metadata:
type: reference
---
- API: https://staging-api.agent.ceo/v2
- Dashboard: https://staging.agent.ceo
- Credentials: stored in 1Password vault "Engineering — Staging"
- Health check: GET /v2/health returns `{"status": "ok"}`
- Deploy trigger: push to branch with `stg-` tag prefix
From Amnesia to Institutional Knowledge
File-based memory turns individual agent sessions into a continuous learning process. Each correction, each preference, each decision accumulates into an institutional knowledge base that survives restarts, compactions, and even agent replacements.
The system is deliberately simple — markdown files on disk, no database, no vector store, no retrieval pipeline. Simplicity is the point. A memory system that requires infrastructure to run is a memory system that will break at 3 AM when nobody is watching.
Start with one memory file. Write down the correction your agent keeps forgetting. Load it at session start. Watch the agent stop making that mistake. Then add another. In a week, your agent will know more about your project than a new hire on their first month.
We build AI agent teams that run production workloads 24/7. Persistent memory is one of the patterns that makes it work. See the full architecture at agent.ceo.