Files
claude-mem/docs/architecture-evolution.mdx
2025-10-26 22:29:43 -04:00

802 lines
20 KiB
Plaintext

# Architecture Evolution: The Journey from v3 to v4
## The Problem We Solved
**Goal:** Create a memory system that makes Claude smarter across sessions without the user noticing it exists.
**Challenge:** How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow?
This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work.
---
## v1-v2: The Naive Approach
### The First Attempt: Dump Everything
**Architecture:**
```
PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup
```
**What we learned:**
- ❌ Context pollution (thousands of tokens of irrelevant data)
- ❌ No compression (raw tool outputs are verbose)
- ❌ No search (had to scan everything linearly)
- ✅ Proved the concept: Memory across sessions is valuable
**Example of what went wrong:**
```
SessionStart loaded:
- 150 file read operations
- 80 grep searches
- 45 bash commands
- Total: ~35,000 tokens
- Relevant to current task: ~500 tokens (1.4%)
```
---
## v3: Smart Compression, Wrong Architecture
### The Breakthrough: AI-Powered Compression
**New idea:** Use Claude itself to compress observations
**Architecture:**
```
PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights
```
**What we added:**
1. **Claude Agent SDK integration** - Use AI to compress observations
2. **Background worker** - Don't block main session
3. **Structured observations** - Extract facts, decisions, insights
4. **Session summaries** - Generate comprehensive summaries
**What worked:**
- ✅ Compression ratio: 10:1 to 100:1
- ✅ Semantic understanding (not just keyword matching)
- ✅ Background processing (hooks stayed fast)
- ✅ Search became useful
**What didn't work:**
- ❌ Still loaded everything upfront
- ❌ Session ID management was broken
- ❌ Aggressive cleanup interrupted summaries
- ❌ Multiple SDK sessions per Claude Code session
---
## The Key Realizations
### Realization 1: Progressive Disclosure
**Problem:** Even compressed observations can pollute context if you load them all.
**Insight:** Humans don't read everything before starting work. Why should AI?
**Solution:** Show an index first, fetch details on-demand.
```
❌ Old: Load 50 observations (8,500 tokens)
✅ New: Show index of 50 observations (800 tokens)
Agent fetches 2-3 relevant ones (300 tokens)
Total: 1,100 tokens vs 8,500 tokens
```
**Impact:**
- 87% reduction in context usage
- 100% relevance (only fetch what's needed)
- Agent autonomy (decides what's relevant)
### Realization 2: Session ID Chaos
**Problem:** SDK session IDs change on every turn.
**What we thought:**
```typescript
// ❌ Wrong assumption
UserPromptSubmit → Capture session ID once → Use forever
```
**Reality:**
```typescript
// ✅ Actual behavior
Turn 1: session_abc123
Turn 2: session_def456
Turn 3: session_ghi789
```
**Why this matters:**
- Can't resume sessions without tracking ID updates
- Session state gets lost between turns
- Observations get orphaned
**Solution:**
```typescript
// Capture from system init message
for await (const msg of response) {
if (msg.type === 'system' && msg.subtype === 'init') {
sdkSessionId = msg.session_id;
await updateSessionId(sessionId, sdkSessionId);
}
}
```
### Realization 3: Graceful vs Aggressive Cleanup
**v3 approach:**
```typescript
// ❌ Aggressive: Kill worker immediately
SessionEnd → DELETE /worker/session → Worker stops
```
**Problems:**
- Summary generation interrupted mid-process
- Pending observations lost
- Race conditions everywhere
**v4 approach:**
```typescript
// ✅ Graceful: Let worker finish
SessionEnd → Mark session complete → Worker finishes → Exit naturally
```
**Benefits:**
- Summaries complete successfully
- No lost observations
- Clean state transitions
**Code:**
```typescript
// v3: Aggressive
async function sessionEnd(sessionId: string) {
await fetch(`http://localhost:37777/sessions/${sessionId}`, {
method: 'DELETE'
});
}
// v4: Graceful
async function sessionEnd(sessionId: string) {
await db.run(
'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?',
[Date.now(), sessionId]
);
}
```
### Realization 4: One Session, Not Many
**Problem:** We were creating multiple SDK sessions per Claude Code session.
**What we thought:**
```
Claude Code session → Create SDK session per observation → 100+ SDK sessions
```
**Reality should be:**
```
Claude Code session → ONE long-running SDK session → Streaming input
```
**Why this matters:**
- SDK maintains conversation state
- Context accumulates naturally
- Much more efficient
**Implementation:**
```typescript
// ✅ Streaming Input Mode
async function* messageGenerator(): AsyncIterable<UserMessage> {
// Initial prompt
yield {
role: "user",
content: "You are a memory assistant..."
};
// Then continuously yield observations
while (session.status === 'active') {
const observations = await pollQueue();
for (const obs of observations) {
yield {
role: "user",
content: formatObservation(obs)
};
}
await sleep(1000);
}
}
const response = query({
prompt: messageGenerator(),
options: { maxTurns: 1000 }
});
```
---
## v4: The Architecture That Works
### The Core Design
```
┌─────────────────────────────────────────────────────────┐
│ CLAUDE CODE SESSION │
│ User → Claude → Tools (Read, Edit, Write, Bash) │
│ ↓ │
│ PostToolUse Hook │
│ (queues observation) │
└─────────────────────────────────────────────────────────┘
↓ SQLite queue
┌─────────────────────────────────────────────────────────┐
│ SDK WORKER PROCESS │
│ ONE streaming session per Claude Code session │
│ │
│ AsyncIterable<UserMessage> │
│ → Yields observations from queue │
│ → SDK compresses via AI │
│ → Parses XML responses │
│ → Stores in database │
└─────────────────────────────────────────────────────────┘
↓ SQLite storage
┌─────────────────────────────────────────────────────────┐
│ NEXT SESSION │
│ SessionStart Hook │
│ → Queries database │
│ → Returns progressive disclosure index │
│ → Agent fetches details via MCP │
└─────────────────────────────────────────────────────────┘
```
### The Five Hook Architecture
<Tabs>
<Tab title="SessionStart">
**Purpose:** Inject context from previous sessions
**Timing:** When Claude Code starts
**What it does:**
- Queries last 10 session summaries
- Formats as progressive disclosure index
- Injects into context via stdout
**Key change from v3:**
- ✅ Index format (not full details)
- ✅ Token counts visible
- ✅ MCP search instructions included
</Tab>
<Tab title="UserPromptSubmit">
**Purpose:** Initialize session tracking
**Timing:** Before Claude processes prompt
**What it does:**
- Creates session record
- Saves raw user prompt (v4.2.0+)
- Starts worker if needed
**Key change from v3:**
- ✅ Stores raw prompts for search
- ✅ Auto-starts PM2 worker
</Tab>
<Tab title="PostToolUse">
**Purpose:** Capture tool observations
**Timing:** After every tool execution
**What it does:**
- Enqueues observation in database
- Returns immediately
**Key change from v3:**
- ✅ Just enqueues (doesn't process)
- ✅ Worker handles all AI calls
</Tab>
<Tab title="Summary">
**Purpose:** Generate session summaries
**Timing:** Worker-triggered (mid-session)
**What it does:**
- Gathers observations
- Sends to Claude for summarization
- Stores structured summary
**Key change from v3:**
- ✅ Multiple summaries per session
- ✅ Summaries are checkpoints, not endings
</Tab>
<Tab title="SessionEnd">
**Purpose:** Graceful cleanup
**Timing:** When session ends
**What it does:**
- Marks session complete
- Lets worker finish processing
**Key change from v3:**
- ✅ Graceful (not aggressive)
- ✅ No DELETE requests
- ✅ Worker finishes naturally
</Tab>
</Tabs>
### Database Schema Evolution
**v3 schema:**
```sql
-- Simple, flat structure
CREATE TABLE observations (
id INTEGER PRIMARY KEY,
session_id TEXT,
text TEXT,
created_at INTEGER
);
```
**v4 schema:**
```sql
-- Rich, structured schema
CREATE TABLE observations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
project TEXT NOT NULL,
-- Progressive disclosure metadata
title TEXT NOT NULL,
subtitle TEXT,
type TEXT NOT NULL, -- decision, bugfix, feature, etc.
-- Content
narrative TEXT NOT NULL,
facts TEXT, -- JSON array
-- Searchability
concepts TEXT, -- JSON array of tags
files_read TEXT, -- JSON array
files_modified TEXT, -- JSON array
-- Timestamps
created_at TEXT NOT NULL,
created_at_epoch INTEGER NOT NULL,
FOREIGN KEY(session_id) REFERENCES sdk_sessions(id)
);
-- FTS5 for full-text search
CREATE VIRTUAL TABLE observations_fts USING fts5(
title, subtitle, narrative, facts, concepts,
content=observations
);
-- Auto-sync triggers
CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN
INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts)
VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts);
END;
```
**What changed:**
- ✅ Structured fields (title, subtitle, type)
- ✅ FTS5 full-text search
- ✅ Project-scoped queries
- ✅ Rich metadata for progressive disclosure
### Worker Service Redesign
**v3 worker:**
```typescript
// Multiple short SDK sessions
app.post('/process', async (req, res) => {
const response = await query({
prompt: buildPrompt(req.body),
options: { maxTurns: 1 }
});
for await (const msg of response) {
// Process single observation
}
res.json({ success: true });
});
```
**v4 worker:**
```typescript
// ONE long-running SDK session
async function runWorker(sessionId: string) {
const response = query({
prompt: messageGenerator(), // AsyncIterable
options: { maxTurns: 1000 }
});
for await (const msg of response) {
if (msg.type === 'text') {
parseObservations(msg.content);
parseSummaries(msg.content);
}
}
}
```
**Benefits:**
- Maintains conversation state
- SDK handles context automatically
- More efficient (fewer API calls)
- Natural multi-turn flow
---
## Critical Fixes Along the Way
### Fix 1: Context Injection Pollution (v4.3.1)
**Problem:** SessionStart hook output polluted with npm install logs
```bash
# Hook output contained:
npm WARN deprecated ...
npm WARN deprecated ...
{"hookSpecificOutput": {"additionalContext": "..."}}
```
**Why it broke:**
- Claude Code expects clean JSON or plain text
- stderr/stdout from npm install mixed with hook output
- Context didn't inject properly
**Solution:**
```json
{
"command": "npm install --loglevel=silent && node context-hook.js"
}
```
**Result:** Clean JSON output, context injection works
### Fix 2: Double Shebang Issue (v4.3.1)
**Problem:** Hook executables had duplicate shebangs
```javascript
#!/usr/bin/env node
#!/usr/bin/env node // ← Duplicate!
// Rest of code...
```
**Why it happened:**
- Source files had shebang
- esbuild added another shebang during build
**Solution:**
```typescript
// Remove shebangs from source files
// Let esbuild add them during build
```
**Result:** Clean executables, no parsing errors
### Fix 3: FTS5 Injection Vulnerability (v4.2.3)
**Problem:** User input passed directly to FTS5 query
```typescript
// ❌ Vulnerable
const results = db.query(
`SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'`
);
```
**Attack:**
```typescript
userQuery = "'; DROP TABLE observations; --"
```
**Solution:**
```typescript
// ✅ Safe: Use parameterized queries
const results = db.query(
'SELECT * FROM observations_fts WHERE observations_fts MATCH ?',
[userQuery]
);
```
### Fix 4: NOT NULL Constraint Violation (v4.2.8)
**Problem:** Session creation failed when prompt was empty
```sql
INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...)
VALUES ('abc123', NULL, ...) -- ❌ user_prompt is NOT NULL
```
**Solution:**
```typescript
// Allow NULL user_prompts
user_prompt: input.prompt ?? null
```
**Schema change:**
```sql
-- Before
user_prompt TEXT NOT NULL
-- After
user_prompt TEXT -- Nullable
```
---
## Performance Improvements
### Optimization 1: Prepared Statements
**Before:**
```typescript
for (const obs of observations) {
db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]);
}
```
**After:**
```typescript
const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`);
for (const obs of observations) {
stmt.run([obs.id, obs.text, ...]);
}
stmt.finalize();
```
**Impact:** 5x faster bulk inserts
### Optimization 2: FTS5 Indexing
**Before:**
```typescript
// Manual full-text search
const results = db.query(
`SELECT * FROM observations WHERE text LIKE '%${query}%'`
);
```
**After:**
```typescript
// FTS5 virtual table
const results = db.query(
`SELECT * FROM observations_fts WHERE observations_fts MATCH ?`,
[query]
);
```
**Impact:** 100x faster searches on large datasets
### Optimization 3: Index Format Default
**Before:**
```typescript
// Always return full observations
search_observations({ query: "hooks" });
// Returns: 5,000 tokens
```
**After:**
```typescript
// Default to index format
search_observations({ query: "hooks", format: "index" });
// Returns: 200 tokens
// Fetch full only when needed
search_observations({ query: "hooks", format: "full", limit: 1 });
// Returns: 150 tokens
```
**Impact:** 25x reduction in average search result size
---
## What We Learned
### Lesson 1: Context is Precious
**Principle:** Every token you put in context window costs attention.
**Application:**
- Progressive disclosure reduces waste by 87%
- Index-first approach gives agent control
- Token counts make costs visible
### Lesson 2: Session State is Complicated
**Principle:** Distributed state is hard. SDK handles it better than we can.
**Application:**
- Use SDK's built-in session resumption
- Don't try to manually reconstruct state
- Track session IDs from init messages
### Lesson 3: Graceful Beats Aggressive
**Principle:** Let processes finish their work before terminating.
**Application:**
- Graceful cleanup prevents data loss
- Workers finish important operations
- Clean state transitions reduce bugs
### Lesson 4: AI is the Compressor
**Principle:** Don't compress manually. Let AI do semantic compression.
**Application:**
- 10:1 to 100:1 compression ratios
- Semantic understanding, not keyword extraction
- Structured outputs (XML parsing)
### Lesson 5: Progressive Everything
**Principle:** Show metadata first, fetch details on-demand.
**Application:**
- Progressive disclosure in context injection
- Index format in search results
- Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details)
---
## The Road Ahead
### Planned: Adaptive Index Size
```typescript
SessionStart({ source: "startup" }):
→ Show last 10 sessions (normal)
SessionStart({ source: "resume" }):
→ Show only current session (minimal)
SessionStart({ source: "compact" }):
→ Show last 20 sessions (comprehensive)
```
### Planned: Relevance Scoring
```typescript
// Use embeddings to pre-sort index by semantic relevance
search_observations({
query: "authentication bug",
sort: "relevance" // Based on embeddings
});
```
### Planned: Multi-Project Context
```typescript
// Cross-project pattern recognition
search_observations({
query: "API rate limiting",
projects: ["api-gateway", "user-service", "billing-service"]
});
```
### Planned: Collaborative Memory
```typescript
// Team-shared observations (optional)
createObservation({
title: "Rate limit: 100 req/min",
scope: "team" // vs "user"
});
```
---
## Migration Guide: v3 → v4
### Step 1: Backup Database
```bash
cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db
```
### Step 2: Update Plugin
```bash
cd ~/.claude/plugins/marketplaces/thedotmack
git pull
```
### Step 3: Run Migration
```bash
npx tsx src/services/sqlite/migrations/v3-to-v4.ts
```
**What the migration does:**
- Adds new columns to observations table
- Creates FTS5 virtual tables
- Sets up auto-sync triggers
- Migrates existing observations to new schema
### Step 4: Restart Worker
```bash
pm2 restart claude-mem-worker
pm2 logs claude-mem-worker
```
### Step 5: Test
```bash
# Start Claude Code
claude
# Check that context is injected
# (Should see progressive disclosure index)
# Submit a prompt and check observations
pm2 logs claude-mem-worker --nostream
```
---
## Key Metrics
### v3 Performance
| Metric | Value |
|--------|-------|
| Context usage per session | ~25,000 tokens |
| Relevant context | ~2,000 tokens (8%) |
| Hook execution time | ~200ms |
| Search latency | ~500ms (LIKE queries) |
### v4 Performance
| Metric | Value |
|--------|-------|
| Context usage per session | ~1,100 tokens |
| Relevant context | ~1,100 tokens (100%) |
| Hook execution time | ~45ms |
| Search latency | ~15ms (FTS5) |
**Improvements:**
- 96% reduction in context waste
- 12x increase in relevance
- 4x faster hooks
- 33x faster search
---
## Conclusion
The journey from v3 to v4 was about understanding these fundamental truths:
1. **Context is finite** - Progressive disclosure respects attention budget
2. **AI is the compressor** - Semantic understanding beats keyword extraction
3. **Agents are smart** - Let them decide what to fetch
4. **State is hard** - Use SDK's built-in mechanisms
5. **Graceful wins** - Let processes finish cleanly
The result is a memory system that's both powerful and invisible. Users never notice it working - Claude just gets smarter over time.
---
## Further Reading
- [Progressive Disclosure](/docs/progressive-disclosure) - The philosophy behind v4
- [Hooks Architecture](/docs/hooks-architecture) - How hooks power the system
- [Context Engineering](/docs/context-engineering) - Foundational principles
- [v4.0.0 Release Notes](/CHANGELOG.md#v400) - Full changelog
---
*This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v4 is the architecture that emerged from understanding what actually works.*