feat(docs): Add analysis reports for issues #514, #517, #520, #527, and #532

- Issue #514: Documented analysis of orphaned observer session files, including root cause, evidence, and recommended fixes. - Issue #517: Analyzed PowerShell escaping issues in cleanupOrphanedProcesses() on Windows, with recommended fixes using WMIC. - Issue #520: Confirmed resolution of stuck messages issue through architectural changes to a claim-and-delete pattern. - Issue #527: Identified detection failure of uv on Apple Silicon Macs with Homebrew installation, proposed path updates for detection. - Issue #532: Analyzed memory leak issues in SessionManager, detailing session cleanup and conversationHistory growth concerns, with recommended fixes.
2026-04-25 17:15:04 +02:00 · 2026-01-04 00:21:22 -05:00
parent a833cd2099
commit bb033b95f1
7 changed files with 1027 additions and 2 deletions
--- a/docs/reports/2026-01-04--issue-514-orphaned-sessions-analysis.md
+++ b/docs/reports/2026-01-04--issue-514-orphaned-sessions-analysis.md
@@ -0,0 +1,292 @@
+# Issue #514: Orphaned Observer Session Files Analysis
+
+**Date:** January 4, 2026
+**Status:** PARTIALLY RESOLVED - Root cause understood, fix was made but reverted
+**Original Issue:** 13,000+ orphaned .jsonl session files created over 2 days
+
+---
+
+## Executive Summary
+
+Issue #514 reported that the plugin created 13,000+ orphaned session .jsonl files in `~/.claude/projects/<project>/`. Each file contained only an initialization message with no actual observations. The hypothesis was that `startSessionProcessor()` in startup-recovery created new observer sessions in a loop.
+
+**Current State:** The issue was **fixed in commit 9a7f662** with a deterministic `mem-${contentSessionId}` prefix approach, but this fix was **reverted in commit f9197b5** due to the SDK not accepting custom session IDs. The current code uses a NULL-based initialization pattern that can still create orphaned sessions under certain conditions.
+
+---
+
+## Evidence: Current File Analysis
+
+Filesystem analysis of `~/.claude/projects/-Users-alexnewman-Scripts-claude-mem/`:
+
+| Line Count | Number of Files |
+|------------|-----------------|
+| 0 lines (empty) | 407 |
+| 1 line | **12,562** |
+| 2 lines | 3,199 |
+| 3+ lines | 3,546 |
+| **Total** | **~19,714** |
+
+The 12,562 single-line files are consistent with the issue description - sessions that initialized but never received observations.
+
+Sample single-line file content:
+```json
+{"type":"queue-operation","operation":"dequeue","timestamp":"2025-12-28T20:41:25.484Z","sessionId":"00081a3b-9485-48a4-89f0-fd4dfccd3ac9"}
+```
+
+---
+
+## Root Cause Analysis
+
+### The Problem Chain
+
+1. **Worker startup calls `processPendingQueues()`** (line 281 in worker-service.ts)
+2. For each session with pending messages, it calls `initializeSession()` then `startSessionProcessor()`
+3. `startSessionProcessor()` invokes `sdkAgent.startSession()` which calls the Claude Agent SDK `query()` function
+4. **If `memorySessionId` is NULL**, no `resume` parameter is passed to `query()`
+5. **The SDK creates a NEW .jsonl file** for each query call without a resume parameter
+6. **If the query aborts before receiving a response** (timeout, crash, abort signal), the `memorySessionId` is never captured
+7. On next startup, the cycle repeats - creating yet another orphaned file
+
+### Why Sessions Abort Before Capturing memorySessionId
+
+Looking at `startSessionProcessor()` flow:
+
+```typescript
+// worker-service.ts lines 301-321
+private startSessionProcessor(session, source) {
+  session.generatorPromise = this.sdkAgent.startSession(session, this)
+    .catch(error => { /* error handling */ })
+    .finally(() => {
+      session.generatorPromise = null;
+      this.broadcastProcessingStatus();
+    });
+}
+```
+
+And `processPendingQueues()`:
+
+```typescript
+// worker-service.ts lines 347-371
+for (const sessionDbId of orphanedSessionIds) {
+  const session = this.sessionManager.initializeSession(sessionDbId);
+  this.startSessionProcessor(session, 'startup-recovery');
+  await new Promise(resolve => setTimeout(resolve, 100));  // 100ms delay between sessions
+}
+```
+
+The problem: Starting 50 sessions rapidly (100ms delay) with pending messages means:
+- All 50 SDK queries start nearly simultaneously
+- The SDK creates 50 new .jsonl files (since none have memorySessionId yet)
+- If any query fails/aborts before the first response, its memorySessionId is never captured
+- On next startup, those sessions get new files again
+
+---
+
+## Code Flow: Where .jsonl Files Are Created
+
+The .jsonl files are created by the **Claude Agent SDK** (`@anthropic-ai/claude-agent-sdk`), not by claude-mem directly.
+
+When `query()` is called in SDKAgent.ts:
+
+```typescript
+// SDKAgent.ts lines 89-99
+const queryResult = query({
+  prompt: messageGenerator,
+  options: {
+    model: modelId,
+    // Resume with captured memorySessionId (null on first prompt, real ID on subsequent)
+    ...(hasRealMemorySessionId && { resume: session.memorySessionId }),
+    disallowedTools,
+    abortController: session.abortController,
+    pathToClaudeCodeExecutable: claudePath
+  }
+});
+```
+
+**Key insight:** If `hasRealMemorySessionId` is false (memorySessionId is null), no `resume` parameter is passed. The SDK then generates a new UUID and creates a new file at:
+`~/.claude/projects/<dashed-cwd>/<new-uuid>.jsonl`
+
+---
+
+## Fix History
+
+### Commit 9a7f662: The Original Fix (Reverted)
+
+```
+fix(sdk): always pass deterministic session ID to prevent orphaned files
+
+Fixes #514 - Excessive observer sessions created during startup-recovery
+
+Root cause: When memorySessionId was null, no `resume` parameter was passed
+to the SDK's query(). This caused the SDK to create a NEW session file on
+every call. If queries aborted before capturing the SDK's session_id, the
+placeholder remained, leading to cascading creation of 13,000+ orphaned files.
+
+Fix:
+- Generate deterministic ID `mem-${contentSessionId}` upfront
+- Always pass it to `resume` parameter
+- Persist immediately to database before query starts
+- If SDK returns different ID, capture and use that going forward
+```
+
+**This fix was correct in approach** - always passing a resume parameter prevents new file creation.
+
+### Commit f9197b5: The Revert
+
+```
+fix(sdk): restore session continuity via robust capture-and-resume strategy
+
+Replaces the deterministic 'mem-' ID approach with a capture-based strategy:
+1. Passes 'resume' parameter ONLY when a verified memory session ID exists
+2. Captures SDK-generated session ID when it differs from current ID
+3. Ensures subsequent prompts resume the correctly captured session ID
+
+This resolves the issue where new sessions were created for every message
+due to failure to capture/resume the initial session ID, without introducing
+potentially invalid deterministic IDs.
+```
+
+**The revert explanation suggests the SDK rejected the `mem-` prefix IDs.**
+
+### Commit 005b0f8: Current NULL-based Pattern
+
+Changed `memory_session_id` initialization from `contentSessionId` (placeholder) to `NULL`:
+- Simpler logic: `!!session.memorySessionId` instead of `memorySessionId !== contentSessionId`
+- But still creates new files on first prompt of each session
+
+---
+
+## Relationship with Issue #520 (Stuck Messages)
+
+**Issue #520 is related but distinct:**
+
+| Aspect | Issue #514 (Orphaned Files) | Issue #520 (Stuck Messages) |
+|--------|-----------------------------|-----------------------------|
+| Problem | Too many .jsonl files | Messages never processed |
+| Root Cause | SDK creates new file per query without resume | Old claim-process-mark pattern left messages in 'processing' state |
+| Status | Partially resolved | **Fully resolved** |
+| Fix | Need deterministic resume IDs | Changed to claim-and-delete pattern |
+
+**Connection:** Both issues relate to startup-recovery. Issue #520's fix (claim-and-delete pattern) doesn't create the loop that #514 describes, but #514 can still occur when:
+1. Sessions have pending messages
+2. Recovery starts the generator
+3. Generator aborts before capturing memorySessionId
+4. Next startup repeats the cycle
+
+---
+
+## v8.5.7 Status
+
+**v8.5.7 did NOT fully address Issue #514.** The major changes were:
+- Modular architecture refactor
+- NULL-based initialization pattern
+- Comprehensive test coverage
+
+The deterministic `mem-` prefix fix (9a7f662) was reverted before v8.5.7.
+
+---
+
+## Recommended Fix
+
+### Option 1: Reintroduce Deterministic IDs with SDK Validation
+
+```typescript
+// SDKAgent.ts - In startSession()
+async startSession(session: ActiveSession, worker?: WorkerRef): Promise<void> {
+  // Generate deterministic ID based on database session ID (not UUID-based contentSessionId)
+  // Format: "mem-<sessionDbId>" is short and unlikely to conflict
+  const deterministicMemoryId = session.memorySessionId || `mem-${session.sessionDbId}`;
+
+  // Always pass resume to prevent orphaned sessions
+  const queryResult = query({
+    prompt: messageGenerator,
+    options: {
+      model: modelId,
+      resume: deterministicMemoryId,  // ALWAYS pass, even if SDK might reject
+      disallowedTools,
+      abortController: session.abortController,
+      pathToClaudeCodeExecutable: claudePath
+    }
+  });
+
+  // Capture whatever ID the SDK actually uses
+  for await (const message of queryResult) {
+    if (message.session_id && message.session_id !== session.memorySessionId) {
+      session.memorySessionId = message.session_id;
+      this.dbManager.getSessionStore().updateMemorySessionId(
+        session.sessionDbId,
+        message.session_id
+      );
+    }
+    // ... rest of processing
+  }
+}
+```
+
+### Option 2: Limit Recovery Scope
+
+Prevent the recovery loop by limiting how many times a session can be recovered:
+
+```typescript
+// In processPendingQueues()
+for (const sessionDbId of orphanedSessionIds) {
+  // Check if this session was already recovered recently
+  const dbSession = this.dbManager.getSessionById(sessionDbId);
+  const recoveryAttempts = dbSession.recovery_attempts || 0;
+
+  if (recoveryAttempts >= 3) {
+    logger.warn('SYSTEM', 'Session exceeded max recovery attempts, skipping', {
+      sessionDbId,
+      recoveryAttempts
+    });
+    continue;
+  }
+
+  // Increment recovery counter
+  this.dbManager.getSessionStore().incrementRecoveryAttempts(sessionDbId);
+
+  // ... rest of recovery
+}
+```
+
+### Option 3: Cleanup Old Files (Mitigation, Not Fix)
+
+Add a cleanup script that removes orphaned .jsonl files:
+
+```bash
+# Find files with only 1 line older than 7 days
+find ~/.claude/projects/ -name "*.jsonl" -mtime +7 \
+  -exec sh -c '[ $(wc -l < "$1") -le 1 ] && rm "$1"' _ {} \;
+```
+
+---
+
+## Files Involved
+
+| File | Role |
+|------|------|
+| `src/services/worker-service.ts` | `startSessionProcessor()`, `processPendingQueues()` |
+| `src/services/worker/SDKAgent.ts` | `startSession()`, `query()` call with `resume` parameter |
+| `src/services/worker/SessionManager.ts` | `initializeSession()`, session lifecycle |
+| `src/services/sqlite/sessions/create.ts` | `createSDKSession()`, NULL-based initialization |
+| `src/services/sqlite/PendingMessageStore.ts` | `getSessionsWithPendingMessages()` |
+
+---
+
+## Conclusion
+
+Issue #514 was correctly diagnosed. The fix in commit 9a7f662 was the right approach but was reverted because the SDK may not accept arbitrary custom IDs. The current NULL-based pattern (005b0f8) is cleaner but doesn't prevent orphaned files when queries abort before capturing the SDK's session ID.
+
+**Recommendation:** Reintroduce the deterministic ID approach with proper handling of SDK rejections (Option 1). If the SDK rejects the ID and returns a different one, capture and persist that ID immediately. This ensures at most one .jsonl file per database session, even across crashes and restarts.
+
+---
+
+## Appendix: Git Commit References
+
+| Commit | Description |
+|--------|-------------|
+| 9a7f662 | Original fix: deterministic `mem-` prefix IDs (REVERTED) |
+| f9197b5 | Revert: capture-based strategy without deterministic IDs |
+| 005b0f8 | NULL-based initialization pattern (current) |
+| d72a81e | Queue refactoring (related to #520) |
+| eb1a78b | Claim-and-delete pattern (fixes #520) |
--- a/docs/reports/2026-01-04--issue-517-windows-powershell-analysis.md
+++ b/docs/reports/2026-01-04--issue-517-windows-powershell-analysis.md
@@ -0,0 +1,87 @@
+# Issue #517 Analysis: Windows PowerShell Escaping in cleanupOrphanedProcesses()
+
+**Date:** 2026-01-04
+**Version Analyzed:** 8.5.7
+**Status:** NOT FIXED - Issue still present
+
+## Summary
+
+The reported issue involves PowerShell's `$_` variable being interpreted by Bash before PowerShell receives it when running in Git Bash or WSL environments on Windows. This causes `cleanupOrphanedProcesses()` to fail during worker initialization.
+
+## Current State
+
+The `cleanupOrphanedProcesses()` function is located in:
+- **File:** `/Users/alexnewman/Scripts/claude-mem/src/services/infrastructure/ProcessManager.ts`
+- **Lines:** 164-251
+
+### Problematic Code (Lines 170-172)
+
+```typescript
+if (isWindows) {
+  // Windows: Use PowerShell Get-CimInstance to find chroma-mcp processes
+  const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.Name -like '*python*' -and $_.CommandLine -like '*chroma-mcp*' } | Select-Object -ExpandProperty ProcessId"`;
+  const { stdout } = await execAsync(cmd, { timeout: 60000 });
+```
+
+The `$_.Name` and `$_.CommandLine` contain `$_` which is a special variable in both PowerShell and Bash. When this command string is executed via Node.js `child_process.exec()` in a Git Bash or WSL environment, Bash may interpret `$_` as its own special variable (the last argument of the previous command) before passing it to PowerShell.
+
+### Additional Occurrence (Lines 91-92)
+
+A similar issue exists in `getChildProcesses()`:
+
+```typescript
+const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq ${parentPid} } | Select-Object -ExpandProperty ProcessId"`;
+```
+
+## Error Handling Analysis
+
+Both functions have try-catch blocks with non-blocking error handling:
+- Line 208-212: `cleanupOrphanedProcesses()` catches errors and logs a warning, then returns
+- Line 98-102: `getChildProcesses()` catches errors and logs a warning, returning empty array
+
+While this prevents worker initialization from crashing, it means orphaned process cleanup silently fails on affected Windows environments.
+
+## Recommended Fix
+
+Replace PowerShell commands with WMIC (Windows Management Instrumentation Command-line), which does not use `$_` syntax:
+
+### For cleanupOrphanedProcesses() (Line 171):
+
+**Current:**
+```typescript
+const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.Name -like '*python*' -and $_.CommandLine -like '*chroma-mcp*' } | Select-Object -ExpandProperty ProcessId"`;
+```
+
+**Recommended:**
+```typescript
+const cmd = `wmic process where "name like '%python%' and commandline like '%chroma-mcp%'" get processid /format:list`;
+```
+
+### For getChildProcesses() (Line 91):
+
+**Current:**
+```typescript
+const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq ${parentPid} } | Select-Object -ExpandProperty ProcessId"`;
+```
+
+**Recommended:**
+```typescript
+const cmd = `wmic process where "parentprocessid=${parentPid}" get processid /format:list`;
+```
+
+### Implementation Notes
+
+1. WMIC output format differs from PowerShell - parse `ProcessId=12345` format
+2. WMIC is deprecated in newer Windows versions but still widely available
+3. Alternative: Use PowerShell with proper escaping (`$$_` or `\$_` depending on context)
+4. Consider using `powershell -NoProfile -NonInteractive` flags for faster execution
+
+## Impact Assessment
+
+- **Severity:** Medium - orphaned process cleanup fails silently
+- **Scope:** Windows users running in Git Bash, WSL, or mixed shell environments
+- **Workaround:** None currently - users must manually kill orphaned chroma-mcp processes
+
+## Files to Modify
+
+1. `/src/services/infrastructure/ProcessManager.ts` (lines 91-92, 171-172)
--- a/docs/reports/2026-01-04--issue-520-stuck-messages-analysis.md
+++ b/docs/reports/2026-01-04--issue-520-stuck-messages-analysis.md
@@ -0,0 +1,210 @@
+# Issue #520: Stuck Messages Analysis
+
+**Date:** January 4, 2026
+**Status:** RESOLVED - Issue no longer exists in current codebase
+**Original Issue:** Messages in 'processing' status never recovered after worker crash
+
+---
+
+## Executive Summary
+
+The issue described in GitHub #520 has been **fully resolved** in the current codebase through a fundamental architectural change. The system now uses a **claim-and-delete** pattern instead of the old **claim-process-mark** pattern, which eliminates the stuck 'processing' state problem entirely.
+
+---
+
+## Original Issue Description
+
+The issue claimed that after a worker crash:
+
+1. `getSessionsWithPendingMessages()` returns sessions with `status IN ('pending', 'processing')`
+2. But `claimNextMessage()` only looks for `status = 'pending'`
+3. So 'processing' messages are orphaned
+
+**Proposed Fix:** Add `resetStuckMessages(0)` at start of `processPendingQueues()`
+
+---
+
+## Current Code Analysis
+
+### 1. Queue Processing Pattern: Claim-and-Delete
+
+The current architecture uses `claimAndDelete()` instead of `claimNextMessage()`:
+
+**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/sqlite/PendingMessageStore.ts`
+
+```typescript
+// Lines 85-104
+claimAndDelete(sessionDbId: number): PersistentPendingMessage | null {
+  const claimTx = this.db.transaction((sessionId: number) => {
+    const peekStmt = this.db.prepare(`
+      SELECT * FROM pending_messages
+      WHERE session_db_id = ? AND status = 'pending'
+      ORDER BY id ASC
+      LIMIT 1
+    `);
+    const msg = peekStmt.get(sessionId) as PersistentPendingMessage | null;
+
+    if (msg) {
+      // Delete immediately - no "processing" state needed
+      const deleteStmt = this.db.prepare('DELETE FROM pending_messages WHERE id = ?');
+      deleteStmt.run(msg.id);
+    }
+    return msg;
+  });
+
+  return claimTx(sessionDbId) as PersistentPendingMessage | null;
+}
+```
+
+**Key insight:** Messages are atomically selected and deleted in a single transaction. There is no 'processing' state for messages being actively worked on - they simply don't exist in the database anymore.
+
+### 2. Iterator Uses claimAndDelete
+
+**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/queue/SessionQueueProcessor.ts`
+
+```typescript
+// Lines 18-38
+async *createIterator(sessionDbId: number, signal: AbortSignal): AsyncIterableIterator<PendingMessageWithId> {
+  while (!signal.aborted) {
+    try {
+      // Atomically claim AND DELETE next message from DB
+      // Message is now in memory only - no "processing" state tracking needed
+      const persistentMessage = this.store.claimAndDelete(sessionDbId);
+
+      if (persistentMessage) {
+        // Yield the message for processing (it's already deleted from queue)
+        yield this.toPendingMessageWithId(persistentMessage);
+      } else {
+        // Queue empty - wait for wake-up event
+        await this.waitForMessage(signal);
+      }
+    } catch (error) {
+      // ... error handling
+    }
+  }
+}
+```
+
+### 3. getSessionsWithPendingMessages Still Checks Both States
+
+**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/sqlite/PendingMessageStore.ts`
+
+```typescript
+// Lines 319-326
+getSessionsWithPendingMessages(): number[] {
+  const stmt = this.db.prepare(`
+    SELECT DISTINCT session_db_id FROM pending_messages
+    WHERE status IN ('pending', 'processing')
+  `);
+  const results = stmt.all() as { session_db_id: number }[];
+  return results.map(r => r.session_db_id);
+}
+```
+
+**This is technically vestigial code** - with the claim-and-delete pattern, messages should never be in 'processing' state. However, it provides backward compatibility and defense-in-depth.
+
+### 4. Startup Recovery Still Exists
+
+**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/worker-service.ts`
+
+```typescript
+// Lines 236-242
+// Recover stuck messages from previous crashes
+const { PendingMessageStore } = await import('./sqlite/PendingMessageStore.js');
+const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
+const STUCK_THRESHOLD_MS = 5 * 60 * 1000;
+const resetCount = pendingStore.resetStuckMessages(STUCK_THRESHOLD_MS);
+if (resetCount > 0) {
+  logger.info('SYSTEM', `Recovered ${resetCount} stuck messages from previous session`, { thresholdMinutes: 5 });
+}
+```
+
+This runs BEFORE `processPendingQueues()` is called (line 281), which addresses the original fix request.
+
+---
+
+## Verification of Issue Status
+
+### Does the Issue Exist?
+
+**NO** - The issue as described no longer exists because:
+
+1. **No 'processing' state during normal operation**: With claim-and-delete, messages go directly from 'pending' to 'deleted'. They never enter a 'processing' state.
+
+2. **Startup recovery handles legacy stuck messages**: Even if 'processing' messages exist (from old code or edge cases), `resetStuckMessages()` is called BEFORE `processPendingQueues()` in `initializeBackground()` (lines 236-241 run before line 281).
+
+3. **Architecture fundamentally changed**: The old `claimNextMessage()` function that only looked for `status = 'pending'` no longer exists. It was replaced with `claimAndDelete()`.
+
+### GeminiAgent and OpenRouterAgent Behavior
+
+Both agents use the same `SessionManager.getMessageIterator()` which calls `SessionQueueProcessor.createIterator()` which uses `claimAndDelete()`. All three agents (SDKAgent, GeminiAgent, OpenRouterAgent) use identical queue processing:
+
+```typescript
+// GeminiAgent.ts:174, OpenRouterAgent.ts:134
+for await (const message of this.sessionManager.getMessageIterator(session.sessionDbId)) {
+  // ...
+}
+```
+
+They do NOT handle recovery differently - they all rely on the shared infrastructure.
+
+### What v8.5.7 Changed
+
+Looking at the git history:
+
+```
+v8.5.7 (ac03901):
+- Minor ESM/CommonJS compatibility fix for isMainModule detection
+- No queue-related changes
+
+v8.5.6 -> v8.5.7:
+- f21ea97 refactor: decompose monolith into modular architecture with comprehensive test suite (#538)
+```
+
+The major refactor happened before v8.5.7. The claim-and-delete pattern was already in place.
+
+---
+
+## Timeline of Resolution
+
+Based on git history, the issue was likely resolved through these commits:
+
+1. **b8ce27b** - `feat(queue): Simplify queue processing and enhance reliability`
+2. **eb1a78b** - `fix: eliminate duplicate observations by simplifying message queue`
+3. **d72a81e** - `Refactor session queue processing and database interactions`
+
+These commits appear to have introduced the claim-and-delete pattern that eliminates the original bug.
+
+---
+
+## Conclusion
+
+**Issue #520 should be closed as resolved.**
+
+The described bug (`claimNextMessage()` only checking `status = 'pending'`) no longer exists because:
+
+1. `claimNextMessage()` was replaced with `claimAndDelete()` which atomically removes messages
+2. `resetStuckMessages()` is already called at startup BEFORE `processPendingQueues()`
+3. The 'processing' status is now only used for legacy compatibility and edge cases
+
+### No Fix Needed
+
+The proposed fix ("Add `resetStuckMessages(0)` at start of `processPendingQueues()`") is:
+
+1. **Unnecessary** - The recovery happens in `initializeBackground()` before `processPendingQueues()` is called
+2. **Using wrong threshold** - `resetStuckMessages(0)` would reset ALL processing messages immediately, which could cause issues if called during normal operation (not just startup)
+
+The current implementation with a 5-minute threshold is more robust - it only recovers truly stuck messages, not messages that are actively being processed.
+
+---
+
+## Appendix: File References
+
+| Component | File | Key Lines |
+|-----------|------|-----------|
+| claimAndDelete | `src/services/sqlite/PendingMessageStore.ts` | 85-104 |
+| Queue Iterator | `src/services/queue/SessionQueueProcessor.ts` | 18-38 |
+| Startup Recovery | `src/services/worker-service.ts` | 236-242 |
+| processPendingQueues | `src/services/worker-service.ts` | 326-375 |
+| getSessionsWithPendingMessages | `src/services/sqlite/PendingMessageStore.ts` | 319-326 |
+| resetStuckMessages | `src/services/sqlite/PendingMessageStore.ts` | 279-290 |
--- a/docs/reports/2026-01-04--issue-527-uv-homebrew-analysis.md
+++ b/docs/reports/2026-01-04--issue-527-uv-homebrew-analysis.md
@@ -0,0 +1,112 @@
+# Issue #527: uv Detection Fails on Apple Silicon Macs with Homebrew Installation
+
+**Date**: 2026-01-04
+**Issue**: GitHub Issue #527
+**Status**: Confirmed - Fix Required
+
+## Summary
+
+The `isUvInstalled()` function fails to detect uv when installed via Homebrew on Apple Silicon Macs because it does not check the `/opt/homebrew/bin/uv` path.
+
+## Analysis
+
+### Files Affected
+
+Two copies of `smart-install.js` exist in the codebase:
+
+1. **Source file**: `/Users/alexnewman/Scripts/claude-mem/scripts/smart-install.js`
+2. **Built/deployed file**: `/Users/alexnewman/Scripts/claude-mem/plugin/scripts/smart-install.js`
+
+### Current uv Path Detection
+
+**Source file (`scripts/smart-install.js`)** - Lines 22-24:
+```javascript
+const UV_COMMON_PATHS = IS_WINDOWS
+  ? [join(homedir(), '.local', 'bin', 'uv.exe'), join(homedir(), '.cargo', 'bin', 'uv.exe')]
+  : [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv'];
+```
+
+**Plugin file (`plugin/scripts/smart-install.js`)** - Lines 103-105:
+```javascript
+const uvPaths = IS_WINDOWS
+  ? [join(homedir(), '.local', 'bin', 'uv.exe'), join(homedir(), '.cargo', 'bin', 'uv.exe')]
+  : [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv'];
+```
+
+### Paths Currently Checked (Unix/macOS)
+
+| Path | Installer | Architecture |
+|------|-----------|--------------|
+| `~/.local/bin/uv` | Official installer | Any |
+| `~/.cargo/bin/uv` | Cargo/Rust install | Any |
+| `/usr/local/bin/uv` | Homebrew (Intel) | x86_64 |
+
+### Missing Path
+
+| Path | Installer | Architecture |
+|------|-----------|--------------|
+| `/opt/homebrew/bin/uv` | Homebrew (Apple Silicon) | arm64 |
+
+## Root Cause
+
+Homebrew installs to different prefixes depending on architecture:
+- **Intel Macs (x86_64)**: `/usr/local/bin/`
+- **Apple Silicon Macs (arm64)**: `/opt/homebrew/bin/`
+
+The current implementation only includes the Intel Homebrew path, causing detection to fail on Apple Silicon when:
+1. uv is installed via `brew install uv`
+2. The user's shell PATH is not available during script execution (common in non-interactive contexts)
+
+## Impact
+
+Users on Apple Silicon Macs who installed uv via Homebrew will:
+1. See "uv not found" errors
+2. Have uv unnecessarily reinstalled via the official installer
+3. End up with duplicate installations
+
+## Recommended Fix
+
+Add `/opt/homebrew/bin/uv` to the Unix paths array.
+
+### Source file (`scripts/smart-install.js`) - Line 24
+
+**Before:**
+```javascript
+: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv'];
+```
+
+**After:**
+```javascript
+: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv', '/opt/homebrew/bin/uv'];
+```
+
+### Plugin file (`plugin/scripts/smart-install.js`) - Lines 103-105 and 222-224
+
+The same fix should be applied in both locations where `uvPaths` is defined:
+- Line 105 in `isUvInstalled()`
+- Line 224 in `installUv()`
+
+### Note: Bun Has the Same Issue
+
+The Bun detection has the same gap:
+
+**Current (`scripts/smart-install.js` line 20):**
+```javascript
+: [join(homedir(), '.bun', 'bin', 'bun'), '/usr/local/bin/bun'];
+```
+
+**Should also add:**
+```javascript
+: [join(homedir(), '.bun', 'bin', 'bun'), '/usr/local/bin/bun', '/opt/homebrew/bin/bun'];
+```
+
+## Verification
+
+After the fix, verify by:
+1. Installing uv via Homebrew on an Apple Silicon Mac
+2. Running the smart-install script
+3. Confirming uv is detected without attempting reinstallation
+
+## Conclusion
+
+**Fix is required.** The `/opt/homebrew/bin/uv` path is missing from both files. This is a simple one-line addition to the path arrays. The same fix should also be applied to Bun detection paths for consistency.
--- a/docs/reports/2026-01-04--issue-532-memory-leak-analysis.md
+++ b/docs/reports/2026-01-04--issue-532-memory-leak-analysis.md
@@ -0,0 +1,324 @@
+# Issue #532: Memory Leak in SessionManager - Analysis Report
+
+**Date**: 2026-01-04
+**Issue**: Memory leak causing 54GB+ VS Code memory consumption after several days of use
+**Reported Root Causes**:
+1. Sessions never auto-cleanup after SDK agent completes
+2. `conversationHistory` array grows unbounded (never trimmed)
+
+---
+
+## Executive Summary
+
+This analysis confirms **both issues exist in the current codebase** (v8.5.7). While v8.5.7 included a major modular refactor, it did **not address either memory leak issue**. The `SessionManager` holds sessions indefinitely in memory with no TTL/cleanup mechanism, and `conversationHistory` arrays grow unbounded within each session (with only OpenRouter implementing partial mitigation).
+
+---
+
+## 1. SessionManager Session Storage Analysis
+
+### Location
+`/Users/alexnewman/Scripts/claude-mem/src/services/worker/SessionManager.ts`
+
+### Current Implementation
+
+```typescript
+export class SessionManager {
+  private sessions: Map<number, ActiveSession> = new Map();
+  private sessionQueues: Map<number, EventEmitter> = new Map();
+  // ...
+}
+```
+
+Sessions are stored in an in-memory `Map<number, ActiveSession>` with the session database ID as the key.
+
+### Session Lifecycle
+
+| Event | Method | Behavior |
+|-------|--------|----------|
+| Session created | `initializeSession()` | Added to `this.sessions` Map (line 152) |
+| Session deleted | `deleteSession()` | Removed from `this.sessions` Map (line 293) |
+| Worker shutdown | `shutdownAll()` | Calls `deleteSession()` on all sessions |
+
+### The Problem: No Automatic Cleanup
+
+Looking at `/Users/alexnewman/Scripts/claude-mem/src/services/worker/http/routes/SessionRoutes.ts` (lines 213-216), the session completion handling has this comment:
+
+```typescript
+// NOTE: We do NOT delete the session here anymore.
+// The generator waits for events, so if it exited, it's either aborted or crashed.
+// Idle sessions stay in memory (ActiveSession is small) to listen for future events.
+```
+
+**Critical Finding**: Sessions are **intentionally never deleted** after the SDK agent completes. They persist indefinitely "to listen for future events."
+
+### When Sessions ARE Deleted
+
+Sessions are only deleted when:
+1. Explicit `DELETE /sessions/:sessionDbId` HTTP request (manual cleanup)
+2. `POST /sessions/:sessionDbId/complete` HTTP request (cleanup-hook callback)
+3. Worker service shutdown (`shutdownAll()`)
+
+There is **NO automatic cleanup mechanism** based on:
+- Session age/TTL
+- Session inactivity timeout
+- Memory pressure
+- Completed/failed status
+
+---
+
+## 2. conversationHistory Analysis
+
+### Location
+`/Users/alexnewman/Scripts/claude-mem/src/services/worker-types.ts` (line 34)
+
+### Type Definition
+
+```typescript
+export interface ActiveSession {
+  // ...
+  conversationHistory: ConversationMessage[];  // Shared conversation history for provider switching
+  // ...
+}
+```
+
+### Usage Pattern
+
+The `conversationHistory` array is populated by three agent implementations:
+
+1. **SDKAgent** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/SDKAgent.ts`)
+   - Adds user messages at lines 247, 280, 302
+   - Assistant responses added via `ResponseProcessor`
+
+2. **GeminiAgent** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/GeminiAgent.ts`)
+   - Adds user messages at lines 143, 196, 232
+   - Adds assistant responses at lines 148, 202, 238
+
+3. **OpenRouterAgent** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/OpenRouterAgent.ts`)
+   - Adds user messages at lines 103, 155, 191
+   - Adds assistant responses at lines 108, 161, 197
+   - **Implements truncation**: See `truncateHistory()` at lines 262-301
+
+4. **ResponseProcessor** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/agents/ResponseProcessor.ts`)
+   - Adds assistant responses at line 57
+
+### The Problem: Unbounded Growth
+
+**For Claude SDK and Gemini agents**, there is **no limit or trimming** of `conversationHistory`. Every message is `push()`ed without checking array size.
+
+**OpenRouter ONLY** has mitigation via `truncateHistory()`:
+
+```typescript
+private truncateHistory(history: ConversationMessage[]): ConversationMessage[] {
+  const MAX_CONTEXT_MESSAGES = parseInt(settings.CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES) || 20;
+  const MAX_ESTIMATED_TOKENS = parseInt(settings.CLAUDE_MEM_OPENROUTER_MAX_TOKENS) || 100000;
+
+  // Sliding window: keep most recent messages within limits
+  // ...
+}
+```
+
+However, this only truncates the copy sent to OpenRouter API - **it does NOT truncate the actual `session.conversationHistory` array**. The original array still grows unbounded.
+
+### Memory Impact Calculation
+
+Each `ConversationMessage` contains:
+- `role`: 'user' | 'assistant' (small string)
+- `content`: string (can be very large - full prompts/responses)
+
+A typical session with 100 tool uses could have:
+- 1 init prompt (~2KB)
+- 100 observation prompts (~5KB each = 500KB)
+- 100 responses (~1KB each = 100KB)
+- 1 summary prompt + response (~5KB)
+
+**Per session**: ~600KB in `conversationHistory` alone
+
+After several days with many sessions, this adds up to gigabytes.
+
+---
+
+## 3. v8.5.7 Refactor Assessment
+
+The v8.5.7 release (2026-01-04) focused on modular architecture refactoring:
+
+### What v8.5.7 DID:
+- Extracted SQLite repositories into `/src/services/sqlite/`
+- Extracted worker agents into `/src/services/worker/agents/`
+- Extracted search strategies into `/src/services/worker/search/`
+- Extracted context generation into `/src/services/context/`
+- Extracted infrastructure into `/src/services/infrastructure/`
+- Added 595 tests across 36 test files
+
+### What v8.5.7 DID NOT address:
+- No session TTL or automatic cleanup mechanism
+- No `conversationHistory` size limits for Claude SDK or Gemini
+- No memory pressure monitoring for sessions
+- The "sessions stay in memory" design comment was already present
+
+**Relevant v8.5.2 Note**: There was a related fix for SDK Agent child process memory leak (orphaned Claude processes), but that addressed process cleanup, not in-memory session state.
+
+---
+
+## 4. Specific Code Locations Requiring Fixes
+
+### Fix Location 1: SessionManager needs cleanup mechanism
+**File**: `/Users/alexnewman/Scripts/claude-mem/src/services/worker/SessionManager.ts`
+
+Add automatic session cleanup based on:
+- Session completion (when generator finishes and no pending work)
+- Session age TTL (e.g., 1 hour after last activity)
+- Memory pressure (configurable max sessions)
+
+### Fix Location 2: conversationHistory needs bounds
+**Files**:
+- `/Users/alexnewman/Scripts/claude-mem/src/services/worker/SDKAgent.ts`
+- `/Users/alexnewman/Scripts/claude-mem/src/services/worker/GeminiAgent.ts`
+- `/Users/alexnewman/Scripts/claude-mem/src/services/worker/agents/ResponseProcessor.ts`
+
+Apply sliding window truncation similar to OpenRouterAgent's approach, but mutate the original array.
+
+### Fix Location 3: Session cleanup on completion
+**File**: `/Users/alexnewman/Scripts/claude-mem/src/services/worker/http/routes/SessionRoutes.ts`
+
+Remove the design decision to keep idle sessions in memory. Add cleanup timer after generator completes.
+
+---
+
+## 5. Recommended Fixes
+
+### Fix 1: Add Session TTL and Cleanup Timer
+
+```typescript
+// In SessionManager.ts
+
+private readonly SESSION_TTL_MS = 60 * 60 * 1000; // 1 hour
+private cleanupTimers: Map<number, NodeJS.Timeout> = new Map();
+
+/**
+ * Schedule automatic cleanup for idle sessions
+ */
+scheduleSessionCleanup(sessionDbId: number): void {
+  // Clear existing timer if any
+  const existingTimer = this.cleanupTimers.get(sessionDbId);
+  if (existingTimer) {
+    clearTimeout(existingTimer);
+  }
+
+  // Schedule cleanup after TTL
+  const timer = setTimeout(() => {
+    const session = this.sessions.get(sessionDbId);
+    if (session && !session.generatorPromise) {
+      // Only delete if no active generator
+      this.deleteSession(sessionDbId);
+      logger.info('SESSION', 'Session auto-cleaned due to TTL', { sessionDbId });
+    }
+  }, this.SESSION_TTL_MS);
+
+  this.cleanupTimers.set(sessionDbId, timer);
+}
+
+/**
+ * Cancel cleanup timer (call when session receives new work)
+ */
+cancelSessionCleanup(sessionDbId: number): void {
+  const timer = this.cleanupTimers.get(sessionDbId);
+  if (timer) {
+    clearTimeout(timer);
+    this.cleanupTimers.delete(sessionDbId);
+  }
+}
+```
+
+### Fix 2: Add conversationHistory Bounds
+
+```typescript
+// In src/services/worker/SessionManager.ts or new utility file
+
+const MAX_CONVERSATION_HISTORY_LENGTH = 50; // Configurable
+
+/**
+ * Trim conversation history to prevent unbounded growth
+ * Keeps the most recent messages
+ */
+export function trimConversationHistory(session: ActiveSession): void {
+  if (session.conversationHistory.length > MAX_CONVERSATION_HISTORY_LENGTH) {
+    const toRemove = session.conversationHistory.length - MAX_CONVERSATION_HISTORY_LENGTH;
+    session.conversationHistory.splice(0, toRemove);
+    logger.debug('SESSION', 'Trimmed conversation history', {
+      sessionDbId: session.sessionDbId,
+      removed: toRemove,
+      remaining: session.conversationHistory.length
+    });
+  }
+}
+```
+
+Then call this after each message is added in SDKAgent, GeminiAgent, and ResponseProcessor.
+
+### Fix 3: Update SessionRoutes Generator Completion
+
+```typescript
+// In SessionRoutes.ts, update the finally block (around line 164)
+
+.finally(() => {
+  const sessionDbId = session.sessionDbId;
+  const wasAborted = session.abortController.signal.aborted;
+
+  if (wasAborted) {
+    logger.info('SESSION', `Generator aborted`, { sessionId: sessionDbId });
+  } else {
+    logger.info('SESSION', `Generator completed naturally`, { sessionId: sessionDbId });
+  }
+
+  session.generatorPromise = null;
+  session.currentProvider = null;
+  this.workerService.broadcastProcessingStatus();
+
+  // Check for pending work
+  const pendingStore = this.sessionManager.getPendingMessageStore();
+  const pendingCount = pendingStore.getPendingCount(sessionDbId);
+
+  if (pendingCount > 0 && !wasAborted) {
+    // Restart for pending work
+    // ... existing restart logic ...
+  } else {
+    // No pending work - schedule cleanup instead of keeping forever
+    this.sessionManager.scheduleSessionCleanup(sessionDbId);
+  }
+});
+```
+
+---
+
+## 6. Configuration Recommendations
+
+Add these to `settings.json` defaults:
+
+```json
+{
+  "CLAUDE_MEM_SESSION_TTL_MINUTES": 60,
+  "CLAUDE_MEM_MAX_CONVERSATION_HISTORY": 50,
+  "CLAUDE_MEM_MAX_ACTIVE_SESSIONS": 100
+}
+```
+
+---
+
+## 7. Testing Recommendations
+
+Add tests for:
+1. Session cleanup after TTL expires
+2. `conversationHistory` trimming at various sizes
+3. Memory monitoring under sustained load
+4. Cleanup timer cancellation on new work
+
+---
+
+## Summary
+
+| Issue | Status in v8.5.7 | Fix Required |
+|-------|------------------|--------------|
+| Sessions never auto-cleanup | NOT FIXED | Yes - add TTL/cleanup mechanism |
+| conversationHistory unbounded | NOT FIXED (except partial OpenRouter mitigation) | Yes - add trimming to all agents |
+
+Both memory leaks are confirmed to exist in the current codebase and require the fixes outlined above.
--- a/plugin/package.json
+++ b/plugin/package.json
@@ -1,6 +1,6 @@
 {
  "name": "claude-mem-plugin",
-  "version": "8.5.6",
+  "version": "8.5.7",
  "private": true,
  "description": "Runtime dependencies for claude-mem bundled hooks",
  "type": "module",
--- a/plugin/scripts/worker-service.cjs
+++ b/plugin/scripts/worker-service.cjs