feat(docs): Add analysis reports for issues #514, #517, #520, #527, and #532

- Issue #514: Documented analysis of orphaned observer session files, including root cause, evidence, and recommended fixes.
- Issue #517: Analyzed PowerShell escaping issues in cleanupOrphanedProcesses() on Windows, with recommended fixes using WMIC.
- Issue #520: Confirmed resolution of stuck messages issue through architectural changes to a claim-and-delete pattern.
- Issue #527: Identified detection failure of uv on Apple Silicon Macs with Homebrew installation, proposed path updates for detection.
- Issue #532: Analyzed memory leak issues in SessionManager, detailing session cleanup and conversationHistory growth concerns, with recommended fixes.
This commit is contained in:
Alex Newman
2026-01-04 00:21:22 -05:00
parent a833cd2099
commit bb033b95f1
7 changed files with 1027 additions and 2 deletions

View File

@@ -0,0 +1,292 @@
# Issue #514: Orphaned Observer Session Files Analysis
**Date:** January 4, 2026
**Status:** PARTIALLY RESOLVED - Root cause understood, fix was made but reverted
**Original Issue:** 13,000+ orphaned .jsonl session files created over 2 days
---
## Executive Summary
Issue #514 reported that the plugin created 13,000+ orphaned session .jsonl files in `~/.claude/projects/<project>/`. Each file contained only an initialization message with no actual observations. The hypothesis was that `startSessionProcessor()` in startup-recovery created new observer sessions in a loop.
**Current State:** The issue was **fixed in commit 9a7f662** with a deterministic `mem-${contentSessionId}` prefix approach, but this fix was **reverted in commit f9197b5** due to the SDK not accepting custom session IDs. The current code uses a NULL-based initialization pattern that can still create orphaned sessions under certain conditions.
---
## Evidence: Current File Analysis
Filesystem analysis of `~/.claude/projects/-Users-alexnewman-Scripts-claude-mem/`:
| Line Count | Number of Files |
|------------|-----------------|
| 0 lines (empty) | 407 |
| 1 line | **12,562** |
| 2 lines | 3,199 |
| 3+ lines | 3,546 |
| **Total** | **~19,714** |
The 12,562 single-line files are consistent with the issue description - sessions that initialized but never received observations.
Sample single-line file content:
```json
{"type":"queue-operation","operation":"dequeue","timestamp":"2025-12-28T20:41:25.484Z","sessionId":"00081a3b-9485-48a4-89f0-fd4dfccd3ac9"}
```
---
## Root Cause Analysis
### The Problem Chain
1. **Worker startup calls `processPendingQueues()`** (line 281 in worker-service.ts)
2. For each session with pending messages, it calls `initializeSession()` then `startSessionProcessor()`
3. `startSessionProcessor()` invokes `sdkAgent.startSession()` which calls the Claude Agent SDK `query()` function
4. **If `memorySessionId` is NULL**, no `resume` parameter is passed to `query()`
5. **The SDK creates a NEW .jsonl file** for each query call without a resume parameter
6. **If the query aborts before receiving a response** (timeout, crash, abort signal), the `memorySessionId` is never captured
7. On next startup, the cycle repeats - creating yet another orphaned file
### Why Sessions Abort Before Capturing memorySessionId
Looking at `startSessionProcessor()` flow:
```typescript
// worker-service.ts lines 301-321
private startSessionProcessor(session, source) {
session.generatorPromise = this.sdkAgent.startSession(session, this)
.catch(error => { /* error handling */ })
.finally(() => {
session.generatorPromise = null;
this.broadcastProcessingStatus();
});
}
```
And `processPendingQueues()`:
```typescript
// worker-service.ts lines 347-371
for (const sessionDbId of orphanedSessionIds) {
const session = this.sessionManager.initializeSession(sessionDbId);
this.startSessionProcessor(session, 'startup-recovery');
await new Promise(resolve => setTimeout(resolve, 100)); // 100ms delay between sessions
}
```
The problem: Starting 50 sessions rapidly (100ms delay) with pending messages means:
- All 50 SDK queries start nearly simultaneously
- The SDK creates 50 new .jsonl files (since none have memorySessionId yet)
- If any query fails/aborts before the first response, its memorySessionId is never captured
- On next startup, those sessions get new files again
---
## Code Flow: Where .jsonl Files Are Created
The .jsonl files are created by the **Claude Agent SDK** (`@anthropic-ai/claude-agent-sdk`), not by claude-mem directly.
When `query()` is called in SDKAgent.ts:
```typescript
// SDKAgent.ts lines 89-99
const queryResult = query({
prompt: messageGenerator,
options: {
model: modelId,
// Resume with captured memorySessionId (null on first prompt, real ID on subsequent)
...(hasRealMemorySessionId && { resume: session.memorySessionId }),
disallowedTools,
abortController: session.abortController,
pathToClaudeCodeExecutable: claudePath
}
});
```
**Key insight:** If `hasRealMemorySessionId` is false (memorySessionId is null), no `resume` parameter is passed. The SDK then generates a new UUID and creates a new file at:
`~/.claude/projects/<dashed-cwd>/<new-uuid>.jsonl`
---
## Fix History
### Commit 9a7f662: The Original Fix (Reverted)
```
fix(sdk): always pass deterministic session ID to prevent orphaned files
Fixes #514 - Excessive observer sessions created during startup-recovery
Root cause: When memorySessionId was null, no `resume` parameter was passed
to the SDK's query(). This caused the SDK to create a NEW session file on
every call. If queries aborted before capturing the SDK's session_id, the
placeholder remained, leading to cascading creation of 13,000+ orphaned files.
Fix:
- Generate deterministic ID `mem-${contentSessionId}` upfront
- Always pass it to `resume` parameter
- Persist immediately to database before query starts
- If SDK returns different ID, capture and use that going forward
```
**This fix was correct in approach** - always passing a resume parameter prevents new file creation.
### Commit f9197b5: The Revert
```
fix(sdk): restore session continuity via robust capture-and-resume strategy
Replaces the deterministic 'mem-' ID approach with a capture-based strategy:
1. Passes 'resume' parameter ONLY when a verified memory session ID exists
2. Captures SDK-generated session ID when it differs from current ID
3. Ensures subsequent prompts resume the correctly captured session ID
This resolves the issue where new sessions were created for every message
due to failure to capture/resume the initial session ID, without introducing
potentially invalid deterministic IDs.
```
**The revert explanation suggests the SDK rejected the `mem-` prefix IDs.**
### Commit 005b0f8: Current NULL-based Pattern
Changed `memory_session_id` initialization from `contentSessionId` (placeholder) to `NULL`:
- Simpler logic: `!!session.memorySessionId` instead of `memorySessionId !== contentSessionId`
- But still creates new files on first prompt of each session
---
## Relationship with Issue #520 (Stuck Messages)
**Issue #520 is related but distinct:**
| Aspect | Issue #514 (Orphaned Files) | Issue #520 (Stuck Messages) |
|--------|-----------------------------|-----------------------------|
| Problem | Too many .jsonl files | Messages never processed |
| Root Cause | SDK creates new file per query without resume | Old claim-process-mark pattern left messages in 'processing' state |
| Status | Partially resolved | **Fully resolved** |
| Fix | Need deterministic resume IDs | Changed to claim-and-delete pattern |
**Connection:** Both issues relate to startup-recovery. Issue #520's fix (claim-and-delete pattern) doesn't create the loop that #514 describes, but #514 can still occur when:
1. Sessions have pending messages
2. Recovery starts the generator
3. Generator aborts before capturing memorySessionId
4. Next startup repeats the cycle
---
## v8.5.7 Status
**v8.5.7 did NOT fully address Issue #514.** The major changes were:
- Modular architecture refactor
- NULL-based initialization pattern
- Comprehensive test coverage
The deterministic `mem-` prefix fix (9a7f662) was reverted before v8.5.7.
---
## Recommended Fix
### Option 1: Reintroduce Deterministic IDs with SDK Validation
```typescript
// SDKAgent.ts - In startSession()
async startSession(session: ActiveSession, worker?: WorkerRef): Promise<void> {
// Generate deterministic ID based on database session ID (not UUID-based contentSessionId)
// Format: "mem-<sessionDbId>" is short and unlikely to conflict
const deterministicMemoryId = session.memorySessionId || `mem-${session.sessionDbId}`;
// Always pass resume to prevent orphaned sessions
const queryResult = query({
prompt: messageGenerator,
options: {
model: modelId,
resume: deterministicMemoryId, // ALWAYS pass, even if SDK might reject
disallowedTools,
abortController: session.abortController,
pathToClaudeCodeExecutable: claudePath
}
});
// Capture whatever ID the SDK actually uses
for await (const message of queryResult) {
if (message.session_id && message.session_id !== session.memorySessionId) {
session.memorySessionId = message.session_id;
this.dbManager.getSessionStore().updateMemorySessionId(
session.sessionDbId,
message.session_id
);
}
// ... rest of processing
}
}
```
### Option 2: Limit Recovery Scope
Prevent the recovery loop by limiting how many times a session can be recovered:
```typescript
// In processPendingQueues()
for (const sessionDbId of orphanedSessionIds) {
// Check if this session was already recovered recently
const dbSession = this.dbManager.getSessionById(sessionDbId);
const recoveryAttempts = dbSession.recovery_attempts || 0;
if (recoveryAttempts >= 3) {
logger.warn('SYSTEM', 'Session exceeded max recovery attempts, skipping', {
sessionDbId,
recoveryAttempts
});
continue;
}
// Increment recovery counter
this.dbManager.getSessionStore().incrementRecoveryAttempts(sessionDbId);
// ... rest of recovery
}
```
### Option 3: Cleanup Old Files (Mitigation, Not Fix)
Add a cleanup script that removes orphaned .jsonl files:
```bash
# Find files with only 1 line older than 7 days
find ~/.claude/projects/ -name "*.jsonl" -mtime +7 \
-exec sh -c '[ $(wc -l < "$1") -le 1 ] && rm "$1"' _ {} \;
```
---
## Files Involved
| File | Role |
|------|------|
| `src/services/worker-service.ts` | `startSessionProcessor()`, `processPendingQueues()` |
| `src/services/worker/SDKAgent.ts` | `startSession()`, `query()` call with `resume` parameter |
| `src/services/worker/SessionManager.ts` | `initializeSession()`, session lifecycle |
| `src/services/sqlite/sessions/create.ts` | `createSDKSession()`, NULL-based initialization |
| `src/services/sqlite/PendingMessageStore.ts` | `getSessionsWithPendingMessages()` |
---
## Conclusion
Issue #514 was correctly diagnosed. The fix in commit 9a7f662 was the right approach but was reverted because the SDK may not accept arbitrary custom IDs. The current NULL-based pattern (005b0f8) is cleaner but doesn't prevent orphaned files when queries abort before capturing the SDK's session ID.
**Recommendation:** Reintroduce the deterministic ID approach with proper handling of SDK rejections (Option 1). If the SDK rejects the ID and returns a different one, capture and persist that ID immediately. This ensures at most one .jsonl file per database session, even across crashes and restarts.
---
## Appendix: Git Commit References
| Commit | Description |
|--------|-------------|
| 9a7f662 | Original fix: deterministic `mem-` prefix IDs (REVERTED) |
| f9197b5 | Revert: capture-based strategy without deterministic IDs |
| 005b0f8 | NULL-based initialization pattern (current) |
| d72a81e | Queue refactoring (related to #520) |
| eb1a78b | Claim-and-delete pattern (fixes #520) |

View File

@@ -0,0 +1,87 @@
# Issue #517 Analysis: Windows PowerShell Escaping in cleanupOrphanedProcesses()
**Date:** 2026-01-04
**Version Analyzed:** 8.5.7
**Status:** NOT FIXED - Issue still present
## Summary
The reported issue involves PowerShell's `$_` variable being interpreted by Bash before PowerShell receives it when running in Git Bash or WSL environments on Windows. This causes `cleanupOrphanedProcesses()` to fail during worker initialization.
## Current State
The `cleanupOrphanedProcesses()` function is located in:
- **File:** `/Users/alexnewman/Scripts/claude-mem/src/services/infrastructure/ProcessManager.ts`
- **Lines:** 164-251
### Problematic Code (Lines 170-172)
```typescript
if (isWindows) {
// Windows: Use PowerShell Get-CimInstance to find chroma-mcp processes
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.Name -like '*python*' -and $_.CommandLine -like '*chroma-mcp*' } | Select-Object -ExpandProperty ProcessId"`;
const { stdout } = await execAsync(cmd, { timeout: 60000 });
```
The `$_.Name` and `$_.CommandLine` contain `$_` which is a special variable in both PowerShell and Bash. When this command string is executed via Node.js `child_process.exec()` in a Git Bash or WSL environment, Bash may interpret `$_` as its own special variable (the last argument of the previous command) before passing it to PowerShell.
### Additional Occurrence (Lines 91-92)
A similar issue exists in `getChildProcesses()`:
```typescript
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq ${parentPid} } | Select-Object -ExpandProperty ProcessId"`;
```
## Error Handling Analysis
Both functions have try-catch blocks with non-blocking error handling:
- Line 208-212: `cleanupOrphanedProcesses()` catches errors and logs a warning, then returns
- Line 98-102: `getChildProcesses()` catches errors and logs a warning, returning empty array
While this prevents worker initialization from crashing, it means orphaned process cleanup silently fails on affected Windows environments.
## Recommended Fix
Replace PowerShell commands with WMIC (Windows Management Instrumentation Command-line), which does not use `$_` syntax:
### For cleanupOrphanedProcesses() (Line 171):
**Current:**
```typescript
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.Name -like '*python*' -and $_.CommandLine -like '*chroma-mcp*' } | Select-Object -ExpandProperty ProcessId"`;
```
**Recommended:**
```typescript
const cmd = `wmic process where "name like '%python%' and commandline like '%chroma-mcp%'" get processid /format:list`;
```
### For getChildProcesses() (Line 91):
**Current:**
```typescript
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq ${parentPid} } | Select-Object -ExpandProperty ProcessId"`;
```
**Recommended:**
```typescript
const cmd = `wmic process where "parentprocessid=${parentPid}" get processid /format:list`;
```
### Implementation Notes
1. WMIC output format differs from PowerShell - parse `ProcessId=12345` format
2. WMIC is deprecated in newer Windows versions but still widely available
3. Alternative: Use PowerShell with proper escaping (`$$_` or `\$_` depending on context)
4. Consider using `powershell -NoProfile -NonInteractive` flags for faster execution
## Impact Assessment
- **Severity:** Medium - orphaned process cleanup fails silently
- **Scope:** Windows users running in Git Bash, WSL, or mixed shell environments
- **Workaround:** None currently - users must manually kill orphaned chroma-mcp processes
## Files to Modify
1. `/src/services/infrastructure/ProcessManager.ts` (lines 91-92, 171-172)

View File

@@ -0,0 +1,210 @@
# Issue #520: Stuck Messages Analysis
**Date:** January 4, 2026
**Status:** RESOLVED - Issue no longer exists in current codebase
**Original Issue:** Messages in 'processing' status never recovered after worker crash
---
## Executive Summary
The issue described in GitHub #520 has been **fully resolved** in the current codebase through a fundamental architectural change. The system now uses a **claim-and-delete** pattern instead of the old **claim-process-mark** pattern, which eliminates the stuck 'processing' state problem entirely.
---
## Original Issue Description
The issue claimed that after a worker crash:
1. `getSessionsWithPendingMessages()` returns sessions with `status IN ('pending', 'processing')`
2. But `claimNextMessage()` only looks for `status = 'pending'`
3. So 'processing' messages are orphaned
**Proposed Fix:** Add `resetStuckMessages(0)` at start of `processPendingQueues()`
---
## Current Code Analysis
### 1. Queue Processing Pattern: Claim-and-Delete
The current architecture uses `claimAndDelete()` instead of `claimNextMessage()`:
**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/sqlite/PendingMessageStore.ts`
```typescript
// Lines 85-104
claimAndDelete(sessionDbId: number): PersistentPendingMessage | null {
const claimTx = this.db.transaction((sessionId: number) => {
const peekStmt = this.db.prepare(`
SELECT * FROM pending_messages
WHERE session_db_id = ? AND status = 'pending'
ORDER BY id ASC
LIMIT 1
`);
const msg = peekStmt.get(sessionId) as PersistentPendingMessage | null;
if (msg) {
// Delete immediately - no "processing" state needed
const deleteStmt = this.db.prepare('DELETE FROM pending_messages WHERE id = ?');
deleteStmt.run(msg.id);
}
return msg;
});
return claimTx(sessionDbId) as PersistentPendingMessage | null;
}
```
**Key insight:** Messages are atomically selected and deleted in a single transaction. There is no 'processing' state for messages being actively worked on - they simply don't exist in the database anymore.
### 2. Iterator Uses claimAndDelete
**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/queue/SessionQueueProcessor.ts`
```typescript
// Lines 18-38
async *createIterator(sessionDbId: number, signal: AbortSignal): AsyncIterableIterator<PendingMessageWithId> {
while (!signal.aborted) {
try {
// Atomically claim AND DELETE next message from DB
// Message is now in memory only - no "processing" state tracking needed
const persistentMessage = this.store.claimAndDelete(sessionDbId);
if (persistentMessage) {
// Yield the message for processing (it's already deleted from queue)
yield this.toPendingMessageWithId(persistentMessage);
} else {
// Queue empty - wait for wake-up event
await this.waitForMessage(signal);
}
} catch (error) {
// ... error handling
}
}
}
```
### 3. getSessionsWithPendingMessages Still Checks Both States
**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/sqlite/PendingMessageStore.ts`
```typescript
// Lines 319-326
getSessionsWithPendingMessages(): number[] {
const stmt = this.db.prepare(`
SELECT DISTINCT session_db_id FROM pending_messages
WHERE status IN ('pending', 'processing')
`);
const results = stmt.all() as { session_db_id: number }[];
return results.map(r => r.session_db_id);
}
```
**This is technically vestigial code** - with the claim-and-delete pattern, messages should never be in 'processing' state. However, it provides backward compatibility and defense-in-depth.
### 4. Startup Recovery Still Exists
**File:** `/Users/alexnewman/Scripts/claude-mem/src/services/worker-service.ts`
```typescript
// Lines 236-242
// Recover stuck messages from previous crashes
const { PendingMessageStore } = await import('./sqlite/PendingMessageStore.js');
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
const STUCK_THRESHOLD_MS = 5 * 60 * 1000;
const resetCount = pendingStore.resetStuckMessages(STUCK_THRESHOLD_MS);
if (resetCount > 0) {
logger.info('SYSTEM', `Recovered ${resetCount} stuck messages from previous session`, { thresholdMinutes: 5 });
}
```
This runs BEFORE `processPendingQueues()` is called (line 281), which addresses the original fix request.
---
## Verification of Issue Status
### Does the Issue Exist?
**NO** - The issue as described no longer exists because:
1. **No 'processing' state during normal operation**: With claim-and-delete, messages go directly from 'pending' to 'deleted'. They never enter a 'processing' state.
2. **Startup recovery handles legacy stuck messages**: Even if 'processing' messages exist (from old code or edge cases), `resetStuckMessages()` is called BEFORE `processPendingQueues()` in `initializeBackground()` (lines 236-241 run before line 281).
3. **Architecture fundamentally changed**: The old `claimNextMessage()` function that only looked for `status = 'pending'` no longer exists. It was replaced with `claimAndDelete()`.
### GeminiAgent and OpenRouterAgent Behavior
Both agents use the same `SessionManager.getMessageIterator()` which calls `SessionQueueProcessor.createIterator()` which uses `claimAndDelete()`. All three agents (SDKAgent, GeminiAgent, OpenRouterAgent) use identical queue processing:
```typescript
// GeminiAgent.ts:174, OpenRouterAgent.ts:134
for await (const message of this.sessionManager.getMessageIterator(session.sessionDbId)) {
// ...
}
```
They do NOT handle recovery differently - they all rely on the shared infrastructure.
### What v8.5.7 Changed
Looking at the git history:
```
v8.5.7 (ac03901):
- Minor ESM/CommonJS compatibility fix for isMainModule detection
- No queue-related changes
v8.5.6 -> v8.5.7:
- f21ea97 refactor: decompose monolith into modular architecture with comprehensive test suite (#538)
```
The major refactor happened before v8.5.7. The claim-and-delete pattern was already in place.
---
## Timeline of Resolution
Based on git history, the issue was likely resolved through these commits:
1. **b8ce27b** - `feat(queue): Simplify queue processing and enhance reliability`
2. **eb1a78b** - `fix: eliminate duplicate observations by simplifying message queue`
3. **d72a81e** - `Refactor session queue processing and database interactions`
These commits appear to have introduced the claim-and-delete pattern that eliminates the original bug.
---
## Conclusion
**Issue #520 should be closed as resolved.**
The described bug (`claimNextMessage()` only checking `status = 'pending'`) no longer exists because:
1. `claimNextMessage()` was replaced with `claimAndDelete()` which atomically removes messages
2. `resetStuckMessages()` is already called at startup BEFORE `processPendingQueues()`
3. The 'processing' status is now only used for legacy compatibility and edge cases
### No Fix Needed
The proposed fix ("Add `resetStuckMessages(0)` at start of `processPendingQueues()`") is:
1. **Unnecessary** - The recovery happens in `initializeBackground()` before `processPendingQueues()` is called
2. **Using wrong threshold** - `resetStuckMessages(0)` would reset ALL processing messages immediately, which could cause issues if called during normal operation (not just startup)
The current implementation with a 5-minute threshold is more robust - it only recovers truly stuck messages, not messages that are actively being processed.
---
## Appendix: File References
| Component | File | Key Lines |
|-----------|------|-----------|
| claimAndDelete | `src/services/sqlite/PendingMessageStore.ts` | 85-104 |
| Queue Iterator | `src/services/queue/SessionQueueProcessor.ts` | 18-38 |
| Startup Recovery | `src/services/worker-service.ts` | 236-242 |
| processPendingQueues | `src/services/worker-service.ts` | 326-375 |
| getSessionsWithPendingMessages | `src/services/sqlite/PendingMessageStore.ts` | 319-326 |
| resetStuckMessages | `src/services/sqlite/PendingMessageStore.ts` | 279-290 |

View File

@@ -0,0 +1,112 @@
# Issue #527: uv Detection Fails on Apple Silicon Macs with Homebrew Installation
**Date**: 2026-01-04
**Issue**: GitHub Issue #527
**Status**: Confirmed - Fix Required
## Summary
The `isUvInstalled()` function fails to detect uv when installed via Homebrew on Apple Silicon Macs because it does not check the `/opt/homebrew/bin/uv` path.
## Analysis
### Files Affected
Two copies of `smart-install.js` exist in the codebase:
1. **Source file**: `/Users/alexnewman/Scripts/claude-mem/scripts/smart-install.js`
2. **Built/deployed file**: `/Users/alexnewman/Scripts/claude-mem/plugin/scripts/smart-install.js`
### Current uv Path Detection
**Source file (`scripts/smart-install.js`)** - Lines 22-24:
```javascript
const UV_COMMON_PATHS = IS_WINDOWS
? [join(homedir(), '.local', 'bin', 'uv.exe'), join(homedir(), '.cargo', 'bin', 'uv.exe')]
: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv'];
```
**Plugin file (`plugin/scripts/smart-install.js`)** - Lines 103-105:
```javascript
const uvPaths = IS_WINDOWS
? [join(homedir(), '.local', 'bin', 'uv.exe'), join(homedir(), '.cargo', 'bin', 'uv.exe')]
: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv'];
```
### Paths Currently Checked (Unix/macOS)
| Path | Installer | Architecture |
|------|-----------|--------------|
| `~/.local/bin/uv` | Official installer | Any |
| `~/.cargo/bin/uv` | Cargo/Rust install | Any |
| `/usr/local/bin/uv` | Homebrew (Intel) | x86_64 |
### Missing Path
| Path | Installer | Architecture |
|------|-----------|--------------|
| `/opt/homebrew/bin/uv` | Homebrew (Apple Silicon) | arm64 |
## Root Cause
Homebrew installs to different prefixes depending on architecture:
- **Intel Macs (x86_64)**: `/usr/local/bin/`
- **Apple Silicon Macs (arm64)**: `/opt/homebrew/bin/`
The current implementation only includes the Intel Homebrew path, causing detection to fail on Apple Silicon when:
1. uv is installed via `brew install uv`
2. The user's shell PATH is not available during script execution (common in non-interactive contexts)
## Impact
Users on Apple Silicon Macs who installed uv via Homebrew will:
1. See "uv not found" errors
2. Have uv unnecessarily reinstalled via the official installer
3. End up with duplicate installations
## Recommended Fix
Add `/opt/homebrew/bin/uv` to the Unix paths array.
### Source file (`scripts/smart-install.js`) - Line 24
**Before:**
```javascript
: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv'];
```
**After:**
```javascript
: [join(homedir(), '.local', 'bin', 'uv'), join(homedir(), '.cargo', 'bin', 'uv'), '/usr/local/bin/uv', '/opt/homebrew/bin/uv'];
```
### Plugin file (`plugin/scripts/smart-install.js`) - Lines 103-105 and 222-224
The same fix should be applied in both locations where `uvPaths` is defined:
- Line 105 in `isUvInstalled()`
- Line 224 in `installUv()`
### Note: Bun Has the Same Issue
The Bun detection has the same gap:
**Current (`scripts/smart-install.js` line 20):**
```javascript
: [join(homedir(), '.bun', 'bin', 'bun'), '/usr/local/bin/bun'];
```
**Should also add:**
```javascript
: [join(homedir(), '.bun', 'bin', 'bun'), '/usr/local/bin/bun', '/opt/homebrew/bin/bun'];
```
## Verification
After the fix, verify by:
1. Installing uv via Homebrew on an Apple Silicon Mac
2. Running the smart-install script
3. Confirming uv is detected without attempting reinstallation
## Conclusion
**Fix is required.** The `/opt/homebrew/bin/uv` path is missing from both files. This is a simple one-line addition to the path arrays. The same fix should also be applied to Bun detection paths for consistency.

View File

@@ -0,0 +1,324 @@
# Issue #532: Memory Leak in SessionManager - Analysis Report
**Date**: 2026-01-04
**Issue**: Memory leak causing 54GB+ VS Code memory consumption after several days of use
**Reported Root Causes**:
1. Sessions never auto-cleanup after SDK agent completes
2. `conversationHistory` array grows unbounded (never trimmed)
---
## Executive Summary
This analysis confirms **both issues exist in the current codebase** (v8.5.7). While v8.5.7 included a major modular refactor, it did **not address either memory leak issue**. The `SessionManager` holds sessions indefinitely in memory with no TTL/cleanup mechanism, and `conversationHistory` arrays grow unbounded within each session (with only OpenRouter implementing partial mitigation).
---
## 1. SessionManager Session Storage Analysis
### Location
`/Users/alexnewman/Scripts/claude-mem/src/services/worker/SessionManager.ts`
### Current Implementation
```typescript
export class SessionManager {
private sessions: Map<number, ActiveSession> = new Map();
private sessionQueues: Map<number, EventEmitter> = new Map();
// ...
}
```
Sessions are stored in an in-memory `Map<number, ActiveSession>` with the session database ID as the key.
### Session Lifecycle
| Event | Method | Behavior |
|-------|--------|----------|
| Session created | `initializeSession()` | Added to `this.sessions` Map (line 152) |
| Session deleted | `deleteSession()` | Removed from `this.sessions` Map (line 293) |
| Worker shutdown | `shutdownAll()` | Calls `deleteSession()` on all sessions |
### The Problem: No Automatic Cleanup
Looking at `/Users/alexnewman/Scripts/claude-mem/src/services/worker/http/routes/SessionRoutes.ts` (lines 213-216), the session completion handling has this comment:
```typescript
// NOTE: We do NOT delete the session here anymore.
// The generator waits for events, so if it exited, it's either aborted or crashed.
// Idle sessions stay in memory (ActiveSession is small) to listen for future events.
```
**Critical Finding**: Sessions are **intentionally never deleted** after the SDK agent completes. They persist indefinitely "to listen for future events."
### When Sessions ARE Deleted
Sessions are only deleted when:
1. Explicit `DELETE /sessions/:sessionDbId` HTTP request (manual cleanup)
2. `POST /sessions/:sessionDbId/complete` HTTP request (cleanup-hook callback)
3. Worker service shutdown (`shutdownAll()`)
There is **NO automatic cleanup mechanism** based on:
- Session age/TTL
- Session inactivity timeout
- Memory pressure
- Completed/failed status
---
## 2. conversationHistory Analysis
### Location
`/Users/alexnewman/Scripts/claude-mem/src/services/worker-types.ts` (line 34)
### Type Definition
```typescript
export interface ActiveSession {
// ...
conversationHistory: ConversationMessage[]; // Shared conversation history for provider switching
// ...
}
```
### Usage Pattern
The `conversationHistory` array is populated by three agent implementations:
1. **SDKAgent** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/SDKAgent.ts`)
- Adds user messages at lines 247, 280, 302
- Assistant responses added via `ResponseProcessor`
2. **GeminiAgent** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/GeminiAgent.ts`)
- Adds user messages at lines 143, 196, 232
- Adds assistant responses at lines 148, 202, 238
3. **OpenRouterAgent** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/OpenRouterAgent.ts`)
- Adds user messages at lines 103, 155, 191
- Adds assistant responses at lines 108, 161, 197
- **Implements truncation**: See `truncateHistory()` at lines 262-301
4. **ResponseProcessor** (`/Users/alexnewman/Scripts/claude-mem/src/services/worker/agents/ResponseProcessor.ts`)
- Adds assistant responses at line 57
### The Problem: Unbounded Growth
**For Claude SDK and Gemini agents**, there is **no limit or trimming** of `conversationHistory`. Every message is `push()`ed without checking array size.
**OpenRouter ONLY** has mitigation via `truncateHistory()`:
```typescript
private truncateHistory(history: ConversationMessage[]): ConversationMessage[] {
const MAX_CONTEXT_MESSAGES = parseInt(settings.CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES) || 20;
const MAX_ESTIMATED_TOKENS = parseInt(settings.CLAUDE_MEM_OPENROUTER_MAX_TOKENS) || 100000;
// Sliding window: keep most recent messages within limits
// ...
}
```
However, this only truncates the copy sent to OpenRouter API - **it does NOT truncate the actual `session.conversationHistory` array**. The original array still grows unbounded.
### Memory Impact Calculation
Each `ConversationMessage` contains:
- `role`: 'user' | 'assistant' (small string)
- `content`: string (can be very large - full prompts/responses)
A typical session with 100 tool uses could have:
- 1 init prompt (~2KB)
- 100 observation prompts (~5KB each = 500KB)
- 100 responses (~1KB each = 100KB)
- 1 summary prompt + response (~5KB)
**Per session**: ~600KB in `conversationHistory` alone
After several days with many sessions, this adds up to gigabytes.
---
## 3. v8.5.7 Refactor Assessment
The v8.5.7 release (2026-01-04) focused on modular architecture refactoring:
### What v8.5.7 DID:
- Extracted SQLite repositories into `/src/services/sqlite/`
- Extracted worker agents into `/src/services/worker/agents/`
- Extracted search strategies into `/src/services/worker/search/`
- Extracted context generation into `/src/services/context/`
- Extracted infrastructure into `/src/services/infrastructure/`
- Added 595 tests across 36 test files
### What v8.5.7 DID NOT address:
- No session TTL or automatic cleanup mechanism
- No `conversationHistory` size limits for Claude SDK or Gemini
- No memory pressure monitoring for sessions
- The "sessions stay in memory" design comment was already present
**Relevant v8.5.2 Note**: There was a related fix for SDK Agent child process memory leak (orphaned Claude processes), but that addressed process cleanup, not in-memory session state.
---
## 4. Specific Code Locations Requiring Fixes
### Fix Location 1: SessionManager needs cleanup mechanism
**File**: `/Users/alexnewman/Scripts/claude-mem/src/services/worker/SessionManager.ts`
Add automatic session cleanup based on:
- Session completion (when generator finishes and no pending work)
- Session age TTL (e.g., 1 hour after last activity)
- Memory pressure (configurable max sessions)
### Fix Location 2: conversationHistory needs bounds
**Files**:
- `/Users/alexnewman/Scripts/claude-mem/src/services/worker/SDKAgent.ts`
- `/Users/alexnewman/Scripts/claude-mem/src/services/worker/GeminiAgent.ts`
- `/Users/alexnewman/Scripts/claude-mem/src/services/worker/agents/ResponseProcessor.ts`
Apply sliding window truncation similar to OpenRouterAgent's approach, but mutate the original array.
### Fix Location 3: Session cleanup on completion
**File**: `/Users/alexnewman/Scripts/claude-mem/src/services/worker/http/routes/SessionRoutes.ts`
Remove the design decision to keep idle sessions in memory. Add cleanup timer after generator completes.
---
## 5. Recommended Fixes
### Fix 1: Add Session TTL and Cleanup Timer
```typescript
// In SessionManager.ts
private readonly SESSION_TTL_MS = 60 * 60 * 1000; // 1 hour
private cleanupTimers: Map<number, NodeJS.Timeout> = new Map();
/**
* Schedule automatic cleanup for idle sessions
*/
scheduleSessionCleanup(sessionDbId: number): void {
// Clear existing timer if any
const existingTimer = this.cleanupTimers.get(sessionDbId);
if (existingTimer) {
clearTimeout(existingTimer);
}
// Schedule cleanup after TTL
const timer = setTimeout(() => {
const session = this.sessions.get(sessionDbId);
if (session && !session.generatorPromise) {
// Only delete if no active generator
this.deleteSession(sessionDbId);
logger.info('SESSION', 'Session auto-cleaned due to TTL', { sessionDbId });
}
}, this.SESSION_TTL_MS);
this.cleanupTimers.set(sessionDbId, timer);
}
/**
* Cancel cleanup timer (call when session receives new work)
*/
cancelSessionCleanup(sessionDbId: number): void {
const timer = this.cleanupTimers.get(sessionDbId);
if (timer) {
clearTimeout(timer);
this.cleanupTimers.delete(sessionDbId);
}
}
```
### Fix 2: Add conversationHistory Bounds
```typescript
// In src/services/worker/SessionManager.ts or new utility file
const MAX_CONVERSATION_HISTORY_LENGTH = 50; // Configurable
/**
* Trim conversation history to prevent unbounded growth
* Keeps the most recent messages
*/
export function trimConversationHistory(session: ActiveSession): void {
if (session.conversationHistory.length > MAX_CONVERSATION_HISTORY_LENGTH) {
const toRemove = session.conversationHistory.length - MAX_CONVERSATION_HISTORY_LENGTH;
session.conversationHistory.splice(0, toRemove);
logger.debug('SESSION', 'Trimmed conversation history', {
sessionDbId: session.sessionDbId,
removed: toRemove,
remaining: session.conversationHistory.length
});
}
}
```
Then call this after each message is added in SDKAgent, GeminiAgent, and ResponseProcessor.
### Fix 3: Update SessionRoutes Generator Completion
```typescript
// In SessionRoutes.ts, update the finally block (around line 164)
.finally(() => {
const sessionDbId = session.sessionDbId;
const wasAborted = session.abortController.signal.aborted;
if (wasAborted) {
logger.info('SESSION', `Generator aborted`, { sessionId: sessionDbId });
} else {
logger.info('SESSION', `Generator completed naturally`, { sessionId: sessionDbId });
}
session.generatorPromise = null;
session.currentProvider = null;
this.workerService.broadcastProcessingStatus();
// Check for pending work
const pendingStore = this.sessionManager.getPendingMessageStore();
const pendingCount = pendingStore.getPendingCount(sessionDbId);
if (pendingCount > 0 && !wasAborted) {
// Restart for pending work
// ... existing restart logic ...
} else {
// No pending work - schedule cleanup instead of keeping forever
this.sessionManager.scheduleSessionCleanup(sessionDbId);
}
});
```
---
## 6. Configuration Recommendations
Add these to `settings.json` defaults:
```json
{
"CLAUDE_MEM_SESSION_TTL_MINUTES": 60,
"CLAUDE_MEM_MAX_CONVERSATION_HISTORY": 50,
"CLAUDE_MEM_MAX_ACTIVE_SESSIONS": 100
}
```
---
## 7. Testing Recommendations
Add tests for:
1. Session cleanup after TTL expires
2. `conversationHistory` trimming at various sizes
3. Memory monitoring under sustained load
4. Cleanup timer cancellation on new work
---
## Summary
| Issue | Status in v8.5.7 | Fix Required |
|-------|------------------|--------------|
| Sessions never auto-cleanup | NOT FIXED | Yes - add TTL/cleanup mechanism |
| conversationHistory unbounded | NOT FIXED (except partial OpenRouter mitigation) | Yes - add trimming to all agents |
Both memory leaks are confirmed to exist in the current codebase and require the fixes outlined above.

View File

@@ -1,6 +1,6 @@
{
"name": "claude-mem-plugin",
"version": "8.5.6",
"version": "8.5.7",
"private": true,
"description": "Runtime dependencies for claude-mem bundled hooks",
"type": "module",

File diff suppressed because one or more lines are too long