fix: Claude Code 2.1.1 compatibility + log-level audit + path validation fixes (#614)

* Refactor CLAUDE.md and related files for December 2025 updates

- Updated CLAUDE.md in src/services/worker with new entries for December 2025, including changes to Search.ts, GeminiAgent.ts, SDKAgent.ts, and SessionManager.ts.
- Revised CLAUDE.md in src/shared to reflect updates and new entries for December 2025, including paths.ts and worker-utils.ts.
- Modified hook-constants.ts to clarify exit codes and their behaviors.
- Added comprehensive hooks reference documentation for Claude Code, detailing usage, events, and examples.
- Created initial CLAUDE.md files in various directories to track recent activity.

* fix: Merge user-message-hook output into context-hook hookSpecificOutput

- Add footer message to additionalContext in context-hook.ts
- Remove user-message-hook from SessionStart hooks array
- Fixes issue where stderr+exit(1) approach was silently discarded

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update logs and documentation for recent plugin and worker service changes

- Added detailed logs for worker service activities from Dec 10, 2025 to Jan 7, 2026, including initialization patterns, cleanup confirmations, and diagnostic logging.
- Updated plugin documentation with recent activities, including plugin synchronization and configuration changes from Dec 3, 2025 to Jan 7, 2026.
- Enhanced the context hook and worker service logs to reflect improvements and fixes in the plugin architecture.
- Documented the migration and verification processes for the Claude memory system and its integration with the marketplace.

* Refactor hooks architecture and remove deprecated user-message-hook

- Updated hook configurations in CLAUDE.md and hooks.json to reflect changes in session start behavior.
- Removed user-message-hook functionality as it is no longer utilized in Claude Code 2.1.0; context is now injected silently.
- Enhanced context-hook to handle session context injection without user-visible messages.
- Cleaned up documentation across multiple files to align with the new hook structure and removed references to obsolete hooks.
- Adjusted timing and command execution for hooks to improve performance and reliability.

* fix: Address PR #610 review issues

- Replace USER_MESSAGE_ONLY test with BLOCKING_ERROR test in hook-constants.test.ts
- Standardize Claude Code 2.1.0 note wording across all three documentation files
- Exclude deprecated user-message-hook.ts from logger-usage-standards test

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: Remove hardcoded fake token counts from context injection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Address PR #610 review issues by fixing test files, standardizing documentation notes, and verifying code quality improvements.

* fix: Add path validation to CLAUDE.md distribution to prevent invalid directory creation

- Add isValidPathForClaudeMd() function to reject invalid paths:
  - Tilde paths (~) that Node.js doesn't expand
  - URLs (http://, https://)
  - Paths with spaces (likely command text or PR references)
  - Paths with # (GitHub issue/PR references)
  - Relative paths that escape project boundary

- Integrate validation in updateFolderClaudeMdFiles loop
- Add 6 unit tests for path validation
- Update .gitignore to prevent accidental commit of malformed directories
- Clean up existing invalid directories (~/, PR #610..., git diff..., https:)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: Implement path validation in CLAUDE.md generation to prevent invalid directory creation

- Added `isValidPathForClaudeMd()` function to validate file paths in `src/utils/claude-md-utils.ts`.
- Integrated path validation in `updateFolderClaudeMdFiles` to skip invalid paths.
- Added 6 new unit tests in `tests/utils/claude-md-utils.test.ts` to cover various rejection cases.
- Updated `.gitignore` to prevent tracking of invalid directories.
- Cleaned up existing invalid directories in the repository.

* feat: Promote critical WARN logs to ERROR level across codebase

Comprehensive log-level audit promoting 38+ WARN messages to ERROR for
improved debugging and incident response:

- Parser: observation type errors, data contamination
- SDK/Agents: empty init responses (Gemini, OpenRouter)
- Worker/Queue: session recovery, auto-recovery failures
- Chroma: sync failures, search failures (now treated as critical)
- SQLite: search failures (primary data store)
- Session/Generator: failures, missing context
- Infrastructure: shutdown, process management failures
- File Operations: CLAUDE.md updates, config reads
- Branch Management: recovery checkout failures

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: Address PR #614 review issues

- Remove incorrectly tracked tilde-prefixed files from git
- Fix absolute path validation to check projectRoot boundaries
- Add test coverage for absolute path validation edge cases

Closes review issues:
- Issue 1: ~/ prefixed files removed from tracking
- Issue 3: Absolute paths now validated against projectRoot
- Issue 4: Added 3 new test cases for absolute path scenarios

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* build assets and context

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-01-07 23:34:20 -05:00
committed by GitHub
parent 687146ce53
commit 2659ec3231
98 changed files with 8927 additions and 3554 deletions

View File

@@ -3,32 +3,5 @@
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
### Dec 29, 2025
**docs**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34316 | 11:02 PM | 🔵 | Mintlify Documentation Structure Identified | ~242 |
### Jan 1, 2026
**SESSION_ID_ARCHITECTURE.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #35668 | 11:40 PM | ✅ | Main branch updated with major feature additions | ~377 |
| #35565 | 9:55 PM | ✅ | Session ID architecture documentation created | ~510 |
### Jan 3, 2026
**SESSION_ID_ARCHITECTURE.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36689 | 11:56 PM | 🔵 | PR #538 Review Findings - Modular Architecture Refactor | ~590 |
| #36632 | 10:45 PM | 🔵 | SESSION_ID_ARCHITECTURE.md Documents Placeholder Pattern With ContentSessionId | ~584 |
| #36625 | 10:44 PM | 🔵 | Documentation and Code Reveal Placeholder Detection Pattern | ~583 |
**anti-pattern-cleanup-plan.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36390 | 8:50 PM | 🔄 | Comprehensive Monolith Refactor with Modular Architecture | ~724 |
*No recent activity*
</claude-mem-context>

View File

@@ -3,111 +3,75 @@
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
### Nov 13, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #7806 | 4:54 PM | 🔵 | PR #101 Enhancement: Continuation Prompt Token Reduction | ~634 |
### Nov 16, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #9976 | 11:35 PM | 🔵 | Endless Mode Architecture Plan Documented | ~661 |
| #9967 | 11:18 PM | ⚖️ | Endless Mode Architecture: Immutable Storage with Ephemeral Transform | ~217 |
### Nov 17, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #10131 | 1:22 AM | 🔵 | Endless Mode Token Economics Analysis Output: Complete Infrastructure Impact | ~542 |
| #10130 | " | ✅ | Integration of Actual Compute Savings Analysis into Main Execution Flow | ~258 |
| #10129 | " | 🔵 | Prompt Caching Economics: User Cost vs. Anthropic Compute Cost Divergence | ~451 |
| #10126 | 1:19 AM | 🔴 | Fix Return Statement Variable Names in playTheTapeThrough Function | ~313 |
| #10125 | " | ✅ | Redesign Timeline Display to Show Fresh/Cached Token Breakdown and Real Dollar Costs | ~501 |
| #10124 | " | ✅ | Replace Estimated Cost Model with Actual Caching-Based Costs in Anthropic Scale Analysis | ~516 |
| #10123 | " | ✅ | Pivot Session Length Comparison Table from Token to Cost Metrics | ~413 |
| #10122 | " | ✅ | Add Dual Reporting: Token Count vs Actual Cost in Comparison Output | ~410 |
| #10121 | 1:18 AM | ✅ | Apply Prompt Caching Cost Model to Endless Mode Calculation Function | ~501 |
| #10120 | " | ✅ | Integrate Prompt Caching Cost Calculations into Without-Endless-Mode Function | ~426 |
| #10119 | " | ✅ | Display Prompt Caching Pricing in Initial Calculator Output | ~297 |
| #10118 | " | ✅ | Add Prompt Caching Pricing Model to Token Economics Calculator | ~316 |
| #10115 | 1:15 AM | 🟣 | Token Economics Calculator for Endless Mode Sessions | ~465 |
| #10013 | 12:13 AM | 🔵 | Duplicate Agent SDK TypeScript Reference Documentation | ~340 |
| #10012 | " | 🔵 | Agent SDK TypeScript API Reference Complete | ~349 |
### Nov 18, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #11738 | 11:51 PM | ⚖️ | Comprehensive Architecture Document Created for Phase 1 | ~868 |
| #11711 | 11:44 PM | 🔵 | Language Model Tool Documentation Index | ~282 |
| #11710 | " | 🔵 | Language Model Tool API Implementation Guide | ~718 |
| #11709 | 11:43 PM | 🔵 | Comprehensive Copilot Extension Implementation Plan | ~624 |
| #11708 | " | 🔵 | VS Code Chat Sample Documentation Unavailable | ~327 |
| #11707 | " | 🔵 | VS Code Language Model API Structure and Capabilities | ~515 |
| #11705 | " | ⚖️ | VS Code Extension Development Planning Phase Initiated | ~327 |
| #11206 | 3:01 PM | 🔵 | mem-search skill architecture and migration details retrieved in full format | ~538 |
### Nov 25, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #15538 | 8:36 PM | 🔵 | Context Document for Landing Page Refinements | ~381 |
| #15314 | 5:04 PM | 🔵 | Endless Mode Documentation Post Retrieved with 156 Lines | ~671 |
### Dec 20, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #31257 | 8:58 PM | ⚖️ | Eight Conflict Detection Hypotheses Evaluated with Simulation Results | ~525 |
| #31256 | " | 🔵 | Supersession vs Conflict Detection Feature Analysis | ~515 |
### Dec 30, 2025
**agent-sdk-v2-docs.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34477 | 2:25 PM | | V2 Upgrade Branch Modifies Four Files with Net Code Reduction | ~328 |
| #34466 | 2:24 PM | 🔵 | Agent SDK V2 Documentation Reveals Correct API Surface | ~399 |
| #34425 | 2:04 PM | 🔵 | Agent SDK V2 API Documentation and Migration Patterns | ~698 |
| #34422 | 2:03 PM | ✅ | Added Agent SDK V2 Documentation Files | ~240 |
| #34419 | 2:02 PM | ✅ | Committed Agent SDK V2 upgrade preparation | ~275 |
| #34520 | 2:34 PM | 🔵 | V2 Example Code Demonstrates All Key Patterns | ~537 |
### Jan 7, 2026
**agent-sdk-v2-example.ts**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34431 | 2:06 PM | 🔵 | V2 SDK Executable Example Code Patterns | ~710 |
| #34428 | 2:05 PM | 🔵 | Agent SDK v2 Example File Reading Requested | ~204 |
| #34411 | 1:55 PM | ⚖️ | Agent SDK V2 Migration Plan Created | ~519 |
| #34410 | " | ⚖️ | Agent SDK V2 Migration Strategy Analysis | ~499 |
| #34406 | 1:54 PM | 🔵 | Comprehensive V2 Migration Analysis Shows Architectural Incompatibility | ~556 |
| #34401 | " | 🔵 | Agent SDK V2 API Design and Patterns | ~435 |
### Dec 31, 2025
**agent-sdk-v2-preview.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34635 | 3:04 PM | 🟣 | Documentation-First Workflow Design Document Created | ~769 |
| #34616 | 2:59 PM | ⚖️ | Documentation-First Workflow Agent Analyzing V2 SDK Examples and Patterns | ~515 |
| #34581 | 2:44 PM | 🔵 | V2 Session API Official Documentation Confirms Clear Separation Pattern | ~446 |
**agent-sdk-v2-examples.ts**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34605 | 2:57 PM | 🔵 | V2 SDK Migration Failure Root Cause: Context Rot and Knowledge Transfer Gaps | ~550 |
| #34583 | 2:44 PM | 🔵 | Executable V2 Examples Demonstrate All Core Patterns With Crystal Clear Code | ~471 |
| #34580 | 2:43 PM | 🔵 | V2 Session API Documentation Shows Clear Send/Receive Pattern Despite Naming Confusion | ~364 |
### Jan 1, 2026
**try-catch-audit-report.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #35635 | 11:14 PM | ✅ | Removed Temporary Try-Catch Audit Report | ~224 |
| #35463 | 8:59 PM | 🟣 | Enforceable Anti-Pattern Detection System for Try-Catch Abuse | ~485 |
| #35462 | " | 🟣 | Error handling audit tooling and documentation added | ~271 |
| #35435 | 8:46 PM | ✅ | Try-Catch Audit Report Documented in Markdown | ~781 |
**agent-sdk-v2-examples.ts**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #35429 | 6:32 PM | ⚖️ | KISS Principle Applied to SDKAgent.ts Defensive Code | ~322 |
| #35412 | 5:25 PM | 🔵 | Canonical Example Shows Cost Tracking Without Token Usage Checks | ~289 |
| #35406 | 5:24 PM | ⚖️ | Categorized Conditionals as Required Business Logic vs Defensive Code | ~377 |
| #35404 | " | 🔵 | Canonical V2 SDK Patterns Require Message Type Checking | ~313 |
| #35398 | 5:23 PM | 🟣 | Added comprehensive Claude Agent SDK V2 examples | ~231 |
| #35383 | 3:11 PM | 🔵 | Phase 2 code quality review identified 5 critical bugs | ~604 |
| #35382 | 3:08 PM | ✅ | Phase 2 Anti-Pattern Recheck Passed | ~315 |
| #35379 | 3:07 PM | 🔵 | Phase 2 Anti-Pattern Check Found Content Extraction Bug | ~373 |
| #35353 | 3:01 PM | 🔵 | Phase 2 preparation analysis completed | ~502 |
| #35340 | 2:58 PM | 🔵 | Phase 1 Anti-Pattern Check Reveals V1/V2 API Mismatch | ~352 |
| #35330 | 2:54 PM | 🔵 | Claude Agent SDK V2 API Pattern Documentation | ~305 |
| #35329 | 2:53 PM | 🔵 | Agent SDK V2 API patterns and capabilities | ~453 |
| #35292 | 1:23 PM | 🔵 | Agent SDK query() resume parameter: SDK-generated session IDs cannot be predetermined | ~543 |
**sdkagent-removal-list.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #35425 | 5:27 PM | ✅ | Reorganized Removal List with DELETE/SIMPLIFY Section Header | ~240 |
| #35424 | " | ✅ | Revised Token Tracking from Delete to Simplify with Hardcoded Zero | ~365 |
| #35416 | 5:25 PM | 🟣 | Created Actionable Removal Checklist for SDKAgent.ts Cleanup | ~431 |
**agent-sdk-v2-preview.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #35414 | 5:25 PM | 🔵 | V2 Preview Documentation Shows Direct Result Access Pattern | ~281 |
| #35289 | 1:23 PM | 🔵 | Agent SDK V2 preview documentation: Explicit session lifecycle with createSession and resumeSession | ~523 |
**sdkagent-conditional-logic-CORRECTED.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #35409 | 5:24 PM | 🟣 | Created Comprehensive Conditional Logic Removal Report | ~488 |
**dont-be-an-idiot.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #35400 | 5:23 PM | 🔄 | Phase 1 SDK V2 migration - Updated imports and added API documentation | ~294 |
| #35347 | 3:01 PM | ✅ | Phase 1 commit completed via Task agent | ~374 |
| #35346 | " | ✅ | Committed Phase 1 of V2 API migration | ~333 |
| #35345 | 3:00 PM | ✅ | Phase 1 Changes Committed to Git | ~338 |
| #35343 | " | ✅ | Phase 1 Git Status Shows Modified Files | ~315 |
| #35334 | 2:55 PM | 🔵 | V2 SDK Migration Preparation Complete | ~368 |
| #35332 | 2:54 PM | 🟣 | SDK V2 API Documentation Created | ~305 |
| #35331 | " | ✅ | Created documentation affirming V2 API stability | ~358 |
### Jan 2, 2026
**try-catch-audit-report.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #35703 | 1:01 PM | ⚖️ | Try-Catch as Anti-Pattern: Root Cause Analysis and Documentation | ~363 |
### Jan 6, 2026
**windows-code-evaluation.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38104 | 12:14 AM | 🔵 | Windows Compatibility Issues Documented Across 56 Memory Entries | ~509 |
| #38209 | 7:39 PM | 🔵 | Claude Code Hooks System Architecture and Usage | ~491 |
</claude-mem-context>

View File

@@ -0,0 +1,338 @@
# Get started with Claude Code hooks
> Learn how to customize and extend Claude Code's behavior by registering shell commands
Claude Code hooks are user-defined shell commands that execute at various points
in Claude Code's lifecycle. Hooks provide deterministic control over Claude
Code's behavior, ensuring certain actions always happen rather than relying on
the LLM to choose to run them.
<Tip>
For reference documentation on hooks, see [Hooks reference](/en/hooks).
</Tip>
Example use cases for hooks include:
* **Notifications**: Customize how you get notified when Claude Code is awaiting
your input or permission to run something.
* **Automatic formatting**: Run `prettier` on .ts files, `gofmt` on .go files,
etc. after every file edit.
* **Logging**: Track and count all executed commands for compliance or
debugging.
* **Feedback**: Provide automated feedback when Claude Code produces code that
does not follow your codebase conventions.
* **Custom permissions**: Block modifications to production files or sensitive
directories.
By encoding these rules as hooks rather than prompting instructions, you turn
suggestions into app-level code that executes every time it is expected to run.
<Warning>
You must consider the security implication of hooks as you add them, because hooks run automatically during the agent loop with your current environment's credentials.
For example, malicious hooks code can exfiltrate your data. Always review your hooks implementation before registering them.
For full security best practices, see [Security Considerations](/en/hooks#security-considerations) in the hooks reference documentation.
</Warning>
## Hook Events Overview
Claude Code provides several hook events that run at different points in the
workflow:
* **PreToolUse**: Runs before tool calls (can block them)
* **PermissionRequest**: Runs when a permission dialog is shown (can allow or deny)
* **PostToolUse**: Runs after tool calls complete
* **UserPromptSubmit**: Runs when the user submits a prompt, before Claude processes it
* **Notification**: Runs when Claude Code sends notifications
* **Stop**: Runs when Claude Code finishes responding
* **SubagentStop**: Runs when subagent tasks complete
* **PreCompact**: Runs before Claude Code is about to run a compact operation
* **SessionStart**: Runs when Claude Code starts a new session or resumes an existing session
* **SessionEnd**: Runs when Claude Code session ends
Each event receives different data and can control Claude's behavior in
different ways.
## Quickstart
In this quickstart, you'll add a hook that logs the shell commands that Claude
Code runs.
### Prerequisites
Install `jq` for JSON processing in the command line.
### Step 1: Open hooks configuration
Run the `/hooks` [slash command](/en/slash-commands) and select
the `PreToolUse` hook event.
`PreToolUse` hooks run before tool calls and can block them while providing
Claude feedback on what to do differently.
### Step 2: Add a matcher
Select `+ Add new matcher…` to run your hook only on Bash tool calls.
Type `Bash` for the matcher.
<Note>You can use `*` to match all tools.</Note>
### Step 3: Add the hook
Select `+ Add new hook…` and enter this command:
```bash theme={null}
jq -r '"\(.tool_input.command) - \(.tool_input.description // "No description")"' >> ~/.claude/bash-command-log.txt
```
### Step 4: Save your configuration
For storage location, select `User settings` since you're logging to your home
directory. This hook will then apply to all projects, not just your current
project.
Then press `Esc` until you return to the REPL. Your hook is now registered.
### Step 5: Verify your hook
Run `/hooks` again or check `~/.claude/settings.json` to see your configuration:
```json theme={null}
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "jq -r '\"\\(.tool_input.command) - \\(.tool_input.description // \"No description\")\"' >> ~/.claude/bash-command-log.txt"
}
]
}
]
}
}
```
### Step 6: Test your hook
Ask Claude to run a simple command like `ls` and check your log file:
```bash theme={null}
cat ~/.claude/bash-command-log.txt
```
You should see entries like:
```
ls - Lists files and directories
```
## More Examples
<Note>
For a complete example implementation, see the [bash command validator example](https://github.com/anthropics/claude-code/blob/main/examples/hooks/bash_command_validator_example.py) in our public codebase.
</Note>
### Code Formatting Hook
Automatically format TypeScript files after editing:
```json theme={null}
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "jq -r '.tool_input.file_path' | { read file_path; if echo \"$file_path\" | grep -q '\\.ts$'; then npx prettier --write \"$file_path\"; fi; }"
}
]
}
]
}
}
```
### Markdown Formatting Hook
Automatically fix missing language tags and formatting issues in markdown files:
```json theme={null}
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/markdown_formatter.py"
}
]
}
]
}
}
```
Create `.claude/hooks/markdown_formatter.py` with this content:
````python theme={null}
#!/usr/bin/env python3
"""
Markdown formatter for Claude Code output.
Fixes missing language tags and spacing issues while preserving code content.
"""
import json
import sys
import re
import os
def detect_language(code):
"""Best-effort language detection from code content."""
s = code.strip()
# JSON detection
if re.search(r'^\s*[{\[]', s):
try:
json.loads(s)
return 'json'
except:
pass
# Python detection
if re.search(r'^\s*def\s+\w+\s*\(', s, re.M) or \
re.search(r'^\s*(import|from)\s+\w+', s, re.M):
return 'python'
# JavaScript detection
if re.search(r'\b(function\s+\w+\s*\(|const\s+\w+\s*=)', s) or \
re.search(r'=>|console\.(log|error)', s):
return 'javascript'
# Bash detection
if re.search(r'^#!.*\b(bash|sh)\b', s, re.M) or \
re.search(r'\b(if|then|fi|for|in|do|done)\b', s):
return 'bash'
# SQL detection
if re.search(r'\b(SELECT|INSERT|UPDATE|DELETE|CREATE)\s+', s, re.I):
return 'sql'
return 'text'
def format_markdown(content):
"""Format markdown content with language detection."""
# Fix unlabeled code fences
def add_lang_to_fence(match):
indent, info, body, closing = match.groups()
if not info.strip():
lang = detect_language(body)
return f"{indent}```{lang}\n{body}{closing}\n"
return match.group(0)
fence_pattern = r'(?ms)^([ \t]{0,3})```([^\n]*)\n(.*?)(\n\1```)\s*$'
content = re.sub(fence_pattern, add_lang_to_fence, content)
# Fix excessive blank lines (only outside code fences)
content = re.sub(r'\n{3,}', '\n\n', content)
return content.rstrip() + '\n'
# Main execution
try:
input_data = json.load(sys.stdin)
file_path = input_data.get('tool_input', {}).get('file_path', '')
if not file_path.endswith(('.md', '.mdx')):
sys.exit(0) # Not a markdown file
if os.path.exists(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
formatted = format_markdown(content)
if formatted != content:
with open(file_path, 'w', encoding='utf-8') as f:
f.write(formatted)
print(f"✓ Fixed markdown formatting in {file_path}")
except Exception as e:
print(f"Error formatting markdown: {e}", file=sys.stderr)
sys.exit(1)
````
Make the script executable:
```bash theme={null}
chmod +x .claude/hooks/markdown_formatter.py
```
This hook automatically:
* Detects programming languages in unlabeled code blocks
* Adds appropriate language tags for syntax highlighting
* Fixes excessive blank lines while preserving code content
* Only processes markdown files (`.md`, `.mdx`)
### Custom Notification Hook
Get desktop notifications when Claude needs input:
```json theme={null}
{
"hooks": {
"Notification": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "notify-send 'Claude Code' 'Awaiting your input'"
}
]
}
]
}
}
```
### File Protection Hook
Block edits to sensitive files:
```json theme={null}
{
"hooks": {
"PreToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "python3 -c \"import json, sys; data=json.load(sys.stdin); path=data.get('tool_input',{}).get('file_path',''); sys.exit(2 if any(p in path for p in ['.env', 'package-lock.json', '.git/']) else 0)\""
}
]
}
]
}
}
```
## Learn more
* For reference documentation on hooks, see [Hooks reference](/en/hooks).
* For comprehensive security best practices and safety guidelines, see [Security Considerations](/en/hooks#security-considerations) in the hooks reference documentation.
* For troubleshooting steps and debugging techniques, see [Debugging](/en/hooks#debugging) in the hooks reference
documentation.
---
> To find navigation and other pages in this documentation, fetch the llms.txt file at: https://code.claude.com/docs/llms.txt

View File

@@ -93,165 +93,84 @@ npx mintlify dev
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
### Dec 28, 2025
### Nov 18, 2025
**platform-integration.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #33540 | 10:55 PM | 🔵 | Grep search found mem-search references in internationalized documentation | ~577 |
| #33522 | 10:46 PM | 🔵 | Platform integration documentation describes Worker API architecture | ~351 |
| #11206 | 3:01 PM | 🔵 | mem-search skill architecture and migration details retrieved in full format | ~538 |
### Nov 21, 2025
**docs.json**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #33312 | 3:09 PM | | OpenRouter Provider Documentation | ~497 |
| #13221 | 2:01 AM | 🔴 | Fixed broken markdown link to Viewer UI documentation | ~316 |
| #13220 | 2:00 AM | 🔴 | Escaped HTML less-than symbol in universal architecture timeout documentation | ~316 |
| #13216 | 1:54 AM | ✅ | Universal Architecture Added to Navigation | ~330 |
| #13215 | " | 🟣 | Universal AI Memory Architecture Documentation Created | ~732 |
| #13213 | 1:50 AM | 🔵 | Introduction Page Content and Recent v6.0.0 Release | ~495 |
| #13212 | " | 🔵 | Architecture Evolution Documentation Structure | ~408 |
| #13211 | " | 🔵 | Mintlify Documentation Site Configuration | ~430 |
| #13209 | 1:48 AM | 🔵 | Public Documentation Structure and Guidelines | ~383 |
### Nov 25, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #14994 | 2:22 PM | ✅ | Version Channel Section Added to Configuration Documentation | ~301 |
| #14993 | " | ✅ | Beta Features Added to Documentation Navigation | ~188 |
| #14992 | 2:21 PM | 🟣 | Beta Features Documentation Page Created | ~488 |
| #14991 | " | 🔵 | Mintlify Navigation Structure and Documentation Groups | ~394 |
| #14989 | " | 🔵 | Installation Documentation with Quick Start and Verification Steps | ~383 |
| #14988 | " | 🔵 | Configuration Documentation Structure and Environment Variables | ~338 |
### Nov 26, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #16190 | 10:22 PM | 🔵 | RAGTIME Search Retrieved Five Observations About Claude-Mem vs RAG Architecture | ~637 |
### Dec 3, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #19884 | 9:42 PM | 🔵 | Configuration system and environment variables | ~701 |
| #19878 | 9:40 PM | 🔵 | Installation process and system architecture | ~486 |
### Dec 8, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #22335 | 10:26 PM | 🔵 | Mintlify documentation configuration analyzed | ~534 |
| #22311 | 9:47 PM | 🔵 | Comprehensive Hooks Architecture Documentation Review | ~263 |
| #22297 | 9:43 PM | 🔵 | Mintlify Documentation Framework Configuration | ~446 |
| #22294 | " | 🔵 | Documentation Site Structure Located | ~359 |
### Dec 9, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #23179 | 10:44 PM | ✅ | Removed explanatory reasons from tool exclusion documentation | ~297 |
### Dec 15, 2025
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #27038 | 6:02 PM | 🔵 | 95% token reduction claims found only in private experimental documents, not in main public docs | ~513 |
| #27037 | " | 🔵 | Branch switching functionality exists in SettingsRoutes with UI switcher removal intent | ~463 |
| #26986 | 5:24 PM | ✅ | Updated Endless Mode latency warning in beta features documentation | ~299 |
### Dec 29, 2025
**docs.json**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34321 | 11:03 PM | 🔵 | Mintlify Navigation Configuration Defines Expected File Paths | ~330 |
| #34215 | 10:08 PM | 🔵 | Retrieved Detailed Cursor Integration Implementation History | ~676 |
| #34148 | 9:28 PM | 🟣 | Cursor IDE Integration with Cross-Platform Hooks and Documentation | ~514 |
| #34112 | 9:07 PM | 🟣 | Committed Cursor Public Documentation to Repository | ~427 |
| #34108 | 9:06 PM | 🟣 | Added Cursor Integration Section to Documentation Navigation | ~441 |
| #34101 | 9:04 PM | 🔵 | Documentation Navigation Configuration Using Mintlify | ~445 |
| #33938 | 6:27 PM | 🔵 | Relevant CLAUDE.md Context Identified for PR #492 | ~435 |
| #33750 | 12:25 AM | | Documentation Update: Removed Version Number from Architecture Evolution | ~281 |
### Jan 7, 2026
**public**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34318 | 11:03 PM | 🔵 | Mintlify Documentation Files and Configuration Located | ~294 |
**installation.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34102 | 9:04 PM | 🔵 | Current Installation Documentation Targets Claude Code Plugin Users | ~485 |
**progressive-disclosure.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #33773 | 12:38 AM | ✅ | Replaced two-tier with three-layer workflow documentation | ~431 |
| #33772 | " | ✅ | Updated progressive disclosure docs with 3-layer MCP workflow | ~399 |
| #33771 | 12:37 AM | 🔵 | Progressive disclosure design rationale documented | ~508 |
| #33770 | " | 🔵 | Progressive disclosure documentation reviewed | ~479 |
| #33715 | 12:18 AM | ✅ | Future enhancements section updated for current API structure | ~308 |
| #33712 | " | ✅ | Progressive disclosure docs updated to reflect 3-layer workflow | ~363 |
| #33702 | 12:09 AM | ⚖️ | Documentation Update Strategy Finalized for MCP Architecture Transition | ~845 |
| #33696 | 12:07 AM | 🔵 | Progressive Disclosure Philosophy Requires Tool Name Updates | ~687 |
| #33685 | 12:04 AM | 🔵 | Progressive Disclosure Philosophy Document References Deprecated Tools | ~543 |
**architecture-evolution.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #33763 | 12:27 AM | ✅ | Pull request #480 created for MCP architecture documentation updates | ~423 |
| #33760 | 12:26 AM | ✅ | Major documentation overhaul across 6 files with 908 additions | ~367 |
| #33749 | 12:25 AM | ✅ | Documentation Version Reference Removed | ~282 |
| #33747 | " | ✅ | Removed specific version reference from MCP architecture section | ~279 |
| #33745 | " | ✅ | Documentation Version Reference Simplified | ~224 |
| #33744 | " | ✅ | Removed fabricated version number from architecture documentation | ~319 |
| #33726 | 12:20 AM | 🟣 | v6.5.0 architecture evolution documentation added | ~599 |
| #33700 | 12:08 AM | 🔵 | Architecture Evolution Document Contains Historical MCP Tool References | ~625 |
**troubleshooting.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #33722 | 12:19 AM | ✅ | Token limit troubleshooting completely rewritten for 3-layer workflow | ~426 |
| #33720 | " | ✅ | Troubleshooting search query examples updated to current API | ~329 |
| #33717 | " | 🔵 | Troubleshooting docs contain outdated search_observations references | ~333 |
| #33693 | 12:06 AM | 🔵 | Troubleshooting Documentation Contains Deprecated Search Tool Syntax | ~576 |
**introduction.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #33687 | 12:05 AM | 🔵 | Introduction Documentation References mem-search Skill | ~426 |
### Dec 31, 2025
**troubleshooting.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34688 | 3:40 PM | 🔵 | Worker Logs Command Usage Across Codebase | ~320 |
### Jan 1, 2026
**context-engineering.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #35459 | 8:57 PM | 🔵 | Existing Coding Standards and Anti-Pattern References in Codebase | ~600 |
### Jan 2, 2026
**architecture-evolution.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36096 | 10:23 PM | 🔵 | Observation API Function Names Located | ~227 |
### Jan 3, 2026
**architecture-evolution.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36353 | 8:42 PM | 🔵 | Multiple observation table definitions found across codebase | ~280 |
### Jan 4, 2026
**hooks-architecture.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36941 | 2:43 AM | 🔵 | Context Injection Header Format | ~220 |
### Jan 5, 2026
**CLAUDE.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38082 | 10:13 PM | ✅ | Merge Conflict Resolution - Kept Feature Branch Versions | ~431 |
**progressive-disclosure.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38069 | 9:51 PM | 🔵 | Progressive Disclosure Philosophy Documentation | ~546 |
**docs.json**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38066 | 9:50 PM | ✅ | v9.0 Documentation Audit Completed with 14 Files Updated | ~547 |
| #38064 | " | ⚖️ | 9.0 Release Documentation Audit Complete - Major Gaps Identified | ~997 |
| #38035 | 9:42 PM | 🔵 | Documentation Navigation Structure for 9.0 Release | ~422 |
**modes.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38060 | 9:49 PM | 🔵 | Modes System Documentation for Workflow and Language Configuration | ~514 |
**configuration.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38054 | 9:47 PM | 🔵 | Configuration Documentation Review - Missing Live Context Settings | ~530 |
**platform-integration.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38052 | 9:46 PM | 🔵 | Platform Integration Documentation Review | ~525 |
**hooks-architecture.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38045 | 9:45 PM | 🔵 | Hooks Architecture Documentation Review | ~520 |
**introduction.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38044 | 9:44 PM | 🔵 | Introduction Documentation Review for 9.0 Release | ~462 |
**context-engineering.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38043 | 9:44 PM | 🔵 | Context Engineering Documentation Review | ~455 |
**troubleshooting.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #37548 | 4:48 PM | ✅ | Issue #543 Analysis Report Created for Slash Command Availability | ~540 |
| #38233 | 7:42 PM | | Renumbered SessionEnd Hook from 6 to 5 | ~315 |
| #38229 | 7:41 PM | ✅ | Renumbered PostToolUse Hook from 4 to 3 | ~278 |
| #38225 | " | ✅ | Updated Hook Count Description in Hooks Architecture Documentation | ~352 |
</claude-mem-context>

View File

@@ -3,147 +3,36 @@
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
### Dec 16, 2025
### Nov 18, 2025
**worker-service.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #28299 | 9:57 PM | | Documentation Updated for Renamed MCP Tools | ~305 |
| #28293 | 9:56 PM | 🔵 | get_batch_observations Usage Across Codebase | ~226 |
| #28292 | " | 🔵 | get_batch_observations Referenced in 4 Files | ~246 |
| #28242 | 9:38 PM | 🔵 | Progressive Description and Batch Observations Usage Sites | ~241 |
| #11206 | 3:01 PM | 🔵 | mem-search skill architecture and migration details retrieved in full format | ~538 |
### Nov 21, 2025
**search-architecture.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #28074 | 8:09 PM | 🔵 | Progressive Disclosure Pattern and Search Implementation | ~558 |
| #28067 | " | 🔵 | Progressive Disclosure and Security Architecture | ~528 |
| #28066 | 8:08 PM | 🔵 | Search Architecture Evolution from MCP to Skill-Based | ~530 |
| #28058 | " | 🔵 | Search Architecture Evolution from MCP Tools to Skill-Based HTTP API | ~528 |
| #13218 | 1:58 AM | 🔴 | Escaped HTML special character in MDX documentation | ~261 |
### Dec 3, 2025
**pm2-to-bun-migration.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #27716 | 5:41 PM | | Better-sqlite3 References Removed from Codebase | ~319 |
| #27715 | " | 🔵 | Git Branch Already Created | ~208 |
| #27712 | 5:40 PM | 🟣 | Merged PR #332: API-Based Import/Export Architecture | ~372 |
| #27699 | 5:37 PM | 🔵 | Comprehensive PM2 to Bun migration documentation exists | ~360 |
| #27696 | 5:36 PM | 🔵 | Documentation already reflects better-sqlite3 to bun:sqlite migration | ~390 |
| #27695 | " | 🔵 | Better-sqlite3 references found in documentation | ~201 |
| #27687 | 5:32 PM | 🔴 | Corrected Migration Date in PM2 to Bun Documentation | ~270 |
| #27656 | 5:24 PM | ⚖️ | PM2 to Bun Documentation Migration Plan Created | ~551 |
| #27655 | " | 🟣 | PM2 to Bun Documentation Migration Plan Created | ~455 |
| #27654 | 5:22 PM | 🔵 | Complete PM2 Documentation Audit | ~458 |
| #19891 | 9:43 PM | 🔵 | Seven hook scripts across five lifecycle events | ~713 |
### Dec 17, 2025
### Dec 15, 2025
**pm2-to-bun-migration.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #28930 | 7:30 PM | 🔵 | Worker CLI Distribution and Build System | ~275 |
| #27040 | 6:03 PM | 🔵 | Comprehensive search confirms no 95% claims exist in main branch public documentation | ~508 |
| #27037 | 6:02 PM | 🔵 | Branch switching functionality exists in SettingsRoutes with UI switcher removal intent | ~463 |
### Jan 7, 2026
**worker-service.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #28929 | 7:30 PM | 🔵 | ProcessManager Usage Across Codebase | ~319 |
### Dec 18, 2025
**database.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #29815 | 7:33 PM | 🔵 | Database contains multiple session table schemas | ~311 |
### Dec 20, 2025
**worker-service.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #30675 | 5:08 PM | 🔵 | Platform Documentation Across 18 Files | ~335 |
**overview.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #30253 | 3:17 PM | 🔵 | Agent SDK Integration Throughout Codebase | ~402 |
### Dec 24, 2025
**hooks.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #32193 | 7:42 PM | 🔵 | Session completion endpoint usage across codebase | ~278 |
### Dec 25, 2025
**worker-service.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #32654 | 8:51 PM | 🔵 | Identified multiple files related to queue recovery | ~375 |
| #32456 | 5:41 PM | ✅ | Completed merge of main branch into feature/titans-phase1-3 | ~354 |
| #32432 | 3:41 PM | 🟣 | Manual Queue Recovery System with CLI and API | ~531 |
| #32425 | 3:26 PM | ✅ | API Documentation for Manual Recovery Endpoints Added | ~563 |
### Dec 27, 2025
**hooks.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #33101 | 7:11 PM | 🔵 | Context Injection API Endpoint Usage Across Hooks | ~358 |
### Dec 28, 2025
**search-architecture.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #33540 | 10:55 PM | 🔵 | Grep search found mem-search references in internationalized documentation | ~577 |
### Dec 29, 2025
**search-architecture.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #33769 | 12:37 AM | ✅ | Documentation updated for MCP-based search architecture | ~528 |
| #33767 | 12:34 AM | ✅ | Committed MDX Syntax Fix to updates/docs Branch | ~301 |
| #33766 | " | 🔴 | Fixed MDX Syntax Error in Performance Section | ~293 |
| #33765 | 12:33 AM | 🔵 | Search Architecture Documentation Structure Analysis | ~524 |
| #33763 | 12:27 AM | ✅ | Pull request #480 created for MCP architecture documentation updates | ~423 |
| #33762 | " | ✅ | Architecture shift from skill-based to MCP-based search with 3-layer workflow | ~418 |
| #33760 | 12:26 AM | ✅ | Major documentation overhaul across 6 files with 908 additions | ~367 |
| #33756 | 12:25 AM | ✅ | Documentation Version Reference Removed from Search Architecture | ~257 |
| #33754 | " | ✅ | Removed fabricated version range from skill-based approach comparison | ~318 |
| #33753 | " | ✅ | Documentation Version Number Removed from Architecture Evolution Section | ~233 |
| #33751 | " | 🔵 | Architecture Evolution Documentation Records v6.5.0 Migration | ~227 |
| #33702 | 12:09 AM | ⚖️ | Documentation Update Strategy Finalized for MCP Architecture Transition | ~845 |
| #33698 | 12:07 AM | 🔵 | Search Architecture Documentation Comprehensively Describes Deleted Skill System | ~663 |
| #33680 | 12:03 AM | 🔵 | Search Architecture Documentation Describes Deleted Skill System | ~576 |
### Dec 31, 2025
**pm2-to-bun-migration.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #34688 | 3:40 PM | 🔵 | Worker Logs Command Usage Across Codebase | ~320 |
### Jan 2, 2026
**database.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36096 | 10:23 PM | 🔵 | Observation API Function Names Located | ~227 |
### Jan 3, 2026
**database.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36353 | 8:42 PM | 🔵 | Multiple observation table definitions found across codebase | ~280 |
### Jan 5, 2026
**overview.mdx**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38066 | 9:50 PM | ✅ | v9.0 Documentation Audit Completed with 14 Files Updated | ~547 |
| #38064 | " | ⚖️ | 9.0 Release Documentation Audit Complete - Major Gaps Identified | ~997 |
| #38059 | 9:49 PM | 🔵 | Architecture Overview Documentation Review | ~554 |
| #38221 | 7:41 PM | | Removed User Message Hook Documentation Section | ~339 |
| #38218 | 7:40 PM | ✅ | Updated Hook Configuration Documentation to Match Implementation | ~382 |
| #38212 | " | 🔵 | 5-Stage Hook Lifecycle Architecture for Memory Agent | ~668 |
</claude-mem-context>

View File

@@ -177,7 +177,7 @@ graph TB
| Stage | Hook | Trigger | Purpose |
|-------|------|---------|---------|
| **1. SessionStart** | `context-hook.js` + `user-message-hook.js` | User opens Claude Code | Inject prior context, show UI messages |
| **1. SessionStart** | `context-hook.js` | User opens Claude Code | Inject prior context silently |
| **2. UserPromptSubmit** | `new-hook.js` | User submits a prompt | Create/get session, save prompt, init worker |
| **3. PostToolUse** | `save-hook.js` | Claude uses any tool | Queue observation for AI compression |
| **4. Stop** | `summary-hook.js` | User stops asking questions | Generate session summary |
@@ -194,12 +194,16 @@ Hooks are configured in `plugin/hooks/hooks.json`:
"matcher": "startup|clear|compact",
"hooks": [{
"type": "command",
"command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js",
"command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/smart-install.js",
"timeout": 300
}, {
"type": "command",
"command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/user-message-hook.js",
"timeout": 10
"command": "bun ${CLAUDE_PLUGIN_ROOT}/scripts/worker-service.cjs start",
"timeout": 60
}, {
"type": "command",
"command": "bun ${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js",
"timeout": 60
}]
}],
"UserPromptSubmit": [{
@@ -242,8 +246,13 @@ Hooks are configured in `plugin/hooks/hooks.json`:
**Timing**: When user opens Claude Code or resumes session
**Hooks Triggered** (in order):
1. `context-hook.js` - Fetches and injects prior session context
2. `user-message-hook.js` - Displays context info to user via stderr
1. `smart-install.js` - Ensures dependencies are installed
2. `worker-service.cjs start` - Starts the worker service
3. `context-hook.js` - Fetches and silently injects prior session context
<Note>
As of Claude Code 2.1.0 (ultrathink update), SessionStart hooks no longer display user-visible messages. Context is silently injected via `hookSpecificOutput.additionalContext`.
</Note>
### Sequence Diagram
@@ -306,19 +315,6 @@ sequenceDiagram
**Implementation**: `src/hooks/context-hook.ts`
### User Message Hook (`user-message-hook.js`)
**Purpose**: Display helpful user messages during first-time setup or when viewing context.
**Behavior**:
- Shows first-time setup message when `node_modules` is missing
- Displays formatted context information with colors
- Provides tips for using claude-mem effectively
- Shows link to viewer UI (`http://localhost:37777`)
- Uses stderr as communication channel (only output available in Claude Code UI)
**Implementation**: `src/hooks/user-message-hook.ts`
---
## Stage 2: UserPromptSubmit

View File

@@ -22,11 +22,15 @@ Claude-Mem is fundamentally a **hook-driven system**. Every piece of functionali
┌─────────────────────────────────────────────────────────┐
│ CLAUDE-MEM SYSTEM │
│ │
│ Smart Context User New Obs │
│ Install Inject Message Session Capture │
│ Smart Worker Context New Obs │
│ Install Start Inject Session Capture │
└─────────────────────────────────────────────────────────┘
```
<Note>
As of Claude Code 2.1.0 (ultrathink update), SessionStart hooks no longer display user-visible messages. Context is silently injected via `hookSpecificOutput.additionalContext`.
</Note>
**Key insight:** Claude-Mem doesn't interrupt or modify Claude Code's behavior. It observes from the outside and provides value through lifecycle hooks.
---
@@ -68,9 +72,9 @@ Claude Code's hook system provides exactly what we need:
---
## The Six Hook Scripts + Pre-Hook
## The Hook Scripts
Claude-Mem uses 6 lifecycle hook scripts across 5 lifecycle events, plus 1 pre-hook script for dependency management. SessionStart runs 2 hooks in sequence (after the pre-hook script).
Claude-Mem uses lifecycle hook scripts across 5 lifecycle events. SessionStart runs 3 hooks in sequence: smart-install, worker-service start, and context-hook.
### Pre-Hook: Smart Install (Before SessionStart)
@@ -155,56 +159,7 @@ Claude-Mem uses 6 lifecycle hook scripts across 5 lifecycle events, plus 1 pre-h
---
### Hook 2: SessionStart - User Message
**Purpose:** Display helpful user messages during first-time setup
**When:** Claude Code starts (runs after context-hook)
**What it does:**
1. Checks if dependencies are installed
2. Shows first-time setup message if needed
3. Displays formatted context information with colors
4. Shows link to viewer UI (http://localhost:37777)
5. Exits with code 3 (informational, not error)
**Configuration:**
```json
{
"hooks": {
"SessionStart": [{
"matcher": "startup|clear|compact",
"hooks": [{
"type": "command",
"command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/user-message-hook.js",
"timeout": 10
}]
}]
}
}
```
**Output Example:**
```
📝 Claude-Mem Context Loaded
Note: This appears as stderr but is informational only
[Context details with colors...]
📺 Watch live in browser http://localhost:37777/ (New! v5.1)
```
**Key Features:**
- ✅ User-friendly first-time experience
- ✅ Visual context display
- ✅ Links to viewer UI
- ✅ Non-intrusive (exit code 3)
**Source:** `plugin/scripts/user-message-hook.js` (minified)
---
### Hook 3: UserPromptSubmit (New Session Hook)
### Hook 2: UserPromptSubmit (New Session Hook)
**Purpose:** Initialize session tracking when user submits a prompt
@@ -251,7 +206,7 @@ VALUES (?, ?, ?, ...)
---
### Hook 4: PostToolUse (Save Observation Hook)
### Hook 3: PostToolUse (Save Observation Hook)
**Purpose:** Capture tool execution observations for later processing
@@ -312,7 +267,7 @@ VALUES (?, ?, ?, ?, ...)
---
### Hook 5: Stop Hook (Summary Generation)
### Hook 4: Stop Hook (Summary Generation)
**Purpose:** Generate AI-powered session summaries during the session
@@ -367,7 +322,7 @@ VALUES (?, ?, ?, ?, ...)
---
### Hook 6: SessionEnd (Cleanup Hook)
### Hook 5: SessionEnd (Cleanup Hook)
**Purpose:** Mark sessions as completed when they end
@@ -474,14 +429,18 @@ sequenceDiagram
| Event | Timing | Blocking | Timeout | Output Handling |
|-------|--------|----------|---------|-----------------|
| **SessionStart (smart-install)** | Before session | No | 300s | stderr (info) |
| **SessionStart (context)** | Before session | No | 300s | stdout → context |
| **SessionStart (user-message)** | Before session | No | 10s | stderr (info) |
| **UserPromptSubmit** | Before processing | No | 120s | stdout → context |
| **SessionStart (smart-install)** | Before session | No | 300s | stderr (log only) |
| **SessionStart (worker-start)** | Before session | No | 60s | stderr (log only) |
| **SessionStart (context)** | Before session | No | 60s | JSON → additionalContext (silent) |
| **UserPromptSubmit** | Before processing | No | 60s | stdout → context |
| **PostToolUse** | After tool | No | 120s | Transcript only |
| **Summary** | Worker triggered | No | 120s | Database |
| **SessionEnd** | On exit | No | 120s | Log only |
<Note>
As of Claude Code 2.1.0 (ultrathink update), SessionStart hooks no longer display user-visible messages. Context is silently injected via `hookSpecificOutput.additionalContext`.
</Note>
---
## The Worker Service Architecture

View File

@@ -0,0 +1,144 @@
# All Open Issues Explained
*Generated: January 7, 2026*
This report provides plain English explanations of all 12 open GitHub issues, their root causes, and proposed solutions.
---
## Critical Priority (P0)
### #603 - Memory Leak from Child Processes
When you use claude-mem on Linux/Mac, it spawns helper processes to analyze your work. These processes never get cleaned up when they're done - they just sit there eating RAM. One user had 121 zombie processes using 44GB of memory after 6 hours.
**Root cause:** The `getChildProcesses()` function in ProcessManager.ts only works on Windows (using WMIC). On Linux/Mac, it returns an empty array, so child processes are never tracked or killed during cleanup.
**Proposed solution:** Add Unix child process enumeration using `pgrep -P <pid>` to find and kill child processes when the worker shuts down or restarts.
---
### #596 - SDK Crashes on Startup
Sometimes when the plugin tries to start its AI helper, it crashes immediately with "ProcessTransport not ready." It's a timing issue - the plugin tries to send data before the helper process is fully started up.
**Root cause:** The Claude Agent SDK spawns a subprocess, but the plugin immediately tries to write to stdin before the process has finished initializing. There's no retry mechanism.
**Proposed solution:** Add a retry wrapper with exponential backoff (100ms → 200ms → 400ms) around the SDK query call. If it fails with "ProcessTransport not ready," wait and try again up to 3 times.
---
### #587 - Observations Stop Being Saved
After you restart the worker (or it crashes), the plugin thinks it can resume an old session that doesn't exist anymore. The AI helper just sits there waiting instead of processing your work, so nothing gets saved.
**Root cause:** The `memorySessionId` persists in the database across worker restarts, but the actual SDK session is gone. The plugin tries to resume a non-existent session, and the SDK responds with "awaiting data" instead of processing.
**Proposed solution:** Track whether the `memorySessionId` was captured during the current worker run with a `memorySessionIdCapturedThisRun` flag. Only attempt to resume if this flag is true.
---
## High Priority (P1)
### #602 - Windows Won't Start
The plugin uses an old Windows command called `wmic` that Microsoft removed from Windows 11. So Windows users get errors and the plugin won't start properly.
**Root cause:** ProcessManager.ts uses `wmic process where "parentprocessid=X"` to enumerate child processes, but WMIC is deprecated and removed from modern Windows 11 builds.
**Proposed solution:** Replace WMIC with `tasklist /FI "PID eq X" /FO CSV` as the primary method, with PowerShell `Get-CimInstance Win32_Process` as a fallback.
---
### #588 - Unexpected API Charges
If you have an Anthropic API key in your project's `.env` file, the plugin silently uses it and charges your account. Users with Claude Max subscriptions were surprised by extra bills because the plugin found their API key and used it without asking.
**Root cause:** The default provider is set to `'claude'` in SettingsDefaultsManager.ts, and the Claude Agent SDK automatically discovers `ANTHROPIC_API_KEY` from environment variables. The plugin inherits the parent process environment, exposing any API keys.
**Proposed solution:** Either change the default provider to `'gemini'` (which has a free tier), or add a first-run warning that clearly states API costs will be incurred. Consider requiring explicit opt-in for Anthropic API usage.
---
### #591 - OpenRouter Provider Broken
When using OpenRouter as your AI provider, the plugin can't save observations because it's missing an internal ID that normally comes from Claude's API. OpenRouter doesn't provide this ID, and the plugin doesn't handle that.
**Root cause:** OpenRouterAgent.ts has no mechanism to capture or generate a `memorySessionId`. Unlike the Claude SDK which returns a `session_id` in responses, OpenRouter's API is stateless and doesn't provide session identifiers.
**Proposed solution:** Generate a UUID for `memorySessionId` at the start of `OpenRouterAgent.startSession()` before calling `processAgentResponse()`. The same fix is needed for GeminiAgent.ts.
---
### #598 - Plugin Messages in Your History
When you use `/resume` in Claude Code, you see a bunch of "Hello memory agent" messages that the plugin sent internally. These should be hidden from your conversation history but they're leaking through.
**Root cause:** The plugin yields messages with `session_id: session.contentSessionId` (the user's session) instead of `session.memorySessionId` (the plugin's internal session). This causes the SDK to associate plugin messages with the user's conversation.
**Proposed solution:** Change SDKAgent.ts line 289 to use `memorySessionId` instead of `contentSessionId`. Also consider removing or minimizing the `continuation_greeting` in code.json.
---
### #586 - Race Condition Loses Data
There's a timing bug where the plugin tries to save your observations before it has the session ID it needs. Instead of waiting, it just throws an error and your observations are lost.
**Root cause:** The async message generator yields messages concurrently with session ID capture. If `processAgentResponse()` runs before the first SDK message with `session_id` is processed, `memorySessionId` is still null and the hard error at ResponseProcessor.ts:73-75 throws.
**Proposed solution:** Replace the hard error with a wait/retry loop that polls for up to 5 seconds for `memorySessionId` to be captured. If still missing, generate a fallback UUID.
---
## Medium Priority (P2)
### #590 - Annoying Popup Window on Windows
When the plugin starts its vector database (Chroma) on Windows, a blank terminal window pops up and stays open. You have to manually close it every time.
**Root cause:** ChromaSync.ts attempts to set `windowsHide: true` in the transport options, but the MCP SDK's StdioClientTransport doesn't pass this option through to `child_process.spawn()`.
**Proposed solution:** Wrap the `uvx` command in a PowerShell call: `powershell -NoProfile -WindowStyle Hidden -Command "uvx ..."`. This pattern already works elsewhere in the codebase (ProcessManager.ts:271).
---
### #600 - Documentation Lies
The docs describe features that don't actually exist in the released version - they're only in beta branches. Users try to use documented features and they don't work.
**Root cause:** Documentation was written for features in beta branches that were never merged to main. The MCP migration removed the skills directory but docs still reference it. Several settings are documented but not in the validated settings list.
**Proposed solution:** Audit all docs and either add "Beta Only" badges to unimplemented features, or remove references entirely. Update architecture docs to reflect MCP-based search instead of skill-based.
---
### #597 - General Bug Report
A user posted 4 screenshots saying "too many bugs" after 2 days of frustration. It's basically a meta-issue confirming the other problems are real and affecting users.
**Root cause:** The user encountered multiple v9.0.0 regressions including ProcessTransport failures, worker startup issues, and session problems. The screenshots show error states but lack specific details.
**Proposed solution:** This is resolved by fixing the other issues. Consider adding a `/troubleshoot` command or better error reporting to help users provide actionable bug reports.
---
## Low Priority (P3)
### #599 - Windows Drive Root Error
If you run Claude Code from `C:\` (the drive root), the plugin crashes because it can't figure out what to call your "project." It's an edge case but easy to fix.
**Root cause:** user-message-hook.ts uses `path.basename(process.cwd())` directly, which returns an empty string for drive roots like `C:\`. The API rejects empty project names with a 400 error.
**Proposed solution:** Use the existing `getProjectName()` utility from `src/utils/project-name.ts` which already handles drive roots by returning `"drive-C"` style names.
---
## Summary by Release
| Release | Issues | Focus |
|---------|--------|-------|
| v9.0.1 | #603, #596, #587 | Critical stability fixes |
| v9.0.2 | #602, #588, #591, #598, #586 | Windows + provider fixes |
| v9.1.0 | #590, #600, #597 | Polish + documentation |
| v9.1.x | #599 | Edge case fix |

View File

@@ -0,0 +1,84 @@
# Open Issues Summary - January 7, 2026
This document provides an index of all open GitHub issues analyzed on 2026-01-07.
## Critical Priority (P0)
| Issue | Title | Severity | Report |
|-------|-------|----------|--------|
| #603 | Worker daemon leaks child claude processes | Critical | [Report](./issue-603-worker-daemon-leaks-child-processes.md) |
| #596 | ProcessTransport not ready for writing | Critical | [Report](./issue-596-processtransport-not-ready.md) |
| #587 | Observations not stored - SDK awaiting data | Critical | [Report](./issue-587-observations-not-stored.md) |
## High Priority (P1)
| Issue | Title | Severity | Report |
|-------|-------|----------|--------|
| #602 | PostToolUse worker-service failed (Windows) | Critical | [Report](./issue-602-posttooluse-worker-service-failed.md) |
| #588 | API key usage warning - unexpected charges | High | [Report](./issue-588-api-key-usage-warning.md) |
| #591 | OpenRouter memorySessionId capture failure | Critical | [Report](./issue-591-openrouter-memorysessionid-capture.md) |
| #598 | Conversation history pollution | High | [Report](./issue-598-conversation-history-pollution.md) |
| #586 | Race condition in memory_session_id capture | High | [Report](./issue-586-feature-request-unknown.md) |
| #597 | Multiple bugs reported (image-only) | High | [Report](./issue-597-too-many-bugs.md) |
## Medium Priority (P2)
| Issue | Title | Severity | Report |
|-------|-------|----------|--------|
| #590 | Windows Chroma terminal popup | Medium | [Report](./issue-590-windows-chroma-terminal-popup.md) |
| #600 | Documentation audit - features not implemented | Medium | [Report](./issue-600-documentation-audit-features-not-implemented.md) |
## Low Priority (P3)
| Issue | Title | Severity | Report |
|-------|-------|----------|--------|
| #599 | Windows drive root 400 error | Low | [Report](./issue-599-windows-drive-root-400-error.md) |
---
## Key Themes
### 1. v9.0.0 Regressions
Multiple issues (#596, #587, #586) relate to observation storage failures introduced in v9.0.0, primarily around:
- ProcessTransport race conditions
- Session ID capture timing
- Worker restart loops
### 2. Windows Platform Issues
Several Windows-specific bugs (#602, #590, #599):
- WMIC deprecated command usage
- Console window popups
- Path handling for drive roots
### 3. Session Management
Issues with session lifecycle (#603, #591, #598):
- Child process leaks
- Provider-specific session ID handling
- Message pollution in user history
### 4. Documentation Drift
Issue #600 identifies significant gap between documented and implemented features.
---
## Recommended Fix Order
1. **v9.0.1 Hotfix** (48 hours):
- #588 - Add API key usage warning (financial impact)
- #596 - ProcessTransport retry mechanism
- #587 - Stale session invalidation
2. **v9.0.2 Patch** (1 week):
- #603 - Orphan process reaper
- #602 - Windows WMIC replacement
- #591 - OpenRouter memorySessionId generation
3. **v9.1.0 Minor** (2 weeks):
- #598 - Session isolation improvements
- #590 - Windows console hiding
- #599 - Drive root path handling
- #600 - Documentation updates
---
*Generated: 2026-01-07 19:45 EST*

View File

@@ -3,165 +3,35 @@
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
### Jan 2, 2026
**2026-01-02--generator-failure-investigation.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36091 | 10:20 PM | 🔵 | Generator Failure Investigation - Chroma Vector Search Silent Failures | ~436 |
| #36079 | 10:10 PM | 🔴 | Fixed Generator Crashes from Silent Chroma Vector Search Failures | ~531 |
**2026-01-02--stuck-observations.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36004 | 8:14 PM | 🔵 | Comprehensive Investigation Report on Stuck Observations Problem | ~527 |
### Jan 3, 2026
**2026-01-04--session-id-refactor-failures.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36643 | 10:56 PM | 🔵 | Session ID Refactor Test Failures Root Cause | ~513 |
| #36636 | 10:46 PM | 🟣 | Session ID Refactor Analysis Agent Completed Comprehensive Report | ~637 |
| #36626 | 10:44 PM | 🟣 | Session ID Refactor Failures Report Generated | ~569 |
**2026-01-04--gemini-agent-failures.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36638 | 10:47 PM | ✅ | GeminiAgent Failures Report Manually Created After Agent Timeout | ~604 |
**2026-01-04--session-store-failures.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36635 | 10:46 PM | 🟣 | SessionStore Analysis Agent Completed Report Generation | ~545 |
| #36634 | " | ✅ | SessionStore Failures Report Generated With Test Fix Recommendations | ~595 |
**2026-01-04--logger-coverage-failures.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36633 | 10:46 PM | ✅ | Logger Coverage Failures Report Generated | ~559 |
| #36623 | 10:44 PM | 🟣 | Logger Coverage Failures Report Generated | ~249 |
**2026-01-04--session-id-validation-failures.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36628 | 10:44 PM | 🟣 | Session ID Validation Failures Report Generated | ~690 |
**2026-01-04--test-suite-report.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36609 | 10:39 PM | 🟣 | Comprehensive Test Suite Report Generated | ~563 |
**reports**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36599 | 10:37 PM | 🔵 | Reports Directory Structure Confirmed | ~203 |
**2026-01-02--generator-failure-investigation.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36390 | 8:50 PM | 🔄 | Comprehensive Monolith Refactor with Modular Architecture | ~724 |
**2026-01-03--observation-saving-failure.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36113 | 3:58 PM | 🔴 | Fixed FOREIGN KEY Constraint Failure in Observation Storage | ~448 |
### Jan 4, 2026
**2026-01-04--issue-511-gemini-model-missing.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36924 | 2:25 AM | ✅ | Merged fix/pr-538-followups branch into main with comprehensive updates | ~481 |
| #36827 | 1:03 AM | ✅ | Branch diff shows 1,293 insertions and 98 deletions across 15 files | ~464 |
| #36781 | 12:45 AM | 🔵 | Complete GeminiAgent Model Configuration Gap Analysis | ~552 |
| #36776 | 12:43 AM | 🔵 | Issue #511 Analysis Document Located | ~459 |
| #36759 | 12:34 AM | ✅ | Created Issue #511 Analysis Report | ~304 |
**2026-01-04--issue-517-windows-powershell-analysis.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36779 | 12:44 AM | 🔵 | ProcessManager Windows PowerShell Functions Complete Analysis | ~550 |
| #36720 | 12:15 AM | 🔵 | Issue #517 Windows PowerShell Analysis Completed | ~631 |
| #36718 | " | 🔵 | Issue #517 Analysis - Windows PowerShell Variable Escaping Bug | ~482 |
**2026-01-04--issue-531-export-type-duplication.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36760 | 12:34 AM | ✅ | Created Issue #531 Report: Export Script Type Duplication | ~430 |
**2026-01-04--gemini-agent-failures.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36751 | 12:32 AM | 🔵 | Gemini-Related Files Located Across Project | ~242 |
**reports**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36748 | 12:31 AM | 🔵 | Existing GitHub Issue Reports Located | ~271 |
**2026-01-04--issue-527-uv-homebrew-analysis.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36721 | 12:15 AM | 🔵 | Issue #527 UV Homebrew Path Missing on Apple Silicon | ~492 |
| #36719 | " | 🔵 | Issue #527 uv Homebrew Detection Missing on Apple Silicon Macs | ~526 |
**2026-01-04--issue-514-orphaned-sessions-analysis.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36717 | 12:14 AM | 🔵 | Issue #514 Orphaned Sessions Analysis Completed | ~723 |
| #36716 | 12:13 AM | 🔵 | Issue #514 Orphaned .jsonl Session Files Analysis | ~616 |
**2026-01-04--issue-532-memory-leak-analysis.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36714 | 12:12 AM | 🔵 | Memory Leak Analysis Report for Issue #532 Generated | ~531 |
| #36712 | 12:11 AM | 🔵 | Memory Leak Analysis for Issue #532 Documented | ~646 |
**2026-01-04--issue-520-stuck-messages-analysis.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36713 | 12:11 AM | 🔵 | Issue #520 Stuck Messages Already Resolved | ~569 |
| #36711 | " | 🔵 | Issue #520 Stuck Messages Analysis - Already Resolved | ~526 |
**2026-01-02--stuck-observations.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #36710 | 12:07 AM | 🔵 | Stuck Observations Analysis - Six Critical Lifecycle Gaps | ~677 |
### Jan 5, 2026
**2026-01-05--issue-544-mem-search-hint-claude-code.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #37613 | 5:31 PM | 🔵 | PR #558 Review Feedback Analysis | ~544 |
| #37555 | 4:49 PM | 🔵 | Issue #544 Message Locations and Fix Pattern Documented | ~463 |
| #37545 | 4:47 PM | ✅ | Issue #544 Analysis Report Created for mem-search Skill Messaging Problem | ~480 |
| #37962 | 8:18 PM | 🔴 | Fixed SessionStart hook crash when stdin is undefined | ~440 |
### Jan 7, 2026
**2026-01-05--issue-555-windows-hooks-ipc-false.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #37558 | 4:49 PM | 🔵 | Issue #555 Windows Hook Execution Patterns and Fix Strategy Documented | ~510 |
**2026-01-05--issue-545-formattool-json-parse-crash.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #37557 | 4:49 PM | 🔵 | Issue #545 Bug Location and Fix Pattern Documented | ~462 |
**2026-01-05--issue-543-slash-command-unavailable.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #37548 | 4:48 PM | | Issue #543 Analysis Report Created for Slash Command Availability | ~540 |
**2026-01-05--issue-557-settings-module-loader-error.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #37547 | 4:47 PM | | Issue #557 Analysis Report Created for Plugin Startup Failure | ~491 |
### Jan 6, 2026
**2026-01-06--windows-woes-comprehensive-report.md**
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #38109 | 12:16 AM | ✅ | Comprehensive Windows Woes Report Generated from Memory Search | ~826 |
| #38475 | 10:31 PM | ⚖️ | Log Level Philosophy: Error-Adjacent Messages Promoted to ERROR | ~412 |
| #38467 | 10:29 PM | ⚖️ | Log Level Audit Strategy: Tighten ERROR Messages for Runtime Issue Discovery | ~464 |
| #38445 | 10:26 PM | 🔵 | DEBUG Level Logging: SessionRoutes Line 211 Error-During-Recovery Pattern | ~500 |
| #38442 | 10:25 PM | 🔵 | Log Audit Contains 409 Source File Log Entries | ~293 |
| #38441 | " | 🔵 | DEBUG-level logging patterns for diagnostics and non-critical operations | ~595 |
| #38439 | " | 🔵 | Log Audit Shows SessionRoutes.ts Has Two WARN Messages for Generator Failures | ~493 |
| #38438 | " | 🔵 | WARN Level Log Patterns: Graceful Degradation and Fallback Behavior | ~539 |
| #38437 | 10:24 PM | 🔵 | Claude-mem core functionality and logging patterns identified | ~710 |
| #38428 | " | 🔵 | Log level audit report structure and content examined | ~559 |
| #38425 | " | ⚖️ | Log Level Architecture: Fail-Critical Over Fail-Fast for Chroma | ~467 |
| #38416 | 10:22 PM | 🔵 | ChromaDB Is Critical Not Optional - Log Audit Findings Challenged | ~405 |
| #38405 | 10:07 PM | ⚖️ | DEBUG Log Level Analysis - One Message Requires WARN Promotion | ~819 |
| #38404 | 10:06 PM | ⚖️ | Log Level Audit Analysis - WARN to ERROR Promotion Criteria Established | ~769 |
| #38403 | 10:04 PM | 🔵 | Log Level Audit - INFO and DEBUG Level Messages Catalogued | ~688 |
| #38402 | 10:03 PM | 🔵 | Log Level Audit Report Analysis - Critical Error Messages Identified | ~642 |
| #38401 | 10:00 PM | 🔵 | Enhanced Audit Report Reveals Error Logging Patterns and Message Extraction Issues | ~498 |
| #38394 | 9:58 PM | ✅ | Created Log Level Audit Report Documentation | ~319 |
| #38393 | " | ✅ | Enhanced Log Audit Report Format with Component Tags and Full Logger Calls | ~393 |
| #38386 | 9:56 PM | ✅ | Log Audit Report Generated and Saved to Documentation | ~442 |
| #38385 | " | ✅ | Log Level Audit Report Saved to Documentation | ~379 |
| #38251 | 7:46 PM | 🔵 | Comprehensive Windows Platform Issues Report | ~982 |
</claude-mem-context>

View File

@@ -0,0 +1,507 @@
# Issue #586: Race Condition in memory_session_id Capture
**Report Date:** 2026-01-07
**Issue:** [#586](https://github.com/thedotmack/claude-mem/issues/586)
**Reporter:** rocky2431
**Environment:** claude-mem 9.0.0, macOS Darwin 24.6.0, Node v22.x / Bun 1.x
---
## 1. Executive Summary
This issue describes a critical race condition where new sessions frequently have an empty (NULL) `memory_session_id` in the `sdk_sessions` table. This prevents observations from being stored, as the `ResponseProcessor` requires a valid `memorySessionId` before processing agent responses.
**Key Finding:** The race condition occurs because session initialization via `handleSessionInitByClaudeId()` creates the session with a NULL `memory_session_id`, but the SDK agent may not have responded yet to provide its session ID when subsequent `PostToolUse` hooks attempt to store observations.
**Error Message:**
```
Cannot store observations: memorySessionId not yet captured
```
**Severity:** Critical
**Priority:** P1
**Impact:** Sessions with NULL `memory_session_id` cannot store any observations, leading to data loss and incomplete session history.
---
## 2. Problem Analysis
### 2.1 Error Manifestation
The error originates from `ResponseProcessor.ts` (line 73-75):
```typescript
// CRITICAL: Must use memorySessionId (not contentSessionId) for FK constraint
if (!session.memorySessionId) {
throw new Error('Cannot store observations: memorySessionId not yet captured');
}
```
### 2.2 Observed Symptoms
1. **Log Evidence:**
```log
[2026-01-07 04:02:39.872] [INFO ] [SESSION] [session-14379] Session initialized
{project=claude-task-master, contentSessionId=a48d7f90-27e4-4a1d-b379-bf2195ee333e,
queueDepth=0, hasGenerator=false}
```
Note: `contentSessionId` is present but `memorySessionId` is missing.
2. **Database State:**
```sql
SELECT id, memory_session_id, project FROM sdk_sessions ORDER BY id DESC LIMIT 5;
14379 | (NULL) | claude-task-master -- Missing!
14293 | 090b5397-... | .claude -- OK
14285 | (NULL) | .claude -- Missing!
```
3. **Queue Accumulation:**
- Observations are enqueued to `pending_messages` table
- Hundreds of unprocessed items accumulate
- Only user prompts are recorded, no AI analysis
### 2.3 Race Condition Timeline
```
Time T0: SessionStart hook triggers
└─> new-hook.ts calls /api/sessions/init
└─> createSDKSession() creates row with memory_session_id = NULL
Time T1: PostToolUse hook triggers (user action)
└─> save-hook.ts calls /api/sessions/observations
└─> Observation queued to pending_messages
Time T2: SDK Agent generator starts
└─> Waiting for first message from Claude SDK
Time T3: First SDK message arrives (RACE CONDITION WINDOW)
└─> updateMemorySessionId() called with captured ID
└─> Database updated: memory_session_id = "sdk-gen-abc123"
Time T4: SDK Agent attempts to process queued observations
└─> processAgentResponse() checks session.memorySessionId
└─> If NULL (not yet updated): ERROR thrown
```
**The Problem:** If `PostToolUse` events arrive during the window between session creation (T0) and SDK session ID capture (T3), the `ResponseProcessor` will fail because `memorySessionId` is still NULL.
---
## 3. Technical Details
### 3.1 Session ID Architecture
Claude-mem uses a dual session ID system (documented in `docs/SESSION_ID_ARCHITECTURE.md`):
| ID | Purpose | Source | Initial Value |
|----|---------|--------|---------------|
| `contentSessionId` | User's Claude Code conversation ID | Hook system | Set immediately |
| `memorySessionId` | Memory agent's internal session ID | SDK response | NULL (captured later) |
### 3.2 Session Creation Flow
**File:** `src/services/sqlite/sessions/create.ts` (lines 24-47)
```typescript
export function createSDKSession(
db: Database,
contentSessionId: string,
project: string,
userPrompt: string
): number {
// Pure INSERT OR IGNORE - no updates, no complexity
// NOTE: memory_session_id starts as NULL. It is captured by SDKAgent from the first SDK
// response and stored via updateMemorySessionId(). CRITICAL: memory_session_id must NEVER
// equal contentSessionId - that would inject memory messages into the user's transcript!
db.prepare(`
INSERT OR IGNORE INTO sdk_sessions
(content_session_id, memory_session_id, project, user_prompt, started_at, started_at_epoch, status)
VALUES (?, NULL, ?, ?, ?, ?, 'active')
`).run(contentSessionId, project, userPrompt, now.toISOString(), nowEpoch);
// ...
}
```
### 3.3 Memory Session ID Capture
**File:** `src/services/worker/SDKAgent.ts` (lines 117-141)
```typescript
// Process SDK messages
for await (const message of queryResult) {
// Capture memory session ID from first SDK message (any type has session_id)
if (!session.memorySessionId && message.session_id) {
session.memorySessionId = message.session_id;
// Persist to database for cross-restart recovery
this.dbManager.getSessionStore().updateMemorySessionId(
session.sessionDbId,
message.session_id
);
// ... verification logging ...
}
// ...
}
```
### 3.4 Response Processor Validation
**File:** `src/services/worker/agents/ResponseProcessor.ts` (lines 72-75)
```typescript
// CRITICAL: Must use memorySessionId (not contentSessionId) for FK constraint
if (!session.memorySessionId) {
throw new Error('Cannot store observations: memorySessionId not yet captured');
}
```
### 3.5 Session Manager Initialization
**File:** `src/services/worker/SessionManager.ts` (lines 127-143)
```typescript
// Create active session
// Load memorySessionId from database if previously captured (enables resume across restarts)
session = {
sessionDbId,
contentSessionId: dbSession.content_session_id,
memorySessionId: dbSession.memory_session_id || null, // NULL initially!
// ...
};
```
---
## 4. Impact Assessment
### 4.1 Direct Impact
| Impact Area | Description |
|------------|-------------|
| **Data Loss** | Observations queued during race window are never stored |
| **Queue Growth** | `pending_messages` table grows unbounded |
| **User Experience** | Session history incomplete - only prompts, no analysis |
| **System Load** | Repeated retry attempts consume resources |
### 4.2 Frequency
The issue appears **intermittent** - some sessions initialize correctly while others fail. The race condition depends on:
- System load
- Claude SDK response latency
- Hook timing relative to SDK startup
### 4.3 Related Issues
- **Issue #520** (CLOSED): Stuck messages in 'processing' status - similar queue recovery problem
- **Issue #591**: OpenRouter Agent fails to capture memorySessionId - architectural gap for stateless providers
---
## 5. Root Cause Analysis
### 5.1 Primary Root Cause
**Architectural Timing Gap:** The session initialization API (`/api/sessions/init`) creates sessions with a NULL `memory_session_id`, expecting the SDK agent to capture it from the first response. However, there is no synchronization mechanism to prevent observation processing before this capture occurs.
### 5.2 Contributing Factors
1. **Asynchronous SDK Agent Startup:** The generator starts asynchronously without blocking the hook response
2. **No Capture Wait Mechanism:** Observations are queued immediately without waiting for memorySessionId capture
3. **Strict Validation in ResponseProcessor:** The processor throws an error rather than handling the NULL case gracefully
4. **No Retry Logic:** Failed observations due to missing memorySessionId are not retried after capture
### 5.3 Timing Window Analysis
```
Hook Execution Timeline:
├─ new-hook.ts (UserPromptSubmit)
│ ├─ POST /api/sessions/init → createSDKSession(memory_session_id=NULL)
│ └─ POST /sessions/{id}/init → startSession() [async, non-blocking]
├─ [RACE CONDITION WINDOW OPENS]
│ └─ SDK agent waiting for Claude response
├─ save-hook.ts (PostToolUse) ← CAN TRIGGER DURING WINDOW
│ └─ POST /api/sessions/observations
│ └─ Queued, will fail when processed
├─ [SDK FIRST MESSAGE ARRIVES]
│ └─ updateMemorySessionId(captured_id)
│ └─ Database updated, session.memorySessionId set
├─ [RACE CONDITION WINDOW CLOSES]
└─ Subsequent observations process successfully
```
---
## 6. Recommended Solutions
### 6.1 Solution A: Retry Mechanism in ResponseProcessor (Recommended)
If `memorySessionId` is not available, wait briefly with exponential backoff:
```typescript
// In processAgentResponse():
async function waitForMemorySessionId(
session: ActiveSession,
dbManager: DatabaseManager,
maxRetries: number = 5,
baseDelayMs: number = 100
): Promise<boolean> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
if (session.memorySessionId) return true;
// Check database for updates
const dbSession = dbManager.getSessionById(session.sessionDbId);
if (dbSession?.memory_session_id) {
session.memorySessionId = dbSession.memory_session_id;
return true;
}
await new Promise(resolve => setTimeout(resolve, baseDelayMs * Math.pow(2, attempt)));
}
return false;
}
// Usage:
const captured = await waitForMemorySessionId(session, dbManager);
if (!captured) {
throw new Error('Cannot store observations: memorySessionId not yet captured after retries');
}
```
**Pros:**
- Non-breaking change
- Handles timing variations gracefully
- Minimal code modification
**Cons:**
- Adds latency in worst case
- Polling-based solution
### 6.2 Solution B: Lazy Capture on First PostToolUse
Capture `memorySessionId` on the first `PostToolUse` if not already set:
```typescript
// In handleObservationsByClaudeId():
if (!session.memorySessionId && session.contentSessionId) {
// Generate a placeholder that will be updated when SDK responds
const tempId = `pending-${session.contentSessionId}`;
session.memorySessionId = tempId;
store.updateMemorySessionId(sessionDbId, tempId);
logger.warn('SESSION', 'Generated temporary memorySessionId', { tempId });
}
```
**Pros:**
- Immediate resolution
- No retry delays
**Cons:**
- Temporary IDs may cause confusion
- Requires updating when real ID is captured
### 6.3 Solution C: Use contentSessionId as Fallback
For initial observations before SDK capture, use `contentSessionId`:
```typescript
// In processAgentResponse():
const effectiveMemorySessionId = session.memorySessionId || session.contentSessionId;
```
**Pros:**
- Simple implementation
- No timing issues
**Cons:**
- **Violates architectural principle** that memorySessionId should differ from contentSessionId
- Risk of FK constraint issues
- May cause resume problems
### 6.4 Solution D: Block Until memorySessionId is Captured
Modify `handleObservationsByClaudeId` to wait for SDK capture:
```typescript
// In handleObservationsByClaudeId():
const session = this.sessionManager.getSession(sessionDbId);
if (!session?.memorySessionId) {
// Return a "pending" response, client should retry
res.status(202).json({
status: 'pending',
reason: 'awaiting_memory_session_id',
retryAfterMs: 500
});
return;
}
```
**Pros:**
- Explicit handling
- Client-controlled retry
**Cons:**
- Requires hook changes
- May cause hook timeout
### 6.5 Recommended Approach
**Solution A** is recommended because:
1. Handles the race condition transparently
2. Minimal impact on existing code
3. Self-healing behavior (retries until successful)
4. Maintains architectural integrity
5. Low regression risk
---
## 7. Priority/Severity Assessment
### 7.1 Severity Matrix
| Factor | Assessment |
|--------|------------|
| **Data Loss** | High - Observations lost during race window |
| **Functionality** | Partial - Some sessions work, some don't |
| **Frequency** | Intermittent - Depends on system timing |
| **Workaround** | Manual SQL fix available |
| **Affected Users** | All users under specific timing conditions |
### 7.2 Priority Assignment
**Priority: P1 (High)**
Rationale:
- Silent data loss is occurring
- Affects core functionality (observation storage)
- Unpredictable - users may not know data is being lost
- Fix is straightforward with low regression risk
### 7.3 Recommended Timeline
| Action | Timeline |
|--------|----------|
| Implement Solution A | 2-4 hours |
| Unit tests | 1 hour |
| Integration tests | 1 hour |
| Code review | 30 minutes |
| Release | Same day |
---
## 8. Workaround
Users experiencing this issue can manually fix affected sessions:
```sql
-- Find sessions with missing memory_session_id
SELECT id, content_session_id, project
FROM sdk_sessions
WHERE memory_session_id IS NULL;
-- Option 1: Use content_session_id as memory_session_id (not recommended)
-- WARNING: May cause issues with session resume
UPDATE sdk_sessions
SET memory_session_id = content_session_id
WHERE id = <sessionDbId> AND memory_session_id IS NULL;
-- Option 2: Generate a unique ID
UPDATE sdk_sessions
SET memory_session_id = 'manual-' || content_session_id
WHERE id = <sessionDbId> AND memory_session_id IS NULL;
```
**Important:** After applying the workaround, the worker must be restarted to pick up the new `memory_session_id` values.
---
## 9. Testing Recommendations
### 9.1 Unit Tests
```typescript
describe('ResponseProcessor memorySessionId handling', () => {
it('should wait for memorySessionId capture with retry', async () => {
const session = createMockSession({ memorySessionId: null });
// Simulate delayed capture
setTimeout(() => {
session.memorySessionId = 'captured-id';
}, 200);
await expect(
processAgentResponse(text, session, dbManager, sessionManager, worker, 0, null, 'Test')
).resolves.not.toThrow();
});
it('should throw after max retries if memorySessionId never captured', async () => {
const session = createMockSession({ memorySessionId: null });
await expect(
processAgentResponse(text, session, dbManager, sessionManager, worker, 0, null, 'Test')
).rejects.toThrow('memorySessionId not yet captured after retries');
});
});
```
### 9.2 Integration Tests
```typescript
describe('Session initialization race condition', () => {
it('should handle rapid PostToolUse events during SDK startup', async () => {
// Create session
const sessionDbId = store.createSDKSession(contentSessionId, project, prompt);
// Immediately queue observations (before SDK responds)
for (let i = 0; i < 5; i++) {
sessionManager.queueObservation(sessionDbId, {
tool_name: 'Read',
tool_input: { file_path: '/test.txt' },
tool_response: { content: 'test' },
prompt_number: 1,
cwd: '/test'
});
}
// Start SDK agent (will capture memorySessionId)
await sdkAgent.startSession(session, worker);
// Verify all observations were stored
const stored = db.prepare('SELECT COUNT(*) as count FROM observations WHERE memory_session_id = ?')
.get(session.memorySessionId);
expect(stored.count).toBeGreaterThanOrEqual(5);
});
});
```
---
## 10. Related Files
| File | Relevance |
|------|-----------|
| `src/services/worker/agents/ResponseProcessor.ts` | Error origin (line 73-75), primary fix location |
| `src/services/worker/SessionManager.ts` | Session initialization with NULL memorySessionId |
| `src/services/worker/SDKAgent.ts` | memorySessionId capture logic |
| `src/services/sqlite/sessions/create.ts` | Session creation with NULL memory_session_id |
| `src/hooks/new-hook.ts` | Session initialization hook |
| `src/hooks/save-hook.ts` | PostToolUse observation queueing |
| `docs/SESSION_ID_ARCHITECTURE.md` | Architecture documentation |
---
## 11. Conclusion
Issue #586 describes a critical race condition in the session initialization process where `memory_session_id` is not captured before observations are processed. This results in silent data loss as observations fail to store with the error "Cannot store observations: memorySessionId not yet captured".
The recommended fix is to implement a retry mechanism in `ResponseProcessor.processAgentResponse()` that waits for the `memorySessionId` to be captured, with exponential backoff. This approach:
- Maintains the existing architectural integrity
- Handles timing variations gracefully
- Has low regression risk
- Is straightforward to implement and test
**Immediate Action Required:** Implement Solution A (Retry Mechanism) and release a hotfix to prevent ongoing data loss.

View File

@@ -0,0 +1,337 @@
# Technical Report: Issue #587 - Observations Not Being Stored
**Issue:** v9.0.0: Observations not being stored - SDK agent stuck on 'Awaiting tool execution data'
**Author:** chuck-boudreau
**Created:** 2026-01-07
**Report Date:** 2026-01-07
**Status:** Open
**Affected Version:** 9.0.0
**Environment:** macOS (Darwin 25.1.0)
---
## 1. Executive Summary
After upgrading to claude-mem v9.0.0, users report that observations are not being stored in the database. The SDK agent responds with "Ready to observe. Awaiting tool execution data from the primary session" instead of processing tool calls and generating observations. Investigation reveals a **two-part failure mode**:
1. **Primary Issue:** The SDK agent receives tool execution data but fails to process it into observations, returning a generic "awaiting data" message despite receiving valid input.
2. **Secondary Issue (Resolved):** A version mismatch between plugin (9.0.0) and worker (8.5.9) was causing an infinite restart loop, which was fixed in commit `e22e2bfc`. However, **even after resolving the restart loop, the observation storage issue persists**.
This report analyzes both issues, identifies potential root causes, and proposes solutions.
---
## 2. Problem Analysis
### 2.1 Symptom Description
The user reports the following behavior after upgrading to v9.0.0:
```
[INFO ] [SDK ] [session-1] <- Response received (72 chars) {promptNumber=57} Ready to observe. Awaiting tool execution data from the primary session.
[INFO ] [DB ] [session-1] STORED | sessionDbId=1 | memorySessionId=xxx | obsCount=0 | obsIds=[] | summaryId=none
```
Key observations:
- The SDK agent is starting correctly (`Generator auto-starting`)
- Tool executions are being received (`PostToolUse: Bash(cat ~/.claude-mem/settings.json)`)
- Messages are being queued (`ENQUEUED | messageId=596 | type=observation`)
- Messages are being claimed by the agent (`CLAIMED | messageId=596`)
- **BUT:** The agent returns "Ready to observe. Awaiting tool execution data" instead of actual observations
- Result: `obsCount=0` persists across all tool calls
### 2.2 Version Mismatch Issue (Resolved)
The user also encountered a version mismatch causing infinite restarts:
```
[INFO ] [SYSTEM] Worker version mismatch detected - auto-restarting {pluginVersion=9.0.0, workerVersion=8.5.9}
```
**Resolution:** This issue was fixed in commit `e22e2bfc` (PR #567) by:
1. Updating `plugin/package.json` from 8.5.10 to 9.0.0
2. Rebuilding all hooks and worker service with correct version injection
3. Adding version consistency tests
However, the user reports that **even after resolving the restart loop, observations still weren't being created**.
---
## 3. Technical Details
### 3.1 Architecture Overview
The claude-mem observation pipeline works as follows:
```
User Session -> PostToolUse Hook -> Worker HTTP API -> Session Queue -> SDK Agent -> Database
(save-hook.ts) (/api/sessions/ (SessionManager) (SDKAgent.ts)
observations)
```
### 3.2 SDK Agent Prompt System
The SDK agent uses a mode-based prompt system loaded from `/plugin/modes/code.json`:
1. **Initial Prompt (`buildInitPrompt`)**: Full initialization with system identity, observer role, recording focus
2. **Continuation Prompt (`buildContinuationPrompt`)**: For subsequent tool observations in the same session
3. **Observation Prompt (`buildObservationPrompt`)**: Wraps tool execution data in XML format
**Key files:**
- `/src/services/worker/SDKAgent.ts` - Agent implementation (lines 100-213)
- `/src/sdk/prompts.ts` - Prompt building functions (lines 29-235)
- `/plugin/modes/code.json` - Mode configuration with prompt templates
### 3.3 Message Flow Analysis
From the logs, the flow appears correct up to SDK query:
```
1. PostToolUse hook fires -> /api/sessions/observations
2. SessionManager.queueObservation() persists to PendingMessageStore
3. EventEmitter notifies SDK agent
4. SDK agent yields observation prompt to Claude SDK
5. Claude SDK returns response -> "Ready to observe. Awaiting tool execution data"
6. No observations parsed -> obsCount=0
```
### 3.4 Suspicious Log Entry
```
promptType=CONTINUATION
lastPromptNumber=57
```
The `promptNumber=57` suggests this is a continuation of an existing session, not a fresh start. The `CONTINUATION` prompt type is used when `session.lastPromptNumber > 1`.
**Potential Issue:** If the SDK session context was lost (e.g., due to the restart loop), the `memorySessionId` may be stale, but the system is attempting to resume a session that no longer exists in the Claude SDK's context.
### 3.5 Code Analysis: Resume Logic
From `SDKAgent.ts` (lines 71-114):
```typescript
// CRITICAL: Only resume if:
// 1. memorySessionId exists (was captured from a previous SDK response)
// 2. lastPromptNumber > 1 (this is a continuation within the same SDK session)
// On worker restart or crash recovery, memorySessionId may exist from a previous
// SDK session but we must NOT resume because the SDK context was lost.
const hasRealMemorySessionId = !!session.memorySessionId;
const queryResult = query({
prompt: messageGenerator,
options: {
model: modelId,
// Only resume if BOTH: (1) we have a memorySessionId AND (2) this isn't the first prompt
...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: session.memorySessionId }),
// ...
}
});
```
**Critical Finding:** The code attempts to resume the SDK session if `memorySessionId` exists AND `lastPromptNumber > 1`. However, if the worker restarted (due to version mismatch), the SDK context is lost but the `memorySessionId` may still exist in the database from a previous session.
The code at lines 92-98 attempts to detect this:
```typescript
// INIT prompt - never resume even if memorySessionId exists (stale from previous session)
if (hasStaleMemoryId) {
logger.warn('SDK', `Skipping resume for INIT prompt despite existing memorySessionId=${session.memorySessionId} - SDK context was lost (worker restart or crash recovery)`);
}
```
But this only applies when `lastPromptNumber === 1`. If `lastPromptNumber > 1`, the code still attempts to resume with a potentially stale `memorySessionId`.
---
## 4. Impact Assessment
### 4.1 Severity: **Critical**
- **Data Loss:** Observations are not being persisted, resulting in complete loss of session memory
- **Core Functionality Broken:** The primary purpose of claude-mem (persistent memory) is non-functional
- **User Experience:** Users see no value from the plugin after upgrade
### 4.2 Scope
- **Affected Users:** All users who upgraded to v9.0.0 and had existing sessions
- **Trigger Condition:** Appears to occur when:
1. Worker restarts (due to version mismatch or other reasons)
2. Session has existing `memorySessionId` in database
3. Session has `lastPromptNumber > 1`
### 4.3 Workaround
Users can work around by:
1. Clearing the database: `rm ~/.claude-mem/claude-mem.db`
2. Starting fresh sessions
However, this results in loss of all historical observations.
---
## 5. Root Cause Analysis
### 5.1 Primary Hypothesis: Stale Session Resume
**Root Cause:** The SDK agent attempts to resume a session using a `memorySessionId` that no longer exists in the Claude SDK's context (because the SDK process was terminated during the restart loop).
**Evidence:**
1. `promptNumber=57` suggests continuation of existing session
2. `promptType=CONTINUATION` indicates resume path is being taken
3. The response "Ready to observe. Awaiting tool execution data" suggests the SDK received a continuation prompt without the necessary context
**Code Path:**
1. Worker restarts due to version mismatch
2. Session is reloaded from database with `memory_session_id` and `lastPromptNumber=57`
3. `SDKAgent.startSession()` evaluates `hasRealMemorySessionId=true` and `lastPromptNumber > 1`
4. Adds `resume: memorySessionId` to query options
5. Claude SDK attempts to resume non-existent session
6. Claude SDK responds with generic "awaiting data" message instead of processing observations
### 5.2 Secondary Hypothesis: Prompt Format Issue
The SDK agent might not be receiving the observation data in the expected format. The `buildObservationPrompt` function formats tool data as:
```xml
<observed_from_primary_session>
<what_happened>Bash</what_happened>
<occurred_at>2026-01-07T...</occurred_at>
<parameters>...</parameters>
<outcome>...</outcome>
</observed_from_primary_session>
```
If the Claude model doesn't recognize this as actionable tool data (expecting a different format), it might respond with the generic message.
### 5.3 Tertiary Hypothesis: Mode Configuration Issue
The mode system loads configuration from `/plugin/modes/code.json`. If the mode fails to load or loads incorrectly, the prompts may be malformed.
From `ModeManager.ts`:
```typescript
loadMode(modeId: string): ModeConfig {
// Falls back to 'code' if mode not found
// Throws only if 'code.json' is missing
}
```
---
## 6. Recommended Solutions
### 6.1 Immediate Fix: Invalidate Stale Session IDs on Worker Restart
**Priority:** Critical
**Effort:** Low
**File:** `src/services/worker/SDKAgent.ts`
Add detection for worker restart scenarios and invalidate stale `memorySessionId`:
```typescript
// Before starting SDK query, check if this is a recovery scenario
// If worker restarted but session was mid-flight, the SDK context is lost
// We should start fresh instead of attempting to resume
if (session.memorySessionId && !isWorkerSameProcess(session.memorySessionId)) {
logger.warn('SDK', 'Invalidating stale memorySessionId due to worker restart', {
sessionDbId: session.sessionDbId,
staleMemorySessionId: session.memorySessionId
});
session.memorySessionId = null;
this.dbManager.getSessionStore().updateMemorySessionId(session.sessionDbId, null);
}
```
### 6.2 Short-Term Fix: Add Resume Validation
**Priority:** High
**Effort:** Medium
**File:** `src/services/worker/SDKAgent.ts`
Before attempting resume, validate that the session exists in the SDK:
```typescript
// Validate memorySessionId before attempting resume
if (hasRealMemorySessionId && session.lastPromptNumber > 1) {
const isValidSession = await this.validateSDKSession(session.memorySessionId);
if (!isValidSession) {
logger.warn('SDK', 'memorySessionId no longer valid, starting fresh', {
sessionDbId: session.sessionDbId,
invalidMemorySessionId: session.memorySessionId
});
session.memorySessionId = null;
session.lastPromptNumber = 1; // Reset to trigger INIT prompt
}
}
```
### 6.3 Long-Term Fix: Add Worker Instance Tracking
**Priority:** Medium
**Effort:** High
**Files:** Multiple
Track worker instance ID in the database to detect restart scenarios:
1. Generate unique worker instance ID on startup
2. Store with each session's `memorySessionId`
3. On session load, compare worker instance ID
4. If mismatch, invalidate `memorySessionId` and restart fresh
### 6.4 Additional Recommendations
1. **Add diagnostic logging:** Log the full prompt being sent to SDK for debugging
2. **Add retry logic:** If SDK returns generic response, retry with INIT prompt
3. **Add health check:** Validate SDK session state before processing observations
4. **Update VERSION_FIX.md:** Document the observation storage issue as a related symptom
---
## 7. Priority/Severity Assessment
| Aspect | Rating | Justification |
|--------|--------|---------------|
| **Severity** | Critical | Core functionality completely broken |
| **Impact** | High | All v9.0.0 users with existing sessions affected |
| **Urgency** | High | Users currently losing all observation data |
| **Complexity** | Medium | Root cause identified, fix is localized |
| **Risk** | Low | Fix is additive, doesn't change happy path |
### Recommended Priority: **P0 - Critical**
This should be addressed immediately with a patch release (v9.0.1).
---
## 8. References
### Relevant Files
- `/src/services/worker/SDKAgent.ts` - SDK agent implementation
- `/src/sdk/prompts.ts` - Prompt building functions
- `/src/services/worker/SessionManager.ts` - Session lifecycle management
- `/src/services/infrastructure/HealthMonitor.ts` - Version checking
- `/docs/VERSION_FIX.md` - Documentation of version mismatch fix
### Related Issues
- PR #567 - Fix version mismatch causing infinite worker restart loop
- Commit `e22e2bfc` - Version mismatch fix
### Test Files
- `/tests/infrastructure/version-consistency.test.ts` - Version consistency tests
---
## 9. Appendix: Full Log Excerpt
```
[INFO ] [HOOK ] -> PostToolUse: Bash(cat ~/.claude-mem/settings.json) {workerPort=37777}
[INFO ] [HTTP ] -> POST /api/sessions/observations {requestId=POST-xxx}
[INFO ] [QUEUE ] [session-1] ENQUEUED | sessionDbId=1 | messageId=596 | type=observation | tool=Bash(...) | depth=1
[INFO ] [SESSION] [session-1] Generator auto-starting (observation) using Claude SDK {queueDepth=0, historyLength=0}
[INFO ] [SDK ] Starting SDK query {sessionDbId=1, ..., lastPromptNumber=57, isInitPrompt=false, promptType=CONTINUATION}
[INFO ] [SDK ] Creating message generator {..., promptType=CONTINUATION}
[INFO ] [QUEUE ] [session-1] CLAIMED | sessionDbId=1 | messageId=596 | type=observation
[INFO ] [SDK ] [session-1] <- Response received (72 chars) {promptNumber=57} Ready to observe. Awaiting tool execution data from the primary session.
[INFO ] [DB ] [session-1] STORED | sessionDbId=1 | ... | obsCount=0 | obsIds=[] | summaryId=none
```

View File

@@ -0,0 +1,434 @@
# Issue #588: Unexpected API Charges from ANTHROPIC_API_KEY Discovery
**Date:** January 7, 2026
**Status:** INVESTIGATION COMPLETE - Critical UX/Financial Issue
**Priority:** HIGH
**Labels:** bug, financial-impact, ux
**Author:** imkane
**Version Affected:** 9.0.0 and earlier
---
## Executive Summary
A user with a Claude Max subscription ($100/month) began receiving unexpected "Auto-recharge credits" invoice emails from Anthropic after installing the claude-mem plugin. The plugin discovered an `ANTHROPIC_API_KEY` in a `.env` file in the project root and used it for AI operations (observation compression, summary generation), causing direct API charges that were not anticipated by the user.
**Financial Impact:** The user expected all AI costs to be covered by their Claude Max subscription. Instead, the plugin consumed their Anthropic API credits separately, triggering auto-recharge billing.
**Root Cause:** The Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`) automatically discovers and uses `ANTHROPIC_API_KEY` from environment variables or `.env` files. Claude-mem's worker service runs AI operations (observation compression, summaries) through this SDK, which consumes API credits independently of the user's Claude Max subscription.
---
## Problem Analysis
### User Expectations vs. Reality
| Expectation | Reality |
|-------------|---------|
| Claude Max ($100/mo) covers all Claude usage | Claude Max covers Claude Code IDE usage only |
| Plugin enhances Claude Code without extra cost | Plugin uses separate API calls via SDK |
| No API key needed since using Claude Max | SDK auto-discovers `.env` API keys |
| Billing would be transparent | Silent API key discovery leads to surprise charges |
### The Discovery Flow
1. User installs claude-mem plugin via marketplace
2. User has an `ANTHROPIC_API_KEY` in project `.env` file (for other purposes)
3. Plugin worker starts on first Claude Code session
4. Worker spawns Claude Agent SDK for observation processing
5. SDK auto-discovers `ANTHROPIC_API_KEY` from environment
6. Every observation compression and session summary uses API credits
7. User receives unexpected invoice for API usage
---
## Technical Details
### How claude-mem Uses the Claude Agent SDK
Claude-mem uses `@anthropic-ai/claude-agent-sdk` (version ^0.1.76) for AI-powered operations:
**File:** `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/services/worker/SDKAgent.ts`
```typescript
// Line 26
import { query } from '@anthropic-ai/claude-agent-sdk';
// Lines 100-114 - SDK query execution
const queryResult = query({
prompt: messageGenerator,
options: {
model: modelId,
...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: session.memorySessionId }),
disallowedTools,
abortController: session.abortController,
pathToClaudeCodeExecutable: claudePath
}
});
```
### API Key Discovery Chain
The Claude Agent SDK uses a standard discovery mechanism:
1. **Environment Variable:** `process.env.ANTHROPIC_API_KEY`
2. **File Discovery:** `~/.anthropic/api_key` or project `.env` files
3. **Inherited Environment:** Claude Code passes its environment to spawned processes
**From hooks architecture documentation (line 826-828):**
```markdown
### API Key Protection
**Configuration:**
- Anthropic API key in `~/.anthropic/api_key` or `ANTHROPIC_API_KEY` env var
- Worker inherits environment from Claude Code
- Never logged or stored in database
```
### Worker Service Environment Inheritance
**File:** `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/services/worker-service.ts`
```typescript
// Line 263 - Worker spawns with full environment
env: process.env
```
**File:** `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/services/infrastructure/ProcessManager.ts`
```typescript
// Line 273 - Process inherits environment
...process.env,
```
This means any `ANTHROPIC_API_KEY` present in the parent process environment or discovered from `.env` files will be used by the worker.
### What Operations Consume API Credits
| Operation | Trigger | API Usage |
|-----------|---------|-----------|
| Observation Compression | PostToolUse hook | ~0.5-2K tokens per observation |
| Session Summary | Summary hook | ~2-5K tokens per session |
| Follow-up Queries | Multi-turn processing | Variable |
**Estimated Usage Per Session:**
- Active coding session: 20-50 tool uses
- At ~1.5K tokens per observation: 30-75K tokens
- Session summary: ~3K tokens
- **Total per session: 33-78K tokens**
### Alternative Providers (Not Using Anthropic API)
Claude-mem supports alternative AI providers that DO NOT use the Anthropic API:
**File:** `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/services/worker/GeminiAgent.ts`
```typescript
// Line 376
const apiKey = settings.CLAUDE_MEM_GEMINI_API_KEY || process.env.GEMINI_API_KEY || '';
```
**File:** `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/services/worker/OpenRouterAgent.ts`
```typescript
// Line 418
const apiKey = settings.CLAUDE_MEM_OPENROUTER_API_KEY || process.env.OPENROUTER_API_KEY || '';
```
These providers require explicit configuration and do not auto-discover.
---
## Impact Assessment
### Financial Impact
| Scenario | Estimated Monthly Cost |
|----------|------------------------|
| Light usage (5 sessions/day) | $10-30 |
| Moderate usage (15 sessions/day) | $30-90 |
| Heavy usage (30+ sessions/day) | $90-200+ |
**Compounding Factors:**
- Auto-recharge enabled by default on Anthropic accounts
- No notification before charges occur
- User may not realize plugin is source of usage
### User Experience Impact
1. **Trust Violation:** Users expect plugins to be transparent about costs
2. **Subscription Confusion:** Claude Max subscription doesn't cover SDK API usage
3. **No Consent:** API key used without explicit opt-in
4. **Discovery Difficulty:** Source of charges not immediately obvious
### Affected User Base
- Users with `ANTHROPIC_API_KEY` in project `.env` files
- Users with API key in `~/.anthropic/api_key`
- Users who exported `ANTHROPIC_API_KEY` for other tools
- Users who don't know they have an API key configured
---
## Root Cause Analysis
### Primary Root Cause
**Silent API key auto-discovery by the Claude Agent SDK without user consent or notification.**
The SDK is designed for developer use cases where explicit API key configuration is expected. When used within a plugin context, the automatic discovery behavior creates a mismatch between user expectations and system behavior.
### Contributing Factors
1. **No Pre-Flight Check:** Plugin doesn't warn users that it will use API credits
2. **No Opt-In Flow:** API key usage happens automatically without consent
3. **No Usage Visibility:** No way to see API consumption before it happens
4. **Documentation Gap:** Not clearly documented that separate API credits are used
5. **Provider Default:** Default provider is 'claude' which uses Anthropic API
### Why This Wasn't Caught Earlier
- Developer testing uses API keys intentionally
- Claude Max subscription model is newer
- Auto-discovery is a feature for SDK users, not plugin users
- No telemetry on API key discovery
---
## Recommended Solutions
### Immediate Fixes (v9.0.1)
#### 1. Add First-Run Warning
Display a prominent warning on first plugin activation:
```
[claude-mem] IMPORTANT: This plugin uses the Claude Agent SDK for AI operations.
If you have an ANTHROPIC_API_KEY configured, it will be used for:
- Observation compression
- Session summaries
This may incur separate API charges beyond your Claude Max subscription.
To avoid charges, configure an alternative provider in ~/.claude-mem/settings.json:
- Set CLAUDE_MEM_PROVIDER to "gemini" or "openrouter"
- Or ensure no ANTHROPIC_API_KEY is accessible to the plugin
Continue? [Y/n]
```
#### 2. Detect and Warn About API Key
Add a check during worker initialization:
```typescript
// Pseudo-code for worker-service.ts
const hasAnthropicKey = !!(
process.env.ANTHROPIC_API_KEY ||
existsSync(join(homedir(), '.anthropic', 'api_key'))
);
if (hasAnthropicKey && provider === 'claude') {
logger.warn('SYSTEM',
'ANTHROPIC_API_KEY detected. Plugin AI operations will consume API credits. ' +
'Configure CLAUDE_MEM_PROVIDER in settings.json to use a free alternative.'
);
}
```
#### 3. Default to Free Provider
Change default provider from 'claude' to 'gemini' (free tier available):
**File:** `src/shared/SettingsDefaultsManager.ts`
```typescript
// Line 66 - Change default
CLAUDE_MEM_PROVIDER: 'gemini', // Changed from 'claude' - free tier by default
```
### Medium-Term Solutions (v9.1.0)
#### 4. Opt-In API Key Usage
Require explicit configuration to use Anthropic API:
```json
// ~/.claude-mem/settings.json
{
"CLAUDE_MEM_PROVIDER": "claude",
"CLAUDE_MEM_ANTHROPIC_API_KEY_CONSENT": true // New required field
}
```
#### 5. Usage Estimation Before Processing
Show estimated token usage before processing:
```
[claude-mem] Processing 25 observations
Estimated API usage: ~37,500 tokens (~$0.15)
Provider: claude (ANTHROPIC_API_KEY)
```
#### 6. Environment Isolation
Prevent automatic API key inheritance:
```typescript
// In worker spawn
env: {
...process.env,
ANTHROPIC_API_KEY: undefined, // Explicitly unset unless opted-in
}
```
### Long-Term Solutions (v10.0.0)
#### 7. Built-In Usage Dashboard
Add API usage tracking to the viewer UI at http://localhost:37777:
- Total tokens consumed this session/day/month
- Estimated costs by provider
- Warning thresholds
#### 8. Provider Configuration Wizard
First-run wizard in viewer UI:
1. "Choose your AI provider for memory operations"
2. Options: Free (Gemini), Pay-per-use (Claude/OpenRouter), Self-hosted
3. Configure API keys through UI, not discovery
---
## Priority/Severity Assessment
### Severity: HIGH
**Rationale:**
- Direct financial impact on users
- Trust violation in plugin ecosystem
- No user consent for charges
- Difficult to discover source of charges
- Affects users who believed Claude Max covered all costs
### Priority: P1 - Critical
**Rationale:**
- Active financial harm to users
- Reputation risk for plugin
- Simple fixes available
- User trust requires immediate action
### Recommended Timeline
| Milestone | Target | Description |
|-----------|--------|-------------|
| Hotfix | 48 hours | Add warning message, update docs |
| v9.0.1 | 1 week | Detection, warning, default provider change |
| v9.1.0 | 2 weeks | Opt-in flow, usage estimation |
| v10.0.0 | 1 month | Full usage dashboard, configuration wizard |
---
## Files to Modify
| File | Change |
|------|--------|
| `src/services/worker-service.ts` | Add API key detection and warning |
| `src/shared/SettingsDefaultsManager.ts` | Change default provider to 'gemini' |
| `plugin/scripts/context-hook.js` | Add first-run warning |
| `docs/public/installation.mdx` | Document API key usage clearly |
| `docs/public/configuration.mdx` | Add provider selection guidance |
| `CHANGELOG.md` | Document the change |
---
## Testing Recommendations
### Test Cases to Add
1. **API Key Detection Test:** Verify warning appears when ANTHROPIC_API_KEY present
2. **Default Provider Test:** Ensure new installs default to gemini
3. **Opt-In Test:** Verify claude provider requires explicit consent
4. **Environment Isolation Test:** Confirm API key not inherited without consent
### Manual Testing
```bash
# Test 1: Clean environment (should default to gemini)
unset ANTHROPIC_API_KEY
claude # Start Claude Code with plugin
# Test 2: With API key (should show warning)
export ANTHROPIC_API_KEY="sk-test-key"
claude # Should display warning
# Test 3: Explicit opt-in
# Configure settings.json with consent flag
claude # Should use claude provider without warning
```
---
## Conclusion
Issue #588 represents a critical UX and financial issue where the plugin's use of the Claude Agent SDK results in unexpected API charges for users who have an `ANTHROPIC_API_KEY` configured. The auto-discovery behavior, while useful for developers, creates a poor user experience for plugin users who expect their Claude Max subscription to cover all costs.
**Immediate Action Required:**
1. Release hotfix with warning message
2. Update documentation to clearly state API usage
3. Change default provider to free tier (gemini)
4. Implement opt-in consent for Anthropic API usage
**The fix is straightforward, but the impact on user trust requires prompt action.**
---
## Appendix: Related Issues and Documentation
| Resource | Description |
|----------|-------------|
| [Claude Agent SDK Docs](https://docs.anthropic.com/claude/docs/agent-sdk) | SDK documentation |
| `docs/public/hooks-architecture.mdx` | Hooks and API key documentation |
| `docs/public/configuration.mdx` | Settings configuration reference |
| Issue #511 | Related: Gemini model support |
| Issue #527 | Related: Provider detection issues |
---
## Appendix: User Communication Template
**Suggested Announcement/Changelog Entry:**
```markdown
## Important Notice for v9.0.1
### API Key Usage Disclosure
Claude-mem uses AI for observation compression and session summaries.
If you have an `ANTHROPIC_API_KEY` configured (in ~/.anthropic/api_key,
environment variables, or project .env files), the plugin will use
Anthropic API credits for these operations.
**This is separate from your Claude Max subscription.**
### Changes in v9.0.1
- **Default provider changed to Gemini** (free tier available)
- **Warning displayed** when ANTHROPIC_API_KEY is detected
- **Opt-in required** to use Anthropic API for plugin operations
### For existing users
If you experienced unexpected charges:
1. Check your provider setting: `~/.claude-mem/settings.json`
2. Set `CLAUDE_MEM_PROVIDER` to `"gemini"` for free operation
3. Or remove/unset `ANTHROPIC_API_KEY` if not needed for other tools
We apologize for any confusion or unexpected charges caused by this behavior.
```

View File

@@ -0,0 +1,411 @@
# Issue #590: Blank Terminal Window Pops Up on Windows When Chroma MCP Server Starts
**Date:** 2026-01-07
**Issue Author:** dwd898
**Severity:** Medium (UX disruption, not a functional failure)
**Status:** OPEN - Root cause confirmed, multiple solutions proposed
---
## 1. Executive Summary
On Windows 11, when claude-mem starts the Chroma MCP server via `uvx`, a blank terminal window (Windows Terminal / PowerShell) appears and does not close automatically. Users must manually close this window each time, which disrupts the workflow.
The root cause is that the MCP SDK's `StdioClientTransport` class does not pass the `windowsHide: true` option to the underlying `child_process.spawn()` call. While the claude-mem codebase attempts to set this option, it has no effect because the MCP SDK ignores it.
This issue affects all Windows users who have ChromaDB vector search enabled (the default configuration).
---
## 2. Problem Analysis
### 2.1 User-Reported Symptoms
- A blank terminal window appears when any action triggers Chroma initialization
- The window shows the `uvx.exe` path but contains no output
- The window remains open until manually closed by the user
- This occurs every time ChromaDB is initialized (typically once per Claude session)
### 2.2 Environment Details
| Component | Value |
|-----------|-------|
| OS | Windows 11 64-bit |
| Terminal | PowerShell 7.6.0-preview.6 |
| claude-mem version | 9.0.0 |
| uvx location | `C:\Users\Dell\AppData\Local\Microsoft\WinGet\Links\uvx.exe` |
| MCP SDK version | ^1.25.1 |
### 2.3 Trigger Conditions
The terminal popup occurs when:
1. Claude Code starts a new session with claude-mem enabled
2. A search query is executed with semantic search enabled
3. The ChromaSync service initializes for the first time in a session
4. Any backfill operation triggers Chroma connection
---
## 3. Technical Details
### 3.1 Affected Code Location
**File:** `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/services/sync/ChromaSync.ts`
**Lines:** 106-124
```typescript
const transportOptions: any = {
command: 'uvx',
args: [
'--python', pythonVersion,
'chroma-mcp',
'--client-type', 'persistent',
'--data-dir', this.VECTOR_DB_DIR
],
stderr: 'ignore'
};
// CRITICAL: On Windows, try to hide console window to prevent PowerShell popups
// Note: windowsHide may not be supported by MCP SDK's StdioClientTransport
if (isWindows) {
transportOptions.windowsHide = true;
logger.debug('CHROMA_SYNC', 'Windows detected, attempting to hide console window', { project: this.project });
}
this.transport = new StdioClientTransport(transportOptions);
```
### 3.2 Why windowsHide Fails
The `StdioClientTransport` class from `@modelcontextprotocol/sdk` accepts configuration options but does **not** forward `windowsHide` to `child_process.spawn()`. The SDK's transport implementation only uses a subset of spawn options:
- `command` - The executable to run
- `args` - Command line arguments
- `env` - Environment variables (optional)
- `stderr` - Stderr handling mode
The `windowsHide` option is silently ignored because it's not part of the SDK's expected interface.
### 3.3 MCP SDK Transport Architecture
```
ChromaSync.ts
|
v
StdioClientTransport (MCP SDK)
|
v
child_process.spawn() [internal to SDK]
|
v
uvx.exe subprocess
|
v
chroma-mcp Python process
```
The SDK controls the spawn call, so claude-mem cannot directly influence the spawn options.
### 3.4 Comparison with Other Subprocess Calls
Other parts of claude-mem successfully hide Windows console windows because they use `child_process.spawn()` directly:
| Component | File | Uses windowsHide | Works on Windows |
|-----------|------|------------------|------------------|
| ProcessManager | `ProcessManager.ts:271` | Yes (direct spawn) | Yes |
| SDKAgent | `SDKAgent.ts:379` | Yes (direct spawn) | Yes |
| BranchManager | `BranchManager.ts:61,88` | Yes (direct spawn) | Yes |
| shared/paths | `paths.ts:103` | Yes (direct spawn) | Yes |
| ChromaSync | `ChromaSync.ts:120` | Yes (via SDK - ignored) | **No** |
---
## 4. Impact Assessment
### 4.1 Affected Users
- All Windows users with ChromaDB enabled (default)
- Approximately 100% of Windows user base
### 4.2 Severity Breakdown
| Aspect | Impact |
|--------|--------|
| Functionality | No impact - Chroma works correctly |
| UX Disruption | Medium - Requires manual window close |
| Workflow Impact | Low - One-time per session |
| Data Integrity | None |
| Security | None |
### 4.3 Workaround Availability
**Current Workaround:** Users can manually close the terminal window. The Chroma process continues running in the background even after the window is closed.
---
## 5. Root Cause Analysis
### 5.1 Primary Cause
The MCP SDK's `StdioClientTransport` class does not implement support for the `windowsHide` spawn option. This is a limitation in the SDK, not a bug in claude-mem.
### 5.2 SDK Gap Analysis
The MCP SDK (version 1.25.1) provides a transport abstraction layer but does not expose all Node.js spawn options. The `StdioClientTransport` constructor signature accepts:
```typescript
interface StdioClientTransportOptions {
command: string;
args?: string[];
env?: NodeJS.ProcessEnv;
stderr?: 'inherit' | 'pipe' | 'ignore';
}
```
Notable missing options:
- `windowsHide`
- `detached`
- `cwd`
- `shell`
### 5.3 Historical Context
The claude-mem codebase has extensively addressed Windows console popup issues in other areas:
- **December 4, 2025:** Added `windowsHide` parameter to ProcessManager
- **December 17, 2025:** PR #378 standardized `windowsHide: true` across all direct spawn calls
- **Known Issue:** The comment in ChromaSync.ts (line 118) explicitly acknowledges this limitation
---
## 6. Recommended Solutions
### 6.1 Solution 1: PowerShell Wrapper (Recommended Short-Term)
**Approach:** Wrap the `uvx` command in a PowerShell invocation that hides the window.
**Implementation:**
```typescript
const transportOptions: any = {
command: 'powershell',
args: [
'-WindowStyle', 'Hidden',
'-Command',
`uvx --python ${pythonVersion} chroma-mcp --client-type persistent --data-dir '${this.VECTOR_DB_DIR}'`
],
stderr: 'ignore'
};
```
**Pros:**
- No SDK changes required
- Immediate fix possible
- Pattern already used in worker-cli.js (lines 1-19)
**Cons:**
- Adds PowerShell dependency (already required for Windows)
- Slightly more complex command construction
- PATH escaping considerations
**Estimated Effort:** 2-4 hours
### 6.2 Solution 2: Custom Transport Layer
**Approach:** Bypass `StdioClientTransport` and implement a custom transport using `child_process.spawn()` directly.
**Implementation:**
```typescript
import { Transport } from '@modelcontextprotocol/sdk/shared/transport.js';
import { spawn, ChildProcess } from 'child_process';
class WindowsHiddenStdioTransport implements Transport {
private process: ChildProcess;
constructor(options: TransportOptions) {
this.process = spawn(options.command, options.args, {
windowsHide: true,
stdio: ['pipe', 'pipe', options.stderr === 'ignore' ? 'ignore' : 'pipe']
});
}
// ... implement Transport interface
}
```
**Pros:**
- Full control over spawn options
- Clean, maintainable solution
- Reusable for other MCP clients
**Cons:**
- Requires implementing Transport interface
- Must handle stdin/stdout piping manually
- More complex error handling
**Estimated Effort:** 8-16 hours
### 6.3 Solution 3: Upstream SDK Enhancement
**Approach:** Request the MCP SDK team to add `windowsHide` support to `StdioClientTransport`.
**Implementation:**
1. Open issue on MCP SDK repository
2. Propose API extension: `spawnOptions?: Partial<SpawnOptions>`
3. Provide PR if accepted
**Pros:**
- Fixes the root cause
- Benefits all MCP SDK users on Windows
- No workarounds needed
**Cons:**
- Depends on external team
- Uncertain timeline
- May require SDK version bump
**Estimated Effort:** Variable (depends on upstream response)
### 6.4 Solution 4: VBS Wrapper Script
**Approach:** Use a Windows Script Host (VBS) file to launch the process silently.
**Implementation:**
Create `launch-chroma.vbs`:
```vbs
Set WshShell = CreateObject("WScript.Shell")
WshShell.Run "uvx --python 3.13 chroma-mcp --client-type persistent --data-dir " & DataDir, 0, False
```
**Pros:**
- Guaranteed hidden window
- Works on all Windows versions
**Cons:**
- Requires additional script file
- Complex path handling
- VBS is deprecated technology
**Estimated Effort:** 4-6 hours
---
## 7. Priority/Severity Assessment
### 7.1 Severity Matrix
| Factor | Rating | Justification |
|--------|--------|---------------|
| User Impact | Medium | Annoying but not blocking |
| Frequency | Low | Once per session |
| Workaround | Yes | Close window manually |
| Data Risk | None | No data loss or corruption |
| Security Risk | None | No security implications |
### 7.2 Recommended Priority
**Priority: P2 (Medium)**
This issue should be addressed in the next minor release but is not urgent enough to warrant an immediate patch release.
### 7.3 Recommendation
Implement **Solution 1 (PowerShell Wrapper)** as an immediate fix for the next release. Simultaneously, open an upstream issue for **Solution 3** to address the root cause in the MCP SDK.
---
## 8. Related Issues and Context
### 8.1 Related GitHub Issues
| Issue | Title | Relationship |
|-------|-------|--------------|
| #367 | Console windows appearing during hook execution | Similar root cause |
| #517 | PowerShell `$_` escaping in Git Bash | Windows shell escaping |
| #555 | Windows hooks IPC issues | Windows platform challenges |
### 8.2 Related PRs
| PR | Title | Relevance |
|----|-------|-----------|
| #378 | Windows stabilization | Added windowsHide to other spawn calls |
| #372 | Worker wrapper architecture | Similar Windows console hiding approach |
### 8.3 Documentation
- Windows Woes Report: `/docs/reports/2026-01-06--windows-woes-comprehensive-report.md`
- Windows Troubleshooting: https://docs.claude-mem.ai/troubleshooting/windows-issues
---
## 9. Testing Recommendations
### 9.1 Test Cases
1. **Basic functionality:** Verify Chroma starts correctly with proposed fix
2. **Window visibility:** Confirm no terminal window appears
3. **Process lifecycle:** Ensure Chroma process terminates on worker shutdown
4. **Error handling:** Verify errors are properly captured despite hidden window
5. **PATH variations:** Test with uvx in different PATH locations
### 9.2 Test Environments
- Windows 11 with PowerShell 7.x
- Windows 11 with PowerShell 5.1
- Windows 10 with PowerShell 5.1
- Windows with Git Bash as default shell
---
## 10. Appendix
### 10.1 Current ChromaSync Connection Flow
```
1. ChromaSync.ensureConnection() called
2. Check if already connected
3. Load Python version from settings
4. Detect Windows platform
5. Set windowsHide: true (ineffective)
6. Create StdioClientTransport with uvx command
7. Connect MCP client to transport
-> POPUP APPEARS HERE
8. Mark as connected
```
### 10.2 PowerShell Command Pattern (from worker-cli.js)
The existing pattern for hidden PowerShell execution:
```typescript
const cmd = `Start-Process -FilePath '${escapedPath}' -ArgumentList '${args}' -WindowStyle Hidden`;
spawnSync("powershell", ["-Command", cmd], {
stdio: "pipe",
timeout: 10000,
windowsHide: true
});
```
### 10.3 MCP SDK Source Reference
The StdioClientTransport implementation in `@modelcontextprotocol/sdk` uses:
```typescript
this._process = spawn(command, args, {
env: this._env,
stdio: ['pipe', 'pipe', stderr]
});
```
Note the absence of `windowsHide` in the spawn options.
---
## 11. Revision History
| Date | Author | Changes |
|------|--------|---------|
| 2026-01-07 | Claude Opus 4.5 | Initial report |

View File

@@ -0,0 +1,423 @@
# Issue #591: OpenRouter Agent Fails to Capture memorySessionId for Empty Prompt History Sessions
**Report Date:** 2026-01-07
**Issue:** [#591](https://github.com/thedotmack/claude-mem/issues/591)
**Reporter:** cjdrilke
**Environment:** claude-mem 9.0.0, Provider: openrouter, Model: xiaomi/mimo-v2-flash:free, Platform: linux
---
## 1. Executive Summary
This issue describes a critical failure in the OpenRouter agent where it cannot store observations for sessions that have an empty prompt history (`prompt_counter = 0`). The error message "Cannot store observations: memorySessionId not yet captured" indicates that the `memorySessionId` is `null` when `processAgentResponse()` attempts to store observations.
**Key Finding:** Unlike the Claude SDK Agent which captures `memorySessionId` from SDK response messages, the OpenRouter Agent has **no mechanism to capture or generate a memorySessionId**. This is a fundamental architectural gap that causes all OpenRouter sessions to fail on their first observation.
**Severity:** Critical
**Priority:** P1
**Impact:** All new OpenRouter sessions fail to store observations
---
## 2. Problem Analysis
### 2.1 Error Manifestation
```
Error: Cannot store observations: memorySessionId not yet captured
```
This error originates from `ResponseProcessor.ts` line 73-75:
```typescript
// CRITICAL: Must use memorySessionId (not contentSessionId) for FK constraint
if (!session.memorySessionId) {
throw new Error('Cannot store observations: memorySessionId not yet captured');
}
```
### 2.2 Affected Code Path
1. OpenRouter session starts via `OpenRouterAgent.startSession()`
2. Session is initialized with `memorySessionId: null`
3. OpenRouter API is queried and returns a response
4. `processAgentResponse()` is called with the response
5. **memorySessionId is still null** - no capture mechanism exists
6. Error thrown, observations not stored
### 2.3 Comparison with SDK Agent
The Claude SDK Agent successfully captures `memorySessionId` at `SDKAgent.ts` lines 120-141:
```typescript
// Capture memory session ID from first SDK message (any type has session_id)
if (!session.memorySessionId && message.session_id) {
session.memorySessionId = message.session_id;
// Persist to database for cross-restart recovery
this.dbManager.getSessionStore().updateMemorySessionId(
session.sessionDbId,
message.session_id
);
// ... verification logging ...
}
```
**The OpenRouter Agent has no equivalent capture mechanism.**
---
## 3. Technical Details
### 3.1 Session ID Architecture
Claude-mem uses a dual session ID system (documented in `docs/SESSION_ID_ARCHITECTURE.md`):
| ID | Purpose | Source |
|----|---------|--------|
| `contentSessionId` | User's Claude Code conversation ID | Hook system |
| `memorySessionId` | Memory agent's internal session for resume | SDK response |
### 3.2 Session Initialization Flow
```
1. Hook creates session
createSDKSession(contentSessionId, project, prompt)
Database state:
├─ content_session_id: "user-session-123"
└─ memory_session_id: NULL (not yet captured)
2. SessionManager.initializeSession() creates ActiveSession:
session = {
sessionDbId: number,
contentSessionId: "user-session-123",
memorySessionId: null, // ← Critical: starts as null
...
}
```
### 3.3 OpenRouter Response Format
OpenRouter uses an OpenAI-compatible API response format:
```typescript
interface OpenRouterResponse {
choices?: Array<{
message?: {
role?: string;
content?: string;
};
finish_reason?: string;
}>;
usage?: {
prompt_tokens?: number;
completion_tokens?: number;
total_tokens?: number;
};
error?: {
message?: string;
code?: string;
};
}
```
**Critical Gap:** This response format does NOT include a `session_id` field. OpenRouter is a stateless API that does not maintain server-side session state.
### 3.4 Root Cause in OpenRouterAgent.ts
In `OpenRouterAgent.startSession()` (lines 85-133), the init response is processed:
```typescript
const initResponse = await this.queryOpenRouterMultiTurn(session.conversationHistory, apiKey, model, siteUrl, appName);
if (initResponse.content) {
// Add response to conversation history
session.conversationHistory.push({ role: 'assistant', content: initResponse.content });
// ... token tracking ...
// Process response using shared ResponseProcessor (no original timestamp for init - not from queue)
await processAgentResponse(
initResponse.content,
session, // ← memorySessionId is still null here
this.dbManager,
this.sessionManager,
worker,
tokensUsed,
null,
'OpenRouter',
undefined
);
}
```
**No memorySessionId capture occurs between session initialization and calling `processAgentResponse()`.**
---
## 4. Impact Assessment
### 4.1 Direct Impact
- **All OpenRouter sessions fail** when `prompt_counter = 0` (new sessions)
- No observations are stored for OpenRouter-based memory extraction
- Error prevents any memory from being captured via OpenRouter
### 4.2 Scope of Impact
| Affected | Not Affected |
|----------|--------------|
| All OpenRouter providers | Claude SDK Agent |
| All OpenRouter models | Gemini Agent (if implemented differently) |
| New sessions (prompt_counter = 0) | Potentially resumed sessions* |
*Note: Resumed sessions may work if they were previously processed by Claude SDK and have a captured `memorySessionId` from a fallback.
### 4.3 User Experience
Users configuring OpenRouter as their provider will:
1. See successful API calls to OpenRouter
2. Receive no stored observations
3. See error messages in logs about memorySessionId not captured
4. Have an empty memory database despite apparent processing
---
## 5. Root Cause Analysis
### 5.1 Primary Root Cause
**The OpenRouter Agent was implemented without a mechanism to generate or capture `memorySessionId`.**
Unlike the Claude SDK which returns a `session_id` in its response messages, OpenRouter's OpenAI-compatible API is stateless and does not provide session identifiers.
### 5.2 Contributing Factors
1. **Architectural Mismatch**: The `memorySessionId` concept was designed around the Claude SDK's session management, which OpenRouter does not have.
2. **Missing Initialization Logic**: Neither the OpenRouter agent nor the ResponseProcessor generates a `memorySessionId` when one is not provided by the API.
3. **Shared ResponseProcessor Assumption**: `ResponseProcessor.ts` assumes `memorySessionId` is always captured before it is called, which is true for Claude SDK but not for OpenRouter.
### 5.3 Why It Worked Before (Speculation)
This may have been masked if:
- OpenRouter fallback to Claude SDK triggered before the bug manifested
- Initial testing used existing sessions with previously captured `memorySessionId`
- The feature was added without comprehensive test coverage for new sessions
---
## 6. Recommended Solutions
### 6.1 Solution A: Generate memorySessionId for OpenRouter (Recommended)
Since OpenRouter is stateless, generate a unique `memorySessionId` when starting an OpenRouter session:
**Location:** `OpenRouterAgent.ts` in `startSession()` method, after session initialization
```typescript
async startSession(session: ActiveSession, worker?: WorkerRef): Promise<void> {
try {
// Generate memorySessionId for stateless providers (OpenRouter doesn't have session tracking)
if (!session.memorySessionId) {
const generatedMemorySessionId = `openrouter-${session.contentSessionId}-${Date.now()}`;
session.memorySessionId = generatedMemorySessionId;
// Persist to database
this.dbManager.getSessionStore().updateMemorySessionId(
session.sessionDbId,
generatedMemorySessionId
);
logger.info('SESSION', `MEMORY_ID_GENERATED | sessionDbId=${session.sessionDbId} | memorySessionId=${generatedMemorySessionId} | provider=OpenRouter`, {
sessionId: session.sessionDbId,
memorySessionId: generatedMemorySessionId
});
}
// ... rest of existing code ...
}
}
```
**Pros:**
- Minimal code changes
- Follows existing patterns
- Works with stateless APIs
- Maintains FK integrity
**Cons:**
- Memory session ID format differs from Claude SDK
- No resume capability (OpenRouter is stateless anyway)
### 6.2 Solution B: Use contentSessionId as memorySessionId for Stateless Providers
For stateless providers, use the `contentSessionId` directly as the `memorySessionId`:
```typescript
if (!session.memorySessionId) {
session.memorySessionId = session.contentSessionId;
this.dbManager.getSessionStore().updateMemorySessionId(
session.sessionDbId,
session.contentSessionId
);
}
```
**Pros:**
- Simpler approach
- No additional ID generation
**Cons:**
- Violates the architectural principle that memorySessionId should differ from contentSessionId
- Could cause issues with session isolation (see SESSION_ID_ARCHITECTURE.md warnings)
### 6.3 Solution C: Allow null memorySessionId with Auto-Generation in ResponseProcessor
Modify `ResponseProcessor.ts` to generate a `memorySessionId` if one is not present:
```typescript
// In processAgentResponse():
if (!session.memorySessionId) {
const generatedId = `auto-${session.contentSessionId}-${Date.now()}`;
session.memorySessionId = generatedId;
// Persist to database
dbManager.getSessionStore().updateMemorySessionId(session.sessionDbId, generatedId);
logger.info('DB', `AUTO_GENERATED_MEMORY_ID | sessionDbId=${session.sessionDbId} | memorySessionId=${generatedId}`);
}
```
**Pros:**
- Works for any agent type
- Single point of fix
**Cons:**
- ResponseProcessor takes on responsibilities it shouldn't have
- Less explicit about provider behavior
### 6.4 Recommended Approach
**Solution A** is recommended because:
1. It explicitly handles the stateless nature of OpenRouter
2. It follows the existing pattern established by Claude SDK Agent
3. It keeps the memorySessionId generation in the agent where provider-specific logic belongs
4. It maintains clear separation of concerns
---
## 7. Priority/Severity Assessment
### 7.1 Severity Matrix
| Factor | Assessment |
|--------|------------|
| **Data Loss** | High - All observations lost for OpenRouter sessions |
| **Functionality** | Complete - OpenRouter provider is non-functional |
| **Workaround** | Exists - Use Claude SDK or Gemini providers |
| **Affected Users** | Subset - Only OpenRouter users |
| **Regression** | Unknown - May be present since OpenRouter was added |
### 7.2 Priority Assignment
**Priority: P1 (High)**
Rationale:
- Complete feature failure for affected configuration
- Users who choose OpenRouter are completely blocked
- Fix is straightforward with low regression risk
### 7.3 Recommended Timeline
| Action | Timeline |
|--------|----------|
| Hotfix development | 1-2 hours |
| Testing | 1 hour |
| Code review | 30 minutes |
| Release | Same day |
---
## 8. Testing Recommendations
### 8.1 Unit Tests to Add
```typescript
// tests/worker/openrouter-agent.test.ts
describe('OpenRouterAgent memorySessionId handling', () => {
it('should generate memorySessionId when session has none', async () => {
const session = createMockSession({
memorySessionId: null,
contentSessionId: 'test-content-123'
});
await openRouterAgent.startSession(session, mockWorker);
expect(session.memorySessionId).not.toBeNull();
expect(session.memorySessionId).toContain('openrouter-');
});
it('should persist generated memorySessionId to database', async () => {
const session = createMockSession({ memorySessionId: null });
await openRouterAgent.startSession(session, mockWorker);
expect(mockDbManager.getSessionStore().updateMemorySessionId)
.toHaveBeenCalledWith(session.sessionDbId, expect.any(String));
});
it('should not regenerate memorySessionId if already present', async () => {
const existingId = 'existing-memory-id';
const session = createMockSession({ memorySessionId: existingId });
await openRouterAgent.startSession(session, mockWorker);
expect(session.memorySessionId).toBe(existingId);
});
});
```
### 8.2 Integration Tests to Add
```typescript
describe('OpenRouter end-to-end observation storage', () => {
it('should successfully store observations for new OpenRouter sessions', async () => {
// Create new session via hook
const sessionDbId = createSDKSession(db, 'content-123', 'test-project', 'test prompt');
// Initialize and start OpenRouter agent
const session = sessionManager.initializeSession(sessionDbId);
await openRouterAgent.startSession(session, mockWorker);
// Verify observations were stored
const observations = db.prepare('SELECT * FROM observations WHERE memory_session_id = ?')
.all(session.memorySessionId);
expect(observations.length).toBeGreaterThan(0);
});
});
```
---
## 9. Related Files
| File | Relevance |
|------|-----------|
| `src/services/worker/OpenRouterAgent.ts` | Primary fix location |
| `src/services/worker/agents/ResponseProcessor.ts` | Error origin (line 73-75) |
| `src/services/worker/SessionManager.ts` | Session initialization |
| `src/services/worker/SDKAgent.ts` | Reference implementation for memorySessionId capture |
| `src/services/sqlite/sessions/create.ts` | Database session creation |
| `docs/SESSION_ID_ARCHITECTURE.md` | Architecture documentation |
| `tests/worker/agents/response-processor.test.ts` | Existing test coverage |
---
## 10. Conclusion
Issue #591 is a critical bug that renders the OpenRouter provider non-functional for new sessions. The root cause is a missing `memorySessionId` capture mechanism specific to stateless providers like OpenRouter.
The recommended fix is to generate a unique `memorySessionId` in `OpenRouterAgent.startSession()` before calling `processAgentResponse()`. This fix is straightforward, follows existing patterns, and carries low regression risk.
**Immediate Action Required:** Implement Solution A and release a hotfix.

View File

@@ -0,0 +1,417 @@
# Issue #596: ProcessTransport is not ready for writing - Generator aborted on every observation
**Date:** 2026-01-07
**Issue:** [#596](https://github.com/thedotmack/claude-mem/issues/596)
**Reported by:** soho-dev-account
**Severity:** Critical
**Status:** Under Investigation
**Labels:** bug
---
## 1. Executive Summary
After a clean install of claude-mem v9.0.0, the SDK agent aborts every observation with a "ProcessTransport is not ready for writing" error. The worker starts successfully and the HTTP API responds, but no observations are stored to the database. The error originates from the Claude Agent SDK's internal transport layer, specifically in the bundled `worker-service.cjs` at line 1119.
**Key Finding:** This is a race condition or timing issue in the Claude Agent SDK's ProcessTransport initialization. The SDK attempts to write messages to its subprocess transport before the transport's ready state is established.
**Impact:** Complete loss of memory functionality. The system appears operational but silently fails to capture any development context.
---
## 2. Problem Analysis
### 2.1 Symptoms
1. **Worker starts successfully** - No startup errors, HTTP endpoints respond
2. **Observations are queued** - HTTP 200 responses from `/api/sessions/observations`
3. **Generator aborts immediately** - Every queued message triggers generator abort
4. **No observations stored** - Database remains empty despite active usage
### 2.2 Error Signature
```
error: ProcessTransport is not ready for writing at write (/Users/.../worker-service.cjs:1119:5337)
```
### 2.3 Worker Logs Pattern
```
[INFO ] [SDK ] Starting SDK query...
[INFO ] [SDK ] Creating message generator...
[INFO ] [SESSION] [session-3458] Generator aborted
```
The log shows:
- SDK query starts (line 78-85 in SDKAgent.ts)
- Message generator created (line 266-272 in SDKAgent.ts)
- Generator aborts immediately (line 169 in SessionRoutes.ts)
The gap between "Creating message generator" and "Generator aborted" indicates the SDK's `query()` function throws before yielding any messages.
### 2.4 Environment Context
- **OS:** macOS 26.3, Apple Silicon
- **Bun:** 1.3.5
- **Node:** v22.21.1
- **Claude Code:** 2.0.75
- **claude-mem:** v9.0.0 (clean install)
---
## 3. Technical Details
### 3.1 ProcessTransport in the Agent SDK
The `ProcessTransport` class is part of the Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`), bundled into `worker-service.cjs` during the build process. This transport manages bidirectional IPC communication between:
1. **Parent process:** The claude-mem worker service
2. **Child process:** Claude Code subprocess spawned for SDK queries
The transport uses stdin/stdout pipes to exchange JSON messages with the Claude Code process.
### 3.2 The Ready State Problem
ProcessTransport maintains a `ready` state that gates write operations:
```javascript
// Approximate structure from bundled code
class ProcessTransport {
ready = false;
write(data) {
if (!this.ready) {
throw new Error("ProcessTransport is not ready for writing");
}
// ... actual write to subprocess stdin
}
async start() {
// Spawn subprocess
// Set up pipes
this.ready = true;
}
}
```
The error occurs when `write()` is called before `start()` completes, or when the transport initialization fails silently.
### 3.3 Code Flow Analysis
1. **Session initialization** (`SessionRoutes.ts:237-299`)
- HTTP request creates/fetches session
- Calls `startGeneratorWithProvider()`
2. **Generator startup** (`SessionRoutes.ts:118-217`)
- Sets `session.currentProvider`
- Calls `agent.startSession(session, worker)`
- Wraps in Promise with error/finally handlers
3. **SDK query invocation** (`SDKAgent.ts:102-114`)
```typescript
const queryResult = query({
prompt: messageGenerator,
options: {
model: modelId,
...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: session.memorySessionId }),
disallowedTools,
abortController: session.abortController,
pathToClaudeCodeExecutable: claudePath
}
});
```
4. **SDK internal flow** (inside `query()`)
- Creates ProcessTransport
- Spawns Claude subprocess
- **RACE:** Attempts to write before ready
### 3.4 Abort Controller Signal Path
When ProcessTransport throws, the error propagates through:
1. `query()` async iterator throws
2. `for await` loop in `startSession()` exits
3. Generator promise rejects
4. SessionRoutes `.catch()` handler executes
5. Checks `session.abortController.signal.aborted`
6. Since not manually aborted, logs "Generator failed" at ERROR level
7. `.finally()` handler executes
8. Logs "Generator aborted" (misleading - it wasn't aborted, it crashed)
---
## 4. Impact Assessment
### 4.1 Functional Impact
| Component | Status | Notes |
|-----------|--------|-------|
| Worker startup | Working | HTTP server binds correctly |
| HTTP API | Working | Endpoints respond with 200 |
| Session creation | Working | Database rows created |
| Observation queueing | Working | Messages added to pending queue |
| SDK query | **Failing** | ProcessTransport error |
| Observation storage | **Failing** | No observations saved |
| Summary generation | **Failing** | Depends on working SDK |
| CLAUDE.md generation | **Partial** | No recent activity to show |
### 4.2 User Impact
- **100% loss of memory functionality** - No observations captured
- **Silent failure mode** - Worker appears healthy
- **Queue grows indefinitely** - Messages stuck in "processing"
- **No error visible to user** - Requires checking worker logs
### 4.3 System Recovery
After this failure:
1. Pending messages remain in database (crash-safe design)
2. On worker restart, messages are recoverable
3. If SDK issue is resolved, backlog will process
---
## 5. Root Cause Analysis
### 5.1 Primary Hypothesis: SDK Version Incompatibility
**Confidence: 85%**
The Claude Agent SDK version (`^0.1.76`) may have introduced changes to ProcessTransport initialization timing that conflict with how claude-mem invokes `query()`.
Evidence:
- v9.0.0 works for some users but fails for others
- Error occurs in SDK internals, not claude-mem code
- Similar timing issues seen in previous SDK versions
### 5.2 Alternative Hypothesis: Subprocess Spawn Race
**Confidence: 70%**
The Claude Code subprocess may fail to start or respond in time, causing the transport to remain in non-ready state.
Evidence:
- `pathToClaudeCodeExecutable` is auto-detected
- Different Claude Code versions may have different startup times
- Apple Silicon Bun may spawn processes differently
### 5.3 Alternative Hypothesis: Bun-Specific IPC Issue
**Confidence: 50%**
Bun's process spawning may handle stdin/stdout pipes differently than Node.js, causing transport initialization to fail.
Evidence:
- claude-mem runs under Bun
- Agent SDK may not be tested extensively with Bun runtime
- Bun 1.3.5 is relatively new
### 5.4 Related: Recent Version Mismatch Fix (#567)
Commit `e22e2bfc` fixed a version mismatch causing infinite worker restart loops. This touched:
- `plugin/package.json`
- `plugin/scripts/worker-service.cjs`
- Hook scripts
While this fix addressed restart loops, it may have introduced timing changes that expose this race condition.
---
## 6. Recommended Solutions
### 6.1 Immediate Workarounds
#### Option A: Retry with Backoff (Quick Fix)
Add retry logic around `query()` invocation:
```typescript
// SDKAgent.ts - wrap query() with retry
async function queryWithRetry(options: QueryOptions, maxRetries = 3): Promise<QueryResult> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return query(options);
} catch (error) {
if (error.message?.includes('ProcessTransport is not ready') && attempt < maxRetries - 1) {
await new Promise(resolve => setTimeout(resolve, 100 * (attempt + 1)));
continue;
}
throw error;
}
}
}
```
**Pros:** Quick to implement, may resolve timing-sensitive cases
**Cons:** Masks underlying issue, adds latency
#### Option B: Verify Claude Executable Before Query
Add explicit verification that Claude is responsive:
```typescript
// Before calling query()
const testResult = execSync(`${claudePath} --version`, { timeout: 5000 });
if (!testResult) {
throw new Error('Claude executable not responding');
}
```
**Pros:** Catches subprocess spawn failures early
**Cons:** Adds startup latency, doesn't address transport race
### 6.2 Medium-Term Fixes
#### Option C: Pin SDK Version
Lock to a known-working SDK version:
```json
{
"dependencies": {
"@anthropic-ai/claude-agent-sdk": "0.1.75"
}
}
```
**Pros:** Immediate resolution if regression confirmed
**Cons:** Misses security updates, may not match Claude Code version
#### Option D: Add Transport Ready Callback
Request SDK feature to expose transport ready state:
```typescript
// Hypothetical API
const queryResult = query({
prompt: messageGenerator,
options: { ... },
onTransportReady: () => logger.info('SDK', 'Transport ready')
});
```
**Pros:** Proper fix at SDK level
**Cons:** Requires SDK changes
### 6.3 Long-Term Solutions
#### Option E: V2 SDK Migration
The V2 SDK (`unstable_v2_createSession`) uses a different session-based architecture that may not have this race condition:
```typescript
await using session = unstable_v2_createSession({
model: 'claude-sonnet-4-5-20250929'
});
await session.send(initPrompt); // Explicit send/receive
for await (const msg of session.receive()) { ... }
```
**Pros:** Modern API, explicit lifecycle control
**Cons:** V2 is "unstable preview", requires significant refactor
#### Option F: Alternative Agent Provider
Use Gemini or OpenRouter as default when SDK fails:
```typescript
// SessionRoutes.ts - fallback logic
try {
await sdkAgent.startSession(session, worker);
} catch (error) {
if (error.message?.includes('ProcessTransport')) {
logger.warn('SESSION', 'SDK transport failed, falling back to Gemini');
await geminiAgent.startSession(session, worker);
}
}
```
**Pros:** System remains functional
**Cons:** Different model behavior, requires API key
---
## 7. Priority/Severity Assessment
### 7.1 Severity: Critical
| Criterion | Rating | Justification |
|-----------|--------|---------------|
| Functional Impact | Critical | Core feature completely broken |
| User Count | Unknown | Appears on clean installs |
| Data Loss | Low | No data corrupted, queue preserved |
| Recoverability | Medium | Worker restart may help |
| Workaround Available | Limited | Use alternative provider |
### 7.2 Priority: P0
This should be treated as a P0 (highest priority) issue because:
1. **Core functionality broken** - Memory capture is the primary feature
2. **Silent failure** - Users may not realize observations aren't being saved
3. **Clean install affected** - New users cannot use the product
4. **No easy workaround** - Requires code changes or provider switching
### 7.3 Recommended Action Plan
1. **Immediate (Day 1)**
- [ ] Reproduce issue in controlled environment
- [ ] Test with pinned SDK version 0.1.75
- [ ] Test with Node.js instead of Bun
- [ ] Add explicit error message to SessionRoutes for this failure mode
2. **Short-term (Week 1)**
- [ ] Implement retry logic (Option A)
- [ ] Add transport failure telemetry
- [ ] Document workaround in issue comments
- [ ] File SDK issue with Anthropic if confirmed regression
3. **Medium-term (Week 2-4)**
- [ ] Evaluate V2 SDK migration timeline
- [ ] Add graceful fallback to alternative providers
- [ ] Improve generator error visibility in viewer UI
---
## 8. Appendix
### 8.1 Related Files
| File | Relevance |
|------|-----------|
| `src/services/worker/SDKAgent.ts` | SDK query invocation |
| `src/services/worker/http/routes/SessionRoutes.ts` | Generator lifecycle management |
| `src/services/worker/SessionManager.ts` | Session state and queue management |
| `src/services/worker-types.ts` | ActiveSession type definition |
| `plugin/scripts/worker-service.cjs` | Bundled worker with SDK code |
### 8.2 Related Issues
- **#567** - Version mismatch causing infinite worker restart loop (may be related)
- **#520** - Stuck messages analysis (similar symptom pattern)
- **#532** - Memory leak analysis (generator lifecycle issues)
### 8.3 Related Documentation
- `docs/context/agent-sdk-v2-preview.md` - V2 SDK documentation
- `docs/context/agent-sdk-v2-examples.ts` - V2 SDK code examples
- `docs/reports/2026-01-02--generator-failure-investigation.md` - Previous generator failure analysis
### 8.4 Test Commands
```bash
# Check worker logs
tail -f ~/.claude-mem/logs/worker-$(date +%Y-%m-%d).log
# Check pending queue
npm run queue
# Restart worker
npm run worker:restart
# Test with specific SDK version
npm install @anthropic-ai/claude-agent-sdk@0.1.75
npm run build-and-sync
```
---
**Report prepared by:** Claude Code
**Analysis date:** 2026-01-07
**Next review:** After reproduction attempt

View File

@@ -0,0 +1,271 @@
# Issue #597: Too Many Bugs - Technical Analysis Report
**Date:** 2026-01-07
**Issue:** [#597](https://github.com/thedotmack/claude-mem/issues/597)
**Author:** TullyMonster
**Labels:** bug
**Status:** Open
---
## 1. Executive Summary
Issue #597 is a bug report from user TullyMonster containing four screenshots documenting various issues encountered over a two-day period. The report lacks textual description of the specific bugs, relying entirely on visual evidence. The user apologizes for the delay in reporting, stating the bugs significantly hampered their productivity.
**Key Limitation:** This analysis is constrained by the image-only nature of the report. The four screenshots (with dimensions 2560x1239, 1511x628, 2560x3585, and 1907x1109 pixels) cannot be programmatically analyzed to extract specific error messages or UI states.
**Community Confirmation:** Another user (ham-zax) commented agreeing with the severity: "yeah atleast make beta testing, it hampers productivity"
---
## 2. Problem Analysis
### 2.1 Contextual Analysis
Based on the timing (January 7, 2026), the user was running claude-mem v9.0.0, which was released on January 5, 2026. This version introduced the "Live Context System with Distributed CLAUDE.md Generation" (PR #556), a significant architectural change.
The same user (TullyMonster) also commented on Issue #596 with "same as you," indicating they experienced the ProcessTransport error that causes all observations to fail silently.
### 2.2 Related Issues Filed Same Day
| Issue | Title | Relevance |
|-------|-------|-----------|
| #596 | ProcessTransport is not ready for writing - Generator aborted | User confirmed experiencing this |
| #598 | Too many messages, polluting conversation history | UX issue with plugin messages |
| #602 | PostToolUse Error: worker-service.cjs failed to start | Worker startup failures on Windows |
### 2.3 Screenshot Analysis (Inferred)
Based on screenshot dimensions and the pattern of issues being reported on v9.0.0:
| Screenshot | Dimensions | Likely Content |
|------------|------------|----------------|
| Image 1 | 2560x1239 | Full-screen terminal/IDE showing errors |
| Image 2 | 1511x628 | Error message dialog or log output |
| Image 3 | 2560x3585 | Very tall - likely scrolling log output or multiple stacked errors |
| Image 4 | 1907x1109 | Terminal or UI showing bug manifestation |
---
## 3. Technical Details
### 3.1 Probable Bug Categories (v9.0.0)
Based on the cluster of issues around this time, the bugs likely fall into these categories:
#### A. ProcessTransport Failures (High Probability)
```
error: ProcessTransport is not ready for writing
at write (/Users/.../worker-service.cjs:1119:5337)
at streamInput (/Users/.../worker-service.cjs:1122:1041)
```
- Every observation fails with "Generator aborted"
- Queue depth accumulates (87+ unprocessed items)
- Worker UI works, but no observations are stored
#### B. Worker Startup Failures (Moderate Probability)
```
[ERROR] [HOOK] save-hook failed Worker did not become ready within 15 seconds. (port 37777)
[ERROR] [SYSTEM] Worker failed to start (health check timeout)
[ERROR] [SYSTEM] Failed to start server. Is port 37777 in use?
```
#### C. Session/Memory Issues (Moderate Probability)
```
[ERROR] Cannot store observations: memorySessionId not yet captured
[WARN] Generator exited unexpectedly
```
#### D. Conversation Pollution (Possible)
Multiple "Hello memory agent" messages appearing in conversation history, disrupting workflow.
### 3.2 Environment Assumptions
Based on the user's participation in Issue #596 (macOS focus) and screenshot dimensions:
- **OS:** Likely macOS (high-resolution display)
- **Version:** v9.0.0 (released Jan 5, 2026)
- **Runtime:** Bun 1.3.x
---
## 4. Impact Assessment
### 4.1 User Impact
| Impact Area | Severity | Description |
|-------------|----------|-------------|
| Productivity | **Critical** | User spent 2 days dealing with bugs instead of coding |
| Data Loss | **High** | Observations not being stored (ProcessTransport issue) |
| Workflow Disruption | **High** | Multiple bugs compounding the problem |
| User Trust | **Medium** | User apologizes for delay, showing frustration |
### 4.2 Broader Impact
The community response indicates this is not an isolated incident:
- ham-zax: "yeah atleast make beta testing, it hampers productivity"
- Multiple users on #596: "same as you", "same 3"
This suggests v9.0.0 has significant stability issues affecting multiple users.
---
## 5. Root Cause Analysis
### 5.1 Likely Root Causes
#### Primary: ProcessTransport Race Condition
The Claude Agent SDK's ProcessTransport class attempts to write to stdin before the spawned process is ready. This is a timing/race condition that manifests inconsistently.
**Evidence:**
- Clean installs affected
- Both Bun 1.3.4 and 1.3.5 affected
- Prompts ARE recorded correctly, only SDK agent fails
#### Secondary: Version 9.0.0 Regression
PR #556 introduced significant changes to the Live Context System, which may have:
1. Introduced new race conditions
2. Affected worker lifecycle management
3. Changed timing of critical initialization steps
#### Tertiary: Platform-Specific Issues
Windows users experiencing additional problems:
- `wmic` command not recognized (newer Windows versions)
- Port binding conflicts
- PowerShell variable escaping in Git Bash
### 5.2 Contributing Factors
| Factor | Description |
|--------|-------------|
| Rapid releases | v8.5.10 to v9.0.0 in 2 days |
| Complex architecture | 5 lifecycle hooks, async worker, SDK integration |
| Limited beta testing | Community comment suggests need for beta channel |
| Platform diversity | macOS, Windows, Linux all have different issues |
---
## 6. Recommended Solutions
### 6.1 Immediate Actions (For User)
1. **Request Clarification** - Post a comment asking:
```
@TullyMonster Thank you for the detailed screenshots! To help us
investigate these issues more effectively, could you please provide:
1. Which specific errors/behaviors are shown in each screenshot?
2. Your environment (OS, claude-mem version, Bun version)?
3. Relevant log entries from ~/.claude-mem/logs/worker-YYYY-MM-DD.log?
4. Steps to reproduce any of these issues?
We've identified several related issues (#596, #598, #602) and want to
ensure we're addressing your specific problems.
```
2. **Verify Version** - Confirm user is on v9.0.0
3. **Link Related Issues** - Cross-reference with:
- #596 (ProcessTransport)
- #598 (message pollution)
- #602 (worker startup)
### 6.2 Technical Fixes (For Maintainers)
| Priority | Fix | Issue |
|----------|-----|-------|
| P0 | Fix ProcessTransport race condition | #596 |
| P1 | Improve worker startup reliability | #602 |
| P2 | Reduce conversation pollution | #598 |
| P3 | Add better error recovery | General |
### 6.3 Process Improvements
1. **Beta Channel** - Consider a beta release channel for major versions
2. **Automated Testing** - Expand CI to catch lifecycle issues
3. **Error Reporting** - Add structured error logging that's easier to share
4. **Bug Report Template** - Update template to encourage log submission
---
## 7. Priority/Severity Assessment
### 7.1 Individual Issue Severity
| Aspect | Rating | Justification |
|--------|--------|---------------|
| Frequency | High | Multiple users affected |
| Impact | Critical | Complete workflow disruption |
| Urgency | High | Blocking user productivity |
| Complexity | Medium | Root causes identified in related issues |
### 7.2 Overall Priority
**Priority: P1 - High**
**Rationale:**
- User lost 2 days of productivity
- Multiple corroborating reports from community
- v9.0.0 appears to have introduced regressions
- Plugin is actively harming user experience rather than helping
### 7.3 Recommended Triage
1. **Consolidate** - This issue likely duplicates #596, #602, and/or #598
2. **Request Details** - Ask user to specify which screenshots map to which issues
3. **Consider Rollback** - If issues persist, consider advising users to downgrade to v8.5.10
4. **Hotfix** - Prioritize a v9.0.1 release addressing ProcessTransport issue
---
## 8. Appendix
### 8.1 Related Issues Timeline
| Date | Issue | Event |
|------|-------|-------|
| Jan 5 | - | v9.0.0 released |
| Jan 6 | #571 | "Cannot store observations" |
| Jan 6 | #573 | "bun does not auto install" |
| Jan 7 01:10 | #588 | API key cost warning |
| Jan 7 10:17 | #596 | ProcessTransport failures |
| Jan 7 13:09 | #597 | This issue |
| Jan 7 14:08 | #598 | Conversation pollution |
| Jan 7 18:13 | #602 | Worker startup failures |
### 8.2 Screenshot Metadata
| # | Dimensions | Aspect Ratio | Notes |
|---|------------|--------------|-------|
| 1 | 2560x1239 | 2.07:1 | Wide monitor screenshot |
| 2 | 1511x628 | 2.41:1 | Cropped dialog/window |
| 3 | 2560x3585 | 0.71:1 | Tall scrolling capture |
| 4 | 1907x1109 | 1.72:1 | Standard window capture |
### 8.3 Version History
| Version | Date | Notable Changes |
|---------|------|-----------------|
| v8.5.10 | Jan 5 | Pre-v9 stable |
| v9.0.0 | Jan 5 | Live Context System (PR #556) |
| v9.0.0+ | Jan 7 | Version mismatch fix (PR #567) |
---
## 9. Conclusion
Issue #597 represents user frustration with multiple bugs encountered in claude-mem v9.0.0. While the image-only report makes specific diagnosis difficult, contextual analysis strongly suggests the user experienced:
1. ProcessTransport failures causing observation loss (#596)
2. Possibly worker startup issues (#602)
3. Possibly conversation pollution (#598)
**Recommended Next Steps:**
1. Request additional details from the user
2. Link this issue to #596 as likely duplicate/related
3. Prioritize v9.0.1 hotfix for ProcessTransport issue
4. Consider implementing a beta testing channel for major releases
---
*Report generated: 2026-01-07*
*Analysis based on: GitHub issue data, related issues, commit history, and contextual inference*

View File

@@ -0,0 +1,365 @@
# Issue #598: Conversation History Pollution - Technical Analysis Report
**Issue:** Too many messages, polluting my conversation history
**Author:** abhijit8ganguly-afk
**Created:** 2026-01-07
**Labels:** bug
**Report Date:** 2026-01-07
---
## 1. Executive Summary
Users are experiencing conversation history pollution when using `/resume` in Claude Code. Plugin-generated messages starting with "Hello memory agent" appear in the user's conversation history, making it difficult to resume sessions. This issue stems from a fundamental architectural concern: the Claude Agent SDK's resume mechanism appears to inject messages into the user's transcript when the `resume` parameter is passed with the user's `contentSessionId` instead of the plugin's separate `memorySessionId`.
**Key Finding:** The plugin maintains two separate session IDs for isolation:
- `contentSessionId`: The user's Claude Code session (what appears in `/resume`)
- `memorySessionId`: The plugin's internal SDK session (should NEVER appear in user's history)
When these become conflated, plugin messages pollute the user's conversation history.
---
## 2. Problem Analysis
### 2.1 User-Reported Symptoms
When using `/resume`, users see multiple messages starting with "Hello memory agent" appearing in their conversation history. These messages are internal to the claude-mem plugin and should be invisible to users.
### 2.2 Source of "Hello memory agent" Messages
The message originates from the mode configuration file at `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/plugin/modes/code.json`:
```json
{
"prompts": {
"continuation_greeting": "Hello memory agent, you are continuing to observe the primary Claude session.",
"continuation_instruction": "IMPORTANT: Continue generating observations from tool use messages using the XML structure below."
}
}
```
This greeting is injected via `buildContinuationPrompt()` in `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/sdk/prompts.ts`:
```typescript
export function buildContinuationPrompt(userPrompt: string, promptNumber: number, contentSessionId: string, mode: ModeConfig): string {
return `${mode.prompts.continuation_greeting}
...
```
### 2.3 When These Messages Appear
The continuation prompt is used when `session.lastPromptNumber > 1` (any prompt after the initial session start). This is controlled in `SDKAgent.ts`:
```typescript
const initPrompt = isInitPrompt
? buildInitPrompt(session.project, session.contentSessionId, session.userPrompt, mode)
: buildContinuationPrompt(session.userPrompt, session.lastPromptNumber, session.contentSessionId, mode);
```
---
## 3. Technical Details
### 3.1 Session ID Architecture
The plugin uses a dual session ID system to maintain isolation between user conversations and plugin operations:
| Session ID | Purpose | Storage | Should Appear in /resume |
|------------|---------|---------|--------------------------|
| `contentSessionId` | User's Claude Code session | From hook context | Yes - this IS the user's session |
| `memorySessionId` | Plugin's internal SDK session | Captured from SDK responses | **NEVER** |
**Critical Code Comments from `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/services/sqlite/SessionStore.ts`:**
```typescript
// NOTE: memory_session_id starts as NULL. It is captured by SDKAgent from the first SDK
// response and stored via updateMemorySessionId(). CRITICAL: memory_session_id must NEVER
// equal contentSessionId - that would inject memory messages into the user's transcript!
```
### 3.2 SDK Query Flow
The `SDKAgent.startSession()` method at `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/services/worker/SDKAgent.ts` controls how the plugin interacts with Claude:
```typescript
const queryResult = query({
prompt: messageGenerator,
options: {
model: modelId,
// Only resume if BOTH: (1) we have a memorySessionId AND (2) this isn't the first prompt
// On worker restart, memorySessionId may exist from a previous SDK session but we
// need to start fresh since the SDK context was lost
...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: session.memorySessionId }),
disallowedTools,
abortController: session.abortController,
pathToClaudeCodeExecutable: claudePath
}
});
```
**Key Point:** The `resume` parameter specifies which session to continue. If this accidentally uses `contentSessionId`, messages appear in the user's history.
### 3.3 Message Generator Architecture
The `createMessageGenerator()` method yields synthetic user messages to the SDK:
```typescript
yield {
type: 'user',
message: {
role: 'user',
content: initPrompt // Contains "Hello memory agent" for continuation prompts
},
session_id: session.contentSessionId, // References user's session for context
parent_tool_use_id: null,
isSynthetic: true // Marked as synthetic - should not appear in real history
};
```
### 3.4 Transcript Storage Locations
Claude Code stores conversation transcripts at:
```
~/.claude/projects/{dashed-cwd}/{session_id}.jsonl
```
If the wrong session ID is used, plugin messages get written to the user's transcript file.
---
## 4. Impact Assessment
### 4.1 User Experience Impact
| Severity | Description |
|----------|-------------|
| High | Users cannot effectively use `/resume` due to pollution |
| Medium | Confusion about what messages are "real" vs plugin-generated |
| Low | Increased scrolling/navigation effort |
### 4.2 Functional Impact
- **Session Resume Degraded:** The core `/resume` functionality becomes less useful
- **Context Window Pollution:** Plugin messages consume valuable context window tokens
- **Trust Erosion:** Users may question if the plugin is behaving correctly
### 4.3 Affected Users
All users who:
1. Have claude-mem plugin installed
2. Use `/resume` to continue sessions
3. Have multi-turn conversations where continuation prompts are generated
---
## 5. Root Cause Analysis
### 5.1 Primary Root Cause
The issue likely occurs when the session ID passed to the SDK's `resume` parameter conflates with the user's session. This could happen in several scenarios:
**Scenario A: Stale Session ID Resume (Previously Identified)**
When the worker restarts with a stale `memorySessionId` from a previous session, it may attempt to resume into a non-existent session. The fix at `SDKAgent.ts:109` prevents this:
```typescript
...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: session.memorySessionId }),
```
However, if this logic was not working correctly, or if there was a race condition, the wrong ID could be used.
**Scenario B: Session ID Capture Timing**
The `memorySessionId` is captured from the first SDK response:
```typescript
if (!session.memorySessionId && message.session_id) {
session.memorySessionId = message.session_id;
// Persist to database for cross-restart recovery
this.dbManager.getSessionStore().updateMemorySessionId(
session.sessionDbId,
message.session_id
);
}
```
If this capture fails or is delayed, subsequent messages might use the wrong session context.
**Scenario C: Message Yielding with User Session Context**
The message generator yields messages with `session_id: session.contentSessionId`:
```typescript
yield {
type: 'user',
message: { role: 'user', content: initPrompt },
session_id: session.contentSessionId, // <-- This is the user's session ID!
...
};
```
This field may be used by the SDK to determine where to persist messages. If so, this is a design issue where the plugin's internal messages reference the user's session.
### 5.2 Contributing Factors
1. **Shared Conversation History:** The plugin maintains a `conversationHistory` array that includes plugin messages, used for provider switching (Claude/Gemini/OpenRouter). This history may leak into user-visible contexts.
2. **Continuation Prompt Content:** The "Hello memory agent" greeting is explicitly designed to be internal but has no technical mechanism preventing it from appearing in user transcripts.
3. **Synthetic Message Flag:** Messages are marked `isSynthetic: true` but this flag may not be respected by all downstream components.
---
## 6. Recommended Solutions
### 6.1 Immediate Mitigations
#### Option A: Remove or Minimize Continuation Greeting (Low Effort)
Modify the mode configuration to use a less intrusive greeting:
```json
{
"continuation_greeting": "", // Empty or very minimal
}
```
**Pros:** Quick fix, no code changes
**Cons:** Doesn't fix the underlying session ID issue
#### Option B: Verify Session ID Isolation (Medium Effort)
Add runtime validation to ensure `memorySessionId` never equals `contentSessionId`:
```typescript
if (session.memorySessionId === session.contentSessionId) {
logger.error('SESSION', 'CRITICAL: memorySessionId matches contentSessionId - messages will pollute user history!', {
contentSessionId: session.contentSessionId,
memorySessionId: session.memorySessionId
});
// Reset memorySessionId to force fresh SDK session
session.memorySessionId = null;
}
```
### 6.2 Structural Fixes
#### Option C: Remove session_id from Yielded Messages (High Effort)
Investigate if the `session_id` field in yielded messages can be omitted or changed:
```typescript
yield {
type: 'user',
message: { role: 'user', content: initPrompt },
// session_id: session.contentSessionId, // REMOVE or use memorySessionId
parent_tool_use_id: null,
isSynthetic: true
};
```
**Requires:** Understanding of SDK internals and testing
#### Option D: Separate Transcript Storage (High Effort)
Ensure plugin messages are stored in a completely separate transcript path:
- User transcript: `~/.claude/projects/{cwd}/{contentSessionId}.jsonl`
- Plugin transcript: `~/.claude-mem/transcripts/{memorySessionId}.jsonl`
### 6.3 Long-Term Architecture
#### Option E: Agent SDK Isolation Mode
Request or implement an SDK feature that marks certain messages as "agent-internal" and prevents them from appearing in user-facing `/resume` history.
---
## 7. Priority/Severity Assessment
| Dimension | Rating | Justification |
|-----------|--------|---------------|
| **User Impact** | High | Directly affects core user workflow (`/resume`) |
| **Frequency** | High | Affects all users with continuation prompts |
| **Workaround Available** | Partial | Users can ignore messages, but UX degraded |
| **Fix Complexity** | Medium-High | Requires understanding SDK session mechanics |
### Recommended Priority: P1 (High)
This issue should be addressed promptly as it:
1. Degrades a core Claude Code feature (`/resume`)
2. Affects all plugin users
3. May indicate a deeper session isolation problem
4. Could lead to users disabling the plugin
---
## 8. Related Issues and Documentation
### Related Issues
- Previous fix for stale session resume: `.claude/plans/fix-stale-session-resume-crash.md`
- Session ID architecture: `SESSION_ID_ARCHITECTURE.md` (referenced in plans)
### Key Files for Investigation
| File | Relevance |
|------|-----------|
| `src/services/worker/SDKAgent.ts` | SDK query loop and session handling |
| `src/sdk/prompts.ts` | Prompt generation including "Hello memory agent" |
| `plugin/modes/code.json` | Mode configuration with greeting text |
| `src/services/sqlite/SessionStore.ts` | Session ID storage and validation |
| `tests/sdk-agent-resume.test.ts` | Test file for resume logic |
### Test Coverage
The resume parameter logic has unit tests at `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/tests/sdk-agent-resume.test.ts` covering:
- INIT prompt scenarios (should NOT resume)
- Continuation prompt scenarios (should resume with memorySessionId)
- Edge cases (empty/undefined memorySessionId)
- Stale session crash prevention
---
## 9. Appendix: Code References
### A. Continuation Greeting in Mode Config
**File:** `plugin/modes/code.json`
```json
"continuation_greeting": "Hello memory agent, you are continuing to observe the primary Claude session."
```
### B. Prompt Building
**File:** `src/sdk/prompts.ts:174-177`
```typescript
export function buildContinuationPrompt(...): string {
return `${mode.prompts.continuation_greeting}
...
```
### C. Message Yielding
**File:** `src/services/worker/SDKAgent.ts:283-292`
```typescript
yield {
type: 'user',
message: { role: 'user', content: initPrompt },
session_id: session.contentSessionId,
parent_tool_use_id: null,
isSynthetic: true
};
```
### D. Session ID Capture
**File:** `src/services/worker/SDKAgent.ts:120-140`
```typescript
if (!session.memorySessionId && message.session_id) {
session.memorySessionId = message.session_id;
this.dbManager.getSessionStore().updateMemorySessionId(
session.sessionDbId,
message.session_id
);
...
}
```
---
*Report generated: 2026-01-07*
*Analysis based on codebase at commit: 687146ce (merge main)*

View File

@@ -0,0 +1,323 @@
# Technical Report: Issue #599 - Windows Drive Root 400 Error
**Issue:** [#599](https://github.com/thedotmack/claude-mem/issues/599)
**Title:** user-message-hook.js fails with 400 error when running from Windows drive root (C:\)
**Author:** PakAbhishek
**Created:** 2026-01-07
**Severity:** Low
**Priority:** Medium
**Component:** Hooks / Session Initialization
---
## 1. Executive Summary
When running Claude Code from a Windows drive root directory (e.g., `C:\`), the `user-message-hook.js` script fails with a 400 HTTP error during session startup. The root cause is that `path.basename('C:\')` returns an empty string on Windows, which causes the API call to `/api/context/inject?project=` to fail with the error "Project(s) parameter is required".
**Key Findings:**
- The bug is **cosmetic only** - all core memory functionality continues to work correctly
- A robust fix already exists in `src/utils/project-name.ts` (`getProjectName()` function) but is not used by `user-message-hook.ts`
- The fix requires updating `user-message-hook.ts` to use the existing `getProjectName()` utility instead of raw `path.basename()`
- The `context-hook.ts` is already immune to this bug because it uses `getProjectContext()` which wraps `getProjectName()`
**Affected Files:**
- `src/hooks/user-message-hook.ts` (needs fix)
- `plugin/scripts/user-message-hook.js` (built artifact, auto-fixed by rebuild)
---
## 2. Problem Analysis
### 2.1 User-Reported Symptoms
1. Error message on Claude Code startup when cwd is `C:\`:
```
error: Failed to fetch context: 400
at C:\Users\achau\.claude\plugins\cache\thedotmack\claude-mem\9.0.0\scripts\user-message-hook.js:19:1339
```
2. The error appears during the SessionStart hook phase
3. Despite the error, all other functionality works correctly:
- Worker health check: passing
- MCP tools: connected and functional
- Memory search: working
- Session observations: saved correctly
### 2.2 Reproduction Steps
1. Open terminal on Windows
2. Navigate to drive root: `cd C:\`
3. Start Claude Code: `claude`
4. Observe the startup error
### 2.3 Environment
- **OS:** Windows 11
- **claude-mem version:** 9.0.0
- **Bun version:** 1.3.5
- **Claude Code:** Latest
---
## 3. Technical Details
### 3.1 Code Flow Analysis
The `user-message-hook.ts` extracts the project name using:
```typescript
// File: src/hooks/user-message-hook.ts (lines 18-23)
const project = basename(process.cwd());
const response = await fetch(
`http://127.0.0.1:${port}/api/context/inject?project=${encodeURIComponent(project)}&colors=true`,
{ method: 'GET' }
);
```
When `process.cwd()` returns `C:\`, the `path.basename()` function returns an empty string:
```javascript
> require('path').basename('C:\\')
''
```
This results in an API call to:
```
/api/context/inject?project=&colors=true
```
### 3.2 Server-Side Validation
The `/api/context/inject` endpoint in `SearchRoutes.ts` performs strict validation:
```typescript
// File: src/services/worker/http/routes/SearchRoutes.ts (lines 207-223)
private handleContextInject = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const projectsParam = (req.query.projects as string) || (req.query.project as string);
const useColors = req.query.colors === 'true';
if (!projectsParam) {
this.badRequest(res, 'Project(s) parameter is required');
return;
}
const projects = projectsParam.split(',').map(p => p.trim()).filter(Boolean);
if (projects.length === 0) {
this.badRequest(res, 'At least one project is required');
return;
}
// ...
});
```
The validation correctly rejects empty project names, returning HTTP 400.
### 3.3 Existing Solution
A robust solution already exists in `src/utils/project-name.ts`:
```typescript
// File: src/utils/project-name.ts (lines 12-40)
export function getProjectName(cwd: string | null | undefined): string {
if (!cwd || cwd.trim() === '') {
logger.warn('PROJECT_NAME', 'Empty cwd provided, using fallback', { cwd });
return 'unknown-project';
}
const basename = path.basename(cwd);
// Edge case: Drive roots on Windows (C:\, J:\) or Unix root (/)
// path.basename('C:\') returns '' (empty string)
if (basename === '') {
const isWindows = process.platform === 'win32';
if (isWindows) {
const driveMatch = cwd.match(/^([A-Z]):\\/i);
if (driveMatch) {
const driveLetter = driveMatch[1].toUpperCase();
const projectName = `drive-${driveLetter}`;
logger.info('PROJECT_NAME', 'Drive root detected', { cwd, projectName });
return projectName;
}
}
logger.warn('PROJECT_NAME', 'Root directory detected, using fallback', { cwd });
return 'unknown-project';
}
return basename;
}
```
This function:
- Handles null/undefined cwd
- Handles empty basename (drive roots)
- Returns meaningful names like `drive-C`, `drive-D` for Windows drive roots
- Returns `unknown-project` for Unix root or other edge cases
### 3.4 Comparison: Fixed vs. Unfixed Hooks
| Hook | Implementation | Status |
|------|---------------|--------|
| `context-hook.ts` | Uses `getProjectContext()` which calls `getProjectName()` | Immune to bug |
| `user-message-hook.ts` | Uses raw `basename(process.cwd())` | **Vulnerable** |
| `new-hook.ts` | Receives `cwd` from stdin, uses `getProjectName()` | Immune to bug |
| `save-hook.ts` | Uses basename but receives cwd from API context | Context-dependent |
---
## 4. Impact Assessment
### 4.1 Severity: Low
- **Functional Impact:** Cosmetic only - the error message is displayed but does not affect core functionality
- **Data Integrity:** No data loss or corruption
- **Workaround Available:** Yes - run Claude from a project directory instead of drive root
### 4.2 Affected Users
- Users running Claude Code from Windows drive roots (C:\, D:\, etc.)
- Estimated as a small percentage of users based on typical usage patterns
- More likely to affect users doing quick tests or troubleshooting
### 4.3 User Experience Impact
- Confusing error message on startup
- Users may incorrectly believe the plugin is broken
- Error appears in stderr alongside legitimate context information
---
## 5. Root Cause Analysis
### 5.1 Primary Cause
The `user-message-hook.ts` was implemented using a direct `path.basename()` call instead of the standardized `getProjectName()` utility function that handles edge cases.
### 5.2 Contributing Factors
1. **Inconsistent Pattern Usage:** Different hooks use different approaches to extract project names
2. **Missing Validation:** No client-side validation of project name before making API call
3. **Edge Case Not Tested:** Windows drive root is an unusual but valid working directory
### 5.3 Historical Context
The `getProjectName()` utility was added to handle this exact edge case (see `src/utils/project-name.ts`), but not all hooks were updated to use it. The `context-hook.ts` uses the newer `getProjectContext()` function, while `user-message-hook.ts` still uses the older pattern.
---
## 6. Recommended Solutions
### 6.1 Primary Fix (Recommended)
Update `user-message-hook.ts` to use the existing `getProjectName()` utility:
```typescript
// Current (vulnerable):
import { basename } from "path";
const project = basename(process.cwd());
// Fixed:
import { getProjectName } from "../utils/project-name.js";
const project = getProjectName(process.cwd());
```
**Benefits:**
- Uses battle-tested utility
- Consistent with other hooks
- Handles all edge cases (drive roots, Unix root, empty cwd)
- Provides meaningful project names (`drive-C`) instead of fallbacks
### 6.2 Alternative: Inline Fix (User-Suggested)
The user suggested an inline fix in the issue:
```javascript
let projectName = basename(process.cwd());
if (!projectName || projectName === '') {
const cwd = process.cwd();
projectName = cwd.match(/^([A-Za-z]:)[\\/]?$/)
? `drive-${cwd[0].toUpperCase()}`
: 'unknown-project';
}
```
**Evaluation:**
- Functionally correct
- Duplicates existing logic in `getProjectName()`
- Does not address the pattern inconsistency
- Acceptable if import constraints prevent using the utility
### 6.3 Additional Improvements (Optional)
1. **Add Client-Side Validation:**
```typescript
if (!project || project.trim() === '') {
throw new Error('Unable to determine project name from working directory');
}
```
2. **Standardize All Hooks:** Audit other hooks using `basename(process.cwd())` and update to use `getProjectName()`
3. **Add Unit Tests:** Create tests for `user-message-hook.ts` covering:
- Normal project directories
- Windows drive roots (C:\, D:\)
- Unix root (/)
- Trailing slashes
---
## 7. Priority and Severity Assessment
### 7.1 Classification
| Metric | Value | Justification |
|--------|-------|---------------|
| **Severity** | Low | Cosmetic error only, no functional impact |
| **Priority** | Medium | User-facing error, easy fix, affects Windows users |
| **Effort** | Trivial | Single line change + rebuild |
| **Risk** | Very Low | Using existing, tested utility function |
### 7.2 Recommendation
**Recommended Action:** Fix in next patch release (9.0.1)
**Rationale:**
- Simple fix with minimal risk
- Improves Windows user experience
- Demonstrates responsiveness to community feedback
- Pattern already exists in codebase
### 7.3 Testing Requirements
1. Verify fix on Windows with `C:\` as cwd
2. Verify existing behavior unchanged for normal project directories
3. Verify worktree detection still works correctly
4. Run full hook test suite
---
## 8. Appendix
### 8.1 Related Files
| File | Purpose | Fix Required |
|------|---------|--------------|
| `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/hooks/user-message-hook.ts` | Source hook (needs fix) | Yes |
| `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/plugin/scripts/user-message-hook.js` | Built hook | Auto-rebuilds |
| `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/utils/project-name.ts` | Utility (has fix) | No |
| `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/hooks/context-hook.ts` | Reference implementation | No |
| `/Users/alexnewman/conductor/workspaces/claude-mem/budapest/src/services/worker/http/routes/SearchRoutes.ts` | API validation | No |
### 8.2 Related Issues
- Windows compatibility has been a focus area, with 56+ memory entries documenting Windows-specific fixes
- This issue follows the pattern of other Windows edge case bugs
### 8.3 References
- [Node.js path.basename documentation](https://nodejs.org/api/path.html#pathbasenamepath-suffix)
- [Windows file system path formats](https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file)

View File

@@ -0,0 +1,432 @@
# Issue #600: Documentation Audit - Features Documented But Not Implemented
**Report Date:** 2026-01-07
**Issue Author:** @bguidolim
**Issue Created:** 2026-01-07
**Status:** Open
**Priority:** Medium-High
---
## 1. Executive Summary
A comprehensive audit by @bguidolim has identified **8 discrepancies** between the claude-mem documentation (`docs/public/`) and the actual implementation in the main branch. The core issue is that documentation describes beta-branch features as if they exist in the production release, leading to user confusion and failed feature expectations.
### Key Findings
| Category | Issue | Severity |
|----------|-------|----------|
| **Critical** | Version Channel UI missing from frontend | High |
| **Critical** | Endless Mode settings not validated/functional | High |
| **Moderate** | Troubleshoot Skill referenced but doesn't exist | Medium |
| **Moderate** | Folder CLAUDE.md setting documented but always enabled | Medium |
| **Moderate** | Skills directory documented but replaced by MCP | Medium |
| **Minor** | Allowed branches list incomplete | Low |
| **Minor** | Hook count inconsistency (5 vs 6) | Low |
| **Minor** | MCP tool count clarification needed | Low |
### Recommendation
Implement **Option B** (documentation update) for most items, with selective **Option A** (feature completion) for Version Channel UI given its near-complete backend implementation.
---
## 2. Problem Analysis
### 2.1 Documentation-Reality Gap
The documentation at `docs/public/` describes several features that:
1. Exist only in beta branches (`beta/endless-mode`, `beta/7.0`)
2. Have partial implementations (backend only, no frontend)
3. Were removed during architecture migrations (MCP transition)
4. Have non-functional settings (documented but ignored in code)
### 2.2 Impact on Users
Users following the documentation will:
- Look for UI elements that don't exist (Version Channel switcher)
- Configure settings that have no effect (Endless Mode, Folder CLAUDE.md)
- Invoke skills that don't exist (troubleshoot skill)
- Expect directory structures that don't match reality
---
## 3. Technical Details
### 3.1 Version Channel UI (High Severity)
**Documentation Claims** (`docs/public/beta-features.mdx`):
- Lines 14-24 describe a Version Channel switcher in the Settings modal
- Users should see "Settings gear icon" > "Version Channel" section
- Options include "Try Beta (Endless Mode)" and "Switch to Stable"
**Actual Implementation**:
| Component | Status | Location |
|-----------|--------|----------|
| `BranchManager.ts` | Implemented | `src/services/worker/BranchManager.ts` |
| `getBranchInfo()` | Implemented | Backend API |
| `switchBranch()` | Implemented | Backend API |
| `pullUpdates()` | Implemented | Backend API |
| `/api/branch/status` | Implemented | `SettingsRoutes.ts:169-172` |
| `/api/branch/switch` | Implemented | `SettingsRoutes.ts:178-209` |
| `/api/branch/update` | Implemented | `SettingsRoutes.ts:214-228` |
| **UI Component** | **NOT IMPLEMENTED** | `ContextSettingsModal.tsx` has no Version Channel section |
**Verification** (from `ContextSettingsModal.tsx`):
The component contains sections for:
- Loading settings (observations, sessions)
- Filters (types, concepts)
- Display settings
- Advanced settings (provider, model, port)
There is **no Version Channel section**. A grep for "Version Channel", "version channel", or "channel" in `src/ui/` returns no results.
**Related Issues**: #333, #436, #461 (all closed without merging UI)
---
### 3.2 Endless Mode Settings (High Severity)
**Documentation Claims** (`docs/public/endless-mode.mdx`):
```json
{
"CLAUDE_MEM_ENDLESS_MODE": "false",
"CLAUDE_MEM_ENDLESS_WAIT_TIMEOUT_MS": "90000"
}
```
**Actual Implementation**:
The `SettingsRoutes.ts` file (lines 87-124) defines the validated `settingKeys` array:
```typescript
const settingKeys = [
'CLAUDE_MEM_MODEL',
'CLAUDE_MEM_CONTEXT_OBSERVATIONS',
'CLAUDE_MEM_WORKER_PORT',
'CLAUDE_MEM_WORKER_HOST',
'CLAUDE_MEM_PROVIDER',
'CLAUDE_MEM_GEMINI_API_KEY',
// ... 20+ other settings
'CLAUDE_MEM_CONTEXT_SHOW_LAST_MESSAGE',
];
```
**Neither `CLAUDE_MEM_ENDLESS_MODE` nor `CLAUDE_MEM_ENDLESS_WAIT_TIMEOUT_MS` are present in this array.**
A grep for `ENDLESS_MODE` in `src/` returns only CLAUDE.md context files (auto-generated), not any TypeScript implementation.
**Current Location**: Implementation exists only in `upstream/beta/endless-mode` branch.
**Related Issues**: #366, #403, #416 (all closed, feature still in beta only)
---
### 3.3 Troubleshoot Skill (Medium Severity)
**Documentation Claims**:
`docs/public/troubleshooting.mdx` (lines 8-20):
```markdown
## Quick Diagnostic Tool
Describe any issues you're experiencing to Claude, and the troubleshoot skill
will automatically activate to provide diagnosis and fixes.
The troubleshoot skill will:
- Check worker status and health
- Verify database existence and integrity
- ...
```
`docs/public/architecture/overview.mdx` (lines 165-175):
```
plugin/skills/
├── mem-search/
├── troubleshoot/ ← Documented but doesn't exist
│ ├── SKILL.md
│ └── operations/
└── version-bump/
```
**Actual Implementation**:
```bash
$ ls plugin/skills/
ls: plugin/skills/: No such file or directory
```
The `plugin/skills/` directory **does not exist** in the main branch.
**Historical Context**: Skills were merged in PR #72 (v5.2) but later removed during the MCP migration. The documentation was not updated to reflect this architectural change.
---
### 3.4 Folder CLAUDE.md Setting (Medium Severity)
**Documentation Claims** (`docs/public/configuration.mdx`, lines 232-238):
| Setting | Default | Description |
|---------|---------|-------------|
| `CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED` | `false` | Enable auto-generation of folder CLAUDE.md files |
**Actual Implementation**:
In `ResponseProcessor.ts` (lines 216-233), folder CLAUDE.md updates are triggered unconditionally:
```typescript
// Update folder CLAUDE.md files for touched folders (fire-and-forget)
const allFilePaths: string[] = [];
for (const obs of observations) {
allFilePaths.push(...(obs.files_modified || []));
allFilePaths.push(...(obs.files_read || []));
}
if (allFilePaths.length > 0) {
updateFolderClaudeMdFiles(
allFilePaths,
session.project,
getWorkerPort(),
projectRoot
).catch(error => {
logger.warn('FOLDER_INDEX', 'CLAUDE.md update failed (non-critical)', ...);
});
}
```
**The setting `CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED` is never read.** The feature runs unconditionally when files are touched.
Additionally, the setting is not in the `SettingsRoutes.ts` settingKeys array, so it cannot be configured through the API.
**Fix in Progress**: PR #589
---
### 3.5 Skills Directory (Medium Severity)
**Documentation Claims** (`docs/public/architecture/overview.mdx`, lines 165-175):
```
plugin/skills/
├── mem-search/
│ ├── SKILL.md
│ ├── operations/
│ └── principles/
├── troubleshoot/
└── version-bump/
```
**Actual Implementation**:
The `plugin/skills/` directory does not exist. Search functionality is now provided by MCP tools defined in `src/servers/mcp-server.ts`:
```typescript
const tools = [
{ name: '__IMPORTANT', ... },
{ name: 'search', ... },
{ name: 'timeline', ... },
{ name: 'get_observations', ... }
];
```
The skill-based architecture was replaced by MCP tools during the v6.x architecture evolution. The documentation still describes the old skill-based system.
---
### 3.6 Allowed Branches List (Low Severity)
**Location**: `SettingsRoutes.ts:187`
```typescript
const allowedBranches = ['main', 'beta/7.0', 'feature/bun-executable'];
```
**Issue**: Missing `beta/endless-mode` which exists in upstream and is documented.
---
### 3.7 Hook Count Inconsistency (Low Severity)
| Source | Stated Count |
|--------|--------------|
| `docs/public/architecture/overview.mdx` | "6 lifecycle hooks" |
| Root `CLAUDE.md` | "5 Lifecycle Hooks" |
| Actual `hooks.json` | 4 hook types (SessionStart, UserPromptSubmit, PostToolUse, Stop) |
**Actual Hooks** (from `plugin/hooks/hooks.json`):
1. SessionStart (with smart-install, worker-service, context-hook, user-message-hook)
2. UserPromptSubmit (with worker-service, new-hook)
3. PostToolUse (with worker-service, save-hook)
4. Stop (with worker-service, summary-hook)
Note: The documentation may be counting individual script invocations rather than hook types.
---
### 3.8 MCP Tool Count (Low Severity)
**Documentation Claims**: "4 MCP tools"
**Actual Tools**:
1. `__IMPORTANT` - Instructional/workflow guidance (not a functional tool)
2. `search` - Search memory index
3. `timeline` - Get chronological context
4. `get_observations` - Fetch full observation details
The claim is technically correct, but `__IMPORTANT` is workflow documentation rather than a functional tool.
---
## 4. Impact Assessment
### 4.1 User Experience Impact
| Issue | User Impact | Frequency |
|-------|-------------|-----------|
| Version Channel UI | Users cannot switch branches via UI | High - Documented prominently |
| Endless Mode | Config has no effect | Medium - Beta feature |
| Troubleshoot Skill | Skill invocation fails | High - Troubleshooting entry point |
| Folder CLAUDE.md | Setting ignored | Low - Niche feature |
| Skills Directory | Structure doesn't match | Low - Developer documentation |
### 4.2 Developer Experience Impact
| Issue | Developer Impact |
|-------|------------------|
| Architecture docs outdated | New contributors confused by skill references |
| Hook count mismatch | Onboarding confusion |
| API endpoint gaps | Integration developers encounter missing features |
---
## 5. Root Cause Analysis
### 5.1 Primary Causes
1. **Branch Divergence**: Beta branches contain features that were documented but never merged to main
2. **Architecture Migration**: The MCP transition removed the skill system but docs weren't updated
3. **Documentation-First Development**: Features were documented during planning but implementation was incomplete
4. **Missing Sync Process**: No automated check between docs and code
### 5.2 Contributing Factors
1. **Multiple Authors**: Documentation and code written by different contributors
2. **Long-Running Branches**: Beta branches existed for extended periods
3. **Incomplete PRs**: Related issues (#333, #436, #461, #366, #403, #416) were closed without merging
---
## 6. Recommended Solutions
### 6.1 Immediate Actions (This Week)
| Item | Action | Owner | Effort |
|------|--------|-------|--------|
| Troubleshoot Skill | Remove references from `troubleshooting.mdx` | Docs | 1 hour |
| Skills Directory | Update `overview.mdx` to show current MCP architecture | Docs | 2 hours |
| Hook Count | Align all sources to "5 hooks" | Docs | 30 min |
| MCP Tool Clarification | Note that `__IMPORTANT` is workflow guidance | Docs | 15 min |
### 6.2 Short-Term Actions (This Sprint)
| Item | Action | Owner | Effort |
|------|--------|-------|--------|
| Endless Mode | Add "Beta Only" badge to `endless-mode.mdx` and `beta-features.mdx` | Docs | 1 hour |
| Version Channel | Add "Beta Only" badge OR complete UI implementation | Eng/Docs | 2-8 hours |
| Folder CLAUDE.md | Merge PR #589 to respect setting | Eng | Code review |
| Allowed Branches | Add `beta/endless-mode` to allowed list | Eng | 15 min |
### 6.3 Long-Term Actions (Next Release)
| Item | Action | Owner | Effort |
|------|--------|-------|--------|
| Documentation Sync | Implement CI check for doc/code alignment | DevOps | 1 day |
| Beta Badge System | Create Mintlify component for beta feature marking | Docs | 2 hours |
| Feature Flags | Consider feature flag system for documented-but-beta features | Eng | 1 week |
---
## 7. Priority/Severity Assessment
### Severity Matrix
| Issue | Severity | Priority | Rationale |
|-------|----------|----------|-----------|
| Version Channel UI | High | P1 | Backend complete, users actively confused |
| Endless Mode | High | P2 | Documented prominently, users try to configure |
| Troubleshoot Skill | Medium | P1 | Entry point for support, must work |
| Folder CLAUDE.md | Medium | P2 | Settings should work as documented |
| Skills Directory | Medium | P3 | Developer-facing, less user impact |
| Allowed Branches | Low | P3 | Edge case |
| Hook Count | Low | P4 | Cosmetic inconsistency |
| MCP Tool Count | Low | P4 | Minor clarification |
### Recommended Resolution Order
1. **P1 - Immediate**: Fix troubleshoot skill reference (remove or explain)
2. **P1 - Immediate**: Version Channel UI decision (badge or implement)
3. **P2 - This Week**: Endless Mode documentation badges
4. **P2 - This Week**: Folder CLAUDE.md PR #589 merge
5. **P3 - This Sprint**: Architecture documentation update
6. **P4 - Eventually**: Minor inconsistencies
---
## 8. Files Requiring Updates
### Documentation Files
| File | Changes Needed |
|------|---------------|
| `docs/public/troubleshooting.mdx` | Remove troubleshoot skill reference |
| `docs/public/architecture/overview.mdx` | Update to MCP architecture, fix hook count |
| `docs/public/beta-features.mdx` | Add "Beta Only" badges, clarify UI availability |
| `docs/public/endless-mode.mdx` | Add "Beta Only" badge prominently |
| `docs/public/configuration.mdx` | Mark `FOLDER_CLAUDEMD_ENABLED` as coming soon or remove |
| `CLAUDE.md` (root) | Verify hook count |
### Code Files
| File | Changes Needed |
|------|---------------|
| `src/services/worker/http/routes/SettingsRoutes.ts` | Add `beta/endless-mode` to allowed branches |
| `src/services/worker/agents/ResponseProcessor.ts` | Check `FOLDER_CLAUDEMD_ENABLED` setting (via PR #589) |
| `src/shared/SettingsDefaultsManager.ts` | Add `CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED` setting |
---
## 9. Appendix
### Related Issues and PRs
| Reference | Description | Status |
|-----------|-------------|--------|
| #333 | Version Channel UI | Closed |
| #436 | Version Channel UI | Closed |
| #461 | Version Channel UI | Closed |
| #366 | Endless Mode | Closed |
| #403 | Endless Mode | Closed |
| #416 | Endless Mode | Closed |
| #589 | Folder CLAUDE.md setting fix | Open |
| #600 | This documentation audit | Open |
### Verification Commands
```bash
# Check for Version Channel UI
grep -r "Version Channel\|version.*channel" src/ui/
# Check for Endless Mode settings
grep -r "ENDLESS_MODE" src/
# Check skills directory
ls -la plugin/skills/
# Check settings validation
grep -A 50 "settingKeys" src/services/worker/http/routes/SettingsRoutes.ts
```
---
*Report generated from analysis of Issue #600 and codebase inspection on 2026-01-07.*

View File

@@ -0,0 +1,430 @@
# Issue #602: PostToolUse Error - Worker Service Failed to Start (Windows)
**Report Date:** 2026-01-07
**Issue Author:** onurtirpan
**Issue Created:** 2026-01-07
**Labels:** bug
**Severity:** HIGH
**Priority:** P1 - Critical
---
## 1. Executive Summary
A Windows 11 user running Claude Code 0.2.76 with claude-mem v9.0.0 is experiencing complete plugin failure. The worker service cannot start during PostToolUse hook execution, resulting in long delays and multiple cascading errors. This is a systemic Windows platform compatibility issue that prevents the entire memory system from functioning.
### Key Symptoms
- "Plugin hook bun worker-service.cjs start failed to start: The operation was aborted"
- Multiple "Worker failed to start (health check timeout)" errors
- "Failed to start server. Is port 37777 in use?"
- "wmic is not recognized" - Windows command compatibility issue
- Database not initialized errors
### Impact
- **Complete loss of memory functionality** on Windows
- Long delays during Claude Code operations
- User workflow disruption
---
## 2. Problem Analysis
### 2.1 Error Chain Analysis
The reported errors form a cascade failure pattern:
```
1. PostToolUse hook triggered
└── 2. worker-service.cjs start command executed
└── 3. Bun spawns worker process
└── 4. Worker startup timeout (operation aborted)
└── 5. Health check fails repeatedly
└── 6. "Is port 37777 in use?" check fails
└── 7. "wmic is not recognized" - WMIC unavailable
└── 8. Database cannot initialize
└── 9. Plugin hook failure
```
### 2.2 Error Categories
| Error Type | Root Cause | Severity |
|------------|-----------|----------|
| "operation was aborted" | Hook timeout exceeded before worker ready | High |
| "health check timeout" | Worker startup takes too long on Windows | High |
| "Is port 37777 in use?" | Previous zombie process holding port | Medium |
| "wmic is not recognized" | WMIC deprecated/removed in Windows 11 | Critical |
| "Database not initialized" | Worker never reached ready state | Consequential |
---
## 3. Technical Details
### 3.1 Affected Components
| Component | File Path | Role |
|-----------|-----------|------|
| Hook Configuration | `plugin/hooks/hooks.json` | Defines PostToolUse command chain |
| Worker Service | `src/services/worker-service.ts` | Main worker orchestrator |
| Process Manager | `src/services/infrastructure/ProcessManager.ts` | Windows process enumeration via WMIC |
| Health Monitor | `src/services/infrastructure/HealthMonitor.ts` | Port and health check logic |
| Server | `src/services/server/Server.ts` | HTTP server on port 37777 |
### 3.2 Hook Configuration (hooks.json)
```json
{
"PostToolUse": [{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "bun \"${CLAUDE_PLUGIN_ROOT}/scripts/worker-service.cjs\" start",
"timeout": 60
},
{
"type": "command",
"command": "bun \"${CLAUDE_PLUGIN_ROOT}/scripts/save-hook.js\"",
"timeout": 120
}
]
}]
}
```
The 60-second timeout for worker startup may be insufficient on Windows systems, especially during first-time initialization when database creation, Chroma vector store setup, and MCP server connection all must complete.
### 3.3 Platform Timeouts
Current timeout configuration in `src/shared/hook-constants.ts`:
```typescript
export const HOOK_TIMEOUTS = {
DEFAULT: 300000, // 5 minutes
HEALTH_CHECK: 30000, // 30 seconds
WORKER_STARTUP_WAIT: 1000,
WORKER_STARTUP_RETRIES: 300,
PRE_RESTART_SETTLE_DELAY: 2000,
WINDOWS_MULTIPLIER: 1.5 // Only 1.5x for Windows
} as const;
```
### 3.4 WMIC Dependency
The `ProcessManager.ts` uses WMIC for Windows process enumeration:
```typescript
// Line 91-92: getChildProcesses()
const cmd = `wmic process where "parentprocessid=${parentPid}" get processid /format:list`;
// Line 174: cleanupOrphanedProcesses()
const cmd = `wmic process where "name like '%python%' and commandline like '%chroma-mcp%'" get processid /format:list`;
```
**Critical Issue:** WMIC (Windows Management Instrumentation Command-line) has been deprecated since Windows 10 version 21H1 and is being removed in Windows 11. Users with clean Windows 11 installations may not have WMIC available.
---
## 4. Impact Assessment
### 4.1 User Impact
| Impact Category | Description |
|-----------------|-------------|
| Functionality | Complete loss of memory features on Windows |
| Performance | Long delays during Claude Code operations (60s+ timeouts) |
| User Experience | Error messages displayed, interrupted workflows |
| Data | No observations being saved, no context injection |
### 4.2 Scope
- **Affected Platform:** Windows 11 (Build 26100+)
- **Affected Shell:** PowerShell 7
- **Affected Version:** claude-mem 9.0.0
- **Claude Code Version:** 0.2.76
- **Estimated User Base:** All Windows 11 users with modern builds
### 4.3 Related Issues
| Issue | Title | Status | Relationship |
|-------|-------|--------|--------------|
| #517 | PowerShell `$_` escaping in Git Bash | Fixed (v9.0.0) | Same component (ProcessManager) |
| #555 | Windows hooks IPC issues | Open | Related Windows hook execution |
| #324 | Windows 11 64-bit system issues | Open | Same platform |
---
## 5. Root Cause Analysis
### 5.1 Primary Root Cause: WMIC Deprecation
WMIC is no longer available by default on Windows 11. When `cleanupOrphanedProcesses()` runs during worker initialization, it fails with "wmic is not recognized", causing the error to be swallowed but subsequent operations to fail.
**Evidence from ProcessManager.ts lines 167-218:**
```typescript
export async function cleanupOrphanedProcesses(): Promise<void> {
const isWindows = process.platform === 'win32';
// ...
if (isWindows) {
// Windows: Use WMIC to find chroma-mcp processes
const cmd = `wmic process where "name like '%python%' and commandline like '%chroma-mcp%'" get processid /format:list`;
const { stdout } = await execAsync(cmd, { timeout: 60000 });
// ...
}
}
```
### 5.2 Secondary Root Cause: Insufficient Timeouts
The hooks.json defines a 60-second timeout for worker startup, but on Windows:
1. WMIC command execution adds latency
2. Database initialization is slower on Windows file systems
3. MCP server initialization has a 5-minute timeout but the hook only waits 60 seconds
4. The WINDOWS_MULTIPLIER of 1.5x is applied inconsistently
### 5.3 Tertiary Root Cause: Zombie Port Issue
The "Is port 37777 in use?" error indicates previous worker processes may not have exited cleanly. This is a known issue (documented in `docs/reports/2026-01-06--windows-woes-comprehensive-report.md`) where Bun's socket cleanup bug on Windows leaves zombie ports.
### 5.4 Quaternary Root Cause: Error Cascade
When `cleanupOrphanedProcesses()` fails silently, the worker attempts to start but:
1. Previous zombie processes may still hold port 37777
2. Health checks fail because the new worker cannot bind
3. The "operation was aborted" error triggers when the 60s hook timeout expires
4. Database initialization never completes
---
## 6. Recommended Solutions
### 6.1 Immediate Fix: Replace WMIC with PowerShell CIM Cmdlets (P0)
**Replace WMIC commands with PowerShell Get-CimInstance:**
```typescript
// Before (ProcessManager.ts line 91-92)
const cmd = `wmic process where "parentprocessid=${parentPid}" get processid /format:list`;
// After
const cmd = `powershell -NoProfile -Command "Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq ${parentPid} } | Select-Object -ExpandProperty ProcessId"`;
```
```typescript
// Before (ProcessManager.ts line 174)
const cmd = `wmic process where "name like '%python%' and commandline like '%chroma-mcp%'" get processid /format:list`;
// After
const cmd = `powershell -NoProfile -Command "Get-CimInstance Win32_Process | Where-Object { $_.Name -like '*python*' -and $_.CommandLine -like '*chroma-mcp*' } | Select-Object -ExpandProperty ProcessId"`;
```
**Note:** This reintroduces Issue #517 concerns about `$_` in Git Bash. Use proper escaping or run via Node.js `child_process.spawn` with `shell: false` and explicit `powershell.exe` path.
### 6.2 Alternative Fix: Use tasklist Command (P0)
A WMIC-free alternative using built-in Windows commands:
```typescript
// For process enumeration
const cmd = `tasklist /FI "IMAGENAME eq python*" /FO CSV /NH`;
// Parse CSV output to get PIDs
```
### 6.3 Increase Windows Timeouts (P1)
Update `plugin/hooks/hooks.json` to use longer Windows-appropriate timeouts:
```json
{
"PostToolUse": [{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "bun \"${CLAUDE_PLUGIN_ROOT}/scripts/worker-service.cjs\" start",
"timeout": 120
}
]
}]
}
```
Update `src/shared/hook-constants.ts`:
```typescript
export const HOOK_TIMEOUTS = {
// ...
WINDOWS_MULTIPLIER: 2.5 // Increase from 1.5 to 2.5
} as const;
```
### 6.4 Add WMIC Availability Detection (P1)
Add graceful fallback when WMIC is unavailable:
```typescript
async function isWmicAvailable(): Promise<boolean> {
try {
await execAsync('wmic os get caption', { timeout: 5000 });
return true;
} catch {
return false;
}
}
export async function cleanupOrphanedProcesses(): Promise<void> {
if (process.platform !== 'win32') {
// Unix implementation
return;
}
const useWmic = await isWmicAvailable();
const cmd = useWmic
? `wmic process where "name like '%python%' ..." get processid /format:list`
: `powershell -NoProfile -Command "Get-CimInstance Win32_Process | ..."`;
// Continue with appropriate parser
}
```
### 6.5 Improve Port Cleanup on Windows (P2)
Ensure proper cleanup before worker restart:
```typescript
// Add to ProcessManager.ts
export async function forceReleasePort(port: number): Promise<void> {
if (process.platform !== 'win32') return;
try {
// Find process using the port
const { stdout } = await execAsync(
`powershell -NoProfile -Command "(Get-NetTCPConnection -LocalPort ${port} -ErrorAction SilentlyContinue).OwningProcess | Sort-Object -Unique"`
);
const pids = stdout.trim().split('\n').filter(p => p.trim());
for (const pid of pids) {
await forceKillProcess(parseInt(pid, 10));
}
} catch {
// Port not in use or access denied
}
}
```
### 6.6 Improve Error Messaging (P2)
Add user-friendly error messages with actionable guidance:
```typescript
// In HealthMonitor.ts
export async function waitForHealth(port: number, timeoutMs: number): Promise<boolean> {
// ... existing logic ...
if (!ready && process.platform === 'win32') {
logger.warn('SYSTEM', 'Windows worker startup slow. Check:');
logger.warn('SYSTEM', ' 1. Is antivirus scanning the plugin folder?');
logger.warn('SYSTEM', ' 2. Is port 37777 blocked by firewall?');
logger.warn('SYSTEM', ' 3. Try: netstat -ano | findstr 37777');
}
return ready;
}
```
---
## 7. Priority/Severity Assessment
### 7.1 Severity Matrix
| Factor | Assessment | Score |
|--------|-----------|-------|
| User Impact | Complete feature loss | 5/5 |
| Frequency | Every operation on affected systems | 5/5 |
| Workaround Available | None | 5/5 |
| Data Loss Risk | No data saved | 4/5 |
| Affected Users | All Windows 11 users | 4/5 |
**Overall Severity: CRITICAL (23/25)**
### 7.2 Priority Recommendation
| Priority | Action | Timeline |
|----------|--------|----------|
| P0 | Replace WMIC with PowerShell CIM cmdlets | Immediate (v9.0.1) |
| P1 | Increase Windows timeouts | Same release |
| P1 | Add WMIC availability detection | Same release |
| P2 | Improve port cleanup | Next minor release |
| P2 | Better error messaging | Next minor release |
### 7.3 Testing Requirements
1. **Unit Tests:**
- Test `cleanupOrphanedProcesses()` with mock WMIC failure
- Test `getChildProcesses()` with PowerShell fallback
- Test timeout multiplier application
2. **Integration Tests:**
- Windows 11 clean install (no WMIC)
- Windows 10 with WMIC available
- Git Bash environment with PowerShell commands
3. **Manual Verification:**
- Confirm worker starts successfully on Windows 11
- Confirm health checks pass within timeout
- Confirm orphaned process cleanup works
---
## 8. Files to Modify
| File | Change Required |
|------|-----------------|
| `src/services/infrastructure/ProcessManager.ts` | Replace WMIC with PowerShell or tasklist |
| `src/shared/hook-constants.ts` | Increase WINDOWS_MULTIPLIER |
| `plugin/hooks/hooks.json` | Increase worker start timeout |
| `src/services/infrastructure/HealthMonitor.ts` | Add Windows-specific error messages |
| `docs/public/troubleshooting.mdx` | Document Windows 11 requirements |
---
## 9. Appendix
### 9.1 Related Documentation
- `docs/reports/2026-01-06--windows-woes-comprehensive-report.md`
- `docs/reports/2026-01-04--issue-517-windows-powershell-analysis.md`
- `docs/reports/2026-01-05--issue-555-windows-hooks-ipc-false.md`
### 9.2 WMIC Deprecation Timeline
| Windows Version | WMIC Status |
|-----------------|-------------|
| Windows 10 (pre-21H1) | Available by default |
| Windows 10 21H1+ | Deprecated, feature on demand |
| Windows 11 (initial) | Available but deprecated |
| Windows 11 22H2+ | Being removed progressively |
| Windows 11 23H2+ | Not installed by default |
### 9.3 PowerShell Equivalent Commands
| WMIC Command | PowerShell Equivalent |
|--------------|----------------------|
| `wmic process list` | `Get-CimInstance Win32_Process` |
| `wmic process where "name='x'"` | `Get-CimInstance Win32_Process \| Where-Object { $_.Name -eq 'x' }` |
| `wmic process get processid` | `(Get-CimInstance Win32_Process).ProcessId` |
### 9.4 User Workaround (Temporary)
Until a fix is released, users can manually install WMIC:
1. Open Settings > Apps > Optional Features
2. Click "Add a feature"
3. Search for "WMIC (Windows Management Instrumentation Command-line)"
4. Install and restart terminal
**Note:** This is not a recommended long-term solution as WMIC will eventually be fully removed.
---
*Report generated by Claude Opus 4.5 for issue #602*

View File

@@ -0,0 +1,427 @@
# Technical Report: Worker Daemon Child Process Leak
**Issue:** #603 - Bug: worker-service daemon leaks child claude processes
**Author:** raulk
**Created:** 2026-01-07
**Report Version:** 1.0
**Severity:** Critical
**Priority:** P0 - Immediate attention required
---
## 1. Executive Summary
The `worker-service.cjs --daemon` process spawns Claude subagent processes via the Claude Agent SDK that are not being properly terminated when their tasks complete. Over the course of normal usage (6+ hours), this results in the accumulation of orphaned child processes that consume significant system memory.
**Key Findings:**
- 121 orphaned `claude` processes accumulated over ~6 hours
- Total memory consumption: ~44GB RSS
- Average memory per process: ~372MB
- Root cause: Missing child process cleanup after SDK query completion
- The issue affects Linux systems and potentially all platforms
**Recommendation:** Implement explicit child process tracking and cleanup in the SDK agent lifecycle, and add process reaping on generator completion.
---
## 2. Problem Analysis
### 2.1 Observed Behavior
The reporter documented the following scenario:
**Parent daemon process (running 7+ hours):**
```
PID PPID RSS(KB) ELAPSED COMMAND
4118969 1 161656 07:28:16 bun ~/.claude/plugins/cache/thedotmack/claude-mem/9.0.0/scripts/worker-service.cjs --daemon
```
**Sample of leaked children (121 total, all parented to daemon):**
```
PID PPID RSS(KB) ELAPSED COMMAND
1927 4118969 377308 06:21:16 claude --output-format stream-json --verbose --input-format stream-json --model claude-sonnet-4-5 --disallowedTools Bash,Read,Write,Edit,Grep,Glob,WebFetch,WebSearch,Task,NotebookEdit,AskUserQuestion,TodoWrite --setting-sources --permission-mode default
2834 4118969 384716 06:20:44 claude --output-format stream-json [...]
3988 4118969 381844 06:20:15 claude --output-format stream-json --resume <session-id> [...]
5938 4118969 382816 06:19:37 claude --output-format stream-json --resume <session-id> [...]
11503 4118969 381276 06:16:12 claude --output-format stream-json --resume <session-id> [...]
```
### 2.2 Reproduction Steps
1. Use claude-mem normally throughout a work session
2. Run: `ps -o pid,ppid,rss,etime --no-headers | awk '$2 == '$(pgrep -f worker-service.cjs)`
3. Count grows over time without bound
### 2.3 Expected Behavior
Child claude processes should terminate when their task completes, or the daemon should reap them.
---
## 3. Technical Details
### 3.1 Architecture Overview
The claude-mem worker service uses a modular architecture:
```
WorkerService (worker-service.ts)
|
+-- SDKAgent (SDKAgent.ts)
| |
| +-- query() from @anthropic-ai/claude-agent-sdk
| |
| +-- Spawns `claude` CLI subprocess
|
+-- SessionManager (SessionManager.ts)
| |
| +-- Manages active sessions
| +-- Event-driven message queues
|
+-- ProcessManager (ProcessManager.ts)
|
+-- Child process enumeration
+-- Graceful shutdown cleanup
```
### 3.2 SDK Agent Child Process Spawning
The `SDKAgent.startSession()` method invokes the Claude Agent SDK's `query()` function:
```typescript
// src/services/worker/SDKAgent.ts (lines 100-114)
const queryResult = query({
prompt: messageGenerator,
options: {
model: modelId,
...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: session.memorySessionId }),
disallowedTools,
abortController: session.abortController,
pathToClaudeCodeExecutable: claudePath
}
});
```
The `query()` function internally spawns a `claude` CLI subprocess with the parameters visible in the leaked process list:
- `--output-format stream-json`
- `--verbose`
- `--input-format stream-json`
- `--model claude-sonnet-4-5`
- `--disallowedTools ...`
- `--setting-sources`
- `--permission-mode default`
### 3.3 Session Lifecycle
Sessions are managed through the following flow:
1. **Initialization:** `SessionRoutes.handleSessionInit()` creates a session and starts a generator
2. **Processing:** `SDKAgent.startSession()` runs the query loop, processing messages from the queue
3. **Completion:** Generator promise resolves, triggering cleanup in `finally` block
The relevant generator lifecycle code in `SessionRoutes.ts` (lines 137-216):
```typescript
session.generatorPromise = agent.startSession(session, this.workerService)
.catch(error => { /* error handling */ })
.finally(() => {
session.generatorPromise = null;
session.currentProvider = null;
this.workerService.broadcastProcessingStatus();
// Crash recovery logic...
if (!wasAborted) {
// Check for pending work and potentially restart
}
});
```
### 3.4 Graceful Shutdown Implementation
The existing shutdown mechanism in `GracefulShutdown.ts` (lines 49-90) does handle child processes, but **only during daemon shutdown**:
```typescript
export async function performGracefulShutdown(config: GracefulShutdownConfig): Promise<void> {
// STEP 1: Enumerate all child processes BEFORE we start closing things
const childPids = await getChildProcesses(process.pid);
// ... other cleanup steps ...
// STEP 6: Force kill any remaining child processes (Windows zombie port fix)
if (childPids.length > 0) {
for (const pid of childPids) {
await forceKillProcess(pid);
}
await waitForProcessesExit(childPids, 5000);
}
}
```
**Critical Gap:** This cleanup only runs when the daemon itself shuts down, not when individual SDK sessions complete.
---
## 4. Impact Assessment
### 4.1 Resource Consumption
| Metric | Value |
|--------|-------|
| Leaked processes | 121 |
| Total RSS | ~44GB |
| Average per process | ~372MB |
| Accumulation rate | ~20 processes/hour |
| Time to exhaustion (64GB system) | ~3 hours |
### 4.2 System Effects
1. **Memory Exhaustion:** Systems with limited RAM will experience OOM conditions
2. **Performance Degradation:** Swap thrashing as memory fills
3. **Process Table Pollution:** Maximum PID limits may be approached
4. **User Experience:** System becomes unresponsive during extended sessions
### 4.3 Affected Platforms
- **Linux (confirmed):** Ubuntu reported by issue author
- **macOS (likely):** Same process spawning mechanism
- **Windows (potentially different):** Uses different child process tracking
---
## 5. Root Cause Analysis
### 5.1 Primary Root Cause
**The SDK's `query()` function spawns a child `claude` process that is not being explicitly terminated when the async iterator completes.**
The `SDKAgent.startSession()` method:
1. Creates an async generator via `query()`
2. Iterates over messages via `for await (const message of queryResult)`
3. When iteration completes (naturally or via abort), the generator resolves
4. **No explicit cleanup of the underlying child process occurs**
### 5.2 Contributing Factors
1. **No Child Process Tracking:** The codebase does not maintain a registry of spawned child processes during normal operation - only during shutdown enumeration.
2. **AbortController Not Triggering Process Kill:** While sessions have an `abortController`, signaling abort to the SDK iterator does not guarantee the underlying `claude` process terminates.
3. **Generator Finally Block Missing Process Cleanup:** The `finally` block in `SessionRoutes.startGeneratorWithProvider()` handles state cleanup but does not explicitly kill child processes.
4. **SDK Abstraction Hiding Process Details:** The `@anthropic-ai/claude-agent-sdk` abstracts the subprocess management, making it difficult to access and terminate the child process directly.
### 5.3 Code Path Analysis
```
User Session Complete
|
v
SDKAgent.startSession() completes for-await loop
|
v
Generator promise resolves
|
v
SessionRoutes finally block executes
|
+-- session.generatorPromise = null
+-- session.currentProvider = null
+-- broadcastProcessingStatus()
+-- Check pending work
|
v
[MISSING] Child process termination
|
v
Claude subprocess continues running (LEAKED)
```
---
## 6. Recommended Solutions
### 6.1 Solution A: SDK-Level Child Process Tracking (Preferred)
Add explicit child process tracking to the SDKAgent class:
```typescript
// src/services/worker/SDKAgent.ts
export class SDKAgent {
private activeChildProcesses: Map<number, { pid: number, sessionDbId: number }> = new Map();
async startSession(session: ActiveSession, worker?: WorkerRef): Promise<void> {
// Before query(), track that we're about to spawn
const queryResult = query({...});
// After first message, capture the PID if available
// Note: May require SDK modification to expose PID
try {
for await (const message of queryResult) {
// ... existing message handling
}
} finally {
// Cleanup: Kill any child process for this session
this.cleanupSessionProcess(session.sessionDbId);
}
}
private cleanupSessionProcess(sessionDbId: number): void {
// Find and terminate process for this session
// Requires either SDK enhancement or platform-specific process enumeration
}
}
```
**Challenges:** The SDK does not currently expose the child process PID.
### 6.2 Solution B: Session-Level Process Enumeration and Cleanup
Add process cleanup to the session completion flow:
```typescript
// src/services/worker/http/routes/SessionRoutes.ts
private startGeneratorWithProvider(session, provider, source): void {
const parentPid = process.pid;
const preExistingPids = new Set(await getChildProcessesForSession(parentPid, 'claude'));
session.generatorPromise = agent.startSession(session, this.workerService)
.finally(async () => {
// Find new child processes that appeared during this session
const currentPids = await getChildProcessesForSession(parentPid, 'claude');
const newPids = currentPids.filter(pid => !preExistingPids.has(pid));
// Terminate orphaned processes
for (const pid of newPids) {
await forceKillProcess(pid);
}
// ... existing cleanup
});
}
```
### 6.3 Solution C: Periodic Orphan Reaper (Mitigation)
Add a background task that periodically identifies and terminates leaked processes:
```typescript
// src/services/worker/OrphanReaper.ts
export class OrphanReaper {
private interval: NodeJS.Timer | null = null;
start(intervalMs: number = 60000): void {
this.interval = setInterval(async () => {
const orphans = await this.findOrphanedClaudeProcesses();
for (const pid of orphans) {
await forceKillProcess(pid);
}
}, intervalMs);
}
private async findOrphanedClaudeProcesses(): Promise<number[]> {
// Find claude processes parented to the worker daemon
// that have been running longer than expected (e.g., > 30 minutes)
}
}
```
**Pros:** Works without SDK modifications
**Cons:** Reactive rather than proactive; processes leak for up to interval duration
### 6.4 Solution D: Request SDK Enhancement
File an issue with the Claude Agent SDK requesting:
1. Exposure of child process PID in query result
2. Built-in cleanup on iterator completion
3. Explicit `close()` or `terminate()` method
### 6.5 Recommended Implementation Order
1. **Immediate (P0):** Implement Solution C (Orphan Reaper) as a mitigation
2. **Short-term (P1):** Implement Solution B (Session-Level Cleanup)
3. **Medium-term (P2):** Pursue Solution D (SDK Enhancement) with Anthropic
4. **Long-term (P3):** Implement Solution A once SDK provides PID access
---
## 7. Priority/Severity Assessment
### 7.1 Severity: Critical
- **Data Loss:** No
- **System Instability:** Yes - memory exhaustion
- **User Impact:** High - system becomes unusable
- **Scope:** All users with extended sessions
### 7.2 Priority: P0 - Immediate
- **Frequency:** Every session creates leaked processes
- **Accumulation:** Unbounded growth
- **Workaround:** Manual daemon restart (disruptive)
- **Business Impact:** Renders product unusable for long sessions
### 7.3 Effort Estimate
| Solution | Effort | Risk |
|----------|--------|------|
| Orphan Reaper (C) | 2-4 hours | Low |
| Session Cleanup (B) | 4-8 hours | Medium |
| SDK Enhancement (D) | External dependency | - |
| Full Tracking (A) | 8-16 hours | Medium |
---
## 8. References
- **Issue:** https://github.com/thedotmack/claude-mem/issues/603
- **Source Files:**
- `/src/services/worker/SDKAgent.ts` - SDK query invocation
- `/src/services/worker/SessionManager.ts` - Session lifecycle
- `/src/services/worker/http/routes/SessionRoutes.ts` - Generator management
- `/src/services/infrastructure/ProcessManager.ts` - Process utilities
- `/src/services/infrastructure/GracefulShutdown.ts` - Shutdown cleanup
- **Related Code:**
- `@anthropic-ai/claude-agent-sdk` - External SDK spawning processes
---
## 9. Appendix: Process Enumeration Reference
### Current getChildProcesses Implementation
```typescript
// src/services/infrastructure/ProcessManager.ts
export async function getChildProcesses(parentPid: number): Promise<number[]> {
if (process.platform !== 'win32') {
return []; // NOTE: Only implemented for Windows!
}
// Windows implementation using wmic
const cmd = `wmic process where "parentprocessid=${parentPid}" get processid /format:list`;
// ...
}
```
**Critical Finding:** The `getChildProcesses` function is currently **Windows-only** and returns an empty array on Linux/macOS. This means the Linux user reporting the issue has no built-in cleanup mechanism.
### Required Fix for Linux/macOS
```typescript
export async function getChildProcesses(parentPid: number): Promise<number[]> {
if (process.platform === 'win32') {
// Existing Windows implementation
} else {
// Unix implementation
const { stdout } = await execAsync(`pgrep -P ${parentPid}`);
return stdout.trim().split('\n').map(Number).filter(n => !isNaN(n));
}
}
```
---
*Report prepared by Claude Code analysis of codebase and issue #603*

File diff suppressed because it is too large Load Diff