137 Commits

Author SHA1 Message Date
Alex Newman
03748acd6a refactor: remove bearer auth and platform_source context filter (#2081)
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048)

- Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916)
- Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048)
- Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956)
- Add periodic clearFailed() to purge stale pending_messages (#1957)
- Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874)

- ResponseProcessor: mark messages as failed (with retry) instead of confirming
  when the LLM returns non-XML garbage (auth errors, rate limits) (#1874)
- Health endpoint: include activeSessions count for queue liveness monitoring (#1867)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: cache isFts5Available() at construction time

Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text
query. Result is now cached in _fts5Available at construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053)

- Replace flat consecutiveRestarts counter with time-windowed RestartGuard:
  only counts restarts within 60s window (cap=10), decays after 5min of
  success. Prevents stranding pending messages on long-running sessions. (#2053)

- Add idle session eviction to pool slot allocation: when all slots are full,
  evict the idlest session (no pending work, oldest activity) to free a slot
  for new requests, preventing 60s timeout deadlock. (#1868)

- Fix MCP loopback self-check: use process.execPath instead of bare 'node'
  which fails on non-interactive PATH. Fix crash misclassification by removing
  false "Generator exited unexpectedly" error log on normal completion. (#1876)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907)

- Wrap summarize hook's workerHttpRequest in try/catch to prevent exit
  code 2 (blocking error) on network failures or malformed responses.
  Session exit no longer blocks on worker errors. (#1901)

- Add health-check wait loop to UserPromptSubmit session-init command in
  hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit
  before SessionStart, session-init now waits up to 10s for worker health
  before proceeding. Also wrap session-init HTTP call in try/catch. (#1907)

- Close #1896 as already-fixed: mtime comparison at file-context.ts:255-267
  bypasses truncation when file is newer than latest observation.

- Close #1903 as no-repro: hooks.json correctly declares all hook events.
  Issue was Claude Code 12.0.1/macOS platform event-dispatch bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936)

- Add bearer token auth to all API endpoints: auto-generated 32-byte
  token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook,
  MCP, viewer, and OpenCode requests include Authorization header.
  Health/readiness endpoints exempt for polling. (#1932, #1933)

- Add path traversal protection: watch.context.path validated against
  project root and ~/.claude-mem/ before write. Rejects ../../../etc
  style attacks. (#1934)

- Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter
  (300 req/min/IP) to prevent abuse. (#1935)

- Derive default worker port from UID (37700 + uid%100) to prevent
  cross-user data leakage on multi-user macOS. Windows falls back to
  37777. Shell hooks use same formula via id -u. (#1936)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918)

- Fix per-type search endpoints to pass project filter to Chroma queries
  and SQLite hydration. searchObservations/Sessions/UserPrompts now use
  $or clause matching project + merged_into_project. (#1912)

- Fix timeline/search methods to pass project to Chroma anchor queries.
  Prevents cross-project result leakage when project param omitted. (#1911)

- Sync imported observations to ChromaDB after FTS rebuild. Import
  endpoint now calls chromaSync.syncObservation() for each imported
  row, making them visible to MCP search(). (#1914)

- Fix session-init cwd fallback to match context.ts (process.cwd()).
  Prevents project key mismatch that caused "no previous sessions"
  on fresh sessions. (#1918)

- Fix sync-marketplace restart to include auth token and per-user port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve all CodeRabbit and Greptile review comments on PR #2080

- Fix run.sh comment mismatch (no-op flag vs empty array)
- Gate session-init on health check success (prevent running when worker unreachable)
- Fix date_desc ordering ignored in FTS session search
- Age-scope failed message purge (1h retention) instead of clearing all
- Anchor RestartGuard decay to real successes (null init, not Date.now())
- Add recordSuccess() calls in ResponseProcessor and completion path
- Prevent caller headers from overriding bearer auth token
- Add lazy cleanup for rate limiter map to prevent unbounded growth
- Bound post-import Chroma sync with concurrency limit of 8
- Add doc_type:'observation' filter to Chroma queries feeding observation hydration
- Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline)
- Add response.ok check and error handling in viewer saveSettings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-2 review comments

- Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge
- Downgrade _fts5Available flag when FTS table creation fails
- Escape FTS5 MATCH input by quoting user queries as literal phrases
- Escape LIKE metacharacters (%, _, \) in prompt text search
- Add response.ok check in initial settings load (matches save flow)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-3 review comments

- Include failed_at_epoch in COALESCE for age-scoped purge
- Re-throw FTS5 errors so callers can distinguish failure from no-results
- Wrap all FTS fallback calls in SearchManager with try/catch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove bearer auth and platform_source from context inject

Bearer token auth (#1932/#1933) added friction for all localhost API
clients with no benefit — the worker already binds localhost-only (CORS
restriction + host binding). Removed auth-token module, requireAuth
middleware, and Authorization headers from all internal callers.

platform_source filtering from the /api/context/inject path was never
used by any caller and silently filtered out observations. The underlying
platform_source column stays; only the query-time filter and its plumbing
through ContextBuilder, ObservationCompiler, SearchRoutes, context.ts,
and transcripts/processor.ts are removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit + Greptile + claude-review comments on PR #2081

- middleware.ts: drop 'Authorization' from CORS allowedHeaders (Greptile)
- middleware.ts: rate limiter falls back to req.socket.remoteAddress; add Retry-After on 429 (claude-review)
- SearchRoutes.ts: drop leftover platformSource read+pass in handleContextPreview (Greptile)
- .docker-blowout-data/: stop tracking the empty SQLite placeholder and gitignore the dir (claude-review)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: tighten rate limiter — correct boundary + drop dead cleanup branch

- `entry.count >= RATE_LIMIT_MAX_REQUESTS` so the 300th request is the
  first rejected (was 301).
- Removed the `requestCounts.size > 100` lazy-cleanup block — on a
  localhost-only server the map tops out at 1–2 entries, so the branch
  was dead code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: rate limiter correctly allows exactly 300 req/min; doc localhost scope

- Check `entry.count >= max` BEFORE incrementing so the cap matches the
  comment: 300 requests pass, the 301st gets 429.
- Added a comment noting the limiter is effectively a global cap on a
  localhost-only worker (all callers share the 127.0.0.1/::1 bucket).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: normalise IPv4-mapped IPv6 in rate limiter client IP

Strip the `::ffff:` prefix so a localhost caller routed as
`::ffff:127.0.0.1` shares a bucket with `127.0.0.1`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: size-guarded prune of rate limiter map for non-localhost deploys

Prune expired entries only when the map exceeds 1000 keys and we're
already doing a window reset, so the cost is zero on the localhost hot
path (1–2 keys) and the map can't grow unbounded if the worker is ever
bound on a non-loopback interface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 13:31:13 -07:00
Alex Newman
8d166b47c1 Revert "revert: roll back v12.3.3 (Issue Blowout 2026)"
This reverts commit bfc7de377a.
2026-04-20 12:18:55 -07:00
Alex Newman
bfc7de377a revert: roll back v12.3.3 (Issue Blowout 2026)
SessionStart context injection regressed in v12.3.3 — no memory
context is being delivered to new sessions. Rolling back to the
v12.3.2 tree state while the regression is investigated.

Reverts #2080.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:59:15 -07:00
Alex Newman
ba1ef6c42c fix: Issue Blowout 2026 — 25 bugs across worker, hooks, security, and search (#2080)
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048)

- Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916)
- Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048)
- Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956)
- Add periodic clearFailed() to purge stale pending_messages (#1957)
- Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874)

- ResponseProcessor: mark messages as failed (with retry) instead of confirming
  when the LLM returns non-XML garbage (auth errors, rate limits) (#1874)
- Health endpoint: include activeSessions count for queue liveness monitoring (#1867)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: cache isFts5Available() at construction time

Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text
query. Result is now cached in _fts5Available at construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053)

- Replace flat consecutiveRestarts counter with time-windowed RestartGuard:
  only counts restarts within 60s window (cap=10), decays after 5min of
  success. Prevents stranding pending messages on long-running sessions. (#2053)

- Add idle session eviction to pool slot allocation: when all slots are full,
  evict the idlest session (no pending work, oldest activity) to free a slot
  for new requests, preventing 60s timeout deadlock. (#1868)

- Fix MCP loopback self-check: use process.execPath instead of bare 'node'
  which fails on non-interactive PATH. Fix crash misclassification by removing
  false "Generator exited unexpectedly" error log on normal completion. (#1876)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907)

- Wrap summarize hook's workerHttpRequest in try/catch to prevent exit
  code 2 (blocking error) on network failures or malformed responses.
  Session exit no longer blocks on worker errors. (#1901)

- Add health-check wait loop to UserPromptSubmit session-init command in
  hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit
  before SessionStart, session-init now waits up to 10s for worker health
  before proceeding. Also wrap session-init HTTP call in try/catch. (#1907)

- Close #1896 as already-fixed: mtime comparison at file-context.ts:255-267
  bypasses truncation when file is newer than latest observation.

- Close #1903 as no-repro: hooks.json correctly declares all hook events.
  Issue was Claude Code 12.0.1/macOS platform event-dispatch bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936)

- Add bearer token auth to all API endpoints: auto-generated 32-byte
  token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook,
  MCP, viewer, and OpenCode requests include Authorization header.
  Health/readiness endpoints exempt for polling. (#1932, #1933)

- Add path traversal protection: watch.context.path validated against
  project root and ~/.claude-mem/ before write. Rejects ../../../etc
  style attacks. (#1934)

- Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter
  (300 req/min/IP) to prevent abuse. (#1935)

- Derive default worker port from UID (37700 + uid%100) to prevent
  cross-user data leakage on multi-user macOS. Windows falls back to
  37777. Shell hooks use same formula via id -u. (#1936)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918)

- Fix per-type search endpoints to pass project filter to Chroma queries
  and SQLite hydration. searchObservations/Sessions/UserPrompts now use
  $or clause matching project + merged_into_project. (#1912)

- Fix timeline/search methods to pass project to Chroma anchor queries.
  Prevents cross-project result leakage when project param omitted. (#1911)

- Sync imported observations to ChromaDB after FTS rebuild. Import
  endpoint now calls chromaSync.syncObservation() for each imported
  row, making them visible to MCP search(). (#1914)

- Fix session-init cwd fallback to match context.ts (process.cwd()).
  Prevents project key mismatch that caused "no previous sessions"
  on fresh sessions. (#1918)

- Fix sync-marketplace restart to include auth token and per-user port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve all CodeRabbit and Greptile review comments on PR #2080

- Fix run.sh comment mismatch (no-op flag vs empty array)
- Gate session-init on health check success (prevent running when worker unreachable)
- Fix date_desc ordering ignored in FTS session search
- Age-scope failed message purge (1h retention) instead of clearing all
- Anchor RestartGuard decay to real successes (null init, not Date.now())
- Add recordSuccess() calls in ResponseProcessor and completion path
- Prevent caller headers from overriding bearer auth token
- Add lazy cleanup for rate limiter map to prevent unbounded growth
- Bound post-import Chroma sync with concurrency limit of 8
- Add doc_type:'observation' filter to Chroma queries feeding observation hydration
- Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline)
- Add response.ok check and error handling in viewer saveSettings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-2 review comments

- Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge
- Downgrade _fts5Available flag when FTS table creation fails
- Escape FTS5 MATCH input by quoting user queries as literal phrases
- Escape LIKE metacharacters (%, _, \) in prompt text search
- Add response.ok check in initial settings load (matches save flow)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-3 review comments

- Include failed_at_epoch in COALESCE for age-scoped purge
- Re-throw FTS5 errors so callers can distinguish failure from no-results
- Wrap all FTS fallback calls in SearchManager with try/catch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 11:42:09 -07:00
Alex Newman
a0dd516cd5 fix: resolve all 301 error handling anti-patterns across codebase
Systematic cleanup of every error handling anti-pattern detected by the
automated scanner. 289 issues fixed via code changes, 12 approved with
specific technical justifications.

Changes across 90 files:
- GENERIC_CATCH (141): Added instanceof Error type discrimination
- LARGE_TRY_BLOCK (82): Extracted helper methods to narrow try scope to ≤10 lines
- NO_LOGGING_IN_CATCH (65): Added logger/console calls for error visibility
- CATCH_AND_CONTINUE_CRITICAL_PATH (10): Added throw/return or approved overrides
- ERROR_STRING_MATCHING (2): Approved with rationale (no typed error classes)
- ERROR_MESSAGE_GUESSING (1): Replaced chained .includes() with documented pattern array
- PROMISE_CATCH_NO_LOGGING (1): Added logging to .catch() handler

Also fixes a detector bug where nested try/catch inside a catch block
corrupted brace-depth tracking, causing false positives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 19:57:00 -07:00
Alex Newman
a1741f4322 docs: update CHANGELOG.md for v12.2.0 + make generator incremental
Script now reads existing CHANGELOG.md, skips releases already documented,
only fetches bodies for new releases, and prepends them. Pass --full to
force complete regeneration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 20:21:51 -07:00
Alex Newman
9d695f53ed chore: remove auto-generated per-directory CLAUDE.md files
Leftover artifacts from an abandoned context-injection feature. The
project-level CLAUDE.md stays; the directory-level ones were generated
timeline scaffolding that never panned out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 17:51:24 -07:00
Alex Newman
d589bc5f25 fix(cwd-remap): address PR review feedback
- Handle bare repo common-dir (strip trailing .git) instead of an
  identical-branch ternary
- Surface unexpected git stderr while keeping "not a git repository"
  silent
- Explicitly close the sqlite handle in both dry-run and apply paths
  so WAL checkpoints complete

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 17:21:49 -07:00
Alex Newman
148e1892df chore(scripts): replace worktree-remap with cwd-based remap using pending_messages.cwd
The old worktree-remap.ts tried to reconstruct per-session cwd by regex-
matching absolute paths that incidentally leaked into observation free-text
(files_read, source_input_summary, metadata, user_prompt). That source is
derived and lossy: it only hit 1/3498 plain-project sessions in practice.

pending_messages.cwd is the structured, authoritative cwd captured from
every hook payload — 7,935 of 8,473 rows are populated. cwd-remap.ts uses
that column as the source of truth:

  1. Pull every distinct cwd from pending_messages.cwd
  2. For each cwd, classify with git:
       - rev-parse --absolute-git-dir vs --git-common-dir → main vs worktree
       - rev-parse --show-toplevel for the correct leaf (handles cwds that
         are subdirs of the worktree root)
     Parent project name = basename(dirname(common-dir)); composite is
     parent/worktree for worktrees, basename(toplevel) for main repos.
  3. For each session, take the EARLIEST pending_messages.cwd (not the
     dominant one — claude-mem's own hooks run from nested .context/
     claude-mem/ directories and would otherwise poison the count).
  4. Apply UPDATEs in a single transaction across sdk_sessions,
     observations, and session_summaries. Auto-backs-up the DB first.

Result on a real DB: 41 sessions remapped (vs 1 previously),
1,694 observations and 3,091 session_summaries updated to match.
43 cwds skipped (deleted worktrees / non-repos) are left untouched —
no inference when the data isn't there.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-16 16:40:37 -07:00
Alex Newman
040729beef fix(project-name): use parent/worktree composite so observations don't cross worktrees
Revert of #1820 behavior. Each worktree now gets its own bucket:
- In a worktree, primary = `parent/worktree` (e.g. `claude-mem/dar-es-salaam`)
- In a main repo, primary = basename (unchanged)
- allProjects is always `[primary]` — strict isolation at query time

Includes a one-off maintenance script (scripts/worktree-remap.ts) that
retroactively reattributes past sessions to their worktree using path
signals in observations and user prompts. Two-rule inference keeps the
remap high-confidence:
  1. The worktree basename in the path matches the session's current
     plain project name (pre-#1820 era; trusted).
  2. Or all worktree path signals converge on a single (parent, worktree)
     across the session.
Ambiguous sessions are skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-16 15:40:44 -07:00
Alex Newman
c648d5d8d2 feat: Knowledge Agents — queryable corpora from claude-mem (#1653)
* feat: add knowledge agent types, store, builder, and renderer

Phase 1 of Knowledge Agents feature. Introduces corpus compilation
pipeline that filters observations from the database into portable
corpus files stored at ~/.claude-mem/corpora/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add corpus CRUD HTTP endpoints and wire into worker service

Phase 2 of Knowledge Agents. Adds CorpusRoutes with 5 endpoints
(build, list, get, delete, rebuild) and registers them during
worker background initialization alongside SearchRoutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add KnowledgeAgent with V1 SDK prime/query/reprime

Phase 3 of Knowledge Agents. Uses Agent SDK V1 query() with
resume and disallowedTools for Q&A-only knowledge sessions.
Auto-reprimes on session expiry. Adds prime, query, and reprime
HTTP endpoints to CorpusRoutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add MCP tools and skill for knowledge agents

Phase 4 of Knowledge Agents. Adds build_corpus, list_corpora,
prime_corpus, and query_corpus MCP tools delegating to worker
HTTP endpoints. Includes /knowledge-agent skill with workflow docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle SDK process exit in KnowledgeAgent, add e2e test

The Agent SDK may throw after yielding all messages when the
Claude process exits with a non-zero code. Now tolerates this
if session_id/answer were already captured. Adds comprehensive
e2e test script (31 assertions) orchestrated via tmux-cli.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use settings model ID instead of hardcoded model in KnowledgeAgent

Reads CLAUDE_MEM_MODEL from user settings via getModelId(), matching
the existing SDKAgent pattern. No more hardcoded model assumptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: improve knowledge agents developer experience

Add public documentation page, rebuild/reprime MCP tools, and actionable
error messages. DX review scored knowledge agents 4/10 — core engineering
works (31/31 e2e) but the feature was invisible. This addresses
discoverability (docs, cross-links), API completeness (missing MCP tools),
and error quality (fix/example fields in error responses).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add quick start guide to knowledge agents page

Covers the three main use cases upfront: creating an agent, asking a
single question, and starting a fresh conversation with reprime. Includes
keeping-it-current section for rebuild + reprime workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address code review issues — path traversal, session safety, prompt injection

- Block path traversal in CorpusStore with alphanumeric name validation and resolved path check
- Harden system prompt against instruction injection from untrusted corpus content
- Validate question field as non-empty string in query endpoint
- Only persist session_id after successful prime (not null on failure)
- Persist refreshed session_id after query execution
- Only auto-reprime on session resume errors, not all query failures
- Add fenced code block language tags to SKILL.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining code review issues — e2e robustness, MCP validation, docs

- Harden e2e curl wrappers with connect-timeout, fallback to HTTP 000 on transport failure
- Use curl_post wrapper consistently for all long-running POST calls
- Add runtime name validation to all corpus MCP tool handlers
- Fix docs: soften hallucination guarantee to probabilistic claim
- Fix architecture diagram: add missing rebuild_corpus and reprime_corpus tools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: enforce string[] type in safeParseJsonArray for corpus data integrity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add blank line before fenced code blocks in SKILL.md maintenance section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 17:30:20 -07:00
Alex Newman
abd55977ca fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node (#1645)
* fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node

The MCP server bundle (mcp-server.cjs) ships with `#!/usr/bin/env node` so
it must run under Node, but commit 2b60dd29 added an import of
`ensureWorkerStarted` from worker-service.ts. That import transitively pulls
in DatabaseManager → bun:sqlite, blowing up at top-level require under Node.

The bundle ballooned from ~358KB (v11.0.1) to ~1.96MB (v12.0.0) and crashed
on every spawn, breaking the MCP server entirely for Codex/MCP-only clients
and any flow that boots the MCP tool surface.

Fix:

1. Extract `ensureWorkerStarted` and the Windows spawn-cooldown helpers
   into a new lightweight module `src/services/worker-spawner.ts` that
   only imports from infrastructure/ProcessManager, infrastructure/HealthMonitor,
   shared/*, and utils/logger — no SQLite, no ChromaSync, no DatabaseManager.

2. The new helper takes the worker script path explicitly so callers
   running under Node (mcp-server) can pass `worker-service.cjs` while
   callers already inside the worker (worker-service self-spawn) pass
   `__filename`. worker-service.ts keeps a thin wrapper for back-compat.

3. mcp-server.ts now imports from worker-spawner.js and resolves
   WORKER_SCRIPT_PATH via __dirname so the daemon can be auto-started
   for MCP-only clients without dragging in the entire worker bundle.

4. resolveWorkerRuntimePath() now searches for Bun on every platform
   (not just Windows). worker-service.cjs requires Bun at runtime, so
   when the spawner is invoked from a Node process the Unix branch can
   no longer fall through to process.execPath (= node).

5. spawnDaemon's Unix branch now calls resolveWorkerRuntimePath() instead
   of hardcoding process.execPath, fixing the same Node-spawning-Node bug
   for the actual subprocess launch on Linux/macOS.

After:
- mcp-server.cjs is 384KB again with zero `bun:sqlite` references
- node mcp-server.cjs initializes and serves tools/list + tools/call
  (verified via JSON-RPC against the running worker)
- ProcessManager test suite updated for the new cross-platform Bun
  resolution behavior; full suite has the same pre-existing failures
  as main, no regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 1)

Per Claude Code Review on PR #1645:

1. mcp-server.ts: log a warning when both __dirname and import.meta.url
   resolution fail. The cwd() fallback is essentially dead code for the
   CJS bundle but if it ever fires it gives the user a breadcrumb instead
   of a silently-wrong WORKER_SCRIPT_PATH.

2. mcp-server.ts: existsSync check on WORKER_SCRIPT_PATH at module load.
   Surfaces a clear "worker-service.cjs not found at expected path" log
   line for partial installs / dev environments instead of letting the
   failure surface as a generic spawnDaemon error later.

3. ProcessManager.ts: explanatory comment on the Windows `return 0`
   sentinel in spawnDaemon. Documents that PowerShell Start-Process
   doesn't return a PID and that callers MUST use `pid === undefined`
   for failure detection — never falsy checks like `if (!pid)`.

Items 4 (no direct unit tests for the worker-spawner Windows cooldown
helpers) and 5 (process-manager.test.ts uses real ~/.claude-mem path)
are deferred — the reviewer flagged the latter as out of scope, and
the former needs an injectable-I/O refactor that isn't appropriate
for a hotfix bugfix PR.

Verified: build clean, mcp-server.cjs still 384KB / zero bun:sqlite,
JSON-RPC tools/list still returns the 7-tool surface, ProcessManager
test suite still 43/43.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(spawner): mkdir CLAUDE_MEM_DATA_DIR before writing Windows cooldown marker

Per CodeRabbit on PR #1645: on a fresh user profile, the data dir may not
exist yet when markWorkerSpawnAttempted() runs. writeFileSync would throw
ENOENT, the catch would swallow it, and the marker would never be created
— defeating the popup-loop protection this helper exists to provide.

mkdirSync(dir, { recursive: true }) is a no-op when the directory already
exists, so it's safe to call on every spawn attempt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(spawner): add APPROVED OVERRIDE annotations for cooldown marker catches

Per CodeRabbit on PR #1645: silent catch blocks at spawn-cooldown sites
should carry the APPROVED OVERRIDE annotation that the rest of the
codebase uses (see ProcessManager.ts:689, BaseRouteHandler.ts:82,
ChromaSync.ts:288).

Both catches are intentional best-effort:
- markWorkerSpawnAttempted: if mkdir/writeFileSync fails, the worker
  spawn itself will almost certainly fail too. Surfacing that downstream
  is far more useful than a noisy log line about a lock file.
- clearWorkerSpawnAttempted: a stale marker is harmless. Worst case is
  one suppressed retry within the cooldown window, then self-heals.

No behaviour change. Resolves the second half of CodeRabbit's lines
38-65 comment on worker-spawner.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 2)

Round 2 of Claude Code Review feedback on PR #1645:

Build guardrail (most important — protects the regression this PR fixes):

- scripts/build-hooks.js: post-build check that fails the build if
  mcp-server.cjs ever contains a `bun:sqlite` reference. This is the
  exact regression PR #1645 fixed; future contributors will get an
  immediate, actionable error if a transitive import re-introduces it.
  Verified the check trips when violated.

Code clarity:

- src/servers/mcp-server.ts: drop dead `_originalLog` capture — it was
  never restored. Less code is fewer bugs.

- src/servers/mcp-server.ts: elevate `cwd()` fallback log from WARN to
  ERROR. Per reviewer: a wrong WORKER_SCRIPT_PATH means worker auto-start
  silently fails, so the breadcrumb should be loud and searchable.

- src/services/worker-service.ts: extended doc comment on the
  `ensureWorkerStartedShared(port, __filename)` wrapper explaining why
  `__filename` is the correct script path here (CJS bundle = compiled
  worker-service.cjs) and why mcp-server.ts can't use the same trick.

- src/services/infrastructure/ProcessManager.ts: inline comment on the
  `env.BUN === 'bun'` bare-command guard explaining why it's reachable
  even though `isBunExecutablePath('bun')` is true (pathExists returns
  false for relative names, so the second branch is what fires).

Coverage:

- src/services/infrastructure/ProcessManager.ts: add `/usr/bin/bun` to
  the Linux candidate paths so apt-installed Bun on Debian/Ubuntu is
  found without falling through to the PATH lookup.

Out-of-scope items (deferred with rationale in PR replies):

- Unit tests for ensureWorkerStarted / Windows cooldown helpers — needs
  injectable-I/O refactor unsuitable for a hotfix.
- Sentinel object for Windows spawnDaemon `0` — broader API change.
- Windows Scoop install path — follow-up for a future PR.
- runOneTimeChromaMigration placement, aggressiveStartupCleanup,
  console.log redirect timing, platform timeout multiplier — all
  pre-existing and unrelated to this regression.

Verified: build clean, guardrail trips on simulated violation,
mcp-server.cjs still 0 bun:sqlite refs, ProcessManager tests 43/43.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 3)

Round 3 of Claude Code Review feedback on PR #1645:

ProcessManager.ts: improve actionability of "Bun not found" errors

Both Windows and Unix branches of spawnDaemon previously logged a vague
"Failed to locate Bun runtime" message when resolveWorkerRuntimePath()
returned null. Replaced with an actionable message that names the install
URL and explains *why* Bun is required (worker uses bun:sqlite). The
existing null-guard at the call sites already prevents passing null to
child_process.spawn — only the error text changed.

scripts/build-hooks.js: refine bun:sqlite guardrail to match actual
require() calls only

The previous coarse `includes('bun:sqlite')` check tripped on its own
improved error message, which legitimately mentions "bun:sqlite" by name.
Switched to a regex that matches `require("bun:sqlite")` /
`require('bun:sqlite')` (with optional whitespace, handles both quote
styles, handles minified output) so error messages and inline comments
can reference the module name without false positives. Verified the
regex still trips on real violations (both spaced and minified forms)
and correctly ignores string-literal mentions.

Other round-3 items (verified, not changed):

- TOOL_ENDPOINT_MAP: reviewer flagged as dead code, but it IS used at
  lines 250 and 263 by the search and timeline tool handlers. False
  positive — kept as-is.
- if (!pid) callsites: grepped src/, zero offenders. The Windows `0`
  PID sentinel contract is safe; only the in-line documentation comment
  in ProcessManager.ts mentions the anti-pattern.
- callWorkerAPIPost double-wrapping: pre-existing intentional behavior
  (only used by /api/observations/batch which returns raw data, not
  the MCP {content:[...]} shape). Unrelated to this regression.
- Snap path / startParentHeartbeat / main().catch / test for non-
  existent workerScriptPath / etc — pre-existing or out of scope for
  this hotfix, deferred per established disposition.

Verified: build clean, guardrail still trips on real violations,
mcp-server.cjs has 0 require("bun:sqlite") calls, JSON-RPC tools/list
returns the 7-tool surface, ProcessManager tests 43/43.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(spawnDaemon): contract test for Windows 0 PID success sentinel

Per CodeRabbit nitpick on PR #1645 commit 7a96b3b9: add a focused test
that documents the spawnDaemon return contract so any future contributor
who introduces `if (!pid)` against a spawnDaemon return value (or its
wrapper) sees a failing assertion explaining why the falsy check is
incorrect.

The test deliberately exercises the JS-level semantics rather than
mocking PowerShell — a true mocked Windows test would require
refactoring spawnDaemon to take an injectable execSync, which is a
larger change than this hotfix should carry. The contract assertions
here catch the same regression class (treating Windows success as
failure) without that refactor.

Verified: bun test tests/infrastructure/process-manager.test.ts now
passes 44/44 (was 43/43).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 4)

Round 4 of Claude Code Review feedback on PR #1645 (review of round-3
commit 193286f9):

tests/infrastructure/process-manager.test.ts: replace require('fs')
with the already-imported statSync. Reviewer correctly flagged that
the file uses ESM-style named imports everywhere else and the inline
require() calls would break under strict ESM. Two callsites updated
in the touchPidFile test.

src/services/infrastructure/ProcessManager.ts: hoist
resolveWorkerRuntimePath() and the `Bun runtime not found` error
handling out of both branches in spawnDaemon. Both Windows and Unix
branches need the same Bun lookup, and resolving once before the OS
branch split avoids a duplicate execSync('which bun')/where bun in the
no-well-known-path fallback. The error message is also DRY now —
single source of truth instead of two near-identical strings.

CodeRabbit confirmed in its previous reply that "All actionable items
across all four review rounds are fully resolved" — these two minor
items from claude-review of round 3 are the only remaining cleanup.

Verified: build clean, ProcessManager tests still 44/44.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 5)

Round 5 of Claude Code Review feedback on PR #1645:

src/services/worker-spawner.ts: drop `export` from internal helpers

`shouldSkipSpawnOnWindows`, `markWorkerSpawnAttempted`, and
`clearWorkerSpawnAttempted` were exported even though they were
private in worker-service.ts and nothing outside this module needs
them. Removing the `export` keyword keeps the public surface to just
`ensureWorkerStarted` and prevents future callers from bypassing the
spawn lifecycle.

scripts/build-hooks.js: broaden guardrail to all bun:* modules

Previously the regex only caught `require("bun:sqlite")`, but every
module in the `bun:` namespace (bun:ffi, bun:test, etc.) is Bun-only
and would crash mcp-server.cjs the same way under Node. Generalized
the regex to `require("bun:[a-z][a-z0-9_-]*")` so a transitive import
of any Bun-only module fails the build instead of shipping a broken
bundle. Verified the new regex still trips on bun:sqlite, bun:ffi,
bun:test, and correctly ignores string-literal mentions in error
messages.

src/servers/mcp-server.ts: attribute root cause when dirname resolution fails

Previously, if `__dirname`/`import.meta.url` resolution failed and we
fell back to `process.cwd()`, the user would see two warnings: an
error about the dirname fallback AND a separate warning about the
missing worker bundle. The second warning hides the root cause —
someone debugging would assume the install is broken when really it's
a dirname-resolution failure. Track the failure with a flag and emit
a single root-cause-attributing log line in the existence-check
branch instead. The dirname fallback paths are still functionally
unreachable in CJS deployment; this just makes the failure mode
unmistakable if it ever does fire.

Out of scope (consistent with prior rounds):
- darwin/linux split for non-Windows candidate paths (benign today)
- Integration test for non-existent workerScriptPath (test coverage
  gap deferred since rounds 1-2)
- Defer existsSync check to first ensureWorkerStarted call (current
  module-init check is the loud signal we want)

Already addressed in earlier rounds:
- resolveWorkerRuntimePath() called twice in spawnDaemon → hoisted in
  round 4 (b2c114b4)
- _originalLog dead code → removed in round 2 (7a96b3b9)

Verified: build clean, broadened guardrail trips on bun:sqlite,
bun:ffi, and bun:test (and ignores string literals), MCP server
serves the 7-tool surface, ProcessManager tests still 44/44.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 6)

Round 6 of Claude Code Review feedback on PR #1645:

src/services/worker-spawner.ts: validate workerScriptPath at entry

Add an empty-string + existsSync guard at the top of ensureWorkerStarted.
Without this, a partial install or upstream path-resolution regression
just surfaces as a low-signal child_process error from spawnDaemon. The
explicit log line at the entry point makes that class of bug much
easier to diagnose. The mcp-server.ts module-init existsSync check
already covers this for the MCP-server caller, but defending at the
spawner level reinforces the contract for any future caller.

src/services/worker-spawner.ts: document SettingsDefaultsManager
dependency boundary in the module header

The spawner imports from SettingsDefaultsManager, ProcessManager, and
HealthMonitor. None of those currently touch bun:sqlite, but if any
of them ever does, the spawner's SQLite-free contract silently breaks.
The build guardrail in build-hooks.js is the only thing that catches
it. Header comment now flags this so future contributors audit
transitive imports when adding helpers from the shared/infrastructure
layers.

src/services/infrastructure/ProcessManager.ts: add /snap/bin/bun

Ubuntu Snap install path. Now alongside the existing apt path
(/usr/bin/bun) and Homebrew/Linuxbrew paths. The PATH lookup catches
it as fallback, but listing it explicitly avoids paying for an
execSync('which bun') in the common case.

src/servers/mcp-server.ts: elevate missing-bundle log warn → error

A missing worker-service.cjs means EVERY MCP tool call that needs the
worker silently fails. That's a broken-install state, not a transient
condition — match the severity of the dirname-fallback branch above
(which is already ERROR).

Out of scope (consistent with prior rounds, reviewer agrees these are
appropriately deferred):
- Streaming bundle read in build-hooks.js (nit at current 384KB size)
- Unit tests for ensureWorkerStarted / cooldown helpers
- Integration test for non-existent workerScriptPath

Verified: build clean, broadened guardrail still trips on bun:* imports
and ignores string literals, MCP server serves the 7-tool surface,
ProcessManager tests still 44/44.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): defer WORKER_SCRIPT_PATH check to first call (round 7)

Round 7 of Claude Code Review feedback on PR #1645:

src/servers/mcp-server.ts: extract module-level existsSync check into
checkWorkerScriptPath() and call it lazily from ensureWorkerConnection()
instead of at module load.

The early-warning intent is preserved (the check still fires before any
actual spawn attempt), but tests/tools that import this module without
booting the MCP server no longer see noisy ERROR-level log lines for a
worker bundle they never intended to start. The check is cheap and
idempotent, so calling it on every auto-start attempt is fine.

The two failure-mode branches (dirname-resolution failure vs simple
missing-bundle) remain unchanged — the function body is identical to
the previous module-level if-block, just hoisted into a function and
called from ensureWorkerConnection().

False positive (no change needed):
- Reviewer flagged `mkdirSync` as a dead import in worker-spawner.ts,
  but it IS used at line 71 in markWorkerSpawnAttempted (the round-1
  ENOENT fix CodeRabbit explicitly asked for).

Out of scope:
- Volta path (~/.volta/bin/bun) — PATH fallback handles it; nit per
  reviewer
- worker-spawner.ts unit tests — needs injectable I/O, deferred
  consistently since round 1

Verified: build clean, tests 44/44, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 8)

Round 8 of Claude Code Review feedback on PR #1645:

tests/services/worker-spawner.test.ts: NEW FILE — unit tests for the
ensureWorkerStarted entry-point validation guards added in round 6.
Covers the empty-string and non-existent-path cases without requiring
the broader injectable-I/O refactor that the deeper spawn lifecycle
tests would need. 2 new passing tests.

src/services/infrastructure/ProcessManager.ts: memoize
resolveWorkerRuntimePath() for the no-options call site (which is what
spawnDaemon uses). Caches both successful resolutions and the
not-found result so repeated spawn attempts (crash loops, health
thrashing) don't repeatedly hit statSync on candidate paths. Tests
that pass options bypass the cache entirely so existing test cases
remain deterministic. Added resetWorkerRuntimePathCache() exported
for test isolation only.

src/servers/mcp-server.ts: rename checkWorkerScriptPath() →
warnIfWorkerScriptMissing(). Per reviewer: the old name implied a
boolean check but the function returns void and has side effects. New
name is more accurate.

DEFENDED (no change made):
- Reviewer asked to elevate process.cwd() fallback to a synchronous
  throw at module load. This conflicts with round 7 feedback which
  asked to defer the existsSync check to first call to avoid noisy
  test logs. The current lazy approach is the right compromise: it
  fires before any actual spawn attempt, attributes the root cause,
  and doesn't pollute test imports. Throwing at module load would
  crash before stdio is wired up, which is much harder to debug than
  the lazy log line.
- Reviewer asked to grep for `if (!pid)` callsites — already verified
  in round 3, zero offenders in src/.

Out of scope:
- Volta path (~/.volta/bin/bun) — PATH fallback handles it; reviewer
  marked as nit
- Deeper unit tests for ensureWorkerStarted spawn lifecycle (PID file
  cleanup, health checks, etc.) — needs injectable I/O, deferred
  consistently since round 1

Verified: build clean, ProcessManager tests still 44/44, new
worker-spawner tests 2/2, smoke test serves 7 tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(spawner): clear Windows cooldown marker on all healthy paths (round 9)

Round 9 of PR #1645 review feedback.

src/services/worker-spawner.ts: clear stale Windows cooldown marker
on every healthy-return path

Per CodeRabbit (genuine bug):

The .worker-start-attempted marker was previously only cleared after
a spawn initiated by ensureWorkerStarted itself succeeded. If a
previous auto-start failed, then the worker became healthy via
another session or a manual start, the early-return success branches
(existing live PID, fast-path health check, port-in-use waitForHealth)
would leave the stale marker behind. A subsequent genuine outage
inside the 2-minute cooldown window would then be incorrectly
suppressed on Windows.

Now calls clearWorkerSpawnAttempted() on all three healthy success
paths in addition to the existing post-spawn path. The function is
already a no-op on non-Windows, so the change is risk-free for Linux
and macOS callers.

src/servers/mcp-server.ts: more actionable error when auto-start fails

Per claude-review: when ensureWorkerStarted returns false (or throws),
the caller currently logs a generic "Worker auto-start failed" line.
Updated both error sites to explicitly call out which MCP tools will
fail (search/timeline/get_observations) and to point at earlier log
lines for the specific cause. Helps users distinguish "worker is just
not running" from "tools are broken".

DEFENDED (no change):
- Sentinel object for Windows spawnDaemon 0 PID — broader API change,
  out of scope, deferred consistently since round 1
- Spawner lifecycle tests beyond input validation — needs injectable
  I/O, deferred consistently
- Concurrent cooldown marker race on Windows — pre-existing,
  out of scope
- stripHardcodedDirname() regex fragility assertion — pre-existing,
  out of scope

Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(spawner): don't cache null Bun-not-found result (round 10)

Round 10 of PR #1645 review feedback.

src/services/infrastructure/ProcessManager.ts: only cache successful
resolveWorkerRuntimePath() results

Genuine bug from claude-review: the round-8 memoization cached BOTH
successful resolutions AND the not-found `null` result. If Bun isn't
on PATH at the moment the MCP server first tries to spawn the worker
— e.g., on a fresh install where the user installs Bun in another
terminal and retries — every subsequent ensureWorkerConnection call
would return the cached `null` and fail with a misleading "Bun not
found" error even though Bun is now available.

The fix is the one-line change the reviewer suggested: only cache
when `result !== null`. Crash loops still get the fast-path memoized
success; recovery from a fresh-install Bun install still works.

src/servers/mcp-server.ts: rename warnIfWorkerScriptMissing →
errorIfWorkerScriptMissing

Per claude-review: the function uses logger.error but the name says
"warn" — name/level mismatch. Renamed to match. The function still
serves the same purpose (defensive lazy check), just with an accurate
name.

DEFENDED (no change):
- Discriminated union for mcpServerDirResolutionFailed flag — current
  approach works, the noise is minimal, and the alternative would
  add type complexity for a path that's functionally unreachable in
  CJS deployment
- macOS /usr/local/bin/bun "missing" — already in the Linux/macOS
  candidate list at line 137 (false positive from reviewer)
- nix store path — out of scope, PATH fallback handles it
- Long build-hooks.js error message — verbosity is intentional, this
  message only fires on a real regression and the diagnostic value is
  worth the line wrap
- Spawner lifecycle test coverage gap — needs injectable I/O,
  deferred consistently

Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): bundle size budget guardrail (round 11)

Round 11 of PR #1645 review feedback.

scripts/build-hooks.js: secondary bundle-size budget guardrail

Per claude-review: the existing `require("bun:*")` regex catches the
specific regression class we already know about, but if esbuild ever
changes how it emits external module specifiers, the regex could
silently miss the regression. A bundle-size budget catches the
structural symptom (worker-service.ts dragged into the bundle blew
the size from ~358KB to ~1.96MB) regardless of how the imports look.

Set the ceiling at 600KB. Current size is ~384KB; the broken v12.0.0
bundle was ~1920KB. Plenty of headroom for legitimate growth without
incentivizing bundle bloat or false positives. Both guardrails fire
independently — one is regex-based, one is size-based — so a
regression has to defeat both to ship.

tests/services/worker-spawner.test.ts: comment about port irrelevance

Per claude-review: the hardcoded port values in the validation-guard
tests are arbitrary because the path validation short-circuits before
any network I/O. Added a comment explaining this so future readers
don't waste time wondering why specific ports were picked.

DEFENDED (no change):

- clearWorkerSpawnAttempted on the unhealthy-live-PID return path:
  reviewer asked to clear the marker here too, but the current
  behavior is correct. The marker tracks "recently attempted a spawn"
  and exists to prevent rapid PowerShell-popup loops. If a wedged
  process is currently using the port, the spawn isn't actually
  happening on this code path (the helper returns false without
  reaching the spawn step). When the wedged process eventually dies
  and a subsequent call hits the spawn path, the marker correctly
  suppresses repeated retry attempts within the 2-minute cooldown.
  Clearing the marker on the unhealthy-return path would defeat
  exactly the popup-loop protection the marker exists to provide.

- execSync in lookupBinaryInPath blocks event loop: pre-existing
  concern, not introduced by this PR. Reviewer notes "fires once,
  result cached". Not in scope for a hotfix.

- Tracking issue for spawner lifecycle test gap: out of scope for
  this PR; the gap is documented in the test file's header comment
  with a back-reference to PR #1645.

Verified: build clean, both guardrails functional (size budget is
under the new ceiling), ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): eliminate double error log when worker bundle is missing (round 12)

Round 12 of PR #1645 review feedback.

src/servers/mcp-server.ts: errorIfWorkerScriptMissing() now only logs
when the dirname-fallback attribution path is needed

Previously a missing worker-service.cjs would produce two ERROR log
lines on the same code path:
1. errorIfWorkerScriptMissing() in ensureWorkerConnection()
2. The existsSync guard inside ensureWorkerStarted()

The simple "missing bundle" case is fully covered by the spawner's
own existsSync guard. The mcp-server.ts function now ONLY logs when
mcpServerDirResolutionFailed is true — that's the mcp-server-specific
root-cause attribution that the spawner cannot provide on its own.

Net effect: same single error log per bug class, cleaner triage.

DEFENDED (no change):

- mkdirSync error propagation in markWorkerSpawnAttempted: reviewer
  worried that mkdirSync/writeFileSync exceptions could escape, but
  the entire body is already wrapped in try/catch with an APPROVED
  OVERRIDE annotation. False positive.
- clearWorkerSpawnAttempted on healthy paths: reviewer asked a
  clarifying question, not a change request. The behavior is
  intentional — the cooldown marker exists to prevent rapid
  PowerShell-popup loops from a series of failed spawns; a healthy
  worker means the marker has served its purpose and a future
  outage should NOT be suppressed. Will explain in PR reply.
- __filename ESM concern in worker-service.ts wrapper: already
  documented in round 4 with an extended comment about the CJS
  bundle context and why mcp-server.ts can't use the same trick.
- Spawn lifecycle integration tests: deferred consistently since
  round 1; gap is documented in worker-spawner.test.ts header.

Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(spawner): add bare-command BUN env override coverage

Final round of PR #1645 review feedback: while preparing to merge, I
noticed CodeRabbit's round-5 CHANGES_REQUESTED review on commit
3570d2f0 included an unaddressed nitpick — the env-driven bare-command
branch in resolveWorkerRuntimePath() (returning a bare 'bun' unchanged
when BUN or BUN_PATH is set that way) had no test coverage and could
regress without any failing assertion.

Added a focused test that exercises the env: { BUN: 'bun' } branch
specifically. 47/47 tests pass (was 46/46).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:08:36 -07:00
Alex Newman
7996dfd5cd Merge branch 'thedotmack/add-lang-parsers' into integration/validation-batch
Adds 24-language support for smart-explore: Kotlin, Swift, Elixir,
Lua, Scala, Bash, Haskell, Zig, CSS, SCSS, TOML, YAML, SQL, Markdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:50:46 -07:00
Alex Newman
95889c7b4e feat: expand smart-explore to 24 languages with markdown support and user-installable grammars
Add 15 new tree-sitter language grammars (Kotlin, Swift, PHP, Elixir, Lua, Scala,
Bash, Haskell, Zig, CSS, SCSS, TOML, YAML, SQL, Markdown) with verified SCM queries.
Add markdown-specific formatting with heading hierarchy, code block detection, and
section-aware unfold. Add user-installable grammar system via .claude-mem.json config
with custom query file support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:24:56 -07:00
Alex Newman
8d02271321 Merge branch 'pr-1411' into integration/validation-batch 2026-04-06 14:19:03 -07:00
Alex Newman
a4115d055a Merge branch 'pr-1585' into integration/validation-batch 2026-04-06 14:18:28 -07:00
Alex Newman
1f808c0be7 Merge branch 'pr-1579' into integration/validation-batch 2026-04-06 14:18:27 -07:00
Alessandro Costa
5a27420809 feat: add claude-mem-sync for multi-machine observation synchronization (#1570)
Bidirectional sync of observations and session summaries between
machines via SSH/SCP. Exports to JSON, transfers, imports with
deduplication by (created_at, title).

Commands:
  claude-mem-sync push <remote-host>    # local → remote
  claude-mem-sync pull <remote-host>    # remote → local
  claude-mem-sync sync <remote-host>    # bidirectional
  claude-mem-sync status <remote-host>  # compare counts

Features:
- Deduplication prevents duplicates on repeated runs
- Configurable paths via CLAUDE_MEM_DB / CLAUDE_MEM_REMOTE_DB
- Automatic temp file cleanup
- Requires only Python 3 + SSH on both machines

Tested syncing 3,400+ observations between two physical servers.
After sync, a session on the remote server used the transferred
memory to deliver a real feature PR — proving productive
cross-machine workflows.

Co-authored-by: Alessandro Costa <alessandro@claudio.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:14:31 -07:00
Alex Newman
190c74492f fix: address second PR review — clean replace, IDE failure bubbling, bun validation
- cpSync now does rmSync before copy to avoid stale file merges
- setupIDEs() returns failed IDE list; install reports partial success
- runSmartInstall() returns boolean status instead of void
- Worker port in next-steps URL reads CLAUDE_MEM_WORKER_PORT env var
- Goose YAML regex stops at column-0 keys (prevents eating sibling sections)
- AGENTS.md uninstall removes header-only stub files
- findBunPath() validated before use in WindsurfHooksInstaller
- Cursor marked unsupported in ide-detection until installer is wired

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 14:17:18 -07:00
Alex Newman
ae6915b88e fix: address PR review — shebang, double-escaping, data loss, uninstall scope
- Add shebang banner to NPX CLI esbuild config so npx claude-mem works
- Remove manual backslash pre-escaping in WindsurfHooksInstaller (JSON.stringify handles it)
- Scope cache deletion to claude-mem only, not entire vendor namespace
- Use getWorkerPort() in OpenCodeInstaller instead of hard-coded 37777
- Throw on corrupt JSON in readJsonSafe/readGeminiSettings/Windsurf to prevent data loss
- Fix Cursor install stub to warn instead of silently succeeding
- Fix Gemini uninstall to remove individual hooks within groups, not whole groups
- Update tests for new corrupt-file-throws behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:49:14 -07:00
Alex Newman
8265fc7aa1 Merge remote-tracking branch 'origin/thedotmack/npx-gemini-cli' into thedotmack/npx-gemini-cli
Resolve merge conflicts in adapter index, gemini-cli adapter, and rebuilt CJS artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 13:47:49 -07:00
Octopus
c74101b7f7 fix: handle bare filenames in regenerate-claude-md.ts (fixes #1514) 2026-04-03 10:07:11 +08:00
ming
eea9c100ba fix: harden plugin manifest sync script 2026-04-02 20:15:26 +08:00
ming
16f79d6f71 chore: sync plugin manifest metadata from package 2026-04-02 20:07:00 +08:00
Alex Newman
0321f4266d fix: remove import.meta.url banner from CJS files run by Node.js
The MCP server (#!/usr/bin/env node) and context generator run under
Node.js, where import.meta.url throws SyntaxError in CJS mode. Only
the worker-service needs the banner since it runs under Bun.

CJS files under Node.js already have __dirname/__filename natively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:32:43 -07:00
Alex Newman
07ab7000a8 fix: patch 7 critical bugs affecting all non-dev-machine users and Windows
1. Fix esbuild inlining build-machine __dirname as string literal — use
   CJS-compatible runtime banner with require("node:url").fileURLToPath
   across worker-service, mcp-server, and context-generator builds.

2. Fix isMainModule check missing .cjs extension and Windows backslash
   path normalization.

3. Wrap extractLastMessage in try-catch to prevent infinite Stop hook
   feedback loop on malformed transcripts (exit 0 instead of exit 2).

4. Replace heavy SessionEnd hook (Node→Bun→1.7MB CJS→HTTP) with
   lightweight inline node -e one-liner (~200ms vs >1s).

5. Add 7 Gemini/OpenRouter error patterns to unrecoverablePatterns
   circuit breaker to prevent 77K+ retry loops on expired API keys.

6. Preserve CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_CODE_GIT_BASH_PATH in
   sanitizeEnv instead of stripping them with the CLAUDE_CODE_ prefix.

7. Use PowerShell -EncodedCommand for spawnDaemon to fix path quoting
   when Windows usernames contain spaces.

Closes #1515, #1495, #1475, #1465, #1500, #1513, #1512, #1450, #1460,
#1486, #1449, #1481, #1451, #1480, #1453, #1445

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:20:29 -07:00
Alex Newman
f2cc33b494 feat: add Gemini CLI, OpenCode, and Windsurf IDE integrations
Gemini CLI: platform adapter mapping 6 of 11 hooks, settings.json
deep-merge installer, GEMINI.md context injection.

OpenCode: plugin with tool.execute.after interceptor, bus events for
session lifecycle, claude_mem_search custom tool, AGENTS.md context.

Windsurf: platform adapter for tool_info envelope format, hooks.json
installer for 5 post-action hooks, .windsurf/rules context injection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 23:02:18 -07:00
Alex Newman
3a09c1bb1a feat: add NPX CLI and OpenClaw build pipeline, optimize npm package size
Adds esbuild steps for npx-cli (57KB, Node.js ESM) and openclaw plugin
(12KB). Creates .npmignore to exclude node_modules and Bun binary from
npm package, reducing pack size from 146MB to 2MB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 23:02:18 -07:00
Atharva Deopujari
abb5940788 fix: handle single-quoted paths and dangling var; edge case
Address review feedback:
- Match both double-quoted and single-quoted string literals defensively
- Clean up dangling `var ;` when __dirname is the sole declarator
- Refactor into a loop over both identifiers to reduce duplication
2026-03-20 12:39:42 +05:30
Atharva Deopujari
d88ea71590 fix: strip hardcoded __dirname/__filename from bundled CJS output
esbuild inlines __dirname and __filename as static strings when converting
ESM TypeScript source to CJS format. These build-time paths shadow the
runtime's native __dirname (provided by Bun/Node CJS module wrapper),
breaking path resolution for viewer UI, mode loading, and database
initialization on end-user machines.

Add a post-build step that removes the hardcoded var declarations from
all bundled CJS outputs, allowing the runtime globals to work correctly.

Fixes #1410
2026-03-20 11:25:19 +05:30
Alex Newman
0e502dbd21 feat: add smart-explore AST-based code navigation (#1244)
* feat: add smart-file-read module for token-optimized semantic code search

- Created package.json for the smart-file-read module with dependencies and scripts.
- Implemented parser.ts for code structure parsing using tree-sitter, supporting multiple languages.
- Developed search.ts for searching code files and symbols with grep-style and structural matching.
- Added test-run.mjs for testing search and outline functionalities.
- Configured TypeScript with tsconfig.json for strict type checking and module resolution.

* fix: update .gitignore to include _tree-sitter and remove unused subproject

* feat: add preliminary results and skill recommendation for smart-explore module

* chore: remove outdated plan.md file detailing session start hook issues

* feat: update Smart File Read integration plan and skill documentation for smart-explore

* feat: migrate Smart File Read to web-tree-sitter WASM for cross-platform compatibility

* refactor: switch to tree-sitter CLI for parsing and enhance search functionality

- Updated `parser.ts` to utilize the tree-sitter CLI for AST extraction instead of native bindings, improving compatibility and performance.
- Removed grammar loading logic and replaced it with a path resolution for grammar packages.
- Implemented batch parsing in `parseFilesBatch` to handle multiple files in a single CLI call, enhancing search speed.
- Refactored `searchCodebase` to collect files and parse them in batches, streamlining the search process.
- Adjusted symbol extraction logic to accommodate the new parsing method and ensure accurate symbol matching.

* feat: update Smart File Read integration plan to utilize tree-sitter CLI for improved performance and cross-platform compatibility

* feat: add smart-file-read parser and search to src/services

Copy validated tree-sitter CLI-based parser and search modules from
smart-file-read prototype into the claude-mem source tree for MCP
tool integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: register smart_search, smart_unfold, smart_outline MCP tools

Add 3 tree-sitter AST-based code exploration tools to the MCP server.
Direct execution (no HTTP delegation) — they call parser/search
functions directly for sub-second response times.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add tree-sitter CLI deps to build system and plugin runtime

Externalize tree-sitter packages in esbuild MCP server build. Add
10 grammar packages + CLI to plugin package.json for runtime install.
Remove unused @chroma-core/default-embed from plugin deps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: create smart-explore skill with 3-layer workflow docs

Progressive disclosure workflow: search -> outline -> unfold.
Documents all 3 MCP tools with parameters and token economics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add comprehensive documentation for the smart-explore feature

- Introduced a detailed technical reference covering the architecture, parser, search engine, and tool registration for the smart-explore feature in claude-mem.
- Documented the three-layer workflow: search, outline, and unfold, along with their respective MCP tools.
- Explained the parsing process using tree-sitter, including language support, query patterns, and symbol extraction.
- Outlined the search module's functionality, including file discovery, batch parsing, and relevance scoring.
- Provided insights into build system integration and token economics for efficient code exploration.

* chore: remove experiment artifacts, prototypes, and plan files

Remove A/B test docs, prototype smart-file-read directory, and
implementation plans. Keep only production code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: simplify hooks configuration and remove setup script

* fix: use execFileSync to prevent command injection in tree-sitter parser

Replaces execSync shell string with execFileSync + argument array,
eliminating shell interpretation of file paths. Also corrects
file_pattern description from "Glob pattern" to "Substring filter".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:00:26 -05:00
Alex Newman
23f30d35b9 chore: bump version to 10.4.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:39:06 -05:00
Alex Newman
c6f932988a Fix 30+ root-cause bugs across 10 triage phases (#1214)
* MAESTRO: fix ChromaDB core issues — Python pinning, Windows paths, disable toggle, metadata sanitization, transport errors

- Add --python version pinning to uvx args in both local and remote mode (fixes #1196, #1206, #1208)
- Convert backslash paths to forward slashes for --data-dir on Windows (fixes #1199)
- Add CLAUDE_MEM_CHROMA_ENABLED setting for SQLite-only fallback mode (fixes #707)
- Sanitize metadata in addDocuments() to filter null/undefined/empty values (fixes #1183, #1188)
- Wrap callTool() in try/catch for transport errors with auto-reconnect (fixes #1162)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix data integrity — content-hash deduplication, project name collision, empty project guard, stuck isProcessing

- Add SHA-256 content-hash deduplication to observations INSERT (store.ts, transactions.ts, SessionStore.ts)
- Add content_hash column via migration 22 with backfill and index
- Fix project name collision: getCurrentProjectName() now returns parent/basename
- Guard against empty project string with cwd-derived fallback
- Fix stuck isProcessing: hasAnyPendingWork() resets processing messages older than 5 minutes
- Add 12 new tests covering all four fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix hook lifecycle — stderr suppression, output isolation, conversation pollution prevention

- Suppress process.stderr.write in hookCommand() to prevent Claude Code showing diagnostic
  output as error UI (#1181). Restores stderr in finally block for worker-continues case.
- Convert console.error() to logger.warn()/error() in hook-command.ts and handlers/index.ts
  so all diagnostics route to log file instead of stderr.
- Verified all 7 handlers return suppressOutput: true (prevents conversation pollution #598, #784).
- Verified session-complete is a recognized event type (fixes #984).
- Verified unknown event types return no-op handler with exit 0 (graceful degradation).
- Added 10 new tests in tests/hook-lifecycle.test.ts covering event dispatch, adapter defaults,
  stderr suppression, and standard response constants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix worker lifecycle — restart loop coordination, stale transport retry, ENOENT shutdown race

- Add PID file mtime guard to prevent concurrent restart storms (#1145):
  isPidFileRecent() + touchPidFile() coordinate across sessions
- Add transparent retry in ChromaMcpManager.callTool() on transport
  error — reconnects and retries once instead of failing (#1131)
- Wrap getInstalledPluginVersion() with ENOENT/EBUSY handling (#1042)
- Verified ChromaMcpManager.stop() already called on all shutdown paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Windows platform support — uvx.cmd spawn, PowerShell $_ elimination, windowsHide, FTS5 fallback

- Route uvx spawn through cmd.exe /c on Windows since MCP SDK lacks shell:true (#1190, #1192, #1199)
- Replace all PowerShell Where-Object {$_} pipelines with WQL -Filter server-side filtering (#1024, #1062)
- Add windowsHide: true to all exec/spawn calls missing it to prevent console popups (#1048)
- Add FTS5 runtime probe with graceful fallback when unavailable on Windows (#791)
- Guard FTS5 table creation in migrations, SessionSearch, and SessionStore with try/catch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix skills/ distribution — build-time verification and regression tests (#1187)

Add post-build verification in build-hooks.js that fails if critical
distribution files (skills, hooks, plugin manifest) are missing. Add
10 regression tests covering skill file presence, YAML frontmatter,
hooks.json integrity, and package.json files field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix MigrationRunner schema initialization (#979) — version conflict between parallel migration systems

Root cause: old DatabaseManager migrations 1-7 shared schema_versions table with
MigrationRunner's 4-22, causing version number collisions (5=drop tables vs add column,
6=FTS5 vs prompt tracking, 7=discovery_tokens vs remove UNIQUE).  initializeSchema()
was gated behind maxApplied===0, so core tables were never created when old versions
were present.

Fixes:
- initializeSchema() always creates core tables via CREATE TABLE IF NOT EXISTS
- Migrations 5-7 check actual DB state (columns/constraints) not just version tracking
- Crash-safe temp table rebuilds (DROP IF EXISTS _new before CREATE)
- Added missing migration 21 (ON UPDATE CASCADE) to MigrationRunner
- Added ON UPDATE CASCADE to FK definitions in initializeSchema()
- All changes applied to both runner.ts and SessionStore.ts

Tests: 13 new tests in migration-runner.test.ts covering fresh DB, idempotency,
version conflicts, crash recovery, FK constraints, and data integrity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix 21 test failures — stale mocks, outdated assertions, missing OpenClaw guards

Server tests (12): Added missing workerPath and getAiStatus to ServerOptions
mocks after interface expansion. ChromaSync tests (3): Updated to verify
transport cleanup in ChromaMcpManager after architecture refactor. OpenClaw (2):
Added memory_ tool skipping and response truncation to prevent recursive loops
and oversized payloads. MarkdownFormatter (2): Updated assertions to match
current output. SettingsDefaultsManager (1): Used correct default key for
getBool test. Logger standards (1): Excluded CLI transcript command from
background service check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Codex CLI compatibility (#744) — session_id fallbacks, unknown platform tolerance, undefined guard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Cursor IDE integration (#838, #1049) — adapter field fallbacks, tolerant session-init validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix /api/logs OOM (#1203) — tail-read replaces full-file readFileSync

Replace readFileSync (loads entire file into memory) with readLastLines()
that reads only from the end of the file in expanding chunks (64KB → 10MB cap).
Prevents OOM on large log files while preserving the same API response shape.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Settings CORS error (#1029) — explicit methods and allowedHeaders in CORS config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: add session custom_title for agent attribution (#1213) — migration 23, endpoint + store support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: prevent CLAUDE.md/AGENTS.md writes inside .git/ directories (#1165)

Add .git path guard to all 4 write sites to prevent ref corruption when
paths resolve inside .git internals.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix plugin disabled state not respected (#781) — early exit check in all hook entry points

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix UserPromptSubmit context re-injection on every turn (#1079) — contextInjected session flag

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix stale AbortController queue stall (#1099) — lastGeneratorActivity tracking + 30s timeout

Three-layer fix:
1. Added lastGeneratorActivity timestamp to ActiveSession, updated by
   processAgentResponse (all agents), getMessageIterator (queue yields),
   and startGeneratorWithProvider (generator launch)
2. Added stale generator detection in ensureGeneratorRunning — if no
   activity for >30s, aborts stale controller, resets state, restarts
3. Added AbortSignal.timeout(30000) in deleteSession to prevent
   indefinite hang when awaiting a stuck generator promise

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:34:35 -05:00
Alex Newman
50eeed97e7 MAESTRO: fix cache-based install — resolveRoot, post-install verification, CLI path
Replace hardcoded marketplace path in plugin/scripts/smart-install.js with
resolveRoot() that uses CLAUDE_PLUGIN_ROOT env var (set by Claude Code for
all hooks), with fallback to script location and legacy paths. Fixes #1128,
#1166 where cache installs couldn't find or install node_modules.

Also fixes installCLI() path (ROOT/plugin/scripts/ → ROOT/scripts/) and adds
verifyCriticalModules() post-install check with npm fallback on failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:28:51 -05:00
Alex Newman
224567f980 fix: prevent ONNX model cache corruption from bun cache clears
Remove nuclear `bun pm cache rm` from smart-install.js and
sync-marketplace.cjs (only needed for removed sharp dependency).
Add `bun install` in cache version directory after sync so worker
can resolve dependencies. Move HuggingFace model cache to
~/.claude-mem/models/ so reinstalls don't corrupt it. Add self-healing
retry for Protobuf parsing failures.

Fixes recurring issues #1104, #1105, #1110.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 19:54:17 -05:00
Alex Newman
f1162ed4a4 fix: remove sharp dependency that was never needed (#1141)
Sharp was an explicit dependency but nothing in the codebase imports it.
Chroma embeddings use ONNX Runtime via @chroma-core/default-embed, not sharp.
Sharp's native binary has a persistent Bun node_modules layout bug where
@img/sharp-libvips-* isn't placed alongside @img/sharp-darwin-* causing
ERR_DLOPEN_FAILED on every install.

- Remove sharp, @img/sharp-libvips-darwin-arm64, node-gyp from deps
- Remove node-addon-api from devDeps
- Remove @img cache clearing hacks from smart-install.js and sync-marketplace.cjs
- Replace with simple `bun pm cache rm` before install as general cache hygiene

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 19:20:34 -05:00
Alex Newman
f24251118e fix: bun install, node-addon-api for sharp, consolidate PendingMessageStore (#1140)
* fix: use bun install in sync, add node-addon-api for sharp, consolidate PendingMessageStore

- Switch sync-marketplace from npm to bun install
- Add node-addon-api as dev dep so sharp builds under bun
- Consolidate duplicate PendingMessageStore instantiation in worker-service finally block

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* build assets

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 18:05:42 -05:00
Alex Newman
d2e926fbf7 fix: post-merge breakage (Gemini, idle timeout, sharp cache) (#1138)
* fix: add gemini-3-flash to validModels array

The model was defined in the type union and RPM limits but missing from
the runtime validModels array, causing silent fallback to gemini-2.5-flash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip processing when Gemini returns empty observation response

Empty responses were silently consuming messages from the queue via
processAgentResponse. Now skips processing on empty content, leaving
the message in processing status for stale recovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent idle timeout from triggering infinite restart loop

When a session hits the 3-minute idle timeout, the finally block was
seeing stale processing messages and restarting the generator endlessly.
Now tracks idle timeout as a distinct exit reason via session flag,
resets stale messages, and skips restart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: clear stale Bun native module cache on update

Bun's global cache retains sharp/libvips native binaries with broken
dylib references after version upgrades. Clear ~/.bun/install/cache/@img/
before install in both the end-user (smart-install) and dev (sync-marketplace)
paths to prevent ERR_DLOPEN_FAILED errors in Chroma sync.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback (empty summary response, session-scoped reset, shell injection)

- Apply same empty-response guard to summary path as observation path in GeminiAgent
- Add optional sessionDbId param to resetStaleProcessingMessages for session-scoped resets
- Use JSON.stringify for gitignore pattern escaping, filter negation patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 17:46:30 -05:00
Shintaro Okamura
28f35f3ec7 fix: resolve marketplace root path dynamically for XDG-compliant environments (#1031)
The marketplace root was hardcoded to `~/.claude/plugins/marketplaces/thedotmack`,
which does not exist on XDG-compliant setups (e.g. Nix-managed Claude Code) where
plugins are stored under `~/.config/claude/plugins/`.

This caused an ENOENT error on every SessionStart:

    ENOENT: no such file or directory, open
    '~/.claude/plugins/marketplaces/thedotmack/package.json'

Now the script:
1. Derives the base path from CLAUDE_PLUGIN_ROOT if available
2. Probes the XDG path (~/.config/claude/plugins/...)
3. Falls back to the legacy path (~/.claude/plugins/...)

Fixes thedotmack/claude-mem#1030
2026-02-16 00:30:36 -05:00
alfraido86-jpg
209db9f11a fix: implement table counts in bug-report collector, resolving TODO (#1000)
Add getTableCounts() function that queries the SQLite database via the
sqlite3 CLI to retrieve observation, session, and summary counts. Wire
the counts into collectDiagnostics and display them in formatDiagnostics.

https://claude.ai/code/session_0118BNqxCCWebC4Rpo2ypkh9

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-16 00:26:51 -05:00
Peter Dave Hello
be6437c46f Allow README translation to reference existing translations (#1028)
Allow reuse of prior translation files as a style and terminology
guide when regenerating README i18n files.
2026-02-16 00:26:44 -05:00
Alex Newman
1b68c55763 fix: resolve SDK spawn failures and sharp native binary crashes
- Strip CLAUDECODE env var from SDK subprocesses to prevent "cannot be
  launched inside another Claude Code session" error (Claude Code 2.1.42+)
- Lazy-load @chroma-core/default-embed to avoid eagerly pulling in
  sharp native binaries at bundle startup (fixes ERR_DLOPEN_FAILED)
- Add stderr capture to SDK spawn for diagnosing future process failures
- Exclude lockfiles from marketplace rsync and delete stale lockfiles
  before npm install to prevent native dep version mismatches

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:47:27 -05:00
Alex Newman
ed313db742 Merge main into feat/chroma-http-server
Resolve conflicts between Chroma HTTP server PR and main branch changes
(folder CLAUDE.md, exclusion settings, Zscaler SSL, transport cleanup).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:02:54 -05:00
Yasir Ali
583a76b150 feat: Add Urdu language support 2026-02-06 02:40:14 -05:00
Alex Newman
4df9f61347 refactor: implement in-process worker architecture for hooks (#722)
* fix: stop generating empty CLAUDE.md files

- Return empty string instead of "No recent activity" when no observations exist
- Skip writing CLAUDE.md files when formatted content is empty
- Remove redundant "auto-generated by claude-mem" HTML comment
- Clean up 98 existing empty CLAUDE.md files across the codebase
- Update tests to expect empty string for empty input

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* build assets

* refactor: implement in-process worker architecture for hooks

Replaces spawn-based worker startup with in-process architecture:
- Hook processes now become the worker when port 37777 is free
- Eliminates Windows spawn issues (NO SPAWN rule)
- SessionStart chains: smart-install && stop && context

Key changes:
- worker-service.ts: hook case starts WorkerService in-process
- hook-command.ts: skipExit option prevents process.exit() when hosting worker
- hooks.json: single chained command replaces separate start/hook commands
- worker-utils.ts: ensureWorkerRunning() returns boolean, doesn't block
- handlers: graceful fallback when worker unavailable

All 761 tests pass. Manual verification confirms hook stays alive as worker.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* context

* a

* MAESTRO: Mark PR #722 test verification task complete

All 797 tests passed (3 skipped, 0 failed) after merge conflict resolution.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* MAESTRO: Mark PR #722 build verification task complete

* MAESTRO: Mark PR #722 code review task complete

Code review verified:
- worker-service.ts hook case starts WorkerService in-process
- hook-command.ts has skipExit option
- hooks.json uses single chained command
- worker-utils.ts ensureWorkerRunning() returns boolean

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* MAESTRO: Mark PR #722 conflict resolution push task complete

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 19:49:15 -05:00
bigphoot
2c304eafad feat: Add DefaultEmbeddingFunction for local vector embeddings
- Added @chroma-core/default-embed dependency for local embeddings
- Updated ChromaSync to use DefaultEmbeddingFunction with collections
- Added isServerReachable() async method for reliable server detection
- Fixed start() to detect and reuse existing Chroma servers
- Updated build script to externalize native ONNX binaries
- Added runtime dependency to plugin/package.json

The embedding function uses all-MiniLM-L6-v2 model locally via ONNX,
eliminating need for external embedding API calls.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 13:02:04 -08:00
bigphoot
5b3804ac08 feat: Switch to persistent Chroma HTTP server
Replace MCP subprocess approach with persistent Chroma HTTP server for
improved performance and reliability. This re-enables Chroma on Windows
by eliminating the subprocess spawning that caused console popups.

Changes:
- NEW: ChromaServerManager.ts - Manages local Chroma server lifecycle
  via `npx chroma run`
- REFACTOR: ChromaSync.ts - Uses chromadb npm package's ChromaClient
  instead of MCP subprocess (removes Windows disabling)
- UPDATE: worker-service.ts - Starts Chroma server on initialization
- UPDATE: GracefulShutdown.ts - Stops Chroma server on shutdown
- UPDATE: SettingsDefaultsManager.ts - New Chroma configuration options
- UPDATE: build-hooks.js - Mark optional chromadb deps as external

Benefits:
- Eliminates subprocess spawn latency on first query
- Single server process instead of per-operation subprocesses
- No Python/uvx dependency for local mode
- Re-enables Chroma vector search on Windows
- Future-ready for cloud-hosted Chroma (claude-mem pro)
- Cross-platform: Linux, macOS, Windows

Configuration:
  CLAUDE_MEM_CHROMA_MODE=local|remote
  CLAUDE_MEM_CHROMA_HOST=127.0.0.1
  CLAUDE_MEM_CHROMA_PORT=8000
  CLAUDE_MEM_CHROMA_SSL=false

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 13:02:04 -08:00
Alexander Knigge
182097ef1c fix: resolve path format mismatch in folder CLAUDE.md generation (#794) (#813)
The isDirectChild() function failed to match files when the API used
absolute paths (/Users/x/project/app/api) but the database stored
relative paths (app/api/router.py). This caused all folder CLAUDE.md
files to incorrectly show "No recent activity".

Changes:
- Create shared path-utils module with proper path normalization
- Implement suffix matching strategy for mixed path formats
- Update SessionSearch.ts to use shared utilities
- Update regenerate-claude-md.ts to use shared utilities (was using
  outdated broken logic)
- Prevent spurious directory creation from malformed paths
- Add comprehensive test coverage for path matching edge cases

This is the proper fix for #794, replacing PR #809 which only masked
the bug by skipping file creation when "no activity" was shown.

Co-authored-by: bigphoot <bigphoot@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 15:48:31 -05:00
Alex Newman
05323c9db5 Cleanup worker-service.ts: remove dead code and fallback concept (#706)
* refactor(worker): remove dead code from worker-service.ts

Remove ~216 lines of unreachable code:
- Delete `runInteractiveSetup` function (defined but never called)
- Remove unused imports: fs namespace, spawn, homedir, readline,
  existsSync/writeFileSync/readFileSync/mkdirSync
- Clean up CursorHooksInstaller imports (keep only used exports)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(worker): only enable SDK fallback when Claude is configured

Add isConfigured() method to SDKAgent that checks for ANTHROPIC_API_KEY
or claude CLI availability. Worker now only sets SDK agent as fallback
for third-party providers when credentials exist, preventing cascading
failures for users who intentionally use Gemini/OpenRouter without Claude.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(worker): remove misleading re-export indirection

Remove unnecessary re-export of updateCursorContextForProject from
worker-service.ts. ResponseProcessor now imports directly from
CursorHooksInstaller.ts where the function is defined. This eliminates
misleading indirection that suggested a circular dependency existed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(mcp): use build-time injected version instead of hardcoded strings

Replace hardcoded '1.0.0' version strings with __DEFAULT_PACKAGE_VERSION__
constant that esbuild replaces at build time. This ensures MCP server and
client versions stay synchronized with package.json.

- worker-service.ts: MCP client version now uses packageVersion
- ChromaSync.ts: MCP client version now uses packageVersion
- mcp-server.ts: MCP server version now uses packageVersion
- Added clarifying comments for empty MCP capabilities objects

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: Implement cleanup and validation plans for worker-service.ts

- Added a comprehensive cleanup plan addressing 23 identified issues in worker-service.ts, focusing on safe deletions, low-risk simplifications, and medium-risk improvements.
- Created an execution plan for validating intentional patterns in worker-service.ts, detailing necessary actions and priorities.
- Generated a report on unjustified logic in worker-service.ts, categorizing issues by severity and providing recommendations for immediate and short-term actions.
- Introduced documentation for recent activity in the mem-search plugin, enhancing traceability and context for changes.

* fix(sdk): remove dangerous ANTHROPIC_API_KEY check from isConfigured

Claude Code uses CLI authentication, not direct API calls. Checking for
ANTHROPIC_API_KEY could accidentally use a user's API key (from other
projects) which costs 20x more than Claude Code's pricing.

Now only checks for claude CLI availability.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(worker): remove fallback agent concept entirely

Users who choose Gemini/OpenRouter want those providers, not secret
fallback behavior. Removed setFallbackAgent calls and the unused
isConfigured() method.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 23:30:13 -05:00
Alex Newman
601596f5cb Fix: Windows Terminal tab accumulation and Windows 11 compatibility (#625) (#628)
* docs: add folder index generator plan

RFC for auto-generating folder-level CLAUDE.md files with observation
timelines. Includes IDE symlink support and root CLAUDE.md integration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: implement folder index generator (Phase 1)

Add automatic CLAUDE.md generation for folders containing observed files.
This enables IDE context providers to access relevant memory observations.

Core modules:
- FolderDiscovery: Extract folders from observation file paths
- FolderTimelineCompiler: Compile chronological timeline per folder
- ClaudeMdGenerator: Write CLAUDE.md with tag-based content replacement
- FolderIndexOrchestrator: Coordinate regeneration on observation save

Integration:
- Event-driven regeneration after observation save in ResponseProcessor
- HTTP endpoints for folder discovery, timeline, and manual generation
- Settings for enabling/configuring folder index behavior

The <claude-mem-context> tag wrapping ensures:
- Manual CLAUDE.md content is preserved
- Auto-generated content won't be recursively observed
- Clean separation between user and system content

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add updateFolderClaudeMd function to CursorHooksInstaller

Adds function to update CLAUDE.md files for folders touched by observations.
Uses existing /api/search/by-file endpoint, preserves content outside
<claude-mem-context> tags, and writes atomically via temp file + rename.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: hook updateFolderClaudeMd into ResponseProcessor

Calls updateFolderClaudeMd after observation save to update folder-level
CLAUDE.md files. Uses fire-and-forget pattern with error logging.
Extracts file paths from saved observations and workspace path from registry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add timeline formatting for folder CLAUDE.md files

Implements formatTimelineForClaudeMd function that transforms API response
into compact markdown table format. Converts emojis to text labels,
handles ditto marks for timestamps, and groups under "Recent" header.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: remove old folder-index implementation

Deletes redundant folder-index services that were replaced by the simpler
updateFolderClaudeMd approach in CursorHooksInstaller.ts.

Removed:
- src/services/folder-index/ directory (5 files)
- FolderIndexRoutes.ts
- folder-index settings from SettingsDefaultsManager
- folder-index route registration from worker-service

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add worktree-aware project filtering for unified timelines

Detect git worktrees and show both parent repo and worktree observations
in the session start timeline. When running in a worktree, the context
now includes observations from both projects, interleaved chronologically.

- Add detectWorktree() utility to identify worktree directories
- Add getProjectContext() to return parent + worktree projects
- Update context hook to pass multi-project queries
- Add queryObservationsMulti() and querySummariesMulti() for IN clauses
- Maintain backward compatibility with single-project queries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: restructure logging to prove session correctness and reduce noise

Add critical logging at each stage of the session lifecycle to prove the session ID chain (contentSessionId → sessionDbId → memorySessionId) stays aligned. New logs include CREATED, ENQUEUED, CLAIMED, MEMORY_ID_CAPTURED, STORING, and STORED. Move intermediate migration and backfill progress logs to DEBUG level to reduce noise, keeping only essential initialization and completion logs at INFO level.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: extract folder CLAUDE.md utils to shared location

Moves folder CLAUDE.md utilities from CursorHooksInstaller to a new
shared utils file. Removes Cursor registry dependency - file paths
from observations are already absolute, no workspace lookup needed.

New file: src/utils/claude-md-utils.ts
- replaceTaggedContent() - preserves user content outside tags
- writeClaudeMdToFolder() - atomic writes with tag preservation
- formatTimelineForClaudeMd() - API response to compact markdown
- updateFolderClaudeMdFiles() - orchestrates folder updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: trigger folder CLAUDE.md updates when observations are saved

The folder CLAUDE.md update was previously only triggered in
syncAndBroadcastSummary, but summaries run with observationCount=0
(observations are saved separately). Moved the update logic to
syncAndBroadcastObservations where file paths are available.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* all the claudes

* test: add unit tests for claude-md-utils pure functions

Add 11 tests covering replaceTaggedContent and formatTimelineForClaudeMd:
- replaceTaggedContent: empty content, tag replacement, appending, partial tags
- formatTimelineForClaudeMd: empty input, parsing, ditto marks, session IDs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test: add integration tests for file operation functions

Add 9 tests for writeClaudeMdToFolder and updateFolderClaudeMdFiles:
- writeClaudeMdToFolder: folder creation, content preservation, nested dirs, atomic writes
- updateFolderClaudeMdFiles: empty skip, fetch/write, deduplication, error handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test: add unit tests for timeline-formatting utilities

Add 14 tests for extractFirstFile and groupByDate functions:
- extractFirstFile: relative paths, fallback to files_read, null handling, invalid JSON
- groupByDate: empty arrays, date grouping, chronological sorting, item preservation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: rebuild plugin scripts with merged features

* docs: add project-specific CLAUDE.md with architecture and development notes

* fix: exclude project root from auto-generated CLAUDE.md updates

Skip folders containing .git directory when auto-updating subfolder
CLAUDE.md files. This ensures:

1. Root CLAUDE.md remains user-managed and untouched by the system
2. SessionStart context injection stays pristine throughout the session
3. Subfolder CLAUDE.md files continue to receive live context updates
4. Cleaner separation between user-authored root docs and auto-generated folder indexes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: prevent crash from resuming stale SDK sessions on worker restart

When the worker restarts, it was incorrectly passing the `resume` parameter
to INIT prompts (lastPromptNumber=1) when a memorySessionId existed from a
previous SDK session. This caused "Claude Code process exited with code 1"
crashes because the SDK tried to resume into a session that no longer exists.

Root cause: The resume condition only checked `hasRealMemorySessionId` but
did not verify that this was a CONTINUATION prompt (lastPromptNumber > 1).

Fix: Add `session.lastPromptNumber > 1` check to the resume condition:
- Before: `...(hasRealMemorySessionId && { resume: session.memorySessionId })`
- After: `...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: ... })`

Also added:
- Enhanced debug logging that warns when skipping resume for INIT prompts
- Unit tests in tests/sdk-agent-resume.test.ts (9 test cases)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: properly handle Chroma MCP connection errors

Previously, ensureCollection() caught ALL errors from chroma_get_collection_info
and assumed they meant "collection doesn't exist", triggering unnecessary
collection creation attempts. Connection errors like "Not connected" or
"MCP error -32000: Connection closed" would cascade into failed creation attempts.

Similarly, queryChroma() would silently return empty results when the MCP call
failed, masking the underlying connection problem.

Changes:
- ensureCollection(): Detect connection errors and re-throw immediately instead
  of attempting collection creation
- queryChroma(): Wrap MCP call in try-catch and throw connection errors instead
  of returning empty results
- Both methods reset connection state (connected=false, client=null) on
  connection errors so subsequent operations can attempt to reconnect

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* pushed

* fix: scope regenerate-claude-md.ts to current working directory

Critical bug fix: The script was querying ALL observations from the database
across ALL projects ever recorded (1396+ folders), then attempting to write
CLAUDE.md files everywhere including other projects, non-existent paths, and
ignored directories.

Changes:
- Use git ls-files to discover folders (respects .gitignore automatically)
- Filter database query to current project only (by folder name)
- Use relative paths for database queries (matches storage format)
- Add --clean flag to remove auto-generated CLAUDE.md files
- Add fallback directory walker for non-git repos

Now correctly scopes to 26 folders with observations instead of 1396+.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs and adjustments

* fix: cleanup mode strips tags instead of deleting files blindly

The cleanup mode was incorrectly deleting entire files that contained
<claude-mem-context> tags. The correct behavior (per original design):

1. Strip the <claude-mem-context>...</claude-mem-context> section
2. If empty after stripping → delete the file
3. If has remaining content → save the stripped version

Now properly preserves user content in CLAUDE.md files while removing
only the auto-generated sections.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* deleted some files

* chore: regenerate folder CLAUDE.md files with fixed script

Regenerated 23 folder CLAUDE.md files using the corrected script that:
- Scopes to current working directory only
- Uses git ls-files to respect .gitignore
- Filters by project name

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update CLAUDE.md files for January 5, 2026

- Regenerated and staged 23 CLAUDE.md files with a mix of new and modified content.
- Fixed cleanup mode to properly strip tags instead of deleting files blindly.
- Cleaned up empty CLAUDE.md files from various directories, including ~/.claude and ~/Scripts.
- Conducted dry-run cleanup that identified a significant reduction in auto-generated CLAUDE.md files.
- Removed the isAutoGeneratedClaudeMd function due to incorrect file deletion behavior.

* feat: use settings for observation limit in batch regeneration script

Replace hard-coded limit of 10 with configurable CLAUDE_MEM_CONTEXT_OBSERVATIONS
setting (default: 50). This allows users to control how many observations appear
in folder CLAUDE.md files.

Changes:
- Import SettingsDefaultsManager and load settings at script startup
- Use OBSERVATION_LIMIT constant derived from settings at both call sites
- Remove stale default parameter from findObservationsByFolder function

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: use settings for observation limit in event-driven updates

Replace hard-coded limit of 10 in updateFolderClaudeMdFiles with
configurable CLAUDE_MEM_CONTEXT_OBSERVATIONS setting (default: 50).

Changes:
- Import SettingsDefaultsManager and os module
- Load settings at function start (once, not in loop)
- Use limit from settings in API call

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: Implement configurable observation limits and enhance search functionality

- Added configurable observation limits to batch regeneration scripts.
- Enhanced SearchManager to handle folder queries and normalize parameters.
- Introduced methods to check for direct child files in observations and sessions.
- Updated SearchOptions interface to include isFolder flag for filtering.
- Improved code quality with comprehensive reviews and anti-pattern checks.
- Cleaned up auto-generated CLAUDE.md files across various directories.
- Documented recent changes and improvements in CLAUDE.md files.

* build asset

* Project Context from Claude-Mem auto-added (can be auto removed at any time)

* CLAUDE.md updates

* fix: resolve CLAUDE.md files to correct directory in worktree setups

When using git worktrees, CLAUDE.md files were being written relative to
the worker's process.cwd() instead of the actual project directory. This
fix threads the project's cwd from message processing through to the file
writing utilities, ensuring CLAUDE.md files are created in the correct
project directory regardless of where the worker was started.

Changes:
- Add projectRoot parameter to updateFolderClaudeMdFiles for path resolution
- Thread projectRoot through ResponseProcessor call chain
- Track lastCwd from messages in SDKAgent, GeminiAgent, OpenRouterAgent
- Add tests for relative/absolute path handling with projectRoot

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* more project context updates

* context updates

* planning context

* feat: add CLI infrastructure for unified hook architecture (Phase 1)

- Add src/cli/types.ts with NormalizedHookInput, HookResult, PlatformAdapter, EventHandler interfaces
- Add src/cli/stdin-reader.ts with readJsonFromStdin() extracted from save-hook.ts pattern

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add platform adapters for unified hook CLI (Phase 2)

- Add claude-code adapter mapping session_id, tool_name, etc.
- Add cursor adapter mapping conversation_id, workspace_roots, result_json
- Add raw adapter for testing/passthrough
- Add getPlatformAdapter() factory function

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add event handlers for unified hook CLI (Phase 3)

- Add context handler (GET /api/context/inject)
- Add session-init handler (POST /api/sessions/init)
- Add observation handler (POST /api/sessions/observations)
- Add summarize handler (POST /api/sessions/summarize)
- Add user-message handler (stderr output, exit code 3)
- Add file-edit handler for Cursor afterFileEdit events
- Add getEventHandler() factory function

All handlers copy exact HTTP calls from original hooks.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add hook command dispatcher (Phase 4)

- Add src/cli/hook-command.ts dispatching stdin to adapters and handlers
- Add 'hook' case to worker-service.ts CLI switch
- Usage: bun worker-service.cjs hook <platform> <event>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: update hooks.json to use unified CLI (Phase 5)

- Change context-hook.js to: hook claude-code context
- Change new-hook.js to: hook claude-code session-init
- Change save-hook.js to: hook claude-code observation
- Change summary-hook.js to: hook claude-code summarize
- Change user-message-hook.js to: hook claude-code user-message

All hooks now route through unified CLI: bun worker-service.cjs hook <platform> <event>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: update Cursor integration to use unified CLI (Phase 6)

- Update CursorHooksInstaller to generate unified CLI commands
- Use node instead of shell scripts for Cursor hooks
- Add platform field to NormalizedHookInput for handler decisions
- Skip SDK agent init for Cursor platform (not supported)
- Fix file-edit handler to use 'write_file' tool name

Cursor event mapping:
- beforeSubmitPrompt → session-init, context
- afterMCPExecution/afterShellExecution → observation
- afterFileEdit → file-edit
- stop → summarize

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: remove old hook files, complete unified CLI migration (Phase 7)

Deleted files:
- src/hooks/context-hook.ts, new-hook.ts, save-hook.ts, summary-hook.ts, user-message-hook.ts
- cursor-hooks/*.sh and *.ps1 shell scripts
- plugin/scripts/*-hook.js built files

Modified:
- scripts/build-hooks.js: removed hook build loop

Build now produces only: worker-service.cjs, mcp-server.cjs, context-generator.cjs
All hooks route through: bun worker-service.cjs hook <platform> <event>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: handle undefined stdin in platform adapters for SessionStart hooks

SessionStart hooks don't receive stdin data from Claude Code, causing the
adapters to crash when trying to access properties on undefined. Added
null coalescing to handle empty input gracefully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* context update

* fix: replace deprecated WMIC with PowerShell for Windows 11 compatibility

Fixes #625

Changes:
- Replace WMIC process queries with PowerShell Get-Process and Get-CimInstance
- WMIC is deprecated in Windows 11 and causes terminal tab accumulation
- PowerShell provides simpler output format (just PIDs, not "ProcessId=1234")
- Update tests to match new PowerShell output parsing logic

Benefits:
- Windows 11 compatibility (WMIC removal planned)
- Fixes terminal window accumulation issue on Windows
- Cleaner, more maintainable parsing logic
- Same security validation (PID > 0, integer checks)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: graceful exit strategy to prevent Windows Terminal tab accumulation (#625)

Problem:
Windows Terminal keeps tabs open when processes exit with code 1, leading
to tab accumulation during worker lifecycle operations (start, stop, restart,
version mismatches, port conflicts). This created a poor user experience with
dozens of orphaned terminal tabs.

Solution:
Implemented graceful exit strategy using exit code 0 for all expected failure
scenarios. The wrapper and plugin handle restart logic, so child processes
don't need to signal failure with non-zero exit codes.

Key Changes:
- worker-service.ts: Changed all process.exit(1) to process.exit(0) in:
  - Port conflict scenarios
  - Version mismatch recovery
  - Daemon spawn failures
  - Health check timeouts
  - Restart failures
  - Worker startup errors
- mcp-server.ts: Changed fatal error exit from 1 to 0
- ProcessManager.ts: Changed signal handler error exit from 1 to 0
- hook-command.ts: Changed hook error exit code from 1 to 2 (BLOCKING_ERROR)
  to ensure users see error messages (per Claude Code docs, exit 1 only shows
  in verbose mode)

All exits include explanatory comments documenting the graceful exit strategy.

Fixes #625

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: update CLAUDE.md activity logs

Auto-generated context updates from work on issue #625:
- Windows 11 WMIC migration
- PowerShell process enumeration
- Graceful exit strategy implementation
- PR creation for Windows Terminal tab fix

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: update activity logs in CLAUDE.md files across multiple directories

* chore: update CLAUDE.md files with recent activity and documentation improvements

- Adjusted dates and entries in CLAUDE.md for various components including reports and services.
- Added detailed activity logs for worker services, CLI commands, and server interactions.
- Documented exit code strategy in CLAUDE.md to clarify graceful exit philosophy.
- Extracted PowerShell timeout constant and updated related documentation and tests.
- Enhanced log level audit strategy and clarified logging patterns across services.

* polish: extract PowerShell timeout constant and document exit code strategy

- Extract magic number 60000ms to HOOK_TIMEOUTS.POWERSHELL_COMMAND (10000ms)
- Reduce PowerShell timeout from 60s to 10s per review feedback
- Document exit code strategy in CLAUDE.md
- Add test coverage for new constant

Addresses review feedback from PR #628

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* build assets

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 23:13:31 -05:00