Addresses unresolved CodeRabbit finding on WorktreeAdoption.ts:296.
Previously, Chroma patch failures stranded rows permanently: adoptedSqliteIds
was built only from rows where merged_into_project IS NULL, so once SQL
committed, reruns couldn't rediscover them for retry.
The Chroma id set is now built from ALL observations whose project matches a
merged worktree — including rows already stamped to this parent. Combined
with the idempotent updateMergedIntoProject, transient Chroma failures
self-heal on the next adoption pass.
SQL writes remain idempotent (UPDATE still guards on merged_into_project IS
NULL), so adoptedObservations / adoptedSummaries continue to count only
newly-adopted rows. chromaUpdates now counts total Chroma writes per pass
(may exceed adoptedObservations when retrying).
Addresses six CodeRabbit/Greptile findings on PR #2052:
- Schema guard in adoptMergedWorktrees probes for merged_into_project
columns before preparing statements; returns early when absent so first
boot after upgrade (pre-migration) doesn't silently fail.
- Startup adoption now iterates distinct cwds from pending_messages and
dedupes via resolveMainRepoPath — the worker daemon runs with
cwd=plugin scripts dir, so process.cwd() fallback was a no-op.
- ObservationCompiler single-project queries (queryObservations /
querySummaries) OR merged_into_project into WHERE so injected context
surfaces adopted worktree rows, matching the Multi variants.
- SessionStore constructor now calls ensureMergedIntoProjectColumns so
bundled artifacts (context-generator.cjs) that embed SessionStore get
the merged_into_project column on DBs that only went through the
bundled migration chain.
- OBSERVER_SESSIONS_PROJECT constant is now derived from
basename(OBSERVER_SESSIONS_DIR) and used across PaginationHelper,
SessionStore, and timeline queries instead of hardcoded strings.
- Corrected misleading Chroma retry docstring in WorktreeAdoption to
match actual behavior (no auto-retry once SQL commits).
Observer sessions (internal SDK-driven worker queries) run under a
synthetic project name 'observer-sessions' to keep them out of
claude --resume. They were still surfacing in the viewer project
picker and unfiltered observation/summary/prompt feeds.
Filter them out at every UI-facing query:
- SessionStore.getAllProjects and getProjectCatalog
- timeline/queries.ts getAllProjects
- PaginationHelper observations/summaries/prompts when no project is selected
When a caller explicitly requests project='observer-sessions',
results are still returned (not a hard ban, just hidden by default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Document --branch override in npx-cli help text
- Guard ContextBuilder against empty projects[] override; fall back to cwd-derived primary
- Ensure merged_into_project indexes are created even if ALTER ran in a prior partial migration
- Reject adopt --branch/--cwd flags with missing or flag-like values
- Use defined --color-border-primary token for merged badge border
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Update allProjects test expectation to match [parent, composite] (matches JSDoc + callers in ContextBuilder/context handlers).
- Replace string-matched __DRY_RUN_ROLLBACK__ sentinel with dedicated DryRunRollback class to avoid swallowing unrelated errors.
- Add 5000ms timeout to spawnSync git calls in WorktreeAdoption and ProcessManager so worker startup can't hang on a stuck git process.
- Drop unreachable break after process.exit(0) in adopt case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regenerated worker-service.cjs, context-generator.cjs, viewer.html, and
viewer-bundle.js to reflect all six implementation phases of the merged-
worktree adoption feature.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worktrees are branches off main; the parent holds the architecture,
decisions, and long-tail history the worktree inherits. Scoping reads
to the worktree alone meant every new worktree started cold on any
question that required prior context.
Expand `allProjects` in a worktree to `[parent, composite]` so reads
pull both. Writes still go through `.primary` (the composite), so
sibling worktrees don't leak into each other.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ports scripts/cwd-remap.ts into ProcessManager.runOneTimeCwdRemap() and
invokes it in initializeBackground() alongside the existing chroma
migration. Uses pending_messages.cwd as the source of truth to rewrite
pre-worktree bare project names into the parent/worktree composite
format so search and context are consistent.
- Backs up the DB to .bak-cwd-remap-<ts> before any writes.
- Idempotent: marker file .cwd-remap-applied-v1 short-circuits reruns.
- No-ops on fresh installs (no DB, or no pending_messages table).
- On failure, logs and skips the marker so the next restart retries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Leftover artifacts from an abandoned context-injection feature. The
project-level CLAUDE.md stays; the directory-level ones were generated
timeline scaffolding that never panned out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three sites didn't account for parent/worktree composite naming:
- PaginationHelper.stripProjectPath: marker used full composite, breaking
path sanitization for worktrees checked out outside a parent/leaf layout.
Now extracts the leaf segment.
- observations/store.ts: fallback imported getCurrentProjectName from
shared/paths.ts (a duplicate impl without worktree detection). Switched
to getProjectContext().primary so writes key into the same project as
reads.
- SearchManager.getRecentContext: fallback used basename(cwd) and lost
the parent prefix, making the MCP tool find nothing in worktrees.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a caller (e.g. worker context-inject route) passes a `projects`
array without a matching cwd, the cwd-derived `context.primary` drifted
from the projects being queried — producing an empty-state header for
one project while querying another. Use the last entry of `projects` so
header and query target stay in sync.
Revert of #1820 behavior. Each worktree now gets its own bucket:
- In a worktree, primary = `parent/worktree` (e.g. `claude-mem/dar-es-salaam`)
- In a main repo, primary = basename (unchanged)
- allProjects is always `[primary]` — strict isolation at query time
Includes a one-off maintenance script (scripts/worktree-remap.ts) that
retroactively reattributes past sessions to their worktree using path
signals in observations and user prompts. Two-rule inference keeps the
remap high-confidence:
1. The worktree basename in the path matches the session's current
plain project name (pre-#1820 era; trusted).
2. Or all worktree path signals converge on a single (parent, worktree)
across the session.
Ambiguous sessions are skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Observations were 100% failing on Claude Code 2.1.109+ because the Agent
SDK emits ["--setting-sources", ""] when settingSources defaults to [].
The existing Bun-workaround filter stripped the empty string but left
the orphan --setting-sources flag, which then consumed --permission-mode
as its value, crashing the subprocess with:
Error processing --setting-sources:
Invalid setting source: --permission-mode.
Make the filter pair-aware: when an empty arg follows a --flag, drop
both so the SDK default (no setting sources) is preserved by omission.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The synthetic summary salvage feature created fake summaries from observation
data when the AI returned <observation> instead of <summary> tags. This was
overengineered — missing a summary is preferable to fabricating one from
observation fields that don't map cleanly to summary semantics.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve Setup hook broken reference and warn on macOS-only binary (#1547)
On Linux ARM64, the plugin silently failed because:
1. The Setup hook called setup.sh which was removed; the hook exited 127
(file not found), causing the plugin to appear uninstalled.
2. The committed plugin/scripts/claude-mem binary is macOS arm64 only;
no warning was shown when it could not execute on other platforms.
Fix the Setup hook to call smart-install.js (the current setup mechanism)
and add checkBinaryPlatformCompatibility() to smart-install.js, which reads
the Mach-O magic bytes from the bundled binary and warns users on non-macOS
platforms that the JS fallback (bun-runner.js + worker-service.cjs) is active.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: close fd in finally block, strengthen smart-install tests to use production function
- Wrap openSync/readSync in checkBinaryPlatformCompatibility with a finally block so the file descriptor is always closed even if readSync throws
- Export checkBinaryPlatformCompatibility with an optional binaryPath param for testability
- Refactor Mach-O detection tests to call the production function directly, mocking process.platform and passing controlled binary paths, eliminating duplicated inline logic
- Strengthen plugin-distribution test to assert at least one command hook exists before checking for smart-install.js, preventing vacuous pass
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* fix: replace hardcoded nvm/homebrew PATH with universal login shell resolution
Hook commands previously hardcoded PATH entries for nvm and homebrew,
causing `node: command not found` for users with other Node version
managers (mise, asdf, volta, fnm, Nix, etc.).
Replace with `$($SHELL -lc 'echo $PATH')` which inherits the user's
login shell PATH regardless of how Node was installed. Also adds the
missing PATH export to the PreToolUse hook (#1702).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add cache-path fallback to PreToolUse hook
Aligns PreToolUse _R resolution with all other hooks by adding the
cache directory lookup before falling back to the marketplace path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: exclude primary key index from unique constraint check in migration 7
PRAGMA index_list returns all indexes including those backing PRIMARY KEY
columns (origin='pk'), which always have unique=1. The check was therefore
always true, causing migration 7 to run the full DROP/CREATE/RENAME table
sequence on every worker startup instead of short-circuiting once the
UNIQUE constraint had already been removed.
Fix: filter to non-PK indexes by requiring idx.origin !== 'pk'. The
origin field is already present on the existing IndexInfo interface.
Fixes#1749
* fix: apply pk-origin guard to all three migration code paths
CodeRabbit correctly identified that the origin !== 'pk' fix was only
applied to MigrationRunner.ts but not to the two other active code paths
that run the same removeSessionSummariesUniqueConstraint logic:
- SessionStore.ts:220 — used by DatabaseManager and worker-service
- plugin/scripts/context-generator.cjs — bundled artifact (minified)
All three paths now consistently exclude primary-key indexes when
detecting unique constraints on session_summaries.
syncAndBroadcastSummary was using the raw ParsedSummary (null when salvaged)
instead of summaryForStore for the SSE broadcast, causing a crash when the
LLM returns <observation> without <summary> tags. Also removes misplaced
tree-sitter docs from mem-search/SKILL.md (belongs in smart-explore).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The three SessionStart hooks poll http://localhost:37777/health with an
8-second retry budget and then exit 1 silently (via `curl -sf || exit 1`)
when the worker has not stabilized in time. Claude Code surfaces this as
two `SessionStart:startup hook error - Failed with non-blocking status
code: No stderr output` messages on every first session of the day.
This happens because of a race between the MCP server auto-starting the
worker and the hook's own `worker-service.cjs start` path, which on Linux
respects a live PID file written by an earlier session and waits for the
existing worker to become healthy. 8s is not always enough.
Mitigate in the hook layer without touching worker-service.cjs:
- bump the inner retry loop from 8 to 20 attempts (up to ~20s)
- replace `|| exit 1` with `|| true` so the hook emits its continue JSON
regardless (Claude Code no longer logs a phantom error)
- guard the context-injection hook with a health check - only run
`worker-service.cjs hook claude-code context` if the worker is actually
responding, otherwise skip silently
Trade-off: on cold starts where the worker takes longer than ~20s to
come up, the recent-context preamble is skipped for that first session
instead of surfacing an error. Subsequent sessions in the same day work
normally because the worker is already healthy.
Refs #1447
- Add Custom Grammars (.claude-mem.json) section explaining how to register
additional tree-sitter parsers for unsupported file extensions
- Add Markdown Special Support section documenting heading-based outline,
code-fence search, section unfold, and frontmatter extraction behaviors
- Expand bundled language test to cover all 10 documented languages plus
the plain-text fallback sentence to prevent partial doc regressions
Co-Authored-By: Claude <noreply@anthropic.com>
Node 22+ emits DEP0190 when spawnSync is called with a separate args
array and shell:true, because the args are only concatenated (not
escaped). Split the findBun() PATH check into platform-specific calls:
Windows uses spawnSync('where bun', { shell: true }) as a single string,
Unix uses spawnSync('which', ['bun']) with no shell option.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
tree-sitter language docs belonged in smart-explore but were absent;
this adds the Bundled Languages table (10 languages) with correct placement.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
The mcp-server.cjs script requires bun:sqlite, a Bun-specific built-in
that is unavailable in Node.js. When Claude Code spawns the script using
the shebang (#!/usr/bin/env node), the import fails with:
Error: Cannot find module 'bun:sqlite'
Fix: explicitly invoke bun as the command and pass the script as an arg,
so the correct runtime is used regardless of the shebang line.
* feat: add knowledge agent types, store, builder, and renderer
Phase 1 of Knowledge Agents feature. Introduces corpus compilation
pipeline that filters observations from the database into portable
corpus files stored at ~/.claude-mem/corpora/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add corpus CRUD HTTP endpoints and wire into worker service
Phase 2 of Knowledge Agents. Adds CorpusRoutes with 5 endpoints
(build, list, get, delete, rebuild) and registers them during
worker background initialization alongside SearchRoutes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add KnowledgeAgent with V1 SDK prime/query/reprime
Phase 3 of Knowledge Agents. Uses Agent SDK V1 query() with
resume and disallowedTools for Q&A-only knowledge sessions.
Auto-reprimes on session expiry. Adds prime, query, and reprime
HTTP endpoints to CorpusRoutes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add MCP tools and skill for knowledge agents
Phase 4 of Knowledge Agents. Adds build_corpus, list_corpora,
prime_corpus, and query_corpus MCP tools delegating to worker
HTTP endpoints. Includes /knowledge-agent skill with workflow docs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: handle SDK process exit in KnowledgeAgent, add e2e test
The Agent SDK may throw after yielding all messages when the
Claude process exits with a non-zero code. Now tolerates this
if session_id/answer were already captured. Adds comprehensive
e2e test script (31 assertions) orchestrated via tmux-cli.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use settings model ID instead of hardcoded model in KnowledgeAgent
Reads CLAUDE_MEM_MODEL from user settings via getModelId(), matching
the existing SDKAgent pattern. No more hardcoded model assumptions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: improve knowledge agents developer experience
Add public documentation page, rebuild/reprime MCP tools, and actionable
error messages. DX review scored knowledge agents 4/10 — core engineering
works (31/31 e2e) but the feature was invisible. This addresses
discoverability (docs, cross-links), API completeness (missing MCP tools),
and error quality (fix/example fields in error responses).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add quick start guide to knowledge agents page
Covers the three main use cases upfront: creating an agent, asking a
single question, and starting a fresh conversation with reprime. Includes
keeping-it-current section for rebuild + reprime workflow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address code review issues — path traversal, session safety, prompt injection
- Block path traversal in CorpusStore with alphanumeric name validation and resolved path check
- Harden system prompt against instruction injection from untrusted corpus content
- Validate question field as non-empty string in query endpoint
- Only persist session_id after successful prime (not null on failure)
- Persist refreshed session_id after query execution
- Only auto-reprime on session resume errors, not all query failures
- Add fenced code block language tags to SKILL.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address remaining code review issues — e2e robustness, MCP validation, docs
- Harden e2e curl wrappers with connect-timeout, fallback to HTTP 000 on transport failure
- Use curl_post wrapper consistently for all long-running POST calls
- Add runtime name validation to all corpus MCP tool handlers
- Fix docs: soften hallucination guarantee to probabilistic claim
- Fix architecture diagram: add missing rebuild_corpus and reprime_corpus tools
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: enforce string[] type in safeParseJsonArray for corpus data integrity
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add blank line before fenced code blocks in SKILL.md maintenance section
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Patch release for the MCP server bun:sqlite crash fix landed in
PR #1645 (commit abd55977).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node
The MCP server bundle (mcp-server.cjs) ships with `#!/usr/bin/env node` so
it must run under Node, but commit 2b60dd29 added an import of
`ensureWorkerStarted` from worker-service.ts. That import transitively pulls
in DatabaseManager → bun:sqlite, blowing up at top-level require under Node.
The bundle ballooned from ~358KB (v11.0.1) to ~1.96MB (v12.0.0) and crashed
on every spawn, breaking the MCP server entirely for Codex/MCP-only clients
and any flow that boots the MCP tool surface.
Fix:
1. Extract `ensureWorkerStarted` and the Windows spawn-cooldown helpers
into a new lightweight module `src/services/worker-spawner.ts` that
only imports from infrastructure/ProcessManager, infrastructure/HealthMonitor,
shared/*, and utils/logger — no SQLite, no ChromaSync, no DatabaseManager.
2. The new helper takes the worker script path explicitly so callers
running under Node (mcp-server) can pass `worker-service.cjs` while
callers already inside the worker (worker-service self-spawn) pass
`__filename`. worker-service.ts keeps a thin wrapper for back-compat.
3. mcp-server.ts now imports from worker-spawner.js and resolves
WORKER_SCRIPT_PATH via __dirname so the daemon can be auto-started
for MCP-only clients without dragging in the entire worker bundle.
4. resolveWorkerRuntimePath() now searches for Bun on every platform
(not just Windows). worker-service.cjs requires Bun at runtime, so
when the spawner is invoked from a Node process the Unix branch can
no longer fall through to process.execPath (= node).
5. spawnDaemon's Unix branch now calls resolveWorkerRuntimePath() instead
of hardcoding process.execPath, fixing the same Node-spawning-Node bug
for the actual subprocess launch on Linux/macOS.
After:
- mcp-server.cjs is 384KB again with zero `bun:sqlite` references
- node mcp-server.cjs initializes and serves tools/list + tools/call
(verified via JSON-RPC against the running worker)
- ProcessManager test suite updated for the new cross-platform Bun
resolution behavior; full suite has the same pre-existing failures
as main, no regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 1)
Per Claude Code Review on PR #1645:
1. mcp-server.ts: log a warning when both __dirname and import.meta.url
resolution fail. The cwd() fallback is essentially dead code for the
CJS bundle but if it ever fires it gives the user a breadcrumb instead
of a silently-wrong WORKER_SCRIPT_PATH.
2. mcp-server.ts: existsSync check on WORKER_SCRIPT_PATH at module load.
Surfaces a clear "worker-service.cjs not found at expected path" log
line for partial installs / dev environments instead of letting the
failure surface as a generic spawnDaemon error later.
3. ProcessManager.ts: explanatory comment on the Windows `return 0`
sentinel in spawnDaemon. Documents that PowerShell Start-Process
doesn't return a PID and that callers MUST use `pid === undefined`
for failure detection — never falsy checks like `if (!pid)`.
Items 4 (no direct unit tests for the worker-spawner Windows cooldown
helpers) and 5 (process-manager.test.ts uses real ~/.claude-mem path)
are deferred — the reviewer flagged the latter as out of scope, and
the former needs an injectable-I/O refactor that isn't appropriate
for a hotfix bugfix PR.
Verified: build clean, mcp-server.cjs still 384KB / zero bun:sqlite,
JSON-RPC tools/list still returns the 7-tool surface, ProcessManager
test suite still 43/43.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(spawner): mkdir CLAUDE_MEM_DATA_DIR before writing Windows cooldown marker
Per CodeRabbit on PR #1645: on a fresh user profile, the data dir may not
exist yet when markWorkerSpawnAttempted() runs. writeFileSync would throw
ENOENT, the catch would swallow it, and the marker would never be created
— defeating the popup-loop protection this helper exists to provide.
mkdirSync(dir, { recursive: true }) is a no-op when the directory already
exists, so it's safe to call on every spawn attempt.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(spawner): add APPROVED OVERRIDE annotations for cooldown marker catches
Per CodeRabbit on PR #1645: silent catch blocks at spawn-cooldown sites
should carry the APPROVED OVERRIDE annotation that the rest of the
codebase uses (see ProcessManager.ts:689, BaseRouteHandler.ts:82,
ChromaSync.ts:288).
Both catches are intentional best-effort:
- markWorkerSpawnAttempted: if mkdir/writeFileSync fails, the worker
spawn itself will almost certainly fail too. Surfacing that downstream
is far more useful than a noisy log line about a lock file.
- clearWorkerSpawnAttempted: a stale marker is harmless. Worst case is
one suppressed retry within the cooldown window, then self-heals.
No behaviour change. Resolves the second half of CodeRabbit's lines
38-65 comment on worker-spawner.ts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 2)
Round 2 of Claude Code Review feedback on PR #1645:
Build guardrail (most important — protects the regression this PR fixes):
- scripts/build-hooks.js: post-build check that fails the build if
mcp-server.cjs ever contains a `bun:sqlite` reference. This is the
exact regression PR #1645 fixed; future contributors will get an
immediate, actionable error if a transitive import re-introduces it.
Verified the check trips when violated.
Code clarity:
- src/servers/mcp-server.ts: drop dead `_originalLog` capture — it was
never restored. Less code is fewer bugs.
- src/servers/mcp-server.ts: elevate `cwd()` fallback log from WARN to
ERROR. Per reviewer: a wrong WORKER_SCRIPT_PATH means worker auto-start
silently fails, so the breadcrumb should be loud and searchable.
- src/services/worker-service.ts: extended doc comment on the
`ensureWorkerStartedShared(port, __filename)` wrapper explaining why
`__filename` is the correct script path here (CJS bundle = compiled
worker-service.cjs) and why mcp-server.ts can't use the same trick.
- src/services/infrastructure/ProcessManager.ts: inline comment on the
`env.BUN === 'bun'` bare-command guard explaining why it's reachable
even though `isBunExecutablePath('bun')` is true (pathExists returns
false for relative names, so the second branch is what fires).
Coverage:
- src/services/infrastructure/ProcessManager.ts: add `/usr/bin/bun` to
the Linux candidate paths so apt-installed Bun on Debian/Ubuntu is
found without falling through to the PATH lookup.
Out-of-scope items (deferred with rationale in PR replies):
- Unit tests for ensureWorkerStarted / Windows cooldown helpers — needs
injectable-I/O refactor unsuitable for a hotfix.
- Sentinel object for Windows spawnDaemon `0` — broader API change.
- Windows Scoop install path — follow-up for a future PR.
- runOneTimeChromaMigration placement, aggressiveStartupCleanup,
console.log redirect timing, platform timeout multiplier — all
pre-existing and unrelated to this regression.
Verified: build clean, guardrail trips on simulated violation,
mcp-server.cjs still 0 bun:sqlite refs, ProcessManager tests 43/43.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 3)
Round 3 of Claude Code Review feedback on PR #1645:
ProcessManager.ts: improve actionability of "Bun not found" errors
Both Windows and Unix branches of spawnDaemon previously logged a vague
"Failed to locate Bun runtime" message when resolveWorkerRuntimePath()
returned null. Replaced with an actionable message that names the install
URL and explains *why* Bun is required (worker uses bun:sqlite). The
existing null-guard at the call sites already prevents passing null to
child_process.spawn — only the error text changed.
scripts/build-hooks.js: refine bun:sqlite guardrail to match actual
require() calls only
The previous coarse `includes('bun:sqlite')` check tripped on its own
improved error message, which legitimately mentions "bun:sqlite" by name.
Switched to a regex that matches `require("bun:sqlite")` /
`require('bun:sqlite')` (with optional whitespace, handles both quote
styles, handles minified output) so error messages and inline comments
can reference the module name without false positives. Verified the
regex still trips on real violations (both spaced and minified forms)
and correctly ignores string-literal mentions.
Other round-3 items (verified, not changed):
- TOOL_ENDPOINT_MAP: reviewer flagged as dead code, but it IS used at
lines 250 and 263 by the search and timeline tool handlers. False
positive — kept as-is.
- if (!pid) callsites: grepped src/, zero offenders. The Windows `0`
PID sentinel contract is safe; only the in-line documentation comment
in ProcessManager.ts mentions the anti-pattern.
- callWorkerAPIPost double-wrapping: pre-existing intentional behavior
(only used by /api/observations/batch which returns raw data, not
the MCP {content:[...]} shape). Unrelated to this regression.
- Snap path / startParentHeartbeat / main().catch / test for non-
existent workerScriptPath / etc — pre-existing or out of scope for
this hotfix, deferred per established disposition.
Verified: build clean, guardrail still trips on real violations,
mcp-server.cjs has 0 require("bun:sqlite") calls, JSON-RPC tools/list
returns the 7-tool surface, ProcessManager tests 43/43.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(spawnDaemon): contract test for Windows 0 PID success sentinel
Per CodeRabbit nitpick on PR #1645 commit 7a96b3b9: add a focused test
that documents the spawnDaemon return contract so any future contributor
who introduces `if (!pid)` against a spawnDaemon return value (or its
wrapper) sees a failing assertion explaining why the falsy check is
incorrect.
The test deliberately exercises the JS-level semantics rather than
mocking PowerShell — a true mocked Windows test would require
refactoring spawnDaemon to take an injectable execSync, which is a
larger change than this hotfix should carry. The contract assertions
here catch the same regression class (treating Windows success as
failure) without that refactor.
Verified: bun test tests/infrastructure/process-manager.test.ts now
passes 44/44 (was 43/43).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 4)
Round 4 of Claude Code Review feedback on PR #1645 (review of round-3
commit 193286f9):
tests/infrastructure/process-manager.test.ts: replace require('fs')
with the already-imported statSync. Reviewer correctly flagged that
the file uses ESM-style named imports everywhere else and the inline
require() calls would break under strict ESM. Two callsites updated
in the touchPidFile test.
src/services/infrastructure/ProcessManager.ts: hoist
resolveWorkerRuntimePath() and the `Bun runtime not found` error
handling out of both branches in spawnDaemon. Both Windows and Unix
branches need the same Bun lookup, and resolving once before the OS
branch split avoids a duplicate execSync('which bun')/where bun in the
no-well-known-path fallback. The error message is also DRY now —
single source of truth instead of two near-identical strings.
CodeRabbit confirmed in its previous reply that "All actionable items
across all four review rounds are fully resolved" — these two minor
items from claude-review of round 3 are the only remaining cleanup.
Verified: build clean, ProcessManager tests still 44/44.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 5)
Round 5 of Claude Code Review feedback on PR #1645:
src/services/worker-spawner.ts: drop `export` from internal helpers
`shouldSkipSpawnOnWindows`, `markWorkerSpawnAttempted`, and
`clearWorkerSpawnAttempted` were exported even though they were
private in worker-service.ts and nothing outside this module needs
them. Removing the `export` keyword keeps the public surface to just
`ensureWorkerStarted` and prevents future callers from bypassing the
spawn lifecycle.
scripts/build-hooks.js: broaden guardrail to all bun:* modules
Previously the regex only caught `require("bun:sqlite")`, but every
module in the `bun:` namespace (bun:ffi, bun:test, etc.) is Bun-only
and would crash mcp-server.cjs the same way under Node. Generalized
the regex to `require("bun:[a-z][a-z0-9_-]*")` so a transitive import
of any Bun-only module fails the build instead of shipping a broken
bundle. Verified the new regex still trips on bun:sqlite, bun:ffi,
bun:test, and correctly ignores string-literal mentions in error
messages.
src/servers/mcp-server.ts: attribute root cause when dirname resolution fails
Previously, if `__dirname`/`import.meta.url` resolution failed and we
fell back to `process.cwd()`, the user would see two warnings: an
error about the dirname fallback AND a separate warning about the
missing worker bundle. The second warning hides the root cause —
someone debugging would assume the install is broken when really it's
a dirname-resolution failure. Track the failure with a flag and emit
a single root-cause-attributing log line in the existence-check
branch instead. The dirname fallback paths are still functionally
unreachable in CJS deployment; this just makes the failure mode
unmistakable if it ever does fire.
Out of scope (consistent with prior rounds):
- darwin/linux split for non-Windows candidate paths (benign today)
- Integration test for non-existent workerScriptPath (test coverage
gap deferred since rounds 1-2)
- Defer existsSync check to first ensureWorkerStarted call (current
module-init check is the loud signal we want)
Already addressed in earlier rounds:
- resolveWorkerRuntimePath() called twice in spawnDaemon → hoisted in
round 4 (b2c114b4)
- _originalLog dead code → removed in round 2 (7a96b3b9)
Verified: build clean, broadened guardrail trips on bun:sqlite,
bun:ffi, and bun:test (and ignores string literals), MCP server
serves the 7-tool surface, ProcessManager tests still 44/44.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 6)
Round 6 of Claude Code Review feedback on PR #1645:
src/services/worker-spawner.ts: validate workerScriptPath at entry
Add an empty-string + existsSync guard at the top of ensureWorkerStarted.
Without this, a partial install or upstream path-resolution regression
just surfaces as a low-signal child_process error from spawnDaemon. The
explicit log line at the entry point makes that class of bug much
easier to diagnose. The mcp-server.ts module-init existsSync check
already covers this for the MCP-server caller, but defending at the
spawner level reinforces the contract for any future caller.
src/services/worker-spawner.ts: document SettingsDefaultsManager
dependency boundary in the module header
The spawner imports from SettingsDefaultsManager, ProcessManager, and
HealthMonitor. None of those currently touch bun:sqlite, but if any
of them ever does, the spawner's SQLite-free contract silently breaks.
The build guardrail in build-hooks.js is the only thing that catches
it. Header comment now flags this so future contributors audit
transitive imports when adding helpers from the shared/infrastructure
layers.
src/services/infrastructure/ProcessManager.ts: add /snap/bin/bun
Ubuntu Snap install path. Now alongside the existing apt path
(/usr/bin/bun) and Homebrew/Linuxbrew paths. The PATH lookup catches
it as fallback, but listing it explicitly avoids paying for an
execSync('which bun') in the common case.
src/servers/mcp-server.ts: elevate missing-bundle log warn → error
A missing worker-service.cjs means EVERY MCP tool call that needs the
worker silently fails. That's a broken-install state, not a transient
condition — match the severity of the dirname-fallback branch above
(which is already ERROR).
Out of scope (consistent with prior rounds, reviewer agrees these are
appropriately deferred):
- Streaming bundle read in build-hooks.js (nit at current 384KB size)
- Unit tests for ensureWorkerStarted / cooldown helpers
- Integration test for non-existent workerScriptPath
Verified: build clean, broadened guardrail still trips on bun:* imports
and ignores string literals, MCP server serves the 7-tool surface,
ProcessManager tests still 44/44.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): defer WORKER_SCRIPT_PATH check to first call (round 7)
Round 7 of Claude Code Review feedback on PR #1645:
src/servers/mcp-server.ts: extract module-level existsSync check into
checkWorkerScriptPath() and call it lazily from ensureWorkerConnection()
instead of at module load.
The early-warning intent is preserved (the check still fires before any
actual spawn attempt), but tests/tools that import this module without
booting the MCP server no longer see noisy ERROR-level log lines for a
worker bundle they never intended to start. The check is cheap and
idempotent, so calling it on every auto-start attempt is fine.
The two failure-mode branches (dirname-resolution failure vs simple
missing-bundle) remain unchanged — the function body is identical to
the previous module-level if-block, just hoisted into a function and
called from ensureWorkerConnection().
False positive (no change needed):
- Reviewer flagged `mkdirSync` as a dead import in worker-spawner.ts,
but it IS used at line 71 in markWorkerSpawnAttempted (the round-1
ENOENT fix CodeRabbit explicitly asked for).
Out of scope:
- Volta path (~/.volta/bin/bun) — PATH fallback handles it; nit per
reviewer
- worker-spawner.ts unit tests — needs injectable I/O, deferred
consistently since round 1
Verified: build clean, tests 44/44, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 8)
Round 8 of Claude Code Review feedback on PR #1645:
tests/services/worker-spawner.test.ts: NEW FILE — unit tests for the
ensureWorkerStarted entry-point validation guards added in round 6.
Covers the empty-string and non-existent-path cases without requiring
the broader injectable-I/O refactor that the deeper spawn lifecycle
tests would need. 2 new passing tests.
src/services/infrastructure/ProcessManager.ts: memoize
resolveWorkerRuntimePath() for the no-options call site (which is what
spawnDaemon uses). Caches both successful resolutions and the
not-found result so repeated spawn attempts (crash loops, health
thrashing) don't repeatedly hit statSync on candidate paths. Tests
that pass options bypass the cache entirely so existing test cases
remain deterministic. Added resetWorkerRuntimePathCache() exported
for test isolation only.
src/servers/mcp-server.ts: rename checkWorkerScriptPath() →
warnIfWorkerScriptMissing(). Per reviewer: the old name implied a
boolean check but the function returns void and has side effects. New
name is more accurate.
DEFENDED (no change made):
- Reviewer asked to elevate process.cwd() fallback to a synchronous
throw at module load. This conflicts with round 7 feedback which
asked to defer the existsSync check to first call to avoid noisy
test logs. The current lazy approach is the right compromise: it
fires before any actual spawn attempt, attributes the root cause,
and doesn't pollute test imports. Throwing at module load would
crash before stdio is wired up, which is much harder to debug than
the lazy log line.
- Reviewer asked to grep for `if (!pid)` callsites — already verified
in round 3, zero offenders in src/.
Out of scope:
- Volta path (~/.volta/bin/bun) — PATH fallback handles it; reviewer
marked as nit
- Deeper unit tests for ensureWorkerStarted spawn lifecycle (PID file
cleanup, health checks, etc.) — needs injectable I/O, deferred
consistently since round 1
Verified: build clean, ProcessManager tests still 44/44, new
worker-spawner tests 2/2, smoke test serves 7 tools.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(spawner): clear Windows cooldown marker on all healthy paths (round 9)
Round 9 of PR #1645 review feedback.
src/services/worker-spawner.ts: clear stale Windows cooldown marker
on every healthy-return path
Per CodeRabbit (genuine bug):
The .worker-start-attempted marker was previously only cleared after
a spawn initiated by ensureWorkerStarted itself succeeded. If a
previous auto-start failed, then the worker became healthy via
another session or a manual start, the early-return success branches
(existing live PID, fast-path health check, port-in-use waitForHealth)
would leave the stale marker behind. A subsequent genuine outage
inside the 2-minute cooldown window would then be incorrectly
suppressed on Windows.
Now calls clearWorkerSpawnAttempted() on all three healthy success
paths in addition to the existing post-spawn path. The function is
already a no-op on non-Windows, so the change is risk-free for Linux
and macOS callers.
src/servers/mcp-server.ts: more actionable error when auto-start fails
Per claude-review: when ensureWorkerStarted returns false (or throws),
the caller currently logs a generic "Worker auto-start failed" line.
Updated both error sites to explicitly call out which MCP tools will
fail (search/timeline/get_observations) and to point at earlier log
lines for the specific cause. Helps users distinguish "worker is just
not running" from "tools are broken".
DEFENDED (no change):
- Sentinel object for Windows spawnDaemon 0 PID — broader API change,
out of scope, deferred consistently since round 1
- Spawner lifecycle tests beyond input validation — needs injectable
I/O, deferred consistently
- Concurrent cooldown marker race on Windows — pre-existing,
out of scope
- stripHardcodedDirname() regex fragility assertion — pre-existing,
out of scope
Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(spawner): don't cache null Bun-not-found result (round 10)
Round 10 of PR #1645 review feedback.
src/services/infrastructure/ProcessManager.ts: only cache successful
resolveWorkerRuntimePath() results
Genuine bug from claude-review: the round-8 memoization cached BOTH
successful resolutions AND the not-found `null` result. If Bun isn't
on PATH at the moment the MCP server first tries to spawn the worker
— e.g., on a fresh install where the user installs Bun in another
terminal and retries — every subsequent ensureWorkerConnection call
would return the cached `null` and fail with a misleading "Bun not
found" error even though Bun is now available.
The fix is the one-line change the reviewer suggested: only cache
when `result !== null`. Crash loops still get the fast-path memoized
success; recovery from a fresh-install Bun install still works.
src/servers/mcp-server.ts: rename warnIfWorkerScriptMissing →
errorIfWorkerScriptMissing
Per claude-review: the function uses logger.error but the name says
"warn" — name/level mismatch. Renamed to match. The function still
serves the same purpose (defensive lazy check), just with an accurate
name.
DEFENDED (no change):
- Discriminated union for mcpServerDirResolutionFailed flag — current
approach works, the noise is minimal, and the alternative would
add type complexity for a path that's functionally unreachable in
CJS deployment
- macOS /usr/local/bin/bun "missing" — already in the Linux/macOS
candidate list at line 137 (false positive from reviewer)
- nix store path — out of scope, PATH fallback handles it
- Long build-hooks.js error message — verbosity is intentional, this
message only fires on a real regression and the diagnostic value is
worth the line wrap
- Spawner lifecycle test coverage gap — needs injectable I/O,
deferred consistently
Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): bundle size budget guardrail (round 11)
Round 11 of PR #1645 review feedback.
scripts/build-hooks.js: secondary bundle-size budget guardrail
Per claude-review: the existing `require("bun:*")` regex catches the
specific regression class we already know about, but if esbuild ever
changes how it emits external module specifiers, the regex could
silently miss the regression. A bundle-size budget catches the
structural symptom (worker-service.ts dragged into the bundle blew
the size from ~358KB to ~1.96MB) regardless of how the imports look.
Set the ceiling at 600KB. Current size is ~384KB; the broken v12.0.0
bundle was ~1920KB. Plenty of headroom for legitimate growth without
incentivizing bundle bloat or false positives. Both guardrails fire
independently — one is regex-based, one is size-based — so a
regression has to defeat both to ship.
tests/services/worker-spawner.test.ts: comment about port irrelevance
Per claude-review: the hardcoded port values in the validation-guard
tests are arbitrary because the path validation short-circuits before
any network I/O. Added a comment explaining this so future readers
don't waste time wondering why specific ports were picked.
DEFENDED (no change):
- clearWorkerSpawnAttempted on the unhealthy-live-PID return path:
reviewer asked to clear the marker here too, but the current
behavior is correct. The marker tracks "recently attempted a spawn"
and exists to prevent rapid PowerShell-popup loops. If a wedged
process is currently using the port, the spawn isn't actually
happening on this code path (the helper returns false without
reaching the spawn step). When the wedged process eventually dies
and a subsequent call hits the spawn path, the marker correctly
suppresses repeated retry attempts within the 2-minute cooldown.
Clearing the marker on the unhealthy-return path would defeat
exactly the popup-loop protection the marker exists to provide.
- execSync in lookupBinaryInPath blocks event loop: pre-existing
concern, not introduced by this PR. Reviewer notes "fires once,
result cached". Not in scope for a hotfix.
- Tracking issue for spawner lifecycle test gap: out of scope for
this PR; the gap is documented in the test file's header comment
with a back-reference to PR #1645.
Verified: build clean, both guardrails functional (size budget is
under the new ceiling), ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): eliminate double error log when worker bundle is missing (round 12)
Round 12 of PR #1645 review feedback.
src/servers/mcp-server.ts: errorIfWorkerScriptMissing() now only logs
when the dirname-fallback attribution path is needed
Previously a missing worker-service.cjs would produce two ERROR log
lines on the same code path:
1. errorIfWorkerScriptMissing() in ensureWorkerConnection()
2. The existsSync guard inside ensureWorkerStarted()
The simple "missing bundle" case is fully covered by the spawner's
own existsSync guard. The mcp-server.ts function now ONLY logs when
mcpServerDirResolutionFailed is true — that's the mcp-server-specific
root-cause attribution that the spawner cannot provide on its own.
Net effect: same single error log per bug class, cleaner triage.
DEFENDED (no change):
- mkdirSync error propagation in markWorkerSpawnAttempted: reviewer
worried that mkdirSync/writeFileSync exceptions could escape, but
the entire body is already wrapped in try/catch with an APPROVED
OVERRIDE annotation. False positive.
- clearWorkerSpawnAttempted on healthy paths: reviewer asked a
clarifying question, not a change request. The behavior is
intentional — the cooldown marker exists to prevent rapid
PowerShell-popup loops from a series of failed spawns; a healthy
worker means the marker has served its purpose and a future
outage should NOT be suppressed. Will explain in PR reply.
- __filename ESM concern in worker-service.ts wrapper: already
documented in round 4 with an extended comment about the CJS
bundle context and why mcp-server.ts can't use the same trick.
- Spawn lifecycle integration tests: deferred consistently since
round 1; gap is documented in worker-spawner.test.ts header.
Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(spawner): add bare-command BUN env override coverage
Final round of PR #1645 review feedback: while preparing to merge, I
noticed CodeRabbit's round-5 CHANGES_REQUESTED review on commit
3570d2f0 included an unaddressed nitpick — the env-driven bare-command
branch in resolveWorkerRuntimePath() (returning a bare 'bun' unchanged
when BUN or BUN_PATH is set that way) had no test coverage and could
regress without any failing assertion.
Added a focused test that exercises the env: { BUN: 'bun' } branch
specifically. 47/47 tests pass (was 46/46).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collapse multiple whitespace, trim, and increase max length to 160 chars
for observation titles in file-context deny reason.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix migration version conflict: addSessionPlatformSourceColumn now uses v25
- Sanitize observation titles in file-context deny reason (strip newlines, limit length)
- Guard json_each() with LIKE '[%' check for legacy bare-path rows
- Guard /stream SSE endpoint with 503 before DB initialization
- Scope bun-runner signal exit handling to start subcommand only
- Normalize platformSource at route boundary in DataRoutes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change file-read gate from deny to allow with limit:1, injecting the
observation timeline as additionalContext. Edit now works on gated files
since the file registers as "read" with near-zero token cost.
- Add updatedInput to HookResult type for PreToolUse hooks.
- Add .npmrc with legacy-peer-deps=true for tree-sitter peer dep conflicts.
- Add --legacy-peer-deps to npm fallback paths in smart-install.js so end
users without bun can install the 24 grammar packages.
- Rebuild plugin artifacts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove duplicate TranscriptWatcher/config imports in worker-service.ts
- Use normalizePlatformSource in handleSessionInitByClaudeId for consistency
- Don't skip DB completion when session not in memory (completeByClaudeId)
- Add try-catch around fetch in useContextPreview refresh callback
- Deduplicate store.getAllProjects() call in DataRoutes
- Fix malformed comment separators in migration runner
- Fix missing closing brace and JSDoc opener (merge artifact) in migration runner
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix indentation bugs flagged in PR review (SettingsDefaultsManager,
MigrationRunner), add current date/time to file read gate timeline
so the model can judge observation recency, and add documentation
for the file read gate feature.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs fixed:
1. SessionCompletionHandler called dbManager.getSessionStore() during
WorkerService construction, before DB initialization. Changed to
accept DatabaseManager and defer the call to runtime.
2. migration009 (generated_by_model, relevance_count columns) only ran
via the deprecated MigrationRunner path, never through SessionStore's
migration chain. Added addObservationModelColumns() to SessionStore
constructor. Checks column existence directly since schema_versions
may have been marked applied without the ALTER TABLE succeeding.
Also removed duplicate transcriptWatcher declaration and shutdown block
(merge artifact).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sort within-day observations chronologically (was specificity-ordered)
- Canonicalize relative paths to POSIX format before DB lookup
- Skip projects param when allProjects is empty (prevents cross-project leaks)
- Remove dead stderrMessage field and hook-command block (unused after permissionDecision switch)
- Type permissionDecision as 'allow' | 'deny' union instead of string
- Remove redundant non-null assertions in getObservationsByFilePath
- Add edit guidance to deny message (use sed via Bash with smart tools)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>