claude-mem

mirror of https://github.com/thedotmack/claude-mem synced 2026-04-25 17:15:04 +02:00

Author	SHA1	Message	Date
Alex Newman	99060bac1a	fix: detect PID reuse in worker start-guard (container restarts) (#2082 ) * fix: detect PID reuse in worker start-guard to survive container restarts The 'Worker already running' guard checked PID liveness with kill(0), which false-positives when a persistent PID file outlives the PID namespace (docker stop / docker start, pm2 graceful reloads). The new worker comes up with the same low PID (e.g. 11) as the old one, kill(0) says 'alive', and the worker refuses to start against its own prior incarnation. Capture a process-start token alongside the PID and verify identity, not just liveness: - Linux: /proc/<pid>/stat field 22 (starttime, jiffies since boot) - macOS/POSIX: `ps -p <pid> -o lstart=` - Windows: unchanged (returns null, falls back to liveness) PID files written by older versions are token-less, so verifyPidFileOwnership falls back to the current liveness-only behavior for backwards compatibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: apply review feedback to PID identity helpers - Collapse ProcessManager re-export down to a single import/export statement. - Make verifyPidFileOwnership a type predicate (info is PidInfo) so callers don't need non-null assertions on the narrowed value. - Drop the `!` assertions at the worker-service GUARD 1 call site now that the predicate narrows. - Tighten the captureProcessStartToken platform doc comment to enumerate process.platform values explicitly. No behavior change — esbuild output is byte-identical (type-only edits). Addresses items 1-3 of the claude-review comment on PR #2082. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: pin LC_ALL=C for `ps lstart=` in captureProcessStartToken Without a locale pin, `ps -o lstart=` emits month/weekday names in the system locale. A bind-mounted PID file written under one locale and read under another would hash to different tokens and the live worker would incorrectly appear stale — reintroducing the very bug this helper exists to prevent. Flagged by Greptile on PR #2082. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: address second-round review on PID identity helpers - verifyPidFileOwnership: log a DEBUG diagnostic when the PID is alive but the start-token mismatches. Without it, callers can't distinguish the "process dead" path from the "PID reused" path in production logs — the exact case this helper exists to catch. - writePidFile: drop the redundant `?? undefined` coercion. `null` and `undefined` are both falsy for the subsequent ternary, so the coercion was purely cosmetic noise that suggested an important distinction. - Add a unit test for the win32 fallback path in captureProcessStartToken (mocks process.platform) — previously uncovered in CI. Addresses items 1, 2, and 5 of the second claude-review on PR #2082. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 19:49:03 -07:00
Alex Newman	789efe4234	feat: disable subagent summaries, label subagent observations (#2073 ) * feat: disable subagent summaries and label subagent observations Detect Claude Code subagent hook context via `agent_id`/`agent_type` on stdin, short-circuit the Stop-hook summary path when present, and thread the subagent identity end-to-end onto observation rows (new `agent_type` and `agent_id` columns, migration 010 at version 27). Main-session rows remain NULL; content-hash dedup is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address PR #2073 review feedback - Narrow summarize subagent guard to agentId only so --agent-started main sessions still own their summary (agentType alone is main-session). - Remove now-dead agentId/agentType spreads from the summarize POST body. - Always overwrite pendingAgentId/pendingAgentType in SDK/Gemini/OpenRouter agents (clears stale subagent identity on main-session messages after a subagent message in the same batch). - Add idx_observations_agent_id index in migration 010 + the mirror migration in SessionStore + the runner. - Replace console.log in migration010 with logger.debug. - Update summarize test: agentType alone no longer short-circuits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit + claude-review iteration 4 feedback - SessionRoutes.handleSummarizeByClaudeId: narrow worker-side guard to agentId only (matches hook-side). agentType alone = --agent main session, which still owns its summary. - ResponseProcessor: wrap storeObservations in try/finally so pendingAgentId/Type clear even if storage throws. Prevents stale subagent identity from leaking into the next batch on error. - SessionStore.importObservation + bulk.importObservation: persist agent_type/agent_id so backup/import round-trips preserve subagent attribution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * polish: claude-review iteration 5 cleanup - Use ?? not \|\| for nullable subagent fields in PendingMessageStore (prevents treating empty string as null). - Simplify observation.ts body spread — include fields unconditionally; JSON.stringify drops undefined anyway. - Narrow any[] to Array<{ name: string }> in migration010 column checks. - Add trailing newline to migrations.ts. - Document in observations/store.ts why the dedup hash intentionally excludes agent fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * polish: claude-review iteration 7 feedback - claude-code adapter: add 128-char safety cap on agent_id/agent_type so a malformed Claude Code payload cannot balloon DB rows. Empty strings now also treated as absent. - migration010: state-aware debug log lists only columns actually added; idempotent re-runs log "already present; ensured indexes". - Add 3 adapter tests covering the length cap boundary and empty-string rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf: skip subagent summary before worker bootstrap Move the agentId short-circuit above ensureWorkerRunning() so a Stop hook fired inside a subagent does not trigger worker startup just to return early. Addresses CodeRabbit nit on summarize.ts:36-47. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:58:01 -07:00
Copilot	8ec91e7ffa	fix: break infinite summary-retry loop (#1633 ) (#2072 ) * Initial plan * fix: break infinite summary-retry loop (#1633) Three-part fix: 1. Parser coercion: When LLM returns <observation> tags instead of <summary>, coerce observation content into summary fields (root cause fix) 2. Stronger summary prompt: Add clearer tag requirements with warnings 3. Circuit breaker: Track consecutive summary failures per session, skip further attempts after 3 failures to prevent unbounded prompt growth Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * refactor: extract shared constants for summary mode marker and failure threshold Addresses code review feedback: SUMMARY_MODE_MARKER and MAX_CONSECUTIVE_SUMMARY_FAILURES are now defined once in sdk/prompts.ts and imported by ResponseProcessor and SessionManager. Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77 Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> * fix: guard summary failure counter on summaryExpected (Greptile P1) The circuit breaker counter previously incremented on any response containing <observation> or <summary> tags — which matches virtually every normal observation response. After 3 observations the breaker would open and permanently block summarization, reproducing the data-loss scenario #1633 was meant to prevent. Gate the increment block on summaryExpected (already computed for parseSummary coercion) so the counter only tracks actual summary attempts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: cover circuit-breaker + apply review polish - Use findLast / at(-1) for last-user-message lookup instead of filter + index (O(1) common case). - Drop redundant `\|\| 0` fallback — field is required and initialized. - Add comment noting counter is ephemeral by design. - Add ResponseProcessor tests covering: * counter NOT incrementing on normal observation responses (regression guard for the Greptile P1) * counter incrementing when a summary was expected but missing * counter resetting to 0 on successful summary storage Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: iterate all observation blocks; don't count skip_summary as failure Addresses CodeRabbit review on #2072: - coerceObservationToSummary now iterates all <observation> blocks with a global regex and returns the first block that has title, narrative, or facts. Previously, an empty leading observation would short-circuit and discard populated follow-ups. - Circuit-breaker counter now treats explicit <skip_summary/> as neutral — neither a failure nor a success — so a run that happens to end on a skip doesn't punish the session or mask a prior bad streak. Real failures (no summary, no skip) still increment. - Tests added for both cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: reference SUMMARY_MODE_MARKER constant instead of hardcoded string Addresses CodeRabbit nitpick: tests should pull the marker from the canonical source so they don't silently drift when the constant is renamed or edited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: also coerce observations when <summary> has empty sub-tags When the LLM wraps an empty <summary></summary> around real observation content, the #1360 empty-subtag guard rejects the summary and returns null — which would lose the observation content and resurrect the #1633 retry loop. Fall back to coerceObservationToSummary in that branch too, mirroring the unmatched-<summary> path. Adds a test covering the empty-summary-wraps-observation case and a guard test for empty summary with no observation content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com> Co-authored-by: Alex Newman <thedotmack@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:00:38 -07:00
Alex Newman	d24f3a7019	fix(worktree): address PR review — test assertion, dry-run sentinel, git timeouts - Update allProjects test expectation to match [parent, composite] (matches JSDoc + callers in ContextBuilder/context handlers). - Replace string-matched __DRY_RUN_ROLLBACK__ sentinel with dedicated DryRunRollback class to avoid swallowing unrelated errors. - Add 5000ms timeout to spawnSync git calls in WorktreeAdoption and ProcessManager so worker startup can't hang on a stuck git process. - Drop unreachable break after process.exit(0) in adopt case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 19:50:01 -07:00
Alex Newman	9d695f53ed	chore: remove auto-generated per-directory CLAUDE.md files Leftover artifacts from an abandoned context-injection feature. The project-level CLAUDE.md stays; the directory-level ones were generated timeline scaffolding that never panned out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 17:51:24 -07:00
Alex Newman	040729beef	fix(project-name): use parent/worktree composite so observations don't cross worktrees Revert of #1820 behavior. Each worktree now gets its own bucket: - In a worktree, primary = `parent/worktree` (e.g. `claude-mem/dar-es-salaam`) - In a main repo, primary = basename (unchanged) - allProjects is always `[primary]` — strict isolation at query time Includes a one-off maintenance script (scripts/worktree-remap.ts) that retroactively reattributes past sessions to their worktree using path signals in observations and user prompts. Two-rule inference keeps the remap high-confidence: 1. The worktree basename in the path matches the session's current plain project name (pre-#1820 era; trusted). 2. Or all worktree path signals converge on a single (parent, worktree) across the session. Ambiguous sessions are skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-16 15:40:44 -07:00
Alex Newman	aa7cdb6d9f	fix: revert unauthorized $CMEM branding in context header A prior Claude instance snuck in a `$CMEM` token branding header during a context compression refactor (`7e072106`). Reverts back to the original descriptive format: `# [project] recent context, datetime` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 12:04:27 -07:00
Alex Newman	d0fc68c630	revert: remove overengineered summary salvage logic (#1718 ) (#1850 ) The synthetic summary salvage feature created fake summaries from observation data when the AI returned <observation> instead of <summary> tags. This was overengineered — missing a summary is preferable to fabricating one from observation fields that don't map cleanly to summary semantics. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 04:22:41 -07:00
Ben Younes	05232ff091	fix: reap stuck generators in reapStaleSessions (fixes #1652 ) (#1698 ) * fix: reap stuck generators in reapStaleSessions (fixes #1652) Sessions whose SDK subprocess hung would stay in the active sessions map forever because `reapStaleSessions()` unconditionally skipped any session with a non-null `generatorPromise`. The generator was blocked on `for await (const msg of queryResult)` inside SDKAgent and could never unblock itself — the idle-timeout only fires when the generator is in `waitForMessage()`, and the orphan reaper skips processes whose session is still in the map. Add `MAX_GENERATOR_IDLE_MS` (5 min). When `reapStaleSessions()` sees a session whose `generatorPromise` is set but `lastGeneratorActivity` has not advanced in over 5 minutes, it now: 1. SIGKILLs the tracked subprocess to unblock the stuck `for await` 2. Calls `session.abortController.abort()` so the generator loop exits 3. Calls `deleteSession()` which waits up to 30 s for the generator to finish, then cleans up supervisor-tracked children Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: freeze time in stale-generator test and import constants from production source - Export MAX_GENERATOR_IDLE_MS, MAX_SESSION_IDLE_MS, StaleGeneratorCandidate, StaleGeneratorProcess, and detectStaleGenerator from SessionManager.ts so tests no longer duplicate production constants or detection logic. - Use setSystemTime() from bun:test to freeze Date.now() in the "exactly at threshold" test, eliminating the flaky double-Date.now() race. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 00:58:35 -07:00
Ben Younes	4538e686ad	fix: resolve Setup hook broken reference and warn on macOS-only binary (#1547 ) (#1696 ) * fix: resolve Setup hook broken reference and warn on macOS-only binary (#1547) On Linux ARM64, the plugin silently failed because: 1. The Setup hook called setup.sh which was removed; the hook exited 127 (file not found), causing the plugin to appear uninstalled. 2. The committed plugin/scripts/claude-mem binary is macOS arm64 only; no warning was shown when it could not execute on other platforms. Fix the Setup hook to call smart-install.js (the current setup mechanism) and add checkBinaryPlatformCompatibility() to smart-install.js, which reads the Mach-O magic bytes from the bundled binary and warns users on non-macOS platforms that the JS fallback (bun-runner.js + worker-service.cjs) is active. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com> * fix: close fd in finally block, strengthen smart-install tests to use production function - Wrap openSync/readSync in checkBinaryPlatformCompatibility with a finally block so the file descriptor is always closed even if readSync throws - Export checkBinaryPlatformCompatibility with an optional binaryPath param for testability - Refactor Mach-O detection tests to call the production function directly, mocking process.platform and passing controlled binary paths, eliminating duplicated inline logic - Strengthen plugin-distribution test to assert at least one command hook exists before checking for smart-install.js, preventing vacuous pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-04-15 00:58:29 -07:00
Ben Younes	f97c50bfb9	fix: session lifecycle guards to prevent runaway API spend (#1590 ) (#1693 ) * fix: add session lifecycle guards to prevent runaway API spend (#1590) Three root causes allowed 30+ subprocess accumulation over 36 hours: 1. SIGTERM-killed processes (code 143) triggered crash recovery and immediately respawned — now detected and treated as intentional termination (aborts controller so wasAborted=true in .finally). 2. No wall-clock limit: sessions ran for 13+ hours continuously spending tokens — now refuses new generators after 4 hours and drains the pending queue to prevent further spawning. 3. Duplicate --resume processes for the same session UUID — now killed and unregistered before a new spawn is registered. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com> * fix: use normalized errorMsg in logger.error payload and annotate SIGTERM override Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: use persisted createdAt for wall-clock guard and bind abortController locally to prevent stale abort Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore: re-trigger CodeRabbit review after rate limit reset * fix: defer process unregistration until exit and align boundary test with strict > (#1693) - ProcessRegistry: don't unregister PID immediately after SIGTERM — let the existing 'exit' handler clean up when the process actually exits, preventing tracking loss for still-live processes. - Test: align wall-clock boundary test with production's strict `>` operator (exactly 4h is NOT terminated, only >4h is). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-04-15 00:58:23 -07:00
Ben Younes	983be42998	fix: resolve Gemini CLI 0.37.0 session capture failures (#1664 ) (#1692 ) Three root causes prevented Gemini sessions from persisting prompts, observations, and summaries: 1. BeforeAgent was mapped to user-message (display-only) instead of session-init (which initialises the session and starts the SDK agent). 2. The transcript parser expected Claude Code JSONL (type: "assistant") but Gemini CLI 0.37.0 writes a JSON document with a messages array where assistant entries carry type: "gemini". extractLastMessage now detects the format and routes to the correct parser, preserving full backward compatibility with Claude Code JSONL transcripts. 3. The summarize handler omitted platformSource from the /api/sessions/summarize request body, causing sessions to be recorded without the gemini-cli source tag. Co-authored-by: Claude <noreply@anthropic.com>	2026-04-15 00:58:20 -07:00
Ethan	16a0737dfc	fix: use parent project name for worktree observation writes (#1820 ) * fix: use parent project name for worktree observation writes (#1819) Observations and sessions from git worktrees were stored under basename(cwd) instead of the parent repo name because write paths called getProjectName() (not worktree-aware) instead of getProjectContext() (worktree-aware). This is the same bug as #1081, #1317, and #1500 — it regressed because the two functions coexist and new code reached for the simpler one. Fix: getProjectContext() now returns parentProjectName as primary when in a worktree, and all four write-path call sites now use getProjectContext().primary instead of getProjectName(). Includes regression test that creates a real worktree directory structure and asserts primary === parentProjectName. * fix: address review nitpicks — allProjects fallback, JSDoc, write-path test - ContextBuilder: default projects to context.allProjects for legacy worktree-labeled record compatibility - ProjectContext: clarify JSDoc that primary is canonical (parent repo in worktrees) - Tests: add write-path regression test mirroring session-init/SessionRoutes pattern; refactor worktree fixture into beforeAll/afterAll * refactor(project-name): rename local to cwdProjectName and dedupe allProjects Addresses final CodeRabbit nitpick: disambiguates the local variable from the returned `primary` field, and dedupes allProjects via Set in case parent and cwd resolve to the same name. --------- Co-authored-by: Ethan Hurst <ethan.hurst@outlook.com.au>	2026-04-15 00:58:14 -07:00
enma998	471e1f62f9	Fix npx search and default Codex context to workspace-local AGENTS (#1780 ) * Fix npx search query parameter mismatch * Use workspace-local Codex AGENTS context by default --------- Co-authored-by: bnb <bnb>	2026-04-15 00:58:08 -07:00
suyua9	eeb6841033	fix: coerce corpus route filters (#1776 ) * fix: coerce corpus route filters * test: cover unsupported corpus type filters	2026-04-15 00:58:01 -07:00
Tran Quang	2a2008bac2	fix(file-context): preserve targeted reads + invalidate on mtime (#1719 ) (#1729 ) * fix(file-context): preserve targeted reads + invalidate on mtime (#1719) The PreToolUse:Read hook unconditionally rewrote tool input to {file_path, limit:1}, which interacted with two failure modes: 1. Subagent edits a file → parent's next Read still gets truncated because the observation snapshot predates the change. 2. Claude requests a different section with offset/limit → the hook strips them, so the Claude Code harness's read-dedup cache returns "File unchanged" against the prior 1-line read. The file becomes unreadable for the rest of the conversation, even though the hook's own recovery hint says "Read again with offset/limit for the section you need." Two complementary fixes: - mtime invalidation: stat the file (we already stat for the size gate) and compare mtimeMs to the newest observation's created_at_epoch. If the file is newer, pass the read through unchanged so fresh content reaches Claude. - Targeted-read pass-through: when toolInput already specifies offset and/or limit, preserve them in updatedInput instead of collapsing to {limit:1}. The harness's dedup cache then sees a distinct input and lets the read proceed. The unconstrained-read path (no offset, no limit) is unchanged: still truncated to 1 line plus the observation timeline, so token economics are preserved for the common case. Tests cover all three branches: existing truncation, targeted-read pass-through (offset+limit, limit-only), and mtime-driven bypass. Fixes #1719 * refactor(file-context): address review findings on #1719 fix - Add offset-only test case for full targeted-read branch coverage - Use >= for mtime comparison to handle same-millisecond edge case - Add Number.isFinite() + bounds guards on offset/limit pass-through - Trim over-verbose comments to concise single-line summaries - Remove redundant `as number` casts after typeof narrowing - Add comment explaining fileMtimeMs=0 sentinel invariant	2026-04-15 00:57:57 -07:00
Alex Newman	a390a537c9	fix: broadcast uses summaryForStore to support salvaged summaries (#1718 ) syncAndBroadcastSummary was using the raw ParsedSummary (null when salvaged) instead of summaryForStore for the SSE broadcast, causing a crash when the LLM returns <observation> without <summary> tags. Also removes misplaced tree-sitter docs from mem-search/SKILL.md (belongs in smart-explore). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 19:11:48 -07:00
Alex Newman	2357835942	Merge pull request #1686 from ousamabenyounes/fix/issue-1633 fix: expose summaryStored in session status to detect silent summary loss (#1633)	2026-04-14 18:41:58 -07:00
Alex Newman	77a22d30b2	Merge pull request #1555 from ousamabenyounes/fix/issue-1384-mcp-inputschema fix: declare inputSchema properties for search and timeline MCP tools (#1384 #1413)	2026-04-14 18:41:54 -07:00
Alex Newman	40a25e0225	Merge pull request #1676 from ousamabenyounes/fix/issue-1625 fix: filter ghost observations with no content fields (#1625)	2026-04-14 18:41:51 -07:00
Alex Newman	4c2ab98d90	Merge pull request #1679 from ousamabenyounes/fix/issue-1297 fix: set cwd to homedir when spawning chroma-mcp to prevent pydantic .env.local crash (#1297)	2026-04-14 18:41:48 -07:00
Alex Newman	7bcfd73985	Merge pull request #1677 from ousamabenyounes/fix/issue-1503 fix: avoid DEP0190 deprecation on Windows by using single-string spawnSync for where bun (#1503)	2026-04-14 18:41:34 -07:00
Alex Newman	7dd321f869	Merge pull request #1678 from ousamabenyounes/fix/issue-1342 fix: add .gitattributes to enforce LF endings on plugin scripts (#1342)	2026-04-14 18:41:31 -07:00
Alex Newman	153ddb814b	Merge pull request #1670 from ousamabenyounes/fix/issue-1651 docs: add Language Support section to smart-explore/SKILL.md (#1651)	2026-04-14 18:41:28 -07:00
Alex Newman	216d17879d	Merge pull request #1680 from ousamabenyounes/fix/issue-1447 fix: suppress false ERROR when duplicate daemon loses port bind race (#1447)	2026-04-14 18:41:25 -07:00
Alex Newman	fa73dd483c	Merge pull request #1666 from ousamabenyounes/fix/issue-1299 fix: remove leaky mock.module() for project-name that polluted parallel workers (#1299)	2026-04-14 18:41:22 -07:00
Ousama Ben Younes	2f19eab9c2	fix: expose summaryStored in session status to detect silent summary loss (#1633 ) Stop hook polled queueLength===0 as a proxy for summary success, but the queue empties regardless of whether the LLM produced valid <summary> tags. Added lastSummaryStored tracking on ActiveSession, surfaced via the /api/sessions/status endpoint, and emit a logger.warn in the Stop hook when summaryStored===false. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-10 15:06:18 +00:00
Ousama Ben Younes	e7bf2ac65a	docs: add custom grammar and markdown special support sections to smart-explore/SKILL.md - Add Custom Grammars (.claude-mem.json) section explaining how to register additional tree-sitter parsers for unsupported file extensions - Add Markdown Special Support section documenting heading-based outline, code-fence search, section unfold, and frontmatter extraction behaviors - Expand bundled language test to cover all 10 documented languages plus the plain-text fallback sentence to prevent partial doc regressions Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-10 10:52:31 +00:00
Ousama Ben Younes	5ac54239d8	fix: add context-generator.cjs to SHEBANG_SCRIPTS and assert file existence - Add missing context-generator.cjs to the SHEBANG_SCRIPTS list so CRLF regressions in that script are caught by the test suite - Replace silent early-returns with expect(existsSync(filePath)).toBe(true) so the suite fails loudly when expected build artifacts are absent Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-10 10:51:34 +00:00
Ousama Ben Younes	08cf2ba3bd	fix: suppress false ERROR when duplicate daemon loses port bind race (#1447 ) When the MCP server and SessionStart hook both spawn a worker daemon concurrently, one loses the bind race (EADDRINUSE / Bun's port-in-use error). The loser now checks if the winner is healthy; if so, it logs INFO and exits cleanly instead of logging a misleading ERROR on every first session start. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 10:01:08 +00:00
Ousama Ben Younes	c7c4fd54d6	fix: set cwd to homedir when spawning chroma-mcp to prevent pydantic .env.local crash (#1297 ) chroma-mcp uses pydantic-settings which auto-reads .env/.env.local from the CWD. When the project directory contains non-chroma variables (e.g. CELERY_TASK_ALWAYS_EAGER), pydantic rejects them with "Extra inputs are not permitted", crashing the subprocess and triggering a permanent backoff loop. Passing cwd: os.homedir() to StdioClientTransport ensures pydantic never reads project env files. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-10 09:55:02 +00:00
Ousama Ben Younes	f61eb2d162	fix: add .gitattributes to enforce LF endings on plugin scripts (#1342 ) Without .gitattributes, building on Windows produces plugin scripts with CRLF line endings. The CRLF on the shebang line causes "env: node\r: No such file or directory" on macOS/Linux, breaking the MCP server and all hook scripts. Add text=auto eol=lf as the global default plus explicit eol=lf rules for plugin/scripts/.cjs and .js. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-10 09:51:22 +00:00
Ousama Ben Younes	e9a234308a	fix: avoid DEP0190 deprecation on Windows by using single-string spawnSync for where bun (#1503 ) Node 22+ emits DEP0190 when spawnSync is called with a separate args array and shell:true, because the args are only concatenated (not escaped). Split the findBun() PATH check into platform-specific calls: Windows uses spawnSync('where bun', { shell: true }) as a single string, Unix uses spawnSync('which', ['bun']) with no shell option. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-10 09:49:23 +00:00
Ousama Ben Younes	e398212983	fix: filter ghost observations with no content fields (#1625 ) When the LLM is overwhelmed by large context it can emit bare <observation/> blocks (or ones containing only <type>). These are stored as rows where title, narrative, facts and concepts are all null/empty, appearing as meaningless "Untitled" entries in the context window. Add a guard in parseObservations() that skips any observation where every content field is null/empty before pushing it to the result array. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-10 09:40:40 +00:00
Ousama Ben Younes	36a03f75b2	docs: add Language Support section to smart-explore/SKILL.md (#1651 ) tree-sitter language docs belonged in smart-explore but were absent; this adds the Bundled Languages table (10 languages) with correct placement. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-09 23:24:14 +00:00
Ousama Ben Younes	5676cab83f	fix: remove leaky mock.module() for project-name that polluted parallel workers (#1299 ) Top-level mock.module() in context-reinjection-guard.test.ts permanently stubbed getProjectName() to 'test-project' for the entire Bun worker process, causing tests in other files to receive the wrong value. Removed the unnecessary mock (session-init tests don't assert on project name), added bunfig.toml smol=true for worker isolation, and added a regression test. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-09 22:55:54 +00:00
Alex Newman	abd55977ca	fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node (#1645 ) * fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node The MCP server bundle (mcp-server.cjs) ships with `#!/usr/bin/env node` so it must run under Node, but commit `2b60dd29` added an import of `ensureWorkerStarted` from worker-service.ts. That import transitively pulls in DatabaseManager → bun:sqlite, blowing up at top-level require under Node. The bundle ballooned from ~358KB (v11.0.1) to ~1.96MB (v12.0.0) and crashed on every spawn, breaking the MCP server entirely for Codex/MCP-only clients and any flow that boots the MCP tool surface. Fix: 1. Extract `ensureWorkerStarted` and the Windows spawn-cooldown helpers into a new lightweight module `src/services/worker-spawner.ts` that only imports from infrastructure/ProcessManager, infrastructure/HealthMonitor, shared/, and utils/logger — no SQLite, no ChromaSync, no DatabaseManager. 2. The new helper takes the worker script path explicitly so callers running under Node (mcp-server) can pass `worker-service.cjs` while callers already inside the worker (worker-service self-spawn) pass `__filename`. worker-service.ts keeps a thin wrapper for back-compat. 3. mcp-server.ts now imports from worker-spawner.js and resolves WORKER_SCRIPT_PATH via __dirname so the daemon can be auto-started for MCP-only clients without dragging in the entire worker bundle. 4. resolveWorkerRuntimePath() now searches for Bun on every platform (not just Windows). worker-service.cjs requires Bun at runtime, so when the spawner is invoked from a Node process the Unix branch can no longer fall through to process.execPath (= node). 5. spawnDaemon's Unix branch now calls resolveWorkerRuntimePath() instead of hardcoding process.execPath, fixing the same Node-spawning-Node bug for the actual subprocess launch on Linux/macOS. After: - mcp-server.cjs is 384KB again with zero `bun:sqlite` references - node mcp-server.cjs initializes and serves tools/list + tools/call (verified via JSON-RPC against the running worker) - ProcessManager test suite updated for the new cross-platform Bun resolution behavior; full suite has the same pre-existing failures as main, no regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> fix(mcp): address PR #1645 review feedback (round 1) Per Claude Code Review on PR #1645: 1. mcp-server.ts: log a warning when both __dirname and import.meta.url resolution fail. The cwd() fallback is essentially dead code for the CJS bundle but if it ever fires it gives the user a breadcrumb instead of a silently-wrong WORKER_SCRIPT_PATH. 2. mcp-server.ts: existsSync check on WORKER_SCRIPT_PATH at module load. Surfaces a clear "worker-service.cjs not found at expected path" log line for partial installs / dev environments instead of letting the failure surface as a generic spawnDaemon error later. 3. ProcessManager.ts: explanatory comment on the Windows `return 0` sentinel in spawnDaemon. Documents that PowerShell Start-Process doesn't return a PID and that callers MUST use `pid === undefined` for failure detection — never falsy checks like `if (!pid)`. Items 4 (no direct unit tests for the worker-spawner Windows cooldown helpers) and 5 (process-manager.test.ts uses real ~/.claude-mem path) are deferred — the reviewer flagged the latter as out of scope, and the former needs an injectable-I/O refactor that isn't appropriate for a hotfix bugfix PR. Verified: build clean, mcp-server.cjs still 384KB / zero bun:sqlite, JSON-RPC tools/list still returns the 7-tool surface, ProcessManager test suite still 43/43. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(spawner): mkdir CLAUDE_MEM_DATA_DIR before writing Windows cooldown marker Per CodeRabbit on PR #1645: on a fresh user profile, the data dir may not exist yet when markWorkerSpawnAttempted() runs. writeFileSync would throw ENOENT, the catch would swallow it, and the marker would never be created — defeating the popup-loop protection this helper exists to provide. mkdirSync(dir, { recursive: true }) is a no-op when the directory already exists, so it's safe to call on every spawn attempt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(spawner): add APPROVED OVERRIDE annotations for cooldown marker catches Per CodeRabbit on PR #1645: silent catch blocks at spawn-cooldown sites should carry the APPROVED OVERRIDE annotation that the rest of the codebase uses (see ProcessManager.ts:689, BaseRouteHandler.ts:82, ChromaSync.ts:288). Both catches are intentional best-effort: - markWorkerSpawnAttempted: if mkdir/writeFileSync fails, the worker spawn itself will almost certainly fail too. Surfacing that downstream is far more useful than a noisy log line about a lock file. - clearWorkerSpawnAttempted: a stale marker is harmless. Worst case is one suppressed retry within the cooldown window, then self-heals. No behaviour change. Resolves the second half of CodeRabbit's lines 38-65 comment on worker-spawner.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(mcp): address PR #1645 review feedback (round 2) Round 2 of Claude Code Review feedback on PR #1645: Build guardrail (most important — protects the regression this PR fixes): - scripts/build-hooks.js: post-build check that fails the build if mcp-server.cjs ever contains a `bun:sqlite` reference. This is the exact regression PR #1645 fixed; future contributors will get an immediate, actionable error if a transitive import re-introduces it. Verified the check trips when violated. Code clarity: - src/servers/mcp-server.ts: drop dead `_originalLog` capture — it was never restored. Less code is fewer bugs. - src/servers/mcp-server.ts: elevate `cwd()` fallback log from WARN to ERROR. Per reviewer: a wrong WORKER_SCRIPT_PATH means worker auto-start silently fails, so the breadcrumb should be loud and searchable. - src/services/worker-service.ts: extended doc comment on the `ensureWorkerStartedShared(port, __filename)` wrapper explaining why `__filename` is the correct script path here (CJS bundle = compiled worker-service.cjs) and why mcp-server.ts can't use the same trick. - src/services/infrastructure/ProcessManager.ts: inline comment on the `env.BUN === 'bun'` bare-command guard explaining why it's reachable even though `isBunExecutablePath('bun')` is true (pathExists returns false for relative names, so the second branch is what fires). Coverage: - src/services/infrastructure/ProcessManager.ts: add `/usr/bin/bun` to the Linux candidate paths so apt-installed Bun on Debian/Ubuntu is found without falling through to the PATH lookup. Out-of-scope items (deferred with rationale in PR replies): - Unit tests for ensureWorkerStarted / Windows cooldown helpers — needs injectable-I/O refactor unsuitable for a hotfix. - Sentinel object for Windows spawnDaemon `0` — broader API change. - Windows Scoop install path — follow-up for a future PR. - runOneTimeChromaMigration placement, aggressiveStartupCleanup, console.log redirect timing, platform timeout multiplier — all pre-existing and unrelated to this regression. Verified: build clean, guardrail trips on simulated violation, mcp-server.cjs still 0 bun:sqlite refs, ProcessManager tests 43/43. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(mcp): address PR #1645 review feedback (round 3) Round 3 of Claude Code Review feedback on PR #1645: ProcessManager.ts: improve actionability of "Bun not found" errors Both Windows and Unix branches of spawnDaemon previously logged a vague "Failed to locate Bun runtime" message when resolveWorkerRuntimePath() returned null. Replaced with an actionable message that names the install URL and explains why Bun is required (worker uses bun:sqlite). The existing null-guard at the call sites already prevents passing null to child_process.spawn — only the error text changed. scripts/build-hooks.js: refine bun:sqlite guardrail to match actual require() calls only The previous coarse `includes('bun:sqlite')` check tripped on its own improved error message, which legitimately mentions "bun:sqlite" by name. Switched to a regex that matches `require("bun:sqlite")` / `require('bun:sqlite')` (with optional whitespace, handles both quote styles, handles minified output) so error messages and inline comments can reference the module name without false positives. Verified the regex still trips on real violations (both spaced and minified forms) and correctly ignores string-literal mentions. Other round-3 items (verified, not changed): - TOOL_ENDPOINT_MAP: reviewer flagged as dead code, but it IS used at lines 250 and 263 by the search and timeline tool handlers. False positive — kept as-is. - if (!pid) callsites: grepped src/, zero offenders. The Windows `0` PID sentinel contract is safe; only the in-line documentation comment in ProcessManager.ts mentions the anti-pattern. - callWorkerAPIPost double-wrapping: pre-existing intentional behavior (only used by /api/observations/batch which returns raw data, not the MCP {content:[...]} shape). Unrelated to this regression. - Snap path / startParentHeartbeat / main().catch / test for non- existent workerScriptPath / etc — pre-existing or out of scope for this hotfix, deferred per established disposition. Verified: build clean, guardrail still trips on real violations, mcp-server.cjs has 0 require("bun:sqlite") calls, JSON-RPC tools/list returns the 7-tool surface, ProcessManager tests 43/43. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(spawnDaemon): contract test for Windows 0 PID success sentinel Per CodeRabbit nitpick on PR #1645 commit `7a96b3b9`: add a focused test that documents the spawnDaemon return contract so any future contributor who introduces `if (!pid)` against a spawnDaemon return value (or its wrapper) sees a failing assertion explaining why the falsy check is incorrect. The test deliberately exercises the JS-level semantics rather than mocking PowerShell — a true mocked Windows test would require refactoring spawnDaemon to take an injectable execSync, which is a larger change than this hotfix should carry. The contract assertions here catch the same regression class (treating Windows success as failure) without that refactor. Verified: bun test tests/infrastructure/process-manager.test.ts now passes 44/44 (was 43/43). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(mcp): address PR #1645 review feedback (round 4) Round 4 of Claude Code Review feedback on PR #1645 (review of round-3 commit `193286f9`): tests/infrastructure/process-manager.test.ts: replace require('fs') with the already-imported statSync. Reviewer correctly flagged that the file uses ESM-style named imports everywhere else and the inline require() calls would break under strict ESM. Two callsites updated in the touchPidFile test. src/services/infrastructure/ProcessManager.ts: hoist resolveWorkerRuntimePath() and the `Bun runtime not found` error handling out of both branches in spawnDaemon. Both Windows and Unix branches need the same Bun lookup, and resolving once before the OS branch split avoids a duplicate execSync('which bun')/where bun in the no-well-known-path fallback. The error message is also DRY now — single source of truth instead of two near-identical strings. CodeRabbit confirmed in its previous reply that "All actionable items across all four review rounds are fully resolved" — these two minor items from claude-review of round 3 are the only remaining cleanup. Verified: build clean, ProcessManager tests still 44/44. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(mcp): address PR #1645 review feedback (round 5) Round 5 of Claude Code Review feedback on PR #1645: src/services/worker-spawner.ts: drop `export` from internal helpers `shouldSkipSpawnOnWindows`, `markWorkerSpawnAttempted`, and `clearWorkerSpawnAttempted` were exported even though they were private in worker-service.ts and nothing outside this module needs them. Removing the `export` keyword keeps the public surface to just `ensureWorkerStarted` and prevents future callers from bypassing the spawn lifecycle. scripts/build-hooks.js: broaden guardrail to all bun:* modules Previously the regex only caught `require("bun:sqlite")`, but every module in the `bun:` namespace (bun:ffi, bun:test, etc.) is Bun-only and would crash mcp-server.cjs the same way under Node. Generalized the regex to `require("bun:[a-z][a-z0-9_-]")` so a transitive import of any Bun-only module fails the build instead of shipping a broken bundle. Verified the new regex still trips on bun:sqlite, bun:ffi, bun:test, and correctly ignores string-literal mentions in error messages. src/servers/mcp-server.ts: attribute root cause when dirname resolution fails Previously, if `__dirname`/`import.meta.url` resolution failed and we fell back to `process.cwd()`, the user would see two warnings: an error about the dirname fallback AND a separate warning about the missing worker bundle. The second warning hides the root cause — someone debugging would assume the install is broken when really it's a dirname-resolution failure. Track the failure with a flag and emit a single root-cause-attributing log line in the existence-check branch instead. The dirname fallback paths are still functionally unreachable in CJS deployment; this just makes the failure mode unmistakable if it ever does fire. Out of scope (consistent with prior rounds): - darwin/linux split for non-Windows candidate paths (benign today) - Integration test for non-existent workerScriptPath (test coverage gap deferred since rounds 1-2) - Defer existsSync check to first ensureWorkerStarted call (current module-init check is the loud signal we want) Already addressed in earlier rounds: - resolveWorkerRuntimePath() called twice in spawnDaemon → hoisted in round 4 (`b2c114b4`) - _originalLog dead code → removed in round 2 (`7a96b3b9`) Verified: build clean, broadened guardrail trips on bun:sqlite, bun:ffi, and bun:test (and ignores string literals), MCP server serves the 7-tool surface, ProcessManager tests still 44/44. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> fix(mcp): address PR #1645 review feedback (round 6) Round 6 of Claude Code Review feedback on PR #1645: src/services/worker-spawner.ts: validate workerScriptPath at entry Add an empty-string + existsSync guard at the top of ensureWorkerStarted. Without this, a partial install or upstream path-resolution regression just surfaces as a low-signal child_process error from spawnDaemon. The explicit log line at the entry point makes that class of bug much easier to diagnose. The mcp-server.ts module-init existsSync check already covers this for the MCP-server caller, but defending at the spawner level reinforces the contract for any future caller. src/services/worker-spawner.ts: document SettingsDefaultsManager dependency boundary in the module header The spawner imports from SettingsDefaultsManager, ProcessManager, and HealthMonitor. None of those currently touch bun:sqlite, but if any of them ever does, the spawner's SQLite-free contract silently breaks. The build guardrail in build-hooks.js is the only thing that catches it. Header comment now flags this so future contributors audit transitive imports when adding helpers from the shared/infrastructure layers. src/services/infrastructure/ProcessManager.ts: add /snap/bin/bun Ubuntu Snap install path. Now alongside the existing apt path (/usr/bin/bun) and Homebrew/Linuxbrew paths. The PATH lookup catches it as fallback, but listing it explicitly avoids paying for an execSync('which bun') in the common case. src/servers/mcp-server.ts: elevate missing-bundle log warn → error A missing worker-service.cjs means EVERY MCP tool call that needs the worker silently fails. That's a broken-install state, not a transient condition — match the severity of the dirname-fallback branch above (which is already ERROR). Out of scope (consistent with prior rounds, reviewer agrees these are appropriately deferred): - Streaming bundle read in build-hooks.js (nit at current 384KB size) - Unit tests for ensureWorkerStarted / cooldown helpers - Integration test for non-existent workerScriptPath Verified: build clean, broadened guardrail still trips on bun:* imports and ignores string literals, MCP server serves the 7-tool surface, ProcessManager tests still 44/44. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(mcp): defer WORKER_SCRIPT_PATH check to first call (round 7) Round 7 of Claude Code Review feedback on PR #1645: src/servers/mcp-server.ts: extract module-level existsSync check into checkWorkerScriptPath() and call it lazily from ensureWorkerConnection() instead of at module load. The early-warning intent is preserved (the check still fires before any actual spawn attempt), but tests/tools that import this module without booting the MCP server no longer see noisy ERROR-level log lines for a worker bundle they never intended to start. The check is cheap and idempotent, so calling it on every auto-start attempt is fine. The two failure-mode branches (dirname-resolution failure vs simple missing-bundle) remain unchanged — the function body is identical to the previous module-level if-block, just hoisted into a function and called from ensureWorkerConnection(). False positive (no change needed): - Reviewer flagged `mkdirSync` as a dead import in worker-spawner.ts, but it IS used at line 71 in markWorkerSpawnAttempted (the round-1 ENOENT fix CodeRabbit explicitly asked for). Out of scope: - Volta path (~/.volta/bin/bun) — PATH fallback handles it; nit per reviewer - worker-spawner.ts unit tests — needs injectable I/O, deferred consistently since round 1 Verified: build clean, tests 44/44, smoke test 7-tool surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(mcp): address PR #1645 review feedback (round 8) Round 8 of Claude Code Review feedback on PR #1645: tests/services/worker-spawner.test.ts: NEW FILE — unit tests for the ensureWorkerStarted entry-point validation guards added in round 6. Covers the empty-string and non-existent-path cases without requiring the broader injectable-I/O refactor that the deeper spawn lifecycle tests would need. 2 new passing tests. src/services/infrastructure/ProcessManager.ts: memoize resolveWorkerRuntimePath() for the no-options call site (which is what spawnDaemon uses). Caches both successful resolutions and the not-found result so repeated spawn attempts (crash loops, health thrashing) don't repeatedly hit statSync on candidate paths. Tests that pass options bypass the cache entirely so existing test cases remain deterministic. Added resetWorkerRuntimePathCache() exported for test isolation only. src/servers/mcp-server.ts: rename checkWorkerScriptPath() → warnIfWorkerScriptMissing(). Per reviewer: the old name implied a boolean check but the function returns void and has side effects. New name is more accurate. DEFENDED (no change made): - Reviewer asked to elevate process.cwd() fallback to a synchronous throw at module load. This conflicts with round 7 feedback which asked to defer the existsSync check to first call to avoid noisy test logs. The current lazy approach is the right compromise: it fires before any actual spawn attempt, attributes the root cause, and doesn't pollute test imports. Throwing at module load would crash before stdio is wired up, which is much harder to debug than the lazy log line. - Reviewer asked to grep for `if (!pid)` callsites — already verified in round 3, zero offenders in src/. Out of scope: - Volta path (~/.volta/bin/bun) — PATH fallback handles it; reviewer marked as nit - Deeper unit tests for ensureWorkerStarted spawn lifecycle (PID file cleanup, health checks, etc.) — needs injectable I/O, deferred consistently since round 1 Verified: build clean, ProcessManager tests still 44/44, new worker-spawner tests 2/2, smoke test serves 7 tools. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(spawner): clear Windows cooldown marker on all healthy paths (round 9) Round 9 of PR #1645 review feedback. src/services/worker-spawner.ts: clear stale Windows cooldown marker on every healthy-return path Per CodeRabbit (genuine bug): The .worker-start-attempted marker was previously only cleared after a spawn initiated by ensureWorkerStarted itself succeeded. If a previous auto-start failed, then the worker became healthy via another session or a manual start, the early-return success branches (existing live PID, fast-path health check, port-in-use waitForHealth) would leave the stale marker behind. A subsequent genuine outage inside the 2-minute cooldown window would then be incorrectly suppressed on Windows. Now calls clearWorkerSpawnAttempted() on all three healthy success paths in addition to the existing post-spawn path. The function is already a no-op on non-Windows, so the change is risk-free for Linux and macOS callers. src/servers/mcp-server.ts: more actionable error when auto-start fails Per claude-review: when ensureWorkerStarted returns false (or throws), the caller currently logs a generic "Worker auto-start failed" line. Updated both error sites to explicitly call out which MCP tools will fail (search/timeline/get_observations) and to point at earlier log lines for the specific cause. Helps users distinguish "worker is just not running" from "tools are broken". DEFENDED (no change): - Sentinel object for Windows spawnDaemon 0 PID — broader API change, out of scope, deferred consistently since round 1 - Spawner lifecycle tests beyond input validation — needs injectable I/O, deferred consistently - Concurrent cooldown marker race on Windows — pre-existing, out of scope - stripHardcodedDirname() regex fragility assertion — pre-existing, out of scope Verified: build clean, ProcessManager tests 44/44, worker-spawner tests 2/2, smoke test 7-tool surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(spawner): don't cache null Bun-not-found result (round 10) Round 10 of PR #1645 review feedback. src/services/infrastructure/ProcessManager.ts: only cache successful resolveWorkerRuntimePath() results Genuine bug from claude-review: the round-8 memoization cached BOTH successful resolutions AND the not-found `null` result. If Bun isn't on PATH at the moment the MCP server first tries to spawn the worker — e.g., on a fresh install where the user installs Bun in another terminal and retries — every subsequent ensureWorkerConnection call would return the cached `null` and fail with a misleading "Bun not found" error even though Bun is now available. The fix is the one-line change the reviewer suggested: only cache when `result !== null`. Crash loops still get the fast-path memoized success; recovery from a fresh-install Bun install still works. src/servers/mcp-server.ts: rename warnIfWorkerScriptMissing → errorIfWorkerScriptMissing Per claude-review: the function uses logger.error but the name says "warn" — name/level mismatch. Renamed to match. The function still serves the same purpose (defensive lazy check), just with an accurate name. DEFENDED (no change): - Discriminated union for mcpServerDirResolutionFailed flag — current approach works, the noise is minimal, and the alternative would add type complexity for a path that's functionally unreachable in CJS deployment - macOS /usr/local/bin/bun "missing" — already in the Linux/macOS candidate list at line 137 (false positive from reviewer) - nix store path — out of scope, PATH fallback handles it - Long build-hooks.js error message — verbosity is intentional, this message only fires on a real regression and the diagnostic value is worth the line wrap - Spawner lifecycle test coverage gap — needs injectable I/O, deferred consistently Verified: build clean, ProcessManager tests 44/44, worker-spawner tests 2/2, smoke test 7-tool surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(mcp): bundle size budget guardrail (round 11) Round 11 of PR #1645 review feedback. scripts/build-hooks.js: secondary bundle-size budget guardrail Per claude-review: the existing `require("bun:")` regex catches the specific regression class we already know about, but if esbuild ever changes how it emits external module specifiers, the regex could silently miss the regression. A bundle-size budget catches the structural symptom (worker-service.ts dragged into the bundle blew the size from ~358KB to ~1.96MB) regardless of how the imports look. Set the ceiling at 600KB. Current size is ~384KB; the broken v12.0.0 bundle was ~1920KB. Plenty of headroom for legitimate growth without incentivizing bundle bloat or false positives. Both guardrails fire independently — one is regex-based, one is size-based — so a regression has to defeat both to ship. tests/services/worker-spawner.test.ts: comment about port irrelevance Per claude-review: the hardcoded port values in the validation-guard tests are arbitrary because the path validation short-circuits before any network I/O. Added a comment explaining this so future readers don't waste time wondering why specific ports were picked. DEFENDED (no change): - clearWorkerSpawnAttempted on the unhealthy-live-PID return path: reviewer asked to clear the marker here too, but the current behavior is correct. The marker tracks "recently attempted a spawn" and exists to prevent rapid PowerShell-popup loops. If a wedged process is currently using the port, the spawn isn't actually happening on this code path (the helper returns false without reaching the spawn step). When the wedged process eventually dies and a subsequent call hits the spawn path, the marker correctly suppresses repeated retry attempts within the 2-minute cooldown. Clearing the marker on the unhealthy-return path would defeat exactly the popup-loop protection the marker exists to provide. - execSync in lookupBinaryInPath blocks event loop: pre-existing concern, not introduced by this PR. Reviewer notes "fires once, result cached". Not in scope for a hotfix. - Tracking issue for spawner lifecycle test gap: out of scope for this PR; the gap is documented in the test file's header comment with a back-reference to PR #1645. Verified: build clean, both guardrails functional (size budget is under the new ceiling), ProcessManager tests 44/44, worker-spawner tests 2/2, smoke test 7-tool surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> fix(mcp): eliminate double error log when worker bundle is missing (round 12) Round 12 of PR #1645 review feedback. src/servers/mcp-server.ts: errorIfWorkerScriptMissing() now only logs when the dirname-fallback attribution path is needed Previously a missing worker-service.cjs would produce two ERROR log lines on the same code path: 1. errorIfWorkerScriptMissing() in ensureWorkerConnection() 2. The existsSync guard inside ensureWorkerStarted() The simple "missing bundle" case is fully covered by the spawner's own existsSync guard. The mcp-server.ts function now ONLY logs when mcpServerDirResolutionFailed is true — that's the mcp-server-specific root-cause attribution that the spawner cannot provide on its own. Net effect: same single error log per bug class, cleaner triage. DEFENDED (no change): - mkdirSync error propagation in markWorkerSpawnAttempted: reviewer worried that mkdirSync/writeFileSync exceptions could escape, but the entire body is already wrapped in try/catch with an APPROVED OVERRIDE annotation. False positive. - clearWorkerSpawnAttempted on healthy paths: reviewer asked a clarifying question, not a change request. The behavior is intentional — the cooldown marker exists to prevent rapid PowerShell-popup loops from a series of failed spawns; a healthy worker means the marker has served its purpose and a future outage should NOT be suppressed. Will explain in PR reply. - __filename ESM concern in worker-service.ts wrapper: already documented in round 4 with an extended comment about the CJS bundle context and why mcp-server.ts can't use the same trick. - Spawn lifecycle integration tests: deferred consistently since round 1; gap is documented in worker-spawner.test.ts header. Verified: build clean, ProcessManager tests 44/44, worker-spawner tests 2/2, smoke test 7-tool surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(spawner): add bare-command BUN env override coverage Final round of PR #1645 review feedback: while preparing to merge, I noticed CodeRabbit's round-5 CHANGES_REQUESTED review on commit `3570d2f0` included an unaddressed nitpick — the env-driven bare-command branch in resolveWorkerRuntimePath() (returning a bare 'bun' unchanged when BUN or BUN_PATH is set that way) had no test coverage and could regress without any failing assertion. Added a focused test that exercises the env: { BUN: 'bun' } branch specifically. 47/47 tests pass (was 46/46). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:08:36 -07:00
Ousama Ben Younes	64062ac761	fix: address CodeRabbit review — remove side-effectful test import and normalize timeline depth params - tests/servers/mcp-tool-schemas.test.ts: remove `import '../../src/servers/mcp-server.js'` which triggered server startup side effects; test only needs to read the TS source as text - src/services/worker/SearchManager.ts: add Number() coercion for depth_before/depth_after in timeline(), getContextTimeline(), getTimelineByQuery() — HTTP query strings deliver these as strings, coercion ensures they are always numbers before being passed to filterByDepth() and getTimelineAround*() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 17:30:59 +00:00
Ousama Ben Younes	8cdabe6315	fix: declare inputSchema properties for search and timeline MCP tools (#1384 #1413 ) Both tools had properties:{} which prevents MCP clients from exposing params to the LLM, causing every call to send {} and get a 500 error ("Either query or filters required for search"). - search: declare query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy - timeline: declare anchor, query, depth_before, depth_after, project - Add 3 schema regression tests (static source validation) Closes #1384 Closes #1413 Co-Authored-By: Claude <noreply@anthropic.com>	2026-04-07 17:21:19 +00:00
Alex Newman	6250a194dd	Merge branch 'pr-1472' into integration/validation-batch # Conflicts: # plugin/scripts/context-generator.cjs # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # plugin/ui/viewer-bundle.js # src/cli/handlers/context.ts # src/services/sqlite/SessionStore.ts # src/services/sqlite/migrations/runner.ts # src/services/worker-service.ts # src/shared/SettingsDefaultsManager.ts	2026-04-06 14:23:18 -07:00
Alex Newman	58fcd85724	Merge branch 'pr-1578' into integration/validation-batch # Conflicts: # plugin/scripts/context-generator.cjs # plugin/scripts/worker-service.cjs # src/utils/tag-stripping.ts	2026-04-06 14:21:45 -07:00
Alex Newman	c1a3fc27ec	Merge branch 'pr-1557' into integration/validation-batch # Conflicts: # plugin/hooks/hooks.json # tests/infrastructure/plugin-distribution.test.ts	2026-04-06 14:20:49 -07:00
Alex Newman	d570909bf1	Merge branch 'pr-1491' into integration/validation-batch # Conflicts: # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # src/shared/hook-constants.ts	2026-04-06 14:20:05 -07:00
Alex Newman	5dd2a6f758	Merge branch 'pr-1553' into integration/validation-batch # Conflicts: # src/services/worker/session/SessionCompletionHandler.ts	2026-04-06 14:19:50 -07:00
Alex Newman	c3cb8f81ed	Merge branch 'pr-1368' into integration/validation-batch # Conflicts: # plugin/scripts/context-generator.cjs # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # plugin/ui/viewer-bundle.js	2026-04-06 14:19:23 -07:00
Alex Newman	5cffff7d40	Merge branch 'pr-1620' into integration/validation-batch	2026-04-06 14:19:02 -07:00
Alex Newman	d63d73acc2	Merge branch 'pr-1524' into integration/validation-batch	2026-04-06 14:19:01 -07:00
Alex Newman	842d614adb	Merge branch 'pr-1550' into integration/validation-batch	2026-04-06 14:18:28 -07:00
Alex Newman	4d2bb1f13e	Merge branch 'pr-1441' into integration/validation-batch	2026-04-06 14:18:28 -07:00
Alex Newman	a9de029c02	Merge branch 'pr-1549' into integration/validation-batch	2026-04-06 14:18:28 -07:00

1 2 3 4

152 Commits