* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048)
- Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916)
- Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048)
- Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956)
- Add periodic clearFailed() to purge stale pending_messages (#1957)
- Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874)
- ResponseProcessor: mark messages as failed (with retry) instead of confirming
when the LLM returns non-XML garbage (auth errors, rate limits) (#1874)
- Health endpoint: include activeSessions count for queue liveness monitoring (#1867)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: cache isFts5Available() at construction time
Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text
query. Result is now cached in _fts5Available at construction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053)
- Replace flat consecutiveRestarts counter with time-windowed RestartGuard:
only counts restarts within 60s window (cap=10), decays after 5min of
success. Prevents stranding pending messages on long-running sessions. (#2053)
- Add idle session eviction to pool slot allocation: when all slots are full,
evict the idlest session (no pending work, oldest activity) to free a slot
for new requests, preventing 60s timeout deadlock. (#1868)
- Fix MCP loopback self-check: use process.execPath instead of bare 'node'
which fails on non-interactive PATH. Fix crash misclassification by removing
false "Generator exited unexpectedly" error log on normal completion. (#1876)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907)
- Wrap summarize hook's workerHttpRequest in try/catch to prevent exit
code 2 (blocking error) on network failures or malformed responses.
Session exit no longer blocks on worker errors. (#1901)
- Add health-check wait loop to UserPromptSubmit session-init command in
hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit
before SessionStart, session-init now waits up to 10s for worker health
before proceeding. Also wrap session-init HTTP call in try/catch. (#1907)
- Close#1896 as already-fixed: mtime comparison at file-context.ts:255-267
bypasses truncation when file is newer than latest observation.
- Close#1903 as no-repro: hooks.json correctly declares all hook events.
Issue was Claude Code 12.0.1/macOS platform event-dispatch bug.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936)
- Add bearer token auth to all API endpoints: auto-generated 32-byte
token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook,
MCP, viewer, and OpenCode requests include Authorization header.
Health/readiness endpoints exempt for polling. (#1932, #1933)
- Add path traversal protection: watch.context.path validated against
project root and ~/.claude-mem/ before write. Rejects ../../../etc
style attacks. (#1934)
- Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter
(300 req/min/IP) to prevent abuse. (#1935)
- Derive default worker port from UID (37700 + uid%100) to prevent
cross-user data leakage on multi-user macOS. Windows falls back to
37777. Shell hooks use same formula via id -u. (#1936)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918)
- Fix per-type search endpoints to pass project filter to Chroma queries
and SQLite hydration. searchObservations/Sessions/UserPrompts now use
$or clause matching project + merged_into_project. (#1912)
- Fix timeline/search methods to pass project to Chroma anchor queries.
Prevents cross-project result leakage when project param omitted. (#1911)
- Sync imported observations to ChromaDB after FTS rebuild. Import
endpoint now calls chromaSync.syncObservation() for each imported
row, making them visible to MCP search(). (#1914)
- Fix session-init cwd fallback to match context.ts (process.cwd()).
Prevents project key mismatch that caused "no previous sessions"
on fresh sessions. (#1918)
- Fix sync-marketplace restart to include auth token and per-user port.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve all CodeRabbit and Greptile review comments on PR #2080
- Fix run.sh comment mismatch (no-op flag vs empty array)
- Gate session-init on health check success (prevent running when worker unreachable)
- Fix date_desc ordering ignored in FTS session search
- Age-scope failed message purge (1h retention) instead of clearing all
- Anchor RestartGuard decay to real successes (null init, not Date.now())
- Add recordSuccess() calls in ResponseProcessor and completion path
- Prevent caller headers from overriding bearer auth token
- Add lazy cleanup for rate limiter map to prevent unbounded growth
- Bound post-import Chroma sync with concurrency limit of 8
- Add doc_type:'observation' filter to Chroma queries feeding observation hydration
- Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline)
- Add response.ok check and error handling in viewer saveSettings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve CodeRabbit round-2 review comments
- Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge
- Downgrade _fts5Available flag when FTS table creation fails
- Escape FTS5 MATCH input by quoting user queries as literal phrases
- Escape LIKE metacharacters (%, _, \) in prompt text search
- Add response.ok check in initial settings load (matches save flow)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve CodeRabbit round-3 review comments
- Include failed_at_epoch in COALESCE for age-scoped purge
- Re-throw FTS5 errors so callers can distinguish failure from no-results
- Wrap all FTS fallback calls in SearchManager with try/catch
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: remove bearer auth and platform_source from context inject
Bearer token auth (#1932/#1933) added friction for all localhost API
clients with no benefit — the worker already binds localhost-only (CORS
restriction + host binding). Removed auth-token module, requireAuth
middleware, and Authorization headers from all internal callers.
platform_source filtering from the /api/context/inject path was never
used by any caller and silently filtered out observations. The underlying
platform_source column stays; only the query-time filter and its plumbing
through ContextBuilder, ObservationCompiler, SearchRoutes, context.ts,
and transcripts/processor.ts are removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: resolve CodeRabbit + Greptile + claude-review comments on PR #2081
- middleware.ts: drop 'Authorization' from CORS allowedHeaders (Greptile)
- middleware.ts: rate limiter falls back to req.socket.remoteAddress; add Retry-After on 429 (claude-review)
- SearchRoutes.ts: drop leftover platformSource read+pass in handleContextPreview (Greptile)
- .docker-blowout-data/: stop tracking the empty SQLite placeholder and gitignore the dir (claude-review)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: tighten rate limiter — correct boundary + drop dead cleanup branch
- `entry.count >= RATE_LIMIT_MAX_REQUESTS` so the 300th request is the
first rejected (was 301).
- Removed the `requestCounts.size > 100` lazy-cleanup block — on a
localhost-only server the map tops out at 1–2 entries, so the branch
was dead code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: rate limiter correctly allows exactly 300 req/min; doc localhost scope
- Check `entry.count >= max` BEFORE incrementing so the cap matches the
comment: 300 requests pass, the 301st gets 429.
- Added a comment noting the limiter is effectively a global cap on a
localhost-only worker (all callers share the 127.0.0.1/::1 bucket).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: normalise IPv4-mapped IPv6 in rate limiter client IP
Strip the `::ffff:` prefix so a localhost caller routed as
`::ffff:127.0.0.1` shares a bucket with `127.0.0.1`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: size-guarded prune of rate limiter map for non-localhost deploys
Prune expired entries only when the map exceeds 1000 keys and we're
already doing a window reset, so the cost is zero on the localhost hot
path (1–2 keys) and the map can't grow unbounded if the worker is ever
bound on a non-loopback interface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes the 300 req/min rate limiter from the worker's HTTP middleware.
The worker is localhost-only (enforced via CORS), so rate limiting was
pointless security theater — but it broke the viewer, which polls logs
and stats frequently enough to trip the limit within seconds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SessionStart context injection regressed in v12.3.3 — no memory
context is being delivered to new sessions. Rolling back to the
v12.3.2 tree state while the regression is investigated.
Reverts #2080.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048)
- Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916)
- Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048)
- Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956)
- Add periodic clearFailed() to purge stale pending_messages (#1957)
- Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874)
- ResponseProcessor: mark messages as failed (with retry) instead of confirming
when the LLM returns non-XML garbage (auth errors, rate limits) (#1874)
- Health endpoint: include activeSessions count for queue liveness monitoring (#1867)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: cache isFts5Available() at construction time
Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text
query. Result is now cached in _fts5Available at construction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053)
- Replace flat consecutiveRestarts counter with time-windowed RestartGuard:
only counts restarts within 60s window (cap=10), decays after 5min of
success. Prevents stranding pending messages on long-running sessions. (#2053)
- Add idle session eviction to pool slot allocation: when all slots are full,
evict the idlest session (no pending work, oldest activity) to free a slot
for new requests, preventing 60s timeout deadlock. (#1868)
- Fix MCP loopback self-check: use process.execPath instead of bare 'node'
which fails on non-interactive PATH. Fix crash misclassification by removing
false "Generator exited unexpectedly" error log on normal completion. (#1876)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907)
- Wrap summarize hook's workerHttpRequest in try/catch to prevent exit
code 2 (blocking error) on network failures or malformed responses.
Session exit no longer blocks on worker errors. (#1901)
- Add health-check wait loop to UserPromptSubmit session-init command in
hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit
before SessionStart, session-init now waits up to 10s for worker health
before proceeding. Also wrap session-init HTTP call in try/catch. (#1907)
- Close#1896 as already-fixed: mtime comparison at file-context.ts:255-267
bypasses truncation when file is newer than latest observation.
- Close#1903 as no-repro: hooks.json correctly declares all hook events.
Issue was Claude Code 12.0.1/macOS platform event-dispatch bug.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936)
- Add bearer token auth to all API endpoints: auto-generated 32-byte
token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook,
MCP, viewer, and OpenCode requests include Authorization header.
Health/readiness endpoints exempt for polling. (#1932, #1933)
- Add path traversal protection: watch.context.path validated against
project root and ~/.claude-mem/ before write. Rejects ../../../etc
style attacks. (#1934)
- Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter
(300 req/min/IP) to prevent abuse. (#1935)
- Derive default worker port from UID (37700 + uid%100) to prevent
cross-user data leakage on multi-user macOS. Windows falls back to
37777. Shell hooks use same formula via id -u. (#1936)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918)
- Fix per-type search endpoints to pass project filter to Chroma queries
and SQLite hydration. searchObservations/Sessions/UserPrompts now use
$or clause matching project + merged_into_project. (#1912)
- Fix timeline/search methods to pass project to Chroma anchor queries.
Prevents cross-project result leakage when project param omitted. (#1911)
- Sync imported observations to ChromaDB after FTS rebuild. Import
endpoint now calls chromaSync.syncObservation() for each imported
row, making them visible to MCP search(). (#1914)
- Fix session-init cwd fallback to match context.ts (process.cwd()).
Prevents project key mismatch that caused "no previous sessions"
on fresh sessions. (#1918)
- Fix sync-marketplace restart to include auth token and per-user port.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve all CodeRabbit and Greptile review comments on PR #2080
- Fix run.sh comment mismatch (no-op flag vs empty array)
- Gate session-init on health check success (prevent running when worker unreachable)
- Fix date_desc ordering ignored in FTS session search
- Age-scope failed message purge (1h retention) instead of clearing all
- Anchor RestartGuard decay to real successes (null init, not Date.now())
- Add recordSuccess() calls in ResponseProcessor and completion path
- Prevent caller headers from overriding bearer auth token
- Add lazy cleanup for rate limiter map to prevent unbounded growth
- Bound post-import Chroma sync with concurrency limit of 8
- Add doc_type:'observation' filter to Chroma queries feeding observation hydration
- Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline)
- Add response.ok check and error handling in viewer saveSettings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve CodeRabbit round-2 review comments
- Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge
- Downgrade _fts5Available flag when FTS table creation fails
- Escape FTS5 MATCH input by quoting user queries as literal phrases
- Escape LIKE metacharacters (%, _, \) in prompt text search
- Add response.ok check in initial settings load (matches save flow)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve CodeRabbit round-3 review comments
- Include failed_at_epoch in COALESCE for age-scoped purge
- Re-throw FTS5 errors so callers can distinguish failure from no-results
- Wrap all FTS fallback calls in SearchManager with try/catch
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048)
- Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916)
- Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048)
- Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956)
- Add periodic clearFailed() to purge stale pending_messages (#1957)
- Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874)
- ResponseProcessor: mark messages as failed (with retry) instead of confirming
when the LLM returns non-XML garbage (auth errors, rate limits) (#1874)
- Health endpoint: include activeSessions count for queue liveness monitoring (#1867)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: cache isFts5Available() at construction time
Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text
query. Result is now cached in _fts5Available at construction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The extracted helper methods (handleInitResponse, processObservationMessage,
processSummaryMessage) lost the conversationHistory.push calls for assistant
replies, breaking multi-turn context for queryOpenRouterMultiTurn.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove spurious console.error in logger JSON.parse catch (expected control flow)
- Remove debug logging from hot PID cleanup loop (approved override)
- Replace unsafe `error as Error` casts with instanceof checks in ChromaSync, GeminiAgent, OpenRouterAgent
- Wrap non-Error FTS failures with new Error(String()) instead of dropping details
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(evals): SWE-bench Docker scaffolding for claude-mem resolve-rate measurement
Adds evals/swebench/ scaffolding per .claude/plans/swebench-claude-mem-docker.md.
Agent image builds Claude Code 2.1.114 + locally-built claude-mem plugin;
run-instance.sh executes the two-turn ingest/fix protocol per instance;
run-batch.py orchestrates parallel Docker runs with per-instance isolation;
eval.sh wraps the upstream SWE-bench harness; summarize.py aggregates reports.
Orchestrator owns JSONL writes under a lock to avoid racy concurrent appends;
agent writes its authoritative diff to CLAUDE_MEM_OUTPUT_DIR (/scratch in
container mode) and the orchestrator reads it back. Scaffolding only — no
Docker build or smoke test run yet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(evals): OAuth credential mounting for Claude Max/Pro subscriptions
Skips per-call API billing by extracting OAuth creds from host Keychain
(macOS) or ~/.claude/.credentials.json (Linux) and bind-mounting them
read-only into each agent container. Creds are copied into HOME=$SCRATCH/.claude
at container start so the per-instance isolation model still holds.
Adds run-batch.py --auth {oauth,api-key,auto} (auto prefers OAuth, falls
back to API key). run-instance.sh accepts either ANTHROPIC_API_KEY or
CLAUDE_MEM_CREDENTIALS_FILE. smoke-test.sh runs one instance end-to-end
using OAuth for quick verification before batch runs.
Caveat surfaced in docstrings: Max/Pro has per-window usage limits and is
framed for individual developer use — batch evaluation may exhaust the
quota or raise compliance questions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(docker): basic claude-mem container for ad-hoc testing
Adds docker/claude-mem/ with a fresh spin-up image:
- Dockerfile: FROM node:20 (reproduces anthropics/claude-code .devcontainer
pattern — Anthropic ships the Dockerfile, not a pullable image); layers
Bun + uv + locally-built plugin/; runs as non-root node user
- entrypoint.sh: seeds OAuth creds from CLAUDE_MEM_CREDENTIALS_FILE into
$HOME/.claude/.credentials.json, then exec's the command (default: bash)
- build.sh: npm run build + docker build
- run.sh: interactive launcher; auto-extracts OAuth from macOS Keychain
(security find-generic-password) or ~/.claude/.credentials.json on Linux,
mounts host .docker-claude-mem-data/ at /home/node/.claude-mem so the
observations DB survives container exit
Validated end-to-end: PostToolUse hook fires, queue enqueues, worker's SDK
compression runs under subscription OAuth, observations row lands with
populated facts/concepts/files_read, Chroma sync triggers.
Also updates .gitignore/.dockerignore for the new runtime-output paths.
Built plugin artifacts refreshed by the build step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(evals/swebench): non-root user, OAuth mount, Lite dataset default
- Dockerfile.agent: switch to non-root \`node\` user (uid 1000); Claude Code
refuses --permission-mode bypassPermissions when euid==0, which made every
agent run exit 1 before producing a diff. Also move Bun + uv installs to
system paths so the non-root user can exec them.
- run-batch.py: add extract_oauth_credentials() that pulls from macOS
Keychain / Linux ~/.claude/.credentials.json into a temp file and bind-
mounts it at /auth/.credentials.json:ro with CLAUDE_MEM_CREDENTIALS_FILE.
New --auth {oauth,api-key,auto} flag. New --dataset flag so the batch can
target SWE-bench_Lite without editing the script.
- smoke-test.sh: default DATASET to princeton-nlp/SWE-bench_Lite (Lite
contains sympy__sympy-24152, Verified does not); accept DATASET env
override.
Caveat surfaced during testing: Max/Pro subscriptions have per-window usage
limits; running 5 instances in parallel with the "read every source file"
ingest prompt exhausted the 5h window within ~25 minutes (3/5 hit HTTP 429).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address PR #2076 review comments
- docker/claude-mem/run.sh: chmod 600 (not 644) on extracted OAuth creds
to match what `claude login` writes; avoids exposing tokens to other
host users. Verified readable inside the container under Docker
Desktop's UID translation.
- docker/claude-mem/Dockerfile: pin Bun + uv via --build-arg BUN_VERSION
/ UV_VERSION (defaults: 1.3.12, 0.11.7). Bun via `bash -s "bun-v<V>"`;
uv via versioned installer URL `https://astral.sh/uv/<V>/install.sh`.
- evals/swebench/smoke-test.sh: pipe JSON through stdin to `python3 -c`
so paths with spaces/special chars can't break shell interpolation.
- evals/swebench/run-batch.py: add --overwrite flag; abort by default
when predictions.jsonl for the run-id already exists, preventing
accidental silent discard of partial results.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address coderabbit review on PR #2076
Actionable (4):
- Dockerfile uv install: wrap `chmod ... || true` in braces so the trailing
`|| true` no longer masks failures from `curl|sh` via bash operator
precedence (&& binds tighter than ||). Applied to both docker/claude-mem/
and evals/swebench/Dockerfile.agent. Added `set -eux` to the RUN lines.
- docker/claude-mem/Dockerfile: drop unused `sudo` apt package (~2 MB).
- run-batch.py: name each agent container (`swebench-agent-<id>-<pid>-<tid>`)
and force-remove via `docker rm -f <name>` in the TimeoutExpired handler
so timed-out runs don't leave orphan containers.
Nitpicks (2):
- smoke-test.sh: collapse 3 python3 invocations into 1 — parse the instance
JSON once, print `repo base_commit`, and write problem.txt in the same
call.
- run-instance.sh: shallow clone via `--depth 1 --no-single-branch` +
`fetch --depth 1 origin $BASE_COMMIT`. Falls back to a full clone if the
server rejects the by-commit fetch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address second coderabbit review on PR #2076
Actionable (3):
- docker/claude-mem/run.sh: on macOS, fall back to ~/.claude/.credentials.json
when the Keychain lookup misses (some setups still have file-only creds).
Unified into a single creds_obtained gate so the error surface lists both
sources tried.
- docker/claude-mem/run.sh: drop `exec docker run` — `exec` replaces the shell
so the EXIT trap (`rm -f "$CREDS_FILE"`) never fires and the extracted
OAuth JSON leaks to disk until tmpfs cleanup. Run as a child instead so
the trap runs on exit.
- evals/swebench/smoke-test.sh: actually enforce the TIMEOUT env var. Pick
`timeout` or `gtimeout` (coreutils on macOS), fall back to uncapped with
a warning. Name the container so exit-124 from timeout can `docker rm -f`
it deterministically.
Nitpick from the same review (consolidated python3 calls in smoke-test.sh)
was already addressed in the prior commit ef621e00.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address third coderabbit review on PR #2076
Actionable (1):
- evals/swebench/smoke-test.sh: the consolidated python heredoc had competing
stdin redirections — `<<'PY'` (script body) AND `< "$INSTANCE_JSON"` (data).
The heredoc won, so `json.load(sys.stdin)` saw an empty stream and the parse
would have failed at runtime. Pass INSTANCE_JSON as argv[2] and `open()` it
inside the script instead; the heredoc is now only the script body, which
is what `python3 -` needs.
Nitpicks (2):
- evals/swebench/smoke-test.sh: macOS Keychain lookup now falls through to
~/.claude/.credentials.json on miss (matches docker/claude-mem/run.sh).
- evals/swebench/run-batch.py: extract_oauth_credentials() no longer
early-returns on Darwin keychain miss; falls through to the on-disk creds
file so macOS setups with file-only credentials work in batch mode too.
Functional spot-check of the parse fix confirmed: REPO/BASE_COMMIT populated
and problem.txt written from a synthetic INSTANCE_JSON.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: gitignore runtime state files
.claude/scheduled_tasks.lock is a PID+sessionId lock written by Claude
Code's cron scheduler every session. It got accidentally checked in during
the v12.0.0 bump and has been churning phantom diffs in every PR since.
Untrack it and ignore.
plugin/.cli-installed is a timestamp marker the claude-mem installer drops
to record when the plugin was installed. Never belonged in version control.
Ignore it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: add trailing newline to .gitignore
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
parseSummary runs on every agent response, not just summary turns. When the
turn is a normal observation, the LLM correctly emits <observation> and no
<summary> — but the fallthrough branch from #1345 treated this as prompt
misbehavior and logged "prompt conditioning may need strengthening" every
time. That assumption stopped holding after #1633 refactored the caller to
always invoke parseSummary with a coerceFromObservation flag.
Gate the whole observation-on-summary path on coerceFromObservation. On a
real summary turn, coercion still runs and logs the legitimate "coercion
failed" warning when the response has no usable content. On an observation
turn, parseSummary returns null silently, which is the correct behavior.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: disable subagent summaries and label subagent observations
Detect Claude Code subagent hook context via `agent_id`/`agent_type` on
stdin, short-circuit the Stop-hook summary path when present, and thread
the subagent identity end-to-end onto observation rows (new `agent_type`
and `agent_id` columns, migration 010 at version 27). Main-session rows
remain NULL; content-hash dedup is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address PR #2073 review feedback
- Narrow summarize subagent guard to agentId only so --agent-started
main sessions still own their summary (agentType alone is main-session).
- Remove now-dead agentId/agentType spreads from the summarize POST body.
- Always overwrite pendingAgentId/pendingAgentType in SDK/Gemini/OpenRouter
agents (clears stale subagent identity on main-session messages after
a subagent message in the same batch).
- Add idx_observations_agent_id index in migration 010 + the mirror
migration in SessionStore + the runner.
- Replace console.log in migration010 with logger.debug.
- Update summarize test: agentType alone no longer short-circuits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address CodeRabbit + claude-review iteration 4 feedback
- SessionRoutes.handleSummarizeByClaudeId: narrow worker-side guard to
agentId only (matches hook-side). agentType alone = --agent main
session, which still owns its summary.
- ResponseProcessor: wrap storeObservations in try/finally so
pendingAgentId/Type clear even if storage throws. Prevents stale
subagent identity from leaking into the next batch on error.
- SessionStore.importObservation + bulk.importObservation: persist
agent_type/agent_id so backup/import round-trips preserve subagent
attribution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* polish: claude-review iteration 5 cleanup
- Use ?? not || for nullable subagent fields in PendingMessageStore
(prevents treating empty string as null).
- Simplify observation.ts body spread — include fields unconditionally;
JSON.stringify drops undefined anyway.
- Narrow any[] to Array<{ name: string }> in migration010 column checks.
- Add trailing newline to migrations.ts.
- Document in observations/store.ts why the dedup hash intentionally
excludes agent fields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* polish: claude-review iteration 7 feedback
- claude-code adapter: add 128-char safety cap on agent_id/agent_type
so a malformed Claude Code payload cannot balloon DB rows. Empty
strings now also treated as absent.
- migration010: state-aware debug log lists only columns actually
added; idempotent re-runs log "already present; ensured indexes".
- Add 3 adapter tests covering the length cap boundary and empty-string
rejection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* perf: skip subagent summary before worker bootstrap
Move the agentId short-circuit above ensureWorkerRunning() so a Stop
hook fired inside a subagent does not trigger worker startup just to
return early. Addresses CodeRabbit nit on summarize.ts:36-47.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Initial plan
* fix: break infinite summary-retry loop (#1633)
Three-part fix:
1. Parser coercion: When LLM returns <observation> tags instead of <summary>,
coerce observation content into summary fields (root cause fix)
2. Stronger summary prompt: Add clearer tag requirements with warnings
3. Circuit breaker: Track consecutive summary failures per session,
skip further attempts after 3 failures to prevent unbounded prompt growth
Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77
Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>
* refactor: extract shared constants for summary mode marker and failure threshold
Addresses code review feedback: SUMMARY_MODE_MARKER and
MAX_CONSECUTIVE_SUMMARY_FAILURES are now defined once in sdk/prompts.ts
and imported by ResponseProcessor and SessionManager.
Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77
Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>
* fix: guard summary failure counter on summaryExpected (Greptile P1)
The circuit breaker counter previously incremented on any response
containing <observation> or <summary> tags — which matches virtually
every normal observation response. After 3 observations the breaker
would open and permanently block summarization, reproducing the
data-loss scenario #1633 was meant to prevent.
Gate the increment block on summaryExpected (already computed for
parseSummary coercion) so the counter only tracks actual summary
attempts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: cover circuit-breaker + apply review polish
- Use findLast / at(-1) for last-user-message lookup instead of
filter + index (O(1) common case).
- Drop redundant `|| 0` fallback — field is required and initialized.
- Add comment noting counter is ephemeral by design.
- Add ResponseProcessor tests covering:
* counter NOT incrementing on normal observation responses
(regression guard for the Greptile P1)
* counter incrementing when a summary was expected but missing
* counter resetting to 0 on successful summary storage
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: iterate all observation blocks; don't count skip_summary as failure
Addresses CodeRabbit review on #2072:
- coerceObservationToSummary now iterates all <observation> blocks
with a global regex and returns the first block that has title,
narrative, or facts. Previously, an empty leading observation
would short-circuit and discard populated follow-ups.
- Circuit-breaker counter now treats explicit <skip_summary/> as
neutral — neither a failure nor a success — so a run that happens
to end on a skip doesn't punish the session or mask a prior bad
streak. Real failures (no summary, no skip) still increment.
- Tests added for both cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: reference SUMMARY_MODE_MARKER constant instead of hardcoded string
Addresses CodeRabbit nitpick: tests should pull the marker from the
canonical source so they don't silently drift when the constant is
renamed or edited.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: also coerce observations when <summary> has empty sub-tags
When the LLM wraps an empty <summary></summary> around real observation
content, the #1360 empty-subtag guard rejects the summary and returns
null — which would lose the observation content and resurrect the
#1633 retry loop. Fall back to coerceObservationToSummary in that
branch too, mirroring the unmatched-<summary> path.
Adds a test covering the empty-summary-wraps-observation case and
a guard test for empty summary with no observation content.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>
Co-authored-by: Alex Newman <thedotmack@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Conductor workspace setup is no longer needed - plugins handle hook
registration directly via plugin/hooks/hooks.json. The shim was copying
a stale settings.local.json into every worktree, registering dead hook
paths (save-hook.js, new-hook.js, summary-hook.js) that no longer exist.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Script now reads existing CHANGELOG.md, skips releases already documented,
only fetches bodies for new releases, and prepends them. Pass --full to
force complete regeneration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses unresolved CodeRabbit finding on WorktreeAdoption.ts:296.
Previously, Chroma patch failures stranded rows permanently: adoptedSqliteIds
was built only from rows where merged_into_project IS NULL, so once SQL
committed, reruns couldn't rediscover them for retry.
The Chroma id set is now built from ALL observations whose project matches a
merged worktree — including rows already stamped to this parent. Combined
with the idempotent updateMergedIntoProject, transient Chroma failures
self-heal on the next adoption pass.
SQL writes remain idempotent (UPDATE still guards on merged_into_project IS
NULL), so adoptedObservations / adoptedSummaries continue to count only
newly-adopted rows. chromaUpdates now counts total Chroma writes per pass
(may exceed adoptedObservations when retrying).
Addresses six CodeRabbit/Greptile findings on PR #2052:
- Schema guard in adoptMergedWorktrees probes for merged_into_project
columns before preparing statements; returns early when absent so first
boot after upgrade (pre-migration) doesn't silently fail.
- Startup adoption now iterates distinct cwds from pending_messages and
dedupes via resolveMainRepoPath — the worker daemon runs with
cwd=plugin scripts dir, so process.cwd() fallback was a no-op.
- ObservationCompiler single-project queries (queryObservations /
querySummaries) OR merged_into_project into WHERE so injected context
surfaces adopted worktree rows, matching the Multi variants.
- SessionStore constructor now calls ensureMergedIntoProjectColumns so
bundled artifacts (context-generator.cjs) that embed SessionStore get
the merged_into_project column on DBs that only went through the
bundled migration chain.
- OBSERVER_SESSIONS_PROJECT constant is now derived from
basename(OBSERVER_SESSIONS_DIR) and used across PaginationHelper,
SessionStore, and timeline queries instead of hardcoded strings.
- Corrected misleading Chroma retry docstring in WorktreeAdoption to
match actual behavior (no auto-retry once SQL commits).
Observer sessions (internal SDK-driven worker queries) run under a
synthetic project name 'observer-sessions' to keep them out of
claude --resume. They were still surfacing in the viewer project
picker and unfiltered observation/summary/prompt feeds.
Filter them out at every UI-facing query:
- SessionStore.getAllProjects and getProjectCatalog
- timeline/queries.ts getAllProjects
- PaginationHelper observations/summaries/prompts when no project is selected
When a caller explicitly requests project='observer-sessions',
results are still returned (not a hard ban, just hidden by default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Document --branch override in npx-cli help text
- Guard ContextBuilder against empty projects[] override; fall back to cwd-derived primary
- Ensure merged_into_project indexes are created even if ALTER ran in a prior partial migration
- Reject adopt --branch/--cwd flags with missing or flag-like values
- Use defined --color-border-primary token for merged badge border
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Update allProjects test expectation to match [parent, composite] (matches JSDoc + callers in ContextBuilder/context handlers).
- Replace string-matched __DRY_RUN_ROLLBACK__ sentinel with dedicated DryRunRollback class to avoid swallowing unrelated errors.
- Add 5000ms timeout to spawnSync git calls in WorktreeAdoption and ProcessManager so worker startup can't hang on a stuck git process.
- Drop unreachable break after process.exit(0) in adopt case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regenerated worker-service.cjs, context-generator.cjs, viewer.html, and
viewer-bundle.js to reflect all six implementation phases of the merged-
worktree adoption feature.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ObservationCard renders a secondary "merged → <parent>" chip when
merged_into_project is set, next to the existing project label.
Both are meaningful: project is origin provenance, merged_into_project
is the current home.
Extends PaginationHelper's observations and summaries queries with
OR merged_into_project = ? so the single-project viewer fetch pulls
in adopted rows — the plan's Phase 3 covered multi-project context
injection; this is the single-project UI read path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a manual escape hatch for the worktree adoption engine. Covers
squash-merges where git branch --merged HEAD returns nothing, and
lets users re-run adoption on demand.
Wired through worker-service.cjs (same pattern as generate/clean)
so the command runs under Bun with bun:sqlite, keeping npx-cli/
pure Node. --cwd flag passes the user's working directory through
the spawn so the engine resolves the correct parent repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>