Files
claude-mem/docs/PR-SHIPPING-REPORT.md
Alex Newman 7566b8c650 fix: add idle timeout to prevent zombie observer processes (#856)
* fix: add idle timeout to prevent zombie observer processes

Root cause fix for zombie observer accumulation. The SessionQueueProcessor
iterator now exits gracefully after 3 minutes of inactivity instead of
waiting forever for messages.

Changes:
- Add IDLE_TIMEOUT_MS constant (3 minutes)
- waitForMessage() now returns boolean and accepts timeout parameter
- createIterator() tracks lastActivityTime and exits on idle timeout
- Graceful exit via return (not throw) allows SDK to complete cleanly

This addresses the root cause that PR #848 worked around with pattern
matching. Observer processes now self-terminate, preventing accumulation
when session-complete hooks don't fire.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: trigger abort on idle timeout to actually kill subprocess

The previous implementation only returned from the iterator on idle timeout,
but this doesn't terminate the Claude subprocess - it just stops yielding
messages. The subprocess stays alive as a zombie because:

1. Returning from createIterator() ends the generator
2. The SDK closes stdin via transport.endInput()
3. But the subprocess may not exit on stdin EOF
4. No abort signal is sent to kill it

Fix: Add onIdleTimeout callback that SessionManager uses to call
session.abortController.abort(). This sends SIGTERM to the subprocess
via the SDK's ProcessTransport abort handler.

Verified by Codex analysis of the SDK internals:
- abort() triggers ProcessTransport abort handler → SIGTERM
- transport.close() sends SIGTERM → escalates to SIGKILL after 5s
- Just closing stdin is NOT sufficient to guarantee subprocess exit

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: add idle timeout to prevent zombie observer processes

Also cleaned up hooks.json to remove redundant start commands.
The hook command handler now auto-starts the worker if not running,
which is how it should have been since we changed to auto-start.

This maintenance change was done manually.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: resolve race condition in session queue idle timeout detection

- Reset timer on spurious wakeup when queue is empty but duration check fails
- Use optional chaining for onIdleTimeout callback
- Include threshold value in idle timeout log message for better diagnostics
- Add comprehensive unit tests for SessionQueueProcessor

Fixes PR #856 review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: migrate installer to Setup hook

- Add plugin/scripts/setup.sh for one-time dependency setup
- Add Setup hook to hooks.json (triggers via claude --init)
- Remove smart-install.js from SessionStart hook
- Keep smart-install.js as manual fallback for Windows/auto-install

Setup hook handles:
- Bun detection with fallback locations
- uv detection (optional, for Chroma)
- Version marker to skip redundant installs
- Clear error messages with install instructions

* feat: add np for one-command npm releases

- Add np as dev dependency
- Add release, release:patch, release:minor, release:major scripts
- Add prepublishOnly hook to run build before publish
- Configure np (no yarn, include all contents, run tests)

* fix: reduce PostToolUse hook timeout to 30s

PostToolUse runs on every tool call, 120s was excessive and could cause
hangs. Reduced to 30s for responsive behavior.

* docs: add PR shipping report

Analyzed 6 PRs for shipping readiness:
- #856: Ready to merge (idle timeout fix)
- #700, #722, #657: Have conflicts, need rebase
- #464: Contributor PR, too large (15K+ lines)
- #863: Needs manual review

Includes shipping strategy and conflict resolution order.

* MAESTRO: Verify PR #856 test suite passes

All 797 tests pass (3 skipped, 0 failures). The 11 SessionQueueProcessor
idle timeout tests all pass with 20 expect() assertions verified.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* MAESTRO: Verify PR #856 build passes

- Ran npm run build successfully with no TypeScript errors
- All artifacts generated (worker-service, mcp-server, context-generator, viewer UI)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* MAESTRO: Code review PR #856 implementation verified

Verified all requirements in SessionQueueProcessor.ts:
- IDLE_TIMEOUT_MS = 180000ms (3 minutes)
- waitForMessage() accepts timeout parameter
- lastActivityTime reset on spurious wakeup (race condition fix)
- Graceful exit logs include thresholdMs parameter
- 11 comprehensive test cases in SessionQueueProcessor.test.ts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: bigph00t <166455923+bigph00t@users.noreply.github.com>
Co-authored-by: root <root@srv1317155.hstgr.cloud>
2026-02-04 19:31:24 -05:00

5.9 KiB

Claude-Mem PR Shipping Report

Generated: 2026-02-04

Executive Summary

6 PRs analyzed for shipping readiness. 1 is ready to merge, 4 have conflicts, 1 is too large for easy review.

PR Title Status Recommendation
#856 Idle timeout for zombie processes MERGEABLE Ship it
#700 Windows Terminal popup fix ⚠️ Conflicts Rebase, then ship
#722 In-process worker architecture ⚠️ Conflicts Rebase, high impact
#657 generate/clean CLI commands ⚠️ Conflicts Rebase, then ship
#863 Ragtime email investigation 🔍 Needs review Research pending
#464 Sleep Agent Pipeline (contributor) 🔴 Too large Request split or dedicated review

Ready to Ship

PR #856: Idle Timeout for Zombie Observer Processes

Status: MERGEABLE (no conflicts)

Metric Value
Additions 928
Deletions 171
Files 8
Risk Low-Medium

What it does:

  • Adds 3-minute idle timeout to SessionQueueProcessor
  • Prevents zombie observer processes that were causing 13.4GB swap usage
  • Processes exit gracefully after inactivity instead of waiting forever

Why ship it:

  • Fixes real user-reported issue (79 zombie processes)
  • Well-tested (11 new tests, 440 lines of test coverage)
  • Clean implementation, preventive approach
  • Supersedes PR #848's reactive cleanup
  • No conflicts, ready to merge

Review notes:

  • 1 Greptile bot comment (addressed)
  • Race condition fix included
  • Enhanced logging added

Needs Rebase (Have Conflicts)

PR #700: Windows Terminal Popup Fix

Status: ⚠️ CONFLICTING

Metric Value
Additions 187
Deletions 399
Files 8
Risk Medium

What it does:

  • Eliminates Windows Terminal popup by removing spawn-based daemon
  • Worker start command becomes daemon directly (no child spawn)
  • Removes restart command (users do stop then start)
  • Net simplification: -212 lines

Breaking changes:

  • restart command removed

Review status:

  • 1 APPROVAL from @volkanfirat (Jan 15, 2026)

Action needed: Resolve conflicts, then ready to ship.


PR #722: In-Process Worker Architecture

Status: ⚠️ CONFLICTING

Metric Value
Additions 869
Deletions 4,658
Files 112
Risk High

What it does:

  • Hook processes become the worker (no separate daemon spawning)
  • First hook that needs worker becomes the worker
  • Eliminates Windows spawn issues ("NO SPAWN" rule)
  • 761 tests pass

Architectural impact: HIGH

  • Fundamentally changes worker lifecycle
  • Hook processes stay alive (they ARE the worker)
  • First hook wins port 37777, others use HTTP

Action needed: Resolve conflicts. Consider relationship with PR #700 (both touch worker architecture).


PR #657: Generate/Clean CLI Commands

Status: ⚠️ CONFLICTING

Metric Value
Additions 1,184
Deletions 5,057
Files 104
Risk Medium

What it does:

  • Adds claude-mem generate and claude-mem clean CLI commands
  • Fixes validation bugs (deleted folders recreated from stale DB)
  • Fixes Windows path handling
  • Adds automatic shell alias installation
  • Disables subdirectory CLAUDE.md files by default

Breaking changes:

  • Default behavior change: folder CLAUDE.md now disabled by default

Action needed: Resolve conflicts, complete Windows testing.


Needs Attention

PR #863: Ragtime Email Investigation

Status: 🔍 Research pending

Research agent did not return results. Manual review needed.


PR #464: Sleep Agent Pipeline (Contributor: @laihenyi)

Status: 🔴 Too large for effective review

Metric Value
Additions 15,430
Deletions 469
Files 73
Wait time 37+ days
Risk High

What it does:

  • Sleep Agent Pipeline with memory tiering
  • Supersession detection
  • Session Statistics API (/api/session/:id/stats)
  • StatusLine + PreCompact hooks
  • Context Generator improvements
  • Self-healing CI workflow

Concerns:

Issue Details
🔴 Size 15K+ lines is too large for effective review
🔴 SupersessionDetector Single file with 1,282 additions
🟡 No tests visible Test plan checkboxes unchecked
🟡 Self-healing CI Auto-fix workflow could cause infinite commit loops
🟡 Serena config Adds .serena/ tooling

Recommendation:

  1. Option A: Request contributor split into 4-5 smaller PRs
  2. Option B: Allocate dedicated review time (several hours)
  3. Option C: Cherry-pick specific features (hooks, stats API)

Note: Contributor has been waiting 37+ days. Deserves response either way.


Shipping Strategy

Phase 1: Quick Wins (This Week)

  1. Merge #856 — Ready now, fixes real user issue
  2. Rebase #700 — Has approval, Windows fix needed
  3. Rebase #657 — Useful CLI commands

Phase 2: Architecture (Careful Review)

  1. Review #722 — High impact, conflicts with #700 approach?
    • Both PRs eliminate spawning but in different ways
    • May need to pick one approach

Phase 3: Contributor PR

  1. Respond to #464 — Options:
    • Ask for split
    • Schedule dedicated review
    • Cherry-pick subset

Phase 4: Investigation

  1. Manual review #863 — Ragtime email feature

Conflict Resolution Order

Since multiple PRs have conflicts, suggested rebase order:

  1. #856 (merge first — no conflicts)
  2. #700 (rebase onto main after #856)
  3. #657 (rebase onto main after #700)
  4. #722 (rebase last — may conflict with #700 architecturally)

Summary

Ready Conflicts Needs Work
1 PR (#856) 3 PRs (#700, #722, #657) 2 PRs (#464, #863)

Immediate action: Merge #856, then rebase the conflict PRs in order.