Files
claude-mem/Auto Run Docs/Initiation/Phase-01-Merge-PR-856-Zombie-Observer-Fix.md
Alex Newman 7566b8c650 fix: add idle timeout to prevent zombie observer processes (#856)
* fix: add idle timeout to prevent zombie observer processes

Root cause fix for zombie observer accumulation. The SessionQueueProcessor
iterator now exits gracefully after 3 minutes of inactivity instead of
waiting forever for messages.

Changes:
- Add IDLE_TIMEOUT_MS constant (3 minutes)
- waitForMessage() now returns boolean and accepts timeout parameter
- createIterator() tracks lastActivityTime and exits on idle timeout
- Graceful exit via return (not throw) allows SDK to complete cleanly

This addresses the root cause that PR #848 worked around with pattern
matching. Observer processes now self-terminate, preventing accumulation
when session-complete hooks don't fire.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: trigger abort on idle timeout to actually kill subprocess

The previous implementation only returned from the iterator on idle timeout,
but this doesn't terminate the Claude subprocess - it just stops yielding
messages. The subprocess stays alive as a zombie because:

1. Returning from createIterator() ends the generator
2. The SDK closes stdin via transport.endInput()
3. But the subprocess may not exit on stdin EOF
4. No abort signal is sent to kill it

Fix: Add onIdleTimeout callback that SessionManager uses to call
session.abortController.abort(). This sends SIGTERM to the subprocess
via the SDK's ProcessTransport abort handler.

Verified by Codex analysis of the SDK internals:
- abort() triggers ProcessTransport abort handler → SIGTERM
- transport.close() sends SIGTERM → escalates to SIGKILL after 5s
- Just closing stdin is NOT sufficient to guarantee subprocess exit

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: add idle timeout to prevent zombie observer processes

Also cleaned up hooks.json to remove redundant start commands.
The hook command handler now auto-starts the worker if not running,
which is how it should have been since we changed to auto-start.

This maintenance change was done manually.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: resolve race condition in session queue idle timeout detection

- Reset timer on spurious wakeup when queue is empty but duration check fails
- Use optional chaining for onIdleTimeout callback
- Include threshold value in idle timeout log message for better diagnostics
- Add comprehensive unit tests for SessionQueueProcessor

Fixes PR #856 review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: migrate installer to Setup hook

- Add plugin/scripts/setup.sh for one-time dependency setup
- Add Setup hook to hooks.json (triggers via claude --init)
- Remove smart-install.js from SessionStart hook
- Keep smart-install.js as manual fallback for Windows/auto-install

Setup hook handles:
- Bun detection with fallback locations
- uv detection (optional, for Chroma)
- Version marker to skip redundant installs
- Clear error messages with install instructions

* feat: add np for one-command npm releases

- Add np as dev dependency
- Add release, release:patch, release:minor, release:major scripts
- Add prepublishOnly hook to run build before publish
- Configure np (no yarn, include all contents, run tests)

* fix: reduce PostToolUse hook timeout to 30s

PostToolUse runs on every tool call, 120s was excessive and could cause
hangs. Reduced to 30s for responsive behavior.

* docs: add PR shipping report

Analyzed 6 PRs for shipping readiness:
- #856: Ready to merge (idle timeout fix)
- #700, #722, #657: Have conflicts, need rebase
- #464: Contributor PR, too large (15K+ lines)
- #863: Needs manual review

Includes shipping strategy and conflict resolution order.

* MAESTRO: Verify PR #856 test suite passes

All 797 tests pass (3 skipped, 0 failures). The 11 SessionQueueProcessor
idle timeout tests all pass with 20 expect() assertions verified.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* MAESTRO: Verify PR #856 build passes

- Ran npm run build successfully with no TypeScript errors
- All artifacts generated (worker-service, mcp-server, context-generator, viewer UI)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* MAESTRO: Code review PR #856 implementation verified

Verified all requirements in SessionQueueProcessor.ts:
- IDLE_TIMEOUT_MS = 180000ms (3 minutes)
- waitForMessage() accepts timeout parameter
- lastActivityTime reset on spurious wakeup (race condition fix)
- Graceful exit logs include thresholdMs parameter
- 11 comprehensive test cases in SessionQueueProcessor.test.ts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: bigph00t <166455923+bigph00t@users.noreply.github.com>
Co-authored-by: root <root@srv1317155.hstgr.cloud>
2026-02-04 19:31:24 -05:00

2.5 KiB

Phase 01: Test and Merge PR #856 - Zombie Observer Fix

PR #856 adds idle timeout to SessionQueueProcessor to prevent zombie observer processes. This is the most mature PR with existing test coverage, passing CI, and no merge conflicts. By the end of this phase, the fix will be merged to main and the improvement will be live.

Tasks

  • Checkout and verify PR #856:

    • git fetch origin fix/observer-idle-timeout
    • git checkout fix/observer-idle-timeout
    • Verify the branch is up to date with origin
    • Branch verified up to date with origin (pulled 4 new files: PR-SHIPPING-REPORT.md, package.json updates, hooks.json updates, setup.sh)
  • Run the full test suite to confirm all tests pass:

    • npm test
    • Specifically verify the 11 SessionQueueProcessor tests pass
    • Report any failures
    • Full test suite passes: 797 pass, 3 skip (pre-existing), 0 fail
    • All 11 SessionQueueProcessor tests pass: 11 pass, 0 fail, 20 expect() calls
  • Run the build to confirm compilation succeeds:

    • npm run build
    • Verify no TypeScript errors
    • Verify all artifacts are generated
    • Build completed successfully with no TypeScript errors
    • All artifacts generated:
      • worker-service.cjs (1786.80 KB)
      • mcp-server.cjs (332.41 KB)
      • context-generator.cjs (61.57 KB)
      • viewer-bundle.js and viewer.html
  • Code review the changes for correctness:

    • Read src/services/queue/SessionQueueProcessor.ts and verify:
      • IDLE_TIMEOUT_MS is set to 3 minutes (180000ms)
      • waitForMessage() accepts timeout parameter
      • lastActivityTime is reset on spurious wakeup (race condition fix)
      • Graceful exit logs with thresholdMs parameter
    • Read tests/services/queue/SessionQueueProcessor.test.ts and verify test coverage
    • Code review complete - all requirements verified:
      • Line 6: IDLE_TIMEOUT_MS = 3 * 60 * 1000 (180000ms)
      • Line 90: waitForMessage(signal: AbortSignal, timeoutMs: number = IDLE_TIMEOUT_MS)
      • Line 63: lastActivityTime = Date.now() on spurious wakeup with comment
      • Lines 54-58: Logger includes thresholdMs: IDLE_TIMEOUT_MS parameter
      • 11 test cases covering idle timeout, abort signal, message events, cleanup, errors, and conversion
  • Merge PR #856 to main:

    • git checkout main
    • git pull origin main
    • gh pr merge 856 --squash --delete-branch
    • Verify merge succeeded
  • Run post-merge verification:

    • git pull origin main
    • npm test to confirm tests still pass on main
    • npm run build to confirm build still works