mirror of https://github.com/thedotmack/claude-mem synced 2026-04-25 17:15:04 +02:00

Files

Alex Newman 7566b8c650 fix: add idle timeout to prevent zombie observer processes (#856 )

* fix: add idle timeout to prevent zombie observer processes

Root cause fix for zombie observer accumulation. The SessionQueueProcessor
iterator now exits gracefully after 3 minutes of inactivity instead of
waiting forever for messages.

Changes:
- Add IDLE_TIMEOUT_MS constant (3 minutes)
- waitForMessage() now returns boolean and accepts timeout parameter
- createIterator() tracks lastActivityTime and exits on idle timeout
- Graceful exit via return (not throw) allows SDK to complete cleanly

This addresses the root cause that PR #848 worked around with pattern
matching. Observer processes now self-terminate, preventing accumulation
when session-complete hooks don't fire.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: trigger abort on idle timeout to actually kill subprocess

The previous implementation only returned from the iterator on idle timeout,
but this doesn't terminate the Claude subprocess - it just stops yielding
messages. The subprocess stays alive as a zombie because:

1. Returning from createIterator() ends the generator
2. The SDK closes stdin via transport.endInput()
3. But the subprocess may not exit on stdin EOF
4. No abort signal is sent to kill it

Fix: Add onIdleTimeout callback that SessionManager uses to call
session.abortController.abort(). This sends SIGTERM to the subprocess
via the SDK's ProcessTransport abort handler.

Verified by Codex analysis of the SDK internals:
- abort() triggers ProcessTransport abort handler → SIGTERM
- transport.close() sends SIGTERM → escalates to SIGKILL after 5s
- Just closing stdin is NOT sufficient to guarantee subprocess exit

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: add idle timeout to prevent zombie observer processes

Also cleaned up hooks.json to remove redundant start commands.
The hook command handler now auto-starts the worker if not running,
which is how it should have been since we changed to auto-start.

This maintenance change was done manually.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: resolve race condition in session queue idle timeout detection

- Reset timer on spurious wakeup when queue is empty but duration check fails
- Use optional chaining for onIdleTimeout callback
- Include threshold value in idle timeout log message for better diagnostics
- Add comprehensive unit tests for SessionQueueProcessor

Fixes PR #856 review feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: migrate installer to Setup hook

- Add plugin/scripts/setup.sh for one-time dependency setup
- Add Setup hook to hooks.json (triggers via claude --init)
- Remove smart-install.js from SessionStart hook
- Keep smart-install.js as manual fallback for Windows/auto-install

Setup hook handles:
- Bun detection with fallback locations
- uv detection (optional, for Chroma)
- Version marker to skip redundant installs
- Clear error messages with install instructions

* feat: add np for one-command npm releases

- Add np as dev dependency
- Add release, release:patch, release:minor, release:major scripts
- Add prepublishOnly hook to run build before publish
- Configure np (no yarn, include all contents, run tests)

* fix: reduce PostToolUse hook timeout to 30s

PostToolUse runs on every tool call, 120s was excessive and could cause
hangs. Reduced to 30s for responsive behavior.

* docs: add PR shipping report

Analyzed 6 PRs for shipping readiness:
- #856: Ready to merge (idle timeout fix)
- #700, #722, #657: Have conflicts, need rebase
- #464: Contributor PR, too large (15K+ lines)
- #863: Needs manual review

Includes shipping strategy and conflict resolution order.

* MAESTRO: Verify PR #856 test suite passes

All 797 tests pass (3 skipped, 0 failures). The 11 SessionQueueProcessor
idle timeout tests all pass with 20 expect() assertions verified.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* MAESTRO: Verify PR #856 build passes

- Ran npm run build successfully with no TypeScript errors
- All artifacts generated (worker-service, mcp-server, context-generator, viewer UI)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* MAESTRO: Code review PR #856 implementation verified

Verified all requirements in SessionQueueProcessor.ts:
- IDLE_TIMEOUT_MS = 180000ms (3 minutes)
- waitForMessage() accepts timeout parameter
- lastActivityTime reset on spurious wakeup (race condition fix)
- Graceful exit logs include thresholdMs parameter
- 11 comprehensive test cases in SessionQueueProcessor.test.ts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: bigph00t <166455923+bigph00t@users.noreply.github.com>
Co-authored-by: root <root@srv1317155.hstgr.cloud>

2026-02-04 19:31:24 -05:00

5.9 KiB

Raw Blame History

Claude-Mem PR Shipping Report

Generated: 2026-02-04

Executive Summary

6 PRs analyzed for shipping readiness. 1 is ready to merge, 4 have conflicts, 1 is too large for easy review.

PR	Title	Status	Recommendation
#856	Idle timeout for zombie processes	✅ MERGEABLE	Ship it
#700	Windows Terminal popup fix	⚠️ Conflicts	Rebase, then ship
#722	In-process worker architecture	⚠️ Conflicts	Rebase, high impact
#657	generate/clean CLI commands	⚠️ Conflicts	Rebase, then ship
#863	Ragtime email investigation	🔍 Needs review	Research pending
#464	Sleep Agent Pipeline (contributor)	🔴 Too large	Request split or dedicated review

Ready to Ship

PR #856: Idle Timeout for Zombie Observer Processes

Status: ✅ MERGEABLE (no conflicts)

Metric	Value
Additions	928
Deletions	171
Files	8
Risk	Low-Medium

What it does:

Adds 3-minute idle timeout to SessionQueueProcessor
Prevents zombie observer processes that were causing 13.4GB swap usage
Processes exit gracefully after inactivity instead of waiting forever

Why ship it:

Fixes real user-reported issue (79 zombie processes)
Well-tested (11 new tests, 440 lines of test coverage)
Clean implementation, preventive approach
Supersedes PR #848's reactive cleanup
No conflicts, ready to merge

Review notes:

1 Greptile bot comment (addressed)
Race condition fix included
Enhanced logging added

Needs Rebase (Have Conflicts)

Status: ⚠️ CONFLICTING

Metric	Value
Additions	187
Deletions	399
Files	8
Risk	Medium

What it does:

Eliminates Windows Terminal popup by removing spawn-based daemon
Worker start command becomes daemon directly (no child spawn)
Removes restart command (users do stop then start)
Net simplification: -212 lines

Breaking changes:

restart command removed

Review status:

✅ 1 APPROVAL from @volkanfirat (Jan 15, 2026)

Action needed: Resolve conflicts, then ready to ship.

PR #722: In-Process Worker Architecture

Status: ⚠️ CONFLICTING

Metric	Value
Additions	869
Deletions	4,658
Files	112
Risk	High

What it does:

Hook processes become the worker (no separate daemon spawning)
First hook that needs worker becomes the worker
Eliminates Windows spawn issues ("NO SPAWN" rule)
761 tests pass

Architectural impact: HIGH

Fundamentally changes worker lifecycle
Hook processes stay alive (they ARE the worker)
First hook wins port 37777, others use HTTP

Action needed: Resolve conflicts. Consider relationship with PR #700 (both touch worker architecture).

PR #657: Generate/Clean CLI Commands

Status: ⚠️ CONFLICTING

Metric	Value
Additions	1,184
Deletions	5,057
Files	104
Risk	Medium

What it does:

Adds claude-mem generate and claude-mem clean CLI commands
Fixes validation bugs (deleted folders recreated from stale DB)
Fixes Windows path handling
Adds automatic shell alias installation
Disables subdirectory CLAUDE.md files by default

Breaking changes:

Default behavior change: folder CLAUDE.md now disabled by default

Action needed: Resolve conflicts, complete Windows testing.

Needs Attention

PR #863: Ragtime Email Investigation

Status: 🔍 Research pending

Research agent did not return results. Manual review needed.

PR #464: Sleep Agent Pipeline (Contributor: @laihenyi)

Status: 🔴 Too large for effective review

Metric	Value
Additions	15,430
Deletions	469
Files	73
Wait time	37+ days
Risk	High

What it does:

Sleep Agent Pipeline with memory tiering
Supersession detection
Session Statistics API (/api/session/:id/stats)
StatusLine + PreCompact hooks
Context Generator improvements
Self-healing CI workflow

Concerns:

Issue	Details
🔴 Size	15K+ lines is too large for effective review
🔴 SupersessionDetector	Single file with 1,282 additions
🟡 No tests visible	Test plan checkboxes unchecked
🟡 Self-healing CI	Auto-fix workflow could cause infinite commit loops
🟡 Serena config	Adds `.serena/` tooling

Recommendation:

Option A: Request contributor split into 4-5 smaller PRs
Option B: Allocate dedicated review time (several hours)
Option C: Cherry-pick specific features (hooks, stats API)

Note: Contributor has been waiting 37+ days. Deserves response either way.

Shipping Strategy

Phase 1: Quick Wins (This Week)

Merge #856 — Ready now, fixes real user issue
Rebase #700 — Has approval, Windows fix needed
Rebase #657 — Useful CLI commands

Phase 2: Architecture (Careful Review)

Review #722 — High impact, conflicts with #700 approach?
- Both PRs eliminate spawning but in different ways
- May need to pick one approach

Phase 3: Contributor PR

Respond to #464 — Options:
- Ask for split
- Schedule dedicated review
- Cherry-pick subset

Phase 4: Investigation

Manual review #863 — Ragtime email feature

Conflict Resolution Order

Since multiple PRs have conflicts, suggested rebase order:

#856 (merge first — no conflicts)
#700 (rebase onto main after #856)
#657 (rebase onto main after #700)
#722 (rebase last — may conflict with #700 architecturally)

Summary

Ready	Conflicts	Needs Work
1 PR (#856)	3 PRs (#700, #722, #657)	2 PRs (#464, #863)

Immediate action: Merge #856, then rebase the conflict PRs in order.

5.9 KiB Raw Blame History

Claude-Mem PR Shipping Report

Executive Summary

Ready to Ship

PR #856: Idle Timeout for Zombie Observer Processes

Needs Rebase (Have Conflicts)

PR #700: Windows Terminal Popup Fix

PR #722: In-Process Worker Architecture

PR #657: Generate/Clean CLI Commands

Needs Attention

PR #863: Ragtime Email Investigation

PR #464: Sleep Agent Pipeline (Contributor: @laihenyi)

Shipping Strategy

Phase 1: Quick Wins (This Week)

Phase 2: Architecture (Careful Review)

Phase 3: Contributor PR

Phase 4: Investigation

Conflict Resolution Order

Summary

5.9 KiB

Raw Blame History