mirror of
https://github.com/thedotmack/claude-mem
synced 2026-04-25 17:15:04 +02:00
fix: add idle timeout to prevent zombie observer processes (#856)
* fix: add idle timeout to prevent zombie observer processes Root cause fix for zombie observer accumulation. The SessionQueueProcessor iterator now exits gracefully after 3 minutes of inactivity instead of waiting forever for messages. Changes: - Add IDLE_TIMEOUT_MS constant (3 minutes) - waitForMessage() now returns boolean and accepts timeout parameter - createIterator() tracks lastActivityTime and exits on idle timeout - Graceful exit via return (not throw) allows SDK to complete cleanly This addresses the root cause that PR #848 worked around with pattern matching. Observer processes now self-terminate, preventing accumulation when session-complete hooks don't fire. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: trigger abort on idle timeout to actually kill subprocess The previous implementation only returned from the iterator on idle timeout, but this doesn't terminate the Claude subprocess - it just stops yielding messages. The subprocess stays alive as a zombie because: 1. Returning from createIterator() ends the generator 2. The SDK closes stdin via transport.endInput() 3. But the subprocess may not exit on stdin EOF 4. No abort signal is sent to kill it Fix: Add onIdleTimeout callback that SessionManager uses to call session.abortController.abort(). This sends SIGTERM to the subprocess via the SDK's ProcessTransport abort handler. Verified by Codex analysis of the SDK internals: - abort() triggers ProcessTransport abort handler → SIGTERM - transport.close() sends SIGTERM → escalates to SIGKILL after 5s - Just closing stdin is NOT sufficient to guarantee subprocess exit Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: add idle timeout to prevent zombie observer processes Also cleaned up hooks.json to remove redundant start commands. The hook command handler now auto-starts the worker if not running, which is how it should have been since we changed to auto-start. This maintenance change was done manually. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: resolve race condition in session queue idle timeout detection - Reset timer on spurious wakeup when queue is empty but duration check fails - Use optional chaining for onIdleTimeout callback - Include threshold value in idle timeout log message for better diagnostics - Add comprehensive unit tests for SessionQueueProcessor Fixes PR #856 review feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: migrate installer to Setup hook - Add plugin/scripts/setup.sh for one-time dependency setup - Add Setup hook to hooks.json (triggers via claude --init) - Remove smart-install.js from SessionStart hook - Keep smart-install.js as manual fallback for Windows/auto-install Setup hook handles: - Bun detection with fallback locations - uv detection (optional, for Chroma) - Version marker to skip redundant installs - Clear error messages with install instructions * feat: add np for one-command npm releases - Add np as dev dependency - Add release, release:patch, release:minor, release:major scripts - Add prepublishOnly hook to run build before publish - Configure np (no yarn, include all contents, run tests) * fix: reduce PostToolUse hook timeout to 30s PostToolUse runs on every tool call, 120s was excessive and could cause hangs. Reduced to 30s for responsive behavior. * docs: add PR shipping report Analyzed 6 PRs for shipping readiness: - #856: Ready to merge (idle timeout fix) - #700, #722, #657: Have conflicts, need rebase - #464: Contributor PR, too large (15K+ lines) - #863: Needs manual review Includes shipping strategy and conflict resolution order. * MAESTRO: Verify PR #856 test suite passes All 797 tests pass (3 skipped, 0 failures). The 11 SessionQueueProcessor idle timeout tests all pass with 20 expect() assertions verified. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * MAESTRO: Verify PR #856 build passes - Ran npm run build successfully with no TypeScript errors - All artifacts generated (worker-service, mcp-server, context-generator, viewer UI) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * MAESTRO: Code review PR #856 implementation verified Verified all requirements in SessionQueueProcessor.ts: - IDLE_TIMEOUT_MS = 180000ms (3 minutes) - waitForMessage() accepts timeout parameter - lastActivityTime reset on spurious wakeup (race condition fix) - Graceful exit logs include thresholdMs parameter - 11 comprehensive test cases in SessionQueueProcessor.test.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: bigph00t <166455923+bigph00t@users.noreply.github.com> Co-authored-by: root <root@srv1317155.hstgr.cloud>
This commit is contained in:
213
docs/PR-SHIPPING-REPORT.md
Normal file
213
docs/PR-SHIPPING-REPORT.md
Normal file
@@ -0,0 +1,213 @@
|
||||
# Claude-Mem PR Shipping Report
|
||||
*Generated: 2026-02-04*
|
||||
|
||||
## Executive Summary
|
||||
|
||||
6 PRs analyzed for shipping readiness. **1 is ready to merge**, 4 have conflicts, 1 is too large for easy review.
|
||||
|
||||
| PR | Title | Status | Recommendation |
|
||||
|----|-------|--------|----------------|
|
||||
| **#856** | Idle timeout for zombie processes | ✅ **MERGEABLE** | **Ship it** |
|
||||
| #700 | Windows Terminal popup fix | ⚠️ Conflicts | Rebase, then ship |
|
||||
| #722 | In-process worker architecture | ⚠️ Conflicts | Rebase, high impact |
|
||||
| #657 | generate/clean CLI commands | ⚠️ Conflicts | Rebase, then ship |
|
||||
| #863 | Ragtime email investigation | 🔍 Needs review | Research pending |
|
||||
| #464 | Sleep Agent Pipeline (contributor) | 🔴 Too large | Request split or dedicated review |
|
||||
|
||||
---
|
||||
|
||||
## Ready to Ship
|
||||
|
||||
### PR #856: Idle Timeout for Zombie Observer Processes
|
||||
**Status:** ✅ MERGEABLE (no conflicts)
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Additions | 928 |
|
||||
| Deletions | 171 |
|
||||
| Files | 8 |
|
||||
| Risk | Low-Medium |
|
||||
|
||||
**What it does:**
|
||||
- Adds 3-minute idle timeout to `SessionQueueProcessor`
|
||||
- Prevents zombie observer processes that were causing 13.4GB swap usage
|
||||
- Processes exit gracefully after inactivity instead of waiting forever
|
||||
|
||||
**Why ship it:**
|
||||
- Fixes real user-reported issue (79 zombie processes)
|
||||
- Well-tested (11 new tests, 440 lines of test coverage)
|
||||
- Clean implementation, preventive approach
|
||||
- Supersedes PR #848's reactive cleanup
|
||||
- No conflicts, ready to merge
|
||||
|
||||
**Review notes:**
|
||||
- 1 Greptile bot comment (addressed)
|
||||
- Race condition fix included
|
||||
- Enhanced logging added
|
||||
|
||||
---
|
||||
|
||||
## Needs Rebase (Have Conflicts)
|
||||
|
||||
### PR #700: Windows Terminal Popup Fix
|
||||
**Status:** ⚠️ CONFLICTING
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Additions | 187 |
|
||||
| Deletions | 399 |
|
||||
| Files | 8 |
|
||||
| Risk | Medium |
|
||||
|
||||
**What it does:**
|
||||
- Eliminates Windows Terminal popup by removing spawn-based daemon
|
||||
- Worker `start` command becomes daemon directly (no child spawn)
|
||||
- Removes `restart` command (users do `stop` then `start`)
|
||||
- Net simplification: -212 lines
|
||||
|
||||
**Breaking changes:**
|
||||
- `restart` command removed
|
||||
|
||||
**Review status:**
|
||||
- ✅ 1 APPROVAL from @volkanfirat (Jan 15, 2026)
|
||||
|
||||
**Action needed:** Resolve conflicts, then ready to ship.
|
||||
|
||||
---
|
||||
|
||||
### PR #722: In-Process Worker Architecture
|
||||
**Status:** ⚠️ CONFLICTING
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Additions | 869 |
|
||||
| Deletions | 4,658 |
|
||||
| Files | 112 |
|
||||
| Risk | High |
|
||||
|
||||
**What it does:**
|
||||
- Hook processes become the worker (no separate daemon spawning)
|
||||
- First hook that needs worker becomes the worker
|
||||
- Eliminates Windows spawn issues ("NO SPAWN" rule)
|
||||
- 761 tests pass
|
||||
|
||||
**Architectural impact:** HIGH
|
||||
- Fundamentally changes worker lifecycle
|
||||
- Hook processes stay alive (they ARE the worker)
|
||||
- First hook wins port 37777, others use HTTP
|
||||
|
||||
**Action needed:** Resolve conflicts. Consider relationship with PR #700 (both touch worker architecture).
|
||||
|
||||
---
|
||||
|
||||
### PR #657: Generate/Clean CLI Commands
|
||||
**Status:** ⚠️ CONFLICTING
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Additions | 1,184 |
|
||||
| Deletions | 5,057 |
|
||||
| Files | 104 |
|
||||
| Risk | Medium |
|
||||
|
||||
**What it does:**
|
||||
- Adds `claude-mem generate` and `claude-mem clean` CLI commands
|
||||
- Fixes validation bugs (deleted folders recreated from stale DB)
|
||||
- Fixes Windows path handling
|
||||
- Adds automatic shell alias installation
|
||||
- Disables subdirectory CLAUDE.md files by default
|
||||
|
||||
**Breaking changes:**
|
||||
- Default behavior change: folder CLAUDE.md now disabled by default
|
||||
|
||||
**Action needed:** Resolve conflicts, complete Windows testing.
|
||||
|
||||
---
|
||||
|
||||
## Needs Attention
|
||||
|
||||
### PR #863: Ragtime Email Investigation
|
||||
**Status:** 🔍 Research pending
|
||||
|
||||
Research agent did not return results. Manual review needed.
|
||||
|
||||
---
|
||||
|
||||
### PR #464: Sleep Agent Pipeline (Contributor: @laihenyi)
|
||||
**Status:** 🔴 Too large for effective review
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Additions | 15,430 |
|
||||
| Deletions | 469 |
|
||||
| Files | 73 |
|
||||
| Wait time | 37+ days |
|
||||
| Risk | High |
|
||||
|
||||
**What it does:**
|
||||
- Sleep Agent Pipeline with memory tiering
|
||||
- Supersession detection
|
||||
- Session Statistics API (`/api/session/:id/stats`)
|
||||
- StatusLine + PreCompact hooks
|
||||
- Context Generator improvements
|
||||
- Self-healing CI workflow
|
||||
|
||||
**Concerns:**
|
||||
| Issue | Details |
|
||||
|-------|---------|
|
||||
| 🔴 Size | 15K+ lines is too large for effective review |
|
||||
| 🔴 SupersessionDetector | Single file with 1,282 additions |
|
||||
| 🟡 No tests visible | Test plan checkboxes unchecked |
|
||||
| 🟡 Self-healing CI | Auto-fix workflow could cause infinite commit loops |
|
||||
| 🟡 Serena config | Adds `.serena/` tooling |
|
||||
|
||||
**Recommendation:**
|
||||
1. **Option A:** Request contributor split into 4-5 smaller PRs
|
||||
2. **Option B:** Allocate dedicated review time (several hours)
|
||||
3. **Option C:** Cherry-pick specific features (hooks, stats API)
|
||||
|
||||
**Note:** Contributor has been waiting 37+ days. Deserves response either way.
|
||||
|
||||
---
|
||||
|
||||
## Shipping Strategy
|
||||
|
||||
### Phase 1: Quick Wins (This Week)
|
||||
1. **Merge #856** — Ready now, fixes real user issue
|
||||
2. **Rebase #700** — Has approval, Windows fix needed
|
||||
3. **Rebase #657** — Useful CLI commands
|
||||
|
||||
### Phase 2: Architecture (Careful Review)
|
||||
4. **Review #722** — High impact, conflicts with #700 approach?
|
||||
- Both PRs eliminate spawning but in different ways
|
||||
- May need to pick one approach
|
||||
|
||||
### Phase 3: Contributor PR
|
||||
5. **Respond to #464** — Options:
|
||||
- Ask for split
|
||||
- Schedule dedicated review
|
||||
- Cherry-pick subset
|
||||
|
||||
### Phase 4: Investigation
|
||||
6. **Manual review #863** — Ragtime email feature
|
||||
|
||||
---
|
||||
|
||||
## Conflict Resolution Order
|
||||
|
||||
Since multiple PRs have conflicts, suggested rebase order:
|
||||
|
||||
1. **#856** (merge first — no conflicts)
|
||||
2. **#700** (rebase onto main after #856)
|
||||
3. **#657** (rebase onto main after #700)
|
||||
4. **#722** (rebase last — may conflict with #700 architecturally)
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Ready | Conflicts | Needs Work |
|
||||
|-------|-----------|------------|
|
||||
| 1 PR (#856) | 3 PRs (#700, #722, #657) | 2 PRs (#464, #863) |
|
||||
|
||||
**Immediate action:** Merge #856, then rebase the conflict PRs in order.
|
||||
Reference in New Issue
Block a user