Files
Alex Newman 99060bac1a fix: detect PID reuse in worker start-guard (container restarts) (#2082)
* fix: detect PID reuse in worker start-guard to survive container restarts

The 'Worker already running' guard checked PID liveness with kill(0), which
false-positives when a persistent PID file outlives the PID namespace (docker
stop / docker start, pm2 graceful reloads). The new worker comes up with the
same low PID (e.g. 11) as the old one, kill(0) says 'alive', and the worker
refuses to start against its own prior incarnation.

Capture a process-start token alongside the PID and verify identity, not just
liveness:
  - Linux: /proc/<pid>/stat field 22 (starttime, jiffies since boot)
  - macOS/POSIX: `ps -p <pid> -o lstart=`
  - Windows: unchanged (returns null, falls back to liveness)

PID files written by older versions are token-less, so verifyPidFileOwnership
falls back to the current liveness-only behavior for backwards compatibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: apply review feedback to PID identity helpers

- Collapse ProcessManager re-export down to a single import/export statement.
- Make verifyPidFileOwnership a type predicate (info is PidInfo) so callers
  don't need non-null assertions on the narrowed value.
- Drop the `!` assertions at the worker-service GUARD 1 call site now that
  the predicate narrows.
- Tighten the captureProcessStartToken platform doc comment to enumerate
  process.platform values explicitly.

No behavior change — esbuild output is byte-identical (type-only edits).
Addresses items 1-3 of the claude-review comment on PR #2082.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pin LC_ALL=C for `ps lstart=` in captureProcessStartToken

Without a locale pin, `ps -o lstart=` emits month/weekday names in the
system locale. A bind-mounted PID file written under one locale and read
under another would hash to different tokens and the live worker would
incorrectly appear stale — reintroducing the very bug this helper exists
to prevent.

Flagged by Greptile on PR #2082.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: address second-round review on PID identity helpers

- verifyPidFileOwnership: log a DEBUG diagnostic when the PID is alive but
  the start-token mismatches. Without it, callers can't distinguish the
  "process dead" path from the "PID reused" path in production logs — the
  exact case this helper exists to catch.
- writePidFile: drop the redundant `?? undefined` coercion. `null` and
  `undefined` are both falsy for the subsequent ternary, so the coercion
  was purely cosmetic noise that suggested an important distinction.
- Add a unit test for the win32 fallback path in captureProcessStartToken
  (mocks process.platform) — previously uncovered in CI.

Addresses items 1, 2, and 5 of the second claude-review on PR #2082.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 19:49:03 -07:00
..