[codex] Harden heartbeat scheduling and runtime controls (#4223)

## Thinking Path

> - Paperclip orchestrates AI agents through issue checkout, heartbeat
runs, routines, and auditable control-plane state
> - The runtime path has to recover from lost local processes, transient
adapter failures, blocked dependencies, and routine coalescing without
stranding work
> - The existing branch carried several reliability fixes across
heartbeat scheduling, issue runtime controls, routine dispatch, and
operator-facing run state
> - These changes belong together because they share backend contracts,
migrations, and runtime status semantics
> - This pull request groups the control-plane/runtime slice so it can
merge independently from board UI polish and adapter sandbox work
> - The benefit is safer heartbeat recovery, clearer runtime controls,
and more predictable recurring execution behavior

## What Changed

- Adds bounded heartbeat retry scheduling, scheduled retry state, and
Codex transient failure recovery handling.
- Tightens heartbeat process recovery, blocker wake behavior, issue
comment wake handling, routine dispatch coalescing, and
activity/dashboard bounds.
- Adds runtime-control MCP tools and Paperclip skill docs for issue
workspace runtime management.
- Adds migrations `0061_lively_thor_girl.sql` and
`0062_routine_run_dispatch_fingerprint.sql`.
- Surfaces retry state in run ledger/agent UI and keeps related shared
types synchronized.

## Verification

- `pnpm exec vitest run
server/src/__tests__/heartbeat-retry-scheduling.test.ts
server/src/__tests__/heartbeat-process-recovery.test.ts
server/src/__tests__/routines-service.test.ts`
- `pnpm exec vitest run src/tools.test.ts` from `packages/mcp-server`

## Risks

- Medium risk: this touches heartbeat recovery and routine dispatch,
which are central execution paths.
- Migration order matters if split branches land out of order: merge
this PR before branches that assume the new runtime/routine fields.
- Runtime retry behavior should be watched in CI and in local operator
smoke tests because it changes how transient failures are resumed.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent runtime, shell/git tool use
enabled. Exact hosted model build and context window are not exposed in
this Paperclip heartbeat environment.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
This commit is contained in:
Dotta
2026-04-21 12:24:11 -05:00
committed by GitHub
parent ab9051b595
commit 09d0678840
61 changed files with 17622 additions and 456 deletions

View File

@@ -24,7 +24,6 @@ COMMITS=$(git log --since="${DATE}T00:00:00" --until="${NEXT_DATE}T00:00:00" mas
json_escape() {
python3 -c 'import json, sys; print(json.dumps(sys.stdin.read().rstrip("\n"))[1:-1])'
}
if [[ -z "$COMMITS" ]]; then
PAYLOAD=$(cat <<ENDJSON
{

65
scripts/kill-vitest.sh Executable file
View File

@@ -0,0 +1,65 @@
#!/usr/bin/env bash
#
# Kill all running vitest processes.
#
# Usage:
# scripts/kill-vitest.sh # kill all
# scripts/kill-vitest.sh --dry # preview what would be killed
#
set -euo pipefail
DRY_RUN=false
if [[ "${1:-}" == "--dry" || "${1:-}" == "--dry-run" || "${1:-}" == "-n" ]]; then
DRY_RUN=true
fi
pids=()
lines=()
while IFS= read -r line; do
[[ -z "$line" ]] && continue
pid=$(echo "$line" | awk '{print $2}')
pids+=("$pid")
lines+=("$line")
done < <(ps aux | grep -E '(^|/)(vitest|node .*/vitest)( |$)|/\.bin/vitest|vitest/dist|vitest\.mjs' | grep -v grep || true)
if [[ ${#pids[@]} -eq 0 ]]; then
echo "No vitest processes found."
exit 0
fi
echo "Found ${#pids[@]} vitest process(es):"
echo ""
for i in "${!pids[@]}"; do
line="${lines[$i]}"
pid=$(echo "$line" | awk '{print $2}')
start=$(echo "$line" | awk '{print $9}')
cmd=$(echo "$line" | awk '{for(i=11;i<=NF;i++) printf "%s ", $i; print ""}')
cmd=$(echo "$cmd" | sed "s|$HOME/||g")
printf " PID %-7s started %-10s %s\n" "$pid" "$start" "$cmd"
done
echo ""
if [[ "$DRY_RUN" == true ]]; then
echo "Dry run — re-run without --dry to kill these processes."
exit 0
fi
echo "Sending SIGTERM..."
for pid in "${pids[@]}"; do
kill -TERM "$pid" 2>/dev/null && echo " signaled $pid" || echo " $pid already gone"
done
sleep 2
for pid in "${pids[@]}"; do
if kill -0 "$pid" 2>/dev/null; then
echo " $pid still alive, sending SIGKILL..."
kill -KILL "$pid" 2>/dev/null || true
fi
done
echo "Done."