mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-26 01:24:59 +02:00
* feat(seed): BUNDLE_RUN_STARTED_AT_MS env + runSeed SIGTERM cleanup
Prereq for the re-export-share Comtrade seeder (plan 2026-04-24-003),
usable by any cohort seeder whose consumer needs bundle-level freshness.
Two coupled changes:
1. `_bundle-runner.mjs` injects `BUNDLE_RUN_STARTED_AT_MS` into every
spawned child. All siblings in a single bundle run share one value
(captured at `runBundle` start, not spawn time). Consumers use this
to detect stale peer keys — if a peer's seed-meta predates the
current bundle run, fall back to a hard default rather than read
a cohort-peer's last-week output.
2. `_seed-utils.mjs::runSeed` registers a `process.once('SIGTERM')`
handler that releases the acquired lock and extends existing-data
TTL before exiting 143. `_bundle-runner.mjs` sends SIGTERM on
section timeout, then SIGKILL after KILL_GRACE_MS (5s). Without
this handler the `finally` path never runs on SIGKILL, leaving
the 30-min acquireLock reservation in place until its own TTL
expires — the next cron tick silently skips the resource.
Regression guard memory: `bundle-runner-sigkill-leaks-child-lock` (PR
#3128 root cause).
Tests added:
- bundle-runner env injection (value within run bounds)
- sibling sections share the same timestamp (critical for the
consumer freshness guard)
- runSeed SIGTERM path: exit 143 + cleanup log
- process.once contract: second SIGTERM does not re-enter handler
* fix(seed): address P1/P2 review findings on SIGTERM + bundle contracts
Addresses PR #3384 review findings (todos 256, 257, 259, 260):
#256 (P1) — SIGTERM handler narrowed to fetch phase only. Was installed
at runSeed entry and armed through every `process.exit` path; could
race `emptyDataIsFailure: true` strict-floor exits (IMF-External,
WB-bulk) and extend seed-meta TTL when the contract forbids it —
silently re-masking 30-day outages. Now the handler is attached
immediately before `withRetry(fetchFn)` and removed in a try/finally
that covers all fetch-phase exit branches.
#257 (P1) — `BUNDLE_RUN_STARTED_AT_MS` now has a first-class helper.
Exported `getBundleRunStartedAtMs()` from `_seed-utils.mjs` with JSDoc
describing the bundle-freshness contract. Fleet-wide helper so the
next consumer seeder imports instead of rediscovering the idiom.
#259 (P2) — SIGTERM cleanup runs `Promise.allSettled` on disjoint-key
ops (`releaseLock` + `extendExistingTtl`). Serialising compounded
Upstash latency during the exact failure mode (Redis degraded) this
handler exists to handle, risking breach of the 5s SIGKILL grace.
#260 (P2) — `_bundle-runner.mjs` asserts topological order on
optional `dependsOn` section field. Throws on unknown-label refs and
on deps appearing at a later index. Fleet-wide contract replacing
the previous prose-comment ordering guarantee.
Tests added/updated:
- New: SIGTERM handler removed after fetchFn completes (narrowed-scope
contract — post-fetch SIGTERM must NOT trigger TTL extension)
- New: dependsOn unknown-label + out-of-order + happy-path (3 tests)
Full test suite: 6,866 tests pass (+4 net).
* fix(seed): getBundleRunStartedAtMs returns null outside a bundle run
Review follow-up: the earlier `Math.floor(Date.now()/1000)*1000` fallback
regressed standalone (non-bundle) runs. A consumer seeder invoked
manually just after its peer wrote `fetchedAt = (now - 5s)` would see
`bundleStartMs = Date.now()`, reject the perfectly-fresh peer envelope
as "stale", and fall back to defaults — defeating the point of the
peer-read path outside the bundle.
Returning null when `BUNDLE_RUN_STARTED_AT_MS` is unset/invalid keeps
the freshness gate scoped to its real purpose (across-bundle-tick
staleness) and lets standalone runs skip the gate entirely. Consumers
check `bundleStartMs != null` before applying the comparison; see the
companion `seed-sovereign-wealth.mjs` change on the stacked PR.
* test(seed): SIGTERM cleanup test now verifies Redis DEL + EXPIRE calls
Greptile review P2 on PR #3384: the existing test only asserted exit
code + log line, not that the Redis ops were actually issued. The
log claim was ahead of the test.
Fixture now logs every Upstash fetch call's shape (EVAL / pipeline-
EXPIRE / other) to stderr. Test asserts:
- >=1 EVAL op was issued during SIGTERM cleanup (releaseLock Lua
script on the lock key)
- >=1 pipeline-EXPIRE op was issued (extendExistingTtl on canonical
+ seed-meta keys)
- The EVAL body carries the runSeed-generated runId (proves it's
THIS run's release, not a phantom op)
- The EXPIRE pipeline touches both the canonicalKey AND the
seed-meta key (proves the keys[] array was built correctly
including the extraKeys merge path)
Full test suite: 6,866 tests pass, typecheck clean.
230 lines
9.3 KiB
JavaScript
230 lines
9.3 KiB
JavaScript
// Verifies _bundle-runner.mjs streams child stdio live, reports timeout with
|
|
// a clear reason, and escalates SIGTERM → SIGKILL when a child ignores SIGTERM.
|
|
//
|
|
// Uses a real spawn of a small bundle against ephemeral scripts under scripts/
|
|
// because the runner joins __dirname with section.script.
|
|
|
|
import { test } from 'node:test';
|
|
import assert from 'node:assert/strict';
|
|
import { spawn } from 'node:child_process';
|
|
import { writeFileSync, unlinkSync } from 'node:fs';
|
|
import { join } from 'node:path';
|
|
|
|
const SCRIPTS_DIR = new URL('../scripts/', import.meta.url).pathname;
|
|
|
|
function runBundleWith(sections, opts = {}) {
|
|
const runPath = join(SCRIPTS_DIR, '_bundle-runner-test-run.mjs');
|
|
writeFileSync(
|
|
runPath,
|
|
`import { runBundle } from './_bundle-runner.mjs';\nawait runBundle('test', ${JSON.stringify(
|
|
sections,
|
|
)}, ${JSON.stringify(opts)});\n`,
|
|
);
|
|
return new Promise((resolve) => {
|
|
const child = spawn(process.execPath, [runPath], { stdio: ['ignore', 'pipe', 'pipe'] });
|
|
let stdout = '';
|
|
let stderr = '';
|
|
child.stdout.on('data', (c) => { stdout += c; });
|
|
child.stderr.on('data', (c) => { stderr += c; });
|
|
child.on('close', (code) => {
|
|
try { unlinkSync(runPath); } catch {}
|
|
resolve({ code, stdout, stderr });
|
|
});
|
|
});
|
|
}
|
|
|
|
function writeFixture(name, body) {
|
|
const path = join(SCRIPTS_DIR, name);
|
|
writeFileSync(path, body);
|
|
return () => { try { unlinkSync(path); } catch {} };
|
|
}
|
|
|
|
test('streams child stdout live and reports Done on success', async () => {
|
|
const cleanup = writeFixture(
|
|
'_bundle-fixture-fast.mjs',
|
|
`console.log('line-one'); console.log('line-two');\n`,
|
|
);
|
|
try {
|
|
const { code, stdout } = await runBundleWith([
|
|
{ label: 'FAST', script: '_bundle-fixture-fast.mjs', intervalMs: 1, timeoutMs: 5000 },
|
|
]);
|
|
assert.equal(code, 0);
|
|
assert.match(stdout, /\[FAST\] line-one/);
|
|
assert.match(stdout, /\[FAST\] line-two/);
|
|
assert.match(stdout, /\[FAST\] Done \(/);
|
|
assert.match(stdout, /\[Bundle:test\] Finished .* ran:1/);
|
|
} finally {
|
|
cleanup();
|
|
}
|
|
});
|
|
|
|
test('timeout emits terminal reason BEFORE SIGTERM/SIGKILL grace (survives container kill)', async () => {
|
|
const cleanup = writeFixture(
|
|
'_bundle-fixture-hang.mjs',
|
|
// Ignore SIGTERM so the runner must SIGKILL.
|
|
`process.on('SIGTERM', () => {}); console.log('hung'); setInterval(() => {}, 1000);\n`,
|
|
);
|
|
try {
|
|
const t0 = Date.now();
|
|
const { code, stdout, stderr } = await runBundleWith([
|
|
{ label: 'HANG', script: '_bundle-fixture-hang.mjs', intervalMs: 1, timeoutMs: 1000 },
|
|
]);
|
|
const elapsedMs = Date.now() - t0;
|
|
assert.equal(code, 1, 'bundle must exit non-zero on failure');
|
|
const combined = stdout + stderr;
|
|
assert.match(combined, /\[HANG\] hung/, 'child stdout should stream before kill');
|
|
// Critical: terminal "Failed ... timeout" line must appear in-line with the
|
|
// SIGTERM send, not after SIGKILL — this is what survives a container kill
|
|
// landing inside the 10s grace window.
|
|
const failIdx = combined.indexOf('Failed after');
|
|
const sigkillIdx = combined.indexOf('SIGKILL');
|
|
assert.ok(failIdx >= 0, 'must emit Failed line');
|
|
assert.ok(sigkillIdx > failIdx, 'Failed line must precede SIGKILL escalation');
|
|
assert.match(combined, /Failed after .*s: timeout after 1s — sending SIGTERM/);
|
|
assert.match(combined, /Did not exit on SIGTERM.*SIGKILL/);
|
|
// 1s timeout + 10s SIGTERM grace + overhead; cap well above that to avoid flake.
|
|
assert.ok(elapsedMs < 20_000, `timeout escalation took ${elapsedMs}ms — too slow`);
|
|
} finally {
|
|
cleanup();
|
|
}
|
|
});
|
|
|
|
test('budget check accounts for SIGKILL grace when deferring', async () => {
|
|
const cleanup = writeFixture(
|
|
'_bundle-fixture-sleep.mjs',
|
|
`console.log('ok');\n`,
|
|
);
|
|
try {
|
|
// timeoutMs (15s) + grace (10s) = 25s worst-case. Budget 20s must defer.
|
|
const { code, stdout } = await runBundleWith(
|
|
[{ label: 'GATED', script: '_bundle-fixture-sleep.mjs', intervalMs: 1, timeoutMs: 15_000 }],
|
|
{ maxBundleMs: 20_000 },
|
|
);
|
|
assert.equal(code, 0, 'deferred sections are not failures');
|
|
assert.match(stdout, /\[GATED\] Deferred, needs 25s \(timeout\+grace\)/);
|
|
assert.match(stdout, /deferred:1/);
|
|
} finally {
|
|
cleanup();
|
|
}
|
|
});
|
|
|
|
test('non-zero exit without timeout reports exit code', async () => {
|
|
const cleanup = writeFixture(
|
|
'_bundle-fixture-fail.mjs',
|
|
`console.error('boom'); process.exit(2);\n`,
|
|
);
|
|
try {
|
|
const { code, stdout, stderr } = await runBundleWith([
|
|
{ label: 'FAIL', script: '_bundle-fixture-fail.mjs', intervalMs: 1, timeoutMs: 5000 },
|
|
]);
|
|
assert.equal(code, 1);
|
|
const combined = stdout + stderr;
|
|
assert.match(combined, /\[FAIL\] boom/);
|
|
assert.match(combined, /Failed after .*s: exit 2/);
|
|
} finally {
|
|
cleanup();
|
|
}
|
|
});
|
|
|
|
test('injects BUNDLE_RUN_STARTED_AT_MS env into child; value is within run bounds', async () => {
|
|
const cleanup = writeFixture(
|
|
'_bundle-fixture-env.mjs',
|
|
`console.log('BUNDLE_RUN_STARTED_AT_MS=' + process.env.BUNDLE_RUN_STARTED_AT_MS);\n`,
|
|
);
|
|
const before = Date.now();
|
|
try {
|
|
const { code, stdout } = await runBundleWith([
|
|
{ label: 'ENV', script: '_bundle-fixture-env.mjs', intervalMs: 1, timeoutMs: 5000 },
|
|
]);
|
|
const after = Date.now();
|
|
assert.equal(code, 0);
|
|
const match = stdout.match(/BUNDLE_RUN_STARTED_AT_MS=(\d+)/);
|
|
assert.ok(match, `expected env var in child stdout; got:\n${stdout}`);
|
|
const injected = Number(match[1]);
|
|
assert.ok(Number.isInteger(injected), 'injected value must be an integer');
|
|
// Parent captured t0 at bundle start (before this test's `before` call) and
|
|
// child ran before `after`. So: before - tolerance <= injected <= after.
|
|
assert.ok(injected >= before - 5000 && injected <= after,
|
|
`injected=${injected} out of bounds [${before - 5000}, ${after}]`);
|
|
} finally {
|
|
cleanup();
|
|
}
|
|
});
|
|
|
|
test('sibling sections share the same BUNDLE_RUN_STARTED_AT_MS (one-shot per bundle)', async () => {
|
|
const cleanupA = writeFixture(
|
|
'_bundle-fixture-env-a.mjs',
|
|
`console.log('TS_A=' + process.env.BUNDLE_RUN_STARTED_AT_MS);\n`,
|
|
);
|
|
const cleanupB = writeFixture(
|
|
'_bundle-fixture-env-b.mjs',
|
|
`await new Promise((r) => setTimeout(r, 200));\nconsole.log('TS_B=' + process.env.BUNDLE_RUN_STARTED_AT_MS);\n`,
|
|
);
|
|
try {
|
|
const { code, stdout } = await runBundleWith([
|
|
{ label: 'A', script: '_bundle-fixture-env-a.mjs', intervalMs: 1, timeoutMs: 5000 },
|
|
{ label: 'B', script: '_bundle-fixture-env-b.mjs', intervalMs: 1, timeoutMs: 5000 },
|
|
]);
|
|
assert.equal(code, 0);
|
|
const tsA = Number(stdout.match(/TS_A=(\d+)/)?.[1]);
|
|
const tsB = Number(stdout.match(/TS_B=(\d+)/)?.[1]);
|
|
assert.ok(tsA && tsB, `both timestamps present; stdout:\n${stdout}`);
|
|
// Both children read the same bundle-level t0, so the injected value is
|
|
// identical across siblings (NOT spawn time). This is the critical
|
|
// property Phase 2's bundle-freshness guard relies on.
|
|
assert.equal(tsA, tsB, 'siblings must share one bundle-level timestamp');
|
|
} finally {
|
|
cleanupA();
|
|
cleanupB();
|
|
}
|
|
});
|
|
|
|
test('dependsOn: throws when a dep appears later in the sections array', async () => {
|
|
// Consumer (depends on Producer) is at index 0 — violates the contract.
|
|
const cleanupC = writeFixture('_bundle-fixture-dep-consumer.mjs', `console.log('consumer');\n`);
|
|
const cleanupP = writeFixture('_bundle-fixture-dep-producer.mjs', `console.log('producer');\n`);
|
|
try {
|
|
const { code, stderr } = await runBundleWith([
|
|
{ label: 'Consumer', script: '_bundle-fixture-dep-consumer.mjs', intervalMs: 1, timeoutMs: 5000, dependsOn: ['Producer'] },
|
|
{ label: 'Producer', script: '_bundle-fixture-dep-producer.mjs', intervalMs: 1, timeoutMs: 5000 },
|
|
]);
|
|
assert.notEqual(code, 0, 'out-of-order dependsOn must cause non-zero exit');
|
|
assert.match(stderr, /dependsOn 'Producer' but 'Producer' is at index 1/,
|
|
`expected topological violation error; stderr:\n${stderr}`);
|
|
} finally {
|
|
cleanupC();
|
|
cleanupP();
|
|
}
|
|
});
|
|
|
|
test('dependsOn: passes when deps appear earlier in the sections array', async () => {
|
|
const cleanupP = writeFixture('_bundle-fixture-dep-producer-ok.mjs', `console.log('producer');\n`);
|
|
const cleanupC = writeFixture('_bundle-fixture-dep-consumer-ok.mjs', `console.log('consumer');\n`);
|
|
try {
|
|
const { code, stdout } = await runBundleWith([
|
|
{ label: 'Producer', script: '_bundle-fixture-dep-producer-ok.mjs', intervalMs: 1, timeoutMs: 5000 },
|
|
{ label: 'Consumer', script: '_bundle-fixture-dep-consumer-ok.mjs', intervalMs: 1, timeoutMs: 5000, dependsOn: ['Producer'] },
|
|
]);
|
|
assert.equal(code, 0);
|
|
assert.match(stdout, /\[Producer\] producer/);
|
|
assert.match(stdout, /\[Consumer\] consumer/);
|
|
} finally {
|
|
cleanupP();
|
|
cleanupC();
|
|
}
|
|
});
|
|
|
|
test('dependsOn: throws on unknown label reference', async () => {
|
|
const cleanup = writeFixture('_bundle-fixture-dep-orphan.mjs', `console.log('orphan');\n`);
|
|
try {
|
|
const { code, stderr } = await runBundleWith([
|
|
{ label: 'Orphan', script: '_bundle-fixture-dep-orphan.mjs', intervalMs: 1, timeoutMs: 5000, dependsOn: ['DoesNotExist'] },
|
|
]);
|
|
assert.notEqual(code, 0);
|
|
assert.match(stderr, /dependsOn unknown label 'DoesNotExist'/,
|
|
`expected unknown-label error; stderr:\n${stderr}`);
|
|
} finally {
|
|
cleanup();
|
|
}
|
|
});
|