Files
worldmonitor/tests/resilience-cache-keys-health-sync.test.mts
Elie Habib 8cca8d19e3 feat(resilience): Comtrade-backed re-export-share seeder + SWF Redis read (#3385)
* feat(seed): BUNDLE_RUN_STARTED_AT_MS env + runSeed SIGTERM cleanup

Prereq for the re-export-share Comtrade seeder (plan 2026-04-24-003),
usable by any cohort seeder whose consumer needs bundle-level freshness.

Two coupled changes:

1. `_bundle-runner.mjs` injects `BUNDLE_RUN_STARTED_AT_MS` into every
   spawned child. All siblings in a single bundle run share one value
   (captured at `runBundle` start, not spawn time). Consumers use this
   to detect stale peer keys — if a peer's seed-meta predates the
   current bundle run, fall back to a hard default rather than read
   a cohort-peer's last-week output.

2. `_seed-utils.mjs::runSeed` registers a `process.once('SIGTERM')`
   handler that releases the acquired lock and extends existing-data
   TTL before exiting 143. `_bundle-runner.mjs` sends SIGTERM on
   section timeout, then SIGKILL after KILL_GRACE_MS (5s). Without
   this handler the `finally` path never runs on SIGKILL, leaving
   the 30-min acquireLock reservation in place until its own TTL
   expires — the next cron tick silently skips the resource.

Regression guard memory: `bundle-runner-sigkill-leaks-child-lock` (PR
#3128 root cause).

Tests added:
- bundle-runner env injection (value within run bounds)
- sibling sections share the same timestamp (critical for the
  consumer freshness guard)
- runSeed SIGTERM path: exit 143 + cleanup log
- process.once contract: second SIGTERM does not re-enter handler

* fix(seed): address P1/P2 review findings on SIGTERM + bundle contracts

Addresses PR #3384 review findings (todos 256, 257, 259, 260):

#256 (P1) — SIGTERM handler narrowed to fetch phase only. Was installed
at runSeed entry and armed through every `process.exit` path; could
race `emptyDataIsFailure: true` strict-floor exits (IMF-External,
WB-bulk) and extend seed-meta TTL when the contract forbids it —
silently re-masking 30-day outages. Now the handler is attached
immediately before `withRetry(fetchFn)` and removed in a try/finally
that covers all fetch-phase exit branches.

#257 (P1) — `BUNDLE_RUN_STARTED_AT_MS` now has a first-class helper.
Exported `getBundleRunStartedAtMs()` from `_seed-utils.mjs` with JSDoc
describing the bundle-freshness contract. Fleet-wide helper so the
next consumer seeder imports instead of rediscovering the idiom.

#259 (P2) — SIGTERM cleanup runs `Promise.allSettled` on disjoint-key
ops (`releaseLock` + `extendExistingTtl`). Serialising compounded
Upstash latency during the exact failure mode (Redis degraded) this
handler exists to handle, risking breach of the 5s SIGKILL grace.

#260 (P2) — `_bundle-runner.mjs` asserts topological order on
optional `dependsOn` section field. Throws on unknown-label refs and
on deps appearing at a later index. Fleet-wide contract replacing
the previous prose-comment ordering guarantee.

Tests added/updated:
- New: SIGTERM handler removed after fetchFn completes (narrowed-scope
  contract — post-fetch SIGTERM must NOT trigger TTL extension)
- New: dependsOn unknown-label + out-of-order + happy-path (3 tests)

Full test suite: 6,866 tests pass (+4 net).

* fix(seed): getBundleRunStartedAtMs returns null outside a bundle run

Review follow-up: the earlier `Math.floor(Date.now()/1000)*1000` fallback
regressed standalone (non-bundle) runs. A consumer seeder invoked
manually just after its peer wrote `fetchedAt = (now - 5s)` would see
`bundleStartMs = Date.now()`, reject the perfectly-fresh peer envelope
as "stale", and fall back to defaults — defeating the point of the
peer-read path outside the bundle.

Returning null when `BUNDLE_RUN_STARTED_AT_MS` is unset/invalid keeps
the freshness gate scoped to its real purpose (across-bundle-tick
staleness) and lets standalone runs skip the gate entirely. Consumers
check `bundleStartMs != null` before applying the comparison; see the
companion `seed-sovereign-wealth.mjs` change on the stacked PR.

* test(seed): SIGTERM cleanup test now verifies Redis DEL + EXPIRE calls

Greptile review P2 on PR #3384: the existing test only asserted exit
code + log line, not that the Redis ops were actually issued. The
log claim was ahead of the test.

Fixture now logs every Upstash fetch call's shape (EVAL / pipeline-
EXPIRE / other) to stderr. Test asserts:

- >=1 EVAL op was issued during SIGTERM cleanup (releaseLock Lua
  script on the lock key)
- >=1 pipeline-EXPIRE op was issued (extendExistingTtl on canonical
  + seed-meta keys)
- The EVAL body carries the runSeed-generated runId (proves it's
  THIS run's release, not a phantom op)
- The EXPIRE pipeline touches both the canonicalKey AND the
  seed-meta key (proves the keys[] array was built correctly
  including the extraKeys merge path)

Full test suite: 6,866 tests pass, typecheck clean.

* feat(resilience): Comtrade-backed re-export-share seeder + SWF Redis read

Plan ref: docs/plans/2026-04-24-003-feat-reexport-share-comtrade-seeder-plan.md

Motivating case. Before this PR, the SWF `rawMonths` denominator for
the `sovereignFiscalBuffer` dimension used GROSS annual imports for
every country. For re-export hubs (goods transiting without domestic
settlement), this structurally under-reports resilience: UAE's 2023
$941B of imports include $334B of transit flow that never represents
domestic consumption. Net imports = gross × (1 − reexport_share).

The previous (PR 3A) design flattened a hand-curated YAML into Redis;
the YAML shipped empty and never populated, so the correction never
applied and the cohort audit showed no movement.

Gap #2 (this PR). Two coupled changes to make the correction actually
apply:

1. Comtrade-backed seeder (`scripts/seed-recovery-reexport-share.mjs`).
   Rewritten to fetch UN Comtrade `flowCode=RX` (re-exports) and
   `flowCode=M` (imports) per cohort member, compute share = RX/M at
   the latest co-populated year, clamp to [0.05, 0.95], publish the
   envelope. Header auth (`Ocp-Apim-Subscription-Key`) — subscription
   key never reaches URL/logs/Redis. `maxRecords=250000` cap with
   truncation detection. Sequential + retry-on-429 with backoff.

   Hub cohort resolved by Phase 0 empirical probe (plan §Phase 0):
   ['AE', 'PA']. Six candidates (SG/HK/NL/BE/MY/LT) return HTTP 200
   with zero RX rows — Comtrade doesn't expose RX for those reporters.

2. SWF seeder reads from Redis (`scripts/seed-sovereign-wealth.mjs`).
   Swaps `loadReexportShareByCountry()` (YAML) for
   `loadReexportShareFromRedis()` (Redis key written by #1). Guarded
   by bundle-run freshness: if the sibling Reexport-Share seeder's
   `seed-meta` predates `BUNDLE_RUN_STARTED_AT_MS` (set by the
   prereq PR's `_bundle-runner.mjs` env-injection), HARD fallback
   to gross imports rather than apply last-month's stale share.

Health registries. Both new keys registered in BOTH `api/health.js`
SEED_META (60-day alert threshold) and `api/seed-health.js`
SEED_DOMAINS (43200min interval). feedback_two_health_endpoints_must_match.

Bundle wiring. `seed-bundle-resilience-recovery` Reexport-Share
timeout bumped 60s → 300s (Comtrade + retry can take 2-3 min
worst-case). Ordering preserved: Reexport-Share before Sovereign-
Wealth so the SWF seeder reads a freshly-written key in the same
cron tick.

Deletions. YAML + loader + 7 obsolete loader tests removed; single
source of truth is now Comtrade → Redis.

Prereq. Stacks on PR #3384 (feat/bundle-runner-env-sigterm)
which adds BUNDLE_RUN_STARTED_AT_MS env injection + runSeed
SIGTERM cleanup. This PR's bundle-freshness guard depends on
that env variable.

Tests (19 new, 7 deleted, +12 net):
- Pure math: parseComtradeFlowResponse, computeShareFromFlows,
  clampShare, declareRecords + credential-leak source scan (15)
- Integration (Gap #2 regression guards): SWF seeder loadReexport
  ShareFromRedis — fresh/absent/malformed/stale-meta/missing-meta (5)
- Health registry dual-registry drift guard — scoped to this PR's
  keys, respecting pre-existing asymmetry (4)
- Bundle-ordering + timeout assertions (2)

Phase 0 cohort validation committed to plan. Full test suite
passes: 6,881 tests.

* fix(resilience): address P1/P2 review findings — adopt shared helpers, pin freshness boundary

Addresses PR #3385 review findings:

#257 (P1) consumer — `seed-sovereign-wealth.mjs` imports the shared
`getBundleRunStartedAtMs` helper from `_seed-utils.mjs` (added in the
prereq commit) instead of its own `getBundleStartMs`. Single source of
truth for the bundle-freshness contract.

#258 (P2) — `seed-recovery-reexport-share.mjs` isMain guard uses the
canonical `pathToFileURL(process.argv[1]).href === import.meta.url`
form instead of basename-suffix matching. Handles symlinks, case-
different paths on macOS HFS+, and Windows path separators without
string munging.

#260 (P2) consumer — Sovereign-Wealth declares `dependsOn:
['Reexport-Share']` in the bundle spec. `_bundle-runner.mjs` (prereq
commit) now enforces topological order on load and throws on
violation — replaces the previous prose-comment ordering contract.

#261 (P2) — added a test to `tests/seed-sovereign-wealth-reads-redis-
reexport-share.test.mts` pinning the inclusive-boundary semantic:
`fetchedAtMs === bundleStartMs` must be treated as FRESH. Guards
against a future refactor to `<=` that would silently reject peers
writing at the very first millisecond of the bundle run.

Rebased onto updated prereq. Full test suite: 6,886 tests pass (+5 net).

* fix(resilience): freshness gate skipped in standalone mode; meta still required

Review catch: the previous `bundleStartMs = Date.now()` fallback made
standalone/manual `seed-sovereign-wealth.mjs` runs ALWAYS reject any
previously-seeded re-export-share meta as "stale" — even when the
operator ran the Reexport seeder milliseconds beforehand. Defeated
the point of the peer-read path outside the bundle.

With `getBundleRunStartedAtMs()` now returning null outside a bundle
(companion commit on the prereq branch), the consumer only applies
the freshness gate when `bundleStartMs != null`. Standalone runs
accept any `fetchedAt` — the operator is responsible for ordering.

Two guards survive the change:
- Meta MUST exist (absence = peer-outage fail-safe, both modes)
- In-bundle: meta MUST be at or after `BUNDLE_RUN_STARTED_AT_MS`

Two new tests pin both modes:
- standalone: accepts meta written 10 min before this process started
- standalone: still rejects missing meta (peer-outage fail-safe
  survives gate bypass)

Rebased onto updated prereq. Full test suite: 6,888 tests (+2 net).

* fix(resilience): filter world-aggregate Comtrade rows + skip final-retry sleep

Greptile review of PR #3385 flagged two P2s in the Comtrade seeder.

Finding #3 (parseComtradeFlowResponse double-count risk):
`cmdCode=TOTAL` without a partner filter currently returns only
world-aggregate rows in practice — but `parseComtradeFlowResponse`
summed every row unconditionally. A future refactor adding per-
partner querying would silently double-count (world-aggregate row +
partner-level rows for the same year), cutting the derived share in
half with no test signal.

Fix: explicit `partnerCode ∈ {'0', 0, null/undefined}` filter. Matches
current empirical behavior (aggregate-only responses) and makes the
construct robust to a future partner-level query.

Finding #4 (wasted backoff on final retry):
429 and 5xx branches slept `backoffMs` before `continue`, but on
`attempt === RETRY_MAX_ATTEMPTS` the loop condition fails immediately
after — the sleep was pure waste. Added early-return (parallel to the
existing pattern in the network-error catch branch) so the final
attempt exits the retry loop at the first non-success response
without extra latency.

Tests:
- 3 new `parseComtradeFlowResponse` variants: world-only filter,
  numeric-0 partnerCode shape, rows without partnerCode field
- Existing tests updated: the double-count assertion replaced with
  a "per-partner rows must NOT sum into the world-aggregate total"
  assertion that pins the new contract

Rebased onto updated prereq. Full test suite: 6,890 tests (+2 net).
2026-04-25 00:14:17 +04:00

177 lines
8.3 KiB
TypeScript

import { describe, it } from 'node:test';
import assert from 'node:assert/strict';
import { readFileSync } from 'node:fs';
import { dirname, join, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import {
RESILIENCE_SCORE_CACHE_PREFIX,
RESILIENCE_RANKING_CACHE_KEY,
RESILIENCE_HISTORY_KEY_PREFIX,
} from '../server/worldmonitor/resilience/v1/_shared.ts';
const __dirname = dirname(fileURLToPath(import.meta.url));
const repoRoot = resolve(__dirname, '..');
// Phase 1 T1.9 cache-key / health-registry sync guard.
//
// If a future PR bumps any of the resilience cache key constants in
// server/worldmonitor/resilience/v1/_shared.ts (e.g. resilience:score:v7
// becomes v8), the api/health.js SEED_META / KEY_TO_DOMAIN registry MUST
// be updated in the same PR or health probes will silently watch the
// wrong key and stop paging on real staleness.
//
// This test reads api/health.js as text and asserts the ranking cache
// key string (the only resilience key currently tracked in health) is
// literally present. When new resilience keys are added to health, add
// their assertions here too.
//
// Rationale: api/health.js is a plain .js file with hand-maintained
// string literals for the KEY_TO_DOMAIN mapping and the SEED_META
// registry. Those string literals are the single source of truth for
// what the health probe watches, and they are copy-pasted (not
// imported) from the server-side TypeScript constants. A literal text
// match is the cheapest possible drift guard.
describe('resilience cache-key health-registry sync (T1.9)', () => {
const healthText = readFileSync(join(repoRoot, 'api/health.js'), 'utf-8');
it('RESILIENCE_RANKING_CACHE_KEY literal appears in api/health.js', () => {
assert.ok(
healthText.includes(`'${RESILIENCE_RANKING_CACHE_KEY}'`) ||
healthText.includes(`"${RESILIENCE_RANKING_CACHE_KEY}"`),
`api/health.js must reference ${RESILIENCE_RANKING_CACHE_KEY} in KEY_TO_DOMAIN or SEED_META. Did you bump the key in _shared.ts without updating health?`,
);
});
it('RESILIENCE_SCORE_CACHE_PREFIX matches expected resilience:score:v<n>: shape', () => {
// The score key is per-country (prefix + ISO2), so we do not expect
// the full key literal in health.js. Guard: the prefix string
// matches the declared resilience:score:v<n>: shape so a typo or an
// accidental rename is caught at test time.
const versionMatch = /^resilience:score:v(\d+):$/.exec(RESILIENCE_SCORE_CACHE_PREFIX);
assert.ok(
versionMatch,
`RESILIENCE_SCORE_CACHE_PREFIX must match resilience:score:v<n>: shape, got ${RESILIENCE_SCORE_CACHE_PREFIX}`,
);
});
it('RESILIENCE_HISTORY_KEY_PREFIX matches expected resilience:history:v<n>: shape', () => {
const versionMatch = /^resilience:history:v(\d+):$/.exec(RESILIENCE_HISTORY_KEY_PREFIX);
assert.ok(
versionMatch,
`RESILIENCE_HISTORY_KEY_PREFIX must match resilience:history:v<n>: shape, got ${RESILIENCE_HISTORY_KEY_PREFIX}`,
);
});
// PR 3A §net-imports adds this block. The cache-prefix-bump-propagation-
// scope skill documents that "one prefix, many mirrored sites" is the
// bug class: scorer and seed-resilience-scores agree, but an offline
// analysis script or a benchmark mirror still reads the old prefix.
// This test reads every known mirror file and asserts each contains
// the current version literal (not v_old). If a future cache bump
// misses a site, the test names it explicitly.
describe('cache-prefix mirror parity — every declared literal site', () => {
const SCORE_MIRROR_FILES = [
'scripts/seed-resilience-scores.mjs',
'scripts/validate-resilience-correlation.mjs',
'scripts/backtest-resilience-outcomes.mjs',
'scripts/validate-resilience-backtest.mjs',
] as const;
const RANKING_MIRROR_FILES = [
'scripts/seed-resilience-scores.mjs',
'scripts/benchmark-resilience-external.mjs',
'api/health.js',
] as const;
it('every score-prefix mirror uses the canonical RESILIENCE_SCORE_CACHE_PREFIX', () => {
for (const rel of SCORE_MIRROR_FILES) {
const text = readFileSync(join(repoRoot, rel), 'utf-8');
// A mirror file's single-source-of-truth invariant: it must
// contain the canonical prefix literal. A bump that misses the
// mirror leaves the mirror reading an abandoned Redis key.
assert.ok(
text.includes(RESILIENCE_SCORE_CACHE_PREFIX),
`${rel} must contain RESILIENCE_SCORE_CACHE_PREFIX=${RESILIENCE_SCORE_CACHE_PREFIX}. Did the cache-prefix bump miss this file?`,
);
// Also assert the OLD prefix is NOT present — catches the
// bump-the-constant-but-forget-the-literal pattern.
const oldPrefixPattern = /resilience:score:v(\d+):/g;
const matches = [...text.matchAll(oldPrefixPattern)]
.map((m) => m[0])
.filter((m) => m !== RESILIENCE_SCORE_CACHE_PREFIX);
assert.equal(
matches.length, 0,
`${rel} has stale score-prefix literal(s): ${matches.join(', ')} — must match ${RESILIENCE_SCORE_CACHE_PREFIX}`,
);
}
});
it('every ranking-key mirror uses the canonical RESILIENCE_RANKING_CACHE_KEY', () => {
for (const rel of RANKING_MIRROR_FILES) {
const text = readFileSync(join(repoRoot, rel), 'utf-8');
assert.ok(
text.includes(RESILIENCE_RANKING_CACHE_KEY),
`${rel} must contain RESILIENCE_RANKING_CACHE_KEY=${RESILIENCE_RANKING_CACHE_KEY}. Did the cache-prefix bump miss this file?`,
);
const oldRankingPattern = /resilience:ranking:v(\d+)\b/g;
const matches = [...text.matchAll(oldRankingPattern)]
.map((m) => m[0])
.filter((m) => m !== RESILIENCE_RANKING_CACHE_KEY);
// Loose match: some files reference older versions in comments
// (seed-resilience-scores.mjs has historical notes about
// v9/v10). Only flag non-comment lines.
const stalePositions = [...text.matchAll(oldRankingPattern)]
.filter((m) => m[0] !== RESILIENCE_RANKING_CACHE_KEY)
.filter((m) => {
// Inspect the line surrounding this match: skip if it's a
// comment line (starts with //, *, or is inside /* */).
const lineStart = text.lastIndexOf('\n', m.index ?? 0) + 1;
const lineEnd = text.indexOf('\n', m.index ?? 0);
const line = text.slice(lineStart, lineEnd === -1 ? text.length : lineEnd);
return !/^\s*(\/\/|\*|#)/.test(line);
});
assert.equal(
stalePositions.length, 0,
`${rel} has stale ranking-key literal(s) in non-comment code: ${stalePositions.map((m) => m[0]).join(', ')} — must match ${RESILIENCE_RANKING_CACHE_KEY}`,
);
}
});
});
// Plan 2026-04-24-003 dual-registry drift guard. `api/health.js` and
// `api/seed-health.js` maintain INDEPENDENT registries (see
// `feedback_two_health_endpoints_must_match`). They are NOT globally
// identical — health.js watches keys seed-health.js doesn't, and vice
// versa. Only the keys explicitly added by this PR are required in
// BOTH registries; pre-existing recovery entries (fiscal-space,
// reserve-adequacy, external-debt, import-hhi, fuel-stocks) live only
// in api/health.js by design and are NOT asserted here.
describe('resilience-recovery dual-registry parity (this PR only)', () => {
const SHARED_RESILIENCE_KEYS = [
'resilience:recovery:reexport-share',
'resilience:recovery:sovereign-wealth',
] as const;
const healthJsText = readFileSync(join(repoRoot, 'api/health.js'), 'utf-8');
const seedHealthJsText = readFileSync(join(repoRoot, 'api/seed-health.js'), 'utf-8');
for (const key of SHARED_RESILIENCE_KEYS) {
it(`'${key}' is registered in api/health.js SEED_META`, () => {
const metaKey = `seed-meta:${key}`;
assert.ok(
healthJsText.includes(`'${metaKey}'`) || healthJsText.includes(`"${metaKey}"`),
`api/health.js must register '${metaKey}' in SEED_META`,
);
});
it(`'${key}' is registered in api/seed-health.js SEED_DOMAINS`, () => {
assert.ok(
seedHealthJsText.includes(`'${key}'`) || seedHealthJsText.includes(`"${key}"`),
`api/seed-health.js must register '${key}' in SEED_DOMAINS`,
);
});
}
});
});