worldmonitor/tests/resilience-construct-invariants.test.mts at main

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00
Files
Elie Habib df392b0514 feat(resilience): PR 0 — cohort-sanity release-gate harness (#3369 )
* feat(resilience): PR 0 — cohort-sanity release-gate harness

Lands the audit infrastructure for the resilience cohort-ranking
structural audit (plan 2026-04-24-002). Release gate, not merge gate:
the audit tells release review what to look at before publishing a
ranking; it does not block a PR.

What's new
- scripts/audit-resilience-cohorts.mjs — Markdown report generator.
  Fetches the live ranking + per-country scores (or reads a fixture
  in offline mode), emits per-cohort per-dimension tables, contribution
  decomposition, saturated / outlier / identical-score flags, and a
  top-N movers comparison vs a baseline snapshot.
- tests/resilience-construct-invariants.test.mts — 12 formula-level
  anchor-value assertions with synthetic inputs. Covers HHI, external
  debt (Greenspan-Guidotti anchor), and sovereign fiscal buffer
  (saturating transform). Tests the MATH, not a country's rank.
- tests/fixtures/resilience-audit-fixture.json — offline fixture that
  mirrors the 2026-04-24 GCC state (KW>QA>AE) so the audit tool can
  be smoke-tested without API-key access.
- docs/methodology/cohort-sanity-release-gate.md — operational doc
  explaining when to run, how to read the report, and the explicit
  anti-pattern note on rank-targeted acceptance criteria.

Verified
- `npx tsx --test tests/resilience-construct-invariants.test.mts` —
  12 pass (HHI, debt, SWF invariants all green against current scorer)
- `npm run test:data` — 6706 pass / 0 fail
- `FIXTURE=tests/fixtures/resilience-audit-fixture.json
   OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs`
  runs to completion and correctly flags:
  (a) coverage-outlier on AE.importConcentration (0.3 vs peers 1.0)
  (b) saturated-high on GCC.externalDebtCoverage (all 6 at 100)
  — the two top cohort-sanity findings from the plan.

Not in this PR
- The live-API baseline snapshot
  (docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json)
  is deferred to a manual release-prep step: run
  `WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app
   node scripts/freeze-resilience-ranking.mjs` before the first
  methodology PR (PR 1 HHI period widening) so its movers table has
  something to compare against.
- No scorer changes. No cache-prefix bumps. This PR is pure tooling.

* fix(resilience): fail-closed on fetch failures + pillar-combine formula mode

Addresses review P1 + P2 on PR #3369.

P1 — fetch-failure silent-drop.
Per-country score fetches that failed were logged to stderr, silently
stored as null, and then filtered out of cohort tables via
`codes.filter((cc) => scoreMap.get(cc))`. A transient 403/500 on the
very country carrying the ranking anomaly could produce a Markdown
report that looked valid — wrong failure mode for a release gate.

Fix:
- `fetchScoresConcurrent` now tracks failures in a dedicated Map and
  does NOT insert null placeholders; missing cohort members are
  computed against the requested cohort code set.
- The report has a ⛔ blocker banner at top AND an always-rendered
  "Fetch failures / missing members" section (shown even when empty,
  so an operator learns to look).
- `STRICT=1` writes the report, then exits code 3 on any fetch
  failure or missing cohort member, code 4 on formula-mode drift,
  code 0 otherwise. Automation can differentiate the two.

P2 — pillar-combine formula mode invalidates contribution rows.
`docs/methodology/cohort-sanity-release-gate.md:63` tells operators
to run this audit before activating `RESILIENCE_PILLAR_COMBINE_ENABLED`,
but the contribution decomposition is a domain-weighted roll-up that
is ONLY valid when `overallScore = sum(domain.score * domain.weight)`.
Once pillar combine is on, `overallScore = penalizedPillarScore(pillars)`
(non-linear in dim scores); decomposition rows become materially
misleading for exactly the release-gate scenario the doc prescribes.

Fix:
- Added `detectFormulaMode(scoreMap)` that takes countries with:
  (a) `sum(domain.weight)` within 0.05 of 1.0 (complete response), AND
  (b) every dim at `coverage ≥ 0.9` (stable share math)
  and compares `|Σ contributions - overallScore|` against
  `CONTRIB_TOLERANCE` (default 1.5). If > 50% of ≥ 3 eligible
  countries drift, pillar combine is flagged.
- Report emits a ⛔ blocker banner at top, a "Formula mode" line in
  the header, and a "Formula-mode diagnostic" section with the first
  three offenders. Under `STRICT=1` exits code 4.
- Methodology doc updated: new "Fail-closed semantics" section,
  "Formula mode" operator guide, ENV table entries for STRICT +
  CONTRIB_TOLERANCE.

Verified:
- `tests/audit-cohort-formula-detection.test.mts` (NEW) — 3 child-process
  smoke tests: missing-members banner + STRICT exit 3, all-clear exit 0,
  pillar-mode banner + STRICT exit 4. All pass.
- `npx tsx --test tests/resilience-construct-invariants.test.mts
   tests/audit-cohort-formula-detection.test.mts` — 15 pass / 0 fail
- `npm run test:data` — 6709 pass / 0 fail
- `npm run typecheck` / `typecheck:api` — green
- `npm run lint` / `lint:md` — no warnings on new / changed files
  (refactor split buildReport complexity from 51 → under 50 by
  extracting `renderCohortSection` + `renderDimCell`)
- Fixture smoke: AE.importConcentration coverage-outlier and
  GCC.externalDebtCoverage saturated-high flags still fire correctly.

* fix(resilience): PR 0 review — fixture-mode source label, try/catch country-names, ASCII minus

Addresses 3 P2 Greptile findings on #3369:

1. **Misleading Source: line in fixture mode.** `FIXTURE_PATH` sets
   `API_BASE=''`, so the report header showed a bare "/api/..." path that
   never resolved — making a fixture run visually indistinguishable from
   a live run. Now surfaces `Source: fixture://<path>` in fixture mode.

2. **`loadCountryNameMap` crashes without useful diagnostics.** A missing
   or unparseable `shared/country-names.json` produced a raw unhandled
   rejection. Now the read and the parse are each wrapped in their own
   try/catch; on either failure the script logs a developer-friendly
   warning and falls back to ISO-2 codes (report shows "AE" instead of
   "Uae"). Keeps the audit operable in CI-offline scenarios.

3. **Unicode minus `−` (U+2212) instead of ASCII `-` in `fmtDelta`.**
   Downstream operators diff / grep / CSV-pipe the report; the Unicode
   minus breaks byte-level text tooling. Replaced with ASCII hyphen-
   minus. Left the U+2212 in the formula-mode diagnostic prose
   (`|Σ contributions − overallScore|`) where it's mathematical notation,
   not data.

Verified

- `npx tsx --test tests/audit-cohort-formula-detection.test.mts tests/resilience-construct-invariants.test.mts` — 15 pass / 0 fail
- Fixture-mode run produces `Source: fixture://tests/fixtures/...`
- Movers-table negative deltas now use ASCII `-`
2026-04-24 18:13:22 +04:00
6.6 KiB

Raw Permalink Blame History

View Raw
6.6 KiB Raw Permalink Blame History

6.6 KiB

Raw Permalink Blame History