mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
feat(resilience): PR 0 — cohort-sanity release-gate harness (#3369)
* feat(resilience): PR 0 — cohort-sanity release-gate harness Lands the audit infrastructure for the resilience cohort-ranking structural audit (plan 2026-04-24-002). Release gate, not merge gate: the audit tells release review what to look at before publishing a ranking; it does not block a PR. What's new - scripts/audit-resilience-cohorts.mjs — Markdown report generator. Fetches the live ranking + per-country scores (or reads a fixture in offline mode), emits per-cohort per-dimension tables, contribution decomposition, saturated / outlier / identical-score flags, and a top-N movers comparison vs a baseline snapshot. - tests/resilience-construct-invariants.test.mts — 12 formula-level anchor-value assertions with synthetic inputs. Covers HHI, external debt (Greenspan-Guidotti anchor), and sovereign fiscal buffer (saturating transform). Tests the MATH, not a country's rank. - tests/fixtures/resilience-audit-fixture.json — offline fixture that mirrors the 2026-04-24 GCC state (KW>QA>AE) so the audit tool can be smoke-tested without API-key access. - docs/methodology/cohort-sanity-release-gate.md — operational doc explaining when to run, how to read the report, and the explicit anti-pattern note on rank-targeted acceptance criteria. Verified - `npx tsx --test tests/resilience-construct-invariants.test.mts` — 12 pass (HHI, debt, SWF invariants all green against current scorer) - `npm run test:data` — 6706 pass / 0 fail - `FIXTURE=tests/fixtures/resilience-audit-fixture.json OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs` runs to completion and correctly flags: (a) coverage-outlier on AE.importConcentration (0.3 vs peers 1.0) (b) saturated-high on GCC.externalDebtCoverage (all 6 at 100) — the two top cohort-sanity findings from the plan. Not in this PR - The live-API baseline snapshot (docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json) is deferred to a manual release-prep step: run `WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app node scripts/freeze-resilience-ranking.mjs` before the first methodology PR (PR 1 HHI period widening) so its movers table has something to compare against. - No scorer changes. No cache-prefix bumps. This PR is pure tooling. * fix(resilience): fail-closed on fetch failures + pillar-combine formula mode Addresses review P1 + P2 on PR #3369. P1 — fetch-failure silent-drop. Per-country score fetches that failed were logged to stderr, silently stored as null, and then filtered out of cohort tables via `codes.filter((cc) => scoreMap.get(cc))`. A transient 403/500 on the very country carrying the ranking anomaly could produce a Markdown report that looked valid — wrong failure mode for a release gate. Fix: - `fetchScoresConcurrent` now tracks failures in a dedicated Map and does NOT insert null placeholders; missing cohort members are computed against the requested cohort code set. - The report has a ⛔ blocker banner at top AND an always-rendered "Fetch failures / missing members" section (shown even when empty, so an operator learns to look). - `STRICT=1` writes the report, then exits code 3 on any fetch failure or missing cohort member, code 4 on formula-mode drift, code 0 otherwise. Automation can differentiate the two. P2 — pillar-combine formula mode invalidates contribution rows. `docs/methodology/cohort-sanity-release-gate.md:63` tells operators to run this audit before activating `RESILIENCE_PILLAR_COMBINE_ENABLED`, but the contribution decomposition is a domain-weighted roll-up that is ONLY valid when `overallScore = sum(domain.score * domain.weight)`. Once pillar combine is on, `overallScore = penalizedPillarScore(pillars)` (non-linear in dim scores); decomposition rows become materially misleading for exactly the release-gate scenario the doc prescribes. Fix: - Added `detectFormulaMode(scoreMap)` that takes countries with: (a) `sum(domain.weight)` within 0.05 of 1.0 (complete response), AND (b) every dim at `coverage ≥ 0.9` (stable share math) and compares `|Σ contributions - overallScore|` against `CONTRIB_TOLERANCE` (default 1.5). If > 50% of ≥ 3 eligible countries drift, pillar combine is flagged. - Report emits a ⛔ blocker banner at top, a "Formula mode" line in the header, and a "Formula-mode diagnostic" section with the first three offenders. Under `STRICT=1` exits code 4. - Methodology doc updated: new "Fail-closed semantics" section, "Formula mode" operator guide, ENV table entries for STRICT + CONTRIB_TOLERANCE. Verified: - `tests/audit-cohort-formula-detection.test.mts` (NEW) — 3 child-process smoke tests: missing-members banner + STRICT exit 3, all-clear exit 0, pillar-mode banner + STRICT exit 4. All pass. - `npx tsx --test tests/resilience-construct-invariants.test.mts tests/audit-cohort-formula-detection.test.mts` — 15 pass / 0 fail - `npm run test:data` — 6709 pass / 0 fail - `npm run typecheck` / `typecheck:api` — green - `npm run lint` / `lint:md` — no warnings on new / changed files (refactor split buildReport complexity from 51 → under 50 by extracting `renderCohortSection` + `renderDimCell`) - Fixture smoke: AE.importConcentration coverage-outlier and GCC.externalDebtCoverage saturated-high flags still fire correctly. * fix(resilience): PR 0 review — fixture-mode source label, try/catch country-names, ASCII minus Addresses 3 P2 Greptile findings on #3369: 1. **Misleading Source: line in fixture mode.** `FIXTURE_PATH` sets `API_BASE=''`, so the report header showed a bare "/api/..." path that never resolved — making a fixture run visually indistinguishable from a live run. Now surfaces `Source: fixture://<path>` in fixture mode. 2. **`loadCountryNameMap` crashes without useful diagnostics.** A missing or unparseable `shared/country-names.json` produced a raw unhandled rejection. Now the read and the parse are each wrapped in their own try/catch; on either failure the script logs a developer-friendly warning and falls back to ISO-2 codes (report shows "AE" instead of "Uae"). Keeps the audit operable in CI-offline scenarios. 3. **Unicode minus `−` (U+2212) instead of ASCII `-` in `fmtDelta`.** Downstream operators diff / grep / CSV-pipe the report; the Unicode minus breaks byte-level text tooling. Replaced with ASCII hyphen- minus. Left the U+2212 in the formula-mode diagnostic prose (`|Σ contributions − overallScore|`) where it's mathematical notation, not data. Verified - `npx tsx --test tests/audit-cohort-formula-detection.test.mts tests/resilience-construct-invariants.test.mts` — 15 pass / 0 fail - Fixture-mode run produces `Source: fixture://tests/fixtures/...` - Movers-table negative deltas now use ASCII `-`
This commit is contained in:
240
docs/methodology/cohort-sanity-release-gate.md
Normal file
240
docs/methodology/cohort-sanity-release-gate.md
Normal file
@@ -0,0 +1,240 @@
|
||||
# Cohort-sanity release gate
|
||||
|
||||
Operational procedure for the resilience cohort-sanity audit. This is a
|
||||
**release gate**, not a merge gate. The audit tells release review what
|
||||
to look at before publishing a ranking; it does not block a PR from
|
||||
merging.
|
||||
|
||||
## What this exists to catch
|
||||
|
||||
A composite resilience score can be mathematically correct yet produce
|
||||
rankings that contradict first-principles domain judgment — usually
|
||||
because ONE input has a coverage gap, a saturated goalpost, or a
|
||||
denominator that's structurally wrong for one sub-class of entities
|
||||
(re-export hubs, single-sector states, SWF-parked-reserve designs).
|
||||
|
||||
Cohort-sanity is the test the codebase can't run on its own. It says:
|
||||
"given these cohorts, does the ranking match the construct each
|
||||
cohort is defined to probe?" Not "does country A rank above country
|
||||
B" — see the anti-pattern section below.
|
||||
|
||||
Relevant background in the repository:
|
||||
|
||||
- `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md` —
|
||||
the audit plan that motivates this gate.
|
||||
- Skill `cohort-ranking-sanity-surfaces-hidden-data-gaps` — the general
|
||||
diagnostic protocol (data bug / methodology bug / construct limitation
|
||||
/ value judgment), including the anti-pattern note on rank-targeted
|
||||
acceptance criteria.
|
||||
- `tests/resilience-construct-invariants.test.mts` — formula-level
|
||||
invariants with synthetic inputs. These test the SCORING MATH; they
|
||||
don't flip to fail on a live-ranking change.
|
||||
|
||||
## Artifacts
|
||||
|
||||
1. **`scripts/audit-resilience-cohorts.mjs`** — emits a structured
|
||||
Markdown report with:
|
||||
- Full top-N ranking table
|
||||
- Per-cohort per-dimension breakdown (GCC, OECD-nuclear, ASEAN trade
|
||||
hubs, LatAm-petro, African-fragile, post-Soviet, stressed-debt,
|
||||
re-export hubs, SWF-heavy exporters, fragile-floor)
|
||||
- Contribution decomposition: for each country, each dim's
|
||||
`score × coverage × dimWeight × domainWeight` contribution to
|
||||
overall
|
||||
- Flagged patterns: saturated dims, low-coverage outliers, identical
|
||||
scores across cohort members
|
||||
- Top-N movers vs a baseline snapshot
|
||||
|
||||
2. **`tests/resilience-construct-invariants.test.mts`** — formula-level
|
||||
anchor-value assertions. Part of `npm run test:data`. Failing means
|
||||
the scorer formula drifted; investigate before editing the test.
|
||||
|
||||
3. **`docs/snapshots/resilience-ranking-live-pre-cohort-audit-YYYY-MM-DD.json`** —
|
||||
the baseline snapshot for movers comparison. Refresh before each
|
||||
methodology change.
|
||||
|
||||
## When to run
|
||||
|
||||
- **Pre-publication**: any time the published ranking is about to
|
||||
change externally (site, API consumers, newsletter, partner feed).
|
||||
- **Every merge touching a scorer file** in `server/worldmonitor/resilience/v1/_dimension-scorers.ts`,
|
||||
`server/worldmonitor/resilience/v1/_shared.ts`, or a scorer-feeding
|
||||
seeder in `scripts/seed-recovery-*.mjs`, `scripts/seed-bundle-resilience-*.mjs`.
|
||||
- **Before activating a feature flag** that alters the scorer
|
||||
(`RESILIENCE_ENERGY_V2_ENABLED`, `RESILIENCE_PILLAR_COMBINE_ENABLED`,
|
||||
`RESILIENCE_SCHEMA_V2_ENABLED`).
|
||||
- **After a cache-prefix bump** (`resilience:score:vN`,
|
||||
`resilience:ranking:vN`, `resilience:history:vN`) — once the new
|
||||
prefix has warmed up, rerun the audit so the movers table reflects
|
||||
the new values and nothing else.
|
||||
|
||||
## How to run
|
||||
|
||||
```bash
|
||||
# Online (hits the live API; requires WORLDMONITOR_API_KEY)
|
||||
WORLDMONITOR_API_KEY=wm_xxx \
|
||||
API_BASE=https://api.worldmonitor.app \
|
||||
BASELINE=docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json \
|
||||
OUT=/tmp/cohort-audit-$(date +%Y-%m-%d).md \
|
||||
node scripts/audit-resilience-cohorts.mjs
|
||||
|
||||
# Offline (fixture mode — for CI / dry-run / regression comparison)
|
||||
FIXTURE=tests/fixtures/resilience-audit-fixture.json \
|
||||
OUT=/tmp/cohort-audit-fixture.md \
|
||||
node scripts/audit-resilience-cohorts.mjs
|
||||
```
|
||||
|
||||
Recommended environment variables:
|
||||
|
||||
| Var | Default | Notes |
|
||||
|---|---|---|
|
||||
| `API_BASE` | (required unless FIXTURE set) | e.g. `https://api.worldmonitor.app` |
|
||||
| `WORLDMONITOR_API_KEY` | (required unless FIXTURE set) | resilience RPCs are in `PREMIUM_RPC_PATHS` |
|
||||
| `FIXTURE` | (empty) | JSON fixture with `{ ranking, scores }` shape — skips all network calls |
|
||||
| `BASELINE` | (empty) | Path to a frozen ranking JSON for movers comparison |
|
||||
| `OUT` | (stdout) | Path for the Markdown report |
|
||||
| `TOP_N` | 60 | Rows to render in the full-ranking table |
|
||||
| `MOVERS_N` | 30 | Rows to render in the movers table |
|
||||
| `CONCURRENCY` | 6 | Parallel score-endpoint fetches |
|
||||
| `STRICT` | unset | `1` = fail-closed. Report still writes, then exit 3 on fetch failures/missing members, exit 4 on formula-mode drift, exit 0 otherwise. Recommended for release-gate automation. |
|
||||
| `CONTRIB_TOLERANCE` | 1.5 | Points of drift tolerated between `Σ contributions` and `overallScore` before formula-mode drift is declared. |
|
||||
|
||||
### Fail-closed semantics
|
||||
|
||||
The audit is fail-closed on two axes. Both are implemented in
|
||||
`scripts/audit-resilience-cohorts.mjs` and documented here so that a
|
||||
release-gate operator cannot shortcut them by reading only the
|
||||
rendered tables.
|
||||
|
||||
1. **Fetch failures / missing cohort members.** When a per-country score
|
||||
fetch fails (HTTP 4xx/5xx, timeout, DNS), the country is NOT silently
|
||||
dropped. The failure is recorded in the run's `failures` map, banner'd
|
||||
as a ⛔ block at the top of the report, and rendered in a dedicated
|
||||
"Fetch failures / missing members" section that is ALWAYS present
|
||||
(even when empty, so an operator learns to look for it). Fixture mode
|
||||
uses the same mechanism for cohort members absent from the fixture.
|
||||
|
||||
2. **Formula-mode mismatch (`RESILIENCE_PILLAR_COMBINE_ENABLED`).** The
|
||||
contribution decomposition is a domain-weighted roll-up that is ONLY
|
||||
mathematically valid when `overallScore` is computed via the legacy
|
||||
`sum(domain.score * domain.weight)` path. Once pillar combine is on,
|
||||
`overallScore = penalizedPillarScore(pillars)` — a non-linear
|
||||
function of the dim scores — and the decomposition rows no longer
|
||||
sum to overall. The harness detects this by taking any country with:
|
||||
|
||||
- `sum(domain.weight)` within 0.05 of 1.0 (complete response)
|
||||
- every dim at `coverage ≥ 0.9` (stable share math)
|
||||
|
||||
and checking `|Σ contributions - overallScore| ≤ CONTRIB_TOLERANCE`.
|
||||
If more than 50% of ≥ 3 eligible countries drift beyond the
|
||||
tolerance, a ⛔ blocker banner fires at report top AND a
|
||||
"Formula-mode diagnostic" section prints the first three offenders
|
||||
with their Σ vs overall numbers. Until the harness grows a
|
||||
pillar-aware decomposition, the contribution tables under pillar
|
||||
mode must be treated as *"legacy-formula reference only"*.
|
||||
|
||||
### Formula mode
|
||||
|
||||
The operator guide for what to do when the formula-mode banner fires:
|
||||
|
||||
- **If the banner is a false positive** (e.g. scorer changed a dim
|
||||
weight and the audit mirror in `scripts/audit-resilience-cohorts.mjs`
|
||||
`DIM_WEIGHTS` is stale): update the mirror, re-run. This is the
|
||||
`production-logic-mirror-silent-divergence` pattern — the mirror
|
||||
must move with the scorer.
|
||||
- **If pillar combine actually activated:** stop using the
|
||||
contribution-decomposition tables for this release gate. Fall back
|
||||
to the per-dimension score table + the construct invariants test +
|
||||
movers review. File a follow-up to grow the harness a pillar-aware
|
||||
decomposition before the next methodology PR under pillar mode.
|
||||
- **Exit codes under `STRICT=1`:** `3` = fetch/missing, `4` = formula
|
||||
mode, `0` = all clear. These are distinct so automation can
|
||||
differentiate "the infra is broken" from "the code path is no
|
||||
longer decomposable."
|
||||
|
||||
## How to read the report
|
||||
|
||||
The report surfaces five categories of signal. **Treat each as a
|
||||
prompt for investigation, not a merge gate.**
|
||||
|
||||
### 1. Per-cohort per-dimension table
|
||||
|
||||
Read across rows. If one country has `IMPUTED` / `unmonitored` /
|
||||
`coverage < 0.5` where peers have full coverage, that's a seed-level
|
||||
gap — probably a late-reporter window or a missing manifest entry.
|
||||
Fix the seed, not the score.
|
||||
|
||||
### 2. Contribution decomposition
|
||||
|
||||
Each cell shows how many overall-score points that dimension
|
||||
contributes to that country. If the row sum doesn't match overall
|
||||
score (not within ~0.5 points), the scorer is using a composition
|
||||
formula the audit script doesn't understand — investigate
|
||||
`_shared.ts`'s `coverageWeightedMean` + `penalizedPillarScore`
|
||||
branches and update the decomposition accordingly.
|
||||
|
||||
### 3. Flagged patterns
|
||||
|
||||
- **Saturated-high**: every cohort member scores > 95 on a dim. The
|
||||
dim contributes zero discrimination within that cohort — either the
|
||||
construct genuinely doesn't apply (acceptable; document in
|
||||
`known-limitations.md`), or the goalpost is too generous (re-anchor).
|
||||
- **Saturated-low**: every member scores < 5. Same question in reverse;
|
||||
often a seed failure rather than a construct issue.
|
||||
- **Identical scores**: all ≥ 3 cohort members hit the same non-trivial
|
||||
value. Usually a regional-default leak or a missing-data imputation
|
||||
class returning the same number.
|
||||
- **Coverage outlier**: one country is `coverage < 0.5` while peers
|
||||
are ≥ 0.9. This is almost always the ranking-inversion smoking gun.
|
||||
|
||||
### 4. Top-N movers vs baseline
|
||||
|
||||
Expected movers post-methodology-PR are construct-consistent: a
|
||||
re-export-hub PR should move re-export hubs, not SWF-heavy exporters.
|
||||
Surprise movers trigger investigation before publication.
|
||||
|
||||
### 5. Anchor invariants
|
||||
|
||||
Run `npx tsx --test tests/resilience-construct-invariants.test.mts`.
|
||||
An anchor drift > 1 point on `score(ratio=1.0)=50` or
|
||||
`score(em=12)≈63` means someone silently re-goalposted or rewrote a
|
||||
saturating transform. This is a bug until proven otherwise.
|
||||
|
||||
## Anti-pattern: rank-targeted acceptance criteria
|
||||
|
||||
**Never put "ENTITY A > ENTITY B" as a merge gate in this workflow.**
|
||||
Once a review commits to producing a specific ranking, every construct
|
||||
/ manifest / goalpost knob becomes a lever to tune toward that
|
||||
outcome — even subconsciously — and the methodology loses its
|
||||
construct integrity.
|
||||
|
||||
Use instead:
|
||||
|
||||
- **Construct monotonicity tests** — synthetic inputs, not country
|
||||
identity: `score(HHI=0.05) > score(HHI=0.20)`,
|
||||
`score(ratio=1.0) = 50 ± 1`. These fail when the MATH breaks, not
|
||||
when the RANKING changes.
|
||||
- **Out-of-sample cohort behaviour** — define a cohort the fix is
|
||||
SUPPOSED to move proportionally (re-export hubs, SWF-heavy
|
||||
exporters, stressed states). Acceptance: cohort behaviour matches
|
||||
the construct change, not a target position.
|
||||
- **Top-N movers review** — movers should be cohort members the
|
||||
construct predicts; surprises trigger investigation.
|
||||
- **Honest "outcome may not resolve"** — if the original sanity-
|
||||
failure (the ranking inversion that triggered the audit) is not
|
||||
guaranteed to resolve under the in-scope fixes, say so explicitly.
|
||||
A plan that acknowledges "the inversion may persist after all
|
||||
fixes, because the dominant driver is out of scope" is stronger
|
||||
than one that over-promises.
|
||||
|
||||
If a release reviewer asks "will this make A rank above B", the
|
||||
correct answer is: *"A will move by the amount the construct
|
||||
predicts. Where it ends up relative to B is an outcome."*
|
||||
|
||||
## Follow-ups
|
||||
|
||||
- Every novel gap identified by the audit should land as a section in
|
||||
`docs/methodology/known-limitations.md` so future reviewers see the
|
||||
diagnosis trail.
|
||||
- If a gap is fixed in a PR, the audit report from that PR's
|
||||
post-merge run should be attached to the PR as an artifact.
|
||||
680
scripts/audit-resilience-cohorts.mjs
Normal file
680
scripts/audit-resilience-cohorts.mjs
Normal file
@@ -0,0 +1,680 @@
|
||||
#!/usr/bin/env node
|
||||
// Release-gate audit harness for the resilience scorer. Emits a Markdown
|
||||
// report that surfaces cohort-level ranking sanity issues BEFORE they reach
|
||||
// publication. Designed as a release gate, not a commit gate — see
|
||||
// docs/methodology/cohort-sanity-release-gate.md for the interpretation
|
||||
// contract and the explicit anti-pattern note on rank-targeted acceptance
|
||||
// criteria.
|
||||
//
|
||||
// What this does:
|
||||
// 1. Fetch the live ranking via GET /api/resilience/v1/get-resilience-ranking.
|
||||
// 2. For every country in the named cohorts (GCC, OECD-nuclear, ASEAN-
|
||||
// trade-hub, LatAm-petro, African-fragile, post-Soviet, stressed-debt),
|
||||
// fetch the full per-dimension score via GET
|
||||
// /api/resilience/v1/get-resilience-score?countryCode=XX.
|
||||
// 3. Emit a Markdown report with:
|
||||
// - Full ranking table (top N + grey-outs summary)
|
||||
// - Per-cohort per-dimension breakdown (score / coverage / imputation)
|
||||
// - Contribution decomposition: per country, per dim,
|
||||
// (score × coverage × dimWeight × domainWeight) toward overall
|
||||
// - Flagged patterns: saturated dimensions (>95 across cohort),
|
||||
// low-coverage outliers (coverage < 0.5 where peers are 1.0),
|
||||
// identical-score clusters (same score across all cohort members)
|
||||
// - Top-N movers vs a baseline snapshot (optional)
|
||||
//
|
||||
// What this does NOT do:
|
||||
// - Assert country rank orderings ("AE > KW"). That would couple the gate
|
||||
// to outcome-seeking; the audit is intentionally descriptive.
|
||||
// - Fail the build. It's a report generator. Release review reads the
|
||||
// report and decides whether to hold publication.
|
||||
//
|
||||
// Usage:
|
||||
// WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app \
|
||||
// node scripts/audit-resilience-cohorts.mjs
|
||||
// WORLDMONITOR_API_KEY=wm_xxx API_BASE=... \
|
||||
// BASELINE=docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json \
|
||||
// OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs
|
||||
// FIXTURE=tests/fixtures/resilience-audit-fixture.json node scripts/audit-resilience-cohorts.mjs
|
||||
//
|
||||
// Auth: the resilience ranking + score endpoints are in PREMIUM_RPC_PATHS
|
||||
// (see src/shared/premium-paths.ts). A valid WORLDMONITOR_API_KEY is
|
||||
// required whether running from a trusted browser origin or not — the
|
||||
// premium gate forces the key.
|
||||
//
|
||||
// Fixture mode (FIXTURE env): reads a JSON file with shape
|
||||
// { ranking: GetResilienceRankingResponse, scores: { [cc]: GetResilienceScoreResponse } }
|
||||
// and builds the report without any network calls. Useful for offline runs
|
||||
// and for regression-comparing the audit output itself across scorer
|
||||
// changes (diff the Markdown).
|
||||
//
|
||||
// Failure modes the script explicitly surfaces (NOT silent-drops):
|
||||
// 1. Per-country fetch failure (HTTP 4xx/5xx, timeout). Tracked in a
|
||||
// `failures` map, rendered as a top-of-report blocker banner and a
|
||||
// dedicated "Fetch failures / missing members" section, so a
|
||||
// reviewer skimming the artifact cannot miss that the cohort was
|
||||
// only partially audited.
|
||||
// 2. Formula-mode mismatch. When `RESILIENCE_PILLAR_COMBINE_ENABLED`
|
||||
// is active, `overallScore = penalizedPillarScore(pillars)` — a
|
||||
// non-linear function of the dim scores — and the contribution
|
||||
// decomposition (domain-weighted) no longer sums to overall. The
|
||||
// harness detects this via Σ-contribution vs overall drift and
|
||||
// flags it at report top so the operator knows the decomposition
|
||||
// rows are reference-only.
|
||||
// STRICT=1 exits non-zero (code 3 for fetch failures, 4 for formula
|
||||
// mismatch) AFTER writing the report, so release-gate automation can't
|
||||
// treat a partial/stale audit as green.
|
||||
|
||||
import fs from 'node:fs/promises';
|
||||
import path from 'node:path';
|
||||
import { fileURLToPath } from 'node:url';
|
||||
import { execSync } from 'node:child_process';
|
||||
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
const __dirname = path.dirname(__filename);
|
||||
const REPO_ROOT = path.resolve(__dirname, '..');
|
||||
|
||||
const FIXTURE_PATH = process.env.FIXTURE || '';
|
||||
const API_BASE = (process.env.API_BASE || '').replace(/\/$/, '');
|
||||
if (!FIXTURE_PATH) {
|
||||
if (!API_BASE) {
|
||||
console.error('[audit-resilience-cohorts] API_BASE env var required (e.g. https://api.worldmonitor.app), or FIXTURE=path.json for offline mode');
|
||||
process.exit(2);
|
||||
}
|
||||
if (!process.env.WORLDMONITOR_API_KEY) {
|
||||
console.error('[audit-resilience-cohorts] WORLDMONITOR_API_KEY env var required; resilience RPC paths are in PREMIUM_RPC_PATHS.');
|
||||
process.exit(2);
|
||||
}
|
||||
}
|
||||
|
||||
const RANKING_URL = `${API_BASE}/api/resilience/v1/get-resilience-ranking`;
|
||||
const SCORE_URL = (cc) => `${API_BASE}/api/resilience/v1/get-resilience-score?countryCode=${encodeURIComponent(cc)}`;
|
||||
const BASELINE_PATH = process.env.BASELINE || '';
|
||||
const OUT_PATH = process.env.OUT || '';
|
||||
const TOP_N_FULL_RANKING = Number(process.env.TOP_N || 60);
|
||||
const MOVERS_N = Number(process.env.MOVERS_N || 30);
|
||||
const CONCURRENCY = Number(process.env.CONCURRENCY || 6);
|
||||
// STRICT=1 makes the audit fail-closed: any per-country fetch failure OR any
|
||||
// detected formula-mode change (pillar-combine on, contribution rows
|
||||
// invalid) exits non-zero so the release-gate operator cannot accidentally
|
||||
// ship a partial / misleading report. Default (STRICT unset) still renders
|
||||
// but banners the issue prominently at report top.
|
||||
const STRICT = process.env.STRICT === '1' || process.env.STRICT === 'true';
|
||||
// Tolerance for "sum(contributions) vs overallScore" equality check used
|
||||
// to detect pillar-combine formula mode (see decomposeContributions).
|
||||
const CONTRIBUTION_SUM_TOLERANCE = Number(process.env.CONTRIB_TOLERANCE || 1.5);
|
||||
|
||||
// Named cohorts. Membership reflects the construct question each cohort
|
||||
// answers — not "who should rank where." See release-gate doc for rationale.
|
||||
const COHORTS = {
|
||||
GCC: ['AE', 'SA', 'KW', 'QA', 'OM', 'BH'],
|
||||
'OECD-nuclear': ['FR', 'US', 'GB', 'JP', 'KR', 'DE', 'CA', 'FI', 'SE', 'BE'],
|
||||
'ASEAN-trade-hub': ['SG', 'MY', 'TH', 'VN', 'ID', 'PH'],
|
||||
'LatAm-petro': ['BR', 'MX', 'CO', 'VE', 'AR', 'EC'],
|
||||
'African-fragile': ['NG', 'ZA', 'ET', 'KE', 'GH', 'CD', 'SD'],
|
||||
'Post-Soviet': ['RU', 'KZ', 'AZ', 'UA', 'UZ', 'GE', 'AM'],
|
||||
'Stressed-debt': ['LK', 'PK', 'AR', 'LB', 'TR', 'EG', 'TN'],
|
||||
'Re-export-hub': ['SG', 'HK', 'NL', 'BE', 'PA', 'AE', 'MY', 'LT'],
|
||||
'SWF-heavy-exporter': ['NO', 'QA', 'KW', 'SA', 'KZ', 'AZ'],
|
||||
'Fragile-floor': ['YE', 'SY', 'SO', 'AF'],
|
||||
};
|
||||
|
||||
// Coarse domain weights mirrored from _dimension-scorers.ts for contribution
|
||||
// decomposition. The live API already returns domain.weight per country,
|
||||
// so we READ that from the API rather than hardcoding — this table is only
|
||||
// used for sanity-cross-check in the header.
|
||||
const EXPECTED_DOMAIN_WEIGHTS = {
|
||||
economic: 0.17,
|
||||
infrastructure: 0.15,
|
||||
energy: 0.11,
|
||||
'social-governance': 0.19,
|
||||
'health-food': 0.13,
|
||||
recovery: 0.25,
|
||||
};
|
||||
|
||||
function commitSha() {
|
||||
try {
|
||||
return execSync('git rev-parse HEAD', { cwd: REPO_ROOT, stdio: ['ignore', 'pipe', 'ignore'] })
|
||||
.toString()
|
||||
.trim();
|
||||
} catch {
|
||||
return 'unknown';
|
||||
}
|
||||
}
|
||||
|
||||
async function loadCountryNameMap() {
|
||||
const filePath = path.join(REPO_ROOT, 'shared', 'country-names.json');
|
||||
let raw;
|
||||
try {
|
||||
raw = await fs.readFile(filePath, 'utf8');
|
||||
} catch (err) {
|
||||
console.error(`[audit] shared/country-names.json read failed (${err.code || err.name}): ${err.message}. Falling back to ISO-2 codes in the report (country names will appear as CC).`);
|
||||
return {};
|
||||
}
|
||||
let forward;
|
||||
try {
|
||||
forward = JSON.parse(raw);
|
||||
} catch (err) {
|
||||
console.error(`[audit] shared/country-names.json parse failed: ${err.message}. Falling back to ISO-2 codes.`);
|
||||
return {};
|
||||
}
|
||||
const reverse = {};
|
||||
for (const [name, iso2] of Object.entries(forward)) {
|
||||
const code = String(iso2 || '').toUpperCase();
|
||||
if (!/^[A-Z]{2}$/.test(code)) continue;
|
||||
if (reverse[code]) continue;
|
||||
reverse[code] = name.replace(/\b([a-z])/g, (_, c) => c.toUpperCase());
|
||||
}
|
||||
return reverse;
|
||||
}
|
||||
|
||||
function apiHeaders() {
|
||||
const h = {
|
||||
accept: 'application/json',
|
||||
// Full UA (not the 10-char Node default) avoids middleware.ts's short-UA
|
||||
// bot guard that 403s bare `node` fetches on the edge path.
|
||||
'user-agent': 'audit-resilience-cohorts/1.0 (+scripts/audit-resilience-cohorts.mjs)',
|
||||
};
|
||||
if (process.env.WORLDMONITOR_API_KEY) {
|
||||
h['X-WorldMonitor-Key'] = process.env.WORLDMONITOR_API_KEY;
|
||||
}
|
||||
return h;
|
||||
}
|
||||
|
||||
async function fetchRanking() {
|
||||
const response = await fetch(RANKING_URL, { headers: apiHeaders() });
|
||||
if (!response.ok) {
|
||||
throw new Error(`HTTP ${response.status} from ${RANKING_URL}: ${await response.text().catch(() => '')}`);
|
||||
}
|
||||
return response.json();
|
||||
}
|
||||
|
||||
async function fetchScore(countryCode) {
|
||||
const response = await fetch(SCORE_URL(countryCode), { headers: apiHeaders() });
|
||||
if (!response.ok) {
|
||||
throw new Error(`HTTP ${response.status} for ${countryCode}`);
|
||||
}
|
||||
return response.json();
|
||||
}
|
||||
|
||||
async function fetchScoresConcurrent(countryCodes) {
|
||||
const scores = new Map();
|
||||
const failures = new Map(); // cc → error message
|
||||
const queue = [...countryCodes];
|
||||
async function worker() {
|
||||
while (queue.length) {
|
||||
const cc = queue.shift();
|
||||
if (!cc) return;
|
||||
try {
|
||||
const data = await fetchScore(cc);
|
||||
scores.set(cc, data);
|
||||
} catch (err) {
|
||||
console.error(`[audit] ${cc} failed: ${err.message}`);
|
||||
failures.set(cc, err.message || 'unknown fetch error');
|
||||
// Do NOT insert null into scores — silent-drop was the P1 bug.
|
||||
// Failures are tracked distinctly so the report can banner them
|
||||
// and STRICT mode can exit non-zero.
|
||||
}
|
||||
}
|
||||
}
|
||||
const workers = Array.from({ length: Math.min(CONCURRENCY, queue.length) }, worker);
|
||||
await Promise.all(workers);
|
||||
return { scores, failures };
|
||||
}
|
||||
|
||||
function round1(n) {
|
||||
return Math.round(n * 10) / 10;
|
||||
}
|
||||
|
||||
function round2(n) {
|
||||
return Math.round(n * 100) / 100;
|
||||
}
|
||||
|
||||
// Given a score document, compute the contribution of every dimension to the
|
||||
// overall score. The overall is (by construct) a domain-weighted roll-up of
|
||||
// coverage-weighted dimension means. For contribution reporting we use the
|
||||
// "effective share" each dim has toward overall:
|
||||
// domainShare = domainWeight
|
||||
// withinDomainShare = (dim.coverage × dimWeight) / Σ(coverage × dimWeight) for that domain
|
||||
// overallContribution = dim.score × withinDomainShare × domainShare
|
||||
// The sum of overallContribution across all dims ≈ overallScore (modulo
|
||||
// pillar-combine path when enabled, which isn't contribution-decomposable
|
||||
// by a clean formula).
|
||||
function decomposeContributions(scoreDoc, dimWeights) {
|
||||
const rows = [];
|
||||
for (const domain of scoreDoc.domains ?? []) {
|
||||
const dims = domain.dimensions ?? [];
|
||||
let denom = 0;
|
||||
for (const d of dims) {
|
||||
const w = dimWeights[d.id] ?? 1.0;
|
||||
denom += (d.coverage ?? 0) * w;
|
||||
}
|
||||
for (const d of dims) {
|
||||
const w = dimWeights[d.id] ?? 1.0;
|
||||
const withinDomainShare = denom > 0 ? ((d.coverage ?? 0) * w) / denom : 0;
|
||||
const contribution = (d.score ?? 0) * withinDomainShare * (domain.weight ?? 0);
|
||||
rows.push({
|
||||
domainId: domain.id,
|
||||
domainWeight: domain.weight,
|
||||
dimensionId: d.id,
|
||||
score: d.score,
|
||||
coverage: d.coverage,
|
||||
imputationClass: d.imputationClass || '',
|
||||
dimWeight: w,
|
||||
withinDomainShare,
|
||||
contribution,
|
||||
});
|
||||
}
|
||||
}
|
||||
return rows;
|
||||
}
|
||||
|
||||
// Weight multipliers mirrored from _dimension-scorers.ts. Mirror is acceptable
|
||||
// here because the audit script is a diagnostic — if dim weights drift we'll
|
||||
// see contribution rows that don't sum to overallScore and investigate.
|
||||
const DIM_WEIGHTS = {
|
||||
macroFiscal: 1.0,
|
||||
currencyExternal: 1.0,
|
||||
tradeSanctions: 1.0,
|
||||
cyberDigital: 1.0,
|
||||
logisticsSupply: 1.0,
|
||||
infrastructure: 1.0,
|
||||
energy: 1.0,
|
||||
governanceInstitutional: 1.0,
|
||||
socialCohesion: 1.0,
|
||||
borderSecurity: 1.0,
|
||||
informationCognitive: 1.0,
|
||||
healthPublicService: 1.0,
|
||||
foodWater: 1.0,
|
||||
fiscalSpace: 1.0,
|
||||
reserveAdequacy: 1.0,
|
||||
externalDebtCoverage: 1.0,
|
||||
importConcentration: 1.0,
|
||||
stateContinuity: 1.0,
|
||||
fuelStockDays: 1.0,
|
||||
liquidReserveAdequacy: 0.5,
|
||||
sovereignFiscalBuffer: 0.5,
|
||||
};
|
||||
|
||||
function flagDimensionPatterns(cohortName, cohortCodes, scoreMap) {
|
||||
const flags = [];
|
||||
// Collect per-dimension values across the cohort.
|
||||
const byDim = new Map();
|
||||
for (const cc of cohortCodes) {
|
||||
const doc = scoreMap.get(cc);
|
||||
if (!doc) continue;
|
||||
for (const domain of doc.domains ?? []) {
|
||||
for (const dim of domain.dimensions ?? []) {
|
||||
if (!byDim.has(dim.id)) byDim.set(dim.id, []);
|
||||
byDim.get(dim.id).push({ cc, score: dim.score, coverage: dim.coverage, imputationClass: dim.imputationClass });
|
||||
}
|
||||
}
|
||||
}
|
||||
for (const [dimId, entries] of byDim.entries()) {
|
||||
// Saturated dim: every member scores > 95
|
||||
if (entries.length >= 3 && entries.every((e) => e.score > 95)) {
|
||||
flags.push({
|
||||
cohort: cohortName,
|
||||
kind: 'saturated-high',
|
||||
dimension: dimId,
|
||||
message: `Every cohort member scores > 95 on ${dimId}; dim contributes zero discrimination within the cohort.`,
|
||||
});
|
||||
}
|
||||
// Saturated low: every member scores < 5
|
||||
if (entries.length >= 3 && entries.every((e) => e.score < 5)) {
|
||||
flags.push({
|
||||
cohort: cohortName,
|
||||
kind: 'saturated-low',
|
||||
dimension: dimId,
|
||||
message: `Every cohort member scores < 5 on ${dimId}; construct may not apply or seed is missing.`,
|
||||
});
|
||||
}
|
||||
// Identical score across cohort (variance = 0 and ≥ 3 entries)
|
||||
if (entries.length >= 3) {
|
||||
const first = entries[0].score;
|
||||
if (entries.every((e) => e.score === first) && first > 0 && first < 100) {
|
||||
flags.push({
|
||||
cohort: cohortName,
|
||||
kind: 'identical-scores',
|
||||
dimension: dimId,
|
||||
message: `All ${entries.length} cohort members have identical ${dimId} = ${first}; possible imputed-default or region-default leak.`,
|
||||
});
|
||||
}
|
||||
}
|
||||
// Low-coverage outlier: one entry has coverage < 0.5 while peers ≥ 0.9
|
||||
const lowCov = entries.filter((e) => (e.coverage ?? 0) < 0.5);
|
||||
const highCov = entries.filter((e) => (e.coverage ?? 0) >= 0.9);
|
||||
if (lowCov.length && highCov.length >= lowCov.length * 2) {
|
||||
flags.push({
|
||||
cohort: cohortName,
|
||||
kind: 'coverage-outlier',
|
||||
dimension: dimId,
|
||||
message: `Low coverage on ${dimId}: ${lowCov.map((e) => `${e.cc}(${round2(e.coverage)})`).join(', ')}; peers have full coverage.`,
|
||||
});
|
||||
}
|
||||
}
|
||||
return flags;
|
||||
}
|
||||
|
||||
function computeMovers(currentItems, baselineItems, n) {
|
||||
if (!baselineItems) return [];
|
||||
const baselineByCc = new Map(baselineItems.map((x) => [x.countryCode, x]));
|
||||
const currentByCc = new Map(currentItems.map((x) => [x.countryCode, x]));
|
||||
const deltas = [];
|
||||
for (const [cc, cur] of currentByCc.entries()) {
|
||||
const prev = baselineByCc.get(cc);
|
||||
if (!prev) continue;
|
||||
const curScore = typeof cur.overallScore === 'number' ? cur.overallScore : null;
|
||||
const prevScore = typeof prev.overallScoreRaw === 'number' ? prev.overallScoreRaw : (typeof prev.overallScore === 'number' ? prev.overallScore : null);
|
||||
if (curScore == null || prevScore == null) continue;
|
||||
deltas.push({
|
||||
countryCode: cc,
|
||||
scoreDelta: curScore - prevScore,
|
||||
curScore,
|
||||
prevScore,
|
||||
curRank: cur.__rank,
|
||||
prevRank: prev.rank ?? null,
|
||||
});
|
||||
}
|
||||
deltas.sort((a, b) => Math.abs(b.scoreDelta) - Math.abs(a.scoreDelta));
|
||||
return deltas.slice(0, n);
|
||||
}
|
||||
|
||||
function fmtDelta(delta) {
|
||||
if (delta === 0) return '·';
|
||||
// ASCII hyphen-minus, not U+2212 MINUS. Downstream operators diff
|
||||
// audit reports with `grep`/`awk`/CSV pipelines that treat the two
|
||||
// characters differently; keeping ASCII preserves byte-level
|
||||
// greppability of negative deltas.
|
||||
const sign = delta > 0 ? '+' : '-';
|
||||
return `${sign}${Math.abs(delta).toFixed(2)}`;
|
||||
}
|
||||
|
||||
function section(label, body) {
|
||||
return `\n## ${label}\n\n${body}\n`;
|
||||
}
|
||||
|
||||
// Detect whether overall is computed via the legacy domain-weighted
|
||||
// formula (contribution decomposition is valid) or the pillar-combine
|
||||
// formula (penalizedPillarScore — decomposition is NOT valid and the
|
||||
// operator MUST know). Signal: |Σ contributions - overallScore| across
|
||||
// countries with COMPLETE domain coverage exceeds
|
||||
// CONTRIBUTION_SUM_TOLERANCE. "Complete" requires:
|
||||
// (a) sum(domain.weight) within 0.05 of 1.0 (all 6 domains present)
|
||||
// (b) every dim has coverage ≥ 0.9 (so the dim-share math is stable)
|
||||
// Both gates prevent false positives from small/partial fixtures or
|
||||
// live-API responses where the call happened to land mid-backfill.
|
||||
function detectFormulaMode(scoreMap) {
|
||||
let diffsExceeded = 0;
|
||||
let checked = 0;
|
||||
const examples = [];
|
||||
for (const [cc, doc] of scoreMap.entries()) {
|
||||
if (!doc) continue;
|
||||
const domains = doc.domains ?? [];
|
||||
const domainWeightSum = domains.reduce((a, d) => a + (d.weight ?? 0), 0);
|
||||
if (Math.abs(domainWeightSum - 1.0) > 0.05) continue; // incomplete response
|
||||
const hasFullCoverage = domains.every((dom) =>
|
||||
(dom.dimensions ?? []).every((dim) => (dim.coverage ?? 0) >= 0.9),
|
||||
);
|
||||
if (!hasFullCoverage) continue;
|
||||
const rows = decomposeContributions(doc, DIM_WEIGHTS);
|
||||
const sum = rows.reduce((a, r) => a + r.contribution, 0);
|
||||
const overall = doc.overallScore ?? 0;
|
||||
const diff = Math.abs(sum - overall);
|
||||
checked += 1;
|
||||
if (diff > CONTRIBUTION_SUM_TOLERANCE) {
|
||||
diffsExceeded += 1;
|
||||
if (examples.length < 3) examples.push({ cc, sum, overall, diff });
|
||||
}
|
||||
}
|
||||
// Heuristic: if > 50% of eligible countries drift AND at least 3 were
|
||||
// checked, pillar-combine is probably active. Below 3 checked we skip
|
||||
// the flag entirely — the signal is too noisy to banner-block on.
|
||||
const pillarModeLikely = checked >= 3 && diffsExceeded / checked > 0.5;
|
||||
return { pillarModeLikely, checked, diffsExceeded, examples };
|
||||
}
|
||||
|
||||
function renderCohortSection(cohortName, codes, scoreMap, nameMap) {
|
||||
const present = codes.filter((cc) => scoreMap.get(cc));
|
||||
if (!present.length) return '';
|
||||
|
||||
// Collect all dims seen in this cohort.
|
||||
const dimIds = new Set();
|
||||
for (const cc of present) {
|
||||
const doc = scoreMap.get(cc);
|
||||
for (const dom of doc.domains ?? []) for (const dim of dom.dimensions ?? []) dimIds.add(dim.id);
|
||||
}
|
||||
const orderedDims = [...dimIds].sort();
|
||||
|
||||
let body = `Members: ${present.join(', ')}\n\n`;
|
||||
|
||||
// Overall table
|
||||
body += `**Overall**\n\n| CC | Country | Overall | Baseline | Stress | Level |\n|---|---|---:|---:|---:|---|\n`;
|
||||
for (const cc of present) {
|
||||
const doc = scoreMap.get(cc);
|
||||
body += `| ${cc} | ${nameMap[cc] ?? cc} | ${round1(doc.overallScore)} | ${round1(doc.baselineScore)} | ${round1(doc.stressScore)} | ${doc.level} |\n`;
|
||||
}
|
||||
|
||||
// Per-dim scores
|
||||
body += `\n**Per-dimension score** (score · coverage · imputationClass if set)\n\n`;
|
||||
body += `| Dim | ${present.join(' | ')} |\n|---| ${present.map(() => '---:').join(' | ')} |\n`;
|
||||
for (const dimId of orderedDims) {
|
||||
const cells = present.map((cc) => renderDimCell(scoreMap.get(cc), dimId));
|
||||
body += `| ${dimId} | ${cells.join(' | ')} |\n`;
|
||||
}
|
||||
|
||||
// Contribution decomposition (sums to overall per country under legacy formula).
|
||||
body += `\n**Contribution decomposition** (points toward overall score)\n\n`;
|
||||
body += `| Dim | ${present.join(' | ')} |\n|---| ${present.map(() => '---:').join(' | ')} |\n`;
|
||||
const contribByCc = new Map(
|
||||
present.map((cc) => [cc, decomposeContributions(scoreMap.get(cc), DIM_WEIGHTS)]),
|
||||
);
|
||||
for (const dimId of orderedDims) {
|
||||
const cells = present.map((cc) => {
|
||||
const row = (contribByCc.get(cc) ?? []).find((r) => r.dimensionId === dimId);
|
||||
return row ? row.contribution.toFixed(2) : '—';
|
||||
});
|
||||
body += `| ${dimId} | ${cells.join(' | ')} |\n`;
|
||||
}
|
||||
const sums = present.map((cc) => (contribByCc.get(cc) ?? []).reduce((a, r) => a + r.contribution, 0));
|
||||
body += `| **sum contrib** | ${sums.map((s) => s.toFixed(2)).join(' | ')} |\n`;
|
||||
const overalls = present.map((cc) => scoreMap.get(cc).overallScore);
|
||||
body += `| **overallScore** | ${overalls.map((s) => round1(s)).join(' | ')} |\n`;
|
||||
|
||||
return section(`Cohort: ${cohortName}`, body);
|
||||
}
|
||||
|
||||
function renderDimCell(doc, dimId) {
|
||||
for (const dom of doc.domains ?? []) {
|
||||
for (const dim of dom.dimensions ?? []) {
|
||||
if (dim.id === dimId) {
|
||||
const cov = round2(dim.coverage ?? 0);
|
||||
const imp = dim.imputationClass ? ` · *${dim.imputationClass}*` : '';
|
||||
return `${Math.round(dim.score ?? 0)} · ${cov}${imp}`;
|
||||
}
|
||||
}
|
||||
}
|
||||
return '—';
|
||||
}
|
||||
|
||||
function buildReport({ ranking, scoreMap, nameMap, movers, capturedAt, sha, failures, requestedCohortCodes }) {
|
||||
const items = ranking.items ?? [];
|
||||
const greyedOut = ranking.greyedOut ?? [];
|
||||
const failureList = [...(failures?.entries?.() ?? [])];
|
||||
const missingCohortMembers = (requestedCohortCodes ?? []).filter((cc) => !scoreMap.get(cc));
|
||||
const formulaMode = detectFormulaMode(scoreMap);
|
||||
|
||||
let md = `# Resilience cohort-sanity audit report\n\n`;
|
||||
|
||||
// Blocking banners at the very top. Operator MUST see these before the
|
||||
// tables below. STRICT mode will exit non-zero after writing the report
|
||||
// so an operator can inspect the diagnostics and then re-run.
|
||||
if (failureList.length || missingCohortMembers.length) {
|
||||
md += `> ⛔ **Fetch failures / missing cohort members.** ${failureList.length} per-country fetch(es) failed; `;
|
||||
md += `${missingCohortMembers.length} cohort member(s) are missing from the score map. `;
|
||||
md += `Tables below only reflect the members that DID load. `;
|
||||
md += `Re-run the audit (STRICT=1 recommended) before treating this report as release-gate evidence.\n\n`;
|
||||
}
|
||||
if (formulaMode.pillarModeLikely) {
|
||||
md += `> ⛔ **Formula mode not supported.** ${formulaMode.diffsExceeded}/${formulaMode.checked} full-coverage countries show `;
|
||||
md += `|Σ contributions − overallScore| > ${CONTRIBUTION_SUM_TOLERANCE}. This almost certainly means \`RESILIENCE_PILLAR_COMBINE_ENABLED\` `;
|
||||
md += `is active (penalizedPillarScore), and the **contribution decomposition tables below are NOT valid**. `;
|
||||
md += `Treat them as "legacy-formula reference only." `;
|
||||
md += `See \`docs/methodology/cohort-sanity-release-gate.md#formula-mode\`.\n\n`;
|
||||
}
|
||||
|
||||
// In FIXTURE mode `API_BASE` is empty → `RANKING_URL` would render as
|
||||
// a bare "/api/resilience/v1/get-resilience-ranking" path that never
|
||||
// resolved. Surface "fixture://<path>" instead so a diff against a
|
||||
// live-run report is visibly distinguishable.
|
||||
const sourceLabel = FIXTURE_PATH ? `fixture://${FIXTURE_PATH}` : RANKING_URL;
|
||||
md += `- Captured: ${capturedAt}\n- Commit: ${sha}\n- Source: ${sourceLabel}\n- Ranked: ${items.length} · Grey-out: ${greyedOut.length}\n`;
|
||||
md += `- Generated by: \`scripts/audit-resilience-cohorts.mjs\`\n`;
|
||||
md += `- Expected domain weights: ${Object.entries(EXPECTED_DOMAIN_WEIGHTS).map(([k, v]) => `${k}=${v}`).join(', ')}\n`;
|
||||
md += `- Formula mode: ${formulaMode.pillarModeLikely ? '**PILLAR-COMBINE (decomposition invalid)**' : 'legacy domain-weighted (decomposition valid)'}\n`;
|
||||
md += `- Fetch failures: ${failureList.length} · Missing cohort members: ${missingCohortMembers.length}\n`;
|
||||
if (BASELINE_PATH) md += `- Baseline snapshot: \`${BASELINE_PATH}\`\n`;
|
||||
|
||||
// Dedicated "what failed" section, rendered even when empty so operators
|
||||
// always know to check for it.
|
||||
{
|
||||
let failBody = '';
|
||||
if (failureList.length) {
|
||||
failBody += `| CC | Country | Error |\n|---|---|---|\n`;
|
||||
for (const [cc, msg] of failureList) {
|
||||
failBody += `| ${cc} | ${nameMap[cc] ?? cc} | ${String(msg).replace(/\|/g, '\\|').slice(0, 200)} |\n`;
|
||||
}
|
||||
}
|
||||
if (missingCohortMembers.length) {
|
||||
failBody += `\n**Cohort members with no score data:** ${missingCohortMembers.join(', ')}\n`;
|
||||
failBody += `\nThe cohorts below were rendered using only members that loaded successfully. `;
|
||||
failBody += `An operator comparing to a prior audit should assume the missing members may carry the very anomaly under review.\n`;
|
||||
}
|
||||
if (!failBody) failBody = '_No fetch failures and all cohort members present._';
|
||||
md += section('Fetch failures / missing members', failBody);
|
||||
}
|
||||
|
||||
if (formulaMode.pillarModeLikely && formulaMode.examples.length) {
|
||||
let fmBody = `| CC | Σ contrib | overallScore | |diff| |\n|---|---:|---:|---:|\n`;
|
||||
for (const ex of formulaMode.examples) {
|
||||
fmBody += `| ${ex.cc} | ${ex.sum.toFixed(2)} | ${ex.overall.toFixed(2)} | ${ex.diff.toFixed(2)} |\n`;
|
||||
}
|
||||
fmBody += `\n**Diagnosis.** Under the legacy domain-weighted formula, Σ contributions ≈ overallScore (within ~${CONTRIBUTION_SUM_TOLERANCE} pts of drift for rounding). When \`RESILIENCE_PILLAR_COMBINE_ENABLED\` is active, \`overallScore\` is computed by \`penalizedPillarScore(pillars)\` which is non-linear in the dimension scores; contribution decomposition by domain-weight no longer sums to overall. The audit script does not yet implement a pillar-aware decomposition — fix that before relying on this report under pillar-combine mode.\n`;
|
||||
md += section('Formula-mode diagnostic', fmBody);
|
||||
}
|
||||
|
||||
// Ranking table
|
||||
let body = '| # | CC | Country | Overall | Coverage | Level | Low-conf |\n|---:|---|---|---:|---:|---|---|\n';
|
||||
items.slice(0, TOP_N_FULL_RANKING).forEach((x, i) => {
|
||||
body += `| ${i + 1} | ${x.countryCode} | ${nameMap[x.countryCode] ?? x.countryCode} | ${round1(x.overallScore)} | ${round2(x.overallCoverage)} | ${x.level} | ${x.lowConfidence ? '⚠' : ''} |\n`;
|
||||
});
|
||||
md += section(`Top ${TOP_N_FULL_RANKING} ranking`, body);
|
||||
|
||||
// Per-cohort per-dimension breakdown
|
||||
for (const [cohortName, codes] of Object.entries(COHORTS)) {
|
||||
md += renderCohortSection(cohortName, codes, scoreMap, nameMap);
|
||||
}
|
||||
|
||||
// Flagged patterns
|
||||
const allFlags = [];
|
||||
for (const [cohortName, codes] of Object.entries(COHORTS)) {
|
||||
allFlags.push(...flagDimensionPatterns(cohortName, codes, scoreMap));
|
||||
}
|
||||
if (allFlags.length) {
|
||||
let flagBody = `| Cohort | Kind | Dimension | Message |\n|---|---|---|---|\n`;
|
||||
for (const f of allFlags) {
|
||||
flagBody += `| ${f.cohort} | ${f.kind} | ${f.dimension} | ${f.message} |\n`;
|
||||
}
|
||||
md += section('Flagged patterns', flagBody);
|
||||
} else {
|
||||
md += section('Flagged patterns', '_No cohort-sanity patterns tripped heuristic thresholds._');
|
||||
}
|
||||
|
||||
// Movers
|
||||
if (movers?.length) {
|
||||
let mvBody = `Baseline: \`${BASELINE_PATH}\`\n\n`;
|
||||
mvBody += `| CC | Country | Prev | Current | Δ | Prev rank | Current rank |\n|---|---|---:|---:|---:|---:|---:|\n`;
|
||||
for (const m of movers) {
|
||||
mvBody += `| ${m.countryCode} | ${nameMap[m.countryCode] ?? m.countryCode} | ${round1(m.prevScore)} | ${round1(m.curScore)} | ${fmtDelta(round2(m.scoreDelta))} | ${m.prevRank ?? '—'} | ${m.curRank ?? '—'} |\n`;
|
||||
}
|
||||
md += section(`Top-${MOVERS_N} movers vs baseline`, mvBody);
|
||||
}
|
||||
|
||||
md += `\n---\n\n*This audit is a release-gate diagnostic, not a merge-blocker. Rank-targeted acceptance criteria are an explicit anti-pattern — see \`docs/methodology/cohort-sanity-release-gate.md\`.*\n`;
|
||||
return { md, failureList, missingCohortMembers, formulaMode };
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const nameMap = await loadCountryNameMap();
|
||||
const cohortCodeSet = new Set();
|
||||
for (const codes of Object.values(COHORTS)) for (const cc of codes) cohortCodeSet.add(cc);
|
||||
const requestedCohortCodes = [...cohortCodeSet].sort();
|
||||
|
||||
let ranking;
|
||||
let scoreMap;
|
||||
let failures = new Map();
|
||||
if (FIXTURE_PATH) {
|
||||
const raw = await fs.readFile(path.resolve(REPO_ROOT, FIXTURE_PATH), 'utf8');
|
||||
const fixture = JSON.parse(raw);
|
||||
ranking = fixture.ranking ?? { items: [], greyedOut: [] };
|
||||
scoreMap = new Map(Object.entries(fixture.scores ?? {}));
|
||||
// Fixture mode has no network calls, but a fixture may legitimately
|
||||
// omit cohort members (for small smoke-test fixtures). Rather than
|
||||
// silently dropping them, compute the missing set here too so the
|
||||
// report banners them identically to live-mode fetch failures.
|
||||
console.error(`[audit] FIXTURE mode: ${path.resolve(REPO_ROOT, FIXTURE_PATH)} (ranked=${(ranking.items || []).length}, scores=${scoreMap.size})`);
|
||||
} else {
|
||||
ranking = await fetchRanking();
|
||||
console.error(`[audit] fetching per-country scores for ${requestedCohortCodes.length} cohort members at concurrency=${CONCURRENCY}`);
|
||||
const result = await fetchScoresConcurrent(requestedCohortCodes);
|
||||
scoreMap = result.scores;
|
||||
failures = result.failures;
|
||||
}
|
||||
const items = ranking.items ?? [];
|
||||
items.forEach((x, i) => { x.__rank = i + 1; });
|
||||
|
||||
let movers = [];
|
||||
if (BASELINE_PATH) {
|
||||
try {
|
||||
const raw = await fs.readFile(path.resolve(REPO_ROOT, BASELINE_PATH), 'utf8');
|
||||
const baseline = JSON.parse(raw);
|
||||
movers = computeMovers(items, baseline.items, MOVERS_N);
|
||||
} catch (err) {
|
||||
console.error(`[audit] baseline read failed: ${err.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
const capturedAt = new Date().toISOString();
|
||||
const sha = commitSha();
|
||||
const { md, failureList, missingCohortMembers, formulaMode } = buildReport({
|
||||
ranking, scoreMap, nameMap, movers, capturedAt, sha, failures, requestedCohortCodes,
|
||||
});
|
||||
|
||||
if (OUT_PATH) {
|
||||
await fs.mkdir(path.dirname(path.resolve(REPO_ROOT, OUT_PATH)), { recursive: true });
|
||||
await fs.writeFile(path.resolve(REPO_ROOT, OUT_PATH), md, 'utf8');
|
||||
console.error(`[audit] wrote ${OUT_PATH}`);
|
||||
} else {
|
||||
process.stdout.write(md);
|
||||
}
|
||||
|
||||
// STRICT mode fails the run AFTER writing the report so operators still
|
||||
// have the diagnostic artifact on disk. Exit codes:
|
||||
// 3 — fetch failures or missing cohort members
|
||||
// 4 — formula-mode change detected (pillar-combine active, decomposition invalid)
|
||||
// 0 — all clear
|
||||
if (STRICT) {
|
||||
if (failureList.length || missingCohortMembers.length) {
|
||||
console.error(`[audit] STRICT: ${failureList.length} fetch failure(s), ${missingCohortMembers.length} missing cohort member(s); exiting 3`);
|
||||
process.exit(3);
|
||||
}
|
||||
if (formulaMode.pillarModeLikely) {
|
||||
console.error(`[audit] STRICT: formula-mode mismatch detected (pillar-combine likely); contribution decomposition invalid; exiting 4`);
|
||||
process.exit(4);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
main().catch((err) => {
|
||||
console.error('[audit-resilience-cohorts] failed:', err);
|
||||
process.exit(1);
|
||||
});
|
||||
166
tests/audit-cohort-formula-detection.test.mts
Normal file
166
tests/audit-cohort-formula-detection.test.mts
Normal file
@@ -0,0 +1,166 @@
|
||||
// Smoke-tests for the fail-closed behaviour of
|
||||
// `scripts/audit-resilience-cohorts.mjs`. Verifies:
|
||||
// (1) Missing cohort members produce a ⛔ banner at report top
|
||||
// and a dedicated "Fetch failures / missing members" section.
|
||||
// (2) STRICT=1 exits non-zero (code 3) when members are missing.
|
||||
// (3) Formula-mode detection correctly banners when pillar-combine
|
||||
// is active (Σ contributions ≠ overallScore for complete responses)
|
||||
// and correctly does NOT banner when contributions sum.
|
||||
//
|
||||
// The tests drive the script as a child process against synthetic
|
||||
// fixtures so they exercise the full `main()` flow (report shape,
|
||||
// exit codes, stderr logging) rather than just the pure helpers.
|
||||
|
||||
import assert from 'node:assert/strict';
|
||||
import { describe, it } from 'node:test';
|
||||
import { spawnSync } from 'node:child_process';
|
||||
import path from 'node:path';
|
||||
import fs from 'node:fs';
|
||||
import os from 'node:os';
|
||||
import { fileURLToPath } from 'node:url';
|
||||
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
const REPO_ROOT = path.resolve(path.dirname(__filename), '..');
|
||||
const SCRIPT = path.join(REPO_ROOT, 'scripts', 'audit-resilience-cohorts.mjs');
|
||||
|
||||
function writeFixture(name: string, fixture: unknown): string {
|
||||
const tmpFile = path.join(os.tmpdir(), `audit-fixture-${name}-${process.pid}.json`);
|
||||
fs.writeFileSync(tmpFile, JSON.stringify(fixture));
|
||||
return tmpFile;
|
||||
}
|
||||
|
||||
function runAudit(env: Record<string, string>): { status: number | null; stdout: string; stderr: string; report: string } {
|
||||
const outFile = path.join(os.tmpdir(), `audit-out-${Date.now()}-${Math.random().toString(36).slice(2)}.md`);
|
||||
const result = spawnSync('node', [SCRIPT], {
|
||||
env: { ...process.env, OUT: outFile, ...env },
|
||||
encoding: 'utf8',
|
||||
});
|
||||
let report = '';
|
||||
try { report = fs.readFileSync(outFile, 'utf8'); } catch { /* no report written */ }
|
||||
return {
|
||||
status: result.status,
|
||||
stdout: result.stdout ?? '',
|
||||
stderr: result.stderr ?? '',
|
||||
report,
|
||||
};
|
||||
}
|
||||
|
||||
// Complete fixture: 57 cohort members so missing-member banner does NOT fire.
|
||||
// Domain weights sum to 1.0 and coverage is 1.0 throughout.
|
||||
// Σ contributions per country should land within CONTRIB_TOLERANCE of overall.
|
||||
function buildCompleteFixture(options: { pillarMode?: boolean } = {}): unknown {
|
||||
const allCohortCodes = Array.from(new Set([
|
||||
'AE', 'SA', 'KW', 'QA', 'OM', 'BH',
|
||||
'FR', 'US', 'GB', 'JP', 'KR', 'DE', 'CA', 'FI', 'SE', 'BE',
|
||||
'SG', 'MY', 'TH', 'VN', 'ID', 'PH',
|
||||
'BR', 'MX', 'CO', 'VE', 'AR', 'EC',
|
||||
'NG', 'ZA', 'ET', 'KE', 'GH', 'CD', 'SD',
|
||||
'RU', 'KZ', 'AZ', 'UA', 'UZ', 'GE', 'AM',
|
||||
'LK', 'PK', 'LB', 'TR', 'EG', 'TN',
|
||||
'HK', 'NL', 'PA', 'LT',
|
||||
'NO',
|
||||
'YE', 'SY', 'SO', 'AF',
|
||||
]));
|
||||
|
||||
const buildDoc = (overallScore: number) => {
|
||||
const dimScore = overallScore;
|
||||
return {
|
||||
countryCode: 'XX',
|
||||
overallScore: options.pillarMode ? 10 : overallScore,
|
||||
// When pillarMode=true we deliberately set overallScore to a value
|
||||
// that won't match Σ contributions (penalizedPillarScore semantics)
|
||||
// so the detector fires. coverage=1.0 across all dims keeps the
|
||||
// eligibility gate satisfied.
|
||||
level: 'moderate',
|
||||
baselineScore: overallScore,
|
||||
stressScore: overallScore,
|
||||
stressFactor: 0.2,
|
||||
domains: [
|
||||
{ id: 'economic', weight: 0.17, score: dimScore, dimensions: [
|
||||
{ id: 'macroFiscal', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
||||
]},
|
||||
{ id: 'infrastructure', weight: 0.15, score: dimScore, dimensions: [
|
||||
{ id: 'infrastructure', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
||||
]},
|
||||
{ id: 'energy', weight: 0.11, score: dimScore, dimensions: [
|
||||
{ id: 'energy', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
||||
]},
|
||||
{ id: 'social-governance', weight: 0.19, score: dimScore, dimensions: [
|
||||
{ id: 'governanceInstitutional', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
||||
]},
|
||||
{ id: 'health-food', weight: 0.13, score: dimScore, dimensions: [
|
||||
{ id: 'healthPublicService', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
||||
]},
|
||||
{ id: 'recovery', weight: 0.25, score: dimScore, dimensions: [
|
||||
{ id: 'externalDebtCoverage', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
||||
]},
|
||||
],
|
||||
};
|
||||
};
|
||||
|
||||
const scores: Record<string, unknown> = {};
|
||||
for (const cc of allCohortCodes) {
|
||||
scores[cc] = { ...(buildDoc(70) as Record<string, unknown>), countryCode: cc };
|
||||
}
|
||||
const items = allCohortCodes.slice(0, 6).map((cc) => ({
|
||||
countryCode: cc, overallScore: 70, level: 'moderate', lowConfidence: false, overallCoverage: 1.0, rankStable: true,
|
||||
}));
|
||||
return { ranking: { items, greyedOut: [] }, scores };
|
||||
}
|
||||
|
||||
describe('audit-resilience-cohorts fail-closed — missing cohort members', () => {
|
||||
it('banners the report when fixture omits cohort members AND exits 3 under STRICT=1', () => {
|
||||
// Minimal fixture intentionally omits almost every cohort member.
|
||||
const fixture = {
|
||||
ranking: { items: [
|
||||
{ countryCode: 'AE', overallScore: 72.72, level: 'high', lowConfidence: false, overallCoverage: 0.88, rankStable: true },
|
||||
], greyedOut: [] },
|
||||
scores: {
|
||||
AE: { countryCode: 'AE', overallScore: 72.72, level: 'high', baselineScore: 72, stressScore: 70, stressFactor: 0.15, domains: [
|
||||
{ id: 'recovery', weight: 0.25, score: 50, dimensions: [
|
||||
{ id: 'externalDebtCoverage', score: 100, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
||||
]},
|
||||
]},
|
||||
},
|
||||
};
|
||||
const fixturePath = writeFixture('missing-members', fixture);
|
||||
try {
|
||||
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
|
||||
assert.equal(result.status, 3, `expected STRICT exit code 3 for missing members; got ${result.status}; stderr=${result.stderr}`);
|
||||
assert.match(result.report, /⛔ \*\*Fetch failures \/ missing cohort members/, 'expected missing-members banner at report top');
|
||||
assert.match(result.report, /## Fetch failures \/ missing members/, 'expected dedicated Fetch-failures section');
|
||||
assert.match(result.report, /Cohort members with no score data:/, 'expected missing-members list');
|
||||
} finally {
|
||||
fs.unlinkSync(fixturePath);
|
||||
}
|
||||
});
|
||||
|
||||
it('exits 0 under STRICT=1 when all cohort members present + formula matches', () => {
|
||||
const fixture = buildCompleteFixture({ pillarMode: false });
|
||||
const fixturePath = writeFixture('complete', fixture);
|
||||
try {
|
||||
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
|
||||
assert.equal(result.status, 0, `expected STRICT exit 0; got ${result.status}; stderr=${result.stderr}`);
|
||||
assert.doesNotMatch(result.report, /⛔ \*\*Fetch failures/, 'missing-members banner should NOT fire');
|
||||
assert.doesNotMatch(result.report, /⛔ \*\*Formula mode not supported/, 'formula-mode banner should NOT fire on legacy-formula response');
|
||||
} finally {
|
||||
fs.unlinkSync(fixturePath);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe('audit-resilience-cohorts fail-closed — formula mode', () => {
|
||||
it('banners the report when Σ contributions diverges from overallScore AND exits 4 under STRICT=1', () => {
|
||||
const fixture = buildCompleteFixture({ pillarMode: true });
|
||||
const fixturePath = writeFixture('pillar-mode', fixture);
|
||||
try {
|
||||
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
|
||||
assert.equal(result.status, 4, `expected STRICT exit code 4 for formula mismatch; got ${result.status}; stderr=${result.stderr}`);
|
||||
assert.match(result.report, /⛔ \*\*Formula mode not supported/, 'expected formula-mode banner at report top');
|
||||
assert.match(result.report, /PILLAR-COMBINE \(decomposition invalid\)/, 'expected formula-mode line in header');
|
||||
assert.match(result.report, /## Formula-mode diagnostic/, 'expected dedicated formula-mode diagnostic section');
|
||||
} finally {
|
||||
fs.unlinkSync(fixturePath);
|
||||
}
|
||||
});
|
||||
});
|
||||
125
tests/fixtures/resilience-audit-fixture.json
vendored
Normal file
125
tests/fixtures/resilience-audit-fixture.json
vendored
Normal file
@@ -0,0 +1,125 @@
|
||||
{
|
||||
"_comment": "Minimal synthetic fixture for scripts/audit-resilience-cohorts.mjs end-to-end dry-run. Mirrors the 2026-04-24 GCC snapshot (KW>QA>AE) so the fixture-mode run produces the observed-real-world deltas the audit is designed to surface. Values are approximate; used only for structural verification.",
|
||||
"ranking": {
|
||||
"items": [
|
||||
{ "countryCode": "KW", "overallScore": 79.08, "level": "high", "lowConfidence": false, "overallCoverage": 0.92, "rankStable": true },
|
||||
{ "countryCode": "QA", "overallScore": 77.06, "level": "high", "lowConfidence": false, "overallCoverage": 0.95, "rankStable": true },
|
||||
{ "countryCode": "AE", "overallScore": 72.72, "level": "high", "lowConfidence": false, "overallCoverage": 0.88, "rankStable": true },
|
||||
{ "countryCode": "SA", "overallScore": 68.04, "level": "high", "lowConfidence": false, "overallCoverage": 0.90, "rankStable": true },
|
||||
{ "countryCode": "OM", "overallScore": 65.74, "level": "moderate", "lowConfidence": false, "overallCoverage": 0.82, "rankStable": true },
|
||||
{ "countryCode": "BH", "overallScore": 61.69, "level": "moderate", "lowConfidence": false, "overallCoverage": 0.85, "rankStable": true }
|
||||
],
|
||||
"greyedOut": []
|
||||
},
|
||||
"scores": {
|
||||
"AE": {
|
||||
"countryCode": "AE",
|
||||
"overallScore": 72.72,
|
||||
"level": "high",
|
||||
"baselineScore": 72.0,
|
||||
"stressScore": 70.0,
|
||||
"stressFactor": 0.15,
|
||||
"domains": [
|
||||
{ "id": "recovery", "score": 62, "weight": 0.25, "dimensions": [
|
||||
{ "id": "sovereignFiscalBuffer", "score": 27, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "liquidReserveAdequacy", "score": 38, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "importConcentration", "score": 50, "coverage": 0.3, "observedWeight": 0, "imputedWeight": 1, "imputationClass": "unmonitored" },
|
||||
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "fiscalSpace", "score": 76, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "stateContinuity", "score": 85, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "economic", "score": 74, "weight": 0.17, "dimensions": [
|
||||
{ "id": "tradeSanctions", "score": 54, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "currencyExternal", "score": 73, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "macroFiscal", "score": 80, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "energy", "score": 79, "weight": 0.11, "dimensions": [
|
||||
{ "id": "energy", "score": 79, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "health-food", "score": 62, "weight": 0.13, "dimensions": [
|
||||
{ "id": "foodWater", "score": 53, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "healthPublicService", "score": 75, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "social-governance", "score": 74, "weight": 0.19, "dimensions": [
|
||||
{ "id": "socialCohesion", "score": 70, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "governanceInstitutional", "score": 78, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "infrastructure", "score": 78, "weight": 0.15, "dimensions": [
|
||||
{ "id": "infrastructure", "score": 80, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]}
|
||||
]
|
||||
},
|
||||
"KW": {
|
||||
"countryCode": "KW",
|
||||
"overallScore": 79.08,
|
||||
"level": "high",
|
||||
"baselineScore": 79.0,
|
||||
"stressScore": 78.0,
|
||||
"stressFactor": 0.11,
|
||||
"domains": [
|
||||
{ "id": "recovery", "score": 90, "weight": 0.25, "dimensions": [
|
||||
{ "id": "sovereignFiscalBuffer", "score": 98, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "liquidReserveAdequacy", "score": 72, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "importConcentration", "score": 85, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "fiscalSpace", "score": 98, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "stateContinuity", "score": 80, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "economic", "score": 78, "weight": 0.17, "dimensions": [
|
||||
{ "id": "tradeSanctions", "score": 82, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "currencyExternal", "score": 86, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "macroFiscal", "score": 70, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "energy", "score": 55, "weight": 0.11, "dimensions": [
|
||||
{ "id": "energy", "score": 55, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "health-food", "score": 60, "weight": 0.13, "dimensions": [
|
||||
{ "id": "foodWater", "score": 53, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "healthPublicService", "score": 72, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "social-governance", "score": 72, "weight": 0.19, "dimensions": [
|
||||
{ "id": "socialCohesion", "score": 68, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "governanceInstitutional", "score": 76, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "infrastructure", "score": 76, "weight": 0.15, "dimensions": [
|
||||
{ "id": "infrastructure", "score": 76, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]}
|
||||
]
|
||||
},
|
||||
"QA": {
|
||||
"countryCode": "QA",
|
||||
"overallScore": 77.06,
|
||||
"level": "high",
|
||||
"baselineScore": 77.0,
|
||||
"stressScore": 76.0,
|
||||
"stressFactor": 0.12,
|
||||
"domains": [
|
||||
{ "id": "recovery", "score": 85, "weight": 0.25, "dimensions": [
|
||||
{ "id": "sovereignFiscalBuffer", "score": 95, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "liquidReserveAdequacy", "score": 68, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "importConcentration", "score": 70, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]},
|
||||
{ "id": "economic", "score": 78, "weight": 0.17, "dimensions": [
|
||||
{ "id": "tradeSanctions", "score": 82, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]}
|
||||
]
|
||||
},
|
||||
"SA": { "countryCode": "SA", "overallScore": 68.04, "level": "high", "baselineScore": 68.0, "stressScore": 67.0, "stressFactor": 0.14, "domains": [
|
||||
{ "id": "recovery", "score": 70, "weight": 0.25, "dimensions": [
|
||||
{ "id": "sovereignFiscalBuffer", "score": 72, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
|
||||
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]}
|
||||
]},
|
||||
"OM": { "countryCode": "OM", "overallScore": 65.74, "level": "moderate", "baselineScore": 66.0, "stressScore": 64.0, "stressFactor": 0.16, "domains": [
|
||||
{ "id": "recovery", "score": 60, "weight": 0.25, "dimensions": [
|
||||
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]}
|
||||
]},
|
||||
"BH": { "countryCode": "BH", "overallScore": 61.69, "level": "moderate", "baselineScore": 62.0, "stressScore": 60.0, "stressFactor": 0.18, "domains": [
|
||||
{ "id": "recovery", "score": 55, "weight": 0.25, "dimensions": [
|
||||
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
|
||||
]}
|
||||
]}
|
||||
}
|
||||
}
|
||||
161
tests/resilience-construct-invariants.test.mts
Normal file
161
tests/resilience-construct-invariants.test.mts
Normal file
@@ -0,0 +1,161 @@
|
||||
// Construct invariants — formula-level assertions with synthetic inputs.
|
||||
//
|
||||
// Purpose. Complement `resilience-dimension-monotonicity.test.mts` (which
|
||||
// pins direction) with precise ANCHOR-VALUE checks. These tests fail when
|
||||
// the scoring FORMULA breaks, not when a country's RANK changes. They are
|
||||
// deliberately country-identity-free so the audit gate (see
|
||||
// `docs/methodology/cohort-sanity-release-gate.md`) does not collapse into
|
||||
// an outcome-seeking "ENTITY A must > ENTITY B" assertion — that is the
|
||||
// anti-pattern the cohort-sanity skill explicitly warns against.
|
||||
//
|
||||
// Plan reference. PR 0 from
|
||||
// `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md`
|
||||
// (§"PR 0 — Release-gate audit harness"):
|
||||
// > `score(HHI=0.05) > score(HHI=0.20)`
|
||||
// > `score(debtToReservesRatio=0) > score(ratio=1) > score(ratio=2)`
|
||||
// > `score(effMo=12) > score(effMo=3)`
|
||||
// > `score(lowCarbonShare=80, fossilImportDep=0) > score(lowCarbonShare=0, fossilImportDep=100)`
|
||||
//
|
||||
// The tests are organised by scorer and include both the monotonicity
|
||||
// claim and the precise anchor value where the construct fixes one
|
||||
// (Greenspan-Guidotti = 50; saturating transform at effMo=12 = ~63).
|
||||
// An anchor drift > 1 point is an invariant break: investigate before
|
||||
// editing the test.
|
||||
|
||||
import assert from 'node:assert/strict';
|
||||
import { describe, it } from 'node:test';
|
||||
|
||||
import {
|
||||
scoreImportConcentration,
|
||||
scoreExternalDebtCoverage,
|
||||
scoreSovereignFiscalBuffer,
|
||||
type ResilienceSeedReader,
|
||||
} from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
|
||||
|
||||
const TEST_ISO2 = 'XX';
|
||||
|
||||
function makeReader(keyValueMap: Record<string, unknown>): ResilienceSeedReader {
|
||||
return async (key: string) => keyValueMap[key] ?? null;
|
||||
}
|
||||
|
||||
describe('construct invariants — importConcentration', () => {
|
||||
async function scoreWith(hhi: number) {
|
||||
return scoreImportConcentration(TEST_ISO2, makeReader({
|
||||
'resilience:recovery:import-hhi:v1': { countries: { [TEST_ISO2]: { hhi } } },
|
||||
}));
|
||||
}
|
||||
|
||||
it('score(HHI=0.05) > score(HHI=0.20)', async () => {
|
||||
const diversified = await scoreWith(0.05);
|
||||
const concentrated = await scoreWith(0.20);
|
||||
assert.ok(
|
||||
diversified.score > concentrated.score,
|
||||
`HHI 0.05→0.20 should lower score; got ${diversified.score} → ${concentrated.score}`,
|
||||
);
|
||||
});
|
||||
|
||||
it('HHI=0 anchors at score 100 (no-concentration pole)', async () => {
|
||||
const r = await scoreWith(0);
|
||||
assert.ok(Math.abs(r.score - 100) < 1, `expected ~100 at HHI=0, got ${r.score}`);
|
||||
});
|
||||
|
||||
it('HHI=0.5 (fully concentrated under current 0..5000 goalpost) anchors at score 0', async () => {
|
||||
// Current scorer: hhi×10000 normalised against (0, 5000). 0.5×10000 = 5000 → 0.
|
||||
const r = await scoreWith(0.5);
|
||||
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at HHI=0.5 under current goalpost, got ${r.score}`);
|
||||
});
|
||||
});
|
||||
|
||||
describe('construct invariants — externalDebtCoverage (Greenspan-Guidotti anchor)', () => {
|
||||
async function scoreWith(debtToReservesRatio: number) {
|
||||
return scoreExternalDebtCoverage(TEST_ISO2, makeReader({
|
||||
'resilience:recovery:external-debt:v1': {
|
||||
countries: { [TEST_ISO2]: { debtToReservesRatio } },
|
||||
},
|
||||
}));
|
||||
}
|
||||
|
||||
it('ratio=0 → score 100 (zero-rollover-exposure pole)', async () => {
|
||||
const r = await scoreWith(0);
|
||||
assert.ok(Math.abs(r.score - 100) < 1, `expected ~100 at ratio=0, got ${r.score}`);
|
||||
});
|
||||
|
||||
it('ratio=1.0 → score 50 (Greenspan-Guidotti threshold)', async () => {
|
||||
const r = await scoreWith(1.0);
|
||||
assert.ok(
|
||||
Math.abs(r.score - 50) < 1,
|
||||
`expected ~50 at ratio=1.0 under Greenspan-Guidotti anchor (worst=2), got ${r.score}`,
|
||||
);
|
||||
});
|
||||
|
||||
it('ratio=2.0 → score 0 (acute rollover-shock pole)', async () => {
|
||||
const r = await scoreWith(2.0);
|
||||
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at ratio=2.0, got ${r.score}`);
|
||||
});
|
||||
|
||||
it('monotonic: score(ratio=0) > score(ratio=1) > score(ratio=2)', async () => {
|
||||
const [r0, r1, r2] = await Promise.all([scoreWith(0), scoreWith(1), scoreWith(2)]);
|
||||
assert.ok(r0.score > r1.score && r1.score > r2.score,
|
||||
`expected strictly decreasing; got ${r0.score}, ${r1.score}, ${r2.score}`);
|
||||
});
|
||||
});
|
||||
|
||||
describe('construct invariants — sovereignFiscalBuffer (saturating transform)', () => {
|
||||
// Saturating transform per scorer (line ~1687):
|
||||
// score = 100 * (1 - exp(-em / 12))
|
||||
// Reference values (not tuning points — these are what the formula SHOULD
|
||||
// produce if no one has silently redefined it):
|
||||
// em=0 → 0
|
||||
// em=3 → 100*(1-e^-0.25) ≈ 22.1
|
||||
// em=12 → 100*(1-e^-1) ≈ 63.2
|
||||
// em=24 → 100*(1-e^-2) ≈ 86.5
|
||||
// em→∞ → 100
|
||||
|
||||
async function scoreWithEm(em: number) {
|
||||
return scoreSovereignFiscalBuffer(TEST_ISO2, makeReader({
|
||||
'resilience:recovery:sovereign-wealth:v1': {
|
||||
countries: { [TEST_ISO2]: { totalEffectiveMonths: em, completeness: 1.0 } },
|
||||
},
|
||||
}));
|
||||
}
|
||||
|
||||
it('em=0 → score 0 (no SWF buffer)', async () => {
|
||||
const r = await scoreWithEm(0);
|
||||
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at em=0, got ${r.score}`);
|
||||
});
|
||||
|
||||
it('em=12 → score ≈ 63 (one-year saturating anchor)', async () => {
|
||||
const r = await scoreWithEm(12);
|
||||
const expected = 100 * (1 - Math.exp(-1));
|
||||
assert.ok(
|
||||
Math.abs(r.score - expected) < 1,
|
||||
`expected ~${expected.toFixed(1)} at em=12, got ${r.score}`,
|
||||
);
|
||||
});
|
||||
|
||||
it('em=24 → score ≈ 86 (two-year saturating anchor)', async () => {
|
||||
const r = await scoreWithEm(24);
|
||||
const expected = 100 * (1 - Math.exp(-2));
|
||||
assert.ok(
|
||||
Math.abs(r.score - expected) < 1,
|
||||
`expected ~${expected.toFixed(1)} at em=24, got ${r.score}`,
|
||||
);
|
||||
});
|
||||
|
||||
it('monotonic: score(em=3) < score(em=12) < score(em=24)', async () => {
|
||||
const [r3, r12, r24] = await Promise.all([scoreWithEm(3), scoreWithEm(12), scoreWithEm(24)]);
|
||||
assert.ok(r3.score < r12.score && r12.score < r24.score,
|
||||
`expected strictly increasing; got em=3:${r3.score}, em=12:${r12.score}, em=24:${r24.score}`);
|
||||
});
|
||||
|
||||
it('country not in manifest → score 0, coverage 1.0 (legitimate zero, not imputed)', async () => {
|
||||
// Seed present but country absent = "no SWF" (legitimate structural zero).
|
||||
// This is distinct from "seed missing entirely" which returns IMPUTE.
|
||||
const r = await scoreSovereignFiscalBuffer(TEST_ISO2, makeReader({
|
||||
'resilience:recovery:sovereign-wealth:v1': { countries: {} },
|
||||
}));
|
||||
assert.equal(r.score, 0, `expected 0 when country has no manifest entry, got ${r.score}`);
|
||||
assert.equal(r.coverage, 1.0, `expected coverage=1.0 (legitimate observation), got ${r.coverage}`);
|
||||
assert.equal(r.imputationClass, null, `expected null imputation (not imputed), got ${r.imputationClass}`);
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user