feat(resilience): PR 0 — cohort-sanity release-gate harness (#3369)

* feat(resilience): PR 0 — cohort-sanity release-gate harness

Lands the audit infrastructure for the resilience cohort-ranking
structural audit (plan 2026-04-24-002). Release gate, not merge gate:
the audit tells release review what to look at before publishing a
ranking; it does not block a PR.

What's new
- scripts/audit-resilience-cohorts.mjs — Markdown report generator.
  Fetches the live ranking + per-country scores (or reads a fixture
  in offline mode), emits per-cohort per-dimension tables, contribution
  decomposition, saturated / outlier / identical-score flags, and a
  top-N movers comparison vs a baseline snapshot.
- tests/resilience-construct-invariants.test.mts — 12 formula-level
  anchor-value assertions with synthetic inputs. Covers HHI, external
  debt (Greenspan-Guidotti anchor), and sovereign fiscal buffer
  (saturating transform). Tests the MATH, not a country's rank.
- tests/fixtures/resilience-audit-fixture.json — offline fixture that
  mirrors the 2026-04-24 GCC state (KW>QA>AE) so the audit tool can
  be smoke-tested without API-key access.
- docs/methodology/cohort-sanity-release-gate.md — operational doc
  explaining when to run, how to read the report, and the explicit
  anti-pattern note on rank-targeted acceptance criteria.

Verified
- `npx tsx --test tests/resilience-construct-invariants.test.mts` —
  12 pass (HHI, debt, SWF invariants all green against current scorer)
- `npm run test:data` — 6706 pass / 0 fail
- `FIXTURE=tests/fixtures/resilience-audit-fixture.json
   OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs`
  runs to completion and correctly flags:
  (a) coverage-outlier on AE.importConcentration (0.3 vs peers 1.0)
  (b) saturated-high on GCC.externalDebtCoverage (all 6 at 100)
  — the two top cohort-sanity findings from the plan.

Not in this PR
- The live-API baseline snapshot
  (docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json)
  is deferred to a manual release-prep step: run
  `WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app
   node scripts/freeze-resilience-ranking.mjs` before the first
  methodology PR (PR 1 HHI period widening) so its movers table has
  something to compare against.
- No scorer changes. No cache-prefix bumps. This PR is pure tooling.

* fix(resilience): fail-closed on fetch failures + pillar-combine formula mode

Addresses review P1 + P2 on PR #3369.

P1 — fetch-failure silent-drop.
Per-country score fetches that failed were logged to stderr, silently
stored as null, and then filtered out of cohort tables via
`codes.filter((cc) => scoreMap.get(cc))`. A transient 403/500 on the
very country carrying the ranking anomaly could produce a Markdown
report that looked valid — wrong failure mode for a release gate.

Fix:
- `fetchScoresConcurrent` now tracks failures in a dedicated Map and
  does NOT insert null placeholders; missing cohort members are
  computed against the requested cohort code set.
- The report has a  blocker banner at top AND an always-rendered
  "Fetch failures / missing members" section (shown even when empty,
  so an operator learns to look).
- `STRICT=1` writes the report, then exits code 3 on any fetch
  failure or missing cohort member, code 4 on formula-mode drift,
  code 0 otherwise. Automation can differentiate the two.

P2 — pillar-combine formula mode invalidates contribution rows.
`docs/methodology/cohort-sanity-release-gate.md:63` tells operators
to run this audit before activating `RESILIENCE_PILLAR_COMBINE_ENABLED`,
but the contribution decomposition is a domain-weighted roll-up that
is ONLY valid when `overallScore = sum(domain.score * domain.weight)`.
Once pillar combine is on, `overallScore = penalizedPillarScore(pillars)`
(non-linear in dim scores); decomposition rows become materially
misleading for exactly the release-gate scenario the doc prescribes.

Fix:
- Added `detectFormulaMode(scoreMap)` that takes countries with:
  (a) `sum(domain.weight)` within 0.05 of 1.0 (complete response), AND
  (b) every dim at `coverage ≥ 0.9` (stable share math)
  and compares `|Σ contributions - overallScore|` against
  `CONTRIB_TOLERANCE` (default 1.5). If > 50% of ≥ 3 eligible
  countries drift, pillar combine is flagged.
- Report emits a  blocker banner at top, a "Formula mode" line in
  the header, and a "Formula-mode diagnostic" section with the first
  three offenders. Under `STRICT=1` exits code 4.
- Methodology doc updated: new "Fail-closed semantics" section,
  "Formula mode" operator guide, ENV table entries for STRICT +
  CONTRIB_TOLERANCE.

Verified:
- `tests/audit-cohort-formula-detection.test.mts` (NEW) — 3 child-process
  smoke tests: missing-members banner + STRICT exit 3, all-clear exit 0,
  pillar-mode banner + STRICT exit 4. All pass.
- `npx tsx --test tests/resilience-construct-invariants.test.mts
   tests/audit-cohort-formula-detection.test.mts` — 15 pass / 0 fail
- `npm run test:data` — 6709 pass / 0 fail
- `npm run typecheck` / `typecheck:api` — green
- `npm run lint` / `lint:md` — no warnings on new / changed files
  (refactor split buildReport complexity from 51 → under 50 by
  extracting `renderCohortSection` + `renderDimCell`)
- Fixture smoke: AE.importConcentration coverage-outlier and
  GCC.externalDebtCoverage saturated-high flags still fire correctly.

* fix(resilience): PR 0 review — fixture-mode source label, try/catch country-names, ASCII minus

Addresses 3 P2 Greptile findings on #3369:

1. **Misleading Source: line in fixture mode.** `FIXTURE_PATH` sets
   `API_BASE=''`, so the report header showed a bare "/api/..." path that
   never resolved — making a fixture run visually indistinguishable from
   a live run. Now surfaces `Source: fixture://<path>` in fixture mode.

2. **`loadCountryNameMap` crashes without useful diagnostics.** A missing
   or unparseable `shared/country-names.json` produced a raw unhandled
   rejection. Now the read and the parse are each wrapped in their own
   try/catch; on either failure the script logs a developer-friendly
   warning and falls back to ISO-2 codes (report shows "AE" instead of
   "Uae"). Keeps the audit operable in CI-offline scenarios.

3. **Unicode minus `−` (U+2212) instead of ASCII `-` in `fmtDelta`.**
   Downstream operators diff / grep / CSV-pipe the report; the Unicode
   minus breaks byte-level text tooling. Replaced with ASCII hyphen-
   minus. Left the U+2212 in the formula-mode diagnostic prose
   (`|Σ contributions − overallScore|`) where it's mathematical notation,
   not data.

Verified

- `npx tsx --test tests/audit-cohort-formula-detection.test.mts tests/resilience-construct-invariants.test.mts` — 15 pass / 0 fail
- Fixture-mode run produces `Source: fixture://tests/fixtures/...`
- Movers-table negative deltas now use ASCII `-`
This commit is contained in:
Elie Habib
2026-04-24 18:13:22 +04:00
committed by GitHub
parent 34dfc9a451
commit df392b0514
5 changed files with 1372 additions and 0 deletions

View File

@@ -0,0 +1,240 @@
# Cohort-sanity release gate
Operational procedure for the resilience cohort-sanity audit. This is a
**release gate**, not a merge gate. The audit tells release review what
to look at before publishing a ranking; it does not block a PR from
merging.
## What this exists to catch
A composite resilience score can be mathematically correct yet produce
rankings that contradict first-principles domain judgment — usually
because ONE input has a coverage gap, a saturated goalpost, or a
denominator that's structurally wrong for one sub-class of entities
(re-export hubs, single-sector states, SWF-parked-reserve designs).
Cohort-sanity is the test the codebase can't run on its own. It says:
"given these cohorts, does the ranking match the construct each
cohort is defined to probe?" Not "does country A rank above country
B" — see the anti-pattern section below.
Relevant background in the repository:
- `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md`
the audit plan that motivates this gate.
- Skill `cohort-ranking-sanity-surfaces-hidden-data-gaps` — the general
diagnostic protocol (data bug / methodology bug / construct limitation
/ value judgment), including the anti-pattern note on rank-targeted
acceptance criteria.
- `tests/resilience-construct-invariants.test.mts` — formula-level
invariants with synthetic inputs. These test the SCORING MATH; they
don't flip to fail on a live-ranking change.
## Artifacts
1. **`scripts/audit-resilience-cohorts.mjs`** — emits a structured
Markdown report with:
- Full top-N ranking table
- Per-cohort per-dimension breakdown (GCC, OECD-nuclear, ASEAN trade
hubs, LatAm-petro, African-fragile, post-Soviet, stressed-debt,
re-export hubs, SWF-heavy exporters, fragile-floor)
- Contribution decomposition: for each country, each dim's
`score × coverage × dimWeight × domainWeight` contribution to
overall
- Flagged patterns: saturated dims, low-coverage outliers, identical
scores across cohort members
- Top-N movers vs a baseline snapshot
2. **`tests/resilience-construct-invariants.test.mts`** — formula-level
anchor-value assertions. Part of `npm run test:data`. Failing means
the scorer formula drifted; investigate before editing the test.
3. **`docs/snapshots/resilience-ranking-live-pre-cohort-audit-YYYY-MM-DD.json`** —
the baseline snapshot for movers comparison. Refresh before each
methodology change.
## When to run
- **Pre-publication**: any time the published ranking is about to
change externally (site, API consumers, newsletter, partner feed).
- **Every merge touching a scorer file** in `server/worldmonitor/resilience/v1/_dimension-scorers.ts`,
`server/worldmonitor/resilience/v1/_shared.ts`, or a scorer-feeding
seeder in `scripts/seed-recovery-*.mjs`, `scripts/seed-bundle-resilience-*.mjs`.
- **Before activating a feature flag** that alters the scorer
(`RESILIENCE_ENERGY_V2_ENABLED`, `RESILIENCE_PILLAR_COMBINE_ENABLED`,
`RESILIENCE_SCHEMA_V2_ENABLED`).
- **After a cache-prefix bump** (`resilience:score:vN`,
`resilience:ranking:vN`, `resilience:history:vN`) — once the new
prefix has warmed up, rerun the audit so the movers table reflects
the new values and nothing else.
## How to run
```bash
# Online (hits the live API; requires WORLDMONITOR_API_KEY)
WORLDMONITOR_API_KEY=wm_xxx \
API_BASE=https://api.worldmonitor.app \
BASELINE=docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json \
OUT=/tmp/cohort-audit-$(date +%Y-%m-%d).md \
node scripts/audit-resilience-cohorts.mjs
# Offline (fixture mode — for CI / dry-run / regression comparison)
FIXTURE=tests/fixtures/resilience-audit-fixture.json \
OUT=/tmp/cohort-audit-fixture.md \
node scripts/audit-resilience-cohorts.mjs
```
Recommended environment variables:
| Var | Default | Notes |
|---|---|---|
| `API_BASE` | (required unless FIXTURE set) | e.g. `https://api.worldmonitor.app` |
| `WORLDMONITOR_API_KEY` | (required unless FIXTURE set) | resilience RPCs are in `PREMIUM_RPC_PATHS` |
| `FIXTURE` | (empty) | JSON fixture with `{ ranking, scores }` shape — skips all network calls |
| `BASELINE` | (empty) | Path to a frozen ranking JSON for movers comparison |
| `OUT` | (stdout) | Path for the Markdown report |
| `TOP_N` | 60 | Rows to render in the full-ranking table |
| `MOVERS_N` | 30 | Rows to render in the movers table |
| `CONCURRENCY` | 6 | Parallel score-endpoint fetches |
| `STRICT` | unset | `1` = fail-closed. Report still writes, then exit 3 on fetch failures/missing members, exit 4 on formula-mode drift, exit 0 otherwise. Recommended for release-gate automation. |
| `CONTRIB_TOLERANCE` | 1.5 | Points of drift tolerated between `Σ contributions` and `overallScore` before formula-mode drift is declared. |
### Fail-closed semantics
The audit is fail-closed on two axes. Both are implemented in
`scripts/audit-resilience-cohorts.mjs` and documented here so that a
release-gate operator cannot shortcut them by reading only the
rendered tables.
1. **Fetch failures / missing cohort members.** When a per-country score
fetch fails (HTTP 4xx/5xx, timeout, DNS), the country is NOT silently
dropped. The failure is recorded in the run's `failures` map, banner'd
as a ⛔ block at the top of the report, and rendered in a dedicated
"Fetch failures / missing members" section that is ALWAYS present
(even when empty, so an operator learns to look for it). Fixture mode
uses the same mechanism for cohort members absent from the fixture.
2. **Formula-mode mismatch (`RESILIENCE_PILLAR_COMBINE_ENABLED`).** The
contribution decomposition is a domain-weighted roll-up that is ONLY
mathematically valid when `overallScore` is computed via the legacy
`sum(domain.score * domain.weight)` path. Once pillar combine is on,
`overallScore = penalizedPillarScore(pillars)` — a non-linear
function of the dim scores — and the decomposition rows no longer
sum to overall. The harness detects this by taking any country with:
- `sum(domain.weight)` within 0.05 of 1.0 (complete response)
- every dim at `coverage ≥ 0.9` (stable share math)
and checking `|Σ contributions - overallScore| ≤ CONTRIB_TOLERANCE`.
If more than 50% of ≥ 3 eligible countries drift beyond the
tolerance, a ⛔ blocker banner fires at report top AND a
"Formula-mode diagnostic" section prints the first three offenders
with their Σ vs overall numbers. Until the harness grows a
pillar-aware decomposition, the contribution tables under pillar
mode must be treated as *"legacy-formula reference only"*.
### Formula mode
The operator guide for what to do when the formula-mode banner fires:
- **If the banner is a false positive** (e.g. scorer changed a dim
weight and the audit mirror in `scripts/audit-resilience-cohorts.mjs`
`DIM_WEIGHTS` is stale): update the mirror, re-run. This is the
`production-logic-mirror-silent-divergence` pattern — the mirror
must move with the scorer.
- **If pillar combine actually activated:** stop using the
contribution-decomposition tables for this release gate. Fall back
to the per-dimension score table + the construct invariants test +
movers review. File a follow-up to grow the harness a pillar-aware
decomposition before the next methodology PR under pillar mode.
- **Exit codes under `STRICT=1`:** `3` = fetch/missing, `4` = formula
mode, `0` = all clear. These are distinct so automation can
differentiate "the infra is broken" from "the code path is no
longer decomposable."
## How to read the report
The report surfaces five categories of signal. **Treat each as a
prompt for investigation, not a merge gate.**
### 1. Per-cohort per-dimension table
Read across rows. If one country has `IMPUTED` / `unmonitored` /
`coverage < 0.5` where peers have full coverage, that's a seed-level
gap — probably a late-reporter window or a missing manifest entry.
Fix the seed, not the score.
### 2. Contribution decomposition
Each cell shows how many overall-score points that dimension
contributes to that country. If the row sum doesn't match overall
score (not within ~0.5 points), the scorer is using a composition
formula the audit script doesn't understand — investigate
`_shared.ts`'s `coverageWeightedMean` + `penalizedPillarScore`
branches and update the decomposition accordingly.
### 3. Flagged patterns
- **Saturated-high**: every cohort member scores > 95 on a dim. The
dim contributes zero discrimination within that cohort — either the
construct genuinely doesn't apply (acceptable; document in
`known-limitations.md`), or the goalpost is too generous (re-anchor).
- **Saturated-low**: every member scores < 5. Same question in reverse;
often a seed failure rather than a construct issue.
- **Identical scores**: all ≥ 3 cohort members hit the same non-trivial
value. Usually a regional-default leak or a missing-data imputation
class returning the same number.
- **Coverage outlier**: one country is `coverage < 0.5` while peers
are ≥ 0.9. This is almost always the ranking-inversion smoking gun.
### 4. Top-N movers vs baseline
Expected movers post-methodology-PR are construct-consistent: a
re-export-hub PR should move re-export hubs, not SWF-heavy exporters.
Surprise movers trigger investigation before publication.
### 5. Anchor invariants
Run `npx tsx --test tests/resilience-construct-invariants.test.mts`.
An anchor drift > 1 point on `score(ratio=1.0)=50` or
`score(em=12)≈63` means someone silently re-goalposted or rewrote a
saturating transform. This is a bug until proven otherwise.
## Anti-pattern: rank-targeted acceptance criteria
**Never put "ENTITY A > ENTITY B" as a merge gate in this workflow.**
Once a review commits to producing a specific ranking, every construct
/ manifest / goalpost knob becomes a lever to tune toward that
outcome — even subconsciously — and the methodology loses its
construct integrity.
Use instead:
- **Construct monotonicity tests** — synthetic inputs, not country
identity: `score(HHI=0.05) > score(HHI=0.20)`,
`score(ratio=1.0) = 50 ± 1`. These fail when the MATH breaks, not
when the RANKING changes.
- **Out-of-sample cohort behaviour** — define a cohort the fix is
SUPPOSED to move proportionally (re-export hubs, SWF-heavy
exporters, stressed states). Acceptance: cohort behaviour matches
the construct change, not a target position.
- **Top-N movers review** — movers should be cohort members the
construct predicts; surprises trigger investigation.
- **Honest "outcome may not resolve"** — if the original sanity-
failure (the ranking inversion that triggered the audit) is not
guaranteed to resolve under the in-scope fixes, say so explicitly.
A plan that acknowledges "the inversion may persist after all
fixes, because the dominant driver is out of scope" is stronger
than one that over-promises.
If a release reviewer asks "will this make A rank above B", the
correct answer is: *"A will move by the amount the construct
predicts. Where it ends up relative to B is an outcome."*
## Follow-ups
- Every novel gap identified by the audit should land as a section in
`docs/methodology/known-limitations.md` so future reviewers see the
diagnosis trail.
- If a gap is fixed in a PR, the audit report from that PR's
post-merge run should be attached to the PR as an artifact.

View File

@@ -0,0 +1,680 @@
#!/usr/bin/env node
// Release-gate audit harness for the resilience scorer. Emits a Markdown
// report that surfaces cohort-level ranking sanity issues BEFORE they reach
// publication. Designed as a release gate, not a commit gate — see
// docs/methodology/cohort-sanity-release-gate.md for the interpretation
// contract and the explicit anti-pattern note on rank-targeted acceptance
// criteria.
//
// What this does:
// 1. Fetch the live ranking via GET /api/resilience/v1/get-resilience-ranking.
// 2. For every country in the named cohorts (GCC, OECD-nuclear, ASEAN-
// trade-hub, LatAm-petro, African-fragile, post-Soviet, stressed-debt),
// fetch the full per-dimension score via GET
// /api/resilience/v1/get-resilience-score?countryCode=XX.
// 3. Emit a Markdown report with:
// - Full ranking table (top N + grey-outs summary)
// - Per-cohort per-dimension breakdown (score / coverage / imputation)
// - Contribution decomposition: per country, per dim,
// (score × coverage × dimWeight × domainWeight) toward overall
// - Flagged patterns: saturated dimensions (>95 across cohort),
// low-coverage outliers (coverage < 0.5 where peers are 1.0),
// identical-score clusters (same score across all cohort members)
// - Top-N movers vs a baseline snapshot (optional)
//
// What this does NOT do:
// - Assert country rank orderings ("AE > KW"). That would couple the gate
// to outcome-seeking; the audit is intentionally descriptive.
// - Fail the build. It's a report generator. Release review reads the
// report and decides whether to hold publication.
//
// Usage:
// WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app \
// node scripts/audit-resilience-cohorts.mjs
// WORLDMONITOR_API_KEY=wm_xxx API_BASE=... \
// BASELINE=docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json \
// OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs
// FIXTURE=tests/fixtures/resilience-audit-fixture.json node scripts/audit-resilience-cohorts.mjs
//
// Auth: the resilience ranking + score endpoints are in PREMIUM_RPC_PATHS
// (see src/shared/premium-paths.ts). A valid WORLDMONITOR_API_KEY is
// required whether running from a trusted browser origin or not — the
// premium gate forces the key.
//
// Fixture mode (FIXTURE env): reads a JSON file with shape
// { ranking: GetResilienceRankingResponse, scores: { [cc]: GetResilienceScoreResponse } }
// and builds the report without any network calls. Useful for offline runs
// and for regression-comparing the audit output itself across scorer
// changes (diff the Markdown).
//
// Failure modes the script explicitly surfaces (NOT silent-drops):
// 1. Per-country fetch failure (HTTP 4xx/5xx, timeout). Tracked in a
// `failures` map, rendered as a top-of-report blocker banner and a
// dedicated "Fetch failures / missing members" section, so a
// reviewer skimming the artifact cannot miss that the cohort was
// only partially audited.
// 2. Formula-mode mismatch. When `RESILIENCE_PILLAR_COMBINE_ENABLED`
// is active, `overallScore = penalizedPillarScore(pillars)` — a
// non-linear function of the dim scores — and the contribution
// decomposition (domain-weighted) no longer sums to overall. The
// harness detects this via Σ-contribution vs overall drift and
// flags it at report top so the operator knows the decomposition
// rows are reference-only.
// STRICT=1 exits non-zero (code 3 for fetch failures, 4 for formula
// mismatch) AFTER writing the report, so release-gate automation can't
// treat a partial/stale audit as green.
import fs from 'node:fs/promises';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
import { execSync } from 'node:child_process';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const REPO_ROOT = path.resolve(__dirname, '..');
const FIXTURE_PATH = process.env.FIXTURE || '';
const API_BASE = (process.env.API_BASE || '').replace(/\/$/, '');
if (!FIXTURE_PATH) {
if (!API_BASE) {
console.error('[audit-resilience-cohorts] API_BASE env var required (e.g. https://api.worldmonitor.app), or FIXTURE=path.json for offline mode');
process.exit(2);
}
if (!process.env.WORLDMONITOR_API_KEY) {
console.error('[audit-resilience-cohorts] WORLDMONITOR_API_KEY env var required; resilience RPC paths are in PREMIUM_RPC_PATHS.');
process.exit(2);
}
}
const RANKING_URL = `${API_BASE}/api/resilience/v1/get-resilience-ranking`;
const SCORE_URL = (cc) => `${API_BASE}/api/resilience/v1/get-resilience-score?countryCode=${encodeURIComponent(cc)}`;
const BASELINE_PATH = process.env.BASELINE || '';
const OUT_PATH = process.env.OUT || '';
const TOP_N_FULL_RANKING = Number(process.env.TOP_N || 60);
const MOVERS_N = Number(process.env.MOVERS_N || 30);
const CONCURRENCY = Number(process.env.CONCURRENCY || 6);
// STRICT=1 makes the audit fail-closed: any per-country fetch failure OR any
// detected formula-mode change (pillar-combine on, contribution rows
// invalid) exits non-zero so the release-gate operator cannot accidentally
// ship a partial / misleading report. Default (STRICT unset) still renders
// but banners the issue prominently at report top.
const STRICT = process.env.STRICT === '1' || process.env.STRICT === 'true';
// Tolerance for "sum(contributions) vs overallScore" equality check used
// to detect pillar-combine formula mode (see decomposeContributions).
const CONTRIBUTION_SUM_TOLERANCE = Number(process.env.CONTRIB_TOLERANCE || 1.5);
// Named cohorts. Membership reflects the construct question each cohort
// answers — not "who should rank where." See release-gate doc for rationale.
const COHORTS = {
GCC: ['AE', 'SA', 'KW', 'QA', 'OM', 'BH'],
'OECD-nuclear': ['FR', 'US', 'GB', 'JP', 'KR', 'DE', 'CA', 'FI', 'SE', 'BE'],
'ASEAN-trade-hub': ['SG', 'MY', 'TH', 'VN', 'ID', 'PH'],
'LatAm-petro': ['BR', 'MX', 'CO', 'VE', 'AR', 'EC'],
'African-fragile': ['NG', 'ZA', 'ET', 'KE', 'GH', 'CD', 'SD'],
'Post-Soviet': ['RU', 'KZ', 'AZ', 'UA', 'UZ', 'GE', 'AM'],
'Stressed-debt': ['LK', 'PK', 'AR', 'LB', 'TR', 'EG', 'TN'],
'Re-export-hub': ['SG', 'HK', 'NL', 'BE', 'PA', 'AE', 'MY', 'LT'],
'SWF-heavy-exporter': ['NO', 'QA', 'KW', 'SA', 'KZ', 'AZ'],
'Fragile-floor': ['YE', 'SY', 'SO', 'AF'],
};
// Coarse domain weights mirrored from _dimension-scorers.ts for contribution
// decomposition. The live API already returns domain.weight per country,
// so we READ that from the API rather than hardcoding — this table is only
// used for sanity-cross-check in the header.
const EXPECTED_DOMAIN_WEIGHTS = {
economic: 0.17,
infrastructure: 0.15,
energy: 0.11,
'social-governance': 0.19,
'health-food': 0.13,
recovery: 0.25,
};
function commitSha() {
try {
return execSync('git rev-parse HEAD', { cwd: REPO_ROOT, stdio: ['ignore', 'pipe', 'ignore'] })
.toString()
.trim();
} catch {
return 'unknown';
}
}
async function loadCountryNameMap() {
const filePath = path.join(REPO_ROOT, 'shared', 'country-names.json');
let raw;
try {
raw = await fs.readFile(filePath, 'utf8');
} catch (err) {
console.error(`[audit] shared/country-names.json read failed (${err.code || err.name}): ${err.message}. Falling back to ISO-2 codes in the report (country names will appear as CC).`);
return {};
}
let forward;
try {
forward = JSON.parse(raw);
} catch (err) {
console.error(`[audit] shared/country-names.json parse failed: ${err.message}. Falling back to ISO-2 codes.`);
return {};
}
const reverse = {};
for (const [name, iso2] of Object.entries(forward)) {
const code = String(iso2 || '').toUpperCase();
if (!/^[A-Z]{2}$/.test(code)) continue;
if (reverse[code]) continue;
reverse[code] = name.replace(/\b([a-z])/g, (_, c) => c.toUpperCase());
}
return reverse;
}
function apiHeaders() {
const h = {
accept: 'application/json',
// Full UA (not the 10-char Node default) avoids middleware.ts's short-UA
// bot guard that 403s bare `node` fetches on the edge path.
'user-agent': 'audit-resilience-cohorts/1.0 (+scripts/audit-resilience-cohorts.mjs)',
};
if (process.env.WORLDMONITOR_API_KEY) {
h['X-WorldMonitor-Key'] = process.env.WORLDMONITOR_API_KEY;
}
return h;
}
async function fetchRanking() {
const response = await fetch(RANKING_URL, { headers: apiHeaders() });
if (!response.ok) {
throw new Error(`HTTP ${response.status} from ${RANKING_URL}: ${await response.text().catch(() => '')}`);
}
return response.json();
}
async function fetchScore(countryCode) {
const response = await fetch(SCORE_URL(countryCode), { headers: apiHeaders() });
if (!response.ok) {
throw new Error(`HTTP ${response.status} for ${countryCode}`);
}
return response.json();
}
async function fetchScoresConcurrent(countryCodes) {
const scores = new Map();
const failures = new Map(); // cc → error message
const queue = [...countryCodes];
async function worker() {
while (queue.length) {
const cc = queue.shift();
if (!cc) return;
try {
const data = await fetchScore(cc);
scores.set(cc, data);
} catch (err) {
console.error(`[audit] ${cc} failed: ${err.message}`);
failures.set(cc, err.message || 'unknown fetch error');
// Do NOT insert null into scores — silent-drop was the P1 bug.
// Failures are tracked distinctly so the report can banner them
// and STRICT mode can exit non-zero.
}
}
}
const workers = Array.from({ length: Math.min(CONCURRENCY, queue.length) }, worker);
await Promise.all(workers);
return { scores, failures };
}
function round1(n) {
return Math.round(n * 10) / 10;
}
function round2(n) {
return Math.round(n * 100) / 100;
}
// Given a score document, compute the contribution of every dimension to the
// overall score. The overall is (by construct) a domain-weighted roll-up of
// coverage-weighted dimension means. For contribution reporting we use the
// "effective share" each dim has toward overall:
// domainShare = domainWeight
// withinDomainShare = (dim.coverage × dimWeight) / Σ(coverage × dimWeight) for that domain
// overallContribution = dim.score × withinDomainShare × domainShare
// The sum of overallContribution across all dims ≈ overallScore (modulo
// pillar-combine path when enabled, which isn't contribution-decomposable
// by a clean formula).
function decomposeContributions(scoreDoc, dimWeights) {
const rows = [];
for (const domain of scoreDoc.domains ?? []) {
const dims = domain.dimensions ?? [];
let denom = 0;
for (const d of dims) {
const w = dimWeights[d.id] ?? 1.0;
denom += (d.coverage ?? 0) * w;
}
for (const d of dims) {
const w = dimWeights[d.id] ?? 1.0;
const withinDomainShare = denom > 0 ? ((d.coverage ?? 0) * w) / denom : 0;
const contribution = (d.score ?? 0) * withinDomainShare * (domain.weight ?? 0);
rows.push({
domainId: domain.id,
domainWeight: domain.weight,
dimensionId: d.id,
score: d.score,
coverage: d.coverage,
imputationClass: d.imputationClass || '',
dimWeight: w,
withinDomainShare,
contribution,
});
}
}
return rows;
}
// Weight multipliers mirrored from _dimension-scorers.ts. Mirror is acceptable
// here because the audit script is a diagnostic — if dim weights drift we'll
// see contribution rows that don't sum to overallScore and investigate.
const DIM_WEIGHTS = {
macroFiscal: 1.0,
currencyExternal: 1.0,
tradeSanctions: 1.0,
cyberDigital: 1.0,
logisticsSupply: 1.0,
infrastructure: 1.0,
energy: 1.0,
governanceInstitutional: 1.0,
socialCohesion: 1.0,
borderSecurity: 1.0,
informationCognitive: 1.0,
healthPublicService: 1.0,
foodWater: 1.0,
fiscalSpace: 1.0,
reserveAdequacy: 1.0,
externalDebtCoverage: 1.0,
importConcentration: 1.0,
stateContinuity: 1.0,
fuelStockDays: 1.0,
liquidReserveAdequacy: 0.5,
sovereignFiscalBuffer: 0.5,
};
function flagDimensionPatterns(cohortName, cohortCodes, scoreMap) {
const flags = [];
// Collect per-dimension values across the cohort.
const byDim = new Map();
for (const cc of cohortCodes) {
const doc = scoreMap.get(cc);
if (!doc) continue;
for (const domain of doc.domains ?? []) {
for (const dim of domain.dimensions ?? []) {
if (!byDim.has(dim.id)) byDim.set(dim.id, []);
byDim.get(dim.id).push({ cc, score: dim.score, coverage: dim.coverage, imputationClass: dim.imputationClass });
}
}
}
for (const [dimId, entries] of byDim.entries()) {
// Saturated dim: every member scores > 95
if (entries.length >= 3 && entries.every((e) => e.score > 95)) {
flags.push({
cohort: cohortName,
kind: 'saturated-high',
dimension: dimId,
message: `Every cohort member scores > 95 on ${dimId}; dim contributes zero discrimination within the cohort.`,
});
}
// Saturated low: every member scores < 5
if (entries.length >= 3 && entries.every((e) => e.score < 5)) {
flags.push({
cohort: cohortName,
kind: 'saturated-low',
dimension: dimId,
message: `Every cohort member scores < 5 on ${dimId}; construct may not apply or seed is missing.`,
});
}
// Identical score across cohort (variance = 0 and ≥ 3 entries)
if (entries.length >= 3) {
const first = entries[0].score;
if (entries.every((e) => e.score === first) && first > 0 && first < 100) {
flags.push({
cohort: cohortName,
kind: 'identical-scores',
dimension: dimId,
message: `All ${entries.length} cohort members have identical ${dimId} = ${first}; possible imputed-default or region-default leak.`,
});
}
}
// Low-coverage outlier: one entry has coverage < 0.5 while peers ≥ 0.9
const lowCov = entries.filter((e) => (e.coverage ?? 0) < 0.5);
const highCov = entries.filter((e) => (e.coverage ?? 0) >= 0.9);
if (lowCov.length && highCov.length >= lowCov.length * 2) {
flags.push({
cohort: cohortName,
kind: 'coverage-outlier',
dimension: dimId,
message: `Low coverage on ${dimId}: ${lowCov.map((e) => `${e.cc}(${round2(e.coverage)})`).join(', ')}; peers have full coverage.`,
});
}
}
return flags;
}
function computeMovers(currentItems, baselineItems, n) {
if (!baselineItems) return [];
const baselineByCc = new Map(baselineItems.map((x) => [x.countryCode, x]));
const currentByCc = new Map(currentItems.map((x) => [x.countryCode, x]));
const deltas = [];
for (const [cc, cur] of currentByCc.entries()) {
const prev = baselineByCc.get(cc);
if (!prev) continue;
const curScore = typeof cur.overallScore === 'number' ? cur.overallScore : null;
const prevScore = typeof prev.overallScoreRaw === 'number' ? prev.overallScoreRaw : (typeof prev.overallScore === 'number' ? prev.overallScore : null);
if (curScore == null || prevScore == null) continue;
deltas.push({
countryCode: cc,
scoreDelta: curScore - prevScore,
curScore,
prevScore,
curRank: cur.__rank,
prevRank: prev.rank ?? null,
});
}
deltas.sort((a, b) => Math.abs(b.scoreDelta) - Math.abs(a.scoreDelta));
return deltas.slice(0, n);
}
function fmtDelta(delta) {
if (delta === 0) return '·';
// ASCII hyphen-minus, not U+2212 MINUS. Downstream operators diff
// audit reports with `grep`/`awk`/CSV pipelines that treat the two
// characters differently; keeping ASCII preserves byte-level
// greppability of negative deltas.
const sign = delta > 0 ? '+' : '-';
return `${sign}${Math.abs(delta).toFixed(2)}`;
}
function section(label, body) {
return `\n## ${label}\n\n${body}\n`;
}
// Detect whether overall is computed via the legacy domain-weighted
// formula (contribution decomposition is valid) or the pillar-combine
// formula (penalizedPillarScore — decomposition is NOT valid and the
// operator MUST know). Signal: |Σ contributions - overallScore| across
// countries with COMPLETE domain coverage exceeds
// CONTRIBUTION_SUM_TOLERANCE. "Complete" requires:
// (a) sum(domain.weight) within 0.05 of 1.0 (all 6 domains present)
// (b) every dim has coverage ≥ 0.9 (so the dim-share math is stable)
// Both gates prevent false positives from small/partial fixtures or
// live-API responses where the call happened to land mid-backfill.
function detectFormulaMode(scoreMap) {
let diffsExceeded = 0;
let checked = 0;
const examples = [];
for (const [cc, doc] of scoreMap.entries()) {
if (!doc) continue;
const domains = doc.domains ?? [];
const domainWeightSum = domains.reduce((a, d) => a + (d.weight ?? 0), 0);
if (Math.abs(domainWeightSum - 1.0) > 0.05) continue; // incomplete response
const hasFullCoverage = domains.every((dom) =>
(dom.dimensions ?? []).every((dim) => (dim.coverage ?? 0) >= 0.9),
);
if (!hasFullCoverage) continue;
const rows = decomposeContributions(doc, DIM_WEIGHTS);
const sum = rows.reduce((a, r) => a + r.contribution, 0);
const overall = doc.overallScore ?? 0;
const diff = Math.abs(sum - overall);
checked += 1;
if (diff > CONTRIBUTION_SUM_TOLERANCE) {
diffsExceeded += 1;
if (examples.length < 3) examples.push({ cc, sum, overall, diff });
}
}
// Heuristic: if > 50% of eligible countries drift AND at least 3 were
// checked, pillar-combine is probably active. Below 3 checked we skip
// the flag entirely — the signal is too noisy to banner-block on.
const pillarModeLikely = checked >= 3 && diffsExceeded / checked > 0.5;
return { pillarModeLikely, checked, diffsExceeded, examples };
}
function renderCohortSection(cohortName, codes, scoreMap, nameMap) {
const present = codes.filter((cc) => scoreMap.get(cc));
if (!present.length) return '';
// Collect all dims seen in this cohort.
const dimIds = new Set();
for (const cc of present) {
const doc = scoreMap.get(cc);
for (const dom of doc.domains ?? []) for (const dim of dom.dimensions ?? []) dimIds.add(dim.id);
}
const orderedDims = [...dimIds].sort();
let body = `Members: ${present.join(', ')}\n\n`;
// Overall table
body += `**Overall**\n\n| CC | Country | Overall | Baseline | Stress | Level |\n|---|---|---:|---:|---:|---|\n`;
for (const cc of present) {
const doc = scoreMap.get(cc);
body += `| ${cc} | ${nameMap[cc] ?? cc} | ${round1(doc.overallScore)} | ${round1(doc.baselineScore)} | ${round1(doc.stressScore)} | ${doc.level} |\n`;
}
// Per-dim scores
body += `\n**Per-dimension score** (score · coverage · imputationClass if set)\n\n`;
body += `| Dim | ${present.join(' | ')} |\n|---| ${present.map(() => '---:').join(' | ')} |\n`;
for (const dimId of orderedDims) {
const cells = present.map((cc) => renderDimCell(scoreMap.get(cc), dimId));
body += `| ${dimId} | ${cells.join(' | ')} |\n`;
}
// Contribution decomposition (sums to overall per country under legacy formula).
body += `\n**Contribution decomposition** (points toward overall score)\n\n`;
body += `| Dim | ${present.join(' | ')} |\n|---| ${present.map(() => '---:').join(' | ')} |\n`;
const contribByCc = new Map(
present.map((cc) => [cc, decomposeContributions(scoreMap.get(cc), DIM_WEIGHTS)]),
);
for (const dimId of orderedDims) {
const cells = present.map((cc) => {
const row = (contribByCc.get(cc) ?? []).find((r) => r.dimensionId === dimId);
return row ? row.contribution.toFixed(2) : '—';
});
body += `| ${dimId} | ${cells.join(' | ')} |\n`;
}
const sums = present.map((cc) => (contribByCc.get(cc) ?? []).reduce((a, r) => a + r.contribution, 0));
body += `| **sum contrib** | ${sums.map((s) => s.toFixed(2)).join(' | ')} |\n`;
const overalls = present.map((cc) => scoreMap.get(cc).overallScore);
body += `| **overallScore** | ${overalls.map((s) => round1(s)).join(' | ')} |\n`;
return section(`Cohort: ${cohortName}`, body);
}
function renderDimCell(doc, dimId) {
for (const dom of doc.domains ?? []) {
for (const dim of dom.dimensions ?? []) {
if (dim.id === dimId) {
const cov = round2(dim.coverage ?? 0);
const imp = dim.imputationClass ? ` · *${dim.imputationClass}*` : '';
return `${Math.round(dim.score ?? 0)} · ${cov}${imp}`;
}
}
}
return '—';
}
function buildReport({ ranking, scoreMap, nameMap, movers, capturedAt, sha, failures, requestedCohortCodes }) {
const items = ranking.items ?? [];
const greyedOut = ranking.greyedOut ?? [];
const failureList = [...(failures?.entries?.() ?? [])];
const missingCohortMembers = (requestedCohortCodes ?? []).filter((cc) => !scoreMap.get(cc));
const formulaMode = detectFormulaMode(scoreMap);
let md = `# Resilience cohort-sanity audit report\n\n`;
// Blocking banners at the very top. Operator MUST see these before the
// tables below. STRICT mode will exit non-zero after writing the report
// so an operator can inspect the diagnostics and then re-run.
if (failureList.length || missingCohortMembers.length) {
md += `> ⛔ **Fetch failures / missing cohort members.** ${failureList.length} per-country fetch(es) failed; `;
md += `${missingCohortMembers.length} cohort member(s) are missing from the score map. `;
md += `Tables below only reflect the members that DID load. `;
md += `Re-run the audit (STRICT=1 recommended) before treating this report as release-gate evidence.\n\n`;
}
if (formulaMode.pillarModeLikely) {
md += `> ⛔ **Formula mode not supported.** ${formulaMode.diffsExceeded}/${formulaMode.checked} full-coverage countries show `;
md += `|Σ contributions overallScore| > ${CONTRIBUTION_SUM_TOLERANCE}. This almost certainly means \`RESILIENCE_PILLAR_COMBINE_ENABLED\` `;
md += `is active (penalizedPillarScore), and the **contribution decomposition tables below are NOT valid**. `;
md += `Treat them as "legacy-formula reference only." `;
md += `See \`docs/methodology/cohort-sanity-release-gate.md#formula-mode\`.\n\n`;
}
// In FIXTURE mode `API_BASE` is empty → `RANKING_URL` would render as
// a bare "/api/resilience/v1/get-resilience-ranking" path that never
// resolved. Surface "fixture://<path>" instead so a diff against a
// live-run report is visibly distinguishable.
const sourceLabel = FIXTURE_PATH ? `fixture://${FIXTURE_PATH}` : RANKING_URL;
md += `- Captured: ${capturedAt}\n- Commit: ${sha}\n- Source: ${sourceLabel}\n- Ranked: ${items.length} · Grey-out: ${greyedOut.length}\n`;
md += `- Generated by: \`scripts/audit-resilience-cohorts.mjs\`\n`;
md += `- Expected domain weights: ${Object.entries(EXPECTED_DOMAIN_WEIGHTS).map(([k, v]) => `${k}=${v}`).join(', ')}\n`;
md += `- Formula mode: ${formulaMode.pillarModeLikely ? '**PILLAR-COMBINE (decomposition invalid)**' : 'legacy domain-weighted (decomposition valid)'}\n`;
md += `- Fetch failures: ${failureList.length} · Missing cohort members: ${missingCohortMembers.length}\n`;
if (BASELINE_PATH) md += `- Baseline snapshot: \`${BASELINE_PATH}\`\n`;
// Dedicated "what failed" section, rendered even when empty so operators
// always know to check for it.
{
let failBody = '';
if (failureList.length) {
failBody += `| CC | Country | Error |\n|---|---|---|\n`;
for (const [cc, msg] of failureList) {
failBody += `| ${cc} | ${nameMap[cc] ?? cc} | ${String(msg).replace(/\|/g, '\\|').slice(0, 200)} |\n`;
}
}
if (missingCohortMembers.length) {
failBody += `\n**Cohort members with no score data:** ${missingCohortMembers.join(', ')}\n`;
failBody += `\nThe cohorts below were rendered using only members that loaded successfully. `;
failBody += `An operator comparing to a prior audit should assume the missing members may carry the very anomaly under review.\n`;
}
if (!failBody) failBody = '_No fetch failures and all cohort members present._';
md += section('Fetch failures / missing members', failBody);
}
if (formulaMode.pillarModeLikely && formulaMode.examples.length) {
let fmBody = `| CC | Σ contrib | overallScore | |diff| |\n|---|---:|---:|---:|\n`;
for (const ex of formulaMode.examples) {
fmBody += `| ${ex.cc} | ${ex.sum.toFixed(2)} | ${ex.overall.toFixed(2)} | ${ex.diff.toFixed(2)} |\n`;
}
fmBody += `\n**Diagnosis.** Under the legacy domain-weighted formula, Σ contributions ≈ overallScore (within ~${CONTRIBUTION_SUM_TOLERANCE} pts of drift for rounding). When \`RESILIENCE_PILLAR_COMBINE_ENABLED\` is active, \`overallScore\` is computed by \`penalizedPillarScore(pillars)\` which is non-linear in the dimension scores; contribution decomposition by domain-weight no longer sums to overall. The audit script does not yet implement a pillar-aware decomposition — fix that before relying on this report under pillar-combine mode.\n`;
md += section('Formula-mode diagnostic', fmBody);
}
// Ranking table
let body = '| # | CC | Country | Overall | Coverage | Level | Low-conf |\n|---:|---|---|---:|---:|---|---|\n';
items.slice(0, TOP_N_FULL_RANKING).forEach((x, i) => {
body += `| ${i + 1} | ${x.countryCode} | ${nameMap[x.countryCode] ?? x.countryCode} | ${round1(x.overallScore)} | ${round2(x.overallCoverage)} | ${x.level} | ${x.lowConfidence ? '⚠' : ''} |\n`;
});
md += section(`Top ${TOP_N_FULL_RANKING} ranking`, body);
// Per-cohort per-dimension breakdown
for (const [cohortName, codes] of Object.entries(COHORTS)) {
md += renderCohortSection(cohortName, codes, scoreMap, nameMap);
}
// Flagged patterns
const allFlags = [];
for (const [cohortName, codes] of Object.entries(COHORTS)) {
allFlags.push(...flagDimensionPatterns(cohortName, codes, scoreMap));
}
if (allFlags.length) {
let flagBody = `| Cohort | Kind | Dimension | Message |\n|---|---|---|---|\n`;
for (const f of allFlags) {
flagBody += `| ${f.cohort} | ${f.kind} | ${f.dimension} | ${f.message} |\n`;
}
md += section('Flagged patterns', flagBody);
} else {
md += section('Flagged patterns', '_No cohort-sanity patterns tripped heuristic thresholds._');
}
// Movers
if (movers?.length) {
let mvBody = `Baseline: \`${BASELINE_PATH}\`\n\n`;
mvBody += `| CC | Country | Prev | Current | Δ | Prev rank | Current rank |\n|---|---|---:|---:|---:|---:|---:|\n`;
for (const m of movers) {
mvBody += `| ${m.countryCode} | ${nameMap[m.countryCode] ?? m.countryCode} | ${round1(m.prevScore)} | ${round1(m.curScore)} | ${fmtDelta(round2(m.scoreDelta))} | ${m.prevRank ?? '—'} | ${m.curRank ?? '—'} |\n`;
}
md += section(`Top-${MOVERS_N} movers vs baseline`, mvBody);
}
md += `\n---\n\n*This audit is a release-gate diagnostic, not a merge-blocker. Rank-targeted acceptance criteria are an explicit anti-pattern — see \`docs/methodology/cohort-sanity-release-gate.md\`.*\n`;
return { md, failureList, missingCohortMembers, formulaMode };
}
async function main() {
const nameMap = await loadCountryNameMap();
const cohortCodeSet = new Set();
for (const codes of Object.values(COHORTS)) for (const cc of codes) cohortCodeSet.add(cc);
const requestedCohortCodes = [...cohortCodeSet].sort();
let ranking;
let scoreMap;
let failures = new Map();
if (FIXTURE_PATH) {
const raw = await fs.readFile(path.resolve(REPO_ROOT, FIXTURE_PATH), 'utf8');
const fixture = JSON.parse(raw);
ranking = fixture.ranking ?? { items: [], greyedOut: [] };
scoreMap = new Map(Object.entries(fixture.scores ?? {}));
// Fixture mode has no network calls, but a fixture may legitimately
// omit cohort members (for small smoke-test fixtures). Rather than
// silently dropping them, compute the missing set here too so the
// report banners them identically to live-mode fetch failures.
console.error(`[audit] FIXTURE mode: ${path.resolve(REPO_ROOT, FIXTURE_PATH)} (ranked=${(ranking.items || []).length}, scores=${scoreMap.size})`);
} else {
ranking = await fetchRanking();
console.error(`[audit] fetching per-country scores for ${requestedCohortCodes.length} cohort members at concurrency=${CONCURRENCY}`);
const result = await fetchScoresConcurrent(requestedCohortCodes);
scoreMap = result.scores;
failures = result.failures;
}
const items = ranking.items ?? [];
items.forEach((x, i) => { x.__rank = i + 1; });
let movers = [];
if (BASELINE_PATH) {
try {
const raw = await fs.readFile(path.resolve(REPO_ROOT, BASELINE_PATH), 'utf8');
const baseline = JSON.parse(raw);
movers = computeMovers(items, baseline.items, MOVERS_N);
} catch (err) {
console.error(`[audit] baseline read failed: ${err.message}`);
}
}
const capturedAt = new Date().toISOString();
const sha = commitSha();
const { md, failureList, missingCohortMembers, formulaMode } = buildReport({
ranking, scoreMap, nameMap, movers, capturedAt, sha, failures, requestedCohortCodes,
});
if (OUT_PATH) {
await fs.mkdir(path.dirname(path.resolve(REPO_ROOT, OUT_PATH)), { recursive: true });
await fs.writeFile(path.resolve(REPO_ROOT, OUT_PATH), md, 'utf8');
console.error(`[audit] wrote ${OUT_PATH}`);
} else {
process.stdout.write(md);
}
// STRICT mode fails the run AFTER writing the report so operators still
// have the diagnostic artifact on disk. Exit codes:
// 3 — fetch failures or missing cohort members
// 4 — formula-mode change detected (pillar-combine active, decomposition invalid)
// 0 — all clear
if (STRICT) {
if (failureList.length || missingCohortMembers.length) {
console.error(`[audit] STRICT: ${failureList.length} fetch failure(s), ${missingCohortMembers.length} missing cohort member(s); exiting 3`);
process.exit(3);
}
if (formulaMode.pillarModeLikely) {
console.error(`[audit] STRICT: formula-mode mismatch detected (pillar-combine likely); contribution decomposition invalid; exiting 4`);
process.exit(4);
}
}
}
main().catch((err) => {
console.error('[audit-resilience-cohorts] failed:', err);
process.exit(1);
});

View File

@@ -0,0 +1,166 @@
// Smoke-tests for the fail-closed behaviour of
// `scripts/audit-resilience-cohorts.mjs`. Verifies:
// (1) Missing cohort members produce a ⛔ banner at report top
// and a dedicated "Fetch failures / missing members" section.
// (2) STRICT=1 exits non-zero (code 3) when members are missing.
// (3) Formula-mode detection correctly banners when pillar-combine
// is active (Σ contributions ≠ overallScore for complete responses)
// and correctly does NOT banner when contributions sum.
//
// The tests drive the script as a child process against synthetic
// fixtures so they exercise the full `main()` flow (report shape,
// exit codes, stderr logging) rather than just the pure helpers.
import assert from 'node:assert/strict';
import { describe, it } from 'node:test';
import { spawnSync } from 'node:child_process';
import path from 'node:path';
import fs from 'node:fs';
import os from 'node:os';
import { fileURLToPath } from 'node:url';
const __filename = fileURLToPath(import.meta.url);
const REPO_ROOT = path.resolve(path.dirname(__filename), '..');
const SCRIPT = path.join(REPO_ROOT, 'scripts', 'audit-resilience-cohorts.mjs');
function writeFixture(name: string, fixture: unknown): string {
const tmpFile = path.join(os.tmpdir(), `audit-fixture-${name}-${process.pid}.json`);
fs.writeFileSync(tmpFile, JSON.stringify(fixture));
return tmpFile;
}
function runAudit(env: Record<string, string>): { status: number | null; stdout: string; stderr: string; report: string } {
const outFile = path.join(os.tmpdir(), `audit-out-${Date.now()}-${Math.random().toString(36).slice(2)}.md`);
const result = spawnSync('node', [SCRIPT], {
env: { ...process.env, OUT: outFile, ...env },
encoding: 'utf8',
});
let report = '';
try { report = fs.readFileSync(outFile, 'utf8'); } catch { /* no report written */ }
return {
status: result.status,
stdout: result.stdout ?? '',
stderr: result.stderr ?? '',
report,
};
}
// Complete fixture: 57 cohort members so missing-member banner does NOT fire.
// Domain weights sum to 1.0 and coverage is 1.0 throughout.
// Σ contributions per country should land within CONTRIB_TOLERANCE of overall.
function buildCompleteFixture(options: { pillarMode?: boolean } = {}): unknown {
const allCohortCodes = Array.from(new Set([
'AE', 'SA', 'KW', 'QA', 'OM', 'BH',
'FR', 'US', 'GB', 'JP', 'KR', 'DE', 'CA', 'FI', 'SE', 'BE',
'SG', 'MY', 'TH', 'VN', 'ID', 'PH',
'BR', 'MX', 'CO', 'VE', 'AR', 'EC',
'NG', 'ZA', 'ET', 'KE', 'GH', 'CD', 'SD',
'RU', 'KZ', 'AZ', 'UA', 'UZ', 'GE', 'AM',
'LK', 'PK', 'LB', 'TR', 'EG', 'TN',
'HK', 'NL', 'PA', 'LT',
'NO',
'YE', 'SY', 'SO', 'AF',
]));
const buildDoc = (overallScore: number) => {
const dimScore = overallScore;
return {
countryCode: 'XX',
overallScore: options.pillarMode ? 10 : overallScore,
// When pillarMode=true we deliberately set overallScore to a value
// that won't match Σ contributions (penalizedPillarScore semantics)
// so the detector fires. coverage=1.0 across all dims keeps the
// eligibility gate satisfied.
level: 'moderate',
baselineScore: overallScore,
stressScore: overallScore,
stressFactor: 0.2,
domains: [
{ id: 'economic', weight: 0.17, score: dimScore, dimensions: [
{ id: 'macroFiscal', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'infrastructure', weight: 0.15, score: dimScore, dimensions: [
{ id: 'infrastructure', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'energy', weight: 0.11, score: dimScore, dimensions: [
{ id: 'energy', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'social-governance', weight: 0.19, score: dimScore, dimensions: [
{ id: 'governanceInstitutional', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'health-food', weight: 0.13, score: dimScore, dimensions: [
{ id: 'healthPublicService', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'recovery', weight: 0.25, score: dimScore, dimensions: [
{ id: 'externalDebtCoverage', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
],
};
};
const scores: Record<string, unknown> = {};
for (const cc of allCohortCodes) {
scores[cc] = { ...(buildDoc(70) as Record<string, unknown>), countryCode: cc };
}
const items = allCohortCodes.slice(0, 6).map((cc) => ({
countryCode: cc, overallScore: 70, level: 'moderate', lowConfidence: false, overallCoverage: 1.0, rankStable: true,
}));
return { ranking: { items, greyedOut: [] }, scores };
}
describe('audit-resilience-cohorts fail-closed — missing cohort members', () => {
it('banners the report when fixture omits cohort members AND exits 3 under STRICT=1', () => {
// Minimal fixture intentionally omits almost every cohort member.
const fixture = {
ranking: { items: [
{ countryCode: 'AE', overallScore: 72.72, level: 'high', lowConfidence: false, overallCoverage: 0.88, rankStable: true },
], greyedOut: [] },
scores: {
AE: { countryCode: 'AE', overallScore: 72.72, level: 'high', baselineScore: 72, stressScore: 70, stressFactor: 0.15, domains: [
{ id: 'recovery', weight: 0.25, score: 50, dimensions: [
{ id: 'externalDebtCoverage', score: 100, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
]},
},
};
const fixturePath = writeFixture('missing-members', fixture);
try {
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
assert.equal(result.status, 3, `expected STRICT exit code 3 for missing members; got ${result.status}; stderr=${result.stderr}`);
assert.match(result.report, /⛔ \*\*Fetch failures \/ missing cohort members/, 'expected missing-members banner at report top');
assert.match(result.report, /## Fetch failures \/ missing members/, 'expected dedicated Fetch-failures section');
assert.match(result.report, /Cohort members with no score data:/, 'expected missing-members list');
} finally {
fs.unlinkSync(fixturePath);
}
});
it('exits 0 under STRICT=1 when all cohort members present + formula matches', () => {
const fixture = buildCompleteFixture({ pillarMode: false });
const fixturePath = writeFixture('complete', fixture);
try {
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
assert.equal(result.status, 0, `expected STRICT exit 0; got ${result.status}; stderr=${result.stderr}`);
assert.doesNotMatch(result.report, /⛔ \*\*Fetch failures/, 'missing-members banner should NOT fire');
assert.doesNotMatch(result.report, /⛔ \*\*Formula mode not supported/, 'formula-mode banner should NOT fire on legacy-formula response');
} finally {
fs.unlinkSync(fixturePath);
}
});
});
describe('audit-resilience-cohorts fail-closed — formula mode', () => {
it('banners the report when Σ contributions diverges from overallScore AND exits 4 under STRICT=1', () => {
const fixture = buildCompleteFixture({ pillarMode: true });
const fixturePath = writeFixture('pillar-mode', fixture);
try {
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
assert.equal(result.status, 4, `expected STRICT exit code 4 for formula mismatch; got ${result.status}; stderr=${result.stderr}`);
assert.match(result.report, /⛔ \*\*Formula mode not supported/, 'expected formula-mode banner at report top');
assert.match(result.report, /PILLAR-COMBINE \(decomposition invalid\)/, 'expected formula-mode line in header');
assert.match(result.report, /## Formula-mode diagnostic/, 'expected dedicated formula-mode diagnostic section');
} finally {
fs.unlinkSync(fixturePath);
}
});
});

View File

@@ -0,0 +1,125 @@
{
"_comment": "Minimal synthetic fixture for scripts/audit-resilience-cohorts.mjs end-to-end dry-run. Mirrors the 2026-04-24 GCC snapshot (KW>QA>AE) so the fixture-mode run produces the observed-real-world deltas the audit is designed to surface. Values are approximate; used only for structural verification.",
"ranking": {
"items": [
{ "countryCode": "KW", "overallScore": 79.08, "level": "high", "lowConfidence": false, "overallCoverage": 0.92, "rankStable": true },
{ "countryCode": "QA", "overallScore": 77.06, "level": "high", "lowConfidence": false, "overallCoverage": 0.95, "rankStable": true },
{ "countryCode": "AE", "overallScore": 72.72, "level": "high", "lowConfidence": false, "overallCoverage": 0.88, "rankStable": true },
{ "countryCode": "SA", "overallScore": 68.04, "level": "high", "lowConfidence": false, "overallCoverage": 0.90, "rankStable": true },
{ "countryCode": "OM", "overallScore": 65.74, "level": "moderate", "lowConfidence": false, "overallCoverage": 0.82, "rankStable": true },
{ "countryCode": "BH", "overallScore": 61.69, "level": "moderate", "lowConfidence": false, "overallCoverage": 0.85, "rankStable": true }
],
"greyedOut": []
},
"scores": {
"AE": {
"countryCode": "AE",
"overallScore": 72.72,
"level": "high",
"baselineScore": 72.0,
"stressScore": 70.0,
"stressFactor": 0.15,
"domains": [
{ "id": "recovery", "score": 62, "weight": 0.25, "dimensions": [
{ "id": "sovereignFiscalBuffer", "score": 27, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "liquidReserveAdequacy", "score": 38, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "importConcentration", "score": 50, "coverage": 0.3, "observedWeight": 0, "imputedWeight": 1, "imputationClass": "unmonitored" },
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "fiscalSpace", "score": 76, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "stateContinuity", "score": 85, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "economic", "score": 74, "weight": 0.17, "dimensions": [
{ "id": "tradeSanctions", "score": 54, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "currencyExternal", "score": 73, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "macroFiscal", "score": 80, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "energy", "score": 79, "weight": 0.11, "dimensions": [
{ "id": "energy", "score": 79, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "health-food", "score": 62, "weight": 0.13, "dimensions": [
{ "id": "foodWater", "score": 53, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "healthPublicService", "score": 75, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "social-governance", "score": 74, "weight": 0.19, "dimensions": [
{ "id": "socialCohesion", "score": 70, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "governanceInstitutional", "score": 78, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "infrastructure", "score": 78, "weight": 0.15, "dimensions": [
{ "id": "infrastructure", "score": 80, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]}
]
},
"KW": {
"countryCode": "KW",
"overallScore": 79.08,
"level": "high",
"baselineScore": 79.0,
"stressScore": 78.0,
"stressFactor": 0.11,
"domains": [
{ "id": "recovery", "score": 90, "weight": 0.25, "dimensions": [
{ "id": "sovereignFiscalBuffer", "score": 98, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "liquidReserveAdequacy", "score": 72, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "importConcentration", "score": 85, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "fiscalSpace", "score": 98, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "stateContinuity", "score": 80, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "economic", "score": 78, "weight": 0.17, "dimensions": [
{ "id": "tradeSanctions", "score": 82, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "currencyExternal", "score": 86, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "macroFiscal", "score": 70, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "energy", "score": 55, "weight": 0.11, "dimensions": [
{ "id": "energy", "score": 55, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "health-food", "score": 60, "weight": 0.13, "dimensions": [
{ "id": "foodWater", "score": 53, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "healthPublicService", "score": 72, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "social-governance", "score": 72, "weight": 0.19, "dimensions": [
{ "id": "socialCohesion", "score": 68, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "governanceInstitutional", "score": 76, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "infrastructure", "score": 76, "weight": 0.15, "dimensions": [
{ "id": "infrastructure", "score": 76, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]}
]
},
"QA": {
"countryCode": "QA",
"overallScore": 77.06,
"level": "high",
"baselineScore": 77.0,
"stressScore": 76.0,
"stressFactor": 0.12,
"domains": [
{ "id": "recovery", "score": 85, "weight": 0.25, "dimensions": [
{ "id": "sovereignFiscalBuffer", "score": 95, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "liquidReserveAdequacy", "score": 68, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "importConcentration", "score": 70, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]},
{ "id": "economic", "score": 78, "weight": 0.17, "dimensions": [
{ "id": "tradeSanctions", "score": 82, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]}
]
},
"SA": { "countryCode": "SA", "overallScore": 68.04, "level": "high", "baselineScore": 68.0, "stressScore": 67.0, "stressFactor": 0.14, "domains": [
{ "id": "recovery", "score": 70, "weight": 0.25, "dimensions": [
{ "id": "sovereignFiscalBuffer", "score": 72, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" },
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]}
]},
"OM": { "countryCode": "OM", "overallScore": 65.74, "level": "moderate", "baselineScore": 66.0, "stressScore": 64.0, "stressFactor": 0.16, "domains": [
{ "id": "recovery", "score": 60, "weight": 0.25, "dimensions": [
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]}
]},
"BH": { "countryCode": "BH", "overallScore": 61.69, "level": "moderate", "baselineScore": 62.0, "stressScore": 60.0, "stressFactor": 0.18, "domains": [
{ "id": "recovery", "score": 55, "weight": 0.25, "dimensions": [
{ "id": "externalDebtCoverage", "score": 100, "coverage": 1.0, "observedWeight": 1, "imputedWeight": 0, "imputationClass": "" }
]}
]}
}
}

View File

@@ -0,0 +1,161 @@
// Construct invariants — formula-level assertions with synthetic inputs.
//
// Purpose. Complement `resilience-dimension-monotonicity.test.mts` (which
// pins direction) with precise ANCHOR-VALUE checks. These tests fail when
// the scoring FORMULA breaks, not when a country's RANK changes. They are
// deliberately country-identity-free so the audit gate (see
// `docs/methodology/cohort-sanity-release-gate.md`) does not collapse into
// an outcome-seeking "ENTITY A must > ENTITY B" assertion — that is the
// anti-pattern the cohort-sanity skill explicitly warns against.
//
// Plan reference. PR 0 from
// `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md`
// (§"PR 0 — Release-gate audit harness"):
// > `score(HHI=0.05) > score(HHI=0.20)`
// > `score(debtToReservesRatio=0) > score(ratio=1) > score(ratio=2)`
// > `score(effMo=12) > score(effMo=3)`
// > `score(lowCarbonShare=80, fossilImportDep=0) > score(lowCarbonShare=0, fossilImportDep=100)`
//
// The tests are organised by scorer and include both the monotonicity
// claim and the precise anchor value where the construct fixes one
// (Greenspan-Guidotti = 50; saturating transform at effMo=12 = ~63).
// An anchor drift > 1 point is an invariant break: investigate before
// editing the test.
import assert from 'node:assert/strict';
import { describe, it } from 'node:test';
import {
scoreImportConcentration,
scoreExternalDebtCoverage,
scoreSovereignFiscalBuffer,
type ResilienceSeedReader,
} from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
const TEST_ISO2 = 'XX';
function makeReader(keyValueMap: Record<string, unknown>): ResilienceSeedReader {
return async (key: string) => keyValueMap[key] ?? null;
}
describe('construct invariants — importConcentration', () => {
async function scoreWith(hhi: number) {
return scoreImportConcentration(TEST_ISO2, makeReader({
'resilience:recovery:import-hhi:v1': { countries: { [TEST_ISO2]: { hhi } } },
}));
}
it('score(HHI=0.05) > score(HHI=0.20)', async () => {
const diversified = await scoreWith(0.05);
const concentrated = await scoreWith(0.20);
assert.ok(
diversified.score > concentrated.score,
`HHI 0.05→0.20 should lower score; got ${diversified.score}${concentrated.score}`,
);
});
it('HHI=0 anchors at score 100 (no-concentration pole)', async () => {
const r = await scoreWith(0);
assert.ok(Math.abs(r.score - 100) < 1, `expected ~100 at HHI=0, got ${r.score}`);
});
it('HHI=0.5 (fully concentrated under current 0..5000 goalpost) anchors at score 0', async () => {
// Current scorer: hhi×10000 normalised against (0, 5000). 0.5×10000 = 5000 → 0.
const r = await scoreWith(0.5);
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at HHI=0.5 under current goalpost, got ${r.score}`);
});
});
describe('construct invariants — externalDebtCoverage (Greenspan-Guidotti anchor)', () => {
async function scoreWith(debtToReservesRatio: number) {
return scoreExternalDebtCoverage(TEST_ISO2, makeReader({
'resilience:recovery:external-debt:v1': {
countries: { [TEST_ISO2]: { debtToReservesRatio } },
},
}));
}
it('ratio=0 → score 100 (zero-rollover-exposure pole)', async () => {
const r = await scoreWith(0);
assert.ok(Math.abs(r.score - 100) < 1, `expected ~100 at ratio=0, got ${r.score}`);
});
it('ratio=1.0 → score 50 (Greenspan-Guidotti threshold)', async () => {
const r = await scoreWith(1.0);
assert.ok(
Math.abs(r.score - 50) < 1,
`expected ~50 at ratio=1.0 under Greenspan-Guidotti anchor (worst=2), got ${r.score}`,
);
});
it('ratio=2.0 → score 0 (acute rollover-shock pole)', async () => {
const r = await scoreWith(2.0);
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at ratio=2.0, got ${r.score}`);
});
it('monotonic: score(ratio=0) > score(ratio=1) > score(ratio=2)', async () => {
const [r0, r1, r2] = await Promise.all([scoreWith(0), scoreWith(1), scoreWith(2)]);
assert.ok(r0.score > r1.score && r1.score > r2.score,
`expected strictly decreasing; got ${r0.score}, ${r1.score}, ${r2.score}`);
});
});
describe('construct invariants — sovereignFiscalBuffer (saturating transform)', () => {
// Saturating transform per scorer (line ~1687):
// score = 100 * (1 - exp(-em / 12))
// Reference values (not tuning points — these are what the formula SHOULD
// produce if no one has silently redefined it):
// em=0 → 0
// em=3 → 100*(1-e^-0.25) ≈ 22.1
// em=12 → 100*(1-e^-1) ≈ 63.2
// em=24 → 100*(1-e^-2) ≈ 86.5
// em→∞ → 100
async function scoreWithEm(em: number) {
return scoreSovereignFiscalBuffer(TEST_ISO2, makeReader({
'resilience:recovery:sovereign-wealth:v1': {
countries: { [TEST_ISO2]: { totalEffectiveMonths: em, completeness: 1.0 } },
},
}));
}
it('em=0 → score 0 (no SWF buffer)', async () => {
const r = await scoreWithEm(0);
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at em=0, got ${r.score}`);
});
it('em=12 → score ≈ 63 (one-year saturating anchor)', async () => {
const r = await scoreWithEm(12);
const expected = 100 * (1 - Math.exp(-1));
assert.ok(
Math.abs(r.score - expected) < 1,
`expected ~${expected.toFixed(1)} at em=12, got ${r.score}`,
);
});
it('em=24 → score ≈ 86 (two-year saturating anchor)', async () => {
const r = await scoreWithEm(24);
const expected = 100 * (1 - Math.exp(-2));
assert.ok(
Math.abs(r.score - expected) < 1,
`expected ~${expected.toFixed(1)} at em=24, got ${r.score}`,
);
});
it('monotonic: score(em=3) < score(em=12) < score(em=24)', async () => {
const [r3, r12, r24] = await Promise.all([scoreWithEm(3), scoreWithEm(12), scoreWithEm(24)]);
assert.ok(r3.score < r12.score && r12.score < r24.score,
`expected strictly increasing; got em=3:${r3.score}, em=12:${r12.score}, em=24:${r24.score}`);
});
it('country not in manifest → score 0, coverage 1.0 (legitimate zero, not imputed)', async () => {
// Seed present but country absent = "no SWF" (legitimate structural zero).
// This is distinct from "seed missing entirely" which returns IMPUTE.
const r = await scoreSovereignFiscalBuffer(TEST_ISO2, makeReader({
'resilience:recovery:sovereign-wealth:v1': { countries: {} },
}));
assert.equal(r.score, 0, `expected 0 when country has no manifest entry, got ${r.score}`);
assert.equal(r.coverage, 1.0, `expected coverage=1.0 (legitimate observation), got ${r.coverage}`);
assert.equal(r.imputationClass, null, `expected null imputation (not imputed), got ${r.imputationClass}`);
});
});