feat(resilience): PR 0 diagnostic freeze + fairness-audit harness (no scoring changes) (#3284)

* feat(resilience): PR 0 diagnostic freeze + fairness-audit harness

Lands the before-state and measurement apparatus every subsequent
resilience-scorer PR validates against. Zero scoring changes. Per the
v3 plan at docs/plans/2026-04-22-001-fix-resilience-scorer-structural-
bias-plan.md this is tranche 0 of five.

What lands:
- Construct contract published in the methodology doc: absolute
  resilience not development-adjusted, mechanism test for every
  indicator, peer-relative views published separately from the core.
- Known construct limitations section: six construct errors scheduled
  for PR 1-3 repair with explicit mapping to plan tranches.
- Indicator-source manifest at docs/methodology/indicator-sources.yaml
  with source, seriesId, seriesUrl, coveragePct, lastObservedYear,
  license, mechanismTestRationale, and a constructStatus classification.
- Pre-repair ranking snapshot at
  docs/snapshots/resilience-ranking-live-pre-repair-2026-04-22.json
  (217 items + 5 greyedOut, captured 2026-04-22 08:38 UTC at commit
  425507d15).
- Cohort configuration at tests/helpers/resilience-cohorts.mts: six
  cohorts covering 87 countries (net-fuel-exporters, net-energy-
  importers-oecd, nuclear-heavy-generation, coal-heavy-domestic,
  small-island-importers, fragile-states).
- Matched-pair sanity panel at tests/helpers/resilience-matched-pairs.mts:
  six pairs (FR/DE, NO/CA, UAE/BH, JP/KR, IN/ZA, SG/CH) with expected-
  direction rationale and minGap for acceptance gate 7.
- scripts/compare-resilience-current-vs-proposed.mjs extended to emit
  cohortSummary and matchedPairSummary alongside the existing output
  shape (backward compatible).
- tests/resilience-cohort-config.test.mts: 11 validations ensuring the
  cohort + matched-pair configs stay well-formed.

Deferred to PR 0.5 (before PR 1 lands):
- Monotonicity test harness for all 19 dimension scorers pinning the
  sign of every indicator.
- Pearson-derivative variable-influence baseline inside the sensitivity
  script producing the nominal-weight-vs-effective-influence table that
  plan acceptance gate 8 requires.

Verification: typecheck:all clean, 430/430 resilience tests pass,
11/11 new cohort-config tests pass, snapshot auto-discovered and
validated by the existing snapshot-test harness.

* feat(resilience): PR 0 follow-ups — monotonicity harness, variable-influence baseline, cross-consumer formula gate

Completes the PR 0 scope per the v3 plan §5 deliverables. Three adds:

1. Monotonicity test harness
   tests/resilience-dimension-monotonicity.test.mts pins the direction
   of movement for 14 indicators across 7 dimensions (reserve adequacy,
   fiscal space 3x, external debt coverage, import concentration,
   governance WGI, food/water 2x, energy 5x). Each test builds two
   synthetic ResilienceSeedReader fixtures differing only in the target
   indicator and asserts the dimension score moves in the documented
   direction. The scoreEnergy tests explicitly flag three indicators
   (gasShare, coalShare, electricityConsumption) that PR 1 §3.1-3.2
   overturns so future readers understand which directional claims the
   plan intentionally replaces.

2. Variable-influence baseline
   scripts/compare-resilience-current-vs-proposed.mjs now computes
   per-dimension Pearson correlation against the current overallScore
   scaled by the dimension's nominal domain weight (a Pearson-derivative
   approximation of Sobol indices). The output carries a
   variableInfluence[] array sorted by abs(effectiveInfluence) desc.
   Acceptance gate 8 from the plan compares post-change effective
   influence against assigned nominal weight; divergences flag a
   wealth-proxy or saturated-signal construct problem.

3. Cross-consumer formula gate
   Five external consumers of resilience:score:v10:* now filter stale-
   formula entries so a flag flip does not serve mixed-formula data
   downstream:
     - server/worldmonitor/supply-chain/v1/get-route-impact.ts —
       readResilienceScore() checks _formula via the new
       getCurrentCacheFormula export and returns 0 on mismatch.
     - scripts/validate-resilience-correlation.mjs,
       scripts/validate-resilience-backtest.mjs,
       scripts/backtest-resilience-outcomes.mjs,
       scripts/benchmark-resilience-external.mjs — each inlines a
       currentCacheFormulaLocal() helper that mirrors the server's
       formula derivation from env, skips parsed entries whose
       _formula disagrees, and logs the skip count so operators can
       notice a mismatch during the flip window.

A mixed-formula cohort (some countries d6-tagged, others pc-tagged)
would confound every correlation, AUC, and Spearman this repair plan
depends on for its acceptance gates. These guards close that gap.

Verification: typecheck:all clean, 444/444 resilience tests pass
(+14 from the new monotonicity harness).

* fix(resilience): PR 0 review follow-ups — sample-union + doc tense

Two review-driven fixes on top of PR 0.

1. scripts/compare-resilience-current-vs-proposed.mjs — the cohort and
   matched-pair summaries were computed against the historical
   52-country sensitivity seed, which silently excluded the
   small-island-importers cohort (zero members in the seed) and the
   sg-vs-ch matched pair (Singapore not in the seed). With the current
   script those acceptance gates are partially measured at best.

   SAMPLE now = union(historical 52 seed, every cohort member, every
   matched-pair endpoint). The imports for RESILIENCE_COHORTS and
   MATCHED_PAIRS moved from inside main() to module scope so the union
   can be computed before the script runs.

   Net sample size grows from 52 to ~95 countries. Still fast enough
   for an interactive pass; makes the acceptance gates honest.

2. docs/methodology/country-resilience-index.mdx — the construct
   contract wording read as present-tense compliance ("Every indicator
   in the scorer passes a single mechanism test"), which contradicted
   the immediately-following passage about indicators that currently
   fail the test. Reworded to "is being evaluated against" and added
   an explicit PR-0-does-not-change-scoring paragraph that names the
   known-failing indicators (electricityConsumption, gas/coal flat
   penalties, WHO per-capita health spend) and points at the repair
   plan for the replacement schedule.

Verification: typecheck:all clean, 444/444 resilience tests pass.

* fix(resilience): compare-script loads frozen baseline + emits per-indicator influence

Addresses two P1 review findings on PR #3284:

1. Script previously compared current-6d vs proposed-pillar-combined
   from the SAME checkout; never loaded the frozen pre-PR-0 baseline,
   so acceptance gates 2/6/7 ("no country moved >15pts vs baseline",
   cohort median shift vs baseline, matched-pair gap change vs
   baseline) could not be enforced for later scorer PRs.

   Now auto-discovers the most recent
   resilience-ranking-live-pre-repair-<date>.json (or post-<pr>-<date>)
   in docs/snapshots/ and emits a baselineComparison block with:
   spearmanVsBaseline, maxCountryAbsDelta, biggestDriftsVsBaseline,
   cohortShiftVsBaseline, matchedPairGapChange. If no baseline is
   found, the block is emitted with status 'unavailable' so callers
   distinguish missing-baseline from passed-baseline.

2. variableInfluence was emitted only at the dimension level, which
   hid the exact sub-indicators the repair plan targets
   (electricityConsumption, gasShare, coalShare, etc.) inside their
   parent dimension. Added extractIndicatorValues() which pulls twelve
   construct-risk indicators per country from the shared memoized
   reader, then computes per-indicator Pearson correlation against
   the current overall score. Emitted as perIndicatorInfluence[],
   sorted by absolute effective influence.

Acceptance gate 8 ("effective influence agrees in sign and rank-order
with assigned nominal weights") is now computable at the indicator
level, not only at the dimension level.

No production code touched; diagnostic-harness only.

* fix(resilience): baseline-snapshot selection by structured parse, not filename sort

Addresses P1 review on compare-resilience-current-vs-proposed.mjs:118-130.

Plain filename sort breaks the "immediate-prior state" contract two ways:

1. Lexical ordering: `pre-repair` sorts after `post-*`
   (`pr...` to 'r' > 'o'), so the PR-0 freeze would keep winning even
   after post-PR snapshots exist. Later scorer PRs would then report
   acceptance-gate deltas against the original pre-repair freeze
   instead of the immediately-prior post-PR-(N-1) snapshot — the gate
   would appear valid while measuring against the wrong baseline.

2. Lexical ordering: `pr10` < `pr9` (digit-by-digit), so PR-10 would
   lose the selection to PR-9.

Fix: parseBaselineSnapshotMeta() extracts (kind, prNumber, date) from
the filename. Sort keys are (kindRank desc, prNumber desc, date desc):
  - post always beats pre-repair (kindRank 1 vs 0)
  - among posts, prNumber compared numerically (10 beats 9)
  - date breaks ties (same-PR re-snapshots, later capture wins)
  - unlabeled post tags get prNumber 0 so they sort between
    pre-repair and any numbered PR snapshot

Surfaced in output: baselineKind / baselinePrNumber / baselineDate
alongside baselineFile so the operator can verify which snapshot was
selected without having to reopen the file.

Module now isMain-guarded per feedback_seed_isMain_guard memory so
tests can import parseBaselineSnapshotMeta without firing the
scoring run.

Added tests/resilience-baseline-snapshot-ordering.test.mjs (9 tests)
pinning the ordering contract for every known failure mode.

Diagnostic-harness change only. No production code touched.

* fix(resilience): full scorable universe + registry-driven per-indicator influence

Addresses two fresh P1 review findings on the PR 0 compare harness.

Finding 1 — acceptance math ran on a curated ~95-country sample,
so plan gate 2 could miss large regressions in excluded countries.

  - Main scoring loop now iterates the FULL scorable universe
    (listScorableCountries()), not the 52-country seed + cohort union.
  - Removed SAMPLE / HISTORICAL_SENSITIVITY_SEED constants.
  - Added scorableUniverseSize + cohortMissingFromScorable to output
    so operators see universe size and any cohort/pair endpoint that
    listScorable refuses to score (fail-loud, not silent drop).

Finding 3 — per-indicator influence was a hand-picked 12-indicator
subset, hiding most registry indicators from the baseline that
later scorer PRs need.

  - Extraction is now driven by INDICATOR_REGISTRY. Every Core +
    Enrichment indicator gets a row with explicit extractionStatus:
      implemented | not-implemented (with reason) | unregistered-in-harness
  - EXTRACTION_RULES covers 40/59 indicators across 11 shape families
    (static-path, static-wb-infrastructure, static-wgi, static-wgi-mean,
    static-who, energy-mix-field, gas-storage-field, recovery-country-
    field, imf-macro/labor-country-field, national-debt, sanctions-count).
  - Remaining 19 indicators need either a scorer trace hook (PR 0.5)
    or a safe aggregation duplicate; each carries a reason string.
  - extractionCoverage summary (totalIndicators / implemented /
    notImplemented / unregisteredInHarness / coreImplemented / coreTotal)
    exposed in output so PR 0.5 progress is measurable.

Added tests/resilience-indicator-extraction-plan.test.mjs (11 tests)
pinning: every registry entry has an extraction row; not-implemented
rows carry a reason; all 12 plan-named construct-risk indicators stay
extractable; Core-tier coverage floor of 45%; shape-family unit tests.

Diagnostic-harness change only. No production code touched.

* fix(resilience): wire event-aggregate per-indicator influence via exported scorer helpers

Addresses P1 review on PR 0 compare harness. Previous commit marked 16
Core-tier indicators as 'not-implemented' because they needed scorer
event-window/severity-weighting math; that left the gate-9 acceptance
apparatus incomplete for a large part of the shipped score.

Fix: export the scorer-internal aggregation helpers so the harness
calls them directly. Zero aggregation math duplicated in the harness,
harness and scorer cannot drift.

Exported from _dimension-scorers.ts (purely additive):
  summarizeCyber, summarizeOutages, summarizeGps,
  summarizeUcdp, summarizeUnrest, summarizeSocialVelocity,
  getCountryDisplacement, getThreatSummaryScore,
  countTradeRestrictions, countTradeBarriers.

13 extraction rules moved from not-implemented to implemented:
  cyberThreats, internetOutages, infraOutages, gpsJamming,
  ucdpConflict, unrestEvents, socialVelocity, newsThreatScore,
  displacementTotal, displacementHosted, tradeRestrictions,
  tradeBarriers, recoveryConflictPressure, recoveryDisplacementVelocity.

Coverage:
  52/59 total (88%), 46/50 Core-tier (92%).

Four Core indicators remain not-implemented for STRUCTURAL reasons,
NOT missing code. Scorer inputs are genuinely global scalars with
zero per-country variance, so Pearson(indicator, overall) is 0 or
NaN by construction:
  shippingStress, transitDisruption, energyPriceStress — scorer
  reads a global scalar applied to every country; a per-country
  effective signal would need re-expression as (global x per-country
  exposure), which is a derived signal in a different entry.
  aquastatWaterAvailability — needs a distinct sub-indicator path
  resolver; enrichment follow-up.

New test asserts the three no-per-country-variance indicators STAY
not-implemented with a matching reason, so any future extraction
that appears to cover them without fixing the underlying construct
fails.

Dispatcher split into STATIC / SIMPLE / AGGREGATE extractor tables
to stay under biome complexity limit. Core-tier floor test raised
from 45% to 80%.

89 resilience tests pass, typecheck clean, biome clean. No production
behaviour changes.

* fix(resilience): tag-gated AQUASTAT extractor closes the last fixable Core gap

Reviewer flagged aquastatWaterAvailability as the only remaining Core
indicator where the not-implemented status was structurally fixable
rather than conceptually impossible.

Both aquastatWaterStress and aquastatWaterAvailability share a single
.aquastat.value field; the scorer's scoreAquastatValue splits them
by the sibling .aquastat.indicator tag keyword (stress/withdrawal/
dependency to stress family; availability/renewable/access to
availability family). The harness now mirrors this branching:

  - classifyAquastatFamily implements the scorer's priority order
    (stress-family match wins even if the tag also contains an
    availability keyword, matching the sequential if-check at
    _dimension-scorers.ts L770-776).
  - static-aquastat-stress / static-aquastat-availability extractors
    return the value only when the family matches, so stress-family
    readings never corrupt the availability Pearson and vice versa.

Core-tier coverage: 46/50 to 47/50 (94%). The 3 remaining Core
not-implemented indicators (shippingStress, transitDisruption,
energyPriceStress) are all structural impossibilities: scorer inputs
are global scalars with zero per-country variance.

New contract test pins both directions of the tag gate plus the
priority-order edge case (a tag containing both families' keywords
routes to stress).

90 resilience tests pass, typecheck clean, biome clean.
This commit is contained in:
Elie Habib
2026-04-22 16:44:12 +04:00
committed by GitHub
parent 2765b46dad
commit da0f26a3cf
16 changed files with 5207 additions and 34 deletions

View File

@@ -12,6 +12,29 @@ This document describes the **currently shipping** behavior of the index. The ve
Everything documented below describes the **currently shipping** state: schemaVersion `"2.0"` shape, 6 domains × 19 dimensions × 3 pillars, and the 6-domain weighted `overall_score`. When an operator flips the pillar-combined flag on, the subsection on [Pillar-combined score activation](#pillar-combined-score-activation-flag-gated-default-off) documents what changes.
## Construct contract
Country Resilience measures **absolute national shock-absorption and recovery capacity at a point in time**. It does not adjust for income level. Development-adjacent indicators enter only when they measure a direct resilience mechanism. Those indicators use threshold or saturating transforms so the score rewards functional capacity, not affluence itself. Peer-relative over- and under-performance will be published separately as an analytical overlay, not inside the core score.
The scorer will treat development as relevant only where it creates a direct and measurable shock-absorption mechanism. Pure level-of-affluence proxies are excluded. Development-relative overperformance will be reported separately and will not alter the ordinal country ranking.
Every indicator in the scorer is being evaluated against a single **mechanism test**: *what direct shock channel does this measure?* An indicator whose only answer is "this country is rich" is excluded from the core score regardless of its historical correlation with resilience outcomes. An indicator whose answer is "capacity X absorbs shock Y" can enter but must use a threshold or saturating transform so it rewards the mechanism rather than the level of resource that drives it.
This PR (the diagnostic freeze) does not change any scoring behaviour. It ships the mechanism-test framework and the apparatus to measure compliance; it does not claim compliance. Several indicators in the current scorer fail the test (notably `electricityConsumption`, `gasShare` / `coalShare` as flat domestic-fossil penalties, and `WHO per-capita health spend`). They are tagged `wealth-proxy` or equivalent in `docs/methodology/indicator-sources.yaml` and scheduled for replacement in PR 1 / PR 4 of the repair plan. Published rankings today reflect the pre-repair scorer; the mechanism-test contract applies fully only after PR 4.
## Known construct limitations (in repair)
The first-publication repair is sequenced as PR 0 → PR 1 → PR 3 → PR 2 → PR 4 under the plan above. At the time of writing (PR 0 shipping), the following six construct errors are known and scheduled:
1. **`electricityConsumption` is a wealth proxy, not a resilience signal.** Weight 0.30 on the `energy` dimension; rewards per-capita load rather than grid-integrity capacity. Replaced in PR 1 with `powerLossesPct`, `reserveMarginPct`, and `accessToElectricityPct` (the last moved to the `infrastructure` domain).
2. **Gas and coal penalized as vulnerability even when domestic.** Current `gasShare` / `coalShare` penalties conflate fossil-dominance with fossil-import-dependence. Replaced in PR 1 with a single `importedFossilDependence` composite using World Bank `EG.IMP.CONS.ZS`.
3. **No nuclear credit in `scoreEnergy`.** Nuclear-heavy generation scores no points despite firm low-carbon characteristics. Fixed in PR 1 by collapsing `renewShare` + new `nuclearShare` into a single `lowCarbonGenerationShare` indicator.
4. **Sovereign-wealth buffers invisible to `reserveAdequacy`.** Current dimension only sees central-bank reserves; SWF assets are not counted. Fixed in PR 2 by splitting the dimension into `liquidReserveAdequacy` + `sovereignFiscalBuffer` with a three-component haircut (access × liquidity × transparency) and a saturating transform.
5. **Dead and regional-only signals in the global core score.** `fuelStockDays` (100% imputed globally), `euGasStorageStress` (EU-only), and `currencyExternal` (BIS 64-economy coverage) currently carry material weight despite insufficient coverage for a world ranking. Retired or scoped regional-only in PR 3.
6. **No coverage-based weight cap.** A dimension at 30% observed coverage carries the same weight as one at 95%. Fixed in PR 3 with a CI-enforced rule: no indicator with observed coverage below 70% may exceed 5% nominal weight or 5% effective influence.
Each item maps to an acceptance gate and a spec in the repair plan. Until PR 1PR 3 land, published rankings reflect the current construct and should be read in that context.
## In the dashboard
CRI is surfaced across three places in the product, all driven from the same currently-shipping score:

View File

@@ -0,0 +1,686 @@
# Resilience scorer indicator-source manifest (PR 0 scaffold, 2026-04-22).
#
# One entry per sub-indicator used inside a dimension scorer. Each entry
# answers the mechanism test from docs/plans/2026-04-22-001-fix-resilience-
# scorer-structural-bias-plan.md §1.1: what direct shock channel does this
# measure?
#
# Fields:
# indicator — scorer variable name (matches the weighted-blend entry)
# dimension — parent dimension id (matches RESILIENCE_DIMENSION_ORDER)
# domain — parent domain id
# weight — current nominal weight inside the dimension blend
# direction — higher-better | lower-better | composite
# source — authority that publishes the series
# seriesId — canonical series id where applicable (e.g. EG.IMP.CONS.ZS)
# seriesUrl — direct link to source documentation
# coveragePct — observed-data coverage across the 222-country static index (first-pass estimate; authoritative value lives in the matching seeder's seed-meta.coverage field)
# lastObservedYear — most-recent year with global data in the source
# license — reuse license (CC-BY, CC0, OGL, Proprietary-with-fair-use, etc.)
# mechanismTestRationale — one-sentence answer to "what direct shock channel does this measure?"
# constructStatus — observed-mechanism | wealth-proxy | imputed-floor | regional-only | dead-signal
#
# constructStatus is the v3-plan classification:
# observed-mechanism — passes the mechanism test; kept as-is pending goalpost review
# wealth-proxy — fails the mechanism test; slated for removal or threshold-transform
# imputed-floor — data source not wired; producing only the imputed midpoint
# regional-only — data source covers <50% of scorable countries
# dead-signal — saturated or compressed; signal collapsed across the ranking
# ECONOMIC DOMAIN (weight 0.17) -------------------------------------------
- indicator: govRevenuePct
dimension: macroFiscal
domain: economic
weight: 0.50
direction: higher-better
source: IMF
seriesId: GGR_G01_GDP_PT
seriesUrl: https://www.imf.org/external/datamapper/GGR_G01_GDP_PT@FM/
coveragePct: 0.90
lastObservedYear: 2024
license: Proprietary-with-fair-use
mechanismTestRationale: Government revenue % GDP is the policy-response headroom a state can deploy during a fiscal shock; higher = more ability to absorb.
constructStatus: observed-mechanism
reviewNotes: Goalpost 5-45 is probably too wide at the top; Nordic revenue/GDP ≥ 50% saturates. Review in PR 4 goalpost pass.
- indicator: debtGrowthRate
dimension: macroFiscal
domain: economic
weight: 0.20
direction: lower-better
source: National debt databases (IMF + WB cross-source)
seriesId: TBD
seriesUrl: TBD
coveragePct: 0.80
lastObservedYear: 2024
license: Mixed (per-source)
mechanismTestRationale: Rising debt growth indicates deteriorating fiscal trajectory and reduced capacity to finance a shock response.
constructStatus: observed-mechanism
- indicator: currentAccountPct
dimension: macroFiscal
domain: economic
weight: 0.30
direction: higher-better
source: IMF
seriesId: BCA_NGDPD
seriesUrl: https://www.imf.org/external/datamapper/BCA_NGDPD@WEO/
coveragePct: 0.85
lastObservedYear: 2024
license: Proprietary-with-fair-use
mechanismTestRationale: Current account surplus indicates external-payments resilience; deficit indicates vulnerability to external-finance shocks.
constructStatus: observed-mechanism
- indicator: fxVolatility
dimension: currencyExternal
domain: economic
weight: 0.60
direction: lower-better
source: BIS Data Portal
seriesId: Broad Effective Exchange Rate
seriesUrl: https://data.bis.org/topics/EER
coveragePct: 0.30 # BIS covers 64 economies
lastObservedYear: 2024
license: CC-BY
mechanismTestRationale: FX volatility measures monetary-shock transmission risk.
constructStatus: regional-only
reviewNotes: BIS EER covers only 64 economies. Replace with FR.INR.RINR (real interest rate) or IMF inflation volatility in PR 3.
- indicator: fxDeviation
dimension: currencyExternal
domain: economic
weight: 0.25
direction: lower-better
source: BIS Data Portal
seriesId: EER deviation from equilibrium
seriesUrl: https://data.bis.org/topics/EER
coveragePct: 0.30
lastObservedYear: 2024
license: CC-BY
mechanismTestRationale: EER deviation from equilibrium proxies mis-aligned exchange rates that create abrupt-correction risk.
constructStatus: regional-only
reviewNotes: Same 64-economy limitation. Retire in PR 3.
- indicator: fxReservesAdequacy
dimension: currencyExternal
domain: economic
weight: 0.15
direction: higher-better
source: World Bank
seriesId: FI.RES.TOTL.MO
seriesUrl: https://data.worldbank.org/indicator/FI.RES.TOTL.MO
coveragePct: 0.85
lastObservedYear: 2023
license: CC-BY-4.0
mechanismTestRationale: Reserves in months of imports directly measures immediate external-finance cushion.
constructStatus: observed-mechanism
reviewNotes: Also enters reserveAdequacy at 1.0 weight. Double-counting risk across dimensions; review in PR 2.
- indicator: sanctionCount
dimension: tradeSanctions
domain: economic
weight: 0.45
direction: lower-better
source: OFAC
seriesId: Consolidated Sanctions List (count per country)
seriesUrl: https://sanctionslist.ofac.treas.gov/
coveragePct: 1.00
lastObservedYear: 2026
license: US Government public domain
mechanismTestRationale: Active sanctions restrict trade/finance channels; higher count = more channels restricted.
constructStatus: observed-mechanism
reviewNotes: OFAC-only. PR 4 adds EU/UK/CN sanctions for directional completeness.
- indicator: tradeRestrictions
dimension: tradeSanctions
domain: economic
weight: 0.15
direction: lower-better
source: WTO
seriesId: Trade Monitoring Database
seriesUrl: https://www.wto.org/english/tratop_e/tpr_e/trade_monitoring_e.htm
coveragePct: 0.75
lastObservedYear: 2025
license: Open
mechanismTestRationale: Active trade restrictions (in-force, weighted 3×) directly measure market-access loss.
constructStatus: observed-mechanism
- indicator: tradeBarriers
dimension: tradeSanctions
domain: economic
weight: 0.15
direction: lower-better
source: WTO
seriesId: Trade Barriers Notifications
seriesUrl: https://tradebarriers.wto.org/
coveragePct: 0.70
lastObservedYear: 2025
license: Open
mechanismTestRationale: Notified trade barriers (not yet in force) indicate near-term market-access risk.
constructStatus: observed-mechanism
- indicator: appliedTariffRate
dimension: tradeSanctions
domain: economic
weight: 0.25
direction: lower-better
source: World Bank / WITS
seriesId: TM.TAX.MRCH.WM.AR.ZS
seriesUrl: https://data.worldbank.org/indicator/TM.TAX.MRCH.WM.AR.ZS
coveragePct: 0.90
lastObservedYear: 2023
license: CC-BY-4.0
mechanismTestRationale: Applied tariff rates measure cost of trade restriction on imports.
constructStatus: observed-mechanism
# INFRASTRUCTURE DOMAIN (weight 0.15) -------------------------------------
- indicator: cyberThreats
dimension: cyberDigital
domain: infrastructure
weight: 0.45
direction: lower-better
source: Cyber threat feeds (mixed Western-origin)
seriesId: severity-weighted count (critical 3×, high 2×, medium 1×, low 0.5×)
seriesUrl: Internal seed
coveragePct: 0.70
lastObservedYear: 2026
license: Proprietary feeds (aggregated)
mechanismTestRationale: Severity-weighted cyber threat count directly measures ongoing cyber-attack pressure on national digital infrastructure.
constructStatus: observed-mechanism
reviewNotes: Western-feed bias; non-English cyber activity under-represented. PR 4 §4.8 tracks this.
- indicator: internetOutages
dimension: cyberDigital
domain: infrastructure
weight: 0.35
direction: lower-better
source: Cloudflare Radar + internal monitoring
seriesId: Outage penalty (total 4×, major 2×, partial 1×)
seriesUrl: https://radar.cloudflare.com/
coveragePct: 0.95
lastObservedYear: 2026
license: CC-BY-4.0
mechanismTestRationale: Internet outages directly measure digital-infrastructure availability under current stress.
constructStatus: observed-mechanism
- indicator: gpsJamming
dimension: cyberDigital
domain: infrastructure
weight: 0.20
direction: lower-better
source: GPSJam
seriesId: Hex penalty (high 3×, medium 1×)
seriesUrl: https://gpsjam.org/
coveragePct: 0.95
lastObservedYear: 2026
license: Open data
mechanismTestRationale: GPS jamming intensity measures electronic-warfare / navigation-disruption exposure.
constructStatus: observed-mechanism
- indicator: logisticsPerformanceIndex
dimension: logisticsSupply
domain: infrastructure
weight: TBD
direction: higher-better
source: World Bank LPI
seriesId: LP.LPI.OVRL.XQ
seriesUrl: https://lpi.worldbank.org/
coveragePct: 0.85
lastObservedYear: 2023
license: CC-BY-4.0
mechanismTestRationale: Logistics Performance Index measures functional capacity to move goods; directly shocks during supply-chain disruptions.
constructStatus: observed-mechanism
reviewNotes: Goalpost anchors OECD-centric; PR 4 review.
- indicator: infrastructureSubcomponents
dimension: infrastructure
domain: infrastructure
weight: TBD
direction: higher-better
source: World Bank + WEF Global Competitiveness
seriesId: Composite
seriesUrl: TBD
coveragePct: 0.80
lastObservedYear: 2024
license: Mixed
mechanismTestRationale: Physical infrastructure quality is the baseline capacity for delivering services during normal and crisis periods.
constructStatus: observed-mechanism
# ENERGY DOMAIN (weight 0.11) --------------------------------------------
- indicator: dependency
dimension: energy
domain: energy
weight: 0.25
direction: lower-better
source: IEA (via static seed)
seriesId: Energy import dependency (%)
seriesUrl: https://www.iea.org/data-and-statistics/data-browser
coveragePct: 0.50 # IEA detail covers OECD + major non-OECD
lastObservedYear: 2023
license: Proprietary-with-fair-use
mechanismTestRationale: Share of energy consumption that is imported; direct supply-shock exposure.
constructStatus: observed-mechanism
reviewNotes: PR 1 §3.2 replaces with World Bank EG.IMP.CONS.ZS (better coverage) as part of importedFossilDependence composite.
- indicator: gasShare
dimension: energy
domain: energy
weight: 0.12
direction: lower-better
source: IEA World Energy Balances via static seed
seriesId: Natural gas share of primary energy
seriesUrl: https://www.iea.org/data-and-statistics/data-browser
coveragePct: 0.85
lastObservedYear: 2023
license: Proprietary-with-fair-use
mechanismTestRationale: CURRENT SCORER applies this as a vulnerability (lower-better) but it CONFLATES fossil-dominance with fossil-import-dependence. Domestic gas is a resilience asset, not a vulnerability.
constructStatus: wealth-proxy
reviewNotes: PR 1 §3.2 removes as standalone input; folds into importedFossilDependence under power-system framing.
- indicator: coalShare
dimension: energy
domain: energy
weight: 0.08
direction: lower-better
source: IEA World Energy Balances via static seed
seriesId: Coal share of primary energy
seriesUrl: https://www.iea.org/data-and-statistics/data-browser
coveragePct: 0.85
lastObservedYear: 2023
license: Proprietary-with-fair-use
mechanismTestRationale: Same concern as gasShare — penalty is climate-frame, not resilience-frame. Fails mechanism test under absolute-resilience contract.
constructStatus: wealth-proxy
reviewNotes: PR 1 §3.2 removes.
- indicator: renewShare
dimension: energy
domain: energy
weight: 0.05
direction: higher-better
source: IEA / World Bank
seriesId: EG.ELC.RNEW.ZS
seriesUrl: https://data.worldbank.org/indicator/EG.ELC.RNEW.ZS
coveragePct: 0.85
lastObservedYear: 2023
license: CC-BY-4.0
mechanismTestRationale: Share of electricity from renewables, proxies low-carbon-firm-generation capacity (for hydro/geothermal) and diversity-of-supply (for wind/solar).
constructStatus: observed-mechanism
reviewNotes: PR 1 §3.3 collapses with nuclearShare (currently missing) into one lowCarbonGenerationShare indicator.
- indicator: storageStress
dimension: energy
domain: energy
weight: 0.10
direction: lower-better
source: GIE AGSI+
seriesId: EU gas storage fill % (per country)
seriesUrl: https://agsi.gie.eu/
coveragePct: 0.15 # EU + UK + a handful
lastObservedYear: 2026
license: Open
mechanismTestRationale: EU gas storage fill directly measures winter-heating-shock buffer. European-only platform.
constructStatus: regional-only
reviewNotes: PR 1 §3.5 renames to euGasStorageStress and scopes to EU-only (weight 0 for non-EU).
- indicator: exposedEnergyStress
dimension: energy
domain: energy
weight: 0.10
direction: composite
source: Internal composite (energy-price-stress × import-exposure)
seriesId: Derived
seriesUrl: Internal seed
coveragePct: 0.70
lastObservedYear: 2026
license: Internal
mechanismTestRationale: Combines energy-price shocks with import-exposure to measure price-shock transmission.
constructStatus: observed-mechanism
reviewNotes: PR 1 may simplify given importedFossilDependence covers import-exposure directly.
- indicator: electricityConsumption
dimension: energy
domain: energy
weight: 0.30
direction: higher-better
source: World Bank
seriesId: EG.USE.ELEC.KH.PC
seriesUrl: https://data.worldbank.org/indicator/EG.USE.ELEC.KH.PC
coveragePct: 0.90
lastObservedYear: 2022
license: CC-BY-4.0
mechanismTestRationale: FAILS the mechanism test. Per-capita electricity consumption tracks GDP per capita; it is a level-of-load measure not a resilience mechanism. IEA energy-security framing treats EFFICIENCY (lower load for same output) as resilience, which this indicator inversely rewards.
constructStatus: wealth-proxy
reviewNotes: PR 1 §3.1 removes. Replaced with powerLossesPct (EG.ELC.LOSS.ZS), reserveMarginPct (IEA), and accessToElectricityPct (EG.ELC.ACCS.ZS) moved to infrastructure domain.
# SOCIAL-GOVERNANCE DOMAIN (weight 0.19) ---------------------------------
- indicator: wgiComposite
dimension: governanceInstitutional
domain: social-governance
weight: 1.0
direction: higher-better
source: World Bank WGI
seriesId: Voice/Accountability, Political Stability, Government Effectiveness, Regulatory Quality, Rule of Law, Control of Corruption
seriesUrl: https://info.worldbank.org/governance/wgi/
coveragePct: 0.98
lastObservedYear: 2023
license: CC-BY-4.0
mechanismTestRationale: WGI subscores measure state capacity to design and enforce policy response to shocks. Passes the mechanism test conditionally — the composite is a direct policy-response-capacity signal.
constructStatus: observed-mechanism
reviewNotes: Weights review in PR 4. Individual WGI subscores may need separate weighting vs equal-blend.
- indicator: gpiScore
dimension: socialCohesion
domain: social-governance
weight: 0.40 # approximate
direction: lower-better
source: Institute for Economics and Peace
seriesId: Global Peace Index
seriesUrl: https://www.visionofhumanity.org/
coveragePct: 0.75
lastObservedYear: 2024
license: Proprietary-with-fair-use
mechanismTestRationale: GPI measures internal conflict, militarization, and external conflict intensity — direct social-cohesion-shock exposure.
constructStatus: observed-mechanism
reviewNotes: Known Western-democracy bias in GPI methodology; PR 4 review.
- indicator: displacementMetric
dimension: socialCohesion
domain: social-governance
weight: 0.30 # approximate
direction: lower-better
source: UNHCR
seriesId: totalDisplaced
seriesUrl: https://data.unhcr.org/
coveragePct: 0.95
lastObservedYear: 2025
license: Open
mechanismTestRationale: Total displaced persons directly measures ongoing forced-migration pressure. BIAS: currently blends origin + host; penalizes Jordan/Turkey/Germany for HOSTING.
constructStatus: wealth-proxy # classified as biased — bias label
reviewNotes: PR 4 §4.2 splits origin (negative signal) from host (mixed signal).
- indicator: unrestMetric
dimension: socialCohesion
domain: social-governance
weight: 0.30 # approximate
direction: lower-better
source: Internal unrest seed (cross-source signals + UCDP)
seriesId: unrestCount + sqrt(fatalities)
seriesUrl: Internal seed
coveragePct: 0.85
lastObservedYear: 2026
license: Mixed
mechanismTestRationale: Active unrest events measure current social-cohesion stress.
constructStatus: observed-mechanism
- indicator: borderSecuritySubs
dimension: borderSecurity
domain: social-governance
weight: TBD
direction: composite
source: Composite (UNHCR displacement + UCDP conflict + governance)
seriesId: Derived
seriesUrl: Internal seed
coveragePct: 0.80
lastObservedYear: 2026
license: Mixed
mechanismTestRationale: Border-security composite captures cross-border shock transmission exposure.
constructStatus: observed-mechanism
reviewNotes: Inherits displacement host-vs-sending bias from socialCohesion. PR 4 fix.
- indicator: rsfPressFreedom
dimension: informationCognitive
domain: social-governance
weight: TBD
direction: higher-better
source: Reporters Sans Frontieres
seriesId: Press Freedom Index
seriesUrl: https://rsf.org/en/index
coveragePct: 0.85
lastObservedYear: 2024
license: Proprietary-with-fair-use
mechanismTestRationale: Press freedom proxies quality of information-shock response and independent verification capacity.
constructStatus: observed-mechanism
- indicator: languageNormalizedSocialVelocity
dimension: informationCognitive
domain: social-governance
weight: TBD
direction: composite
source: Reddit + cross-source + internal language-coverage-weighting
seriesId: Internal
seriesUrl: Internal seed
coveragePct: 0.95
lastObservedYear: 2026
license: Mixed
mechanismTestRationale: Language-normalized social-information velocity measures information-shock propagation speed adjusted for source-density bias.
constructStatus: observed-mechanism
# HEALTH-FOOD DOMAIN (weight 0.13) ---------------------------------------
- indicator: whoHealthExpenditure
dimension: healthPublicService
domain: health-food
weight: TBD
direction: higher-better
source: WHO Global Health Observatory
seriesId: Current health expenditure per capita, PPP
seriesUrl: https://www.who.int/data/gho
coveragePct: 0.95
lastObservedYear: 2022
license: CC-BY-4.0
mechanismTestRationale: Health expenditure per capita proxies health-system capacity. FAILS the strict mechanism test — it measures SPEND, not CAPACITY. Should be replaced with surge-capacity / bed-density / ICU-density threshold signal.
constructStatus: wealth-proxy
reviewNotes: PR 4 §4.9 replacement.
- indicator: ipcPhase
dimension: foodWater
domain: health-food
weight: 0.15
direction: lower-better
source: FAO IPC (Integrated Food Security Phase Classification)
seriesId: IPC Phase (1-5)
seriesUrl: https://www.ipcinfo.org/
coveragePct: 0.40 # IPC covers acutely-affected countries
lastObservedYear: 2025
license: Open
mechanismTestRationale: IPC phase directly measures current food-security-crisis severity.
constructStatus: observed-mechanism
reviewNotes: Coverage is inherently partial — IPC only tracks countries with current/imminent food crises. Imputed to a resilient-default for non-tracked countries.
- indicator: aquastatWaterStress
dimension: foodWater
domain: health-food
weight: 0.25
direction: lower-better
source: FAO AQUASTAT
seriesId: Water stress (withdrawal / renewable resources)
seriesUrl: https://www.fao.org/aquastat/
coveragePct: 0.85
lastObservedYear: 2020
license: Open
mechanismTestRationale: Water stress directly measures water-supply-shock exposure.
constructStatus: observed-mechanism
- indicator: aquastatWaterAvailability
dimension: foodWater
domain: health-food
weight: 0.15
direction: higher-better
source: FAO AQUASTAT
seriesId: Water availability (m³/capita)
seriesUrl: https://www.fao.org/aquastat/
coveragePct: 0.85
lastObservedYear: 2020
license: Open
mechanismTestRationale: Water availability per capita proxies baseline water-security-shock buffer.
constructStatus: observed-mechanism
# RECOVERY DOMAIN (weight 0.25) ------------------------------------------
- indicator: recoveryGovRevenue
dimension: fiscalSpace
domain: recovery
weight: 0.40
direction: higher-better
source: IMF
seriesId: GGR_G01_GDP_PT
seriesUrl: https://www.imf.org/external/datamapper/GGR_G01_GDP_PT@FM/
coveragePct: 0.90
lastObservedYear: 2024
license: Proprietary-with-fair-use
mechanismTestRationale: Government revenue % GDP for recovery scenarios — policy-response fiscal headroom.
constructStatus: observed-mechanism
reviewNotes: Duplicate with macroFiscal.govRevenuePct. PR 4 may de-duplicate.
- indicator: recoveryFiscalBalance
dimension: fiscalSpace
domain: recovery
weight: 0.30
direction: higher-better
source: IMF
seriesId: GGXCNL_G01_GDP_PT
seriesUrl: https://www.imf.org/external/datamapper/GGXCNL_G01_GDP_PT@FM/
coveragePct: 0.85
lastObservedYear: 2024
license: Proprietary-with-fair-use
mechanismTestRationale: General government net lending/borrowing as % of GDP — direct fiscal-response-capacity signal.
constructStatus: observed-mechanism
- indicator: recoveryDebtToGdp
dimension: fiscalSpace
domain: recovery
weight: 0.30
direction: lower-better
source: IMF
seriesId: GGXWDG_NGDP_PT
seriesUrl: https://www.imf.org/external/datamapper/GGXWDG_NGDP_PT@FM/
coveragePct: 0.90
lastObservedYear: 2024
license: Proprietary-with-fair-use
mechanismTestRationale: General government gross debt to GDP — fiscal-stress cushion.
constructStatus: observed-mechanism
reviewNotes: Goalpost 0-150 is too linear; Japan at 260% (mostly domestic, yen-denominated) scores 0 despite weak real fiscal-stress risk. PR 4 §4.4 adds holder-composition modifier.
- indicator: recoveryReserveMonths
dimension: reserveAdequacy
domain: recovery
weight: 1.00
direction: higher-better
source: World Bank
seriesId: FI.RES.TOTL.MO
seriesUrl: https://data.worldbank.org/indicator/FI.RES.TOTL.MO
coveragePct: 0.85
lastObservedYear: 2023
license: CC-BY-4.0
mechanismTestRationale: Central-bank reserves in months of imports — immediate external-liquidity cushion.
constructStatus: observed-mechanism
reviewNotes: PR 2 §3.4 renames to liquidReserveAdequacy; new dimension sovereignFiscalBuffer added.
- indicator: recoveryDebtToReserves
dimension: externalDebtCoverage
domain: recovery
weight: 1.00
direction: lower-better
source: World Bank
seriesId: DT.DOD.DSTC.CD / FI.RES.TOTL.CD
seriesUrl: https://data.worldbank.org/indicator/DT.DOD.DSTC.CD
coveragePct: 0.75
lastObservedYear: 2023
license: CC-BY-4.0
mechanismTestRationale: Short-term external debt to reserves ratio — rollover-shock exposure.
constructStatus: dead-signal
reviewNotes: Saturates at 100 for every country in the 9-country probe (goalpost 0-5 is too generous). PR 3 re-goalpost.
- indicator: recoveryImportHhi
dimension: importConcentration
domain: recovery
weight: 1.00
direction: lower-better
source: UN Comtrade
seriesId: HS2 bilateral Herfindahl-Hirschman Index
seriesUrl: https://comtrade.un.org/
coveragePct: 0.70
lastObservedYear: 2023
license: Open
mechanismTestRationale: Import-partner concentration (HHI) — supplier-shock exposure.
constructStatus: observed-mechanism
reviewNotes: Coverage gap for UAE and small-island states; PR 1+ audit.
- indicator: recoveryWgiContinuity
dimension: stateContinuity
domain: recovery
weight: 0.50
direction: higher-better
source: World Bank WGI
seriesId: Mean of WGI subscores
seriesUrl: https://info.worldbank.org/governance/wgi/
coveragePct: 0.98
lastObservedYear: 2023
license: CC-BY-4.0
mechanismTestRationale: WGI composite as state-continuity proxy — institutional durability through shocks.
constructStatus: observed-mechanism
reviewNotes: Duplicate with governanceInstitutional.wgiComposite. PR 4 de-duplicate.
- indicator: recoveryConflictPressure
dimension: stateContinuity
domain: recovery
weight: 0.30
direction: lower-better
source: UCDP
seriesId: Armed conflict events / fatalities
seriesUrl: https://ucdp.uu.se/
coveragePct: 0.95
lastObservedYear: 2026
license: Open
mechanismTestRationale: UCDP conflict intensity — direct state-continuity-shock metric.
constructStatus: observed-mechanism
- indicator: recoveryDisplacementVelocity
dimension: stateContinuity
domain: recovery
weight: 0.20
direction: lower-better
source: UNHCR
seriesId: Displacement as share of population
seriesUrl: https://data.unhcr.org/
coveragePct: 0.95
lastObservedYear: 2025
license: Open
mechanismTestRationale: Displacement velocity — population-scale state-continuity stress.
constructStatus: observed-mechanism
reviewNotes: Inherits host-vs-sending bias. PR 4 §4.2 fix.
- indicator: recoveryFuelStockDays
dimension: fuelStockDays
domain: recovery
weight: 1.00
direction: higher-better
source: IEA / EIA
seriesId: Days of fuel stock cover
seriesUrl: https://www.iea.org/data-and-statistics/data-tools/oil-stocks-of-iea-countries
coveragePct: 0.30 # imputed for every country
lastObservedYear: null
license: Proprietary
mechanismTestRationale: Days of fuel stock for import-shock coverage. IEA rules bind only net importers; net exporters get no observed value.
constructStatus: imputed-floor
reviewNotes: PR 3 §3.5 retires from core score (permanent). Enrichment-only if IEA/EIA connector ever wires.
# PENDING ADDITIONS FOR PR 1+ --------------------------------------------
# PR 1 additions (not yet in the scorer):
# - powerLossesPct → EG.ELC.LOSS.ZS (transmission+distribution losses, lower-better)
# - reserveMarginPct → IEA electricity balance (generation reserve margin, higher-better)
# - accessToElectricityPct → EG.ELC.ACCS.ZS (threshold/saturating, moved to infrastructure domain)
# - importedFossilDependence → EG.IMP.CONS.ZS × fossil-generation-share (Option B power-system framing)
# - lowCarbonGenerationShare → EG.ELC.NUCL.ZS + EG.ELC.RNEW.ZS (higher-better)
# PR 2 additions (not yet in the scorer):
# - liquidReserveAdequacy → FI.RES.TOTL.MO (rename of current reserveMonths)
# - sovereignFiscalBuffer → IFSWF + official disclosures × access × liquidity × transparency
# PR 3 replacements (not yet in the scorer):
# - realInterestRate → FR.INR.RINR (replaces currencyExternal for non-BIS countries)

File diff suppressed because it is too large Load Diff

View File

@@ -28,6 +28,15 @@ const __dirname = dirname(fileURLToPath(import.meta.url));
const VALIDATION_DIR = join(__dirname, '..', 'docs', 'methodology', 'country-resilience-index', 'validation');
const RESILIENCE_SCORE_CACHE_PREFIX = 'resilience:score:v10:';
// Mirror of _shared.ts#currentCacheFormula. Must stay in lockstep; see
// the same comment in scripts/validate-resilience-correlation.mjs for
// the rationale.
function currentCacheFormulaLocal() {
const combine = (process.env.RESILIENCE_PILLAR_COMBINE_ENABLED ?? 'false').toLowerCase() === 'true';
const v2 = (process.env.RESILIENCE_SCHEMA_V2_ENABLED ?? 'true').toLowerCase() === 'true';
return combine && v2 ? 'pc' : 'd6';
}
const BACKTEST_RESULT_KEY = 'resilience:backtest:outcomes:v1';
const BACKTEST_TTL_SECONDS = 7 * 24 * 60 * 60;
@@ -214,6 +223,8 @@ async function fetchAllResilienceScores(url, token) {
const commands = ALL_COUNTRIES.map((cc) => ['GET', `${RESILIENCE_SCORE_CACHE_PREFIX}${cc}`]);
const batchSize = 50;
const scores = new Map();
const current = currentCacheFormulaLocal();
let staleFormulaSkipped = 0;
for (let i = 0; i < commands.length; i += batchSize) {
const batch = commands.slice(i, i + batchSize);
@@ -225,6 +236,12 @@ async function fetchAllResilienceScores(url, token) {
if (typeof raw !== 'string') continue;
try {
const parsed = JSON.parse(raw);
// Cross-formula gate: mixed-formula cohorts would confound
// the AUC for recovery-prediction models.
if (parsed?._formula !== current) {
staleFormulaSkipped++;
continue;
}
if (parsed?.overallScore != null) {
scores.set(batchCodes[j], parsed.overallScore);
}
@@ -232,6 +249,9 @@ async function fetchAllResilienceScores(url, token) {
}
}
if (staleFormulaSkipped > 0) {
console.warn(`[backtest-resilience-outcomes] skipped ${staleFormulaSkipped} stale-formula score entries (current=${current})`);
}
return scores;
}

View File

@@ -363,6 +363,15 @@ function median(arr) {
return s.length % 2 ? s[mid] : (s[mid - 1] + s[mid]) / 2;
}
// Mirror of _shared.ts#currentCacheFormula. Must stay in lockstep; a
// mixed-formula benchmark would produce a meaningless Spearman / Pearson
// against INFORM / HDI / WRI reference indices.
function currentCacheFormulaLocal() {
const combine = (process.env.RESILIENCE_PILLAR_COMBINE_ENABLED ?? 'false').toLowerCase() === 'true';
const v2 = (process.env.RESILIENCE_SCHEMA_V2_ENABLED ?? 'true').toLowerCase() === 'true';
return combine && v2 ? 'pc' : 'd6';
}
async function readWmScoresFromRedis() {
const { url, token } = getRedisCredentials();
const rankingResp = await fetch(`${url}/get/${encodeURIComponent('resilience:ranking:v10')}`, {
@@ -379,8 +388,18 @@ async function readWmScoresFromRedis() {
return new Map();
}
const parsed = JSON.parse(rankingData.result);
// Cross-formula gate: the ranking payload carries a `_formula` tag
// written by get-resilience-ranking.ts#stampRankingCacheTag. If the
// tag disagrees with the current formula (because the flag just
// flipped and the ranking cron hasn't rebuilt yet), reject the
// ranking rather than benchmarking against a stale-formula cohort.
const current = currentCacheFormulaLocal();
if (parsed && typeof parsed === 'object' && parsed._formula !== current) {
console.warn(`[benchmark] Ranking _formula=${parsed._formula ?? 'undefined'} does not match current=${current} — skipping (stale-formula cache entry)`);
return new Map();
}
// The ranking cache stores a GetResilienceRankingResponse object
// with { items, greyedOut }, not a bare array.
// with { items, greyedOut, _formula }, not a bare array.
const ranking = Array.isArray(parsed) ? parsed : (parsed?.items ?? []);
const scores = new Map();
for (const item of ranking) {
@@ -388,7 +407,7 @@ async function readWmScoresFromRedis() {
scores.set(item.countryCode, item.overallScore);
}
}
console.log(`[benchmark] Read ${scores.size} WM resilience scores from Redis`);
console.log(`[benchmark] Read ${scores.size} WM resilience scores from Redis (formula=${current})`);
return scores;
}

File diff suppressed because it is too large Load Diff

View File

@@ -29,6 +29,15 @@ loadEnvFile(import.meta.url);
// Source of truth: server/worldmonitor/resilience/v1/_shared.ts
const RESILIENCE_SCORE_CACHE_PREFIX = 'resilience:score:v10:';
// Mirror of _shared.ts#currentCacheFormula — must stay in lockstep so
// the backtest only ingests same-formula cache entries. A mixed-formula
// cohort would confound the recovery-prediction correlations.
function currentCacheFormulaLocal() {
const combine = (process.env.RESILIENCE_PILLAR_COMBINE_ENABLED ?? 'false').toLowerCase() === 'true';
const v2 = (process.env.RESILIENCE_SCHEMA_V2_ENABLED ?? 'true').toLowerCase() === 'true';
return combine && v2 ? 'pc' : 'd6';
}
const MIN_SCORED_COUNTRIES = 5;
let _scoreAllDimensions = null;
@@ -188,12 +197,27 @@ function pearsonCorrelation(xs, ys) {
async function fetchScoresForCountries(url, token, countryCodes) {
const commands = countryCodes.map((cc) => ['GET', `${RESILIENCE_SCORE_CACHE_PREFIX}${cc}`]);
const results = await redisPipeline(url, token, commands);
const current = currentCacheFormulaLocal();
let staleFormulaSkipped = 0;
const scores = new Map();
for (let i = 0; i < countryCodes.length; i++) {
const raw = results[i]?.result;
if (typeof raw !== 'string') continue;
try { scores.set(countryCodes[i], JSON.parse(raw)); } catch { /* skip */ }
try {
const parsed = JSON.parse(raw);
// Cross-formula gate: only ingest same-formula entries. A
// mixed-formula cohort would produce a meaningless correlation
// between baseline resilience and post-shock recovery.
if (parsed?._formula !== current) {
staleFormulaSkipped++;
continue;
}
scores.set(countryCodes[i], parsed);
} catch { /* skip */ }
}
if (staleFormulaSkipped > 0) {
console.warn(`[validate-resilience-backtest] skipped ${staleFormulaSkipped} stale-formula entries (current=${current})`);
}
return scores;
}

View File

@@ -5,6 +5,17 @@ import { loadEnvFile, getRedisCredentials } from './_seed-utils.mjs';
// Source of truth: server/worldmonitor/resilience/v1/_shared.ts → RESILIENCE_SCORE_CACHE_PREFIX
const RESILIENCE_SCORE_CACHE_PREFIX = 'resilience:score:v10:';
// Mirror of server/worldmonitor/resilience/v1/_shared.ts#currentCacheFormula.
// Must stay in lockstep with the server-side definition so this script
// skips cross-formula cache entries for the same reasons the server
// does — correlations benchmarked against a mixed-formula cohort of
// d6 + pc entries would be meaningless.
function currentCacheFormulaLocal() {
const combine = (process.env.RESILIENCE_PILLAR_COMBINE_ENABLED ?? 'false').toLowerCase() === 'true';
const v2 = (process.env.RESILIENCE_SCHEMA_V2_ENABLED ?? 'true').toLowerCase() === 'true';
return combine && v2 ? 'pc' : 'd6';
}
const REFERENCE_INDICES = {
ndgain: {
NO: 0.76, IS: 0.72, NZ: 0.71, DK: 0.74, SE: 0.73, FI: 0.72, CH: 0.73, AU: 0.70,
@@ -79,6 +90,8 @@ function spearmanRho(x, y) {
async function fetchWorldMonitorScores(url, token, countryCodes) {
const commands = countryCodes.map((c) => ['GET', `${RESILIENCE_SCORE_CACHE_PREFIX}${c}`]);
const results = await redisPipeline(url, token, commands);
const current = currentCacheFormulaLocal();
const skipped = { staleFormula: 0, noOverallScore: 0, malformed: 0 };
const scores = new Map();
for (let i = 0; i < countryCodes.length; i++) {
@@ -86,10 +99,27 @@ async function fetchWorldMonitorScores(url, token, countryCodes) {
if (typeof raw !== 'string') continue;
try {
const parsed = JSON.parse(raw);
// Cross-formula gate: the benchmark/validation scripts run off
// live cache entries. A mixed-formula cohort (some countries
// scored under d6, others under pc because their cache entries
// landed on either side of a flag flip) would produce a
// meaningless Spearman. Skip stale-formula entries so the
// correlation runs only against same-formula peers.
if (parsed?._formula !== current) {
skipped.staleFormula++;
continue;
}
if (typeof parsed?.overallScore === 'number' && parsed.overallScore > 0) {
scores.set(countryCodes[i], parsed.overallScore);
} else {
skipped.noOverallScore++;
}
} catch { /* skip */ }
} catch {
skipped.malformed++;
}
}
if (skipped.staleFormula > 0) {
console.warn(`[validate-resilience-correlation] skipped ${skipped.staleFormula} stale-formula entries (current=${current})`);
}
return scores;
}

View File

@@ -605,7 +605,7 @@ function getLatestDebtEntry(raw: unknown, countryCode: string): NationalDebtEntr
return null;
}
function countTradeRestrictions(raw: unknown, countryCode: string): number {
export function countTradeRestrictions(raw: unknown, countryCode: string): number {
const restrictions: TradeRestriction[] = Array.isArray((raw as { restrictions?: unknown[] } | null)?.restrictions)
? ((raw as { restrictions?: TradeRestriction[] }).restrictions ?? [])
: [];
@@ -617,7 +617,7 @@ function countTradeRestrictions(raw: unknown, countryCode: string): number {
}, 0);
}
function countTradeBarriers(raw: unknown, countryCode: string): number {
export function countTradeBarriers(raw: unknown, countryCode: string): number {
const barriers: TradeBarrier[] = Array.isArray((raw as { barriers?: unknown[] } | null)?.barriers)
? ((raw as { barriers?: TradeBarrier[] }).barriers ?? [])
: [];
@@ -630,7 +630,7 @@ function isInWtoReporterSet(raw: unknown, countryCode: string): boolean {
return reporters.includes(countryCode);
}
function summarizeOutages(raw: unknown, countryCode: string): { total: number; major: number; partial: number } {
export function summarizeOutages(raw: unknown, countryCode: string): { total: number; major: number; partial: number } {
const outages: InternetOutage[] = Array.isArray((raw as { outages?: unknown[] } | null)?.outages)
? ((raw as { outages?: InternetOutage[] }).outages ?? [])
: [];
@@ -648,7 +648,7 @@ function summarizeOutages(raw: unknown, countryCode: string): { total: number; m
}, { total: 0, major: 0, partial: 0 });
}
function summarizeGps(raw: unknown, countryCode: string): { high: number; medium: number } {
export function summarizeGps(raw: unknown, countryCode: string): { high: number; medium: number } {
const hexes: GpsJamHex[] = Array.isArray((raw as { hexes?: unknown[] } | null)?.hexes)
? ((raw as { hexes?: GpsJamHex[] }).hexes ?? [])
: [];
@@ -664,7 +664,7 @@ function summarizeGps(raw: unknown, countryCode: string): { high: number; medium
}, { high: 0, medium: 0 });
}
function summarizeCyber(raw: unknown, countryCode: string): { weightedCount: number } {
export function summarizeCyber(raw: unknown, countryCode: string): { weightedCount: number } {
const threats: CyberThreat[] = Array.isArray((raw as { threats?: unknown[] } | null)?.threats)
? ((raw as { threats?: CyberThreat[] }).threats ?? [])
: [];
@@ -683,7 +683,7 @@ function summarizeCyber(raw: unknown, countryCode: string): { weightedCount: num
};
}
function summarizeUnrest(raw: unknown, countryCode: string): { unrestCount: number; fatalities: number } {
export function summarizeUnrest(raw: unknown, countryCode: string): { unrestCount: number; fatalities: number } {
const events: UnrestEvent[] = Array.isArray((raw as { events?: unknown[] } | null)?.events)
? ((raw as { events?: UnrestEvent[] }).events ?? [])
: [];
@@ -697,7 +697,7 @@ function summarizeUnrest(raw: unknown, countryCode: string): { unrestCount: numb
}, { unrestCount: 0, fatalities: 0 });
}
function summarizeUcdp(raw: unknown, countryCode: string): { eventCount: number; deaths: number; typeWeight: number } {
export function summarizeUcdp(raw: unknown, countryCode: string): { eventCount: number; deaths: number; typeWeight: number } {
const events: UcdpEvent[] = Array.isArray((raw as { events?: unknown[] } | null)?.events)
? ((raw as { events?: UcdpEvent[] }).events ?? [])
: [];
@@ -711,20 +711,20 @@ function summarizeUcdp(raw: unknown, countryCode: string): { eventCount: number;
}, { eventCount: 0, deaths: 0, typeWeight: 0 });
}
function getCountryDisplacement(raw: unknown, countryCode: string): CountryDisplacement | null {
export function getCountryDisplacement(raw: unknown, countryCode: string): CountryDisplacement | null {
const summary = (raw as { summary?: { countries?: CountryDisplacement[] } } | null)?.summary;
const countries = Array.isArray(summary?.countries) ? summary.countries : [];
return countries.find((entry) => matchesCountryIdentifier(entry.code, countryCode)) ?? null;
}
function summarizeSocialVelocity(raw: unknown, countryCode: string): number {
export function summarizeSocialVelocity(raw: unknown, countryCode: string): number {
const posts: SocialVelocityPost[] = Array.isArray((raw as { posts?: unknown[] } | null)?.posts)
? ((raw as { posts?: SocialVelocityPost[] }).posts ?? [])
: [];
return posts.reduce((sum, post) => sum + (matchesCountryText(post.title, countryCode) ? (safeNum(post.velocityScore) ?? 0) : 0), 0);
}
function getThreatSummaryScore(raw: unknown, countryCode: string): number | null {
export function getThreatSummaryScore(raw: unknown, countryCode: string): number | null {
if (!raw || typeof raw !== 'object') return null;
const byCountry = (raw as Record<string, unknown>).byCountry ?? raw; // backward-compat: old payload was a flat ISO2 map
const counts = (byCountry as Record<string, Record<string, number>>)?.[countryCode.toUpperCase()];

View File

@@ -24,7 +24,7 @@ import { lazyFetchBilateralHs4 } from './_bilateral-hs4-lazy';
import { ROUTE_IMPACT_KEY } from '../../../_shared/cache-keys';
import { CHOKEPOINT_REGISTRY } from '../../../_shared/chokepoint-registry';
import { BYPASS_CORRIDORS_BY_CHOKEPOINT } from '../../../_shared/bypass-corridors';
import { RESILIENCE_SCORE_CACHE_PREFIX } from '../../resilience/v1/_shared';
import { RESILIENCE_SCORE_CACHE_PREFIX, getCurrentCacheFormula } from '../../resilience/v1/_shared';
import COUNTRY_PORT_CLUSTERS from '../../../../scripts/shared/country-port-clusters.json';
const CACHE_TTL_SECONDS = 86400; // 24h
@@ -129,10 +129,20 @@ function emptyResponse(_req: GetRouteImpactRequest, comtradeSource: string): Get
async function readResilienceScore(iso2: string): Promise<number> {
try {
const raw = await getCachedJson(`${RESILIENCE_SCORE_CACHE_PREFIX}${iso2}`, true);
if (raw && typeof raw === 'object' && 'overallScore' in (raw as object)) {
return (raw as { overallScore: number }).overallScore;
if (!raw || typeof raw !== 'object' || !('overallScore' in (raw as object))) {
return 0;
}
return 0;
// Cross-formula gate: score cache entries written under a different
// formula than the current one are stale and must not be served
// downstream. Returning 0 here mirrors the not-found case — the
// caller (computeImpact) treats 0 as "no resilience signal" and
// renders the lane without a resilience modifier. A fresh
// per-country rescoring is triggered naturally on the next call
// to the resilience handler, so the staleness is self-healing.
const tag = (raw as { _formula?: unknown })._formula;
const current = getCurrentCacheFormula();
if (tag !== current) return 0;
return (raw as { overallScore: number }).overallScore;
} catch {
return 0;
}

View File

@@ -0,0 +1,133 @@
// Cohort definitions for the resilience-scorer fairness audit.
// Referenced by scripts/compare-resilience-current-vs-proposed.mjs and
// tests/resilience-cohort-config.test.mts. See
// docs/plans/2026-04-22-001-fix-resilience-scorer-structural-bias-plan.md
// §5 (PR 0) and §7 for the role these cohorts play in the acceptance gates.
//
// Membership is curated, not live-derived. Each cohort lists every country
// that clearly falls into the category under widely-accepted definitions
// (IEA + WB for exporters/importers, IAEA for nuclear-heavy, etc.). The
// median-shift gate per cohort (§6 gate 6) is computed from these lists.
//
// Borderline cases are deliberately excluded: if a country only fits a
// cohort in some years, we leave it out so the cohort median stays a
// stable reference across PRs.
export interface ResilienceCohort {
/** Unique id used in reports and commit messages. */
id: string;
/** Human-readable cohort name. */
label: string;
/** One-sentence definition citing the objective criterion used. */
definition: string;
/** Source or authority that grounds the definition. */
source: string;
/** ISO-3166 alpha-2 country codes in the cohort. */
countryCodes: readonly string[];
}
export const RESILIENCE_COHORTS: readonly ResilienceCohort[] = [
{
id: 'net-fuel-exporters',
label: 'Net fuel exporters',
definition:
'Countries whose net petroleum + gas exports exceed domestic consumption on a 5-year rolling average. These countries are the archetype the current scorer under-scores via gas/coal penalties and the net-import-biased fuel-stock metric.',
source: 'IEA World Energy Balances + UN Comtrade HS27 cross-check. List curated 2026-04-22.',
countryCodes: [
'AE', 'SA', 'QA', 'KW', 'OM', 'BH', // Gulf
'NO', 'CA', // Wealthy democracies
'RU', 'IR', 'IQ', // Major non-aligned
'KZ', 'AZ', 'TM', // Post-Soviet
'VE', 'CO', 'EC', // South America
'NG', 'DZ', 'LY', 'AO', // Africa
'BN', // Southeast Asia
],
},
{
id: 'net-energy-importers-oecd',
label: 'Net energy importers (OECD core)',
definition:
'OECD countries with EG.IMP.CONS.ZS > 20% (net energy imports as share of primary energy use). Validates that exporter-aimed fixes do not accidentally uplift these as a side effect.',
source: 'World Bank WDI EG.IMP.CONS.ZS, 2022 values. Curated 2026-04-22.',
countryCodes: [
'DE', 'FR', 'IT', 'ES', 'PT', // EU core + periphery
'BE', 'NL', 'AT', 'CH', // EU continental
'JP', 'KR', // East Asia
'GB', 'IE', // UK + Ireland
'GR', 'HU', 'CZ', 'SK', // Southern + Central EU
'TR', // Bridge economy
],
},
{
id: 'nuclear-heavy-generation',
label: 'Nuclear-heavy generation mix',
definition:
'Countries where nuclear supplied ≥ 15% of electricity generation in the most recent reporting year. Validates that the new lowCarbonGenerationShare indicator correctly rewards firm low-carbon generation (PR 1 §3.3).',
source: 'IAEA PRIS (Power Reactor Information System) + World Bank EG.ELC.NUCL.ZS. Curated 2026-04-22.',
countryCodes: [
'FR', 'SK', 'UA', 'HU', 'BE', 'BG', 'SI', // Central/Eastern Europe heavy adopters
'CZ', 'FI', 'SE', 'CH', // Western/Northern EU adopters
'KR', 'US', // North America + East Asia
'AE', // UAE (Barakah)
'RU', // Russia
'AR', // Argentina (small but material share)
],
},
{
id: 'coal-heavy-domestic',
label: 'Coal-heavy domestic producers',
definition:
'Countries where coal supplied ≥ 30% of electricity generation AND the coal is predominantly domestic (not imported). Validates that the new importedFossilDependence composite correctly distinguishes domestic from imported coal exposure.',
source: 'World Bank EG.ELC.COAL.ZS + WITS/Comtrade domestic-vs-imports cross-check. Curated 2026-04-22.',
countryCodes: [
'IN', 'CN', 'ID', // Asia heavyweights
'ZA', 'BW', // Southern Africa
'AU', 'US', // OECD domestic producers
'PL', 'RS', 'BA', 'KZ', // Central/Eastern Europe + post-Soviet
'MN', // Mongolia
],
},
{
id: 'small-island-importers',
label: 'Small-island fuel importers',
definition:
'Small-island developing states that import essentially all fossil fuels. Data coverage is thin for this cohort; catches fixes that require data they structurally lack.',
source: 'UN SIDS list, subset with > 100k population. Curated 2026-04-22.',
countryCodes: [
'FJ', 'WS', 'TO', 'VU', 'SB', 'PG', 'KI', 'TV', // Pacific
'MV', // Indian Ocean
'MU', 'SC', 'CV', // Africa-adjacent
'BB', 'TT', 'JM', 'LC', 'VC', 'GD', // Caribbean
],
},
{
id: 'fragile-states',
label: 'Fragile states (low-band anchors)',
definition:
'Countries consistently in the bottom band of multiple composite indices (Fund for Peace FSI top-10 fragile 2019-2023, UN LDC, UCDP conflict-affected). Release-gate anchors must continue to score these at or below the LOW_BAND_CEILING through every PR.',
source: 'Intersection of Fund for Peace FSI, UN LDC list, and UCDP conflict-event database. Curated 2026-04-22.',
countryCodes: [
'YE', 'SO', 'SD', 'SS', // Horn + NE Africa
'CF', 'TD', 'NE', 'ML', 'BF', 'BI', // Sahel + Great Lakes
'CD', 'ET', // Central/East Africa
'HT', // Caribbean
'SY', 'IQ', 'AF', // MENA
'MM', 'LB', // Asia + Levant
],
},
] as const;
export function cohortMembershipFor(countryCode: string): readonly string[] {
const cc = countryCode.trim().toUpperCase();
return RESILIENCE_COHORTS
.filter((cohort) => cohort.countryCodes.includes(cc))
.map((cohort) => cohort.id);
}
export function unionMembership(): readonly string[] {
const seen = new Set<string>();
for (const cohort of RESILIENCE_COHORTS) {
for (const cc of cohort.countryCodes) seen.add(cc);
}
return [...seen];
}

View File

@@ -0,0 +1,92 @@
// Matched-pair sanity panel for the resilience-scorer fairness audit.
// Referenced by scripts/compare-resilience-current-vs-proposed.mjs and
// tests/resilience-cohort-config.test.mts. See
// docs/plans/2026-04-22-001-fix-resilience-scorer-structural-bias-plan.md
// §7 for the role these pairs play in the acceptance gates.
//
// Each pair tests a specific scorer-behavior axis under pre-chosen,
// publicly-defensible directional expectations. Acceptance gate #7
// enforces that each pair's within-pair-gap sign stays as documented
// across every scorer-changing PR. A pair flipping direction stops the
// PR and forces the construct change to be re-examined.
export interface MatchedPair {
/** Unique id used in reports. */
id: string;
/** ISO-3166 alpha-2 for the country expected to score higher. */
higherExpected: string;
/** ISO-3166 alpha-2 for the country expected to score lower. */
lowerExpected: string;
/** Scorer-behavior axis the pair tests. */
axis: string;
/**
* One-paragraph rationale. Documents both why the direction is
* defensible today AND the conditions under which it could flip.
* The rationale should be neutral — not a score target, but a
* statement about the underlying resilience mechanism.
*/
rationale: string;
/**
* Minimum gap (higher - lower) required. If the gap shrinks below this
* after a PR's change, the sanity gate flags it as a near-flip even
* though the sign hasn't changed. Default 3 points.
*/
minGap?: number;
}
export const MATCHED_PAIRS: readonly MatchedPair[] = [
{
id: 'fr-vs-de',
higherExpected: 'FR',
lowerExpected: 'DE',
axis: 'Nuclear-heavy vs non-nuclear OECD importers',
rationale:
'France (~65% nuclear) has firm low-carbon electricity generation that Germany lacks post-phase-out; both are net energy importers but France\'s shock-absorption capacity via generation-mix independence is materially higher. A scorer that loses this gap under PR 1 has mis-weighted generation-mix vs other infrastructure signals. Germany\'s stronger fiscal/export sector does not close the gap in the current scorer; it shouldn\'t close it under PR 1 either.',
minGap: 3,
},
{
id: 'no-vs-ca',
higherExpected: 'NO',
lowerExpected: 'CA',
axis: 'SWF-fueled fossil exporter vs non-SWF fossil exporter',
rationale:
'Norway and Canada share the net-fuel-exporter + OECD + good-governance profile. Norway\'s sovereign-wealth buffer (GPFG, $1.6T) produces a materially larger shock-absorption cushion that Canada does not have. A scorer that loses this gap under PR 2 indicates the sovereignFiscalBuffer dimension is under-weighted OR the transparency/access/liquidity haircuts are over-penalizing Norway\'s fiscal-rule-bound withdrawals.',
minGap: 3,
},
{
id: 'uae-vs-bh',
higherExpected: 'AE',
lowerExpected: 'BH',
axis: 'Gulf with large SWF scale vs small-scale Gulf',
rationale:
'UAE\'s SWF scale (ADIA + Mubadala + ICD ≈ $1.7T for a population of ~10M) is two orders of magnitude higher per capita than Bahrain\'s (Mumtalakat ≈ $20B for ~1.5M). UAE infrastructure and recovery-domain indicators dominate. A scorer that shows AE ≈ BH after PR 1+PR 2 is mis-scaling the SWF haircut transform.',
minGap: 5,
},
{
id: 'jp-vs-kr',
higherExpected: 'JP',
lowerExpected: 'KR',
axis: 'Nuclear-adopters with different post-Fukushima trajectories',
rationale:
'Japan is a more established, more governance-tested nuclear adopter with deeper bureaucratic institutions and slightly stronger liquid-reserve cushion; South Korea is more dynamic but has higher concentration in semiconductor exports and lower SWF adequacy. The pair is intentionally narrow — within ~5 points expected — because both are strong OECD Asian economies. A wide gap or a direction flip under any PR indicates the scorer is over-reacting to governance-style differences or geopolitical-volatility proxies.',
minGap: 1,
},
{
id: 'in-vs-za',
higherExpected: 'IN',
lowerExpected: 'ZA',
axis: 'Coal-heavy domestic producers',
rationale:
'India and South Africa are both coal-heavy domestic producers with weak governance relative to OECD peers. India has materially higher macro-fiscal resilience (larger reserves, larger economy, more diversified export base, growing nuclear share) than South Africa (load-shedding crisis, weaker fiscal space). A scorer that loses this gap after PR 1 indicates the importedFossilDependence composite is over-crediting South Africa for its domestic coal without weighting its power-system-reliability collapse.',
minGap: 3,
},
{
id: 'sg-vs-ch',
higherExpected: 'SG',
lowerExpected: 'CH',
axis: 'Small high-infrastructure economies (SWF scale vs neutrality premium)',
rationale:
'Both are small, wealthy, governance-strong, high-infrastructure economies. Singapore\'s combined SWF (GIC + Temasek ≈ $1T) is materially larger per capita than Switzerland\'s SNB-held reserves despite similar GDP per capita. Singapore also has more explicit reserve-for-crisis access rules. Expect SG > CH by a small but real margin after PR 2. A wide gap would indicate over-crediting the SWF transform; a flipped direction would indicate the liquidReserveAdequacy dimension is picking up Switzerland\'s SNB strength disproportionately.',
minGap: 1,
},
] as const;

View File

@@ -0,0 +1,124 @@
// Contract test for the baseline-snapshot selection logic used by
// scripts/compare-resilience-current-vs-proposed.mjs. The selector is
// what drives acceptance gates 2 / 6 / 7 (matched-pair, cohort, max
// country drift) for every scorer PR in the resilience repair plan.
// A plain filename sort breaks on two axes:
// 1. `pre-repair` sorts after `post-*` lexically (`pr...` → 'r' > 'o'),
// so the pre-repair freeze would keep winning forever.
// 2. `post-pr9` sorts after `post-pr10` lexically, so PR-10 would
// lose to PR-9.
// These tests pin the parsed ordering so neither failure mode silently
// regresses.
import test from 'node:test';
import assert from 'node:assert/strict';
const mod = await import('../scripts/compare-resilience-current-vs-proposed.mjs');
const { parseBaselineSnapshotMeta } = mod;
function orderedFilenames(filenames) {
return filenames
.map(parseBaselineSnapshotMeta)
.filter((m) => m != null)
.sort((a, b) => {
if (a.kindRank !== b.kindRank) return b.kindRank - a.kindRank;
if (a.prNumber !== b.prNumber) return b.prNumber - a.prNumber;
return b.date.localeCompare(a.date);
})
.map((m) => m.filename);
}
test('parseBaselineSnapshotMeta: pre-repair filename is recognised', () => {
const meta = parseBaselineSnapshotMeta('resilience-ranking-live-pre-repair-2026-04-22.json');
assert.ok(meta);
assert.equal(meta.kind, 'pre-repair');
assert.equal(meta.kindRank, 0);
assert.equal(meta.prNumber, -1);
assert.equal(meta.date, '2026-04-22');
});
test('parseBaselineSnapshotMeta: post-pr<N> filename parses prNumber numerically', () => {
const meta = parseBaselineSnapshotMeta('resilience-ranking-live-post-pr10-2026-05-01.json');
assert.ok(meta);
assert.equal(meta.kind, 'post');
assert.equal(meta.kindRank, 1);
assert.equal(meta.prNumber, 10);
assert.equal(meta.date, '2026-05-01');
assert.equal(meta.tag, 'pr10');
});
test('parseBaselineSnapshotMeta: post-<freeform-tag> falls back to prNumber 0', () => {
const meta = parseBaselineSnapshotMeta('resilience-ranking-live-post-handcal-2026-06-01.json');
assert.ok(meta);
assert.equal(meta.kind, 'post');
assert.equal(meta.prNumber, 0);
assert.equal(meta.tag, 'handcal');
});
test('parseBaselineSnapshotMeta: unrelated filenames return null', () => {
assert.equal(parseBaselineSnapshotMeta('resilience-ranking-2026-04-21.json'), null);
assert.equal(parseBaselineSnapshotMeta('resilience-ranking-pillar-combined-projected-2026-04-21.json'), null);
assert.equal(parseBaselineSnapshotMeta('README.md'), null);
});
test('selection ordering: post always beats pre-repair regardless of date', () => {
const ordered = orderedFilenames([
'resilience-ranking-live-pre-repair-2026-06-01.json',
'resilience-ranking-live-post-pr1-2026-05-01.json',
]);
assert.deepEqual(ordered, [
'resilience-ranking-live-post-pr1-2026-05-01.json',
'resilience-ranking-live-pre-repair-2026-06-01.json',
]);
});
test('selection ordering: pr10 beats pr9 (numeric, not lexical)', () => {
const ordered = orderedFilenames([
'resilience-ranking-live-post-pr9-2026-05-15.json',
'resilience-ranking-live-post-pr10-2026-06-01.json',
'resilience-ranking-live-post-pr2-2026-05-01.json',
]);
assert.deepEqual(ordered, [
'resilience-ranking-live-post-pr10-2026-06-01.json',
'resilience-ranking-live-post-pr9-2026-05-15.json',
'resilience-ranking-live-post-pr2-2026-05-01.json',
]);
});
test('selection ordering: realistic PR-0..PR-4 ladder picks the latest PR', () => {
const ordered = orderedFilenames([
'resilience-ranking-live-pre-repair-2026-04-22.json',
'resilience-ranking-live-post-pr1-2026-05-10.json',
'resilience-ranking-live-post-pr3-2026-06-02.json',
'resilience-ranking-live-post-pr2-2026-05-25.json',
'resilience-ranking-live-post-pr4-2026-06-18.json',
]);
assert.equal(ordered[0], 'resilience-ranking-live-post-pr4-2026-06-18.json');
assert.equal(ordered.at(-1), 'resilience-ranking-live-pre-repair-2026-04-22.json');
});
test('selection ordering: same pr number, later date wins', () => {
// Edge case: a PR re-snapshotted after a hotfix. The later capture
// should win so "immediate prior" remains the most recent observation
// of that PR's landed state.
const ordered = orderedFilenames([
'resilience-ranking-live-post-pr2-2026-05-25.json',
'resilience-ranking-live-post-pr2-2026-05-27.json',
]);
assert.equal(ordered[0], 'resilience-ranking-live-post-pr2-2026-05-27.json');
});
test('selection ordering: unlabeled post tag sorts between pre-repair and pr1', () => {
// Guards against a future misnamed snapshot sneaking in and either
// beating a numbered PR or losing to the original pre-repair.
const ordered = orderedFilenames([
'resilience-ranking-live-pre-repair-2026-04-22.json',
'resilience-ranking-live-post-handcal-2026-05-02.json',
'resilience-ranking-live-post-pr1-2026-05-10.json',
]);
assert.deepEqual(ordered, [
'resilience-ranking-live-post-pr1-2026-05-10.json',
'resilience-ranking-live-post-handcal-2026-05-02.json',
'resilience-ranking-live-pre-repair-2026-04-22.json',
]);
});

View File

@@ -0,0 +1,124 @@
// Validates the cohort and matched-pair configuration used by the PR 0
// diagnostic-freeze harness. These configs are load-bearing for the
// fairness audit in docs/plans/2026-04-22-001-fix-resilience-scorer-
// structural-bias-plan.md §7 — a silent regression in them would
// corrupt the acceptance-gate evidence for every subsequent scorer PR.
import assert from 'node:assert/strict';
import { describe, it } from 'node:test';
import { RESILIENCE_COHORTS, unionMembership } from './helpers/resilience-cohorts.mts';
import { MATCHED_PAIRS } from './helpers/resilience-matched-pairs.mts';
const ISO2_RE = /^[A-Z]{2}$/;
describe('resilience cohort configuration', () => {
it('every cohort has at least 3 members', () => {
for (const cohort of RESILIENCE_COHORTS) {
assert.ok(
cohort.countryCodes.length >= 3,
`cohort ${cohort.id} has ${cohort.countryCodes.length} members; medians are unreliable below 3`,
);
}
});
it('every cohort country code is a valid ISO-3166 alpha-2', () => {
for (const cohort of RESILIENCE_COHORTS) {
for (const cc of cohort.countryCodes) {
assert.match(cc, ISO2_RE, `cohort ${cohort.id} has non-ISO2 code "${cc}"`);
}
}
});
it('no cohort has duplicate members within itself', () => {
for (const cohort of RESILIENCE_COHORTS) {
const unique = new Set(cohort.countryCodes);
assert.equal(
unique.size,
cohort.countryCodes.length,
`cohort ${cohort.id} has duplicate members: ${cohort.countryCodes.length - unique.size} duplicates`,
);
}
});
it('every cohort has a documented definition and source', () => {
for (const cohort of RESILIENCE_COHORTS) {
assert.ok(cohort.definition.length > 20, `cohort ${cohort.id} definition too short`);
assert.ok(cohort.source.length > 10, `cohort ${cohort.id} source citation too short`);
assert.ok(cohort.label.length > 3, `cohort ${cohort.id} label too short`);
}
});
it('cohort union covers at least 70 unique countries', () => {
// PR 0 §7: the union of cohort membership must span a meaningful
// slice of the ranking. 70 countries is roughly a third of the
// scorable set — sufficient for cohort-median gates to distinguish
// construct-change effects from noise.
const union = unionMembership();
assert.ok(
union.length >= 70,
`cohort union has ${union.length} unique countries; expected ≥ 70 for meaningful fairness audit`,
);
});
it('cohort ids are unique', () => {
const ids = RESILIENCE_COHORTS.map((c) => c.id);
const unique = new Set(ids);
assert.equal(unique.size, ids.length, 'cohort ids must be unique');
});
});
describe('resilience matched-pair configuration', () => {
it('every matched pair references two distinct valid ISO-2 codes', () => {
for (const pair of MATCHED_PAIRS) {
assert.match(pair.higherExpected, ISO2_RE, `pair ${pair.id} higherExpected`);
assert.match(pair.lowerExpected, ISO2_RE, `pair ${pair.id} lowerExpected`);
assert.notEqual(
pair.higherExpected,
pair.lowerExpected,
`pair ${pair.id} has higher === lower (${pair.higherExpected})`,
);
}
});
it('every matched pair has a documented axis + rationale', () => {
for (const pair of MATCHED_PAIRS) {
assert.ok(pair.axis.length > 10, `pair ${pair.id} axis too short`);
// Rationale must be substantive — pins the expected-direction
// defensibility so a reviewer can challenge the pair on its
// merits rather than guessing at intent.
assert.ok(pair.rationale.length > 100, `pair ${pair.id} rationale too short (${pair.rationale.length} chars)`);
}
});
it('every matched pair has a non-negative minimum gap', () => {
for (const pair of MATCHED_PAIRS) {
const minGap = pair.minGap ?? 3;
assert.ok(
minGap >= 0,
`pair ${pair.id} minGap=${minGap} must be ≥ 0`,
);
// Guard against an accidentally-enormous minGap that would make
// the gate trivially fail — no pair should need more than a
// 10-point cushion.
assert.ok(
minGap <= 10,
`pair ${pair.id} minGap=${minGap} suspiciously large; pairs with gaps > 10 are probably not valid sanity-check peers`,
);
}
});
it('pair ids are unique', () => {
const ids = MATCHED_PAIRS.map((p) => p.id);
const unique = new Set(ids);
assert.equal(unique.size, ids.length, 'matched-pair ids must be unique');
});
it('at least 4 pairs are defined to exercise the fairness audit', () => {
// Acceptance gate #7 in the plan requires the matched-pair sanity
// panel to be exercised every scorer-changing PR. Too few pairs
// and the panel provides insufficient coverage across scorer
// behavior axes.
assert.ok(MATCHED_PAIRS.length >= 4, `expected ≥ 4 matched pairs, got ${MATCHED_PAIRS.length}`);
});
});

View File

@@ -0,0 +1,259 @@
// Monotonicity-test harness. Pins the direction of movement for the
// highest-leverage indicators so PR 1 + PR 2 cannot accidentally flip
// a sign silently. See
// docs/plans/2026-04-22-001-fix-resilience-scorer-structural-bias-plan.md
// §5 (PR 0 deliverable) and §6 (acceptance gate 8).
//
// Each test builds two synthetic `ResilienceSeedReader` fixtures that
// differ only in the target indicator's value and asserts the dimension
// score moves in the documented direction.
//
// Scope (minimum viable, expanded in PR 0.5 follow-ups):
// - scoreEnergy: dependency, gasShare, coalShare, renewShare, electricityConsumption
// (all five direction claims the current scorer makes — PR 1 overturns three of them)
// - scoreReserveAdequacy: reserveMonths
// - scoreFiscalSpace: govRevenuePct, fiscalBalancePct, debtToGdpPct
// - scoreExternalDebtCoverage: debtToReservesRatio
// - scoreImportConcentration: hhi
// - scoreFoodWater: peopleInCrisis, phase
// - scoreGovernanceInstitutional: WGI mean
//
// 15 indicators × 1 direction check each = 15 assertions. The harness
// is written as a table so PR 1 can add/remove rows without touching
// test logic.
import assert from 'node:assert/strict';
import { describe, it } from 'node:test';
import {
scoreEnergy,
scoreReserveAdequacy,
scoreFiscalSpace,
scoreExternalDebtCoverage,
scoreImportConcentration,
scoreFoodWater,
scoreGovernanceInstitutional,
type ResilienceSeedReader,
} from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
const TEST_ISO2 = 'XX';
function makeStaticReader(staticRecord: unknown, overrides: Record<string, unknown> = {}): ResilienceSeedReader {
return async (key: string) => {
if (key === `resilience:static:${TEST_ISO2}`) return staticRecord;
if (key in overrides) return overrides[key];
return null;
};
}
function makeRecoveryReader(keyValueMap: Record<string, unknown>): ResilienceSeedReader {
return async (key: string) => keyValueMap[key] ?? null;
}
describe('resilience dimension monotonicity — scoreReserveAdequacy', () => {
it('higher reserveMonths → higher score', async () => {
const low = await scoreReserveAdequacy(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:reserve-adequacy:v1': { countries: { [TEST_ISO2]: { reserveMonths: 2 } } },
}));
const high = await scoreReserveAdequacy(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:reserve-adequacy:v1': { countries: { [TEST_ISO2]: { reserveMonths: 12 } } },
}));
assert.ok(high.score > low.score, `reserveMonths 2→12 should raise score; got ${low.score}${high.score}`);
});
});
describe('resilience dimension monotonicity — scoreFiscalSpace', () => {
const baseEntry = { govRevenuePct: 25, fiscalBalancePct: 0, debtToGdpPct: 60 };
async function scoreWith(override: Partial<typeof baseEntry>) {
return scoreFiscalSpace(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:fiscal-space:v1': { countries: { [TEST_ISO2]: { ...baseEntry, ...override } } },
}));
}
it('higher govRevenuePct → higher score', async () => {
const low = await scoreWith({ govRevenuePct: 10 });
const high = await scoreWith({ govRevenuePct: 40 });
assert.ok(high.score > low.score, `govRevenuePct 10→40 should raise score; got ${low.score}${high.score}`);
});
it('higher fiscalBalancePct → higher score', async () => {
const low = await scoreWith({ fiscalBalancePct: -10 });
const high = await scoreWith({ fiscalBalancePct: 3 });
assert.ok(high.score > low.score, `fiscalBalancePct -10→3 should raise score; got ${low.score}${high.score}`);
});
it('higher debtToGdpPct → lower score', async () => {
const low = await scoreWith({ debtToGdpPct: 40 });
const high = await scoreWith({ debtToGdpPct: 140 });
assert.ok(low.score > high.score, `debtToGdpPct 40→140 should lower score; got ${low.score}${high.score}`);
});
});
describe('resilience dimension monotonicity — scoreExternalDebtCoverage', () => {
async function scoreWith(ratio: number) {
return scoreExternalDebtCoverage(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:external-debt:v1': { countries: { [TEST_ISO2]: { debtToReservesRatio: ratio } } },
}));
}
it('higher debtToReservesRatio → lower score', async () => {
// NOTE: the current scorer saturates at 100 for ratio ≤ 0 (goalpost
// lower-better, worst=5 best=0). Picking values inside the 0-5 band
// to get a meaningful gradient. PR 3 §3.6 re-goalposts this.
const good = await scoreWith(1);
const bad = await scoreWith(4);
assert.ok(good.score > bad.score, `debtToReservesRatio 1→4 should lower score; got ${good.score}${bad.score}`);
});
});
describe('resilience dimension monotonicity — scoreImportConcentration', () => {
async function scoreWith(hhi: number) {
return scoreImportConcentration(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:import-hhi:v1': { countries: { [TEST_ISO2]: { hhi } } },
}));
}
it('higher hhi → lower score (more concentration = more exposure)', async () => {
// HHI payload is on a 0..1 scale (normalised before storage).
// 0.15 = diversified supplier base; 0.45 = concentrated.
const diversified = await scoreWith(0.15);
const concentrated = await scoreWith(0.45);
assert.ok(diversified.score > concentrated.score, `hhi 0.15→0.45 should lower score; got ${diversified.score}${concentrated.score}`);
});
});
describe('resilience dimension monotonicity — scoreGovernanceInstitutional', () => {
async function scoreWith(wgiMeanValue: number) {
// Static-record shape per `getStaticWgiValues`: `wgi.indicators.<name>.value`.
const staticRecord = {
wgi: {
indicators: {
voiceAccountability: { value: wgiMeanValue },
politicalStability: { value: wgiMeanValue },
governmentEffectiveness:{ value: wgiMeanValue },
regulatoryQuality: { value: wgiMeanValue },
ruleOfLaw: { value: wgiMeanValue },
controlOfCorruption: { value: wgiMeanValue },
},
},
};
return scoreGovernanceInstitutional(TEST_ISO2, makeStaticReader(staticRecord));
}
it('higher WGI mean → higher score', async () => {
const weak = await scoreWith(-1.5);
const strong = await scoreWith(1.5);
assert.ok(strong.score > weak.score, `WGI -1.5→1.5 should raise score; got ${weak.score}${strong.score}`);
});
});
describe('resilience dimension monotonicity — scoreFoodWater', () => {
async function scoreWith(override: Record<string, unknown>) {
const fao = { peopleInCrisis: 100, phase: 'Phase 1', ...override };
const staticRecord = { fao, aquastat: { waterStress: { value: 40 }, waterAvailability: { value: 2000 } } };
return scoreFoodWater(TEST_ISO2, makeStaticReader(staticRecord));
}
it('higher peopleInCrisis → lower score', async () => {
const healthy = await scoreWith({ peopleInCrisis: 1000 });
const crisis = await scoreWith({ peopleInCrisis: 5_000_000 });
assert.ok(healthy.score > crisis.score, `peopleInCrisis 1k→5M should lower score; got ${healthy.score}${crisis.score}`);
});
it('higher IPC phase → lower score', async () => {
const phase2 = await scoreWith({ phase: 'Phase 2' });
const phase5 = await scoreWith({ phase: 'Phase 5' });
assert.ok(phase2.score > phase5.score, `phase 2→5 should lower score; got ${phase2.score}${phase5.score}`);
});
});
describe('resilience dimension monotonicity — scoreEnergy (current construct)', () => {
// NOTE: these tests pin the CURRENT scorer direction for each indicator.
// PR 1 §3.1-3.3 overturns three of them (electricityConsumption, gasShare,
// coalShare) — when PR 1 ships, those tests are REPLACED by tests for
// the new indicators (importedFossilDependence, lowCarbonGenerationShare).
// The failure of one of these tests in the meantime is a signal that a
// PR has accidentally altered the construct; PR 1 should update this
// file in the same commit that changes scoreEnergy.
function makeEnergyReader(overrides: {
staticRecord?: unknown;
mix?: unknown;
prices?: unknown;
storage?: unknown;
} = {}): ResilienceSeedReader {
const defaultStatic = {
iea: { energyImportDependency: { value: 30 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 3000 } } },
};
const defaultMix = { gasShare: 30, coalShare: 20, renewShare: 30 };
return async (key: string) => {
if (key === `resilience:static:${TEST_ISO2}`) return overrides.staticRecord ?? defaultStatic;
if (key === 'economic:energy:v1:all') return overrides.prices ?? null;
if (key === `energy:mix:v1:${TEST_ISO2}`) return overrides.mix ?? defaultMix;
if (key === `energy:gas-storage:v1:${TEST_ISO2}`) return overrides.storage ?? null;
return null;
};
}
it('higher import dependency → lower score', async () => {
const selfSufficient = await scoreEnergy(TEST_ISO2, makeEnergyReader({
staticRecord: {
iea: { energyImportDependency: { value: 10 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 3000 } } },
},
}));
const dependent = await scoreEnergy(TEST_ISO2, makeEnergyReader({
staticRecord: {
iea: { energyImportDependency: { value: 90 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 3000 } } },
},
}));
assert.ok(selfSufficient.score > dependent.score, `import dep 10→90 should lower score; got ${selfSufficient.score}${dependent.score}`);
});
it('higher renewShare → higher score', async () => {
const low = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 30, coalShare: 20, renewShare: 5 } }));
const high = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 30, coalShare: 20, renewShare: 70 } }));
assert.ok(high.score > low.score, `renewShare 5→70 should raise score; got ${low.score}${high.score}`);
});
it('CURRENT: higher gasShare → lower score (THIS CHANGES IN PR 1 — see plan §3.2)', async () => {
// Pins the current (v3-plan-condemned) behavior so PR 1 knows what
// it is replacing. When PR 1 ships the new importedFossilDependence
// composite, this test is REPLACED, not deleted — the replacement
// pins the new construct's direction.
const low = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 10, coalShare: 20, renewShare: 30 } }));
const high = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 70, coalShare: 20, renewShare: 30 } }));
assert.ok(low.score > high.score, `gasShare 10→70 should lower score under current construct; got ${low.score}${high.score}`);
});
it('CURRENT: higher coalShare → lower score (THIS CHANGES IN PR 1 — see plan §3.2)', async () => {
const low = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 30, coalShare: 10, renewShare: 30 } }));
const high = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 30, coalShare: 70, renewShare: 30 } }));
assert.ok(low.score > high.score, `coalShare 10→70 should lower score under current construct; got ${low.score}${high.score}`);
});
it('CURRENT: higher electricityConsumption → higher score (THIS FAILS THE MECHANISM TEST — see plan §3.1)', async () => {
// This test PASSES today because the current scorer rewards
// per-capita electricity consumption. The v3 plan classifies
// electricityConsumption as a wealth-proxy that fails the mechanism
// test; PR 1 removes it. When PR 1 ships, this test is DELETED (not
// replaced), because the indicator no longer exists. The delete is
// the signal that the wealth-proxy concern is resolved.
const low = await scoreEnergy(TEST_ISO2, makeEnergyReader({
staticRecord: {
iea: { energyImportDependency: { value: 30 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 500 } } },
},
}));
const high = await scoreEnergy(TEST_ISO2, makeEnergyReader({
staticRecord: {
iea: { energyImportDependency: { value: 30 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 7500 } } },
},
}));
assert.ok(high.score > low.score, `electricityConsumption 500→7500 kWh/cap should raise score under current construct; got ${low.score}${high.score}`);
});
});

View File

@@ -0,0 +1,235 @@
// Contract test for the registry-driven per-indicator extraction plan
// used by scripts/compare-resilience-current-vs-proposed.mjs. Pins two
// acceptance-apparatus invariants:
//
// 1. Every indicator in INDICATOR_REGISTRY has a corresponding
// EXTRACTION_RULES row (implemented OR not-implemented with a
// reason). No silent omissions.
// 2. All six repair-plan construct-risk indicators (energy mix +
// electricity consumption + energy import dependency + WGI
// sub-pillars + recovery fiscal indicators) are 'implemented'
// in the harness, so PR 1 / PR 3 / PR 4 can measure
// pre-vs-post effective-influence against their baselines.
import test from 'node:test';
import assert from 'node:assert/strict';
const scriptMod = await import('../scripts/compare-resilience-current-vs-proposed.mjs');
const registryMod = await import('../server/worldmonitor/resilience/v1/_indicator-registry.ts');
const { buildIndicatorExtractionPlan, applyExtractionRule, EXTRACTION_RULES } = scriptMod;
const { INDICATOR_REGISTRY } = registryMod;
test('every INDICATOR_REGISTRY entry has an EXTRACTION_RULES row', () => {
const missing = INDICATOR_REGISTRY.filter((spec) => !(spec.id in EXTRACTION_RULES));
assert.deepEqual(
missing.map((s) => s.id),
[],
'new indicator(s) added to INDICATOR_REGISTRY without adding an EXTRACTION_RULES entry; ' +
'add an extractor or an explicit { type: "not-implemented", reason }',
);
});
test('extraction plan row exists for every registry entry', () => {
const plan = buildIndicatorExtractionPlan(INDICATOR_REGISTRY);
assert.equal(plan.length, INDICATOR_REGISTRY.length);
for (const entry of plan) {
assert.ok(['implemented', 'not-implemented', 'unregistered-in-harness'].includes(entry.extractionStatus));
}
});
test('"not-implemented" rows carry a reason string', () => {
const plan = buildIndicatorExtractionPlan(INDICATOR_REGISTRY);
for (const entry of plan) {
if (entry.extractionStatus === 'not-implemented') {
assert.ok(
typeof entry.reason === 'string' && entry.reason.length > 0,
`indicator ${entry.indicator} marked not-implemented but has no reason`,
);
}
}
});
test('all construct-risk indicators flagged by the repair plan are implemented', () => {
// The repair plan §3.1§3.2, §4.3, §4.4 specifically names these
// indicators as the ones whose effective influence must be
// measurable pre- and post-change. If any becomes 'not-implemented',
// the acceptance apparatus for that PR evaporates. IDs match
// INDICATOR_REGISTRY exactly — the registry renames macroFiscal
// fiscal-space sub-indicators with a `recovery*` prefix when they
// live in the fiscalSpace dimension.
const mustBeImplemented = [
'gasShare',
'coalShare',
'renewShare',
'electricityConsumption',
'energyImportDependency',
'govRevenuePct',
'recoveryGovRevenue',
'recoveryFiscalBalance',
'recoveryDebtToGdp',
'recoveryReserveMonths',
'recoveryDebtToReserves',
'recoveryImportHhi',
];
const plan = buildIndicatorExtractionPlan(INDICATOR_REGISTRY);
const byId = Object.fromEntries(plan.map((p) => [p.indicator, p]));
for (const id of mustBeImplemented) {
assert.ok(byId[id], `construct-risk indicator ${id} is not in the extraction plan`);
assert.equal(
byId[id].extractionStatus,
'implemented',
`construct-risk indicator ${id} must be extractable; got "${byId[id].extractionStatus}": ${byId[id].reason ?? ''}`,
);
}
});
test('core-tier indicator coverage meets a minimum floor', () => {
// Drives the extractionCoverage summary in the output. Floor raised
// after wiring the exported scorer-aggregate helpers (summarizeCyber,
// summarizeOutages, summarizeGps, summarizeUcdp, summarizeUnrest,
// getThreatSummaryScore, getCountryDisplacement, countTradeRestrictions,
// countTradeBarriers). The only Core-tier indicators still unextracted
// are those whose scorer inputs are genuinely global scalars
// (shippingStress, transitDisruption, energyPriceStress) or require
// unexported time-series helpers (fxVolatility, fxDeviation,
// aquastatWaterAvailability, householdDebtService, etc.).
const plan = buildIndicatorExtractionPlan(INDICATOR_REGISTRY);
const coreTotal = plan.filter((p) => p.tier === 'core').length;
const coreImplemented = plan.filter((p) => p.tier === 'core' && p.extractionStatus === 'implemented').length;
assert.ok(
coreImplemented / coreTotal >= 0.80,
`core-tier extraction coverage fell below 80%: ${coreImplemented}/${coreTotal}`,
);
});
test('the three "no per-country variance" indicators stay not-implemented with correct reason', () => {
// shippingStress / transitDisruption / energyPriceStress are
// scorer-level GLOBAL scalars — Pearson(global, overall) is 0 or
// NaN by construction. They must NOT be marked implemented: any
// future implementation that appears to extract them is wrong
// unless it re-expresses them as per-country effective contribution.
const plan = buildIndicatorExtractionPlan(INDICATOR_REGISTRY);
const byId = Object.fromEntries(plan.map((p) => [p.indicator, p]));
for (const id of ['shippingStress', 'transitDisruption', 'energyPriceStress']) {
assert.equal(byId[id]?.extractionStatus, 'not-implemented', `${id} should stay not-implemented (no per-country variance)`);
assert.match(byId[id].reason, /no per-country variance|global/i);
}
});
test('applyExtractionRule — static-path navigates nested object fields', () => {
const rule = { type: 'static-path', path: ['iea', 'energyImportDependency', 'value'] };
const sources = { staticRecord: { iea: { energyImportDependency: { value: 42 } } } };
assert.equal(applyExtractionRule(rule, sources, 'AE'), 42);
});
test('applyExtractionRule — recovery-country-field uses .countries[iso2].<field>', () => {
const rule = { type: 'recovery-country-field', key: 'resilience:recovery:fiscal-space:v1', field: 'govRevenuePct' };
const sources = { fiscalSpace: { countries: { AE: { govRevenuePct: 30 } } } };
assert.equal(applyExtractionRule(rule, sources, 'AE'), 30);
});
test('applyExtractionRule — static-wgi reads .wgi.indicators[code].value', () => {
// WGI keys are World-Bank standard codes (VA.EST, PV.EST, etc.)
const rule = { type: 'static-wgi', code: 'RL.EST' };
const sources = { staticRecord: { wgi: { indicators: { 'RL.EST': { value: 1.2 } } } } };
assert.equal(applyExtractionRule(rule, sources, 'DE'), 1.2);
});
test('applyExtractionRule — static-wgi-mean averages all six WGI sub-pillars', () => {
const rule = { type: 'static-wgi-mean' };
const sources = { staticRecord: { wgi: { indicators: {
'VA.EST': { value: 1.0 },
'PV.EST': { value: -1.0 },
'GE.EST': { value: 0.5 },
'RQ.EST': { value: -0.5 },
'RL.EST': { value: 2.0 },
'CC.EST': { value: 0.0 },
} } } };
assert.equal(applyExtractionRule(rule, sources, 'DE'), (1.0 + -1.0 + 0.5 + -0.5 + 2.0 + 0.0) / 6);
});
test('applyExtractionRule — missing values return null (pairwise-drop contract)', () => {
const rule = { type: 'static-path', path: ['iea', 'energyImportDependency', 'value'] };
assert.equal(applyExtractionRule(rule, {}, 'AE'), null);
assert.equal(applyExtractionRule(rule, { staticRecord: null }, 'AE'), null);
assert.equal(applyExtractionRule(rule, { staticRecord: { iea: null } }, 'AE'), null);
});
test('applyExtractionRule — not-implemented rules short-circuit to null', () => {
const rule = { type: 'not-implemented', reason: 'test' };
assert.equal(applyExtractionRule(rule, {}, 'AE'), null);
});
test('applyExtractionRule — summarize-cyber wires through exported scorer helper', () => {
const rule = { type: 'summarize-cyber' };
const cyber = { threats: [{ country: 'AE', severity: 'CRITICALITY_LEVEL_CRITICAL' }] };
// Pass a stub helper to prove the rule dispatches through it.
const helpers = {
summarizeCyber: (raw, cc) => ({
weightedCount: raw.threats.filter((t) => t.country === cc).length * 3,
}),
};
assert.equal(applyExtractionRule(rule, { cyber }, 'AE', helpers), 3);
// Without the helper available, rule falls back to null.
assert.equal(applyExtractionRule(rule, { cyber }, 'AE', {}), null);
});
test('applyExtractionRule — summarize-outages-penalty computes 4/2/1 weighting', () => {
const rule = { type: 'summarize-outages-penalty' };
const outages = { outages: [] };
const helpers = {
summarizeOutages: () => ({ total: 1, major: 2, partial: 3 }),
};
// penalty = 1*4 + 2*2 + 3*1 = 11
assert.equal(applyExtractionRule(rule, { outages }, 'AE', helpers), 11);
});
test('applyExtractionRule — displacement-field reads per-country entry by field name', () => {
const rule = { type: 'displacement-field', field: 'totalDisplaced' };
const displacement = {};
const helpers = {
getCountryDisplacement: () => ({ totalDisplaced: 12345, hostTotal: 678 }),
};
assert.equal(applyExtractionRule(rule, { displacement }, 'SY', helpers), 12345);
});
test('applyExtractionRule — count-trade-restrictions uses scorer-exported counter', () => {
const rule = { type: 'count-trade-restrictions' };
const tradeRestrictions = { restrictions: [] };
const helpers = { countTradeRestrictions: () => 5 };
assert.equal(applyExtractionRule(rule, { tradeRestrictions }, 'AE', helpers), 5);
// Zero coerces to null (pairwise-drop contract for empty signals).
assert.equal(applyExtractionRule(rule, { tradeRestrictions }, 'AE', { countTradeRestrictions: () => 0 }), null);
});
test('applyExtractionRule — aquastat stress vs availability gated by indicator tag', () => {
// Mirror scoreAquastatValue in _dimension-scorers.ts: both indicators
// share .aquastat.value, but the .aquastat.indicator tag classifies
// which family the reading belongs to. A stress-family country must
// NOT contribute a reading to the availability extractor, and vice
// versa, otherwise the Pearson correlation mixes two different
// construct scales.
const stressRule = { type: 'static-aquastat-stress' };
const availabilityRule = { type: 'static-aquastat-availability' };
const stressCountry = { staticRecord: { aquastat: { value: 42, indicator: 'Water stress (withdrawal/availability)' } } };
const availabilityCountry = { staticRecord: { aquastat: { value: 1500, indicator: 'Renewable water availability per capita' } } };
const unknownCountry = { staticRecord: { aquastat: { value: 99, indicator: 'Some unrecognised tag' } } };
const missingCountry = { staticRecord: { aquastat: { value: null, indicator: 'stress' } } };
// Stress-tagged country: only the stress extractor returns the value.
assert.equal(applyExtractionRule(stressRule, stressCountry, 'AE'), 42);
assert.equal(applyExtractionRule(availabilityRule, stressCountry, 'AE'), null);
// Availability-tagged country: only the availability extractor returns.
assert.equal(applyExtractionRule(availabilityRule, availabilityCountry, 'DE'), 1500);
assert.equal(applyExtractionRule(stressRule, availabilityCountry, 'DE'), null);
// Unknown tag: neither extractor returns (pairwise-drop).
assert.equal(applyExtractionRule(stressRule, unknownCountry, 'XX'), null);
assert.equal(applyExtractionRule(availabilityRule, unknownCountry, 'XX'), null);
// Missing value: null regardless of tag.
assert.equal(applyExtractionRule(stressRule, missingCountry, 'XX'), null);
});