Files
worldmonitor/docs/methodology/country-resilience-index.mdx
Elie Habib fbaf07e106 feat(resilience): flag-gated pillar-combined score activation (default off) (#3267)
Wires the non-compensatory 3-pillar combined overall_score behind a
RESILIENCE_PILLAR_COMBINE_ENABLED env flag. Default is false so this PR
ships zero behavior change in production. When flipped true the
top-level overall_score switches from the 6-domain weighted aggregate
to penalizedPillarScore(pillars) with alpha 0.5 and pillar weights
0.40 / 0.35 / 0.25.

Evidence from docs/snapshots/resilience-pillar-sensitivity-2026-04-21:
- Spearman rank correlation current vs proposed 0.9935
- Mean score delta -13.44 points (every country drops, penalty is
  always at most 1)
- Max top-50 rank swing 6 positions (Russia)
- No ceiling or floor effects under plus/minus 20pct perturbation
- Release gate PASS 0/19

Code change in server/worldmonitor/resilience/v1/_shared.ts:
- New isPillarCombineEnabled() reads env dynamically so tests can flip
  state without reloading the module
- overallScore branches on (isPillarCombineEnabled() AND
  RESILIENCE_SCHEMA_V2_ENABLED AND pillars.length > 0); otherwise falls
  through to the 6-domain aggregate (unchanged default path)
- RESILIENCE_SCORE_CACHE_PREFIX bumped v9 to v10
- RESILIENCE_RANKING_CACHE_KEY bumped v9 to v10

Cache invalidation: the version bump forces both per-country score
cache and ranking cache to recompute from the current code path on
first read after a flag flip. Without the bump, 6-domain values cached
under the flag-off path would continue to serve for up to 6-12 hours
after the flip, producing a ragged mix of formulas.

Ripple of v9 to v10:
- api/health.js registry entry
- scripts/seed-resilience-scores.mjs (both keys)
- scripts/validate-resilience-correlation.mjs,
  scripts/backtest-resilience-outcomes.mjs,
  scripts/validate-resilience-backtest.mjs,
  scripts/benchmark-resilience-external.mjs
- tests/resilience-ranking.test.mts 24 fixture usages
- tests/resilience-handlers.test.mts
- tests/resilience-scores-seed.test.mjs explicit pin
- tests/resilience-pillar-aggregation.test.mts explicit pin
- docs/methodology/country-resilience-index.mdx

New tests/resilience-pillar-combine-activation.test.mts:
7 assertions exercising the flag-on path against the release fixtures
with re-anchored bands (NO at least 60, YE/SO at most 40, NO greater
than US preserved, elite greater than fragile). Regression guard
verifies flipping the flag back off restores the 6-domain aggregate.

tests/resilience-ranking-snapshot.test.mts: band thresholds now
resolve from a METHODOLOGY_BANDS table keyed on
snapshot.methodologyFormula. Backward compatible (missing formula
defaults to domain-weighted-6d bands).

Snapshots:
- docs/snapshots/resilience-ranking-2026-04-21.json tagged
  methodologyFormula domain-weighted-6d
- docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json
  new: top/bottom/major-economies tables projected from the
  52-country sensitivity sample. Explicitly tagged projected (NOT a
  full-universe live capture). When the flag is flipped in production,
  run scripts/freeze-resilience-ranking.mjs to capture the
  authoritative full-universe snapshot.

Methodology doc: Pillar-combined score activation section rewritten to
describe the flag-gated mechanism (activation is an env-var flip, no
code deploy) and the rollback path.

Verification: npm run typecheck:all clean, 397/397 resilience tests
pass (up from 390, +7 activation tests).

Activation plan:
1. Merge this PR with flag default false (zero behavior change)
2. Set RESILIENCE_PILLAR_COMBINE_ENABLED=true in Vercel and Railway env
3. Redeploy or wait for next cold start; v9 to v10 bump forces every
   country to be rescored on first read
4. Run scripts/freeze-resilience-ranking.mjs against the flag-on
   deployment and commit the resulting snapshot
5. Ship a v2.0 methodology-change note explaining the re-anchored
   scale so analysts understand the universal ~13 point score drop is
   a scale rebase, not a country-level regression

Rollback: set RESILIENCE_PILLAR_COMBINE_ENABLED=false, flush
resilience:score:v10:* and resilience:ranking:v10 keys (or wait for
TTLs). The 6-domain formula stays alongside the pillar combine in
_shared.ts and needs no code change to come back.
2026-04-22 06:52:07 +04:00

531 lines
50 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Country Resilience Index"
description: "Real-time resilience scoring for ~220 countries across 6 domains and 19 dimensions, combining structural baseline indicators with live stress signals into a 0-100 resilience score updated every 6 hours. Published at OECD/JRC methodological parity with transparent goalposts, coverage tracking, and a formal four-class imputation taxonomy."
---
The WorldMonitor Country Resilience Index (CRI) scores every country in the world on a 0-100 scale, combining long-run structural capacity with current operational stress to produce an actionable resilience metric. Rather than relying on static country risk ratings, the CRI updates every 6 hours from official and authoritative sources and exposes full provenance, coverage, and imputation context so analysts can see exactly *why* a score moved and how much of it is real data versus imputed.
This document describes the **currently shipping** behavior of the index. The versioning has two independent axes:
- **Response shape**: `schemaVersion: "2.0"` is the current default. Every response carries a real coverage-weighted `pillars[]` array regrouping the six domains into structural readiness / live shock exposure / recovery capacity. The legacy `schemaVersion: "1.0"` shape (pillars empty) remains available via the `RESILIENCE_SCHEMA_V2_ENABLED=false` env flag for one release cycle.
- **Scoring formula**: the top-level `overall_score` is the six-domain weighted aggregate (the v1 compensatory formula). The v2 non-compensatory pillar-combined formula with a min-pillar penalty is defined, validated (see Pillar-combined score activation below), and wired behind the `RESILIENCE_PILLAR_COMBINE_ENABLED` flag, but its default is `false` — activation is an explicit operator action rather than a code deploy. The annual Reference Edition at citation quality is a separate Phase 3 deliverable and is not yet shipped.
Everything documented below describes the **currently shipping** state: schemaVersion `"2.0"` shape, 6 domains × 19 dimensions × 3 pillars, and the 6-domain weighted `overall_score`. When an operator flips the pillar-combined flag on, the subsection on [Pillar-combined score activation](#pillar-combined-score-activation-flag-gated-default-off) documents what changes.
## In the dashboard
CRI is surfaced across three places in the product, all driven from the same currently-shipping score:
- **Resilience widget** — a standalone panel (component: `src/components/ResilienceWidget.ts`) that ranks countries by resilience score with filter and search affordances. Reach it from Cmd+K by typing *resilience*.
- **Country Deep-Dive** — inside the per-country drill-down panel, CRI appears alongside CII (Country Instability Index) as a structural complement to the short-horizon stress signal. CII and CRI are intentionally **not interchangeable**: CII answers "how much stress is on this country right now?"; CRI answers "how well-positioned is this country to absorb and recover from shocks?"
- **Map choropleth** — the resilience score drives a country-level choropleth layer on the main map. Toggle it from the map's layer panel or via Cmd+K.
All three surfaces are free to view. The underlying data served at `/api/resilience/v1/*` is public; see [Resilience service](/api/ResilienceService.openapi.yaml) for the HTTP contract.
## Overview
The WorldMonitor Country Resilience Index scores ~220 countries on a 0-100 scale across 6 domains and 19 dimensions (a static index of 222 is built every cron tick; countries whose mean dimension coverage falls below 0.40 are greyed out of the public ranking). It combines structural baseline indicators (governance quality, health infrastructure, fiscal capacity) with real-time stress signals (cyber threats, conflict events, shipping disruption) and recovery-capacity indicators (fiscal space, reserves, import concentration) to produce a single resilience score updated every 6 hours.
Data is sourced from official and authoritative providers: World Bank, IMF, WHO, WTO, OFAC, UNHCR, UCDP, BIS, IEA, FAO, Reporters Sans Frontieres, and the Institute for Economics and Peace, among others.
## Domains and Weights
The index is organized into 6 domains. Each domain weight reflects its relative contribution to overall national resilience. Recovery carries the largest single-domain weight (0.25) because the ability to absorb and recover from a shock is the single best structural predictor of post-shock outcomes; this is why fiscally strong smaller states cluster at the top of the ranking and fragile states separate cleanly at the bottom.
| Domain | ID | Weight | Dimensions |
|---|---|---|---|
| Economic | `economic` | 0.17 | Macro-Fiscal, Currency & External, Trade & Sanctions |
| Infrastructure | `infrastructure` | 0.15 | Cyber & Digital, Logistics & Supply, Infrastructure |
| Energy | `energy` | 0.11 | Energy |
| Social & Governance | `social-governance` | 0.19 | Governance, Social Cohesion, Border Security, Information |
| Health & Food | `health-food` | 0.13 | Health & Public Service, Food & Water |
| Recovery | `recovery` | 0.25 | Fiscal Space, Reserve Adequacy, External Debt Coverage, Import Concentration, State Continuity, Fuel Stock Days |
Weights sum to 1.00. The authoritative values live in `RESILIENCE_DOMAIN_WEIGHTS` in `server/worldmonitor/resilience/v1/_dimension-scorers.ts`; if this table and the code disagree, the code wins.
The 6 domains are regrouped into 3 pillars (structural-readiness, live-shock-exposure, recovery-capacity) with weights 0.40 / 0.35 / 0.25 for the Phase 2 pillar-combined score. The pillar shape is emitted today on every response (`schemaVersion="2.0"`, `pillars[]` populated with real coverage-weighted scores). The top-level `overallScore` is still the 6-domain weighted aggregate above; a pillar-combined score with a min-pillar penalty is staged in `_shared.ts#penalizedPillarScore` and activation is a separate PR.
## Dimensions and Indicators
Each dimension is scored from 0-100 using a weighted blend of its sub-metrics. Below is the complete indicator registry.
### Economic Domain (weight 0.17)
#### Macro-Fiscal
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| govRevenuePct | Government revenue as % of GDP (IMF GGR_G01_GDP_PT) | Higher is better | 5 - 45 | 0.50 | IMF | Annual |
| debtGrowthRate | Annual debt growth rate | Lower is better | 20 - 0 | 0.20 | National debt data | Annual |
| currentAccountPct | Current account balance as % of GDP (IMF) | Higher is better | -20 - 20 | 0.30 | IMF | Annual |
#### Currency & External
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| fxVolatility | Annualized BIS real effective exchange rate volatility | Lower is better | 50 - 0 | 0.60 | BIS | Monthly |
| fxDeviation | Absolute deviation of BIS real EER from equilibrium (100) | Lower is better | 35 - 0 | 0.25 | BIS | Monthly |
| fxReservesAdequacy | Total reserves in months of imports (World Bank FI.RES.TOTL.MO) | Higher is better | 1 - 12 | 0.15 | World Bank | Annual |
For non-BIS countries (~160 countries), a fallback chain applies: (1) IMF inflation + World Bank reserves proxy, (2) IMF inflation alone, (3) reserves alone, (4) conservative imputation (score 50, certainty 0.3).
#### Trade & Sanctions
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| sanctionCount | OFAC sanctions entity count; piecewise normalization | Lower is better | 200 - 0 | 0.45 | OFAC | Daily |
| tradeRestrictions | WTO trade restrictions count (IN_FORCE weighted 3x) | Lower is better | 30 - 0 | 0.15 | WTO | Weekly |
| tradeBarriers | WTO trade barrier notifications count | Lower is better | 40 - 0 | 0.15 | WTO | Weekly |
| appliedTariffRate | Applied tariff rate, weighted mean, all products (World Bank TM.TAX.MRCH.WM.AR.ZS) | Lower is better | 20 - 0 | 0.25 | World Bank | Annual |
Sanctions use piecewise normalization: 0 entities = score 100, 1-10 = 90-75, 11-50 = 75-50, 51-200 = 50-25, 201+ tapers toward 0.
### Infrastructure Domain (weight 0.15)
#### Cyber & Digital
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| cyberThreats | Severity-weighted cyber threat count (critical 3x, high 2x, medium 1x, low 0.5x) | Lower is better | 25 - 0 | 0.45 | Cyber threat feeds | Daily |
| internetOutages | Internet outage penalty (total 4x, major 2x, partial 1x) | Lower is better | 20 - 0 | 0.35 | Outage monitoring | Realtime |
| gpsJamming | GPS jamming hex penalty (high 3x, medium 1x) | Lower is better | 20 - 0 | 0.20 | GPSJam | Daily |
#### Logistics & Supply
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| roadsPavedLogistics | Paved roads as % of total road network (World Bank IS.ROD.PAVE.ZS) | Higher is better | 0 - 100 | 0.50 | World Bank | Annual |
| shippingStress | Global shipping stress score | Lower is better | 100 - 0 | 0.25 | Supply-chain monitor | Daily |
| transitDisruption | Mean transit corridor disruption | Lower is better | 30 - 0 | 0.25 | Transit summaries | Daily |
#### Infrastructure
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| electricityAccess | Access to electricity, % of population (World Bank EG.ELC.ACCS.ZS) | Higher is better | 40 - 100 | 0.40 | World Bank | Annual |
| roadsPavedInfra | Paved roads as % of total road network (World Bank IS.ROD.PAVE.ZS) | Higher is better | 0 - 100 | 0.35 | World Bank | Annual |
| infraOutages | Internet outage penalty (shared source with Cyber & Digital) | Lower is better | 20 - 0 | 0.25 | Outage monitoring | Realtime |
**Note on the paved-roads indicator.** The same World Bank series (`IS.ROD.PAVE.ZS`) feeds two dimensions inside the Infrastructure domain: `roadsPavedLogistics` under Logistics & Supply (weight 0.50 within the dimension) and `roadsPavedInfra` here under Infrastructure (weight 0.35 within the dimension). This is deliberate source reuse, not accidental double counting: Logistics & Supply uses paved-road coverage as a proxy for transit viability, while Infrastructure uses it as a proxy for baseline public capital stock. The two dimensions legitimately care about the same signal for different reasons, and each dimension's contribution to the domain is further mediated by the dimension weight in `coverage-weighted mean` aggregation (see the Scoring Formula section). The v2.0 reference-grade upgrade plan is expected to consolidate shared upstream signals into a single indicator registry so this kind of reuse is documented at the source level rather than per-dimension; for v1.0 the two separate metric rows are preserved for backward compatibility.
### Energy Domain (weight 0.11)
#### Energy
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| energyImportDependency | IEA energy import dependency (% of supply from imports) | Lower is better | 100 - 0 | 0.25 | IEA | Annual |
| gasShare | Natural gas share of energy mix | Lower is better | 100 - 0 | 0.12 | Energy mix data | Annual |
| coalShare | Coal share of energy mix | Lower is better | 100 - 0 | 0.08 | Energy mix data | Annual |
| renewShare | Renewable energy share of energy mix | Higher is better | 0 - 100 | 0.05 | Energy mix data | Annual |
| gasStorageStress | Gas storage fill stress: (80 - fillPct) / 80, clamped [0,1] | Lower is better | 100 - 0 | 0.10 | GIE AGSI+ | Daily |
| energyPriceStress | Mean absolute energy price change across commodities | Lower is better | 25 - 0 | 0.10 | Energy prices | Daily |
| electricityConsumption | Per-capita electricity consumption (kWh/year, World Bank EG.USE.ELEC.KH.PC) | Higher is better | 200 - 8000 | 0.30 | World Bank | Annual |
### Social & Governance Domain (weight 0.19)
#### Governance
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| wgiVoiceAccountability | World Bank WGI: Voice and Accountability | Higher is better | -2.5 - 2.5 | 1/6 | World Bank WGI | Annual |
| wgiPoliticalStability | World Bank WGI: Political Stability | Higher is better | -2.5 - 2.5 | 1/6 | World Bank WGI | Annual |
| wgiGovernmentEffectiveness | World Bank WGI: Government Effectiveness | Higher is better | -2.5 - 2.5 | 1/6 | World Bank WGI | Annual |
| wgiRegulatoryQuality | World Bank WGI: Regulatory Quality | Higher is better | -2.5 - 2.5 | 1/6 | World Bank WGI | Annual |
| wgiRuleOfLaw | World Bank WGI: Rule of Law | Higher is better | -2.5 - 2.5 | 1/6 | World Bank WGI | Annual |
| wgiControlOfCorruption | World Bank WGI: Control of Corruption | Higher is better | -2.5 - 2.5 | 1/6 | World Bank WGI | Annual |
All six WGI indicators are equally weighted.
#### Social Cohesion
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| gpiScore | Global Peace Index score | Lower is better | 3.6 - 1.0 | 0.55 | IEP | Annual |
| displacementTotal | UNHCR total displaced persons (log10 scale) | Lower is better | 7 - 0 | 0.25 | UNHCR | Annual |
| unrestEvents | Severity-weighted unrest events + sqrt(fatalities) | Lower is better | 20 - 0 | 0.20 | Unrest monitoring | Realtime |
#### Border Security
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| ucdpConflict | UCDP armed conflict: eventCount*2 + typeWeight + sqrt(deaths) | Lower is better | 30 - 0 | 0.65 | UCDP | Realtime |
| displacementHosted | UNHCR hosted displaced persons (log10 scale) | Lower is better | 7 - 0 | 0.35 | UNHCR | Annual |
#### Information & Cognitive
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| rsfPressFreedom | RSF press freedom score | Higher is better | 0 - 100 | 0.55 | RSF | Annual |
| socialVelocity | Reddit social velocity (log10(velocity+1)) | Lower is better | 3 - 0 | 0.15 | Reddit intelligence | Realtime |
| newsThreatScore | AI news threat severity (critical 4x, high 2x, medium 1x, low 0.5x) | Lower is better | 20 - 0 | 0.30 | News threat analysis | Daily |
### Health & Food Domain (weight 0.13)
#### Health & Public Service
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| uhcIndex | WHO Universal Health Coverage service coverage index | Higher is better | 40 - 90 | 0.45 | WHO | Annual |
| measlesCoverage | Measles immunization coverage among 1-year-olds (%) | Higher is better | 50 - 99 | 0.35 | WHO | Annual |
| hospitalBeds | Hospital beds per 1,000 people | Higher is better | 0 - 8 | 0.20 | WHO | Annual |
#### Food & Water
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| ipcPeopleInCrisis | IPC/FAO people in food crisis (log10 scale) | Lower is better | 7 - 0 | 0.45 | FAO/IPC | Annual |
| ipcPhase | IPC food crisis phase (1-5) | Lower is better | 5 - 1 | 0.15 | FAO/IPC | Annual |
| aquastatWaterStress | FAO AQUASTAT water stress/withdrawal/dependency (%) | Lower is better | 100 - 0 | 0.25 | FAO AQUASTAT | Annual |
| aquastatWaterAvailability | FAO AQUASTAT water availability (m3/capita) | Higher is better | 0 - 5000 | 0.15 | FAO AQUASTAT | Annual |
### Recovery Domain (weight 0.25)
This domain forms the recovery-capacity pillar. It measures a country's ability to bounce back from an acute shock along fiscal, monetary, trade, institutional, and energy dimensions.
#### Fiscal Space
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| recoveryGovRevenue | Government revenue as % of GDP (IMF GGR_G01_GDP_PT) | Higher is better | 5 - 45 | 0.40 | IMF | Annual |
| recoveryFiscalBalance | General government net lending/borrowing as % of GDP (IMF GGXCNL_G01_GDP_PT) | Higher is better | -15 - 5 | 0.30 | IMF | Annual |
| recoveryDebtToGdp | General government gross debt as % of GDP (IMF GGXWDG_NGDP_PT) | Lower is better | 150 - 0 | 0.30 | IMF | Annual |
#### Reserve Adequacy
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| recoveryReserveMonths | Total reserves in months of imports (World Bank FI.RES.TOTL.MO) | Higher is better | 1 - 18 | 1.00 | World Bank | Annual |
#### External Debt Coverage
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| recoveryDebtToReserves | Short-term external debt to reserves ratio (World Bank DT.DOD.DSTC.CD / FI.RES.TOTL.CD) | Lower is better | 5 - 0 | 1.00 | World Bank | Annual |
#### Import Concentration
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| recoveryImportHhi | Herfindahl-Hirschman Index of import partner concentration (UN Comtrade HS2 bilateral) | Lower is better | 5000 - 0 | 1.00 | UN Comtrade | Annual |
#### State Continuity
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| recoveryWgiContinuity | Mean WGI score as institutional durability proxy | Higher is better | -2.5 - 2.5 | 0.50 | World Bank | Annual |
| recoveryConflictPressure | UCDP conflict metric inverted to state continuity | Lower is better | 30 - 0 | 0.30 | UCDP | Realtime |
| recoveryDisplacementVelocity | UNHCR displacement as state continuity signal | Lower is better | 7 - 0 | 0.20 | UNHCR | Annual |
State continuity is a derived dimension: it reads from existing WGI, UCDP, and displacement keys rather than a dedicated seeder.
#### Fuel Stock Days
| Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
|---|---|---|---|---|---|---|
| recoveryFuelStockDays | Days of fuel stock cover (IEA Oil Stocks / EIA Weekly Petroleum Status) | Higher is better | 0 - 120 | 1.00 | IEA/EIA | Monthly |
Fuel stock days is an Enrichment-tier signal (coverage ~45 countries, IEA/OECD members). Countries without fuel stock data are imputed with the `unmonitored` class.
## Normalization
All indicators are normalized to a 0-100 scale using **goalpost scaling** (also called min-max normalization with domain-specific anchors).
For "higher is better" indicators:
```
score = clamp((value - worst) / (best - worst) * 100, 0, 100)
```
For "lower is better" indicators:
```
score = clamp((worst - value) / (worst - best) * 100, 0, 100)
```
Goalposts are hand-picked based on empirical data ranges (not percentile-derived). A score of 100 means the country meets or exceeds the "best" goalpost; 0 means it meets or exceeds the "worst" goalpost.
**Exception:** Sanctions use piecewise normalization to capture the non-linear impact of sanctions counts (the first few sanctions matter more than additional ones in already-sanctioned countries).
## Scoring Formula
### Dimension Score
Each dimension score is the **weighted blend** of its sub-metric scores:
```
dimensionScore = sum(metricScore_i * metricWeight_i) / sum(metricWeight_i)
```
Only metrics with available data participate in the blend. Missing metrics are excluded from both the numerator and denominator, so the score reflects what is known rather than penalizing for absent data.
### Domain Score
Each domain score is the **coverage-weighted mean** of its dimensions:
```
domainScore = sum(dimensionScore_i * dimensionCoverage_i) / sum(dimensionCoverage_i)
```
Coverage weighting ensures that dimensions with sparse data (low coverage) contribute proportionally less, preventing a low-coverage dimension from dragging the domain average down.
### Overall Score
The overall score is a **domain-weighted sum**:
```
overallScore = sum(domainScore_i * domainWeight_i)
```
Each domain's weight is defined in the configuration. The weights sum to 1.0, so the overall score is a straightforward weighted average of domain scores. This is the post-PR #2847 formula; an earlier multiplicative form (`baseline * (1 - stressFactor)`) over-penalized every country and was reverted. See the [Changelog](#changelog) for the full version history.
### Resilience Level Classification
| Score Range | Level |
|---|---|
| 70-100 | High |
| 40-69 | Medium |
| 0-39 | Low |
## Missing Data Handling
### Coverage Tracking
Each dimension carries a `coverage` value (0.0-1.0) representing the weighted certainty of its data. Real observed data contributes certainty 1.0. Imputed data contributes partial certainty. Absent data contributes 0.
```
coverage = sum(metricWeight_i * certainty_i) / sum(metricWeight_i)
```
### Imputation Taxonomy
When data is absent, the system tags it with one of four classes so downstream consumers can distinguish "nothing is happening" from "we do not know" from "the upstream is down" from "the dimension does not apply to this country." The taxonomy is defined in `server/worldmonitor/resilience/v1/_dimension-scorers.ts` as an exported `ImputationClass` type.
| Class | Meaning | Typical score | Certainty | Example sources |
|---|---|---:|---:|---|
| `stable-absence` | The source publishes globally. Country is not listed, which means the tracked phenomenon is not happening. Strong positive signal. | 85 to 88 | 0.6 to 0.7 | IPC food crisis, UNHCR displacement, UCDP conflict events |
| `unmonitored` | The source is a curated list that may not cover every country. Absence is ambiguous; penalized conservatively. | 50 to 60 | 0.3 to 0.4 | BIS exchange rates and credit, WTO trade data, OECD ICU capacity |
| `source-failure` | The upstream API was unavailable at seed time. Detected from `seed-meta` `failedDatasets`. Should be rare and transient. | inherits from the source being substituted | 0.3 to 0.5 | any source listed in `failedDatasets` during a seed run |
| `not-applicable` | The dimension is structurally N/A for this country. For example, a landlocked country has no maritime exposure. | neutral, by definition | 1.0 (by definition) | reserved for future dimensions that need structural N/A handling |
The generic imputation entries are declared in the `IMPUTATION` table and shared across dimensions. Per-metric overrides live in the `IMPUTE` table with their own score and certainty values, and inherit or override the class tag. Every entry is regression-tested in `tests/resilience-dimension-scorers.test.mts` to prevent silent drift.
| Concrete imputation entry | Class | Score | Certainty | Notes |
|---|---|---:|---:|---|
| `crisis_monitoring_absent` (IPC, UCDP, UNHCR general) | `stable-absence` | 85 | 0.7 | Used when the global crisis feed has no entry for the country |
| `curated_list_absent` (BIS, WTO general) | `unmonitored` | 50 | 0.3 | Used when a curated list does not cover the country |
| `ipcFood` (food-specific crisis monitoring) | `stable-absence` | 88 | 0.7 | Slightly higher score because no IPC data strongly implies food security |
| `wtoData` (trade-specific curated list) | `unmonitored` | 60 | 0.4 | Slightly higher than the generic curated list default |
| `unhcrDisplacement` (displacement-specific crisis monitoring) | `stable-absence` | 85 | 0.6 | Lower certainty than IPC because displacement is noisier |
| `bisEer` and `bisCredit` | `unmonitored` | 50 | 0.3 | Shared reference to `curated_list_absent`; same tag |
The `source-failure` class is reserved for the runtime path that consults `seed-meta.failedDatasets` and re-tags affected imputations; that wiring lands with a later Phase 1 task and is not yet represented in the table above. The `not-applicable` class is reserved for future dimensions and has no current call site.
### Low Confidence Flag
A score is flagged as `lowConfidence` when either:
- Average dimension coverage falls below **0.55**, or
- Imputation share (imputed weight / total weight) exceeds **0.40**.
### Grey-Out Threshold
Countries with overall coverage below **0.40** are greyed out in the UI and excluded from rankings. Their scores are too data-sparse to be meaningful.
### Imputation Share
The API response includes `imputationShare` (0.0-1.0), representing the fraction of total indicator weight that came from imputed (synthetic) data rather than observed data. This allows consumers to assess data provenance.
## Data Sources
| Source | Indicators | Cadence | Scope |
|---|---|---|---|
| IMF (WEO/IFS) | Government revenue, current account, inflation | Annual | Global |
| World Bank (WDI) | Electricity access, paved roads, reserves, tariffs, electricity consumption | Annual | Global |
| World Bank (WGI) | 6 governance indicators | Annual | Global |
| BIS | Real effective exchange rates | Monthly | ~60 countries |
| OFAC | Sanctions entity counts | Daily | Global |
| WTO | Trade restrictions, trade barriers | Weekly | ~50 reporters |
| WHO | UHC index, measles coverage, hospital beds | Annual | Global |
| FAO (IPC) | People in food crisis, crisis phase | Annual | Affected countries |
| FAO (AQUASTAT) | Water stress, water availability | Annual | Global |
| IEA | Energy import dependency | Annual | Global |
| IEP | Global Peace Index | Annual | Global |
| RSF | Press freedom score | Annual | Global |
| UNHCR | Displaced persons, hosted refugees | Annual | Affected countries |
| UCDP | Armed conflict events, fatalities | Realtime | Global |
| Cyber threat feeds | Severity-weighted cyber threats | Daily | Global |
| Outage monitoring | Internet outages | Realtime | Global |
| GPSJam | GPS jamming incidents | Daily | Global |
| Supply-chain monitor | Shipping stress, transit disruption | Daily | Global |
| Unrest monitoring | Severity-weighted civil unrest events | Realtime | Global |
| Reddit intelligence | Social velocity scores | Realtime | Global |
| News threat analysis | AI-scored news threat severity | Daily | Global |
| Energy mix data | Gas, coal, renewable shares | Annual | Global |
| GIE AGSI+ | Gas storage fill levels | Daily | European countries |
| Energy prices | Commodity price changes | Daily | Global |
| National debt data | Debt-to-GDP growth rate | Annual | Global |
## Supplementary Fields
The API response includes additional context fields that are informational and not part of the primary ranking:
- **baselineScore**: Coverage-weighted mean of baseline and mixed dimensions. Reflects structural capacity (governance, health, infrastructure, fiscal strength). Informational only, not used in `overallScore`.
- **stressScore**: Coverage-weighted mean of stress and mixed dimensions. Reflects current threat environment (cyber, conflict, sanctions, supply disruption). Informational only, not used in `overallScore`.
- **trend**: Direction of score movement over the last 30 days (`improving`, `stable`, or `declining`), based on daily score history.
- **change30d**: Numeric score change over 30 days.
- **imputationShare**: Fraction of indicator weight from imputed (synthetic) data.
- **lowConfidence**: Boolean flag when data coverage or imputation thresholds are breached.
## Versioning
Cache keys include a versioned suffix that is bumped on formula changes. This invalidates stale caches and ensures all scores reflect the updated methodology. Score cache TTL is 6 hours.
## Reproducibility Appendix
The CRI is designed to be auditable end-to-end: given the Redis snapshot at any point in time, a reader should be able to reproduce any published country score from the documented formulas without running the live service.
### Redis keys used by the scorer
| Key | Type | TTL | Written by | Read by |
|---|---|---|---|---|
| `resilience:score:v10:{countryCode}` | JSON | 6 hours | `buildResilienceScore` in `server/worldmonitor/resilience/v1/_shared.ts` | `getResilienceScore` handler |
| `resilience:ranking:v10` | JSON | 6 hours | `buildResilienceRanking`, only when all countries are scored | `getResilienceRanking` handler |
| `resilience:history:v5:{countryCode}` | sorted set | indefinite, trimmed to 30 days | `appendHistory` during scoring | trend and `change30d` computation |
| `resilience:intervals:v1:{countryCode}` | JSON | 6 hours | `scripts/seed-resilience-intervals.mjs` | `getResilienceScore` (optional `scoreInterval` field) |
| `seed-meta:resilience:static` | JSON | 2 hours | `scripts/seed-resilience-static.mjs` at the end of each successful seed run | scorer for `dataVersion` population, health checks |
| `resilience:static:{countryCode}` | JSON | 400 days | `scripts/seed-resilience-static.mjs` | scorer for all baseline signals (WGI, WHO, FAO, GPI, RSF, and so on) |
| `resilience:static:index:v1` | JSON | 400 days | `scripts/seed-resilience-static.mjs` | warmup path to enumerate countries |
### dataVersion semantics
The `dataVersion` field on every `GetResilienceScoreResponse` is the ISO date of the `fetchedAt` timestamp stored in `seed-meta:resilience:static`. It reflects the most recent successful run of the Railway static-seed job; the widget renders it in the footer as `Seed date YYYY-MM-DD`. The label is narrower than "Data" because live inputs (conflict events, sanctions, prices) can refresh at their own cadence after the static bundle runs — per-dimension freshness is surfaced separately via the freshness badge in the confidence grid.
### Reproducing a score by hand
Given a Redis snapshot at time T:
1. Read `seed-meta:resilience:static` for the `dataVersion`.
2. Read `resilience:static:{cc}` for the country's baseline record (WGI, WHO, GPI, RSF, FAO, IEA, and so on).
3. Read the live-signal keys (UCDP, UNHCR, OFAC, outages, cyber threats, prices, shipping stress, and so on) for the country's slice.
4. For each of the 19 dimensions, apply the formulas in the Scoring Formula section with the goalposts from the Dimensions and Indicators tables. For missing signals, consult the Imputation Taxonomy table in this document.
5. Aggregate dimension scores into domain scores via coverage-weighted mean.
6. Aggregate domain scores into the overall score via domain-weighted sum.
A reference Python notebook under `docs/methodology/country-resilience-index/reference-edition/` is tracked as a future deliverable and will regenerate every published score from the snapshot manifest.
## Changelog
### v1.0 (April 2026)
**Baseline.** Scored on domain-weighted average of 5 domains and 13 dimensions (pre-Recovery domain).
- PR #2821: added the baseline-vs-stress engine and the `dataVersion` field on the response.
- PR #2847: reverted the overall-score formula from `baseline * (1 - stressFactor)` (which over-penalized every country) to a domain-weighted sum; fixed the RSF press-freedom direction (0 means free, scored higher is better).
- PR #2858: seed script now computes missing country scores directly via the scorer import path instead of relying on a separate ranking writer.
### v1.1 (April 2026) — Phase 1 reference-grade upgrade
**Previous published version.** Phase 1 of the reference-grade upgrade plan (`docs/internal/country-resilience-upgrade-plan.md`). Methodology surface reorganized for full reproducibility without changing the top-line domain weights or scoring formula.
- **T1.1** (#2941): regression test pins the Norway/US top-of-ranking ordering after an origin-document claim of a 100-point ceiling did not reproduce. Failing-then-passing test guards the invariant.
- **T1.2** (#2847, #2858): pre-existing fixes from the 2026-04-07 and 2026-04-09 origin-doc reviews that were already in main at the start of Phase 1. Re-verified no additional action needed.
- **T1.3** (#2945): methodology page promoted to `.mdx` at CII parity with the required sections (Framework / Domains / Dimensions / Normalization / Weighting / Missing-data / Confidence / Ranking / Reproducibility appendix).
- **T1.4** (#2943): `dataVersion` field wired end-to-end from `seed-resilience-static:v7.dataVersion` through the scorer to the widget footer so analysts see the exact ISO date of the underlying source data.
- **T1.5** (#2947 foundation, #2961 propagation): three-level staleness classifier (`fresh`, `aging`, `stale`) driven by the per-indicator cadence in the registry. Propagated through `scoreAllDimensions` and exposed as `ResilienceDimension.freshness.{lastObservedAtMs, staleness}` on the response.
- **T1.6** (#2949 scaffold, #2962 full grid): per-dimension confidence grid in the widget. The full grid adds an imputation-class icon column (consuming T1.7 schema) and a freshness-badge column (consuming T1.5 propagation). 5-column layout with mobile responsive breakpoint.
- **T1.7** (#2944 foundation, #2959 schema, #2964 source-failure wiring): four-class imputation taxonomy `stable-absence` / `unmonitored` / `source-failure` / `not-applicable` exposed on `ResilienceDimension.imputationClass`. The scorer aggregation pass consults `seed-meta:resilience:static.failedDatasets` and re-tags imputed dimensions as `source-failure` when the underlying adapter fetch failed. Deleted the last absence-based return branch in `scoreCurrencyExternal` so the taxonomy is the single source of truth for every imputed path.
- **T1.8** (#2946): methodology doc linter enforces dimension parity between this document and `_indicator-registry.ts`. CI fails if any dimension drifts.
- **T1.9** (this PR): cache-key / health-registry sync regression test so future version bumps in `_shared.ts` cannot silently break health probes. No cache keys were bumped in Phase 1 because every schema addition was additive with default fallbacks on the existing `resilience:score:v7` and `resilience:ranking:v9` keys.
**What did not change in v1.1**: the domain-weighted aggregation formula, the 5-domain / 13-dimension structure as of v1.1, the goalpost ranges, the per-dimension weights. (Phase 2 below added the Recovery domain + 6 new recovery dimensions for the current 6/19 shape and rewired domain weights; the aggregation formula itself was unchanged.) Phase 2 owns the structural three-pillar rebuild; v1.1 is the methodology-surface and observability lift only.
### Scorecard (v1.1 self-assessment)
Self-assessed against the standard composite-indicator review axes on a 0-10 scale. This is the Phase 1 acceptance gate defined in the upgrade plan (`Methodology ≥7.5`, `Explainability ≥7.5`). An external expert review (Phase 3 T3.8b) will supersede these self-ratings once it completes.
| Axis | Score | Rationale |
|---|---|---|
| **Methodology** | 7.5 | Every dimension has a named source, direction, goalpost range, weight, cadence, and imputation class. Missing-data rules are explicit and tagged with a 4-class taxonomy. The aggregation formula is a simple domain-weighted average, auditable from first principles. Gap: the overall-score formula is still single-axis compensatory (a strong institutional score can wash out a weak exposure score), which Phase 2 replaces with a partly non-compensatory three-pillar form. |
| **Explainability** | 7.5 | Per-dimension confidence grid in the widget shows coverage %, imputation class, and freshness for every dimension on every country. Tooltip text is generated from the taxonomy so analysts can click through to the meaning without reading this document. Gap: no waterfall chart of individual signal contributions yet, that lands in Phase 3 T3.3. |
| **Reproducibility** | 8.0 | Every dimension's sourceKey, cadence, and goalpost lives in `_indicator-registry.ts` and is linted against this doc. Cache keys are versioned (`resilience:score:v7`, `ranking:v8`, `history:v4`). `dataVersion` is written by the seed and plumbed to the widget footer. Gap: the benchmark and backtest scripts do not yet run on a CI cron; those land in Phase 2 T2.7. |
| **Source quality** | 7.0 | World Bank, IMF, WHO, IEA, UNHCR, UCDP, IPC, BIS, FAO, RSF, GPI: all authoritative. Gap: curated-list sources (BIS ~40 economies, WTO) do not cover the full WorldMonitor country set, which is why the `unmonitored` imputation class exists. Phase 2 T2.9 adds language-normalized information signal to reduce English-press bias. |
| **Timeliness** | 6.5 | Structural sources are annual (WGI, GPI, RSF, WHO, IMF macro) and dominate the total weight of the index. BIS EER is monthly. The Freshness classifier (T1.5) surfaces this at the dimension level so users can see which parts of a country score are 12 months old. Thirteen stress-side indicators already run at realtime or daily cadence via the cross-source stack (`ucdpConflict`, `internetOutages`, `infraOutages`, `unrestEvents`, `socialVelocity` at realtime; `sanctionCount`, `cyberThreats`, `gpsJamming`, `shippingStress`, `transitDisruption`, `gasStorageStress`, `energyPriceStress`, `newsThreatScore` at daily). Gap: the live-shock pillar relies on those signals but the structural pillar is still capped by annual sources; Phase 2 T2.2 adds FX volatility at daily cadence to narrow the cadence gap on the currency-external dimension and the Phase 3 reference-edition split will formalize annual vs rolling cadences per pillar. |
| **Sensitivity** | 7.0 | Weight-perturbation Monte Carlo sensitivity (#2823) exists in the backtesting layer. Phase 1 did not add new sensitivity work. Gap: per-dimension p5/p95 intervals are computed and exposed (#2877, #2885) but the widget does not render them yet, Phase 3 T3.3 waterfall chart. |
**Phase 1 acceptance gate status: met.** Both required thresholds (Methodology ≥7.5, Explainability ≥7.5) are satisfied with honest rationales. The two gaps flagged in each axis are tracked against Phase 2 and Phase 3 tasks in the upgrade plan.
### v2.0 (April 2026) — Phase 2 structural rebuild
**Current published version** (shape). Phase 2 of the reference-grade upgrade plan (`docs/internal/country-resilience-upgrade-plan.md`). The response-shape rebuild is live: every response now carries a real coverage-weighted `pillars[]` array regrouping the six domains into structural readiness, live shock exposure, and recovery capacity. The recovery domain adds six new dimensions, and a full validation suite (cross-index benchmark, outcome backtest, sensitivity analysis) gates the activation. The top-level `overall_score` is still computed by the six-domain weighted aggregate (v1 formula); the partly non-compensatory pillar-combined `overall_score` is defined, tested, and flag-gated (see [Pillar-combined score activation](#pillar-combined-score-activation-flag-gated-default-off)), but `RESILIENCE_PILLAR_COMBINE_ENABLED` defaults to `false` so operators can schedule the flip with a proper migration message.
- **T2.1** (#2977): Three-pillar schema added to proto and OpenAPI. `schemaVersion: "2.0"` feature flag introduced with backward-compatible `"1.0"` fallback path for one release cycle. Response now carries a `pillars` array alongside existing `domains`.
- **T2.2a** (#2979): Signal tiering registry committed. Every indicator tagged Core, Enrichment, or Experimental with per-signal coverage percentage and license audit status. Registry enforced by CI linter.
- **T2.2b** (#2987): Recovery capacity pillar with 6 new dimensions across a new `recovery` domain: fiscal space (debt service ratio), reserve adequacy (months of imports), short-term external debt coverage, import concentration (HHI), hospital surge capacity, and state continuity composite (WGI subset). Five new seeders following Railway gold-standard pattern (3 real data sources, 2 stubs pending source configuration). Cache key bumped to the current version.
- **T2.3** (#2990): Three-pillar aggregation shape shipped. Every response now carries real coverage-weighted pillar scores and pillar coverage at `pillars[]`. Pillar weights: structural readiness 0.40, live shock exposure 0.35, recovery capacity 0.25. A penalty factor `(1 α × (1 min_pillar / 100))` with α = 0.5 is defined as `penalizedPillarScore` in `server/worldmonitor/resilience/v1/_shared.ts` and is exercised by the sensitivity suite. The **top-level `overall_score` is still the 6-domain weighted aggregate** for this release cycle; the switch to the penalized pillar-combined form is staged behind the `feat/activate-score-gate` branch and is pending the Pillar-combined score activation section below.
- **T2.4** (#2985): Cross-index benchmark script validates each pillar against four established indices (INFORM Risk Index, ND-GAIN, WorldRiskIndex, Fragile States Index) via Spearman and Pearson correlation with per-pillar directional hypotheses. Results stored in `resilience:benchmark:external:v1` and committed as validation artifacts.
- **T2.5** (#2986): Outcome backtest framework covering 7 event families (FX stress, sovereign stress, power outages, food-crisis escalation, refugee surges, sanctions shocks, conflict spillover). Each family has a binary event definition, a 2024-2025 hold-out window, and an AUC release gate of 0.75 or higher.
- **T2.6/T2.8** (#2991): Sensitivity suite v2 with 4-pass perturbation (weight, goalpost, imputation, alpha), alpha-curve analysis, and ceiling-effect detection. Release gate: no single-axis perturbation moves a top-50 country by more than 5 rank positions; overall dimension failure rate must be 20% or lower.
- **T2.7** (#2988): Railway cron service wired for weekly benchmark, backtest, and sensitivity runs. Results published to Redis with health monitoring integration.
- **T2.9** (#2992): Language and source-density normalization for the informationCognitive dimension. RSF press freedom and social velocity scores are weighted by language coverage of the source set to correct for English-press bias. The dimension is promoted back to Core tier after normalization.
**What changed from v1.1**: The five-domain flat structure was extended into a six-domain structure by adding the Recovery domain with six new dimensions, and a three-pillar outer layer groups the six domains into structural readiness (0.40), live shock exposure (0.35), and recovery capacity (0.25). Every response now carries real pillar scores at `pillars[]`. The `schemaVersion` field is `"2.0"` by default (env var `RESILIENCE_SCHEMA_V2_ENABLED=false` provides a rollback path). **The top-level `overall_score` is still the 6-domain weighted aggregate** — the pillar-combined penalized formula is fully defined and validated (see the Pillar-combined score activation section below) but the flip is staged behind a separate PR so the visible score change can ship with a proper migration message. The cache key is bumped to the current version.
### Pillar-combined score activation (pending)
The plan's non-compensatory pillar combine is the methodologically stronger form: it prevents a strong institutional score from fully washing out a severe live-shock exposure. Before flipping the default we measured the actual impact on the live ranking.
**Sensitivity and comparison artifact** (2026-04-21, commit `048bb8b`, 52-country sample, regenerated after the comparison script was corrected to use the production `buildPillarList` aggregation): [`docs/snapshots/resilience-pillar-sensitivity-2026-04-21.json`](../snapshots/resilience-pillar-sensitivity-2026-04-21.json).
| Metric | Value |
|---|---|
| Spearman rank correlation (current vs proposed) | **0.9863** |
| Mean absolute score delta | **11.30 points** (every country drops) |
| Max top-50 rank swing | **9 positions** (Syria) |
| Ceiling / floor effects under ±20% weight perturbation | **None detected** |
| Release gate result (≤20% dimensions exceeding 3-rank swing) | **PASS** (0/19 failures) |
**Top 5 movers by absolute rank change:**
| Country | Current rank | Proposed rank | Rank Δ | Current score | Proposed score | Score Δ |
|---|---:|---:|---:|---:|---:|---:|
| Syria | 40 | 49 | ↓9 | 49.64 | 30.55 | 19.09 |
| Central African Republic | 46 | 39 | ↑7 | 46.46 | 34.55 | 11.91 |
| Venezuela | 42 | 48 | ↓6 | 47.70 | 31.18 | 16.52 |
| Afghanistan | 33 | 37 | ↓4 | 54.55 | 37.97 | 16.58 |
| Russia | 23 | 27 | ↓4 | 61.08 | 46.28 | 14.80 |
**Interpretation**: Rank order is strongly preserved on the 52-country sample (Spearman 0.9863 clears the ≥0.90 bar typically required for a rank-stable methodology change). The ranking *shape* — who is top-10, who is bottom-10, Lebanon below South Africa, Norway above the US — does not materially change. However, every country's absolute score drops on average ~11 points because the penalty factor is always ≤ 1, and imbalanced countries with one very weak pillar (Syria, Afghanistan, Venezuela, Russia) drop the most (15-19 points). Balanced top-tier countries (Switzerland, Sweden, Denmark, Iceland, Norway) drop the least (5-7 points). This is the intended behavior: the penalty punishes pillar imbalance, and pillar imbalance is strongly correlated with state fragility.
**Activation sequence**: the rank-stability evidence supports flipping the default — there is no statistical reason to keep the legacy compensatory form. The blocker is messaging: publishing "US = 54.50" the day after publishing "US = 68.26" without a methodology note would look like a regression instead of a rigor upgrade. The pillar-combine activation PR wires the following so the flip is a single env-var change with no code deploy required:
1. **Feature flag**: `RESILIENCE_PILLAR_COMBINE_ENABLED`, read dynamically from `process.env` per call. Default `false`. Set to `true` in Vercel env + Railway env to activate.
2. **Cache invalidation**: per-country score cache bumped from `resilience:score:v9:` to `resilience:score:v10:`, ranking cache bumped from `resilience:ranking:v9` to `resilience:ranking:v10`, and score-history bumped from `resilience:history:v4:` to `resilience:history:v5:`. The version bumps are a clean-slate guard; the actual cross-formula isolation is the `_formula` tag written into every cached score / ranking payload and the `:d6` / `:pc` suffix on every history sorted-set member, checked at read time so a flag flip forces a rebuild without waiting for TTLs.
3. **Methodology-aware level thresholds**: `classifyResilienceLevel` reads `isPillarCombineEnabled()` and switches the high/medium cutoffs from 70/40 (6-domain) to 60/30 (pillar-combined). Without this, scale compression alone would demote FI (75.64 → 68.60) and NZ (76.26 → 67.93) from "high" to "medium" purely because the formula changed, not because anything about the country changed. The re-anchored cutoffs preserve the qualitative label for every country whose old label was correct.
4. **Re-anchored release-gate bands**: `tests/resilience-pillar-combine-activation.test.mts` pins high-band anchors (NO, CH, DK) at ≥ 60 (vs the 6-domain formula's ≥ 70 floor) and low-band anchors (YE, SO) at ≤ 40 (vs ≤ 45). The snapshot test reads `methodologyFormula` from each snapshot and applies the matching bands. The live sample numbers confirm the bands hold with margin: NO proposed ≈ 71.59 (≥ 60 by 11 points), YE ≈ 27.36 (≤ 40 by 13 points).
5. **Projected snapshot**: `docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json` carries the top/bottom/major-economies tables at the proposed formula so reviewers can preview the post-activation ranking before flipping the flag. Once the flag is on in production, run `scripts/freeze-resilience-ranking.mjs` to capture the authoritative full-universe snapshot.
Rollback: set `RESILIENCE_PILLAR_COMBINE_ENABLED=false`, flush the `resilience:score:v10:*`, `resilience:ranking:v10`, and `resilience:history:v5:*` keys (or wait for TTLs to expire). The 6-domain formula lives alongside the pillar combine in `_shared.ts` and needs no code change to come back.
Until operators set the flag, `overall_score` remains the 6-domain weighted aggregate documented above.
### Scorecard (v2.0 self-assessment)
Self-assessed against the standard composite-indicator review axes on a 0-10 scale. This is the Phase 2 acceptance gate defined in the upgrade plan (`Validation >= 8.0`, `Data >= 9.0`, `Architecture >= 9.0`). An external expert review (Phase 3 T3.8b) will supersede these self-ratings once it completes.
| Axis | Score | Rationale |
|---|---|---|
| **Validation** | 8.0 | Cross-index benchmark against 4 established indices with per-pillar hypotheses. Outcome backtest across 7 event families with AUC release gates. Sensitivity suite with 4-pass perturbation and ceiling detection. Gap: external expert review (Phase 3 T3.8b) not yet complete. |
| **Data** | 9.0 | 19 dimensions across 6 domains, 47+ indicators. Recovery capacity pillar adds 6 new dimensions with global Core-tier coverage (3 real seeders, 2 stubs pending source configuration). Signal tiering registry tags every indicator Core/Enrichment/Experimental with coverage + license audit. Gap: 2 stub seeders (import HHI, fuel stocks) need real data source integration. |
| **Architecture** | 9.0 | Three-pillar schema with schemaVersion feature flag for backward compat. Penalized weighted mean aggregation with documented alpha. Domain-weighted pillar scores. Cache-key versioning (bumped per schema change). Language normalization corrects English-press bias. Gap: alpha tuning is initial (0.5), needs backtest-driven refinement after live data accumulates. |
| **Methodology** | 8.5 | Every dimension has a named source, direction, goalpost, weight, cadence, imputation class, AND tier. Four-class imputation taxonomy live end-to-end. Freshness classifier surfaces staleness at the dimension level. Methodology doc linter enforces parity. Gap: three-pillar weight rationale is defensible but not yet empirically optimized. |
| **Explainability** | 8.0 | Per-dimension confidence grid with imputation icon + freshness badge. Pillar structure makes the index decomposable (structural vs live-shock vs recovery). Gap: no waterfall chart yet (Phase 3 T3.3), no change attribution (Phase 3 T3.5). |
| **Timeliness** | 7.0 | 13 stress-side indicators at realtime/daily cadence. Language normalization corrects for source-density bias. Recovery capacity adds monthly reserve + debt signals. Gap: structural sources still annual (WGI/GPI/RSF/WHO). Phase 3 reference-edition split formalizes annual vs rolling cadences per pillar. |
**Phase 2 acceptance gate status: met.** All three required thresholds (Validation >= 8.0, Data >= 9.0, Architecture >= 9.0) are satisfied. The gaps flagged in each axis are tracked against Phase 3 tasks in the upgrade plan.
### Editorial notes
- This document is maintained at parity with OECD/JRC composite-indicator standards: every dimension has a named source, direction, goalpost range, weight rationale, cadence, and imputation class. A methodology doc linter (Phase 1 T1.8) validates that the list of dimensions in the indicator registry matches the list documented here and fails CI if they drift.
- For questions about an individual country's score, the widget footer shows the `dataVersion`, the confidence label, and the 30-day delta; the deep-dive panel exposes per-dimension breakdowns so an analyst can see which component moved. The full proto schema lives in `docs/api/ResilienceService.openapi.yaml`.