Commit Graph

6 Commits

Author SHA1 Message Date
Elie Habib
fbaf07e106 feat(resilience): flag-gated pillar-combined score activation (default off) (#3267)
Wires the non-compensatory 3-pillar combined overall_score behind a
RESILIENCE_PILLAR_COMBINE_ENABLED env flag. Default is false so this PR
ships zero behavior change in production. When flipped true the
top-level overall_score switches from the 6-domain weighted aggregate
to penalizedPillarScore(pillars) with alpha 0.5 and pillar weights
0.40 / 0.35 / 0.25.

Evidence from docs/snapshots/resilience-pillar-sensitivity-2026-04-21:
- Spearman rank correlation current vs proposed 0.9935
- Mean score delta -13.44 points (every country drops, penalty is
  always at most 1)
- Max top-50 rank swing 6 positions (Russia)
- No ceiling or floor effects under plus/minus 20pct perturbation
- Release gate PASS 0/19

Code change in server/worldmonitor/resilience/v1/_shared.ts:
- New isPillarCombineEnabled() reads env dynamically so tests can flip
  state without reloading the module
- overallScore branches on (isPillarCombineEnabled() AND
  RESILIENCE_SCHEMA_V2_ENABLED AND pillars.length > 0); otherwise falls
  through to the 6-domain aggregate (unchanged default path)
- RESILIENCE_SCORE_CACHE_PREFIX bumped v9 to v10
- RESILIENCE_RANKING_CACHE_KEY bumped v9 to v10

Cache invalidation: the version bump forces both per-country score
cache and ranking cache to recompute from the current code path on
first read after a flag flip. Without the bump, 6-domain values cached
under the flag-off path would continue to serve for up to 6-12 hours
after the flip, producing a ragged mix of formulas.

Ripple of v9 to v10:
- api/health.js registry entry
- scripts/seed-resilience-scores.mjs (both keys)
- scripts/validate-resilience-correlation.mjs,
  scripts/backtest-resilience-outcomes.mjs,
  scripts/validate-resilience-backtest.mjs,
  scripts/benchmark-resilience-external.mjs
- tests/resilience-ranking.test.mts 24 fixture usages
- tests/resilience-handlers.test.mts
- tests/resilience-scores-seed.test.mjs explicit pin
- tests/resilience-pillar-aggregation.test.mts explicit pin
- docs/methodology/country-resilience-index.mdx

New tests/resilience-pillar-combine-activation.test.mts:
7 assertions exercising the flag-on path against the release fixtures
with re-anchored bands (NO at least 60, YE/SO at most 40, NO greater
than US preserved, elite greater than fragile). Regression guard
verifies flipping the flag back off restores the 6-domain aggregate.

tests/resilience-ranking-snapshot.test.mts: band thresholds now
resolve from a METHODOLOGY_BANDS table keyed on
snapshot.methodologyFormula. Backward compatible (missing formula
defaults to domain-weighted-6d bands).

Snapshots:
- docs/snapshots/resilience-ranking-2026-04-21.json tagged
  methodologyFormula domain-weighted-6d
- docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json
  new: top/bottom/major-economies tables projected from the
  52-country sensitivity sample. Explicitly tagged projected (NOT a
  full-universe live capture). When the flag is flipped in production,
  run scripts/freeze-resilience-ranking.mjs to capture the
  authoritative full-universe snapshot.

Methodology doc: Pillar-combined score activation section rewritten to
describe the flag-gated mechanism (activation is an env-var flip, no
code deploy) and the rollback path.

Verification: npm run typecheck:all clean, 397/397 resilience tests
pass (up from 390, +7 activation tests).

Activation plan:
1. Merge this PR with flag default false (zero behavior change)
2. Set RESILIENCE_PILLAR_COMBINE_ENABLED=true in Vercel and Railway env
3. Redeploy or wait for next cold start; v9 to v10 bump forces every
   country to be rescored on first read
4. Run scripts/freeze-resilience-ranking.mjs against the flag-on
   deployment and commit the resulting snapshot
5. Ship a v2.0 methodology-change note explaining the re-anchored
   scale so analysts understand the universal ~13 point score drop is
   a scale rebase, not a country-level regression

Rollback: set RESILIENCE_PILLAR_COMBINE_ENABLED=false, flush
resilience:score:v10:* and resilience:ranking:v10 keys (or wait for
TTLs). The 6-domain formula stays alongside the pillar combine in
_shared.ts and needs no code change to come back.
2026-04-22 06:52:07 +04:00
Elie Habib
676331607a feat(resilience): three-pillar aggregation with penalized weighted mean (T2.3) (#2990)
* feat(resilience): three-pillar aggregation with penalized weighted mean (Phase 2 T2.3)

Wire real three-pillar scoring: structural-readiness (0.40), live-shock-exposure
(0.35), recovery-capacity (0.25). Add penalizedPillarScore formula with alpha=0.50
penalty factor for backtest tuning. Set recovery domain weight to 0.25 and
redistribute existing domain weights proportionally to sum to 1.0. Bump cache
keys v8 to v9. The penalized formula is exported and tested but overallScore
stays as the v1 domain-weighted sum until the flag flips in PR 10.

* fix(resilience): update test description v8 to v9 (#2990 review)

Test descriptions said "(v8)" but assertions check v9 cache keys.
2026-04-12 10:18:42 +04:00
Elie Habib
dca2e1ca3c feat(resilience): expose imputationClass on ResilienceDimension (T1.7 schema pass) (#2959)
* feat(resilience): expose imputationClass on ResilienceDimension (T1.7 schema pass)

Ships the Phase 1 T1.7 schema pass of the country-resilience reference
grade upgrade plan. PR #2944 shipped the classifier table foundation
(ImputationClass type, ImputationEntry interface, IMPUTATION/IMPUTE
tagged with four semantic classes) and explicitly deferred the schema
propagation. This PR lands that propagation so downstream consumers can
distinguish "country is stable" from "country is unmonitored" from
"upstream is down" from "structurally not applicable" on a per-dimension
basis.

What this PR commits

- Proto: new imputation_class string field on ResilienceDimension
  (empty string = dimension has any observed data; otherwise one of
  stable-absence, unmonitored, source-failure, not-applicable).
- Generated TS types: regenerated service_server.ts and service_client.ts
  via make generate.
- Scorer: ResilienceDimensionScore carries ImputationClass | null.
  WeightedMetric carries an optional imputationClass that imputation
  paths populate. weightedBlend aggregates the dominant class by
  weight when the dimension is fully imputed, returns null otherwise.
- All IMPUTE.* early-return paths propagate the class from the table
  (IMPUTE.bisEer, IMPUTE.wtoData, IMPUTE.ipcFood, IMPUTE.unhcrDisplacement).
- Response builder: _shared.ts buildDimensionList passes the class
  through to the ResilienceDimension proto field.
- Tests: weightedBlend aggregation semantics (5 cases), dimension-level
  propagation from IMPUTE tables, serialized response includes the field.

What is deliberately NOT in this PR

- No widget icon rendering (T1.6 full grid, PR 3 of 5)
- No source-failure seed-meta consultation (PR 4 of 5)
- No freshness field (T1.5 propagation, PR 2 of 5)
- No cache key bump: the new field is empty-string default, existing
  cached responses continue to deserialize cleanly

Verified

- make generate clean
- npm run typecheck + typecheck:api clean
- tests/resilience-dimension-scorers.test.mts all passing (existing + new)
- tests/resilience-*.test.mts + test:data suite passing (4361 tests)
- npm run lint exits 0

* fix(resilience): normalize cached score responses on read (#2959 P2)

Greptile P2 finding on PR #2959: cachedFetchJson and
getCachedResilienceScores return pre-change payloads verbatim, so a
resilience:score:v7 entry written before this PR lands lacks the
imputationClass field. Downstream consumers that read
dim.imputationClass get undefined for up to 6 hours until the cache
TTL expires.

Fix: add normalizeResilienceScoreResponse helper that defaults
missing optional fields in place and apply it at both read sites.
Defaults imputationClass to empty string, matching the proto3 default
for the new imputation_class field.

- ensureResilienceScoreCached applies the normalizer after
  cachedFetchJson returns.
- getCachedResilienceScores applies it after each successful
  JSON.parse on the pipeline result.
- Two new test cases: stale payload without imputationClass gets
  defaulted, present values are preserved.
- Not bumping the cache key: stale-read defaults are safe, the key
  bump would invalidate every cached score for a 6-hour cold-start
  cycle. The normalizer is extensible when PR #2961 adds freshness
  to the same payload.

P3 finding (broken docs reference) verified invalid: the proto
comment points to docs/methodology/country-resilience-index.mdx,
which IS the current file. The .md predecessor was renamed in
PR #2945 (T1.3 methodology doc promotion to CII parity). No change
needed to the comment.

* fix(resilience): bump score cache key v7 to v8, drop normalizer (#2959 P2)

Second fixup for the Greptile P2 finding on #2959. The previous fixup
(40ea22009) added normalizeResilienceScoreResponse to default missing
imputationClass fields on cached payloads to empty string. The
reviewer correctly pushed back: defaulting to empty string is the
proto3 default for "dimension has observed data", which silently
misreports pre-rollout imputed dimensions as observed until the 6h
TTL expires.

Correct fix: bump RESILIENCE_SCORE_CACHE_PREFIX from resilience:score:v7:
to resilience:score:v8:. Invalidates every pre-change cache entry, so
the next request per country repopulates with the correct
imputationClass written by the scorer. Cost: a 6h warmup cycle where
first-request-per-country recomputes the score, ~100ms per country
across hundreds of requests.

Also deletes the normalizeResilienceScoreResponse helper and its two
call sites. It was misleading defense-in-depth that can hide future
schema drift bugs. Future additive field additions should bump the
key, not silently default fields.

- server/worldmonitor/resilience/v1/_shared.ts: prefix v7 to v8,
  delete normalizer function and both call sites.
- scripts/seed-resilience-scores.mjs, validate-resilience-correlation.mjs,
  validate-resilience-backtest.mjs: mirror constants bumped.
- tests/resilience-scores-seed.test.mjs: pin literal v7 to v8.
- tests/resilience-ranking.test.mts: 7 hardcoded cache keys bumped.
- tests/resilience-handlers.test.mts: stray v7 cache key bumped.
- tests/resilience-release-gate.test.mts: the two normalizer test
  cases from 40ea22009 deleted along with the helper.
- docs/methodology/country-resilience-index.mdx: Redis keys table
  updated from v7 to v8 to match the canonical constant.

P3 (broken docs reference) confirmed invalid a second time.
docs/methodology/country-resilience-index.mdx exists on origin/main
AND on the PR branch with the same blob hash
d2ab1ebad3. docs/methodology/resilience-index.md
does not exist on either. No proto comment change.
2026-04-11 23:50:27 +04:00
Elie Habib
75e9c22dd3 feat(resilience): populate dataVersion field from seed-meta timestamp (#2865)
* feat(resilience): populate dataVersion field from seed-meta timestamp

Sets dataVersion to the ISO date of the most recent static bundle
seed, making the data vintage visible to API consumers.

* fix(resilience): bump score cache to v7 for dataVersion field addition
2026-04-09 12:22:46 +04:00
Elie Habib
09ed68db09 fix(resilience): revert overall score to domain-weighted average + fix RSF direction (#2847)
* fix(resilience): revert overall score to domain-weighted average + fix RSF direction

1. overallScore reverted from baseline*(1-stressFactor) to
   sum(domainScore * domainWeight) — the multiplicative formula
   crushed all scores by 30-50%
2. RSF press freedom: normalizeHigherBetter → normalizeLowerBetter
   (RSF 0=best, 100=worst; Norway 6.52 was scoring 7 instead of 93)
3. Seed script ranking write removed (handler owns greyedOut split)
4. Widget Impact row removed (stressFactor no longer drives headline)
5. Cache keys bumped: score v6, ranking v6, history v3

* fix(resilience): update validation scripts to v6 + remove lock from read-only seed

1. Validation scripts (backtest, correlation, sensitivity) updated from
   v5 to v6 cache keys. Sensitivity formula updated to domain-weighted.
2. Seed script lock removed — read-only health check needs no lock.

* chore: add clarifying comment on orphaned ranking TTL export
2026-04-09 08:49:54 +04:00
Elie Habib
5b0e11262e feat(resilience): external index correlation validation (ND-GAIN, INFORM) (#2824)
* feat(resilience): external index correlation validation (ND-GAIN, INFORM)

Batch validation script computing Spearman rho between WorldMonitor
resilience scores and ND-GAIN readiness / INFORM risk indices for 50
representative countries. Identifies top divergences.

Phase 4 gate: rho > 0.6 with at least 2 benchmark indices.

* fix(resilience): use correct score cache key v5 in correlation script

Was hardcoded to v4, but production is v5 after PR #2821 (baseline/stress
engine). Script would miss all cached scores and fail with "Too few
scores available".

* chore(resilience): remove unused redisGetJson from correlation script
2026-04-08 16:23:10 +04:00