eliott/worldmonitor - worldmonitor - lab48

eliott/worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	fbaf07e106	feat(resilience): flag-gated pillar-combined score activation (default off) (#3267 ) Wires the non-compensatory 3-pillar combined overall_score behind a RESILIENCE_PILLAR_COMBINE_ENABLED env flag. Default is false so this PR ships zero behavior change in production. When flipped true the top-level overall_score switches from the 6-domain weighted aggregate to penalizedPillarScore(pillars) with alpha 0.5 and pillar weights 0.40 / 0.35 / 0.25. Evidence from docs/snapshots/resilience-pillar-sensitivity-2026-04-21: - Spearman rank correlation current vs proposed 0.9935 - Mean score delta -13.44 points (every country drops, penalty is always at most 1) - Max top-50 rank swing 6 positions (Russia) - No ceiling or floor effects under plus/minus 20pct perturbation - Release gate PASS 0/19 Code change in server/worldmonitor/resilience/v1/_shared.ts: - New isPillarCombineEnabled() reads env dynamically so tests can flip state without reloading the module - overallScore branches on (isPillarCombineEnabled() AND RESILIENCE_SCHEMA_V2_ENABLED AND pillars.length > 0); otherwise falls through to the 6-domain aggregate (unchanged default path) - RESILIENCE_SCORE_CACHE_PREFIX bumped v9 to v10 - RESILIENCE_RANKING_CACHE_KEY bumped v9 to v10 Cache invalidation: the version bump forces both per-country score cache and ranking cache to recompute from the current code path on first read after a flag flip. Without the bump, 6-domain values cached under the flag-off path would continue to serve for up to 6-12 hours after the flip, producing a ragged mix of formulas. Ripple of v9 to v10: - api/health.js registry entry - scripts/seed-resilience-scores.mjs (both keys) - scripts/validate-resilience-correlation.mjs, scripts/backtest-resilience-outcomes.mjs, scripts/validate-resilience-backtest.mjs, scripts/benchmark-resilience-external.mjs - tests/resilience-ranking.test.mts 24 fixture usages - tests/resilience-handlers.test.mts - tests/resilience-scores-seed.test.mjs explicit pin - tests/resilience-pillar-aggregation.test.mts explicit pin - docs/methodology/country-resilience-index.mdx New tests/resilience-pillar-combine-activation.test.mts: 7 assertions exercising the flag-on path against the release fixtures with re-anchored bands (NO at least 60, YE/SO at most 40, NO greater than US preserved, elite greater than fragile). Regression guard verifies flipping the flag back off restores the 6-domain aggregate. tests/resilience-ranking-snapshot.test.mts: band thresholds now resolve from a METHODOLOGY_BANDS table keyed on snapshot.methodologyFormula. Backward compatible (missing formula defaults to domain-weighted-6d bands). Snapshots: - docs/snapshots/resilience-ranking-2026-04-21.json tagged methodologyFormula domain-weighted-6d - docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json new: top/bottom/major-economies tables projected from the 52-country sensitivity sample. Explicitly tagged projected (NOT a full-universe live capture). When the flag is flipped in production, run scripts/freeze-resilience-ranking.mjs to capture the authoritative full-universe snapshot. Methodology doc: Pillar-combined score activation section rewritten to describe the flag-gated mechanism (activation is an env-var flip, no code deploy) and the rollback path. Verification: npm run typecheck:all clean, 397/397 resilience tests pass (up from 390, +7 activation tests). Activation plan: 1. Merge this PR with flag default false (zero behavior change) 2. Set RESILIENCE_PILLAR_COMBINE_ENABLED=true in Vercel and Railway env 3. Redeploy or wait for next cold start; v9 to v10 bump forces every country to be rescored on first read 4. Run scripts/freeze-resilience-ranking.mjs against the flag-on deployment and commit the resulting snapshot 5. Ship a v2.0 methodology-change note explaining the re-anchored scale so analysts understand the universal ~13 point score drop is a scale rebase, not a country-level regression Rollback: set RESILIENCE_PILLAR_COMBINE_ENABLED=false, flush resilience:score:v10:* and resilience:ranking:v10 keys (or wait for TTLs). The 6-domain formula stays alongside the pillar combine in _shared.ts and needs no code change to come back.	2026-04-22 06:52:07 +04:00
Elie Habib	676331607a	feat(resilience): three-pillar aggregation with penalized weighted mean (T2.3) (#2990 ) * feat(resilience): three-pillar aggregation with penalized weighted mean (Phase 2 T2.3) Wire real three-pillar scoring: structural-readiness (0.40), live-shock-exposure (0.35), recovery-capacity (0.25). Add penalizedPillarScore formula with alpha=0.50 penalty factor for backtest tuning. Set recovery domain weight to 0.25 and redistribute existing domain weights proportionally to sum to 1.0. Bump cache keys v8 to v9. The penalized formula is exported and tested but overallScore stays as the v1 domain-weighted sum until the flag flips in PR 10. * fix(resilience): update test description v8 to v9 (#2990 review) Test descriptions said "(v8)" but assertions check v9 cache keys.	2026-04-12 10:18:42 +04:00
Elie Habib	dca2e1ca3c	feat(resilience): expose imputationClass on ResilienceDimension (T1.7 schema pass) (#2959 ) * feat(resilience): expose imputationClass on ResilienceDimension (T1.7 schema pass) Ships the Phase 1 T1.7 schema pass of the country-resilience reference grade upgrade plan. PR #2944 shipped the classifier table foundation (ImputationClass type, ImputationEntry interface, IMPUTATION/IMPUTE tagged with four semantic classes) and explicitly deferred the schema propagation. This PR lands that propagation so downstream consumers can distinguish "country is stable" from "country is unmonitored" from "upstream is down" from "structurally not applicable" on a per-dimension basis. What this PR commits - Proto: new imputation_class string field on ResilienceDimension (empty string = dimension has any observed data; otherwise one of stable-absence, unmonitored, source-failure, not-applicable). - Generated TS types: regenerated service_server.ts and service_client.ts via make generate. - Scorer: ResilienceDimensionScore carries ImputationClass \| null. WeightedMetric carries an optional imputationClass that imputation paths populate. weightedBlend aggregates the dominant class by weight when the dimension is fully imputed, returns null otherwise. - All IMPUTE.* early-return paths propagate the class from the table (IMPUTE.bisEer, IMPUTE.wtoData, IMPUTE.ipcFood, IMPUTE.unhcrDisplacement). - Response builder: _shared.ts buildDimensionList passes the class through to the ResilienceDimension proto field. - Tests: weightedBlend aggregation semantics (5 cases), dimension-level propagation from IMPUTE tables, serialized response includes the field. What is deliberately NOT in this PR - No widget icon rendering (T1.6 full grid, PR 3 of 5) - No source-failure seed-meta consultation (PR 4 of 5) - No freshness field (T1.5 propagation, PR 2 of 5) - No cache key bump: the new field is empty-string default, existing cached responses continue to deserialize cleanly Verified - make generate clean - npm run typecheck + typecheck:api clean - tests/resilience-dimension-scorers.test.mts all passing (existing + new) - tests/resilience-.test.mts + test:data suite passing (4361 tests) - npm run lint exits 0 fix(resilience): normalize cached score responses on read (#2959 P2) Greptile P2 finding on PR #2959: cachedFetchJson and getCachedResilienceScores return pre-change payloads verbatim, so a resilience:score:v7 entry written before this PR lands lacks the imputationClass field. Downstream consumers that read dim.imputationClass get undefined for up to 6 hours until the cache TTL expires. Fix: add normalizeResilienceScoreResponse helper that defaults missing optional fields in place and apply it at both read sites. Defaults imputationClass to empty string, matching the proto3 default for the new imputation_class field. - ensureResilienceScoreCached applies the normalizer after cachedFetchJson returns. - getCachedResilienceScores applies it after each successful JSON.parse on the pipeline result. - Two new test cases: stale payload without imputationClass gets defaulted, present values are preserved. - Not bumping the cache key: stale-read defaults are safe, the key bump would invalidate every cached score for a 6-hour cold-start cycle. The normalizer is extensible when PR #2961 adds freshness to the same payload. P3 finding (broken docs reference) verified invalid: the proto comment points to docs/methodology/country-resilience-index.mdx, which IS the current file. The .md predecessor was renamed in PR #2945 (T1.3 methodology doc promotion to CII parity). No change needed to the comment. * fix(resilience): bump score cache key v7 to v8, drop normalizer (#2959 P2) Second fixup for the Greptile P2 finding on #2959. The previous fixup (`40ea22009`) added normalizeResilienceScoreResponse to default missing imputationClass fields on cached payloads to empty string. The reviewer correctly pushed back: defaulting to empty string is the proto3 default for "dimension has observed data", which silently misreports pre-rollout imputed dimensions as observed until the 6h TTL expires. Correct fix: bump RESILIENCE_SCORE_CACHE_PREFIX from resilience:score:v7: to resilience:score:v8:. Invalidates every pre-change cache entry, so the next request per country repopulates with the correct imputationClass written by the scorer. Cost: a 6h warmup cycle where first-request-per-country recomputes the score, ~100ms per country across hundreds of requests. Also deletes the normalizeResilienceScoreResponse helper and its two call sites. It was misleading defense-in-depth that can hide future schema drift bugs. Future additive field additions should bump the key, not silently default fields. - server/worldmonitor/resilience/v1/_shared.ts: prefix v7 to v8, delete normalizer function and both call sites. - scripts/seed-resilience-scores.mjs, validate-resilience-correlation.mjs, validate-resilience-backtest.mjs: mirror constants bumped. - tests/resilience-scores-seed.test.mjs: pin literal v7 to v8. - tests/resilience-ranking.test.mts: 7 hardcoded cache keys bumped. - tests/resilience-handlers.test.mts: stray v7 cache key bumped. - tests/resilience-release-gate.test.mts: the two normalizer test cases from `40ea22009` deleted along with the helper. - docs/methodology/country-resilience-index.mdx: Redis keys table updated from v7 to v8 to match the canonical constant. P3 (broken docs reference) confirmed invalid a second time. docs/methodology/country-resilience-index.mdx exists on origin/main AND on the PR branch with the same blob hash `d2ab1ebad3`. docs/methodology/resilience-index.md does not exist on either. No proto comment change.	2026-04-11 23:50:27 +04:00
Elie Habib	75e9c22dd3	feat(resilience): populate dataVersion field from seed-meta timestamp (#2865 ) * feat(resilience): populate dataVersion field from seed-meta timestamp Sets dataVersion to the ISO date of the most recent static bundle seed, making the data vintage visible to API consumers. * fix(resilience): bump score cache to v7 for dataVersion field addition	2026-04-09 12:22:46 +04:00
Elie Habib	09ed68db09	fix(resilience): revert overall score to domain-weighted average + fix RSF direction (#2847 ) * fix(resilience): revert overall score to domain-weighted average + fix RSF direction 1. overallScore reverted from baseline(1-stressFactor) to sum(domainScore domainWeight) — the multiplicative formula crushed all scores by 30-50% 2. RSF press freedom: normalizeHigherBetter → normalizeLowerBetter (RSF 0=best, 100=worst; Norway 6.52 was scoring 7 instead of 93) 3. Seed script ranking write removed (handler owns greyedOut split) 4. Widget Impact row removed (stressFactor no longer drives headline) 5. Cache keys bumped: score v6, ranking v6, history v3 * fix(resilience): update validation scripts to v6 + remove lock from read-only seed 1. Validation scripts (backtest, correlation, sensitivity) updated from v5 to v6 cache keys. Sensitivity formula updated to domain-weighted. 2. Seed script lock removed — read-only health check needs no lock. * chore: add clarifying comment on orphaned ranking TTL export	2026-04-09 08:49:54 +04:00
Elie Habib	5b0e11262e	feat(resilience): external index correlation validation (ND-GAIN, INFORM) (#2824 ) * feat(resilience): external index correlation validation (ND-GAIN, INFORM) Batch validation script computing Spearman rho between WorldMonitor resilience scores and ND-GAIN readiness / INFORM risk indices for 50 representative countries. Identifies top divergences. Phase 4 gate: rho > 0.6 with at least 2 benchmark indices. * fix(resilience): use correct score cache key v5 in correlation script Was hardcoded to v4, but production is v5 after PR #2821 (baseline/stress engine). Script would miss all cached scores and fail with "Too few scores available". * chore(resilience): remove unused redisGetJson from correlation script	2026-04-08 16:23:10 +04:00