worldmonitor

eliott/worldmonitor

Fork 0

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Elie Habib	52659ce192	feat(resilience): PR 1 — energy construct repair (flag-gated) (#3289 ) * docs(resilience): PR 1 foundation — Option B framing + v2 energy construct spec First commit in PR 1 of the resilience repair plan. Zero scoring-behaviour change; sets up the construct contract that the code changes will implement. Declares the framing decision required by plan section 3.2 before any scorer code lands: Option B (power-system security) is adopted. Electricity grids are the dominant short-horizon shock-transmission channel, and the choice lets the v2 energy indicator set share one denominator (percent of electricity generation) instead of mixing primary-energy and power-system measures in a composite. Methodology doc changes: - Energy Domain section now documents both the legacy indicator set (still the default) and the v2 indicator set (flag-gated), under a single #### Energy H4 heading so the methodology-doc linter still asserts dimension-id parity with the registry. - v2 indicators: importedFossilDependence (EG.ELC.FOSL.ZS x max(EG.IMP.CONS.ZS, 0)), lowCarbonGenerationShare (EG.ELC.NUCL.ZS + EG.ELC.RNEW.ZS), powerLossesPct (EG.ELC.LOSS.ZS), reserveMarginPct (IEA), euGasStorageStress (renamed + scoped to EU), energyPriceStress (retained at 0.15 weight). - Retired under v2: electricityConsumption, gasShare, coalShare, dependency (all into importedFossilDependence), renewShare. - electricityAccess moves from energy to infrastructure under v2. - Added a v2.1 changelog section documenting the flag-gated rollout, acceptance gates (per plan section 6), and snapshot filenames for the post-flag-flip captures. - Known-limitations items 1-3 updated to note PR 1 lands the v2 construct behind RESILIENCE_ENERGY_V2_ENABLED (default off). Methodology-doc linter + mdx-lint + typecheck all clean. Indicator registry, seeders, and scorer rewrite land in subsequent commits on this same branch. * feat(resilience): PR 1 — RESILIENCE_ENERGY_V2_ENABLED flag + scoreEnergy v2 + registry entries Second commit in PR 1 of the resilience repair plan. Lands the flag, the v2 scorer code path, and the registry entries the methodology doc referenced. Default is flag off; published rankings are unchanged until the flag flips in a later commit (after seeders land and the acceptance-gate rerun produces a fresh post-flip snapshot). Changes: - _shared.ts: isEnergyV2Enabled() function reader on the canonical RESILIENCE_ENERGY_V2_ENABLED env var. Dynamic read (like isPillarCombineEnabled) so tests can flip per-case. - _dimension-scorers.ts: - New Redis key constants for the three v2 seed keys plus the reserved reserveMargin key (seeder deferred per plan §3.1 open-question). - EU_GAS_STORAGE_COUNTRIES set (EU + EFTA + UK) for the renamed euGasStorageStress signal per plan §3.5 point 2. - isEnergyV2EnabledLocal() — private duplicate of the flag reader to avoid a circular import (_shared.ts already imports from this module). Same env-var contract. - scoreEnergy split into scoreEnergyLegacy() + scoreEnergyV2(). Public scoreEnergy() branches on the flag. Legacy path is byte-identical to the pre-commit behaviour. - scoreEnergyV2() reads four new bulk payloads, composes importedFossilDependence = fossilElectricityShare × max(netImports, 0)/100 per plan §3.2, collapses net exporters to 0, and gates euGasStorageStress on EU membership so non-EU countries re-normalise rather than getting penalised for a regional signal. - _indicator-registry.ts: four new entries under `dimension: 'energy'` with `tier: 'experimental'` — importedFossilDependence (0.35), lowCarbonGenerationShare (0.20), powerLossesPct (0.10), reserveMarginPct (0.10). Experimental tier keeps them out of the Core coverage gate until seed coverage is confirmed. - compare-resilience-current-vs-proposed.mjs: new 'bulk-v1-country-value' shape family in the extraction dispatcher. EXTRACTION_RULES now covers the four v2 registry indicators so the per-indicator influence harness tracks them from day one. When the seeders are absent, pairedSampleSize = 0 and Pearson = 0 — the harness output surfaces the "no influence yet" state rather than silently dropping the indicators. - tests/resilience-energy-v2.test.mts: 11 new tests pinning: - flag-off = legacy behaviour preserved (v2 seed keys have no effect when flag is off — catches accidental cross-path reads) - flag-on = v2 composite behaves correctly: - lower fossilElectricityShare raises score - net exporter with 90% fossil > net importer with 90% fossil (max(·, 0) collapse verified) - higher lowCarbonGenerationShare raises score (nuclear credit) - higher powerLossesPct lowers score - euGasStorageStress is invariant for non-EU, responds for DE - all v2 inputs absent = graceful degradation, coverage < 1.0 106 resilience tests pass (existing + 11 new). Typecheck clean. Biome clean. No production behaviour change with flag off (default). Next commits on this branch: three World Bank seeders for the v2 keys, health.js + SEED_META registration (gated ON_DEMAND_KEYS until Railway cron provisions), acceptance-gate rerun at flag-flip time. * feat(resilience): PR 1 — three WB seeders + health registration for v2 energy construct Third commit in PR 1. Lands the seed scripts for the three v2 energy indicator source keys, registered in api/health.js with ON_DEMAND_KEYS gating until Railway cron provisions. New seeders (weekly cron cadence, 8d maxStaleMin = 2x interval): - scripts/seed-low-carbon-generation.mjs Pulls EG.ELC.NUCL.ZS + EG.ELC.RNEW.ZS from World Bank, sums per country into `resilience:low-carbon-generation:v1`. Partial coverage (one series missing) still emits a value using the observed half — the scorer's 0-80 saturating goalpost tolerates it and the underlying construct is "firm low-carbon share". - scripts/seed-fossil-electricity-share.mjs Pulls EG.ELC.FOSL.ZS into `resilience:fossil-electricity-share:v1`. Feeds the importedFossilDependence composite at score time (composite = fossilShare × max(netImports, 0) / 100 per plan §3.2). - scripts/seed-power-reliability.mjs Pulls EG.ELC.LOSS.ZS into `resilience:power-losses:v1`. Direct grid-integrity signal replacing the retired electricityConsumption wealth proxy. All three follow the existing seed-recovery-.mjs template: - Shape: { countries: { [ISO2]: { value, year } }, seededAt } - runSeed() from _seed-utils.mjs with schemaVersion=1, ttl=35d - validateFn floor of 150 countries (WB coverage is 150-180 for the three indicators; below 150 = transient fetch failure) - ISO3 → ISO2 mapping via scripts/shared/iso3-to-iso2.json No reserveMargin seeder is shipped in this commit per plan §3.1 open question: IEA electricity-balance coverage is sparse outside OECD+G20, and the indicator will likely ship as 'unmonitored' with weight 0.05 if it lands at all. The Redis key (`resilience:reserve-margin:v1`) is reserved in _dimension-scorers.ts so the v2 scorer shape is stable. api/health.js: - SEED_DOMAINS: add `lowCarbonGeneration`, `fossilElectricityShare`, `powerLosses` → their Redis keys. - SEED_META: same three, pointing at `seed-meta:resilience:` meta keys with maxStaleMin=11520 (8d, per the worldmonitor health-maxstalemin-write-cadence pattern: 2x weekly cron). - ON_DEMAND_KEYS: three new entries gated as TRANSITIONAL until Railway cron provisions and the first clean run completes. Remove from this set after ~7 days of green production runs. Typecheck clean; existing 106 resilience tests pass (seeders have no in-repo callers yet, so nothing depends on them executing). Real-API integration tests land when Railway cron is provisioned. Next commit: Railway cron configuration + bundle-runner wiring. * feat(resilience): PR 1 — bundle-runner + acceptance-gate verdict + flag-flip runbook Final commit in the PR 1 tranche. Lands the three remaining pieces so the flag-flip is fully operable once Railway cron provisions. - scripts/seed-bundle-resilience-energy-v2.mjs Railway cron bundle wrapping the three v2 energy seeders (low-carbon-generation, fossil-electricity-share, power-losses). Weekly cadence (7-day intervalMs); the underlying data is annual at source so polling more frequently just hammers the World Bank API. 5-minute per-script timeout. Mirrors the existing seed-bundle-resilience-recovery.mjs pattern. - scripts/compare-resilience-current-vs-proposed.mjs: acceptanceGates block. Programmatic evaluation of plan §6 gates using the inputs the harness already computes: gate-1-spearman Spearman vs baseline >= 0.85 gate-2-country-drift Max country drift vs baseline <= 15 gate-6-cohort-median Cohort median shift vs baseline <= 10 gate-7-matched-pair Every pair holds expected direction gate-9-effective-influence >= 80% Core indicators measurable gate-universe-integrity No cohort/pair endpoint missing from scorable Thresholds are encoded in a const so they can't silently soften. Output verdict is PASS / CONDITIONAL / BLOCK. Emitted in summary.acceptanceVerdict for at-a-glance PR comment pasting, with full per-gate detail in acceptanceGates.results. - docs/methodology/energy-v2-flag-flip-runbook.md Operator runbook for the flag flip. Pre-flip checklist (seeders green, health endpoint green, ON_DEMAND_KEYS graduation, Spearman verification), flip procedure (pre-flip snapshot, dry-run, cache prefix bump, Vercel env flip, post-flip snapshot, methodology doc reclassification), rollback procedure, and a reference table for the three possible verdict states. PR 1 is now code-complete pending: 1. Railway cron provisioning (ops, not code) 2. Flag flip + acceptance-gate rerun (follows runbook, not code) 3. Reserve-margin seeder (deferred per plan §3.1 open-question) Zero scoring-behaviour change in this commit. 121 resilience tests pass, typecheck clean. * fix(resilience): PR 1 — drop unseeded reserveMargin from scorer + fix composite extractor Addresses two P1 review findings on PR #3289. Finding 1: scoreEnergyV2 read resilience:reserve-margin:v1 at weight 0.10 but no seeder ships in this PR (indicator deferred per plan §3.1 open-question). On flag flip that slot would be permanently null, silently renormalizing the remaining 90% of weight and producing a construct different from what the methodology doc describes. Fix: remove reserve-margin from the v2 reader + blend entirely. Redistribute its 0.10 weight to powerLossesPct (now 0.20); both are grid-integrity signals per plan §3.1, and the original plan split electricityConsumption's 0.30 weight across powerLossesPct + reserveMarginPct + importedFossilDependence — without reserveMarginPct, powerLossesPct carries the shared grid-integrity load until the IEA seeder ships. v2 weights now: 0.35 + 0.20 + 0.20 + 0.10 + 0.15 = 1.00 (importedFossilDependence + lowCarbonGenerationShare + powerLossesPct + euGasStorageStress + energyPriceStress) Reserve-margin Redis key constant stays reserved so the v2 scorer shape is stable when a future commit lands the seeder; split 0.10 back out of powerLossesPct at that point. Methodology doc, _shared.ts flag comment, and v2 test suite all updated to the 5-indicator shape. New regression test asserts that changing reserve-margin Redis content has zero effect on the v2 score — guards against a future commit accidentally wiring the reader back in without its seeder. Finding 2: scripts/compare-resilience-current-vs-proposed.mjs measured importedFossilDependence by reading fossilElectricityShare alone. The scorer defines it as fossilShare × max(netImports, 0) / 100, so the extractor zeroed out net exporters and under-reported net importers — making gate-9 effective-influence wrong for the centrepiece construct change of PR 1. Fix: new 'imported-fossil-dependence-composite' extractor type in applyExtractionRule that recomputes the same composite from both inputs (fossilShare bulk payload + staticRecord.iea. energyImportDependency.value). Stays in lockstep with the scorer — drift between the two would break gate-9's interpretation. New unit tests pin: - net importer: 80% × max(60, 0) / 100 = 48 ✓ - net exporter: 80% × max(-40, 0) / 100 = 0 ✓ - missing either input → null 64 resilience tests pass; typecheck clean. Flag-off path is still byte-identical to pre-PR behaviour. * docs(resilience): PR 1 — align methodology doc with actual shipped indicators and seeders Addresses P1 review on docs/methodology/country-resilience-index.mdx lines 29 and 574-575. The doc still described reserveMarginPct as a shipped v2 indicator and listed seed-net-energy-imports.mjs in the new-seeders list, neither of which the branch actually ships. Doc changes to match the code in this branch: Known-limitations item 1: restated to describe the actual v2 replacement footprint — powerLossesPct at 0.20 (temporarily absorbing reserveMarginPct's 0.10) plus accessToElectricityPct moved to infrastructure. reserveMarginPct is named as a deferred companion with the split-out instructions for when its seeder lands. v2.1 changelog (Indicators added): split into "live in PR 1" and "deferred in PR 1" so the reader can distinguish which entries match real code. importedFossilDependence's composite formula now written out and the net-imports source attributed to the existing resilience:static.iea path (not a new seeder). v2.1 changelog (New seeders): lists the three actual files that ship in this branch (seed-low-carbon-generation, seed-fossil- electricity-share, seed-power-reliability) and explicitly notes seed-net-energy-imports.mjs is NOT a new seeder — the EG.IMP.CONS.ZS series is already fetched by seed-resilience- static.mjs. Adds the bundle-runner reference. Methodology-doc linter + mdx-lint both pass (125/125). Typecheck clean. Doc is now the source of truth for what PR 1 actually ships. * fix(resilience): PR 1 — sync powerLossesPct registry weight with scorer (0.10 → 0.20) Reviewer-caught mismatch between INDICATOR_REGISTRY and scoreEnergyV2. The previous commit redistributed the deferred reserveMarginPct's 0.10 weight into powerLossesPct in the SCORER but left the REGISTRY entry unchanged at 0.10. Two downstream effects: 1. scripts/compare-resilience-current-vs-proposed.mjs copies `spec.weight` into `nominalWeight` for gate-9 reporting, so powerLossesPct's nominal influence would be under-reported by half in every post-flip acceptance run — exactly the harness PR 1 relies on for merge evidence. 2. Methodology doc vs registry vs scorer drift is the pattern the methodology-doc linter is supposed to catch; it passes here because the linter only checks dimension-id parity, not weights. Registry is now the only remaining source of truth to keep in lockstep with the scorer. Change: - `_indicator-registry.ts` powerLossesPct.weight: 0.1 → 0.2 - Inline comment names the deferral and instructs: "when the IEA electricity-balance seeder lands, split 0.10 back out and restore reserveMarginPct at 0.10. Keep this field in lockstep with scoreEnergyV2 ... because the PR 0 compare harness copies spec.weight into nominalWeight for gate-9 reporting." Experimental weights per dimension invariant still holds (0.35 + 0.20 + 0.20 = 0.75 for energy, well under the 1.0 ceiling). 64 resilience tests pass, typecheck clean.	2026-04-22 17:10:38 +04:00
Elie Habib	da0f26a3cf	feat(resilience): PR 0 diagnostic freeze + fairness-audit harness (no scoring changes) (#3284 ) * feat(resilience): PR 0 diagnostic freeze + fairness-audit harness Lands the before-state and measurement apparatus every subsequent resilience-scorer PR validates against. Zero scoring changes. Per the v3 plan at docs/plans/2026-04-22-001-fix-resilience-scorer-structural- bias-plan.md this is tranche 0 of five. What lands: - Construct contract published in the methodology doc: absolute resilience not development-adjusted, mechanism test for every indicator, peer-relative views published separately from the core. - Known construct limitations section: six construct errors scheduled for PR 1-3 repair with explicit mapping to plan tranches. - Indicator-source manifest at docs/methodology/indicator-sources.yaml with source, seriesId, seriesUrl, coveragePct, lastObservedYear, license, mechanismTestRationale, and a constructStatus classification. - Pre-repair ranking snapshot at docs/snapshots/resilience-ranking-live-pre-repair-2026-04-22.json (217 items + 5 greyedOut, captured 2026-04-22 08:38 UTC at commit `425507d15`). - Cohort configuration at tests/helpers/resilience-cohorts.mts: six cohorts covering 87 countries (net-fuel-exporters, net-energy- importers-oecd, nuclear-heavy-generation, coal-heavy-domestic, small-island-importers, fragile-states). - Matched-pair sanity panel at tests/helpers/resilience-matched-pairs.mts: six pairs (FR/DE, NO/CA, UAE/BH, JP/KR, IN/ZA, SG/CH) with expected- direction rationale and minGap for acceptance gate 7. - scripts/compare-resilience-current-vs-proposed.mjs extended to emit cohortSummary and matchedPairSummary alongside the existing output shape (backward compatible). - tests/resilience-cohort-config.test.mts: 11 validations ensuring the cohort + matched-pair configs stay well-formed. Deferred to PR 0.5 (before PR 1 lands): - Monotonicity test harness for all 19 dimension scorers pinning the sign of every indicator. - Pearson-derivative variable-influence baseline inside the sensitivity script producing the nominal-weight-vs-effective-influence table that plan acceptance gate 8 requires. Verification: typecheck:all clean, 430/430 resilience tests pass, 11/11 new cohort-config tests pass, snapshot auto-discovered and validated by the existing snapshot-test harness. * feat(resilience): PR 0 follow-ups — monotonicity harness, variable-influence baseline, cross-consumer formula gate Completes the PR 0 scope per the v3 plan §5 deliverables. Three adds: 1. Monotonicity test harness tests/resilience-dimension-monotonicity.test.mts pins the direction of movement for 14 indicators across 7 dimensions (reserve adequacy, fiscal space 3x, external debt coverage, import concentration, governance WGI, food/water 2x, energy 5x). Each test builds two synthetic ResilienceSeedReader fixtures differing only in the target indicator and asserts the dimension score moves in the documented direction. The scoreEnergy tests explicitly flag three indicators (gasShare, coalShare, electricityConsumption) that PR 1 §3.1-3.2 overturns so future readers understand which directional claims the plan intentionally replaces. 2. Variable-influence baseline scripts/compare-resilience-current-vs-proposed.mjs now computes per-dimension Pearson correlation against the current overallScore scaled by the dimension's nominal domain weight (a Pearson-derivative approximation of Sobol indices). The output carries a variableInfluence[] array sorted by abs(effectiveInfluence) desc. Acceptance gate 8 from the plan compares post-change effective influence against assigned nominal weight; divergences flag a wealth-proxy or saturated-signal construct problem. 3. Cross-consumer formula gate Five external consumers of resilience:score:v10:* now filter stale- formula entries so a flag flip does not serve mixed-formula data downstream: - server/worldmonitor/supply-chain/v1/get-route-impact.ts — readResilienceScore() checks _formula via the new getCurrentCacheFormula export and returns 0 on mismatch. - scripts/validate-resilience-correlation.mjs, scripts/validate-resilience-backtest.mjs, scripts/backtest-resilience-outcomes.mjs, scripts/benchmark-resilience-external.mjs — each inlines a currentCacheFormulaLocal() helper that mirrors the server's formula derivation from env, skips parsed entries whose _formula disagrees, and logs the skip count so operators can notice a mismatch during the flip window. A mixed-formula cohort (some countries d6-tagged, others pc-tagged) would confound every correlation, AUC, and Spearman this repair plan depends on for its acceptance gates. These guards close that gap. Verification: typecheck:all clean, 444/444 resilience tests pass (+14 from the new monotonicity harness). * fix(resilience): PR 0 review follow-ups — sample-union + doc tense Two review-driven fixes on top of PR 0. 1. scripts/compare-resilience-current-vs-proposed.mjs — the cohort and matched-pair summaries were computed against the historical 52-country sensitivity seed, which silently excluded the small-island-importers cohort (zero members in the seed) and the sg-vs-ch matched pair (Singapore not in the seed). With the current script those acceptance gates are partially measured at best. SAMPLE now = union(historical 52 seed, every cohort member, every matched-pair endpoint). The imports for RESILIENCE_COHORTS and MATCHED_PAIRS moved from inside main() to module scope so the union can be computed before the script runs. Net sample size grows from 52 to ~95 countries. Still fast enough for an interactive pass; makes the acceptance gates honest. 2. docs/methodology/country-resilience-index.mdx — the construct contract wording read as present-tense compliance ("Every indicator in the scorer passes a single mechanism test"), which contradicted the immediately-following passage about indicators that currently fail the test. Reworded to "is being evaluated against" and added an explicit PR-0-does-not-change-scoring paragraph that names the known-failing indicators (electricityConsumption, gas/coal flat penalties, WHO per-capita health spend) and points at the repair plan for the replacement schedule. Verification: typecheck:all clean, 444/444 resilience tests pass. * fix(resilience): compare-script loads frozen baseline + emits per-indicator influence Addresses two P1 review findings on PR #3284: 1. Script previously compared current-6d vs proposed-pillar-combined from the SAME checkout; never loaded the frozen pre-PR-0 baseline, so acceptance gates 2/6/7 ("no country moved >15pts vs baseline", cohort median shift vs baseline, matched-pair gap change vs baseline) could not be enforced for later scorer PRs. Now auto-discovers the most recent resilience-ranking-live-pre-repair-<date>.json (or post-<pr>-<date>) in docs/snapshots/ and emits a baselineComparison block with: spearmanVsBaseline, maxCountryAbsDelta, biggestDriftsVsBaseline, cohortShiftVsBaseline, matchedPairGapChange. If no baseline is found, the block is emitted with status 'unavailable' so callers distinguish missing-baseline from passed-baseline. 2. variableInfluence was emitted only at the dimension level, which hid the exact sub-indicators the repair plan targets (electricityConsumption, gasShare, coalShare, etc.) inside their parent dimension. Added extractIndicatorValues() which pulls twelve construct-risk indicators per country from the shared memoized reader, then computes per-indicator Pearson correlation against the current overall score. Emitted as perIndicatorInfluence[], sorted by absolute effective influence. Acceptance gate 8 ("effective influence agrees in sign and rank-order with assigned nominal weights") is now computable at the indicator level, not only at the dimension level. No production code touched; diagnostic-harness only. * fix(resilience): baseline-snapshot selection by structured parse, not filename sort Addresses P1 review on compare-resilience-current-vs-proposed.mjs:118-130. Plain filename sort breaks the "immediate-prior state" contract two ways: 1. Lexical ordering: `pre-repair` sorts after `post-` (`pr...` to 'r' > 'o'), so the PR-0 freeze would keep winning even after post-PR snapshots exist. Later scorer PRs would then report acceptance-gate deltas against the original pre-repair freeze instead of the immediately-prior post-PR-(N-1) snapshot — the gate would appear valid while measuring against the wrong baseline. 2. Lexical ordering: `pr10` < `pr9` (digit-by-digit), so PR-10 would lose the selection to PR-9. Fix: parseBaselineSnapshotMeta() extracts (kind, prNumber, date) from the filename. Sort keys are (kindRank desc, prNumber desc, date desc): - post always beats pre-repair (kindRank 1 vs 0) - among posts, prNumber compared numerically (10 beats 9) - date breaks ties (same-PR re-snapshots, later capture wins) - unlabeled post tags get prNumber 0 so they sort between pre-repair and any numbered PR snapshot Surfaced in output: baselineKind / baselinePrNumber / baselineDate alongside baselineFile so the operator can verify which snapshot was selected without having to reopen the file. Module now isMain-guarded per feedback_seed_isMain_guard memory so tests can import parseBaselineSnapshotMeta without firing the scoring run. Added tests/resilience-baseline-snapshot-ordering.test.mjs (9 tests) pinning the ordering contract for every known failure mode. Diagnostic-harness change only. No production code touched. fix(resilience): full scorable universe + registry-driven per-indicator influence Addresses two fresh P1 review findings on the PR 0 compare harness. Finding 1 — acceptance math ran on a curated ~95-country sample, so plan gate 2 could miss large regressions in excluded countries. - Main scoring loop now iterates the FULL scorable universe (listScorableCountries()), not the 52-country seed + cohort union. - Removed SAMPLE / HISTORICAL_SENSITIVITY_SEED constants. - Added scorableUniverseSize + cohortMissingFromScorable to output so operators see universe size and any cohort/pair endpoint that listScorable refuses to score (fail-loud, not silent drop). Finding 3 — per-indicator influence was a hand-picked 12-indicator subset, hiding most registry indicators from the baseline that later scorer PRs need. - Extraction is now driven by INDICATOR_REGISTRY. Every Core + Enrichment indicator gets a row with explicit extractionStatus: implemented \| not-implemented (with reason) \| unregistered-in-harness - EXTRACTION_RULES covers 40/59 indicators across 11 shape families (static-path, static-wb-infrastructure, static-wgi, static-wgi-mean, static-who, energy-mix-field, gas-storage-field, recovery-country- field, imf-macro/labor-country-field, national-debt, sanctions-count). - Remaining 19 indicators need either a scorer trace hook (PR 0.5) or a safe aggregation duplicate; each carries a reason string. - extractionCoverage summary (totalIndicators / implemented / notImplemented / unregisteredInHarness / coreImplemented / coreTotal) exposed in output so PR 0.5 progress is measurable. Added tests/resilience-indicator-extraction-plan.test.mjs (11 tests) pinning: every registry entry has an extraction row; not-implemented rows carry a reason; all 12 plan-named construct-risk indicators stay extractable; Core-tier coverage floor of 45%; shape-family unit tests. Diagnostic-harness change only. No production code touched. * fix(resilience): wire event-aggregate per-indicator influence via exported scorer helpers Addresses P1 review on PR 0 compare harness. Previous commit marked 16 Core-tier indicators as 'not-implemented' because they needed scorer event-window/severity-weighting math; that left the gate-9 acceptance apparatus incomplete for a large part of the shipped score. Fix: export the scorer-internal aggregation helpers so the harness calls them directly. Zero aggregation math duplicated in the harness, harness and scorer cannot drift. Exported from _dimension-scorers.ts (purely additive): summarizeCyber, summarizeOutages, summarizeGps, summarizeUcdp, summarizeUnrest, summarizeSocialVelocity, getCountryDisplacement, getThreatSummaryScore, countTradeRestrictions, countTradeBarriers. 13 extraction rules moved from not-implemented to implemented: cyberThreats, internetOutages, infraOutages, gpsJamming, ucdpConflict, unrestEvents, socialVelocity, newsThreatScore, displacementTotal, displacementHosted, tradeRestrictions, tradeBarriers, recoveryConflictPressure, recoveryDisplacementVelocity. Coverage: 52/59 total (88%), 46/50 Core-tier (92%). Four Core indicators remain not-implemented for STRUCTURAL reasons, NOT missing code. Scorer inputs are genuinely global scalars with zero per-country variance, so Pearson(indicator, overall) is 0 or NaN by construction: shippingStress, transitDisruption, energyPriceStress — scorer reads a global scalar applied to every country; a per-country effective signal would need re-expression as (global x per-country exposure), which is a derived signal in a different entry. aquastatWaterAvailability — needs a distinct sub-indicator path resolver; enrichment follow-up. New test asserts the three no-per-country-variance indicators STAY not-implemented with a matching reason, so any future extraction that appears to cover them without fixing the underlying construct fails. Dispatcher split into STATIC / SIMPLE / AGGREGATE extractor tables to stay under biome complexity limit. Core-tier floor test raised from 45% to 80%. 89 resilience tests pass, typecheck clean, biome clean. No production behaviour changes. * fix(resilience): tag-gated AQUASTAT extractor closes the last fixable Core gap Reviewer flagged aquastatWaterAvailability as the only remaining Core indicator where the not-implemented status was structurally fixable rather than conceptually impossible. Both aquastatWaterStress and aquastatWaterAvailability share a single .aquastat.value field; the scorer's scoreAquastatValue splits them by the sibling .aquastat.indicator tag keyword (stress/withdrawal/ dependency to stress family; availability/renewable/access to availability family). The harness now mirrors this branching: - classifyAquastatFamily implements the scorer's priority order (stress-family match wins even if the tag also contains an availability keyword, matching the sequential if-check at _dimension-scorers.ts L770-776). - static-aquastat-stress / static-aquastat-availability extractors return the value only when the family matches, so stress-family readings never corrupt the availability Pearson and vice versa. Core-tier coverage: 46/50 to 47/50 (94%). The 3 remaining Core not-implemented indicators (shippingStress, transitDisruption, energyPriceStress) are all structural impossibilities: scorer inputs are global scalars with zero per-country variance. New contract test pins both directions of the tag gate plus the priority-order edge case (a tag containing both families' keywords routes to stress). 90 resilience tests pass, typecheck clean, biome clean.	2026-04-22 16:44:12 +04:00

Elie Habib

52659ce192

feat(resilience): PR 1 — energy construct repair (flag-gated) (#3289 )

* docs(resilience): PR 1 foundation — Option B framing + v2 energy construct spec

First commit in PR 1 of the resilience repair plan. Zero scoring-behaviour
change; sets up the construct contract that the code changes will implement.

Declares the framing decision required by plan section 3.2 before any
scorer code lands: Option B (power-system security) is adopted. Electricity
grids are the dominant short-horizon shock-transmission channel, and the
choice lets the v2 energy indicator set share one denominator (percent of
electricity generation) instead of mixing primary-energy and power-system
measures in a composite.

Methodology doc changes:
- Energy Domain section now documents both the legacy indicator set
(still the default) and the v2 indicator set (flag-gated), under a
single #### Energy H4 heading so the methodology-doc linter still
asserts dimension-id parity with the registry.
- v2 indicators: importedFossilDependence (EG.ELC.FOSL.ZS x
max(EG.IMP.CONS.ZS, 0)), lowCarbonGenerationShare (EG.ELC.NUCL.ZS +
EG.ELC.RNEW.ZS), powerLossesPct (EG.ELC.LOSS.ZS), reserveMarginPct
(IEA), euGasStorageStress (renamed + scoped to EU), energyPriceStress
(retained at 0.15 weight).
- Retired under v2: electricityConsumption, gasShare, coalShare,
dependency (all into importedFossilDependence), renewShare.
- electricityAccess moves from energy to infrastructure under v2.
- Added a v2.1 changelog section documenting the flag-gated rollout,
acceptance gates (per plan section 6), and snapshot filenames for
the post-flag-flip captures.
- Known-limitations items 1-3 updated to note PR 1 lands the v2
construct behind RESILIENCE_ENERGY_V2_ENABLED (default off).

Methodology-doc linter + mdx-lint + typecheck all clean. Indicator
registry, seeders, and scorer rewrite land in subsequent commits on
this same branch.

* feat(resilience): PR 1 — RESILIENCE_ENERGY_V2_ENABLED flag + scoreEnergy v2 + registry entries

Second commit in PR 1 of the resilience repair plan. Lands the flag,
the v2 scorer code path, and the registry entries the methodology
doc referenced. Default is flag off; published rankings are unchanged
until the flag flips in a later commit (after seeders land and the
acceptance-gate rerun produces a fresh post-flip snapshot).

Changes:

- _shared.ts: isEnergyV2Enabled() function reader on the canonical
RESILIENCE_ENERGY_V2_ENABLED env var. Dynamic read (like
isPillarCombineEnabled) so tests can flip per-case.

- _dimension-scorers.ts:
- New Redis key constants for the three v2 seed keys plus the
reserved reserveMargin key (seeder deferred per plan §3.1
open-question).
- EU_GAS_STORAGE_COUNTRIES set (EU + EFTA + UK) for the renamed
euGasStorageStress signal per plan §3.5 point 2.
- isEnergyV2EnabledLocal() — private duplicate of the flag reader
to avoid a circular import (_shared.ts already imports from
this module). Same env-var contract.
- scoreEnergy split into scoreEnergyLegacy() + scoreEnergyV2().
Public scoreEnergy() branches on the flag. Legacy path is
byte-identical to the pre-commit behaviour.
- scoreEnergyV2() reads four new bulk payloads, composes
importedFossilDependence = fossilElectricityShare × max(netImports, 0)/100
per plan §3.2, collapses net exporters to 0, and gates
euGasStorageStress on EU membership so non-EU countries
re-normalise rather than getting penalised for a regional
signal.

- _indicator-registry.ts: four new entries under `dimension: 'energy'`
with `tier: 'experimental'` — importedFossilDependence (0.35),
lowCarbonGenerationShare (0.20), powerLossesPct (0.10),
reserveMarginPct (0.10). Experimental tier keeps them out of the
Core coverage gate until seed coverage is confirmed.

- compare-resilience-current-vs-proposed.mjs: new
'bulk-v1-country-value' shape family in the extraction dispatcher.
EXTRACTION_RULES now covers the four v2 registry indicators so
the per-indicator influence harness tracks them from day one.
When the seeders are absent, pairedSampleSize = 0 and Pearson = 0
— the harness output surfaces the "no influence yet" state rather
than silently dropping the indicators.

- tests/resilience-energy-v2.test.mts: 11 new tests pinning:
- flag-off = legacy behaviour preserved (v2 seed keys have no
effect when flag is off — catches accidental cross-path reads)
- flag-on = v2 composite behaves correctly:
- lower fossilElectricityShare raises score
- net exporter with 90% fossil > net importer with 90% fossil
(max(·, 0) collapse verified)
- higher lowCarbonGenerationShare raises score (nuclear credit)
- higher powerLossesPct lowers score
- euGasStorageStress is invariant for non-EU, responds for DE
- all v2 inputs absent = graceful degradation, coverage < 1.0

106 resilience tests pass (existing + 11 new). Typecheck clean. Biome
clean. No production behaviour change with flag off (default).

Next commits on this branch: three World Bank seeders for the v2 keys,
health.js + SEED_META registration (gated ON_DEMAND_KEYS until Railway
cron provisions), acceptance-gate rerun at flag-flip time.

* feat(resilience): PR 1 — three WB seeders + health registration for v2 energy construct

Third commit in PR 1. Lands the seed scripts for the three v2 energy
indicator source keys, registered in api/health.js with ON_DEMAND_KEYS
gating until Railway cron provisions.

New seeders (weekly cron cadence, 8d maxStaleMin = 2x interval):
- scripts/seed-low-carbon-generation.mjs
Pulls EG.ELC.NUCL.ZS + EG.ELC.RNEW.ZS from World Bank, sums per
country into `resilience:low-carbon-generation:v1`. Partial
coverage (one series missing) still emits a value using the
observed half — the scorer's 0-80 saturating goalpost tolerates
it and the underlying construct is "firm low-carbon share".

- scripts/seed-fossil-electricity-share.mjs
Pulls EG.ELC.FOSL.ZS into `resilience:fossil-electricity-share:v1`.
Feeds the importedFossilDependence composite at score time
(composite = fossilShare × max(netImports, 0) / 100 per plan §3.2).

- scripts/seed-power-reliability.mjs
Pulls EG.ELC.LOSS.ZS into `resilience:power-losses:v1`. Direct
grid-integrity signal replacing the retired electricityConsumption
wealth proxy.

All three follow the existing seed-recovery-*.mjs template:
- Shape: { countries: { [ISO2]: { value, year } }, seededAt }
- runSeed() from _seed-utils.mjs with schemaVersion=1, ttl=35d
- validateFn floor of 150 countries (WB coverage is 150-180 for
the three indicators; below 150 = transient fetch failure)
- ISO3 → ISO2 mapping via scripts/shared/iso3-to-iso2.json

No reserveMargin seeder is shipped in this commit per plan §3.1 open
question: IEA electricity-balance coverage is sparse outside OECD+G20,
and the indicator will likely ship as 'unmonitored' with weight 0.05
if it lands at all. The Redis key (`resilience:reserve-margin:v1`) is
reserved in _dimension-scorers.ts so the v2 scorer shape is stable.

api/health.js:
- SEED_DOMAINS: add `lowCarbonGeneration`, `fossilElectricityShare`,
`powerLosses` → their Redis keys.
- SEED_META: same three, pointing at `seed-meta:resilience:*` meta
keys with maxStaleMin=11520 (8d, per the worldmonitor
health-maxstalemin-write-cadence pattern: 2x weekly cron).
- ON_DEMAND_KEYS: three new entries gated as TRANSITIONAL until
Railway cron provisions and the first clean run completes. Remove
from this set after ~7 days of green production runs.

Typecheck clean; existing 106 resilience tests pass (seeders have no
in-repo callers yet, so nothing depends on them executing). Real-API
integration tests land when Railway cron is provisioned.

Next commit: Railway cron configuration + bundle-runner wiring.

* feat(resilience): PR 1 — bundle-runner + acceptance-gate verdict + flag-flip runbook

Final commit in the PR 1 tranche. Lands the three remaining pieces so
the flag-flip is fully operable once Railway cron provisions.

- scripts/seed-bundle-resilience-energy-v2.mjs
Railway cron bundle wrapping the three v2 energy seeders
(low-carbon-generation, fossil-electricity-share, power-losses).
Weekly cadence (7-day intervalMs); the underlying data is annual
at source so polling more frequently just hammers the World Bank
API. 5-minute per-script timeout. Mirrors the existing
seed-bundle-resilience-recovery.mjs pattern.

- scripts/compare-resilience-current-vs-proposed.mjs: acceptanceGates
block. Programmatic evaluation of plan §6 gates using the inputs
the harness already computes:
gate-1-spearman Spearman vs baseline >= 0.85
gate-2-country-drift Max country drift vs baseline <= 15
gate-6-cohort-median Cohort median shift vs baseline <= 10
gate-7-matched-pair Every pair holds expected direction
gate-9-effective-influence >= 80% Core indicators measurable
gate-universe-integrity No cohort/pair endpoint missing from scorable
Thresholds are encoded in a const so they can't silently soften.
Output verdict is PASS / CONDITIONAL / BLOCK. Emitted in
summary.acceptanceVerdict for at-a-glance PR comment pasting, with
full per-gate detail in acceptanceGates.results.

- docs/methodology/energy-v2-flag-flip-runbook.md
Operator runbook for the flag flip. Pre-flip checklist (seeders
green, health endpoint green, ON_DEMAND_KEYS graduation, Spearman
verification), flip procedure (pre-flip snapshot, dry-run, cache
prefix bump, Vercel env flip, post-flip snapshot, methodology
doc reclassification), rollback procedure, and a reference table
for the three possible verdict states.

PR 1 is now code-complete pending:
1. Railway cron provisioning (ops, not code)
2. Flag flip + acceptance-gate rerun (follows runbook, not code)
3. Reserve-margin seeder (deferred per plan §3.1 open-question)

Zero scoring-behaviour change in this commit. 121 resilience tests
pass, typecheck clean.

* fix(resilience): PR 1 — drop unseeded reserveMargin from scorer + fix composite extractor

Addresses two P1 review findings on PR #3289.

Finding 1: scoreEnergyV2 read resilience:reserve-margin:v1 at weight
0.10 but no seeder ships in this PR (indicator deferred per plan
§3.1 open-question). On flag flip that slot would be permanently
null, silently renormalizing the remaining 90% of weight and
producing a construct different from what the methodology doc
describes. Fix: remove reserve-margin from the v2 reader +
blend entirely. Redistribute its 0.10 weight to powerLossesPct
(now 0.20); both are grid-integrity signals per plan §3.1, and
the original plan split electricityConsumption's 0.30 weight
across powerLossesPct + reserveMarginPct + importedFossilDependence
— without reserveMarginPct, powerLossesPct carries the shared
grid-integrity load until the IEA seeder ships.

v2 weights now: 0.35 + 0.20 + 0.20 + 0.10 + 0.15 = 1.00
(importedFossilDependence + lowCarbonGenerationShare +
powerLossesPct + euGasStorageStress + energyPriceStress)

Reserve-margin Redis key constant stays reserved so the v2
scorer shape is stable when a future commit lands the seeder;
split 0.10 back out of powerLossesPct at that point.

Methodology doc, _shared.ts flag comment, and v2 test suite all
updated to the 5-indicator shape. New regression test asserts
that changing reserve-margin Redis content has zero effect on
the v2 score — guards against a future commit accidentally
wiring the reader back in without its seeder.

Finding 2: scripts/compare-resilience-current-vs-proposed.mjs
measured importedFossilDependence by reading fossilElectricityShare
alone. The scorer defines it as fossilShare × max(netImports, 0)
/ 100, so the extractor zeroed out net exporters and
under-reported net importers — making gate-9 effective-influence
wrong for the centrepiece construct change of PR 1.

Fix: new 'imported-fossil-dependence-composite' extractor type
in applyExtractionRule that recomputes the same composite from
both inputs (fossilShare bulk payload + staticRecord.iea.
energyImportDependency.value). Stays in lockstep with the
scorer — drift between the two would break gate-9's
interpretation.

New unit tests pin:
- net importer: 80% × max(60, 0) / 100 = 48 ✓
- net exporter: 80% × max(-40, 0) / 100 = 0 ✓
- missing either input → null

64 resilience tests pass; typecheck clean. Flag-off path is
still byte-identical to pre-PR behaviour.

* docs(resilience): PR 1 — align methodology doc with actual shipped indicators and seeders

Addresses P1 review on docs/methodology/country-resilience-index.mdx
lines 29 and 574-575. The doc still described reserveMarginPct as a
shipped v2 indicator and listed seed-net-energy-imports.mjs in the
new-seeders list, neither of which the branch actually ships.

Doc changes to match the code in this branch:

Known-limitations item 1: restated to describe the actual v2
replacement footprint — powerLossesPct at 0.20 (temporarily
absorbing reserveMarginPct's 0.10) plus accessToElectricityPct
moved to infrastructure. reserveMarginPct is named as a deferred
companion with the split-out instructions for when its seeder
lands.

v2.1 changelog (Indicators added): split into "live in PR 1" and
"deferred in PR 1" so the reader can distinguish which entries
match real code. importedFossilDependence's composite formula
now written out and the net-imports source attributed to the
existing resilience:static.iea path (not a new seeder).

v2.1 changelog (New seeders): lists the three actual files that
ship in this branch (seed-low-carbon-generation, seed-fossil-
electricity-share, seed-power-reliability) and explicitly notes
seed-net-energy-imports.mjs is NOT a new seeder — the
EG.IMP.CONS.ZS series is already fetched by seed-resilience-
static.mjs. Adds the bundle-runner reference.

Methodology-doc linter + mdx-lint both pass (125/125). Typecheck
clean. Doc is now the source of truth for what PR 1 actually ships.

* fix(resilience): PR 1 — sync powerLossesPct registry weight with scorer (0.10 → 0.20)

Reviewer-caught mismatch between INDICATOR_REGISTRY and scoreEnergyV2.
The previous commit redistributed the deferred reserveMarginPct's 0.10
weight into powerLossesPct in the SCORER but left the REGISTRY entry
unchanged at 0.10. Two downstream effects:

1. scripts/compare-resilience-current-vs-proposed.mjs copies
`spec.weight` into `nominalWeight` for gate-9 reporting, so
powerLossesPct's nominal influence would be under-reported by
half in every post-flip acceptance run — exactly the harness PR 1
relies on for merge evidence.
2. Methodology doc vs registry vs scorer drift is the pattern the
methodology-doc linter is supposed to catch; it passes here
because the linter only checks dimension-id parity, not weights.
Registry is now the only remaining source of truth to keep in
lockstep with the scorer.

Change:
- `_indicator-registry.ts` powerLossesPct.weight: 0.1 → 0.2
- Inline comment names the deferral and instructs: "when the IEA
electricity-balance seeder lands, split 0.10 back out and restore
reserveMarginPct at 0.10. Keep this field in lockstep with
scoreEnergyV2 ... because the PR 0 compare harness copies
spec.weight into nominalWeight for gate-9 reporting."

Experimental weights per dimension invariant still holds (0.35 + 0.20
+ 0.20 = 0.75 for energy, well under the 1.0 ceiling). 64 resilience
tests pass, typecheck clean.

2026-04-22 17:10:38 +04:00

Elie Habib

da0f26a3cf

feat(resilience): PR 0 diagnostic freeze + fairness-audit harness (no scoring changes) (#3284 )

* feat(resilience): PR 0 diagnostic freeze + fairness-audit harness

Lands the before-state and measurement apparatus every subsequent
resilience-scorer PR validates against. Zero scoring changes. Per the
v3 plan at docs/plans/2026-04-22-001-fix-resilience-scorer-structural-
bias-plan.md this is tranche 0 of five.

What lands:
- Construct contract published in the methodology doc: absolute
  resilience not development-adjusted, mechanism test for every
  indicator, peer-relative views published separately from the core.
- Known construct limitations section: six construct errors scheduled
  for PR 1-3 repair with explicit mapping to plan tranches.
- Indicator-source manifest at docs/methodology/indicator-sources.yaml
  with source, seriesId, seriesUrl, coveragePct, lastObservedYear,
  license, mechanismTestRationale, and a constructStatus classification.
- Pre-repair ranking snapshot at
  docs/snapshots/resilience-ranking-live-pre-repair-2026-04-22.json
  (217 items + 5 greyedOut, captured 2026-04-22 08:38 UTC at commit
  425507d15).
- Cohort configuration at tests/helpers/resilience-cohorts.mts: six
  cohorts covering 87 countries (net-fuel-exporters, net-energy-
  importers-oecd, nuclear-heavy-generation, coal-heavy-domestic,
  small-island-importers, fragile-states).
- Matched-pair sanity panel at tests/helpers/resilience-matched-pairs.mts:
  six pairs (FR/DE, NO/CA, UAE/BH, JP/KR, IN/ZA, SG/CH) with expected-
  direction rationale and minGap for acceptance gate 7.
- scripts/compare-resilience-current-vs-proposed.mjs extended to emit
  cohortSummary and matchedPairSummary alongside the existing output
  shape (backward compatible).
- tests/resilience-cohort-config.test.mts: 11 validations ensuring the
  cohort + matched-pair configs stay well-formed.

Deferred to PR 0.5 (before PR 1 lands):
- Monotonicity test harness for all 19 dimension scorers pinning the
  sign of every indicator.
- Pearson-derivative variable-influence baseline inside the sensitivity
  script producing the nominal-weight-vs-effective-influence table that
  plan acceptance gate 8 requires.

Verification: typecheck:all clean, 430/430 resilience tests pass,
11/11 new cohort-config tests pass, snapshot auto-discovered and
validated by the existing snapshot-test harness.

* feat(resilience): PR 0 follow-ups — monotonicity harness, variable-influence baseline, cross-consumer formula gate

Completes the PR 0 scope per the v3 plan §5 deliverables. Three adds:

1. Monotonicity test harness
   tests/resilience-dimension-monotonicity.test.mts pins the direction
   of movement for 14 indicators across 7 dimensions (reserve adequacy,
   fiscal space 3x, external debt coverage, import concentration,
   governance WGI, food/water 2x, energy 5x). Each test builds two
   synthetic ResilienceSeedReader fixtures differing only in the target
   indicator and asserts the dimension score moves in the documented
   direction. The scoreEnergy tests explicitly flag three indicators
   (gasShare, coalShare, electricityConsumption) that PR 1 §3.1-3.2
   overturns so future readers understand which directional claims the
   plan intentionally replaces.

2. Variable-influence baseline
   scripts/compare-resilience-current-vs-proposed.mjs now computes
   per-dimension Pearson correlation against the current overallScore
   scaled by the dimension's nominal domain weight (a Pearson-derivative
   approximation of Sobol indices). The output carries a
   variableInfluence[] array sorted by abs(effectiveInfluence) desc.
   Acceptance gate 8 from the plan compares post-change effective
   influence against assigned nominal weight; divergences flag a
   wealth-proxy or saturated-signal construct problem.

3. Cross-consumer formula gate
   Five external consumers of resilience:score:v10:* now filter stale-
   formula entries so a flag flip does not serve mixed-formula data
   downstream:
     - server/worldmonitor/supply-chain/v1/get-route-impact.ts —
       readResilienceScore() checks _formula via the new
       getCurrentCacheFormula export and returns 0 on mismatch.
     - scripts/validate-resilience-correlation.mjs,
       scripts/validate-resilience-backtest.mjs,
       scripts/backtest-resilience-outcomes.mjs,
       scripts/benchmark-resilience-external.mjs — each inlines a
       currentCacheFormulaLocal() helper that mirrors the server's
       formula derivation from env, skips parsed entries whose
       _formula disagrees, and logs the skip count so operators can
       notice a mismatch during the flip window.

A mixed-formula cohort (some countries d6-tagged, others pc-tagged)
would confound every correlation, AUC, and Spearman this repair plan
depends on for its acceptance gates. These guards close that gap.

Verification: typecheck:all clean, 444/444 resilience tests pass
(+14 from the new monotonicity harness).

* fix(resilience): PR 0 review follow-ups — sample-union + doc tense

Two review-driven fixes on top of PR 0.

1. scripts/compare-resilience-current-vs-proposed.mjs — the cohort and
   matched-pair summaries were computed against the historical
   52-country sensitivity seed, which silently excluded the
   small-island-importers cohort (zero members in the seed) and the
   sg-vs-ch matched pair (Singapore not in the seed). With the current
   script those acceptance gates are partially measured at best.

   SAMPLE now = union(historical 52 seed, every cohort member, every
   matched-pair endpoint). The imports for RESILIENCE_COHORTS and
   MATCHED_PAIRS moved from inside main() to module scope so the union
   can be computed before the script runs.

   Net sample size grows from 52 to ~95 countries. Still fast enough
   for an interactive pass; makes the acceptance gates honest.

2. docs/methodology/country-resilience-index.mdx — the construct
   contract wording read as present-tense compliance ("Every indicator
   in the scorer passes a single mechanism test"), which contradicted
   the immediately-following passage about indicators that currently
   fail the test. Reworded to "is being evaluated against" and added
   an explicit PR-0-does-not-change-scoring paragraph that names the
   known-failing indicators (electricityConsumption, gas/coal flat
   penalties, WHO per-capita health spend) and points at the repair
   plan for the replacement schedule.

Verification: typecheck:all clean, 444/444 resilience tests pass.

* fix(resilience): compare-script loads frozen baseline + emits per-indicator influence

Addresses two P1 review findings on PR #3284:

1. Script previously compared current-6d vs proposed-pillar-combined
   from the SAME checkout; never loaded the frozen pre-PR-0 baseline,
   so acceptance gates 2/6/7 ("no country moved >15pts vs baseline",
   cohort median shift vs baseline, matched-pair gap change vs
   baseline) could not be enforced for later scorer PRs.

   Now auto-discovers the most recent
   resilience-ranking-live-pre-repair-<date>.json (or post-<pr>-<date>)
   in docs/snapshots/ and emits a baselineComparison block with:
   spearmanVsBaseline, maxCountryAbsDelta, biggestDriftsVsBaseline,
   cohortShiftVsBaseline, matchedPairGapChange. If no baseline is
   found, the block is emitted with status 'unavailable' so callers
   distinguish missing-baseline from passed-baseline.

2. variableInfluence was emitted only at the dimension level, which
   hid the exact sub-indicators the repair plan targets
   (electricityConsumption, gasShare, coalShare, etc.) inside their
   parent dimension. Added extractIndicatorValues() which pulls twelve
   construct-risk indicators per country from the shared memoized
   reader, then computes per-indicator Pearson correlation against
   the current overall score. Emitted as perIndicatorInfluence[],
   sorted by absolute effective influence.

Acceptance gate 8 ("effective influence agrees in sign and rank-order
with assigned nominal weights") is now computable at the indicator
level, not only at the dimension level.

No production code touched; diagnostic-harness only.

* fix(resilience): baseline-snapshot selection by structured parse, not filename sort

Addresses P1 review on compare-resilience-current-vs-proposed.mjs:118-130.

Plain filename sort breaks the "immediate-prior state" contract two ways:

1. Lexical ordering: `pre-repair` sorts after `post-*`
   (`pr...` to 'r' > 'o'), so the PR-0 freeze would keep winning even
   after post-PR snapshots exist. Later scorer PRs would then report
   acceptance-gate deltas against the original pre-repair freeze
   instead of the immediately-prior post-PR-(N-1) snapshot — the gate
   would appear valid while measuring against the wrong baseline.

2. Lexical ordering: `pr10` < `pr9` (digit-by-digit), so PR-10 would
   lose the selection to PR-9.

Fix: parseBaselineSnapshotMeta() extracts (kind, prNumber, date) from
the filename. Sort keys are (kindRank desc, prNumber desc, date desc):
  - post always beats pre-repair (kindRank 1 vs 0)
  - among posts, prNumber compared numerically (10 beats 9)
  - date breaks ties (same-PR re-snapshots, later capture wins)
  - unlabeled post tags get prNumber 0 so they sort between
    pre-repair and any numbered PR snapshot

Surfaced in output: baselineKind / baselinePrNumber / baselineDate
alongside baselineFile so the operator can verify which snapshot was
selected without having to reopen the file.

Module now isMain-guarded per feedback_seed_isMain_guard memory so
tests can import parseBaselineSnapshotMeta without firing the
scoring run.

Added tests/resilience-baseline-snapshot-ordering.test.mjs (9 tests)
pinning the ordering contract for every known failure mode.

Diagnostic-harness change only. No production code touched.

* fix(resilience): full scorable universe + registry-driven per-indicator influence

Addresses two fresh P1 review findings on the PR 0 compare harness.

Finding 1 — acceptance math ran on a curated ~95-country sample,
so plan gate 2 could miss large regressions in excluded countries.

  - Main scoring loop now iterates the FULL scorable universe
    (listScorableCountries()), not the 52-country seed + cohort union.
  - Removed SAMPLE / HISTORICAL_SENSITIVITY_SEED constants.
  - Added scorableUniverseSize + cohortMissingFromScorable to output
    so operators see universe size and any cohort/pair endpoint that
    listScorable refuses to score (fail-loud, not silent drop).

Finding 3 — per-indicator influence was a hand-picked 12-indicator
subset, hiding most registry indicators from the baseline that
later scorer PRs need.

  - Extraction is now driven by INDICATOR_REGISTRY. Every Core +
    Enrichment indicator gets a row with explicit extractionStatus:
      implemented | not-implemented (with reason) | unregistered-in-harness
  - EXTRACTION_RULES covers 40/59 indicators across 11 shape families
    (static-path, static-wb-infrastructure, static-wgi, static-wgi-mean,
    static-who, energy-mix-field, gas-storage-field, recovery-country-
    field, imf-macro/labor-country-field, national-debt, sanctions-count).
  - Remaining 19 indicators need either a scorer trace hook (PR 0.5)
    or a safe aggregation duplicate; each carries a reason string.
  - extractionCoverage summary (totalIndicators / implemented /
    notImplemented / unregisteredInHarness / coreImplemented / coreTotal)
    exposed in output so PR 0.5 progress is measurable.

Added tests/resilience-indicator-extraction-plan.test.mjs (11 tests)
pinning: every registry entry has an extraction row; not-implemented
rows carry a reason; all 12 plan-named construct-risk indicators stay
extractable; Core-tier coverage floor of 45%; shape-family unit tests.

Diagnostic-harness change only. No production code touched.

* fix(resilience): wire event-aggregate per-indicator influence via exported scorer helpers

Addresses P1 review on PR 0 compare harness. Previous commit marked 16
Core-tier indicators as 'not-implemented' because they needed scorer
event-window/severity-weighting math; that left the gate-9 acceptance
apparatus incomplete for a large part of the shipped score.

Fix: export the scorer-internal aggregation helpers so the harness
calls them directly. Zero aggregation math duplicated in the harness,
harness and scorer cannot drift.

Exported from _dimension-scorers.ts (purely additive):
  summarizeCyber, summarizeOutages, summarizeGps,
  summarizeUcdp, summarizeUnrest, summarizeSocialVelocity,
  getCountryDisplacement, getThreatSummaryScore,
  countTradeRestrictions, countTradeBarriers.

13 extraction rules moved from not-implemented to implemented:
  cyberThreats, internetOutages, infraOutages, gpsJamming,
  ucdpConflict, unrestEvents, socialVelocity, newsThreatScore,
  displacementTotal, displacementHosted, tradeRestrictions,
  tradeBarriers, recoveryConflictPressure, recoveryDisplacementVelocity.

Coverage:
  52/59 total (88%), 46/50 Core-tier (92%).

Four Core indicators remain not-implemented for STRUCTURAL reasons,
NOT missing code. Scorer inputs are genuinely global scalars with
zero per-country variance, so Pearson(indicator, overall) is 0 or
NaN by construction:
  shippingStress, transitDisruption, energyPriceStress — scorer
  reads a global scalar applied to every country; a per-country
  effective signal would need re-expression as (global x per-country
  exposure), which is a derived signal in a different entry.
  aquastatWaterAvailability — needs a distinct sub-indicator path
  resolver; enrichment follow-up.

New test asserts the three no-per-country-variance indicators STAY
not-implemented with a matching reason, so any future extraction
that appears to cover them without fixing the underlying construct
fails.

Dispatcher split into STATIC / SIMPLE / AGGREGATE extractor tables
to stay under biome complexity limit. Core-tier floor test raised
from 45% to 80%.

89 resilience tests pass, typecheck clean, biome clean. No production
behaviour changes.

* fix(resilience): tag-gated AQUASTAT extractor closes the last fixable Core gap

Reviewer flagged aquastatWaterAvailability as the only remaining Core
indicator where the not-implemented status was structurally fixable
rather than conceptually impossible.

Both aquastatWaterStress and aquastatWaterAvailability share a single
.aquastat.value field; the scorer's scoreAquastatValue splits them
by the sibling .aquastat.indicator tag keyword (stress/withdrawal/
dependency to stress family; availability/renewable/access to
availability family). The harness now mirrors this branching:

  - classifyAquastatFamily implements the scorer's priority order
    (stress-family match wins even if the tag also contains an
    availability keyword, matching the sequential if-check at
    _dimension-scorers.ts L770-776).
  - static-aquastat-stress / static-aquastat-availability extractors
    return the value only when the family matches, so stress-family
    readings never corrupt the availability Pearson and vice versa.

Core-tier coverage: 46/50 to 47/50 (94%). The 3 remaining Core
not-implemented indicators (shippingStress, transitDisruption,
energyPriceStress) are all structural impossibilities: scorer inputs
are global scalars with zero per-country variance.

New contract test pins both directions of the tag gate plus the
priority-order edge case (a tag containing both families' keywords
routes to stress).

90 resilience tests pass, typecheck clean, biome clean.

2026-04-22 16:44:12 +04:00

2 Commits