mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(resilience): PR 3 §3.5 — retire fuelStockDays from core score permanently
First commit in PR 3 of the resilience repair plan. Retires
`fuelStockDays` from the core score with no replacement.
Why permanent, not replaced:
IEA emergency-stockholding rules are defined in days of NET IMPORTS
and do not bind net exporters by design. Norway/Canada/US measured
in days-of-imports are incomparable to Germany/Japan measured the
same way — the construct is fundamentally different across the two
country classes. No globally-comparable recovery-fuel signal can
be built from this source; the pre-repair probe showed 100% imputed
at 50 for every country in the April 2026 freeze.
scoreFuelStockDays:
- Rewritten to return coverage=0 + observedWeight=0 +
imputationClass='source-failure' for every country regardless
of seed content.
- Drops the dimension from the `recovery` domain's coverage-
weighted mean automatically; remaining recovery dimensions
pick up the share via re-normalisation in
`_shared.ts#coverageWeightedMean`.
- No explicit weight transfer needed — the coverage-weighted
blend handles redistribution.
Registry:
- recoveryFuelStockDays re-tagged from tier='enrichment' to
tier='experimental' so the Core coverage gate treats it as
out-of-score.
- Description updated to make the retirement explicit; entry
stays in the registry for structural continuity (the
dimension `fuelStockDays` remains in RESILIENCE_DIMENSION_ORDER
for the 19-dimension tests; removing the dimension entirely is
a PR 4 structural-audit concern).
Housekeeping:
- Removed `RESILIENCE_RECOVERY_FUEL_STOCKS_KEY` constant (no
longer read; noUnusedLocals would reject it).
- Removed `RecoveryFuelStocksCountry` interface for the same
reason. Comment at the removed declaration instructs future
maintainers not to re-add the type as a reservation; when a
new recovery-fuel concept lands, introduce a fresh interface.
Plan reference: §3.5 point 1 of
`docs/plans/2026-04-22-001-fix-resilience-scorer-structural-bias-plan.md`.
51 resilience tests pass, typecheck + biome clean. The
`recovery` domain's published score will shift slightly for every
country because the 0.10 slot that fuelStockDays was imputing to
now redistributes; the compare-harness acceptance-gate rerun at
merge time will quantify the shift per plan §6 gates.
* feat(resilience): PR 3 §3.5 — retire BIS-backed currencyExternal; rebuild on IMF inflation + WB reserves
BIS REER/DSR feeds were load-bearing in currencyExternal (weights 0.35
fxVolatility + 0.35 fxDeviation, ~70% of dimension). They cover ~60
countries max — so every non-BIS country fell through to
curated_list_absent (coverage 0.3) or a thin IMF proxy (coverage 0.45).
Combined with reserveMarginPct already removed in PR 1, currencyExternal
was the clearest "construct absent for most of the world" carrier left
in the scorer.
Changes:
_dimension-scorers.ts
- scoreCurrencyExternal now reads IMF macro (inflationPct) + WB FX
reserves only. Coverage ladder:
inflation + reserves → 0.85 (observed primary + secondary)
inflation only → 0.55
reserves only → 0.40
neither → 0.30 (IMPUTE.bisEer retained for snapshot
continuity; semantics read as
"no IMF + no WB reserves" now)
- Removed dead symbols: RESILIENCE_BIS_EXCHANGE_KEY constant (reserved
via comment only, flagged by noUnusedLocals), stddev() helper,
getCountryBisExchangeRates() loader, BisExchangeRate interface,
dateToSortableNumber() — all were exclusive callers of the retired
BIS path.
_indicator-registry.ts
- New core entry inflationStability (weight 0.60, tier=core,
sourceKey=economic:imf:macro:v2).
- fxReservesAdequacy weight 0.15 → 0.40 (secondary reliability
anchor).
- fxVolatility + fxDeviation demoted tier=enrichment → tier=experimental
(BIS ~60-country coverage; off the core weight sum).
- Non-experimental weights now sum to 1.0 (0.60 + 0.40).
scripts/compare-resilience-current-vs-proposed.mjs
- EXTRACTION_RULES: added inflationStability →
imf-macro-country-field field=inflationPct so the registry-parity
test passes and the correlation harness sees the new construct.
tests/resilience-dimension-scorers.test.mts
- Dropped BIS-era wording ("non-BIS country") and test 266
(BIS-outage coverage 0.35 branch) which collapsed to the inflation-
only path post-retirement.
- Updated coverage assertions: inflation-only 0.45 → 0.55; inflation+
reserves 0.55 → 0.85.
tests/resilience-scorers.test.mts
- domainAverages.economic 68.33 → 66.33 (US currencyExternal score
shifts slightly under IMF+reserves vs old BIS composite).
- stressScore 67.85 → 67.21; stressFactor 0.3215 → 0.3279.
- overallScore 65.82 → 65.52.
- baselineScore unchanged (currencyExternal is stress-only).
All 6324 data-tier tests pass. typecheck:api clean. No change to
seeders or Redis keys; this is a pure scorer + registry rebuild.
* feat(resilience): PR 3 §3.5 point 3 — re-goalpost externalDebtCoverage (0..5 → 0..2)
Plan §2.1 diagnosis table showed externalDebtCoverage saturating at
score=100 across all 9 probe countries — including stressed states.
Signal was collapsed. Root cause: (worst=5, best=0) gave every country
with ratio < 0.5 a score above 90, and mapped Greenspan-Guidotti's
reserve-adequacy threshold (ratio=1.0) to score 80 — well into "no
worry" territory instead of the "mild warning" it should be.
Re-anchored on Greenspan-Guidotti directly: ratio=1.0 now maps to score
50 (mild warning), ratio=2.0 to score 0 (acute rollover-shock exposure).
Ratios above 2.0 clamp to 0, consistent with "beyond this point the
country is already in crisis; exact value stops mattering."
Files changed:
- _indicator-registry.ts: recoveryDebtToReserves goalposts
{worst: 5, best: 0} → {worst: 2, best: 0}. Description updated to
cite Greenspan-Guidotti; inline comment documents anchor + rationale.
- _dimension-scorers.ts: scoreExternalDebtCoverage normalizer bound
changed from (0..5) to (0..2), with inline comment.
- docs/methodology/country-resilience-index.mdx: goalpost table row
5-0 → 2-0, description cites Greenspan-Guidotti.
- docs/methodology/indicator-sources.yaml:
* constructStatus: dead-signal → observed-mechanism (signal is now
discriminating).
* reviewNotes updated to describe the new anchor.
* mechanismTestRationale names the Greenspan-Guidotti rule.
- tests/resilience-dimension-monotonicity.test.mts: updated the
comment + picked values inside the (0..2) discriminating band (0.3
and 1.5). Old values (1 vs 4) had 4 clamping to 0.
- tests/resilience-dimension-scorers.test.mts: NO score threshold
relaxed >90 → >=85 (NO ratio=0.2 now scores 90, was 96).
- tests/resilience-scorers.test.mts: fixture drift:
* domainAverages.recovery 54.83 → 47.33 (US extDebt 70 → 25).
* baselineScore 63.63 → 60.12 (extDebt is baseline type).
* overallScore 65.52 → 63.27.
* stressScore / stressFactor unchanged (extDebt is baseline-only).
All 6324 data-tier tests pass. typecheck:api clean.
* feat(resilience): PR 3 §3.6 — CI gate on indicator coverage and nominal weight
Plan §3.6 adds a new acceptance criterion (also §5 item 5):
> No indicator with observed coverage below 70% may exceed 5% nominal
> weight OR 5% effective influence in the post-change sensitivity run.
This commit enforces the NOMINAL-WEIGHT half as a unit test that runs
on every CI build. The EFFECTIVE-INFLUENCE half is produced by
scripts/validate-resilience-sensitivity.mjs as a committed artifact;
the gate file only asserts that script still exists so a refactor that
removes it breaks the build loudly.
Why the gate exists (plan §3.6):
"A dimension at 30% observed coverage carries the same effective
weight as one at 95%. This contradicts the OECD/JRC handbook on
uncertainty analysis."
Implementation:
tests/resilience-coverage-influence-gate.test.mts — three tests:
1. Nominal-weight gate: for every core indicator with coverage < 137
countries (70% of the ~195-country universe), computes its nominal
overall weight as
indicator.weight × (1/dimensions-in-domain) × domain-weight
and asserts it does not exceed 5%. Equal-share-per-dimension is
the *upper bound* on runtime weight (coverage-weighted mean gives
a lower share when a dimension drops out), so this is a strict
bound: if the nominal number passes, the runtime number also
passes for every country.
2. Effective-influence contract: asserts the sensitivity script
exists at its expected path. Removing it (intentionally or by
refactor) breaks the build.
3. Audit visibility: prints the top 10 core indicators by nominal
overall weight. No assertion beyond "ran" — the list lets
reviewers spot outliers that pass the gate but are near the cap.
Current state (observed from audit output):
recoveryReserveMonths: nominal=4.17% coverage=188
recoveryDebtToReserves: nominal=4.17% coverage=185
recoveryImportHhi: nominal=4.17% coverage=190
inflationStability: nominal=3.40% coverage=185
electricityConsumption: nominal=3.30% coverage=217
ucdpConflict: nominal=3.09% coverage=193
Every core indicator has coverage ≥ 180 (already enforced by the
pre-existing indicator-tiering test), so the nominal-weight gate has
no current violators — its purpose is catching future drift, not
flagging today's state.
All 6327 data-tier tests pass. typecheck:api clean.
* docs(resilience): PR 3 methodology doc — document §3.5 dead-signal retirements + §3.6 coverage gate
Methodology-doc update capturing the three §3.5 landings and the §3.6 CI
gate. Five edits:
1. **Known construct limitations section (#5 and #6):** strikethrough the
original "dead signals" and "no coverage-based weight cap" items,
annotate them with "Landed in PR 3 §3.5"/"Landed in PR 3 §3.6" +
specifics of what shipped.
2. **Currency & External H4 section:** completely rewritten. Old table
(fxVolatility / fxDeviation / fxReservesAdequacy on BIS primary) is
replaced by the two-indicator post-PR-3 table (inflationStability at
0.60 + fxReservesAdequacy at 0.40). Coverage ladder spelled out
(0.85 / 0.55 / 0.40 / 0.30). Legacy BIS indicators named as
experimental-tier drill-downs only.
3. **Fuel Stock Days H4 section:** H4 heading text kept verbatim so the
methodology-lint H4-to-dimension mapping does not break; body
rewritten to explain that the dimension is retired from core but the
seeder still runs for IEA-member drill-downs.
4. **External Debt Coverage table row:** goalpost 5-0 → 2-0, description
cites Greenspan-Guidotti reserve-adequacy rule.
5. **New v2.2 changelog entry** — PR 3 dead-signal cleanup, covering
§3.5 points 1/2/3 + §3.6 + acceptance gates + construct-audit
updates.
No scoring or code changes in this commit. Methodology-lint test passes
(H4 mapping intact). All 6327 data-tier tests pass.
* fix(resilience): PR 3 §3.6 gate — correct share-denominator for coverage-weighted aggregation
Reviewer catch (thanks). The previous gate computed each indicator's
nominal overall weight as
indicator.weight × (1 / N_total_dimensions_in_domain) × domain_weight
and claimed this was an upper bound ("actual runtime weight is ≤ this
when some dimensions drop out on coverage"). That is BACKWARDS for
this scorer.
The domain aggregation is coverage-weighted
(server/worldmonitor/resilience/v1/_shared.ts coverageWeightedMean),
so when a dimension pins at coverage=0 it is EXCLUDED from the
denominator and the surviving dimensions' shares go UP, not down.
PR 3 commit 1 retires fuelStockDays by hard-coding its scorer to
coverage=0 for every country — so in the current live state the
recovery domain has 5 contributing dimensions (not 6), and each core
recovery indicator's nominal share is
1.0 × 1/5 × 0.25 = 5.00% (was mis-reported as 4.17%)
The old gate therefore under-estimated nominal influence and could
silently pass exactly the kind of low-coverage overweight regression
it is meant to block.
Fix:
- Added `coreBearingDimensions(domainId)` helper that counts only
dimensions that have ≥1 core indicator in the registry. A dimension
with only experimental/enrichment entries (post-retirement
fuelStockDays) has no core contribution → does not dilute shares.
- Updated `nominalOverallWeight` to divide by the core-bearing count,
not the raw dimension count.
- Rewrote the helper's doc comment to stop claiming this is a strict
upper bound — explicitly calls out the dynamic case (source failure
raising surviving dim shares further) as the sensitivity script's
responsibility.
- Added a new regression test: asserts (a) at least one recovery
dimension is all-non-core (fuelStockDays post-retirement),
(b) fuelStockDays has zero core indicators, and (c) recoveryDebt
ToReserves nominal = 0.05 exactly (not 0.0417) — any reversion
of the retirement or regression to N_total-denominator will fail
loudly.
Top-10 audit output now correctly shows:
recoveryReserveMonths: nominal=5% coverage=188
recoveryDebtToReserves: nominal=5% coverage=185
recoveryImportHhi: nominal=5% coverage=190
(was 4.17% each under the old math)
All 486 resilience tests pass. typecheck:api clean.
Note: the 5% figure is exactly AT the cap, not over it. "exceed" means
strictly > 5%, so it still passes. But now the reviewer / audit log
reflects reality.
* fix(resilience): PR 3 review — retired-dim confidence drag + false source-failure label
Addresses the Codex review P1 + P2 on PR #3297.
P1 — retired-dim drag on confidence averages
--------------------------------------------
scoreFuelStockDays returns coverage=0 by design (retired construct),
but computeLowConfidence, computeOverallCoverage, and the widget's
formatResilienceConfidence averaged across all 19 dimensions. That
dragged every country's reported averageCoverage down — US went from
0.8556 (active dims only) to 0.8105 (all dims) — enough drift to
misclassify edge countries as lowConfidence and to shift the ranking
widget's overallCoverage pill for every country.
Fix: introduce an authoritative RESILIENCE_RETIRED_DIMENSIONS set in
_dimension-scorers.ts and filter it out of all three averages. The
filter is keyed on the retired-dim REGISTRY, not on coverage === 0,
because a non-retired dim can legitimately emit coverage=0 on a
genuinely sparse-data country via weightedBlend fall-through — those
entries MUST keep dragging confidence down (that is the sparse-data
signal lowConfidence exists to surface). Verified: sparse-country
release-gate test (marks sparse WHO/FAO countries as low confidence)
still passes with the registry-keyed filter; would have failed with
a naive coverage=0 filter.
Server-client parity: widget-utils cannot import server code, so
RESILIENCE_RETIRED_DIMENSION_IDS is a hand-mirrored constant, kept
in lockstep by tests/resilience-retired-dimensions-parity.test.mts
(parses the widget file as text, same pattern as existing widget-util
tests that can't import the widget module directly).
P2 — false "Source down" label on retired dim
---------------------------------------------
scoreFuelStockDays hard-coded imputationClass: 'source-failure',
which the widget maps to "Source down: upstream seeder failed" with
a `!` icon for every country. That is semantically wrong for an
intentional retirement. Flipped to null so the widget's absent-path
renders a neutral cell without a false outage label. null is already
a legal value of ResilienceDimensionScore.imputationClass; no type
change needed.
Tests
-----
- tests/resilience-confidence-averaging.test.mts (new): pins the
registry-keyed filter semantic for computeOverallCoverage +
computeLowConfidence. Includes a negative-control test proving
non-retired coverage=0 dims still flip lowConfidence.
- tests/resilience-retired-dimensions-parity.test.mts (new):
lockstep gate between server and client retired-dim lists.
- Widget test adds a registry-keyed exclusion test with a non-retired
coverage=0 dim in the fixture to lock in the correct semantic.
- Existing tests asserting imputationClass: 'source-failure' for
fuelStockDays flipped to null.
All 494 resilience tests + full 6336/6336 data-tier suite pass.
Typecheck clean for both tsconfig.json and tsconfig.api.json.
* docs(resilience): align methodology + registry metadata with shipped imputationClass=null
Follow-up to the previous PR 3 review commit that flipped
scoreFuelStockDays's imputationClass from 'source-failure' to null to
avoid a false "Source down" widget label on every country. The code
changed; the doc and registry metadata did not, leaving three sites
in the methodology mdx and two comment/description sites in the
registry still claiming imputationClass='source-failure'. Any future
reviewer (or tooling that treats the registry description as
authoritative) would be misled.
This commit rewrites those sites to describe the shipped behavior:
- imputationClass=null (not 'source-failure'), with the rationale
- exclusion from confidence/coverage averages via the
RESILIENCE_RETIRED_DIMENSIONS registry filter
- the distinction between structural retirement (filtered) and
runtime coverage=0 (kept so sparse-data countries still flag
lowConfidence)
Touched:
- docs/methodology/country-resilience-index.mdx (lines ~33, ~268, ~590)
- server/worldmonitor/resilience/v1/_indicator-registry.ts
(recoveryFuelStockDays comment block + description field)
No code-behavior change. Docs-only.
Tests: 157 targeted resilience tests pass (incl. methodology-lint +
widget + release-gate + confidence-averaging). Typecheck clean on
both tsconfig.json and tsconfig.api.json.
1390 lines
65 KiB
JavaScript
1390 lines
65 KiB
JavaScript
#!/usr/bin/env node
|
||
// Compare current production overall_score (6-domain weighted aggregate)
|
||
// against the proposed pillar-combined score with penalty term (α=0.5).
|
||
// Produces a JSON artifact with the Spearman correlation, the top-N
|
||
// absolute-rank movers, and per-country score deltas so the activation
|
||
// decision (flip or keep pending?) has a concrete data point.
|
||
//
|
||
// Usage: node --import tsx/esm scripts/compare-resilience-current-vs-proposed.mjs > out.json
|
||
//
|
||
// IMPORTANT: this script must use the SAME pillar aggregation path the
|
||
// production API exposes, not a local re-implementation with different
|
||
// weighting semantics. We therefore import `buildPillarList` directly
|
||
// from `server/worldmonitor/resilience/v1/_pillar-membership.ts` (which
|
||
// weights member domains by their average dimension coverage, not by
|
||
// their static domain weights) and replicate `_shared.ts#buildDomainList`
|
||
// inline so domain scores are produced by the same coverage-weighted
|
||
// mean the production scorer uses. Any drift from production here
|
||
// invalidates the Spearman / rank-delta conclusions downstream, so if
|
||
// production ever changes its aggregation path this script must be
|
||
// updated in lockstep.
|
||
|
||
import { loadEnvFile } from './_seed-utils.mjs';
|
||
import { readFileSync, readdirSync, existsSync } from 'node:fs';
|
||
import path from 'node:path';
|
||
import { fileURLToPath } from 'node:url';
|
||
import { RESILIENCE_COHORTS } from '../tests/helpers/resilience-cohorts.mts';
|
||
import { MATCHED_PAIRS } from '../tests/helpers/resilience-matched-pairs.mts';
|
||
|
||
const __filename = fileURLToPath(import.meta.url);
|
||
const REPO_ROOT = path.resolve(path.dirname(__filename), '..');
|
||
const SNAPSHOT_DIR = path.join(REPO_ROOT, 'docs', 'snapshots');
|
||
|
||
loadEnvFile(import.meta.url);
|
||
|
||
// Scoring and acceptance gates run over the FULL scorable universe
|
||
// (listScorableCountries() from _shared.ts) — no curated SAMPLE is
|
||
// used. Earlier revisions computed drift / Spearman / cohort / pair
|
||
// checks on a 52-country sensitivity seed (+ cohort union); that
|
||
// missed regressions in any country outside the seed. RESILIENCE_COHORTS
|
||
// and MATCHED_PAIRS are still imported because the cohort/pair
|
||
// diagnostic blocks below are naturally scoped to their defined
|
||
// memberships, and we use them to report cohortMissingFromScorable
|
||
// (any cohort/pair endpoint that listScorableCountries refuses to
|
||
// score — fail-loud instead of silent drop).
|
||
|
||
// Mirrors `_shared.ts#coverageWeightedMean`. Kept local because the
|
||
// production helper is not exported.
|
||
function coverageWeightedMean(dims) {
|
||
const totalCoverage = dims.reduce((s, d) => s + d.coverage, 0);
|
||
if (!totalCoverage) return 0;
|
||
return dims.reduce((s, d) => s + d.score * d.coverage, 0) / totalCoverage;
|
||
}
|
||
|
||
// Mirrors `_shared.ts#buildDomainList` exactly so the ResilienceDomain
|
||
// objects fed to buildPillarList are byte-identical to what production
|
||
// emits. The production helper is not exported, so we re-implement it
|
||
// here; the implementation MUST stay in lockstep with _shared.ts.
|
||
function buildDomainList(dimensions, dimensionDomains, domainOrder, getDomainWeight) {
|
||
const grouped = new Map();
|
||
for (const domainId of domainOrder) grouped.set(domainId, []);
|
||
for (const dim of dimensions) {
|
||
const domainId = dimensionDomains[dim.id];
|
||
grouped.get(domainId)?.push(dim);
|
||
}
|
||
return domainOrder.map((domainId) => {
|
||
const domainDims = grouped.get(domainId) ?? [];
|
||
const domainScore = coverageWeightedMean(domainDims);
|
||
return {
|
||
id: domainId,
|
||
score: Math.round(domainScore * 100) / 100,
|
||
weight: getDomainWeight(domainId),
|
||
dimensions: domainDims,
|
||
};
|
||
});
|
||
}
|
||
|
||
function rankCountries(scores) {
|
||
const sorted = Object.entries(scores)
|
||
.sort(([a, scoreA], [b, scoreB]) => scoreB - scoreA || a.localeCompare(b));
|
||
const ranks = {};
|
||
for (let i = 0; i < sorted.length; i++) {
|
||
ranks[sorted[i][0]] = i + 1;
|
||
}
|
||
return ranks;
|
||
}
|
||
|
||
// Auto-discover the immediate-prior baseline snapshot so scorer PRs
|
||
// can compare against a LOCKED BEFORE-STATE (acceptance gates 2 + 6 + 7)
|
||
// rather than against the in-process proposed formula.
|
||
//
|
||
// Filename conventions:
|
||
// resilience-ranking-live-pre-repair-<YYYY-MM-DD>.json (PR 0 freeze)
|
||
// resilience-ranking-live-post-pr<N>-<YYYY-MM-DD>.json (each scorer PR's landing snapshot)
|
||
//
|
||
// Ordering MUST parse out both the PR number and the date, NOT plain
|
||
// filename sort. Plain sort breaks in two ways:
|
||
// 1. Lexical ordering: 'pre' > 'post' alphabetically (`pr...` → 'r' > 'o'),
|
||
// so `live-pre-repair-2026-04-22` sorts AFTER `live-post-pr1-2026-05-01`,
|
||
// which means the pre-repair freeze would keep winning even after
|
||
// post-PR snapshots land.
|
||
// 2. Lexical ordering: `pr10` < `pr9` (digit-by-digit), so the PR-10
|
||
// snapshot would lose to the PR-9 snapshot.
|
||
//
|
||
// Fix: sort keys are (kind rank desc, prNumber desc, date desc), where
|
||
// kind is `post` (newer than any pre-repair) over `pre-repair`. Among
|
||
// posts, higher PR number wins on numeric comparison; ties broken by
|
||
// date. Returns null if no baseline is present.
|
||
function parseBaselineSnapshotMeta(filename) {
|
||
const preMatch = /^resilience-ranking-live-pre-repair-(\d{4}-\d{2}-\d{2})\.json$/.exec(filename);
|
||
if (preMatch) {
|
||
// kindRank 0 ensures any `post-*` snapshot supersedes every
|
||
// `pre-repair-*` freeze regardless of date.
|
||
return { filename, kind: 'pre-repair', kindRank: 0, prNumber: -1, date: preMatch[1] };
|
||
}
|
||
const postMatch = /^resilience-ranking-live-post-(.+?)-(\d{4}-\d{2}-\d{2})\.json$/.exec(filename);
|
||
if (postMatch) {
|
||
const [, tag, date] = postMatch;
|
||
const prMatch = /^pr(\d+)$/i.exec(tag);
|
||
// Unrecognised `post-<tag>` → prNumber 0 so it ranks between
|
||
// pre-repair and any numbered post-PR snapshot. Better than
|
||
// silently winning or silently losing; the tag is still printed
|
||
// back in `baselineFile` so the operator can spot it.
|
||
return { filename, kind: 'post', kindRank: 1, prNumber: prMatch ? Number(prMatch[1]) : 0, date, tag };
|
||
}
|
||
return null;
|
||
}
|
||
|
||
function loadMostRecentBaselineSnapshot() {
|
||
if (!existsSync(SNAPSHOT_DIR)) return null;
|
||
let entries;
|
||
try {
|
||
entries = readdirSync(SNAPSHOT_DIR);
|
||
} catch {
|
||
return null;
|
||
}
|
||
const candidates = entries
|
||
.map(parseBaselineSnapshotMeta)
|
||
.filter((m) => m != null)
|
||
.sort((a, b) => {
|
||
if (a.kindRank !== b.kindRank) return b.kindRank - a.kindRank;
|
||
if (a.prNumber !== b.prNumber) return b.prNumber - a.prNumber;
|
||
return b.date.localeCompare(a.date);
|
||
});
|
||
if (candidates.length === 0) return null;
|
||
const latest = candidates[0];
|
||
const raw = readFileSync(path.join(SNAPSHOT_DIR, latest.filename), 'utf8');
|
||
const parsed = JSON.parse(raw);
|
||
if (!Array.isArray(parsed.items)) return null;
|
||
return {
|
||
filename: latest.filename,
|
||
kind: latest.kind,
|
||
prNumber: latest.prNumber,
|
||
date: latest.date,
|
||
capturedAt: parsed.capturedAt,
|
||
commitSha: parsed.commitSha,
|
||
scoresByCountry: Object.fromEntries(
|
||
parsed.items.map((item) => [item.countryCode, item.overallScore]),
|
||
),
|
||
greyedOutCountries: new Set((parsed.greyedOut ?? []).map((g) => g.countryCode)),
|
||
};
|
||
}
|
||
|
||
function spearmanCorrelation(ranksA, ranksB) {
|
||
const keys = Object.keys(ranksA).filter((k) => k in ranksB);
|
||
const n = keys.length;
|
||
if (n < 2) return 1;
|
||
const dSqSum = keys.reduce((s, k) => s + (ranksA[k] - ranksB[k]) ** 2, 0);
|
||
return 1 - (6 * dSqSum) / (n * (n * n - 1));
|
||
}
|
||
|
||
// Per-indicator extraction registry. Acceptance gate 8 in the plan
|
||
// requires effective-influence-by-INDICATOR (not by dimension) across
|
||
// the scorer. The registry below is built from INDICATOR_REGISTRY at
|
||
// runtime: every entry in INDICATOR_REGISTRY gets a row here with an
|
||
// explicit extractionStatus, so indicators that cannot be deterministi-
|
||
// cally extracted from raw Redis (event-window aggregates, Monte-Carlo
|
||
// style summaries, etc.) are NOT silently omitted — they appear in
|
||
// `perIndicatorInfluence[]` with `extractionStatus: 'not-implemented'`
|
||
// and a reason string. This keeps the acceptance apparatus honest:
|
||
// later PRs can see exactly which indicators are covered, which are
|
||
// gaps, and which ones they need to instrument in scorer trace hooks.
|
||
//
|
||
// Shape families covered deterministically (extractionStatus:
|
||
// 'implemented'):
|
||
//
|
||
// A) resilience:static:{ISO2} + dotted sub-path (WB code / WGI /
|
||
// WHO / FAO / GPI / RSF / IEA / tradeToGdp / fxReservesMonths /
|
||
// appliedTariffRate)
|
||
// B) energy:mix:v1:{ISO2} scalar field
|
||
// C) energy:gas-storage:v1:{ISO2} scalar field
|
||
// D) resilience:recovery:<name>:v1 bulk key, .countries[ISO2].<field>
|
||
// E) economic:imf:macro:v2 bulk key, .countries[ISO2].<field>
|
||
// F) economic:imf:labor:v1 bulk key, .countries[ISO2].<field>
|
||
// G) economic:national-debt:v1 bulk key, .countries[ISO2].<field>
|
||
//
|
||
// Indicators whose source key is an aggregate-event stream (UCDP
|
||
// events, unrest events, cyber threats, GPS jamming hexes, internet
|
||
// outages, displacement summary, supply-chain shipping / transit
|
||
// stress, trade restrictions / barriers, sanctions counts, energy
|
||
// price stress, social Reddit, BIS DSR / EER, news threat summary)
|
||
// cannot be deterministically reduced to a single per-country scalar
|
||
// without re-running the scorer's own windowing / severity-weighting
|
||
// math, which would duplicate production logic and drift. These are
|
||
// marked `extractionStatus: 'not-implemented'` with a reason; later
|
||
// PRs can either expose a scorer trace hook, or add dedicated
|
||
// extractors here if the aggregation is simple enough to safely
|
||
// duplicate.
|
||
//
|
||
// EXTRACTION_RULES is keyed by the registry's indicator `id` field, so
|
||
// adding a new indicator to INDICATOR_REGISTRY flags this table via
|
||
// the "unregistered indicator" branch in buildIndicatorExtractionPlan.
|
||
|
||
// The rules below use exported scorer helpers wherever the indicator
|
||
// is an event-window aggregate or needs per-country name matching.
|
||
// This avoids duplicating scorer math in the harness — any drift
|
||
// between harness and scorer is impossible by construction.
|
||
//
|
||
// Three Core indicators remain `not-implemented` for structural
|
||
// reasons (NOT missing code — the scorer inputs are genuinely global
|
||
// scalars with no per-country variance to correlate):
|
||
// - shippingStress: scorer reads a global stressScore and combines
|
||
// it with each country's tradeExposure. The raw indicator has
|
||
// zero per-country variance; Pearson(indicator, overall) is 0/NaN.
|
||
// - transitDisruption: scorer takes `mean(...)` across all transit
|
||
// corridor summaries → single global scalar with the same
|
||
// no-variance problem.
|
||
// - energyPriceStress: scorer reads a global mean absolute price
|
||
// change across commodities → same no-variance problem.
|
||
// These three are per-country ONLY via trade/energy exposure ratios,
|
||
// which is a derived signal (in a different indicator entirely).
|
||
//
|
||
// Two more (fxVolatility, fxDeviation) remain unimplemented because
|
||
// they need monthly time-series math on BIS REER series that the
|
||
// harness shouldn't duplicate without a helper export.
|
||
|
||
const EXTRACTION_RULES = {
|
||
// ── macroFiscal ─────────────────────────────────────────────────────
|
||
govRevenuePct: { type: 'imf-macro-country-field', field: 'govRevenuePct' },
|
||
debtGrowthRate: { type: 'national-debt', field: 'annualGrowth' },
|
||
currentAccountPct: { type: 'imf-macro-country-field', field: 'currentAccountPct' },
|
||
unemploymentPct: { type: 'imf-labor-country-field', field: 'unemploymentPct' },
|
||
householdDebtService: { type: 'not-implemented', reason: 'BIS DSR curated series needs per-country quarterly DSR selection matching the scorer window' },
|
||
|
||
// ── currencyExternal ────────────────────────────────────────────────
|
||
// PR 3 §3.5: BIS retired from core; inflationStability (IMF macro) is
|
||
// the new primary with reserves secondary. fxVolatility/fxDeviation
|
||
// stay experimental-only (BIS monthly-change math not exported).
|
||
inflationStability: { type: 'imf-macro-country-field', field: 'inflationPct' },
|
||
fxReservesAdequacy: { type: 'static-path', path: ['fxReservesMonths', 'months'] },
|
||
fxVolatility: { type: 'not-implemented', reason: 'BIS REER annualized volatility needs scorer monthly-change std-dev; helper not exported' },
|
||
fxDeviation: { type: 'not-implemented', reason: 'BIS REER absolute deviation from 100 needs scorer latest-value selection; helper not exported' },
|
||
|
||
// ── tradeSanctions ──────────────────────────────────────────────────
|
||
sanctionCount: { type: 'sanctions-count' },
|
||
tradeRestrictions: { type: 'count-trade-restrictions' },
|
||
tradeBarriers: { type: 'count-trade-barriers' },
|
||
appliedTariffRate: { type: 'static-path', path: ['appliedTariffRate', 'value'] },
|
||
|
||
// ── cyberDigital (scorer-aggregated event streams) ──────────────────
|
||
cyberThreats: { type: 'summarize-cyber' },
|
||
internetOutages: { type: 'summarize-outages-penalty' },
|
||
gpsJamming: { type: 'summarize-gps-penalty' },
|
||
|
||
// ── logisticsSupply ─────────────────────────────────────────────────
|
||
roadsPavedLogistics: { type: 'static-wb-infrastructure', code: 'IS.ROD.PAVE.ZS' },
|
||
shippingStress: { type: 'not-implemented', reason: 'Scorer input is a global stressScore applied to every country; no per-country variance to correlate' },
|
||
transitDisruption: { type: 'not-implemented', reason: 'Scorer input is a global mean across transit corridor summaries; no per-country variance' },
|
||
|
||
// ── infrastructure ──────────────────────────────────────────────────
|
||
electricityAccess: { type: 'static-wb-infrastructure', code: 'EG.ELC.ACCS.ZS' },
|
||
roadsPavedInfra: { type: 'static-wb-infrastructure', code: 'IS.ROD.PAVE.ZS' },
|
||
infraOutages: { type: 'summarize-outages-penalty' },
|
||
|
||
// ── energy ──────────────────────────────────────────────────────────
|
||
energyImportDependency: { type: 'static-path', path: ['iea', 'energyImportDependency', 'value'] },
|
||
gasShare: { type: 'energy-mix-field', field: 'gasShare' },
|
||
coalShare: { type: 'energy-mix-field', field: 'coalShare' },
|
||
renewShare: { type: 'energy-mix-field', field: 'renewShare' },
|
||
gasStorageStress: { type: 'gas-storage-field', field: 'fillPct' },
|
||
energyPriceStress: { type: 'not-implemented', reason: 'Scorer input is a global mean across commodity price changes; no per-country variance' },
|
||
electricityConsumption: { type: 'static-wb-infrastructure', code: 'EG.USE.ELEC.KH.PC' },
|
||
// PR 1 v2 energy indicators — `tier: 'experimental'` until seeders
|
||
// land. The extractor reads the same bulk-payload shape the scorer
|
||
// reads: { countries: { [ISO2]: { value, year } } }. When seed is
|
||
// absent the pairedSampleSize drops to 0 and Pearson returns 0,
|
||
// surfacing the "no influence yet" state in the harness output.
|
||
// importedFossilDependence is a SCORER-LEVEL COMPOSITE, not a direct
|
||
// seed-key read: scoreEnergyV2 computes
|
||
// fossilElectricityShare × max(netImports, 0) / 100
|
||
// where netImports is staticRecord.iea.energyImportDependency.value.
|
||
// Measuring only fossilShare underreports effective influence for
|
||
// net importers (whose composite is modulated by netImports) and
|
||
// zeros out the signal entirely for net exporters. The extractor
|
||
// therefore has to recompute the same composite; the shape family
|
||
// below reads BOTH inputs per country and applies the same math.
|
||
importedFossilDependence: { type: 'imported-fossil-dependence-composite' },
|
||
lowCarbonGenerationShare: { type: 'bulk-v1-country-value', key: 'resilience:low-carbon-generation:v1' },
|
||
powerLossesPct: { type: 'bulk-v1-country-value', key: 'resilience:power-losses:v1' },
|
||
// reserveMarginPct deferred per plan §3.1 — no seeder, no registry
|
||
// entry. Add here when the IEA electricity-balance seeder lands.
|
||
|
||
// ── governanceInstitutional (all 6 WGI sub-pillars) ─────────────────
|
||
wgiVoiceAccountability: { type: 'static-wgi', code: 'VA.EST' },
|
||
wgiPoliticalStability: { type: 'static-wgi', code: 'PV.EST' },
|
||
wgiGovernmentEffectiveness: { type: 'static-wgi', code: 'GE.EST' },
|
||
wgiRegulatoryQuality: { type: 'static-wgi', code: 'RQ.EST' },
|
||
wgiRuleOfLaw: { type: 'static-wgi', code: 'RL.EST' },
|
||
wgiControlOfCorruption: { type: 'static-wgi', code: 'CC.EST' },
|
||
|
||
// ── socialCohesion ──────────────────────────────────────────────────
|
||
gpiScore: { type: 'static-path', path: ['gpi', 'score'] },
|
||
displacementTotal: { type: 'displacement-field', field: 'totalDisplaced' },
|
||
displacementHosted: { type: 'displacement-field', field: 'hostTotal' },
|
||
unrestEvents: { type: 'summarize-unrest' },
|
||
|
||
// ── borderSecurity / stateContinuity conflict-events (event-window) ─
|
||
ucdpConflict: { type: 'summarize-ucdp' },
|
||
|
||
// ── informationCognitive ────────────────────────────────────────────
|
||
rsfPressFreedom: { type: 'static-path', path: ['rsf', 'score'] },
|
||
socialVelocity: { type: 'summarize-social-velocity' },
|
||
newsThreatScore: { type: 'news-threat-score' },
|
||
|
||
// ── healthPublicService ─────────────────────────────────────────────
|
||
hospitalBeds: { type: 'static-who', code: 'hospitalBeds' },
|
||
uhcIndex: { type: 'static-who', code: 'uhcIndex' },
|
||
measlesCoverage: { type: 'static-who', code: 'measlesCoverage' },
|
||
|
||
// ── foodWater ───────────────────────────────────────────────────────
|
||
ipcPeopleInCrisis: { type: 'static-path', path: ['fao', 'peopleInCrisis'] },
|
||
ipcPhase: { type: 'static-path', path: ['fao', 'phase'] },
|
||
// AQUASTAT: both indicators share `.aquastat.value` but the scorer
|
||
// splits them by the `.aquastat.indicator` tag keyword. The harness
|
||
// matches the same branching so each row correlates only against
|
||
// countries whose AQUASTAT entry is in the matching family —
|
||
// otherwise availability-country readings would corrupt the stress
|
||
// Pearson (and vice versa).
|
||
aquastatWaterStress: { type: 'static-aquastat-stress' },
|
||
aquastatWaterAvailability: { type: 'static-aquastat-availability' },
|
||
|
||
// ── recovery* (seeded bulk keys, deterministic per-country fields) ──
|
||
recoveryGovRevenue: { type: 'recovery-country-field', key: 'resilience:recovery:fiscal-space:v1', field: 'govRevenuePct' },
|
||
recoveryFiscalBalance: { type: 'recovery-country-field', key: 'resilience:recovery:fiscal-space:v1', field: 'fiscalBalancePct' },
|
||
recoveryDebtToGdp: { type: 'recovery-country-field', key: 'resilience:recovery:fiscal-space:v1', field: 'debtToGdpPct' },
|
||
recoveryReserveMonths: { type: 'recovery-country-field', key: 'resilience:recovery:reserve-adequacy:v1', field: 'reserveMonths' },
|
||
recoveryDebtToReserves: { type: 'recovery-country-field', key: 'resilience:recovery:external-debt:v1', field: 'debtToReservesRatio' },
|
||
recoveryImportHhi: { type: 'recovery-country-field', key: 'resilience:recovery:import-hhi:v1', field: 'hhi' },
|
||
recoveryFuelStockDays: { type: 'recovery-country-field', key: 'resilience:recovery:fuel-stocks:v1', field: 'stockDays' },
|
||
|
||
// ── stateContinuity derived signals ─────────────────────────────────
|
||
recoveryWgiContinuity: { type: 'static-wgi-mean' },
|
||
recoveryConflictPressure: { type: 'summarize-ucdp' },
|
||
recoveryDisplacementVelocity: { type: 'displacement-field', field: 'totalDisplaced' },
|
||
};
|
||
|
||
// Shape-family dispatch tables. Each extractor takes (rule, sources,
|
||
// countryCode, scorerHelpers) and returns a number or null. Splitting
|
||
// the dispatcher this way keeps each function's cyclomatic complexity
|
||
// below the biome ceiling (the original monolithic switch exceeded it).
|
||
|
||
// AQUASTAT `.aquastat.value` is a single field whose MEANING is carried
|
||
// by the sibling `.aquastat.indicator` tag. scoreAquastatValue() in
|
||
// _dimension-scorers.ts branches the interpretation: stress-family
|
||
// keywords → lowerBetter, availability-family keywords → higherBetter.
|
||
// To match the scorer's classification exactly, the harness gates
|
||
// extraction on the same keyword set, lowercased to match the scorer's
|
||
// normalizeCountryToken path (which lowercases + strips punctuation
|
||
// before the includes() calls at L770-776).
|
||
const AQUASTAT_STRESS_KEYWORDS = ['stress', 'withdrawal', 'dependency'];
|
||
const AQUASTAT_AVAILABILITY_KEYWORDS = ['availability', 'renewable', 'access'];
|
||
|
||
// Classify the AQUASTAT entry by the scorer's EXACT priority order:
|
||
// stress-family first, then availability-family, then 'unknown'. This
|
||
// mirrors the sequential `if` checks in scoreAquastatValue() so a tag
|
||
// like "stress (withdrawal/availability)" routes to stress, not to
|
||
// availability (even though the tag string contains both keywords).
|
||
function classifyAquastatFamily(staticRecord) {
|
||
const raw = staticRecord?.aquastat?.indicator;
|
||
if (typeof raw !== 'string') return 'unknown';
|
||
const normalized = raw.toLowerCase();
|
||
if (AQUASTAT_STRESS_KEYWORDS.some((kw) => normalized.includes(kw))) return 'stress';
|
||
if (AQUASTAT_AVAILABILITY_KEYWORDS.some((kw) => normalized.includes(kw))) return 'availability';
|
||
return 'unknown';
|
||
}
|
||
|
||
const STATIC_EXTRACTORS = {
|
||
'static-path': (rule, { staticRecord }) => {
|
||
let cursor = staticRecord;
|
||
for (const k of rule.path) cursor = cursor?.[k];
|
||
return typeof cursor === 'number' ? cursor : null;
|
||
},
|
||
'static-wb-infrastructure': (rule, { staticRecord }) =>
|
||
staticRecord?.infrastructure?.indicators?.[rule.code]?.value ?? null,
|
||
'static-wgi': (rule, { staticRecord }) =>
|
||
staticRecord?.wgi?.indicators?.[rule.code]?.value ?? null,
|
||
'static-wgi-mean': (_rule, { staticRecord }) => {
|
||
const entries = Object.values(staticRecord?.wgi?.indicators ?? {})
|
||
.map((e) => (typeof e?.value === 'number' ? e.value : null))
|
||
.filter((v) => v != null);
|
||
if (entries.length === 0) return null;
|
||
return entries.reduce((s, v) => s + v, 0) / entries.length;
|
||
},
|
||
'static-who': (rule, { staticRecord }) =>
|
||
staticRecord?.who?.indicators?.[rule.code]?.value ?? null,
|
||
'static-aquastat-stress': (_rule, { staticRecord }) => {
|
||
const value = staticRecord?.aquastat?.value;
|
||
if (typeof value !== 'number') return null;
|
||
return classifyAquastatFamily(staticRecord) === 'stress' ? value : null;
|
||
},
|
||
'static-aquastat-availability': (_rule, { staticRecord }) => {
|
||
const value = staticRecord?.aquastat?.value;
|
||
if (typeof value !== 'number') return null;
|
||
return classifyAquastatFamily(staticRecord) === 'availability' ? value : null;
|
||
},
|
||
};
|
||
|
||
const SIMPLE_EXTRACTORS = {
|
||
'energy-mix-field': (rule, { energyMix }) =>
|
||
typeof energyMix?.[rule.field] === 'number' ? energyMix[rule.field] : null,
|
||
'gas-storage-field': (rule, { gasStorage }) =>
|
||
typeof gasStorage?.[rule.field] === 'number' ? gasStorage[rule.field] : null,
|
||
'recovery-country-field': (rule, sources, countryCode) => {
|
||
const bulkByKey = {
|
||
'resilience:recovery:fiscal-space:v1': sources.fiscalSpace,
|
||
'resilience:recovery:reserve-adequacy:v1': sources.reserveAdequacy,
|
||
'resilience:recovery:external-debt:v1': sources.externalDebt,
|
||
'resilience:recovery:import-hhi:v1': sources.importHhi,
|
||
'resilience:recovery:fuel-stocks:v1': sources.fuelStocks,
|
||
};
|
||
const entry = bulkByKey[rule.key]?.countries?.[countryCode];
|
||
return typeof entry?.[rule.field] === 'number' ? entry[rule.field] : null;
|
||
},
|
||
'imf-macro-country-field': (rule, { imfMacro }, countryCode) => {
|
||
const entry = imfMacro?.countries?.[countryCode];
|
||
return typeof entry?.[rule.field] === 'number' ? entry[rule.field] : null;
|
||
},
|
||
'imf-labor-country-field': (rule, { imfLabor }, countryCode) => {
|
||
const entry = imfLabor?.countries?.[countryCode];
|
||
return typeof entry?.[rule.field] === 'number' ? entry[rule.field] : null;
|
||
},
|
||
'national-debt': (rule, { nationalDebt }, countryCode) => {
|
||
if (!Array.isArray(nationalDebt)) return null;
|
||
const found = nationalDebt.find(
|
||
(e) => e?.iso2 === countryCode || e?.countryCode === countryCode,
|
||
);
|
||
return typeof found?.[rule.field] === 'number' ? found[rule.field] : null;
|
||
},
|
||
'sanctions-count': (_rule, { sanctionsCounts }, countryCode) => {
|
||
const direct = sanctionsCounts?.[countryCode];
|
||
return typeof direct === 'number' ? direct : null;
|
||
},
|
||
// Shape: { countries: { [ISO2]: { value, year } } }. Used by the
|
||
// PR 1 v2 energy seeders. The key is specified per-rule so the
|
||
// dispatcher can route multiple bulk-v1 payloads through one
|
||
// extractor.
|
||
'bulk-v1-country-value': (rule, { bulkV1 }, countryCode) => {
|
||
const payload = bulkV1?.[rule.key];
|
||
const entry = payload?.countries?.[countryCode];
|
||
return typeof entry?.value === 'number' ? entry.value : null;
|
||
},
|
||
// Mirrors scoreEnergyV2's `importedFossilDependence` composite:
|
||
// fossilElectricityShare × max(netImports, 0) / 100
|
||
// fossilElectricityShare lives in the PR 1 bulk key; netImports
|
||
// reuses the legacy resilience:static.iea.energyImportDependency.value
|
||
// (EG.IMP.CONS.ZS) that the static seeder already publishes. This
|
||
// extractor MUST stay in lockstep with the scorer — drift between
|
||
// the two breaks gate-9's effective-influence interpretation.
|
||
'imported-fossil-dependence-composite': (_rule, { staticRecord, bulkV1 }, countryCode) => {
|
||
const fossilPayload = bulkV1?.['resilience:fossil-electricity-share:v1'];
|
||
const fossilEntry = fossilPayload?.countries?.[countryCode];
|
||
const fossilShare = typeof fossilEntry?.value === 'number' ? fossilEntry.value : null;
|
||
const netImports = typeof staticRecord?.iea?.energyImportDependency?.value === 'number'
|
||
? staticRecord.iea.energyImportDependency.value
|
||
: null;
|
||
if (fossilShare == null || netImports == null) return null;
|
||
return fossilShare * Math.max(netImports, 0) / 100;
|
||
},
|
||
};
|
||
|
||
// Aggregator extractors wire through exported scorer helpers so the
|
||
// per-country aggregation math never drifts between harness + scorer.
|
||
function extractSummarizeCyber(_rule, { cyber }, countryCode, { summarizeCyber }) {
|
||
if (!summarizeCyber || cyber == null) return null;
|
||
const { weightedCount } = summarizeCyber(cyber, countryCode);
|
||
return weightedCount > 0 ? weightedCount : null;
|
||
}
|
||
function extractOutagesPenalty(_rule, { outages }, countryCode, { summarizeOutages }) {
|
||
if (!summarizeOutages || outages == null) return null;
|
||
const { total, major, partial } = summarizeOutages(outages, countryCode);
|
||
const penalty = total * 4 + major * 2 + partial;
|
||
return penalty > 0 ? penalty : null;
|
||
}
|
||
function extractGpsPenalty(_rule, { gps }, countryCode, { summarizeGps }) {
|
||
if (!summarizeGps || gps == null) return null;
|
||
const { high, medium } = summarizeGps(gps, countryCode);
|
||
const penalty = high * 3 + medium;
|
||
return penalty > 0 ? penalty : null;
|
||
}
|
||
function extractSummarizeUcdp(_rule, { ucdp }, countryCode, { summarizeUcdp }) {
|
||
if (!summarizeUcdp || ucdp == null) return null;
|
||
const { eventCount } = summarizeUcdp(ucdp, countryCode);
|
||
return eventCount > 0 ? eventCount : null;
|
||
}
|
||
function extractSummarizeUnrest(_rule, { unrest }, countryCode, { summarizeUnrest }) {
|
||
if (!summarizeUnrest || unrest == null) return null;
|
||
const { unrestCount } = summarizeUnrest(unrest, countryCode);
|
||
return unrestCount > 0 ? unrestCount : null;
|
||
}
|
||
function extractSocialVelocity(_rule, { socialVelocity }, countryCode, { summarizeSocialVelocity }) {
|
||
if (!summarizeSocialVelocity || socialVelocity == null) return null;
|
||
const v = summarizeSocialVelocity(socialVelocity, countryCode);
|
||
return v > 0 ? v : null;
|
||
}
|
||
function extractDisplacementField(rule, { displacement }, countryCode, { getCountryDisplacement }) {
|
||
if (!getCountryDisplacement || displacement == null) return null;
|
||
const entry = getCountryDisplacement(displacement, countryCode);
|
||
return typeof entry?.[rule.field] === 'number' ? entry[rule.field] : null;
|
||
}
|
||
function extractNewsThreat(_rule, { newsThreat }, countryCode, { getThreatSummaryScore }) {
|
||
if (!getThreatSummaryScore || newsThreat == null) return null;
|
||
return getThreatSummaryScore(newsThreat, countryCode);
|
||
}
|
||
function extractTradeRestrictions(_rule, { tradeRestrictions }, countryCode, { countTradeRestrictions }) {
|
||
if (!countTradeRestrictions || tradeRestrictions == null) return null;
|
||
const count = countTradeRestrictions(tradeRestrictions, countryCode);
|
||
return count > 0 ? count : null;
|
||
}
|
||
function extractTradeBarriers(_rule, { tradeBarriers }, countryCode, { countTradeBarriers }) {
|
||
if (!countTradeBarriers || tradeBarriers == null) return null;
|
||
const count = countTradeBarriers(tradeBarriers, countryCode);
|
||
return count > 0 ? count : null;
|
||
}
|
||
|
||
const AGGREGATE_EXTRACTORS = {
|
||
'summarize-cyber': extractSummarizeCyber,
|
||
'summarize-outages-penalty': extractOutagesPenalty,
|
||
'summarize-gps-penalty': extractGpsPenalty,
|
||
'summarize-ucdp': extractSummarizeUcdp,
|
||
'summarize-unrest': extractSummarizeUnrest,
|
||
'summarize-social-velocity': extractSocialVelocity,
|
||
'displacement-field': extractDisplacementField,
|
||
'news-threat-score': extractNewsThreat,
|
||
'count-trade-restrictions': extractTradeRestrictions,
|
||
'count-trade-barriers': extractTradeBarriers,
|
||
};
|
||
|
||
function applyExtractionRule(rule, sources, countryCode, scorerHelpers = {}) {
|
||
if (!rule || rule.type === 'not-implemented') return null;
|
||
const staticFn = STATIC_EXTRACTORS[rule.type];
|
||
if (staticFn) return staticFn(rule, sources, countryCode);
|
||
const simpleFn = SIMPLE_EXTRACTORS[rule.type];
|
||
if (simpleFn) return simpleFn(rule, sources, countryCode);
|
||
const aggFn = AGGREGATE_EXTRACTORS[rule.type];
|
||
if (aggFn) return aggFn(rule, sources, countryCode, scorerHelpers);
|
||
return null;
|
||
}
|
||
|
||
async function readExtractionSources(countryCode, reader) {
|
||
// Displacement summary is year-scoped — the scorer reads the current
|
||
// calendar year (see _dimension-scorers#scoreSocialCohesion). We use
|
||
// the same resolver so the harness pulls the same payload the scorer
|
||
// would at the moment of execution.
|
||
const currentYear = new Date().getFullYear();
|
||
// PR 1 v2 energy bulk keys. Fetched once per country (the memoized
|
||
// reader de-dupes; these bulk payloads aren't country-scoped in the
|
||
// key, so all 220 country iterations share one fetch per key.)
|
||
const BULK_V1_KEYS = [
|
||
'resilience:fossil-electricity-share:v1',
|
||
'resilience:low-carbon-generation:v1',
|
||
'resilience:power-losses:v1',
|
||
// resilience:reserve-margin:v1 intentionally omitted — no seeder,
|
||
// no registry entry, per plan §3.1 deferral. Add when the IEA
|
||
// electricity-balance seeder lands.
|
||
];
|
||
const [
|
||
staticRecord, energyMix, gasStorage, fiscalSpace, reserveAdequacy,
|
||
externalDebt, importHhi, fuelStocks, imfMacro, imfLabor,
|
||
nationalDebt, sanctionsCounts,
|
||
cyber, outages, gps, ucdp, unrest, newsThreat, displacement,
|
||
socialVelocity, tradeRestrictions, tradeBarriers,
|
||
...bulkV1Payloads
|
||
] = await Promise.all([
|
||
reader(`resilience:static:${countryCode}`),
|
||
reader(`energy:mix:v1:${countryCode}`),
|
||
reader(`energy:gas-storage:v1:${countryCode}`),
|
||
reader('resilience:recovery:fiscal-space:v1'),
|
||
reader('resilience:recovery:reserve-adequacy:v1'),
|
||
reader('resilience:recovery:external-debt:v1'),
|
||
reader('resilience:recovery:import-hhi:v1'),
|
||
reader('resilience:recovery:fuel-stocks:v1'),
|
||
reader('economic:imf:macro:v2'),
|
||
reader('economic:imf:labor:v1'),
|
||
reader('economic:national-debt:v1'),
|
||
reader('sanctions:country-counts:v1'),
|
||
reader('cyber:threats:v2'),
|
||
reader('infra:outages:v1'),
|
||
reader('intelligence:gpsjam:v2'),
|
||
reader('conflict:ucdp-events:v1'),
|
||
reader('unrest:events:v1'),
|
||
reader('news:threat:summary:v1'),
|
||
reader(`displacement:summary:v1:${currentYear}`),
|
||
reader('intelligence:social:reddit:v1'),
|
||
reader('trade:restrictions:v1:tariff-overview:50'),
|
||
reader('trade:barriers:v1:tariff-gap:50'),
|
||
...BULK_V1_KEYS.map((k) => reader(k)),
|
||
]);
|
||
const bulkV1 = Object.fromEntries(BULK_V1_KEYS.map((k, i) => [k, bulkV1Payloads[i]]));
|
||
return {
|
||
staticRecord, energyMix, gasStorage, fiscalSpace, reserveAdequacy,
|
||
externalDebt, importHhi, fuelStocks, imfMacro, imfLabor,
|
||
nationalDebt, sanctionsCounts,
|
||
cyber, outages, gps, ucdp, unrest, newsThreat, displacement,
|
||
socialVelocity, tradeRestrictions, tradeBarriers,
|
||
bulkV1,
|
||
};
|
||
}
|
||
|
||
// Build the full extraction plan at startup: every entry in
|
||
// INDICATOR_REGISTRY becomes a row in the plan, with status derived
|
||
// from EXTRACTION_RULES. Any indicator present in the registry but
|
||
// missing from EXTRACTION_RULES is flagged as `unregistered-in-harness`
|
||
// so future registry additions can't silently skip influence reporting.
|
||
function buildIndicatorExtractionPlan(indicatorRegistry) {
|
||
return indicatorRegistry.map((spec) => {
|
||
const rule = EXTRACTION_RULES[spec.id];
|
||
if (!rule) {
|
||
return {
|
||
indicator: spec.id,
|
||
dimension: spec.dimension,
|
||
tier: spec.tier,
|
||
nominalWeight: spec.weight,
|
||
extractionStatus: 'unregistered-in-harness',
|
||
reason: 'Indicator exists in INDICATOR_REGISTRY but has no EXTRACTION_RULES entry; add one or explicitly mark not-implemented',
|
||
};
|
||
}
|
||
if (rule.type === 'not-implemented') {
|
||
return {
|
||
indicator: spec.id,
|
||
dimension: spec.dimension,
|
||
tier: spec.tier,
|
||
nominalWeight: spec.weight,
|
||
extractionStatus: 'not-implemented',
|
||
reason: rule.reason,
|
||
};
|
||
}
|
||
return {
|
||
indicator: spec.id,
|
||
dimension: spec.dimension,
|
||
tier: spec.tier,
|
||
nominalWeight: spec.weight,
|
||
extractionStatus: 'implemented',
|
||
rule,
|
||
};
|
||
});
|
||
}
|
||
|
||
// Pearson correlation across two equal-length arrays. Used for
|
||
// variable-influence baseline per acceptance gate 8 in the v3 plan.
|
||
function pearsonCorrelation(xs, ys) {
|
||
const n = xs.length;
|
||
if (n < 2) return 0;
|
||
const meanX = xs.reduce((s, v) => s + v, 0) / n;
|
||
const meanY = ys.reduce((s, v) => s + v, 0) / n;
|
||
let num = 0, denomX = 0, denomY = 0;
|
||
for (let i = 0; i < n; i += 1) {
|
||
const dx = xs[i] - meanX;
|
||
const dy = ys[i] - meanY;
|
||
num += dx * dy;
|
||
denomX += dx * dx;
|
||
denomY += dy * dy;
|
||
}
|
||
const denom = Math.sqrt(denomX * denomY);
|
||
return denom > 0 ? num / denom : 0;
|
||
}
|
||
|
||
async function main() {
|
||
const scorerMod = await import('../server/worldmonitor/resilience/v1/_dimension-scorers.ts');
|
||
const {
|
||
scoreAllDimensions,
|
||
RESILIENCE_DIMENSION_ORDER,
|
||
RESILIENCE_DIMENSION_DOMAINS,
|
||
getResilienceDomainWeight,
|
||
RESILIENCE_DOMAIN_ORDER,
|
||
createMemoizedSeedReader,
|
||
// Scorer helpers passed through to applyExtractionRule so per-
|
||
// indicator aggregation uses the scorer's own math (zero drift).
|
||
summarizeCyber,
|
||
summarizeOutages,
|
||
summarizeGps,
|
||
summarizeUcdp,
|
||
summarizeUnrest,
|
||
summarizeSocialVelocity,
|
||
getCountryDisplacement,
|
||
getThreatSummaryScore,
|
||
countTradeRestrictions,
|
||
countTradeBarriers,
|
||
} = scorerMod;
|
||
const scorerHelpers = {
|
||
summarizeCyber,
|
||
summarizeOutages,
|
||
summarizeGps,
|
||
summarizeUcdp,
|
||
summarizeUnrest,
|
||
summarizeSocialVelocity,
|
||
getCountryDisplacement,
|
||
getThreatSummaryScore,
|
||
countTradeRestrictions,
|
||
countTradeBarriers,
|
||
};
|
||
|
||
const {
|
||
listScorableCountries,
|
||
PENALTY_ALPHA,
|
||
penalizedPillarScore,
|
||
} = await import('../server/worldmonitor/resilience/v1/_shared.ts');
|
||
|
||
const {
|
||
buildPillarList,
|
||
PILLAR_ORDER,
|
||
PILLAR_WEIGHTS,
|
||
} = await import('../server/worldmonitor/resilience/v1/_pillar-membership.ts');
|
||
|
||
const { INDICATOR_REGISTRY } = await import(
|
||
'../server/worldmonitor/resilience/v1/_indicator-registry.ts'
|
||
);
|
||
|
||
const domainWeights = {};
|
||
for (const domainId of RESILIENCE_DOMAIN_ORDER) {
|
||
domainWeights[domainId] = getResilienceDomainWeight(domainId);
|
||
}
|
||
|
||
// Run the acceptance math over the FULL scorable universe, not a
|
||
// curated subset. Plan gate 2 ("no country's overallScore changes
|
||
// by more than 15 points") and the baseline-Spearman check must see
|
||
// every country in the ranking universe; otherwise a large regression
|
||
// inside an excluded country passes silently. RESILIENCE_COHORTS and
|
||
// MATCHED_PAIRS are still used by the cohort/pair diagnostic blocks
|
||
// (naturally scoped to their memberships); any endpoint those
|
||
// definitions reference but listScorableCountries refuses to score
|
||
// is reported in `cohortMissingFromScorable` (fail-loud, not drop).
|
||
const scorableCountries = await listScorableCountries();
|
||
const scorableUniverse = scorableCountries.slice(); // full universe
|
||
const cohortOrPairMembers = new Set([
|
||
...RESILIENCE_COHORTS.flatMap((c) => c.countryCodes),
|
||
...MATCHED_PAIRS.flatMap((p) => [p.higherExpected, p.lowerExpected]),
|
||
]);
|
||
const cohortMissingFromScorable = [...cohortOrPairMembers].filter(
|
||
(cc) => !scorableCountries.includes(cc),
|
||
);
|
||
|
||
// Load the frozen pre-PR-0 baseline before scoring so we can compute
|
||
// baseline-delta gates (acceptance gates 2, 6, 7). If no baseline
|
||
// exists yet (first run under PR 0), we still emit the comparison
|
||
// output but mark the baselineComparison block `unavailable` so the
|
||
// caller can detect missing-baseline vs passed-baseline.
|
||
const baseline = loadMostRecentBaselineSnapshot();
|
||
|
||
// Finding 3 — per-indicator extraction plan is driven by
|
||
// INDICATOR_REGISTRY (every Core + Enrichment indicator gets a row)
|
||
// rather than a hand-picked subset of 12. Indicators whose source
|
||
// key cannot be reduced to a per-country scalar without duplicating
|
||
// scorer math get extractionStatus 'not-implemented' with a reason
|
||
// — so the gap is visible in output, not hidden.
|
||
const extractionPlan = buildIndicatorExtractionPlan(INDICATOR_REGISTRY);
|
||
const implementedRules = extractionPlan.filter((p) => p.extractionStatus === 'implemented');
|
||
|
||
const sharedReader = createMemoizedSeedReader();
|
||
const rows = [];
|
||
const perIndicatorValues = {};
|
||
for (const plan of implementedRules) {
|
||
perIndicatorValues[plan.indicator] = [];
|
||
}
|
||
|
||
for (const countryCode of scorableUniverse) {
|
||
const scoreMap = await scoreAllDimensions(countryCode, sharedReader);
|
||
|
||
const sources = await readExtractionSources(countryCode, sharedReader);
|
||
for (const plan of implementedRules) {
|
||
const value = applyExtractionRule(plan.rule, sources, countryCode, scorerHelpers);
|
||
if (value == null || !Number.isFinite(value)) continue;
|
||
perIndicatorValues[plan.indicator].push({ countryCode, value });
|
||
}
|
||
|
||
// Build the same ResilienceDimension shape production uses. Only
|
||
// `id`, `score`, and `coverage` are read by buildDomainList /
|
||
// buildPillarList, but pass the other fields too for fidelity with
|
||
// the production payload (empty strings / zeros are fine here
|
||
// because the pillar aggregation does not touch them).
|
||
const dimensions = RESILIENCE_DIMENSION_ORDER.map((dimId) => ({
|
||
id: dimId,
|
||
score: scoreMap[dimId].score,
|
||
coverage: scoreMap[dimId].coverage,
|
||
observedWeight: scoreMap[dimId].observedWeight ?? 0,
|
||
imputedWeight: scoreMap[dimId].imputedWeight ?? 0,
|
||
imputationClass: scoreMap[dimId].imputationClass ?? '',
|
||
freshness: { lastObservedAtMs: '0', staleness: '' },
|
||
}));
|
||
|
||
// Build domains and pillars with the EXACT production aggregation.
|
||
const domains = buildDomainList(
|
||
dimensions,
|
||
RESILIENCE_DIMENSION_DOMAINS,
|
||
RESILIENCE_DOMAIN_ORDER,
|
||
getResilienceDomainWeight,
|
||
);
|
||
|
||
// Current production overallScore: Σ domain.score * domain.weight
|
||
// (pre-round `domains[*].score` matches the value used inside
|
||
// production's `buildResilienceScore` where the reduce operates on
|
||
// the rounded domain-list scores).
|
||
const currentOverall = domains.reduce(
|
||
(sum, d) => sum + d.score * d.weight,
|
||
0,
|
||
);
|
||
|
||
// Production pillar shape: coverage-weighted by average dimension
|
||
// coverage per member domain, not by the static domain weights.
|
||
// This is the material correction vs the earlier comparison script.
|
||
const pillars = buildPillarList(domains, true);
|
||
|
||
// Proposed overallScore: Σ pillar.score * pillar.weight × (1 − α(1 − min/100))
|
||
const proposedOverall = penalizedPillarScore(
|
||
pillars.map((p) => ({ score: p.score, weight: p.weight })),
|
||
);
|
||
|
||
const pillarById = Object.fromEntries(pillars.map((p) => [p.id, p.score]));
|
||
|
||
// Retain per-dimension scores on the row so the variable-influence
|
||
// pass below can correlate each dimension's cross-country variance
|
||
// against overall score (acceptance gate 8 baseline).
|
||
const dimensionScores = Object.fromEntries(
|
||
dimensions.map((d) => [d.id, d.score]),
|
||
);
|
||
|
||
rows.push({
|
||
countryCode,
|
||
currentOverallScore: Math.round(currentOverall * 100) / 100,
|
||
proposedOverallScore: Math.round(proposedOverall * 100) / 100,
|
||
scoreDelta: Math.round((proposedOverall - currentOverall) * 100) / 100,
|
||
dimensionScores,
|
||
pillars: {
|
||
structuralReadiness: Math.round((pillarById['structural-readiness'] ?? 0) * 100) / 100,
|
||
liveShockExposure: Math.round((pillarById['live-shock-exposure'] ?? 0) * 100) / 100,
|
||
recoveryCapacity: Math.round((pillarById['recovery-capacity'] ?? 0) * 100) / 100,
|
||
minPillar: Math.round(Math.min(...pillars.map((p) => p.score)) * 100) / 100,
|
||
},
|
||
});
|
||
}
|
||
|
||
const currentScoresMap = Object.fromEntries(rows.map((r) => [r.countryCode, r.currentOverallScore]));
|
||
const proposedScoresMap = Object.fromEntries(rows.map((r) => [r.countryCode, r.proposedOverallScore]));
|
||
|
||
const currentRanks = rankCountries(currentScoresMap);
|
||
const proposedRanks = rankCountries(proposedScoresMap);
|
||
|
||
for (const row of rows) {
|
||
row.currentRank = currentRanks[row.countryCode];
|
||
row.proposedRank = proposedRanks[row.countryCode];
|
||
row.rankDelta = row.proposedRank - row.currentRank; // + means dropped, − means climbed
|
||
row.rankAbsDelta = Math.abs(row.rankDelta);
|
||
}
|
||
|
||
const spearman = spearmanCorrelation(currentRanks, proposedRanks);
|
||
|
||
// Top movers by absolute rank change, breaking ties by absolute score delta.
|
||
const topMovers = [...rows]
|
||
.sort((a, b) =>
|
||
b.rankAbsDelta - a.rankAbsDelta ||
|
||
Math.abs(b.scoreDelta) - Math.abs(a.scoreDelta),
|
||
)
|
||
.slice(0, 10);
|
||
|
||
const biggestScoreDrops = [...rows].sort((a, b) => a.scoreDelta - b.scoreDelta).slice(0, 5);
|
||
const biggestScoreClimbs = [...rows].sort((a, b) => b.scoreDelta - a.scoreDelta).slice(0, 5);
|
||
|
||
const meanScoreDelta = rows.reduce((s, r) => s + r.scoreDelta, 0) / rows.length;
|
||
const meanAbsScoreDelta = rows.reduce((s, r) => s + Math.abs(r.scoreDelta), 0) / rows.length;
|
||
const maxRankAbsDelta = Math.max(...rows.map((r) => r.rankAbsDelta));
|
||
|
||
// Cohort + matched-pair summaries (PR 0 fairness-audit harness).
|
||
// Scoped to the cohort/pair memberships defined in the helpers;
|
||
// scoring ran over the full scorable universe so every member that
|
||
// listScorableCountries recognised is already in `rows`.
|
||
const rowsByCc = new Map(rows.map((r) => [r.countryCode, r]));
|
||
|
||
function median(values) {
|
||
if (values.length === 0) return null;
|
||
const sorted = [...values].sort((a, b) => a - b);
|
||
const mid = Math.floor(sorted.length / 2);
|
||
return sorted.length % 2 === 0 ? (sorted[mid - 1] + sorted[mid]) / 2 : sorted[mid];
|
||
}
|
||
|
||
const cohortSummary = RESILIENCE_COHORTS.map((cohort) => {
|
||
const members = cohort.countryCodes
|
||
.map((cc) => rowsByCc.get(cc))
|
||
.filter((r) => r != null);
|
||
if (members.length === 0) {
|
||
return { cohortId: cohort.id, inSample: 0, skipped: true };
|
||
}
|
||
const deltas = members.map((m) => m.scoreDelta);
|
||
const rankDeltas = members.map((m) => m.rankDelta);
|
||
const sortedByDelta = [...members].sort((a, b) => b.scoreDelta - a.scoreDelta);
|
||
return {
|
||
cohortId: cohort.id,
|
||
label: cohort.label,
|
||
inSample: members.length,
|
||
medianScoreDelta: Math.round(median(deltas) * 100) / 100,
|
||
medianAbsScoreDelta: Math.round(median(deltas.map((d) => Math.abs(d))) * 100) / 100,
|
||
maxRankAbsDelta: Math.max(...rankDeltas.map((d) => Math.abs(d))),
|
||
biggestClimber: sortedByDelta[0] != null
|
||
? { countryCode: sortedByDelta[0].countryCode, scoreDelta: sortedByDelta[0].scoreDelta, rankDelta: sortedByDelta[0].rankDelta }
|
||
: null,
|
||
biggestDrop: sortedByDelta.at(-1) != null
|
||
? { countryCode: sortedByDelta.at(-1).countryCode, scoreDelta: sortedByDelta.at(-1).scoreDelta, rankDelta: sortedByDelta.at(-1).rankDelta }
|
||
: null,
|
||
middleMover: sortedByDelta[Math.floor(sortedByDelta.length / 2)] != null
|
||
? {
|
||
countryCode: sortedByDelta[Math.floor(sortedByDelta.length / 2)].countryCode,
|
||
scoreDelta: sortedByDelta[Math.floor(sortedByDelta.length / 2)].scoreDelta,
|
||
rankDelta: sortedByDelta[Math.floor(sortedByDelta.length / 2)].rankDelta,
|
||
}
|
||
: null,
|
||
};
|
||
});
|
||
|
||
const matchedPairSummary = MATCHED_PAIRS.map((pair) => {
|
||
const higher = rowsByCc.get(pair.higherExpected);
|
||
const lower = rowsByCc.get(pair.lowerExpected);
|
||
if (!higher || !lower) {
|
||
return { pairId: pair.id, skipped: true, reason: `pair member missing from scorable universe: ${!higher ? pair.higherExpected : pair.lowerExpected}` };
|
||
}
|
||
const minGap = pair.minGap ?? 3;
|
||
const currentGap = higher.currentOverallScore - lower.currentOverallScore;
|
||
const proposedGap = higher.proposedOverallScore - lower.proposedOverallScore;
|
||
const expectedDirectionHeld = proposedGap > 0;
|
||
const gapAtLeastMin = proposedGap >= minGap;
|
||
return {
|
||
pairId: pair.id,
|
||
axis: pair.axis,
|
||
higherExpected: pair.higherExpected,
|
||
lowerExpected: pair.lowerExpected,
|
||
minGap,
|
||
currentGap: Math.round(currentGap * 100) / 100,
|
||
proposedGap: Math.round(proposedGap * 100) / 100,
|
||
expectedDirectionHeld,
|
||
gapAtLeastMin,
|
||
// Gate: if either flag is false, this pair fails the matched-pair
|
||
// acceptance check and the PR stops.
|
||
passes: expectedDirectionHeld && gapAtLeastMin,
|
||
};
|
||
});
|
||
|
||
const matchedPairFailures = matchedPairSummary.filter((p) => !p.skipped && !p.passes);
|
||
|
||
// Variable-influence baseline (Pearson-derivative approximation of
|
||
// Sobol indices). For every dimension, measures the cross-country
|
||
// Pearson correlation between that dimension's score and the current
|
||
// overall score, scaled by the dimension's nominal domain weight.
|
||
// The scaled correlation is a proxy for "effective influence" —
|
||
// acceptance gate 8 requires that after any scorer change the
|
||
// measured effective-influence agree in sign and rank-order with
|
||
// the assigned nominal weights. Indicators that nominal-weight as
|
||
// material but measured-effective-influence as near-zero flag a
|
||
// construct problem (the indicator carries weight but drives no
|
||
// variance — classic wealth-proxy or saturated-signal behaviour).
|
||
//
|
||
// A full Sobol implementation is a PR 0.5 follow-up; this Pearson-
|
||
// derivative is sufficient to produce the per-indicator baseline
|
||
// the plan's acceptance gates require.
|
||
const currentOverallArr = rows.map((r) => r.currentOverallScore);
|
||
const variableInfluence = RESILIENCE_DIMENSION_ORDER.map((dimId) => {
|
||
const domainId = RESILIENCE_DIMENSION_DOMAINS[dimId];
|
||
const domainWeight = domainWeights[domainId] ?? 0;
|
||
const dimScoresArr = rows.map((r) => r.dimensionScores[dimId] ?? 0);
|
||
const correlation = pearsonCorrelation(dimScoresArr, currentOverallArr);
|
||
// Normalize: the influence is the correlation × domain weight.
|
||
// We don't know the intra-domain weight here without re-threading
|
||
// the full indicator registry, so this is a domain-level proxy —
|
||
// sufficient for the construct-problem detector described above.
|
||
const influence = correlation * domainWeight;
|
||
const dimScoreMean = dimScoresArr.reduce((s, v) => s + v, 0) / dimScoresArr.length;
|
||
const dimScoreVariance = dimScoresArr.reduce((s, v) => s + (v - dimScoreMean) ** 2, 0) / dimScoresArr.length;
|
||
return {
|
||
dimensionId: dimId,
|
||
domainId,
|
||
nominalDomainWeight: domainWeight,
|
||
pearsonVsOverall: Math.round(correlation * 10000) / 10000,
|
||
effectiveInfluence: Math.round(influence * 10000) / 10000,
|
||
dimScoreMean: Math.round(dimScoreMean * 100) / 100,
|
||
dimScoreVariance: Math.round(dimScoreVariance * 100) / 100,
|
||
};
|
||
});
|
||
// Sort by effective influence desc so the report shows the biggest
|
||
// drivers first.
|
||
variableInfluence.sort((a, b) => Math.abs(b.effectiveInfluence) - Math.abs(a.effectiveInfluence));
|
||
|
||
// Per-indicator effective influence, driven by INDICATOR_REGISTRY
|
||
// via extractionPlan. Every registered indicator gets a row:
|
||
//
|
||
// - extractionStatus='implemented': Pearson(indicatorValue, overallScore)
|
||
// across countries with non-null readings; pairedSampleSize
|
||
// reports coverage.
|
||
// - extractionStatus='not-implemented': correlation omitted, reason
|
||
// surfaced so callers can see why (event-window aggregate,
|
||
// global-only scalar, curated sub-series, etc.).
|
||
// - extractionStatus='unregistered-in-harness': indicator exists in
|
||
// INDICATOR_REGISTRY but EXTRACTION_RULES has no entry, signalling
|
||
// a registry addition that skipped this harness.
|
||
//
|
||
// The output is sorted by absolute effective influence within the
|
||
// implemented group, then by dimension id for the other groups so
|
||
// gaps are legible.
|
||
const scoreByCc = new Map(rows.map((r) => [r.countryCode, r.currentOverallScore]));
|
||
const perIndicatorInfluence = extractionPlan.map((plan) => {
|
||
if (plan.extractionStatus !== 'implemented') {
|
||
return {
|
||
indicator: plan.indicator,
|
||
dimension: plan.dimension,
|
||
tier: plan.tier,
|
||
nominalWeight: plan.nominalWeight,
|
||
extractionStatus: plan.extractionStatus,
|
||
reason: plan.reason,
|
||
};
|
||
}
|
||
const observations = perIndicatorValues[plan.indicator] ?? [];
|
||
const xs = [];
|
||
const ys = [];
|
||
for (const { countryCode, value } of observations) {
|
||
const overall = scoreByCc.get(countryCode);
|
||
if (overall == null) continue;
|
||
xs.push(value);
|
||
ys.push(overall);
|
||
}
|
||
const correlation = pearsonCorrelation(xs, ys);
|
||
return {
|
||
indicator: plan.indicator,
|
||
dimension: plan.dimension,
|
||
tier: plan.tier,
|
||
nominalWeight: plan.nominalWeight,
|
||
extractionStatus: 'implemented',
|
||
pairedSampleSize: xs.length,
|
||
pearsonVsOverall: Math.round(correlation * 10000) / 10000,
|
||
effectiveInfluence: Math.round(correlation * 10000) / 10000,
|
||
};
|
||
});
|
||
perIndicatorInfluence.sort((a, b) => {
|
||
// Implemented entries first (sorted by |influence| desc),
|
||
// not-implemented/unregistered after (sorted by dimension/id)
|
||
// so the acceptance-apparatus gap is easy to read at the bottom.
|
||
const aImpl = a.extractionStatus === 'implemented';
|
||
const bImpl = b.extractionStatus === 'implemented';
|
||
if (aImpl !== bImpl) return aImpl ? -1 : 1;
|
||
if (aImpl) {
|
||
return Math.abs(b.effectiveInfluence) - Math.abs(a.effectiveInfluence);
|
||
}
|
||
const byDim = (a.dimension ?? '').localeCompare(b.dimension ?? '');
|
||
return byDim !== 0 ? byDim : a.indicator.localeCompare(b.indicator);
|
||
});
|
||
|
||
// Coverage summary for the extraction apparatus itself. PR 0.5 can
|
||
// track the "not-implemented" and "unregistered-in-harness" lists to
|
||
// measure progress toward full per-indicator influence coverage.
|
||
const extractionCoverage = {
|
||
totalIndicators: extractionPlan.length,
|
||
implemented: perIndicatorInfluence.filter((p) => p.extractionStatus === 'implemented').length,
|
||
notImplemented: perIndicatorInfluence.filter((p) => p.extractionStatus === 'not-implemented').length,
|
||
unregisteredInHarness: perIndicatorInfluence.filter((p) => p.extractionStatus === 'unregistered-in-harness').length,
|
||
coreImplemented: perIndicatorInfluence.filter((p) => p.extractionStatus === 'implemented' && p.tier === 'core').length,
|
||
coreTotal: extractionPlan.filter((p) => p.tier === 'core').length,
|
||
};
|
||
|
||
// Baseline comparison. Compares today's currentOverallScore against
|
||
// the locked baseline snapshot the plan pins for acceptance gates 2,
|
||
// 6, and 7. If no baseline exists (first PR 0 run), emit an explicit
|
||
// `unavailable` marker so downstream acceptance tooling can detect
|
||
// the state difference rather than treating it as a pass.
|
||
let baselineComparison;
|
||
if (!baseline) {
|
||
baselineComparison = {
|
||
status: 'unavailable',
|
||
reason:
|
||
'No baseline snapshot found in docs/snapshots/. Expected resilience-ranking-live-pre-repair-<date>.json from PR 0 freeze.',
|
||
};
|
||
} else {
|
||
const baselineScores = baseline.scoresByCountry;
|
||
const overlapping = rows
|
||
.map((r) => ({
|
||
countryCode: r.countryCode,
|
||
currentOverallScore: r.currentOverallScore,
|
||
baselineOverallScore: baselineScores[r.countryCode],
|
||
}))
|
||
.filter((r) => typeof r.baselineOverallScore === 'number');
|
||
|
||
const scoreDrifts = overlapping.map((r) => ({
|
||
countryCode: r.countryCode,
|
||
currentOverallScore: r.currentOverallScore,
|
||
baselineOverallScore: Math.round(r.baselineOverallScore * 100) / 100,
|
||
scoreDelta: Math.round((r.currentOverallScore - r.baselineOverallScore) * 100) / 100,
|
||
scoreAbsDelta: Math.abs(Math.round((r.currentOverallScore - r.baselineOverallScore) * 100) / 100),
|
||
}));
|
||
|
||
const maxCountryAbsDelta = scoreDrifts.reduce((max, d) => Math.max(max, d.scoreAbsDelta), 0);
|
||
const biggestDrifts = [...scoreDrifts]
|
||
.sort((a, b) => b.scoreAbsDelta - a.scoreAbsDelta)
|
||
.slice(0, 10);
|
||
|
||
// Spearman vs baseline over the overlap (both ranking universes
|
||
// restricted to the shared country set so newly-added or newly-
|
||
// removed countries can't skew the correlation).
|
||
const currentOverlap = Object.fromEntries(
|
||
overlapping.map((r) => [r.countryCode, r.currentOverallScore]),
|
||
);
|
||
const baselineOverlap = Object.fromEntries(
|
||
overlapping.map((r) => [r.countryCode, r.baselineOverallScore]),
|
||
);
|
||
const spearmanVsBaseline = spearmanCorrelation(
|
||
rankCountries(currentOverlap),
|
||
rankCountries(baselineOverlap),
|
||
);
|
||
|
||
// Cohort median shift vs baseline (the plan's effective cohort
|
||
// gate). A cohort whose median score has drifted by more than the
|
||
// plan's +/-5 tolerance flags for audit even if Spearman looks fine.
|
||
const cohortShiftVsBaseline = RESILIENCE_COHORTS.map((cohort) => {
|
||
const members = cohort.countryCodes
|
||
.map((cc) => {
|
||
const row = rowsByCc.get(cc);
|
||
const base = baselineScores[cc];
|
||
if (!row || typeof base !== 'number') return null;
|
||
return { countryCode: cc, delta: row.currentOverallScore - base };
|
||
})
|
||
.filter((m) => m != null);
|
||
if (members.length === 0) {
|
||
return { cohortId: cohort.id, inSample: 0, skipped: true };
|
||
}
|
||
return {
|
||
cohortId: cohort.id,
|
||
label: cohort.label,
|
||
inSample: members.length,
|
||
medianScoreDeltaVsBaseline: Math.round(median(members.map((m) => m.delta)) * 100) / 100,
|
||
};
|
||
});
|
||
|
||
// Matched-pair gap change vs baseline. For each pair, compare the
|
||
// higher-minus-lower gap today against the same gap in the frozen
|
||
// baseline so construct changes that reverse a pair can be flagged
|
||
// explicitly (the matched-pair table above is current-vs-proposed;
|
||
// this block is current-vs-baseline).
|
||
const matchedPairGapChange = MATCHED_PAIRS.map((pair) => {
|
||
const higherBase = baselineScores[pair.higherExpected];
|
||
const lowerBase = baselineScores[pair.lowerExpected];
|
||
const higher = rowsByCc.get(pair.higherExpected);
|
||
const lower = rowsByCc.get(pair.lowerExpected);
|
||
if (
|
||
typeof higherBase !== 'number' ||
|
||
typeof lowerBase !== 'number' ||
|
||
!higher ||
|
||
!lower
|
||
) {
|
||
return { pairId: pair.id, skipped: true };
|
||
}
|
||
const baselineGap = higherBase - lowerBase;
|
||
const currentGap = higher.currentOverallScore - lower.currentOverallScore;
|
||
return {
|
||
pairId: pair.id,
|
||
axis: pair.axis,
|
||
baselineGap: Math.round(baselineGap * 100) / 100,
|
||
currentGap: Math.round(currentGap * 100) / 100,
|
||
gapChange: Math.round((currentGap - baselineGap) * 100) / 100,
|
||
};
|
||
});
|
||
|
||
baselineComparison = {
|
||
status: 'ok',
|
||
baselineFile: baseline.filename,
|
||
baselineKind: baseline.kind,
|
||
baselinePrNumber: baseline.prNumber,
|
||
baselineDate: baseline.date,
|
||
baselineCapturedAt: baseline.capturedAt,
|
||
baselineCommitSha: baseline.commitSha,
|
||
overlapSize: overlapping.length,
|
||
spearmanVsBaseline: Math.round(spearmanVsBaseline * 10000) / 10000,
|
||
maxCountryAbsDelta: Math.round(maxCountryAbsDelta * 100) / 100,
|
||
biggestDriftsVsBaseline: biggestDrifts,
|
||
cohortShiftVsBaseline,
|
||
matchedPairGapChange,
|
||
};
|
||
}
|
||
|
||
// Acceptance-gate verdict per plan §6. Computed programmatically
|
||
// from the inputs above so every scorer-changing PR has a
|
||
// machine-readable pass/fail on every gate. Gate numbering matches
|
||
// the plan sections literally — do NOT reorder without updating the
|
||
// plan.
|
||
//
|
||
// Thresholds are encoded here (not tunable per-PR) so gate criteria
|
||
// can't silently soften. Any adjustment requires a PR touching this
|
||
// file + the plan doc in the same commit.
|
||
const GATE_THRESHOLDS = {
|
||
SPEARMAN_VS_BASELINE_MIN: 0.85,
|
||
MAX_COUNTRY_ABS_DELTA_MAX: 15,
|
||
COHORT_MEDIAN_SHIFT_MAX: 10,
|
||
};
|
||
const gates = [];
|
||
const addGate = (id, name, status, detail) => {
|
||
gates.push({ id, name, status, detail });
|
||
};
|
||
|
||
// Gate 1: Spearman vs immediate-prior baseline >= 0.85.
|
||
if (baselineComparison.status === 'ok') {
|
||
const s = baselineComparison.spearmanVsBaseline;
|
||
addGate('gate-1-spearman', 'Spearman vs baseline >= 0.85',
|
||
s >= GATE_THRESHOLDS.SPEARMAN_VS_BASELINE_MIN ? 'pass' : 'fail',
|
||
`${s} (floor ${GATE_THRESHOLDS.SPEARMAN_VS_BASELINE_MIN})`);
|
||
} else {
|
||
addGate('gate-1-spearman', 'Spearman vs baseline >= 0.85', 'skipped',
|
||
'baseline unavailable; re-run after PR 0 freeze ships');
|
||
}
|
||
|
||
// Gate 2: No country's overallScore changes by more than 15 points
|
||
// from the immediate-prior baseline.
|
||
if (baselineComparison.status === 'ok') {
|
||
const drift = baselineComparison.maxCountryAbsDelta;
|
||
addGate('gate-2-country-drift', 'Max country drift vs baseline <= 15 points',
|
||
drift <= GATE_THRESHOLDS.MAX_COUNTRY_ABS_DELTA_MAX ? 'pass' : 'fail',
|
||
`${drift}pt (ceiling ${GATE_THRESHOLDS.MAX_COUNTRY_ABS_DELTA_MAX})`);
|
||
} else {
|
||
addGate('gate-2-country-drift', 'Max country drift vs baseline <= 15 points', 'skipped',
|
||
'baseline unavailable');
|
||
}
|
||
|
||
// Gate 6: Cohort median shift vs baseline capped at 10 points.
|
||
if (baselineComparison.status === 'ok') {
|
||
const worstCohort = (baselineComparison.cohortShiftVsBaseline ?? [])
|
||
.filter((c) => !c.skipped && typeof c.medianScoreDeltaVsBaseline === 'number')
|
||
.reduce((worst, c) => {
|
||
const abs = Math.abs(c.medianScoreDeltaVsBaseline);
|
||
return abs > Math.abs(worst?.medianScoreDeltaVsBaseline ?? 0) ? c : worst;
|
||
}, null);
|
||
if (worstCohort) {
|
||
const shift = Math.abs(worstCohort.medianScoreDeltaVsBaseline);
|
||
addGate('gate-6-cohort-median', 'Cohort median shift vs baseline <= 10 points',
|
||
shift <= GATE_THRESHOLDS.COHORT_MEDIAN_SHIFT_MAX ? 'pass' : 'fail',
|
||
`worst: ${worstCohort.cohortId} ${worstCohort.medianScoreDeltaVsBaseline}pt (ceiling ${GATE_THRESHOLDS.COHORT_MEDIAN_SHIFT_MAX})`);
|
||
} else {
|
||
addGate('gate-6-cohort-median', 'Cohort median shift vs baseline <= 10 points', 'skipped',
|
||
'no cohort has baseline overlap');
|
||
}
|
||
} else {
|
||
addGate('gate-6-cohort-median', 'Cohort median shift vs baseline <= 10 points', 'skipped',
|
||
'baseline unavailable');
|
||
}
|
||
|
||
// Gate 7: Matched-pair within-pair gap signs verified. Any pair
|
||
// flipping direction or falling below minGap stops the PR.
|
||
addGate('gate-7-matched-pair', 'Matched-pair within-pair gaps hold expected direction',
|
||
matchedPairFailures.length === 0 ? 'pass' : 'fail',
|
||
matchedPairFailures.length === 0
|
||
? `${matchedPairSummary.filter((p) => !p.skipped).length}/${matchedPairSummary.filter((p) => !p.skipped).length} pairs pass`
|
||
: `${matchedPairFailures.length} pair(s) failed: ${matchedPairFailures.map((p) => p.pairId).join(', ')}`);
|
||
|
||
// Gate 9: Per-indicator effective-influence baseline present. Sign-
|
||
// and rank-order correctness against nominal weights is a post-hoc
|
||
// human-review check; this gate asserts the MEASUREMENT exists,
|
||
// which is the diagnostic-apparatus pre-requisite from PR 0.
|
||
addGate('gate-9-effective-influence-baseline',
|
||
'Per-indicator effective-influence baseline exists (>= 80% of Core implemented)',
|
||
extractionCoverage.coreTotal > 0 && (extractionCoverage.coreImplemented / extractionCoverage.coreTotal) >= 0.80
|
||
? 'pass' : 'fail',
|
||
`${extractionCoverage.coreImplemented}/${extractionCoverage.coreTotal} Core indicators measurable`);
|
||
|
||
// Gate: cohort/pair membership present in scorable universe (not
|
||
// numbered in plan §6 but is the PR 0 fail-loud addition — if any
|
||
// cohort/pair endpoint falls out of listScorableCountries, every
|
||
// other gate is being computed over a silently-partial universe).
|
||
addGate('gate-universe-integrity', 'All cohort/pair endpoints are in the scorable universe',
|
||
cohortMissingFromScorable.length === 0 ? 'pass' : 'fail',
|
||
cohortMissingFromScorable.length === 0
|
||
? `${cohortOrPairMembers.size} endpoints verified`
|
||
: `missing from scorable: ${cohortMissingFromScorable.join(', ')}`);
|
||
|
||
const acceptanceGates = {
|
||
thresholds: GATE_THRESHOLDS,
|
||
results: gates,
|
||
summary: {
|
||
total: gates.length,
|
||
pass: gates.filter((g) => g.status === 'pass').length,
|
||
fail: gates.filter((g) => g.status === 'fail').length,
|
||
skipped: gates.filter((g) => g.status === 'skipped').length,
|
||
},
|
||
verdict: gates.some((g) => g.status === 'fail')
|
||
? 'BLOCK' // any fail halts the PR per plan §6
|
||
: gates.some((g) => g.status === 'skipped')
|
||
? 'CONDITIONAL' // skipped gates need the missing inputs before final merge
|
||
: 'PASS',
|
||
};
|
||
|
||
const output = {
|
||
comparison: 'currentDomainAggregate_vs_proposedPillarCombined',
|
||
penaltyAlpha: PENALTY_ALPHA,
|
||
pillarWeights: PILLAR_WEIGHTS,
|
||
domainWeights,
|
||
// Finding 1 acceptance-apparatus metadata: scoring + acceptance
|
||
// gates ran over the FULL scorable universe, not a curated sample.
|
||
// cohortMissingFromScorable surfaces any cohort/pair endpoint that
|
||
// the scoring registry cannot actually score (e.g. new cohort
|
||
// addition that slipped past listScorableCountries): fail-loud
|
||
// instead of silently dropping.
|
||
scorableUniverseSize: scorableCountries.length,
|
||
sampleSize: rows.length,
|
||
sampleCountries: rows.map((r) => r.countryCode),
|
||
cohortMissingFromScorable,
|
||
summary: {
|
||
spearmanRankCorrelation: Math.round(spearman * 10000) / 10000,
|
||
meanScoreDelta: Math.round(meanScoreDelta * 100) / 100,
|
||
meanAbsScoreDelta: Math.round(meanAbsScoreDelta * 100) / 100,
|
||
maxRankAbsDelta,
|
||
matchedPairFailures: matchedPairFailures.length,
|
||
acceptanceVerdict: acceptanceGates.verdict,
|
||
},
|
||
acceptanceGates,
|
||
baselineComparison,
|
||
cohortSummary,
|
||
matchedPairSummary,
|
||
variableInfluence,
|
||
extractionCoverage,
|
||
perIndicatorInfluence,
|
||
topMoversByRank: topMovers.map((r) => ({
|
||
countryCode: r.countryCode,
|
||
currentRank: r.currentRank,
|
||
proposedRank: r.proposedRank,
|
||
rankDelta: r.rankDelta,
|
||
currentOverallScore: r.currentOverallScore,
|
||
proposedOverallScore: r.proposedOverallScore,
|
||
scoreDelta: r.scoreDelta,
|
||
pillars: r.pillars,
|
||
})),
|
||
biggestScoreDrops: biggestScoreDrops.map((r) => ({
|
||
countryCode: r.countryCode,
|
||
scoreDelta: r.scoreDelta,
|
||
currentOverallScore: r.currentOverallScore,
|
||
proposedOverallScore: r.proposedOverallScore,
|
||
rankDelta: r.rankDelta,
|
||
})),
|
||
biggestScoreClimbs: biggestScoreClimbs.map((r) => ({
|
||
countryCode: r.countryCode,
|
||
scoreDelta: r.scoreDelta,
|
||
currentOverallScore: r.currentOverallScore,
|
||
proposedOverallScore: r.proposedOverallScore,
|
||
rankDelta: r.rankDelta,
|
||
})),
|
||
fullSample: rows,
|
||
};
|
||
|
||
process.stdout.write(`${JSON.stringify(output, null, 2)}\n`);
|
||
}
|
||
|
||
// Export the baseline-snapshot selection helpers so unit tests can
|
||
// verify the ordering contract (pre-repair < post-pr1 < post-pr10, etc.)
|
||
// without having to spin up the full scoring pipeline.
|
||
export {
|
||
parseBaselineSnapshotMeta,
|
||
loadMostRecentBaselineSnapshot,
|
||
EXTRACTION_RULES,
|
||
buildIndicatorExtractionPlan,
|
||
applyExtractionRule,
|
||
};
|
||
|
||
// isMain guard so importing the helpers from a test file does not
|
||
// accidentally trigger the full scoring run. Per the project's
|
||
// feedback_seed_isMain_guard memory: any script that exports functions
|
||
// AND runs work at top level MUST guard the work behind an explicit
|
||
// entrypoint check.
|
||
const invokedAsScript = (() => {
|
||
const entry = process.argv[1];
|
||
if (!entry) return false;
|
||
try {
|
||
return path.resolve(entry) === path.resolve(fileURLToPath(import.meta.url));
|
||
} catch {
|
||
return false;
|
||
}
|
||
})();
|
||
|
||
if (invokedAsScript) {
|
||
main().catch((err) => {
|
||
console.error('[compare-resilience-current-vs-proposed] failed:', err);
|
||
process.exit(1);
|
||
});
|
||
}
|