Files
worldmonitor/tests/resilience-dimension-monotonicity.test.mts
Elie Habib 7cf37c604c feat(resilience): PR 3 — dead-signal cleanup (plan §3.5, §3.6) (#3297)
* feat(resilience): PR 3 §3.5 — retire fuelStockDays from core score permanently

First commit in PR 3 of the resilience repair plan. Retires
`fuelStockDays` from the core score with no replacement.

Why permanent, not replaced:
IEA emergency-stockholding rules are defined in days of NET IMPORTS
and do not bind net exporters by design. Norway/Canada/US measured
in days-of-imports are incomparable to Germany/Japan measured the
same way — the construct is fundamentally different across the two
country classes. No globally-comparable recovery-fuel signal can
be built from this source; the pre-repair probe showed 100% imputed
at 50 for every country in the April 2026 freeze.

  scoreFuelStockDays:
    - Rewritten to return coverage=0 + observedWeight=0 +
      imputationClass='source-failure' for every country regardless
      of seed content.
    - Drops the dimension from the `recovery` domain's coverage-
      weighted mean automatically; remaining recovery dimensions
      pick up the share via re-normalisation in
      `_shared.ts#coverageWeightedMean`.
    - No explicit weight transfer needed — the coverage-weighted
      blend handles redistribution.

  Registry:
    - recoveryFuelStockDays re-tagged from tier='enrichment' to
      tier='experimental' so the Core coverage gate treats it as
      out-of-score.
    - Description updated to make the retirement explicit; entry
      stays in the registry for structural continuity (the
      dimension `fuelStockDays` remains in RESILIENCE_DIMENSION_ORDER
      for the 19-dimension tests; removing the dimension entirely is
      a PR 4 structural-audit concern).

  Housekeeping:
    - Removed `RESILIENCE_RECOVERY_FUEL_STOCKS_KEY` constant (no
      longer read; noUnusedLocals would reject it).
    - Removed `RecoveryFuelStocksCountry` interface for the same
      reason. Comment at the removed declaration instructs future
      maintainers not to re-add the type as a reservation; when a
      new recovery-fuel concept lands, introduce a fresh interface.

Plan reference: §3.5 point 1 of
`docs/plans/2026-04-22-001-fix-resilience-scorer-structural-bias-plan.md`.

51 resilience tests pass, typecheck + biome clean. The
`recovery` domain's published score will shift slightly for every
country because the 0.10 slot that fuelStockDays was imputing to
now redistributes; the compare-harness acceptance-gate rerun at
merge time will quantify the shift per plan §6 gates.

* feat(resilience): PR 3 §3.5 — retire BIS-backed currencyExternal; rebuild on IMF inflation + WB reserves

BIS REER/DSR feeds were load-bearing in currencyExternal (weights 0.35
fxVolatility + 0.35 fxDeviation, ~70% of dimension). They cover ~60
countries max — so every non-BIS country fell through to
curated_list_absent (coverage 0.3) or a thin IMF proxy (coverage 0.45).
Combined with reserveMarginPct already removed in PR 1, currencyExternal
was the clearest "construct absent for most of the world" carrier left
in the scorer.

Changes:

_dimension-scorers.ts
- scoreCurrencyExternal now reads IMF macro (inflationPct) + WB FX
  reserves only. Coverage ladder:
    inflation + reserves → 0.85 (observed primary + secondary)
    inflation only       → 0.55
    reserves only        → 0.40
    neither              → 0.30 (IMPUTE.bisEer retained for snapshot
                                 continuity; semantics read as
                                 "no IMF + no WB reserves" now)
- Removed dead symbols: RESILIENCE_BIS_EXCHANGE_KEY constant (reserved
  via comment only, flagged by noUnusedLocals), stddev() helper,
  getCountryBisExchangeRates() loader, BisExchangeRate interface,
  dateToSortableNumber() — all were exclusive callers of the retired
  BIS path.

_indicator-registry.ts
- New core entry inflationStability (weight 0.60, tier=core,
  sourceKey=economic:imf:macro:v2).
- fxReservesAdequacy weight 0.15 → 0.40 (secondary reliability
  anchor).
- fxVolatility + fxDeviation demoted tier=enrichment → tier=experimental
  (BIS ~60-country coverage; off the core weight sum).
- Non-experimental weights now sum to 1.0 (0.60 + 0.40).

scripts/compare-resilience-current-vs-proposed.mjs
- EXTRACTION_RULES: added inflationStability →
  imf-macro-country-field field=inflationPct so the registry-parity
  test passes and the correlation harness sees the new construct.

tests/resilience-dimension-scorers.test.mts
- Dropped BIS-era wording ("non-BIS country") and test 266
  (BIS-outage coverage 0.35 branch) which collapsed to the inflation-
  only path post-retirement.
- Updated coverage assertions: inflation-only 0.45 → 0.55; inflation+
  reserves 0.55 → 0.85.

tests/resilience-scorers.test.mts
- domainAverages.economic 68.33 → 66.33 (US currencyExternal score
  shifts slightly under IMF+reserves vs old BIS composite).
- stressScore 67.85 → 67.21; stressFactor 0.3215 → 0.3279.
- overallScore 65.82 → 65.52.
- baselineScore unchanged (currencyExternal is stress-only).

All 6324 data-tier tests pass. typecheck:api clean. No change to
seeders or Redis keys; this is a pure scorer + registry rebuild.

* feat(resilience): PR 3 §3.5 point 3 — re-goalpost externalDebtCoverage (0..5 → 0..2)

Plan §2.1 diagnosis table showed externalDebtCoverage saturating at
score=100 across all 9 probe countries — including stressed states.
Signal was collapsed. Root cause: (worst=5, best=0) gave every country
with ratio < 0.5 a score above 90, and mapped Greenspan-Guidotti's
reserve-adequacy threshold (ratio=1.0) to score 80 — well into "no
worry" territory instead of the "mild warning" it should be.

Re-anchored on Greenspan-Guidotti directly: ratio=1.0 now maps to score
50 (mild warning), ratio=2.0 to score 0 (acute rollover-shock exposure).
Ratios above 2.0 clamp to 0, consistent with "beyond this point the
country is already in crisis; exact value stops mattering."

Files changed:

- _indicator-registry.ts: recoveryDebtToReserves goalposts
  {worst: 5, best: 0} → {worst: 2, best: 0}. Description updated to
  cite Greenspan-Guidotti; inline comment documents anchor + rationale.

- _dimension-scorers.ts: scoreExternalDebtCoverage normalizer bound
  changed from (0..5) to (0..2), with inline comment.

- docs/methodology/country-resilience-index.mdx: goalpost table row
  5-0 → 2-0, description cites Greenspan-Guidotti.

- docs/methodology/indicator-sources.yaml:
  * constructStatus: dead-signal → observed-mechanism (signal is now
    discriminating).
  * reviewNotes updated to describe the new anchor.
  * mechanismTestRationale names the Greenspan-Guidotti rule.

- tests/resilience-dimension-monotonicity.test.mts: updated the
  comment + picked values inside the (0..2) discriminating band (0.3
  and 1.5). Old values (1 vs 4) had 4 clamping to 0.

- tests/resilience-dimension-scorers.test.mts: NO score threshold
  relaxed >90 → >=85 (NO ratio=0.2 now scores 90, was 96).

- tests/resilience-scorers.test.mts: fixture drift:
  * domainAverages.recovery 54.83 → 47.33 (US extDebt 70 → 25).
  * baselineScore 63.63 → 60.12 (extDebt is baseline type).
  * overallScore 65.52 → 63.27.
  * stressScore / stressFactor unchanged (extDebt is baseline-only).

All 6324 data-tier tests pass. typecheck:api clean.

* feat(resilience): PR 3 §3.6 — CI gate on indicator coverage and nominal weight

Plan §3.6 adds a new acceptance criterion (also §5 item 5):

> No indicator with observed coverage below 70% may exceed 5% nominal
> weight OR 5% effective influence in the post-change sensitivity run.

This commit enforces the NOMINAL-WEIGHT half as a unit test that runs
on every CI build. The EFFECTIVE-INFLUENCE half is produced by
scripts/validate-resilience-sensitivity.mjs as a committed artifact;
the gate file only asserts that script still exists so a refactor that
removes it breaks the build loudly.

Why the gate exists (plan §3.6):

  "A dimension at 30% observed coverage carries the same effective
   weight as one at 95%. This contradicts the OECD/JRC handbook on
   uncertainty analysis."

Implementation:

tests/resilience-coverage-influence-gate.test.mts — three tests:
  1. Nominal-weight gate: for every core indicator with coverage < 137
     countries (70% of the ~195-country universe), computes its nominal
     overall weight as
       indicator.weight × (1/dimensions-in-domain) × domain-weight
     and asserts it does not exceed 5%. Equal-share-per-dimension is
     the *upper bound* on runtime weight (coverage-weighted mean gives
     a lower share when a dimension drops out), so this is a strict
     bound: if the nominal number passes, the runtime number also
     passes for every country.
  2. Effective-influence contract: asserts the sensitivity script
     exists at its expected path. Removing it (intentionally or by
     refactor) breaks the build.
  3. Audit visibility: prints the top 10 core indicators by nominal
     overall weight. No assertion beyond "ran" — the list lets
     reviewers spot outliers that pass the gate but are near the cap.

Current state (observed from audit output):

  recoveryReserveMonths:   nominal=4.17%  coverage=188
  recoveryDebtToReserves:  nominal=4.17%  coverage=185
  recoveryImportHhi:       nominal=4.17%  coverage=190
  inflationStability:      nominal=3.40%  coverage=185
  electricityConsumption:  nominal=3.30%  coverage=217
  ucdpConflict:            nominal=3.09%  coverage=193

Every core indicator has coverage ≥ 180 (already enforced by the
pre-existing indicator-tiering test), so the nominal-weight gate has
no current violators — its purpose is catching future drift, not
flagging today's state.

All 6327 data-tier tests pass. typecheck:api clean.

* docs(resilience): PR 3 methodology doc — document §3.5 dead-signal retirements + §3.6 coverage gate

Methodology-doc update capturing the three §3.5 landings and the §3.6 CI
gate. Five edits:

1. **Known construct limitations section (#5 and #6):** strikethrough the
   original "dead signals" and "no coverage-based weight cap" items,
   annotate them with "Landed in PR 3 §3.5"/"Landed in PR 3 §3.6" +
   specifics of what shipped.

2. **Currency & External H4 section:** completely rewritten. Old table
   (fxVolatility / fxDeviation / fxReservesAdequacy on BIS primary) is
   replaced by the two-indicator post-PR-3 table (inflationStability at
   0.60 + fxReservesAdequacy at 0.40). Coverage ladder spelled out
   (0.85 / 0.55 / 0.40 / 0.30). Legacy BIS indicators named as
   experimental-tier drill-downs only.

3. **Fuel Stock Days H4 section:** H4 heading text kept verbatim so the
   methodology-lint H4-to-dimension mapping does not break; body
   rewritten to explain that the dimension is retired from core but the
   seeder still runs for IEA-member drill-downs.

4. **External Debt Coverage table row:** goalpost 5-0 → 2-0, description
   cites Greenspan-Guidotti reserve-adequacy rule.

5. **New v2.2 changelog entry** — PR 3 dead-signal cleanup, covering
   §3.5 points 1/2/3 + §3.6 + acceptance gates + construct-audit
   updates.

No scoring or code changes in this commit. Methodology-lint test passes
(H4 mapping intact). All 6327 data-tier tests pass.

* fix(resilience): PR 3 §3.6 gate — correct share-denominator for coverage-weighted aggregation

Reviewer catch (thanks). The previous gate computed each indicator's
nominal overall weight as

  indicator.weight × (1 / N_total_dimensions_in_domain) × domain_weight

and claimed this was an upper bound ("actual runtime weight is ≤ this
when some dimensions drop out on coverage"). That is BACKWARDS for
this scorer.

The domain aggregation is coverage-weighted
(server/worldmonitor/resilience/v1/_shared.ts coverageWeightedMean),
so when a dimension pins at coverage=0 it is EXCLUDED from the
denominator and the surviving dimensions' shares go UP, not down.

PR 3 commit 1 retires fuelStockDays by hard-coding its scorer to
coverage=0 for every country — so in the current live state the
recovery domain has 5 contributing dimensions (not 6), and each core
recovery indicator's nominal share is

  1.0 × 1/5 × 0.25 = 5.00% (was mis-reported as 4.17%)

The old gate therefore under-estimated nominal influence and could
silently pass exactly the kind of low-coverage overweight regression
it is meant to block.

Fix:

- Added `coreBearingDimensions(domainId)` helper that counts only
  dimensions that have ≥1 core indicator in the registry. A dimension
  with only experimental/enrichment entries (post-retirement
  fuelStockDays) has no core contribution → does not dilute shares.
- Updated `nominalOverallWeight` to divide by the core-bearing count,
  not the raw dimension count.
- Rewrote the helper's doc comment to stop claiming this is a strict
  upper bound — explicitly calls out the dynamic case (source failure
  raising surviving dim shares further) as the sensitivity script's
  responsibility.
- Added a new regression test: asserts (a) at least one recovery
  dimension is all-non-core (fuelStockDays post-retirement),
  (b) fuelStockDays has zero core indicators, and (c) recoveryDebt
  ToReserves nominal = 0.05 exactly (not 0.0417) — any reversion
  of the retirement or regression to N_total-denominator will fail
  loudly.

Top-10 audit output now correctly shows:

  recoveryReserveMonths:   nominal=5%     coverage=188
  recoveryDebtToReserves:  nominal=5%     coverage=185
  recoveryImportHhi:       nominal=5%     coverage=190
  (was 4.17% each under the old math)

All 486 resilience tests pass. typecheck:api clean.

Note: the 5% figure is exactly AT the cap, not over it. "exceed" means
strictly > 5%, so it still passes. But now the reviewer / audit log
reflects reality.

* fix(resilience): PR 3 review — retired-dim confidence drag + false source-failure label

Addresses the Codex review P1 + P2 on PR #3297.

P1 — retired-dim drag on confidence averages
--------------------------------------------
scoreFuelStockDays returns coverage=0 by design (retired construct),
but computeLowConfidence, computeOverallCoverage, and the widget's
formatResilienceConfidence averaged across all 19 dimensions. That
dragged every country's reported averageCoverage down — US went from
0.8556 (active dims only) to 0.8105 (all dims) — enough drift to
misclassify edge countries as lowConfidence and to shift the ranking
widget's overallCoverage pill for every country.

Fix: introduce an authoritative RESILIENCE_RETIRED_DIMENSIONS set in
_dimension-scorers.ts and filter it out of all three averages. The
filter is keyed on the retired-dim REGISTRY, not on coverage === 0,
because a non-retired dim can legitimately emit coverage=0 on a
genuinely sparse-data country via weightedBlend fall-through — those
entries MUST keep dragging confidence down (that is the sparse-data
signal lowConfidence exists to surface). Verified: sparse-country
release-gate test (marks sparse WHO/FAO countries as low confidence)
still passes with the registry-keyed filter; would have failed with
a naive coverage=0 filter.

Server-client parity: widget-utils cannot import server code, so
RESILIENCE_RETIRED_DIMENSION_IDS is a hand-mirrored constant, kept
in lockstep by tests/resilience-retired-dimensions-parity.test.mts
(parses the widget file as text, same pattern as existing widget-util
tests that can't import the widget module directly).

P2 — false "Source down" label on retired dim
---------------------------------------------
scoreFuelStockDays hard-coded imputationClass: 'source-failure',
which the widget maps to "Source down: upstream seeder failed" with
a `!` icon for every country. That is semantically wrong for an
intentional retirement. Flipped to null so the widget's absent-path
renders a neutral cell without a false outage label. null is already
a legal value of ResilienceDimensionScore.imputationClass; no type
change needed.

Tests
-----
- tests/resilience-confidence-averaging.test.mts (new): pins the
  registry-keyed filter semantic for computeOverallCoverage +
  computeLowConfidence. Includes a negative-control test proving
  non-retired coverage=0 dims still flip lowConfidence.
- tests/resilience-retired-dimensions-parity.test.mts (new):
  lockstep gate between server and client retired-dim lists.
- Widget test adds a registry-keyed exclusion test with a non-retired
  coverage=0 dim in the fixture to lock in the correct semantic.
- Existing tests asserting imputationClass: 'source-failure' for
  fuelStockDays flipped to null.

All 494 resilience tests + full 6336/6336 data-tier suite pass.
Typecheck clean for both tsconfig.json and tsconfig.api.json.

* docs(resilience): align methodology + registry metadata with shipped imputationClass=null

Follow-up to the previous PR 3 review commit that flipped
scoreFuelStockDays's imputationClass from 'source-failure' to null to
avoid a false "Source down" widget label on every country. The code
changed; the doc and registry metadata did not, leaving three sites
in the methodology mdx and two comment/description sites in the
registry still claiming imputationClass='source-failure'. Any future
reviewer (or tooling that treats the registry description as
authoritative) would be misled.

This commit rewrites those sites to describe the shipped behavior:
 - imputationClass=null (not 'source-failure'), with the rationale
 - exclusion from confidence/coverage averages via the
   RESILIENCE_RETIRED_DIMENSIONS registry filter
 - the distinction between structural retirement (filtered) and
   runtime coverage=0 (kept so sparse-data countries still flag
   lowConfidence)

Touched:
 - docs/methodology/country-resilience-index.mdx (lines ~33, ~268, ~590)
 - server/worldmonitor/resilience/v1/_indicator-registry.ts
   (recoveryFuelStockDays comment block + description field)

No code-behavior change. Docs-only.

Tests: 157 targeted resilience tests pass (incl. methodology-lint +
widget + release-gate + confidence-averaging). Typecheck clean on
both tsconfig.json and tsconfig.api.json.
2026-04-22 23:57:28 +04:00

260 lines
12 KiB
TypeScript
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
// Monotonicity-test harness. Pins the direction of movement for the
// highest-leverage indicators so PR 1 + PR 2 cannot accidentally flip
// a sign silently. See
// docs/plans/2026-04-22-001-fix-resilience-scorer-structural-bias-plan.md
// §5 (PR 0 deliverable) and §6 (acceptance gate 8).
//
// Each test builds two synthetic `ResilienceSeedReader` fixtures that
// differ only in the target indicator's value and asserts the dimension
// score moves in the documented direction.
//
// Scope (minimum viable, expanded in PR 0.5 follow-ups):
// - scoreEnergy: dependency, gasShare, coalShare, renewShare, electricityConsumption
// (all five direction claims the current scorer makes — PR 1 overturns three of them)
// - scoreReserveAdequacy: reserveMonths
// - scoreFiscalSpace: govRevenuePct, fiscalBalancePct, debtToGdpPct
// - scoreExternalDebtCoverage: debtToReservesRatio
// - scoreImportConcentration: hhi
// - scoreFoodWater: peopleInCrisis, phase
// - scoreGovernanceInstitutional: WGI mean
//
// 15 indicators × 1 direction check each = 15 assertions. The harness
// is written as a table so PR 1 can add/remove rows without touching
// test logic.
import assert from 'node:assert/strict';
import { describe, it } from 'node:test';
import {
scoreEnergy,
scoreReserveAdequacy,
scoreFiscalSpace,
scoreExternalDebtCoverage,
scoreImportConcentration,
scoreFoodWater,
scoreGovernanceInstitutional,
type ResilienceSeedReader,
} from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
const TEST_ISO2 = 'XX';
function makeStaticReader(staticRecord: unknown, overrides: Record<string, unknown> = {}): ResilienceSeedReader {
return async (key: string) => {
if (key === `resilience:static:${TEST_ISO2}`) return staticRecord;
if (key in overrides) return overrides[key];
return null;
};
}
function makeRecoveryReader(keyValueMap: Record<string, unknown>): ResilienceSeedReader {
return async (key: string) => keyValueMap[key] ?? null;
}
describe('resilience dimension monotonicity — scoreReserveAdequacy', () => {
it('higher reserveMonths → higher score', async () => {
const low = await scoreReserveAdequacy(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:reserve-adequacy:v1': { countries: { [TEST_ISO2]: { reserveMonths: 2 } } },
}));
const high = await scoreReserveAdequacy(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:reserve-adequacy:v1': { countries: { [TEST_ISO2]: { reserveMonths: 12 } } },
}));
assert.ok(high.score > low.score, `reserveMonths 2→12 should raise score; got ${low.score}${high.score}`);
});
});
describe('resilience dimension monotonicity — scoreFiscalSpace', () => {
const baseEntry = { govRevenuePct: 25, fiscalBalancePct: 0, debtToGdpPct: 60 };
async function scoreWith(override: Partial<typeof baseEntry>) {
return scoreFiscalSpace(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:fiscal-space:v1': { countries: { [TEST_ISO2]: { ...baseEntry, ...override } } },
}));
}
it('higher govRevenuePct → higher score', async () => {
const low = await scoreWith({ govRevenuePct: 10 });
const high = await scoreWith({ govRevenuePct: 40 });
assert.ok(high.score > low.score, `govRevenuePct 10→40 should raise score; got ${low.score}${high.score}`);
});
it('higher fiscalBalancePct → higher score', async () => {
const low = await scoreWith({ fiscalBalancePct: -10 });
const high = await scoreWith({ fiscalBalancePct: 3 });
assert.ok(high.score > low.score, `fiscalBalancePct -10→3 should raise score; got ${low.score}${high.score}`);
});
it('higher debtToGdpPct → lower score', async () => {
const low = await scoreWith({ debtToGdpPct: 40 });
const high = await scoreWith({ debtToGdpPct: 140 });
assert.ok(low.score > high.score, `debtToGdpPct 40→140 should lower score; got ${low.score}${high.score}`);
});
});
describe('resilience dimension monotonicity — scoreExternalDebtCoverage', () => {
async function scoreWith(ratio: number) {
return scoreExternalDebtCoverage(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:external-debt:v1': { countries: { [TEST_ISO2]: { debtToReservesRatio: ratio } } },
}));
}
it('higher debtToReservesRatio → lower score', async () => {
// PR 3 §3.5 point 3: goalpost is now lower-better worst=2 best=0
// (Greenspan-Guidotti anchor). Any ratio ≥ 2 clamps to 0, so pick
// values inside the discriminating band to get a meaningful gradient.
const good = await scoreWith(0.3);
const bad = await scoreWith(1.5);
assert.ok(good.score > bad.score, `debtToReservesRatio 0.3→1.5 should lower score; got ${good.score}${bad.score}`);
});
});
describe('resilience dimension monotonicity — scoreImportConcentration', () => {
async function scoreWith(hhi: number) {
return scoreImportConcentration(TEST_ISO2, makeRecoveryReader({
'resilience:recovery:import-hhi:v1': { countries: { [TEST_ISO2]: { hhi } } },
}));
}
it('higher hhi → lower score (more concentration = more exposure)', async () => {
// HHI payload is on a 0..1 scale (normalised before storage).
// 0.15 = diversified supplier base; 0.45 = concentrated.
const diversified = await scoreWith(0.15);
const concentrated = await scoreWith(0.45);
assert.ok(diversified.score > concentrated.score, `hhi 0.15→0.45 should lower score; got ${diversified.score}${concentrated.score}`);
});
});
describe('resilience dimension monotonicity — scoreGovernanceInstitutional', () => {
async function scoreWith(wgiMeanValue: number) {
// Static-record shape per `getStaticWgiValues`: `wgi.indicators.<name>.value`.
const staticRecord = {
wgi: {
indicators: {
voiceAccountability: { value: wgiMeanValue },
politicalStability: { value: wgiMeanValue },
governmentEffectiveness:{ value: wgiMeanValue },
regulatoryQuality: { value: wgiMeanValue },
ruleOfLaw: { value: wgiMeanValue },
controlOfCorruption: { value: wgiMeanValue },
},
},
};
return scoreGovernanceInstitutional(TEST_ISO2, makeStaticReader(staticRecord));
}
it('higher WGI mean → higher score', async () => {
const weak = await scoreWith(-1.5);
const strong = await scoreWith(1.5);
assert.ok(strong.score > weak.score, `WGI -1.5→1.5 should raise score; got ${weak.score}${strong.score}`);
});
});
describe('resilience dimension monotonicity — scoreFoodWater', () => {
async function scoreWith(override: Record<string, unknown>) {
const fao = { peopleInCrisis: 100, phase: 'Phase 1', ...override };
const staticRecord = { fao, aquastat: { waterStress: { value: 40 }, waterAvailability: { value: 2000 } } };
return scoreFoodWater(TEST_ISO2, makeStaticReader(staticRecord));
}
it('higher peopleInCrisis → lower score', async () => {
const healthy = await scoreWith({ peopleInCrisis: 1000 });
const crisis = await scoreWith({ peopleInCrisis: 5_000_000 });
assert.ok(healthy.score > crisis.score, `peopleInCrisis 1k→5M should lower score; got ${healthy.score}${crisis.score}`);
});
it('higher IPC phase → lower score', async () => {
const phase2 = await scoreWith({ phase: 'Phase 2' });
const phase5 = await scoreWith({ phase: 'Phase 5' });
assert.ok(phase2.score > phase5.score, `phase 2→5 should lower score; got ${phase2.score}${phase5.score}`);
});
});
describe('resilience dimension monotonicity — scoreEnergy (current construct)', () => {
// NOTE: these tests pin the CURRENT scorer direction for each indicator.
// PR 1 §3.1-3.3 overturns three of them (electricityConsumption, gasShare,
// coalShare) — when PR 1 ships, those tests are REPLACED by tests for
// the new indicators (importedFossilDependence, lowCarbonGenerationShare).
// The failure of one of these tests in the meantime is a signal that a
// PR has accidentally altered the construct; PR 1 should update this
// file in the same commit that changes scoreEnergy.
function makeEnergyReader(overrides: {
staticRecord?: unknown;
mix?: unknown;
prices?: unknown;
storage?: unknown;
} = {}): ResilienceSeedReader {
const defaultStatic = {
iea: { energyImportDependency: { value: 30 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 3000 } } },
};
const defaultMix = { gasShare: 30, coalShare: 20, renewShare: 30 };
return async (key: string) => {
if (key === `resilience:static:${TEST_ISO2}`) return overrides.staticRecord ?? defaultStatic;
if (key === 'economic:energy:v1:all') return overrides.prices ?? null;
if (key === `energy:mix:v1:${TEST_ISO2}`) return overrides.mix ?? defaultMix;
if (key === `energy:gas-storage:v1:${TEST_ISO2}`) return overrides.storage ?? null;
return null;
};
}
it('higher import dependency → lower score', async () => {
const selfSufficient = await scoreEnergy(TEST_ISO2, makeEnergyReader({
staticRecord: {
iea: { energyImportDependency: { value: 10 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 3000 } } },
},
}));
const dependent = await scoreEnergy(TEST_ISO2, makeEnergyReader({
staticRecord: {
iea: { energyImportDependency: { value: 90 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 3000 } } },
},
}));
assert.ok(selfSufficient.score > dependent.score, `import dep 10→90 should lower score; got ${selfSufficient.score}${dependent.score}`);
});
it('higher renewShare → higher score', async () => {
const low = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 30, coalShare: 20, renewShare: 5 } }));
const high = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 30, coalShare: 20, renewShare: 70 } }));
assert.ok(high.score > low.score, `renewShare 5→70 should raise score; got ${low.score}${high.score}`);
});
it('CURRENT: higher gasShare → lower score (THIS CHANGES IN PR 1 — see plan §3.2)', async () => {
// Pins the current (v3-plan-condemned) behavior so PR 1 knows what
// it is replacing. When PR 1 ships the new importedFossilDependence
// composite, this test is REPLACED, not deleted — the replacement
// pins the new construct's direction.
const low = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 10, coalShare: 20, renewShare: 30 } }));
const high = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 70, coalShare: 20, renewShare: 30 } }));
assert.ok(low.score > high.score, `gasShare 10→70 should lower score under current construct; got ${low.score}${high.score}`);
});
it('CURRENT: higher coalShare → lower score (THIS CHANGES IN PR 1 — see plan §3.2)', async () => {
const low = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 30, coalShare: 10, renewShare: 30 } }));
const high = await scoreEnergy(TEST_ISO2, makeEnergyReader({ mix: { gasShare: 30, coalShare: 70, renewShare: 30 } }));
assert.ok(low.score > high.score, `coalShare 10→70 should lower score under current construct; got ${low.score}${high.score}`);
});
it('CURRENT: higher electricityConsumption → higher score (THIS FAILS THE MECHANISM TEST — see plan §3.1)', async () => {
// This test PASSES today because the current scorer rewards
// per-capita electricity consumption. The v3 plan classifies
// electricityConsumption as a wealth-proxy that fails the mechanism
// test; PR 1 removes it. When PR 1 ships, this test is DELETED (not
// replaced), because the indicator no longer exists. The delete is
// the signal that the wealth-proxy concern is resolved.
const low = await scoreEnergy(TEST_ISO2, makeEnergyReader({
staticRecord: {
iea: { energyImportDependency: { value: 30 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 500 } } },
},
}));
const high = await scoreEnergy(TEST_ISO2, makeEnergyReader({
staticRecord: {
iea: { energyImportDependency: { value: 30 } },
infrastructure: { indicators: { 'EG.USE.ELEC.KH.PC': { value: 7500 } } },
},
}));
assert.ok(high.score > low.score, `electricityConsumption 500→7500 kWh/cap should raise score under current construct; got ${low.score}${high.score}`);
});
});