diff --git a/docs/methodology/known-limitations.md b/docs/methodology/known-limitations.md index 08f7a823b..3b19a2b81 100644 --- a/docs/methodology/known-limitations.md +++ b/docs/methodology/known-limitations.md @@ -146,3 +146,124 @@ spot-check runs. - Plan reference: `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md` §PR 5.2 + +--- + +## foodWater scorer — construct-deterministic cohort identity (scoreFoodWater) + +**Dimension.** `foodWater` (weight 1.0 in the `health-food` domain +aggregate). Reads from `resilience:static:` via +`readStaticCountry`. Three weighted slots: + +| Slot | Source | Weight | Mapping | +|---|---|---|---| +| People in food crisis (log10) | `fao.peopleInCrisis` (HDX IPC/FSIN) | 0.45 | `normalizeLowerBetter(log10(max(1, n)), 0, 7)` | +| IPC phase number | `fao.phase` → digit extracted | 0.15 | `normalizeLowerBetter(phase, 1, 5)` | +| AQUASTAT water indicator | `aquastat.value` + `aquastat.indicator` (WB `ER.H2O.FWST.ZS`, labelled `'water stress'`) | 0.40 | `normalizeLowerBetter(value, 0, 100)` when indicator contains `stress`/`withdrawal`/`dependency`; `normalizeHigherBetter` when `availability`/`renewable`/`access` | + +**What the plan's predecessor concern was.** The cohort-audit plan +observed that GCC countries all score ~53 on `foodWater` and +asked whether this was a "mystery regional default" or genuine +construct output. + +**Finding — it is genuine construct output.** + +1. IPC/HDX doesn't publish active food-crisis data for food-secure + states like the GCC. `scripts/seed-resilience-static.mjs` writes + `fao: null` (or omits the block) for those countries. +2. The scorer's `fao == null` branch imputes `IMPUTE.ipcFood` = + `{ score: 88, certaintyCoverage: 0.7, imputationClass: + 'stable-absence' }` (see `_dimension-scorers.ts` line 135) at + weight 0.6 for the combined peopleInCrisis+phase slot. +3. AQUASTAT for the GCC is EXTREME. WB indicator `ER.H2O.FWST.ZS` + measures freshwater withdrawal as a % of internal renewable + resources. Desert economies with desalination routinely exceed + 100% (Kuwait ~3200%, Bahrain ~3400%, UAE ~2080%, Qatar ~770%). + Values > 100 clamp the sub-score to 0 under the lower-better + normaliser against (0, 100). +4. Under the `fao: null` branch (which is what the static seeder + emits for GCC in production) plus clamped AQUASTAT=0 at weight + 0.4, the weighted blend is: + + ``` + weightedScore = (IMPUTE.ipcFood × 0.6 + 0 × 0.4) / (0.6 + 0.4) + = (88 × 0.6) / 1.0 + = 52.8 → 53 + ``` + + Pinned as an anchor test in + `tests/resilience-foodwater-field-mapping.test.mts`. Note that + an alternative scenario — `fao` present with `peopleInCrisis: 0` + and `phase: null` — converges on a near-identical 52.94 via the + else branch formula `(100×0.45 + 0×0.4) / 0.85`. That convergence + is a coincidence of the specific zero-peopleInCrisis input, NOT + the construct's intent — the test fixture is intentionally shaped + to exercise the IMPUTE path that matches production. + +**Why GCC scores are identical across the cohort.** GCC +countries share: + +- Same IPC status (not monitored → same impute constant) +- Same AQUASTAT indicator (`'water stress'`, WB's standard label) +- Extreme and similarly-clamped withdrawal ratios (all > 100 → + all clamp to 0 on the AQUASTAT sub-score) + +Identical inputs → identical outputs. That is construct +determinism, not a regional-default lookup. Pinned with a +synthetic two-country test: identical input shapes produce +identical scores; different water profiles produce different +scores. + +**Regression-guard tests** in +`tests/resilience-foodwater-field-mapping.test.mts`: + +- Indicator routing: `'water stress'` → lower-better; + `'renewable water availability'` → higher-better. +- GCC extreme-withdrawal anchor: AQUASTAT value=2000 + + `fao: null` (IMPUTE branch, matching production) blends to + exactly 53 via `(88×0.6 + 0×0.4) / 1.0 = 52.8 → 53`. +- IPC-absent with static record present: imputes + `ipcFood=88`; observed AQUASTAT wins → + `imputationClass=null` per weightedBlend's T1.7 rule. +- Fully-imputed (FAO missing AND AQUASTAT missing): surfaces + `imputationClass='stable-absence'`. +- Fully-absent static record (seeder never ran): returns + coverage=0, NOT an impute. +- Cohort determinism: identical inputs → identical scores; + different water-profile inputs → different scores. + +**Implication — no fix required.** The scorer is producing the +construct it's specified to produce. The observed GCC identity +is a correct summary statement: "non-crisis food security + +severe water-withdrawal stress." A future construct decision +might split `foodWater` into food and water sub-dims so the +water-stress signal doesn't saturate the combined dim across +desert economies — but that is a construct redesign, not a +bug fix. + +**Follow-up data-side spot-check (requires API key / Redis +access; not in scope of this PR).** Pull raw AQUASTAT + FAO +inputs for GCC + IL + JO (similar water-stressed region) and +verify the seeder-written values against WB's live API +response. If a GCC country's WB value differs substantially +from the figures above, the seeder may have a stale-year +picker bug — unlikely given `seed-resilience-static.mjs` uses +`mrv=15` + `selectLatestWorldBankByCountry`, but worth +verifying. + +**References.** + +- Seeder: `scripts/seed-resilience-static.mjs` lines 658-680 + (`WB_WATER_STRESS_INDICATOR`, `fetchAquastatDataset`, + `buildAquastatWbMap`) +- Scorer reads: + `server/worldmonitor/resilience/v1/_dimension-scorers.ts` + lines 895 (`scoreAquastatValue`), 1471 (`scoreFoodWater`), + 135 (`IMPUTE.ipcFood` constant) +- WB indicator docs: + https://data.worldbank.org/indicator/ER.H2O.FWST.ZS +- Plan reference: + `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md` + §PR 5.3 +- Test regression guards: + `tests/resilience-foodwater-field-mapping.test.mts` diff --git a/tests/resilience-foodwater-field-mapping.test.mts b/tests/resilience-foodwater-field-mapping.test.mts new file mode 100644 index 000000000..86e0378cd --- /dev/null +++ b/tests/resilience-foodwater-field-mapping.test.mts @@ -0,0 +1,200 @@ +// Regression guard for scoreFoodWater — inputs, branching, and the +// "identical cohort scores are construct-deterministic, not regional- +// default leaks" invariant that PR 5.3 of plan 2026-04-24-002 sets +// out to establish. +// +// Context. The plan flagged that all GCC countries score ~53 on +// `foodWater` and asked: is that a mystery regional default or is it +// a genuine construct output? This test suite pins the answer: +// identical inputs produce identical outputs, and the inputs are +// themselves explicable — IPC does not monitor rich food-secure +// states (impute 88) and AQUASTAT/WB water-stress values for the +// GCC are EXTREME (freshwater withdrawal > 100% of renewable +// resources), which clamps the AQUASTAT sub-score to 0. The blend +// of IMPUTE.ipcFood=88 with AQUASTAT=0 under the documented weights +// produces a near-identical score across countries with the same +// water-stress profile. That's the construct working — not a bug. +// +// See docs/methodology/known-limitations.md for the full write-up. + +import assert from 'node:assert/strict'; +import { describe, it } from 'node:test'; + +import { + scoreFoodWater, + type ResilienceSeedReader, +} from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts'; + +const TEST_ISO2 = 'XX'; + +function makeStaticReader(staticRecord: unknown): ResilienceSeedReader { + return async (key: string) => (key === `resilience:static:${TEST_ISO2}` ? staticRecord : null); +} + +describe('scoreAquastatValue — indicator-keyword routing contract', () => { + it('indicator="water stress" routes to lower-better (higher withdrawal = worse score)', async () => { + // WB indicator ER.H2O.FWST.ZS: "Level of water stress: freshwater + // withdrawal as a proportion of available freshwater resources." + // Seeded as `indicator: 'water stress'`. Containing "stress" → + // `normalizeLowerBetter(value, 0, 100)` per _dimension-scorers.ts:899. + const low = makeStaticReader({ + aquastat: { value: 10, indicator: 'water stress' }, + fao: { peopleInCrisis: 0, phase: null }, + }); + const high = makeStaticReader({ + aquastat: { value: 80, indicator: 'water stress' }, + fao: { peopleInCrisis: 0, phase: null }, + }); + const [lowScore, highScore] = await Promise.all([ + scoreFoodWater(TEST_ISO2, low), + scoreFoodWater(TEST_ISO2, high), + ]); + assert.ok(lowScore.score > highScore.score, + `higher water-stress must LOWER the score; got low=${lowScore.score}, high=${highScore.score}`); + }); + + it('AQUASTAT value > 100 clamps to 0 (the GCC extreme-withdrawal case)', async () => { + // GCC freshwater withdrawal % of renewable resources is well over + // 100% (KW ~3200, BH ~3400, AE ~2080, QA ~770). The normaliser + // clamps anything past the "worst" anchor to 0. Test with 2000, + // which comfortably exceeds the 100 anchor. + // + // IMPORTANT: drive the IMPUTE branch (`fao: null`), not the else + // branch. In production the static seeder writes `fao: null` for + // GCC (IPC/HDX does not monitor food-secure states), so the live + // blend uses the impute path with weights 0.6 (IPC impute=88) + + // 0.4 (AQUASTAT). The else branch (fao-present, peopleInCrisis=0) + // happens to converge on a near-identical number at these inputs + // by coincidence, but testing the wrong branch would let a future + // impute-branch regression slip through. + const reader = makeStaticReader({ + aquastat: { value: 2000, indicator: 'water stress' }, + fao: null, // fao==null branch — matches the GCC production shape + }); + const result = await scoreFoodWater(TEST_ISO2, reader); + // Blend under the IMPUTE branch: + // { score: 88, weight: 0.6, cov: 0.7, imputed } // IMPUTE.ipcFood + // { score: 0, weight: 0.4, cov: 1.0, observed } // AQUASTAT clamped + // weightedScore = (88*0.6 + 0*0.4) / (0.6+0.4) = 52.8 → 53. + // This is EXACTLY the observed GCC cohort score. Pinning it. + assert.equal(Math.round(result.score), 53, + `GCC water-stress profile must yield ~53 on the IMPUTE branch; got ${result.score}`); + }); + + it('indicator="renewable water availability" routes to higher-better (more = better)', async () => { + const scarce = makeStaticReader({ + aquastat: { value: 500, indicator: 'renewable water availability' }, + fao: { peopleInCrisis: 0, phase: null }, + }); + const abundant = makeStaticReader({ + aquastat: { value: 4500, indicator: 'renewable water availability' }, + fao: { peopleInCrisis: 0, phase: null }, + }); + const [scarceScore, abundantScore] = await Promise.all([ + scoreFoodWater(TEST_ISO2, scarce), + scoreFoodWater(TEST_ISO2, abundant), + ]); + assert.ok(abundantScore.score > scarceScore.score, + `more renewable water = higher score; got scarce=${scarceScore.score}, abundant=${abundantScore.score}`); + }); +}); + +describe('scoreFoodWater — IPC absence path (stable-absence imputation)', () => { + it('country not in IPC/HDX (fao=null) imputes ipcFood=88 when static record present', async () => { + const reader = makeStaticReader({ + aquastat: { value: 20, indicator: 'water stress' }, + fao: null, // crisis_monitoring_absent — food-secure country, not a monitored crisis + }); + const result = await scoreFoodWater(TEST_ISO2, reader); + // Blend: {score:88, weight:0.6, cov:0.7, imputed} + {score:80, weight:0.4, cov:1.0, observed} + // weightedScore = (88*0.6 + 80*0.4) / (0.6+0.4) = 52.8 + 32.0 = 84.8 → 85 + // Pinning the blended range to catch a formula regression. + assert.ok(result.score >= 80 && result.score <= 90, + `IPC-absent + moderate aquastat must blend to 80-90; got ${result.score}`); + // Per weightedBlend's T1.7 rule (line 601 of _dimension-scorers.ts): + // `imputationClass` surfaces ONLY when observedWeight === 0. Here + // AQUASTAT is observed data (score=80), so it "wins" and the final + // imputationClass is null. The IPC-impute is still reflected in + // `imputedWeight` and the lower coverage (70% for IPC * 0.6 + 100% + // for AQUASTAT * 0.4 = 82% weighted). + assert.equal(result.imputationClass, null, + 'mixed observed+imputed → imputationClass=null (observed wins); IPC impute reflected in imputedWeight'); + assert.ok(result.imputedWeight > 0, 'IPC slot must be counted as imputed'); + assert.ok(result.observedWeight > 0, 'AQUASTAT slot must be counted as observed'); + }); + + it('country fully imputed (fao=null AND aquastat absent) surfaces imputationClass=stable-absence', async () => { + // This is the scenario where `imputationClass` actually surfaces: + // AQUASTAT missing (null score, no impute) → contributes no weight + // AT ALL to observedWeight or imputedWeight. The remaining IPC + // slot is fully imputed, so the dimension is fully imputed and + // weightedBlend picks the dominant (only) class. + const reader = makeStaticReader({ + aquastat: null, // AQUASTAT data missing entirely + fao: null, // IPC not monitoring + }); + const result = await scoreFoodWater(TEST_ISO2, reader); + assert.equal(result.imputationClass, 'stable-absence', + 'fully-imputed dim must surface stable-absence class'); + // Single imputed slot {score:88, weight:0.6, cov:0.7}: + // weightedScore = 88 * 0.6 / 0.6 = 88 (only this slot has a score) + // weightedCertainty = 0.7 * 0.6 / (0.6+0.4) = 0.42 total weighted / total + assert.ok(result.score >= 85 && result.score <= 92, + `fully-imputed must score ~88; got ${result.score}`); + }); + + it('static-record absent entirely (seeder never ran) does NOT impute — returns null branch', async () => { + // Per the scorer comment at line 1482 of _dimension-scorers.ts: + // "A missing resilience:static:{ISO2} key means the seeder never + // ran — not crisis-free." So this path returns weight-null for + // the IPC slot, not an IMPUTE. The AQUASTAT slot is also null + // because scoreAquastatValue(null)=null. Result: zero-signal. + const reader = makeStaticReader(null); + const result = await scoreFoodWater(TEST_ISO2, reader); + // weightedBlend with two null-score slots returns coverage=0. + assert.equal(result.coverage, 0, + 'fully-absent static record must produce coverage=0 (no impute-as-if-safe)'); + }); +}); + +describe('scoreFoodWater — cohort determinism (not a regional-default leak)', () => { + it('two countries with identical inputs produce identical scores (construct-deterministic)', async () => { + // Two synthetic "GCC-shaped" countries: extreme water stress + // (WB value > 100 clamps AQUASTAT to 0) + IPC-absent. Same + // inputs → same output is the CORRECT behavior. An identical + // score across a cohort is a construct signal, NOT evidence of + // a hardcoded regional default. + const gccShape = { + aquastat: { value: 2500, indicator: 'water stress' }, + fao: null, + }; + const [a, b] = await Promise.all([ + scoreFoodWater('AE', async (k) => (k === 'resilience:static:AE' ? gccShape : null)), + scoreFoodWater('KW', async (k) => (k === 'resilience:static:KW' ? gccShape : null)), + ]); + assert.equal(a.score, b.score, + 'identical inputs must produce identical scores — the observed GCC cohort identity is construct-deterministic'); + assert.equal(a.coverage, b.coverage, 'and identical coverage'); + }); + + it('a different water-profile input produces a different score (rules out the regional-default hypothesis)', async () => { + // Same IPC-absent status, but different AQUASTAT indicator / + // value: a high-renewable country. If foodWater were using a + // hardcoded regional default, this country would score the + // same as the water-stress case above. It must not. + const waterStressShape = { + aquastat: { value: 2500, indicator: 'water stress' }, + fao: null, + }; + const waterAbundantShape = { + aquastat: { value: 8000, indicator: 'renewable water availability' }, + fao: null, + }; + const [stressed, abundant] = await Promise.all([ + scoreFoodWater('AE', async (k) => (k === 'resilience:static:AE' ? waterStressShape : null)), + scoreFoodWater('IS', async (k) => (k === 'resilience:static:IS' ? waterAbundantShape : null)), + ]); + assert.ok(abundant.score > stressed.score, + `water-abundant country must outscore water-stressed; got stressed=${stressed.score}, abundant=${abundant.score} — a regional default would have tied these`); + }); +});