docs(resilience): PR 5.3 — foodWater scorer audit (construct-deterministic GCC identity) (#3374)

* docs(resilience): PR 5.3 — foodWater scorer audit (construct-deterministic GCC identity) PR 5.3 of cohort-audit plan 2026-04-24-002. Stacked on PR 5.2 (#3373) so the known-limitations.md section append is additive. Read-only static audit of scoreFoodWater. Findings 1. The observed GCC-all-score-53 is CONSTRUCT-DETERMINISTIC, not a regional-default leak. Pinned mathematically: - IPC/HDX doesn't publish active food-crisis data for food-secure states → scorer's fao-null branch imputes IMPUTE.ipcFood=88 (class='stable-absence', cov=0.7) at combined weight 0.6 - WB indicator ER.H2O.FWST.ZS (labelled 'water stress') for GCC is EXTREME (KW ~3200%, BH ~3400%, UAE ~2080%, QA ~770%) — all clamp to sub-score 0 under the scorer's lower-better 0..100 normaliser at weight 0.4 - Blended with peopleInCrisis=0 (fao block present with zero): (100 * 0.45 + 0 * 0.4) / (0.45 + 0.4) = 45 / 0.85 ≈ 53 Every GCC country has the same inputs → same outputs. That's construct math, not a regional lookup. 2. Indicator-keyword routing is code-correct. `'water stress'`, `'withdrawal'`, `'dependency'` route to lower-better; `'availability'`, `'renewable'`, `'access'` route to higher-better; unrecognized indicators fall through to a value-range heuristic with a WARN log. 3. No bug or methodology decision required. The 53-all-GCC output is a correct summary statement: "non-crisis food security + severe water-withdrawal stress." A future construct decision might split foodWater into separate food and water dims so one saturated sub-signal doesn't dominate the combined dim for desert economies — but that's a construct redesign, not a bug. Shipped - `docs/methodology/known-limitations.md` — extended with a new section documenting the foodWater audit findings, the exact blend math that yields ~53 for GCC, cohort-determinism vs regional-default, and a follow-up data-side spot-check list gated on API-key access. - `tests/resilience-foodwater-field-mapping.test.mts` — 8 new regression-guard tests: 1. indicator='water stress' routes to lower-better 2. GCC extreme-withdrawal anchor (value=2000 → blended score 53) 3. indicator='renewable water availability' routes to higher-better 4. fao=null with static record → imputes 88; imputationClass=null because observed AQUASTAT wins (weightedBlend T1.7 rule) 5. fully-imputed (fao=null + aquastat=null) surfaces imputationClass='stable-absence' 6. static-record absent entirely → coverage=0, NOT impute 7. Cohort determinism — identical inputs → identical scores 8. Different water-profile inputs → different scores (rules out regional-default hypothesis) Verified - `npx tsx --test tests/resilience-foodwater-field-mapping.test.mts` — 8 pass / 0 fail - `npm run test:data` — 6711 pass / 0 fail (PR 5.2's 9 + PR 5.3's 8 = 17 new stacked) - `npm run typecheck` / `typecheck:api` — green - `npm run lint` / `lint:md` — clean * fix(resilience): PR 5.3 review — pin IMPUTE branch for GCC anchor; fix comment math Addresses 3 P2 Greptile findings on #3374 — all variations of the same root cause: the test fixture + doc described two different code paths that coincidentally both produce ~53 for GCC inputs. Changes 1. GCC anchor test now drives the IMPUTE branch (`fao: null`), matching what the static seeder emits for GCC in production. The else branch (`fao: { peopleInCrisis: 0 }`) happens to converge on ~52.94 by coincidence but is NOT the live code path for GCC. 2. Doc finding #4 updated to show the IMPUTE-branch math `(88×0.6 + 0×0.4) / 1.0 = 52.8 → 53` and explicitly notes the else-branch convergence as a coincidence — not the construct's intent. 3. Comment math off-by-one fix at line 107: (88×0.6 + 80×0.4) / (0.6+0.4) = 52.8 + 32.0 = 84.8 → 85 (was incorrectly stated as 85.6 → 86) Test assertion `>= 80 && <= 90` still accepts 85 so behaviour is unchanged; this was a comment-only error that would have misled anyone reproducing the math by hand. Verified - `npx tsx --test tests/resilience-foodwater-field-mapping.test.mts` — 8 pass / 0 fail (IMPUTE-branch anchor test produces 53 as expected) - `npm run lint:md` — clean Also rebased onto updated #3373 (which landed a backtick-escape fix).
2026-04-25 17:14:57 +02:00 · 2026-04-24 18:25:50 +04:00
parent 6807a9c7b9
commit a97ba83833
2 changed files with 321 additions and 0 deletions
--- a/docs/methodology/known-limitations.md
+++ b/docs/methodology/known-limitations.md
@@ -146,3 +146,124 @@ spot-check runs.
 - Plan reference:
  `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md`
  §PR 5.2
+
+---
+
+## foodWater scorer — construct-deterministic cohort identity (scoreFoodWater)
+
+**Dimension.** `foodWater` (weight 1.0 in the `health-food` domain
+aggregate). Reads from `resilience:static:<ISO2>` via
+`readStaticCountry`. Three weighted slots:
+
+| Slot | Source | Weight | Mapping |
+|---|---|---|---|
+| People in food crisis (log10) | `fao.peopleInCrisis` (HDX IPC/FSIN) | 0.45 | `normalizeLowerBetter(log10(max(1, n)), 0, 7)` |
+| IPC phase number | `fao.phase` → digit extracted | 0.15 | `normalizeLowerBetter(phase, 1, 5)` |
+| AQUASTAT water indicator | `aquastat.value` + `aquastat.indicator` (WB `ER.H2O.FWST.ZS`, labelled `'water stress'`) | 0.40 | `normalizeLowerBetter(value, 0, 100)` when indicator contains `stress`/`withdrawal`/`dependency`; `normalizeHigherBetter` when `availability`/`renewable`/`access` |
+
+**What the plan's predecessor concern was.** The cohort-audit plan
+observed that GCC countries all score ~53 on `foodWater` and
+asked whether this was a "mystery regional default" or genuine
+construct output.
+
+**Finding — it is genuine construct output.**
+
+1. IPC/HDX doesn't publish active food-crisis data for food-secure
+   states like the GCC. `scripts/seed-resilience-static.mjs` writes
+   `fao: null` (or omits the block) for those countries.
+2. The scorer's `fao == null` branch imputes `IMPUTE.ipcFood` =
+   `{ score: 88, certaintyCoverage: 0.7, imputationClass:
+   'stable-absence' }` (see `_dimension-scorers.ts` line 135) at
+   weight 0.6 for the combined peopleInCrisis+phase slot.
+3. AQUASTAT for the GCC is EXTREME. WB indicator `ER.H2O.FWST.ZS`
+   measures freshwater withdrawal as a % of internal renewable
+   resources. Desert economies with desalination routinely exceed
+   100% (Kuwait ~3200%, Bahrain ~3400%, UAE ~2080%, Qatar ~770%).
+   Values > 100 clamp the sub-score to 0 under the lower-better
+   normaliser against (0, 100).
+4. Under the `fao: null` branch (which is what the static seeder
+   emits for GCC in production) plus clamped AQUASTAT=0 at weight
+   0.4, the weighted blend is:
+
+   ```
+   weightedScore = (IMPUTE.ipcFood × 0.6 + 0 × 0.4) / (0.6 + 0.4)
+                 = (88 × 0.6) / 1.0
+                 = 52.8  → 53
+   ```
+
+   Pinned as an anchor test in
+   `tests/resilience-foodwater-field-mapping.test.mts`. Note that
+   an alternative scenario — `fao` present with `peopleInCrisis: 0`
+   and `phase: null` — converges on a near-identical 52.94 via the
+   else branch formula `(100×0.45 + 0×0.4) / 0.85`. That convergence
+   is a coincidence of the specific zero-peopleInCrisis input, NOT
+   the construct's intent — the test fixture is intentionally shaped
+   to exercise the IMPUTE path that matches production.
+
+**Why GCC scores are identical across the cohort.** GCC
+countries share:
+
+- Same IPC status (not monitored → same impute constant)
+- Same AQUASTAT indicator (`'water stress'`, WB's standard label)
+- Extreme and similarly-clamped withdrawal ratios (all > 100 →
+  all clamp to 0 on the AQUASTAT sub-score)
+
+Identical inputs → identical outputs. That is construct
+determinism, not a regional-default lookup. Pinned with a
+synthetic two-country test: identical input shapes produce
+identical scores; different water profiles produce different
+scores.
+
+**Regression-guard tests** in
+`tests/resilience-foodwater-field-mapping.test.mts`:
+
+- Indicator routing: `'water stress'` → lower-better;
+  `'renewable water availability'` → higher-better.
+- GCC extreme-withdrawal anchor: AQUASTAT value=2000 +
+  `fao: null` (IMPUTE branch, matching production) blends to
+  exactly 53 via `(88×0.6 + 0×0.4) / 1.0 = 52.8 → 53`.
+- IPC-absent with static record present: imputes
+  `ipcFood=88`; observed AQUASTAT wins →
+  `imputationClass=null` per weightedBlend's T1.7 rule.
+- Fully-imputed (FAO missing AND AQUASTAT missing): surfaces
+  `imputationClass='stable-absence'`.
+- Fully-absent static record (seeder never ran): returns
+  coverage=0, NOT an impute.
+- Cohort determinism: identical inputs → identical scores;
+  different water-profile inputs → different scores.
+
+**Implication — no fix required.** The scorer is producing the
+construct it's specified to produce. The observed GCC identity
+is a correct summary statement: "non-crisis food security +
+severe water-withdrawal stress." A future construct decision
+might split `foodWater` into food and water sub-dims so the
+water-stress signal doesn't saturate the combined dim across
+desert economies — but that is a construct redesign, not a
+bug fix.
+
+**Follow-up data-side spot-check (requires API key / Redis
+access; not in scope of this PR).** Pull raw AQUASTAT + FAO
+inputs for GCC + IL + JO (similar water-stressed region) and
+verify the seeder-written values against WB's live API
+response. If a GCC country's WB value differs substantially
+from the figures above, the seeder may have a stale-year
+picker bug — unlikely given `seed-resilience-static.mjs` uses
+`mrv=15` + `selectLatestWorldBankByCountry`, but worth
+verifying.
+
+**References.**
+
+- Seeder: `scripts/seed-resilience-static.mjs` lines 658-680
+  (`WB_WATER_STRESS_INDICATOR`, `fetchAquastatDataset`,
+  `buildAquastatWbMap`)
+- Scorer reads:
+  `server/worldmonitor/resilience/v1/_dimension-scorers.ts`
+  lines 895 (`scoreAquastatValue`), 1471 (`scoreFoodWater`),
+  135 (`IMPUTE.ipcFood` constant)
+- WB indicator docs:
+  https://data.worldbank.org/indicator/ER.H2O.FWST.ZS
+- Plan reference:
+  `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md`
+  §PR 5.3
+- Test regression guards:
+  `tests/resilience-foodwater-field-mapping.test.mts`
--- a/tests/resilience-foodwater-field-mapping.test.mts
+++ b/tests/resilience-foodwater-field-mapping.test.mts
@@ -0,0 +1,200 @@
+// Regression guard for scoreFoodWater — inputs, branching, and the
+// "identical cohort scores are construct-deterministic, not regional-
+// default leaks" invariant that PR 5.3 of plan 2026-04-24-002 sets
+// out to establish.
+//
+// Context. The plan flagged that all GCC countries score ~53 on
+// `foodWater` and asked: is that a mystery regional default or is it
+// a genuine construct output? This test suite pins the answer:
+// identical inputs produce identical outputs, and the inputs are
+// themselves explicable — IPC does not monitor rich food-secure
+// states (impute 88) and AQUASTAT/WB water-stress values for the
+// GCC are EXTREME (freshwater withdrawal > 100% of renewable
+// resources), which clamps the AQUASTAT sub-score to 0. The blend
+// of IMPUTE.ipcFood=88 with AQUASTAT=0 under the documented weights
+// produces a near-identical score across countries with the same
+// water-stress profile. That's the construct working — not a bug.
+//
+// See docs/methodology/known-limitations.md for the full write-up.
+
+import assert from 'node:assert/strict';
+import { describe, it } from 'node:test';
+
+import {
+  scoreFoodWater,
+  type ResilienceSeedReader,
+} from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
+
+const TEST_ISO2 = 'XX';
+
+function makeStaticReader(staticRecord: unknown): ResilienceSeedReader {
+  return async (key: string) => (key === `resilience:static:${TEST_ISO2}` ? staticRecord : null);
+}
+
+describe('scoreAquastatValue — indicator-keyword routing contract', () => {
+  it('indicator="water stress" routes to lower-better (higher withdrawal = worse score)', async () => {
+    // WB indicator ER.H2O.FWST.ZS: "Level of water stress: freshwater
+    // withdrawal as a proportion of available freshwater resources."
+    // Seeded as `indicator: 'water stress'`. Containing "stress" →
+    // `normalizeLowerBetter(value, 0, 100)` per _dimension-scorers.ts:899.
+    const low = makeStaticReader({
+      aquastat: { value: 10, indicator: 'water stress' },
+      fao: { peopleInCrisis: 0, phase: null },
+    });
+    const high = makeStaticReader({
+      aquastat: { value: 80, indicator: 'water stress' },
+      fao: { peopleInCrisis: 0, phase: null },
+    });
+    const [lowScore, highScore] = await Promise.all([
+      scoreFoodWater(TEST_ISO2, low),
+      scoreFoodWater(TEST_ISO2, high),
+    ]);
+    assert.ok(lowScore.score > highScore.score,
+      `higher water-stress must LOWER the score; got low=${lowScore.score}, high=${highScore.score}`);
+  });
+
+  it('AQUASTAT value > 100 clamps to 0 (the GCC extreme-withdrawal case)', async () => {
+    // GCC freshwater withdrawal % of renewable resources is well over
+    // 100% (KW ~3200, BH ~3400, AE ~2080, QA ~770). The normaliser
+    // clamps anything past the "worst" anchor to 0. Test with 2000,
+    // which comfortably exceeds the 100 anchor.
+    //
+    // IMPORTANT: drive the IMPUTE branch (`fao: null`), not the else
+    // branch. In production the static seeder writes `fao: null` for
+    // GCC (IPC/HDX does not monitor food-secure states), so the live
+    // blend uses the impute path with weights 0.6 (IPC impute=88) +
+    // 0.4 (AQUASTAT). The else branch (fao-present, peopleInCrisis=0)
+    // happens to converge on a near-identical number at these inputs
+    // by coincidence, but testing the wrong branch would let a future
+    // impute-branch regression slip through.
+    const reader = makeStaticReader({
+      aquastat: { value: 2000, indicator: 'water stress' },
+      fao: null, // fao==null branch — matches the GCC production shape
+    });
+    const result = await scoreFoodWater(TEST_ISO2, reader);
+    // Blend under the IMPUTE branch:
+    //   { score: 88, weight: 0.6, cov: 0.7, imputed }  // IMPUTE.ipcFood
+    //   { score: 0,  weight: 0.4, cov: 1.0, observed }  // AQUASTAT clamped
+    //   weightedScore = (88*0.6 + 0*0.4) / (0.6+0.4) = 52.8 → 53.
+    // This is EXACTLY the observed GCC cohort score. Pinning it.
+    assert.equal(Math.round(result.score), 53,
+      `GCC water-stress profile must yield ~53 on the IMPUTE branch; got ${result.score}`);
+  });
+
+  it('indicator="renewable water availability" routes to higher-better (more = better)', async () => {
+    const scarce = makeStaticReader({
+      aquastat: { value: 500, indicator: 'renewable water availability' },
+      fao: { peopleInCrisis: 0, phase: null },
+    });
+    const abundant = makeStaticReader({
+      aquastat: { value: 4500, indicator: 'renewable water availability' },
+      fao: { peopleInCrisis: 0, phase: null },
+    });
+    const [scarceScore, abundantScore] = await Promise.all([
+      scoreFoodWater(TEST_ISO2, scarce),
+      scoreFoodWater(TEST_ISO2, abundant),
+    ]);
+    assert.ok(abundantScore.score > scarceScore.score,
+      `more renewable water = higher score; got scarce=${scarceScore.score}, abundant=${abundantScore.score}`);
+  });
+});
+
+describe('scoreFoodWater — IPC absence path (stable-absence imputation)', () => {
+  it('country not in IPC/HDX (fao=null) imputes ipcFood=88 when static record present', async () => {
+    const reader = makeStaticReader({
+      aquastat: { value: 20, indicator: 'water stress' },
+      fao: null, // crisis_monitoring_absent — food-secure country, not a monitored crisis
+    });
+    const result = await scoreFoodWater(TEST_ISO2, reader);
+    // Blend: {score:88, weight:0.6, cov:0.7, imputed} + {score:80, weight:0.4, cov:1.0, observed}
+    //   weightedScore = (88*0.6 + 80*0.4) / (0.6+0.4) = 52.8 + 32.0 = 84.8 → 85
+    // Pinning the blended range to catch a formula regression.
+    assert.ok(result.score >= 80 && result.score <= 90,
+      `IPC-absent + moderate aquastat must blend to 80-90; got ${result.score}`);
+    // Per weightedBlend's T1.7 rule (line 601 of _dimension-scorers.ts):
+    // `imputationClass` surfaces ONLY when observedWeight === 0. Here
+    // AQUASTAT is observed data (score=80), so it "wins" and the final
+    // imputationClass is null. The IPC-impute is still reflected in
+    // `imputedWeight` and the lower coverage (70% for IPC * 0.6 + 100%
+    // for AQUASTAT * 0.4 = 82% weighted).
+    assert.equal(result.imputationClass, null,
+      'mixed observed+imputed → imputationClass=null (observed wins); IPC impute reflected in imputedWeight');
+    assert.ok(result.imputedWeight > 0, 'IPC slot must be counted as imputed');
+    assert.ok(result.observedWeight > 0, 'AQUASTAT slot must be counted as observed');
+  });
+
+  it('country fully imputed (fao=null AND aquastat absent) surfaces imputationClass=stable-absence', async () => {
+    // This is the scenario where `imputationClass` actually surfaces:
+    // AQUASTAT missing (null score, no impute) → contributes no weight
+    // AT ALL to observedWeight or imputedWeight. The remaining IPC
+    // slot is fully imputed, so the dimension is fully imputed and
+    // weightedBlend picks the dominant (only) class.
+    const reader = makeStaticReader({
+      aquastat: null, // AQUASTAT data missing entirely
+      fao: null,      // IPC not monitoring
+    });
+    const result = await scoreFoodWater(TEST_ISO2, reader);
+    assert.equal(result.imputationClass, 'stable-absence',
+      'fully-imputed dim must surface stable-absence class');
+    // Single imputed slot {score:88, weight:0.6, cov:0.7}:
+    //   weightedScore = 88 * 0.6 / 0.6 = 88 (only this slot has a score)
+    //   weightedCertainty = 0.7 * 0.6 / (0.6+0.4) = 0.42 total weighted / total
+    assert.ok(result.score >= 85 && result.score <= 92,
+      `fully-imputed must score ~88; got ${result.score}`);
+  });
+
+  it('static-record absent entirely (seeder never ran) does NOT impute — returns null branch', async () => {
+    // Per the scorer comment at line 1482 of _dimension-scorers.ts:
+    // "A missing resilience:static:{ISO2} key means the seeder never
+    // ran — not crisis-free." So this path returns weight-null for
+    // the IPC slot, not an IMPUTE. The AQUASTAT slot is also null
+    // because scoreAquastatValue(null)=null. Result: zero-signal.
+    const reader = makeStaticReader(null);
+    const result = await scoreFoodWater(TEST_ISO2, reader);
+    // weightedBlend with two null-score slots returns coverage=0.
+    assert.equal(result.coverage, 0,
+      'fully-absent static record must produce coverage=0 (no impute-as-if-safe)');
+  });
+});
+
+describe('scoreFoodWater — cohort determinism (not a regional-default leak)', () => {
+  it('two countries with identical inputs produce identical scores (construct-deterministic)', async () => {
+    // Two synthetic "GCC-shaped" countries: extreme water stress
+    // (WB value > 100 clamps AQUASTAT to 0) + IPC-absent. Same
+    // inputs → same output is the CORRECT behavior. An identical
+    // score across a cohort is a construct signal, NOT evidence of
+    // a hardcoded regional default.
+    const gccShape = {
+      aquastat: { value: 2500, indicator: 'water stress' },
+      fao: null,
+    };
+    const [a, b] = await Promise.all([
+      scoreFoodWater('AE', async (k) => (k === 'resilience:static:AE' ? gccShape : null)),
+      scoreFoodWater('KW', async (k) => (k === 'resilience:static:KW' ? gccShape : null)),
+    ]);
+    assert.equal(a.score, b.score,
+      'identical inputs must produce identical scores — the observed GCC cohort identity is construct-deterministic');
+    assert.equal(a.coverage, b.coverage, 'and identical coverage');
+  });
+
+  it('a different water-profile input produces a different score (rules out the regional-default hypothesis)', async () => {
+    // Same IPC-absent status, but different AQUASTAT indicator /
+    // value: a high-renewable country. If foodWater were using a
+    // hardcoded regional default, this country would score the
+    // same as the water-stress case above. It must not.
+    const waterStressShape = {
+      aquastat: { value: 2500, indicator: 'water stress' },
+      fao: null,
+    };
+    const waterAbundantShape = {
+      aquastat: { value: 8000, indicator: 'renewable water availability' },
+      fao: null,
+    };
+    const [stressed, abundant] = await Promise.all([
+      scoreFoodWater('AE', async (k) => (k === 'resilience:static:AE' ? waterStressShape : null)),
+      scoreFoodWater('IS', async (k) => (k === 'resilience:static:IS' ? waterAbundantShape : null)),
+    ]);
+    assert.ok(abundant.score > stressed.score,
+      `water-abundant country must outscore water-stressed; got stressed=${stressed.score}, abundant=${abundant.score} — a regional default would have tied these`);
+  });
+});