worldmonitor

eliott/worldmonitor

Fork 0

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Elie Habib	abdcdb581f	feat(resilience): SWF manifest expansion + KIA split + new schema fields (#3391 ) * feat(resilience): SWF manifest expansion + KIA split + new schema fields Phase 1 of plan 2026-04-25-001 (Codex-approved round 5). Manifest-only data correction; no construct change, no cache prefix bump. Schema additions (loader-validated, misplacement-rejected): - top-level: aum_usd, aum_year, aum_verified (primary-source AUM) - under classification: aum_pct_of_audited (fraction multiplier), excluded_overlaps_with_reserves (boolean; documentation-only) Manifest expansion (13 → 21 funds, 6 → 13 countries): - UAE: +ICD ($320B verified), +ADQ ($199B verified), +EIA (unverified — loaded for documentation, excluded from scoring per data-integrity rule) - KW: kia split into kia-grf (5%, access=0.9) + kia-fgf (95%, access=0.20). Corrects ~18× over-statement of crisis-deployable Kuwait sovereign wealth (audit found combined-AUM × 0.7 access applied $750B as "deployable" against ~$15B actual GRF stabilization capacity). - CN: +CIC ($1.35T), +NSSF ($400B, statutorily-gated 0.20 tier), +SAFE-IC ($417B, excluded — overlaps SAFE FX reserves) - HK: +HKMA-EF ($498B, excluded — overlaps HKMA reserves) - KR: +KIC ($182B, IFSWF full member) - AU: +Future Fund ($192B, pension-locked) - OM: +OIA ($50B, IFSWF member) - BH: +Mumtalakat ($19B) - TL: +Petroleum Fund ($22B, GPFG-style high-transparency) Re-audits (Phase 1E): - ADIA access 0.3 → 0.4 (rubric flagged; ruler-discretionary deployment empirically demonstrated) - Mubadala access 0.4 → 0.5 (rubric flagged); transparency 0.6 → 0.7 (LM=10 + IFSWF full member alignment) Rubric (docs/methodology/swf-classification-rubric.md): - New "Statutorily-gated long-horizon" 0.20 access tier added between 0.1 (sanctions/frozen) and 0.3 (intergenerational/ruler-discretionary). Anchored by KIA-FGF (Decree 106 of 1976; Council-of-Ministers + Emir decree gate; crossed once in extremis during COVID). Seeder: - Two new pure helpers: shouldSkipFundForBuffer (excluded/unverified decision) and applyAumPctOfAudited (sleeve fraction multiplier) - Manifest-AUM bypass: if aum_verified=true AND aum_usd present, use that value directly (skip Wikipedia) - Skip funds with excluded_overlaps_with_reserves=true (no double-counting against reserveAdequacy / liquidReserveAdequacy) - Skip funds with aum_verified=false (load for documentation only) Tests (+25 net): - 15 schema-extension tests (misplacement rejection, value-range gates, rationale-pairing coherence, backward-compat with pre-PR entries) - 10 helper tests (shouldSkipFundForBuffer + applyAumPctOfAudited predicates and arithmetic; KIA-GRF + KIA-FGF sum equals combined AUM) - Existing manifest test updated for the kia → kia-grf+kia-fgf split Full suite: 6,940 tests pass (+50 net), typecheck clean, no new lint. Predicted ranking deltas (informational, NOT acceptance criteria per plan §"Hard non-goals"): - AE sovFiscBuf likely 39 → 47-49 (Phase 1A + 1E) - KW sovFiscBuf likely 98 → 53-57 (Phase 1B) - CN, HK (excluded), KR, AU acquire newly-defined sovFiscBuf scores - GCC ordering shifts toward QA > KW > AE; AE-KW gap likely 6 → ~3-4 Real outcome will be measured post-deploy via cohort audit per plan §Phase 4. * fix(resilience): completeness denominator excludes documentation-only funds PR-3391 review (P1 catch): the per-country `expectedFunds` denominator counted ALL manifest entries (`funds.length`) including those skipped from buffer scoring by design — `excluded_overlaps_with_reserves: true` (SAFE-IC, HKMA-EF) and `aum_verified: false` (EIA). Result: countries with mixed scorable + non-scorable rosters showed `completeness < 1.0` even when every scorable fund matched. UAE (4 scorable + EIA) would show 0.8; CN (CIC + NSSF + SAFE-IC excluded) would show 0.67. The downstream scorer then derated those countries' coverage based on a fake-partial signal. Three call sites all carried the same bug: - per-country `expectedFunds` in fetchSovereignWealth main loop - `expectedFundsTotal` + `expectedCountries` in buildCoverageSummary - `countManifestFundsForCountry` (missing-country path) All three now filter via `shouldSkipFundForBuffer` to count only scorable manifest entries. Documentation-only funds neither expected nor matched — they don't appear in the ratio at all. Tests added (+4): - AE complete with all 4 scorable matched (EIA documented but excluded) - CN complete with CIC + NSSF matched (SAFE-IC documented but excluded) - Missing-country path returns scorable count not raw manifest count - Country with ONLY documentation-only entries excluded from expectedCountries Full suite: 6,944 tests pass (+4 net), typecheck clean. * fix(resilience): address Greptile P2s on PR #3391 manifest Three review findings, all in the manifest YAML: 1. KIA-GRF access 0.9 → 0.7 (rubric alignment): GRF deployment requires active Council-of-Ministers authorization (2020 COVID precedent demonstrates this), not rule-triggered automatic deployment. The rubric's 0.9 tier ("Pure automatic stabilization") reserved for funds where political authorization is post-hoc / symbolic (Chile ESSF candidate). KIA-GRF correctly fits 0.7 ("Explicit stabilization with rule") — the same tier the pre-split combined-KIA was assigned. Updated rationale clarifies the tier choice. Rubric's 0.7 precedent column already lists "KIA General Reserve Fund" — now consistent with the manifest. 2. Duplicate `# ── Australia ──` header before Oman (copy-paste artifact): removed the orphaned header at the Oman section; added proper `# ── Australia ──` header above the Future Fund entry where it actually belongs (after Timor-Leste). 3. NSSF `aum_pct_of_audited: 1.0` removed (no-op): a multiplier of 1.0 is identity. The schema field is OPTIONAL and only meant for fund-of-funds split entries (e.g. KIA-GRF/FGF). Setting it to 1.0 forced the loader to require an `aum_pct_of_audited` rationale paragraph with no computational benefit. Both the field and the paragraph are now removed; NSSF remains a single- sleeve entry that scores its full audited AUM. Full suite: 6,944 tests pass, typecheck clean.	2026-04-25 12:02:48 +04:00
Elie Habib	8032dc3a04	feat(resilience): PR 2 pre-scorer — SWF manifest + seeder (8/8 funds) (#3305 ) * feat(resilience): PR 2 scaffolding — SWF classification manifest + seeder skeleton Plan §3.4. First of multiple commits for PR 2 (fiscal-buffer split and sovereign-wealth integration). This commit is SCAFFOLDING ONLY: no dimension wiring, no scorer, no cache-keys entry yet. The goal is to land the reviewer-facing metadata and the seeder's three-tier source shape so an external SWF practitioner can critique before we wire the scorer. What is in: 1. docs/methodology/swf-classification-manifest.yaml — authoritative per-fund classification for the `sovereignFiscalBuffer` dimension. First-pass estimates for the 8 funds named in plan §3.4 table: Norway GPFG, UAE ADIA + Mubadala, Saudi PIF, Kuwait KIA, Qatar QIA, Singapore GIC + Temasek. Each fund carries: - three-component classification (access, liquidity, transparency) each on [0, 1], with rationale text citing the mandate / fiscal rule / asset-mix / transparency-index evidence - source URLs for audit Fund-candidates deferred for external-reviewer decision are listed in a trailing comment block (CIC, NWF, SOFAZ, NSIA, Future Fund, NZ Super, ESSF, etc.). external_review_status: PENDING — flip to REVIEWED on sign-off. 2. scripts/shared/swf-manifest-loader.mjs — YAML parser + strict schema validator. Fails loudly on any deviation (out-of-range scores, non-ISO2 countries, missing rationale, duplicate fund IDs, wrong manifest version). Single source of truth for the seeder, future scorer, and methodology-doc linter. 3. scripts/seed-sovereign-wealth.mjs — seeder shell with the three-tier source priority from plan §3.4: 1. Official fund disclosures (MoF, central-bank, annual reports) 2. IFSWF member filings 3. SWFI public fund-rankings page (license-free fallback, scraped) Tiers 1-3 are all stubbed (return null) in this commit — the seeder publishes a well-formed empty payload so the scorer IMPUTE fallback can be exercised end-to-end without live data. emptyDataIsFailure: false is set deliberately so pre-wiring cron runs do not poison seed-meta (see feedback_strict_floor_validate_fail_poisons_seed_meta.md). SWFI scrape target is documented in the file header with the exact URL and a 2.5s inter-request interval. The scraper itself lands in the next commit after the external reviewer signs off on the manifest. 4. tests/swf-classification-manifest.test.mjs — 14 tests exercising both the shipped YAML (plan §3.4 required-fund presence, [0,1] bounds, rationale length, source citations, multi-fund country handling) and the validator's schema enforcement (rejects out- of-range scores, non-ISO2 codes, missing rationale, empty sources, duplicates, wrong version, invalid review status). Out of scope for this commit (follow-ups, in order): - Implement SWFI scrape + IFSWF parse + per-fund official endpoints - Add `liquidReserveAdequacy` and `sovereignFiscalBuffer` dimensions to RESILIENCE_DIMENSION_ORDER, registry, and scorers - Retire `reserveAdequacy` via RESILIENCE_RETIRED_DIMENSIONS - cache-keys.ts + api/bootstrap.js + api/health.js wiring (new seed key needs ON_DEMAND_KEYS gating per Railway-cron bake-in rule) - Recovery-domain weight rebalance + Spearman sensitivity rerun - Methodology doc: rewrite the reserveAdequacy section Tests: 508/508 pass (resilience suite + new manifest tests). Typecheck clean on both tsconfig.json and tsconfig.api.json. No external-facing behavior change — all files are new + isolated. * feat(resilience): PR 2 commit 2 — Wikipedia SWF scraper + SWFI pivot Implements Tier 3 of the sovereignFiscalBuffer seeder. Tier 1 (official disclosures) and Tier 2 (IFSWF filings) remain stubbed — they require per-fund bespoke adapters and will land incrementally. SWFI pivot ---------- The plan's original Tier 3 target was https://www.swfinstitute.org/fund-rankings/sovereign-wealth-fund. Live check on 2026-04-23: the page's <tbody> is empty and AUM is gated behind a lead-capture form (name + company + job title). SWFI per-fund /profile/<id> pages are similarly barren. The "public fund rankings" is effectively no longer public; scraping the lead-gated surface would require submitting fabricated contact info (TOS violation, legally questionable), so Tier 3 pivots to Wikipedia. Wikipedia is legally clean (CC-BY-SA 4.0, attribution required — see WIKIPEDIA_SOURCE_ATTRIBUTION in the seeder) and structurally scrapable. The SWFI Linaburg-Maduell Transparency Index mentioned in manifest rationale text is a SEPARATE SWFI publication (public index scores), not the fund-rankings paywall — those citations stay valid. What is in ---------- 1. scripts/seed-sovereign-wealth.mjs — Wikipedia scraper implementation: - parseWikipediaRankingsTable(html) — exported pure function so the parser is unit-testable without a live fetch. Extracts the wikitable, parses per-fund rows (Country, Abbrev, Fund name, Assets USD B, Inception, Origin). - Strip-HTML helper strips <sup> tags to SPACES (not empty) so `302.0<sup>41</sup>` stays `302.0 41` — otherwise the decimal value and its trailing footnote ref get welded into `302.041`, which the Assets regex mis-parses. - matchWikipediaRecord(fund, cache) — abbrev + fund-name lookup with country disambiguation: lookup maps are now Map<key, Record[]> (list) rather than Map<key, Record>, and the matcher filters the list by manifest country before returning. This is the exact fix for the PIF collision: "PIF" resolves to BOTH Saudi Arabia's Public Investment Fund (~USD 925B) and Palestine's Palestine Investment Fund (~USD 900M) on the live article. Without country-filtering, Map.set silently overwrites one with the other, so Saudi PIF would return Palestine's AUM — three orders of magnitude wrong. - When the country disambiguator cannot pick, returns null rather than a best-guess. Seeder logs the unmatched fund; the IMPUTE path handles it gracefully. 2. docs/methodology/swf-classification-manifest.yaml — added `wikipedia` hints block to each of the 8 funds (abbrev and/or fund_name, matching Wikipedia's canonical naming). 3. scripts/shared/swf-manifest-loader.mjs — optional `wikipedia` field in the schema: `abbrev` and `fund_name` both optional strings, but at least one must be present if the block is provided. 4. tests/seed-sovereign-wealth.test.mjs — 12 tests exercising: - fixture-based parser: abbrev/name indexing, HTML + footnote stripping, decimal AUM, malformed rows skipped, missing-table error - abbrev-collision handling: both candidates retained in the list - country-disambiguation matcher: Saudi PIF correctly picked from a Saudi-vs-Palestine collision fixture (the exact live bug) - ambiguous lookup with unknown country returns null, not wrong record Live verification against the shipped Wikipedia article: 7/8 funds matched with the correct country; Saudi PIF now correctly returns USD 925B (not Palestine's USD 0.9B) because of the country- disambiguation fix. Temasek is the one miss — Wikipedia does not classify it as an SWF (practitioner debate; it lists under "state holding companies" instead). Falls through to IMPUTE in the scorer until Tier 1/2 adapters land with an official-disclosure source. Tests: 522/522 pass (resilience + manifest + scraper). Typecheck clean on both tsconfig.json and tsconfig.api.json. Still stubbed for later commits: - Tier 1 per-fund official-disclosure adapters (incl. Temasek) - Tier 2 IFSWF secretariat parser - Dimension wiring (liquidReserveAdequacy, sovereignFiscalBuffer) - reserveAdequacy retirement via RESILIENCE_RETIRED_DIMENSIONS - cache-keys / bootstrap / health.js wiring (ON_DEMAND_KEYS until bake-in) - Recovery-domain weight rebalance + Spearman sensitivity rerun * feat(resilience): PR 2 commit 3 — Wikipedia infobox fallback + FX → 8/8 match Closes the Temasek gap. The Wikipedia list article excludes Temasek on editorial grounds (classified as a "state holding company" rather than an SWF), so the Tier-3 list-only path topped out at 7/8 funds matched. This commit adds Tier 3b — per-fund Wikipedia article infobox scrape — and a baked-in FX table to handle non-USD infobox currencies. Live verification on the shipped Wikipedia articles: 8/8 funds matched. Temasek: S$ 434B → US$ 321B via infobox + SGD→USD FX. Implementation 1. scripts/seed-sovereign-wealth.mjs - FX_TO_USD table (USD, SGD, NOK, EUR, GBP, AED, SAR, KWD, QAR) with FX_RATES_REVIEWED_AT='2026-04-23' committed into the seed payload so stale rates are visible at audit time. - CURRENCY_SYMBOL_TO_ISO ordered list — US$ tested before S$ before bare $, and $ / kr require a space + digit neighbor to avoid false-matches in rich prose. - detectCurrency(text) exported pure for unit testing. - parseWikipediaArticleInfobox(html) exported pure — scans rows for "Total assets" / "Assets under management" / "AUM" / "Net assets" / "Net portfolio value" labels, extracts "NUMBER (trillion \| billion \| million) (YEAR)" values, applies FX conversion. - fetchWikipediaInfobox(fund) — per-fund article fetch, gated on the manifest's wikipedia.article_url hint. - sourceMix split into {official, ifswf, wikipedia_list, wikipedia_infobox} counters so the seed payload shows which tier delivered each fund. - Source priority chain: official → ifswf → wikipedia_list → wikipedia_infobox. Infobox last because it is N network round- trips; amortizing over the list article cache first minimizes live traffic. 2. docs/methodology/swf-classification-manifest.yaml - Temasek entry gains wikipedia.article_url: https://en.wikipedia.org/wiki/Temasek_Holdings with an inline comment explaining why the list-article path misses. 3. scripts/shared/swf-manifest-loader.mjs - article_url optional field; validator rejects anything that is not a https://<lang>.wikipedia.org/... URL so a typo cannot silently wire the seeder to an off-site fetch. 4. tests/seed-sovereign-wealth.test.mjs (10 new tests, 38/38 pass) - detectCurrency distinguishes US$ vs S$ vs bare $. - parseWikipediaArticleInfobox extracts Temasek S$ 434B → US$ 321B with year tag from "(2025)". - USD-native row pass-through with fxRate=1.0. - NOK trillion conversion (NOK 18.7T → USD 1.74T). - Returns null when no AUM row / no infobox at all. - Documents the unknown-currency → USD fallback contract. Tests: 532/532 pass (full resilience + manifest + scraper suite). Typecheck clean on both tsconfig.json and tsconfig.api.json. Still stubbed for later commits: - Tier 1 per-fund official-disclosure adapters - Tier 2 IFSWF secretariat parser - Dimension wiring (liquidReserveAdequacy, sovereignFiscalBuffer) - reserveAdequacy retirement via RESILIENCE_RETIRED_DIMENSIONS - cache-keys / bootstrap / health.js wiring (ON_DEMAND_KEYS) - Recovery-domain weight rebalance + Spearman sensitivity rerun * refactor(resilience): reuse project-shared FX infrastructure for SWF seeder Self-caught duplication from the previous commit (`699ba832a` introduced a local FX_TO_USD table and FX_RATES_REVIEWED_AT constant). The codebase already has the canonical path: scripts/_seed-utils.mjs SHARED_FX_FALLBACKS (USD/SGD/NOK/EUR/GBP/AED/SAR/QAR/KWD/...) getSharedFxRates() (Redis shared:fx-rates:v1 4h cache + Yahoo) fetchYahooFxRates() Used by seed-grocery-basket, seed-fuel-prices, seed-bigmac. Two FX tables would drift and the live-rate layer (Yahoo via Redis cache) would be orphaned on the SWF path. What changed - Deleted local FX_TO_USD / FX_RATES_REVIEWED_AT constants. - parseWikipediaArticleInfobox() no longer performs FX conversion. Returns { valueNative, currencyNative, aumYear } so the seeder orchestrator applies project-shared rates at call time. Parser is now currency-agnostic and thinner. - Added lookupUsdRate(currency, fxRates) helper: * USD → 1.0 short-circuit * prefer the live map (getSharedFxRates output) over static fallback * fall back to SHARED_FX_FALLBACKS * return null on unknown currency (caller skips the fund — no silent wrong-currency misreading). - fetchWikipediaInfobox() accepts fxRates map, converts via lookupUsdRate, returns enriched { aum, currencyNative, fxRate }. - fetchSovereignWealth() fetches fxRates once at the top via getSharedFxRates(buildFxSymbolsForSwf(), SHARED_FX_FALLBACKS), in parallel with World Bank imports + Wikipedia list. Warms the shared Redis FX cache for other seeders at the same time. - Seed payload drops the fxRatesReviewedAt field; the shared cache carries that metadata at the Redis level for all seeders. Tests updated - parseWikipediaArticleInfobox tests assert the native value + ISO code, no longer the USD-converted amount. - New `lookupUsdRate` suite pins the project-shared FX integration: USD short-circuit, live-rate preference, static fallback, unknown- currency null, and a Temasek S$ 434B → US$ 321B end-to-end case via the shared fallback table. Live re-verification still 8/8; SGD comes through SHARED_FX_FALLBACKS at 0.74 (same number as the deleted local table), so behavior is identical but the dedupe is real. Tests: 536/536 pass. Typecheck clean on both tsconfig configs. * refactor(resilience): split SWF manifest validator into sub-helpers Biome reported validateManifest at complexity 55 vs max 50. Extracted the per-fund validation into validateFundEntry(raw, idx, seen) and pulled out validateClassification, validateRationale, validateSources, validateWikipediaHints as separate helpers. Behavior and tests are unchanged; each helper is now well under the complexity cap and the main validator reads linearly. Tests: 42/42 manifest + scraper tests pass. Typecheck clean. * fix(resilience): PR 2 review — partial-seed guard + manifest REVIEWED status Addresses two P1 findings on PR #3305. P1.1 — partial-seed silent corruption on multi-fund countries ------------------------------------------------------------- For multi-fund countries (AE = ADIA + Mubadala, SG = GIC + Temasek) the previous aggregation silently published a partial totalEffectiveMonths if a secondary fund's scraper drifted on Wikipedia — recordCount would still look green because we counted "any fund matched" as a successful country-seed. Downstream scorer would under-rank those countries with no missingness signal. Fix: - Each country entry now carries { expectedFunds, matchedFunds, completeness } alongside the existing totalEffectiveMonths. The scorer can use completeness < 1.0 to derate (treat as degraded coverage) rather than accept the partial number at face value. - declareRecords counts ONLY countries with completeness === 1.0, so a secondary-fund drift drops the seed-meta record_count and triggers the operational alarm. recordCount in runSeed opts now delegates to declareRecords for parity. - A warn log fires per partial country so the Railway cron log is loud on drift without poisoning seed-meta. - 4 new tests pin: all-matched counts, partial drops, empty/malformed payloads, and defensive handling of pre-completeness payload shape. P1.2 — manifest external-reviewer language contradicted shipped workflow ------------------------------------------------------------------------ The YAML header said "External sovereign-wealth-practitioner review is REQUIRED before PR 2 merges" and external_review_status=PENDING. WorldMonitor's operating mode is fully automated (see memory `feedback_no_external_reviewer_assumption.md`) — there is no external practitioner gate. Reviewer correctly flagged the inconsistency between the document and the shipped behaviour. Fix: - Rewrote the header to describe the actual audit discipline: coefficients derive from the committed rationale + cited sources for each fund; revisions require the same discipline in a follow- up PR. No external-gate language. - Flipped external_review_status to REVIEWED, with a clarifying comment: REVIEWED = coefficients derive from the committed rationale + seeder end-to-end matches the live surfaces. PENDING remains reserved for future PRs that ship unresolved TBD coefficients. - Rewrote the "candidates deferred from v1" trailing block. Each fund listed now has a concrete rationale for deferral (sanctions / access coefficient would pin at 0 / classification contested / AUM disclosure unstable) so a future PR author can argue the case on record. No "reviewer advice needed" placeholders. - Tweaked two inline fund comments (UAE ADQ/ICD, Singapore Temasek) that still said "external reviewer" — now describe the substantive reason for inclusion/deferral. Tests ----- - 540/540 resilience + manifest + scraper tests pass. - Typecheck clean on both tsconfig configs. - Biome clean on all touched files. * fix(resilience): PR 2 review — 4 P2 fixes (WB imports, null validate, nested table, aumYear) Greptile P2 findings on PR #3305, all addressed. (1) Silent country drop on missing WB imports --------------------------------------------- If fetchAnnualImportsUsd() has no entry for a manifest ISO-2 (transient WB outage, new country with spotty coverage), the country was silently skipped. Downstream scorer would then read "absent from payload" as "no SWF" and score 0 with full coverage — substantively wrong. Now logs a warn and adds each affected fund to the unmatched list with a `(no WB imports)` suffix so the seed-meta observer sees the degradation. (2) typeof null === 'object' bypassed validate() ------------------------------------------------ Bare `typeof data?.countries === 'object'` returned true for { countries: null } and { countries: [] }. Downstream property access would then crash. Strict check added: non-null plain object only; also rejects arrays. Test pins all 5 edge cases. (3) Nested </table> / </td> truncated wikitable parse ----------------------------------------------------- Lazy [\s\S]*? in the outer table regex AND the inner row/cell regexes could silently drop every row after any cell that contained a nested mini-table (Wikipedia footnote boxes, sort helpers). Two-step fix: - extractFirstWikitable: depth-aware walk counts <table>/</table> opens and closes, returns content at balanced depth - stripNestedTables: iteratively removes complete inner <table>…</table> blocks BEFORE row parsing, so the lazy row / cell regexes never see a nested </tr> or </td> Test: 5-row fixture with a nested table inside row 1's cell — ADIA (row 2) must still parse, GPFG (row 1 with nested) must still parse. (4) aumYear reflected scrape year, not data year ------------------------------------------------ List-article entries were stamped with `new Date().getFullYear()` even though the Wikipedia list publishes no per-row data-year annotation (figures are typically prior-period). Consumers using aumYear for freshness audit would see "2026" for 2024/2025 data. Now set to null for list entries; infobox tier 3b retains year extraction from the "(YYYY)" tag on the individual fund article. P1 bootstrap deferral: intentional per project memory ----------------------------------------------------- AGENTS.md says new data sources MUST wire api/bootstrap.js. Not done in this PR by design: - No RPC consumer exists yet for `resilience:recovery:sovereign-wealth:v1` (scorer lands in a follow-up PR; wiring bootstrap without a consumer would be dead code). - Local memory `feedback_health_required_key_needs_railway_cron_ first.md` requires new seed keys to sit in ON_DEMAND_KEYS for ~7 days of clean Railway cron before promoting to BOOTSTRAP_KEYS — adding bootstrap wiring now would pre-empt that window and risk CRIT alarms on the health surface. The scorer PR that follows will land the bootstrap wiring + the dimension at the same time, which is the cohesive unit. Tests: 547/547 resilience + manifest + scraper tests pass. Typecheck clean on both tsconfig configs. Biome clean on touched files.	2026-04-23 07:58:40 +04:00

Elie Habib

abdcdb581f

feat(resilience): SWF manifest expansion + KIA split + new schema fields (#3391 )

* feat(resilience): SWF manifest expansion + KIA split + new schema fields

Phase 1 of plan 2026-04-25-001 (Codex-approved round 5). Manifest-only
data correction; no construct change, no cache prefix bump.

Schema additions (loader-validated, misplacement-rejected):
- top-level: aum_usd, aum_year, aum_verified (primary-source AUM)
- under classification: aum_pct_of_audited (fraction multiplier),
  excluded_overlaps_with_reserves (boolean; documentation-only)

Manifest expansion (13 → 21 funds, 6 → 13 countries):
- UAE: +ICD ($320B verified), +ADQ ($199B verified), +EIA (unverified —
  loaded for documentation, excluded from scoring per data-integrity rule)
- KW: kia split into kia-grf (5%, access=0.9) + kia-fgf (95%,
  access=0.20). Corrects ~18× over-statement of crisis-deployable
  Kuwait sovereign wealth (audit found combined-AUM × 0.7 access
  applied $750B as "deployable" against ~$15B actual GRF stabilization
  capacity).
- CN: +CIC ($1.35T), +NSSF ($400B, statutorily-gated 0.20 tier),
  +SAFE-IC ($417B, excluded — overlaps SAFE FX reserves)
- HK: +HKMA-EF ($498B, excluded — overlaps HKMA reserves)
- KR: +KIC ($182B, IFSWF full member)
- AU: +Future Fund ($192B, pension-locked)
- OM: +OIA ($50B, IFSWF member)
- BH: +Mumtalakat ($19B)
- TL: +Petroleum Fund ($22B, GPFG-style high-transparency)

Re-audits (Phase 1E):
- ADIA access 0.3 → 0.4 (rubric flagged; ruler-discretionary deployment
  empirically demonstrated)
- Mubadala access 0.4 → 0.5 (rubric flagged); transparency 0.6 → 0.7
  (LM=10 + IFSWF full member alignment)

Rubric (docs/methodology/swf-classification-rubric.md):
- New "Statutorily-gated long-horizon" 0.20 access tier added between
  0.1 (sanctions/frozen) and 0.3 (intergenerational/ruler-discretionary).
  Anchored by KIA-FGF (Decree 106 of 1976; Council-of-Ministers + Emir
  decree gate; crossed once in extremis during COVID).

Seeder:
- Two new pure helpers: shouldSkipFundForBuffer (excluded/unverified
  decision) and applyAumPctOfAudited (sleeve fraction multiplier)
- Manifest-AUM bypass: if aum_verified=true AND aum_usd present,
  use that value directly (skip Wikipedia)
- Skip funds with excluded_overlaps_with_reserves=true (no
  double-counting against reserveAdequacy / liquidReserveAdequacy)
- Skip funds with aum_verified=false (load for documentation only)

Tests (+25 net):
- 15 schema-extension tests (misplacement rejection, value-range gates,
  rationale-pairing coherence, backward-compat with pre-PR entries)
- 10 helper tests (shouldSkipFundForBuffer + applyAumPctOfAudited
  predicates and arithmetic; KIA-GRF + KIA-FGF sum equals combined AUM)
- Existing manifest test updated for the kia → kia-grf+kia-fgf split

Full suite: 6,940 tests pass (+50 net), typecheck clean, no new lint.

Predicted ranking deltas (informational, NOT acceptance criteria per
plan §"Hard non-goals"):
- AE sovFiscBuf likely 39 → 47-49 (Phase 1A + 1E)
- KW sovFiscBuf likely 98 → 53-57 (Phase 1B)
- CN, HK (excluded), KR, AU acquire newly-defined sovFiscBuf scores
- GCC ordering shifts toward QA > KW > AE; AE-KW gap likely 6 → ~3-4

Real outcome will be measured post-deploy via cohort audit per plan
§Phase 4.

* fix(resilience): completeness denominator excludes documentation-only funds

PR-3391 review (P1 catch): the per-country `expectedFunds` denominator
counted ALL manifest entries (`funds.length`) including those skipped
from buffer scoring by design — `excluded_overlaps_with_reserves: true`
(SAFE-IC, HKMA-EF) and `aum_verified: false` (EIA). Result: countries
with mixed scorable + non-scorable rosters showed `completeness < 1.0`
even when every scorable fund matched. UAE (4 scorable + EIA) would
show 0.8; CN (CIC + NSSF + SAFE-IC excluded) would show 0.67. The
downstream scorer then derated those countries' coverage based on a
fake-partial signal.

Three call sites all carried the same bug:
- per-country `expectedFunds` in fetchSovereignWealth main loop
- `expectedFundsTotal` + `expectedCountries` in buildCoverageSummary
- `countManifestFundsForCountry` (missing-country path)

All three now filter via `shouldSkipFundForBuffer` to count only
scorable manifest entries. Documentation-only funds neither expected
nor matched — they don't appear in the ratio at all.

Tests added (+4):
- AE complete with all 4 scorable matched (EIA documented but excluded)
- CN complete with CIC + NSSF matched (SAFE-IC documented but excluded)
- Missing-country path returns scorable count not raw manifest count
- Country with ONLY documentation-only entries excluded from expectedCountries

Full suite: 6,944 tests pass (+4 net), typecheck clean.

* fix(resilience): address Greptile P2s on PR #3391 manifest

Three review findings, all in the manifest YAML:

1. **KIA-GRF access 0.9 → 0.7** (rubric alignment): GRF deployment
   requires active Council-of-Ministers authorization (2020 COVID
   precedent demonstrates this), not rule-triggered automatic
   deployment. The rubric's 0.9 tier ("Pure automatic stabilization")
   reserved for funds where political authorization is post-hoc /
   symbolic (Chile ESSF candidate). KIA-GRF correctly fits 0.7
   ("Explicit stabilization with rule") — the same tier the
   pre-split combined-KIA was assigned. Updated rationale clarifies
   the tier choice. Rubric's 0.7 precedent column already lists
   "KIA General Reserve Fund" — now consistent with the manifest.

2. **Duplicate `# ── Australia ──` header before Oman** (copy-paste
   artifact): removed the orphaned header at the Oman section;
   added proper `# ── Australia ──` header above the Future Fund
   entry where it actually belongs (after Timor-Leste).

3. **NSSF `aum_pct_of_audited: 1.0` removed** (no-op): a multiplier
   of 1.0 is identity. The schema field is OPTIONAL and only meant
   for fund-of-funds split entries (e.g. KIA-GRF/FGF). Setting it
   to 1.0 forced the loader to require an `aum_pct_of_audited`
   rationale paragraph with no computational benefit. Both the
   field and the paragraph are now removed; NSSF remains a single-
   sleeve entry that scores its full audited AUM.

Full suite: 6,944 tests pass, typecheck clean.

2026-04-25 12:02:48 +04:00

Elie Habib

8032dc3a04

feat(resilience): PR 2 pre-scorer — SWF manifest + seeder (8/8 funds) (#3305 )

* feat(resilience): PR 2 scaffolding — SWF classification manifest + seeder skeleton

Plan §3.4. First of multiple commits for PR 2 (fiscal-buffer split
and sovereign-wealth integration). This commit is SCAFFOLDING ONLY:
no dimension wiring, no scorer, no cache-keys entry yet. The goal is
to land the reviewer-facing metadata and the seeder's three-tier
source shape so an external SWF practitioner can critique before we
wire the scorer.

What is in:

1. docs/methodology/swf-classification-manifest.yaml — authoritative
   per-fund classification for the `sovereignFiscalBuffer` dimension.
   First-pass estimates for the 8 funds named in plan §3.4 table:
   Norway GPFG, UAE ADIA + Mubadala, Saudi PIF, Kuwait KIA,
   Qatar QIA, Singapore GIC + Temasek. Each fund carries:
     - three-component classification (access, liquidity, transparency)
       each on [0, 1], with rationale text citing the mandate / fiscal
       rule / asset-mix / transparency-index evidence
     - source URLs for audit
   Fund-candidates deferred for external-reviewer decision are listed
   in a trailing comment block (CIC, NWF, SOFAZ, NSIA, Future Fund,
   NZ Super, ESSF, etc.).

   external_review_status: PENDING — flip to REVIEWED on sign-off.

2. scripts/shared/swf-manifest-loader.mjs — YAML parser + strict schema
   validator. Fails loudly on any deviation (out-of-range scores,
   non-ISO2 countries, missing rationale, duplicate fund IDs, wrong
   manifest version). Single source of truth for the seeder, future
   scorer, and methodology-doc linter.

3. scripts/seed-sovereign-wealth.mjs — seeder shell with the three-tier
   source priority from plan §3.4:
     1. Official fund disclosures (MoF, central-bank, annual reports)
     2. IFSWF member filings
     3. SWFI public fund-rankings page (license-free fallback, scraped)
   Tiers 1-3 are all stubbed (return null) in this commit — the
   seeder publishes a well-formed empty payload so the scorer IMPUTE
   fallback can be exercised end-to-end without live data.
   emptyDataIsFailure: false is set deliberately so pre-wiring cron
   runs do not poison seed-meta (see
   feedback_strict_floor_validate_fail_poisons_seed_meta.md).

   SWFI scrape target is documented in the file header with the
   exact URL and a 2.5s inter-request interval. The scraper itself
   lands in the next commit after the external reviewer signs off
   on the manifest.

4. tests/swf-classification-manifest.test.mjs — 14 tests exercising
   both the shipped YAML (plan §3.4 required-fund presence, [0,1]
   bounds, rationale length, source citations, multi-fund country
   handling) and the validator's schema enforcement (rejects out-
   of-range scores, non-ISO2 codes, missing rationale, empty sources,
   duplicates, wrong version, invalid review status).

Out of scope for this commit (follow-ups, in order):
 - Implement SWFI scrape + IFSWF parse + per-fund official endpoints
 - Add `liquidReserveAdequacy` and `sovereignFiscalBuffer` dimensions
   to RESILIENCE_DIMENSION_ORDER, registry, and scorers
 - Retire `reserveAdequacy` via RESILIENCE_RETIRED_DIMENSIONS
 - cache-keys.ts + api/bootstrap.js + api/health.js wiring (new
   seed key needs ON_DEMAND_KEYS gating per Railway-cron bake-in rule)
 - Recovery-domain weight rebalance + Spearman sensitivity rerun
 - Methodology doc: rewrite the reserveAdequacy section

Tests: 508/508 pass (resilience suite + new manifest tests).
Typecheck clean on both tsconfig.json and tsconfig.api.json.

No external-facing behavior change — all files are new + isolated.

* feat(resilience): PR 2 commit 2 — Wikipedia SWF scraper + SWFI pivot

Implements Tier 3 of the sovereignFiscalBuffer seeder. Tier 1 (official
disclosures) and Tier 2 (IFSWF filings) remain stubbed — they require
per-fund bespoke adapters and will land incrementally.

SWFI pivot
----------
The plan's original Tier 3 target was
https://www.swfinstitute.org/fund-rankings/sovereign-wealth-fund. Live
check on 2026-04-23: the page's <tbody> is empty and AUM is gated
behind a lead-capture form (name + company + job title). SWFI per-fund
/profile/<id> pages are similarly barren. The "public fund rankings"
is effectively no longer public; scraping the lead-gated surface would
require submitting fabricated contact info (TOS violation, legally
questionable), so Tier 3 pivots to Wikipedia.

Wikipedia is legally clean (CC-BY-SA 4.0, attribution required — see
WIKIPEDIA_SOURCE_ATTRIBUTION in the seeder) and structurally scrapable.
The SWFI Linaburg-Maduell Transparency Index mentioned in manifest
rationale text is a SEPARATE SWFI publication (public index scores),
not the fund-rankings paywall — those citations stay valid.

What is in
----------

1. scripts/seed-sovereign-wealth.mjs — Wikipedia scraper implementation:
   - parseWikipediaRankingsTable(html) — exported pure function so
     the parser is unit-testable without a live fetch. Extracts the
     wikitable, parses per-fund rows (Country, Abbrev, Fund name,
     Assets USD B, Inception, Origin).
   - Strip-HTML helper strips <sup> tags to SPACES (not empty) so
     `302.0<sup>41</sup>` stays `302.0 41` — otherwise the decimal
     value and its trailing footnote ref get welded into `302.041`,
     which the Assets regex mis-parses.
   - matchWikipediaRecord(fund, cache) — abbrev + fund-name lookup
     with country disambiguation: lookup maps are now
     Map<key, Record[]> (list) rather than Map<key, Record>, and the
     matcher filters the list by manifest country before returning.
     This is the exact fix for the PIF collision:
     "PIF" resolves to BOTH Saudi Arabia's Public Investment Fund
     (~USD 925B) and Palestine's Palestine Investment Fund (~USD 900M)
     on the live article. Without country-filtering, Map.set silently
     overwrites one with the other, so Saudi PIF would return
     Palestine's AUM — three orders of magnitude wrong.
   - When the country disambiguator cannot pick, returns null rather
     than a best-guess. Seeder logs the unmatched fund; the IMPUTE
     path handles it gracefully.

2. docs/methodology/swf-classification-manifest.yaml — added
   `wikipedia` hints block to each of the 8 funds (abbrev and/or
   fund_name, matching Wikipedia's canonical naming).

3. scripts/shared/swf-manifest-loader.mjs — optional `wikipedia` field
   in the schema: `abbrev` and `fund_name` both optional strings, but
   at least one must be present if the block is provided.

4. tests/seed-sovereign-wealth.test.mjs — 12 tests exercising:
   - fixture-based parser: abbrev/name indexing, HTML + footnote
     stripping, decimal AUM, malformed rows skipped, missing-table error
   - abbrev-collision handling: both candidates retained in the list
   - country-disambiguation matcher: Saudi PIF correctly picked from
     a Saudi-vs-Palestine collision fixture (the exact live bug)
   - ambiguous lookup with unknown country returns null, not wrong record

Live verification against the shipped Wikipedia article: 7/8 funds
matched with the correct country; Saudi PIF now correctly returns
USD 925B (not Palestine's USD 0.9B) because of the country-
disambiguation fix. Temasek is the one miss — Wikipedia does not
classify it as an SWF (practitioner debate; it lists under "state
holding companies" instead). Falls through to IMPUTE in the scorer
until Tier 1/2 adapters land with an official-disclosure source.

Tests: 522/522 pass (resilience + manifest + scraper).
Typecheck clean on both tsconfig.json and tsconfig.api.json.

Still stubbed for later commits:
 - Tier 1 per-fund official-disclosure adapters (incl. Temasek)
 - Tier 2 IFSWF secretariat parser
 - Dimension wiring (liquidReserveAdequacy, sovereignFiscalBuffer)
 - reserveAdequacy retirement via RESILIENCE_RETIRED_DIMENSIONS
 - cache-keys / bootstrap / health.js wiring (ON_DEMAND_KEYS until bake-in)
 - Recovery-domain weight rebalance + Spearman sensitivity rerun

* feat(resilience): PR 2 commit 3 — Wikipedia infobox fallback + FX → 8/8 match

Closes the Temasek gap. The Wikipedia list article excludes Temasek on
editorial grounds (classified as a "state holding company" rather than
an SWF), so the Tier-3 list-only path topped out at 7/8 funds matched.
This commit adds Tier 3b — per-fund Wikipedia article infobox scrape
— and a baked-in FX table to handle non-USD infobox currencies.

Live verification on the shipped Wikipedia articles: 8/8 funds matched.
Temasek: S$ 434B → US$ 321B via infobox + SGD→USD FX.

Implementation

1. scripts/seed-sovereign-wealth.mjs
   - FX_TO_USD table (USD, SGD, NOK, EUR, GBP, AED, SAR, KWD, QAR)
     with FX_RATES_REVIEWED_AT='2026-04-23' committed into the seed
     payload so stale rates are visible at audit time.
   - CURRENCY_SYMBOL_TO_ISO ordered list — US$ tested before S$ before
     bare $, and $ / kr require a space + digit neighbor to avoid
     false-matches in rich prose.
   - detectCurrency(text) exported pure for unit testing.
   - parseWikipediaArticleInfobox(html) exported pure — scans rows
     for "Total assets" / "Assets under management" / "AUM" / "Net
     assets" / "Net portfolio value" labels, extracts "NUMBER (trillion
     | billion | million) (YEAR)" values, applies FX conversion.
   - fetchWikipediaInfobox(fund) — per-fund article fetch, gated on
     the manifest's wikipedia.article_url hint.
   - sourceMix split into {official, ifswf, wikipedia_list,
     wikipedia_infobox} counters so the seed payload shows which tier
     delivered each fund.
   - Source priority chain: official → ifswf → wikipedia_list →
     wikipedia_infobox. Infobox last because it is N network round-
     trips; amortizing over the list article cache first minimizes
     live traffic.

2. docs/methodology/swf-classification-manifest.yaml
   - Temasek entry gains wikipedia.article_url:
     https://en.wikipedia.org/wiki/Temasek_Holdings with an inline
     comment explaining why the list-article path misses.

3. scripts/shared/swf-manifest-loader.mjs
   - article_url optional field; validator rejects anything that is
     not a https://<lang>.wikipedia.org/... URL so a typo cannot
     silently wire the seeder to an off-site fetch.

4. tests/seed-sovereign-wealth.test.mjs (10 new tests, 38/38 pass)
   - detectCurrency distinguishes US$ vs S$ vs bare $.
   - parseWikipediaArticleInfobox extracts Temasek S$ 434B → US$ 321B
     with year tag from "(2025)".
   - USD-native row pass-through with fxRate=1.0.
   - NOK trillion conversion (NOK 18.7T → USD 1.74T).
   - Returns null when no AUM row / no infobox at all.
   - Documents the unknown-currency → USD fallback contract.

Tests: 532/532 pass (full resilience + manifest + scraper suite).
Typecheck clean on both tsconfig.json and tsconfig.api.json.

Still stubbed for later commits:
 - Tier 1 per-fund official-disclosure adapters
 - Tier 2 IFSWF secretariat parser
 - Dimension wiring (liquidReserveAdequacy, sovereignFiscalBuffer)
 - reserveAdequacy retirement via RESILIENCE_RETIRED_DIMENSIONS
 - cache-keys / bootstrap / health.js wiring (ON_DEMAND_KEYS)
 - Recovery-domain weight rebalance + Spearman sensitivity rerun

* refactor(resilience): reuse project-shared FX infrastructure for SWF seeder

Self-caught duplication from the previous commit (699ba832a introduced
a local FX_TO_USD table and FX_RATES_REVIEWED_AT constant). The
codebase already has the canonical path:

  scripts/_seed-utils.mjs
    SHARED_FX_FALLBACKS      (USD/SGD/NOK/EUR/GBP/AED/SAR/QAR/KWD/...)
    getSharedFxRates()       (Redis shared:fx-rates:v1 4h cache + Yahoo)
    fetchYahooFxRates()

Used by seed-grocery-basket, seed-fuel-prices, seed-bigmac. Two FX
tables would drift and the live-rate layer (Yahoo via Redis cache)
would be orphaned on the SWF path.

What changed

- Deleted local FX_TO_USD / FX_RATES_REVIEWED_AT constants.
- parseWikipediaArticleInfobox() no longer performs FX conversion.
  Returns { valueNative, currencyNative, aumYear } so the seeder
  orchestrator applies project-shared rates at call time. Parser is
  now currency-agnostic and thinner.
- Added lookupUsdRate(currency, fxRates) helper:
  * USD → 1.0 short-circuit
  * prefer the live map (getSharedFxRates output) over static fallback
  * fall back to SHARED_FX_FALLBACKS
  * return null on unknown currency (caller skips the fund — no silent
    wrong-currency misreading).
- fetchWikipediaInfobox() accepts fxRates map, converts via
  lookupUsdRate, returns enriched { aum, currencyNative, fxRate }.
- fetchSovereignWealth() fetches fxRates once at the top via
  getSharedFxRates(buildFxSymbolsForSwf(), SHARED_FX_FALLBACKS), in
  parallel with World Bank imports + Wikipedia list. Warms the shared
  Redis FX cache for other seeders at the same time.
- Seed payload drops the fxRatesReviewedAt field; the shared cache
  carries that metadata at the Redis level for all seeders.

Tests updated

- parseWikipediaArticleInfobox tests assert the native value + ISO
  code, no longer the USD-converted amount.
- New `lookupUsdRate` suite pins the project-shared FX integration:
  USD short-circuit, live-rate preference, static fallback, unknown-
  currency null, and a Temasek S$ 434B → US$ 321B end-to-end case
  via the shared fallback table.

Live re-verification still 8/8; SGD comes through SHARED_FX_FALLBACKS
at 0.74 (same number as the deleted local table), so behavior is
identical but the dedupe is real.

Tests: 536/536 pass. Typecheck clean on both tsconfig configs.

* refactor(resilience): split SWF manifest validator into sub-helpers

Biome reported validateManifest at complexity 55 vs max 50. Extracted
the per-fund validation into validateFundEntry(raw, idx, seen) and
pulled out validateClassification, validateRationale, validateSources,
validateWikipediaHints as separate helpers. Behavior and tests are
unchanged; each helper is now well under the complexity cap and the
main validator reads linearly.

Tests: 42/42 manifest + scraper tests pass. Typecheck clean.

* fix(resilience): PR 2 review — partial-seed guard + manifest REVIEWED status

Addresses two P1 findings on PR #3305.

P1.1 — partial-seed silent corruption on multi-fund countries
-------------------------------------------------------------
For multi-fund countries (AE = ADIA + Mubadala, SG = GIC + Temasek)
the previous aggregation silently published a partial
totalEffectiveMonths if a secondary fund's scraper drifted on
Wikipedia — recordCount would still look green because we counted
"any fund matched" as a successful country-seed. Downstream scorer
would under-rank those countries with no missingness signal.

Fix:
- Each country entry now carries { expectedFunds, matchedFunds,
  completeness } alongside the existing totalEffectiveMonths. The
  scorer can use completeness < 1.0 to derate (treat as degraded
  coverage) rather than accept the partial number at face value.
- declareRecords counts ONLY countries with completeness === 1.0,
  so a secondary-fund drift drops the seed-meta record_count and
  triggers the operational alarm. recordCount in runSeed opts now
  delegates to declareRecords for parity.
- A warn log fires per partial country so the Railway cron log is
  loud on drift without poisoning seed-meta.
- 4 new tests pin: all-matched counts, partial drops, empty/malformed
  payloads, and defensive handling of pre-completeness payload shape.

P1.2 — manifest external-reviewer language contradicted shipped workflow
------------------------------------------------------------------------
The YAML header said "External sovereign-wealth-practitioner review
is REQUIRED before PR 2 merges" and external_review_status=PENDING.
WorldMonitor's operating mode is fully automated (see memory
`feedback_no_external_reviewer_assumption.md`) — there is no external
practitioner gate. Reviewer correctly flagged the inconsistency
between the document and the shipped behaviour.

Fix:
- Rewrote the header to describe the actual audit discipline:
  coefficients derive from the committed rationale + cited sources
  for each fund; revisions require the same discipline in a follow-
  up PR. No external-gate language.
- Flipped external_review_status to REVIEWED, with a clarifying
  comment: REVIEWED = coefficients derive from the committed
  rationale + seeder end-to-end matches the live surfaces. PENDING
  remains reserved for future PRs that ship unresolved TBD
  coefficients.
- Rewrote the "candidates deferred from v1" trailing block. Each
  fund listed now has a concrete rationale for deferral (sanctions /
  access coefficient would pin at 0 / classification contested /
  AUM disclosure unstable) so a future PR author can argue the case
  on record. No "reviewer advice needed" placeholders.
- Tweaked two inline fund comments (UAE ADQ/ICD, Singapore Temasek)
  that still said "external reviewer" — now describe the substantive
  reason for inclusion/deferral.

Tests
-----
- 540/540 resilience + manifest + scraper tests pass.
- Typecheck clean on both tsconfig configs.
- Biome clean on all touched files.

* fix(resilience): PR 2 review — 4 P2 fixes (WB imports, null validate, nested table, aumYear)

Greptile P2 findings on PR #3305, all addressed.

(1) Silent country drop on missing WB imports
---------------------------------------------
If fetchAnnualImportsUsd() has no entry for a manifest ISO-2
(transient WB outage, new country with spotty coverage), the country
was silently skipped. Downstream scorer would then read "absent from
payload" as "no SWF" and score 0 with full coverage — substantively
wrong. Now logs a warn and adds each affected fund to the unmatched
list with a `(no WB imports)` suffix so the seed-meta observer sees
the degradation.

(2) typeof null === 'object' bypassed validate()
------------------------------------------------
Bare `typeof data?.countries === 'object'` returned true for
{ countries: null } and { countries: [] }. Downstream property
access would then crash. Strict check added: non-null plain object
only; also rejects arrays. Test pins all 5 edge cases.

(3) Nested </table> / </td> truncated wikitable parse
-----------------------------------------------------
Lazy [\s\S]*? in the outer table regex AND the inner row/cell regexes
could silently drop every row after any cell that contained a nested
mini-table (Wikipedia footnote boxes, sort helpers). Two-step fix:
  - extractFirstWikitable: depth-aware walk counts <table>/</table>
    opens and closes, returns content at balanced depth
  - stripNestedTables: iteratively removes complete inner
    <table>…</table> blocks BEFORE row parsing, so the lazy row / cell
    regexes never see a nested </tr> or </td>
Test: 5-row fixture with a nested table inside row 1's cell — ADIA
(row 2) must still parse, GPFG (row 1 with nested) must still parse.

(4) aumYear reflected scrape year, not data year
------------------------------------------------
List-article entries were stamped with `new Date().getFullYear()`
even though the Wikipedia list publishes no per-row data-year
annotation (figures are typically prior-period). Consumers using
aumYear for freshness audit would see "2026" for 2024/2025 data.
Now set to null for list entries; infobox tier 3b retains year
extraction from the "(YYYY)" tag on the individual fund article.

P1 bootstrap deferral: intentional per project memory
-----------------------------------------------------
AGENTS.md says new data sources MUST wire api/bootstrap.js. Not done
in this PR by design:
  - No RPC consumer exists yet for
    `resilience:recovery:sovereign-wealth:v1` (scorer lands in a
    follow-up PR; wiring bootstrap without a consumer would be dead
    code).
  - Local memory `feedback_health_required_key_needs_railway_cron_
    first.md` requires new seed keys to sit in ON_DEMAND_KEYS for
    ~7 days of clean Railway cron before promoting to
    BOOTSTRAP_KEYS — adding bootstrap wiring now would pre-empt
    that window and risk CRIT alarms on the health surface.
The scorer PR that follows will land the bootstrap wiring + the
dimension at the same time, which is the cohesive unit.

Tests: 547/547 resilience + manifest + scraper tests pass.
Typecheck clean on both tsconfig configs. Biome clean on touched
files.

2026-04-23 07:58:40 +04:00

2 Commits