mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(resilience): SWF manifest expansion + KIA split + new schema fields Phase 1 of plan 2026-04-25-001 (Codex-approved round 5). Manifest-only data correction; no construct change, no cache prefix bump. Schema additions (loader-validated, misplacement-rejected): - top-level: aum_usd, aum_year, aum_verified (primary-source AUM) - under classification: aum_pct_of_audited (fraction multiplier), excluded_overlaps_with_reserves (boolean; documentation-only) Manifest expansion (13 → 21 funds, 6 → 13 countries): - UAE: +ICD ($320B verified), +ADQ ($199B verified), +EIA (unverified — loaded for documentation, excluded from scoring per data-integrity rule) - KW: kia split into kia-grf (5%, access=0.9) + kia-fgf (95%, access=0.20). Corrects ~18× over-statement of crisis-deployable Kuwait sovereign wealth (audit found combined-AUM × 0.7 access applied $750B as "deployable" against ~$15B actual GRF stabilization capacity). - CN: +CIC ($1.35T), +NSSF ($400B, statutorily-gated 0.20 tier), +SAFE-IC ($417B, excluded — overlaps SAFE FX reserves) - HK: +HKMA-EF ($498B, excluded — overlaps HKMA reserves) - KR: +KIC ($182B, IFSWF full member) - AU: +Future Fund ($192B, pension-locked) - OM: +OIA ($50B, IFSWF member) - BH: +Mumtalakat ($19B) - TL: +Petroleum Fund ($22B, GPFG-style high-transparency) Re-audits (Phase 1E): - ADIA access 0.3 → 0.4 (rubric flagged; ruler-discretionary deployment empirically demonstrated) - Mubadala access 0.4 → 0.5 (rubric flagged); transparency 0.6 → 0.7 (LM=10 + IFSWF full member alignment) Rubric (docs/methodology/swf-classification-rubric.md): - New "Statutorily-gated long-horizon" 0.20 access tier added between 0.1 (sanctions/frozen) and 0.3 (intergenerational/ruler-discretionary). Anchored by KIA-FGF (Decree 106 of 1976; Council-of-Ministers + Emir decree gate; crossed once in extremis during COVID). Seeder: - Two new pure helpers: shouldSkipFundForBuffer (excluded/unverified decision) and applyAumPctOfAudited (sleeve fraction multiplier) - Manifest-AUM bypass: if aum_verified=true AND aum_usd present, use that value directly (skip Wikipedia) - Skip funds with excluded_overlaps_with_reserves=true (no double-counting against reserveAdequacy / liquidReserveAdequacy) - Skip funds with aum_verified=false (load for documentation only) Tests (+25 net): - 15 schema-extension tests (misplacement rejection, value-range gates, rationale-pairing coherence, backward-compat with pre-PR entries) - 10 helper tests (shouldSkipFundForBuffer + applyAumPctOfAudited predicates and arithmetic; KIA-GRF + KIA-FGF sum equals combined AUM) - Existing manifest test updated for the kia → kia-grf+kia-fgf split Full suite: 6,940 tests pass (+50 net), typecheck clean, no new lint. Predicted ranking deltas (informational, NOT acceptance criteria per plan §"Hard non-goals"): - AE sovFiscBuf likely 39 → 47-49 (Phase 1A + 1E) - KW sovFiscBuf likely 98 → 53-57 (Phase 1B) - CN, HK (excluded), KR, AU acquire newly-defined sovFiscBuf scores - GCC ordering shifts toward QA > KW > AE; AE-KW gap likely 6 → ~3-4 Real outcome will be measured post-deploy via cohort audit per plan §Phase 4. * fix(resilience): completeness denominator excludes documentation-only funds PR-3391 review (P1 catch): the per-country `expectedFunds` denominator counted ALL manifest entries (`funds.length`) including those skipped from buffer scoring by design — `excluded_overlaps_with_reserves: true` (SAFE-IC, HKMA-EF) and `aum_verified: false` (EIA). Result: countries with mixed scorable + non-scorable rosters showed `completeness < 1.0` even when every scorable fund matched. UAE (4 scorable + EIA) would show 0.8; CN (CIC + NSSF + SAFE-IC excluded) would show 0.67. The downstream scorer then derated those countries' coverage based on a fake-partial signal. Three call sites all carried the same bug: - per-country `expectedFunds` in fetchSovereignWealth main loop - `expectedFundsTotal` + `expectedCountries` in buildCoverageSummary - `countManifestFundsForCountry` (missing-country path) All three now filter via `shouldSkipFundForBuffer` to count only scorable manifest entries. Documentation-only funds neither expected nor matched — they don't appear in the ratio at all. Tests added (+4): - AE complete with all 4 scorable matched (EIA documented but excluded) - CN complete with CIC + NSSF matched (SAFE-IC documented but excluded) - Missing-country path returns scorable count not raw manifest count - Country with ONLY documentation-only entries excluded from expectedCountries Full suite: 6,944 tests pass (+4 net), typecheck clean. * fix(resilience): address Greptile P2s on PR #3391 manifest Three review findings, all in the manifest YAML: 1. **KIA-GRF access 0.9 → 0.7** (rubric alignment): GRF deployment requires active Council-of-Ministers authorization (2020 COVID precedent demonstrates this), not rule-triggered automatic deployment. The rubric's 0.9 tier ("Pure automatic stabilization") reserved for funds where political authorization is post-hoc / symbolic (Chile ESSF candidate). KIA-GRF correctly fits 0.7 ("Explicit stabilization with rule") — the same tier the pre-split combined-KIA was assigned. Updated rationale clarifies the tier choice. Rubric's 0.7 precedent column already lists "KIA General Reserve Fund" — now consistent with the manifest. 2. **Duplicate `# ── Australia ──` header before Oman** (copy-paste artifact): removed the orphaned header at the Oman section; added proper `# ── Australia ──` header above the Future Fund entry where it actually belongs (after Timor-Leste). 3. **NSSF `aum_pct_of_audited: 1.0` removed** (no-op): a multiplier of 1.0 is identity. The schema field is OPTIONAL and only meant for fund-of-funds split entries (e.g. KIA-GRF/FGF). Setting it to 1.0 forced the loader to require an `aum_pct_of_audited` rationale paragraph with no computational benefit. Both the field and the paragraph are now removed; NSSF remains a single- sleeve entry that scores its full audited AUM. Full suite: 6,944 tests pass, typecheck clean.
1088 lines
46 KiB
JavaScript
1088 lines
46 KiB
JavaScript
#!/usr/bin/env node
|
||
//
|
||
// Seeder — Sovereign Wealth Fund AUM (for the `sovereignFiscalBuffer`
|
||
// resilience dimension, PR 2 §3.4).
|
||
//
|
||
// Source priority (per plan §3.4, amended 2026-04-23 — see
|
||
// "SWFI availability note" below):
|
||
// 1. Official fund disclosures (MoF, central bank, fund annual reports).
|
||
// Hand-curated endpoint map; highest confidence. STUBBED in this
|
||
// commit (per-fund scrape adapters added incrementally).
|
||
// 2. IFSWF member-fund filings. Santiago-principle compliant funds
|
||
// publish audited AUM via the IFSWF secretariat. STUBBED.
|
||
// 3. WIKIPEDIA `List_of_sovereign_wealth_funds` — license-free public
|
||
// fallback (CC-BY-SA, attribution required; see `SOURCE_ATTRIBUTION`
|
||
// below). IMPLEMENTED. Wikipedia per-fund AUM is community-curated
|
||
// with primary-source citations on the article; lower confidence than
|
||
// tier 1 / 2 but sufficient for the `sovereignFiscalBuffer` score's
|
||
// saturating transform (large relative errors in AUM get compressed
|
||
// by the exponential in `score = 100 × (1 − exp(−effectiveMonths /
|
||
// 12))`, so tier-3 noise does not dominate ranking outcomes).
|
||
//
|
||
// SWFI availability note. The plan's original fallback target was the
|
||
// SWFI public fund-rankings page at
|
||
// https://www.swfinstitute.org/fund-rankings/sovereign-wealth-fund.
|
||
// Empirical check on 2026-04-23: the page's <tbody> is empty and AUM is
|
||
// gated behind a lead-capture form (name + company + job title). SWFI
|
||
// individual `/profile/<id>` pages are similarly barren. The "public
|
||
// fund-rankings" source is effectively no longer public. Scraping the
|
||
// lead-gated surface would require submitting fabricated contact info
|
||
// — a TOS violation and legally questionable — so we pivot tier 3 to
|
||
// Wikipedia, which is both legally clean (CC-BY-SA) and structurally
|
||
// scrapable. The SWFI Linaburg-Maduell transparency index mentioned in
|
||
// the manifest's `transparency` rationale text is a SEPARATE SWFI
|
||
// publication (public index scores), not the fund-rankings paywall —
|
||
// those citations stay valid.
|
||
//
|
||
// Cadence: quarterly (plan §3.4). Railway cron cadence: weekly refresh
|
||
// with ~35-day TTL (mirrors other recovery-domain seeders so stale data
|
||
// is caught by the seed-meta gate before it leaks into rankings).
|
||
//
|
||
// Output shape (Redis key `resilience:recovery:sovereign-wealth:v1`,
|
||
// enveloped through `_seed-utils.mjs`):
|
||
//
|
||
// {
|
||
// countries: {
|
||
// [iso2]: {
|
||
// funds: [
|
||
// {
|
||
// fund: 'gpfg',
|
||
// aum: <number, USD>,
|
||
// aumYear: <number>,
|
||
// source: 'official' | 'ifswf' | 'wikipedia_list' | 'wikipedia_infobox',
|
||
// access: <number 0..1>,
|
||
// liquidity: <number 0..1>,
|
||
// transparency: <number 0..1>,
|
||
// rawMonths: <number, = aum / annualImports × 12>,
|
||
// effectiveMonths: <number, = rawMonths × access × liquidity × transparency>,
|
||
// },
|
||
// ...
|
||
// ],
|
||
// totalEffectiveMonths: <number>, // Σ per-fund effectiveMonths
|
||
// annualImports: <number, USD>, // WB NE.IMP.GNFS.CD, for audit
|
||
// expectedFunds: <number>, // manifest count for this country
|
||
// matchedFunds: <number>, // funds whose AUM resolved
|
||
// completeness: <number 0..1>, // matchedFunds / expectedFunds
|
||
// }
|
||
// },
|
||
// seededAt: <ISO8601>,
|
||
// manifestVersion: <number>,
|
||
// sourceMix: {
|
||
// official: <count>, ifswf: <count>,
|
||
// wikipedia_list: <count>, wikipedia_infobox: <count>,
|
||
// },
|
||
// }
|
||
//
|
||
// Countries WITHOUT an entry in the manifest are absent from this
|
||
// payload. The scorer is expected to treat "no entry in payload" as
|
||
// "no sovereign wealth fund" and score 0 with full coverage (plan
|
||
// §3.4 "What happens to no-SWF countries"). This is substantively
|
||
// different from IMPUTE fallback (which is "data-source-failed").
|
||
|
||
import { loadEnvFile, CHROME_UA, runSeed, readSeedSnapshot, SHARED_FX_FALLBACKS, getSharedFxRates, getBundleRunStartedAtMs } from './_seed-utils.mjs';
|
||
import iso3ToIso2 from './shared/iso3-to-iso2.json' with { type: 'json' };
|
||
import { groupFundsByCountry, loadSwfManifest } from './shared/swf-manifest-loader.mjs';
|
||
|
||
const REEXPORT_SHARE_CANONICAL_KEY = 'resilience:recovery:reexport-share:v1';
|
||
const REEXPORT_SHARE_META_KEY = 'seed-meta:resilience:recovery:reexport-share';
|
||
|
||
/**
|
||
* Read the Comtrade-seeded re-export-share map from Redis, guarded by
|
||
* bundle-run freshness. Returns an empty Map on any failure signal —
|
||
* missing key, malformed payload, or seed-meta older than this bundle
|
||
* run. The caller treats an empty map as "use gross imports for all
|
||
* countries" (status-quo fallback).
|
||
*
|
||
* Why bundle-run freshness matters: the Reexport-Share seeder runs
|
||
* immediately before this SWF seeder inside the resilience-recovery
|
||
* bundle. If that seeder fails (Comtrade outage, 429 storm, timeout),
|
||
* its Redis key still holds LAST MONTH's envelope — reading that
|
||
* would silently apply stale shares to the current month's SWF data.
|
||
* The bundle-freshness guard rejects any meta predating the current
|
||
* bundle run, forcing a hard fallback to gross imports.
|
||
*
|
||
* @returns {Promise<Map<string, { reexportShareOfImports: number, year: number | null, sources: string[] }>>}
|
||
*/
|
||
export async function loadReexportShareFromRedis() {
|
||
const map = new Map();
|
||
const raw = await readSeedSnapshot(REEXPORT_SHARE_CANONICAL_KEY);
|
||
if (!raw || typeof raw !== 'object') {
|
||
console.warn('[seed-sovereign-wealth] reexport-share Redis key empty/malformed; falling back to gross-imports denominator for all countries');
|
||
return map;
|
||
}
|
||
|
||
const metaRaw = await readSeedSnapshot(REEXPORT_SHARE_META_KEY);
|
||
const fetchedAtMs = Number(metaRaw?.fetchedAt ?? 0);
|
||
if (!fetchedAtMs) {
|
||
// Meta absent or malformed — can't tell whether the peer seeder ran.
|
||
// Safer to treat as outage than to trust the data key alone.
|
||
console.warn('[seed-sovereign-wealth] reexport-share seed-meta absent/malformed; falling back to gross-imports denominator for all countries');
|
||
return map;
|
||
}
|
||
const bundleStartMs = getBundleRunStartedAtMs();
|
||
// Freshness gate applies ONLY when spawned by _bundle-runner.mjs (i.e.
|
||
// `getBundleRunStartedAtMs()` returns a timestamp). Standalone runs
|
||
// (manual invocation, operator debugging) return null and skip the
|
||
// gate: the operator is responsible for running the peer seeder
|
||
// first, and we trust any `fetchedAt` in that context. The gate's
|
||
// purpose is protecting against across-bundle-tick staleness inside
|
||
// a cron run, which has no analog outside a bundle.
|
||
if (bundleStartMs != null && fetchedAtMs < bundleStartMs) {
|
||
const ageMin = ((Date.now() - fetchedAtMs) / 60_000).toFixed(0);
|
||
console.warn(`[seed-sovereign-wealth] reexport-share seed-meta NOT from this bundle run (age=${ageMin}min, bundleStart=${new Date(bundleStartMs).toISOString()}). Falling back to gross imports for all countries.`);
|
||
return map;
|
||
}
|
||
|
||
const countries = raw.countries ?? {};
|
||
for (const [iso2, entry] of Object.entries(countries)) {
|
||
const share = entry?.reexportShareOfImports;
|
||
// Numeric bounds check — NaN / Infinity / negative / ≥ 1 all pass
|
||
// `typeof === 'number'`. computeNetImports requires share ∈ [0, 1).
|
||
// The Comtrade seeder caps at 0.95 but this guard protects against
|
||
// a rogue payload (e.g. a manual redis-cli write mid-migration).
|
||
if (!Number.isFinite(share) || share < 0 || share > 0.95) {
|
||
console.warn(`[seed-sovereign-wealth] ${iso2} share ${share} fails bounds check [0, 0.95]; skipping`);
|
||
continue;
|
||
}
|
||
map.set(iso2, {
|
||
reexportShareOfImports: share,
|
||
year: entry?.year ?? null,
|
||
sources: Array.isArray(entry?.sources) ? entry.sources : [],
|
||
});
|
||
}
|
||
return map;
|
||
}
|
||
|
||
loadEnvFile(import.meta.url);
|
||
|
||
const CANONICAL_KEY = 'resilience:recovery:sovereign-wealth:v1';
|
||
const CACHE_TTL_SECONDS = 35 * 24 * 3600;
|
||
const WB_BASE = 'https://api.worldbank.org/v2';
|
||
const IMPORTS_INDICATOR = 'NE.IMP.GNFS.CD';
|
||
|
||
const WIKIPEDIA_URL = 'https://en.wikipedia.org/wiki/List_of_sovereign_wealth_funds';
|
||
export const WIKIPEDIA_SOURCE_ATTRIBUTION =
|
||
'Wikipedia — List of sovereign wealth funds + per-fund articles (CC-BY-SA 4.0)';
|
||
|
||
// FX conversion uses the project-shared rate cache — Redis
|
||
// `shared:fx-rates:v1` (4h TTL, live Yahoo Finance source) with a static
|
||
// fallback table (`SHARED_FX_FALLBACKS`) that already carries every
|
||
// currency we can plausibly see in an SWF infobox (USD, SGD, NOK, EUR,
|
||
// GBP, AED, SAR, QAR, KWD, …). See scripts/_seed-utils.mjs and
|
||
// scripts/seed-grocery-basket.mjs / scripts/seed-fuel-prices.mjs for
|
||
// the consumer pattern. Small FX drift is absorbed by the saturating
|
||
// transform in the scorer (100 × (1 − exp(−effectiveMonths / 12))), so
|
||
// the shared cache's cadence suffices.
|
||
//
|
||
// Yahoo symbol convention: `<CCY>USD=X` returns the per-1-local-unit
|
||
// value in USD. We build the symbol map dynamically from any currency
|
||
// the infobox parser surfaces.
|
||
|
||
// Canonical currency code lookup keyed on the symbol / short-code that
|
||
// appears in Wikipedia infoboxes. Each entry maps to an ISO-4217 code
|
||
// used in FX_TO_USD above. Order matters — "US$" must be tested before
|
||
// "S$" and "$" so a "US$ 100B" row doesn't match the SGD / USD-fallback
|
||
// paths; `detectCurrency` below handles this by scanning longest-first.
|
||
const CURRENCY_SYMBOL_TO_ISO = [
|
||
['US$', 'USD'],
|
||
['USD', 'USD'],
|
||
['S$', 'SGD'],
|
||
['SGD', 'SGD'],
|
||
['NOK', 'NOK'],
|
||
['kr', 'NOK'], // Norwegian krone — weak signal, only used when
|
||
// preceded by a space and no other symbol matches
|
||
['€', 'EUR'],
|
||
['EUR', 'EUR'],
|
||
['£', 'GBP'],
|
||
['GBP', 'GBP'],
|
||
['AED', 'AED'],
|
||
['SAR', 'SAR'],
|
||
['KWD', 'KWD'],
|
||
['QAR', 'QAR'],
|
||
['$', 'USD'], // Bare `$` defaults to USD — last to avoid shadowing
|
||
// `US$` / `S$` / etc.
|
||
];
|
||
|
||
// ── World Bank: per-country annual imports (denominator for rawMonths) ──
|
||
|
||
// MRV lookback used in the bulk fetch. WB's `country/all?mrv=1` returns the
|
||
// SAME year across every country (the most recent year that any country
|
||
// reports) with `value: null` for countries that haven't published yet.
|
||
// KW/QA/AE report NE.IMP.GNFS.CD a year or two behind NO/SA/SG, so mrv=1
|
||
// returned null for them in the 2026-04-23 prod run (PR #3352 root cause).
|
||
// mrv=5 gives 5 years and lets us pick the most recent non-null per
|
||
// country, matching what the per-country endpoint returns naturally.
|
||
// Five years is deliberate — one is clearly insufficient, ten is overkill
|
||
// for a denominator that evolves on a yearly cadence (we also report back
|
||
// the year we picked, so the scorer can flag stale ones if it wants).
|
||
const IMPORTS_LOOKBACK_YEARS = 5;
|
||
|
||
/**
|
||
* Collapse a WB multi-year bulk response into a per-country map keyed on
|
||
* most-recent-non-null value. Exported so the mrv=5 + pick-latest logic
|
||
* is unit-testable without mocking fetch.
|
||
*
|
||
* @param {Array<{ countryiso3code?: string, country?: { id?: string }, value: unknown, date: unknown }>} records
|
||
* @returns {Record<string, { importsUsd: number, year: number }>}
|
||
*/
|
||
export function pickLatestPerCountry(records) {
|
||
const imports = {};
|
||
for (const record of records) {
|
||
const rawCode = record?.countryiso3code ?? record?.country?.id ?? '';
|
||
const iso2 = rawCode.length === 3 ? (iso3ToIso2[rawCode] ?? null) : (rawCode.length === 2 ? rawCode : null);
|
||
if (!iso2) continue;
|
||
const value = Number(record?.value);
|
||
if (!Number.isFinite(value) || value <= 0) continue;
|
||
const year = Number(record?.date);
|
||
if (!Number.isFinite(year)) continue;
|
||
const existing = imports[iso2];
|
||
if (!existing || year > existing.year) {
|
||
imports[iso2] = { importsUsd: value, year };
|
||
}
|
||
}
|
||
return imports;
|
||
}
|
||
|
||
async function fetchAnnualImportsUsd() {
|
||
const pages = [];
|
||
let page = 1;
|
||
let totalPages = 1;
|
||
while (page <= totalPages) {
|
||
const url = `${WB_BASE}/country/all/indicator/${IMPORTS_INDICATOR}?format=json&per_page=2000&page=${page}&mrv=${IMPORTS_LOOKBACK_YEARS}`;
|
||
const resp = await fetch(url, {
|
||
headers: { 'User-Agent': CHROME_UA },
|
||
signal: AbortSignal.timeout(30_000),
|
||
});
|
||
if (!resp.ok) throw new Error(`World Bank ${IMPORTS_INDICATOR}: HTTP ${resp.status}`);
|
||
const json = await resp.json();
|
||
const meta = json[0];
|
||
const records = json[1] ?? [];
|
||
totalPages = meta?.pages ?? 1;
|
||
pages.push(...records);
|
||
page++;
|
||
}
|
||
return pickLatestPerCountry(pages);
|
||
}
|
||
|
||
// ── Tier 1: official disclosure endpoints (per-fund hand-curated) ──
|
||
//
|
||
// STUBBED. Each fund's annual-report / press-release page has a
|
||
// different structure; the scrape logic must be bespoke per fund.
|
||
// Added incrementally in follow-up commits.
|
||
//
|
||
// Returns { aum: number, aumYear: number, source: 'official' } or null.
|
||
async function fetchOfficialDisclosure(_fund) {
|
||
return null;
|
||
}
|
||
|
||
// ── Tier 2: IFSWF secretariat filings ──
|
||
//
|
||
// STUBBED. IFSWF publishes member-fund AUM at
|
||
// https://www.ifswf.org/member-profiles/<slug> but layout varies per
|
||
// fund. Deferred to a follow-up commit.
|
||
//
|
||
// Returns { aum: number, aumYear: number, source: 'ifswf' } or null.
|
||
async function fetchIfswfFiling(_fund) {
|
||
return null;
|
||
}
|
||
|
||
// ── Tier 3: Wikipedia fallback ──
|
||
|
||
// Wikipedia's country-name spelling for each manifest ISO-2. Used by the
|
||
// disambiguator to break abbrev collisions (e.g. "PIF" resolves to both
|
||
// Saudi Arabia's Public Investment Fund and Palestine's Palestine
|
||
// Investment Fund — without a country filter, the latter would silently
|
||
// shadow the former). Extend this map when adding a manifest entry
|
||
// whose country is new.
|
||
const ISO2_TO_WIKIPEDIA_COUNTRY_NAME = new Map([
|
||
['NO', 'norway'],
|
||
['AE', 'united arab emirates'],
|
||
['SA', 'saudi arabia'],
|
||
['KW', 'kuwait'],
|
||
['QA', 'qatar'],
|
||
['SG', 'singapore'],
|
||
]);
|
||
|
||
function normalizeAbbrev(value) {
|
||
return String(value || '').toUpperCase().replace(/[-\s.]/g, '');
|
||
}
|
||
|
||
function normalizeFundName(value) {
|
||
return String(value || '').toLowerCase().trim().replace(/\s+/g, ' ');
|
||
}
|
||
|
||
function normalizeCountryName(value) {
|
||
return String(value || '').toLowerCase().trim().replace(/\s+/g, ' ');
|
||
}
|
||
|
||
function pushIndexed(map, key, record) {
|
||
if (!key) return;
|
||
const list = map.get(key) ?? [];
|
||
list.push(record);
|
||
map.set(key, list);
|
||
}
|
||
|
||
function stripHtmlInline(value) {
|
||
// HTML tags replace with a space (not empty) so inline markup like
|
||
// `302.0<sup>41</sup>` becomes `302.0 41` — otherwise the decimal
|
||
// value and its trailing footnote ref get welded into `302.041`,
|
||
// which the Assets regex then mis-parses as a single number.
|
||
return String(value || '')
|
||
.replace(/<[^>]+>/g, ' ')
|
||
.replace(/ /g, ' ')
|
||
.replace(/&/g, '&')
|
||
.replace(/&[#\w]+;/g, ' ')
|
||
.replace(/\s+/g, ' ')
|
||
.trim();
|
||
}
|
||
|
||
// Depth-aware extraction of the first `<table class="wikitable...">`
|
||
// content. A simple lazy `[\s\S]*?</table>` would stop at the FIRST
|
||
// `</table>` encountered — but Wikipedia occasionally embeds mini-
|
||
// tables inside a row (sort helpers, footnote boxes). With a lazy
|
||
// match, any nested `</table>` before the real close silently drops
|
||
// all trailing rows. Walk the tag stream and close at matched depth.
|
||
function extractFirstWikitable(html) {
|
||
const openRe = /<table[^>]*class="[^"]*wikitable[^"]*"[^>]*>/g;
|
||
const openMatch = openRe.exec(html);
|
||
if (!openMatch) return null;
|
||
const innerStart = openMatch.index + openMatch[0].length;
|
||
|
||
const tagRe = /<(\/?)table\b[^>]*>/g;
|
||
tagRe.lastIndex = innerStart;
|
||
let depth = 1;
|
||
let m;
|
||
while ((m = tagRe.exec(html)) !== null) {
|
||
depth += m[1] === '/' ? -1 : 1;
|
||
if (depth === 0) return html.slice(innerStart, m.index);
|
||
}
|
||
return null; // unclosed table — treat as malformed
|
||
}
|
||
|
||
// Recursively remove complete nested `<table>…</table>` blocks from the
|
||
// extracted wikitable content before row parsing. Without this pass,
|
||
// the lazy row / cell regexes below bind across nested `</tr>` and
|
||
// `</td>` tags embedded in a cell's inner table, silently dropping the
|
||
// enclosing row. Uses depth tracking so a nested-inside-nested block
|
||
// is still removed as one unit.
|
||
function stripNestedTables(tableInner) {
|
||
let out = tableInner;
|
||
// Loop because stripping outer nested may reveal deeper ones; each
|
||
// iteration strips the outermost complete <table>…</table>.
|
||
// eslint-disable-next-line no-constant-condition
|
||
while (true) {
|
||
const openRe = /<table\b[^>]*>/g;
|
||
const openMatch = openRe.exec(out);
|
||
if (!openMatch) return out;
|
||
const innerStart = openMatch.index + openMatch[0].length;
|
||
const tagRe = /<(\/?)table\b[^>]*>/g;
|
||
tagRe.lastIndex = innerStart;
|
||
let depth = 1;
|
||
let closeEnd = -1;
|
||
let m;
|
||
while ((m = tagRe.exec(out)) !== null) {
|
||
depth += m[1] === '/' ? -1 : 1;
|
||
if (depth === 0) { closeEnd = m.index + m[0].length; break; }
|
||
}
|
||
if (closeEnd === -1) return out; // unclosed nested — stop
|
||
out = out.slice(0, openMatch.index) + out.slice(closeEnd);
|
||
}
|
||
}
|
||
|
||
/**
|
||
* Parse the Wikipedia wikitable HTML into lookup-by-abbrev / lookup-
|
||
* by-fund-name caches. Exported so it can be unit-tested against a
|
||
* committed fixture without a live fetch.
|
||
*
|
||
* Assumed columns (verified 2026-04-23 on the shipping article):
|
||
* [0] Country or region
|
||
* [1] Abbrev.
|
||
* [2] Fund name
|
||
* [3] Assets (in USD billions, optionally followed by a footnote
|
||
* reference like "2,117 37" — strip the trailing integer).
|
||
* [4] Inception year
|
||
* [5] Origin (Oil Gas / Non-commodity / etc.)
|
||
*
|
||
* Returns Maps keyed by normalized value → LIST of records. Multiple
|
||
* records under one key is a real case: "PIF" resolves to both Saudi
|
||
* Arabia's Public Investment Fund and Palestine's Palestine Investment
|
||
* Fund. The matcher disambiguates via manifest country at lookup time
|
||
* rather than letting Map.set silently overwrite.
|
||
*
|
||
* Record: { aum, aumYear, fundName, countryName, inceptionYear }.
|
||
* aumYear is null for list-article rows because the article does not
|
||
* publish a per-row data-year annotation; consumers treating aumYear
|
||
* as authoritative freshness must fall back to the infobox path.
|
||
*
|
||
* @param {string} html full article HTML
|
||
* @returns {{ byAbbrev: Map<string, object[]>, byFundName: Map<string, object[]> }}
|
||
*/
|
||
export function parseWikipediaRankingsTable(html) {
|
||
const rawTbl = extractFirstWikitable(html);
|
||
if (rawTbl == null) throw new Error('Wikipedia article: wikitable not found');
|
||
const tbl = stripNestedTables(rawTbl);
|
||
|
||
const byAbbrev = new Map();
|
||
const byFundName = new Map();
|
||
|
||
const rowRe = /<tr[^>]*>([\s\S]*?)<\/tr>/g;
|
||
let rowMatch;
|
||
while ((rowMatch = rowRe.exec(tbl)) !== null) {
|
||
const cellRe = /<t[dh][^>]*>([\s\S]*?)<\/t[dh]>/g;
|
||
const cells = [];
|
||
let cellMatch;
|
||
while ((cellMatch = cellRe.exec(rowMatch[1])) !== null) cells.push(cellMatch[1]);
|
||
if (cells.length < 5) continue;
|
||
|
||
const countryName = stripHtmlInline(cells[0]);
|
||
const abbrev = stripHtmlInline(cells[1]);
|
||
const fundName = stripHtmlInline(cells[2]);
|
||
const assetsCell = stripHtmlInline(cells[3]);
|
||
const inceptionCell = stripHtmlInline(cells[4]);
|
||
|
||
// "2,117 37" → 2117 billion (strip optional trailing footnote int)
|
||
const assetsMatch = assetsCell.match(/^([\d,]+(?:\.\d+)?)(?:\s+\d+)?\s*$/);
|
||
if (!assetsMatch) continue;
|
||
const aumBillions = parseFloat(assetsMatch[1].replace(/,/g, ''));
|
||
if (!Number.isFinite(aumBillions) || aumBillions <= 0) continue;
|
||
const aum = aumBillions * 1_000_000_000;
|
||
|
||
const inceptionYearMatch = inceptionCell.match(/(\d{4})/);
|
||
const inceptionYear = inceptionYearMatch ? parseInt(inceptionYearMatch[1], 10) : null;
|
||
|
||
// aumYear: null — the list article has no per-row data-year
|
||
// annotation. Reporting the scrape year would mislead freshness
|
||
// auditors (figures are usually prior-period).
|
||
const record = { aum, aumYear: null, fundName, countryName, inceptionYear };
|
||
|
||
pushIndexed(byAbbrev, normalizeAbbrev(abbrev), record);
|
||
pushIndexed(byFundName, normalizeFundName(fundName), record);
|
||
}
|
||
|
||
return { byAbbrev, byFundName };
|
||
}
|
||
|
||
async function loadWikipediaRankingsCache() {
|
||
const resp = await fetch(WIKIPEDIA_URL, {
|
||
headers: {
|
||
'User-Agent': CHROME_UA,
|
||
'Accept': 'text/html,application/xhtml+xml',
|
||
},
|
||
signal: AbortSignal.timeout(30_000),
|
||
});
|
||
if (!resp.ok) throw new Error(`Wikipedia SWF list: HTTP ${resp.status}`);
|
||
const html = await resp.text();
|
||
return parseWikipediaRankingsTable(html);
|
||
}
|
||
|
||
function pickByCountry(candidates, fundCountryIso2) {
|
||
if (!candidates || candidates.length === 0) return null;
|
||
// Single candidate → return it (country clash is not possible).
|
||
if (candidates.length === 1) return candidates[0];
|
||
// Multiple candidates → require a country-name match to pick one.
|
||
// Returning null here is the safe choice: it means "ambiguous match",
|
||
// which the seeder surfaces as an unmatched fund (logged), rather
|
||
// than silently returning the wrong fund's AUM.
|
||
const expectedCountryName = ISO2_TO_WIKIPEDIA_COUNTRY_NAME.get(fundCountryIso2);
|
||
if (!expectedCountryName) return null;
|
||
for (const record of candidates) {
|
||
if (normalizeCountryName(record.countryName) === expectedCountryName) return record;
|
||
}
|
||
return null;
|
||
}
|
||
|
||
export function matchWikipediaRecord(fund, cache) {
|
||
const hints = fund.wikipedia;
|
||
if (!hints) return null;
|
||
if (hints.abbrev) {
|
||
const hit = pickByCountry(cache.byAbbrev.get(normalizeAbbrev(hints.abbrev)), fund.country);
|
||
if (hit) return hit;
|
||
}
|
||
if (hints.fundName) {
|
||
const hit = pickByCountry(cache.byFundName.get(normalizeFundName(hints.fundName)), fund.country);
|
||
if (hit) return hit;
|
||
}
|
||
return null;
|
||
}
|
||
|
||
async function fetchWikipediaRanking(fund, cache) {
|
||
const hit = matchWikipediaRecord(fund, cache);
|
||
if (!hit) return null;
|
||
return { aum: hit.aum, aumYear: hit.aumYear, source: 'wikipedia_list' };
|
||
}
|
||
|
||
// ── Tier 3b: per-fund Wikipedia article infobox fallback ──
|
||
//
|
||
// Some manifest funds (Temasek is the canonical case) are editorially
|
||
// excluded from Wikipedia's list article. For those, the fund's own
|
||
// Wikipedia article's infobox carries AUM. Infobox layout is relatively
|
||
// stable: a `<table class="infobox ...">` with rows of
|
||
// `<th>Label</th><td>Value</td>`. We look for rows labelled "Total
|
||
// assets" / "Assets under management" / "AUM" / "Net assets" and parse
|
||
// the value.
|
||
|
||
const INFOBOX_AUM_LABELS = [
|
||
/^total\s+assets$/i,
|
||
/^assets\s+under\s+management$/i,
|
||
/^aum$/i,
|
||
/^net\s+assets$/i,
|
||
/^net\s+portfolio\s+value$/i,
|
||
];
|
||
|
||
/**
|
||
* Detect the currency in a Wikipedia infobox value string.
|
||
* Returns an ISO-4217 code (e.g. "SGD") or null if unrecognized.
|
||
* Scans CURRENCY_SYMBOL_TO_ISO in order so longer/more-specific
|
||
* prefixes (US$, S$) match before bare `$` / `kr`.
|
||
*/
|
||
export function detectCurrency(text) {
|
||
const haystack = String(text || '');
|
||
for (const [symbol, iso] of CURRENCY_SYMBOL_TO_ISO) {
|
||
// `$` / `kr` are short + could false-match in rich text; require
|
||
// either a space before or start-of-string immediately before the
|
||
// token, and a digit (optional space) after.
|
||
if (symbol === '$' || symbol === 'kr') {
|
||
const re = new RegExp(`(^|\\s)${symbol.replace(/[$]/g, '\\$')}\\s*\\d`);
|
||
if (re.test(haystack)) return iso;
|
||
continue;
|
||
}
|
||
if (haystack.includes(symbol)) return iso;
|
||
}
|
||
return null;
|
||
}
|
||
|
||
/**
|
||
* Parse a Wikipedia infobox HTML fragment for an AUM value. Returns
|
||
* the NATIVE-currency value plus its ISO-4217 code so the caller can
|
||
* apply the project-shared FX rates (`getSharedFxRates`) at orchestration
|
||
* time. Returning raw-native avoids duplicating the FX conversion layer
|
||
* already maintained in `scripts/_seed-utils.mjs` for seed-grocery-basket,
|
||
* seed-fuel-prices, seed-bigmac, etc.
|
||
*
|
||
* Returns { valueNative: number, currencyNative: string, aumYear: number }
|
||
* or null if no usable row.
|
||
*
|
||
* Exported pure so a committed fixture can exercise the parsing + currency
|
||
* detection without a live fetch.
|
||
*/
|
||
export function parseWikipediaArticleInfobox(html) {
|
||
const infoboxMatch = html.match(/<table[^>]*class="[^"]*infobox[^"]*"[^>]*>([\s\S]*?)<\/table>/);
|
||
if (!infoboxMatch) return null;
|
||
const box = infoboxMatch[1];
|
||
|
||
const rowRe = /<tr[^>]*>([\s\S]*?)<\/tr>/g;
|
||
let rowMatch;
|
||
while ((rowMatch = rowRe.exec(box)) !== null) {
|
||
// Split the row into th (label) + td (value). Either can be missing
|
||
// or out-of-order in edge cases, so use a two-pass extraction.
|
||
const label = (rowMatch[1].match(/<th[^>]*>([\s\S]*?)<\/th>/)?.[1] ?? '');
|
||
const value = (rowMatch[1].match(/<td[^>]*>([\s\S]*?)<\/td>/)?.[1] ?? '');
|
||
const labelText = stripHtmlInline(label);
|
||
if (!INFOBOX_AUM_LABELS.some((re) => re.test(labelText))) continue;
|
||
|
||
const valueText = stripHtmlInline(value);
|
||
// Example values:
|
||
// "S$ 434 billion (2025) 2"
|
||
// "US$ 1,128 billion"
|
||
// "€ 500 million"
|
||
// "NOK 18.7 trillion (2025)"
|
||
const numMatch = valueText.match(/([\d,]+(?:\.\d+)?)\s*(trillion|billion|million)/i);
|
||
if (!numMatch) continue;
|
||
const rawNum = parseFloat(numMatch[1].replace(/,/g, ''));
|
||
if (!Number.isFinite(rawNum) || rawNum <= 0) continue;
|
||
const unit = numMatch[2].toLowerCase();
|
||
const unitMultiplier = unit === 'trillion'
|
||
? 1_000_000_000_000
|
||
: unit === 'billion'
|
||
? 1_000_000_000
|
||
: 1_000_000;
|
||
const valueNative = rawNum * unitMultiplier;
|
||
|
||
const currencyNative = detectCurrency(valueText) ?? 'USD';
|
||
|
||
const yearMatch = valueText.match(/\((\d{4})\)/);
|
||
const aumYear = yearMatch ? parseInt(yearMatch[1], 10) : new Date().getFullYear();
|
||
|
||
return { valueNative, currencyNative, aumYear };
|
||
}
|
||
return null;
|
||
}
|
||
|
||
/**
|
||
* Look up the USD-per-unit rate for a currency from the shared FX map.
|
||
* `fxRates` is the object returned by `getSharedFxRates()` (keys are
|
||
* ISO-4217 codes). Falls back to SHARED_FX_FALLBACKS for any currency
|
||
* not in the live map. Returns null if the currency is unknown — the
|
||
* caller should treat that as "cannot convert, skip this fund" rather
|
||
* than silently pretending the value is USD.
|
||
*/
|
||
export function lookupUsdRate(currency, fxRates) {
|
||
if (currency === 'USD') return 1.0;
|
||
const rate = fxRates?.[currency] ?? SHARED_FX_FALLBACKS[currency];
|
||
return (rate != null && rate > 0) ? rate : null;
|
||
}
|
||
|
||
async function fetchWikipediaInfobox(fund, fxRates) {
|
||
const articleUrl = fund.wikipedia?.articleUrl;
|
||
if (!articleUrl) return null;
|
||
const resp = await fetch(articleUrl, {
|
||
headers: {
|
||
'User-Agent': CHROME_UA,
|
||
'Accept': 'text/html,application/xhtml+xml',
|
||
},
|
||
signal: AbortSignal.timeout(30_000),
|
||
});
|
||
if (!resp.ok) {
|
||
console.warn(`[seed-sovereign-wealth] ${fund.country}:${fund.fund} infobox fetch HTTP ${resp.status}`);
|
||
return null;
|
||
}
|
||
const html = await resp.text();
|
||
const hit = parseWikipediaArticleInfobox(html);
|
||
if (!hit) return null;
|
||
const usdRate = lookupUsdRate(hit.currencyNative, fxRates);
|
||
if (usdRate == null) {
|
||
console.warn(`[seed-sovereign-wealth] ${fund.country}:${fund.fund} infobox currency ${hit.currencyNative} has no FX rate; skipping`);
|
||
return null;
|
||
}
|
||
return {
|
||
aum: hit.valueNative * usdRate,
|
||
aumYear: hit.aumYear,
|
||
source: 'wikipedia_infobox',
|
||
currencyNative: hit.currencyNative,
|
||
fxRate: usdRate,
|
||
};
|
||
}
|
||
|
||
// ── Aggregation ──
|
||
|
||
/**
|
||
* Pure predicate: should this manifest fund be SKIPPED from the
|
||
* SWF buffer calculation? Returns the skip reason string or null.
|
||
*
|
||
* Two skip conditions (Phase 1 §schema):
|
||
* - `excluded_overlaps_with_reserves: true` — AUM already counted
|
||
* in central-bank FX reserves (SAFE-IC, HKMA-EF). Excluding
|
||
* prevents double-counting against reserveAdequacy /
|
||
* liquidReserveAdequacy.
|
||
* - `aum_verified: false` — fund AUM not primary-source-confirmed.
|
||
* Loaded for documentation; excluded from scoring per the
|
||
* data-integrity rule (Codex Round 1 #7).
|
||
*
|
||
* Pure function — exported for tests.
|
||
*
|
||
* @param {{ classification?: { excludedOverlapsWithReserves?: boolean }, aumVerified?: boolean }} fund
|
||
* @returns {'excluded_overlaps_with_reserves' | 'aum_unverified' | null}
|
||
*/
|
||
export function shouldSkipFundForBuffer(fund) {
|
||
if (fund?.classification?.excludedOverlapsWithReserves === true) {
|
||
return 'excluded_overlaps_with_reserves';
|
||
}
|
||
if (fund?.aumVerified === false) {
|
||
return 'aum_unverified';
|
||
}
|
||
return null;
|
||
}
|
||
|
||
/**
|
||
* Pure helper: apply the `aum_pct_of_audited` multiplier to a
|
||
* resolved AUM value. When the fund's classification has no
|
||
* `aum_pct_of_audited`, returns the AUM unchanged.
|
||
*
|
||
* Used for fund-of-funds split entries (e.g. KIA-GRF is ~5% of the
|
||
* audited KIA total; KIA-FGF is ~95%).
|
||
*
|
||
* Pure function — exported for tests.
|
||
*
|
||
* @param {number} resolvedAumUsd
|
||
* @param {{ classification?: { aumPctOfAudited?: number } }} fund
|
||
* @returns {number}
|
||
*/
|
||
export function applyAumPctOfAudited(resolvedAumUsd, fund) {
|
||
const pct = fund?.classification?.aumPctOfAudited;
|
||
if (typeof pct === 'number' && pct > 0 && pct <= 1) {
|
||
return resolvedAumUsd * pct;
|
||
}
|
||
return resolvedAumUsd;
|
||
}
|
||
|
||
async function fetchFundAum(fund, wikipediaCache, fxRates) {
|
||
// Source priority: official → IFSWF → Wikipedia list → Wikipedia
|
||
// per-fund infobox. Short-circuit on first non-null return so the
|
||
// highest-confidence source wins. The infobox sub-tier is last
|
||
// because it is per-fund fetch (N network round-trips, one per fund
|
||
// that misses the list article) — amortizing over the list article
|
||
// cache first minimizes live traffic.
|
||
const official = await fetchOfficialDisclosure(fund);
|
||
if (official) return official;
|
||
const ifswf = await fetchIfswfFiling(fund);
|
||
if (ifswf) return ifswf;
|
||
const wikipediaList = await fetchWikipediaRanking(fund, wikipediaCache);
|
||
if (wikipediaList) return wikipediaList;
|
||
const wikipediaInfobox = await fetchWikipediaInfobox(fund, fxRates);
|
||
if (wikipediaInfobox) return wikipediaInfobox;
|
||
return null;
|
||
}
|
||
|
||
// Build the fxSymbols map getSharedFxRates expects. We request every
|
||
// currency the infobox parser can reasonably surface — this is a
|
||
// superset of what any single seed run will need, but it keeps the
|
||
// shared Redis FX cache warm for other seeders and costs one Yahoo
|
||
// fetch per uncached ccy. The set matches CURRENCY_SYMBOL_TO_ISO.
|
||
function buildFxSymbolsForSwf() {
|
||
const ccys = new Set(CURRENCY_SYMBOL_TO_ISO.map(([, iso]) => iso));
|
||
const symbols = {};
|
||
for (const ccy of ccys) {
|
||
if (ccy === 'USD') continue;
|
||
symbols[ccy] = `${ccy}USD=X`;
|
||
}
|
||
return symbols;
|
||
}
|
||
|
||
/**
|
||
* Net-imports denominator transformation for the SWF rawMonths
|
||
* calculation.
|
||
*
|
||
* netImports = grossImports × (1 − reexportShareOfImports)
|
||
*
|
||
* For countries without a re-export adjustment (reexportShareOfImports = 0),
|
||
* netImports === grossImports — status-quo behaviour.
|
||
*
|
||
* For re-export hubs, the fraction of gross imports that flows through
|
||
* as re-exports does not represent domestic consumption, so the SWF's
|
||
* "months of imports covered" should be measured against the RESIDUAL
|
||
* import stream that actually settles.
|
||
*
|
||
* Exported for unit tests that pin the denominator math independently
|
||
* of live-API fixtures.
|
||
*
|
||
* @param {number} grossImportsUsd Total annual imports in USD (WB NE.IMP.GNFS.CD)
|
||
* @param {number} reexportShareOfImports 0..1 inclusive; 0 = no adjustment
|
||
* @returns {number} Net annual imports in USD
|
||
*/
|
||
export function computeNetImports(grossImportsUsd, reexportShareOfImports) {
|
||
if (!Number.isFinite(grossImportsUsd) || grossImportsUsd <= 0) {
|
||
throw new Error(`computeNetImports: grossImportsUsd must be positive finite, got ${grossImportsUsd}`);
|
||
}
|
||
const share = Number.isFinite(reexportShareOfImports) ? reexportShareOfImports : 0;
|
||
if (share < 0 || share >= 1) {
|
||
throw new Error(`computeNetImports: reexportShareOfImports must be in [0, 1), got ${share}`);
|
||
}
|
||
return grossImportsUsd * (1 - share);
|
||
}
|
||
|
||
export async function fetchSovereignWealth() {
|
||
const manifest = loadSwfManifest();
|
||
// Re-export share: per-country fraction of gross imports that flow
|
||
// through as re-exports without settling as domestic consumption.
|
||
// Sourced from Comtrade via the sibling Reexport-Share seeder that
|
||
// runs immediately before this one inside the resilience-recovery
|
||
// bundle. loadReexportShareFromRedis() enforces bundle-run freshness
|
||
// — if the sibling's seed-meta predates this bundle's start, all
|
||
// countries fall back to gross imports (hard fail-safe). Countries
|
||
// not in the returned map get netImports = grossImports (status-quo
|
||
// behaviour). Absence MUST NOT throw or zero the denominator.
|
||
const reexportShareByCountry = await loadReexportShareFromRedis();
|
||
const [imports, wikipediaCache, fxRates] = await Promise.all([
|
||
fetchAnnualImportsUsd(),
|
||
loadWikipediaRankingsCache(),
|
||
getSharedFxRates(buildFxSymbolsForSwf(), SHARED_FX_FALLBACKS),
|
||
]);
|
||
|
||
const countries = {};
|
||
const sourceMix = { official: 0, ifswf: 0, wikipedia_list: 0, wikipedia_infobox: 0 };
|
||
const unmatched = [];
|
||
// Provenance audit for the cohort-sanity report: which countries had a
|
||
// net-imports adjustment applied, and by how much. Keeps the scorer
|
||
// transparent about where denominators diverge from gross imports.
|
||
const reexportAdjustments = [];
|
||
|
||
for (const [iso2, funds] of groupFundsByCountry(manifest)) {
|
||
const importsEntry = imports[iso2];
|
||
if (!importsEntry) {
|
||
// WB `NE.IMP.GNFS.CD` missing for this country (transient outage
|
||
// or a country with spotty WB coverage). Silently dropping would
|
||
// let the downstream scorer interpret the absence as "no SWF" and
|
||
// score 0 with full coverage — substantively wrong. Log it
|
||
// loudly and surface via the unmatched list so the seed-meta
|
||
// observer can alert.
|
||
console.warn(`[seed-sovereign-wealth] ${iso2} skipped: World Bank imports (${IMPORTS_INDICATOR}) missing — cannot compute rawMonths denominator`);
|
||
for (const fund of funds) unmatched.push(`${fund.country}:${fund.fund} (no WB imports)`);
|
||
continue;
|
||
}
|
||
|
||
// PR 3A net-imports denominator. For re-export hubs (UNCTAD-cited
|
||
// entries in the manifest), replace the gross-imports denominator
|
||
// with net imports via `computeNetImports`. Countries without a
|
||
// manifest entry get grossImports unchanged (share=0 → identity).
|
||
const reexportEntry = reexportShareByCountry.get(iso2);
|
||
const reexportShare = reexportEntry?.reexportShareOfImports ?? 0;
|
||
const denominatorImports = computeNetImports(importsEntry.importsUsd, reexportShare);
|
||
if (reexportShare > 0) {
|
||
reexportAdjustments.push({
|
||
country: iso2,
|
||
grossImportsUsd: importsEntry.importsUsd,
|
||
reexportShareOfImports: reexportShare,
|
||
netImportsUsd: denominatorImports,
|
||
sourceYear: reexportEntry?.year ?? null,
|
||
});
|
||
}
|
||
|
||
const fundRecords = [];
|
||
for (const fund of funds) {
|
||
const skipReason = shouldSkipFundForBuffer(fund);
|
||
if (skipReason) {
|
||
console.log(`[seed-sovereign-wealth] ${fund.country}:${fund.fund} skipped — ${skipReason}`);
|
||
continue;
|
||
}
|
||
|
||
// AUM resolution: prefer manifest-provided primary-source AUM
|
||
// when verified; fall back to the existing Wikipedia/IFSWF
|
||
// resolution chain otherwise (existing entries that pre-date
|
||
// the schema extension still work unchanged).
|
||
let aum = null;
|
||
if (fund.aumVerified === true && typeof fund.aumUsd === 'number') {
|
||
aum = { aum: fund.aumUsd, aumYear: fund.aumYear ?? null, source: 'manifest_primary' };
|
||
} else {
|
||
aum = await fetchFundAum(fund, wikipediaCache, fxRates);
|
||
}
|
||
if (!aum) {
|
||
unmatched.push(`${fund.country}:${fund.fund}`);
|
||
continue;
|
||
}
|
||
|
||
const adjustedAum = applyAumPctOfAudited(aum.aum, fund);
|
||
const aumPct = fund.classification?.aumPctOfAudited;
|
||
sourceMix[aum.source] = (sourceMix[aum.source] ?? 0) + 1;
|
||
|
||
const { access, liquidity, transparency } = fund.classification;
|
||
const rawMonths = (adjustedAum / denominatorImports) * 12;
|
||
const effectiveMonths = rawMonths * access * liquidity * transparency;
|
||
|
||
fundRecords.push({
|
||
fund: fund.fund,
|
||
aum: adjustedAum,
|
||
aumYear: aum.aumYear,
|
||
source: aum.source,
|
||
...(aumPct != null ? { aumPctOfAudited: aumPct } : {}),
|
||
access,
|
||
liquidity,
|
||
transparency,
|
||
rawMonths,
|
||
effectiveMonths,
|
||
});
|
||
}
|
||
|
||
if (fundRecords.length === 0) continue;
|
||
const totalEffectiveMonths = fundRecords.reduce((s, f) => s + f.effectiveMonths, 0);
|
||
// Completeness denominator excludes funds that were INTENTIONALLY
|
||
// skipped from buffer scoring (excluded_overlaps_with_reserves OR
|
||
// aum_verified=false). Without this, manifest entries that exist
|
||
// for documentation only would artificially depress completeness
|
||
// for countries with mixed scorable + non-scorable funds — e.g.
|
||
// UAE (4 scorable + EIA unverified) would show completeness=0.8
|
||
// even when every scorable fund matched, and CN (CIC + NSSF
|
||
// scorable + SAFE-IC excluded) would show 0.67.
|
||
//
|
||
// The right denominator is "scorable funds for this country":
|
||
// funds where shouldSkipFundForBuffer returns null. Documentation-
|
||
// only entries are neither matched nor expected; they don't appear
|
||
// in the ratio at all.
|
||
const scorableFunds = funds.filter((f) => shouldSkipFundForBuffer(f) === null);
|
||
const expectedFunds = scorableFunds.length;
|
||
const matchedFunds = fundRecords.length;
|
||
const completeness = expectedFunds > 0 ? matchedFunds / expectedFunds : 0;
|
||
// `completeness` signals partial-seed on multi-fund countries (AE,
|
||
// SG). Downstream scorer must derate the country when completeness
|
||
// < 1.0 — silently emitting partial totalEffectiveMonths would
|
||
// under-rank countries whose secondary fund transiently drifted on
|
||
// Wikipedia. The country stays in the payload (so the scorer can
|
||
// use the partial number for IMPUTE-level coverage), but only
|
||
// completeness=1.0 countries count toward recordCount / health.
|
||
if (completeness < 1.0) {
|
||
console.warn(`[seed-sovereign-wealth] ${iso2} partial: ${matchedFunds}/${expectedFunds} scorable funds matched — completeness=${completeness.toFixed(2)}`);
|
||
}
|
||
countries[iso2] = {
|
||
funds: fundRecords,
|
||
totalEffectiveMonths,
|
||
// `annualImports` preserved for backwards compatibility + audit.
|
||
// `denominatorImports` (post-PR-3A) is the value ACTUALLY used in
|
||
// rawMonths math. For countries without a re-export adjustment
|
||
// the two are identical; for UNCTAD-cited re-export hubs the
|
||
// latter is smaller.
|
||
annualImports: importsEntry.importsUsd,
|
||
denominatorImports,
|
||
reexportShareOfImports: reexportShare,
|
||
expectedFunds,
|
||
matchedFunds,
|
||
completeness,
|
||
};
|
||
}
|
||
|
||
if (unmatched.length > 0) {
|
||
console.warn(`[seed-sovereign-wealth] ${unmatched.length} fund(s) unmatched across all tiers: ${unmatched.join(', ')}`);
|
||
}
|
||
|
||
const summary = buildCoverageSummary(manifest, imports, countries);
|
||
console.log(`[seed-sovereign-wealth] manifest coverage: ${summary.matchedFunds}/${summary.expectedFunds} funds across ${summary.expectedCountries} countries`);
|
||
for (const row of summary.countryStatuses) {
|
||
const tag = row.status === 'complete' ? 'OK ' : row.status === 'partial' ? 'PART' : 'MISS';
|
||
const extra = row.reason ? ` — ${row.reason}` : '';
|
||
console.log(`[seed-sovereign-wealth] ${tag} ${row.country} ${row.matched}/${row.expected}${extra}`);
|
||
}
|
||
|
||
if (reexportAdjustments.length > 0) {
|
||
console.log(`[seed-sovereign-wealth] re-export adjustment applied to ${reexportAdjustments.length} country/countries:`);
|
||
for (const adj of reexportAdjustments) {
|
||
console.log(`[seed-sovereign-wealth] ${adj.country} share=${adj.reexportShareOfImports.toFixed(2)} gross=$${(adj.grossImportsUsd / 1e9).toFixed(1)}B net=$${(adj.netImportsUsd / 1e9).toFixed(1)}B (source year ${adj.sourceYear ?? 'n/a'})`);
|
||
}
|
||
} else {
|
||
console.log(`[seed-sovereign-wealth] re-export manifest is empty; all countries use gross imports as the rawMonths denominator (status-quo behaviour)`);
|
||
}
|
||
|
||
const usedWikipedia = sourceMix.wikipedia_list + sourceMix.wikipedia_infobox > 0;
|
||
return {
|
||
countries,
|
||
seededAt: new Date().toISOString(),
|
||
manifestVersion: manifest.manifestVersion,
|
||
sourceMix,
|
||
sourceAttribution: {
|
||
wikipedia: usedWikipedia ? WIKIPEDIA_SOURCE_ATTRIBUTION : undefined,
|
||
},
|
||
summary,
|
||
// PR 3A §net-imports. Published for downstream audit (cohort-
|
||
// sanity release-gate + operator verification). Empty array means
|
||
// the re-export manifest has no entries yet; follow-up PRs populate
|
||
// it with UNCTAD-cited shares per country.
|
||
reexportAdjustments,
|
||
};
|
||
}
|
||
|
||
/**
|
||
* Manifest-vs-seeded coverage summary. Exported so the enumeration logic
|
||
* is unit-testable — previously, a country that failed (no WB imports +
|
||
* no Wikipedia match) disappeared silently unless a log line happened to
|
||
* emit on the specific code path. This function guarantees every
|
||
* manifest country appears with an explicit status and reason.
|
||
*
|
||
* @param {{ funds: Array<{ country: string, fund: string }> }} manifest
|
||
* @param {Record<string, unknown>} imports Per-country import entries from pickLatestPerCountry
|
||
* @param {Record<string, { matchedFunds: number, expectedFunds: number, completeness: number }>} countries Seeded country payload
|
||
*/
|
||
export function buildCoverageSummary(manifest, imports, countries) {
|
||
// Coverage denominator excludes manifest entries that are
|
||
// documentation-only by design — funds with
|
||
// `excluded_overlaps_with_reserves: true` (SAFE-IC, HKMA-EF) or
|
||
// `aum_verified: false` (EIA). Counting them as "expected" would
|
||
// depress the headline coverage ratio for countries with mixed
|
||
// scorable + non-scorable fund rosters. Same fix as the per-country
|
||
// completeness denominator above; see comment there.
|
||
const scorableManifestFunds = manifest.funds.filter((f) => shouldSkipFundForBuffer(f) === null);
|
||
const expectedFundsTotal = scorableManifestFunds.length;
|
||
const expectedCountries = new Set(scorableManifestFunds.map((f) => f.country));
|
||
let matchedFundsTotal = 0;
|
||
for (const entry of Object.values(countries)) matchedFundsTotal += entry.matchedFunds;
|
||
// Every status carries a `reason` field so downstream consumers that
|
||
// iterate the persisted countryStatuses can safely dereference `.reason`
|
||
// without defensive checks. `complete` and `partial` use `null` to make
|
||
// the shape uniform; `missing` carries a human-readable string naming
|
||
// which upstream the operator should investigate (WB imports vs
|
||
// Wikipedia fund match).
|
||
const countryStatuses = [];
|
||
for (const iso2 of expectedCountries) {
|
||
const entry = countries[iso2];
|
||
if (entry && entry.completeness === 1.0) {
|
||
countryStatuses.push({ country: iso2, status: 'complete', matched: entry.matchedFunds, expected: entry.expectedFunds, reason: null });
|
||
} else if (entry) {
|
||
countryStatuses.push({ country: iso2, status: 'partial', matched: entry.matchedFunds, expected: entry.expectedFunds, reason: null });
|
||
} else {
|
||
const reason = imports[iso2] ? 'no fund AUM matched' : 'missing WB imports';
|
||
countryStatuses.push({
|
||
country: iso2,
|
||
status: 'missing',
|
||
matched: 0,
|
||
expected: countManifestFundsForCountry(manifest, iso2),
|
||
reason,
|
||
});
|
||
}
|
||
}
|
||
countryStatuses.sort((a, b) => a.country.localeCompare(b.country));
|
||
return {
|
||
expectedCountries: expectedCountries.size,
|
||
expectedFunds: expectedFundsTotal,
|
||
matchedCountries: Object.keys(countries).length,
|
||
matchedFunds: matchedFundsTotal,
|
||
countryStatuses,
|
||
};
|
||
}
|
||
|
||
function countManifestFundsForCountry(manifest, iso2) {
|
||
// Counts SCORABLE funds for the given country (excludes documentation-
|
||
// only entries: `excluded_overlaps_with_reserves: true` and
|
||
// `aum_verified: false`). Used by buildCoverageSummary's missing-
|
||
// country path so the "expected" figure on a missing country reflects
|
||
// what the seeder would actually try to score, not all manifest
|
||
// entries.
|
||
let n = 0;
|
||
for (const f of manifest.funds) {
|
||
if (f.country !== iso2) continue;
|
||
if (shouldSkipFundForBuffer(f) !== null) continue;
|
||
n++;
|
||
}
|
||
return n;
|
||
}
|
||
|
||
export function validate(data) {
|
||
// Tier 3 (Wikipedia) is now live; expected floor = 1 country once any
|
||
// manifest fund matches. We keep the floor lenient (>=0) during the
|
||
// first Railway-cron bake-in window so a transient Wikipedia fetch
|
||
// failure does not poison seed-meta for 30 days (see
|
||
// feedback_strict_floor_validate_fail_poisons_seed_meta.md). Once
|
||
// the seeder has ~7 days of clean runs, tighten to `>= 1`.
|
||
//
|
||
// Strict null check: `typeof null === 'object'` is true in JS, so a
|
||
// bare `typeof x === 'object'` would let `{ countries: null }` through
|
||
// and downstream consumers would crash on property access. Accept
|
||
// only a non-null plain object.
|
||
const c = data?.countries;
|
||
return c != null && typeof c === 'object' && !Array.isArray(c);
|
||
}
|
||
|
||
// Health-facing record count. Counts ONLY fully-matched countries
|
||
// (completeness === 1.0), so a scraper drift on a secondary fund (e.g.
|
||
// Mubadala while ADIA still matches, or Temasek while GIC still matches)
|
||
// drops the recordCount seed-health signal — catching the partial-seed
|
||
// silent-corruption class that an "any country that has any fund"
|
||
// count would miss. Per-country completeness stays in the payload for
|
||
// the scorer to derate; recordCount is the operational alarm.
|
||
export function declareRecords(data) {
|
||
const countries = data?.countries ?? {};
|
||
let fully = 0;
|
||
for (const entry of Object.values(countries)) {
|
||
if (entry?.completeness === 1.0) fully++;
|
||
}
|
||
return fully;
|
||
}
|
||
|
||
if (process.argv[1]?.endsWith('seed-sovereign-wealth.mjs')) {
|
||
runSeed('resilience', 'recovery:sovereign-wealth', CANONICAL_KEY, fetchSovereignWealth, {
|
||
validateFn: validate,
|
||
ttlSeconds: CACHE_TTL_SECONDS,
|
||
sourceVersion: `swf-manifest-v1-${new Date().getFullYear()}`,
|
||
// Health-facing recordCount delegates to declareRecords so the
|
||
// seed-meta record_count stays consistent with the operational
|
||
// alarm (only countries whose manifest funds all matched count).
|
||
recordCount: declareRecords,
|
||
declareRecords,
|
||
schemaVersion: 1,
|
||
maxStaleMin: 86400,
|
||
// Empty payload is still acceptable while tiers 1/2 are stubbed
|
||
// and any transient Wikipedia outage occurs; downstream IMPUTE
|
||
// path handles it.
|
||
emptyDataIsFailure: false,
|
||
}).catch((err) => {
|
||
const _cause = err.cause ? ` (cause: ${err.cause.message || err.cause.code || err.cause})` : '';
|
||
console.error('FATAL:', (err.message || err) + _cause);
|
||
process.exit(1);
|
||
});
|
||
}
|