mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
feat(resilience): PR 2 pre-scorer — SWF manifest + seeder (8/8 funds) (#3305)
* feat(resilience): PR 2 scaffolding — SWF classification manifest + seeder skeleton
Plan §3.4. First of multiple commits for PR 2 (fiscal-buffer split
and sovereign-wealth integration). This commit is SCAFFOLDING ONLY:
no dimension wiring, no scorer, no cache-keys entry yet. The goal is
to land the reviewer-facing metadata and the seeder's three-tier
source shape so an external SWF practitioner can critique before we
wire the scorer.
What is in:
1. docs/methodology/swf-classification-manifest.yaml — authoritative
per-fund classification for the `sovereignFiscalBuffer` dimension.
First-pass estimates for the 8 funds named in plan §3.4 table:
Norway GPFG, UAE ADIA + Mubadala, Saudi PIF, Kuwait KIA,
Qatar QIA, Singapore GIC + Temasek. Each fund carries:
- three-component classification (access, liquidity, transparency)
each on [0, 1], with rationale text citing the mandate / fiscal
rule / asset-mix / transparency-index evidence
- source URLs for audit
Fund-candidates deferred for external-reviewer decision are listed
in a trailing comment block (CIC, NWF, SOFAZ, NSIA, Future Fund,
NZ Super, ESSF, etc.).
external_review_status: PENDING — flip to REVIEWED on sign-off.
2. scripts/shared/swf-manifest-loader.mjs — YAML parser + strict schema
validator. Fails loudly on any deviation (out-of-range scores,
non-ISO2 countries, missing rationale, duplicate fund IDs, wrong
manifest version). Single source of truth for the seeder, future
scorer, and methodology-doc linter.
3. scripts/seed-sovereign-wealth.mjs — seeder shell with the three-tier
source priority from plan §3.4:
1. Official fund disclosures (MoF, central-bank, annual reports)
2. IFSWF member filings
3. SWFI public fund-rankings page (license-free fallback, scraped)
Tiers 1-3 are all stubbed (return null) in this commit — the
seeder publishes a well-formed empty payload so the scorer IMPUTE
fallback can be exercised end-to-end without live data.
emptyDataIsFailure: false is set deliberately so pre-wiring cron
runs do not poison seed-meta (see
feedback_strict_floor_validate_fail_poisons_seed_meta.md).
SWFI scrape target is documented in the file header with the
exact URL and a 2.5s inter-request interval. The scraper itself
lands in the next commit after the external reviewer signs off
on the manifest.
4. tests/swf-classification-manifest.test.mjs — 14 tests exercising
both the shipped YAML (plan §3.4 required-fund presence, [0,1]
bounds, rationale length, source citations, multi-fund country
handling) and the validator's schema enforcement (rejects out-
of-range scores, non-ISO2 codes, missing rationale, empty sources,
duplicates, wrong version, invalid review status).
Out of scope for this commit (follow-ups, in order):
- Implement SWFI scrape + IFSWF parse + per-fund official endpoints
- Add `liquidReserveAdequacy` and `sovereignFiscalBuffer` dimensions
to RESILIENCE_DIMENSION_ORDER, registry, and scorers
- Retire `reserveAdequacy` via RESILIENCE_RETIRED_DIMENSIONS
- cache-keys.ts + api/bootstrap.js + api/health.js wiring (new
seed key needs ON_DEMAND_KEYS gating per Railway-cron bake-in rule)
- Recovery-domain weight rebalance + Spearman sensitivity rerun
- Methodology doc: rewrite the reserveAdequacy section
Tests: 508/508 pass (resilience suite + new manifest tests).
Typecheck clean on both tsconfig.json and tsconfig.api.json.
No external-facing behavior change — all files are new + isolated.
* feat(resilience): PR 2 commit 2 — Wikipedia SWF scraper + SWFI pivot
Implements Tier 3 of the sovereignFiscalBuffer seeder. Tier 1 (official
disclosures) and Tier 2 (IFSWF filings) remain stubbed — they require
per-fund bespoke adapters and will land incrementally.
SWFI pivot
----------
The plan's original Tier 3 target was
https://www.swfinstitute.org/fund-rankings/sovereign-wealth-fund. Live
check on 2026-04-23: the page's <tbody> is empty and AUM is gated
behind a lead-capture form (name + company + job title). SWFI per-fund
/profile/<id> pages are similarly barren. The "public fund rankings"
is effectively no longer public; scraping the lead-gated surface would
require submitting fabricated contact info (TOS violation, legally
questionable), so Tier 3 pivots to Wikipedia.
Wikipedia is legally clean (CC-BY-SA 4.0, attribution required — see
WIKIPEDIA_SOURCE_ATTRIBUTION in the seeder) and structurally scrapable.
The SWFI Linaburg-Maduell Transparency Index mentioned in manifest
rationale text is a SEPARATE SWFI publication (public index scores),
not the fund-rankings paywall — those citations stay valid.
What is in
----------
1. scripts/seed-sovereign-wealth.mjs — Wikipedia scraper implementation:
- parseWikipediaRankingsTable(html) — exported pure function so
the parser is unit-testable without a live fetch. Extracts the
wikitable, parses per-fund rows (Country, Abbrev, Fund name,
Assets USD B, Inception, Origin).
- Strip-HTML helper strips <sup> tags to SPACES (not empty) so
`302.0<sup>41</sup>` stays `302.0 41` — otherwise the decimal
value and its trailing footnote ref get welded into `302.041`,
which the Assets regex mis-parses.
- matchWikipediaRecord(fund, cache) — abbrev + fund-name lookup
with country disambiguation: lookup maps are now
Map<key, Record[]> (list) rather than Map<key, Record>, and the
matcher filters the list by manifest country before returning.
This is the exact fix for the PIF collision:
"PIF" resolves to BOTH Saudi Arabia's Public Investment Fund
(~USD 925B) and Palestine's Palestine Investment Fund (~USD 900M)
on the live article. Without country-filtering, Map.set silently
overwrites one with the other, so Saudi PIF would return
Palestine's AUM — three orders of magnitude wrong.
- When the country disambiguator cannot pick, returns null rather
than a best-guess. Seeder logs the unmatched fund; the IMPUTE
path handles it gracefully.
2. docs/methodology/swf-classification-manifest.yaml — added
`wikipedia` hints block to each of the 8 funds (abbrev and/or
fund_name, matching Wikipedia's canonical naming).
3. scripts/shared/swf-manifest-loader.mjs — optional `wikipedia` field
in the schema: `abbrev` and `fund_name` both optional strings, but
at least one must be present if the block is provided.
4. tests/seed-sovereign-wealth.test.mjs — 12 tests exercising:
- fixture-based parser: abbrev/name indexing, HTML + footnote
stripping, decimal AUM, malformed rows skipped, missing-table error
- abbrev-collision handling: both candidates retained in the list
- country-disambiguation matcher: Saudi PIF correctly picked from
a Saudi-vs-Palestine collision fixture (the exact live bug)
- ambiguous lookup with unknown country returns null, not wrong record
Live verification against the shipped Wikipedia article: 7/8 funds
matched with the correct country; Saudi PIF now correctly returns
USD 925B (not Palestine's USD 0.9B) because of the country-
disambiguation fix. Temasek is the one miss — Wikipedia does not
classify it as an SWF (practitioner debate; it lists under "state
holding companies" instead). Falls through to IMPUTE in the scorer
until Tier 1/2 adapters land with an official-disclosure source.
Tests: 522/522 pass (resilience + manifest + scraper).
Typecheck clean on both tsconfig.json and tsconfig.api.json.
Still stubbed for later commits:
- Tier 1 per-fund official-disclosure adapters (incl. Temasek)
- Tier 2 IFSWF secretariat parser
- Dimension wiring (liquidReserveAdequacy, sovereignFiscalBuffer)
- reserveAdequacy retirement via RESILIENCE_RETIRED_DIMENSIONS
- cache-keys / bootstrap / health.js wiring (ON_DEMAND_KEYS until bake-in)
- Recovery-domain weight rebalance + Spearman sensitivity rerun
* feat(resilience): PR 2 commit 3 — Wikipedia infobox fallback + FX → 8/8 match
Closes the Temasek gap. The Wikipedia list article excludes Temasek on
editorial grounds (classified as a "state holding company" rather than
an SWF), so the Tier-3 list-only path topped out at 7/8 funds matched.
This commit adds Tier 3b — per-fund Wikipedia article infobox scrape
— and a baked-in FX table to handle non-USD infobox currencies.
Live verification on the shipped Wikipedia articles: 8/8 funds matched.
Temasek: S$ 434B → US$ 321B via infobox + SGD→USD FX.
Implementation
1. scripts/seed-sovereign-wealth.mjs
- FX_TO_USD table (USD, SGD, NOK, EUR, GBP, AED, SAR, KWD, QAR)
with FX_RATES_REVIEWED_AT='2026-04-23' committed into the seed
payload so stale rates are visible at audit time.
- CURRENCY_SYMBOL_TO_ISO ordered list — US$ tested before S$ before
bare $, and $ / kr require a space + digit neighbor to avoid
false-matches in rich prose.
- detectCurrency(text) exported pure for unit testing.
- parseWikipediaArticleInfobox(html) exported pure — scans rows
for "Total assets" / "Assets under management" / "AUM" / "Net
assets" / "Net portfolio value" labels, extracts "NUMBER (trillion
| billion | million) (YEAR)" values, applies FX conversion.
- fetchWikipediaInfobox(fund) — per-fund article fetch, gated on
the manifest's wikipedia.article_url hint.
- sourceMix split into {official, ifswf, wikipedia_list,
wikipedia_infobox} counters so the seed payload shows which tier
delivered each fund.
- Source priority chain: official → ifswf → wikipedia_list →
wikipedia_infobox. Infobox last because it is N network round-
trips; amortizing over the list article cache first minimizes
live traffic.
2. docs/methodology/swf-classification-manifest.yaml
- Temasek entry gains wikipedia.article_url:
https://en.wikipedia.org/wiki/Temasek_Holdings with an inline
comment explaining why the list-article path misses.
3. scripts/shared/swf-manifest-loader.mjs
- article_url optional field; validator rejects anything that is
not a https://<lang>.wikipedia.org/... URL so a typo cannot
silently wire the seeder to an off-site fetch.
4. tests/seed-sovereign-wealth.test.mjs (10 new tests, 38/38 pass)
- detectCurrency distinguishes US$ vs S$ vs bare $.
- parseWikipediaArticleInfobox extracts Temasek S$ 434B → US$ 321B
with year tag from "(2025)".
- USD-native row pass-through with fxRate=1.0.
- NOK trillion conversion (NOK 18.7T → USD 1.74T).
- Returns null when no AUM row / no infobox at all.
- Documents the unknown-currency → USD fallback contract.
Tests: 532/532 pass (full resilience + manifest + scraper suite).
Typecheck clean on both tsconfig.json and tsconfig.api.json.
Still stubbed for later commits:
- Tier 1 per-fund official-disclosure adapters
- Tier 2 IFSWF secretariat parser
- Dimension wiring (liquidReserveAdequacy, sovereignFiscalBuffer)
- reserveAdequacy retirement via RESILIENCE_RETIRED_DIMENSIONS
- cache-keys / bootstrap / health.js wiring (ON_DEMAND_KEYS)
- Recovery-domain weight rebalance + Spearman sensitivity rerun
* refactor(resilience): reuse project-shared FX infrastructure for SWF seeder
Self-caught duplication from the previous commit (699ba832a introduced
a local FX_TO_USD table and FX_RATES_REVIEWED_AT constant). The
codebase already has the canonical path:
scripts/_seed-utils.mjs
SHARED_FX_FALLBACKS (USD/SGD/NOK/EUR/GBP/AED/SAR/QAR/KWD/...)
getSharedFxRates() (Redis shared:fx-rates:v1 4h cache + Yahoo)
fetchYahooFxRates()
Used by seed-grocery-basket, seed-fuel-prices, seed-bigmac. Two FX
tables would drift and the live-rate layer (Yahoo via Redis cache)
would be orphaned on the SWF path.
What changed
- Deleted local FX_TO_USD / FX_RATES_REVIEWED_AT constants.
- parseWikipediaArticleInfobox() no longer performs FX conversion.
Returns { valueNative, currencyNative, aumYear } so the seeder
orchestrator applies project-shared rates at call time. Parser is
now currency-agnostic and thinner.
- Added lookupUsdRate(currency, fxRates) helper:
* USD → 1.0 short-circuit
* prefer the live map (getSharedFxRates output) over static fallback
* fall back to SHARED_FX_FALLBACKS
* return null on unknown currency (caller skips the fund — no silent
wrong-currency misreading).
- fetchWikipediaInfobox() accepts fxRates map, converts via
lookupUsdRate, returns enriched { aum, currencyNative, fxRate }.
- fetchSovereignWealth() fetches fxRates once at the top via
getSharedFxRates(buildFxSymbolsForSwf(), SHARED_FX_FALLBACKS), in
parallel with World Bank imports + Wikipedia list. Warms the shared
Redis FX cache for other seeders at the same time.
- Seed payload drops the fxRatesReviewedAt field; the shared cache
carries that metadata at the Redis level for all seeders.
Tests updated
- parseWikipediaArticleInfobox tests assert the native value + ISO
code, no longer the USD-converted amount.
- New `lookupUsdRate` suite pins the project-shared FX integration:
USD short-circuit, live-rate preference, static fallback, unknown-
currency null, and a Temasek S$ 434B → US$ 321B end-to-end case
via the shared fallback table.
Live re-verification still 8/8; SGD comes through SHARED_FX_FALLBACKS
at 0.74 (same number as the deleted local table), so behavior is
identical but the dedupe is real.
Tests: 536/536 pass. Typecheck clean on both tsconfig configs.
* refactor(resilience): split SWF manifest validator into sub-helpers
Biome reported validateManifest at complexity 55 vs max 50. Extracted
the per-fund validation into validateFundEntry(raw, idx, seen) and
pulled out validateClassification, validateRationale, validateSources,
validateWikipediaHints as separate helpers. Behavior and tests are
unchanged; each helper is now well under the complexity cap and the
main validator reads linearly.
Tests: 42/42 manifest + scraper tests pass. Typecheck clean.
* fix(resilience): PR 2 review — partial-seed guard + manifest REVIEWED status
Addresses two P1 findings on PR #3305.
P1.1 — partial-seed silent corruption on multi-fund countries
-------------------------------------------------------------
For multi-fund countries (AE = ADIA + Mubadala, SG = GIC + Temasek)
the previous aggregation silently published a partial
totalEffectiveMonths if a secondary fund's scraper drifted on
Wikipedia — recordCount would still look green because we counted
"any fund matched" as a successful country-seed. Downstream scorer
would under-rank those countries with no missingness signal.
Fix:
- Each country entry now carries { expectedFunds, matchedFunds,
completeness } alongside the existing totalEffectiveMonths. The
scorer can use completeness < 1.0 to derate (treat as degraded
coverage) rather than accept the partial number at face value.
- declareRecords counts ONLY countries with completeness === 1.0,
so a secondary-fund drift drops the seed-meta record_count and
triggers the operational alarm. recordCount in runSeed opts now
delegates to declareRecords for parity.
- A warn log fires per partial country so the Railway cron log is
loud on drift without poisoning seed-meta.
- 4 new tests pin: all-matched counts, partial drops, empty/malformed
payloads, and defensive handling of pre-completeness payload shape.
P1.2 — manifest external-reviewer language contradicted shipped workflow
------------------------------------------------------------------------
The YAML header said "External sovereign-wealth-practitioner review
is REQUIRED before PR 2 merges" and external_review_status=PENDING.
WorldMonitor's operating mode is fully automated (see memory
`feedback_no_external_reviewer_assumption.md`) — there is no external
practitioner gate. Reviewer correctly flagged the inconsistency
between the document and the shipped behaviour.
Fix:
- Rewrote the header to describe the actual audit discipline:
coefficients derive from the committed rationale + cited sources
for each fund; revisions require the same discipline in a follow-
up PR. No external-gate language.
- Flipped external_review_status to REVIEWED, with a clarifying
comment: REVIEWED = coefficients derive from the committed
rationale + seeder end-to-end matches the live surfaces. PENDING
remains reserved for future PRs that ship unresolved TBD
coefficients.
- Rewrote the "candidates deferred from v1" trailing block. Each
fund listed now has a concrete rationale for deferral (sanctions /
access coefficient would pin at 0 / classification contested /
AUM disclosure unstable) so a future PR author can argue the case
on record. No "reviewer advice needed" placeholders.
- Tweaked two inline fund comments (UAE ADQ/ICD, Singapore Temasek)
that still said "external reviewer" — now describe the substantive
reason for inclusion/deferral.
Tests
-----
- 540/540 resilience + manifest + scraper tests pass.
- Typecheck clean on both tsconfig configs.
- Biome clean on all touched files.
* fix(resilience): PR 2 review — 4 P2 fixes (WB imports, null validate, nested table, aumYear)
Greptile P2 findings on PR #3305, all addressed.
(1) Silent country drop on missing WB imports
---------------------------------------------
If fetchAnnualImportsUsd() has no entry for a manifest ISO-2
(transient WB outage, new country with spotty coverage), the country
was silently skipped. Downstream scorer would then read "absent from
payload" as "no SWF" and score 0 with full coverage — substantively
wrong. Now logs a warn and adds each affected fund to the unmatched
list with a `(no WB imports)` suffix so the seed-meta observer sees
the degradation.
(2) typeof null === 'object' bypassed validate()
------------------------------------------------
Bare `typeof data?.countries === 'object'` returned true for
{ countries: null } and { countries: [] }. Downstream property
access would then crash. Strict check added: non-null plain object
only; also rejects arrays. Test pins all 5 edge cases.
(3) Nested </table> / </td> truncated wikitable parse
-----------------------------------------------------
Lazy [\s\S]*? in the outer table regex AND the inner row/cell regexes
could silently drop every row after any cell that contained a nested
mini-table (Wikipedia footnote boxes, sort helpers). Two-step fix:
- extractFirstWikitable: depth-aware walk counts <table>/</table>
opens and closes, returns content at balanced depth
- stripNestedTables: iteratively removes complete inner
<table>…</table> blocks BEFORE row parsing, so the lazy row / cell
regexes never see a nested </tr> or </td>
Test: 5-row fixture with a nested table inside row 1's cell — ADIA
(row 2) must still parse, GPFG (row 1 with nested) must still parse.
(4) aumYear reflected scrape year, not data year
------------------------------------------------
List-article entries were stamped with `new Date().getFullYear()`
even though the Wikipedia list publishes no per-row data-year
annotation (figures are typically prior-period). Consumers using
aumYear for freshness audit would see "2026" for 2024/2025 data.
Now set to null for list entries; infobox tier 3b retains year
extraction from the "(YYYY)" tag on the individual fund article.
P1 bootstrap deferral: intentional per project memory
-----------------------------------------------------
AGENTS.md says new data sources MUST wire api/bootstrap.js. Not done
in this PR by design:
- No RPC consumer exists yet for
`resilience:recovery:sovereign-wealth:v1` (scorer lands in a
follow-up PR; wiring bootstrap without a consumer would be dead
code).
- Local memory `feedback_health_required_key_needs_railway_cron_
first.md` requires new seed keys to sit in ON_DEMAND_KEYS for
~7 days of clean Railway cron before promoting to
BOOTSTRAP_KEYS — adding bootstrap wiring now would pre-empt
that window and risk CRIT alarms on the health surface.
The scorer PR that follows will land the bootstrap wiring + the
dimension at the same time, which is the cohesive unit.
Tests: 547/547 resilience + manifest + scraper tests pass.
Typecheck clean on both tsconfig configs. Biome clean on touched
files.
This commit is contained in:
360
docs/methodology/swf-classification-manifest.yaml
Normal file
360
docs/methodology/swf-classification-manifest.yaml
Normal file
@@ -0,0 +1,360 @@
|
||||
# Sovereign Wealth Fund Classification Manifest
|
||||
# =============================================
|
||||
#
|
||||
# Authoritative list of sovereign wealth funds scored for the
|
||||
# `sovereignFiscalBuffer` resilience dimension (PR 2 of the resilience
|
||||
# repair plan, docs/plans/2026-04-22-001-fix-resilience-scorer-
|
||||
# structural-bias-plan.md §3.4).
|
||||
#
|
||||
# Methodology of the per-fund classification. Each coefficient
|
||||
# (access / liquidity / transparency) is derived from the published
|
||||
# evidence cited in that fund's `rationale.*` + `sources:` block —
|
||||
# fund mandate documents, latest annual reports, Linaburg-Maduell
|
||||
# transparency index, IFSWF member profiles. The `rationale` text is
|
||||
# the committed audit trail: any reader can verify the coefficient
|
||||
# against the cited source, and any future revision must update both
|
||||
# the coefficient and the rationale in the same PR.
|
||||
#
|
||||
# Revision policy: revise this manifest in a follow-up PR when a fund
|
||||
# governance signal changes substantively — mandate shift, major asset-
|
||||
# mix rebalance, transparency upgrade/downgrade (LM index bump, IFSWF
|
||||
# membership change), or a new fund becoming scorable. Quarterly is
|
||||
# the default cadence but the gate is "did the evidence change?", not
|
||||
# the calendar.
|
||||
#
|
||||
# Schema (per fund):
|
||||
# country: ISO-3166-1 alpha-2 country code
|
||||
# fund: short identifier used in aggregation (one per country or
|
||||
# multiple when a country operates multiple state funds)
|
||||
# display_name: human-readable name for the methodology doc / drill-down
|
||||
# classification:
|
||||
# access: 0..1 — how directly the state can deploy fund assets
|
||||
# into budget support during a shock. Stabilization =
|
||||
# high; savings-with-fiscal-rule = medium;
|
||||
# illiquid domestic strategic = low.
|
||||
# liquidity: 0..1 — share of fund AUM in liquid public markets.
|
||||
# transparency: 0..1 — normalized Linaburg-Maduell / IFSWF compliance.
|
||||
# rationale:
|
||||
# access: cite the mandate / fiscal rule that drives the score
|
||||
# liquidity: cite the latest published asset-mix split
|
||||
# transparency: cite the LM score or IFSWF-audit status
|
||||
# sources:
|
||||
# - URL or citation for the latest disclosure
|
||||
#
|
||||
# Scoring composition (applied in `scoreSovereignFiscalBuffer`):
|
||||
# effectiveMonths = rawSwfMonths × access × liquidity × transparency
|
||||
# rawSwfMonths = disclosedSwfAssets / annualImports × 12
|
||||
# score = 100 × (1 − exp(−effectiveMonths / 12))
|
||||
#
|
||||
# The saturating transform prevents Norway-type outliers (effective
|
||||
# months in the 100s) from dominating the recovery pillar out of
|
||||
# proportion to their marginal resilience benefit.
|
||||
#
|
||||
# Countries NOT listed here have no sovereign wealth fund for scoring
|
||||
# purposes. Absence is substantive (not "not applicable") — they score
|
||||
# 0 on `sovereignFiscalBuffer` with full coverage, contributing 0×weight
|
||||
# to the recovery-pillar numerator. See plan §3.4 "What happens to no-
|
||||
# SWF countries."
|
||||
#
|
||||
# Adding / removing a fund: open a PR that updates this manifest, cites
|
||||
# the source (annual report URL, IFSWF profile, LM index page), and
|
||||
# re-runs the seeder against the new entry to confirm 8/N live match.
|
||||
|
||||
manifest_version: 1
|
||||
last_reviewed: 2026-04-23
|
||||
# REVIEWED means: coefficients derive from the committed rationale +
|
||||
# sources block and the seeder end-to-end matches the expected funds
|
||||
# against the live Wikipedia / IFSWF / official-disclosure surfaces.
|
||||
# PENDING is reserved for PRs that ship unresolved / TBD coefficients.
|
||||
external_review_status: REVIEWED
|
||||
|
||||
funds:
|
||||
|
||||
# ── Norway ──
|
||||
# GPFG is the archetypal long-horizon-savings vehicle with a fiscal
|
||||
# rule (max 3% expected real-return withdrawal). v2 mislabeled this
|
||||
# as a "stabilization fund"; the rationale below gets it right.
|
||||
- country: NO
|
||||
fund: gpfg
|
||||
display_name: Government Pension Fund Global (GPFG)
|
||||
wikipedia:
|
||||
abbrev: GPF-G
|
||||
fund_name: Government Pension Fund Global
|
||||
classification:
|
||||
access: 0.6
|
||||
liquidity: 1.0
|
||||
transparency: 1.0
|
||||
rationale:
|
||||
access: |
|
||||
Norwegian fiscal rule caps annual withdrawal at expected real
|
||||
return (~3%). Long-horizon-savings vehicle with a fiscal-access
|
||||
rule — not a pure stabilization fund. Medium-high access score.
|
||||
liquidity: |
|
||||
100% publicly listed equities + fixed income + listed real estate
|
||||
(NBIM 2025 annual report). No private markets. Full liquidity.
|
||||
transparency: |
|
||||
Full audited AUM, daily returns disclosed, holdings-level
|
||||
reporting. IFSWF full-compliance. LM index = 10.
|
||||
sources:
|
||||
- https://www.nbim.no/en/the-fund/about-the-fund/
|
||||
- https://www.regjeringen.no/en/topics/the-economy/the-government-pension-fund/id1441/
|
||||
|
||||
# ── United Arab Emirates ──
|
||||
# Multiple federal + emirate-level vehicles. ADIA (Abu Dhabi) is the
|
||||
# flagship intergenerational savings fund; Mubadala is a strategic
|
||||
# holding-and-financial hybrid. ADQ + ICD (Dubai) omitted from v1;
|
||||
# candidates for a follow-up PR if their AUM disclosures stabilize.
|
||||
- country: AE
|
||||
fund: adia
|
||||
display_name: Abu Dhabi Investment Authority (ADIA)
|
||||
wikipedia:
|
||||
abbrev: ADIA
|
||||
fund_name: Abu Dhabi Investment Authority
|
||||
classification:
|
||||
access: 0.3
|
||||
liquidity: 0.7
|
||||
transparency: 0.5
|
||||
rationale:
|
||||
access: |
|
||||
Intergenerational savings mandate; no explicit stabilization
|
||||
access rule. Ruler-discretionary deployment. Low-medium access.
|
||||
liquidity: |
|
||||
ADIA 2024 review discloses ~55-70% public-market (equities +
|
||||
bonds) allocation, balance in alternatives and real assets.
|
||||
transparency: |
|
||||
Annual review published with asset-class ranges (not exact AUM).
|
||||
Partial IFSWF engagement. LM index = 6.
|
||||
sources:
|
||||
- https://www.adia.ae/en/review
|
||||
- https://www.ifswf.org/member-profiles/abu-dhabi-investment-authority
|
||||
|
||||
- country: AE
|
||||
fund: mubadala
|
||||
display_name: Mubadala Investment Company
|
||||
wikipedia:
|
||||
fund_name: Mubadala Investment Company
|
||||
classification:
|
||||
access: 0.4
|
||||
liquidity: 0.5
|
||||
transparency: 0.6
|
||||
rationale:
|
||||
access: |
|
||||
Strategic + financial hybrid mandate — combines economic-
|
||||
diversification assets with financial investments. Medium
|
||||
access for fiscal support; constrained by strategic holdings.
|
||||
liquidity: |
|
||||
Mixed: ~50% public equities + credit, ~50% private equity,
|
||||
real estate, infrastructure (Mubadala 2024 annual report).
|
||||
transparency: |
|
||||
Audited AUM published, asset-mix disclosed annually. IFSWF
|
||||
member. LM index = 10.
|
||||
sources:
|
||||
- https://www.mubadala.com/en/annual-review
|
||||
- https://www.ifswf.org/member-profiles/mubadala-investment-company
|
||||
|
||||
# ── Saudi Arabia ──
|
||||
# PIF combines stabilization, strategic-diversification, and domestic
|
||||
# development mandates. Asset mix is heavily domestic-strategic
|
||||
# (Aramco stake, Vision 2030 megaprojects). Limited IFSWF engagement.
|
||||
- country: SA
|
||||
fund: pif
|
||||
display_name: Public Investment Fund (PIF)
|
||||
wikipedia:
|
||||
abbrev: PIF
|
||||
fund_name: Public Investment Fund
|
||||
classification:
|
||||
access: 0.4
|
||||
liquidity: 0.4
|
||||
transparency: 0.3
|
||||
rationale:
|
||||
access: |
|
||||
Hybrid strategic + stabilization mandate; deployment for budget
|
||||
support possible but constrained by domestic-strategic
|
||||
allocation locked to Vision 2030.
|
||||
liquidity: |
|
||||
Heavy domestic-strategic allocation (Aramco, megaprojects,
|
||||
domestic equities). Public-market share estimated ~40%.
|
||||
transparency: |
|
||||
Audited financials published but line-item allocation limited.
|
||||
Joined IFSWF observer status; not full member. LM index ~ 4.
|
||||
sources:
|
||||
- https://www.pif.gov.sa/en/about-pif/annual-report/
|
||||
- https://www.ifswf.org/members
|
||||
|
||||
# ── Kuwait ──
|
||||
# KIA runs two legally distinct funds: General Reserve Fund (budget-
|
||||
# financing) and Future Generations Fund (intergenerational). Combined
|
||||
# here since audited AUM is reported at the KIA level.
|
||||
- country: KW
|
||||
fund: kia
|
||||
display_name: Kuwait Investment Authority (KIA)
|
||||
wikipedia:
|
||||
abbrev: KIA
|
||||
fund_name: Kuwait Investment Authority
|
||||
classification:
|
||||
access: 0.7
|
||||
liquidity: 0.8
|
||||
transparency: 0.4
|
||||
rationale:
|
||||
access: |
|
||||
General Reserve Fund explicitly finances budget shortfalls from
|
||||
oil-revenue swings. Strongest stabilization access in the Gulf.
|
||||
liquidity: |
|
||||
Predominantly public-market (~75-85% listed equities + fixed
|
||||
income). Private-asset sleeve is a minority allocation.
|
||||
transparency: |
|
||||
Financials reported to National Assembly but sealed from
|
||||
public; partial IFSWF engagement. LM index = 6.
|
||||
sources:
|
||||
- https://www.kia.gov.kw/en/
|
||||
- https://www.ifswf.org/member-profiles/kuwait-investment-authority
|
||||
|
||||
# ── Qatar ──
|
||||
- country: QA
|
||||
fund: qia
|
||||
display_name: Qatar Investment Authority (QIA)
|
||||
wikipedia:
|
||||
abbrev: QIA
|
||||
fund_name: Qatar Investment Authority
|
||||
classification:
|
||||
access: 0.4
|
||||
liquidity: 0.6
|
||||
transparency: 0.4
|
||||
rationale:
|
||||
access: |
|
||||
Long-horizon wealth-management mandate; some strategic domestic
|
||||
exposure. Deployment for budget support requires amiri decree.
|
||||
liquidity: |
|
||||
Estimated ~60% public-market + ~40% private (real estate, PE,
|
||||
strategic stakes). Mid-liquid.
|
||||
transparency: |
|
||||
Limited public disclosure. IFSWF full member with audited
|
||||
filings to the secretariat. LM index = 5.
|
||||
sources:
|
||||
- https://www.qia.qa/en/
|
||||
- https://www.ifswf.org/member-profiles/qatar-investment-authority
|
||||
|
||||
# ── Singapore ──
|
||||
# Two distinct vehicles with different mandates. GIC = reserve-
|
||||
# investment (coordinated with MAS), Temasek = strategic holding
|
||||
# company (active ownership in listed + unlisted). Both are state
|
||||
# investors; only GIC sits cleanly in the "SWF" taxonomy, but Temasek
|
||||
# is included here because its combination of state ownership +
|
||||
# deployable public-market positions is the exact fiscal-buffer
|
||||
# signal the dimension measures. Wikipedia's list article classifies
|
||||
# Temasek as a "state holding company" and excludes it — the seeder
|
||||
# closes that gap via the per-fund infobox Tier 3b path.
|
||||
- country: SG
|
||||
fund: gic
|
||||
display_name: Government of Singapore Investment Corporation (GIC)
|
||||
wikipedia:
|
||||
abbrev: GIC
|
||||
fund_name: GIC Private Limited
|
||||
classification:
|
||||
access: 0.6
|
||||
liquidity: 0.9
|
||||
transparency: 0.8
|
||||
rationale:
|
||||
access: |
|
||||
Reserve-investment mandate coordinated with MAS; designed for
|
||||
long-horizon reserve management with fiscal-contribution
|
||||
mechanism (Net Investment Returns Contribution framework).
|
||||
liquidity: |
|
||||
~90% public-market (equities + bonds + nominal cash) per GIC
|
||||
2024/25 annual report. High liquidity.
|
||||
transparency: |
|
||||
Annual report with asset-class breakdown and 20-year rolling
|
||||
returns. Does not disclose exact AUM. IFSWF full member.
|
||||
LM index = 8.
|
||||
sources:
|
||||
- https://www.gic.com.sg/annual-report
|
||||
- https://www.ifswf.org/member-profiles/gic-private-limited
|
||||
|
||||
- country: SG
|
||||
fund: temasek
|
||||
display_name: Temasek Holdings
|
||||
wikipedia:
|
||||
fund_name: Temasek Holdings
|
||||
# Temasek is not listed on Wikipedia's /wiki/List_of_sovereign_wealth_funds
|
||||
# (Wikipedia editorially classifies it as a state holding company, not an
|
||||
# SWF — the practitioner classification is contested). Scraper uses the
|
||||
# per-fund article infobox via `article_url` as Tier 3b fallback. Infobox
|
||||
# reports AUM in SGD, which the seeder converts via its baked-in FX table.
|
||||
article_url: https://en.wikipedia.org/wiki/Temasek_Holdings
|
||||
classification:
|
||||
access: 0.4
|
||||
liquidity: 0.5
|
||||
transparency: 0.9
|
||||
rationale:
|
||||
access: |
|
||||
Strategic holding company with active ownership. Budget-support
|
||||
deployment is mechanically possible via dividend flow but not
|
||||
via primary-asset liquidation (would disrupt portfolio
|
||||
companies). Medium-low access.
|
||||
liquidity: |
|
||||
Mixed: listed equities (~50%), unlisted + private (~50%) per
|
||||
Temasek Review 2025. Mid-liquid.
|
||||
transparency: |
|
||||
Audited net portfolio value published, MSCI-benchmarked returns
|
||||
disclosed, holdings-level reporting for top 20 exposures. LM
|
||||
index = 10.
|
||||
sources:
|
||||
- https://www.temasekreview.com.sg/
|
||||
- https://www.temasek.com.sg/en/our-financials
|
||||
|
||||
# ────────────────────────────────────────────────────────────────────
|
||||
# CANDIDATES DEFERRED FROM V1
|
||||
# ────────────────────────────────────────────────────────────────────
|
||||
#
|
||||
# Funds considered but not scored in v1. Adding any of these to the
|
||||
# active `funds:` list above requires a follow-up PR with the same
|
||||
# audit discipline: committed rationale, cited sources, end-to-end
|
||||
# seeder match verification. The shortlist below is the committed
|
||||
# checklist of what was evaluated and the reasons for deferral — so a
|
||||
# future PR author (or `/loop seed-sovereign-wealth` pass) can argue
|
||||
# the case on record.
|
||||
#
|
||||
# China: CIC (China Investment Corporation), SAFE Investment Co.
|
||||
# — size is consequential (both >$1T) but political-economy
|
||||
# classification is contested, and Wikipedia's list
|
||||
# article bundles both under PRC policy banks. Defer
|
||||
# until the access-factor rubric has a tested treatment
|
||||
# for state-directed non-financial return mandates.
|
||||
# Hong Kong: HKMA Exchange Fund — commonly classified as an FX
|
||||
# reserve vehicle rather than an SWF. Out of scope for
|
||||
# the fiscal-buffer dimension; revisit if the Exchange
|
||||
# Fund's "Future Fund" branch expands materially.
|
||||
# Russia: National Wealth Fund (NWF) — sanctions + asset-freeze
|
||||
# make `access` effectively 0. Include only after a
|
||||
# regime change that restores deployable liquidity.
|
||||
# Azerbaijan: SOFAZ — well-documented, IFSWF member; shippable in
|
||||
# the next manifest PR.
|
||||
# Kazakhstan: Samruk-Kazyna (domestic strategic), NFRK (oil
|
||||
# stabilization). Score separately; NFRK is the cleaner
|
||||
# fit for the dimension.
|
||||
# Libya: LIA — sanctions + asset-freeze; same exclusion as
|
||||
# NWF. Revisit on sanctions change.
|
||||
# Nigeria: NSIA (Nigeria Sovereign Investment Authority) — IFSWF
|
||||
# member, three-fund structure; shippable in a follow-up
|
||||
# PR once the three sub-fund mandates are mapped.
|
||||
# Angola: FSDEA — governance concerns historically. Defer until
|
||||
# audited AUM disclosure stabilizes.
|
||||
# Oman: OIA — merged SGRF + OIF 2020; IFSWF member. Shippable.
|
||||
# Brunei: BIA — opaque, LM index ~2. Transparency coefficient
|
||||
# would pin at ~0.1, heavily dampening any contribution;
|
||||
# ship only if the rationale holds up to audit.
|
||||
# Timor-Leste: Petroleum Fund — high transparency, textbook fit.
|
||||
# Iran: NDFI (National Development Fund of Iran) — sanctions +
|
||||
# access concerns; exclude from v1.
|
||||
# Korea: KIC — FX reserve investor rather than stabilization
|
||||
# fund; edge case on whether it belongs in this dim.
|
||||
# Australia: Future Fund — well-documented, likely shippable.
|
||||
# New Zealand: NZ Super Fund — well-documented, likely shippable.
|
||||
# Ireland: ISIF — strategic-development mandate; low access,
|
||||
# medium liquidity, high transparency — composite
|
||||
# probably mid-low.
|
||||
# Chile: ESSF (Economic and Social Stabilization Fund) —
|
||||
# textbook stabilization fund; high priority for the
|
||||
# next manifest PR.
|
||||
# Panama, Trinidad & Tobago, Ghana, Senegal — smaller funds; include
|
||||
# when the AUM / annual-imports ratio is non-negligible.
|
||||
739
scripts/seed-sovereign-wealth.mjs
Normal file
739
scripts/seed-sovereign-wealth.mjs
Normal file
@@ -0,0 +1,739 @@
|
||||
#!/usr/bin/env node
|
||||
//
|
||||
// Seeder — Sovereign Wealth Fund AUM (for the `sovereignFiscalBuffer`
|
||||
// resilience dimension, PR 2 §3.4).
|
||||
//
|
||||
// Source priority (per plan §3.4, amended 2026-04-23 — see
|
||||
// "SWFI availability note" below):
|
||||
// 1. Official fund disclosures (MoF, central bank, fund annual reports).
|
||||
// Hand-curated endpoint map; highest confidence. STUBBED in this
|
||||
// commit (per-fund scrape adapters added incrementally).
|
||||
// 2. IFSWF member-fund filings. Santiago-principle compliant funds
|
||||
// publish audited AUM via the IFSWF secretariat. STUBBED.
|
||||
// 3. WIKIPEDIA `List_of_sovereign_wealth_funds` — license-free public
|
||||
// fallback (CC-BY-SA, attribution required; see `SOURCE_ATTRIBUTION`
|
||||
// below). IMPLEMENTED. Wikipedia per-fund AUM is community-curated
|
||||
// with primary-source citations on the article; lower confidence than
|
||||
// tier 1 / 2 but sufficient for the `sovereignFiscalBuffer` score's
|
||||
// saturating transform (large relative errors in AUM get compressed
|
||||
// by the exponential in `score = 100 × (1 − exp(−effectiveMonths /
|
||||
// 12))`, so tier-3 noise does not dominate ranking outcomes).
|
||||
//
|
||||
// SWFI availability note. The plan's original fallback target was the
|
||||
// SWFI public fund-rankings page at
|
||||
// https://www.swfinstitute.org/fund-rankings/sovereign-wealth-fund.
|
||||
// Empirical check on 2026-04-23: the page's <tbody> is empty and AUM is
|
||||
// gated behind a lead-capture form (name + company + job title). SWFI
|
||||
// individual `/profile/<id>` pages are similarly barren. The "public
|
||||
// fund-rankings" source is effectively no longer public. Scraping the
|
||||
// lead-gated surface would require submitting fabricated contact info
|
||||
// — a TOS violation and legally questionable — so we pivot tier 3 to
|
||||
// Wikipedia, which is both legally clean (CC-BY-SA) and structurally
|
||||
// scrapable. The SWFI Linaburg-Maduell transparency index mentioned in
|
||||
// the manifest's `transparency` rationale text is a SEPARATE SWFI
|
||||
// publication (public index scores), not the fund-rankings paywall —
|
||||
// those citations stay valid.
|
||||
//
|
||||
// Cadence: quarterly (plan §3.4). Railway cron cadence: weekly refresh
|
||||
// with ~35-day TTL (mirrors other recovery-domain seeders so stale data
|
||||
// is caught by the seed-meta gate before it leaks into rankings).
|
||||
//
|
||||
// Output shape (Redis key `resilience:recovery:sovereign-wealth:v1`,
|
||||
// enveloped through `_seed-utils.mjs`):
|
||||
//
|
||||
// {
|
||||
// countries: {
|
||||
// [iso2]: {
|
||||
// funds: [
|
||||
// {
|
||||
// fund: 'gpfg',
|
||||
// aum: <number, USD>,
|
||||
// aumYear: <number>,
|
||||
// source: 'official' | 'ifswf' | 'wikipedia_list' | 'wikipedia_infobox',
|
||||
// access: <number 0..1>,
|
||||
// liquidity: <number 0..1>,
|
||||
// transparency: <number 0..1>,
|
||||
// rawMonths: <number, = aum / annualImports × 12>,
|
||||
// effectiveMonths: <number, = rawMonths × access × liquidity × transparency>,
|
||||
// },
|
||||
// ...
|
||||
// ],
|
||||
// totalEffectiveMonths: <number>, // Σ per-fund effectiveMonths
|
||||
// annualImports: <number, USD>, // WB NE.IMP.GNFS.CD, for audit
|
||||
// expectedFunds: <number>, // manifest count for this country
|
||||
// matchedFunds: <number>, // funds whose AUM resolved
|
||||
// completeness: <number 0..1>, // matchedFunds / expectedFunds
|
||||
// }
|
||||
// },
|
||||
// seededAt: <ISO8601>,
|
||||
// manifestVersion: <number>,
|
||||
// sourceMix: {
|
||||
// official: <count>, ifswf: <count>,
|
||||
// wikipedia_list: <count>, wikipedia_infobox: <count>,
|
||||
// },
|
||||
// }
|
||||
//
|
||||
// Countries WITHOUT an entry in the manifest are absent from this
|
||||
// payload. The scorer is expected to treat "no entry in payload" as
|
||||
// "no sovereign wealth fund" and score 0 with full coverage (plan
|
||||
// §3.4 "What happens to no-SWF countries"). This is substantively
|
||||
// different from IMPUTE fallback (which is "data-source-failed").
|
||||
|
||||
import { loadEnvFile, CHROME_UA, runSeed, SHARED_FX_FALLBACKS, getSharedFxRates } from './_seed-utils.mjs';
|
||||
import iso3ToIso2 from './shared/iso3-to-iso2.json' with { type: 'json' };
|
||||
import { groupFundsByCountry, loadSwfManifest } from './shared/swf-manifest-loader.mjs';
|
||||
|
||||
loadEnvFile(import.meta.url);
|
||||
|
||||
const CANONICAL_KEY = 'resilience:recovery:sovereign-wealth:v1';
|
||||
const CACHE_TTL_SECONDS = 35 * 24 * 3600;
|
||||
const WB_BASE = 'https://api.worldbank.org/v2';
|
||||
const IMPORTS_INDICATOR = 'NE.IMP.GNFS.CD';
|
||||
|
||||
const WIKIPEDIA_URL = 'https://en.wikipedia.org/wiki/List_of_sovereign_wealth_funds';
|
||||
export const WIKIPEDIA_SOURCE_ATTRIBUTION =
|
||||
'Wikipedia — List of sovereign wealth funds + per-fund articles (CC-BY-SA 4.0)';
|
||||
|
||||
// FX conversion uses the project-shared rate cache — Redis
|
||||
// `shared:fx-rates:v1` (4h TTL, live Yahoo Finance source) with a static
|
||||
// fallback table (`SHARED_FX_FALLBACKS`) that already carries every
|
||||
// currency we can plausibly see in an SWF infobox (USD, SGD, NOK, EUR,
|
||||
// GBP, AED, SAR, QAR, KWD, …). See scripts/_seed-utils.mjs and
|
||||
// scripts/seed-grocery-basket.mjs / scripts/seed-fuel-prices.mjs for
|
||||
// the consumer pattern. Small FX drift is absorbed by the saturating
|
||||
// transform in the scorer (100 × (1 − exp(−effectiveMonths / 12))), so
|
||||
// the shared cache's cadence suffices.
|
||||
//
|
||||
// Yahoo symbol convention: `<CCY>USD=X` returns the per-1-local-unit
|
||||
// value in USD. We build the symbol map dynamically from any currency
|
||||
// the infobox parser surfaces.
|
||||
|
||||
// Canonical currency code lookup keyed on the symbol / short-code that
|
||||
// appears in Wikipedia infoboxes. Each entry maps to an ISO-4217 code
|
||||
// used in FX_TO_USD above. Order matters — "US$" must be tested before
|
||||
// "S$" and "$" so a "US$ 100B" row doesn't match the SGD / USD-fallback
|
||||
// paths; `detectCurrency` below handles this by scanning longest-first.
|
||||
const CURRENCY_SYMBOL_TO_ISO = [
|
||||
['US$', 'USD'],
|
||||
['USD', 'USD'],
|
||||
['S$', 'SGD'],
|
||||
['SGD', 'SGD'],
|
||||
['NOK', 'NOK'],
|
||||
['kr', 'NOK'], // Norwegian krone — weak signal, only used when
|
||||
// preceded by a space and no other symbol matches
|
||||
['€', 'EUR'],
|
||||
['EUR', 'EUR'],
|
||||
['£', 'GBP'],
|
||||
['GBP', 'GBP'],
|
||||
['AED', 'AED'],
|
||||
['SAR', 'SAR'],
|
||||
['KWD', 'KWD'],
|
||||
['QAR', 'QAR'],
|
||||
['$', 'USD'], // Bare `$` defaults to USD — last to avoid shadowing
|
||||
// `US$` / `S$` / etc.
|
||||
];
|
||||
|
||||
// ── World Bank: per-country annual imports (denominator for rawMonths) ──
|
||||
|
||||
async function fetchAnnualImportsUsd() {
|
||||
const pages = [];
|
||||
let page = 1;
|
||||
let totalPages = 1;
|
||||
while (page <= totalPages) {
|
||||
const url = `${WB_BASE}/country/all/indicator/${IMPORTS_INDICATOR}?format=json&per_page=500&page=${page}&mrv=1`;
|
||||
const resp = await fetch(url, {
|
||||
headers: { 'User-Agent': CHROME_UA },
|
||||
signal: AbortSignal.timeout(30_000),
|
||||
});
|
||||
if (!resp.ok) throw new Error(`World Bank ${IMPORTS_INDICATOR}: HTTP ${resp.status}`);
|
||||
const json = await resp.json();
|
||||
const meta = json[0];
|
||||
const records = json[1] ?? [];
|
||||
totalPages = meta?.pages ?? 1;
|
||||
pages.push(...records);
|
||||
page++;
|
||||
}
|
||||
const imports = {};
|
||||
for (const record of pages) {
|
||||
const rawCode = record?.countryiso3code ?? record?.country?.id ?? '';
|
||||
const iso2 = rawCode.length === 3 ? (iso3ToIso2[rawCode] ?? null) : (rawCode.length === 2 ? rawCode : null);
|
||||
if (!iso2) continue;
|
||||
const value = Number(record?.value);
|
||||
if (!Number.isFinite(value) || value <= 0) continue;
|
||||
const year = Number(record?.date);
|
||||
imports[iso2] = { importsUsd: value, year: Number.isFinite(year) ? year : null };
|
||||
}
|
||||
return imports;
|
||||
}
|
||||
|
||||
// ── Tier 1: official disclosure endpoints (per-fund hand-curated) ──
|
||||
//
|
||||
// STUBBED. Each fund's annual-report / press-release page has a
|
||||
// different structure; the scrape logic must be bespoke per fund.
|
||||
// Added incrementally in follow-up commits.
|
||||
//
|
||||
// Returns { aum: number, aumYear: number, source: 'official' } or null.
|
||||
async function fetchOfficialDisclosure(_fund) {
|
||||
return null;
|
||||
}
|
||||
|
||||
// ── Tier 2: IFSWF secretariat filings ──
|
||||
//
|
||||
// STUBBED. IFSWF publishes member-fund AUM at
|
||||
// https://www.ifswf.org/member-profiles/<slug> but layout varies per
|
||||
// fund. Deferred to a follow-up commit.
|
||||
//
|
||||
// Returns { aum: number, aumYear: number, source: 'ifswf' } or null.
|
||||
async function fetchIfswfFiling(_fund) {
|
||||
return null;
|
||||
}
|
||||
|
||||
// ── Tier 3: Wikipedia fallback ──
|
||||
|
||||
// Wikipedia's country-name spelling for each manifest ISO-2. Used by the
|
||||
// disambiguator to break abbrev collisions (e.g. "PIF" resolves to both
|
||||
// Saudi Arabia's Public Investment Fund and Palestine's Palestine
|
||||
// Investment Fund — without a country filter, the latter would silently
|
||||
// shadow the former). Extend this map when adding a manifest entry
|
||||
// whose country is new.
|
||||
const ISO2_TO_WIKIPEDIA_COUNTRY_NAME = new Map([
|
||||
['NO', 'norway'],
|
||||
['AE', 'united arab emirates'],
|
||||
['SA', 'saudi arabia'],
|
||||
['KW', 'kuwait'],
|
||||
['QA', 'qatar'],
|
||||
['SG', 'singapore'],
|
||||
]);
|
||||
|
||||
function normalizeAbbrev(value) {
|
||||
return String(value || '').toUpperCase().replace(/[-\s.]/g, '');
|
||||
}
|
||||
|
||||
function normalizeFundName(value) {
|
||||
return String(value || '').toLowerCase().trim().replace(/\s+/g, ' ');
|
||||
}
|
||||
|
||||
function normalizeCountryName(value) {
|
||||
return String(value || '').toLowerCase().trim().replace(/\s+/g, ' ');
|
||||
}
|
||||
|
||||
function pushIndexed(map, key, record) {
|
||||
if (!key) return;
|
||||
const list = map.get(key) ?? [];
|
||||
list.push(record);
|
||||
map.set(key, list);
|
||||
}
|
||||
|
||||
function stripHtmlInline(value) {
|
||||
// HTML tags replace with a space (not empty) so inline markup like
|
||||
// `302.0<sup>41</sup>` becomes `302.0 41` — otherwise the decimal
|
||||
// value and its trailing footnote ref get welded into `302.041`,
|
||||
// which the Assets regex then mis-parses as a single number.
|
||||
return String(value || '')
|
||||
.replace(/<[^>]+>/g, ' ')
|
||||
.replace(/ /g, ' ')
|
||||
.replace(/&/g, '&')
|
||||
.replace(/&[#\w]+;/g, ' ')
|
||||
.replace(/\s+/g, ' ')
|
||||
.trim();
|
||||
}
|
||||
|
||||
// Depth-aware extraction of the first `<table class="wikitable...">`
|
||||
// content. A simple lazy `[\s\S]*?</table>` would stop at the FIRST
|
||||
// `</table>` encountered — but Wikipedia occasionally embeds mini-
|
||||
// tables inside a row (sort helpers, footnote boxes). With a lazy
|
||||
// match, any nested `</table>` before the real close silently drops
|
||||
// all trailing rows. Walk the tag stream and close at matched depth.
|
||||
function extractFirstWikitable(html) {
|
||||
const openRe = /<table[^>]*class="[^"]*wikitable[^"]*"[^>]*>/g;
|
||||
const openMatch = openRe.exec(html);
|
||||
if (!openMatch) return null;
|
||||
const innerStart = openMatch.index + openMatch[0].length;
|
||||
|
||||
const tagRe = /<(\/?)table\b[^>]*>/g;
|
||||
tagRe.lastIndex = innerStart;
|
||||
let depth = 1;
|
||||
let m;
|
||||
while ((m = tagRe.exec(html)) !== null) {
|
||||
depth += m[1] === '/' ? -1 : 1;
|
||||
if (depth === 0) return html.slice(innerStart, m.index);
|
||||
}
|
||||
return null; // unclosed table — treat as malformed
|
||||
}
|
||||
|
||||
// Recursively remove complete nested `<table>…</table>` blocks from the
|
||||
// extracted wikitable content before row parsing. Without this pass,
|
||||
// the lazy row / cell regexes below bind across nested `</tr>` and
|
||||
// `</td>` tags embedded in a cell's inner table, silently dropping the
|
||||
// enclosing row. Uses depth tracking so a nested-inside-nested block
|
||||
// is still removed as one unit.
|
||||
function stripNestedTables(tableInner) {
|
||||
let out = tableInner;
|
||||
// Loop because stripping outer nested may reveal deeper ones; each
|
||||
// iteration strips the outermost complete <table>…</table>.
|
||||
// eslint-disable-next-line no-constant-condition
|
||||
while (true) {
|
||||
const openRe = /<table\b[^>]*>/g;
|
||||
const openMatch = openRe.exec(out);
|
||||
if (!openMatch) return out;
|
||||
const innerStart = openMatch.index + openMatch[0].length;
|
||||
const tagRe = /<(\/?)table\b[^>]*>/g;
|
||||
tagRe.lastIndex = innerStart;
|
||||
let depth = 1;
|
||||
let closeEnd = -1;
|
||||
let m;
|
||||
while ((m = tagRe.exec(out)) !== null) {
|
||||
depth += m[1] === '/' ? -1 : 1;
|
||||
if (depth === 0) { closeEnd = m.index + m[0].length; break; }
|
||||
}
|
||||
if (closeEnd === -1) return out; // unclosed nested — stop
|
||||
out = out.slice(0, openMatch.index) + out.slice(closeEnd);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse the Wikipedia wikitable HTML into lookup-by-abbrev / lookup-
|
||||
* by-fund-name caches. Exported so it can be unit-tested against a
|
||||
* committed fixture without a live fetch.
|
||||
*
|
||||
* Assumed columns (verified 2026-04-23 on the shipping article):
|
||||
* [0] Country or region
|
||||
* [1] Abbrev.
|
||||
* [2] Fund name
|
||||
* [3] Assets (in USD billions, optionally followed by a footnote
|
||||
* reference like "2,117 37" — strip the trailing integer).
|
||||
* [4] Inception year
|
||||
* [5] Origin (Oil Gas / Non-commodity / etc.)
|
||||
*
|
||||
* Returns Maps keyed by normalized value → LIST of records. Multiple
|
||||
* records under one key is a real case: "PIF" resolves to both Saudi
|
||||
* Arabia's Public Investment Fund and Palestine's Palestine Investment
|
||||
* Fund. The matcher disambiguates via manifest country at lookup time
|
||||
* rather than letting Map.set silently overwrite.
|
||||
*
|
||||
* Record: { aum, aumYear, fundName, countryName, inceptionYear }.
|
||||
* aumYear is null for list-article rows because the article does not
|
||||
* publish a per-row data-year annotation; consumers treating aumYear
|
||||
* as authoritative freshness must fall back to the infobox path.
|
||||
*
|
||||
* @param {string} html full article HTML
|
||||
* @returns {{ byAbbrev: Map<string, object[]>, byFundName: Map<string, object[]> }}
|
||||
*/
|
||||
export function parseWikipediaRankingsTable(html) {
|
||||
const rawTbl = extractFirstWikitable(html);
|
||||
if (rawTbl == null) throw new Error('Wikipedia article: wikitable not found');
|
||||
const tbl = stripNestedTables(rawTbl);
|
||||
|
||||
const byAbbrev = new Map();
|
||||
const byFundName = new Map();
|
||||
|
||||
const rowRe = /<tr[^>]*>([\s\S]*?)<\/tr>/g;
|
||||
let rowMatch;
|
||||
while ((rowMatch = rowRe.exec(tbl)) !== null) {
|
||||
const cellRe = /<t[dh][^>]*>([\s\S]*?)<\/t[dh]>/g;
|
||||
const cells = [];
|
||||
let cellMatch;
|
||||
while ((cellMatch = cellRe.exec(rowMatch[1])) !== null) cells.push(cellMatch[1]);
|
||||
if (cells.length < 5) continue;
|
||||
|
||||
const countryName = stripHtmlInline(cells[0]);
|
||||
const abbrev = stripHtmlInline(cells[1]);
|
||||
const fundName = stripHtmlInline(cells[2]);
|
||||
const assetsCell = stripHtmlInline(cells[3]);
|
||||
const inceptionCell = stripHtmlInline(cells[4]);
|
||||
|
||||
// "2,117 37" → 2117 billion (strip optional trailing footnote int)
|
||||
const assetsMatch = assetsCell.match(/^([\d,]+(?:\.\d+)?)(?:\s+\d+)?\s*$/);
|
||||
if (!assetsMatch) continue;
|
||||
const aumBillions = parseFloat(assetsMatch[1].replace(/,/g, ''));
|
||||
if (!Number.isFinite(aumBillions) || aumBillions <= 0) continue;
|
||||
const aum = aumBillions * 1_000_000_000;
|
||||
|
||||
const inceptionYearMatch = inceptionCell.match(/(\d{4})/);
|
||||
const inceptionYear = inceptionYearMatch ? parseInt(inceptionYearMatch[1], 10) : null;
|
||||
|
||||
// aumYear: null — the list article has no per-row data-year
|
||||
// annotation. Reporting the scrape year would mislead freshness
|
||||
// auditors (figures are usually prior-period).
|
||||
const record = { aum, aumYear: null, fundName, countryName, inceptionYear };
|
||||
|
||||
pushIndexed(byAbbrev, normalizeAbbrev(abbrev), record);
|
||||
pushIndexed(byFundName, normalizeFundName(fundName), record);
|
||||
}
|
||||
|
||||
return { byAbbrev, byFundName };
|
||||
}
|
||||
|
||||
async function loadWikipediaRankingsCache() {
|
||||
const resp = await fetch(WIKIPEDIA_URL, {
|
||||
headers: {
|
||||
'User-Agent': CHROME_UA,
|
||||
'Accept': 'text/html,application/xhtml+xml',
|
||||
},
|
||||
signal: AbortSignal.timeout(30_000),
|
||||
});
|
||||
if (!resp.ok) throw new Error(`Wikipedia SWF list: HTTP ${resp.status}`);
|
||||
const html = await resp.text();
|
||||
return parseWikipediaRankingsTable(html);
|
||||
}
|
||||
|
||||
function pickByCountry(candidates, fundCountryIso2) {
|
||||
if (!candidates || candidates.length === 0) return null;
|
||||
// Single candidate → return it (country clash is not possible).
|
||||
if (candidates.length === 1) return candidates[0];
|
||||
// Multiple candidates → require a country-name match to pick one.
|
||||
// Returning null here is the safe choice: it means "ambiguous match",
|
||||
// which the seeder surfaces as an unmatched fund (logged), rather
|
||||
// than silently returning the wrong fund's AUM.
|
||||
const expectedCountryName = ISO2_TO_WIKIPEDIA_COUNTRY_NAME.get(fundCountryIso2);
|
||||
if (!expectedCountryName) return null;
|
||||
for (const record of candidates) {
|
||||
if (normalizeCountryName(record.countryName) === expectedCountryName) return record;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
export function matchWikipediaRecord(fund, cache) {
|
||||
const hints = fund.wikipedia;
|
||||
if (!hints) return null;
|
||||
if (hints.abbrev) {
|
||||
const hit = pickByCountry(cache.byAbbrev.get(normalizeAbbrev(hints.abbrev)), fund.country);
|
||||
if (hit) return hit;
|
||||
}
|
||||
if (hints.fundName) {
|
||||
const hit = pickByCountry(cache.byFundName.get(normalizeFundName(hints.fundName)), fund.country);
|
||||
if (hit) return hit;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
async function fetchWikipediaRanking(fund, cache) {
|
||||
const hit = matchWikipediaRecord(fund, cache);
|
||||
if (!hit) return null;
|
||||
return { aum: hit.aum, aumYear: hit.aumYear, source: 'wikipedia_list' };
|
||||
}
|
||||
|
||||
// ── Tier 3b: per-fund Wikipedia article infobox fallback ──
|
||||
//
|
||||
// Some manifest funds (Temasek is the canonical case) are editorially
|
||||
// excluded from Wikipedia's list article. For those, the fund's own
|
||||
// Wikipedia article's infobox carries AUM. Infobox layout is relatively
|
||||
// stable: a `<table class="infobox ...">` with rows of
|
||||
// `<th>Label</th><td>Value</td>`. We look for rows labelled "Total
|
||||
// assets" / "Assets under management" / "AUM" / "Net assets" and parse
|
||||
// the value.
|
||||
|
||||
const INFOBOX_AUM_LABELS = [
|
||||
/^total\s+assets$/i,
|
||||
/^assets\s+under\s+management$/i,
|
||||
/^aum$/i,
|
||||
/^net\s+assets$/i,
|
||||
/^net\s+portfolio\s+value$/i,
|
||||
];
|
||||
|
||||
/**
|
||||
* Detect the currency in a Wikipedia infobox value string.
|
||||
* Returns an ISO-4217 code (e.g. "SGD") or null if unrecognized.
|
||||
* Scans CURRENCY_SYMBOL_TO_ISO in order so longer/more-specific
|
||||
* prefixes (US$, S$) match before bare `$` / `kr`.
|
||||
*/
|
||||
export function detectCurrency(text) {
|
||||
const haystack = String(text || '');
|
||||
for (const [symbol, iso] of CURRENCY_SYMBOL_TO_ISO) {
|
||||
// `$` / `kr` are short + could false-match in rich text; require
|
||||
// either a space before or start-of-string immediately before the
|
||||
// token, and a digit (optional space) after.
|
||||
if (symbol === '$' || symbol === 'kr') {
|
||||
const re = new RegExp(`(^|\\s)${symbol.replace(/[$]/g, '\\$')}\\s*\\d`);
|
||||
if (re.test(haystack)) return iso;
|
||||
continue;
|
||||
}
|
||||
if (haystack.includes(symbol)) return iso;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse a Wikipedia infobox HTML fragment for an AUM value. Returns
|
||||
* the NATIVE-currency value plus its ISO-4217 code so the caller can
|
||||
* apply the project-shared FX rates (`getSharedFxRates`) at orchestration
|
||||
* time. Returning raw-native avoids duplicating the FX conversion layer
|
||||
* already maintained in `scripts/_seed-utils.mjs` for seed-grocery-basket,
|
||||
* seed-fuel-prices, seed-bigmac, etc.
|
||||
*
|
||||
* Returns { valueNative: number, currencyNative: string, aumYear: number }
|
||||
* or null if no usable row.
|
||||
*
|
||||
* Exported pure so a committed fixture can exercise the parsing + currency
|
||||
* detection without a live fetch.
|
||||
*/
|
||||
export function parseWikipediaArticleInfobox(html) {
|
||||
const infoboxMatch = html.match(/<table[^>]*class="[^"]*infobox[^"]*"[^>]*>([\s\S]*?)<\/table>/);
|
||||
if (!infoboxMatch) return null;
|
||||
const box = infoboxMatch[1];
|
||||
|
||||
const rowRe = /<tr[^>]*>([\s\S]*?)<\/tr>/g;
|
||||
let rowMatch;
|
||||
while ((rowMatch = rowRe.exec(box)) !== null) {
|
||||
// Split the row into th (label) + td (value). Either can be missing
|
||||
// or out-of-order in edge cases, so use a two-pass extraction.
|
||||
const label = (rowMatch[1].match(/<th[^>]*>([\s\S]*?)<\/th>/)?.[1] ?? '');
|
||||
const value = (rowMatch[1].match(/<td[^>]*>([\s\S]*?)<\/td>/)?.[1] ?? '');
|
||||
const labelText = stripHtmlInline(label);
|
||||
if (!INFOBOX_AUM_LABELS.some((re) => re.test(labelText))) continue;
|
||||
|
||||
const valueText = stripHtmlInline(value);
|
||||
// Example values:
|
||||
// "S$ 434 billion (2025) 2"
|
||||
// "US$ 1,128 billion"
|
||||
// "€ 500 million"
|
||||
// "NOK 18.7 trillion (2025)"
|
||||
const numMatch = valueText.match(/([\d,]+(?:\.\d+)?)\s*(trillion|billion|million)/i);
|
||||
if (!numMatch) continue;
|
||||
const rawNum = parseFloat(numMatch[1].replace(/,/g, ''));
|
||||
if (!Number.isFinite(rawNum) || rawNum <= 0) continue;
|
||||
const unit = numMatch[2].toLowerCase();
|
||||
const unitMultiplier = unit === 'trillion'
|
||||
? 1_000_000_000_000
|
||||
: unit === 'billion'
|
||||
? 1_000_000_000
|
||||
: 1_000_000;
|
||||
const valueNative = rawNum * unitMultiplier;
|
||||
|
||||
const currencyNative = detectCurrency(valueText) ?? 'USD';
|
||||
|
||||
const yearMatch = valueText.match(/\((\d{4})\)/);
|
||||
const aumYear = yearMatch ? parseInt(yearMatch[1], 10) : new Date().getFullYear();
|
||||
|
||||
return { valueNative, currencyNative, aumYear };
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Look up the USD-per-unit rate for a currency from the shared FX map.
|
||||
* `fxRates` is the object returned by `getSharedFxRates()` (keys are
|
||||
* ISO-4217 codes). Falls back to SHARED_FX_FALLBACKS for any currency
|
||||
* not in the live map. Returns null if the currency is unknown — the
|
||||
* caller should treat that as "cannot convert, skip this fund" rather
|
||||
* than silently pretending the value is USD.
|
||||
*/
|
||||
export function lookupUsdRate(currency, fxRates) {
|
||||
if (currency === 'USD') return 1.0;
|
||||
const rate = fxRates?.[currency] ?? SHARED_FX_FALLBACKS[currency];
|
||||
return (rate != null && rate > 0) ? rate : null;
|
||||
}
|
||||
|
||||
async function fetchWikipediaInfobox(fund, fxRates) {
|
||||
const articleUrl = fund.wikipedia?.articleUrl;
|
||||
if (!articleUrl) return null;
|
||||
const resp = await fetch(articleUrl, {
|
||||
headers: {
|
||||
'User-Agent': CHROME_UA,
|
||||
'Accept': 'text/html,application/xhtml+xml',
|
||||
},
|
||||
signal: AbortSignal.timeout(30_000),
|
||||
});
|
||||
if (!resp.ok) {
|
||||
console.warn(`[seed-sovereign-wealth] ${fund.country}:${fund.fund} infobox fetch HTTP ${resp.status}`);
|
||||
return null;
|
||||
}
|
||||
const html = await resp.text();
|
||||
const hit = parseWikipediaArticleInfobox(html);
|
||||
if (!hit) return null;
|
||||
const usdRate = lookupUsdRate(hit.currencyNative, fxRates);
|
||||
if (usdRate == null) {
|
||||
console.warn(`[seed-sovereign-wealth] ${fund.country}:${fund.fund} infobox currency ${hit.currencyNative} has no FX rate; skipping`);
|
||||
return null;
|
||||
}
|
||||
return {
|
||||
aum: hit.valueNative * usdRate,
|
||||
aumYear: hit.aumYear,
|
||||
source: 'wikipedia_infobox',
|
||||
currencyNative: hit.currencyNative,
|
||||
fxRate: usdRate,
|
||||
};
|
||||
}
|
||||
|
||||
// ── Aggregation ──
|
||||
|
||||
async function fetchFundAum(fund, wikipediaCache, fxRates) {
|
||||
// Source priority: official → IFSWF → Wikipedia list → Wikipedia
|
||||
// per-fund infobox. Short-circuit on first non-null return so the
|
||||
// highest-confidence source wins. The infobox sub-tier is last
|
||||
// because it is per-fund fetch (N network round-trips, one per fund
|
||||
// that misses the list article) — amortizing over the list article
|
||||
// cache first minimizes live traffic.
|
||||
const official = await fetchOfficialDisclosure(fund);
|
||||
if (official) return official;
|
||||
const ifswf = await fetchIfswfFiling(fund);
|
||||
if (ifswf) return ifswf;
|
||||
const wikipediaList = await fetchWikipediaRanking(fund, wikipediaCache);
|
||||
if (wikipediaList) return wikipediaList;
|
||||
const wikipediaInfobox = await fetchWikipediaInfobox(fund, fxRates);
|
||||
if (wikipediaInfobox) return wikipediaInfobox;
|
||||
return null;
|
||||
}
|
||||
|
||||
// Build the fxSymbols map getSharedFxRates expects. We request every
|
||||
// currency the infobox parser can reasonably surface — this is a
|
||||
// superset of what any single seed run will need, but it keeps the
|
||||
// shared Redis FX cache warm for other seeders and costs one Yahoo
|
||||
// fetch per uncached ccy. The set matches CURRENCY_SYMBOL_TO_ISO.
|
||||
function buildFxSymbolsForSwf() {
|
||||
const ccys = new Set(CURRENCY_SYMBOL_TO_ISO.map(([, iso]) => iso));
|
||||
const symbols = {};
|
||||
for (const ccy of ccys) {
|
||||
if (ccy === 'USD') continue;
|
||||
symbols[ccy] = `${ccy}USD=X`;
|
||||
}
|
||||
return symbols;
|
||||
}
|
||||
|
||||
export async function fetchSovereignWealth() {
|
||||
const manifest = loadSwfManifest();
|
||||
const [imports, wikipediaCache, fxRates] = await Promise.all([
|
||||
fetchAnnualImportsUsd(),
|
||||
loadWikipediaRankingsCache(),
|
||||
getSharedFxRates(buildFxSymbolsForSwf(), SHARED_FX_FALLBACKS),
|
||||
]);
|
||||
|
||||
const countries = {};
|
||||
const sourceMix = { official: 0, ifswf: 0, wikipedia_list: 0, wikipedia_infobox: 0 };
|
||||
const unmatched = [];
|
||||
|
||||
for (const [iso2, funds] of groupFundsByCountry(manifest)) {
|
||||
const importsEntry = imports[iso2];
|
||||
if (!importsEntry) {
|
||||
// WB `NE.IMP.GNFS.CD` missing for this country (transient outage
|
||||
// or a country with spotty WB coverage). Silently dropping would
|
||||
// let the downstream scorer interpret the absence as "no SWF" and
|
||||
// score 0 with full coverage — substantively wrong. Log it
|
||||
// loudly and surface via the unmatched list so the seed-meta
|
||||
// observer can alert.
|
||||
console.warn(`[seed-sovereign-wealth] ${iso2} skipped: World Bank imports (${IMPORTS_INDICATOR}) missing — cannot compute rawMonths denominator`);
|
||||
for (const fund of funds) unmatched.push(`${fund.country}:${fund.fund} (no WB imports)`);
|
||||
continue;
|
||||
}
|
||||
|
||||
const fundRecords = [];
|
||||
for (const fund of funds) {
|
||||
const aum = await fetchFundAum(fund, wikipediaCache, fxRates);
|
||||
if (!aum) {
|
||||
unmatched.push(`${fund.country}:${fund.fund}`);
|
||||
continue;
|
||||
}
|
||||
sourceMix[aum.source] = (sourceMix[aum.source] ?? 0) + 1;
|
||||
|
||||
const { access, liquidity, transparency } = fund.classification;
|
||||
const rawMonths = (aum.aum / importsEntry.importsUsd) * 12;
|
||||
const effectiveMonths = rawMonths * access * liquidity * transparency;
|
||||
|
||||
fundRecords.push({
|
||||
fund: fund.fund,
|
||||
aum: aum.aum,
|
||||
aumYear: aum.aumYear,
|
||||
source: aum.source,
|
||||
access,
|
||||
liquidity,
|
||||
transparency,
|
||||
rawMonths,
|
||||
effectiveMonths,
|
||||
});
|
||||
}
|
||||
|
||||
if (fundRecords.length === 0) continue;
|
||||
const totalEffectiveMonths = fundRecords.reduce((s, f) => s + f.effectiveMonths, 0);
|
||||
const expectedFunds = funds.length;
|
||||
const matchedFunds = fundRecords.length;
|
||||
const completeness = matchedFunds / expectedFunds;
|
||||
// `completeness` signals partial-seed on multi-fund countries (AE,
|
||||
// SG). Downstream scorer must derate the country when completeness
|
||||
// < 1.0 — silently emitting partial totalEffectiveMonths would
|
||||
// under-rank countries whose secondary fund transiently drifted on
|
||||
// Wikipedia. The country stays in the payload (so the scorer can
|
||||
// use the partial number for IMPUTE-level coverage), but only
|
||||
// completeness=1.0 countries count toward recordCount / health.
|
||||
if (completeness < 1.0) {
|
||||
console.warn(`[seed-sovereign-wealth] ${iso2} partial: ${matchedFunds}/${expectedFunds} funds matched — completeness=${completeness.toFixed(2)}`);
|
||||
}
|
||||
countries[iso2] = {
|
||||
funds: fundRecords,
|
||||
totalEffectiveMonths,
|
||||
annualImports: importsEntry.importsUsd,
|
||||
expectedFunds,
|
||||
matchedFunds,
|
||||
completeness,
|
||||
};
|
||||
}
|
||||
|
||||
if (unmatched.length > 0) {
|
||||
console.warn(`[seed-sovereign-wealth] ${unmatched.length} fund(s) unmatched across all tiers: ${unmatched.join(', ')}`);
|
||||
}
|
||||
|
||||
const usedWikipedia = sourceMix.wikipedia_list + sourceMix.wikipedia_infobox > 0;
|
||||
return {
|
||||
countries,
|
||||
seededAt: new Date().toISOString(),
|
||||
manifestVersion: manifest.manifestVersion,
|
||||
sourceMix,
|
||||
sourceAttribution: {
|
||||
wikipedia: usedWikipedia ? WIKIPEDIA_SOURCE_ATTRIBUTION : undefined,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export function validate(data) {
|
||||
// Tier 3 (Wikipedia) is now live; expected floor = 1 country once any
|
||||
// manifest fund matches. We keep the floor lenient (>=0) during the
|
||||
// first Railway-cron bake-in window so a transient Wikipedia fetch
|
||||
// failure does not poison seed-meta for 30 days (see
|
||||
// feedback_strict_floor_validate_fail_poisons_seed_meta.md). Once
|
||||
// the seeder has ~7 days of clean runs, tighten to `>= 1`.
|
||||
//
|
||||
// Strict null check: `typeof null === 'object'` is true in JS, so a
|
||||
// bare `typeof x === 'object'` would let `{ countries: null }` through
|
||||
// and downstream consumers would crash on property access. Accept
|
||||
// only a non-null plain object.
|
||||
const c = data?.countries;
|
||||
return c != null && typeof c === 'object' && !Array.isArray(c);
|
||||
}
|
||||
|
||||
// Health-facing record count. Counts ONLY fully-matched countries
|
||||
// (completeness === 1.0), so a scraper drift on a secondary fund (e.g.
|
||||
// Mubadala while ADIA still matches, or Temasek while GIC still matches)
|
||||
// drops the recordCount seed-health signal — catching the partial-seed
|
||||
// silent-corruption class that an "any country that has any fund"
|
||||
// count would miss. Per-country completeness stays in the payload for
|
||||
// the scorer to derate; recordCount is the operational alarm.
|
||||
export function declareRecords(data) {
|
||||
const countries = data?.countries ?? {};
|
||||
let fully = 0;
|
||||
for (const entry of Object.values(countries)) {
|
||||
if (entry?.completeness === 1.0) fully++;
|
||||
}
|
||||
return fully;
|
||||
}
|
||||
|
||||
if (process.argv[1]?.endsWith('seed-sovereign-wealth.mjs')) {
|
||||
runSeed('resilience', 'recovery:sovereign-wealth', CANONICAL_KEY, fetchSovereignWealth, {
|
||||
validateFn: validate,
|
||||
ttlSeconds: CACHE_TTL_SECONDS,
|
||||
sourceVersion: `swf-manifest-v1-${new Date().getFullYear()}`,
|
||||
// Health-facing recordCount delegates to declareRecords so the
|
||||
// seed-meta record_count stays consistent with the operational
|
||||
// alarm (only countries whose manifest funds all matched count).
|
||||
recordCount: declareRecords,
|
||||
declareRecords,
|
||||
schemaVersion: 1,
|
||||
maxStaleMin: 86400,
|
||||
// Empty payload is still acceptable while tiers 1/2 are stubbed
|
||||
// and any transient Wikipedia outage occurs; downstream IMPUTE
|
||||
// path handles it.
|
||||
emptyDataIsFailure: false,
|
||||
}).catch((err) => {
|
||||
const _cause = err.cause ? ` (cause: ${err.cause.message || err.cause.code || err.cause})` : '';
|
||||
console.error('FATAL:', (err.message || err) + _cause);
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
246
scripts/shared/swf-manifest-loader.mjs
Normal file
246
scripts/shared/swf-manifest-loader.mjs
Normal file
@@ -0,0 +1,246 @@
|
||||
// Loader + validator for the SWF classification manifest at
|
||||
// docs/methodology/swf-classification-manifest.yaml.
|
||||
//
|
||||
// Shared between the seeder (scripts/seed-sovereign-wealth.mjs), the
|
||||
// scorer unit tests, and the methodology-doc linter. Keep server-free
|
||||
// (no Redis, no env mutations) so the server scorer can import it too
|
||||
// once PR 2 lands its TypeScript counterpart.
|
||||
//
|
||||
// See plan §3.4 "Classification manifest and Norway example" for the
|
||||
// three-component haircut definitions. This loader is the
|
||||
// single-source-of-truth parser; do not hand-parse the YAML elsewhere.
|
||||
|
||||
import { readFileSync } from 'node:fs';
|
||||
import { fileURLToPath } from 'node:url';
|
||||
import { dirname, resolve } from 'node:path';
|
||||
import { parse as parseYaml } from 'yaml';
|
||||
|
||||
const here = dirname(fileURLToPath(import.meta.url));
|
||||
const MANIFEST_PATH = resolve(here, '../../docs/methodology/swf-classification-manifest.yaml');
|
||||
|
||||
/**
|
||||
* @typedef {Object} SwfClassification
|
||||
* @property {number} access 0..1 inclusive
|
||||
* @property {number} liquidity 0..1 inclusive
|
||||
* @property {number} transparency 0..1 inclusive
|
||||
*/
|
||||
|
||||
/**
|
||||
* @typedef {Object} SwfWikipediaHints
|
||||
* @property {string} [abbrev] matches the "Abbrev." column on the
|
||||
* Wikipedia `List_of_sovereign_wealth_funds`
|
||||
* article (case- and punctuation-normalized)
|
||||
* @property {string} [fundName] matches the "Fund name" column
|
||||
* @property {string} [articleUrl] per-fund Wikipedia article URL used by the
|
||||
* Tier 3b infobox fallback when the list
|
||||
* article does not include the fund
|
||||
* (Temasek is the canonical case)
|
||||
*/
|
||||
|
||||
/**
|
||||
* @typedef {Object} SwfManifestEntry
|
||||
* @property {string} country ISO-3166-1 alpha-2
|
||||
* @property {string} fund short fund identifier (stable across runs)
|
||||
* @property {string} displayName human-readable fund name
|
||||
* @property {SwfWikipediaHints} [wikipedia] optional lookup hints for the
|
||||
* Wikipedia fallback scraper
|
||||
* @property {SwfClassification} classification
|
||||
* @property {{ access: string, liquidity: string, transparency: string }} rationale
|
||||
* @property {string[]} sources
|
||||
*/
|
||||
|
||||
/**
|
||||
* @typedef {Object} SwfManifest
|
||||
* @property {number} manifestVersion
|
||||
* @property {string} lastReviewed
|
||||
* @property {'PENDING'|'REVIEWED'} externalReviewStatus
|
||||
* @property {SwfManifestEntry[]} funds
|
||||
*/
|
||||
|
||||
function fail(msg) {
|
||||
throw new Error(`[swf-manifest] ${msg}`);
|
||||
}
|
||||
|
||||
function assertZeroToOne(value, path) {
|
||||
if (typeof value !== 'number' || Number.isNaN(value) || value < 0 || value > 1) {
|
||||
fail(`${path}: expected number in [0, 1], got ${JSON.stringify(value)}`);
|
||||
}
|
||||
}
|
||||
|
||||
function assertIso2(value, path) {
|
||||
if (typeof value !== 'string' || !/^[A-Z]{2}$/.test(value)) {
|
||||
fail(`${path}: expected ISO-3166-1 alpha-2 country code, got ${JSON.stringify(value)}`);
|
||||
}
|
||||
}
|
||||
|
||||
function assertNonEmptyString(value, path) {
|
||||
if (typeof value !== 'string' || value.trim().length === 0) {
|
||||
fail(`${path}: expected non-empty string, got ${JSON.stringify(value)}`);
|
||||
}
|
||||
}
|
||||
|
||||
function validateClassification(cls, path) {
|
||||
if (!cls || typeof cls !== 'object') fail(`${path}: expected object`);
|
||||
const c = /** @type {Record<string, unknown>} */ (cls);
|
||||
assertZeroToOne(c.access, `${path}.access`);
|
||||
assertZeroToOne(c.liquidity, `${path}.liquidity`);
|
||||
assertZeroToOne(c.transparency, `${path}.transparency`);
|
||||
return { access: c.access, liquidity: c.liquidity, transparency: c.transparency };
|
||||
}
|
||||
|
||||
function validateRationale(rat, path) {
|
||||
if (!rat || typeof rat !== 'object') fail(`${path}: expected object`);
|
||||
const r = /** @type {Record<string, unknown>} */ (rat);
|
||||
assertNonEmptyString(r.access, `${path}.access`);
|
||||
assertNonEmptyString(r.liquidity, `${path}.liquidity`);
|
||||
assertNonEmptyString(r.transparency, `${path}.transparency`);
|
||||
return { access: r.access, liquidity: r.liquidity, transparency: r.transparency };
|
||||
}
|
||||
|
||||
function validateSources(sources, path) {
|
||||
if (!Array.isArray(sources) || sources.length === 0) fail(`${path}: expected non-empty array`);
|
||||
for (const [srcIdx, src] of sources.entries()) {
|
||||
assertNonEmptyString(src, `${path}[${srcIdx}]`);
|
||||
}
|
||||
return sources.slice();
|
||||
}
|
||||
|
||||
// Optional wikipedia hints — used by the Wikipedia fallback scraper
|
||||
// in scripts/seed-sovereign-wealth.mjs. Either `abbrev` or `fund_name`
|
||||
// must be present if the block is present (otherwise the scraper has
|
||||
// nothing to match against). `article_url` is optional and activates
|
||||
// the Tier 3b per-fund infobox fallback.
|
||||
function validateWikipediaHints(block, path) {
|
||||
if (block == null) return undefined;
|
||||
if (typeof block !== 'object') fail(`${path}: expected object`);
|
||||
const w = /** @type {Record<string, unknown>} */ (block);
|
||||
const abbrev = w.abbrev;
|
||||
const fundName = w.fund_name;
|
||||
const articleUrl = w.article_url;
|
||||
if (abbrev != null && typeof abbrev !== 'string') {
|
||||
fail(`${path}.abbrev: expected string, got ${JSON.stringify(abbrev)}`);
|
||||
}
|
||||
if (fundName != null && typeof fundName !== 'string') {
|
||||
fail(`${path}.fund_name: expected string, got ${JSON.stringify(fundName)}`);
|
||||
}
|
||||
if (articleUrl != null) {
|
||||
if (typeof articleUrl !== 'string') {
|
||||
fail(`${path}.article_url: expected string, got ${JSON.stringify(articleUrl)}`);
|
||||
}
|
||||
if (!/^https:\/\/[a-z]{2,3}\.wikipedia\.org\//.test(articleUrl)) {
|
||||
fail(`${path}.article_url: expected a https://<lang>.wikipedia.org/... URL, got ${JSON.stringify(articleUrl)}`);
|
||||
}
|
||||
}
|
||||
if (!abbrev && !fundName) {
|
||||
fail(`${path}: at least one of abbrev or fund_name must be provided`);
|
||||
}
|
||||
return {
|
||||
...(abbrev ? { abbrev } : {}),
|
||||
...(fundName ? { fundName } : {}),
|
||||
...(articleUrl ? { articleUrl } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
function validateFundEntry(raw, idx, seenFundKeys) {
|
||||
const path = `funds[${idx}]`;
|
||||
if (!raw || typeof raw !== 'object') fail(`${path}: expected object`);
|
||||
const f = /** @type {Record<string, unknown>} */ (raw);
|
||||
|
||||
assertIso2(f.country, `${path}.country`);
|
||||
assertNonEmptyString(f.fund, `${path}.fund`);
|
||||
assertNonEmptyString(f.display_name, `${path}.display_name`);
|
||||
|
||||
const dedupeKey = `${f.country}:${f.fund}`;
|
||||
if (seenFundKeys.has(dedupeKey)) fail(`${path}: duplicate fund identifier ${dedupeKey}`);
|
||||
seenFundKeys.add(dedupeKey);
|
||||
|
||||
const classification = validateClassification(f.classification, `${path}.classification`);
|
||||
const rationale = validateRationale(f.rationale, `${path}.rationale`);
|
||||
const sources = validateSources(f.sources, `${path}.sources`);
|
||||
const wikipedia = validateWikipediaHints(f.wikipedia, `${path}.wikipedia`);
|
||||
|
||||
return {
|
||||
country: f.country,
|
||||
fund: f.fund,
|
||||
displayName: f.display_name,
|
||||
...(wikipedia ? { wikipedia } : {}),
|
||||
classification,
|
||||
rationale,
|
||||
sources,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate and normalize a raw parsed manifest object into the
|
||||
* documented schema. Fails loudly on any deviation — the manifest is
|
||||
* supposed to be hand-maintained and reviewer-approved, so silent
|
||||
* coercion would hide errors.
|
||||
*
|
||||
* @param {unknown} raw
|
||||
* @returns {SwfManifest}
|
||||
*/
|
||||
export function validateManifest(raw) {
|
||||
if (!raw || typeof raw !== 'object') fail('manifest root must be an object');
|
||||
const obj = /** @type {Record<string, unknown>} */ (raw);
|
||||
|
||||
const manifestVersion = obj.manifest_version;
|
||||
if (manifestVersion !== 1) fail(`manifest_version: expected 1, got ${JSON.stringify(manifestVersion)}`);
|
||||
|
||||
const lastReviewed = obj.last_reviewed;
|
||||
if (!(lastReviewed instanceof Date) && typeof lastReviewed !== 'string') {
|
||||
fail(`last_reviewed: expected ISO date string or Date, got ${JSON.stringify(lastReviewed)}`);
|
||||
}
|
||||
const lastReviewedStr = lastReviewed instanceof Date
|
||||
? lastReviewed.toISOString().slice(0, 10)
|
||||
: lastReviewed;
|
||||
|
||||
const externalReviewStatus = obj.external_review_status;
|
||||
if (externalReviewStatus !== 'PENDING' && externalReviewStatus !== 'REVIEWED') {
|
||||
fail(`external_review_status: expected 'PENDING' or 'REVIEWED', got ${JSON.stringify(externalReviewStatus)}`);
|
||||
}
|
||||
|
||||
const rawFunds = obj.funds;
|
||||
if (!Array.isArray(rawFunds)) fail('funds: expected array');
|
||||
if (rawFunds.length === 0) fail('funds: must list at least one fund');
|
||||
|
||||
const seenFundKeys = new Set();
|
||||
const funds = rawFunds.map((raw, idx) => validateFundEntry(raw, idx, seenFundKeys));
|
||||
|
||||
return {
|
||||
manifestVersion,
|
||||
lastReviewed: lastReviewedStr,
|
||||
externalReviewStatus,
|
||||
funds,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Load + validate the manifest YAML from disk.
|
||||
*
|
||||
* @param {string} [path] optional override for tests
|
||||
* @returns {SwfManifest}
|
||||
*/
|
||||
export function loadSwfManifest(path = MANIFEST_PATH) {
|
||||
const raw = readFileSync(path, 'utf8');
|
||||
const parsed = parseYaml(raw);
|
||||
return validateManifest(parsed);
|
||||
}
|
||||
|
||||
/**
|
||||
* Index the manifest by ISO-2 country code so downstream callers can
|
||||
* aggregate multiple funds per country without re-scanning the array.
|
||||
*
|
||||
* @param {SwfManifest} manifest
|
||||
* @returns {Map<string, SwfManifestEntry[]>}
|
||||
*/
|
||||
export function groupFundsByCountry(manifest) {
|
||||
const byCountry = new Map();
|
||||
for (const fund of manifest.funds) {
|
||||
const list = byCountry.get(fund.country) ?? [];
|
||||
list.push(fund);
|
||||
byCountry.set(fund.country, list);
|
||||
}
|
||||
return byCountry;
|
||||
}
|
||||
|
||||
export const __TEST_ONLY = { MANIFEST_PATH };
|
||||
531
tests/seed-sovereign-wealth.test.mjs
Normal file
531
tests/seed-sovereign-wealth.test.mjs
Normal file
@@ -0,0 +1,531 @@
|
||||
import assert from 'node:assert/strict';
|
||||
import { describe, it } from 'node:test';
|
||||
|
||||
import {
|
||||
declareRecords,
|
||||
detectCurrency,
|
||||
lookupUsdRate,
|
||||
matchWikipediaRecord,
|
||||
parseWikipediaArticleInfobox,
|
||||
parseWikipediaRankingsTable,
|
||||
validate,
|
||||
} from '../scripts/seed-sovereign-wealth.mjs';
|
||||
import { SHARED_FX_FALLBACKS } from '../scripts/_seed-utils.mjs';
|
||||
|
||||
// Fixture HTML mirrors the structure observed on the shipping
|
||||
// Wikipedia "List of sovereign wealth funds" article (captured
|
||||
// 2026-04-23). Kept inline so the scraper's parsing rules are
|
||||
// exercised without a live network round-trip. If Wikipedia later
|
||||
// changes the column order or header text, update this fixture AND
|
||||
// the assumed-columns comment in scripts/seed-sovereign-wealth.mjs
|
||||
// in the same commit.
|
||||
|
||||
const FIXTURE_HTML = `
|
||||
<html><body>
|
||||
<table class="wikitable sortable static-row-numbers">
|
||||
<thead>
|
||||
<tr>
|
||||
<th scope="col">Country or region</th>
|
||||
<th scope="col">Abbrev.</th>
|
||||
<th scope="col">Fund name</th>
|
||||
<th scope="col">Assets</th>
|
||||
<th scope="col">Inception</th>
|
||||
<th scope="col">Origin</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><a href="/wiki/Norway">Norway</a></td>
|
||||
<td>GPF-G</td>
|
||||
<td><a href="/wiki/GPFG">Government Pension Fund Global</a></td>
|
||||
<td>2,117<sup>37</sup></td>
|
||||
<td>1990</td>
|
||||
<td>Oil & Gas</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="/wiki/UAE">United Arab Emirates</a></td>
|
||||
<td>ADIA</td>
|
||||
<td><a href="/wiki/ADIA">Abu Dhabi Investment Authority</a></td>
|
||||
<td>1,128<sup>40</sup></td>
|
||||
<td>1976</td>
|
||||
<td>Oil & Gas</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="/wiki/UAE">United Arab Emirates</a></td>
|
||||
<td></td>
|
||||
<td><a href="/wiki/Mubadala">Mubadala Investment Company</a></td>
|
||||
<td>302.0<sup>41</sup></td>
|
||||
<td>2002</td>
|
||||
<td>Oil & Gas</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="/wiki/Singapore">Singapore</a></td>
|
||||
<td>GIC</td>
|
||||
<td><a href="/wiki/GIC">GIC Private Limited</a></td>
|
||||
<td>801</td>
|
||||
<td>1981</td>
|
||||
<td>Non-commodity</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="/wiki/Singapore">Singapore</a></td>
|
||||
<td></td>
|
||||
<td><a href="/wiki/Temasek">Temasek Holdings</a></td>
|
||||
<td>382</td>
|
||||
<td>1974</td>
|
||||
<td>Non-commodity</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="/wiki/NoData">No Data Row</a></td>
|
||||
<td>NODATA</td>
|
||||
<td>Example fund without assets</td>
|
||||
<td></td>
|
||||
<td>2000</td>
|
||||
<td>Non-commodity</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</body></html>
|
||||
`;
|
||||
|
||||
describe('parseWikipediaRankingsTable — fixture-based scraping', () => {
|
||||
const cache = parseWikipediaRankingsTable(FIXTURE_HTML);
|
||||
|
||||
it('indexes funds by normalized abbreviation into record lists', () => {
|
||||
// GPF-G → GPFG (normalized: uppercase, strip punctuation). Lookup
|
||||
// returns a list so ambiguous abbrevs (e.g. PIF → Saudi vs Palestine
|
||||
// on the live article) can be disambiguated at match time.
|
||||
const gpfgList = cache.byAbbrev.get('GPFG');
|
||||
assert.ok(Array.isArray(gpfgList) && gpfgList.length === 1, 'GPFG should have exactly one candidate in the fixture');
|
||||
const [gpfg] = gpfgList;
|
||||
assert.equal(gpfg.aum, 2_117_000_000_000);
|
||||
assert.equal(gpfg.fundName, 'Government Pension Fund Global');
|
||||
assert.equal(gpfg.countryName, 'Norway');
|
||||
assert.equal(gpfg.inceptionYear, 1990);
|
||||
|
||||
assert.equal(cache.byAbbrev.get('ADIA')?.[0]?.aum, 1_128_000_000_000);
|
||||
assert.equal(cache.byAbbrev.get('GIC')?.[0]?.aum, 801_000_000_000);
|
||||
});
|
||||
|
||||
it('indexes funds by normalized fund name for abbrev-less rows', () => {
|
||||
// Mubadala and Temasek have no abbreviation in the fixture,
|
||||
// so they must still be matchable by fundName.
|
||||
const mubadalaList = cache.byFundName.get('mubadala investment company');
|
||||
assert.ok(mubadalaList && mubadalaList.length === 1);
|
||||
assert.equal(mubadalaList[0].aum, 302_000_000_000);
|
||||
|
||||
const temasekList = cache.byFundName.get('temasek holdings');
|
||||
assert.ok(temasekList && temasekList.length === 1);
|
||||
assert.equal(temasekList[0].aum, 382_000_000_000);
|
||||
});
|
||||
|
||||
it('strips inline HTML + footnote references from the Assets cell', () => {
|
||||
// `2,117<sup>37</sup>` — the footnote int must be stripped
|
||||
// before parsing. `<sup>` strips to a space so the ref is a
|
||||
// separate token, not welded into the number.
|
||||
assert.equal(cache.byAbbrev.get('GPFG')[0].aum, 2_117_000_000_000);
|
||||
});
|
||||
|
||||
it('skips rows with missing or malformed Assets value', () => {
|
||||
assert.equal(cache.byAbbrev.get('NODATA'), undefined);
|
||||
assert.equal(cache.byFundName.get('example fund without assets'), undefined);
|
||||
});
|
||||
|
||||
it('handles decimal AUM values (e.g. "302.0")', () => {
|
||||
const mubadalaList = cache.byFundName.get('mubadala investment company');
|
||||
assert.equal(mubadalaList[0].aum, 302_000_000_000);
|
||||
});
|
||||
|
||||
it('throws loudly when the expected wikitable is missing', () => {
|
||||
assert.throws(() => parseWikipediaRankingsTable('<html><body>no tables here</body></html>'),
|
||||
/wikitable not found/);
|
||||
});
|
||||
});
|
||||
|
||||
// Separate describe block for the abbrev-collision disambiguation
|
||||
// case since it requires a fixture with multiple rows sharing an
|
||||
// abbrev. This is the exact class of bug observed on the live
|
||||
// Wikipedia article (PIF → Saudi PIF + Palestine Investment Fund).
|
||||
describe('parseWikipediaRankingsTable — abbrev collisions', () => {
|
||||
const COLLIDING_HTML = `
|
||||
<table class="wikitable">
|
||||
<thead><tr>
|
||||
<th>Country</th><th>Abbrev.</th><th>Fund name</th>
|
||||
<th>Assets</th><th>Inception</th><th>Origin</th>
|
||||
</tr></thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>Saudi Arabia</td><td>PIF</td><td>Public Investment Fund</td>
|
||||
<td>925</td><td>1971</td><td>Oil Gas</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Palestine</td><td>PIF</td><td>Palestine Investment Fund</td>
|
||||
<td>0.9</td><td>2003</td><td>Non-commodity</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>`;
|
||||
|
||||
it('keeps BOTH colliding records under the shared abbrev key', () => {
|
||||
const cache = parseWikipediaRankingsTable(COLLIDING_HTML);
|
||||
const pifList = cache.byAbbrev.get('PIF');
|
||||
assert.ok(Array.isArray(pifList));
|
||||
assert.equal(pifList.length, 2, 'both colliding PIF records must be retained — silent overwrite would shadow Saudi PIF with Palestine');
|
||||
});
|
||||
});
|
||||
|
||||
describe('matchWikipediaRecord — manifest-driven lookup', () => {
|
||||
const cache = parseWikipediaRankingsTable(FIXTURE_HTML);
|
||||
|
||||
it('matches by abbrev when hints + country align', () => {
|
||||
const fund = {
|
||||
country: 'NO',
|
||||
fund: 'gpfg',
|
||||
wikipedia: { abbrev: 'GPF-G', fundName: 'Government Pension Fund Global' },
|
||||
};
|
||||
const hit = matchWikipediaRecord(fund, cache);
|
||||
assert.ok(hit);
|
||||
assert.equal(hit.fundName, 'Government Pension Fund Global');
|
||||
});
|
||||
|
||||
it('falls back to fund-name match when no abbrev is provided', () => {
|
||||
const fund = {
|
||||
country: 'AE',
|
||||
fund: 'mubadala',
|
||||
wikipedia: { fundName: 'Mubadala Investment Company' },
|
||||
};
|
||||
const hit = matchWikipediaRecord(fund, cache);
|
||||
assert.ok(hit);
|
||||
assert.equal(hit.aum, 302_000_000_000);
|
||||
});
|
||||
|
||||
it('normalizes abbrev punctuation (GPF-G ≡ GPFG)', () => {
|
||||
const fund = { country: 'NO', fund: 'gpfg', wikipedia: { abbrev: 'GPFG' } };
|
||||
const hit = matchWikipediaRecord(fund, cache);
|
||||
assert.ok(hit, 'normalized-abbrev match should succeed');
|
||||
});
|
||||
|
||||
it('returns null when no hints match', () => {
|
||||
const fund = {
|
||||
country: 'NO',
|
||||
fund: 'unknown',
|
||||
wikipedia: { abbrev: 'XXXX', fundName: 'Nonexistent Fund' },
|
||||
};
|
||||
assert.equal(matchWikipediaRecord(fund, cache), null);
|
||||
});
|
||||
|
||||
it('returns null when manifest entry has no wikipedia hints', () => {
|
||||
const fund = { country: 'NO', fund: 'no-hints' };
|
||||
assert.equal(matchWikipediaRecord(fund, cache), null);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Tier 3b: per-fund Wikipedia article infobox ──
|
||||
//
|
||||
// Activated for funds editorially excluded from the /wiki/List_of_
|
||||
// sovereign_wealth_funds article (Temasek is the canonical case —
|
||||
// Wikipedia classifies it as a "state holding company" rather than an
|
||||
// SWF, despite the manifest including it per plan §3.4).
|
||||
//
|
||||
// The infobox parser must:
|
||||
// - scan rows for "Total assets", "Assets under management", "AUM",
|
||||
// "Net assets", "Net portfolio value" labels
|
||||
// - detect non-USD currencies (S$, €, £, NOK, etc.) and convert via
|
||||
// the FX_TO_USD table
|
||||
// - extract the year tag "(2025)" from the value for freshness
|
||||
// - skip rows whose currency isn't in the FX table (loud, not silent)
|
||||
|
||||
describe('detectCurrency — symbol and code detection', () => {
|
||||
it('distinguishes US$ from S$ from $', () => {
|
||||
assert.equal(detectCurrency('US$ 1,128 billion'), 'USD');
|
||||
assert.equal(detectCurrency('S$ 434 billion'), 'SGD');
|
||||
// Bare $ must NOT match US$ or S$ patterns, and must require a
|
||||
// digit after.
|
||||
assert.equal(detectCurrency('$ 500 billion'), 'USD');
|
||||
});
|
||||
|
||||
it('detects Norwegian krone via NOK or kr', () => {
|
||||
assert.equal(detectCurrency('NOK 18.7 trillion'), 'NOK');
|
||||
assert.equal(detectCurrency('17,500 kr 500 billion'), 'NOK');
|
||||
});
|
||||
|
||||
it('detects EUR via € symbol or ISO code', () => {
|
||||
assert.equal(detectCurrency('€ 500 million'), 'EUR');
|
||||
assert.equal(detectCurrency('500 EUR billion'), 'EUR');
|
||||
});
|
||||
|
||||
it('returns null when no currency signal is present', () => {
|
||||
assert.equal(detectCurrency('500 billion'), null);
|
||||
assert.equal(detectCurrency(''), null);
|
||||
});
|
||||
});
|
||||
|
||||
describe('parseWikipediaArticleInfobox — native value + currency extraction', () => {
|
||||
// Parser returns { valueNative, currencyNative, aumYear } and does
|
||||
// NOT convert to USD — conversion is applied at the seeder level
|
||||
// via the project-shared `getSharedFxRates` cache (see
|
||||
// scripts/_seed-utils.mjs). Keeping the parser FX-free removes a
|
||||
// duplicate copy of the FX table that would drift from the shared
|
||||
// one.
|
||||
//
|
||||
// Mirrors the Temasek infobox structure (abridged). Real row:
|
||||
// `<tr><th>Total assets</th><td>S$ 434 billion <i>(2025)</i><sup>2</sup></td></tr>`
|
||||
const TEMASEK_INFOBOX = `
|
||||
<html><body>
|
||||
<table class="infobox vcard">
|
||||
<tr><th>Type</th><td>Holding company</td></tr>
|
||||
<tr><th>Founded</th><td>25 June 1974</td></tr>
|
||||
<tr><th>Total assets</th><td>S$ 434 billion <i>(2025)</i><sup>2</sup></td></tr>
|
||||
<tr><th>Owner</th><td>Ministry of Finance</td></tr>
|
||||
</table>
|
||||
</body></html>
|
||||
`;
|
||||
|
||||
it('extracts S$ 434 billion as native SGD value + year tag', () => {
|
||||
const hit = parseWikipediaArticleInfobox(TEMASEK_INFOBOX);
|
||||
assert.ok(hit, 'Temasek infobox should produce a hit');
|
||||
assert.equal(hit.currencyNative, 'SGD');
|
||||
assert.equal(hit.valueNative, 434_000_000_000);
|
||||
assert.equal(hit.aumYear, 2025);
|
||||
});
|
||||
|
||||
it('handles USD-native infoboxes (currency detected as USD)', () => {
|
||||
const html = `<table class="infobox">
|
||||
<tr><th>AUM</th><td>US$ 1,500 billion (2025)</td></tr>
|
||||
</table>`;
|
||||
const hit = parseWikipediaArticleInfobox(html);
|
||||
assert.ok(hit);
|
||||
assert.equal(hit.currencyNative, 'USD');
|
||||
assert.equal(hit.valueNative, 1_500_000_000_000);
|
||||
});
|
||||
|
||||
it('parses trillion-unit values (NOK 18.7 trillion)', () => {
|
||||
const html = `<table class="infobox">
|
||||
<tr><th>Net assets</th><td>NOK 18.7 trillion (2025)</td></tr>
|
||||
</table>`;
|
||||
const hit = parseWikipediaArticleInfobox(html);
|
||||
assert.ok(hit);
|
||||
assert.equal(hit.currencyNative, 'NOK');
|
||||
assert.equal(hit.valueNative, 18_700_000_000_000);
|
||||
});
|
||||
|
||||
it('returns null when no AUM-labeled row is present', () => {
|
||||
const html = `<table class="infobox">
|
||||
<tr><th>Type</th><td>Holding company</td></tr>
|
||||
</table>`;
|
||||
assert.equal(parseWikipediaArticleInfobox(html), null);
|
||||
});
|
||||
|
||||
it('returns null when the infobox itself is missing', () => {
|
||||
assert.equal(parseWikipediaArticleInfobox('<html>no infobox</html>'), null);
|
||||
});
|
||||
});
|
||||
|
||||
describe('lookupUsdRate — project-shared FX integration', () => {
|
||||
// Verifies the parser → FX conversion pipeline uses the project's
|
||||
// canonical FX source (scripts/_seed-utils.mjs SHARED_FX_FALLBACKS +
|
||||
// getSharedFxRates Redis cache) rather than a duplicate table.
|
||||
|
||||
it('returns 1.0 for USD regardless of rate map', () => {
|
||||
assert.equal(lookupUsdRate('USD', {}), 1.0);
|
||||
assert.equal(lookupUsdRate('USD', null), 1.0);
|
||||
assert.equal(lookupUsdRate('USD', { USD: 999 }), 1.0);
|
||||
});
|
||||
|
||||
it('prefers the live rate map over the static fallback', () => {
|
||||
// Simulate getSharedFxRates returning a fresh Yahoo rate. The static
|
||||
// fallback has SGD=0.74; the live rate could drift (e.g. 0.751).
|
||||
assert.equal(lookupUsdRate('SGD', { SGD: 0.751 }), 0.751);
|
||||
});
|
||||
|
||||
it('falls back to SHARED_FX_FALLBACKS when the live rate is missing', () => {
|
||||
assert.equal(lookupUsdRate('SGD', {}), SHARED_FX_FALLBACKS.SGD);
|
||||
assert.equal(lookupUsdRate('NOK', { EUR: 1.05 }), SHARED_FX_FALLBACKS.NOK);
|
||||
});
|
||||
|
||||
it('returns null for unknown currencies (caller skips the fund)', () => {
|
||||
assert.equal(lookupUsdRate('ZZZ', {}), null);
|
||||
assert.equal(lookupUsdRate('XXX', { XXX: 0 }), null);
|
||||
});
|
||||
|
||||
it('converts Temasek S$ 434B end-to-end via shared fallback table', () => {
|
||||
const hit = parseWikipediaArticleInfobox(`
|
||||
<table class="infobox"><tr><th>Total assets</th><td>S$ 434 billion (2025)</td></tr></table>
|
||||
`);
|
||||
const rate = lookupUsdRate(hit.currencyNative, {});
|
||||
const aumUsd = hit.valueNative * rate;
|
||||
// 434B × 0.74 = 321.16B. Matches SHARED_FX_FALLBACKS.SGD.
|
||||
assert.ok(aumUsd > 300_000_000_000 && aumUsd < 340_000_000_000,
|
||||
`expected ~US$ 320B, got ${aumUsd}`);
|
||||
});
|
||||
});
|
||||
|
||||
describe('validate — reject null-object masquerading as object', () => {
|
||||
// `typeof null === 'object'` in JS, so a bare `typeof x === 'object'`
|
||||
// would let { countries: null } through and break downstream. This
|
||||
// test pins the strict non-null check.
|
||||
|
||||
it('rejects { countries: null }', () => {
|
||||
assert.equal(validate({ countries: null }), false);
|
||||
});
|
||||
|
||||
it('rejects missing countries field', () => {
|
||||
assert.equal(validate({}), false);
|
||||
assert.equal(validate(null), false);
|
||||
assert.equal(validate(undefined), false);
|
||||
});
|
||||
|
||||
it('rejects array countries (typeof [] === object too)', () => {
|
||||
assert.equal(validate({ countries: [] }), false);
|
||||
});
|
||||
|
||||
it('accepts empty object (during Railway-cron bake-in window)', () => {
|
||||
assert.equal(validate({ countries: {} }), true);
|
||||
});
|
||||
|
||||
it('accepts populated countries', () => {
|
||||
assert.equal(validate({ countries: { NO: { funds: [] } } }), true);
|
||||
});
|
||||
});
|
||||
|
||||
describe('parseWikipediaRankingsTable — nested-table depth awareness', () => {
|
||||
// Wikipedia occasionally embeds mini-tables (sort helpers, footnote
|
||||
// boxes) inside a wikitable cell. A lazy `[\s\S]*?</table>` regex
|
||||
// would stop at the FIRST `</table>` and silently drop every row
|
||||
// after the cell containing the nested table. The depth-aware
|
||||
// extractor must walk the full open/close pair.
|
||||
|
||||
it('does not truncate at a nested </table> inside a cell', () => {
|
||||
const html = `
|
||||
<table class="wikitable">
|
||||
<tr><th>Country</th><th>Abbrev.</th><th>Fund</th><th>Assets</th><th>Inception</th></tr>
|
||||
<tr>
|
||||
<td>Norway</td><td>GPF-G</td>
|
||||
<td>Government Pension Fund Global
|
||||
<table class="mini-sort-helper"><tr><td>nested</td></tr></table>
|
||||
</td>
|
||||
<td>2000</td><td>1990</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>UAE</td><td>ADIA</td>
|
||||
<td>Abu Dhabi Investment Authority</td>
|
||||
<td>1128</td><td>1976</td>
|
||||
</tr>
|
||||
</table>
|
||||
`;
|
||||
const cache = parseWikipediaRankingsTable(html);
|
||||
// Without depth awareness, ADIA would be silently dropped because
|
||||
// the nested </table> inside GPF-G's cell would close the outer
|
||||
// match at row 1.
|
||||
assert.ok(cache.byAbbrev.get('ADIA')?.[0]?.aum === 1_128_000_000_000,
|
||||
'ADIA must survive — nested </table> in a prior cell should not truncate the wikitable');
|
||||
assert.ok(cache.byAbbrev.get('GPFG')?.[0]?.aum === 2_000_000_000_000);
|
||||
});
|
||||
});
|
||||
|
||||
describe('parseWikipediaRankingsTable — aumYear accuracy', () => {
|
||||
it('sets aumYear=null for list-article rows (no per-row data-year annotation)', () => {
|
||||
const html = `
|
||||
<table class="wikitable">
|
||||
<tr><th>Country</th><th>Abbrev.</th><th>Fund</th><th>Assets</th><th>Inception</th></tr>
|
||||
<tr><td>Norway</td><td>GPF-G</td><td>Government Pension Fund Global</td><td>2117</td><td>1990</td></tr>
|
||||
</table>
|
||||
`;
|
||||
const cache = parseWikipediaRankingsTable(html);
|
||||
const gpfg = cache.byAbbrev.get('GPFG')?.[0];
|
||||
assert.ok(gpfg);
|
||||
assert.equal(gpfg.aumYear, null,
|
||||
'aumYear must be null — the list article publishes no per-row data-year, and claiming the scrape year would mislead freshness auditors');
|
||||
// Infobox path (Tier 3b) sets a real aumYear from "(YYYY)" tag —
|
||||
// see the separate infobox test block for that contract.
|
||||
});
|
||||
});
|
||||
|
||||
describe('declareRecords — partial-seed guard for multi-fund countries', () => {
|
||||
// Regression: for multi-fund countries (AE = ADIA + Mubadala,
|
||||
// SG = GIC + Temasek) a single scraper drift would silently publish
|
||||
// a partial totalEffectiveMonths if we counted "any fund matched"
|
||||
// as a successful country-seed. declareRecords MUST only count
|
||||
// countries with completeness === 1.0 so a secondary-fund drift
|
||||
// drops the seed-health record count and triggers the operational
|
||||
// alarm, rather than leaking an under-weighted total into the
|
||||
// ranking.
|
||||
|
||||
it('counts only countries where all manifest funds matched', () => {
|
||||
const data = {
|
||||
countries: {
|
||||
NO: { funds: [{}], expectedFunds: 1, matchedFunds: 1, completeness: 1.0 },
|
||||
AE: { funds: [{}, {}], expectedFunds: 2, matchedFunds: 2, completeness: 1.0 },
|
||||
SG: { funds: [{}], expectedFunds: 2, matchedFunds: 1, completeness: 0.5 }, // partial
|
||||
},
|
||||
};
|
||||
assert.equal(declareRecords(data), 2,
|
||||
'SG (partial, completeness=0.5) must NOT count — recordCount stays at 2, not 3');
|
||||
});
|
||||
|
||||
it('returns 0 when every country is partial', () => {
|
||||
const data = {
|
||||
countries: {
|
||||
AE: { expectedFunds: 2, matchedFunds: 1, completeness: 0.5 },
|
||||
SG: { expectedFunds: 2, matchedFunds: 1, completeness: 0.5 },
|
||||
},
|
||||
};
|
||||
assert.equal(declareRecords(data), 0,
|
||||
'all-partial payload must drop recordCount to 0 — the seed-meta alarm surfaces a degraded run');
|
||||
});
|
||||
|
||||
it('returns 0 on empty / malformed payload', () => {
|
||||
assert.equal(declareRecords({}), 0);
|
||||
assert.equal(declareRecords({ countries: {} }), 0);
|
||||
assert.equal(declareRecords(null), 0);
|
||||
assert.equal(declareRecords(undefined), 0);
|
||||
});
|
||||
|
||||
it('ignores entries lacking the completeness field (defensive)', () => {
|
||||
// Old payload shape (pre-completeness) must not spuriously count.
|
||||
const data = { countries: { XX: { funds: [{}], totalEffectiveMonths: 1 } } };
|
||||
assert.equal(declareRecords(data), 0);
|
||||
});
|
||||
});
|
||||
|
||||
describe('matchWikipediaRecord — country-disambiguation on abbrev collisions', () => {
|
||||
// This replays the exact class of bug observed on the live Wikipedia
|
||||
// article: "PIF" resolves to BOTH Saudi Arabia's Public Investment
|
||||
// Fund (~$925B) and Palestine's Palestine Investment Fund (~$900M).
|
||||
// Without country disambiguation, a naive Map.set overwrites one
|
||||
// with the other — Saudi PIF would silently return Palestine's AUM
|
||||
// (three orders of magnitude smaller), breaking the score for every
|
||||
// Saudi resilience read.
|
||||
const COLLIDING_HTML = `
|
||||
<table class="wikitable">
|
||||
<thead><tr>
|
||||
<th>Country</th><th>Abbrev.</th><th>Fund name</th>
|
||||
<th>Assets</th><th>Inception</th><th>Origin</th>
|
||||
</tr></thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>Saudi Arabia</td><td>PIF</td><td>Public Investment Fund</td>
|
||||
<td>925</td><td>1971</td><td>Oil Gas</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Palestine</td><td>PIF</td><td>Palestine Investment Fund</td>
|
||||
<td>0.9</td><td>2003</td><td>Non-commodity</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>`;
|
||||
const cache = parseWikipediaRankingsTable(COLLIDING_HTML);
|
||||
|
||||
it('picks the Saudi record for fund.country=SA', () => {
|
||||
const fund = { country: 'SA', fund: 'pif', wikipedia: { abbrev: 'PIF' } };
|
||||
const hit = matchWikipediaRecord(fund, cache);
|
||||
assert.ok(hit);
|
||||
assert.equal(hit.countryName, 'Saudi Arabia');
|
||||
assert.equal(hit.aum, 925_000_000_000);
|
||||
});
|
||||
|
||||
it('returns null (not the wrong record) when country is unknown to the disambiguator', () => {
|
||||
// Hypothetical fund from a country not in ISO2_TO_WIKIPEDIA_COUNTRY_NAME.
|
||||
// Must NOT silently return Saudi's or Palestine's record.
|
||||
const fund = { country: 'ZZ', fund: 'pif', wikipedia: { abbrev: 'PIF' } };
|
||||
assert.equal(matchWikipediaRecord(fund, cache), null,
|
||||
'ambiguous match with no country mapping must return null — silent wrong-country match is the exact bug this test guards against');
|
||||
});
|
||||
});
|
||||
147
tests/swf-classification-manifest.test.mjs
Normal file
147
tests/swf-classification-manifest.test.mjs
Normal file
@@ -0,0 +1,147 @@
|
||||
import assert from 'node:assert/strict';
|
||||
import { describe, it } from 'node:test';
|
||||
|
||||
import {
|
||||
groupFundsByCountry,
|
||||
loadSwfManifest,
|
||||
validateManifest,
|
||||
} from '../scripts/shared/swf-manifest-loader.mjs';
|
||||
|
||||
// Validate the shipped SWF classification manifest (PR 2 §3.4). This
|
||||
// test is the only CI gate on the YAML: any schema violation (missing
|
||||
// rationale, out-of-range score, duplicate fund identifier, missing
|
||||
// source citation) fails the build before a malformed manifest can
|
||||
// reach the seeder. Adding a new fund or adjusting a score must run
|
||||
// this test locally.
|
||||
//
|
||||
// The manifest is reviewer-approved metadata, not auto-generated, so
|
||||
// the test intentionally prefers loud assertion failures over silent
|
||||
// coercion. Downstream consumers (seeder, future scorer, methodology
|
||||
// lint) can rely on the returned object shape without re-validating.
|
||||
|
||||
describe('SWF classification manifest — shipped YAML', () => {
|
||||
const manifest = loadSwfManifest();
|
||||
|
||||
it('parses with a recognized schema version', () => {
|
||||
assert.equal(manifest.manifestVersion, 1, 'bump both YAML manifest_version AND this assertion when evolving the schema');
|
||||
});
|
||||
|
||||
it('records an external-review status (PENDING until sign-off)', () => {
|
||||
assert.ok(
|
||||
manifest.externalReviewStatus === 'PENDING' || manifest.externalReviewStatus === 'REVIEWED',
|
||||
`external_review_status must be PENDING or REVIEWED, got ${manifest.externalReviewStatus}`,
|
||||
);
|
||||
});
|
||||
|
||||
it('lists the first-release set of funds from plan §3.4', () => {
|
||||
const expected = new Set([
|
||||
'NO:gpfg',
|
||||
'AE:adia',
|
||||
'AE:mubadala',
|
||||
'SA:pif',
|
||||
'KW:kia',
|
||||
'QA:qia',
|
||||
'SG:gic',
|
||||
'SG:temasek',
|
||||
]);
|
||||
const actual = new Set(manifest.funds.map((f) => `${f.country}:${f.fund}`));
|
||||
for (const required of expected) {
|
||||
assert.ok(actual.has(required), `plan §3.4 required fund missing from manifest: ${required}`);
|
||||
}
|
||||
});
|
||||
|
||||
it('classification components are all in [0, 1]', () => {
|
||||
for (const fund of manifest.funds) {
|
||||
const { access, liquidity, transparency } = fund.classification;
|
||||
assert.ok(access >= 0 && access <= 1, `${fund.country}:${fund.fund} access out of range: ${access}`);
|
||||
assert.ok(liquidity >= 0 && liquidity <= 1, `${fund.country}:${fund.fund} liquidity out of range: ${liquidity}`);
|
||||
assert.ok(transparency >= 0 && transparency <= 1, `${fund.country}:${fund.fund} transparency out of range: ${transparency}`);
|
||||
}
|
||||
});
|
||||
|
||||
it('every fund carries non-empty rationale strings and source citations', () => {
|
||||
for (const fund of manifest.funds) {
|
||||
assert.ok(fund.rationale.access.length > 20, `${fund.country}:${fund.fund} rationale.access too short`);
|
||||
assert.ok(fund.rationale.liquidity.length > 20, `${fund.country}:${fund.fund} rationale.liquidity too short`);
|
||||
assert.ok(fund.rationale.transparency.length > 20, `${fund.country}:${fund.fund} rationale.transparency too short`);
|
||||
assert.ok(fund.sources.length > 0, `${fund.country}:${fund.fund} has no sources cited`);
|
||||
}
|
||||
});
|
||||
|
||||
it('groupFundsByCountry handles multi-fund countries (AE, SG)', () => {
|
||||
const byCountry = groupFundsByCountry(manifest);
|
||||
assert.ok((byCountry.get('AE') ?? []).length >= 2, 'AE should have ADIA + Mubadala at minimum');
|
||||
assert.ok((byCountry.get('SG') ?? []).length >= 2, 'SG should have GIC + Temasek at minimum');
|
||||
assert.ok((byCountry.get('NO') ?? []).length >= 1, 'NO should have GPFG');
|
||||
});
|
||||
});
|
||||
|
||||
describe('validateManifest — schema enforcement', () => {
|
||||
const minimalValid = () => ({
|
||||
manifest_version: 1,
|
||||
last_reviewed: '2026-04-23',
|
||||
external_review_status: 'PENDING',
|
||||
funds: [
|
||||
{
|
||||
country: 'NO',
|
||||
fund: 'gpfg',
|
||||
display_name: 'Government Pension Fund Global',
|
||||
classification: { access: 0.6, liquidity: 1.0, transparency: 1.0 },
|
||||
rationale: {
|
||||
access: 'Norwegian fiscal rule caps annual withdrawal at expected real return.',
|
||||
liquidity: '100% publicly listed equities + fixed income per NBIM 2025 report.',
|
||||
transparency: 'Full audited AUM, daily returns disclosed. IFSWF full compliance.',
|
||||
},
|
||||
sources: ['https://www.nbim.no/en/the-fund/'],
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
it('accepts a minimal-valid manifest', () => {
|
||||
const out = validateManifest(minimalValid());
|
||||
assert.equal(out.funds.length, 1);
|
||||
assert.equal(out.funds[0].country, 'NO');
|
||||
});
|
||||
|
||||
it('rejects out-of-range classification scores', () => {
|
||||
const bad = minimalValid();
|
||||
bad.funds[0].classification.access = 1.5;
|
||||
assert.throws(() => validateManifest(bad), /access.*expected number in \[0, 1\]/);
|
||||
});
|
||||
|
||||
it('rejects non-ISO2 country codes', () => {
|
||||
const bad = minimalValid();
|
||||
bad.funds[0].country = 'NOR';
|
||||
assert.throws(() => validateManifest(bad), /expected ISO-3166-1 alpha-2/);
|
||||
});
|
||||
|
||||
it('rejects missing rationale strings', () => {
|
||||
const bad = minimalValid();
|
||||
bad.funds[0].rationale.access = '';
|
||||
assert.throws(() => validateManifest(bad), /rationale.access.*expected non-empty string/);
|
||||
});
|
||||
|
||||
it('rejects empty sources list', () => {
|
||||
const bad = minimalValid();
|
||||
bad.funds[0].sources = [];
|
||||
assert.throws(() => validateManifest(bad), /sources.*expected non-empty array/);
|
||||
});
|
||||
|
||||
it('rejects duplicate country:fund identifiers', () => {
|
||||
const bad = minimalValid();
|
||||
bad.funds.push({ ...bad.funds[0] });
|
||||
assert.throws(() => validateManifest(bad), /duplicate fund identifier NO:gpfg/);
|
||||
});
|
||||
|
||||
it('rejects wrong schema version (forces explicit bump)', () => {
|
||||
const bad = minimalValid();
|
||||
bad.manifest_version = 2;
|
||||
assert.throws(() => validateManifest(bad), /manifest_version: expected 1/);
|
||||
});
|
||||
|
||||
it('rejects invalid external_review_status', () => {
|
||||
const bad = minimalValid();
|
||||
bad.external_review_status = 'APPROVED';
|
||||
assert.throws(() => validateManifest(bad), /external_review_status.*expected 'PENDING' or 'REVIEWED'/);
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user