887 Commits

Author SHA1 Message Date
Elie Habib
8655bd81bc feat(energy-atlas): GEM pipeline data import — gas 297, oil 334 (#3406)
* feat(energy-atlas): GEM pipeline data import — gas 75→297, oil 75→334 (parity-push closure)

Closes the ~3.6× pipeline-scale gap that PR #3397's import infrastructure
was built for. Per docs/methodology/pipelines.mdx operator runbook.

Source releases (CC-BY 4.0, attribution preserved in registry envelope):
  - GEM-GGIT-Gas-Pipelines-2025-11.xlsx
    SHA256: f56d8b14400e558f06e53a4205034d3d506fc38c5ae6bf58000252f87b1845e6
    URL:    https://globalenergymonitor.org/wp-content/uploads/2025/11/GEM-GGIT-Gas-Pipelines-2025-11.xlsx
  - GEM-GOIT-Oil-NGL-Pipelines-2025-03.xlsx
    SHA256: d1648d28aed99cfd2264047f1e944ddfccf50ce9feeac7de5db233c601dc3bb2
    URL:    https://globalenergymonitor.org/wp-content/uploads/2025/03/GEM-GOIT-Oil-NGL-Pipelines-2025-03.xlsx

Pre-conversion: GeoJSON (geometry endpoints) + XLSX (column properties) →
canonical operator-shape JSON via /tmp/gem-import/convert.py. Filter knobs:
  - status ∈ {operating, construction}
  - length ≥ 750 km (gas) / 400 km (oil) — asymmetric per-fuel trunk-class
  - capacity unit conversions: bcm/y native; MMcf/d, MMSCMD, mtpa, m3/day,
    bpd, Mb/d, kbd → bcm/y (gas) or bbl/d (oil) at canonical conversion factors.
  - Country names → ISO 3166-1 alpha-2 via pycountry + alias table.

Merge results (via scripts/import-gem-pipelines.mjs --merge):
  gas: +222 added, 15 duplicates skipped (haversine ≤ 5km AND token Jaccard ≥ 0.6)
  oil: +259 added, 16 duplicates skipped
  Final: 297 gas / 334 oil. Hand-curated 75+75 preserved with full evidence;
  GEM rows ship physicalStateSource='gem', classifierConfidence=0.4,
  operatorStatement=null, sanctionRefs=[].

Floor bump:
  scripts/_pipeline-registry.mjs MIN_PIPELINES_PER_REGISTRY 8 → 200.
  Live counts (297/334) leave ~100 rows of jitter headroom so a partial
  re-import or coverage-narrowing release fails loud rather than halving
  the registry silently.

Tests:
  - tests/pipelines-registry.test.mts: bumped synthetic-registry
    Array.from({length:8}) → length:210 to clear new floor; added 'gem' to
    the evidence-source whitelist for non-flowing badges (parity with the
    derivePipelinePublicBadge audit done in PR #3397 U1).
  - tests/import-gem-pipelines.test.mjs: bumped registry-conformance loop
    3 → 70 to clear new floor.
  - 51/51 pipeline tests pass; tsc --noEmit clean.

vs peer reference site (281 gas + 265 oil): we now match (gas 297) and
exceed (oil 334). Functional + visual + data parity for the energy variant
is closed; remaining gaps are editorial-cadence (weekly briefing) which
is intentionally out of scope per the parity-push plan.

* docs(energy-atlas): land GEM converter + expand methodology runbook for quarterly refresh

PR #3406 imported the data but didn't land the conversion script that
produced it. This commit lands the converter at scripts/_gem-geojson-to-canonical.py
so future operators can reproduce the import deterministically, and rewrites
the docs/methodology/pipelines.mdx runbook to match what actually works:

- Use GeoJSON (not XLSX) — the XLSX has properties but no lat/lon columns;
  only the GIS .zip's GeoJSON has both. The original runbook said to download
  XLSX which would fail at the lat/lon validation step.
- Cadence: quarterly refresh, with concrete signals (peer-site comparison,
  90-day calendar reminder).
- Source datasets: explicit GGIT (gas) + GOIT (oil/NGL) tracker names so
  future operators don't re-request the wrong dataset (the Extraction
  Tracker = wells/fields, NOT pipelines — ours requires the Infrastructure
  Trackers).
- Last-known-good URLs documented + URL pattern explained as fallback when
  GEM rotates per release.
- Filter knob defaults documented inline (gas ≥ 750km, oil ≥ 400km, status
  ∈ {operating, construction}, capacity unit conversion table).
- Failure-mode table mapping common errors to fixes.

Converter takes paths via env vars (GEM_GAS_GEOJSON, GEM_OIL_GEOJSON,
GEM_DOWNLOADED_AT, GEM_SOURCE_VERSION) instead of hardcoded paths so it
works for any release without code edits.

* fix(energy-atlas): close PR #3406 review findings — dedup + zero-length + test

Three Greptile findings on PR #3406:

P1 — Dedup miss (Dampier-Bunbury):
  Same physical pipeline existed in both registries — curated `dampier-bunbury`
  and GEM-imported `dampier-to-bunbury-natural-gas-pipeline-au` — because GEM
  digitized only the southern 60% of the line. The shared Bunbury terminus
  matched at 13.7 km but the average-endpoint distance was 287 km, just over
  the 5 km gate.
  Fix: scripts/_pipeline-dedup.mjs adds a name-set-identity short-circuit —
  if Jaccard == 1.0 (after stopword removal) AND any of the 4 endpoint
  pairings is ≤ 25 km, treat as duplicate. The 25 km anchor preserves the
  existing "name collision in different ocean → still added" contract.
  Added regression test: identical Dampier-Bunbury inputs → 0 added, 1
  skipped, matched against `dampier-bunbury`.

P1 — Zero-length geometry (9 rows: Trans-Alaska, Enbridge Line 3, Ichthys, etc.):
  GEM source GeoJSON occasionally has a Point geometry or single-coord
  LineString, producing pipelines where startPoint == endPoint. They render
  as map-point artifacts and skew aggregate-length stats.
  Fix (defense in depth):
    - scripts/_gem-geojson-to-canonical.py drops at conversion time
      (`zero_length` reason in drop log).
    - scripts/_pipeline-registry.mjs validateRegistry rejects defensively
      so even a hand-curated row with degenerate geometry fails loud.

P2 — Test repetition coupled to fixture row count:
  Hardcoded `for (let i = 0; i < 70; i++)` × 3 fixture rows = 210 silently
  breaks if fixture is trimmed below 3.
  Fix: `Math.ceil(REGISTRY_FLOOR / fixture.length) + 5` derives reps from
  the floor and current fixture length.

Re-run --merge with all fixes applied:
  gas: 75 → 293 (+218 added, 17 deduped — was 222/15 before; +2 catches via
       name-set-identity short-circuit; -2 zero-length never imported)
  oil: 75 → 325 (+250 added, 18 deduped — was 259/16; +2 catches; -7 zero-length)

Tests: 74/74 pipeline tests pass; tsc --noEmit clean.
2026-04-25 18:59:46 +04:00
Elie Habib
5c955691a9 feat(energy-atlas): live tanker map layer + contract (parity PR 3, plan U7-U8) (#3402)
* feat(energy-atlas): live tanker map layer + contract (PR 3, plan U7-U8)

Lands the third and final parity-push surface — per-vessel tanker positions
inside chokepoint bounding boxes, refreshed every 60s. Closes the visual
gap with peer reference energy-intel sites for the live AIS tanker view.

Per docs/plans/2026-04-25-003-feat-energy-parity-pushup-plan.md PR 3.
Codex-approved through 8 review rounds against origin/main @ 050073354.

U7 — Contract changes (relay + handler + proto + gateway + rate-limit + test):

- scripts/ais-relay.cjs: parallel `tankerReports` Map populated for AIS
  ship type 80-89 (tanker class) per ITU-R M.1371. SEPARATE from the
  existing `candidateReports` Map (military-only) so the existing
  military-detection consumer's contract stays unchanged. Snapshot
  endpoint extended to accept `bbox=swLat,swLon,neLat,neLon` + `tankers=true`
  query params, with bbox-filtering applied server-side. Tanker reports
  cleaned up on the same retention window as candidate reports; capped
  at 200 per response (10× headroom for global storage).
- proto/worldmonitor/maritime/v1/{get_,}vessel_snapshot.proto:
  - new `bool include_tankers = 6` request field
  - new `repeated SnapshotCandidateReport tanker_reports = 7` response
    field (reuses existing message shape; parallel to candidate_reports)
- server/worldmonitor/maritime/v1/get-vessel-snapshot.ts: REPLACES the
  prior 5-minute `with|without` cache with a request-keyed cache —
  (includeCandidates, includeTankers, quantizedBbox) — at 60s TTL for
  the live-tanker path and 5min TTL for the existing density/disruption
  consumers. Also adds 1° bbox quantization for cache-key reuse and a
  10° max-bbox guard (BboxTooLargeError) to prevent malicious clients
  from pulling all tankers through one query.
- server/gateway.ts: NEW `'live'` cache tier. CacheTier union extended;
  TIER_HEADERS + TIER_CDN_CACHE both gain entries with `s-maxage=60,
  stale-while-revalidate=60`. RPC_CACHE_TIER maps the maritime endpoint
  from `'no-store'` to `'live'` so the CDN absorbs concurrent identical
  requests across all viewers (without this, N viewers × 6 chokepoints
  hit AISStream upstream linearly).
- server/_shared/rate-limit.ts: ENDPOINT_RATE_POLICIES entry for the
  maritime endpoint at 60 req/min/IP — enough headroom for one user's
  6-chokepoint tab plus refreshes; flags only true scrape-class traffic.
- tests/route-cache-tier.test.mjs: regex extended to include `live` so
  the every-route-has-an-explicit-tier check still recognises the new
  mapping. Without this, the new tier would silently drop the maritime
  route from the validator's route map.

U8 — LiveTankersLayer consumer:

- src/services/live-tankers.ts: per-chokepoint fetcher with 60s in-memory
  cache. Promise.allSettled — never .all — so one chokepoint failing
  doesn't blank the whole layer (failed zones serve last-known data).
  Sources bbox centroids from src/config/chokepoint-registry.ts
  (CORRECT location — server/.../​_chokepoint-ids.ts strips lat/lon).
  Default chokepoint set: hormuz_strait, suez, bab_el_mandeb,
  malacca_strait, panama, bosphorus.
- src/components/DeckGLMap.ts: new `createLiveTankersLayer()` ScatterplotLayer
  styled by speed (anchored amber when speed < 0.5 kn, underway cyan,
  unknown gray); new `loadLiveTankers()` async loader with abort-controller
  cancellation. Layer instantiated when `mapLayers.liveTankers && this.liveTankers.length > 0`.
- src/config/map-layer-definitions.ts: `LayerDefinition` for `liveTankers`
  with `renderers: ['flat'], deckGLOnly: true` (matches existing
  storageFacilities/fuelShortages pattern). Added to `VARIANT_LAYER_ORDER.energy`
  near `ais` so getLayersForVariant() and sanitizeLayersForVariant()
  include it on the energy variant — without this addition the layer
  would be silently stripped even when toggled on.
- src/types/index.ts: `liveTankers?: boolean` on the MapLayers union.
- src/config/panels.ts: ENERGY_MAP_LAYERS + ENERGY_MOBILE_MAP_LAYERS
  both gain `liveTankers: true`. Default `false` everywhere else.
- src/services/maritime/index.ts: existing snapshot consumer pinned to
  `includeTankers: false` to satisfy the proto's new required field;
  preserves identical behavior for the AIS-density / military-detection
  surfaces.

Tests:
- npm run typecheck clean.
- 5 unit tests in tests/live-tankers-service.test.mjs cover the default
  chokepoint set (rejects ids that aren't in CHOKEPOINT_REGISTRY), the
  60s cache TTL pin (must match gateway 'live' tier s-maxage), and bbox
  derivation (±2° padding, total span under the 10° handler guard).
- tests/route-cache-tier.test.mjs continues to pass after the regex
  extension; the new maritime tier is correctly extracted.

Defense in depth:
- THREE-layer cache (CDN 'live' tier → handler bbox-keyed 60s → service
  in-memory 60s) means concurrent users hit the relay sub-linearly.
- Server-side 200-vessel cap on tanker_reports + client-side cap;
  protects layer render perf even on a runaway relay payload.
- Bbox-size guard (10° max) prevents a single global-bbox query from
  exfiltrating every tanker.
- Per-IP rate limit at 60/min covers normal use; flags scrape-class only.
- Existing military-detection contract preserved: `candidate_reports`
  field semantics unchanged; consumers self-select via include_tankers
  vs include_candidates rather than the response field changing meaning.

* fix(energy-atlas): wire LiveTankers loop + 400 bbox-range guard (PR3 review)

Three findings from review of #3402:

P1 — loadLiveTankers() was never called (DeckGLMap.ts:2999):
- Add ensureLiveTankersLoop() / stopLiveTankersLoop() helpers paired with
  the layer-enabled / layer-disabled branches in updateLayers(). The
  ensure helper kicks an immediate load + a 60s setInterval; idempotent
  so calling it on every layers update is safe.
- Wire stopLiveTankersLoop() into destroy() and into the layer-disabled
  branch so we don't hammer the relay when the layer is off.
- Layer factory now runs only when liveTankers.length > 0; ensureLoop
  fires on every observed-enabled tick so first-paint kicks the load
  even before the first tanker arrives.

P1 — bbox lat/lon range guard (get-vessel-snapshot.ts:253):
- Out-of-range bboxes (e.g. ne_lat=200) previously passed the size
  guard (200-195=5° < 10°) but failed at the relay, which silently
  drops the bbox param and returns a global capped subset — making
  the layer appear to "work" with stale phantom data.
- Add isValidLatLon() check inside extractAndValidateBbox(): every
  corner must satisfy [-90, 90] / [-180, 180] before the size guard
  runs. Failure throws BboxValidationError.

P2 — BboxTooLargeError surfaced as 500 instead of 400:
- server/error-mapper.ts maps errors to HTTP status by checking
  `'statusCode' in error`. The previous BboxTooLargeError extended
  Error without that property, so the mapper fell through to
  "unhandled error" → 500.
- Rename to BboxValidationError, add `readonly statusCode = 400`.
  Mapper now surfaces it as HTTP 400 with a descriptive reason.
- Keep BboxTooLargeError as a backwards-compat alias so existing
  imports / tests don't break.

Tests:
- Updated tests/server-handlers.test.mjs structural test to pin the
  new class name + statusCode + lat/lon range checks. 24 tests pass.
- typecheck (src + api) clean.

* fix(energy-atlas): thread AbortSignal through fetchLiveTankers (PR3 review #2)

P2 — AbortController was created + aborted but signal was never passed
into the actual fetch path (DeckGLMap.ts:3048 / live-tankers.ts:100):
- Toggling the layer off, destroying the map, or starting a new refresh
  did not actually cancel in-flight network work. A slow older refresh
  could complete after a newer one and overwrite this.liveTankers with
  stale data.

Threading:
- fetchLiveTankers() now accepts `options.signal: AbortSignal`. Signal
  is passed through to client.getVesselSnapshot() per chokepoint via
  the Connect-RPC client's standard `{ signal }` option.
- Per-zone abort handling: bail early if signal is already aborted
  before the fetch starts (saves a wasted RPC + cache write); re-check
  after the fetch resolves so a slow resolver can't clobber cache
  after the caller cancelled.

Stale-result race guard in DeckGLMap.loadLiveTankers:
- Capture controller in a local before storing on this.liveTankersAbort.
- After fetchLiveTankers resolves, drop the result if EITHER:
  - controller.signal is now aborted (newer load cancelled this one)
  - this.liveTankersAbort points to a different controller (a newer
    load already started + replaced us in the field)
- Without these guards, an older fetch that completed despite
  signal.aborted could still write to this.liveTankers and call
  updateLayers, racing with the newer load.

Tests: 1 new signature-pin test in tests/live-tankers-service.test.mts
verifies fetchLiveTankers accepts options.signal — guards against future
edits silently dropping the parameter and re-introducing the race.
6 tests pass. typecheck clean.

* fix(energy-atlas): bound vessel-snapshot cache via LRU eviction (PR3 review)

Greptile P2 finding: the in-process cache Map grows unbounded across the
serverless instance lifetime. Each distinct (includeCandidates,
includeTankers, quantizedBbox) triple creates a slot that's never evicted.
With 1° quantization and a misbehaving client the keyspace is ~64,000
entries — realistic load is ~12, so a 128-slot cap leaves 10x headroom
while making OOM impossible.

Implementation:
- SNAPSHOT_CACHE_MAX_SLOTS = 128.
- evictIfNeeded() walks insertion order and evicts the first slot whose
  inFlight is null. Slots with active fetches are skipped to avoid
  orphaning awaiting callers; we accept brief over-cap growth until
  in-flight settles.
- touchSlot() re-inserts a slot at the end of Map insertion order on
  hit / in-flight join / fresh write so it counts as most-recently-used.
2026-04-25 17:56:23 +04:00
Elie Habib
d9a1f6a0f8 feat(energy-atlas): GEM pipeline import infrastructure (parity PR 1, plan U1-U4) (#3397)
* feat(energy-atlas): GEM pipeline import infrastructure (PR 1, plan U1-U4)

Lands the parser, dedup helper, validator extensions, and operator runbook
for the Global Energy Monitor (CC-BY 4.0) pipeline-data refresh — closing
~3.6× of the Energy Atlas pipeline-scale gap once the operator runs the
import.

Per docs/plans/2026-04-25-003-feat-energy-parity-pushup-plan.md PR 1.

U1 — Validator + schema extensions:
- Add `'gem'` to VALID_SOURCES in scripts/_pipeline-registry.mjs and to the
  evidence-bearing-source whitelist in derivePipelinePublicBadge so GEM-
  sourced offline rows derive a `disputed` badge via the external-signal
  rule (parity with `press`/`satellite`/`ais-relay`).
- Export VALID_SOURCES so tests assert against the same source-of-truth
  the validator uses (matches the VALID_OIL_PRODUCT_CLASSES pattern from
  PR #3383).
- Floor bump (MIN_PIPELINES_PER_REGISTRY 8→200) intentionally DEFERRED
  to the follow-up data PR — bumping it now would gate the existing 75+75
  hand-curated rows below the new floor and break seeder publishes
  before the GEM data lands.

U2 — GEM parser (test-first):
- scripts/import-gem-pipelines.mjs reads a local JSON file (operator pre-
  converts GEM Excel externally — no `xlsx` dependency added). Schema-
  drift sentinel throws on missing columns. Status mapping covers
  Operating/Construction/Cancelled/Mothballed/Idle/Shut-in. ProductClass
  mapping covers Crude Oil / Refined Products / mixed-flow notes.
  Capacity-unit conversion handles bcm/y, bbl/d, Mbd, kbd.
- 22 tests in tests/import-gem-pipelines.test.mjs cover schema sentinel,
  fuel split, status mapping, productClass mapping, capacity conversion,
  minimum-viable-evidence shape, registry-shape conformance, and bad-
  coordinate rejection.

U3 — Deduplication (pure deterministic):
- scripts/_pipeline-dedup.mjs: dedupePipelines(existing, candidates) →
  { toAdd, skippedDuplicates }. Match rule: haversine ≤5km AND name
  Jaccard ≥0.6 (BOTH required). Reverse-direction-pair-aware.
- 19 tests cover internal helpers, match logic, id collision, determinism,
  and empty inputs.

U4 — Operator runbook (data import deferred):
- docs/methodology/pipelines.mdx: 7-step runbook for the operator to
  download GEM, pre-convert Excel→JSON, dry-run with --print-candidates,
  merge with --merge, bump the registry floor, and commit with
  provenance metadata.
- The actual data import is intentionally OUT OF SCOPE for this agent-
  authored PR because GEM downloads are registration-gated. A follow-up
  PR will commit the imported scripts/data/pipelines-{gas,oil}.json +
  bump MIN_PIPELINES_PER_REGISTRY → 200 + record the GEM release SHA256.

Tests: typecheck clean; 67 tests pass across the three test files.

Codex-approved through 8 review rounds against origin/main @ 050073354.

* fix(energy-atlas): wire --merge to dedupePipelines + within-batch dedup (PR1 review)

P1 — --merge was a TODO no-op (import-gem-pipelines.mjs:291):
- Previously exited with code 2 + a "TODO: wire dedup once U3 lands"
  message. The PR body and the methodology runbook both advertised
  --merge as the operator path.
- Add mergeIntoRegistry(filename, candidates) helper that loads the
  existing envelope, runs dedupePipelines() against the candidate
  list, sorts new entries alphabetically by id (stable diff on rerun),
  validates the merged registry via validateRegistry(), and writes
  to disk only after validation passes. CLI --merge now invokes it
  for both gas and oil + prints a per-fuel summary.
- Source attribution: the registry envelope's `source` field is
  upgraded to mention GEM (CC-BY 4.0) on first merge so the data file
  itself documents provenance.

P2 — dedup transitive-match bug (_pipeline-dedup.mjs:120):
- Pre-fix loop checked each candidate ONLY against the original
  `existing` array. Two GEM rows that match each other but not anything
  in `existing` would BOTH be added, defeating the dedup contract for
  same-batch duplicates (real example: a primary GEM entry plus a
  duplicate row from a regional supplemental sheet).
- Now compares against existing FIRST (existing wins on cross-set
  match — preserves richer hand-curated evidence), then falls back to
  the already-accepted toAdd set. Within-batch matches retain the FIRST
  accepted candidate (deterministic by candidate-list order).

Tests: 22 in tests/pipeline-dedup.test.mjs (3 new) cover the
within-batch dedup, transitive collapse, and existing-wins-over-
already-accepted scenarios. typecheck clean.

* fix(energy-atlas): cross-file-atomic --merge (PR1 review #2)

P1 — partial-import on disk if oil validation fails after gas writes
(import-gem-pipelines.mjs:329 / :350):
- Previous flow ran `mergeIntoRegistry('pipelines-gas.json', gas)` which
  wrote to disk, then `mergeIntoRegistry('pipelines-oil.json', oil)`. If
  oil validation failed, the operator was left with a half-imported
  state: gas had GEM rows committed to disk but oil didn't.
- Refactor into a two-phase API:
  1. prepareMerge(filename, candidates) — pure, no disk I/O. Builds the
     merged envelope, validates it, throws on validation failure.
  2. mergeBothRegistries(gasCandidates, oilCandidates) — calls
     prepareMerge for BOTH fuels first; only writes to disk after BOTH
     pass validation. If oil's prepareMerge throws, gas was never
     touched on disk.
- CLI --merge now invokes mergeBothRegistries. The atomicity guarantee
  is documented inline in the helper.

typecheck clean. No new tests because the existing dedup + validate
suites cover the underlying logic; the change is purely about call
ordering for atomicity.

* fix(energy-atlas): deterministic lastEvidenceUpdate + clarify test comment (PR1 review #3)

P2 — lastEvidenceUpdate was non-deterministic (Greptile P2):
- Previous code used new Date().toISOString() per parser run, so two runs
  of parseGemPipelines on the same input on different days produced
  byte-different output. Quarterly re-imports would produce noisy
  full-row diffs even when the upstream GEM data hadn't changed.
- New: resolveEvidenceTimestamp(envelope) derives the timestamp from
  envelope.downloadedAt (the operator-recorded date) or sourceVersion
  if it parses as ISO. Falls back to 1970-01-01 sentinel when neither
  is set — deliberately ugly so reviewers spot the missing field in
  the data file diff rather than getting silent today's date.
- Computed once per parse run so every emitted candidate gets the
  same timestamp.

P2 — misleading test comment (Greptile P2):
- Comment in tests/import-gem-pipelines.test.mjs:136 said "400_000 bbl/d
  ÷ 1000 = 400 Mbd" while the assertion correctly expects 0.4 (because
  the convention is millions, not thousands). Rewrote the comment to
  state the actual rule + arithmetic clearly.

3 new tests for determinism: (a) two parser runs produce identical
output, (b) timestamp derives from downloadedAt, (c) missing date
yields the epoch sentinel (loud failure mode).
2026-04-25 17:55:45 +04:00
Elie Habib
eeffac31bf fix(vercel): drop fragile VERCEL_GIT_PULL_REQUEST_ID guard in ignore step (#3404)
scripts/vercel-ignore.sh skipped any preview deploy where
VERCEL_GIT_PULL_REQUEST_ID was empty. Vercel only populates that var
on fresh PR-aware webhook events; manual "Redeploy" / "Redeploy
without cache" from the dashboard, and some integration edge cases,
leave it empty even on commits attached to an open PR. The merge-base
diff against origin/main below it is already the authoritative
"touched anything web-relevant" check and is strictly stronger.

Repro: PR #3403 commit 24d511e29 on feat/usage-telemetry — five api/
+ server/ files clearly modified, build canceled at line 18 before
the path diff ran. Local replay with PR_ID unset now exits 1 (build).
2026-04-25 17:53:47 +04:00
Elie Habib
92dd046820 fix(brief): address Greptile P1 + P4 review on merged PR #3396 (#3401)
P1 — False-positive PARITY REGRESSION for AI-digest opt-out users
  (scripts/seed-digest-notifications.mjs)

When rule.aiDigestEnabled === false, briefLead is intentionally
null (no summary in channel bodies), but envLead still reads the
envelope's stub lead. The string comparison null !== '<stub lead>'
fired channels_equal=false on every tick for every opted-out user
— flooding the parity log with noise and risking the PARITY
REGRESSION alert becoming useless.

The WARN was already gated by `briefLead && envLead` (so no Sentry
flood), but the LOG line still misled operators counting
channels_equal=false. Gate the entire parity-log block on the same
condition that governs briefLead population:

  if (AI_DIGEST_ENABLED && rule.aiDigestEnabled !== false) {
    // … parity log + warn …
  }

Opt-out users now produce no parity-log line at all (correct —
there's no canonical synthesis to compare against).

P4 — greetingBucket '' fallback semantics
  (scripts/lib/brief-llm.mjs)

Doc-only — Greptile flagged that unrecognised greetings collapse
to '' (a single bucket). Added a comment clarifying this is
intentional: '' is a stable fourth bucket, not a sentinel for
"missing data". A user whose greeting flips between recognised
and unrecognised values gets different cache keys, which is
correct (those produce visibly different leads).

Other Greptile findings (no code change — replied via PR comment):
- P2 (double-fetch in [...sortedDue, ...sortedAll]): already
  addressed in helper extraction commit df3563080 of PR #3396 —
  see `seen` Set dedupe at scripts/lib/digest-orchestration-helpers.mjs:103.
- P2 (parity check no-op for opted-in): outdated as written —
  after 5d10cee86's per-rule synthesis, briefLead is per-rule and
  envLead is winner-rule's envelope.lead. They diverge for
  non-winner rules (legitimate); agree for winner rules (cache-
  shared via generateDigestProse). The check still serves its
  documented purpose for cache-drift detection.

Stacked on the merged PR #3396; opens as a follow-up since the
parent branch is now closed.

Test results: 7012/7012 (was 7006 pre-rebase onto post-merge main).
2026-04-25 16:43:50 +04:00
Elie Habib
2f5445284b fix(brief): single canonical synthesis brain — eliminate email/brief lead divergence (#3396)
* feat(brief-llm): canonical synthesis prompt + v3 cache key

Extends generateDigestProse to be the single source of truth for
brief executive-summary synthesis (canonicalises what was previously
split between brief-llm's generateDigestProse and seed-digest-
notifications.mjs's generateAISummary). Ports Brain B's prompt
features into buildDigestPrompt:

- ctx={profile, greeting, isPublic} parameter (back-compat: 4-arg
  callers behave like today)
- per-story severity uppercased + short-hash prefix [h:XXXX] so the
  model can emit rankedStoryHashes for stable re-ranking
- profile lines + greeting opener appear only when ctx.isPublic !== true

validateDigestProseShape gains optional rankedStoryHashes (≥4-char
strings, capped to MAX_STORIES_PER_USER × 2). v2-shaped rows still
pass — field defaults to [].

hashDigestInput v3:
- material includes profile-SHA, greeting bucket, isPublic flag,
  per-story hash
- isPublic=true substitutes literal 'public' for userId in the cache
  key so all share-URL readers of the same (date, sensitivity, pool)
  hit ONE cache row (no PII in public cache key)

Adds generateDigestProsePublic(stories, sensitivity, deps) wrapper —
no userId param by design — for the share-URL surface.

Cache prefix bumped brief:llm:digest:v2 → v3. v2 rows expire on TTL.
Per the v1→v2 precedent (see hashDigestInput comment), one-tick cost
on rollout is acceptable for cache-key correctness.

Tests: 72/72 passing in tests/brief-llm.test.mjs (8 new for the v3
behaviors), full data suite 6952/6952.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 1, Codex-approved (5 rounds).

* feat(brief): envelope v3 — adds digest.publicLead for share-URL surface

Bumps BRIEF_ENVELOPE_VERSION 2 → 3. Adds optional
BriefDigest.publicLead — non-personalised executive lead generated
by generateDigestProsePublic (already in this branch from the
previous commit) for the public share-URL surface. Personalised
`lead` is the canonical synthesis for authenticated channels;
publicLead is its profile-stripped sibling so api/brief/public/*
never serves user-specific content (watched assets/regions).

SUPPORTED_ENVELOPE_VERSIONS = [1, 2, 3] keeps v1 + v2 envelopes
in the 7-day TTL window readable through the rollout — the
composer only ever writes the current version, but readers must
tolerate older shapes that haven't expired yet. Same rollout
pattern used at the v1 → v2 bump.

Renderer changes (server/_shared/brief-render.js):
- ALLOWED_DIGEST_KEYS gains 'publicLead' (closed-key-set still
  enforced; v2 envelopes pass because publicLead === undefined is
  the v2 shape).
- assertBriefEnvelope: new isNonEmptyString check on publicLead
  when present. Type contract enforced; absence is OK.

Tests (tests/brief-magazine-render.test.mjs):
- New describe block "v3 publicLead field": v3 envelope renders;
  malformed publicLead rejected; v2 envelope still passes; ad-hoc
  digest keys (e.g. synthesisLevel) still rejected — confirming
  the closed-key-set defense holds for the cron-local-only fields
  the orchestrator must NOT persist.
- BRIEF_ENVELOPE_VERSION pin updated 2 → 3 with rollout-rationale
  comment.

Test results: 182 brief-related tests pass; full data suite
6956/6956.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 2, Codex Round-3 Medium #2.

* feat(brief): synthesis splice + rankedStoryHashes pre-cap re-order

Plumbs the canonical synthesis output (lead, threads, signals,
publicLead, rankedStoryHashes from generateDigestProse) through the
pure composer so the orchestration layer can hand pre-resolved data
into envelope.digest. Composer stays sync / no I/O — Codex Round-2
High #2 honored.

Changes:

scripts/lib/brief-compose.mjs:
- digestStoryToUpstreamTopStory now emits `hash` (the digest story's
  stable identifier, falls back to titleHash when absent). Without
  this, rankedStoryHashes from the LLM has nothing to match against.
- composeBriefFromDigestStories accepts opts.synthesis = {lead,
  threads, signals, rankedStoryHashes?, publicLead?}. When passed,
  splices into envelope.digest after the stub is built. Partial
  synthesis (e.g. only `lead` populated) keeps stub defaults for the
  other fields — graceful degradation when L2 fallback fires.

shared/brief-filter.js:
- filterTopStories accepts optional rankedStoryHashes. New helper
  applyRankedOrder re-orders stories by short-hash prefix match
  BEFORE the cap is applied, so the model's editorial judgment of
  importance survives MAX_STORIES_PER_USER. Stable for ties; stories
  not in the ranking come after in original order. Empty/missing
  ranking is a no-op (legacy callers unchanged).

shared/brief-filter.d.ts:
- filterTopStories signature gains rankedStoryHashes?: string[].
- UpstreamTopStory gains hash?: unknown (carried through from
  digestStoryToUpstreamTopStory).

Tests added (tests/brief-from-digest-stories.test.mjs):
- synthesis substitutes lead/threads/signals/publicLead.
- legacy 4-arg callers (no synthesis) keep stub lead.
- partial synthesis (only lead) keeps stub threads/signals.
- rankedStoryHashes re-orders pool before cap.
- short-hash prefix match (model emits 8 chars; story carries full).
- unranked stories go after in original order.

Test results: 33/33 in brief-from-digest-stories; 182/182 across all
brief tests; full data suite 6956/6956.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 3, Codex Round-2 Low + Round-2 High #2.

* feat(brief): single canonical synthesis per user; rewire all channels

Restructures the digest cron's per-user compose + send loops to
produce ONE canonical synthesis per user per issueSlot — the lead
text every channel (email HTML, plain-text, Telegram, Slack,
Discord, webhook) and the magazine show is byte-identical. This
eliminates the "two-brain" divergence that was producing different
exec summaries on different surfaces (observed 2026-04-25 0802).

Architecture:

composeBriefsForRun (orchestration):
- Pre-annotates every eligible rule with lastSentAt + isDue once,
  before the per-user pass. Same getLastSentAt helper the send loop
  uses so compose + send agree on lastSentAt for every rule.

composeAndStoreBriefForUser (per-user):
- Two-pass winner walk: try DUE rules first (sortedDue), fall back
  to ALL eligible rules (sortedAll) for compose-only ticks.
  Preserves today's dashboard refresh contract for weekly /
  twice_daily users on non-due ticks (Codex Round-4 High #1).
- Within each pass, walk by compareRules priority and pick the
  FIRST candidate with a non-empty pool — mirrors today's behavior
  at scripts/seed-digest-notifications.mjs:1044 and prevents the
  "highest-priority but empty pool" edge case (Codex Round-4
  Medium #2).
- Three-level synthesis fallback chain:
    L1: generateDigestProse(fullPool, ctx={profile,greeting,!public})
    L2: generateDigestProse(envelope-sized slice, ctx={})
    L3: stub from assembleStubbedBriefEnvelope
  Distinct log lines per fallback level so ops can quantify
  failure-mode distribution.
- Generates publicLead in parallel via generateDigestProsePublic
  (no userId param; cache-shared across all share-URL readers).
- Splices synthesis into envelope via composer's optional
  `synthesis` arg (Step 3); rankedStoryHashes re-orders the pool
  BEFORE the cap so editorial importance survives MAX_STORIES.
- synthesisLevel stored in the cron-local briefByUser entry — NOT
  persisted in the envelope (renderer's assertNoExtraKeys would
  reject; Codex Round-2 Medium #5).

Send loop:
- Reads lastSentAt via shared getLastSentAt helper (single source
  of truth with compose flow).
- briefLead = brief?.envelope?.data?.digest?.lead — the canonical
  lead. Passed to buildChannelBodies (text/Telegram/Slack/Discord),
  injectEmailSummary (HTML email), and sendWebhook (webhook
  payload's `summary` field). All-channel parity (Codex Round-1
  Medium #6).
- Subject ternary reads cron-local synthesisLevel: 1 or 2 →
  "Intelligence Brief", 3 → "Digest" (preserves today's UX for
  fallback paths; Codex Round-1 Missing #5).

Removed:
- generateAISummary() — the second LLM call that produced the
  divergent email lead. ~85 lines.
- AI_SUMMARY_CACHE_TTL constant — no longer referenced. The
  digest:ai-summary:v1:* cache rows expire on their existing 1h
  TTL (no cleanup pass).

Helpers added:
- getLastSentAt(rule) — extracted Upstash GET for digest:last-sent
  so compose + send both call one source of truth.
- buildSynthesisCtx(rule, nowMs) — formats profile + greeting for
  the canonical synthesis call. Preserves all today's prefs-fetch
  failure-mode behavior.

Composer:
- compareRules now exported from scripts/lib/brief-compose.mjs so
  the cron can sort each pass identically to groupEligibleRulesByUser.

Test results: full data suite 6962/6962 (was 6956 pre-Step 4; +6
new compose-synthesis tests from Step 3).

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 4 + 4b. Codex-approved (5 rounds).

* fix(brief-render): public-share lead fail-safe — never leak personalised lead

Public-share render path (api/brief/public/[hash].ts → renderer
publicMode=true) MUST NEVER serve the personalised digest.lead
because that string can carry profile context — watched assets,
saved-region names, etc. — written by generateDigestProse with
ctx.profile populated.

Previously: redactForPublic redacted user.name and stories.whyMatters
but passed digest.lead through unchanged. Codex Round-2 High
(security finding).

Now (v3 envelope contract):
- redactForPublic substitutes digest.lead = digest.publicLead when
  the v3 envelope carries one (generated by generateDigestProsePublic
  with profile=null, cache-shared across all public readers).
- When publicLead is absent (v2 envelope still in TTL window OR v3
  envelope where publicLead generation failed), redactForPublic sets
  digest.lead to empty string.
- renderDigestGreeting: when lead is empty, OMIT the <blockquote>
  pull-quote entirely. Page still renders complete (greeting +
  horizontal rule), just without the italic lead block.
- NEVER falls back to the original personalised lead.

assertBriefEnvelope still validates publicLead's contract (when
present, must be a non-empty string) BEFORE redactForPublic runs,
so a malformed publicLead throws before any leak risk.

Tests added (tests/brief-magazine-render.test.mjs):
- v3 envelope renders publicLead in pull-quote, personalised lead
  text never appears.
- v2 envelope (no publicLead) omits pull-quote; rest of page
  intact.
- empty-string publicLead rejected by validator (defensive).
- private render still uses personalised lead.

Test results: 68 brief-magazine-render tests pass; full data suite
remains green from prior commit.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 5, Codex Round-2 High (security).

* feat(digest): brief lead parity log + extra acceptance tests

Adds the parity-contract observability line and supplementary
acceptance tests for the canonical synthesis path.

Parity log (per send, after successful delivery):
  [digest] brief lead parity user=<id> rule=<v>:<s>:<lang>
    synthesis_level=<1|2|3> exec_len=<n> brief_lead_len=<n>
    channels_equal=<bool> public_lead_len=<n>

When channels_equal=false an extra WARN line fires —
"PARITY REGRESSION user=… — email lead != envelope lead." Sentry's
existing console-breadcrumb hook lifts this without an explicit
captureMessage call. Plan acceptance criterion A5.

Tests added (tests/brief-llm.test.mjs, +9):
- generateDigestProsePublic: two distinct callers with identical
  (sensitivity, story-pool) hit the SAME cache row (per Codex
  Round-2 Medium #4 — "no PII in public cache key").
- public + private writes never collide on cache key (defensive).
- greeting bucket change re-keys the personalised cache (Brain B
  parity).
- profile change re-keys the personalised cache.
- v3 cache prefix used (no v2 writes).

Test results: 77/77 in brief-llm; full data suite 6971/6971
(was 6962 pre-Step-7; +9 new public-cache tests).

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 6 (partial) + 7. Acceptance A5, A6.g, A6.f.

* test(digest): backfill A6.h/i/l/m acceptance tests via helper extraction

* fix(brief): close two correctness regressions on multi-rule + public surface

Two findings from human review of the canonical-synthesis PR:

1. Public-share redaction leaked personalised signals + threads.
   The new prompt explicitly personalises both `lead` and `signals`
   ("personalise lead and signals"), but redactForPublic only
   substituted `lead` — leaving `signals` and `threads` intact.
   Public renderer's hasSignals gate would emit the signals page
   whenever `digest.signals.length > 0`, exposing watched-asset /
   region phrasing to anonymous readers. Same privacy bug class
   the original PR was meant to close, just on different fields.

2. Multi-rule users got cross-pool lead/storyList mismatch.
   composeAndStoreBriefForUser picks ONE winning rule for the
   canonical envelope. The send loop then injected that ONE
   `briefLead` into every due rule's channel body — even though
   each rule's storyList came from its own (per-rule) digest pool.
   Multi-rule users (e.g. `full` + `finance`) ended up with email
   bodies leading on geopolitics while listing finance stories.
   Cross-rule editorial mismatch reintroduced after the cross-
   surface fix.

Fix 1 — public signals + threads:
- Envelope shape: BriefDigest gains `publicSignals?: string[]` +
  `publicThreads?: BriefThread[]` (sibling fields to publicLead).
  Renderer's ALLOWED_DIGEST_KEYS extended; assertBriefEnvelope
  validates them when present.
- generateDigestProsePublic already returned a full prose object
  (lead + signals + threads) — orchestration now captures all
  three instead of just `.lead`. Composer splices each into its
  envelope slot.
- redactForPublic substitutes:
    digest.lead    ← publicLead (or empty → omits pull-quote)
    digest.signals ← publicSignals (or empty → omits signals page)
    digest.threads ← publicThreads (or category-derived stub via
                     new derivePublicThreadsStub helper — never
                     falls back to the personalised threads)
- New tests cover all three substitutions + their fail-safes.

Fix 2 — per-rule synthesis in send loop:
- Each due rule independently calls runSynthesisWithFallback over
  ITS OWN pool + ctx. Channel body lead is internally consistent
  with the storyList (both from the same pool).
- Cache absorbs the cost: when this is the winner rule, the
  synthesis hits the cache row written during the compose pass
  (same userId/sensitivity/pool/ctx) — no extra LLM call. Only
  multi-rule users with non-overlapping pools incur additional
  LLM calls.
- magazineUrl still points at the winner's envelope (single brief
  per user per slot — `(userId, issueSlot)` URL contract). Channel
  lead vs magazine lead may differ for non-winner rule sends;
  documented as acceptable trade-off (URL/key shape change to
  support per-rule magazines is out of scope for this PR).
- Parity log refined: adds `winner_match=<bool>` field. The
  PARITY REGRESSION warning now fires only when winner_match=true
  AND the channel lead differs from the envelope lead (the actual
  contract regression). Non-winner sends with legitimately
  different leads no longer spam the alert.

Test results:
- tests/brief-magazine-render.test.mjs: 75/75 (+7 new for public
  signals/threads + validator + private-mode-ignores-public-fields)
- Full data suite: 6995/6995 (was 6988; +7 net)
- typecheck + typecheck:api: clean

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Addresses 2 review findings on PR #3396 not anticipated in the
5-round Codex review.

* fix(brief): unify compose+send window, fall through filter-rejection

Address two residual risks in PR #3396 (single-canonical-brain refactor):

Risk 1 — canonical lead synthesized from a fixed 24h pool while the
send loop ships stories from `lastSentAt ?? 24h`. For weekly users
that meant a 24h-pool lead bolted onto a 7d email body — the same
cross-surface divergence the refactor was meant to eliminate, just in
a different shape. Twice-daily users hit a 12h-vs-24h variant.

Fix: extract the window formula to `digestWindowStartMs(lastSentAt,
nowMs, defaultLookbackMs)` in digest-orchestration-helpers.mjs and
call it from BOTH the compose path's digestFor closure AND the send
loop. The compose path now derives windowStart per-candidate from
`cand.lastSentAt`, identical to what the send loop will use for that
rule. Removed the now-unused BRIEF_STORY_WINDOW_MS constant.

Side-effect: digestFor now receives the full annotated candidate
(`cand`) instead of just the rule, so it can reach `cand.lastSentAt`.
Backwards-compatible at the helper level — pickWinningCandidateWithPool
forwards `cand` instead of `cand.rule`.

Cache memo hit rate drops since lastSentAt varies per-rule, but
correctness > a few extra Upstash GETs.

Risk 2 — pickWinningCandidateWithPool returned the first candidate
with a non-empty raw pool as winner. If composeBriefFromDigestStories
then dropped every story (URL/headline/shape filters), the caller
bailed without trying lower-priority candidates. Pre-PR behaviour was
to keep walking. This regressed multi-rule users whose top-priority
rule's pool happens to be entirely filter-rejected.

Fix: optional `tryCompose(cand, stories)` callback on
pickWinningCandidateWithPool. When provided, the helper calls it after
the non-empty pool check; falsy return → log filter-rejected and walk
to the next candidate; truthy → returns `{winner, stories,
composeResult}` so the caller can reuse the result. Without the
callback, legacy semantics preserved (existing tests + callers
unaffected).

Caller composeAndStoreBriefForUser passes a no-synthesis compose call
as tryCompose — cheap pure-JS, no I/O. Synthesis only runs once after
the winner is locked in, so the perf cost is one extra compose per
filter-rejected candidate, no extra LLM round-trips.

Tests:
- 10 new cases in tests/digest-orchestration-helpers.test.mjs
  covering: digestFor receiving full candidate; tryCompose
  fall-through to lower-priority; all-rejected returns null;
  composeResult forwarded; legacy semantics without tryCompose;
  digestWindowStartMs lastSentAt-vs-default branches; weekly +
  twice-daily window parity assertions; epoch-zero ?? guard.
- Updated tests/digest-cache-key-sensitivity.test.mjs static-shape
  regex to match the new `cand.rule.sensitivity` cache-key shape
  (intent unchanged: cache key MUST include sensitivity).

Stacked on PR #3396 — targets feat/brief-two-brain-divergence.
2026-04-25 16:22:31 +04:00
Elie Habib
dec7b64b17 fix(unrest): proxy-only fetch + 3-attempt retry for GDELT (#3395)
* fix(unrest): proxy-only fetch + 3-attempt retry for GDELT

Production logs showed PR #3362's 45s proxy timeout solved one failure mode
(CONNECT-tunnel timeouts) but ~80% of ticks now fail in 3-14 seconds with
either "Proxy CONNECT: HTTP/1.1 522 Server Error" (Cloudflare can't reach
GDELT origin) or "Client network socket disconnected before secure TLS
connection" (Decodo RSTs the handshake). These are fast-fails, not timeouts —
no amount of timeout bumping helps.

Two changes:

1. Drop the direct fetch entirely. Every direct attempt in 14h of logs
   errored with UND_ERR_CONNECT_TIMEOUT or ECONNRESET — 0% success since
   PR #3256 added the proxy fallback. The direct call costs ~8-30s per tick
   for nothing.

2. Wrap the proxy call in a 3-attempt retry with 1.5-3s jitter. Single-attempt
   per-tick success rate measured at ~18%; with 3 attempts that lifts to ~75%+
   under the same Decodo↔Cloudflare flake rate, comfortably keeping seedAge
   under the 120m STALE_SEED threshold.

Deeper structural fix (out of scope here): wire ACLED credentials on the
Railway unrest service so GDELT isn't the single upstream.

* test(unrest): cover GDELT proxy retry path + no-proxy hard-fail

Address PR #3395 reviewer concerns:

(1) "no automated coverage for the new retry path or the no-proxy path"

Add scripts/seed-unrest-events.mjs DI seams (_proxyFetcher, _sleep,
_jitter, _maxAttempts, _resolveProxyForConnect) and a 6-test suite at
tests/seed-unrest-gdelt-fetch.test.mjs covering:

  1. Single-attempt success — no retries fire.
  2. 2 transient failures + 3rd-attempt success — recovers, returns JSON.
  3. All attempts fail — throws LAST error, exact attempt count.
  4. Malformed proxy body — SyntaxError short-circuits retry (deterministic
     parse failures shouldn't burn attempts).
  5. Missing CONNECT proxy creds — fetchGdeltEvents throws clear
     "PROXY_URL env var is not set" pointer for ops, asserts NO proxy
     fetcher invocation (no wasted network).
  6. End-to-end with retry — fetchGdeltEvents with one transient 522
     recovers and aggregates events normally.

Gate runSeed() entry-point with `import.meta.url === file://argv[1]` so
tests can `import` the module without triggering a real seed run.

(2) "review assumes Railway has Decodo creds; without them, fails immediately"

Yes — that's intentional. Direct fetch had 0% success in production for
weeks (every Railway tick errored UND_ERR_CONNECT_TIMEOUT or ECONNRESET)
since PR #3256 added the proxy fallback. Reintroducing it as "soft"
fallback would just add ~30s of latency + log noise per tick.

What's improved here: the no-proxy error message now names the missing
env var (PROXY_URL) so an operator who hits this in Railway logs has a
direct pointer instead of a generic "GDELT requires proxy" string.
2026-04-25 15:27:43 +04:00
Elie Habib
8f8213605f docs(brief-quality): correct help text — cap_truncation has no fallback estimate (#3393)
Greptile P3 follow-up on PR #3390 (already merged): the help comment
described cap_truncation_rate as "computed from production drop logs
if supplied via stdin, else estimated as max(0, in - 16)/in from
replay record counts" — but:

1. The "16" was stale (post PR #3389, cap default is 12).
2. The fallback estimate was never implemented. cap_truncation only
   appears when --drop-lines-stdin is passed.

Updated the comment to match what the code actually does: cap
metric is omitted entirely without stdin input. No fallback estimate
because replay records don't capture the post-cap output count, so
any derived value would be misleading.
2026-04-25 13:47:53 +04:00
Elie Habib
621ac8d300 feat(brief): topic-threshold sweep + quality dashboard + labeled pairs (#3390)
* feat(brief): topic-threshold sweep + daily quality dashboard + labeled pairs

Adds the "are we getting better" measurement infrastructure for the
brief topic-grouping pipeline. Three artifacts:

1. scripts/data/brief-adjacency-pairs.json — labeled "should-cluster"
   and "should-separate" pairs from real production briefs (12 pairs,
   7 cluster + 5 separate). Append-only labeled corpus.

2. scripts/sweep-topic-thresholds.mjs — pulls the per-tick replay log
   captured by writeReplayLog, reconstructs each tick's reps + cached
   embeddings, re-runs single-link clustering at multiple cosine
   thresholds, and outputs a markdown table with pair_recall,
   false_adjacency, topic_count, multi-member share, and a composite
   quality_score per threshold. Picks the highest-scoring as the
   recommendation.

3. scripts/brief-quality-report.mjs — daily quality dashboard. Pulls
   the latest tick, computes metrics at the active threshold, prints
   which labeled pairs were violated. Run before each config change;
   compare deltas; revert if quality_score drops.

Both scripts mirror the production slice (score floor + top-N) before
clustering so metrics reflect what users actually receive.

First sweep result against 2026-04-24 production replay records:

  threshold | quality | recall | false-adj
     0.30   |  0.649  | 100.0% | 100.0%
     0.32   |  0.705  | 100.0% |  75.0%
     0.35   |  0.825  | 100.0% |  33.3%
     0.38   |  0.815  | 100.0% |  33.3%
     0.40   |  0.815  | 100.0% |  33.3%
     0.42   |  0.895  | 100.0% |   8.3%  
     0.45   |  0.535  |  36.4% |   0.0%  ← current production

Recommended env flip: DIGEST_DEDUP_TOPIC_THRESHOLD=0.42 — lifts
pair_recall from 36% to 100% while introducing only one false-adjacency
case (1 of 12 separate pairs).

* fix(brief-quality): reviewer feedback — cap-aware metrics + env-readable + missing-embed survival

Addresses 6 of 8 review comments on PR #3390:

B. Drop redundant groupTopicsPostDedup call. singleLinkCluster IS the
   partition algorithm production uses internally; the second pass was
   paying cosine work per threshold per tick to read only .error.

C. Score floor + topN + cap now read from production env
   (DIGEST_SCORE_MIN, DIGEST_MAX_ITEMS, DIGEST_MAX_STORIES_PER_USER)
   with documented defaults. CLI flags --score-floor / --top-n /
   --cap (--max-stories) override.

D. Filter reps with missing embeddings instead of returning null on
   the whole tick. Skip only if fewer than 5 survive. Drop count
   reported in Coverage.

E. Removed dead local cosine() in both files.

F. JSON metadata moved from underscore-prefixed top-level keys into a
   nested `meta: {}` object.

H. Recommendation output now names the Railway service explicitly
   so copy-paste can't go to the wrong service.

Adds visible-window pair-recall: scores cluster correctness on what
the user actually sees post-MAX_STORIES_PER_USER truncation, in
addition to partition correctness on the full 30-rep sliced set.

Visible-window finding (against 2026-04-24 production replay):

  threshold=0.45 cap=12 → visible_quality 0.916
  threshold=0.45 cap=16 → visible_quality 0.716  ← cap bump HURTS
  threshold=0.42 cap=12 → visible_quality 0.845
  threshold=0.42 cap=16 → visible_quality 0.845

PR #3389's cap bump 12 → 16 is NOT evidence-justified at the current
0.45 threshold. Positions 13-16 dilute without helping adjacency.
PR #3389 will be revised separately to keep cap=12 default but add
env-tunability.

Skipping G (helper extraction) per reviewer guidance — defer until a
third tool justifies the abstraction.

* fix(brief-quality): reviewer round 2 — single-star, cap=12 default, error path surfaced

Three Greptile review comments on PR #3390:

P1 — sweep  marker tagged every running-best row instead of only
the global best. Compute the global best in a first pass, render
in a second; only the single best row is starred.

P2 — sweep MAX_STORIES_DEFAULT was 16 (assumed PR #3389 would land
the bump). PR #3389 was revised after evidence to keep cap at 12;
default reverted here too. Local runs without DIGEST_MAX_STORIES_PER_USER
now evaluate the correct production-equivalent visible window.

P2 — brief-quality-report's main() gated `scoreReplay` on
`embeddingByHash.size === reps.length`, defeating the missing-embed
survival logic inside scoreReplay (which already filters and falls
back to MIN_SURVIVING_REPS). Removed the outer gate; renderReport's
existing ⚠️ error path now surfaces the diagnostic when too few
embeddings survive instead of silently omitting the section.

Re-running the sweep with the corrected cap=12 default produces a
substantially different recommendation than the original commit
message claimed:

  threshold | visible_quality (cap=12)
     0.30   |   0.649
     0.35   |   0.625
     0.40   |   0.615
     0.42   |   0.845
     0.45   |   0.916    ← current production IS the local optimum

The original commit's "lower threshold to 0.42" recommendation was
an artifact of the cap=16 default. At the actual production cap (12),
the labeled corpus says the current 0.45 threshold is best. PR
description will be updated separately.

The 'shadowed `items`' Greptile mention refers to two `items`
declarations in DIFFERENT function scopes (`redisLrangeAll` and
`scoreOneTick`); not a real shadowing — skipped.
2026-04-25 12:08:15 +04:00
Elie Habib
3373b542e9 feat(brief): make MAX_STORIES_PER_USER env-tunable (default 12, evidence kept it at 12) (#3389)
* fix(brief): bump MAX_STORIES_PER_USER 12 → 16

Production telemetry from PR #3387 surfaced cap-truncation as the
dominant filter loss: 73% of `sensitivity=all` users had `dropped_cap=18`
per tick (30 qualified stories truncated to 12). Multi-member topics
straddling the position-12 boundary lost members.

Bumping the cap to 16 lets larger leading topics fit fully without
affecting `sensitivity=critical` users (their pools cap at 7-10 stories
— well below either threshold). Reduces dropped_cap from ~18 to ~14
per tick.

Validation signal: watch the `[digest] brief filter drops` log line on
Railway after deploy — `dropped_cap=` should drop by ~4 per tick.

Side effect: this addresses the dominant production signal that
Solution 3 (post-filter regroup, originally planned in
docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md)
was supposed to handle. Production evidence killed Sol-3's premise
(0 non-cap drops in 70 samples), so this is a simpler, evidence-backed
alternative.

* revise(brief): keep MAX_STORIES_PER_USER default at 12, add env-tunability

Reviewer asked "why 16?" and the honest answer turned out to be: the
data doesn't support it. After landing PR #3390's sweep harness with
visible-window metrics, re-ran against 2026-04-24 production replay:

  threshold=0.45 cap=12 -> visible_quality 0.916 (best at this cap)
  threshold=0.45 cap=16 -> visible_quality 0.716 (cap bump HURTS)
  threshold=0.42 cap=12 -> visible_quality 0.845
  threshold=0.42 cap=16 -> visible_quality 0.845 (neutral)

At the current 0.45 threshold, positions 13-16 are mostly singletons
or members of "should-separate" clusters — they dilute the brief
without helping topic adjacency. Bumping the cap default to 16 was a
wrong inference from the dropped_cap=18 signal alone.

Revised approach:

- Default MAX_STORIES_PER_USER stays at 12 (matches historical prod).
- Constant becomes env-tunable via DIGEST_MAX_STORIES_PER_USER so any
  future sweep result can be acted on with a Railway env flip without
  a redeploy.

The actual evidence-backed adjacency fix from the sweep is to lower
DIGEST_DEDUP_TOPIC_THRESHOLD from 0.45 -> 0.42 (env flip; see PR #3390).

* fix(brief-llm): tie buildDigestPrompt + hashDigestInput slice to MAX_STORIES_PER_USER

Greptile P1 on PR #3389: with MAX_STORIES_PER_USER now env-tunable,
hard-coded stories.slice(0, 12) in buildDigestPrompt and hashDigestInput
would mean the LLM prose only references the first 12 stories when
the brief carries more. Stories 13+ would appear as visible cards
but be invisible to the AI summary — a quiet mismatch between reader
narrative and brief content.

Cache key MUST stay aligned with the prompt slice or it drifts from
the prompt content; same constant fixes both sites.

Exports MAX_STORIES_PER_USER from brief-compose.mjs (single source
of truth) and imports it in brief-llm.mjs. No behaviour change at
the default cap of 12.
2026-04-25 12:07:48 +04:00
Elie Habib
abdcdb581f feat(resilience): SWF manifest expansion + KIA split + new schema fields (#3391)
* feat(resilience): SWF manifest expansion + KIA split + new schema fields

Phase 1 of plan 2026-04-25-001 (Codex-approved round 5). Manifest-only
data correction; no construct change, no cache prefix bump.

Schema additions (loader-validated, misplacement-rejected):
- top-level: aum_usd, aum_year, aum_verified (primary-source AUM)
- under classification: aum_pct_of_audited (fraction multiplier),
  excluded_overlaps_with_reserves (boolean; documentation-only)

Manifest expansion (13 → 21 funds, 6 → 13 countries):
- UAE: +ICD ($320B verified), +ADQ ($199B verified), +EIA (unverified —
  loaded for documentation, excluded from scoring per data-integrity rule)
- KW: kia split into kia-grf (5%, access=0.9) + kia-fgf (95%,
  access=0.20). Corrects ~18× over-statement of crisis-deployable
  Kuwait sovereign wealth (audit found combined-AUM × 0.7 access
  applied $750B as "deployable" against ~$15B actual GRF stabilization
  capacity).
- CN: +CIC ($1.35T), +NSSF ($400B, statutorily-gated 0.20 tier),
  +SAFE-IC ($417B, excluded — overlaps SAFE FX reserves)
- HK: +HKMA-EF ($498B, excluded — overlaps HKMA reserves)
- KR: +KIC ($182B, IFSWF full member)
- AU: +Future Fund ($192B, pension-locked)
- OM: +OIA ($50B, IFSWF member)
- BH: +Mumtalakat ($19B)
- TL: +Petroleum Fund ($22B, GPFG-style high-transparency)

Re-audits (Phase 1E):
- ADIA access 0.3 → 0.4 (rubric flagged; ruler-discretionary deployment
  empirically demonstrated)
- Mubadala access 0.4 → 0.5 (rubric flagged); transparency 0.6 → 0.7
  (LM=10 + IFSWF full member alignment)

Rubric (docs/methodology/swf-classification-rubric.md):
- New "Statutorily-gated long-horizon" 0.20 access tier added between
  0.1 (sanctions/frozen) and 0.3 (intergenerational/ruler-discretionary).
  Anchored by KIA-FGF (Decree 106 of 1976; Council-of-Ministers + Emir
  decree gate; crossed once in extremis during COVID).

Seeder:
- Two new pure helpers: shouldSkipFundForBuffer (excluded/unverified
  decision) and applyAumPctOfAudited (sleeve fraction multiplier)
- Manifest-AUM bypass: if aum_verified=true AND aum_usd present,
  use that value directly (skip Wikipedia)
- Skip funds with excluded_overlaps_with_reserves=true (no
  double-counting against reserveAdequacy / liquidReserveAdequacy)
- Skip funds with aum_verified=false (load for documentation only)

Tests (+25 net):
- 15 schema-extension tests (misplacement rejection, value-range gates,
  rationale-pairing coherence, backward-compat with pre-PR entries)
- 10 helper tests (shouldSkipFundForBuffer + applyAumPctOfAudited
  predicates and arithmetic; KIA-GRF + KIA-FGF sum equals combined AUM)
- Existing manifest test updated for the kia → kia-grf+kia-fgf split

Full suite: 6,940 tests pass (+50 net), typecheck clean, no new lint.

Predicted ranking deltas (informational, NOT acceptance criteria per
plan §"Hard non-goals"):
- AE sovFiscBuf likely 39 → 47-49 (Phase 1A + 1E)
- KW sovFiscBuf likely 98 → 53-57 (Phase 1B)
- CN, HK (excluded), KR, AU acquire newly-defined sovFiscBuf scores
- GCC ordering shifts toward QA > KW > AE; AE-KW gap likely 6 → ~3-4

Real outcome will be measured post-deploy via cohort audit per plan
§Phase 4.

* fix(resilience): completeness denominator excludes documentation-only funds

PR-3391 review (P1 catch): the per-country `expectedFunds` denominator
counted ALL manifest entries (`funds.length`) including those skipped
from buffer scoring by design — `excluded_overlaps_with_reserves: true`
(SAFE-IC, HKMA-EF) and `aum_verified: false` (EIA). Result: countries
with mixed scorable + non-scorable rosters showed `completeness < 1.0`
even when every scorable fund matched. UAE (4 scorable + EIA) would
show 0.8; CN (CIC + NSSF + SAFE-IC excluded) would show 0.67. The
downstream scorer then derated those countries' coverage based on a
fake-partial signal.

Three call sites all carried the same bug:
- per-country `expectedFunds` in fetchSovereignWealth main loop
- `expectedFundsTotal` + `expectedCountries` in buildCoverageSummary
- `countManifestFundsForCountry` (missing-country path)

All three now filter via `shouldSkipFundForBuffer` to count only
scorable manifest entries. Documentation-only funds neither expected
nor matched — they don't appear in the ratio at all.

Tests added (+4):
- AE complete with all 4 scorable matched (EIA documented but excluded)
- CN complete with CIC + NSSF matched (SAFE-IC documented but excluded)
- Missing-country path returns scorable count not raw manifest count
- Country with ONLY documentation-only entries excluded from expectedCountries

Full suite: 6,944 tests pass (+4 net), typecheck clean.

* fix(resilience): address Greptile P2s on PR #3391 manifest

Three review findings, all in the manifest YAML:

1. **KIA-GRF access 0.9 → 0.7** (rubric alignment): GRF deployment
   requires active Council-of-Ministers authorization (2020 COVID
   precedent demonstrates this), not rule-triggered automatic
   deployment. The rubric's 0.9 tier ("Pure automatic stabilization")
   reserved for funds where political authorization is post-hoc /
   symbolic (Chile ESSF candidate). KIA-GRF correctly fits 0.7
   ("Explicit stabilization with rule") — the same tier the
   pre-split combined-KIA was assigned. Updated rationale clarifies
   the tier choice. Rubric's 0.7 precedent column already lists
   "KIA General Reserve Fund" — now consistent with the manifest.

2. **Duplicate `# ── Australia ──` header before Oman** (copy-paste
   artifact): removed the orphaned header at the Oman section;
   added proper `# ── Australia ──` header above the Future Fund
   entry where it actually belongs (after Timor-Leste).

3. **NSSF `aum_pct_of_audited: 1.0` removed** (no-op): a multiplier
   of 1.0 is identity. The schema field is OPTIONAL and only meant
   for fund-of-funds split entries (e.g. KIA-GRF/FGF). Setting it
   to 1.0 forced the loader to require an `aum_pct_of_audited`
   rationale paragraph with no computational benefit. Both the
   field and the paragraph are now removed; NSSF remains a single-
   sleeve entry that scores its full audited AUM.

Full suite: 6,944 tests pass, typecheck clean.
2026-04-25 12:02:48 +04:00
Elie Habib
9c14820c69 fix(digest): brief filter-drop instrumentation + cache-key correctness (#3387)
* fix(digest): include sensitivity in digestFor cache key

buildDigest filters by rule.sensitivity BEFORE dedup, but digestFor
memoized only on (variant, lang, windowStart). Stricter-sensitivity
users in a shared bucket inherited the looser populator's pool,
producing the wrong story set and defeating downstream topic-grouping
adjacency once filterTopStories re-applied sensitivity.

Solution 1 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.

* feat(digest): instrument per-user filterTopStories drops

Adds an optional onDrop metrics callback to filterTopStories and threads
it through composeBriefFromDigestStories. The seeder aggregates counts
per composed brief and emits one structured log line per user per tick:

  [digest] brief filter drops user=<id> sensitivity=<s> in=<count>
    dropped_severity=<n> dropped_url=<n> dropped_headline=<n>
    dropped_shape=<n> out=<count>

Decides whether the conditional Solution 3 (post-filter regroup) is
warranted by quantifying how often post-group filter drops puncture
multi-member topics in production. No behaviour change for callers
that omit onDrop.

Solution 0 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.

* fix(digest): close two Sol-0 instrumentation gaps from code review

Review surfaced two P2 gaps in the filter-drop telemetry that weakened
its diagnostic purpose for Sol-3 gating:

1. Cap-truncation silent drop: filterTopStories broke on
   `out.length >= maxStories` BEFORE the onDrop emit sites, so up to
   (DIGEST_MAX_ITEMS - MAX_STORIES_PER_USER) stories per user were
   invisible. Added a 'cap' reason to DropMetricsFn and emit one event
   per skipped story so `in - out - sum(dropped_*) == 0` reconciles.

2. Wipeout invisibility: composeAndStoreBriefForUser only logged drop
   stats for the WINNING candidate. When every candidate composed to
   null, the log line never fired — exactly the wipeout case Sol-0
   was meant to surface. Now tracks per-candidate drops and emits an
   aggregate `outcome=wipeout` line covering all attempts.

Also tightens the digest-cache-key sensitivity regex test to anchor
inside the cache-key template literal (it would otherwise match the
unrelated `chosenCandidate.sensitivity ?? 'high'` in the new log line).

PR review residuals from
docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md
ce-code-review run 20260424-232911-37a2d5df.

* chore: ignore .context/ ce-code-review run artifacts

The ce-code-review skill writes per-run artifacts (reviewer JSON,
synthesis.md, metadata.json) under .context/compound-engineering/.
These are local-only — neither tracked nor linted.

* fix(digest): emit per-attempt filter-drop rows, not per-user

Addresses two PR #3387 review findings:

- P2: Earlier candidates that composed to null (wiped out by post-group
  filtering) had their dropStats silently discarded when a later
  candidate shipped — exactly the signal Sol-0 was meant to surface.
- P3: outcome=wipeout row was labeled with allCandidateDrops[0]
  .sensitivity, misleading when candidates within one user have
  different sensitivities.

Fix: emit one structured row per attempted candidate, tagged with that
candidate's own sensitivity and variant. Outcome is shipped|rejected.
A wipeout is now detectable as "all rows for this user are rejected
within the tick" — no aggregate-row ambiguity. Removes the
allCandidateDrops accumulator entirely.

* fix(digest): align composeBriefFromDigestStories sensitivity default to 'high'

Addresses PR #3387 review (P2): composeBriefFromDigestStories defaulted
to `?? 'all'` while buildDigest, the digestFor cache key, and the new
per-attempt log line all default to `?? 'high'`. The mismatch is
harmless in production (the live cron path pre-filters the pool) but:

- A non-prefiltered caller with undefined sensitivity would silently
  ship medium/low stories.
- Per-attempt telemetry labels the attempt as `sensitivity=high` while
  compose actually applied 'all' — operators are misled.

Aligning compose to 'high' makes the four sites agree and the telemetry
honest. Production output is byte-identical (input pool was already
'high'-filtered upstream).

Adds 3 regression tests asserting the new default: critical/high admitted,
medium/low dropped, and onDrop fires reason=severity for the dropped
levels (locks in alignment with per-attempt telemetry).

* fix(digest): align remaining sensitivity defaults to 'high'

Addresses PR #3387 review (P2 + P3): three more sites still defaulted
missing sensitivity to 'all' while compose/buildDigest/cache/log now
treat it as 'high'.

P2 — compareRules (scripts/lib/brief-compose.mjs:35-36): the rank
function used to default to 'all', placing legacy undefined-sensitivity
rules FIRST in the candidate order. Compose then applied a 'high'
filter to them, shipping a narrow brief while an explicit 'all' rule
for the same user was never tried. Aligned to 'high' so the rank
matches what compose actually applies.

P3 — enrichBriefEnvelopeWithLLM (scripts/lib/brief-llm.mjs:526):
the digest prompt and cache key still used 'all' for legacy rules,
misleading personalization ("Reader sensitivity level: all" while the
brief contains only critical/high stories) and busting the cache for
legacy vs explicit-'all' rows that should share entries.

Also aligns the @deprecated composeBriefForRule (line 164) for
consistency, since tests still import it.

3 new regression tests in tests/brief-composer-rule-dedup.test.mjs
lock in the new ranking: explicit 'all' beats undefined-sensitivity,
undefined-sensitivity ties with explicit 'high' (decided by updatedAt),
and groupEligibleRulesByUser candidate order respects the rank.

6853/6853 tests pass (was 6850 → +3).
2026-04-25 00:23:29 +04:00
Elie Habib
8cca8d19e3 feat(resilience): Comtrade-backed re-export-share seeder + SWF Redis read (#3385)
* feat(seed): BUNDLE_RUN_STARTED_AT_MS env + runSeed SIGTERM cleanup

Prereq for the re-export-share Comtrade seeder (plan 2026-04-24-003),
usable by any cohort seeder whose consumer needs bundle-level freshness.

Two coupled changes:

1. `_bundle-runner.mjs` injects `BUNDLE_RUN_STARTED_AT_MS` into every
   spawned child. All siblings in a single bundle run share one value
   (captured at `runBundle` start, not spawn time). Consumers use this
   to detect stale peer keys — if a peer's seed-meta predates the
   current bundle run, fall back to a hard default rather than read
   a cohort-peer's last-week output.

2. `_seed-utils.mjs::runSeed` registers a `process.once('SIGTERM')`
   handler that releases the acquired lock and extends existing-data
   TTL before exiting 143. `_bundle-runner.mjs` sends SIGTERM on
   section timeout, then SIGKILL after KILL_GRACE_MS (5s). Without
   this handler the `finally` path never runs on SIGKILL, leaving
   the 30-min acquireLock reservation in place until its own TTL
   expires — the next cron tick silently skips the resource.

Regression guard memory: `bundle-runner-sigkill-leaks-child-lock` (PR
#3128 root cause).

Tests added:
- bundle-runner env injection (value within run bounds)
- sibling sections share the same timestamp (critical for the
  consumer freshness guard)
- runSeed SIGTERM path: exit 143 + cleanup log
- process.once contract: second SIGTERM does not re-enter handler

* fix(seed): address P1/P2 review findings on SIGTERM + bundle contracts

Addresses PR #3384 review findings (todos 256, 257, 259, 260):

#256 (P1) — SIGTERM handler narrowed to fetch phase only. Was installed
at runSeed entry and armed through every `process.exit` path; could
race `emptyDataIsFailure: true` strict-floor exits (IMF-External,
WB-bulk) and extend seed-meta TTL when the contract forbids it —
silently re-masking 30-day outages. Now the handler is attached
immediately before `withRetry(fetchFn)` and removed in a try/finally
that covers all fetch-phase exit branches.

#257 (P1) — `BUNDLE_RUN_STARTED_AT_MS` now has a first-class helper.
Exported `getBundleRunStartedAtMs()` from `_seed-utils.mjs` with JSDoc
describing the bundle-freshness contract. Fleet-wide helper so the
next consumer seeder imports instead of rediscovering the idiom.

#259 (P2) — SIGTERM cleanup runs `Promise.allSettled` on disjoint-key
ops (`releaseLock` + `extendExistingTtl`). Serialising compounded
Upstash latency during the exact failure mode (Redis degraded) this
handler exists to handle, risking breach of the 5s SIGKILL grace.

#260 (P2) — `_bundle-runner.mjs` asserts topological order on
optional `dependsOn` section field. Throws on unknown-label refs and
on deps appearing at a later index. Fleet-wide contract replacing
the previous prose-comment ordering guarantee.

Tests added/updated:
- New: SIGTERM handler removed after fetchFn completes (narrowed-scope
  contract — post-fetch SIGTERM must NOT trigger TTL extension)
- New: dependsOn unknown-label + out-of-order + happy-path (3 tests)

Full test suite: 6,866 tests pass (+4 net).

* fix(seed): getBundleRunStartedAtMs returns null outside a bundle run

Review follow-up: the earlier `Math.floor(Date.now()/1000)*1000` fallback
regressed standalone (non-bundle) runs. A consumer seeder invoked
manually just after its peer wrote `fetchedAt = (now - 5s)` would see
`bundleStartMs = Date.now()`, reject the perfectly-fresh peer envelope
as "stale", and fall back to defaults — defeating the point of the
peer-read path outside the bundle.

Returning null when `BUNDLE_RUN_STARTED_AT_MS` is unset/invalid keeps
the freshness gate scoped to its real purpose (across-bundle-tick
staleness) and lets standalone runs skip the gate entirely. Consumers
check `bundleStartMs != null` before applying the comparison; see the
companion `seed-sovereign-wealth.mjs` change on the stacked PR.

* test(seed): SIGTERM cleanup test now verifies Redis DEL + EXPIRE calls

Greptile review P2 on PR #3384: the existing test only asserted exit
code + log line, not that the Redis ops were actually issued. The
log claim was ahead of the test.

Fixture now logs every Upstash fetch call's shape (EVAL / pipeline-
EXPIRE / other) to stderr. Test asserts:

- >=1 EVAL op was issued during SIGTERM cleanup (releaseLock Lua
  script on the lock key)
- >=1 pipeline-EXPIRE op was issued (extendExistingTtl on canonical
  + seed-meta keys)
- The EVAL body carries the runSeed-generated runId (proves it's
  THIS run's release, not a phantom op)
- The EXPIRE pipeline touches both the canonicalKey AND the
  seed-meta key (proves the keys[] array was built correctly
  including the extraKeys merge path)

Full test suite: 6,866 tests pass, typecheck clean.

* feat(resilience): Comtrade-backed re-export-share seeder + SWF Redis read

Plan ref: docs/plans/2026-04-24-003-feat-reexport-share-comtrade-seeder-plan.md

Motivating case. Before this PR, the SWF `rawMonths` denominator for
the `sovereignFiscalBuffer` dimension used GROSS annual imports for
every country. For re-export hubs (goods transiting without domestic
settlement), this structurally under-reports resilience: UAE's 2023
$941B of imports include $334B of transit flow that never represents
domestic consumption. Net imports = gross × (1 − reexport_share).

The previous (PR 3A) design flattened a hand-curated YAML into Redis;
the YAML shipped empty and never populated, so the correction never
applied and the cohort audit showed no movement.

Gap #2 (this PR). Two coupled changes to make the correction actually
apply:

1. Comtrade-backed seeder (`scripts/seed-recovery-reexport-share.mjs`).
   Rewritten to fetch UN Comtrade `flowCode=RX` (re-exports) and
   `flowCode=M` (imports) per cohort member, compute share = RX/M at
   the latest co-populated year, clamp to [0.05, 0.95], publish the
   envelope. Header auth (`Ocp-Apim-Subscription-Key`) — subscription
   key never reaches URL/logs/Redis. `maxRecords=250000` cap with
   truncation detection. Sequential + retry-on-429 with backoff.

   Hub cohort resolved by Phase 0 empirical probe (plan §Phase 0):
   ['AE', 'PA']. Six candidates (SG/HK/NL/BE/MY/LT) return HTTP 200
   with zero RX rows — Comtrade doesn't expose RX for those reporters.

2. SWF seeder reads from Redis (`scripts/seed-sovereign-wealth.mjs`).
   Swaps `loadReexportShareByCountry()` (YAML) for
   `loadReexportShareFromRedis()` (Redis key written by #1). Guarded
   by bundle-run freshness: if the sibling Reexport-Share seeder's
   `seed-meta` predates `BUNDLE_RUN_STARTED_AT_MS` (set by the
   prereq PR's `_bundle-runner.mjs` env-injection), HARD fallback
   to gross imports rather than apply last-month's stale share.

Health registries. Both new keys registered in BOTH `api/health.js`
SEED_META (60-day alert threshold) and `api/seed-health.js`
SEED_DOMAINS (43200min interval). feedback_two_health_endpoints_must_match.

Bundle wiring. `seed-bundle-resilience-recovery` Reexport-Share
timeout bumped 60s → 300s (Comtrade + retry can take 2-3 min
worst-case). Ordering preserved: Reexport-Share before Sovereign-
Wealth so the SWF seeder reads a freshly-written key in the same
cron tick.

Deletions. YAML + loader + 7 obsolete loader tests removed; single
source of truth is now Comtrade → Redis.

Prereq. Stacks on PR #3384 (feat/bundle-runner-env-sigterm)
which adds BUNDLE_RUN_STARTED_AT_MS env injection + runSeed
SIGTERM cleanup. This PR's bundle-freshness guard depends on
that env variable.

Tests (19 new, 7 deleted, +12 net):
- Pure math: parseComtradeFlowResponse, computeShareFromFlows,
  clampShare, declareRecords + credential-leak source scan (15)
- Integration (Gap #2 regression guards): SWF seeder loadReexport
  ShareFromRedis — fresh/absent/malformed/stale-meta/missing-meta (5)
- Health registry dual-registry drift guard — scoped to this PR's
  keys, respecting pre-existing asymmetry (4)
- Bundle-ordering + timeout assertions (2)

Phase 0 cohort validation committed to plan. Full test suite
passes: 6,881 tests.

* fix(resilience): address P1/P2 review findings — adopt shared helpers, pin freshness boundary

Addresses PR #3385 review findings:

#257 (P1) consumer — `seed-sovereign-wealth.mjs` imports the shared
`getBundleRunStartedAtMs` helper from `_seed-utils.mjs` (added in the
prereq commit) instead of its own `getBundleStartMs`. Single source of
truth for the bundle-freshness contract.

#258 (P2) — `seed-recovery-reexport-share.mjs` isMain guard uses the
canonical `pathToFileURL(process.argv[1]).href === import.meta.url`
form instead of basename-suffix matching. Handles symlinks, case-
different paths on macOS HFS+, and Windows path separators without
string munging.

#260 (P2) consumer — Sovereign-Wealth declares `dependsOn:
['Reexport-Share']` in the bundle spec. `_bundle-runner.mjs` (prereq
commit) now enforces topological order on load and throws on
violation — replaces the previous prose-comment ordering contract.

#261 (P2) — added a test to `tests/seed-sovereign-wealth-reads-redis-
reexport-share.test.mts` pinning the inclusive-boundary semantic:
`fetchedAtMs === bundleStartMs` must be treated as FRESH. Guards
against a future refactor to `<=` that would silently reject peers
writing at the very first millisecond of the bundle run.

Rebased onto updated prereq. Full test suite: 6,886 tests pass (+5 net).

* fix(resilience): freshness gate skipped in standalone mode; meta still required

Review catch: the previous `bundleStartMs = Date.now()` fallback made
standalone/manual `seed-sovereign-wealth.mjs` runs ALWAYS reject any
previously-seeded re-export-share meta as "stale" — even when the
operator ran the Reexport seeder milliseconds beforehand. Defeated
the point of the peer-read path outside the bundle.

With `getBundleRunStartedAtMs()` now returning null outside a bundle
(companion commit on the prereq branch), the consumer only applies
the freshness gate when `bundleStartMs != null`. Standalone runs
accept any `fetchedAt` — the operator is responsible for ordering.

Two guards survive the change:
- Meta MUST exist (absence = peer-outage fail-safe, both modes)
- In-bundle: meta MUST be at or after `BUNDLE_RUN_STARTED_AT_MS`

Two new tests pin both modes:
- standalone: accepts meta written 10 min before this process started
- standalone: still rejects missing meta (peer-outage fail-safe
  survives gate bypass)

Rebased onto updated prereq. Full test suite: 6,888 tests (+2 net).

* fix(resilience): filter world-aggregate Comtrade rows + skip final-retry sleep

Greptile review of PR #3385 flagged two P2s in the Comtrade seeder.

Finding #3 (parseComtradeFlowResponse double-count risk):
`cmdCode=TOTAL` without a partner filter currently returns only
world-aggregate rows in practice — but `parseComtradeFlowResponse`
summed every row unconditionally. A future refactor adding per-
partner querying would silently double-count (world-aggregate row +
partner-level rows for the same year), cutting the derived share in
half with no test signal.

Fix: explicit `partnerCode ∈ {'0', 0, null/undefined}` filter. Matches
current empirical behavior (aggregate-only responses) and makes the
construct robust to a future partner-level query.

Finding #4 (wasted backoff on final retry):
429 and 5xx branches slept `backoffMs` before `continue`, but on
`attempt === RETRY_MAX_ATTEMPTS` the loop condition fails immediately
after — the sleep was pure waste. Added early-return (parallel to the
existing pattern in the network-error catch branch) so the final
attempt exits the retry loop at the first non-success response
without extra latency.

Tests:
- 3 new `parseComtradeFlowResponse` variants: world-only filter,
  numeric-0 partnerCode shape, rows without partnerCode field
- Existing tests updated: the double-count assertion replaced with
  a "per-partner rows must NOT sum into the world-aggregate total"
  assertion that pins the new contract

Rebased onto updated prereq. Full test suite: 6,890 tests (+2 net).
2026-04-25 00:14:17 +04:00
Elie Habib
5f40f8a13a feat(seed): BUNDLE_RUN_STARTED_AT_MS env + runSeed SIGTERM cleanup (#3384)
* feat(seed): BUNDLE_RUN_STARTED_AT_MS env + runSeed SIGTERM cleanup

Prereq for the re-export-share Comtrade seeder (plan 2026-04-24-003),
usable by any cohort seeder whose consumer needs bundle-level freshness.

Two coupled changes:

1. `_bundle-runner.mjs` injects `BUNDLE_RUN_STARTED_AT_MS` into every
   spawned child. All siblings in a single bundle run share one value
   (captured at `runBundle` start, not spawn time). Consumers use this
   to detect stale peer keys — if a peer's seed-meta predates the
   current bundle run, fall back to a hard default rather than read
   a cohort-peer's last-week output.

2. `_seed-utils.mjs::runSeed` registers a `process.once('SIGTERM')`
   handler that releases the acquired lock and extends existing-data
   TTL before exiting 143. `_bundle-runner.mjs` sends SIGTERM on
   section timeout, then SIGKILL after KILL_GRACE_MS (5s). Without
   this handler the `finally` path never runs on SIGKILL, leaving
   the 30-min acquireLock reservation in place until its own TTL
   expires — the next cron tick silently skips the resource.

Regression guard memory: `bundle-runner-sigkill-leaks-child-lock` (PR
#3128 root cause).

Tests added:
- bundle-runner env injection (value within run bounds)
- sibling sections share the same timestamp (critical for the
  consumer freshness guard)
- runSeed SIGTERM path: exit 143 + cleanup log
- process.once contract: second SIGTERM does not re-enter handler

* fix(seed): address P1/P2 review findings on SIGTERM + bundle contracts

Addresses PR #3384 review findings (todos 256, 257, 259, 260):

#256 (P1) — SIGTERM handler narrowed to fetch phase only. Was installed
at runSeed entry and armed through every `process.exit` path; could
race `emptyDataIsFailure: true` strict-floor exits (IMF-External,
WB-bulk) and extend seed-meta TTL when the contract forbids it —
silently re-masking 30-day outages. Now the handler is attached
immediately before `withRetry(fetchFn)` and removed in a try/finally
that covers all fetch-phase exit branches.

#257 (P1) — `BUNDLE_RUN_STARTED_AT_MS` now has a first-class helper.
Exported `getBundleRunStartedAtMs()` from `_seed-utils.mjs` with JSDoc
describing the bundle-freshness contract. Fleet-wide helper so the
next consumer seeder imports instead of rediscovering the idiom.

#259 (P2) — SIGTERM cleanup runs `Promise.allSettled` on disjoint-key
ops (`releaseLock` + `extendExistingTtl`). Serialising compounded
Upstash latency during the exact failure mode (Redis degraded) this
handler exists to handle, risking breach of the 5s SIGKILL grace.

#260 (P2) — `_bundle-runner.mjs` asserts topological order on
optional `dependsOn` section field. Throws on unknown-label refs and
on deps appearing at a later index. Fleet-wide contract replacing
the previous prose-comment ordering guarantee.

Tests added/updated:
- New: SIGTERM handler removed after fetchFn completes (narrowed-scope
  contract — post-fetch SIGTERM must NOT trigger TTL extension)
- New: dependsOn unknown-label + out-of-order + happy-path (3 tests)

Full test suite: 6,866 tests pass (+4 net).

* fix(seed): getBundleRunStartedAtMs returns null outside a bundle run

Review follow-up: the earlier `Math.floor(Date.now()/1000)*1000` fallback
regressed standalone (non-bundle) runs. A consumer seeder invoked
manually just after its peer wrote `fetchedAt = (now - 5s)` would see
`bundleStartMs = Date.now()`, reject the perfectly-fresh peer envelope
as "stale", and fall back to defaults — defeating the point of the
peer-read path outside the bundle.

Returning null when `BUNDLE_RUN_STARTED_AT_MS` is unset/invalid keeps
the freshness gate scoped to its real purpose (across-bundle-tick
staleness) and lets standalone runs skip the gate entirely. Consumers
check `bundleStartMs != null` before applying the comparison; see the
companion `seed-sovereign-wealth.mjs` change on the stacked PR.

* test(seed): SIGTERM cleanup test now verifies Redis DEL + EXPIRE calls

Greptile review P2 on PR #3384: the existing test only asserted exit
code + log line, not that the Redis ops were actually issued. The
log claim was ahead of the test.

Fixture now logs every Upstash fetch call's shape (EVAL / pipeline-
EXPIRE / other) to stderr. Test asserts:

- >=1 EVAL op was issued during SIGTERM cleanup (releaseLock Lua
  script on the lock key)
- >=1 pipeline-EXPIRE op was issued (extendExistingTtl on canonical
  + seed-meta keys)
- The EVAL body carries the runSeed-generated runId (proves it's
  THIS run's release, not a phantom op)
- The EXPIRE pipeline touches both the canonicalKey AND the
  seed-meta key (proves the keys[] array was built correctly
  including the extraKeys merge path)

Full test suite: 6,866 tests pass, typecheck clean.
2026-04-25 00:14:04 +04:00
Elie Habib
ce797da3a4 chore(energy-atlas): backfill productClass on oil pipelines + enforce enum (#3383)
* chore(energy-atlas): backfill productClass on all oil pipelines + enforce enum

Prior state: 12/75 oil pipelines carried a `productClass: "crude"` tag;
63/75 did not. The field had zero consumers anywhere in the codebase
(no validator, no server handler, no frontend reader) — orphan metadata
from partial curation. Inconsistency spotted during the energy-data
audit after the Energy Atlas PR chain landed.

Changes:

1. Backfill all 63 missing entries with one of three values based on
   the pipeline's name/operator/route:
   - `crude` (70 total): crude-oil trunks, gathering lines, export
     systems. Covers Druzhba, Enbridge Mainline, Keystone-XL, CPC,
     BTC, ESPO, Sumed, Forties, Brent, OCP, OCENSA, EACOP, LAPSSET,
     etc.
   - `products` (4 total): explicit refined-product pipelines —
     Abqaiq-Yanbu Products Line, Vadinar-Kandla, Yangzi-Hefei-Hangzhou,
     Tuxpan-Mexico City.
   - `mixed` (1 total): Salina Cruz-Minatitlán, the only dual-use
     crude/products bridge in the set.

2. Promote productClass from orphan metadata to a schema invariant:
   - Oil pipelines MUST declare one of {crude, products, mixed}.
   - Gas pipelines MUST NOT carry the field (commodity IS its own
     class there).
   - Enforced in scripts/_pipeline-registry.mjs::validateRegistry.

3. Five new test assertions in tests/pipelines-registry.test.mts
   cover both the data invariant (every oil entry has a valid value;
   no gas entry has one) and the validator behavior (rejects missing,
   rejects unknown enum value, rejects gas-with-productClass).

File formatting: the oil registry mixes two styles — multi-line (each
field on its own line) and compact (several fields packed onto one
line). The insertion preserves the local style for each entry by
reusing the whitespace that follows `"commodityType": "oil",`.

No runtime consumers yet; this lands the data hygiene so future
downstream work (crude-vs-products split on the map, refined-product
shock calcs) can rely on the field being present and valid.

* fix(energy-atlas): import VALID_OIL_PRODUCT_CLASSES in tests instead of redefining

Greptile P2 on #3383: the test file defined its own inline
`const VALID = new Set(['crude', 'products', 'mixed'])`, mirroring the
registry's `VALID_OIL_PRODUCT_CLASSES`. If a future PR adds a new class
(e.g. `condensate`) to the registry, the inline copy wouldn't update —
the data test would start reporting valid pipelines as failing before
the validator rejects them, creating a confusing diagnostic gap.

Export `VALID_OIL_PRODUCT_CLASSES` from `scripts/_pipeline-registry.mjs`
and import it in the test. Single source of truth; no drift possible.
2026-04-24 23:36:51 +04:00
Elie Habib
7c0c08ad89 feat(energy-atlas): seed-side countries[] denorm on disruptions + CountryDeepDive row (§R #5 = B) (#3377)
* feat(energy-atlas): seed-side countries[] denorm + CountryDeepDive row (§R #5 = B)

Per plan §R/#5 decision B: denormalise countries[] at seed time on each
disruption event so CountryDeepDivePanel can filter events per country
without an asset-registry round trip. Schema join (pipeline/storage
→ event.assetId) happens once in the weekly cron, not on every panel
render. The alternative (client-side join) was rejected because it
couples UI logic to asset-registry internals and duplicates the join
for every surface that wants a per-country filter.

Changes:
- `proto/.../list_energy_disruptions.proto`: add `repeated string
  countries = 15` to EnergyDisruptionEntry with doc comment tying it
  to the plan decision and the always-non-empty invariant.
- `scripts/_energy-disruption-registry.mjs`:
    • Load pipeline-gas + pipeline-oil + storage-facilities registries
      once per seed cycle; index by id.
    • `deriveCountriesForEvent()` resolves assetId to {fromCountry,
      toCountry, transitCountries} (pipeline) or {country} (storage),
      deduped + alpha-sorted so byte-diff stability holds.
    • `buildPayload()` attaches the computed countries[] to every
      event before writing.
    • `validateRegistry()` now requires non-empty countries[] of
      ISO2 codes. Combined with the seeder's `emptyDataIsFailure:
      true`, this surfaces orphaned assetIds loudly — the next cron
      tick fails validation and seed-meta stays stale, tripping
      health alarms.
- `scripts/data/energy-disruptions.json`: fix two orphaned assetIds
  that the new join caught:
    • `cpc-force-majeure-2022`: `cpc-pipeline` → `cpc` (matches the
      entry in pipelines-oil.json).
    • `pdvsa-designation-2019`: `ve-petrol-2026-q1` (non-existent) →
      `venezuela-anzoategui-puerto-la-cruz`.
- `server/.../list-energy-disruptions.ts`: project countries[] into
  the RPC response via coerceStringArray. Legacy pre-denorm rows
  surface as empty array (always present on wire, length 0 => old).
- `src/components/CountryDeepDivePanel.ts`: add 4th Atlas row —
  "Energy disruptions in {iso2}" — filtered by `iso2 ∈ countries[]`.
  Failure is silent; EnergyDisruptionsPanel (upcoming) is the
  primary disruption surface.
- `tests/energy-disruptions-registry.test.mts`: switch to validating
  the buildPayload output (post-denorm), add §R #5 B invariant
  tests, plus a raw-JSON invariant ensuring curators don't hand-edit
  countries[] (it's derived, not declared).

Proto regen note: `make generate` currently fails with a duplicate
openapi plugin collision in buf.gen.yaml (unrelated bug — 3 plugin
entries emit to the same out dir). Worked around by temporarily
trimming buf.gen.yaml to just the TS plugins for this regen. Added
only the `countries: string[]` wire field to both service_client and
service_server; no other generated-file drift in this PR.

* chore(proto): regenerate openapi specs for countries[] field

Runs `make generate` with the sebuf v0.11.1 plugin now correctly
resolved via the PATH fix (cherry-picked from fix/makefile-generate-path-prefix).
The new `countries` field on EnergyDisruptionEntry propagates into:

- docs/api/SupplyChainService.openapi.yaml (primary per-service spec)
- docs/api/SupplyChainService.openapi.json (machine-readable variant)
- docs/api/worldmonitor.openapi.yaml (consolidated bundle)

No TypeScript drift beyond the already-committed service_client.ts /
service_server.ts updates in 80797e7cc.

* fix(energy-atlas): drop highlightEventId emission (review P2)

Codex P2: loadDisruptionsForCountry dispatched `highlightEventId` but
neither PipelineStatusPanel nor StorageFacilityMapPanel consumes it
(the openDetailHandler reads only pipelineId / facilityId). The UI's
implicit promise (event-specific highlighting) wasn't delivered —
clickthrough was asset-generic, and the extra wire field was a
misleading API surface.

Fix: emit only {pipelineId, facilityId} in the dispatched detail.
Row click opens the asset drawer; user sees the full per-asset
disruption timeline and locates the event visually.

Symmetric fix for PR #3378's EnergyDisruptionsPanel — both emitters
now match the drawer contract exactly. Re-add `highlightEventId`
here when the drawer panels ship matching consumer code
(openDetailHandler accepts it, loadDetail stores it,
renderDisruptionTimeline scrolls + emphasises the matching event).

Typecheck clean, test:data 6698/6698 pass.

* fix(energy-atlas): collision detection + abort signal + label clamp (review P2)

Three Codex P2 findings on PR #3377:

1. `loadAssetRegistries()` spread-merged gas + oil pipelines, silently
   overwriting entries on id collision. No collision today, but a
   curator adding a pipeline under the same id to both files would
   cause `deriveCountriesForEvent` to return wrong-commodity country
   data with no test flagging it.

   Fix: explicit merge loop that throws on duplicate id. The next
   cron tick fails validation, seed-meta stays stale, health alarms
   fire — same loud-failure pattern the rest of the seeder uses.

2. `loadDisruptionsForCountry` didn't thread `this.signal` through
   the RPC fetch shim. The stale-closure guard (`currentCode !== iso2`)
   discarded stale RESULTS, but the in-flight request couldn't be
   cancelled when the user switched countries or closed the panel.

   Fix: wrap globalThis.fetch with { signal: this.signal } in the
   client factory, matching the signal lifecycle the rest of the
   panel already uses.

3. `shortDescription` values up to 200 chars rendered without
   ellipsis in the compact Atlas row, overflowing the row layout.

   Fix: new `truncateDisruptionLabel` helper clamps to 80 chars with
   ellipsis. Full text still accessible via click-through to the
   asset drawer.

Typecheck clean, test:data 6698/6698 pass.
2026-04-24 19:08:07 +04:00
Sebastien Melki
e68a7147dd chore(api): sebuf migration follow-ups (post-#3242) (#3287)
* chore(api-manifest): rewrite brief-why-matters reason as proper internal-helper justification

Carried in from #3248 merge as a band-aid (called out in #3242 review followup
checklist item 7). The endpoint genuinely belongs in internal-helper —
RELAY_SHARED_SECRET-bearer auth, cron-only caller, never reached by dashboards
or partners. Same shape constraint as api/notify.ts.

Replaces the apologetic "filed here to keep the lint green" framing with a
proper structural justification: modeling it as a generated service would
publish internal cron plumbing as user-facing API surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(lint): premium-fetch parity check for ServiceClients (closes #3279)

Adds scripts/enforce-premium-fetch.mjs — AST-walks src/, finds every
`new <ServiceClient>(...)` (variable decl OR `this.foo =` assignment),
tracks which methods each instance actually calls, and fails if any
called method targets a path in src/shared/premium-paths.ts
PREMIUM_RPC_PATHS without `{ fetch: premiumFetch }` on the constructor.

Per-call-site analysis (not class-level) keeps the trade/index.ts pattern
clean — publicClient with globalThis.fetch + premiumClient with
premiumFetch on the same TradeServiceClient class — since publicClient
never calls a premium method.

Wired into:
- npm run lint:premium-fetch
- .husky/pre-push (right after lint:rate-limit-policies)
- .github/workflows/lint-code.yml (right after lint:api-contract)

Found and fixed three latent instances of the HIGH(new) #1 class from
#3242 review (silent 401 → empty fallback for signed-in browser pros):

- src/services/correlation-engine/engine.ts — IntelligenceServiceClient
  built with no fetch option called deductSituation. LLM-assessment overlay
  on convergence cards never landed for browser pros without a WM key.
- src/services/economic/index.ts — EconomicServiceClient with
  globalThis.fetch called getNationalDebt. National-debt panel rendered
  empty for browser pros.
- src/services/sanctions-pressure.ts — SanctionsServiceClient with
  globalThis.fetch called listSanctionsPressure. Sanctions-pressure panel
  rendered empty for browser pros.

All three swap to premiumFetch (single shared client, mirrors the
supply-chain/index.ts justification — premiumFetch no-ops safely on
public methods, so the public methods on those clients keep working).

Verification:
- lint:premium-fetch clean (34 ServiceClient classes, 28 premium paths,
  466 src/ files analyzed)
- Negative test: revert any of the three to globalThis.fetch → exit 1
  with file:line and called-premium-method names
- typecheck + typecheck:api clean
- lint:api-contract / lint:rate-limit-policies / lint:boundaries clean
- tests/sanctions-pressure.test.mjs + premium-fetch.test.mts: 16/16 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(military): fetchStaleFallback NEG_TTL=30s parity (closes #3277)

The legacy /api/military-flights handler had NEG_TTL = 30_000ms — a short
suppression window after a failed live + stale read so we don't Redis-hammer
the stale key during sustained relay+seed outages.

Carried into the sebuf list-military-flights handler:
- Module-scoped `staleNegUntil` timestamp (per-isolate on Vercel Edge,
  which is fine — each warm isolate gets its own 30s suppression window).
- Set whenever fetchStaleFallback returns null (key missing, parse fail,
  empty array after staleToProto filter, or thrown error).
- Checked at the entry of fetchStaleFallback before doing the Redis read.
- Test seam `_resetStaleNegativeCacheForTests()` exposed for unit tests.

Test pinned in tests/redis-caching.test.mjs: drives a stale-empty cycle
three times — first read hits Redis, second within window doesn't, after
test-only reset it does again.

Verified: 18/18 redis-caching tests pass, typecheck:api clean,
lint:premium-fetch clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(lint): rate-limit-policies regex → import() (closes #3278)

The previous lint regex-parsed ENDPOINT_RATE_POLICIES from the source
file. That worked because the literal happens to fit a single line per
key today, but a future reformat (multi-line key wrap, formatter swap,
etc.) would silently break the lint without breaking the build —
exactly the failure mode that's worse than no lint at all.

Fix:
- Export ENDPOINT_RATE_POLICIES from server/_shared/rate-limit.ts.
- Convert scripts/enforce-rate-limit-policies.mjs to async + dynamic
  import() of the policy object directly. Same TS module that the
  gateway uses at runtime → no source-of-truth drift possible.
- Run via tsx (already a dev dep, used by test:data) so the .mjs
  shebang can resolve a .ts import.
- npm script swapped to `tsx scripts/...`. .husky/pre-push uses
  `npm run lint:rate-limit-policies` so no hook change needed.

Verified:
- Clean: 6 policies / 182 gateway routes.
- Negative test (rename a key to the original sanctions typo
  /api/sanctions/v1/lookup-entity): exit 1 with the same incident-
  attributed remedy message as before.
- Reformat test (split a single-line entry across multiple lines):
  still passes — the property is what's read, not the source layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(shipping/v2): alertThreshold: 0 preserved; drop dead validation branch (#3242 followup)

Before: alert_threshold was a plain int32. proto3 scalar default is 0, so
the handler couldn't distinguish "partner explicitly sent 0 (deliver every
disruption)" from "partner omitted the field (apply legacy default 50)" —
both arrived as 0 and got coerced to 50 by `> 0 ? : 50`. Silent intent-drop
for any partner who wanted every alert. The subsequent `alertThreshold < 0`
branch was also unreachable after that coercion.

After:
- Proto field is `optional int32 alert_threshold` — TS type becomes
  `alertThreshold?: number`, so omitted = undefined and explicit 0 stays 0.
- Handler uses `req.alertThreshold ?? 50` — undefined → 50, any number
  passes through unchanged.
- Dead `< 0 || > 100` runtime check removed; buf.validate `int32.gte = 0,
  int32.lte = 100` already enforces the range at the wire layer.

Partner wire contract: identical for the omit-field and 1..100 cases.
Only behavioural change is explicit 0 — previously impossible to request,
now honored per proto3 optional semantics.

Scoped `buf generate --path worldmonitor/shipping/v2` to avoid the full-
regen `@ts-nocheck` drift Seb documented in the #3242 PR comments.
Re-applied `@ts-nocheck` on the two regenerated files manually.

Tests:
- `alertThreshold 0 coerces to 50` flipped to `alertThreshold 0 preserved`.
- New test: `alertThreshold omitted (undefined) applies legacy default 50`.
- `rejects > 100` test removed — proto/wire validation handles it; direct
  handler calls intentionally bypass wire and the handler no longer carries
  a redundant runtime range check.

Verified: 18/18 shipping-v2-handler tests pass, typecheck + typecheck:api
clean, all 4 custom lints clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(shipping/v2): document missing webhook delivery worker + DNS-rebinding contract (#3242 followup)

#3242 followup checklist item 6 from @koala73 — sanity-check that the
delivery worker honors the re-resolve-and-re-check contract that
isBlockedCallbackUrl explicitly delegates to it.

Audit finding: no delivery worker for shipping/v2 webhooks exists in this
repo. Grep across the entire tree (excluding generated/dist) shows the
only readers of webhook:sub:* records are the registration / inspection /
rotate-secret handlers themselves. No code reads them and POSTs to the
stored callbackUrl. The delivery worker is presumed to live in Railway
(separate repo) or hasn't been built yet — neither is auditable from
this repo.

Refreshes the comment block at the top of webhook-shared.ts to:
- explicitly state DNS rebinding is NOT mitigated at registration
- spell out the four-step contract the delivery worker MUST follow
  (re-validate URL, dns.lookup, re-check resolved IP against patterns,
   fetch with resolved IP + Host header preserved)
- flag the in-repo gap so anyone landing delivery code can't miss it

Tracking the gap as #3288 — acceptance there is "delivery worker imports
the patterns + helpers from webhook-shared.ts and applies the four steps
before each send." Action moves to wherever the delivery worker actually
lives (Railway likely).

No code change. Tests + lints unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(lint): add rate-limit-policies step (greptile P1 #3287)

Pre-push hook ran lint:rate-limit-policies but the CI workflow did not,
so fork PRs and --no-verify pushes bypassed the exact drift check the
lint was added to enforce (closes #3278). Adding it right after
lint:api-contract so it runs in the same context the lint was designed
for.

* refactor(lint): premium-fetch regex → import() + loop classRe (greptile P2 #3287)

Two fragilities greptile flagged on enforce-premium-fetch.mjs:

1. loadPremiumPaths regex-parsed src/shared/premium-paths.ts with
   /'(\/api\/[^']+)'/g — same class of silent drift we just removed
   from enforce-rate-limit-policies in #3278. Reformatting the source
   Set (double quotes, spread, helper-computed entries) would drop
   paths from the lint while leaving the runtime untouched. Fix: flip
   the shebang to `#!/usr/bin/env -S npx tsx` and dynamic-import
   PREMIUM_RPC_PATHS directly, mirroring the rate-limit pattern.
   package.json lint:premium-fetch now invokes via tsx too so the
   npm-script path matches direct execution.

2. loadClientClassMap ran classRe.exec once, silently dropping every
   ServiceClient after the first if a file ever contained more than
   one. Current codegen emits one class per file so this was latent,
   but a template change would ship un-linted classes. Fix: collect
   every class-open match with matchAll, slice each class body with
   the next class's start as the boundary, and scan methods per-body
   so method-to-class binding stays correct even with multiple
   classes per file.

Verification:
- lint:premium-fetch clean (34 classes / 28 premium paths / 466 files
  — identical counts to pre-refactor, so no coverage regression).
- Negative test: revert src/services/economic/index.ts to
  globalThis.fetch → exit 1 with file:line, bound var name, and
  premium method list (getNationalDebt). Restore → clean.
- lint:rate-limit-policies still clean.

* fix(shipping/v2): re-add alertThreshold handler range guard (greptile nit 1 #3287)

Wire-layer buf.validate enforces 0..100, but direct handler invocation
(internal jobs, test harnesses, future transports) bypasses it. Cheap
invariant-at-the-boundary — rejects < 0 or > 100 with ValidationError
before the record is stored.

Tests: restored the rejects-out-of-range cases that were dropped when the
branch was (correctly) deleted as dead code on the previous commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(lint): premium-fetch method-regex → TS AST (greptile nits 2+5 #3287)

loadClientClassMap:
  The method regex `async (\w+)\s*\([^)]*\)\s*:\s*Promise<[^>]+>\s*\{\s*let
  path = "..."` assumed (a) no nested `)` in arg types, (b) no nested `>`
  in the return type, (c) `let path = "..."` as the literal first statement.
  Any codegen template shift would silently drop methods with the lint still
  passing clean — the same silent-drift class #3287 just closed on the
  premium-paths side.

  Now walks the service_client.ts AST, matches `export class *ServiceClient`,
  iterates `MethodDeclaration` members, and reads the first
  `let path: string = '...'` variable statement as a StringLiteral. Tolerant
  to any reformatting of arg/return types or method shape.

findCalls scope-blindness:
  Added limitation comment — the walker matches `<varName>.<method>()`
  anywhere in the file without respecting scope. Two constructions in
  different function scopes sharing a var name merge their called-method
  sets. No current src/ file hits this; the lint errs cautiously (flags
  both instances). Keeping the walker simple until scope-aware binding
  is needed.

webhook-shared.ts:
  Inlined issue reference (#3288) so the breadcrumb resolves without
  bouncing through an MDX that isn't in the diff.

Verification:
- lint:premium-fetch clean — 34 classes / 28 premium paths / 489 files.
  Pre-refactor: 34 / 28 / 466. Class + path counts identical; file bump
  is from the main-branch rebase, not the refactor.
- Negative test: revert src/services/economic/index.ts premiumFetch →
  globalThis.fetch. Lint exits 1 at `src/services/economic/index.ts:64:7`
  with `premium method(s) called: getNationalDebt`. Restore → clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(lint): rate-limit OpenAPI regex → yaml parser (greptile nit 3 #3287)

Input side (ENDPOINT_RATE_POLICIES) was flipped to live `import()` in
4e79d029. Output side (OpenAPI routes) still regex-scraped top-level
`paths:` keys with `/^\s{4}(\/api\/[^\s:]+):/gm` — hard-coded 4-space
indent. Any YAML formatter change (2-space indent, flow style, line
folding) would silently drop routes and let policy-drift slip through
— same silent-drift class the input-side fix closed.

Now uses the `yaml` package (already a dep) to parse each
.openapi.yaml and reads `doc.paths` directly.

Verification:
- Clean: 6 policies / 189 routes (was 182 — yaml parser picks up a
  handful the regex missed, closing a silent coverage gap).
- Negative test: rename policy key back to /api/sanctions/v1/lookup-entity
  → exits 1 with the same incident-attributed remedy. Restore → clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(codegen): regenerate unified OpenAPI bundle for alert_threshold proto change

The shipping/v2 webhook alert_threshold field was flipped from `int32` to
`optional int32` with an expanded doc comment in f3339464. That comment
now surfaces in the unified docs/api/worldmonitor.openapi.yaml bundle
(introduced by #3341). Regenerated with sebuf v0.11.1 to pick it up.

No behaviour change — bundle-only documentation drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:00:41 +03:00
Elie Habib
184e82cb40 feat(resilience): PR 3A — net-imports denominator for sovereignFiscalBuffer (#3380)
PR 3A of cohort-audit plan 2026-04-24-002. Construct correction for
re-export hubs: the SWF rawMonths denominator was gross imports, which
double-counted flow-through trade that never represents domestic
consumption. Net-imports fix:

  rawMonths = aum / (grossImports × (1 − reexportShareOfImports)) × 12

applied to any country in the re-export share manifest. Countries NOT
in the manifest get gross imports unchanged (status-quo fallback).

Plan acceptance gates — verified synthetically in this PR:

  Construct invariant. Two synthetic countries, same SWF, same gross
  imports. A re-exports 60%; B re-exports 0%. Post-fix, A's rawMonths
  is 2.5× B's (1/(1-0.6) = 2.5). Pinned in
  tests/resilience-net-imports-denominator.test.mts.

  SWF-heavy exporter invariant. Country with share ≤ 5%: rawMonths
  lift < 5% vs baseline (negligible). Pinned.

What shipped

1. Re-export share manifest infrastructure.
   - scripts/shared/reexport-share-manifest.yaml (new, empty) — schema
     committed; entries populated in follow-up PRs with UNCTAD
     Handbook citations.
   - scripts/shared/reexport-share-loader.mjs (new) — loader + strict
     validator, mirrors swf-manifest-loader.mjs.
   - scripts/seed-recovery-reexport-share.mjs (new) — publishes
     resilience:recovery:reexport-share:v1 from manifest. Empty
     manifest = valid (no countries, no adjustment).

2. SWF seeder uses net-imports denominator.
   - scripts/seed-sovereign-wealth.mjs exports computeNetImports(gross,
     share) — pure helper, unit-tested.
   - Per-country loop: reads manifest, computes denominatorImports,
     applies to rawMonths math.
   - Payload records annualImports (gross, audit), denominatorImports
     (used in math), reexportShareOfImports (provenance).
   - Summary log reports which countries had a net-imports adjustment
     applied with source year.

3. Bundle wiring.
   - Reexport-Share runs BEFORE Sovereign-Wealth in the recovery
     bundle so the SWF seeder reads fresh re-export data in the same
     cron tick.
   - tests/seed-bundle-resilience-recovery.test.mjs expected-entries
     updated (6 → 7) with ordering preservation.

4. Cache-prefix bump (per cache-prefix-bump-propagation-scope skill).
   - RESILIENCE_SCORE_CACHE_PREFIX: v11 → v12
   - RESILIENCE_RANKING_CACHE_KEY: v11 → v12
   - RESILIENCE_HISTORY_KEY_PREFIX: v6 → v7 (history rotation prevents
     30-day rolling window from mixing pre/post-fix scores and
     manufacturing false "falling" trends on deploy day).
   - Source of truth: server/worldmonitor/resilience/v1/_shared.ts
   - Mirrored in: scripts/seed-resilience-scores.mjs,
     scripts/validate-resilience-correlation.mjs,
     scripts/backtest-resilience-outcomes.mjs,
     scripts/validate-resilience-backtest.mjs,
     scripts/benchmark-resilience-external.mjs, api/health.js
   - Test literals bumped in 4 test files (26 line edits).
   - EXTENDED tests/resilience-cache-keys-health-sync.test.mts with
     a parity pass that reads every known mirror file and asserts
     both (a) canonical prefix present AND (b) no stale v<older>
     literals in non-comment code. Found one legacy log-line that
     still referenced v9 (scripts/seed-resilience-scores.mjs:342)
     and refactored it to use the RESILIENCE_RANKING_CACHE_KEY
     constant so future bumps self-update.

Explicitly NOT in this PR

- liquidReserveAdequacy denominator fix. The plan's PR 3A wording
  mentions both dims, but the RESERVES ratio (WB FI.RES.TOTL.MO) is a
  PRE-COMPUTED WB series; applying a post-hoc net-imports adjustment
  mixes WB's denominator year with our manifest-year, and the math
  change belongs in PR 3B (unified liquidity) where the α calibration
  is explicit. This PR stays scoped to sovereignFiscalBuffer.
- Live re-export share entries. The manifest ships EMPTY in this PR;
  entries with UNCTAD citations are one-per-PR follow-ups so each
  figure is individually auditable.

Verified

- tests/resilience-net-imports-denominator.test.mts — 9 pass (construct
  contract: 2.5× ratio gate, monotonicity, boundary rejections,
  backward-compat on missing manifest entry, cohort-proportionality,
  SWF-heavy-exporter-unchanged)
- tests/reexport-share-loader.test.mts — 7 pass (committed-manifest
  shape + 6 schema-violation rejections)
- tests/resilience-cache-keys-health-sync.test.mts — 5 pass (existing 3
  + 2 new parity checks across all mirror files)
- tests/seed-bundle-resilience-recovery.test.mjs — 17 pass (expected
  entries bumped to 7)
- npm run test:data — 6714 pass / 0 fail
- npm run typecheck / typecheck:api — green
- npm run lint / lint:md — clean

Deployment notes

Score + ranking + history cache prefixes all bump in the same deploy.
Per established v10→v11 precedent (and the cache-prefix-bump-
propagation-scope skill):
- Score / ranking: 6h TTL — the new prefix populates via the Railway
  resilience-scores cron within one tick.
- History: 30d ring — the v7 ring starts empty; the first 30 days
  post-deploy lack baseline points, so trend / change30d will read as
  "no change" until v7 accumulates a window.
- Legacy v11 keys can be deleted from Redis at any time post-deploy
  (no reader references them). Leaving them in place costs storage
  but does no harm.
2026-04-24 18:14:04 +04:00
Elie Habib
0081da4148 fix(resilience): widen Comtrade period to 4y + surface picked year (#3372)
PR 1 of cohort-audit plan 2026-04-24-002. Unblocks UAE, Oman, Bahrain
(and any other late-reporter) on the importConcentration dimension.

Problem
- seed-recovery-import-hhi.mjs queries Comtrade with `period=Y-1,Y-2`
  (currently "2025,2024"). Several reporters publish Comtrade 1-2y
  behind — their 2024/2025 rows are empty while 2023 is populated.
- With no data in the queried window, parseRecords() returned [] for
  the reporter, the seeder counted a "skip", the scorer fell through
  to IMPUTE (score=50, coverage=0.3, imputationClass="unmonitored"),
  and the cohort-sanity audit flagged AE as a coverage-outlier inside
  the GCC — exactly the class of silent gap the audit is designed to
  catch.

Fix
1. Widen the Comtrade period parameter to a 4-year window Y-1..Y-4
   via a new `buildPeriodParam(now)` helper. On-time reporters still
   pick their latest year via the existing completeness tiebreak in
   parseRecords(); late reporters now pick up whatever year they
   actually published in (2023 for UAE, etc.).
2. parseRecords() now returns { rows, year } — the year surfaces in
   the per-country payload as `year: number | null` for operator
   freshness audit. The scorer already expects this shape
   (_dimension-scorers.ts:1524 RecoveryImportHhiCountry.year); this
   PR actually populates it.
3. `buildPeriodParam` + `parseRecords` are exported so their unit
   tests can pin year-selection behaviour without hitting Comtrade.

Note on PR 2 of the same plan
The plan calls out "PR 2 — externalDebtCoverage re-goalpost to
Greenspan-Guidotti" as unshipped. It IS shipped: commit 7f78a7561
"PR 3 §3.5 point 3 — re-goalpost externalDebtCoverage (0..5 → 0..2)"
landed under the prior workstream 2026-04-22-001. The new construct
invariants in tests/resilience-construct-invariants.test.mts
(shipped in PR 0 / #3369) confirm score(ratio=0)=100, score(1)=50,
score(2)=0 against current main. PR 2 of the cohort-audit plan is a
no-op; I'll flag this on the plan review thread rather than bundle
a plan edit into this PR.

Verified
- `npx tsx --test tests/seed-recovery-import-hhi.test.mjs` — 19 pass
  (10 existing + 9 new: buildPeriodParam shape; parseRecords picks
  completeness-tiebreak, newer-year-on-ties, late-reporter fallback;
  empty/negative/world-aggregate handling)
- `npx tsx --test tests/seed-comtrade-5xx-retry.test.mjs` — green
  (the `{ records, status }` destructure pattern at the caller still
  works; the new third field `year` is additive)
- `npm run test:data` — 6703 pass / 0 fail
- `npm run typecheck` / `typecheck:api` — green
- `npm run lint` / `lint:md` — no new warnings
- No cache-prefix bump: the payload shape only ADDS an optional
  field; old snapshots remain valid readers.

Acceptance per plan
- Construct invariant: score(HHI=0.05) > score(HHI=0.20) — already
  covered in tests/resilience-construct-invariants.test.mts (PR #3369)
- Monotonicity pin: score(hhi=0.15) > score(hhi=0.45) — already
  covered in tests/resilience-dimension-monotonicity.test.mts

Post-deploy verification
After the next Railway seed-bundle-resilience-recovery cron tick,
confirm UAE/OM/BH appear in `resilience:recovery:import-hhi:v1`
with non-null hhi and `year` = 2023 (or their actual latest year).
Then re-run the cohort audit — the GCC coverage-outlier flag on
AE.importConcentration should disappear.
2026-04-24 18:13:41 +04:00
Elie Habib
df392b0514 feat(resilience): PR 0 — cohort-sanity release-gate harness (#3369)
* feat(resilience): PR 0 — cohort-sanity release-gate harness

Lands the audit infrastructure for the resilience cohort-ranking
structural audit (plan 2026-04-24-002). Release gate, not merge gate:
the audit tells release review what to look at before publishing a
ranking; it does not block a PR.

What's new
- scripts/audit-resilience-cohorts.mjs — Markdown report generator.
  Fetches the live ranking + per-country scores (or reads a fixture
  in offline mode), emits per-cohort per-dimension tables, contribution
  decomposition, saturated / outlier / identical-score flags, and a
  top-N movers comparison vs a baseline snapshot.
- tests/resilience-construct-invariants.test.mts — 12 formula-level
  anchor-value assertions with synthetic inputs. Covers HHI, external
  debt (Greenspan-Guidotti anchor), and sovereign fiscal buffer
  (saturating transform). Tests the MATH, not a country's rank.
- tests/fixtures/resilience-audit-fixture.json — offline fixture that
  mirrors the 2026-04-24 GCC state (KW>QA>AE) so the audit tool can
  be smoke-tested without API-key access.
- docs/methodology/cohort-sanity-release-gate.md — operational doc
  explaining when to run, how to read the report, and the explicit
  anti-pattern note on rank-targeted acceptance criteria.

Verified
- `npx tsx --test tests/resilience-construct-invariants.test.mts` —
  12 pass (HHI, debt, SWF invariants all green against current scorer)
- `npm run test:data` — 6706 pass / 0 fail
- `FIXTURE=tests/fixtures/resilience-audit-fixture.json
   OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs`
  runs to completion and correctly flags:
  (a) coverage-outlier on AE.importConcentration (0.3 vs peers 1.0)
  (b) saturated-high on GCC.externalDebtCoverage (all 6 at 100)
  — the two top cohort-sanity findings from the plan.

Not in this PR
- The live-API baseline snapshot
  (docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json)
  is deferred to a manual release-prep step: run
  `WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app
   node scripts/freeze-resilience-ranking.mjs` before the first
  methodology PR (PR 1 HHI period widening) so its movers table has
  something to compare against.
- No scorer changes. No cache-prefix bumps. This PR is pure tooling.

* fix(resilience): fail-closed on fetch failures + pillar-combine formula mode

Addresses review P1 + P2 on PR #3369.

P1 — fetch-failure silent-drop.
Per-country score fetches that failed were logged to stderr, silently
stored as null, and then filtered out of cohort tables via
`codes.filter((cc) => scoreMap.get(cc))`. A transient 403/500 on the
very country carrying the ranking anomaly could produce a Markdown
report that looked valid — wrong failure mode for a release gate.

Fix:
- `fetchScoresConcurrent` now tracks failures in a dedicated Map and
  does NOT insert null placeholders; missing cohort members are
  computed against the requested cohort code set.
- The report has a  blocker banner at top AND an always-rendered
  "Fetch failures / missing members" section (shown even when empty,
  so an operator learns to look).
- `STRICT=1` writes the report, then exits code 3 on any fetch
  failure or missing cohort member, code 4 on formula-mode drift,
  code 0 otherwise. Automation can differentiate the two.

P2 — pillar-combine formula mode invalidates contribution rows.
`docs/methodology/cohort-sanity-release-gate.md:63` tells operators
to run this audit before activating `RESILIENCE_PILLAR_COMBINE_ENABLED`,
but the contribution decomposition is a domain-weighted roll-up that
is ONLY valid when `overallScore = sum(domain.score * domain.weight)`.
Once pillar combine is on, `overallScore = penalizedPillarScore(pillars)`
(non-linear in dim scores); decomposition rows become materially
misleading for exactly the release-gate scenario the doc prescribes.

Fix:
- Added `detectFormulaMode(scoreMap)` that takes countries with:
  (a) `sum(domain.weight)` within 0.05 of 1.0 (complete response), AND
  (b) every dim at `coverage ≥ 0.9` (stable share math)
  and compares `|Σ contributions - overallScore|` against
  `CONTRIB_TOLERANCE` (default 1.5). If > 50% of ≥ 3 eligible
  countries drift, pillar combine is flagged.
- Report emits a  blocker banner at top, a "Formula mode" line in
  the header, and a "Formula-mode diagnostic" section with the first
  three offenders. Under `STRICT=1` exits code 4.
- Methodology doc updated: new "Fail-closed semantics" section,
  "Formula mode" operator guide, ENV table entries for STRICT +
  CONTRIB_TOLERANCE.

Verified:
- `tests/audit-cohort-formula-detection.test.mts` (NEW) — 3 child-process
  smoke tests: missing-members banner + STRICT exit 3, all-clear exit 0,
  pillar-mode banner + STRICT exit 4. All pass.
- `npx tsx --test tests/resilience-construct-invariants.test.mts
   tests/audit-cohort-formula-detection.test.mts` — 15 pass / 0 fail
- `npm run test:data` — 6709 pass / 0 fail
- `npm run typecheck` / `typecheck:api` — green
- `npm run lint` / `lint:md` — no warnings on new / changed files
  (refactor split buildReport complexity from 51 → under 50 by
  extracting `renderCohortSection` + `renderDimCell`)
- Fixture smoke: AE.importConcentration coverage-outlier and
  GCC.externalDebtCoverage saturated-high flags still fire correctly.

* fix(resilience): PR 0 review — fixture-mode source label, try/catch country-names, ASCII minus

Addresses 3 P2 Greptile findings on #3369:

1. **Misleading Source: line in fixture mode.** `FIXTURE_PATH` sets
   `API_BASE=''`, so the report header showed a bare "/api/..." path that
   never resolved — making a fixture run visually indistinguishable from
   a live run. Now surfaces `Source: fixture://<path>` in fixture mode.

2. **`loadCountryNameMap` crashes without useful diagnostics.** A missing
   or unparseable `shared/country-names.json` produced a raw unhandled
   rejection. Now the read and the parse are each wrapped in their own
   try/catch; on either failure the script logs a developer-friendly
   warning and falls back to ISO-2 codes (report shows "AE" instead of
   "Uae"). Keeps the audit operable in CI-offline scenarios.

3. **Unicode minus `−` (U+2212) instead of ASCII `-` in `fmtDelta`.**
   Downstream operators diff / grep / CSV-pipe the report; the Unicode
   minus breaks byte-level text tooling. Replaced with ASCII hyphen-
   minus. Left the U+2212 in the formula-mode diagnostic prose
   (`|Σ contributions − overallScore|`) where it's mathematical notation,
   not data.

Verified

- `npx tsx --test tests/audit-cohort-formula-detection.test.mts tests/resilience-construct-invariants.test.mts` — 15 pass / 0 fail
- Fixture-mode run produces `Source: fixture://tests/fixtures/...`
- Movers-table negative deltas now use ASCII `-`
2026-04-24 18:13:22 +04:00
Elie Habib
34dfc9a451 fix(news): ground LLM surfaces on real RSS description end-to-end (#3370)
* feat(news/parser): extract RSS/Atom description for LLM grounding (U1)

Add description field to ParsedItem, extract from the first non-empty of
description/content:encoded (RSS) or summary/content (Atom), picking the
longest after HTML-strip + entity-decode + whitespace-normalize. Clip to
400 chars. Reject empty, <40 chars after strip, or normalize-equal to the
headline — downstream consumers fall back to the cleaned headline on '',
preserving current behavior for feeds without a description.

CDATA end is anchored to the closing tag so internal ]]> sequences do not
truncate the match. Preserves cached rss:feed:v1 row compatibility during
the 1h TTL bleed since the field is additive.

Part of fix: pipe RSS description end-to-end so LLM surfaces stop
hallucinating named actors (docs/plans/2026-04-24-001-...).

Covers R1, R7.

* feat(news/story-track): persist description on story:track:v1 HSET (U2)

Append description to the story:track:v1 HSET only when non-empty. Additive
— no key version bump. Old rows and rows from feeds without a description
return undefined on HGETALL, letting downstream readers fall back to the
cleaned headline (R6).

Extract buildStoryTrackHsetFields as a pure helper so the inclusion gate is
unit-testable without Redis.

Update the contract comment in cache-keys.ts so the next reader of the
schema sees description as an optional field.

Covers R2, R6.

* feat(proto): NewsItem.snippet + SummarizeArticleRequest.bodies (U3)

Add two additive proto fields so the article description can ride to every
LLM-adjacent consumer without a breaking change:

- NewsItem.snippet (field 12): RSS/Atom description, HTML-stripped,
  ≤400 chars, empty when unavailable. Wired on toProtoItem.
- SummarizeArticleRequest.bodies (field 8): optional article bodies
  paired 1:1 with headlines for prompt grounding. Empty array is today's
  headline-only behavior.

Regenerated TS client/server stubs and OpenAPI YAML/JSON via sebuf v0.11.1
(PATH=~/go/bin required — Homebrew's protoc-gen-openapiv3 is an older
pre-bundle-mode build that collides on duplicate emission).

Pre-emptive bodies:[] placeholders at the two existing SummarizeArticle
call sites in src/services/summarization.ts; U6 replaces them with real
article bodies once SummarizeArticle handler reads the field.

Covers R3, R5.

* feat(brief/digest): forward RSS description end-to-end through brief envelope (U4)

Digest accumulator reader (seed-digest-notifications.mjs::buildDigest) now
plumbs the optional `description` field off each story:track:v1 HGETALL into
the digest story object. The brief adapter (brief-compose.mjs::
digestStoryToUpstreamTopStory) prefers the real RSS description over the
cleaned headline; when the upstream row has no description (old rows in the
48h bleed, feeds that don't carry one), we fall back to the cleaned headline
so today behavior is preserved (R6).

This is the upstream half of the description cache path. U5 lands the LLM-
side grounding + cache-prefix bump so Gemini actually sees the article body
instead of hallucinating a named actor from the headline.

Covers R4 (upstream half), R6.

* feat(brief/llm): RSS grounding + sanitisation + 4 cache prefix bumps (U5)

The actual fix for the headline-only named-actor hallucination class:
Gemini 2.5 Flash now receives the real article body as grounding context,
so it paraphrases what the article says instead of filling role-label
headlines from parametric priors ("Iran's new supreme leader" → "Ali
Khamenei" was the 2026-04-24 reproduction; with grounding, it becomes
the actual article-named actor).

Changes:

- buildStoryDescriptionPrompt interpolates a `Context: <body>` line
  between the metadata block and the "One editorial sentence" instruction
  when description is non-empty AND not normalise-equal to the headline.
  Clips to 400 chars as a second belt-and-braces after the U1 parser cap.
  No Context line → identical prompt to pre-fix (R6 preserved).

- sanitizeStoryForPrompt extended to cover `description`. Closes the
  asymmetry where whyMatters was sanitised and description wasn't —
  untrusted RSS bodies now flow through the same injection-marker
  neutraliser before prompt interpolation. generateStoryDescription wraps
  the story in sanitizeStoryForPrompt before calling the builder,
  matching generateWhyMatters.

- Four cache prefixes bumped atomically to evict pre-grounding rows:
    scripts/lib/brief-llm.mjs:
      brief:llm:description:v1 → v2  (Railway, description path)
      brief:llm:whymatters:v2 → v3   (Railway, whyMatters fallback)
    api/internal/brief-why-matters.ts:
      brief:llm:whymatters:v6 → v7                (edge, primary)
      brief:llm:whymatters:shadow:v4 → shadow:v5  (edge, shadow)
  hashBriefStory already includes description in the 6-field material
  (v5 contract) so identity naturally drifts; the prefix bump is the
  belt-and-braces that guarantees a clean cold-start on first tick.

- Tests: 8 new + 2 prefix-match updates on tests/brief-llm.test.mjs.
  Covers Context-line injection, empty/dup-of-headline rejection,
  400-char clip, sanitisation of adversarial descriptions, v2 write,
  and legacy-v1 row dark (forced cold-start).

Covers R4 + new sanitisation requirement.

* feat(news/summarize): accept bodies + bump summary cache v5→v6 (U6)

SummarizeArticle now grounds on per-headline article bodies when callers
supply them, so the dashboard "News summary" path stops hallucinating
across unrelated headlines when the upstream RSS carried context.

Three coordinated changes:

1. SummarizeArticleRequest handler reads req.bodies, sanitises each entry
   through sanitizeForPrompt (same trust treatment as geoContext — bodies
   are untrusted RSS text), clips to 400 chars, and pads to the headlines
   length so pair-wise identity is stable.

2. buildArticlePrompts accepts optional bodies and interleaves a
   `    Context: <body>` line under each numbered headline that has a
   non-empty body. Skipped in translate mode (headline[0]-only) and when
   all bodies are empty — yielding a byte-identical prompt to pre-U6
   for every current caller (R6 preserved).

3. summary-cache-key bumps CACHE_VERSION v5→v6 so the pre-grounding rows
   (produced from headline-only prompts) cold-start cleanly. Extends
   canonicalizeSummaryInputs + buildSummaryCacheKey with a pair-wise
   bodies segment `:bd<hash>`; the prefix is `:bd` rather than `:b` to
   avoid colliding with `:brief:` when pattern-matching keys. Translate
   mode is headline[0]-only and intentionally does not shift on bodies.

Dedup reorder preserved: the handler re-pairs bodies to the deduplicated
top-5 via findIndex, so layout matches without breaking cache identity.

New tests: 7 on buildArticlePrompts (bodies interleave, partial fill,
translate-mode skip, clip, short-array tolerance), 8 on
buildSummaryCacheKey (pair-wise sort, cache-bust on body drift, translate
skip). Existing summary-cache-key assertions updated v5→v6.

Covers R3, R4.

* feat(consumers): surface RSS snippet across dashboard, email, relay, MCP + audit (U7)

Thread the RSS description from the ingestion path (U1-U5) into every
user-facing LLM-adjacent surface. Audit the notification producers so
RSS-origin and domain-origin events stay on distinct contracts.

Dashboard (proto snippet → client → panel):
- src/types/index.ts NewsItem.snippet?:string (client-side field).
- src/app/data-loader.ts proto→client mapper propagates p.snippet.
- src/components/NewsPanel.ts renders snippet as a truncated (~200 chars,
  word-boundary ellipsis) `.item-snippet` line under each headline.
- NewsPanel.currentBodies tracks per-headline bodies paired 1:1 with
  currentHeadlines; passed as options.bodies to generateSummary so the
  server-side SummarizeArticle LLM grounds on the article body.

Summary plumbing:
- src/services/summarization.ts threads bodies through SummarizeOptions
  → generateSummary → runApiChain → tryApiProvider; cache key now includes
  bodies (via U6's buildSummaryCacheKey signature).

MCP world-brief:
- api/mcp.ts pairs headlines with their RSS snippets and POSTs `bodies`
  to /api/news/v1/summarize-article so the MCP tool surface is no longer
  starved.

Email digest:
- scripts/seed-digest-notifications.mjs plain-text formatDigest appends
  a ~200-char truncated snippet line under each story; HTML formatDigestHtml
  renders a dim-grey description div between title and meta. Both gated
  on non-empty description (R6 — empty → today's behavior).

Real-time alerts:
- src/services/breaking-news-alerts.ts BreakingAlert gains optional
  description; checkBatchForBreakingAlerts reads item.snippet; dispatchAlert
  includes `description` in the /api/notify payload when present.

Notification relay:
- scripts/notification-relay.cjs formatMessage gated on
  NOTIFY_RELAY_INCLUDE_SNIPPET=1 (default off). When on, RSS-origin
  payloads render a `> <snippet>` context line under the title. When off
  or payload.description absent, output is byte-identical to pre-U7.

Audit (RSS vs domain):
- tests/notification-relay-payload-audit.test.mjs enforces file-level
  @notification-source tags on every producer, rejects `description:` in
  domain-origin payload blocks, and verifies the relay codepath gates
  snippet rendering under the flag.
- Tag added to ais-relay.cjs (domain), seed-aviation.mjs (domain),
  alert-emitter.mjs (domain), breaking-news-alerts.ts (rss).

Deferred (plan explicitly flags): InsightsPanel + cluster-producer
plumbing (bodies default to [] — will unlock gradually once news:insights:v1
producer also carries primarySnippet).

Covers R5, R6.

* docs+test: grounding-path note + bump pinned CACHE_VERSION v5→v6 (U8)

Final verification for the RSS-description-end-to-end fix:

- docs/architecture.mdx — one-paragraph "News Grounding Pipeline"
  subsection tracing parser → story:track:v1.description → NewsItem.snippet
  → brief / SummarizeArticle / dashboard / email / relay / MCP, with the
  empty-description R6 fallback rule called out explicitly.
- tests/summarize-reasoning.test.mjs — Fix-4 static-analysis pin updated
  to match the v6 bump from U6. Without this the summary cache bump silently
  regressed CI's pinned-version assertion.

Final sweep (2026-04-24):
- grep -rn 'brief:llm:description:v1' → only in the U5 legacy-row test
  simulation (by design: proves the v2 bump forces cold-start).
- grep -rn 'brief:llm:whymatters:v2/v6/shadow:v4' → no live references.
- grep -rn 'summary:v5' → no references.
- CACHE_VERSION = 'v6' in src/utils/summary-cache-key.ts.
- Full tsx --test sweep across all tests/*.test.{mjs,mts}: 6747/6747 pass.
- npm run typecheck + typecheck:api: both clean.

Covers R4, R6, R7.

* fix(rss-description): address /ce:review findings before merge

14 fixes from structured code review across 13 reviewer personas.

Correctness-critical (P1 — fixes that prevent R6/U7 contract violations):
- NewsPanel signature covers currentBodies so view-mode toggles that leave
  headlines identical but bodies different now invalidate in-flight summaries.
  Without this, switching renderItems → renderClusters mid-summary let a
  grounded response arrive under a stale (now-orphaned) cache key.
- summarize-article.ts re-pairs bodies with headlines BEFORE dedup via a
  single zip-sanitize-filter-dedup pass. Previously bodies[] was indexed by
  position in light-sanitized headlines while findIndex looked up the
  full-sanitized array — any headline that sanitizeHeadlines emptied
  mispaired every subsequent body, grounding the LLM on the wrong story.
- Client skips the pre-chain cache lookup when bodies are present, since
  client builds keys from RAW bodies while server sanitizes first. The
  keys diverge on injection content, which would silently miss the
  server's authoritative cache every call.

Test + audit hardening:
- Legacy v1 eviction test now uses the real hashBriefStory(story()) suffix
  instead of a literal "somehash", so a bug where the reader still queried
  the v1 prefix at the real key would actually be caught.
- tests/summary-cache-key.test.mts adds 400-char clip identity coverage so
  the canonicalizer's clip and any downstream clip can't silently drift.
- tests/news-rss-description-extract.test.mts renames the well-formed
  CDATA test and adds a new test documenting the malformed-]]> fallback
  behavior (plain regex captures, article content survives).

Safe_auto cleanups:
- Deleted dead SNIPPET_PUSH_MAX constant in notification-relay.cjs.
- BETA-mode groq warm call now passes bodies, warming the right cache slot.
- seed-digest shares a local normalize-equality helper for description !=
  headline comparison, matching the parser's contract.
- Pair-wise sort in summary-cache-key tie-breaks on body so duplicate
  headlines produce stable order across runs.
- buildSummaryCacheKey gained JSDoc documenting the client/server contract
  and the bodies parameter semantics.
- MCP get_world_brief tool description now mentions RSS article-body
  grounding so calling agents see the current contract.
- _shared.ts `opts.bodies![i]!` double-bang replaced with `?? ''`.
- extractRawTagBody regexes cached in module-level Map, mirroring the
  existing TAG_REGEX_CACHE pattern.

Deferred to follow-up (tracked for PR description / separate issue):
- Promote shared MAX_BODY constant across the 5 clip sites
- Promote shared truncateForDisplay helper across 4 render sites
- Collapse NewsPanel.{currentHeadlines, currentBodies} → Array<{title, snippet}>
- Promote sanitizeStoryForPrompt to shared/brief-llm-core.js
- Split list-feed-digest.ts parser helpers into sibling -utils.ts
- Strengthen audit test: forward-sweep + behavioral gate test

Tests: 6749/6749 pass. Typecheck clean on both configs.

* fix(summarization): thread bodies through browser T5 path (Codex #2)

Addresses the second of two Codex-raised findings on PR #3370:

The PR threaded bodies through the server-side API provider chain
(Ollama → Groq → OpenRouter → /api/news/v1/summarize-article) but the
local browser T5 path at tryBrowserT5 was still summarising from
headlines alone. In BETA_MODE that ungrounded path runs BEFORE the
grounded server providers; in normal mode it remains the last
fallback. Whenever T5-small won, the dashboard summary surface
regressed to the headline-only path — the exact hallucination class
this PR exists to eliminate.

Fix: tryBrowserT5 accepts an optional `bodies` parameter and
interleaves each body with its paired headline via a `headline —
body` separator in the combined text (clipped to 200 chars per body
to stay within T5-small's ~512-token context window). All three call
sites (BETA warm, BETA cold, normal-mode fallback) now pass the
bodies threaded down from generateSummary options.bodies.

When bodies is empty/omitted, the combined text is byte-identical to
pre-fix (R6 preserved).

On Codex finding #1 (story:track:v1 additive-only HSET keeps a body
from an earlier mention of the same normalized title), declining to
change. The current rule — "if this mention has a body, overwrite;
otherwise leave the prior body alone" — is defensible: a body from
mention A is not falsified by mention B being body-less (a wire
reprint doesn't invalidate the original source's body). A feed that
publishes a corrected headline creates a new normalized-title hash,
so no stale body carries forward. The failure window is narrow (live
story evolving while keeping the same title through hours of
body-less wire reprints) and the 7-day STORY_TTL is the backstop.
Opening a follow-up issue to revisit semantics if real-world evidence
surfaces a stale-grounding case.

* fix(story-track): description always-written to overwrite stale bodies (Codex #1)

Revisiting Codex finding #1 on PR #3370 after re-review. The previous
response declined the fix with reasoning; on reflection the argument
was over-defending the current behavior.

Problem: buildStoryTrackHsetFields previously wrote `description` only
when non-empty. Because story:track:v1 rows are collapsed by
normalized-title hash, an earlier mention's body would persist for up
to STORY_TTL (7 days) on subsequent body-less mentions of the same
story. Consumers reading `track.description` via HGETALL could not
distinguish "this mention's body" from "some mention's body from the
last week," silently grounding brief / whyMatters / SummarizeArticle
LLMs on text the current mention never supplied. That violates the
grounding contract advertised to every downstream surface in this PR.

Fix: HSET `description` unconditionally on every mention — empty
string when the current item has no body, real body when it does. An
empty value overwrites any prior mention's body so the row is always
authoritative for the current cycle. Consumers continue to treat
empty description as "fall back to cleaned headline" (R6 preserved).
The 7-day STORY_TTL and normalized-title hash semantics are unchanged.

Trade-off accepted: a valid body from Feed A (NYT) is wiped when Feed
B (AP body-less wire reprint) arrives for the same normalized title,
even though Feed A's body is factually correct. Rationale: the
alternative — keeping Feed A's body indefinitely — means the user
sees Feed A's body attributed (by proximity) to an AP mention at a
later timestamp, which is at minimum misleading and at worst carries
retracted/corrected details. Honest absence beats unlabeled presence.

Tests: new stale-body overwrite sequence test (T0 body → T1 empty →
T2 new body), existing "writes description when non-empty" preserved,
existing "omits when empty" inverted to "writes empty, overwriting."
cache-keys.ts contract comment updated to mark description as
always-written rather than optional.
2026-04-24 16:25:14 +04:00
Elie Habib
b68d98972a fix(unrest): bump GDELT proxy timeout 20s → 45s (#3362)
GDELT's v1 gkg_geojson endpoint is currently responding in ~19s (direct
curl test: HTTP 200 at t=19.4s). With the old 20s proxy timeout the
Decodo leg hits Cloudflare origin timeout and returns HTTP 522 on nearly
every tick, so fetchGdeltEvents throws "both paths failed — proxy:
HTTP/1.1 522 Server Error" and runSeed freezes seed-meta fetchedAt.

Result: the unrest:events seed-meta stops advancing while Redis still
holds the last-good payload — health.js reports STALE_SEED even though
the seeder is running on schedule every 45 min. 4.5+ hours of
consecutive failures observed in production logs overnight.

Direct path has been chronically broken (UND_ERR_CONNECT_TIMEOUT in
every tick since PR #3256 added the proxy fallback), so the proxy is
the real fetch path. Giving it 45s absorbs GDELT's current degraded
response time with headroom, without changing any other behavior.

ACLED credentials remain unconfigured in this environment, so GDELT is
effectively the single upstream — separate ops task to wire ACLED as a
real second source.
2026-04-24 08:52:08 +04:00
Elie Habib
5cec1b8c4c fix(insights): trust cluster rank, stop LLM from re-picking top story (#3358)
* fix(insights): trust cluster rank, stop LLM from re-picking top story

WORLD BRIEF panel published "Iran's new supreme leader was seriously
wounded, leading him to delegate power to the Revolutionary Guards. This
development comes amid an ongoing war with Israel." to every visitor for
3h. Payload: openrouter / gemini-2.5-flash.

Root cause: callLLM sent all 10 clustered headlines with "pick the ONE
most significant and summarize ONLY that story". Clustering ranked
Lebanon journalist killing #1 (2 corroborating sources); News24 Iran
rumor ranked #3 (1 source). Gemini overrode the rank, picked #3, and
embellished with war framing from story #4. Objective rank (sourceCount,
velocity, isAlert) lost to model vibe.

Shrink the LLM's job to phrasing. Clustering already ranks — pass only
topStories[0].primaryTitle and instruct the model to rewrite it using
ONLY facts from the headline. No name/place/context invention.

Also:
- temperature 0.3 -> 0.1 (factual summary, not creative)
- CACHE_TTL 3h -> 30m so a bad brief ages out in one cron cycle
- Drop dead MAX_HEADLINES const

Payload shape unchanged; frontend untouched.

* fix(insights): corroboration gate + revert TTL + drop unconditional WHERE

Follow-up to review feedback on the ranking contract, TTL, and prompt:

1. Corroboration gate (P1a). scoreImportance() in scripts/_clustering.mjs
   is keyword-heavy (violence +125 on a single word, flashpoint +75, ^1.5
   multiplier when both hit), so a single-source sensational rumor can
   outrank a 2-source lead purely on lexical signals. Blindly trusting
   topStories[0] would let the ranker's keyword bias still pick bad
   stories. Walk topStories for sourceCount >= 2 instead — corroboration
   becomes a hard requirement, not a tiebreaker. If no cluster qualifies,
   publish status=degraded with no brief (frontend already handles this).

2. CACHE_TTL back to 10800 (P1b). 30m TTL == one cron cadence means the
   key expires on any missed or delayed run and /api/bootstrap loses
   insights entirely (api/bootstrap.js reads news:insights:v1 directly,
   no LKG across TTL-gap). The short TTL was defense-in-depth for bad
   content; the real safety is now upstream (corroboration gate + grounded
   prompt), so the LKG window doesn't need to be sacrificed for it.

3. Prompt: location conditional (P2). "Use ONLY facts present" + "Lead
   with WHAT happened and WHERE" conflicted for headlines without an
   explicit location and pushed the model toward inferred-place
   hallucination. Replaced with "Include a location, person, or
   organization ONLY if it appears in the headline."

* test(insights): lock corroboration gate + grounded-prompt invariants

Review P2: the corroboration gate and the prompt's no-invention rules
had no tests, so future edits to selectTopStories() ordering or prompt
text could silently reintroduce the original hallucination.

Extract the brief-selection helper and prompt builders into a pure
module (scripts/_insights-brief.mjs) so tests can import them without
triggering seed-insights.mjs's top-level runSeed() call:

- pickBriefCluster(topStories) returns first sourceCount>=2 cluster
- briefSystemPrompt(dateISO) returns the system prompt
- briefUserPrompt(headline) returns the user prompt

Regression tests (tests/seed-insights-brief.test.mjs, 12 cases) lock:
- pickBriefCluster skips single-source rumors even when ranked above a
  multi-sourced lead (explicit regression: News24 Iran supreme leader
  2026-04-23 scenario with realistic scores)
- pickBriefCluster tolerates missing/null entries
- briefSystemPrompt forbids invented facts and proper nouns
- briefSystemPrompt's "location" rule is conditional (no unconditional
  "Lead with WHAT and WHERE" directive that would push the model toward
  place-inference when the headline has no location)
- briefSystemPrompt does not contain "pick the most important" style
  language (ranking is done by pickBriefCluster upstream)
- briefUserPrompt passes the headline verbatim and instructs
  "only facts from this headline"

Also fix a misleading comment on CACHE_TTL: corroboration is gated at
brief-selection time, not on the topStories payload itself (which still
includes single-source clusters rendered as the headline list).

test:data: 6657/6657 pass (was 6645; +12).
2026-04-24 07:21:13 +04:00
Elie Habib
def94733a8 feat(agent-readiness): Agent Skills discovery index (#3310) (#3355)
* feat(agent-readiness): Agent Skills discovery index (#3310)

Closes #3310. Ships the Agent Skills Discovery v0.2.0 manifest at
/.well-known/agent-skills/index.json plus two real, useful skills.

Skills are grounded in real sebuf proto RPCs:
- fetch-country-brief → GetCountryIntelBrief (public).
- fetch-resilience-score → GetResilienceScore (Pro / API key).

Each SKILL.md documents endpoint, auth, parameters, response shape,
worked curl, errors, and when not to use the skill.

scripts/build-agent-skills-index.mjs walks every
public/.well-known/agent-skills/<name>/SKILL.md, sha256s the bytes,
and emits index.json. Wired into prebuild + every variant build so a
deploy can never ship an index whose digests disagree with served files.

tests/agent-skills-index.test.mjs asserts the index is up-to-date
via the script's --check mode and recomputes every sha256 against
the on-disk SKILL.md bytes.

Discovery wiring:
- public/.well-known/api-catalog: new anchor entry with the
  agent-skills-index rel per RFC 9727 linkset shape.
- vercel.json: adds agent-skills-index rel to the homepage +
  /index.html Link headers; deploy-config required-rels list updated.

Canonical URLs use the apex (worldmonitor.app) since #3322 fixed
the apex redirect that previously hid .well-known paths.

* fix(agent-readiness): correct auth header + harden frontmatter parser (#3310)

Addresses review findings on #3310.

## P1 — auth header was wrong in both SKILL.md files

The published skills documented `Authorization: Bearer wm_live_...`,
but WorldMonitor API keys must be sent in `X-WorldMonitor-Key`.
`Authorization: Bearer` is for MCP/OAuth or Clerk JWTs — not raw
`wm_live_...` keys. Agents that followed the SKILL.md verbatim would
have gotten 401s despite holding valid keys.

fetch-country-brief also incorrectly claimed the endpoint was
"public"; server-to-server callers without a trusted browser origin
are rejected by `validateApiKey`, so agents do need a key there too.
Fixed both SKILL.md files to document `X-WorldMonitor-Key` and
cross-link docs/usage-auth as the canonical auth matrix.

## P2 — frontmatter parser brittleness

The hand-rolled parser used `indexOf('\n---', 4)` as the closing
fence, which matched any body line that happened to start with `---`.
Swapped for a regex that anchors the fence to its own line, and
delegated value parsing to js-yaml (already a project dep) so future
catalog growth (quoted colons, typed values, arrays) does not trip
new edge cases.

Added parser-contract tests that lock in the new semantics:
body `---` does not terminate the block, values with colons survive
intact, non-mapping frontmatter throws, and no-frontmatter files
return an empty mapping.

Index.json rebuilt against the updated SKILL.md bytes.
2026-04-23 22:21:25 +04:00
Elie Habib
e9146516a5 fix(swf): restore 8/8 fund coverage + explicit per-country observability (#3352)
* fix(swf): restore 8/8 fund coverage — WB bulk mrv=1 silently dropped Gulf countries

The 2026-04-23 post-#3344 Railway run seeded 4/8 funds (NO, SA, SG) and
silently dropped AE/KW/QA. Root cause: WB's `country/all/indicator/…?mrv=1`
returns the SAME year across every country (the most recent year that any
country publishes). KW/QA/AE report NE.IMP.GNFS.CD a year or two behind
NO/SA/SG, so mrv=1 gave them `value: null` and the seeder skipped them
because the rawMonths denominator was missing.

Fix: bump to `mrv=5` and pick the most recent non-null value per country
via a new pure helper `pickLatestPerCountry(records)`. Verified via
6 back-to-back live dry-runs (all 8/8, byte-identical numbers):
  NO: GPFG          1/1  effMo=93.05   (2024 imports)
  AE: ADIA+Mubadala 2/2  effMo=3.85    (2023 imports)
  SA: PIF           1/1  effMo=1.68    (2024 imports)
  KW: KIA           1/1  effMo=45.43   (2023 imports)
  QA: QIA           1/1  effMo=8.61    (2022 imports)
  SG: GIC+Temasek   2/2  effMo=7.11    (2024 imports; Temasek via infobox)

Second fix (observability): every manifest country is now enumerated in
a `summary` block in the payload + logged with an explicit status and
reason. Prod 14:59Z run had logs for KW/QA ("missing WB imports") but AE
was dropped with no log line — the operator has to cross-reference the
manifest to notice. New `buildCoverageSummary(manifest, imports, countries)`
is exported and always emits one row per manifest country: `complete`,
`partial`, or `missing` with `reason ∈ {'missing WB imports', 'no fund
AUM matched'}`. Summary is also embedded in the published payload so
downstream consumers can detect degraded runs without parsing logs.

Tests (48/48 pass, 9 new):
- `pickLatestPerCountry` — 7 cases including the exact prod scenario
  (AE-2024-null + AE-2023-non-null → resolves to 2023 row). Guards
  against upstream re-order (asserts latest-year wins regardless of
  array order), rejects null-only countries, rejects non-positive
  values, handles both iso3 and iso2 codes.
- `buildCoverageSummary` — 2 cases covering the regression
  (silent-drop of AE) and the reason-string disambiguation (operator
  should know whether to investigate WB or Wikipedia).

Validated: 6 live end-to-end dry-runs (all 8/8), full test suite
569/569 pass, biome + lint:md clean.

* fix(swf): address Greptile P2 — uniform reason field + meaningful null-filter test

Two P2 findings on PR #3352:

1. `complete` and `partial` entries in countryStatuses were pushed
   without a `reason` key, while `missing` always carried one. The log
   path tolerated this (`row.reason ? ... : ''`), but the summary is
   now persisted in Redis — any downstream consumer iterating
   countryStatuses and reading `.reason` on a `partial` would see
   undefined. Added `reason: null` to complete + partial for uniform
   persisted shape. Test now asserts the `reason` key is present on
   every row regardless of status.

2. The null-only pickLatestPerCountry test used `'XYZ'` as the ISO-3
   code, which is filtered at the iso3→iso2 lookup stage BEFORE ever
   reaching the null-value guard — a regression that removed null
   filtering entirely would leave the test green. Swapped to `'NOR'`
   (real ISO-3 with a valid iso2 mapping) so the null-filter is the
   actual gate under test. Verified via sanity probe: `NOR + null`
   still drops, `NOR + value` still lands.

Tests 48/48 pass; live dry-run still 8/8 byte-identical; biome clean.
2026-04-23 21:35:25 +04:00
Elie Habib
38218db7cd fix(energy): strict validation — emptyDataIsFailure on Atlas seeders (#3350)
Adds `emptyDataIsFailure: true` to all 5 curated-registry seeders in the
`seed-bundle-energy-sources` Railway service. File-read-and-validate
seeders whose validateFn returns false (stale container, missing data
file, shape regression, etc.) MUST leave seed-meta stale rather than
stamping fresh `recordCount: 0` via the default `publishResult.skipped`
branch in `_seed-utils.mjs:906-917`.

Why this matters — observed production incident on 2026-04-23 (post
PR #3337 merge):

- Subset of Atlas seeders hit the validation-skip path (for reasons
  involving a Railway container stale vs the merged code + a local
  Option A run during an intermediate-file-state window).
- `_seed-utils.mjs:910` `writeFreshnessMetadata(..., 0, ...)` stamped
  `seed-meta:energy:pipelines-oil` and `seed-meta:energy:storage-facilities`
  with fresh `fetchedAt + recordCount: 0`.
- Bundle runner's interval gate at `_bundle-runner.mjs:210` reads
  `fetchedAt` only, not `recordCount`. With `elapsed < 0.8 × 10080min =
  8064min`, the gate skipped these 2 sections for ~5.5 days. No
  canonical data was written; health reported EMPTY; bundle never
  self-healed.

With `emptyDataIsFailure: true`, the strict branch at
`_seed-utils.mjs:897-905` fires instead:

  FAILURE: validation failed (empty data) — seed-meta NOT refreshed;
  bundle will retry next cycle

Seed-meta stays stale, bundle counts it as `failed++`, next cron tick
retries. Health flips STALE_SEED within max-stale-min. Operator sees
it. Loud-failure instead of silent-skip-with-meta-refresh.

Pattern previously documented for strict-floor validators
(IMF/WEO 180+ country seeders in
`feedback_strict_floor_validate_fail_poisons_seed_meta.md`) — now
applied to all 5 Energy Atlas curated registries for the same reasons.

No functional change in the healthy path — validation-passing runs
still publish canonical + fresh seed-meta as before.

Verification: typecheck clean, 6618/6618 data tests pass.
2026-04-23 20:43:27 +04:00
Elie Habib
8278c8e34e fix(forecasts): unwrap seed-contract envelope in canonical-key sim patcher (#3348)
* fix(forecasts): unwrap seed-contract envelope in canonical-key sim patcher

Production bug observed 2026-04-23 across both forecast worker services
(seed-forecasts-simulation + seed-forecasts-deep): every successful run
logs `[SimulationDecorations] Cannot patch canonical key — predictions
missing or not an array` and silently fails to write simulation
adjustments back to forecast:predictions:v2.

Root cause: PR #3097 (seed-contract envelope dual-write) wraps canonical
seed writes in `{_seed: {...}, data: {predictions: [...]}}` via runSeed.
The Lua patcher (_SIM_PATCH_LUA) and its JS test-path mirror both read
`payload.predictions` directly with no envelope unwrap, so they always
return 'MISSING' against the new shape — meeting the documented pattern
in the project's worldmonitor-seed-envelope-consumer-drift learning
(91 producers enveloped, private-helper consumers not migrated).

User-visible impact: ForecastPanel renders simulation-adjusted scores
only when a fast-path seed has touched a forecast since the bug landed;
deep-forecast and simulation re-scores never reach the canonical feed.

Fix:
  - _SIM_PATCH_LUA detects envelope shape (`type(payload._seed) == 'table'
    and type(payload.data) == 'table'`), reads `inner.predictions`, and
    re-encodes preserving the wrapper so envelope shape persists across
    patches. Legacy bare values still pass through unchanged.
  - JS test path mirrors the same unwrap/rewrap.
  - New test WD-20b locks the regression: enveloped store fixture, asserts
    `_seed` wrapper preserved on write + inner predictions patched.

Also resolves the per-run `[seed-contract] forecast:predictions missing
fields: sourceVersion — required in PR 3` warning by passing
`sourceVersion: 'detectors+llm-pipeline'` to runSeed (PR 3 of the
seed-contract migration will start enforcing this; cheap to fix now).

Verified: typecheck (both tsconfigs) clean; lint 0 errors; test:data
6631/6631 green (forecast suite 309/309 incl new WD-20b); edge-functions
176/176 green; markdown + version-check clean.

* fix(forecasts): tighten JS envelope guard to match Lua's strict table check

PR #3348 review (P2):

JS test path used `!!published._seed` (any truthy value) while the Lua
script requires `type(payload._seed) == 'table'` (strict object check).
Asymmetry: a fixture with `_seed: true`, `_seed: 1`, or `_seed: 'string'`
would be treated as enveloped by JS and bare by Lua — meaning the JS
test mirror could silently miss real Lua regressions that bisect on
fixture shape, defeating the purpose of having a parity test path.

Tighten JS to require both `_seed` and `data` be plain objects (rejecting
truthy non-objects + arrays), matching Lua's `type() == 'table'` semantics
exactly.

New test WD-20c locks the parity: fixture with non-table `_seed` (string)
+ bare-shape `predictions` → must succeed via bare path, identical to
what Lua would do.

Verified: 6632/6632 tests pass; new WD-20c green.
2026-04-23 20:38:11 +04:00
Elie Habib
54479feacc fix(ci): vercel-ignore prefers merge-base over VERCEL_GIT_PREVIOUS_SHA on previews (#3347)
* fix(ci): vercel-ignore prefers merge-base over VERCEL_GIT_PREVIOUS_SHA on previews

PR #3346 was incorrectly skipped with 'Canceled by Ignored Build Step'
despite touching src/, pro-test/, and public/. Root cause: on a PR
branch's FIRST push, Vercel populates VERCEL_GIT_PREVIOUS_SHA in ways
that make the path-diff collapse to empty (e.g., same SHA as HEAD, or a
parent commit that sees no net change in the allowed paths).

The script preferred PREVIOUS_SHA and only fell back to
`git merge-base HEAD origin/main` when PREVIOUS_SHA was literally empty
or unresolvable — which misses the 'PREVIOUS_SHA resolves but gives
wrong answer' case that Vercel hits on first-push PRs.

Fix: flip the priority for PREVIEW deploys. Use merge-base first (the
stable truth: 'everything on this PR since it left main'), fall back to
PREVIOUS_SHA only for the rare shallow-clone scenario where origin/main
isn't in Vercel's clone and merge-base returns empty.

Main-branch branch (line 6) is UNCHANGED — it correctly wants
PREVIOUS_SHA = the last deployed commit, not merge-base (which would be
HEAD itself on main and skip every push).

Tested locally:
- PR branch + web change + PREVIOUS_SHA=HEAD → exit 1 (build) ✓
- PR branch + scripts-only change + PREVIOUS_SHA=HEAD → exit 0 (skip) ✓
- main + PREVIOUS_SHA=valid previous deploy → exit 1 (build) ✓

Related: PR #3346 needed an empty commit to retrigger the preview
deploy. After this fix, first-push PRs should deploy without the dance.

* chore: retrigger vercel deploy (previous attempt failed on git-provider transient)
2026-04-23 20:23:45 +04:00
Elie Habib
53c50f4ba9 fix(swf): move manifest next to its loader so Railway ships it (#3344)
PR #3336 fixed the yaml dep but the next Railway tick crashed with
`ENOENT: no such file or directory, open '/docs/methodology/swf-classification-manifest.yaml'`.

Root cause: the loader at scripts/shared/swf-manifest-loader.mjs
resolved `../../docs/methodology/swf-classification-manifest.yaml`,
which works in a full repo checkout but lands at `/docs/...` (outside
`/app`) in the Railway recovery-bundle container. That service has
rootDirectory=scripts/ in the dashboard, so NIXPACKS only copies
`scripts/` into the image — `docs/` is never shipped.

Fix: move the YAML to scripts/shared/swf-classification-manifest.yaml,
alongside its loader. MANIFEST_PATH becomes `./swf-classification-manifest.yaml`
so the file is always adjacent to the code that reads it, regardless
of rootDirectory.

Tests: 53/53 SWF tests still pass; biome clean on changed files.
2026-04-23 19:47:10 +04:00
Elie Habib
0b7069f5dc chore(railway): force rebuild of seed bundles after infra-error build failure (#3342) 2026-04-23 18:14:51 +04:00
Elie Habib
df91e99142 feat(energy): expand 5 curated registries to 100% of plan target (#3337)
* feat(energy): expand gas pipeline registry 12 → 28 (phase 1a batch 1)

Data validation after v1 launch showed pipelines shipped at ~16% of the
plan target (12 gas + 12 oil vs. the plan's 75 + 75 critical
pipelines). This commit closes ~20% of the gas gap with 16 hand-curated
global additions, every entry carrying a full evidence bundle matching
the schema enforced by scripts/_pipeline-registry.mjs.

New additions by region:

North Sea / NW Europe (6):
  europipe-1, europipe-2, franpipe, zeepipe, interconnector-uk-be, bbl

Mediterranean / North Africa (3):
  transmed (Enrico Mattei), greenstream (LY→IT, reduced),
  meg-maghreb-europe (DZ→ES via MA, offline since Oct 2021)

Middle East (1):
  arab-gas-pipeline (EG→LB via JO/SY, offline under Caesar Act)

Former Soviet / Turkey (1):
  blue-stream (RU→TR, carries EU sanctions ref)

Asia (3):
  west-east-3 (CN internal, 7378 km), myanmar-china-gas (shwe),
  igb (interconnector-greece-bulgaria, 2022)

Africa / LatAm (2):
  wagp (west african gas pipeline, 4-country transit),
  gasbol (bolivia-brazil, 3150 km)

Badge distribution on new entries:
  flowing: 12, reduced: 2, offline: 2
  First non-Russia-exposure offline entries (MEG — Morocco-Algeria
  diplomatic closure, Arab Gas — Syria sanctions) — broadens the
  geographic distribution of evidence-bundle-backed non-positive badges.

Registry tests: 17/17 pass (identity, geometry bounds, ISO2 country
codes, evidence contract, capacity-commodity pairing, validateRegistry
negative cases).

Next batches in this phase: oil pipelines +16, then second batches
each commodity to reach plan target (75+75). Tracked in
docs/internal/energy-atlas-registry-expansion.md.

* feat(energy): expand oil pipeline registry 12 → 28 (phase 1a batch 2)

Mirror of the gas batch — 16 hand-curated global additions with full
evidence bundles. Closes ~20% of the oil gap.

New additions by region:

North America (6):
  enbridge-mainline (CA→US 3.15 mbd), enbridge-line-3-replacement (2021),
  flanagan-south, seaway (Cushing→Gulf), marketlink (TC, Cushing→Gulf),
  spearhead

Middle East (3):
  sumed (EG crude bypass of Suez, 2.8 mbd),
  east-west-saudi (Petroline, 5 mbd — largest single oil pipeline in
  the registry by capacity),
  ipsa-2 (IQ→SA, offline since Iraq invasion of Kuwait 1990, later
  converted to gas on the western stretch)

Central Asia (1):
  kazakhstan-china-crude (KZ→CN Alashankou, 2228 km)

Africa (1):
  chad-cameroon-cotco (TD→CM Kribi, 1070 km)

South America (2):
  ocp-ecuador (heavy crude, 450 kbd),
  sote-ecuador (lighter grades, 360 kbd)

Europe (3):
  tal-trieste-ingolstadt (IT→DE via AT, 770 kbd),
  janaf-adria (HR→RS→HU, 280 kbd),
  norpipe-oil (NO→DE North Sea crude, 900 kbd)

Badge distribution on new entries:
  flowing: 15, offline: 1 (IPSA-2, regulator-sourced + nationalisation
  statement backing the offline badge per the evidence-contract rules).

Registry totals after this batch:
  gas:  12 → 28  (37% of plan target 75)
  oil:  12 → 28  (37% of plan target 75)
  total: 24 → 56

Registry tests: 17/17 registry + 23/23 evidence-derivation = 40/40 pass.
Typecheck-free (JSON only).

Next batches (per docs/internal/energy-atlas-registry-expansion.md):
  gas batch 2: +22 → 50 (North Sea remainder, Caspian, Asia)
  oil batch 2: +22 → 50 (North Sea remainder, Russia diversified,
                         Asia long-haul)

* feat(energy): expand gas pipeline registry 28 → 50 (phase 1a batch 3)

Second gas batch, 22 additions, bringing gas to ~67% of the 75-pipeline
plan target. Geographic distribution deliberately skewed this batch
toward under-represented regions (Middle East, Central Asia, South
America, Africa, Southeast Asia) since the first batch filled Europe
and North America.

New additions (22):

North Sea / UK (2):
  vesterled (NO→GB, 13 bcm/yr),
  cats (UK, 9.6 bcm/yr)

Iran family (3):
  iran-turkey-gas (Tabriz→Ankara, 14 bcm/yr, OFAC sanctions ref),
  iran-armenia-gas (2.3 bcm/yr),
  iran-iraq-basra-gas (reduced state — waiver-dependent flows)

Central Asia (2):
  central-asia-center (TM→RU via UZ/KZ, 44 bcm/yr nominal, reduced),
  turkmenistan-iran-korpeje (expired contract, reduced)

Caucasus / Turkey (2):
  south-caucasus-scp (BTE predecessor to TANAP, 22 bcm/yr),
  sakarya-black-sea-tr (2023 Turkish offshore)

China (2):
  west-east-1 (4200 km, 17 bcm/yr),
  west-east-2 (8700 km, 30 bcm/yr)

South America (2):
  bolivia-argentina-yacuiba (reduced),
  antonio-ricaurte (CO→VE, offline since 2015, PDVSA sanctions)

Saudi / Middle East (2):
  saudi-master-gas-system (SA internal, 95 bcm/yr — largest capacity
  in the registry), egypt-jordan-aqaba (AGP south leg, flowing)

Israel-Egypt (1):
  israel-egypt-arish-ashkelon (reverse-flow since 2020, IL→EG export)

Planned / FID-stage (5):
  galsi-planned (DZ→IT, consortium paused),
  eastmed-planned (IL→CY→GR, US political support withdrawn Jan 2022),
  trans-saharan-planned (NG→DZ via NE, insurgency + financing unresolved),
  morocco-nigeria-offshore-planned (NG→MA 11-country offshore route),
  power-of-siberia-2-planned (RU→CN via MN, no binding CNPC contract),
  kirkuk-dohuk-turkey-gas-planned (IQ→TR, Baghdad-Erbil dispute)

Badge distribution on new batch:
  flowing: 10 (incl. Sakarya 2023 commissioned)
  reduced: 3 (CAC, BO-AR, IR-IQ)
  offline: 1 (Antonio Ricaurte, CO-VE, with operator statement + sanction)
  unknown: 6 (all planned/FID-stage, classifierConfidence 0.6-0.75)

All non-flowing badges have evidence (sanction refs, operator
statements, or press sourcing) per the evidence-contract validator.

Registry totals after this batch:
  gas:  28 → 50  (67% of plan target; gas ≥60 gate threshold not yet
                  hit but approaching)
  oil:  28 (unchanged — batch 4 will target oil to 50)
  total: 56 → 78

Registry tests: 17/17 pass. Includes 8 new fully-hedged "unknown" /
planned-status entries; validator accepts them.

Next: oil batch 2 (+22 → 50), then gas batch 3 (+10 → 60), oil batch 3
(+10 → 60). After that the gate criteria on pipelines hit and we can
focus on storage / shortages / disruptions.

* feat(energy): expand oil pipeline registry 28 → 50 (phase 1a batch 4)

Second oil batch, 22 additions, bringing oil to 67% of plan target and
matching gas (50 each, 100 total pipelines).

New additions (22):

Russia Baltic export (2):
  bps-1 (Primorsk, 1.3 mbd — largest single line in oil registry),
  bps-2 (Ust-Luga, 0.75 mbd). Both carry G7+EU price-cap sanctions ref.

North America diversified (3):
  enbridge-line-5 (CA→CA via US Straits of Mackinac, ongoing litigation),
  keystone-xl-cancelled (CA→US, permit revoked 2021, Biden; TC
  terminated Jun 2021; listed for historical + geopolitical
  completeness, physicalState=unknown by deriver rule),
  trans-panama-pipeline (PA, 0.9 mbd cross-isthmus)

Europe remaining (3):
  rotterdam-rhine-rrp (NL→DE, 275 km),
  spse (FR→DE Lyon→Karlsruhe, 769 km),
  forties-pipeline (UK North Sea, 0.6 mbd),
  brent-pipeline (NO→GB Sullom Voe, reduced — Brent field in
  decommissioning)

Middle East (2):
  khafji-neutral-zone (SA/KW, reduced post-2015 neutral-zone dispute),
  ab-1-bahrain (SA→BH, 2018, 0.35 mbd)

Africa (4):
  greater-nile-petroleum (SS→SD Port Sudan, 1610 km),
  djeno-congo (CG terminal system),
  nigeria-forcados-export (reduced — recurring force-majeure),
  nigeria-bonny-export (Trans Niger Pipeline, reduced)

Latin America (2):
  pemex-nuevo-cactus (MX, 0.44 mbd),
  trans-andino (AR→CL, offline since 2006 export restrictions)

Ukraine (1):
  odesa-brody (offline, under EU 2022/879 Russian-crude embargo
  framework)

Asia (1):
  myanmar-china-crude (MM→CN Kunming, 771 km parallel to
  myanmar-china-gas)

Caspian (1):
  baku-novorossiysk-northern (AZ→RU historical route, reduced, carries
  Russian crude price-cap ref)

Historical / planned (2):
  kirkuk-haifa-idle (IQ→IL via JO, closed 1948 — listed for
  completeness; periodically floated as reopening proposal),
  uganda-tanzania-eacop-planned (UG→TZ, under construction, Western
  bank-financing pulled but TotalEnergies continues)

Badge distribution on new batch:
  flowing: 10
  reduced: 6 (Brent decommissioning, Khafji dispute, Greater Nile,
              Forcados, Bonny, Baku-Novorossiysk)
  offline: 2 (Odesa-Brody, Trans-Andino, Kirkuk-Haifa)
  unknown: 2 (Keystone XL cancelled, EACOP under construction)
  Wait, Kirkuk-Haifa is offline not among 2. Corrected count:
  flowing: 10, reduced: 6, offline: 3 (Odesa-Brody, Trans-Andino,
  Kirkuk-Haifa), unknown: 2, plus 1 flowing Myanmar-China-crude = 22.

All non-flowing badges carry supporting evidence (operator statements,
sanction refs, or press citations) per the evidence-contract validator.

Registry totals after this batch:
  gas:  50 (67% of plan target)
  oil:  28 → 50 (67% of plan target)
  total: 78 → 100

Registry tests: 17/17 + 23/23 evidence-derivation = 40/40 pass.

Next batches to hit the 60-each gate criteria from
docs/internal/energy-atlas-registry-expansion.md:
  gas batch 3: +10 → 60 (EastMed details, Galsi alternative routes,
              minor EU-interconnectors, Nigeria LNG feeder gas lines)
  oil batch 3: +10 → 60 (Pluto crude, Chinese Huabei system, Latam
              infill: Brazil Campos, Peru Northern Trunk)

After 60/60: hit gate, move to storage expansion.

* feat(energy): gas registry 50 → 75 — plan target hit

Batch 3 adds 25 more gas pipelines, bringing gas to 100% of the
75-pipeline plan target.

New additions by region (25):
- Norwegian transport spine: statpipe, sleipner-karsto, troll-a,
  oseberg-gas-transport, asgard-transport (covers the major offshore
  export collectors — the rest of the Gassco system)
- Australia: dampier-bunbury (1594 km), moomba-sydney (1299 km)
- Africa: mozambique-rompco (MZ→ZA), escravos-lagos-gas (NG),
  tanzania-mtwara-dar, ghana-gas (atuabo)
- Southeast Asia: thailand-malaysia-cakerawala, indonesia-singapore
  west-natuna + grissik-sakra
- German hubs for Nord Stream continuation: nel-pipeline, opal-pipeline,
  eugal-pipeline (built but dormant after NS2 halt/destruction),
  megal-pipeline, gascade-jagal, zeelink-germany
- Russia/Ukraine/EU transit: progress-urengoy-uzhhorod (halted 1 Jan
  2025 when Ukraine did not renew transit agreement), trans-austria-gas
- Iran: kish-iran-gas, iran-pakistan-gas-planned (Pakistani segment
  stalled since 2014)
- China/HK: china-hong-kong-gas

Badge distribution on new batch: 15 flowing, 4 reduced (NEL, OPAL,
TAG, Escravos-Lagos), 2 offline (EUGAL dormant post-NS2,
Urengoy-Uzhhorod transit halt), 4 sanction-exposed (NS-continuation
pipelines + TAG + Urengoy), 1 unknown (Iran-Pakistan stalled
completion).

Plan progress: gas 50 → 75 (100% of plan target).
Registry tests: 17/17 pass.

* feat(energy): oil registry 50 → 75 — plan target hit

Batch 4 adds 25 more oil pipelines, bringing oil to 100% of the
75-pipeline plan target. Combined with gas at 75, total registry is
150 pipelines — full plan coverage for Phase 1a.

New additions by region (25):
- Latin America: colombia-cano-limon-covenas (ELN-sabotaged, reduced),
  colombia-ocensa (main trunk), peru-norperuano (reduced from jungle
  spills + protests), ecuador-lago-agrio-orellana,
  venezuela-anzoategui-puerto-la-cruz (under OFAC PDVSA sanctions),
  mexico-salina-cruz-minatitlan, mexico-madero-cadereyta,
  mexico-gulf-coast-pipeline (Tuxpan-Mexico City)
- Africa: angola-cabinda-offshore, south-sudan-kenya-lamu-planned
  (LAPSSET)
- Middle East: iran-abadan-isfahan, iran-neka-tehran (reduced,
  Caspian swap arrangements), saudi-abqaiq-yanbu-products,
  iraq-strategic-pipeline (1000 km north-south), iraq-bai-hassan,
  oman-muscat-export (Fahud-Mina al-Fahal), uae-habshan-ruwais
- Asia-Pacific: india-salaya-mathura (1770 km, largest Indian crude
  trunk), india-vadinar-kandla, india-mundra-bhatinda,
  china-qinhuangdao-tianjin-huabei, china-yangzi-hefei-hangzhou
- Russia East: russia-sakhalin-2-crude, russia-komsomolsk-perevoznaya,
  russia-omsk-pavlodar (cross-border to KZ)

Badge distribution on this batch: 18 flowing, 6 reduced, 1 unknown
(LAPSSET planned). Sanctions-exposure diversified: Iran framework (3),
Venezuela/PDVSA (1), Russian price-cap (3). All non-flowing badges
carry supporting evidence per validator rules.

Phase 1a final state (pipelines):
  gas: 12 → 75 (100% of plan target, 6 batches)
  oil: 12 → 75 (100% of plan target, 6 batches)
  total: 24 → 150

Geographic distribution now global:
  - Russia-exposure: ~22 of 150 entries (~15%, down from 50% at v1)
  - US-only: ~8 (~5%, down from 33% storage-side skew)
  - Six continents represented in active infrastructure
  - Historical + planned pipelines (Kirkuk-Haifa, Keystone XL cancelled,
    EACOP u/c, EastMed planned, GALSI planned, TSGP planned,
    Nigeria-Morocco offshore, Power of Siberia 2, Iran-Pakistan Peace,
    LAPSSET) listed with honest 'unknown' physicalState per validator

Registry tests: 17/17 pass.

Phase 1a complete. Next phase (per
docs/internal/energy-atlas-registry-expansion.md):
  - Phase 2: storage 21 → ~200 (+179) via curation + GIIGNL/GIE/EIA
  - Phase 3: shortages 14 → 28 countries
  - Phase 4: disruptions 12 → 50 events

* feat(energy): shortages 15 → 29 entries across 28 countries — plan target hit

+14 country additions matching the 28-country plan target. The
validator's 'confirmed severity requires authoritative source' rule
caught two of my drafts (Myanmar + Sudan) where I had labeled them
confirmed with press-only evidence because regulator/operator sources
under a junta + active civil war are not independently verifiable.
Downgraded both to 'watch' with an inline note explaining the
evidence-quality choice — exactly the validator's intended behavior
(better to under-claim than over-claim severity when the authoritative
channel is broken).

New shortages (14):
- BD diesel: BPC LC delays, regulator-confirmed
- ZA diesel: loadshedding demand spike
- AO diesel: Luanda/Benguela depot delays
- MZ diesel: FX-allocation import constraints
- ZM diesel: mining-sector demand + TAZAMA product tightness
- MW diesel: FX shortfalls + MERA rationing
- GH petrol: Tema port congestion
- MM diesel: post-coup chronic (watch, press-only evidence)
- MN diesel: winter logistics
- CO diesel: trucker strike cycles
- UA diesel: war-driven chronic (confirmed — Ministry of Energy source)
- SY diesel: Caesar Act chronic (confirmed — Syrian Ministry statement)
- SD diesel: civil-war disruption (watch, press-only)
- DE heating_oil: Rhine low-water logistics (watch)

Badge distribution on new batch: 3 confirmed (BD, UA, SY — all with
regulator/operator evidence), 11 watch.

Plan progress:
  shortages: 15 → 29 entries (28 unique countries = 100% of plan)
  gas: 75 (100%)
  oil: 75 (100%)
  storage: 21 (unchanged, next batch)
  disruptions: 12 (unchanged, next batch)

Registry tests: 19/19 pass.

* feat(energy): disruption event log 12 → 52 events — plan target hit

+40 historical and ongoing events covering the asset registry,
bringing disruptions to 104% of the 50-event plan target. Every event
ties to an assetId now in pipelines/storage registries (following the
75-gas + 75-oil + 21-storage registry expansion in the preceding
commits).

New additions by eventType:

Sabotage / war (7):
- abqaiq-khurais-drone-strike-2019 (Saudi, 5.7 mbd removed 11 days)
- russia-refinery-drone-strikes-2024 (Ukrainian drone strike series)
- houthi-red-sea-attacks-2024 (indirect SuMed demand impact)
- russia-ukraine-oil-depot-strikes-2022 (series)
- nigeria-trans-niger-attacks-2024 (Bonny system)
- bai-hassan-attack-2022 (Iraq Bai Hassan)
- sudan-pipeline-attacks-2023 (Greater Nile disruption)

Sanctions (7):
- russia-price-cap-implementation-2022 (G7+EU $60/bbl cap)
- eu-oil-embargo-2022 (6th package)
- pdvsa-designation-2019 (Venezuela)
- btc-kurdistan-shutdown-2023 (ICC ruling, ongoing)
- ipsa-nationalization-2001 (SA nationalised after Iraq invasion of Kuwait)
- arctic-lng-2-foreign-partner-withdrawal-2024
- yamal-lng-arctic-sanctions-ongoing (Novatek)
- ogm-moldova-transit-2022

Mechanical (4):
- druzhba-contamination-2019 (chlorides, 3-month shut)
- keystone-milepost-14-leak-2022 (Kansas, 22-day shut, 14k bbl spill)
- forties-crack-2017 (Red Moss hairline)
- ocensa-ocp-ecuador-suspensions-2022 (Amazon landslide)

Weather (2):
- hurricane-ida-lng-2021 (Gulf coast LNG shutdown)
- rotterdam-hub-low-water-2022 (Rhine 2.5-month disruption)

Commercial (9):
- cpc-blockage-threat-2022 (Russian court 30-day halt threat)
- gme-closure-2021 (Algeria-Morocco MEG)
- ukraine-transit-end-2025 (Progress pipeline halted 1 Jan 2025)
- eugal-dormant-since-2022 (NS2 knock-on)
- keystone-xl-permit-revoked-2021 (Biden day-1)
- antonio-ricaurte-halt-2015 (CO→VE gas export halt)
- langeled-brent-decommissioning-2020
- eacop-financing-2023 (Western bank withdrawal)
- dolphin-qatar-uae-commercial-2024 (contract renegotiation)
- trans-austria-gas-reduction-2022 (Gazprom volume drops)
- cushing-stocks-tank-bottoms-2022
- spr-drawdown-2022-2023 (largest ever 180 mbbl release)
- zhoushan-storage-expansion-2023
- fujairah-stockbuild-2024
- futtsu-lng-demand-decline-2024
- bolivia-diesel-import-cut-2023 (GASBOL)
- myanmar-china-gas-reduced-2023
- yamal-europe-poland-halt-follow-on-2024

Maintenance (1): gladstone-lng-maintenance-2023

Ongoing events (endAt=null): 31 of 52 (~60%). Reflects the structural
reality that many 2022-era sanctions + war events remain live in 2026.

Plan progress:
  gas: 75 (100%)
  oil: 75 (100%)
  storage: 21 (unchanged, next batch)
  shortages: 29 (100% — 28 countries)
  disruptions: 12 → 52 events (104% of plan)

Registry tests: 16/16 pass.

* feat(energy): storage registry 21 → 66 (storage batch 1)

+45 facilities, 33% of plan. Focus: European UGS + second LNG wave.

European UGS additions (35 — mostly filling the gap against GIE AGSI+
coverage which has ~140 EU sites; we now register the majority of
operationally significant ones with non-trivial working capacity):

Germany (9): bierwang, etzel-salt-cavern, jemgum, krummhoern,
  peckensen, reckrod, uelsen, xanten, epe-salt-cavern
Netherlands (3): alkmaar, norg (largest NL, 59.2 TWh), zuidwending
Austria (3): 7fields-schonkirchen (24.6 TWh), baumgarten-uhs,
  puchkirchen
France (7): chemery (38.5 TWh), cerville-velaine, etrez, manosque,
  lussagnet (35 TWh), izaute
Italy (4): minerbio (45 TWh, largest IT), ripalta, sergnano,
  brugherio
UK (2): rough (reduced, post-2017 partial reopening 2022), hornsea
Central/Eastern Europe (8): damborice (CZ), lobodice (CZ),
  lab-slovakia (36 TWh), hajduszoboszlo (HU), mogilno (PL),
  lille-torup (DK), incukalns (LV), gaviota (ES)

Russia (1): kasimovskoe (124 TWh — Gazprom UGS flagship; EU sanctions
  ref carried as evidence)

LNG terminals (9 additions to round out global coverage):
- US: freeport-lng, cameron-lng, cove-point-lng, elba-island-lng
- Middle East: qalhat-lng (Oman), adgas-das-island (UAE)
- Russia: sakhalin-2-lng (sanctions-exposed)
- Indonesia: tangguh-lng, bontang-lng (reduced — declining upstream)

Badge distribution on this batch: 43 operational, 2 reduced (Rough,
Bontang). Most entries from GIE AGSI+ fill-disclosed data; Russian
site + LNG terminals fill-not-disclosed (operator choice + sanctions).

Plan progress:
  gas pipelines: 75 (100%)
  oil pipelines: 75 (100%)
  fuel shortages: 29 / 28 countries (100%)
  disruptions: 52 (104%)
  storage: 21 → 66 (33% of ~200 target)

Registry tests: 21/21 pass.

Next storage batches remaining:
  batch 2 (+45): more European UGS tail + Asian national reserves
                 (CN SPR, IN SPR, JP national reserves, KR KNOC)
  batch 3 (+45): LNG import terminals + additional US tank farms +
                 European tank farms (Rotterdam detail, ARA sub-sites)
  batch 4 (+45): remainder to ~200

* feat(energy): storage registry 66 → 110 (storage batch 2)

+44 facilities. Focus: Asian national reserves + global LNG coverage
+ Singapore/ARA tank-farm detail.

Asian national reserves (11):
- IN ISPRL: vizag (9.8 Mb), mangalore (11 Mb), padur (17.4 Mb)
- CN: zhanjiang (45 Mb), huangdao (20 Mb) — fill opaque, press-only
- JP JOGMEC: shibushi (31.2 Mb), kiire (22 Mb), mutsu-ogawara (28 Mb)
- KR KNOC: yeosu (42 Mb), ulsan (33 Mb), geoje (47 Mb)

LNG export additions (11):
- Australia: pluto-lng, prelude-flng (reduced), darwin-lng (reduced
  upstream)
- Southeast Asia: mlng-bintulu (29.3 Mtpa — largest in registry),
  brunei-lng, donggi-senoro-lng
- Africa: angola-lng (reduced), equatorial-guinea-lng, hilli-episeyo-flng
- Pacific: png-lng
- Caribbean: trinidad-atlantic-lng (reduced)
- Mexico: costa-azul-lng (2025 reverse-to-export commissioning)

LNG import (12):
- UK: south-hook-lng (21 Mtpa), dragon-lng
- EU: zeebrugge-lng, dunkerque-lng, fos-cavaou-lng,
  montoir-de-bretagne-lng, gate-terminal (Rotterdam),
  revithoussa-lng
- Turkey: aliaga-ege-gaz-lng
- Chile: mejillones-lng, quintero-bay-lng

Tank farms (10):
- Africa: saldanha-bay (ZA 45 Mb)
- Norway: mongstad-crude
- ARA: antwerp-petroleum-hub (BE 55 Mb), amsterdam-petroleum-hub
- Asia hubs: singapore-jurong (120 Mb — largest in registry),
  singapore-pulau-ayer-chawan, thailand-sriracha, korea-gwangyang-crude
- Russia Baltic: ust-luga-crude-terminal, primorsk-crude-terminal
  (both carry Russian price-cap sanction refs)

Badge distribution on this batch: 39 operational, 5 reduced (Prelude,
Darwin, Angola, Bontang — no wait Bontang already in. Correct: Prelude,
Darwin, Angola, Trinidad).

Plan progress:
  gas pipelines: 75 (100%)
  oil pipelines: 75 (100%)
  fuel shortages: 29 / 28 countries (100%)
  disruptions: 52 (104%)
  storage: 66 → 110 (55% of ~200 target)

Registry tests: 21/21 pass.

Next batches remaining: ~90 more storage to hit ~200
  batch 3 (+45): Middle East tank farms, Chinese coastal commercial
                 storage, EU UGS tail, African LNG import
  batch 4 (+45): remainder to 200

* feat(energy): storage registry 110 → 155 (storage batch 3)

Adds 45 facilities toward 200 plan target:
- 7 Middle East export terminals (Kharg, Sidi Kerir, Mina al-Ahmadi,
  Mesaieed, Jebel Dhanna, Mina al-Fahal, Bandar Imam Khomeini)
- 10 EU UGS tail (Reitbrook, Empelde, Kirchheilingen, Stockstadt,
  Nüttermoor, Grijpskerk, Târgu Mureș, Třanovice, Uhřice, Háje)
- 4 Chinese coastal crude (Yangshan, Qingdao, Rizhao, Maoming)
- 6 EU LNG import tail (La Spezia, Adriatic, OLT Livorno, Klaipeda,
  Mugardos, Cartagena)
- 5 Indian LNG import (Hazira, Kochi reduced, Ennore, Mundra, Dabhol)
- 6 Japan/Korea LNG import (Chita, Negishi, Sodegaura, Himeji,
  Pyeongtaek, Incheon)
- 5 NA tank farms (Lake Charles, Corpus Christi, Patoka, Edmonton,
  Hardisty)
- 2 Asia-Pacific (Kaohsiung, Nghi Son)

Registry validator: 21/21 tests pass.

* feat(energy): storage registry 155 → 200 (storage batch 4 — plan target hit)

Final batch brings storage to the 200-facility plan target with broad
geographic + facility-type coverage.

New entries (45):
- 6 LNG export: NLNG Bonny (NG, reduced), Arzew (DZ), Skikda (DZ),
  Perú LNG, Calcasieu Pass (US), North West Shelf Karratha (AU)
- 7 LNG import: Świnoujście (PL), Krk FSRU (HR), Wilhelmshaven FSRU (DE),
  Brunsbüttel (DE), Map Ta Phut (TH), Port Qasim (PK), Batangas (PH)
- 6 UGS: Bilche-Volytsko-Uherske (UA, 154 TWh — largest Europe), Banatski
  Dvor (RS), Okoli (HR), Yela (ES), Loenhout (BE), Kushchevskoe (RU)
- 26 crude tank farms: José Terminal (VE, sanctioned), Santos (BR),
  TEBAR São Sebastião (BR), Dos Bocas (MX), Bonny (NG, reduced), Es
  Sider (LY, reduced), Ras Lanuf (LY, reduced), Ceyhan (TR), Puerto
  Rosales (AR), Novorossiysk Sheskharis (RU, sanctioned), Kozmino (RU,
  sanctioned), Tema (GH, reduced), Mombasa (KE), Abidjan SIR (CI),
  Juaymah (SA), Ras Tanura (SA), Yanbu (SA), Kirkuk (IQ, reduced),
  Basra Gulf (IQ), Djibouti Horizon (DJ), Yokkaichi (JP), Mailiao
  (TW), Ventspils (LV, reduced), Gdańsk Naftoport (PL), Constanța
  (RO), Wood River IL (US).

Geographic balance improved: Africa coverage (NG, DZ, LY, GH, KE, CI,
DJ) from 5 to 12 countries; first Iraq + Saudi entries; Balkans +
Ukraine + Romania now covered. Type mix: UGS 56, SPR 15, LNG export 33,
LNG import 38, crude tank farm 58.

Non-operational entries all carry authoritative evidence (press
operator statements + sanctionRefs for Russia/Venezuela).

Registry validator: 21/21 tests pass. Total: 200 facilities across 55
countries. Plan target hit.

* fix(energy): address Greptile review findings on registries

P1 — abqaiq-khurais-drone-strike-2019 (energy-disruptions.json):
capacityOfflineMbd was 5.7 (plant-level Saudi production loss headline)
against assetId east-west-saudi (5.0 mbd pipeline). Capped offline
figure at the linked pipeline's 5.0 mbd ceiling; moved the 5.7 mbd
historical headline into shortDescription with an explanatory note.
Preserves capacity-offline ≤ asset-capacity invariant for downstream
consumers.

P1 — russia-price-cap-implementation-2022 (energy-disruptions.json):
was linked to assetId espo (land pipeline to China — explicitly out of
scope for G7/EU price cap). Relinked to primorsk-crude-terminal
(largest Baltic seaborne crude export terminal, directly affected);
assetType pipeline → storage. Updated shortDescription to clarify
tanker-shipment scope + out-of-scope note for ESPO.

P2 — 13 reduced-state pipelines missing press citation text
(pipelines-gas.json × 8 + pipelines-oil.json × 5):
Added operatorStatement sentences naming the press/regulator sources
backing each reduction claim (Reuters, NNPC/Chevron releases, NIGC,
Pemex annual reports, S&P Platts, IEA Gas Market Report, BBC, etc.).
Clears the evidence-source-type gap flagged by Greptile for entries
that declared physicalStateSource: "press" with a null statement.

All 6583 data tests + 94 registry tests still pass.

* style(energy): restore compact registry formatting (preserve Greptile-fix evidence)

Prior commit 44b2c6859 accidentally reformatted pipelines-gas.json
and pipelines-oil.json from their compact mixed format to fully-
expanded JSON via json.dump(indent=2), producing 2479 lines of noise
for 13 one-line semantic changes.

This commit restores the original compact formatting while preserving
the 12 operatorStatement text additions from the Greptile P2 fix
(peru-norperuano was already fine — it carries a structured
operatorStatement object; the other 12 entries correctly gained press
citation text).

No data change vs 44b2c6859 — only whitespace reverts to original
layout. Pipeline registry tests (40/40) + full test:data (6583/6583)
still pass.
2026-04-23 12:32:29 +04:00
Elie Habib
9f208848b6 fix(deps): add yaml to scripts/package.json (Railway installs from THIS) (#3336)
PR #3333 added `yaml` to the root package.json, but the Railway
seed-bundle-resilience-recovery service builds with rootDirectory
pointing at scripts/ and NIXPACKS auto-detects scripts/package.json
as the install manifest. Root package.json is never visited during
the container build, so yaml stayed missing and the seeder crashed
again at 07:39:26 UTC with the identical ERR_MODULE_NOT_FOUND.

Adding yaml ^2.8.3 (matching the root promotion) to scripts/package.json
so NIXPACKS' `npm install --prefix scripts` lands it in
/app/node_modules/yaml. scripts/shared/swf-manifest-loader.mjs can
then resolve the bare specifier.

Keeping yaml in the root package.json too — it's harmless noise for
local dev + validation bundle (which also imports it via tsx), and
defensive for any future consumer that runs against the root deps.

Future question worth a separate PR: do we want Railway services
pointing at `scripts/` as rootDir, or should we move to a proper
per-service Dockerfile that makes the dep source explicit? The
current state is easy to miss because the Railway dashboard config
is invisible from the repo — second seeder to trip this exact hazard.
2026-04-23 11:53:37 +04:00
Elie Habib
8ea4c8f163 feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change) (#3330)
* feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change)

Ship the measurement layer before picking any recall-lift strategy.

Why: the current dedup path embeds titles only, so brief-wire headlines
that share a real event but drop the geographic anchor (e.g. "Alleged
Coup: defendant arrives in court" vs "Trial opens after Nigeria charges
six over 2025 coup plot") can slip past the 0.60 cosine threshold. To
tune recall without regressing precision we need a replayable per-tick
dataset — one record per story with the exact fields any downstream
candidate (title+slug, LLM-canonicalise, text-embedding-3-large, cross-
encoder re-rank, etc.) would need to score.

This PR ships ONLY the log. Zero behaviour change:
  - Opt-in via DIGEST_DEDUP_REPLAY_LOG=1 (default OFF).
  - Writer is best-effort: all errors swallowed + warned, never affects
    digest delivery. No throw path.
  - Records include hash, originalIndex, isRep, clusterId, raw +
    normalised title, link, severity/score/mentions/phase/sources,
    embeddingCacheKey, hasEmbedding sidecar flag, and the tick's config
    snapshot (mode, clustering, cosineThreshold, topicThreshold, veto).
  - clusterId derives from rep.mergedHashes (already set by
    materializeCluster) so the orchestrator is untouched.
  - Storage: Upstash list keyed by {variant}:{lang}:{sensitivity}:{date}
    with 30-day EXPIRE. Date suffix caps per-key growth; retention
    covers the labelling cadence + cross-candidate comparison window.
  - Env flag is '1'-only (fail-closed on typos, same pattern as
    DIGEST_DEDUP_MODE).

Activation path (post-merge): flip DIGEST_DEDUP_REPLAY_LOG=1 on the
seed-digest-notifications Railway service. Watch one cron tick for the
RPUSH + EXPIRE pair (or a single warn line if creds/upstream flake),
then leave running for at least one week to accumulate calibration data.

Tests: 21 unit tests covering flag parsing, key shape + sanitisation,
record field correctness (isRep, clusterId, embeddingCacheKey,
hasEmbedding, tickConfig), pipeline null/throw handling, malformed
input. Existing 77 dedup tests unchanged and still green.

* fix(digest-dedup): capture topicGroupingEnabled in replay tickConfig

Review catch (PR #3330): the tickConfig snapshot omitted
topicGroupingEnabled even though readOrchestratorConfig returns it and
the digest's post-dedup topic ordering gates on it. A tick run with
DIGEST_DEDUP_TOPIC_GROUPING=0 serialised identically to a default
tick, making those runs non-replayable for the calibration work this
log is meant to enable.

Add topicGroupingEnabled to the recorded tickConfig. One-line schema
fix + regression test asserting topic-grouping-off ticks serialise
distinctly from default.

22/22 tests pass.

* fix(digest-dedup): await replay-log write to survive explicit process.exit

Review catch (PR #3330): the fire-and-forget `void writeReplayLog(...)`
call could be dropped on the explicit-exit paths — the brief-compose
failure gate at line 1539 and main().catch at line 1545 both call
process.exit(1). Unlike natural exit, process.exit does not drain
in-flight promises, so the last N ticks' replay records could be
silently lost on runs where measurement fidelity matters most.

Fix: await the writeReplayLog call. Safe because:
  - writeReplayLog returns synchronously when the flag is off
    (replayLogEnabled check is the first thing it does)
  - It has a top-level try/catch that always returns a result object
  - The Upstash pipeline call has a 10s timeout ceiling
  - buildDigest already awaits many Upstash calls (dedup, compose,
    render) so one more is not a hot-path concern

Comment block added above the call explains why the await is
deliberate — so a future refactor doesn't revert it to void thinking
it's a leftover.

No test change: existing writeReplayLog unit tests already cover the
disabled / empty / success / error paths. The fix is a single-keyword
change in a caller that was already guaranteed-safe by the callee's
contract.

* refactor(digest-dedup): address Greptile P2 review comments on replay log

Three non-blocking polish items from the automated review, bundled
because they all touch the same new module and none change behaviour.

1. tsMs captured BEFORE deduplicateStories (seed-digest-notifications.mjs).
   Previously sampled after dedup returned, so briefTickId reflected
   dedup-completion time rather than tick-start. For downstream readers
   the natural reading of "briefTickId" is when the tick began
   processing; moved the Date.now() call to match that expectation.
   Drift is maybe 100ms-2s on cold-cache embed calls — small, but
   moving it is free.

2. buildReplayLogKey emptiness check now strips ':' and '-' in addition
   to '_'. A pathological ruleId of ':::' previously passed through
   verbatim, producing keys like `digest:replay-log:v1::::2026-04-23`
   that confuse redis-cli's namespace tooling (SCAN / KEYS / tab
   completion). The new guard falls back to "unknown" on any input
   that's all separators. Added a regression test covering the
   ':::' / '---' / '___' / mixed cases.

3. tickConfig is now a per-record shallow copy instead of a shared
   reference. Storage is unaffected (writeReplayLog serialises each
   record via JSON.stringify independently) but an in-memory consumer
   that mutated one record's tickConfig for experimentation would have
   silently affected all other records in the same batch. Added a
   regression test asserting mutation doesn't leak across records.

Tests: 24/24 pass (22 prior + 2 new regression). Typecheck + lint clean.
2026-04-23 11:50:19 +04:00
Elie Habib
8b12ecdf43 fix(aviation): seeder writes delays-bootstrap aggregate (close EMPTY-on-quiet-traffic alarm) (#3334)
* fix(aviation): seeder writes delays-bootstrap aggregate (close EMPTY-on-quiet-traffic alarm)

api/health.js BOOTSTRAP_KEYS.flightDelays points at aviation:delays-bootstrap:v1,
but no seeder ever produced it — the key was only written as a 1800s side-effect
inside list-airport-delays.ts. Quiet user-traffic windows >30 min let the
bootstrap expire, tripping EMPTY (CRIT) even with healthy upstream FAA + intl
+ NOTAM seeds. PR #3073 (Apr 13) doubled the cron cadence to 30 min, putting
the bootstrap TTL right at the failure edge.

Make seed-aviation.mjs the canonical writer:
  - New writeDelaysBootstrap() reads FAA + intl + NOTAM from Redis, applies
    the same NOTAM merge + Normal-operations filler the RPC builds, writes
    aviation:delays-bootstrap:v1 with TTL=7200 (~4 missed cron ticks of cushion).
  - Called pre-runSeed (last-good intl, covers intl-fail tick) AND inside
    afterPublishIntl (this-tick intl, happy-path overwrite).
  - Bump RPC's incidental write TTL 1800 → 7200 so a user-triggered RPC
    doesn't shorten the seeder's expiry and re-create the failure mode.

NOTAM merge logic + filler shape are now mirrored in two files (seeder + RPC's
_shared.ts). Both carry comments pointing at the other to surface drift risk.

Verified: typecheck (both tsconfigs) clean; node --test tests/aviation-*.test.mjs
green; full test:data 6590/6590 green.

* fix(aviation): seeder writes restrictedIcaos + bootstrap unwraps intl envelope

PR #3334 review (P1 + P2):

P1 — bootstrap silently dropped NOTAM restrictions
  seedNotamClosures() only tracked NOTAM_CLOSURE_QCODES; the live RPC's
  classifier in server/worldmonitor/aviation/v1/_shared.ts also derives
  restrictions via NOTAM_RESTRICTION_QCODES (RA, RO) + restriction code45s
  + restriction-text regex. Seeded NOTAM payload only had `closedIcaos`,
  so restrictedIcaos was always empty in Redis — both the new bootstrap
  aggregate AND the RPC's seed-read path silently dropped every NOTAM
  restriction. Mirror the full classifier from _shared.ts:438-452;
  side-car write now includes restrictedIcaos and seed-meta count
  reflects closures + restrictions.

P2 — pre-runSeed bootstrap built with no intl alerts on intl-fail tick
  runSeed wraps the canonical INTL_KEY in {_seed, data} when declareRecords
  is enabled. writeDelaysBootstrap()'s upstashGet only JSON.parsed — no
  envelope unwrap — so intlPayload.alerts was undefined on the pre-runSeed
  bootstrap-build path, and an intl-fail tick would publish a bootstrap
  with all intl alerts dropped instead of preserving the last-good
  snapshot. Add upstashGetUnwrapped() (delegates to unwrapEnvelope from
  _seed-envelope-source.mjs); use it for all three reads (FAA/NOTAM bare
  values pass through unchanged via unwrapEnvelope's permissive path).

Verified: typecheck (both tsconfigs) clean; aviation + edge-functions
tests green; full test:data 6590/6590 green.

* fix(aviation): bootstrap iterates union of seeder + RPC airport registries

PR #3334 review (P2 ×2):

P2 — AIRPORTS vs MONITORED_AIRPORTS registry drift
  Today the two diverge by ~45 iata codes (29 RPC-only, 16 seeder-only).
  Pre-fix the bootstrap iterated the seeder's local AIRPORTS list for
  Normal-operations filler and NOTAM airport lookup, so 29 monitored
  airports never appeared in the bootstrap aggregate even though the
  live RPC included them. Fix: parse src/config/airports.ts as text at
  startup (regex over the static const), memoise the parse, build a
  by-iata Map union (seeder wins on conflict for canonical meta), and
  iterate that for both NOTAM lookup and filler. First-run divergence
  summary logged to surface future drift in cron logs without blocking
  writes. Degrades to seeder AIRPORTS only with a warning if parse fails.

P2 — afterPublishIntl receives raw pre-transform data
  runSeed forwards the RAW fetchIntl() result to afterPublish, NOT the
  publishTransform()'d shape. Today publishTransform is a pass-through
  wrapper so data.alerts is correct, but coupling is subtle — added an
  inline CONTRACT comment so a future publishTransform mutation doesn't
  silently drift bootstrap from INTL_KEY.

Verified: typecheck (both tsconfigs) clean; aviation + edge-functions
tests green; full test:data 6590/6590 green; standalone parse harness
recovers all 111 MONITORED_AIRPORTS rows.
2026-04-23 11:43:54 +04:00
Elie Habib
1958b34f55 fix(digest-dedup): CLUSTERING typo fallback fails closed to complete-link (#3331)
DIGEST_DEDUP_CLUSTERING previously fell to 'single' on unrecognised
values, which silently defeated the documented kill switch. A typo
like `DIGEST_DEDUP_CLUSTERING=complet` during an over-merge incident
would stick with the aggressive single-link merger instead of rolling
back to the conservative complete-link algorithm.

Mirror the DIGEST_DEDUP_MODE typo pattern (PR #3247):
  - Unrecognised value → fall to 'complete' (SAFE / conservative).
  - Surface the raw value via new `invalidClusteringRaw` config field.
  - Emit a warn line on the dedup orchestrator's entry path so operators
    see the typo alongside the kill-switch-took-effect message.

Valid values 'single' (default), 'complete', unset, empty, and any
case variation all behave unchanged. Only true typos change behaviour
— and the new behaviour is the kill-switch-safe one.

Tests: updated the existing case that codified the old behaviour plus
added coverage for (a) multiple typo variants falling to complete with
invalidClusteringRaw set, (b) case-insensitive valid values not
triggering the typo path, and (c) the orchestrator emitting the warn
line even on the jaccard-kill-switch codepath (since CLUSTERING intent
applies to both modes).

81/81 dedup tests pass.
2026-04-23 11:25:05 +04:00
Elie Habib
d3d406448a feat(resilience): PR 2 §3.4 recovery-domain weight rebalance (#3328)
* feat(resilience): PR 2 §3.4 recovery-domain weight rebalance

Dials the two PR 2 §3.4 recovery dims (liquidReserveAdequacy,
sovereignFiscalBuffer) to ~10% share each of the recovery-domain
score via a new per-dimension weight channel in the coverage-weighted
mean. Matches the plan's direction that the sovereign-wealth signal
complement — rather than dominate — the classical liquid-reserves
and fiscal-space signals.

Implementation

- RESILIENCE_DIMENSION_WEIGHTS: new Record<ResilienceDimensionId, number>
  alongside RESILIENCE_DOMAIN_WEIGHTS. Every dim has an explicit entry
  (default 1.0) so rebalance decisions stay auditable; the two new
  recovery dims carry 0.5 each.

  Share math at full coverage (6 active recovery dims):
    weight sum                  = 4 × 1.0 + 2 × 0.5 = 5.0
    each new-dim share          = 0.5 / 5.0 = 0.10  ✓
    each core-dim share         = 1.0 / 5.0 = 0.20

  Retired dims (reserveAdequacy, fuelStockDays) keep weight 1.0 in
  the map; their coverage=0 neutralizes them at the coverage channel
  regardless. Explicit entries guard against a future scorer bug
  accidentally returning coverage>0 for a retired dim and falling
  through the `?? 1.0` default — every retirement decision is now
  tied to a single explicit source of truth.

- coverageWeightedMean (_shared.ts): refactored to apply
  `coverage × dimWeight` per dim instead of `coverage` alone. Backward-
  compatible when all weights default to 1.0 (reduces to the original
  mean). All three aggregation callers — buildDomainList, baseline-
  Score, stressScore — pick up the weighting transparently.

Test coverage

1. New `tests/resilience-recovery-weight-rebalance.test.mts`:
   pins the per-dim weight values, asserts the share math
   (0.10 new / 0.20 core), verifies completeness of the weight map,
   and documents why retired dims stay in the map at 1.0.
2. New `tests/resilience-recovery-ordering.test.mts`: fixture-based
   Spearman-proxy sensitivity check. Asserts NO > US > YE ordering
   preserved on both the overall score and the recovery-domain
   subscore after the rebalance. (Live post-merge Spearman rerun
   against the PR 0 snapshot is tracked as a follow-up commit.)
3. resilience-scorers.test.mts fixture anchors updated in lockstep:
     baselineScore: 60.35 → 62.17 (low-scoring liquidReserveAdequacy
       + partial-coverage SWF now contribute ~half the weight)
     overallScore:  63.60 → 64.39 (recovery subscore lifts by ~3 pts
       from the rebalance, overall by ~0.79)
     recovery flat mean: 48.75 (unchanged — flat mean doesn't apply
       weights by design; documents the coverage-weighted diff)
   Local coverageWeightedMean helper in the test mirrors the
   production implementation (weights applied per dim).

Methodology doc

- New "Per-dimension weights in the recovery domain" subsection with
  the weight table and a sentence explaining the cap. Cross-references
  the source of truth (RESILIENCE_DIMENSION_WEIGHTS).

Deliberate non-goals

- Live post-merge Spearman ≥0.85 check against the PR 0 baseline
  snapshot. Fixture ordering is preserved (new ordering test); the
  live-data check runs after Railway cron refreshes the rankings on
  the new weights and commits docs/snapshots/resilience-ranking-live-
  post-pr2-<date>.json. Tracked as the final piece of PR 2 §3.4
  alongside the health.js / bootstrap graduation (waiting on the
  7-day Railway cron bake-in window).

Tests: 6588/6588 data-tier tests pass. Typecheck clean on both
tsconfig configs. Biome clean on touched files. NO > US > YE
fixture ordering preserved.

* fix(resilience): PR 2 review — thread RESILIENCE_DIMENSION_WEIGHTS through the comparison harness

Greptile P2: the operator comparison harness
(scripts/compare-resilience-current-vs-proposed.mjs) claims its domain
scores "mirror the production scorer's coverage-weighted mean" and is
the artifact generator for Spearman / rank-delta acceptance decisions.
After PR 2 §3.4's weight rebalance, the production mirror diverged —
production now applies RESILIENCE_DIMENSION_WEIGHTS (liquidReserveAdequacy
= 0.5, sovereignFiscalBuffer = 0.5) inside coverageWeightedMean, but
the harness still used equal-weight aggregation.

Left unfixed, post-merge Spearman / rank-delta diagnostics would
compare live API scores (with the 0.5 recovery weights) against
harness predictions that assume equal-share dims — silently biasing
every acceptance decision until someone noticed a country's rank-
delta didn't track.

Fix

- Mirrored coverageWeightedMean now accepts dimensionWeights and
  applies `coverage × weight` per dim, matching _shared.ts exactly.
- Mirrored buildDomainList accepts + forwards dimensionWeights.
- main() imports RESILIENCE_DIMENSION_WEIGHTS from the scorer module
  and passes it through to buildDomainList at the single call site.
- Missing-entry default = 1.0 (same contract as production) — makes
  the harness forward-compatible with any future weight refactor
  (adds a new dim without an explicit entry, old production fallback
  path still produces the correct number).

Verification

- Harness syntax-check clean (node -c).
- RESILIENCE_DIMENSION_WEIGHTS import resolves correctly from the
  harness's import path.
- 509/509 resilience tests still pass (harness isn't in the test
  suite; the invariant is that production ↔ harness use the same
  math, and the production side is covered by tests/resilience-
  recovery-weight-rebalance.test.mts).

* fix(resilience): PR 2 review — bump cache prefixes v10→v11 + document coverage-vs-weight asymmetry

Greptile P1 + P2 on PR #3328.

P1 — cache prefix not bumped after formula change
--------------------------------------------------
The per-dim weight rebalance changes the score formula, but the
`_formula` tag only distinguishes 'd6' vs 'pc' (pillar-combined vs
legacy 6-domain) — it does NOT detect intra-'d6' weight changes. Left
unfixed, scores cached before deploy would be served with the old
equal-weight math for up to the full 6h TTL, and the ranking key for
up to its 12h TTL. Matches the established v9→v10 pattern for every
prior formula-changing deploy.

Bumped in lockstep:
 - RESILIENCE_SCORE_CACHE_PREFIX:     v10  → v11
 - RESILIENCE_RANKING_CACHE_KEY:      v10  → v11
 - RESILIENCE_HISTORY_KEY_PREFIX:      v5  → v6
 - scripts/seed-resilience-scores.mjs local mirrors
 - api/health.js resilienceRanking literal
 - 4 analysis/backtest scripts that read the cached keys directly
 - Test fixtures in resilience-{ranking, handlers, scores-seed,
   pillar-aggregation}.test.* that assert on literal key values

The v5→v6 history bump is the critical one: without it, pre-rebalance
history points would mix with post-rebalance points inside the 30-day
window, and change30d / trend math would diff values from different
formulas against each other, producing false-negative "falling" trends
for every country across the deploy window.

P2 — coverage-vs-weight asymmetry in computeLowConfidence / computeOverallCoverage
----------------------------------------------------------------------------------
Reviewer flagged that these two functions still average coverage
equally across all non-retired dims, even after the scoring aggregation
started applying RESILIENCE_DIMENSION_WEIGHTS. The asymmetry is
INTENTIONAL — these signals answer a different question from scoring:

  scoring aggregation: "how much does each dim matter to the score?"
  coverage signal:     "how much real data do we have on this country?"

A dim at weight 0.5 still has the same data-availability footprint as
a weight=1.0 dim: its coverage value reflects whether we successfully
fetched the upstream source, not whether the scorer cares about it.
Applying scoring weights to the coverage signal would let a
half-weight dim hide half its sparsity from the overallCoverage pill,
misleading users reading coverage as a data-quality indicator.

Added explicit comments to both functions noting the asymmetry is
deliberate and pointing at the other site for matching rationale.
No code change — just documentation.

Tests: 6588/6588 data-tier tests pass (+511 resilience-specific
including the prefix-literal assertions). Typecheck clean on both
tsconfig configs. Biome clean on touched files.

* docs(resilience): bump methodology doc cache-prefix references to v11/v6

Greptile P2 on PR #3328: Redis keys table in the reproducibility
appendix still published `score:v10` / `ranking:v10` / `history:v5`,
and the rollback instructions told operators to flush those keys.
After the recovery-domain weight rebalance, live cache runs at
`score:v11` / `ranking:v11` / `history:v6`.

- Updated the Redis keys table (line 490-492) to match `_shared.ts`.
- Updated the rollback block to name the current keys.
- Left the historical "Activation sequence" narrative intact (it
  accurately describes the pillar-combine PR's v9→v10 / v4→v5 bump)
  but added a parenthetical pointing at the current v11/v6 values.

No code change — doc-only correction for operator accuracy.

* fix(docs): escape MDX-unsafe `<137` pattern to unblock Mintlify deploy

Line 643 had `(<137 countries)` — MDX parses `<137` as a JSX tag
starting with digit `1`, which is illegal and breaks the deploy with
"Unexpected character \`1\` (U+0031) before name". Surfaced after the
prior cache-prefix commit forced Mintlify to re-parse this file.

Replaced with "fewer than 137 countries" for unambiguous rendering.
Other `<` occurrences in this doc (lines 34, 642) are followed by
whitespace and don't trip MDX's tag parser.
2026-04-23 10:25:18 +04:00
Elie Habib
7f83e1e0c3 chore: remove dormant proactive-intelligence agent (superseded by digest) (#3325)
* chore: remove dormant proactive-intelligence agent (superseded by digest)

PR #2889 merged a Phase 4 "Proactive Intelligence Agent" in 2026-04 with
588 lines of code and a PR body explicitly requiring a 6h Railway cron
service. That service was never provisioned — no Dockerfile, no Railway
entry, no health-registry key, all 7 test-plan checkboxes unchecked.

In the meantime the daily Intelligence Brief shipped via
scripts/seed-digest-notifications.mjs (PR #3321 and earlier), covering
the same "personalized editorial brief across all channels" use-case
at a different cadence (30m rather than 6h). The proactive agent's
landscape-diff trigger was speculative; the digest is the shipped
equivalent.

This PR retires the dormant code and scrubs the aspirational
"post-launch classifier" references that docs + comments have been
quietly carrying:

- Deleted scripts/proactive-intelligence.mjs (588 lines).
- scripts/_energy-disruption-registry.mjs, scripts/seed-fuel-shortages.mjs,
  scripts/_fuel-shortage-registry.mjs, src/shared/shortage-evidence.ts:
  dropped "proactive-intelligence.mjs will extend this registry /
  classifier output" comments. Registries are curated-only; no classifier
  exists.
- docs/methodology/disruptions.mdx: replaced "post-launch classifier"
  prose with the accurate "curated-only" description of how the event
  log is maintained.
- docs/api-notifications.mdx: envelope version is shared across **two**
  producers now (notification-relay, seed-digest-notifications), not three.
- scripts/notification-relay.cjs: one cross-producer comment updated.
- proto/worldmonitor/supply_chain/v1/list_energy_disruptions.proto +
  list_fuel_shortages.proto: same aspirational wording scrubbed.
- docs/api/SupplyChainService.openapi.{yaml,json} auto-regenerated via
  `make generate` — text-only description updates, no schema changes.

Net: -626 lines, +36 lines. No runtime behavior change. 6573/6573 unit
tests pass locally.

* fix(proto): scrub stale ListFuelShortages RPC comment (PR #3325 review)

Reviewer caught a stale "classifier-extended post-launch" comment on
the ListFuelShortages RPC method in service.proto that this PR's
initial pass missed — I fixed the message-definition comment in
list_fuel_shortages.proto but not the RPC-method comment in
service.proto, which propagates into the published OpenAPI
operation description.

- proto/worldmonitor/supply_chain/v1/service.proto: rewrite the
  ListFuelShortages RPC comment to match the curated-only framing
  used elsewhere in this PR.
- docs/api/SupplyChainService.openapi.{yaml,json}: auto-regenerated
  via `make generate`. Text-only operation-description update;
  no schema / contract changes.

No runtime impact. Other `classifier` references remaining in the
OpenAPI are legitimate schema field names (classifierVersion,
classifierConfidence) and an unrelated auto-revision-log trigger
enum value, both of which describe real on-row fields that existed
before this cleanup.
2026-04-23 09:15:57 +04:00
Elie Habib
c48ceea463 feat(resilience): PR 2 dimension wiring — split reserveAdequacy + add sovereignFiscalBuffer (#3324)
* feat(resilience): PR 2 dimension wiring — split reserveAdequacy + add sovereignFiscalBuffer

Plan §3.4 follow-up to #3305 + #3319. Lands the scorer + dimension
registration so the SWF seed from the Railway cron feeds a real score
once the bake-in window closes. No weight rebalance yet (separate
commit with Spearman sensitivity check), no health.js graduation yet
(7-day ON_DEMAND window per feedback_health_required_key_needs_
railway_cron_first.md), no bootstrap wiring yet (follow-up PR).

Shape of the change

Retirement:
- reserveAdequacy joins fuelStockDays in RESILIENCE_RETIRED_DIMENSIONS.
  The legacy scorer now mirrors scoreFuelStockDays: returns
  coverage=0 / imputationClass=null so the dimension is filtered out
  of the confidence / coverage averages via the registry filter in
  computeLowConfidence, computeOverallCoverage, and the widget's
  formatResilienceConfidence. Kept in RESILIENCE_DIMENSION_ORDER for
  structural continuity (tests, cached payload shape, registry
  membership). Indicator registry tier demoted to 'experimental'.

Two new active dimensions:
- liquidReserveAdequacy (replaces the liquid-reserves half of the
  retired reserveAdequacy). Same source (WB FI.RES.TOTL.MO, total
  reserves in months of imports) but re-anchored 1..12 months
  instead of 1..18. Twelve months ≈ IMF "full reserve adequacy"
  benchmark for a diversified emerging-market importer — the tighter
  ceiling prevents wealthy commodity-exporters from claiming outsized
  credit for on-paper reserve stocks that are not the relevant
  shock-absorption buffer.
- sovereignFiscalBuffer. Reads resilience:recovery:sovereign-wealth:v1
  (populated by scripts/seed-sovereign-wealth.mjs, landed in #3305 +
  wired into Railway cron in #3319). Computes the saturating
  transform:
    effectiveMonths = Σ [ aum/annualImports × 12 × access × liquidity × transparency ]
    score           = 100 × (1 − exp(−effectiveMonths / 12))
  Exponential saturation prevents Norway-type outliers (effective
  months in the 100s) from dominating the recovery pillar.

Three code paths in scoreSovereignFiscalBuffer:
1. Seed key absent entirely → IMPUTE.recoverySovereignFiscalBuffer
   (score 50 / coverage 0.3 / unmonitored). Covers the Railway-cron
   bake-in window before the first successful tick.
2. Seed present, country NOT in manifest → score=0 with FULL coverage.
   Substantive absence, NOT imputation — per plan §3.4 "What happens
   to no-SWF countries." 0 × weight = 0 in the numerator, so the
   country correctly scores lower than SWF-holding peers on this dim.
3. Seed present, country in payload → saturating score, coverage
   derated by the partial-seed completeness signal (so a Mubadala or
   Temasek scrape drift on a multi-fund country shows up as lower
   confidence rather than a silently-understated total).

Indicator registry:
- Demoted recoveryReserveMonths (tied to retired reserveAdequacy) to
  tier='experimental'.
- Added recoveryLiquidReserveMonths: WB FI.RES.TOTL.MO, anchors 1..12,
  tier='core', coverage=188.
- Added recoverySovereignWealthEffectiveMonths: the new SWF signal,
  tier='experimental' for now because the manifest only has 8 funds
  (below the 180-core / 137-§3.6-gate threshold). Graduating to 'core'
  requires expanding the manifest past ~137 entries — a later PR.

Tests updated

- resilience-release-gate: 19→21 dim count; RETIRED_DIMENSIONS allow-
  list now includes reserveAdequacy alongside fuelStockDays.
- resilience-dimension-scorers: scoreReserveAdequacy monotonicity +
  "high reserves score well" tests migrated to scoreLiquidReserve-
  Adequacy (same source, new 1..12 anchor). New retirement-shape test
  for scoreReserveAdequacy mirroring the PR 3 fuelStockDays retirement
  test. Four new scorer tests pin the three code paths of
  scoreSovereignFiscalBuffer (absent seed / no-SWF country / SWF
  country / partial-completeness derate).
- resilience-scorers fixture: baseline 60.12→60.35, recovery-domain
  flat mean 47.33→48.75, overall 63.27→63.6. Each number commented
  with the driver (split adds liquidReserveAdequacy 18@1.0 + sovereign
  FiscalBuffer 50@0.3 at IMPUTE; retired reserveAdequacy drops out).
- resilience-dimension-monotonicity: target scoreLiquidReserveAdequacy
  instead of scoreReserveAdequacy.
- resilience-handlers: response-shape dim count 19→21.
- resilience-indicator-registry: coverage 19→21 dimensions.
- resilience-dimension-freshness: allowlisted the new sovereign-wealth
  seed-meta key in KNOWN_SEEDS_NOT_IN_HEALTH for the ON_DEMAND window.
- resilience-methodology-lint HEADING_TO_DIMENSION: added the two new
  heading mappings. Methodology doc gets H4 sections for Liquid
  Reserve Adequacy and Sovereign Fiscal Buffer; Reserve Adequacy
  section is annotated as retired.
- resilience-retired-dimensions-parity: client-side
  RESILIENCE_RETIRED_DIMENSION_IDS gets reserveAdequacy. Parser
  upgraded to strip inline `// …` comments from the array body so a
  future reviewer can drop a rationale next to an entry without
  breaking parity.
- resilience-confidence-averaging: fixture updated to include both
  retired dims (reserveAdequacy + fuelStockDays) — confirms the
  registry filter correctly excludes BOTH from the visible coverage
  reading.

Extraction harness (scripts/compare-resilience-current-vs-proposed.mjs):
- recoveryLiquidReserveMonths: reads the same reserve-adequacy seed
  field as recoveryReserveMonths.
- recoverySovereignWealthEffectiveMonths: reads the new SWF seed key
  on field totalEffectiveMonths. Absent-payload → 0 for correlation
  math (matches the substantive-no-SWF scorer branch).

Out of scope for this commit (follow-ups)

- Recovery-domain weight rebalance + Spearman sensitivity rerun
  against the PR 0 baseline.
- health.js graduation (SEED_META entry + ON_DEMAND_KEYS removal) once
  Railway cron has ~7 days of clean runs.
- api/bootstrap.js wiring once an RPC consumer needs the SWF data.
- Manifest expansion past 137 countries so sovereignFiscalBuffer can
  graduate from tier='experimental' to tier='core'.

Tests: 6573/6573 data-tier tests pass. Typecheck clean on both
tsconfig configs. Biome clean on all touched files.

* fix(resilience): PR 2 review — add widget labels for new dimensions

P2 review finding on PR #3324. DIMENSION_LABELS in src/components/
resilience-widget-utils.ts covered only the old 19 dimension IDs, so
the two new active dims (liquidReserveAdequacy, sovereignFiscalBuffer)
would render with their raw internal IDs in the confidence grid for
every country once the scorer started emitting them. The widget test
at getResilienceDimensionLabel also asserted only the 19-label set,
so the gap would have shipped silently.

Fix: add user-facing short labels for both new dims. "Reserves" is
already claimed by the retired reserveAdequacy, so the replacement
disambiguates with "Liquid Reserves"; sovereignFiscalBuffer →
"Sovereign Wealth" per the methodology doc H4 heading.

Also added a regression guard — new test asserts EVERY id in
RESILIENCE_DIMENSION_ORDER resolves to a non-id label. Any future
dimension that ships without a matching DIMENSION_LABELS entry now
fails CI loudly instead of leaking the ID into the UI.

Tests: 502/502 resilience tests pass (+1 new coverage check).
Typecheck clean on both configs.

* fix(resilience): PR 2 review — remove dead IMPUTE.recoveryReserveAdequacy entry

Greptile P2: the retired scoreReserveAdequacy stub no longer reads
from IMPUTE (it hardcodes coverage=0 / imputationClass=null per the
retirement pattern), making IMPUTE.recoveryReserveAdequacy dead code.
Removed the entry + added a breadcrumb comment pointing at the
replacement IMPUTE.recoveryLiquidReserveAdequacy.

The second P2 (bootstrap.js not wired) is a deliberate non-goal — the
reviewer explicitly flags "for visibility" since it's tracked in the
PR body. No action this commit; bootstrap wiring lands alongside the
SEED_META graduation after the ~7-day Railway-cron bake-in.

Tests: 502/502 resilience tests still pass. Typecheck clean.
2026-04-23 09:01:30 +04:00
Elie Habib
29306008e4 fix(email): route Intelligence Brief off the alerts@ mailbox (#3321)
* fix(email): route Intelligence Brief off the alerts@ mailbox

The daily "WorldMonitor Intelligence Brief" email was shipping from
`alerts@worldmonitor.app` with a display name that — if the Railway env
override dropped the `Name <…>` wrapper — Gmail/Outlook fell back to
rendering the local-part ("alerts" / "alert") as the sender name.
Recipients saw a scary-looking "alert" in their inbox for what is
actually a curated editorial read.

Split the sender so editorial mail can't share the `alerts@` mailbox
with incident pushes:

- New env var `RESEND_FROM_BRIEF` (default `WorldMonitor Brief
  <brief@worldmonitor.app>`) consumed by seed-digest-notifications.mjs.
- Falls back to `RESEND_FROM_EMAIL`, then to the built-in default, so
  existing deploys keep working and the rollout is a single Railway
  env flip on the digest service.
- notification-relay.cjs (realtime push alerts) intentionally keeps
  `RESEND_FROM_EMAIL` / `alerts@` — accurate for that path.
- .env.example documents the display-name rule so the bare-address
  trap can't re-introduce the bug.

Rollout: set `RESEND_FROM_BRIEF=WorldMonitor Brief <brief@worldmonitor.app>`
on the `seed-digest-notifications` Railway service. Domain-level Resend
verification already covers the new local-part; no DNS change needed.

* fix(email): runtime normalize sender to prevent bare-address regression

PR review feedback from codex:

  > P2 — RESEND_FROM_BRIEF is consumed verbatim, so an operator can
  > still set brief@worldmonitor.app without a display name and
  > recreate the same Gmail/Outlook rendering bug for the daily brief.
  > Today that protection is only documentation in .env.example, not
  > runtime enforcement.

Add a small shared helper `scripts/lib/resend-from.cjs` that coerces a
bare email address into a "Name <addr>" wrapper with a loud warning
log, and wire it into the digest path.

- Bare-address input (e.g. `brief@worldmonitor.app`) is rewritten to
  `WorldMonitor Brief <brief@worldmonitor.app>` so Gmail/Outlook stop
  falling back to the local-part as the display name.
- Coercion emits a single `console.warn` line per boot so operators
  see the signal in Railway logs and can fix the underlying env.
- Fail-safe (not fail-closed) — a misconfigured env does NOT take the
  cron down.

Also resolves the P3 doc-vs-runtime divergence by reverting
.env.example's RESEND_FROM_EMAIL default from "WorldMonitor Alerts
<...>" back to "WorldMonitor <...>" to match the existing
notification-relay.cjs runtime default. The realtime-alert path will
get the same normalizer treatment in a follow-up PR that cohesively
touches notification-relay.cjs + Dockerfile.relay.

tests: 7 new cases in tests/resend-sender-normalize.test.mjs covering
empty/null/whitespace input, wrapped passthrough, trim, bare-address
coercion, warning emission, no-warning on wrapped, console.warn default
sink. Runs under `npm run test:data`.
2026-04-23 08:51:27 +04:00
Elie Habib
8a988323d2 chore(bundle-runner): emit reliable per-section summary line on parent stdout (#3320)
* chore(bundle-runner): emit reliable per-section summary line on parent stdout

Fixes observability asymmetry in Railway bundle service logs where some
seeders appeared to skip lines like \`Run ID\`, \`Mode\`, or the
structured \`seed_complete\` JSON event. Root cause is Railway's log
ingestion dropping child-stdout lines when multiple seeders emit at
similar timestamps — observed in the PR #3294 launch run where
Pipelines-Gas was missing its \`=== Seed ===\` banner, Pipelines-Oil had
\`Key:\` emitted BEFORE the banner, Storage-Facilities and Energy-
Disruptions were missing Run ID + Mode + seed_complete entirely,
despite identical code paths.

All child processes emit the same lines; Railway just loses some. Fix
is to piggy-back on the observation that bundle-level lines (\`[Bundle:X]
Starting\`, \`[Bundle:X] Finished\`) ARE reliably captured — they come
from the parent process's single stdout stream.

Changes in scripts/_bundle-runner.mjs:
- spawnSeed now captures the child's \`{"event":"seed_complete",...}\` JSON
  line while streaming stdout, parses it, and attaches to the settle
  result.
- Main loop emits one bundle-level summary line per section after child
  exit:
    [Bundle:X] section=NAME status=OK durationMs=1237 records=15 state=OK
  (or \`status=FAILED elapsed=...s reason=...\` for failures).
- Summary line survives Railway's log ingestion even when per-section
  child lines drop, giving monitors a reliable event to key off.

Observability consumers (log-based alerts, seed telemetry scrapers)
should now key off the bundle-level summary rather than per-section
child lines which remain best-effort. The per-section child lines stay
as-is for interactive debugging.

Verification: parse logic sanity-checked against the exact seed_complete
line format. Node syntax check clean. No schema changes.

* fix(bundle-runner): emit FAILED summary line to stderr, not stdout

The prior commit introduced a bundle-level structured summary line per
section. On success that correctly goes to stdout; on FAILED it was
also going to stdout — but that broke tests/bundle-runner.test.mjs test
140 ("timeout emits terminal reason BEFORE SIGTERM/SIGKILL grace").

The test concatenates stdout+stderr and asserts that `SIGKILL` appears
AFTER `Failed after` in the combined string (verifying the kill-decision
log line is emitted BEFORE the 10s SIGTERM→SIGKILL grace window, so
it survives container termination). My new FAILED summary line — which
includes the reason string `timeout after 1s (signal SIGKILL)` —
landed on stdout, which comes first in the concatenation, and its
`SIGKILL` substring matched before the stderr-side `Did not exit on
SIGTERM...SIGKILL` line. Ordering assertion failed.

Fix: route the FAILED summary line through console.error (same stream
as the pre-kill `Failed after ... sending SIGTERM` and the grace-window
`Did not exit...SIGKILL` lines). Chronological ordering in combined
output is preserved; test passes.

OK summary lines stay on stdout — they're observability data, not
error diagnostics, and belong on the normal stream alongside the
bundle Starting/Finished lines.

Local: `node --test tests/bundle-runner.test.mjs` — 4/4 pass including
the previously-failing ordering test.
2026-04-23 08:36:05 +04:00
Elie Habib
24786882ae chore(railway): wire seed-sovereign-wealth into resilience-recovery bundle (#3319)
* chore(railway): add seed-sovereign-wealth to resilience-recovery bundle

Wires the seeder landed in #3305 into the existing Railway cron service
`seed-bundle-resilience-recovery`. One-line bundle entry; no new
Railway service (the bundle pattern amortizes cron cost across the
recovery-domain seeders).

Config matches the rest of the bundle:
- intervalMs: 30 * DAY (parity with CACHE_TTL_SECONDS=35d in the
  seeder + the quarterly manifest revision cadence)
- timeoutMs: 600_000 (longer than peers because Tier 3b does N per-fund
  Wikipedia article fetches for any fund missing from the list article;
  today Temasek is the only miss but leaving headroom)

After deploy, the next cron tick populates
`resilience:recovery:sovereign-wealth:v1`, which then unblocks the
follow-up PR that adds the scorer + dimension wiring.

* fix(tests): update resilience-recovery bundle test for 6th entry

Static-analysis test in tests/seed-bundle-resilience-recovery.test.mjs
was hardcoded to `5 entries` / `all 5 entries use 30 * DAY`. Adding
Sovereign-Wealth to the bundle (previous commit) made the count 6,
breaking both assertions.

Replaced hardcoded `5` with `EXPECTED_ENTRIES.length` so the next
addition only requires appending to the allow-list at the top of the
file (and the assertion message prompts the author to do that if the
count drifts). Also appended the Sovereign-Wealth entry to the
EXPECTED_ENTRIES list.

6566/6566 data-tier tests pass locally.
2026-04-23 08:19:04 +04:00
Elie Habib
8032dc3a04 feat(resilience): PR 2 pre-scorer — SWF manifest + seeder (8/8 funds) (#3305)
* feat(resilience): PR 2 scaffolding — SWF classification manifest + seeder skeleton

Plan §3.4. First of multiple commits for PR 2 (fiscal-buffer split
and sovereign-wealth integration). This commit is SCAFFOLDING ONLY:
no dimension wiring, no scorer, no cache-keys entry yet. The goal is
to land the reviewer-facing metadata and the seeder's three-tier
source shape so an external SWF practitioner can critique before we
wire the scorer.

What is in:

1. docs/methodology/swf-classification-manifest.yaml — authoritative
   per-fund classification for the `sovereignFiscalBuffer` dimension.
   First-pass estimates for the 8 funds named in plan §3.4 table:
   Norway GPFG, UAE ADIA + Mubadala, Saudi PIF, Kuwait KIA,
   Qatar QIA, Singapore GIC + Temasek. Each fund carries:
     - three-component classification (access, liquidity, transparency)
       each on [0, 1], with rationale text citing the mandate / fiscal
       rule / asset-mix / transparency-index evidence
     - source URLs for audit
   Fund-candidates deferred for external-reviewer decision are listed
   in a trailing comment block (CIC, NWF, SOFAZ, NSIA, Future Fund,
   NZ Super, ESSF, etc.).

   external_review_status: PENDING — flip to REVIEWED on sign-off.

2. scripts/shared/swf-manifest-loader.mjs — YAML parser + strict schema
   validator. Fails loudly on any deviation (out-of-range scores,
   non-ISO2 countries, missing rationale, duplicate fund IDs, wrong
   manifest version). Single source of truth for the seeder, future
   scorer, and methodology-doc linter.

3. scripts/seed-sovereign-wealth.mjs — seeder shell with the three-tier
   source priority from plan §3.4:
     1. Official fund disclosures (MoF, central-bank, annual reports)
     2. IFSWF member filings
     3. SWFI public fund-rankings page (license-free fallback, scraped)
   Tiers 1-3 are all stubbed (return null) in this commit — the
   seeder publishes a well-formed empty payload so the scorer IMPUTE
   fallback can be exercised end-to-end without live data.
   emptyDataIsFailure: false is set deliberately so pre-wiring cron
   runs do not poison seed-meta (see
   feedback_strict_floor_validate_fail_poisons_seed_meta.md).

   SWFI scrape target is documented in the file header with the
   exact URL and a 2.5s inter-request interval. The scraper itself
   lands in the next commit after the external reviewer signs off
   on the manifest.

4. tests/swf-classification-manifest.test.mjs — 14 tests exercising
   both the shipped YAML (plan §3.4 required-fund presence, [0,1]
   bounds, rationale length, source citations, multi-fund country
   handling) and the validator's schema enforcement (rejects out-
   of-range scores, non-ISO2 codes, missing rationale, empty sources,
   duplicates, wrong version, invalid review status).

Out of scope for this commit (follow-ups, in order):
 - Implement SWFI scrape + IFSWF parse + per-fund official endpoints
 - Add `liquidReserveAdequacy` and `sovereignFiscalBuffer` dimensions
   to RESILIENCE_DIMENSION_ORDER, registry, and scorers
 - Retire `reserveAdequacy` via RESILIENCE_RETIRED_DIMENSIONS
 - cache-keys.ts + api/bootstrap.js + api/health.js wiring (new
   seed key needs ON_DEMAND_KEYS gating per Railway-cron bake-in rule)
 - Recovery-domain weight rebalance + Spearman sensitivity rerun
 - Methodology doc: rewrite the reserveAdequacy section

Tests: 508/508 pass (resilience suite + new manifest tests).
Typecheck clean on both tsconfig.json and tsconfig.api.json.

No external-facing behavior change — all files are new + isolated.

* feat(resilience): PR 2 commit 2 — Wikipedia SWF scraper + SWFI pivot

Implements Tier 3 of the sovereignFiscalBuffer seeder. Tier 1 (official
disclosures) and Tier 2 (IFSWF filings) remain stubbed — they require
per-fund bespoke adapters and will land incrementally.

SWFI pivot
----------
The plan's original Tier 3 target was
https://www.swfinstitute.org/fund-rankings/sovereign-wealth-fund. Live
check on 2026-04-23: the page's <tbody> is empty and AUM is gated
behind a lead-capture form (name + company + job title). SWFI per-fund
/profile/<id> pages are similarly barren. The "public fund rankings"
is effectively no longer public; scraping the lead-gated surface would
require submitting fabricated contact info (TOS violation, legally
questionable), so Tier 3 pivots to Wikipedia.

Wikipedia is legally clean (CC-BY-SA 4.0, attribution required — see
WIKIPEDIA_SOURCE_ATTRIBUTION in the seeder) and structurally scrapable.
The SWFI Linaburg-Maduell Transparency Index mentioned in manifest
rationale text is a SEPARATE SWFI publication (public index scores),
not the fund-rankings paywall — those citations stay valid.

What is in
----------

1. scripts/seed-sovereign-wealth.mjs — Wikipedia scraper implementation:
   - parseWikipediaRankingsTable(html) — exported pure function so
     the parser is unit-testable without a live fetch. Extracts the
     wikitable, parses per-fund rows (Country, Abbrev, Fund name,
     Assets USD B, Inception, Origin).
   - Strip-HTML helper strips <sup> tags to SPACES (not empty) so
     `302.0<sup>41</sup>` stays `302.0 41` — otherwise the decimal
     value and its trailing footnote ref get welded into `302.041`,
     which the Assets regex mis-parses.
   - matchWikipediaRecord(fund, cache) — abbrev + fund-name lookup
     with country disambiguation: lookup maps are now
     Map<key, Record[]> (list) rather than Map<key, Record>, and the
     matcher filters the list by manifest country before returning.
     This is the exact fix for the PIF collision:
     "PIF" resolves to BOTH Saudi Arabia's Public Investment Fund
     (~USD 925B) and Palestine's Palestine Investment Fund (~USD 900M)
     on the live article. Without country-filtering, Map.set silently
     overwrites one with the other, so Saudi PIF would return
     Palestine's AUM — three orders of magnitude wrong.
   - When the country disambiguator cannot pick, returns null rather
     than a best-guess. Seeder logs the unmatched fund; the IMPUTE
     path handles it gracefully.

2. docs/methodology/swf-classification-manifest.yaml — added
   `wikipedia` hints block to each of the 8 funds (abbrev and/or
   fund_name, matching Wikipedia's canonical naming).

3. scripts/shared/swf-manifest-loader.mjs — optional `wikipedia` field
   in the schema: `abbrev` and `fund_name` both optional strings, but
   at least one must be present if the block is provided.

4. tests/seed-sovereign-wealth.test.mjs — 12 tests exercising:
   - fixture-based parser: abbrev/name indexing, HTML + footnote
     stripping, decimal AUM, malformed rows skipped, missing-table error
   - abbrev-collision handling: both candidates retained in the list
   - country-disambiguation matcher: Saudi PIF correctly picked from
     a Saudi-vs-Palestine collision fixture (the exact live bug)
   - ambiguous lookup with unknown country returns null, not wrong record

Live verification against the shipped Wikipedia article: 7/8 funds
matched with the correct country; Saudi PIF now correctly returns
USD 925B (not Palestine's USD 0.9B) because of the country-
disambiguation fix. Temasek is the one miss — Wikipedia does not
classify it as an SWF (practitioner debate; it lists under "state
holding companies" instead). Falls through to IMPUTE in the scorer
until Tier 1/2 adapters land with an official-disclosure source.

Tests: 522/522 pass (resilience + manifest + scraper).
Typecheck clean on both tsconfig.json and tsconfig.api.json.

Still stubbed for later commits:
 - Tier 1 per-fund official-disclosure adapters (incl. Temasek)
 - Tier 2 IFSWF secretariat parser
 - Dimension wiring (liquidReserveAdequacy, sovereignFiscalBuffer)
 - reserveAdequacy retirement via RESILIENCE_RETIRED_DIMENSIONS
 - cache-keys / bootstrap / health.js wiring (ON_DEMAND_KEYS until bake-in)
 - Recovery-domain weight rebalance + Spearman sensitivity rerun

* feat(resilience): PR 2 commit 3 — Wikipedia infobox fallback + FX → 8/8 match

Closes the Temasek gap. The Wikipedia list article excludes Temasek on
editorial grounds (classified as a "state holding company" rather than
an SWF), so the Tier-3 list-only path topped out at 7/8 funds matched.
This commit adds Tier 3b — per-fund Wikipedia article infobox scrape
— and a baked-in FX table to handle non-USD infobox currencies.

Live verification on the shipped Wikipedia articles: 8/8 funds matched.
Temasek: S$ 434B → US$ 321B via infobox + SGD→USD FX.

Implementation

1. scripts/seed-sovereign-wealth.mjs
   - FX_TO_USD table (USD, SGD, NOK, EUR, GBP, AED, SAR, KWD, QAR)
     with FX_RATES_REVIEWED_AT='2026-04-23' committed into the seed
     payload so stale rates are visible at audit time.
   - CURRENCY_SYMBOL_TO_ISO ordered list — US$ tested before S$ before
     bare $, and $ / kr require a space + digit neighbor to avoid
     false-matches in rich prose.
   - detectCurrency(text) exported pure for unit testing.
   - parseWikipediaArticleInfobox(html) exported pure — scans rows
     for "Total assets" / "Assets under management" / "AUM" / "Net
     assets" / "Net portfolio value" labels, extracts "NUMBER (trillion
     | billion | million) (YEAR)" values, applies FX conversion.
   - fetchWikipediaInfobox(fund) — per-fund article fetch, gated on
     the manifest's wikipedia.article_url hint.
   - sourceMix split into {official, ifswf, wikipedia_list,
     wikipedia_infobox} counters so the seed payload shows which tier
     delivered each fund.
   - Source priority chain: official → ifswf → wikipedia_list →
     wikipedia_infobox. Infobox last because it is N network round-
     trips; amortizing over the list article cache first minimizes
     live traffic.

2. docs/methodology/swf-classification-manifest.yaml
   - Temasek entry gains wikipedia.article_url:
     https://en.wikipedia.org/wiki/Temasek_Holdings with an inline
     comment explaining why the list-article path misses.

3. scripts/shared/swf-manifest-loader.mjs
   - article_url optional field; validator rejects anything that is
     not a https://<lang>.wikipedia.org/... URL so a typo cannot
     silently wire the seeder to an off-site fetch.

4. tests/seed-sovereign-wealth.test.mjs (10 new tests, 38/38 pass)
   - detectCurrency distinguishes US$ vs S$ vs bare $.
   - parseWikipediaArticleInfobox extracts Temasek S$ 434B → US$ 321B
     with year tag from "(2025)".
   - USD-native row pass-through with fxRate=1.0.
   - NOK trillion conversion (NOK 18.7T → USD 1.74T).
   - Returns null when no AUM row / no infobox at all.
   - Documents the unknown-currency → USD fallback contract.

Tests: 532/532 pass (full resilience + manifest + scraper suite).
Typecheck clean on both tsconfig.json and tsconfig.api.json.

Still stubbed for later commits:
 - Tier 1 per-fund official-disclosure adapters
 - Tier 2 IFSWF secretariat parser
 - Dimension wiring (liquidReserveAdequacy, sovereignFiscalBuffer)
 - reserveAdequacy retirement via RESILIENCE_RETIRED_DIMENSIONS
 - cache-keys / bootstrap / health.js wiring (ON_DEMAND_KEYS)
 - Recovery-domain weight rebalance + Spearman sensitivity rerun

* refactor(resilience): reuse project-shared FX infrastructure for SWF seeder

Self-caught duplication from the previous commit (699ba832a introduced
a local FX_TO_USD table and FX_RATES_REVIEWED_AT constant). The
codebase already has the canonical path:

  scripts/_seed-utils.mjs
    SHARED_FX_FALLBACKS      (USD/SGD/NOK/EUR/GBP/AED/SAR/QAR/KWD/...)
    getSharedFxRates()       (Redis shared:fx-rates:v1 4h cache + Yahoo)
    fetchYahooFxRates()

Used by seed-grocery-basket, seed-fuel-prices, seed-bigmac. Two FX
tables would drift and the live-rate layer (Yahoo via Redis cache)
would be orphaned on the SWF path.

What changed

- Deleted local FX_TO_USD / FX_RATES_REVIEWED_AT constants.
- parseWikipediaArticleInfobox() no longer performs FX conversion.
  Returns { valueNative, currencyNative, aumYear } so the seeder
  orchestrator applies project-shared rates at call time. Parser is
  now currency-agnostic and thinner.
- Added lookupUsdRate(currency, fxRates) helper:
  * USD → 1.0 short-circuit
  * prefer the live map (getSharedFxRates output) over static fallback
  * fall back to SHARED_FX_FALLBACKS
  * return null on unknown currency (caller skips the fund — no silent
    wrong-currency misreading).
- fetchWikipediaInfobox() accepts fxRates map, converts via
  lookupUsdRate, returns enriched { aum, currencyNative, fxRate }.
- fetchSovereignWealth() fetches fxRates once at the top via
  getSharedFxRates(buildFxSymbolsForSwf(), SHARED_FX_FALLBACKS), in
  parallel with World Bank imports + Wikipedia list. Warms the shared
  Redis FX cache for other seeders at the same time.
- Seed payload drops the fxRatesReviewedAt field; the shared cache
  carries that metadata at the Redis level for all seeders.

Tests updated

- parseWikipediaArticleInfobox tests assert the native value + ISO
  code, no longer the USD-converted amount.
- New `lookupUsdRate` suite pins the project-shared FX integration:
  USD short-circuit, live-rate preference, static fallback, unknown-
  currency null, and a Temasek S$ 434B → US$ 321B end-to-end case
  via the shared fallback table.

Live re-verification still 8/8; SGD comes through SHARED_FX_FALLBACKS
at 0.74 (same number as the deleted local table), so behavior is
identical but the dedupe is real.

Tests: 536/536 pass. Typecheck clean on both tsconfig configs.

* refactor(resilience): split SWF manifest validator into sub-helpers

Biome reported validateManifest at complexity 55 vs max 50. Extracted
the per-fund validation into validateFundEntry(raw, idx, seen) and
pulled out validateClassification, validateRationale, validateSources,
validateWikipediaHints as separate helpers. Behavior and tests are
unchanged; each helper is now well under the complexity cap and the
main validator reads linearly.

Tests: 42/42 manifest + scraper tests pass. Typecheck clean.

* fix(resilience): PR 2 review — partial-seed guard + manifest REVIEWED status

Addresses two P1 findings on PR #3305.

P1.1 — partial-seed silent corruption on multi-fund countries
-------------------------------------------------------------
For multi-fund countries (AE = ADIA + Mubadala, SG = GIC + Temasek)
the previous aggregation silently published a partial
totalEffectiveMonths if a secondary fund's scraper drifted on
Wikipedia — recordCount would still look green because we counted
"any fund matched" as a successful country-seed. Downstream scorer
would under-rank those countries with no missingness signal.

Fix:
- Each country entry now carries { expectedFunds, matchedFunds,
  completeness } alongside the existing totalEffectiveMonths. The
  scorer can use completeness < 1.0 to derate (treat as degraded
  coverage) rather than accept the partial number at face value.
- declareRecords counts ONLY countries with completeness === 1.0,
  so a secondary-fund drift drops the seed-meta record_count and
  triggers the operational alarm. recordCount in runSeed opts now
  delegates to declareRecords for parity.
- A warn log fires per partial country so the Railway cron log is
  loud on drift without poisoning seed-meta.
- 4 new tests pin: all-matched counts, partial drops, empty/malformed
  payloads, and defensive handling of pre-completeness payload shape.

P1.2 — manifest external-reviewer language contradicted shipped workflow
------------------------------------------------------------------------
The YAML header said "External sovereign-wealth-practitioner review
is REQUIRED before PR 2 merges" and external_review_status=PENDING.
WorldMonitor's operating mode is fully automated (see memory
`feedback_no_external_reviewer_assumption.md`) — there is no external
practitioner gate. Reviewer correctly flagged the inconsistency
between the document and the shipped behaviour.

Fix:
- Rewrote the header to describe the actual audit discipline:
  coefficients derive from the committed rationale + cited sources
  for each fund; revisions require the same discipline in a follow-
  up PR. No external-gate language.
- Flipped external_review_status to REVIEWED, with a clarifying
  comment: REVIEWED = coefficients derive from the committed
  rationale + seeder end-to-end matches the live surfaces. PENDING
  remains reserved for future PRs that ship unresolved TBD
  coefficients.
- Rewrote the "candidates deferred from v1" trailing block. Each
  fund listed now has a concrete rationale for deferral (sanctions /
  access coefficient would pin at 0 / classification contested /
  AUM disclosure unstable) so a future PR author can argue the case
  on record. No "reviewer advice needed" placeholders.
- Tweaked two inline fund comments (UAE ADQ/ICD, Singapore Temasek)
  that still said "external reviewer" — now describe the substantive
  reason for inclusion/deferral.

Tests
-----
- 540/540 resilience + manifest + scraper tests pass.
- Typecheck clean on both tsconfig configs.
- Biome clean on all touched files.

* fix(resilience): PR 2 review — 4 P2 fixes (WB imports, null validate, nested table, aumYear)

Greptile P2 findings on PR #3305, all addressed.

(1) Silent country drop on missing WB imports
---------------------------------------------
If fetchAnnualImportsUsd() has no entry for a manifest ISO-2
(transient WB outage, new country with spotty coverage), the country
was silently skipped. Downstream scorer would then read "absent from
payload" as "no SWF" and score 0 with full coverage — substantively
wrong. Now logs a warn and adds each affected fund to the unmatched
list with a `(no WB imports)` suffix so the seed-meta observer sees
the degradation.

(2) typeof null === 'object' bypassed validate()
------------------------------------------------
Bare `typeof data?.countries === 'object'` returned true for
{ countries: null } and { countries: [] }. Downstream property
access would then crash. Strict check added: non-null plain object
only; also rejects arrays. Test pins all 5 edge cases.

(3) Nested </table> / </td> truncated wikitable parse
-----------------------------------------------------
Lazy [\s\S]*? in the outer table regex AND the inner row/cell regexes
could silently drop every row after any cell that contained a nested
mini-table (Wikipedia footnote boxes, sort helpers). Two-step fix:
  - extractFirstWikitable: depth-aware walk counts <table>/</table>
    opens and closes, returns content at balanced depth
  - stripNestedTables: iteratively removes complete inner
    <table>…</table> blocks BEFORE row parsing, so the lazy row / cell
    regexes never see a nested </tr> or </td>
Test: 5-row fixture with a nested table inside row 1's cell — ADIA
(row 2) must still parse, GPFG (row 1 with nested) must still parse.

(4) aumYear reflected scrape year, not data year
------------------------------------------------
List-article entries were stamped with `new Date().getFullYear()`
even though the Wikipedia list publishes no per-row data-year
annotation (figures are typically prior-period). Consumers using
aumYear for freshness audit would see "2026" for 2024/2025 data.
Now set to null for list entries; infobox tier 3b retains year
extraction from the "(YYYY)" tag on the individual fund article.

P1 bootstrap deferral: intentional per project memory
-----------------------------------------------------
AGENTS.md says new data sources MUST wire api/bootstrap.js. Not done
in this PR by design:
  - No RPC consumer exists yet for
    `resilience:recovery:sovereign-wealth:v1` (scorer lands in a
    follow-up PR; wiring bootstrap without a consumer would be dead
    code).
  - Local memory `feedback_health_required_key_needs_railway_cron_
    first.md` requires new seed keys to sit in ON_DEMAND_KEYS for
    ~7 days of clean Railway cron before promoting to
    BOOTSTRAP_KEYS — adding bootstrap wiring now would pre-empt
    that window and risk CRIT alarms on the health surface.
The scorer PR that follows will land the bootstrap wiring + the
dimension at the same time, which is the cohesive unit.

Tests: 547/547 resilience + manifest + scraper tests pass.
Typecheck clean on both tsconfig configs. Biome clean on touched
files.
2026-04-23 07:58:40 +04:00
Elie Habib
84ee2beb3e feat(energy): Energy Atlas end-to-end — pipelines + storage + shortages + disruptions + country drill-down (#3294)
* feat(energy): pipeline registries (gas + oil) — evidence-based schema

Day 6 of the Energy Atlas Release 1 plan (Week 2). First curated asset
registry for the atlas — the real gap vs GEF.

## Curated data (critical assets only, not global completeness)

scripts/data/pipelines-gas.json — 12 critical gas lines:
  Nord Stream 1/2 (offline; Swedish EEZ sabotage 2022; EU sanctions refs),
  TurkStream, Yamal–Europe (offline; Polish counter-sanctions),
  Brotherhood/Soyuz (offline; Ukraine transit expired 2024-12-31),
  Power of Siberia, Dolphin, Medgaz, TAP, TANAP,
  Central Asia–China, Langeled.

scripts/data/pipelines-oil.json — 12 critical oil lines:
  Druzhba North/South (N offline per EU 2022/879; S under landlocked
  derogation), CPC, ESPO (+ price-cap sanction ref), BTC, TAPS,
  Habshan–Fujairah (Hormuz bypass), Keystone, Kirkuk–Ceyhan (offline
  since 2023 ICC ruling), Baku–Supsa, Trans-Mountain (TMX expansion
  May 2024), ESPO spur to Daqing.

Scope note: 75+ each is Week 2b work via GEM bulk import. Today's cut
is curated from first-hand operator disclosures + regulator filings so
I can stand behind every evidence field.

## Evidence-based schema (not conclusion labels)

Per docs/methodology/pipelines.mdx: no bare `sanctions_blocked` field.
Every pipeline carries an evidence bundle with `physicalState`,
`physicalStateSource`, `operatorStatement`, `commercialState`,
`sanctionRefs[]`, `lastEvidenceUpdate`, `classifierVersion`,
`classifierConfidence`. The public badge (`flowing|reduced|offline|
disputed`) is derived server-side from this bundle at read time.

## Seeder

scripts/seed-pipelines.mjs — single process publishes BOTH keys
(energy:pipelines:{gas,oil}:v1) via two runSeed() calls. Tiny datasets
(<20KB each) so co-location is cheap and guarantees classifierVersion
consistency.

Conventions followed (worldmonitor-bootstrap-registration skill):
- TTL 21d = 3× weekly cadence (gold-standard per
  feedback_seeder_gold_standard.md)
- maxStaleMin 20_160 = 2× cadence (health-maxstalemin-write-cadence skill)
- sourceVersion + schemaVersion + recordCount + declareRecords wired
  (seed-contract-foundation)
- Zero-case explicitly NOT allowed — MIN_PIPELINES_PER_REGISTRY=8 floor

## Health registration (dual, per feedback_two_health_endpoints_must_match)

- api/health.js: BOOTSTRAP_KEYS adds pipelinesGas + pipelinesOil;
  SEED_META adds both with maxStaleMin=20_160.
- api/seed-health.js: mirror entries with intervalMin=10_080 (maxStaleMin/2).

## Bundle registration

scripts/seed-bundle-energy-sources.mjs adds a single Pipelines entry
(not two) because seed-pipelines.mjs publishes both keys in one run —
listing oil separately would double-execute. Monitoring of the oil key
staleness happens in api/health.js instead.

## Tests (tests/pipelines-registry.test.mts)

17 passing node:test assertions covering:
- Schema validation (both registries pass validateRegistry)
- Identity resolution (no id collisions, id matches object key)
- Country ISO2 normalization (from/to/transit all match /^[A-Z]{2}$/)
- Endpoint geometry within Earth bounds
- Evidence rigor: non-flowing badges require at least one supporting
  evidence source (operator statement / sanctionRefs / ais-relay /
  satellite / press)
- ClassifierConfidence in 0..1
- Commodity/capacity pairing (gas uses capacityBcmYr, oil uses
  capacityMbd — mixing = test fail)
- validateRegistry rejects: empty object, null, no-evidence fixtures,
  below-floor counts

Typecheck clean (both tsconfig.json and tsconfig.api.json).

Next: Day 7 will add list-pipelines / get-pipeline-detail RPCs in
supply-chain/v1. Day 8 ships PipelineStatusPanel with DeckGL PathLayer
consuming the registry.

* fix(energy): split seed-pipelines.mjs into two entry points — runSeed hard-exits

High finding from PR review. scripts/seed-pipelines.mjs called runSeed()
twice in one process and awaited Promise.all. But runSeed() in
scripts/_seed-utils.mjs hard-exits via process.exit on ~9 terminal paths
(lines 816, 820, 839, 888, 917, 989, plus fetch-retry 946, fatal 859,
skipped-lock 81). The first runSeed to reach any terminal path exits the
entire node process, so the second runSeed's resolve never fires — only
one of energy:pipelines:{gas,oil}:v1 would ever be written.

Since the bundle scheduled seed-pipelines.mjs exactly once, and both
api/health.js and api/seed-health.js expect both keys populated, the
other registry would stay permanently EMPTY/STALE after deploy.

Fix: split into two entry-point scripts around a shared utility.

- scripts/_pipeline-registry.mjs (NEW, was seed-pipelines.mjs) — shared
  helpers ONLY. Exports GAS_CANONICAL_KEY, OIL_CANONICAL_KEY,
  PIPELINES_TTL_SECONDS, MAX_STALE_MIN, buildGasPayload, buildOilPayload,
  validateRegistry, recordCount, declareRecords. Underscore prefix marks
  it as non-entry-point (matches _seed-utils.mjs / _seed-envelope-source.mjs
  convention).
- scripts/seed-pipelines-gas.mjs (NEW) — imports from the shared module,
  single runSeed('energy','pipelines-gas',…) call.
- scripts/seed-pipelines-oil.mjs (NEW) — same shape, oil.
- scripts/seed-bundle-energy-sources.mjs — register BOTH seeders (not one).
- scripts/seed-pipelines.mjs — deleted.
- tests/pipelines-registry.test.mts — update import path to the shared
  module. All 17 tests still pass.

Typecheck clean (both configs). Tests pass. No other consumers import
from the deleted script.

* fix(energy): complete pipeline bootstrap registration per 4-file checklist

High finding from PR review. My earlier PR description claimed
worldmonitor-bootstrap-registration was complete, but I only touched two
of the four registries (api/health.js + api/seed-health.js). The bootstrap
hydration payload itself (api/bootstrap.js) and the shared cache-keys
registry (server/_shared/cache-keys.ts) still had no entry for either
pipeline key, so any consumer that reads bootstrap data would see
pipelinesGas/pipelinesOil as missing on first load.

Files updated this commit:

- api/bootstrap.js — KEYS map + SLOW_KEYS set both gain pipelinesGas +
  pipelinesOil. Placed next to sprPolicies (same curated-registry cadence
  and tier). Slow tier is correct: weekly cron, not needed on first paint.
- server/_shared/cache-keys.ts — PIPELINES_GAS_KEY + PIPELINES_OIL_KEY
  exported constants (matches SPR_POLICIES_KEY pattern), BOOTSTRAP_KEYS map
  entries, and BOOTSTRAP_TIERS entries (both 'slow').

Not touched (intentional):
- server/gateway.ts — pipeline data is free-tier per the Energy Atlas
  plan; no PREMIUM_RPC_PATHS entry required. Energy Atlas monetization
  hooks (scenario runner, MCP tools, subscriptions) are Release 2.

Full 4-file checklist now complete:
   server/_shared/cache-keys.ts (this commit)
   api/bootstrap.js          (this commit)
   api/health.js             (earlier in PR)
   api/seed-health.js        (earlier in PR — dual-registry rule)

Typecheck clean (both configs).

* feat(energy): ListPipelines + GetPipelineDetail RPCs with evidence-derived badges

Day 7 of the Energy Atlas Release 1 plan (Week 2). Exposes the pipeline
registries (shipped in Day 6) via two supply-chain RPCs and ships the
evidence-to-badge derivation server-side.

## Proto

proto/worldmonitor/supply_chain/v1/list_pipelines.proto — new:
- ListPipelinesRequest { commodity_type?: 'gas' | 'oil' }
- ListPipelinesResponse { pipelines[], fetched_at, classifier_version, upstream_unavailable }
- GetPipelineDetailRequest { pipeline_id (required, query-param) }
- GetPipelineDetailResponse { pipeline?, revisions[], fetched_at, unavailable }
- PipelineEntry — wire shape mirroring scripts/data/pipelines-{gas,oil}.json
  + a server-derived public_badge field
- PipelineEvidence, OperatorStatement, SanctionRef, LatLon, PipelineRevisionEntry

service.proto adds both rpc methods with HTTP_METHOD_GET + path bindings:
  /api/supply-chain/v1/list-pipelines
  /api/supply-chain/v1/get-pipeline-detail

`make generate` regenerated src/generated/{client,server}/… + docs/api/
OpenAPI json/yaml.

## Evidence-derivation

server/worldmonitor/supply-chain/v1/_pipeline-evidence.ts — new.
derivePublicBadge(evidence) → 'flowing' | 'reduced' | 'offline' | 'disputed'
is deterministic + versioned (DERIVER_VERSION='badge-deriver-v1').

Rules (first match wins):
1. offline + sanctionRef OR expired/suspended commercial → offline
2. offline + operator statement → offline
3. offline + only press/ais/satellite → disputed (single-source negative claim)
4. reduced → reduced
5. flowing → flowing
6. unknown / malformed → disputed

Staleness guard: non-flowing badges on >14d-old evidence demote to
disputed. Flowing is the optimistic default — stale "still flowing" is
safer than stale "offline". Matches seed-pipelines-{gas,oil}.mjs maxStaleMin.

Tests (tests/pipeline-evidence-derivation.test.mts) — 15 passing cases
covering happy paths, disputed fallbacks, staleness guard, versioning.

## Handlers

server/worldmonitor/supply-chain/v1/list-pipelines.ts
- Reads energy:pipelines:{gas,oil}:v1 via getCachedJson.
- projectPipeline() narrows the Upstash `unknown` into PipelineEntry
  shape + calls derivePublicBadge.
- Honors commodity_type filter (skip the opposite registry's Redis read
  when the client pre-filters).
- Returns upstream_unavailable=true when BOTH registries miss.

server/worldmonitor/supply-chain/v1/get-pipeline-detail.ts
- Scans both registries by id (ids are globally unique per
  tests/pipelines-registry.test.mts).
- Empty revisions[] for now; auto-revision log wires up in Week 3.

handler.ts registers both into supplyChainHandler.

## Gateway

server/gateway.ts adds 'static' cache-tier for both new RPC paths
(registry is slow-moving; 'static' matches the other read-mostly
supply-chain endpoints).

## Consumer wiring

Not in this commit — PipelineStatusPanel (Day 8) is what will call
listPipelines/getPipelineDetail via the generated client. pipelinesGas
+ pipelinesOil stay in PENDING_CONSUMERS until Day 8.

Typecheck clean (both configs). 15 new tests + 17 registry tests all pass.

* feat(energy): PipelineStatusPanel — evidence-backed status table + drawer

Day 8 of the Energy Atlas Release 1 plan. First consumer of the Day 6–7
registries + RPCs.

## What this PR adds

- src/components/PipelineStatusPanel.ts — new panel (id=pipeline-status).
  * Bootstrap-hydrates from pipelinesGas + pipelinesOil for instant first
    paint; falls through to listPipelines() RPC if bootstrap misses.
    Background re-fetch runs on every render so a classifier-version bump
    between bootstrap stamp and first view produces a visible update.
  * Table rows sorted non-flowing-first (offline / reduced / disputed
    before flowing) — what an atlas reader cares about.
  * Click-to-expand drawer calls getPipelineDetail() lazily — operator
    statements, sanction refs (with clickable source URLs), commercial
    state, classifier version + confidence %, capacity + route metadata.
  * publicBadge color-chip palette matches the methodology doc.
  * Attribution footer with GEM (CC-BY 4.0) credit + classifier version.

- src/components/index.ts — barrel export.
- src/app/panel-layout.ts — import + createPanel('pipeline-status', …).
- src/config/panels.ts — ENERGY_PANELS adds 'pipeline-status' at priority 1.

## PENDING_CONSUMERS cleanup

tests/bootstrap.test.mjs — removes 'pipelinesGas' + 'pipelinesOil' from
the allowlist. The invariant "every bootstrap key has a getHydratedData
consumer" now enforces real wiring for these keys: the panel literally
calls getHydratedData('pipelinesGas') and getHydratedData('pipelinesOil').
Future regressions that remove the consumer will fail pre-push.

## Consumer contract verified

- 67 tests pass including bootstrap.test.mjs consumer coverage check.
- Typecheck clean.
- No DeckGL PathLayer in this commit — existing 'pipelines-layer' has a
  separate data source, so modifying DeckGLMap.ts to overlay evidence-
  derived badges on the map is a follow-up commit to avoid clobbering.

## Out of scope for Day 8 (next steps on same PR)

- DeckGL PathLayer integration (color pipelines on the main map by
  publicBadge, click-to-open this drawer) — Day 8b commit.
- Storage facility registry + StorageFacilityMapPanel — Days 9-10.

* fix(energy): PipelineStatusPanel bootstrap path — client-side badge derivation

High finding from PR review. The Day-8 panel crashed on first paint
whenever bootstrap hydration succeeded, because:

- Bootstrap hydrates raw scripts/data/pipelines-{gas,oil}.json verbatim.
- That JSON does NOT include publicBadge — that field is only added by
  the server handler's projectPipeline() in list-pipelines.ts.
- PipelineStatusPanel passed raw entries into badgeChip(), which called
  badgeLabel(undefined).charAt(0) → TypeError.

The background RPC refresh that would have repaired the data never ran
because the panel threw before reaching it. So the exact bootstrap path
newly wired in commit 6b01fa537 was broken for the new panel.

Fix: move the evidence→badge deriver to src/shared/pipeline-evidence.ts
so the client panel and the server handler run the identical function on
identical inputs. Panel projects raw bootstrap JSON through the shared
deriver client-side, producing the same publicBadge the RPC would have
returned. No UI flicker on hydration because pre- and post-RPC badges
match exactly (same function, same input).

## Changes

- src/shared/pipeline-evidence.ts (NEW) — pure deriver with duck-typed
  PipelineEvidenceInput (no generated-type dependency, so both client
  and server assign their proto-typed evidence bundles by structural
  subtyping). Exports derivePipelinePublicBadge + version + type.
- server/worldmonitor/supply-chain/v1/_pipeline-evidence.ts — now a thin
  re-export of the shared module under its older name so in-handler
  imports keep working without a sweep.
- src/components/PipelineStatusPanel.ts:
  * Imports derivePipelinePublicBadge from @/shared/pipeline-evidence.
  * NEW projectRawPipeline() defensively coerces every field from
    unknown → PipelineEntry shape, mirroring the server projection.
  * buildBootstrapResponse now routes every raw entry through the
    projection before returning, so the wire-format PipelineEntry[] the
    renderer receives always has publicBadge populated.
  * badgeChip() gained a null-guard fallback to 'disputed' — belt +
    braces so even if a future caller passes an undefined, the UI
    renders safely instead of throwing.
  * BootstrapRegistry renamed RawBootstrapRegistry with a comment
    explaining why the seeder ships raw JSON (not wire format).

## Regression tests

tests/pipeline-panel-bootstrap.test.mts (NEW) — 6 tests that exercise
the bootstrap-first-paint path end-to-end:

- Every gas + oil curated entry produces a valid badge.
- Raw entries never ship with pre-computed publicBadge (contract guard
  on the seed data format).
- Deriver never throws on undefined/null/{} evidence (was the crash).
- Nord Stream 1 regression check (offline + paperwork → offline).
- Druzhba-South staleness behavior (reduced when fresh, disputed after
  60 days without update).

38/38 tests now pass (17 registry + 15 deriver + 6 new bootstrap-path).
Typecheck clean on both configs.

## Invariant preserved

The server handler and the panel render identical badges because:
1. Same pure function (imported from the same module).
2. Same deterministic rules, same staleness window.
3. Same bootstrap data read by both paths (Redis → either bootstrap
   payload or RPC response).

No UI flicker on hydration.

* fix(energy): three PR-review P2s on PipelineStatusPanel + aggregators

## P2-1 — sanitizeUrl on external evidence links (XSS hardening)

Sanction-ref URLs and operator-statement URLs were interpolated with
escapeHtml only. HTML-escaping blocks tag injection but NOT javascript:
or data: URL schemes, so a bad URL in the seeded registry would execute
in-app when a reader clicked the evidence link. Every other panel in
the codebase (NewsPanel, GdeltIntelPanel, GeoHubsPanel, AirlineIntelPanel,
MonitorPanel) uses sanitizeUrl for this exact reason.

Fix: import sanitizeUrl from @/utils/sanitize and route both hrefs
through it. sanitizeUrl() drops non-http(s) schemes + returns '' on
invalid URLs. The renderer now suppresses the <a> entirely when
sanitize rejects — the date label still renders as plain text instead
of becoming an executable link.

## P2-2 — loadDetail catch path missing stale-response guard

The success path at loadDetail() checked `this.selectedId !== pipelineId`
to suppress stale responses when the user has clicked another pipeline
mid-flight. The catch path at line 219 had no such guard: if the user
clicked A, then B, and A's request failed before B resolved, A's error
handler cleared detailLoading and detail, showing "Pipeline detail
unavailable" for B's drawer even though B was still loading.

Fix: mirror the same `if (this.selectedId !== pipelineId) return` guard
in the catch path. The newer request now owns the drawer state
regardless of which path (success OR failure) the older one took.

## P2-3 — always-gas-preference aggregator for classifierVersion + fetchedAt

Three call sites (list-pipelines.ts handler, get-pipeline-detail.ts
handler, PipelineStatusPanel bootstrap projection) computed aggregate
classifier version and fetchedAt by `gas?.x || oil?.x || fallback`.
That was defensible when a single seed-pipelines.mjs wrote both keys
atomically (fix commit 29b4ac78f split this into two separate Railway
cron entry points). Now gas + oil cron independently, so mixed-version
(gas=v1, oil=v2 during classifier rollout) and mixed-timestamp (oil
refreshed 6h after gas) windows are the EXPECTED state, not the
exceptional one. The comment in list-pipelines.ts even said "pick the
newest classifier version" but the code didn't actually compare.

Fix: add two shared helpers in src/shared/pipeline-evidence.ts —

- pickNewerClassifierVersion(a,b) — parses /^v(\\d+)$/ and returns the
  higher-numbered version; falls back to lexicographic for non-v-
  prefixed values; handles single-missing inputs.
- pickNewerIsoTimestamp(a,b) — Date.parse()-compares and returns the
  later ISO; handles missing / malformed inputs gracefully.

Both server RPCs and the panel bootstrap projection now call these
helpers identically, so clients are told the truth about version +
freshness during partial rollouts.

## Tests

Extended tests/pipeline-evidence-derivation.test.mts with 8 new
assertions covering both pickers:

- Higher v-number wins regardless of order (v1 vs v2 → v2 both ways)
- Single-missing falls back to the one present
- Missing + missing → default 'v1' for version / '' for ts
- Non-v-numbered values fall back to lexicographic
- Explicit regression: "gas=v1 + oil=v2 during rollout" returns v2
- Explicit regression: "oil fresher than gas" returns the oil timestamp

38 → 46 tests. All pass. Typecheck clean on both configs.

* feat(energy): DeckGL PathLayer colored by evidence-derived badge + map↔panel link

Day 8b of the Energy Atlas plan. Pipelines now render on the main
DeckGL map of the energy variant colored by their derived publicBadge,
and clicking a pipeline on the map opens the same evidence drawer the
panel row-click opens.

## Why this commit

Day 8 shipped the PipelineStatusPanel as a table + drawer view.
Reviewer flag notwithstanding (fixed in 149d33ec3 + db52965cd), a
table-only pipeline view is a weak product compared to the map-centric
atlas it's meant to rival. The map-layer differentiation is the whole
point of the feature.

## What this adds

src/components/DeckGLMap.ts:
- New createEnergyPipelinesLayer() — reads hydrated pipeline registries
  via getHydratedData, projects raw JSON through the shared deriver
  (src/shared/pipeline-evidence.ts), renders a DeckGL PathLayer colored
  by publicBadge:
    flowing  → green (46,204,113)
    reduced  → amber (243,156,18)
    offline  → red   (231,76,60)
    disputed → purple (155,89,182)
  Offline + disputed get thicker strokes (3px vs 2px) for at-a-glance
  surfacing of disrupted assets. Geometry comes from raw startPoint +
  waypoints[] + endPoint per asset (straight line when no waypoints).
- Branching at line ~1498: SITE_VARIANT === 'energy' routes to the
  new method; other variants keep the static PIPELINES config (colored
  by oil/gas type). Existing commodity/finance/full map layers are
  untouched — no cross-variant leakage.
- onClick handler emits `energy:open-pipeline-detail` as a window
  CustomEvent with { pipelineId }. Loose coupling: the map doesn't
  import the panel, the panel doesn't import the map.
- Fallback: if bootstrap hasn't hydrated yet, createEnergyPipelinesLayer
  falls back to the static createPipelinesLayer() so the pipelines
  toggle always shows *something*.

src/components/PipelineStatusPanel.ts:
- Constructor registers a window event listener for
  'energy:open-pipeline-detail' → calls this.loadDetail(pipelineId) →
  drawer opens on the clicked asset. Map click and row click converge
  on the same drawer, same evidence view.
- destroy() removes the listener to prevent ghost handlers after panel
  unmount.

## Guarantees

- Bootstrap parity: the DeckGL layer calls the SAME derivePipelinePublicBadge
  as the panel and the server handler, so the map color, the table row
  chip, and the RPC response all agree on the badge. No flicker, no
  drift, no confused user.
- Variant isolation: only SITE_VARIANT === 'energy' triggers the new
  path. Commodity / finance / full map layers untouched.
- No cross-component import: the panel doesn't reference the map class
  and vice versa. The event contract is the only coupling — testable,
  swappable, tauri-safe (guarded with `typeof window !== 'undefined'`).

Typecheck clean. PR #3294 now has 8 commits.

Follow-up backlog:
- Add waypoints[] to the curated pipelines-{gas,oil}.json so the map
  draws real routes instead of straight lines (cosmetic; does not
  affect correctness).
- Tooltip case in the picking tooltip registry (line ~3748) so hover
  shows "Nord Stream 1 · OFFLINE" before click.

* fix(energy): three PR-review findings on Day 8b DeckGL integration

## P1 — getHydratedData single-use race between map + panel

src/services/bootstrap.ts:34 — `if (val !== undefined) hydrationCache.delete(key);`
The helper drains its slot on first read. Day 8 (PipelineStatusPanel) and
Day 8b (createEnergyPipelinesLayer) BOTH call getHydratedData('pipelinesGas')
and getHydratedData('pipelinesOil') — whoever renders first drains the cache
and forces the loser onto its fallback path (panel → RPC, map → static
PIPELINES layer). The commit's "shared bootstrap-backed data" guarantee
did not actually hold.

Fix: new src/shared/pipeline-registry-store.ts that reads once and memoizes.
Both consumers read through getCachedPipelineRegistries() — same data, same
reference, unlimited re-reads. When the panel's background RPC fetch lands,
it calls setCachedPipelineRegistries() to back-propagate fresh data into
the store so the map's next re-render sees the newer classifierVersion +
fetchedAt too (no map/panel drift during classifier rollouts).

Test-only injection hook (__setBootstrapReaderForTests) makes the drain-once
semantics observable without a real bootstrap payload.

## P2 — pipelines-layer tooltip regresses to blank label on energy variant

src/components/DeckGLMap.ts:3748 (pipelines-layer tooltip case) still assumed
the static-config shape (obj.type). The new energy layer emits objects with
commodityType + badge fields, so the tooltip's type-ternary fell through to
the generic fallback — hover rendered " pipeline" (empty leading commodity)
instead of "Nord Stream 1 · OFFLINE".

Fix: differentiate by presence of obj.badge (only the energy layer sets it).
On the energy variant, tooltip now reads name + commodity + badge. Static-
config variants (commodity / finance / full) keep their existing format
unchanged.

## P2 — createEnergyPipelinesLayer dropped highlightedAssets behavior

The static createPipelinesLayer() reads this.highlightedAssets.pipeline and
threads it into getColor / getWidth with an updateTrigger on the signature.
Any caller using flashAssets('pipeline', [...]) or highlightAssets([...])
gets a visible red-outline flash on the matching paths. My Day 8b energy
layer ignored the set entirely — those APIs silently no-op'd on the energy
variant.

Fix: createEnergyPipelinesLayer() now reads the same highlight set, applies
HIGHLIGHT_COLOR + wider stroke to matching IDs, and wires
updateTriggers: { getColor: sig, getWidth: sig } so DeckGL actually
recomputes when the set changes.

Also removed the unnecessary layerCache.set() in the energy path: the
store can update via RPC back-propagation, and a cache keyed only on
highlight-signature would serve stale data. With ~25 critical-asset
pipelines, rebuild per render is trivial.

## Tests

tests/pipeline-registry-store.test.mts (NEW) — 5 tests covering the
drain-once read-many invariant: multiple consumers get cached data
without re-draining, RPC back-propagation updates the source, partial
updates preserve the other commodity, and pure RPC-first (no bootstrap)
works without invoking the reader.

All 51 PR tests pass. Typecheck clean on both configs.

* feat(energy): Day 9 — storage facility registry (UGS + SPR + LNG + crude hubs)

Ships 21 critical strategic storage facilities as a curated registry, same
evidence-bundle pattern as the pipeline registries in Day 7/8:

- scripts/data/storage-facilities.json — 4 UGS + 4 SPR + 6 LNG export +
  3 LNG import + 4 crude tank farms. Each carries physicalState +
  sanctionRefs + classifierVersion/Confidence + fillDisclosed/fillSource.
- scripts/_storage-facility-registry.mjs — shared helpers (validator,
  builder, canonical key, MAX_STALE_MIN). Validator enforces facility-type
  × capacity-unit pairing (ugs→TWh, spr/tank-farm→Mb, LNG→Mtpa) and the
  non-operational badge ⇒ evidence invariant.
- scripts/seed-storage-facilities.mjs — single runSeed entry (only one
  key, so no split-seeder dance needed).
- Registered in the 4-file bootstrap checklist: cache-keys.ts
  (STORAGE_FACILITIES_KEY + BOOTSTRAP_CACHE_KEYS + BOOTSTRAP_TIERS),
  api/bootstrap.js (KEYS + SLOW_KEYS), api/health.js (BOOTSTRAP_KEYS +
  SEED_META, 14d threshold = 2× weekly cron), api/seed-health.js (mirror).
- tests/bootstrap.test.mjs PENDING_CONSUMERS adds storageFacilities —
  Day 10 StorageFacilityMapPanel will remove it.
- tests/storage-facilities-registry.test.mts — 20 tests covering schema,
  identity, geometry, type×capacity pairing, evidence contract, and
  negative-input validator rejection.

Registry fields are slow-moving; badge derivation happens at read-time
server-side once the RPC handler lands in Day 10 (panel + deckGL
ScatterplotLayer). Seeded data is live in Redis from this commit so the
Day 10 PR only adds display surfaces.

Tests: 56 pass (36 prior + 20 new). Typecheck + typecheck:api clean.

* feat(energy): Day 10 — storage atlas (ListStorageFacilities RPC + DeckGL ScatterplotLayer + panel)

End-to-end wiring for the strategic storage registry seeded in Day 9. Same
pattern as the pipeline shipping path (Days 7+8+8b): proto → handler →
shared evidence deriver → panel → DeckGL map layer, with a shared
read-once store keeping map + panel aligned.

Proto + generated code:
- list_storage_facilities.proto: ListStorageFacilities +
  GetStorageFacilityDetail messages with StorageFacilityEntry,
  StorageEvidence, StorageSanctionRef, StorageOperatorStatement,
  StorageLatLon, StorageFacilityRevisionEntry.
- service.proto wires both RPCs under /api/supply-chain/v1.
- make generate → regenerated client + server stubs + OpenAPI.

Server handlers:
- src/shared/storage-evidence.ts: shared pure deriver. Duck-typed input
  interface avoids generated-type deps; identical rules to the pipeline
  deriver (sanction/commercial paperwork vs external-signal-only offline,
  14d staleness window, version pin).
- _storage-evidence.ts: thin re-export for server handler import ergonomics.
- list-storage-facilities.ts: reads STORAGE_FACILITIES_KEY from Upstash,
  projects raw → wire format, attaches derived publicBadge, filters by
  optional facilityType query arg.
- get-storage-facility-detail.ts: single-asset lookup for drawer.
- handler.ts registers both new methods.
- gateway.ts: both routes → 'static' cache tier (registry is near-static).

Panel + map:
- src/shared/storage-facility-registry-store.ts: drain-once memo mirroring
  pipeline-registry-store. Both panel and DeckGL layer read through this
  so the single-use getHydratedData drain doesn't race between consumers.
  RPC back-propagation via setCachedStorageFacilityRegistry() keeps map ↔
  panel on the same classifierVersion during rollouts.
- StorageFacilityMapPanel.ts: table + evidence drawer. Bootstrap hot path
  projects raw registry through same deriver as server so first-paint
  badge matches post-RPC badge (no flicker). sanitizeUrl + stale-response
  guards (success + catch paths) carried over from PipelineStatusPanel.
- DeckGLMap.ts createEnergyStorageLayer(): ScatterplotLayer keyed on
  badge color; log-scale radius (6km–26km) keeps Rehden visible next to
  Ras Laffan. Click dispatches 'energy:open-storage-facility-detail' —
  panel listens and opens its drawer (loose coupling, no direct refs).
- Tooltip branch on storage-facilities-layer shows facility type, country,
  capacity unit, and badge.
- Added 'storageFacilities' optional field to MapLayers type (optional so
  existing variant literals across commodity/finance/tech/happy/full/etc.
  don't need touching). Wired into LAYER_REGISTRY + VARIANT_LAYER_ORDER.energy
  + ENERGY_MAP_LAYERS + ENERGY_MOBILE_MAP_LAYERS. Panel entry added to
  ENERGY_PANELS + panel-layout createPanel. PENDING_CONSUMERS entry from
  Day 9 removed — panel + map layer are now real consumers.

Tests:
- storage-evidence-derivation.test.mts (17 tests): covers every curated
  facility yields a valid badge, null/malformed input never throws,
  offline sanction/commercial/operator rules, external-signal-only offline
  → disputed, staleness demotion.
- storage-facility-registry-store.test.mts (4 tests): drain-once, no-data
  drain, RPC update, pure-RPC-first path.

All 6,426 unit tests pass. Typecheck + typecheck:api clean. Pre-existing
src-tauri/sidecar/ test failure is unrelated (no diff touches src-tauri/).

* feat(energy): Day 11 — fuel-shortage registry schema + seed + RPC (classifier post-launch)

Ships v1 of the global fuel-shortage alert registry. Severity is the
CLASSIFIER OUTPUT (confirmed/watch), not a client derivation — we ship
the evidence alongside so readers can audit the grounds. v1 is seeded
from curated JSON; post-launch the proactive-intelligence classifier
(Day 12 work) extends the same key directly.

Data:
- scripts/data/fuel-shortages.json — 15 known active shortages
  (PK, LK, NG×2, CU, VE, LB, ZW, AR, IR, BO, KE, PA, EG, BY)
  spanning petrol/diesel/jet across confirmed + watch tiers. Each entry
  carries evidenceSources[] (regulator/operator/press), firstSeen,
  lastConfirmed, resolvedAt, impactTypes[], causeChain[], classifier
  version + confidence. Confirmed severity enforces authoritative
  evidence at schema level.

Seeder:
- scripts/_fuel-shortage-registry.mjs — shared validator (enforces
  iso2 country, enum products/severities/impacts/causes, authoritative
  evidence for confirmed). MIN_SHORTAGES=10.
- scripts/seed-fuel-shortages.mjs — single runSeed entry.
- Registered in seed-bundle-energy-sources.mjs at DAY cadence (shortages
  move faster than registry assets).

Bootstrap 4-file registration:
- cache-keys.ts: FUEL_SHORTAGES_KEY + BOOTSTRAP_CACHE_KEYS + BOOTSTRAP_TIERS.
- api/bootstrap.js: KEYS + SLOW_KEYS.
- api/health.js: BOOTSTRAP_KEYS + SEED_META (2880min = 2× daily cron).
- api/seed-health.js: mirrors intervalMin=1440.

Proto + RPC:
- list_fuel_shortages.proto: ListFuelShortages (country/product/severity
  query facets) + GetFuelShortageDetail messages with FuelShortageEntry,
  FuelShortageEvidence, FuelShortageEvidenceSource.
- service.proto wires both new RPCs under /api/supply-chain/v1.
- list-fuel-shortages.ts handler projects raw → wire format, supports
  server-side country/product/severity filtering.
- get-fuel-shortage-detail.ts single-shortage lookup.
- handler.ts registers both. gateway.ts: 'medium' cache-tier (daily
  classifier updates warrant moderate freshness).

Shared evidence helper:
- src/shared/shortage-evidence.ts: deriveShortageEvidenceQuality maps
  (confidence + authoritative-source count + freshness) → 'strong' |
  'moderate' | 'thin' for client-side sort/trust indicators. Does NOT
  change severity — classifier owns that decision.
- countEvidenceSources buckets sources for the drawer's "n regulator /
  m press" line.

Tests:
- tests/fuel-shortages-registry.test.mts (19 tests): schema, identity,
  enum coverage, evidence contract (confirmed → authoritative source),
  validateRegistry negative cases.
- tests/shortage-evidence.test.mts (10 tests): quality deriver edge
  cases, source bucketing.
- tests/bootstrap.test.mjs PENDING_CONSUMERS adds fuelShortages —
  FuelShortagePanel arrives Day 12 which will remove the entry.

Typecheck + typecheck:api clean. 64 tests pass.

* feat(energy): Day 12 — FuelShortagePanel + DeckGL shortage pins

End-to-end wiring of the fuel-shortage registry shipped in Day 11: panel
on the Energy variant page, ScatterplotLayer pins on the DeckGL map,
both reading through a shared single-drain store so they don't race on
the bootstrap cache.

Panel:
- src/components/FuelShortagePanel.ts — table sorted by severity (confirmed
  first) then evidence quality (strong → thin) then most-recent lastConfirmed.
  Drawer shows short description, first-seen / last-confirmed / resolved,
  impact types, cause chain, classifier version/confidence, and a typed
  evidence-source list with regulator/operator/press chips. sanitizeUrl on
  every href so classifier-ingested URLs can't render as javascript:. Same
  stale-response guards on success + catch paths as the other detail drawers.
- Consumes deriveShortageEvidenceQuality for client-side trust indicator
  (three-dot ●●● / ●●○ / ●○○), NOT for severity — severity is classifier
  output.
- Registered in ENERGY_PANELS + panel-layout.ts + components barrel.

Shared store:
- src/shared/fuel-shortage-registry-store.ts — same drain-once memoize
  pattern as pipeline- and storage-facility-registry-store. Both the
  panel and the DeckGL shortage-pins layer read through it.

DeckGL layer:
- DeckGLMap.createEnergyShortagePinsLayer: ScatterplotLayer placing one
  pin per active shortage at the country centroid (via getCountryCentroid
  from services/country-geometry). Stacking offset (~0.8° lon) when
  multiple shortages share a country so Nigeria's petrol + diesel don't
  render as a single dot. Confirmed pins 55km radius; watch 38km. Click
  dispatches 'energy:open-fuel-shortage-detail' — panel listens.
- Tooltip branch on fuel-shortages-layer: country · product · short
  description · severity.
- Layer registered in LAYER_REGISTRY, VARIANT_LAYER_ORDER.energy,
  ENERGY_MAP_LAYERS, ENERGY_MOBILE_MAP_LAYERS. MapLayers.fuelShortages
  is optional on the type so other variants' literals remain valid.

Tests:
- tests/fuel-shortage-registry-store.test.mts (4 tests): drain-once,
  no-data, RPC back-prop, pure-RPC-first path.
- tests/bootstrap.test.mjs — fuelShortages removed from PENDING_CONSUMERS.

Typecheck + typecheck:api clean. 39 tests pass (plus full suite in pre-push).

* feat(energy): Day 13 — energy disruption event log + asset timeline drawer

Ships the energy:disruptions:v1 registry that threads together pipelines
and storage facilities: state transitions (sabotage, sanction, maintenance,
mechanical, weather, commercial, war) keyed by assetId so any asset's
drawer can render its history without a second registry lookup.

Data + seeder:
- scripts/data/energy-disruptions.json — 12 curated events spanning
  Nord Stream 1/2 sabotage, Druzhba sanctions, CPC force majeure,
  TurkStream maintenance, Yamal halt, Rehden trusteeship, Arctic LNG 2
  sanction, ESPO drone strikes, BTC fire (historical), Sabine Pass
  Hurricane Beryl, Power of Siberia ramp. Each event links back to a
  seeded asset.
- scripts/_energy-disruption-registry.mjs — validator enforces valid
  assetType/eventType/cause enums, http(s) sources, startAt ≤ endAt,
  MIN_EVENTS=8.
- scripts/seed-energy-disruptions.mjs — runSeed entry (weekly cron).
- Bundle entry at 7×DAY cadence.

Bootstrap 4-file registration (cache-keys.ts + bootstrap.js + health.js +
seed-health.js) — energyDisruptions in PENDING_CONSUMERS because panel
drawers fetch lazily via RPC on drawer-open rather than hydrating from
bootstrap directly.

Proto + handler:
- list_energy_disruptions.proto: ListEnergyDisruptions with
  assetId / assetType / ongoingOnly query facets. Returns events sorted
  newest-first.
- list-energy-disruptions.ts projects raw → wire format, supports all
  three query facets.
- Registered in handler.ts. gateway.ts: 'medium' cache tier.

Shared timeline helper:
- src/shared/disruption-timeline.ts — pure formatters (formatEventWindow,
  formatCapacityOffline, statusForEvent). No generated-type deps so
  PipelineStatusPanel + StorageFacilityMapPanel import the same helpers
  and render the timeline identically.

Panel integration:
- PipelineStatusPanel.loadDetail now fetches getPipelineDetail +
  listEnergyDisruptions({assetId, assetType:'pipeline'}) in parallel.
  Drawer gains "Disruption timeline (N)" section with event type, date
  window, capacity offline, cause chain, and short description per entry.
- StorageFacilityMapPanel gets identical treatment with assetType='storage'.
- Both reset detailEvents on closeDetail and on fresh click (stale-response
  safety).

Tests:
- tests/energy-disruptions-registry.test.mts (17 tests): schema, identity,
  enum coverage, evidence, negative inputs.
- tests/bootstrap.test.mjs — energyDisruptions added to PENDING_CONSUMERS.

Typecheck + typecheck:api clean. 51 tests pass locally (plus full suite
in pre-push).

* feat(energy): Day 14 — country drill-down Atlas exposure section

Extends CountryDeepDivePanel's existing "Energy Profile" card with a
mini Atlas-exposure section that surfaces per-country exposure to the
new registries we shipped in Days 7-13.

For each country:
- Pipelines touching this country (from, to, or transit) — clickable
  rows that dispatch 'energy:open-pipeline-detail' so the PipelineStatusPanel
  drawer opens on the energy variant; no-op on other variants.
- Storage facilities in this country — same loose-coupling pattern
  with 'energy:open-storage-facility-detail'.
- Active fuel shortages in this country — severity breakdown line
  (N confirmed · M watch) plus clickable rows emitting
  'energy:open-fuel-shortage-detail'.

Silent absence: sections render only when the country has matching
assets/events, so countries with no pipeline, storage, or shortage
touchpoints see the existing energy-profile card unchanged.

Lazy stores: reads go through the same shared drain-once stores
(getCachedPipelineRegistries, getCachedStorageFacilityRegistry,
getCachedFuelShortageRegistry) so CountryDeepDivePanel does NOT race
with Atlas panels over the single-drain bootstrap cache. Dynamic
import() keeps the three stores out of the panel's static import graph
so non-energy variants can tree-shake them.

Typecheck clean. No schema changes; purely additive UI read from
already-shipped registries.

* docs(energy): methodology page for energy disruption event log

Fills the /docs/methodology/disruptions URL referenced by
list_energy_disruptions.proto, scripts/_energy-disruption-registry.mjs,
and the panel attribution footers. Explains scope (state transitions
not daily noise), data shape, what counts as a disruption, classifier
evolution path, RPC contract, and ties into the sibling pipeline +
storage + shortage methodology pages.

No code change; pure docs completion for Week 4 launch polish.

* fix(energy): upstreamUnavailable only fires when Redis returned nothing

Two handlers (list-storage-facilities + list-pipelines) conflated "empty
filter result on a healthy registry" with "upstream unavailable". A
caller who queried one facilityType/commodityType and legitimately got
zero matches was told the upstream was down — which may push clients to
error-state rendering or suppress caching instead of showing a valid
empty list.

list-storage-facilities.ts — upstreamUnavailable now only fires when
`raw` is null (Redis miss). Zero filtered rows on a healthy registry
returns upstreamUnavailable: false + empty array. Matches the sibling
list-fuel-shortages handler and the wire contract in
list_storage_facilities.proto.

list-pipelines.ts — same bug, subtler shape. Now checks "requested at
least one side AND received nothing" rather than "zero rows after
collection". A filter that legitimately matches no gas/oil pipelines on
a healthy registry now returns upstreamUnavailable: false.

list-energy-disruptions.ts and list-fuel-shortages.ts already had the
correct shape (only flag unavailable when raw is missing) — left as-is.

Typecheck + typecheck:api clean. No tests added: the existing registry
schema tests cover the projection/filter helpers, and the handler-level
gating change is documented in code comments for future audits.

* fix(energy): three Greptile findings on PR #3294

Two P1 filter bugs (resolved shortages rendered as active) and one P2
contract inconsistency on the disruptions handler.

P1: DeckGLMap createEnergyShortagePinsLayer rendered every shortage in
the registry as an active crisis pin — including entries where the
classifier has written resolvedAt to mark the crisis over. Added a
filter so only entries with a null/empty resolvedAt become map pins.
Curated v1 data has resolvedAt=null everywhere so no visible change
today, but the moment the classifier starts writing resolutions
post-launch, resolved shortages would have appeared as ongoing.

P1: CountryDeepDivePanel renderAtlasExposure had the same bug in the
country drill-down — "N confirmed · M watch" counts included resolved
entries, inflating the active-crisis line per country. Same one-line
filter fix.

P2: list-energy-disruptions.ts gated upstreamUnavailable on
`!raw?.events` — a partial write (top-level object present but `events`
property missing) fired the "upstream down" flag, inconsistent with
the sibling handlers (list-pipelines, list-storage-facilities,
list-fuel-shortages) that only fire on `!raw`. Rewrote to match:
`!raw` → upstreamUnavailable, empty events → normal empty list. This
also aligns with the contract documented on the upstream-unavailable-
vs-empty-filter skill extracted from the earlier P2 review.

Typecheck + typecheck:api clean. All three fixes are one-liner filter
or gate changes; no test additions needed (registry tests still pass
with v1 data since resolvedAt is null throughout).
2026-04-23 07:34:07 +04:00
Elie Habib
7cf37c604c feat(resilience): PR 3 — dead-signal cleanup (plan §3.5, §3.6) (#3297)
* feat(resilience): PR 3 §3.5 — retire fuelStockDays from core score permanently

First commit in PR 3 of the resilience repair plan. Retires
`fuelStockDays` from the core score with no replacement.

Why permanent, not replaced:
IEA emergency-stockholding rules are defined in days of NET IMPORTS
and do not bind net exporters by design. Norway/Canada/US measured
in days-of-imports are incomparable to Germany/Japan measured the
same way — the construct is fundamentally different across the two
country classes. No globally-comparable recovery-fuel signal can
be built from this source; the pre-repair probe showed 100% imputed
at 50 for every country in the April 2026 freeze.

  scoreFuelStockDays:
    - Rewritten to return coverage=0 + observedWeight=0 +
      imputationClass='source-failure' for every country regardless
      of seed content.
    - Drops the dimension from the `recovery` domain's coverage-
      weighted mean automatically; remaining recovery dimensions
      pick up the share via re-normalisation in
      `_shared.ts#coverageWeightedMean`.
    - No explicit weight transfer needed — the coverage-weighted
      blend handles redistribution.

  Registry:
    - recoveryFuelStockDays re-tagged from tier='enrichment' to
      tier='experimental' so the Core coverage gate treats it as
      out-of-score.
    - Description updated to make the retirement explicit; entry
      stays in the registry for structural continuity (the
      dimension `fuelStockDays` remains in RESILIENCE_DIMENSION_ORDER
      for the 19-dimension tests; removing the dimension entirely is
      a PR 4 structural-audit concern).

  Housekeeping:
    - Removed `RESILIENCE_RECOVERY_FUEL_STOCKS_KEY` constant (no
      longer read; noUnusedLocals would reject it).
    - Removed `RecoveryFuelStocksCountry` interface for the same
      reason. Comment at the removed declaration instructs future
      maintainers not to re-add the type as a reservation; when a
      new recovery-fuel concept lands, introduce a fresh interface.

Plan reference: §3.5 point 1 of
`docs/plans/2026-04-22-001-fix-resilience-scorer-structural-bias-plan.md`.

51 resilience tests pass, typecheck + biome clean. The
`recovery` domain's published score will shift slightly for every
country because the 0.10 slot that fuelStockDays was imputing to
now redistributes; the compare-harness acceptance-gate rerun at
merge time will quantify the shift per plan §6 gates.

* feat(resilience): PR 3 §3.5 — retire BIS-backed currencyExternal; rebuild on IMF inflation + WB reserves

BIS REER/DSR feeds were load-bearing in currencyExternal (weights 0.35
fxVolatility + 0.35 fxDeviation, ~70% of dimension). They cover ~60
countries max — so every non-BIS country fell through to
curated_list_absent (coverage 0.3) or a thin IMF proxy (coverage 0.45).
Combined with reserveMarginPct already removed in PR 1, currencyExternal
was the clearest "construct absent for most of the world" carrier left
in the scorer.

Changes:

_dimension-scorers.ts
- scoreCurrencyExternal now reads IMF macro (inflationPct) + WB FX
  reserves only. Coverage ladder:
    inflation + reserves → 0.85 (observed primary + secondary)
    inflation only       → 0.55
    reserves only        → 0.40
    neither              → 0.30 (IMPUTE.bisEer retained for snapshot
                                 continuity; semantics read as
                                 "no IMF + no WB reserves" now)
- Removed dead symbols: RESILIENCE_BIS_EXCHANGE_KEY constant (reserved
  via comment only, flagged by noUnusedLocals), stddev() helper,
  getCountryBisExchangeRates() loader, BisExchangeRate interface,
  dateToSortableNumber() — all were exclusive callers of the retired
  BIS path.

_indicator-registry.ts
- New core entry inflationStability (weight 0.60, tier=core,
  sourceKey=economic:imf:macro:v2).
- fxReservesAdequacy weight 0.15 → 0.40 (secondary reliability
  anchor).
- fxVolatility + fxDeviation demoted tier=enrichment → tier=experimental
  (BIS ~60-country coverage; off the core weight sum).
- Non-experimental weights now sum to 1.0 (0.60 + 0.40).

scripts/compare-resilience-current-vs-proposed.mjs
- EXTRACTION_RULES: added inflationStability →
  imf-macro-country-field field=inflationPct so the registry-parity
  test passes and the correlation harness sees the new construct.

tests/resilience-dimension-scorers.test.mts
- Dropped BIS-era wording ("non-BIS country") and test 266
  (BIS-outage coverage 0.35 branch) which collapsed to the inflation-
  only path post-retirement.
- Updated coverage assertions: inflation-only 0.45 → 0.55; inflation+
  reserves 0.55 → 0.85.

tests/resilience-scorers.test.mts
- domainAverages.economic 68.33 → 66.33 (US currencyExternal score
  shifts slightly under IMF+reserves vs old BIS composite).
- stressScore 67.85 → 67.21; stressFactor 0.3215 → 0.3279.
- overallScore 65.82 → 65.52.
- baselineScore unchanged (currencyExternal is stress-only).

All 6324 data-tier tests pass. typecheck:api clean. No change to
seeders or Redis keys; this is a pure scorer + registry rebuild.

* feat(resilience): PR 3 §3.5 point 3 — re-goalpost externalDebtCoverage (0..5 → 0..2)

Plan §2.1 diagnosis table showed externalDebtCoverage saturating at
score=100 across all 9 probe countries — including stressed states.
Signal was collapsed. Root cause: (worst=5, best=0) gave every country
with ratio < 0.5 a score above 90, and mapped Greenspan-Guidotti's
reserve-adequacy threshold (ratio=1.0) to score 80 — well into "no
worry" territory instead of the "mild warning" it should be.

Re-anchored on Greenspan-Guidotti directly: ratio=1.0 now maps to score
50 (mild warning), ratio=2.0 to score 0 (acute rollover-shock exposure).
Ratios above 2.0 clamp to 0, consistent with "beyond this point the
country is already in crisis; exact value stops mattering."

Files changed:

- _indicator-registry.ts: recoveryDebtToReserves goalposts
  {worst: 5, best: 0} → {worst: 2, best: 0}. Description updated to
  cite Greenspan-Guidotti; inline comment documents anchor + rationale.

- _dimension-scorers.ts: scoreExternalDebtCoverage normalizer bound
  changed from (0..5) to (0..2), with inline comment.

- docs/methodology/country-resilience-index.mdx: goalpost table row
  5-0 → 2-0, description cites Greenspan-Guidotti.

- docs/methodology/indicator-sources.yaml:
  * constructStatus: dead-signal → observed-mechanism (signal is now
    discriminating).
  * reviewNotes updated to describe the new anchor.
  * mechanismTestRationale names the Greenspan-Guidotti rule.

- tests/resilience-dimension-monotonicity.test.mts: updated the
  comment + picked values inside the (0..2) discriminating band (0.3
  and 1.5). Old values (1 vs 4) had 4 clamping to 0.

- tests/resilience-dimension-scorers.test.mts: NO score threshold
  relaxed >90 → >=85 (NO ratio=0.2 now scores 90, was 96).

- tests/resilience-scorers.test.mts: fixture drift:
  * domainAverages.recovery 54.83 → 47.33 (US extDebt 70 → 25).
  * baselineScore 63.63 → 60.12 (extDebt is baseline type).
  * overallScore 65.52 → 63.27.
  * stressScore / stressFactor unchanged (extDebt is baseline-only).

All 6324 data-tier tests pass. typecheck:api clean.

* feat(resilience): PR 3 §3.6 — CI gate on indicator coverage and nominal weight

Plan §3.6 adds a new acceptance criterion (also §5 item 5):

> No indicator with observed coverage below 70% may exceed 5% nominal
> weight OR 5% effective influence in the post-change sensitivity run.

This commit enforces the NOMINAL-WEIGHT half as a unit test that runs
on every CI build. The EFFECTIVE-INFLUENCE half is produced by
scripts/validate-resilience-sensitivity.mjs as a committed artifact;
the gate file only asserts that script still exists so a refactor that
removes it breaks the build loudly.

Why the gate exists (plan §3.6):

  "A dimension at 30% observed coverage carries the same effective
   weight as one at 95%. This contradicts the OECD/JRC handbook on
   uncertainty analysis."

Implementation:

tests/resilience-coverage-influence-gate.test.mts — three tests:
  1. Nominal-weight gate: for every core indicator with coverage < 137
     countries (70% of the ~195-country universe), computes its nominal
     overall weight as
       indicator.weight × (1/dimensions-in-domain) × domain-weight
     and asserts it does not exceed 5%. Equal-share-per-dimension is
     the *upper bound* on runtime weight (coverage-weighted mean gives
     a lower share when a dimension drops out), so this is a strict
     bound: if the nominal number passes, the runtime number also
     passes for every country.
  2. Effective-influence contract: asserts the sensitivity script
     exists at its expected path. Removing it (intentionally or by
     refactor) breaks the build.
  3. Audit visibility: prints the top 10 core indicators by nominal
     overall weight. No assertion beyond "ran" — the list lets
     reviewers spot outliers that pass the gate but are near the cap.

Current state (observed from audit output):

  recoveryReserveMonths:   nominal=4.17%  coverage=188
  recoveryDebtToReserves:  nominal=4.17%  coverage=185
  recoveryImportHhi:       nominal=4.17%  coverage=190
  inflationStability:      nominal=3.40%  coverage=185
  electricityConsumption:  nominal=3.30%  coverage=217
  ucdpConflict:            nominal=3.09%  coverage=193

Every core indicator has coverage ≥ 180 (already enforced by the
pre-existing indicator-tiering test), so the nominal-weight gate has
no current violators — its purpose is catching future drift, not
flagging today's state.

All 6327 data-tier tests pass. typecheck:api clean.

* docs(resilience): PR 3 methodology doc — document §3.5 dead-signal retirements + §3.6 coverage gate

Methodology-doc update capturing the three §3.5 landings and the §3.6 CI
gate. Five edits:

1. **Known construct limitations section (#5 and #6):** strikethrough the
   original "dead signals" and "no coverage-based weight cap" items,
   annotate them with "Landed in PR 3 §3.5"/"Landed in PR 3 §3.6" +
   specifics of what shipped.

2. **Currency & External H4 section:** completely rewritten. Old table
   (fxVolatility / fxDeviation / fxReservesAdequacy on BIS primary) is
   replaced by the two-indicator post-PR-3 table (inflationStability at
   0.60 + fxReservesAdequacy at 0.40). Coverage ladder spelled out
   (0.85 / 0.55 / 0.40 / 0.30). Legacy BIS indicators named as
   experimental-tier drill-downs only.

3. **Fuel Stock Days H4 section:** H4 heading text kept verbatim so the
   methodology-lint H4-to-dimension mapping does not break; body
   rewritten to explain that the dimension is retired from core but the
   seeder still runs for IEA-member drill-downs.

4. **External Debt Coverage table row:** goalpost 5-0 → 2-0, description
   cites Greenspan-Guidotti reserve-adequacy rule.

5. **New v2.2 changelog entry** — PR 3 dead-signal cleanup, covering
   §3.5 points 1/2/3 + §3.6 + acceptance gates + construct-audit
   updates.

No scoring or code changes in this commit. Methodology-lint test passes
(H4 mapping intact). All 6327 data-tier tests pass.

* fix(resilience): PR 3 §3.6 gate — correct share-denominator for coverage-weighted aggregation

Reviewer catch (thanks). The previous gate computed each indicator's
nominal overall weight as

  indicator.weight × (1 / N_total_dimensions_in_domain) × domain_weight

and claimed this was an upper bound ("actual runtime weight is ≤ this
when some dimensions drop out on coverage"). That is BACKWARDS for
this scorer.

The domain aggregation is coverage-weighted
(server/worldmonitor/resilience/v1/_shared.ts coverageWeightedMean),
so when a dimension pins at coverage=0 it is EXCLUDED from the
denominator and the surviving dimensions' shares go UP, not down.

PR 3 commit 1 retires fuelStockDays by hard-coding its scorer to
coverage=0 for every country — so in the current live state the
recovery domain has 5 contributing dimensions (not 6), and each core
recovery indicator's nominal share is

  1.0 × 1/5 × 0.25 = 5.00% (was mis-reported as 4.17%)

The old gate therefore under-estimated nominal influence and could
silently pass exactly the kind of low-coverage overweight regression
it is meant to block.

Fix:

- Added `coreBearingDimensions(domainId)` helper that counts only
  dimensions that have ≥1 core indicator in the registry. A dimension
  with only experimental/enrichment entries (post-retirement
  fuelStockDays) has no core contribution → does not dilute shares.
- Updated `nominalOverallWeight` to divide by the core-bearing count,
  not the raw dimension count.
- Rewrote the helper's doc comment to stop claiming this is a strict
  upper bound — explicitly calls out the dynamic case (source failure
  raising surviving dim shares further) as the sensitivity script's
  responsibility.
- Added a new regression test: asserts (a) at least one recovery
  dimension is all-non-core (fuelStockDays post-retirement),
  (b) fuelStockDays has zero core indicators, and (c) recoveryDebt
  ToReserves nominal = 0.05 exactly (not 0.0417) — any reversion
  of the retirement or regression to N_total-denominator will fail
  loudly.

Top-10 audit output now correctly shows:

  recoveryReserveMonths:   nominal=5%     coverage=188
  recoveryDebtToReserves:  nominal=5%     coverage=185
  recoveryImportHhi:       nominal=5%     coverage=190
  (was 4.17% each under the old math)

All 486 resilience tests pass. typecheck:api clean.

Note: the 5% figure is exactly AT the cap, not over it. "exceed" means
strictly > 5%, so it still passes. But now the reviewer / audit log
reflects reality.

* fix(resilience): PR 3 review — retired-dim confidence drag + false source-failure label

Addresses the Codex review P1 + P2 on PR #3297.

P1 — retired-dim drag on confidence averages
--------------------------------------------
scoreFuelStockDays returns coverage=0 by design (retired construct),
but computeLowConfidence, computeOverallCoverage, and the widget's
formatResilienceConfidence averaged across all 19 dimensions. That
dragged every country's reported averageCoverage down — US went from
0.8556 (active dims only) to 0.8105 (all dims) — enough drift to
misclassify edge countries as lowConfidence and to shift the ranking
widget's overallCoverage pill for every country.

Fix: introduce an authoritative RESILIENCE_RETIRED_DIMENSIONS set in
_dimension-scorers.ts and filter it out of all three averages. The
filter is keyed on the retired-dim REGISTRY, not on coverage === 0,
because a non-retired dim can legitimately emit coverage=0 on a
genuinely sparse-data country via weightedBlend fall-through — those
entries MUST keep dragging confidence down (that is the sparse-data
signal lowConfidence exists to surface). Verified: sparse-country
release-gate test (marks sparse WHO/FAO countries as low confidence)
still passes with the registry-keyed filter; would have failed with
a naive coverage=0 filter.

Server-client parity: widget-utils cannot import server code, so
RESILIENCE_RETIRED_DIMENSION_IDS is a hand-mirrored constant, kept
in lockstep by tests/resilience-retired-dimensions-parity.test.mts
(parses the widget file as text, same pattern as existing widget-util
tests that can't import the widget module directly).

P2 — false "Source down" label on retired dim
---------------------------------------------
scoreFuelStockDays hard-coded imputationClass: 'source-failure',
which the widget maps to "Source down: upstream seeder failed" with
a `!` icon for every country. That is semantically wrong for an
intentional retirement. Flipped to null so the widget's absent-path
renders a neutral cell without a false outage label. null is already
a legal value of ResilienceDimensionScore.imputationClass; no type
change needed.

Tests
-----
- tests/resilience-confidence-averaging.test.mts (new): pins the
  registry-keyed filter semantic for computeOverallCoverage +
  computeLowConfidence. Includes a negative-control test proving
  non-retired coverage=0 dims still flip lowConfidence.
- tests/resilience-retired-dimensions-parity.test.mts (new):
  lockstep gate between server and client retired-dim lists.
- Widget test adds a registry-keyed exclusion test with a non-retired
  coverage=0 dim in the fixture to lock in the correct semantic.
- Existing tests asserting imputationClass: 'source-failure' for
  fuelStockDays flipped to null.

All 494 resilience tests + full 6336/6336 data-tier suite pass.
Typecheck clean for both tsconfig.json and tsconfig.api.json.

* docs(resilience): align methodology + registry metadata with shipped imputationClass=null

Follow-up to the previous PR 3 review commit that flipped
scoreFuelStockDays's imputationClass from 'source-failure' to null to
avoid a false "Source down" widget label on every country. The code
changed; the doc and registry metadata did not, leaving three sites
in the methodology mdx and two comment/description sites in the
registry still claiming imputationClass='source-failure'. Any future
reviewer (or tooling that treats the registry description as
authoritative) would be misled.

This commit rewrites those sites to describe the shipped behavior:
 - imputationClass=null (not 'source-failure'), with the rationale
 - exclusion from confidence/coverage averages via the
   RESILIENCE_RETIRED_DIMENSIONS registry filter
 - the distinction between structural retirement (filtered) and
   runtime coverage=0 (kept so sparse-data countries still flag
   lowConfidence)

Touched:
 - docs/methodology/country-resilience-index.mdx (lines ~33, ~268, ~590)
 - server/worldmonitor/resilience/v1/_indicator-registry.ts
   (recoveryFuelStockDays comment block + description field)

No code-behavior change. Docs-only.

Tests: 157 targeted resilience tests pass (incl. methodology-lint +
widget + release-gate + confidence-averaging). Typecheck clean on
both tsconfig.json and tsconfig.api.json.
2026-04-22 23:57:28 +04:00
Elie Habib
c067a7dd63 fix(resilience): include hydroelectric in lowCarbonGenerationShare (PR #3289 follow-up) (#3293)
Greptile P1 review on the merged PR #3289: World Bank EG.ELC.RNEW.ZS
explicitly excludes hydroelectric. The v2 lowCarbonGenerationShare
composite was summing only nuclear + renew-ex-hydro, which would
collapse to ~0 for hydro-dominant economies the moment the
RESILIENCE_ENERGY_V2_ENABLED flag flipped:

  Norway    ~95% hydro → score near 0 on a 0.20-weight indicator
  Paraguay  ~99% hydro → same
  Brazil    ~65% hydro → same
  Canada    ~60% hydro → same

Directly contradicts the plan §3.3 intent of crediting "firm
low-carbon generation" and would produce rankings that contradict
the power-system security framing.

PR #3289 merged before the review landed. This branch applies the
fix against main.

Fix: add EG.ELC.HYRO.ZS as a third series in the composite.

  seed-low-carbon-generation.mjs:
    - INDICATORS: ['EG.ELC.NUCL.ZS', 'EG.ELC.RNEW.ZS'] + 'EG.ELC.HYRO.ZS'
    - fetchLowCarbonGeneration(): sum three series, track latest year
      across all three, same cap-at-100 guard
    - File header comment names the three-series sum with the hydro-
      exclusion rationale + the country list that would break.

  _indicator-registry.ts lowCarbonGenerationShare.description:
  rewritten to name all three WB codes + explain the hydro exclusion.

  country-resilience-index.mdx:
    - Known-limitations item 3 names all three WB codes + country list
    - Energy domain v2 table row names all three WB codes
    - v2.1 changelog Indicators-added bullet names all three WB codes
    - v2.1 changelog New-seeders bullet names all three WB codes on
      seed-low-carbon-generation

No scorer code change (composite lives in the seeder; scorer reads
the pre-summed value from resilience:low-carbon-generation:v1). No
weight change. Flag-off path remains byte-identical.

25 resilience tests pass, typecheck + typecheck:api clean.
2026-04-22 23:47:45 +04:00
Elie Habib
32049c07ca feat(portwatch): H+F — cache by upstream maxDate + parallel window split (#3299) 2026-04-22 22:37:08 +04:00
Elie Habib
52659ce192 feat(resilience): PR 1 — energy construct repair (flag-gated) (#3289)
* docs(resilience): PR 1 foundation — Option B framing + v2 energy construct spec

First commit in PR 1 of the resilience repair plan. Zero scoring-behaviour
change; sets up the construct contract that the code changes will implement.

Declares the framing decision required by plan section 3.2 before any
scorer code lands: Option B (power-system security) is adopted. Electricity
grids are the dominant short-horizon shock-transmission channel, and the
choice lets the v2 energy indicator set share one denominator (percent of
electricity generation) instead of mixing primary-energy and power-system
measures in a composite.

Methodology doc changes:
  - Energy Domain section now documents both the legacy indicator set
    (still the default) and the v2 indicator set (flag-gated), under a
    single #### Energy H4 heading so the methodology-doc linter still
    asserts dimension-id parity with the registry.
  - v2 indicators: importedFossilDependence (EG.ELC.FOSL.ZS x
    max(EG.IMP.CONS.ZS, 0)), lowCarbonGenerationShare (EG.ELC.NUCL.ZS +
    EG.ELC.RNEW.ZS), powerLossesPct (EG.ELC.LOSS.ZS), reserveMarginPct
    (IEA), euGasStorageStress (renamed + scoped to EU), energyPriceStress
    (retained at 0.15 weight).
  - Retired under v2: electricityConsumption, gasShare, coalShare,
    dependency (all into importedFossilDependence), renewShare.
  - electricityAccess moves from energy to infrastructure under v2.
  - Added a v2.1 changelog section documenting the flag-gated rollout,
    acceptance gates (per plan section 6), and snapshot filenames for
    the post-flag-flip captures.
  - Known-limitations items 1-3 updated to note PR 1 lands the v2
    construct behind RESILIENCE_ENERGY_V2_ENABLED (default off).

Methodology-doc linter + mdx-lint + typecheck all clean. Indicator
registry, seeders, and scorer rewrite land in subsequent commits on
this same branch.

* feat(resilience): PR 1 — RESILIENCE_ENERGY_V2_ENABLED flag + scoreEnergy v2 + registry entries

Second commit in PR 1 of the resilience repair plan. Lands the flag,
the v2 scorer code path, and the registry entries the methodology
doc referenced. Default is flag off; published rankings are unchanged
until the flag flips in a later commit (after seeders land and the
acceptance-gate rerun produces a fresh post-flip snapshot).

Changes:

  - _shared.ts: isEnergyV2Enabled() function reader on the canonical
    RESILIENCE_ENERGY_V2_ENABLED env var. Dynamic read (like
    isPillarCombineEnabled) so tests can flip per-case.

  - _dimension-scorers.ts:
    - New Redis key constants for the three v2 seed keys plus the
      reserved reserveMargin key (seeder deferred per plan §3.1
      open-question).
    - EU_GAS_STORAGE_COUNTRIES set (EU + EFTA + UK) for the renamed
      euGasStorageStress signal per plan §3.5 point 2.
    - isEnergyV2EnabledLocal() — private duplicate of the flag reader
      to avoid a circular import (_shared.ts already imports from
      this module). Same env-var contract.
    - scoreEnergy split into scoreEnergyLegacy() + scoreEnergyV2().
      Public scoreEnergy() branches on the flag. Legacy path is
      byte-identical to the pre-commit behaviour.
    - scoreEnergyV2() reads four new bulk payloads, composes
      importedFossilDependence = fossilElectricityShare × max(netImports, 0)/100
      per plan §3.2, collapses net exporters to 0, and gates
      euGasStorageStress on EU membership so non-EU countries
      re-normalise rather than getting penalised for a regional
      signal.

  - _indicator-registry.ts: four new entries under `dimension: 'energy'`
    with `tier: 'experimental'` — importedFossilDependence (0.35),
    lowCarbonGenerationShare (0.20), powerLossesPct (0.10),
    reserveMarginPct (0.10). Experimental tier keeps them out of the
    Core coverage gate until seed coverage is confirmed.

  - compare-resilience-current-vs-proposed.mjs: new
    'bulk-v1-country-value' shape family in the extraction dispatcher.
    EXTRACTION_RULES now covers the four v2 registry indicators so
    the per-indicator influence harness tracks them from day one.
    When the seeders are absent, pairedSampleSize = 0 and Pearson = 0
    — the harness output surfaces the "no influence yet" state rather
    than silently dropping the indicators.

  - tests/resilience-energy-v2.test.mts: 11 new tests pinning:
    - flag-off = legacy behaviour preserved (v2 seed keys have no
      effect when flag is off — catches accidental cross-path reads)
    - flag-on = v2 composite behaves correctly:
      - lower fossilElectricityShare raises score
      - net exporter with 90% fossil > net importer with 90% fossil
        (max(·, 0) collapse verified)
      - higher lowCarbonGenerationShare raises score (nuclear credit)
      - higher powerLossesPct lowers score
      - euGasStorageStress is invariant for non-EU, responds for DE
      - all v2 inputs absent = graceful degradation, coverage < 1.0

106 resilience tests pass (existing + 11 new). Typecheck clean. Biome
clean. No production behaviour change with flag off (default).

Next commits on this branch: three World Bank seeders for the v2 keys,
health.js + SEED_META registration (gated ON_DEMAND_KEYS until Railway
cron provisions), acceptance-gate rerun at flag-flip time.

* feat(resilience): PR 1 — three WB seeders + health registration for v2 energy construct

Third commit in PR 1. Lands the seed scripts for the three v2 energy
indicator source keys, registered in api/health.js with ON_DEMAND_KEYS
gating until Railway cron provisions.

New seeders (weekly cron cadence, 8d maxStaleMin = 2x interval):
  - scripts/seed-low-carbon-generation.mjs
    Pulls EG.ELC.NUCL.ZS + EG.ELC.RNEW.ZS from World Bank, sums per
    country into `resilience:low-carbon-generation:v1`. Partial
    coverage (one series missing) still emits a value using the
    observed half — the scorer's 0-80 saturating goalpost tolerates
    it and the underlying construct is "firm low-carbon share".

  - scripts/seed-fossil-electricity-share.mjs
    Pulls EG.ELC.FOSL.ZS into `resilience:fossil-electricity-share:v1`.
    Feeds the importedFossilDependence composite at score time
    (composite = fossilShare × max(netImports, 0) / 100 per plan §3.2).

  - scripts/seed-power-reliability.mjs
    Pulls EG.ELC.LOSS.ZS into `resilience:power-losses:v1`. Direct
    grid-integrity signal replacing the retired electricityConsumption
    wealth proxy.

All three follow the existing seed-recovery-*.mjs template:
  - Shape: { countries: { [ISO2]: { value, year } }, seededAt }
  - runSeed() from _seed-utils.mjs with schemaVersion=1, ttl=35d
  - validateFn floor of 150 countries (WB coverage is 150-180 for
    the three indicators; below 150 = transient fetch failure)
  - ISO3 → ISO2 mapping via scripts/shared/iso3-to-iso2.json

No reserveMargin seeder is shipped in this commit per plan §3.1 open
question: IEA electricity-balance coverage is sparse outside OECD+G20,
and the indicator will likely ship as 'unmonitored' with weight 0.05
if it lands at all. The Redis key (`resilience:reserve-margin:v1`) is
reserved in _dimension-scorers.ts so the v2 scorer shape is stable.

api/health.js:
  - SEED_DOMAINS: add `lowCarbonGeneration`, `fossilElectricityShare`,
    `powerLosses` → their Redis keys.
  - SEED_META: same three, pointing at `seed-meta:resilience:*` meta
    keys with maxStaleMin=11520 (8d, per the worldmonitor
    health-maxstalemin-write-cadence pattern: 2x weekly cron).
  - ON_DEMAND_KEYS: three new entries gated as TRANSITIONAL until
    Railway cron provisions and the first clean run completes. Remove
    from this set after ~7 days of green production runs.

Typecheck clean; existing 106 resilience tests pass (seeders have no
in-repo callers yet, so nothing depends on them executing). Real-API
integration tests land when Railway cron is provisioned.

Next commit: Railway cron configuration + bundle-runner wiring.

* feat(resilience): PR 1 — bundle-runner + acceptance-gate verdict + flag-flip runbook

Final commit in the PR 1 tranche. Lands the three remaining pieces so
the flag-flip is fully operable once Railway cron provisions.

  - scripts/seed-bundle-resilience-energy-v2.mjs
    Railway cron bundle wrapping the three v2 energy seeders
    (low-carbon-generation, fossil-electricity-share, power-losses).
    Weekly cadence (7-day intervalMs); the underlying data is annual
    at source so polling more frequently just hammers the World Bank
    API. 5-minute per-script timeout. Mirrors the existing
    seed-bundle-resilience-recovery.mjs pattern.

  - scripts/compare-resilience-current-vs-proposed.mjs: acceptanceGates
    block. Programmatic evaluation of plan §6 gates using the inputs
    the harness already computes:
      gate-1-spearman              Spearman vs baseline >= 0.85
      gate-2-country-drift         Max country drift vs baseline <= 15
      gate-6-cohort-median         Cohort median shift vs baseline <= 10
      gate-7-matched-pair          Every pair holds expected direction
      gate-9-effective-influence   >= 80% Core indicators measurable
      gate-universe-integrity      No cohort/pair endpoint missing from scorable
    Thresholds are encoded in a const so they can't silently soften.
    Output verdict is PASS / CONDITIONAL / BLOCK. Emitted in
    summary.acceptanceVerdict for at-a-glance PR comment pasting, with
    full per-gate detail in acceptanceGates.results.

  - docs/methodology/energy-v2-flag-flip-runbook.md
    Operator runbook for the flag flip. Pre-flip checklist (seeders
    green, health endpoint green, ON_DEMAND_KEYS graduation, Spearman
    verification), flip procedure (pre-flip snapshot, dry-run, cache
    prefix bump, Vercel env flip, post-flip snapshot, methodology
    doc reclassification), rollback procedure, and a reference table
    for the three possible verdict states.

PR 1 is now code-complete pending:
  1. Railway cron provisioning (ops, not code)
  2. Flag flip + acceptance-gate rerun (follows runbook, not code)
  3. Reserve-margin seeder (deferred per plan §3.1 open-question)

Zero scoring-behaviour change in this commit. 121 resilience tests
pass, typecheck clean.

* fix(resilience): PR 1 — drop unseeded reserveMargin from scorer + fix composite extractor

Addresses two P1 review findings on PR #3289.

Finding 1: scoreEnergyV2 read resilience:reserve-margin:v1 at weight
0.10 but no seeder ships in this PR (indicator deferred per plan
§3.1 open-question). On flag flip that slot would be permanently
null, silently renormalizing the remaining 90% of weight and
producing a construct different from what the methodology doc
describes. Fix: remove reserve-margin from the v2 reader +
blend entirely. Redistribute its 0.10 weight to powerLossesPct
(now 0.20); both are grid-integrity signals per plan §3.1, and
the original plan split electricityConsumption's 0.30 weight
across powerLossesPct + reserveMarginPct + importedFossilDependence
— without reserveMarginPct, powerLossesPct carries the shared
grid-integrity load until the IEA seeder ships.

  v2 weights now: 0.35 + 0.20 + 0.20 + 0.10 + 0.15 = 1.00
  (importedFossilDependence + lowCarbonGenerationShare +
   powerLossesPct + euGasStorageStress + energyPriceStress)

  Reserve-margin Redis key constant stays reserved so the v2
  scorer shape is stable when a future commit lands the seeder;
  split 0.10 back out of powerLossesPct at that point.

Methodology doc, _shared.ts flag comment, and v2 test suite all
updated to the 5-indicator shape. New regression test asserts
that changing reserve-margin Redis content has zero effect on
the v2 score — guards against a future commit accidentally
wiring the reader back in without its seeder.

Finding 2: scripts/compare-resilience-current-vs-proposed.mjs
measured importedFossilDependence by reading fossilElectricityShare
alone. The scorer defines it as fossilShare × max(netImports, 0)
/ 100, so the extractor zeroed out net exporters and
under-reported net importers — making gate-9 effective-influence
wrong for the centrepiece construct change of PR 1.

Fix: new 'imported-fossil-dependence-composite' extractor type
in applyExtractionRule that recomputes the same composite from
both inputs (fossilShare bulk payload + staticRecord.iea.
energyImportDependency.value). Stays in lockstep with the
scorer — drift between the two would break gate-9's
interpretation.

New unit tests pin:
  - net importer: 80% × max(60, 0) / 100 = 48 ✓
  - net exporter: 80% × max(-40, 0) / 100 = 0 ✓
  - missing either input → null

64 resilience tests pass; typecheck clean. Flag-off path is
still byte-identical to pre-PR behaviour.

* docs(resilience): PR 1 — align methodology doc with actual shipped indicators and seeders

Addresses P1 review on docs/methodology/country-resilience-index.mdx
lines 29 and 574-575. The doc still described reserveMarginPct as a
shipped v2 indicator and listed seed-net-energy-imports.mjs in the
new-seeders list, neither of which the branch actually ships.

Doc changes to match the code in this branch:

  Known-limitations item 1: restated to describe the actual v2
  replacement footprint — powerLossesPct at 0.20 (temporarily
  absorbing reserveMarginPct's 0.10) plus accessToElectricityPct
  moved to infrastructure. reserveMarginPct is named as a deferred
  companion with the split-out instructions for when its seeder
  lands.

  v2.1 changelog (Indicators added): split into "live in PR 1" and
  "deferred in PR 1" so the reader can distinguish which entries
  match real code. importedFossilDependence's composite formula
  now written out and the net-imports source attributed to the
  existing resilience:static.iea path (not a new seeder).

  v2.1 changelog (New seeders): lists the three actual files that
  ship in this branch (seed-low-carbon-generation, seed-fossil-
  electricity-share, seed-power-reliability) and explicitly notes
  seed-net-energy-imports.mjs is NOT a new seeder — the
  EG.IMP.CONS.ZS series is already fetched by seed-resilience-
  static.mjs. Adds the bundle-runner reference.

Methodology-doc linter + mdx-lint both pass (125/125). Typecheck
clean. Doc is now the source of truth for what PR 1 actually ships.

* fix(resilience): PR 1 — sync powerLossesPct registry weight with scorer (0.10 → 0.20)

Reviewer-caught mismatch between INDICATOR_REGISTRY and scoreEnergyV2.
The previous commit redistributed the deferred reserveMarginPct's 0.10
weight into powerLossesPct in the SCORER but left the REGISTRY entry
unchanged at 0.10. Two downstream effects:

  1. scripts/compare-resilience-current-vs-proposed.mjs copies
     `spec.weight` into `nominalWeight` for gate-9 reporting, so
     powerLossesPct's nominal influence would be under-reported by
     half in every post-flip acceptance run — exactly the harness PR 1
     relies on for merge evidence.
  2. Methodology doc vs registry vs scorer drift is the pattern the
     methodology-doc linter is supposed to catch; it passes here
     because the linter only checks dimension-id parity, not weights.
     Registry is now the only remaining source of truth to keep in
     lockstep with the scorer.

Change:
  - `_indicator-registry.ts` powerLossesPct.weight: 0.1 → 0.2
  - Inline comment names the deferral and instructs: "when the IEA
    electricity-balance seeder lands, split 0.10 back out and restore
    reserveMarginPct at 0.10. Keep this field in lockstep with
    scoreEnergyV2 ... because the PR 0 compare harness copies
    spec.weight into nominalWeight for gate-9 reporting."

Experimental weights per dimension invariant still holds (0.35 + 0.20
+ 0.20 = 0.75 for energy, well under the 1.0 ceiling). 64 resilience
tests pass, typecheck clean.
2026-04-22 17:10:38 +04:00
Elie Habib
da0f26a3cf feat(resilience): PR 0 diagnostic freeze + fairness-audit harness (no scoring changes) (#3284)
* feat(resilience): PR 0 diagnostic freeze + fairness-audit harness

Lands the before-state and measurement apparatus every subsequent
resilience-scorer PR validates against. Zero scoring changes. Per the
v3 plan at docs/plans/2026-04-22-001-fix-resilience-scorer-structural-
bias-plan.md this is tranche 0 of five.

What lands:
- Construct contract published in the methodology doc: absolute
  resilience not development-adjusted, mechanism test for every
  indicator, peer-relative views published separately from the core.
- Known construct limitations section: six construct errors scheduled
  for PR 1-3 repair with explicit mapping to plan tranches.
- Indicator-source manifest at docs/methodology/indicator-sources.yaml
  with source, seriesId, seriesUrl, coveragePct, lastObservedYear,
  license, mechanismTestRationale, and a constructStatus classification.
- Pre-repair ranking snapshot at
  docs/snapshots/resilience-ranking-live-pre-repair-2026-04-22.json
  (217 items + 5 greyedOut, captured 2026-04-22 08:38 UTC at commit
  425507d15).
- Cohort configuration at tests/helpers/resilience-cohorts.mts: six
  cohorts covering 87 countries (net-fuel-exporters, net-energy-
  importers-oecd, nuclear-heavy-generation, coal-heavy-domestic,
  small-island-importers, fragile-states).
- Matched-pair sanity panel at tests/helpers/resilience-matched-pairs.mts:
  six pairs (FR/DE, NO/CA, UAE/BH, JP/KR, IN/ZA, SG/CH) with expected-
  direction rationale and minGap for acceptance gate 7.
- scripts/compare-resilience-current-vs-proposed.mjs extended to emit
  cohortSummary and matchedPairSummary alongside the existing output
  shape (backward compatible).
- tests/resilience-cohort-config.test.mts: 11 validations ensuring the
  cohort + matched-pair configs stay well-formed.

Deferred to PR 0.5 (before PR 1 lands):
- Monotonicity test harness for all 19 dimension scorers pinning the
  sign of every indicator.
- Pearson-derivative variable-influence baseline inside the sensitivity
  script producing the nominal-weight-vs-effective-influence table that
  plan acceptance gate 8 requires.

Verification: typecheck:all clean, 430/430 resilience tests pass,
11/11 new cohort-config tests pass, snapshot auto-discovered and
validated by the existing snapshot-test harness.

* feat(resilience): PR 0 follow-ups — monotonicity harness, variable-influence baseline, cross-consumer formula gate

Completes the PR 0 scope per the v3 plan §5 deliverables. Three adds:

1. Monotonicity test harness
   tests/resilience-dimension-monotonicity.test.mts pins the direction
   of movement for 14 indicators across 7 dimensions (reserve adequacy,
   fiscal space 3x, external debt coverage, import concentration,
   governance WGI, food/water 2x, energy 5x). Each test builds two
   synthetic ResilienceSeedReader fixtures differing only in the target
   indicator and asserts the dimension score moves in the documented
   direction. The scoreEnergy tests explicitly flag three indicators
   (gasShare, coalShare, electricityConsumption) that PR 1 §3.1-3.2
   overturns so future readers understand which directional claims the
   plan intentionally replaces.

2. Variable-influence baseline
   scripts/compare-resilience-current-vs-proposed.mjs now computes
   per-dimension Pearson correlation against the current overallScore
   scaled by the dimension's nominal domain weight (a Pearson-derivative
   approximation of Sobol indices). The output carries a
   variableInfluence[] array sorted by abs(effectiveInfluence) desc.
   Acceptance gate 8 from the plan compares post-change effective
   influence against assigned nominal weight; divergences flag a
   wealth-proxy or saturated-signal construct problem.

3. Cross-consumer formula gate
   Five external consumers of resilience:score:v10:* now filter stale-
   formula entries so a flag flip does not serve mixed-formula data
   downstream:
     - server/worldmonitor/supply-chain/v1/get-route-impact.ts —
       readResilienceScore() checks _formula via the new
       getCurrentCacheFormula export and returns 0 on mismatch.
     - scripts/validate-resilience-correlation.mjs,
       scripts/validate-resilience-backtest.mjs,
       scripts/backtest-resilience-outcomes.mjs,
       scripts/benchmark-resilience-external.mjs — each inlines a
       currentCacheFormulaLocal() helper that mirrors the server's
       formula derivation from env, skips parsed entries whose
       _formula disagrees, and logs the skip count so operators can
       notice a mismatch during the flip window.

A mixed-formula cohort (some countries d6-tagged, others pc-tagged)
would confound every correlation, AUC, and Spearman this repair plan
depends on for its acceptance gates. These guards close that gap.

Verification: typecheck:all clean, 444/444 resilience tests pass
(+14 from the new monotonicity harness).

* fix(resilience): PR 0 review follow-ups — sample-union + doc tense

Two review-driven fixes on top of PR 0.

1. scripts/compare-resilience-current-vs-proposed.mjs — the cohort and
   matched-pair summaries were computed against the historical
   52-country sensitivity seed, which silently excluded the
   small-island-importers cohort (zero members in the seed) and the
   sg-vs-ch matched pair (Singapore not in the seed). With the current
   script those acceptance gates are partially measured at best.

   SAMPLE now = union(historical 52 seed, every cohort member, every
   matched-pair endpoint). The imports for RESILIENCE_COHORTS and
   MATCHED_PAIRS moved from inside main() to module scope so the union
   can be computed before the script runs.

   Net sample size grows from 52 to ~95 countries. Still fast enough
   for an interactive pass; makes the acceptance gates honest.

2. docs/methodology/country-resilience-index.mdx — the construct
   contract wording read as present-tense compliance ("Every indicator
   in the scorer passes a single mechanism test"), which contradicted
   the immediately-following passage about indicators that currently
   fail the test. Reworded to "is being evaluated against" and added
   an explicit PR-0-does-not-change-scoring paragraph that names the
   known-failing indicators (electricityConsumption, gas/coal flat
   penalties, WHO per-capita health spend) and points at the repair
   plan for the replacement schedule.

Verification: typecheck:all clean, 444/444 resilience tests pass.

* fix(resilience): compare-script loads frozen baseline + emits per-indicator influence

Addresses two P1 review findings on PR #3284:

1. Script previously compared current-6d vs proposed-pillar-combined
   from the SAME checkout; never loaded the frozen pre-PR-0 baseline,
   so acceptance gates 2/6/7 ("no country moved >15pts vs baseline",
   cohort median shift vs baseline, matched-pair gap change vs
   baseline) could not be enforced for later scorer PRs.

   Now auto-discovers the most recent
   resilience-ranking-live-pre-repair-<date>.json (or post-<pr>-<date>)
   in docs/snapshots/ and emits a baselineComparison block with:
   spearmanVsBaseline, maxCountryAbsDelta, biggestDriftsVsBaseline,
   cohortShiftVsBaseline, matchedPairGapChange. If no baseline is
   found, the block is emitted with status 'unavailable' so callers
   distinguish missing-baseline from passed-baseline.

2. variableInfluence was emitted only at the dimension level, which
   hid the exact sub-indicators the repair plan targets
   (electricityConsumption, gasShare, coalShare, etc.) inside their
   parent dimension. Added extractIndicatorValues() which pulls twelve
   construct-risk indicators per country from the shared memoized
   reader, then computes per-indicator Pearson correlation against
   the current overall score. Emitted as perIndicatorInfluence[],
   sorted by absolute effective influence.

Acceptance gate 8 ("effective influence agrees in sign and rank-order
with assigned nominal weights") is now computable at the indicator
level, not only at the dimension level.

No production code touched; diagnostic-harness only.

* fix(resilience): baseline-snapshot selection by structured parse, not filename sort

Addresses P1 review on compare-resilience-current-vs-proposed.mjs:118-130.

Plain filename sort breaks the "immediate-prior state" contract two ways:

1. Lexical ordering: `pre-repair` sorts after `post-*`
   (`pr...` to 'r' > 'o'), so the PR-0 freeze would keep winning even
   after post-PR snapshots exist. Later scorer PRs would then report
   acceptance-gate deltas against the original pre-repair freeze
   instead of the immediately-prior post-PR-(N-1) snapshot — the gate
   would appear valid while measuring against the wrong baseline.

2. Lexical ordering: `pr10` < `pr9` (digit-by-digit), so PR-10 would
   lose the selection to PR-9.

Fix: parseBaselineSnapshotMeta() extracts (kind, prNumber, date) from
the filename. Sort keys are (kindRank desc, prNumber desc, date desc):
  - post always beats pre-repair (kindRank 1 vs 0)
  - among posts, prNumber compared numerically (10 beats 9)
  - date breaks ties (same-PR re-snapshots, later capture wins)
  - unlabeled post tags get prNumber 0 so they sort between
    pre-repair and any numbered PR snapshot

Surfaced in output: baselineKind / baselinePrNumber / baselineDate
alongside baselineFile so the operator can verify which snapshot was
selected without having to reopen the file.

Module now isMain-guarded per feedback_seed_isMain_guard memory so
tests can import parseBaselineSnapshotMeta without firing the
scoring run.

Added tests/resilience-baseline-snapshot-ordering.test.mjs (9 tests)
pinning the ordering contract for every known failure mode.

Diagnostic-harness change only. No production code touched.

* fix(resilience): full scorable universe + registry-driven per-indicator influence

Addresses two fresh P1 review findings on the PR 0 compare harness.

Finding 1 — acceptance math ran on a curated ~95-country sample,
so plan gate 2 could miss large regressions in excluded countries.

  - Main scoring loop now iterates the FULL scorable universe
    (listScorableCountries()), not the 52-country seed + cohort union.
  - Removed SAMPLE / HISTORICAL_SENSITIVITY_SEED constants.
  - Added scorableUniverseSize + cohortMissingFromScorable to output
    so operators see universe size and any cohort/pair endpoint that
    listScorable refuses to score (fail-loud, not silent drop).

Finding 3 — per-indicator influence was a hand-picked 12-indicator
subset, hiding most registry indicators from the baseline that
later scorer PRs need.

  - Extraction is now driven by INDICATOR_REGISTRY. Every Core +
    Enrichment indicator gets a row with explicit extractionStatus:
      implemented | not-implemented (with reason) | unregistered-in-harness
  - EXTRACTION_RULES covers 40/59 indicators across 11 shape families
    (static-path, static-wb-infrastructure, static-wgi, static-wgi-mean,
    static-who, energy-mix-field, gas-storage-field, recovery-country-
    field, imf-macro/labor-country-field, national-debt, sanctions-count).
  - Remaining 19 indicators need either a scorer trace hook (PR 0.5)
    or a safe aggregation duplicate; each carries a reason string.
  - extractionCoverage summary (totalIndicators / implemented /
    notImplemented / unregisteredInHarness / coreImplemented / coreTotal)
    exposed in output so PR 0.5 progress is measurable.

Added tests/resilience-indicator-extraction-plan.test.mjs (11 tests)
pinning: every registry entry has an extraction row; not-implemented
rows carry a reason; all 12 plan-named construct-risk indicators stay
extractable; Core-tier coverage floor of 45%; shape-family unit tests.

Diagnostic-harness change only. No production code touched.

* fix(resilience): wire event-aggregate per-indicator influence via exported scorer helpers

Addresses P1 review on PR 0 compare harness. Previous commit marked 16
Core-tier indicators as 'not-implemented' because they needed scorer
event-window/severity-weighting math; that left the gate-9 acceptance
apparatus incomplete for a large part of the shipped score.

Fix: export the scorer-internal aggregation helpers so the harness
calls them directly. Zero aggregation math duplicated in the harness,
harness and scorer cannot drift.

Exported from _dimension-scorers.ts (purely additive):
  summarizeCyber, summarizeOutages, summarizeGps,
  summarizeUcdp, summarizeUnrest, summarizeSocialVelocity,
  getCountryDisplacement, getThreatSummaryScore,
  countTradeRestrictions, countTradeBarriers.

13 extraction rules moved from not-implemented to implemented:
  cyberThreats, internetOutages, infraOutages, gpsJamming,
  ucdpConflict, unrestEvents, socialVelocity, newsThreatScore,
  displacementTotal, displacementHosted, tradeRestrictions,
  tradeBarriers, recoveryConflictPressure, recoveryDisplacementVelocity.

Coverage:
  52/59 total (88%), 46/50 Core-tier (92%).

Four Core indicators remain not-implemented for STRUCTURAL reasons,
NOT missing code. Scorer inputs are genuinely global scalars with
zero per-country variance, so Pearson(indicator, overall) is 0 or
NaN by construction:
  shippingStress, transitDisruption, energyPriceStress — scorer
  reads a global scalar applied to every country; a per-country
  effective signal would need re-expression as (global x per-country
  exposure), which is a derived signal in a different entry.
  aquastatWaterAvailability — needs a distinct sub-indicator path
  resolver; enrichment follow-up.

New test asserts the three no-per-country-variance indicators STAY
not-implemented with a matching reason, so any future extraction
that appears to cover them without fixing the underlying construct
fails.

Dispatcher split into STATIC / SIMPLE / AGGREGATE extractor tables
to stay under biome complexity limit. Core-tier floor test raised
from 45% to 80%.

89 resilience tests pass, typecheck clean, biome clean. No production
behaviour changes.

* fix(resilience): tag-gated AQUASTAT extractor closes the last fixable Core gap

Reviewer flagged aquastatWaterAvailability as the only remaining Core
indicator where the not-implemented status was structurally fixable
rather than conceptually impossible.

Both aquastatWaterStress and aquastatWaterAvailability share a single
.aquastat.value field; the scorer's scoreAquastatValue splits them
by the sibling .aquastat.indicator tag keyword (stress/withdrawal/
dependency to stress family; availability/renewable/access to
availability family). The harness now mirrors this branching:

  - classifyAquastatFamily implements the scorer's priority order
    (stress-family match wins even if the tag also contains an
    availability keyword, matching the sequential if-check at
    _dimension-scorers.ts L770-776).
  - static-aquastat-stress / static-aquastat-availability extractors
    return the value only when the family matches, so stress-family
    readings never corrupt the availability Pearson and vice versa.

Core-tier coverage: 46/50 to 47/50 (94%). The 3 remaining Core
not-implemented indicators (shippingStress, transitDisruption,
energyPriceStress) are all structural impossibilities: scorer inputs
are global scalars with zero per-country variance.

New contract test pins both directions of the tag gate plus the
priority-order edge case (a tag containing both families' keywords
routes to stress).

90 resilience tests pass, typecheck clean, biome clean.
2026-04-22 16:44:12 +04:00
Sebastien Melki
58e42aadf9 chore(api): enforce sebuf contract + migrate drifting endpoints (#3207) (#3242)
* chore(api): enforce sebuf contract via exceptions manifest (#3207)

Adds api/api-route-exceptions.json as the single source of truth for
non-proto /api/ endpoints, with scripts/enforce-sebuf-api-contract.mjs
gating every PR via npm run lint:api-contract. Fixes the root-only blind
spot in the prior allowlist (tests/edge-functions.test.mjs), which only
scanned top-level *.js files and missed nested paths and .ts endpoints —
the gap that let api/supply-chain/v1/country-products.ts and friends
drift under proto domain URL prefixes unchallenged.

Checks both directions: every api/<domain>/v<N>/[rpc].ts must pair with
a generated service_server.ts (so a deleted proto fails CI), and every
generated service must have an HTTP gateway (no orphaned generated code).

Manifest entries require category + reason + owner, with removal_issue
mandatory for temporary categories (deferred, migration-pending) and
forbidden for permanent ones. .github/CODEOWNERS pins the manifest to
@SebastienMelki so new exceptions don't slip through review.

The manifest only shrinks: migration-pending entries (19 today) will be
removed as subsequent commits in this PR land each migration.

* refactor(maritime): migrate /api/ais-snapshot → maritime/v1.GetVesselSnapshot (#3207)

The proto VesselSnapshot was carrying density + disruptions but the frontend
also needed sequence, relay status, and candidate_reports to drive the
position-callback system. Those only lived on the raw relay passthrough, so
the client had to keep hitting /api/ais-snapshot whenever callbacks were
registered and fall back to the proto RPC only when the relay URL was gone.

This commit pushes all three missing fields through the proto contract and
collapses the dual-fetch-path into one proto client call.

Proto changes (proto/worldmonitor/maritime/v1/):
  - VesselSnapshot gains sequence, status, candidate_reports.
  - GetVesselSnapshotRequest gains include_candidates (query: include_candidates).

Handler (server/worldmonitor/maritime/v1/get-vessel-snapshot.ts):
  - Forwards include_candidates to ?candidates=... on the relay.
  - Separate 5-min in-memory caches for the candidates=on and candidates=off
    variants; they have very different payload sizes and should not share a slot.
  - Per-request in-flight dedup preserved per-variant.

Frontend (src/services/maritime/index.ts):
  - fetchSnapshotPayload now calls MaritimeServiceClient.getVesselSnapshot
    directly with includeCandidates threaded through. The raw-relay path,
    SNAPSHOT_PROXY_URL, DIRECT_RAILWAY_SNAPSHOT_URL and LOCAL_SNAPSHOT_FALLBACK
    are gone — production already routed via Vercel, the "direct" branch only
    ever fired on localhost, and the proto gateway covers both.
  - New toLegacyCandidateReport helper mirrors toDensityZone/toDisruptionEvent.

api/ais-snapshot.js deleted; manifest entry removed. Only reduced the codegen
scope to worldmonitor.maritime.v1 (buf generate --path) — regenerating the
full tree drops // @ts-nocheck from every client/server file and surfaces
pre-existing type errors across 30+ unrelated services, which is not in
scope for this PR.

Shape-diff vs legacy payload:
  - disruptions / density: proto carries the same fields, just with the
    GeoCoordinates wrapper and enum strings (remapped client-side via
    existing toDisruptionEvent / toDensityZone helpers).
  - sequence, status.{connected,vessels,messages}: now populated from the
    proto response — was hardcoded to 0/false in the prior proto fallback.
  - candidateReports: same shape; optional numeric fields come through as
    0 instead of undefined, which the legacy consumer already handled.

* refactor(sanctions): migrate /api/sanctions-entity-search → LookupSanctionEntity (#3207)

The proto docstring already claimed "OFAC + OpenSanctions" coverage but the
handler only fuzzy-matched a local OFAC Redis index — narrower than the
legacy /api/sanctions-entity-search, which proxied OpenSanctions live (the
source advertised in docs/api-proxies.mdx). Deleting the legacy without
expanding the handler would have been a silent coverage regression for
external consumers.

Handler changes (server/worldmonitor/sanctions/v1/lookup-entity.ts):
  - Primary path: live search against api.opensanctions.org/search/default
    with an 8s timeout and the same User-Agent the legacy edge fn used.
  - Fallback path: the existing OFAC local fuzzy match, kept intact for when
    OpenSanctions is unreachable / rate-limiting.
  - Response source field flips between 'opensanctions' (happy path) and
    'ofac' (fallback) so clients can tell which index answered.
  - Query validation tightened: rejects q > 200 chars (matches legacy cap).

Rate limiting:
  - Added /api/sanctions/v1/lookup-entity to ENDPOINT_RATE_POLICIES at 30/min
    per IP — matches the legacy createIpRateLimiter budget. The gateway
    already enforces per-endpoint policies via checkEndpointRateLimit.

Docs:
  - docs/api-proxies.mdx — dropped the /api/sanctions-entity-search row
    (plus the orphaned /api/ais-snapshot row left over from the previous
    commit in this PR).
  - docs/panels/sanctions-pressure.mdx — points at the new RPC URL and
    describes the OpenSanctions-primary / OFAC-fallback semantics.

api/sanctions-entity-search.js deleted; manifest entry removed.

* refactor(military): migrate /api/military-flights → ListMilitaryFlights (#3207)

Legacy /api/military-flights read a pre-baked Redis blob written by the
seed-military-flights cron and returned flights in a flat app-friendly
shape (lat/lon, lowercase enums, lastSeenMs). The proto RPC takes a bbox,
fetches OpenSky live, classifies server-side, and returns nested
GeoCoordinates + MILITARY_*_TYPE_* enum strings + lastSeenAt — same data,
different contract.

fetchFromRedis in src/services/military-flights.ts was doing nothing
sebuf-aware. Renamed it to fetchViaProto and rewrote to:

  - Instantiate MilitaryServiceClient against getRpcBaseUrl().
  - Iterate MILITARY_QUERY_REGIONS (PACIFIC + WESTERN) in parallel — same
    regions the desktop OpenSky path and the seed cron already use, so
    dashboard coverage tracks the analytic pipeline.
  - Dedup by hexCode across regions.
  - Map proto → app shape via new mapProtoFlight helper plus three reverse
    enum maps (AIRCRAFT_TYPE_REVERSE, OPERATOR_REVERSE, CONFIDENCE_REVERSE).

The seed cron (scripts/seed-military-flights.mjs) stays put: it feeds
regional-snapshot mobility, cross-source signals, correlation, and the
health freshness check (api/health.js: 'military:flights:v1'). None of
those read the legacy HTTP endpoint; they read the Redis key directly.
The proto handler uses its own per-bbox cache keys under the same prefix,
so dashboard traffic no longer races the seed cron's blob — the two paths
diverge by a small refresh lag, which is acceptable.

Docs: dropped the /api/military-flights row from docs/api-proxies.mdx.

api/military-flights.js deleted; manifest entry removed.

Shape-diff vs legacy:
  - f.location.{latitude,longitude} → f.lat, f.lon
  - f.aircraftType: MILITARY_AIRCRAFT_TYPE_TANKER → 'tanker' via reverse map
  - f.operator: MILITARY_OPERATOR_USAF → 'usaf' via reverse map
  - f.confidence: MILITARY_CONFIDENCE_LOW → 'low' via reverse map
  - f.lastSeenAt (number) → f.lastSeen (Date)
  - f.enrichment → f.enriched (with field renames)
  - Extra fields registration / aircraftModel / origin / destination /
    firstSeenAt now flow through where proto populates them.

* fix(supply-chain): thread includeCandidates through chokepoint status (#3207)

Caught by tsconfig.api.json typecheck in the pre-push hook (not covered
by the plain tsc --noEmit run that ran before I pushed the ais-snapshot
commit). The chokepoint status handler calls getVesselSnapshot internally
with a static no-auth request — now required to include the new
includeCandidates bool from the proto extension.

Passing false: server-internal callers don't need per-vessel reports.

* test(maritime): update getVesselSnapshot cache assertions (#3207)

The ais-snapshot migration replaced the single cachedSnapshot/cacheTimestamp
pair with a per-variant cache so candidates-on and candidates-off payloads
don't evict each other. Pre-push hook surfaced that tests/server-handlers
still asserted the old variable names. Rewriting the assertions to match
the new shape while preserving the invariants they actually guard:

  - Freshness check against slot TTL.
  - Cache read before relay call.
  - Per-slot in-flight dedup.
  - Stale-serve on relay failure (result ?? slot.snapshot).

* chore(proto): restore // @ts-nocheck on regenerated maritime files (#3207)

I ran 'buf generate --path worldmonitor/maritime/v1' to scope the proto
regen to the one service I was changing (to avoid the toolchain drift
that drops @ts-nocheck from 60+ unrelated files — separate issue). But
the repo convention is the 'make generate' target, which runs buf and
then sed-prepends '// @ts-nocheck' to every generated .ts file. My
scoped command skipped the sed step. The proto-check CI enforces the
sed output, so the two maritime files need the directive restored.

* refactor(enrichment): decomm /api/enrichment/{company,signals} legacy edge fns (#3207)

Both endpoints were already ported to IntelligenceService:
  - getCompanyEnrichment  (/api/intelligence/v1/get-company-enrichment)
  - listCompanySignals    (/api/intelligence/v1/list-company-signals)

No frontend callers of the legacy /api/enrichment/* paths exist. Removes:
  - api/enrichment/company.js, signals.js, _domain.js
  - api-route-exceptions.json migration-pending entries (58 remain)
  - docs/api-proxies.mdx rows for /api/enrichment/{company,signals}
  - docs/architecture.mdx reference updated to the IntelligenceService RPCs

Verified: typecheck, typecheck:api, lint:api-contract (89 files / 58 entries),
lint:boundaries, tests/edge-functions.test.mjs (136 pass),
tests/enrichment-caching.test.mjs (14 pass — still guards the intelligence/v1
handlers), make generate is zero-diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(leads): migrate /api/{contact,register-interest} → LeadsService (#3207)

New leads/v1 sebuf service with two POST RPCs:
  - SubmitContact    → /api/leads/v1/submit-contact
  - RegisterInterest → /api/leads/v1/register-interest

Handler logic ported 1:1 from api/contact.js + api/register-interest.js:
  - Turnstile verification (desktop sources bypass, preserved)
  - Honeypot (website field) silently accepts without upstream calls
  - Free-email-domain gate on SubmitContact (422 ApiError)
  - validateEmail (disposable/offensive/typo-TLD/MX) on RegisterInterest
  - Convex writes via ConvexHttpClient (contactMessages:submit, registerInterest:register)
  - Resend notification + confirmation emails (HTML templates unchanged)

Shared helpers moved to server/_shared/:
  - turnstile.ts (getClientIp + verifyTurnstile)
  - email-validation.ts (disposable/offensive/MX checks)

Rate limits preserved via ENDPOINT_RATE_POLICIES:
  - submit-contact:    3/hour per IP (was in-memory 3/hr)
  - register-interest: 5/hour per IP (was in-memory 5/hr; desktop
    sources previously capped at 2/hr via shared in-memory map —
    now 5/hr like everyone else, accepting the small regression in
    exchange for Upstash-backed global limiting)

Callers updated:
  - pro-test/src/App.tsx contact form → new submit-contact path
  - src-tauri/sidecar/local-api-server.mjs cloud-fallback rewrites
    /api/register-interest → /api/leads/v1/register-interest when
    proxying; keeps local path for older desktop builds
  - src/services/runtime.ts isKeyFreeApiTarget allows both old and
    new paths through the WORLDMONITOR_API_KEY-optional gate

Tests:
  - tests/contact-handler.test.mjs rewritten to call submitContact
    handler directly; asserts on ValidationError / ApiError
  - tests/email-validation.test.mjs + tests/turnstile.test.mjs
    point at the new server/_shared/ modules

Deleted: api/contact.js, api/register-interest.js, api/_ip-rate-limit.js,
api/_turnstile.js, api/_email-validation.js, api/_turnstile.test.mjs.
Manifest entries removed (58 → 56). Docs updated (api-platform,
api-commerce, usage-rate-limits).

Verified: npm run typecheck + typecheck:api + lint:api-contract
(88 files / 56 entries) + lint:boundaries pass; full test:data
(5852 tests) passes; make generate is zero-diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(pro-test): rebuild bundle for leads/v1 contact form (#3207)

Updates the enterprise contact form to POST to /api/leads/v1/submit-contact
(old path /api/contact removed in the previous commit).

Bundle is rebuilt from pro-test/src/App.tsx source change in 9ccd309d.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): address HIGH review findings 1-3 (#3207)

Three review findings from @koala73 on the sebuf-migration PR, all
silent bugs that would have shipped to prod:

### 1. Sanctions rate-limit policy was dead code

ENDPOINT_RATE_POLICIES keyed the 30/min budget under
/api/sanctions/v1/lookup-entity, but the generated route (from the
proto RPC LookupSanctionEntity) is /api/sanctions/v1/lookup-sanction-entity.
hasEndpointRatePolicy / getEndpointRatelimit are exact-string pathname
lookups, so the mismatch meant the endpoint fell through to the
generic 600/min global limiter instead of the advertised 30/min.

Net effect: the live OpenSanctions proxy endpoint (unauthenticated,
external upstream) had 20x the intended rate budget. Fixed by renaming
the policy key to match the generated route.

### 2. Lost stale-seed fallback on military-flights

Legacy api/military-flights.js cascaded military:flights:v1 →
military:flights:stale:v1 before returning empty. The new proto
handler went straight to live OpenSky/relay and returned null on miss.

Relay or OpenSky hiccup used to serve stale seeded data (24h TTL);
under the new handler it showed an empty map. Both keys are still
written by scripts/seed-military-flights.mjs on every run — fix just
reads the stale key when the live fetch returns null, converts the
seed's app-shape flights (flat lat/lon, lowercase enums, lastSeenMs)
to the proto shape (nested GeoCoordinates, enum strings, lastSeenAt),
and filters to the request bbox.

Read via getRawJson (unprefixed) to match the seed cron's writes,
which bypass the env-prefix system.

### 3. Hex-code casing mismatch broke getFlightByHex

The seed cron writes hexCode: icao24.toUpperCase() (uppercase);
src/services/military-flights.ts:getFlightByHex uppercases the lookup
input: f.hexCode === hexCode.toUpperCase(). The new proto handler
preserved OpenSky's lowercase icao24, and mapProtoFlight is a
pass-through. getFlightByHex was silently returning undefined for
every call after the migration.

Fix: uppercase in the proto handler (live + stale paths), and document
the invariant in a comment on MilitaryFlight.hex_code in
military_flight.proto so future handlers don't re-break it.

### Verified

- typecheck + typecheck:api clean
- lint:api-contract (56 entries) / lint:boundaries clean
- tests/edge-functions.test.mjs 130 pass
- make generate zero-diff (openapi spec regenerated for proto comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): restore desktop 2/hr rate cap on register-interest (#3207)

Addresses HIGH review finding #4 from @koala73. The legacy
api/register-interest.js applied a nested 2/hr per-IP cap when
`source === 'desktop-settings'`, on top of the generic 5/hr endpoint
budget. The sebuf migration lost this — desktop-source requests now
enjoy the full 5/hr cap.

Since `source` is an unsigned client-supplied field, anyone sending
`source: 'desktop-settings'` skips Turnstile AND gets 5/hr. Without
the tighter cap the Turnstile bypass is cheaper to abuse.

Added `checkScopedRateLimit` to `server/_shared/rate-limit.ts` — a
reusable second-stage Upstash limiter keyed on an opaque scope string
+ caller identifier. Fail-open on Redis errors to match existing
checkRateLimit / checkEndpointRateLimit semantics. Handlers that need
per-subscope caps on top of the gateway-level endpoint budget use this
helper.

In register-interest: when `isDesktopSource`, call checkScopedRateLimit
with scope `/api/leads/v1/register-interest#desktop`, limit=2, window=1h,
IP as identifier. On exceeded → throw ApiError(429).

### What this does not fix

This caps the blast radius of the Turnstile bypass but does not close
it — an attacker sending `source: 'desktop-settings'` still skips
Turnstile (just at 2/hr instead of 5/hr). The proper fix is a signed
desktop-secret header that authenticates the bypass; filed as
follow-up #3252. That requires coordinated Tauri build + Vercel env
changes out of scope for #3207.

### Verified

- typecheck + typecheck:api clean
- lint:api-contract (56 entries)
- tests/edge-functions.test.mjs + contact-handler.test.mjs (147 pass)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): MEDIUM + LOW + rate-limit-policy CI check (#3207)

Closes out the remaining @koala73 review findings from #3242 that
didn't already land in the HIGH-fix commits, plus the requested CI
check that would have caught HIGH #1 (dead-code policy key) at
review time.

### MEDIUM #5 — Turnstile missing-secret policy default

Flip `verifyTurnstile`'s default `missingSecretPolicy` from `'allow'`
to `'allow-in-development'`. Dev with no secret = pass (expected
local); prod with no secret = reject + log. submit-contact was
already explicitly overriding to `'allow-in-development'`;
register-interest was silently getting `'allow'`. Safe default now
means a future missing-secret misconfiguration in prod gets caught
instead of silently letting bots through. Removed the now-redundant
override in submit-contact.

### MEDIUM #6 — Silent enum fallbacks in maritime client

`toDisruptionEvent` mapped `AIS_DISRUPTION_TYPE_UNSPECIFIED` / unknown
enum values → `gap_spike` / `low` silently. Refactored to return null
when either enum is unknown; caller filters nulls out of the array.
Handler doesn't produce UNSPECIFIED today, but the `gap_spike`
default would have mislabeled the first new enum value the proto
ever adds — dropping unknowns is safer than shipping wrong labels.

### LOW — Copy drift in register-interest email

Email template hardcoded `435+ Sources`; PR #3241 bumped marketing to
`500+`. Bumped in the rewritten file to stay consistent.

The `as any` on Convex mutation names carried over from legacy and
filed as follow-up #3253.

### Rate-limit-policy coverage lint

`scripts/enforce-rate-limit-policies.mjs` validates every key in
`ENDPOINT_RATE_POLICIES` resolves to a proto-generated gateway route
by cross-referencing `docs/api/*.openapi.yaml`. Fails with the
sanctions-entity-search incident referenced in the error message so
future drift has a paper trail.

Wired into package.json (`lint:rate-limit-policies`) and the pre-push
hook alongside `lint:boundaries`. Smoke-tested both directions —
clean repo passes (5 policies / 175 routes), seeded drift (the exact
HIGH #1 typo) fails with the advertised remedy text.

### Verified
- `lint:rate-limit-policies` ✓
- `typecheck` + `typecheck:api` ✓
- `lint:api-contract` ✓ (56 entries)
- `lint:boundaries` ✓
- edge-functions + contact-handler tests (147 pass)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 5): decomm /api/eia/* + migrate /api/satellites → IntelligenceService (#3207)

Both targets turned out to be decomm-not-migration cases. The original
plan called for two new services (economic/v1.GetEiaSeries +
natural/v1.ListSatellitePositions) but research found neither was
needed:

### /api/eia/[[...path]].js — pure decomm, zero consumers

The "catch-all" is a misnomer — only two paths actually worked,
/api/eia/health and /api/eia/petroleum, both Redis-only readers.
Zero frontend callers in src/. Zero server-side readers. Nothing
consumes the `energy:eia-petroleum:v1` key that seed-eia-petroleum.mjs
writes daily.

The EIA data the frontend actually uses goes through existing typed
RPCs in economic/v1: GetEnergyPrices, GetCrudeInventories,
GetNatGasStorage, GetEnergyCapacity. None of those touch /api/eia/*.

Building GetEiaSeries would have been dead code. Deleted the legacy
file + its test (tests/api-eia-petroleum.test.mjs — it only covered
the legacy endpoint, no behavior to preserve). Empty api/eia/ dir
removed.

**Note for review:** the Redis seed cron keeps running daily and
nothing consumes it. If that stays unused, seed-eia-petroleum.mjs
should be retired too (separate PR). Out of scope for sebuf-migration.

### /api/satellites.js — Learning #2 strikes again

IntelligenceService.ListSatellites already exists at
/api/intelligence/v1/list-satellites, reads the same Redis key
(intelligence:satellites:tle:v1), and supports an optional country
filter the legacy didn't have.

One frontend caller in src/services/satellites.ts needed to switch
from `fetch(toApiUrl('/api/satellites'))` to the typed
IntelligenceServiceClient.listSatellites. Shape diff was tiny —
legacy `noradId` became proto `id` (handler line 36 already picks
either), everything else identical. alt/velocity/inclination in the
proto are ignored by the caller since it propagates positions
client-side via satellite.js.

Kept the client-side cache + failure cooldown + 20s timeout (still
valid concerns at the caller level).

### Manifest + docs
- api-route-exceptions.json: 56 → 54 entries (both removed)
- docs/api-proxies.mdx: dropped the two rows from the Raw-data
  passthroughs table

### Verified
- typecheck + typecheck:api ✓
- lint:api-contract (54 entries) / lint:boundaries / lint:rate-limit-policies ✓
- tests/edge-functions.test.mjs 127 pass (down from 130 — 3 tests were
  for the deleted eia endpoint)
- make generate zero-diff (no proto changes)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 6): migrate /api/supply-chain/v1/{country-products,multi-sector-cost-shock} → SupplyChainService (#3207)

Both endpoints were hand-rolled TS handlers sitting under a proto URL prefix —
the exact drift the manifest guardrail flagged. Promoted both to typed RPCs:

- GetCountryProducts → /api/supply-chain/v1/get-country-products
- GetMultiSectorCostShock → /api/supply-chain/v1/get-multi-sector-cost-shock

Handlers preserve the existing semantics: PRO-gate via isCallerPremium(ctx.request),
iso2 / chokepointId validation, raw bilateral-hs4 Redis read (skip env-prefix to
match seeder writes), CHOKEPOINT_STATUS_KEY for war-risk tier, and the math from
_multi-sector-shock.ts unchanged. Empty-data and non-PRO paths return the typed
empty payload (no 403 — the sebuf gateway pattern is empty-payload-on-deny).

Client wrapper switches from premiumFetch to client.getCountryProducts/
client.getMultiSectorCostShock. Legacy MultiSectorShock / MultiSectorShockResponse /
CountryProductsResponse names remain as type aliases of the generated proto types
so CountryBriefPanel + CountryDeepDivePanel callsites compile with zero churn.

Manifest 54 → 52. Rate-limit gateway routes 175 → 177.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): add cache-tier entries for new supply-chain RPCs (#3207)

Pre-push tests/route-cache-tier.test.mjs caught the missing entries.
Both PRO-gated, request-varying — match the existing supply-chain PRO cohort
(get-country-cost-shock, get-bypass-options, etc.) at slow-browser tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 7): migrate /api/scenario/v1/{run,status,templates} → ScenarioService (#3207)

Promote the three literal-filename scenario endpoints to a typed sebuf
service with three RPCs:

  POST /api/scenario/v1/run-scenario        (RunScenario)
  GET  /api/scenario/v1/get-scenario-status (GetScenarioStatus)
  GET  /api/scenario/v1/list-scenario-templates (ListScenarioTemplates)

Preserves all security invariants from the legacy handlers:
- 405 for wrong method (sebuf service-config method gate)
- scenarioId validation against SCENARIO_TEMPLATES registry
- iso2 regex ^[A-Z]{2}$
- JOB_ID_RE path-traversal guard on status
- Per-IP 10/min rate limit (moved to gateway ENDPOINT_RATE_POLICIES)
- Queue-depth backpressure (>100 → 429)
- PRO gating via isCallerPremium
- AbortSignal.timeout on every Redis pipeline (runRedisPipeline helper)

Wire-level diffs vs legacy:
- Per-user RL now enforced at the gateway (same 10/min/IP budget).
- Rate-limit response omits Retry-After header; retryAfter is in the
  body per error-mapper.ts convention.
- ListScenarioTemplates emits affectedHs2: [] when the registry entry
  is null (all-sectors sentinel); proto repeated cannot carry null.
- RunScenario returns { jobId, status } (no statusUrl field — unused
  by SupplyChainPanel, drop from wire).

Gateway wiring:
- server/gateway.ts RPC_CACHE_TIER: list-scenario-templates → 'daily'
  (matches legacy max-age=3600); get-scenario-status → 'slow-browser'
  (premium short-circuit target, explicit entry required by
  tests/route-cache-tier.test.mjs).
- src/shared/premium-paths.ts: swap old run/status for the new
  run-scenario/get-scenario-status paths.
- api/scenario/v1/{run,status,templates}.ts deleted; 3 manifest
  exceptions removed (63 → 52 → 49 migration-pending).

Client:
- src/services/scenario/index.ts — typed client wrapper using
  premiumFetch (injects Clerk bearer / API key).
- src/components/SupplyChainPanel.ts — polling loop swapped from
  premiumFetch strings to runScenario/getScenarioStatus. Hard 20s
  timeout on run preserved via AbortSignal.any.

Tests:
- tests/scenario-handler.test.mjs — 18 new handler-level tests
  covering every security invariant + the worker envelope coercion.
- tests/edge-functions.test.mjs — scenario sections removed,
  replaced with a breadcrumb pointer to the new test file.

Docs: api-scenarios.mdx, scenario-engine.mdx, usage-rate-limits.mdx,
usage-errors.mdx, supply-chain.mdx refreshed with new paths.

Verified: typecheck, typecheck:api, lint:api-contract (49 entries),
lint:rate-limit-policies (6/180), lint:boundaries, route-cache-tier
(parity), full edge-functions (117) + scenario-handler (18).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 8): migrate /api/v2/shipping/{route-intelligence,webhooks} → ShippingV2Service (#3207)

Partner-facing endpoints promoted to a typed sebuf service. Wire shape
preserved byte-for-byte (camelCase field names, ISO-8601 fetchedAt, the
same subscriberId/secret formats, the same SET + SADD + EXPIRE 30-day
Redis pipeline). Partner URLs /api/v2/shipping/* are unchanged.

RPCs landed:
- GET  /route-intelligence  → RouteIntelligence  (PRO, slow-browser)
- POST /webhooks            → RegisterWebhook    (PRO)
- GET  /webhooks            → ListWebhooks       (PRO, slow-browser)

The existing path-parameter URLs remain on the legacy edge-function
layout because sebuf's HTTP annotations don't currently model path
params (grep proto/**/*.proto for `path: "{…}"` returns zero). Those
endpoints are split into two Vercel dynamic-route files under
api/v2/shipping/webhooks/, behaviorally identical to the previous
hybrid file but cleanly separated:
- GET  /webhooks/{subscriberId}                → [subscriberId].ts
- POST /webhooks/{subscriberId}/rotate-secret  → [subscriberId]/[action].ts
- POST /webhooks/{subscriberId}/reactivate     → [subscriberId]/[action].ts

Both get manifest entries under `migration-pending` pointing at #3207.

Other changes
- scripts/enforce-sebuf-api-contract.mjs: extended GATEWAY_RE to accept
  api/v{N}/{domain}/[rpc].ts (version-first) alongside the canonical
  api/{domain}/v{N}/[rpc].ts; first-use of the reversed ordering is
  shipping/v2 because that's the partner contract.
- vite.config.ts: dev-server sebuf interceptor regex extended to match
  both layouts; shipping/v2 import + allRoutes entry added.
- server/gateway.ts: RPC_CACHE_TIER entries for /api/v2/shipping/
  route-intelligence + /webhooks (slow-browser; premium-gated endpoints
  short-circuit to slow-browser but the entries are required by
  tests/route-cache-tier.test.mjs).
- src/shared/premium-paths.ts: route-intelligence + webhooks added.
- tests/shipping-v2-handler.test.mjs: 18 handler-level tests covering
  PRO gate, iso2/cargoType/hs2 coercion, SSRF guards (http://, RFC1918,
  cloud metadata, IMDS), chokepoint whitelist, alertThreshold range,
  secret/subscriberId format, pipeline shape + 30-day TTL, cross-tenant
  owner isolation, `secret` omission from list response.

Manifest delta
- Removed: api/v2/shipping/route-intelligence.ts, api/v2/shipping/webhooks.ts
- Added:   api/v2/shipping/webhooks/[subscriberId].ts (migration-pending)
- Added:   api/v2/shipping/webhooks/[subscriberId]/[action].ts (migration-pending)
- Added:   api/internal/brief-why-matters.ts (internal-helper) — regression
  surface from the #3248 main merge, which introduced the file without a
  manifest entry. Filed here to keep the lint green; not strictly in scope
  for commit 8 but unblocking.

Net result: 49 → 47 `migration-pending` entries (one net-removal even
though webhook path-params stay pending, because two files collapsed
into two dynamic routes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 1): SupplyChainServiceClient must use premiumFetch (#3207)

Signed-in browser pro users were silently hitting 401 on 8 supply-chain
premium endpoints (country-products, multi-sector-cost-shock,
country-chokepoint-index, bypass-options, country-cost-shock,
sector-dependency, route-explorer-lane, route-impact). The shared
client was constructed with globalThis.fetch, so no Clerk bearer or
X-WorldMonitor-Key was injected. The gateway's validateApiKey runs
with forceKey=true for PREMIUM_RPC_PATHS and 401s before isCallerPremium
is consulted. The generated client's try/catch collapses the 401 into
an empty-fallback return, leaving panels blank with no visible error.

Fix is one line at the client constructor: swap globalThis.fetch for
premiumFetch. The same pattern is already in use for insider-transactions,
stock-analysis, stock-backtest, scenario, trade (premiumClient) — this
was an omission on this client, not a new pattern.

premiumFetch no-ops safely when no credentials are available, so the
5 non-premium methods on this client (shippingRates, chokepointStatus,
chokepointHistory, criticalMinerals, shippingStress) continue to work
unchanged.

This also fixes two panels that were pre-existing latently broken on
main (chokepoint-index, bypass-options, etc. — predating #3207, not
regressions from it). Commit 6 expanded the surface by routing two more
methods through the same buggy client; this commit fixes the class.

From koala73 review (#3242 second-pass, HIGH new #1):
> Exact class PR #3233 fixed for RegionalIntelligenceBoard /
> DeductionPanel / trade / country-intel. Supply-chain was not in
> #3233's scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 2): restore 400 on input-shape errors for 2 supply-chain handlers (#3207)

Commit 6 collapsed all non-happy paths into empty-200 on
`get-country-products` and `get-multi-sector-cost-shock`, including
caller-bug cases that legacy returned 400 for:

- get-country-products: malformed iso2 → empty 200 (was 400)
- get-multi-sector-cost-shock: malformed iso2 / missing chokepointId /
  unknown chokepointId → empty 200 (was 400)

The commit message for 6 called out the 403-for-non-pro → empty-200
shift ("sebuf gateway pattern is empty-payload-on-deny") but not the
400 shift. They're different classes:

- Empty-payload-200 for PRO-deny: intentional contract change, already
  documented and applied across the service. Generated clients treat
  "you lack PRO" as "no data" — fine.
- Empty-payload-200 for malformed input: caller bug silently masked.
  External API consumers can't distinguish "bad wiring" from "genuinely
  no data", test harnesses lose the signal, bad calling code doesn't
  surface in Sentry.

Fix: `throw new ValidationError(violations)` on the 3 input-shape
branches. The generated sebuf server maps ValidationError → HTTP 400
(see src/generated/server/.../service_server.ts and leads/v1 which
already uses this pattern).

PRO-gate deny stays as empty-200 — that contract shift was intentional
and is preserved.

Regression tests added at tests/supply-chain-validation.test.mjs (8
cases) pinning the three-way contract:
- bad input                         → 400 (ValidationError)
- PRO-gate deny on valid input      → 200 empty
- valid PRO input, no data in Redis → 200 empty (unchanged)

From koala73 review (#3242 second-pass, HIGH new #2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 3): restore statusUrl on RunScenarioResponse + document 202→200 wire break (#3207)

Commit 7 silently shifted /api/scenario/v1/run-scenario's response
contract in two ways that the commit message covered only partially:

1. HTTP 202 Accepted → HTTP 200 OK
2. Dropped `statusUrl` string from the response body

The `statusUrl` drop was mentioned as "unused by SupplyChainPanel" but
not framed as a contract change. The 202 → 200 shift was not mentioned
at all. This is a same-version (v1 → v1) migration, so external callers
that key off either signal — `response.status === 202` or
`response.body.statusUrl` — silently branch incorrectly.

Evaluated options:
  (a) sebuf per-RPC status-code config — not available. sebuf's
      HttpConfig only models `path` and `method`; no status annotation.
  (b) Bump to scenario/v2 — judged heavier than the break itself for
      a single status-code shift. No in-repo caller uses 202 or
      statusUrl; the docs-level impact is containable.
  (c) Accept the break, document explicitly, partially restore.

Took option (c):

- Restored `statusUrl` in the proto (new field `string status_url = 3`
  on RunScenarioResponse). Server computes
  `/api/scenario/v1/get-scenario-status?jobId=<encoded job_id>` and
  populates it on every successful enqueue. External callers that
  followed this URL keep working unchanged.
- 202 → 200 is not recoverable inside the sebuf generator, so it is
  called out explicitly in two places:
    - docs/api-scenarios.mdx now includes a prominent `<Warning>` block
      documenting the v1→v1 contract shift + the suggested migration
      (branch on response body shape, not HTTP status).
    - RunScenarioResponse proto comment explains why 200 is the new
      success status on enqueue.
  OpenAPI bundle regenerated to reflect the restored statusUrl field.

- Regression test added in tests/scenario-handler.test.mjs pinning
  `statusUrl` to the exact URL-encoded shape — locks the invariant so
  a future proto rename or handler refactor can't silently drop it
  again.

From koala73 review (#3242 second-pass, HIGH new #3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 1/2): close webhook tenant-isolation gap on shipping/v2 (#3207)

Koala flagged this as a merge blocker in PR #3242 review.

server/worldmonitor/shipping/v2/{register-webhook,list-webhooks}.ts
migrated without reinstating validateApiKey(req, { forceKey: true }),
diverging from both the sibling api/v2/shipping/webhooks/[subscriberId]
routes and the documented "X-WorldMonitor-Key required" contract in
docs/api-shipping-v2.mdx.

Attack surface: the gateway accepts Clerk bearer auth as a pro signal.
A Clerk-authenticated pro user with no X-WorldMonitor-Key reaches the
handler, callerFingerprint() falls back to 'anon', and every such
caller collapses into a shared webhook:owner:anon:v1 bucket. The
defense-in-depth ownerTag !== ownerHash check in list-webhooks.ts
doesn't catch it because both sides equal 'anon' — every Clerk-session
holder could enumerate / overwrite every other Clerk-session pro
tenant's registered webhook URLs.

Fix: reinstate validateApiKey(ctx.request, { forceKey: true }) at the
top of each handler, throwing ApiError(401) when absent. Matches the
sibling routes exactly and the published partner contract.

Tests:
- tests/shipping-v2-handler.test.mjs: two existing "non-PRO → 403"
  tests for register/list were using makeCtx() with no key, which now
  fails at the 401 layer first. Renamed to "no API key → 401
  (tenant-isolation gate)" with a comment explaining the failure mode
  being tested. 18/18 pass.

Verified: typecheck:api, lint:api-contract (no change), lint:boundaries,
lint:rate-limit-policies, test:data (6005/6005).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 2/2): restore v1 path aliases on scenario + supply-chain (#3207)

Koala flagged this as a merge blocker in PR #3242 review.

Commits 6 + 7 of #3207 renamed five documented v1 URLs to the sebuf
method-derived paths and deleted the legacy edge-function files:

  POST /api/scenario/v1/run                       → run-scenario
  GET  /api/scenario/v1/status                    → get-scenario-status
  GET  /api/scenario/v1/templates                 → list-scenario-templates
  GET  /api/supply-chain/v1/country-products      → get-country-products
  GET  /api/supply-chain/v1/multi-sector-cost-shock → get-multi-sector-cost-shock

server/router.ts is an exact static-match table (Map keyed on `METHOD
PATH`), so any external caller — docs, partner scripts, grep-the-
internet — hitting the old documented URL would 404 on first request
after merge. Commit 8 (shipping/v2) preserved partner URLs byte-for-
byte; the scenario + supply-chain renames missed that discipline.

Fix: add five thin alias edge functions that rewrite the pathname to
the canonical sebuf path and delegate to the domain [rpc].ts gateway
via a new server/alias-rewrite.ts helper. Premium gating, rate limits,
entitlement checks, and cache-tier lookups all fire on the canonical
path — aliases are pure URL rewrites, not a duplicate handler pipeline.

  api/scenario/v1/{run,status,templates}.ts
  api/supply-chain/v1/{country-products,multi-sector-cost-shock}.ts

Vite dev parity: file-based routing at api/ is a Vercel concern, so the
dev middleware (vite.config.ts) gets a matching V1_ALIASES rewrite map
before the router dispatch.

Manifest: 5 new entries under `deferred` with removal_issue=#3282
(tracking their retirement at the next v1→v2 break). lint:api-contract
stays green (89 files checked, 55 manifest entries validated).

Docs:
- docs/api-scenarios.mdx: migration callout at the top with the full
  old→new URL table and a link to the retirement issue.
- CHANGELOG.md + docs/changelog.mdx: Changed entry documenting the
  rename + alias compat + the 202→200 shift (from commit 23c821a1).

Verified: typecheck:api, lint:api-contract, lint:rate-limit-policies,
lint:boundaries, test:data (6005/6005).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 09:55:59 +03:00
Elie Habib
425507d15a fix(brief): category-gated context + RELEVANCE RULE to stop formulaic grounding (#3281)
* fix(brief): category-gated context + RELEVANCE RULE to stop formulaic grounding

Shadow-diff of 15 v2 pairs (2026-04-22) showed the analyst pattern-
matching the loudest context numbers — VIX 19.50, top forecast
probability, MidEast FX stress 77 — into every story regardless of
editorial fit. A Rwanda humanitarian story about refugees cited VIX;
an aviation story cited a forecast probability.

Root cause: every story got the same 6-bundle context block, so the
LLM had markets / forecasts / macro in-hand and the "cite a specific
fact" instruction did the rest.

Two-layer fix:

  1. STRUCTURAL — sectionsForCategory() maps the story's category to
     an editorially-relevant subset of bundles. Humanitarian stories
     don't see marketData / forecasts / macroSignals; diplomacy gets
     riskScores only; market/energy gets markets+forecasts but drops
     riskScores. The model physically cannot cite what it wasn't
     given. Unknown categories fall back to all six (backcompat).

  2. PROMPT — WHY_MATTERS_ANALYST_SYSTEM_V2 adds a RELEVANCE RULE
     that explicitly permits grounding in headline/description
     actors when no context fact is a clean fit, and bans dragging
     off-topic market metrics into humanitarian/aviation/diplomacy
     stories. The prompt footer (inline, per-call) restates the
     same guardrail — models follow inline instructions more
     reliably than system-prompt constraints on longer outputs.

Cache keys bumped to invalidate the formulaic v5 output: endpoint
v5 to v6, shadow v3 to v4. Adds 11 unit tests pinning the 5
policies + default fallback + humanitarian structural guarantee +
market policy does-see-markets + guardrail footer presence.

Observability: endpoint now logs policyLabel per call so operators
can confirm in Vercel logs that humanitarian/aviation stories are
NOT seeing marketData without dumping the full prompt.

* test(brief): address greptile P2 — sync MAX_BODY_BYTES + add parseWhyMattersV2 coverage

Greptile PR #3281 review raised two P2 test-quality issues:

1. Test-side MAX_BODY_BYTES mirror was still 4096 — the endpoint
   was bumped to 8192 in PR #3269 (v2 output + description). With
   the stale constant, a payload in the 4097–8192 range was
   accepted by the real endpoint but looked oversize in the test
   mirror, letting the body-cap invariant silently drift. Fixed
   by syncing to 8192 + bumping the bloated fixture to 10_000
   bytes so a future endpoint-cap bump doesn't silently
   re-invalidate the assertion.

2. parseWhyMattersV2 (the only output-validation gate on the
   analyst path) had no dedicated unit tests. Adds 11 targeted
   cases covering: valid 2 and 3 sentence output, 100/500 char
   bounds (incl. boundary assertions), all 6 banned preamble
   phrases, section-label leaks (SITUATION/ANALYSIS/Watch),
   markdown leakage (#, -, *, 1.), stub echo rejection, smart/
   plain quote stripping, non-string defensive branch, and
   whitespace-only strings.

Suite size: 50 to 61 tests, all green.

* fix(brief): add aviation policy to sectionsForCategory (PR #3281 review P1)

Reviewer caught that aviation was named in WHY_MATTERS_ANALYST_SYSTEM_V2's
RELEVANCE RULE as a category banned from off-topic market metrics, but
had no matching regex entry in CATEGORY_SECTION_POLICY. So 'Aviation
Incident' / 'Airspace Closure' / 'Plane Crash' / 'Drone Incursion' all
fell through to DEFAULT_SECTIONS and still got all 6 bundles including
marketData, forecasts, and macroSignals — exactly the VIX / forecast
probability pattern the PR claimed to structurally prevent.

Reproduced on HEAD before fix:
  Aviation Incident -> default
  Airspace Closure  -> default
  Plane Crash       -> default
  ...etc.

Fix:
  1. Adds aviation policy (same 3 bundles as humanitarian/diplomacy/
     tech: worldBrief, countryBrief, riskScores).
  2. Adds dedicated aviation-gating test with 6 category variants.
  3. Adds meta-invariant test: every category named in the system
     prompt's RELEVANCE RULE MUST have a structural policy entry,
     asserting policyLabel !== 'default'. If someone adds a new
     category name to the prompt in the future, this test fires
     until they wire up a regex — prevents soft-guard drift.
  4. Removes 'Aviation Incident' from the default-fall-through test
     list (it now correctly matches aviation).

No cache bump needed — v6 was published to the feature branch only a
few minutes ago, no production entries have been written yet.
2026-04-22 08:21:01 +04:00