654 Commits

Author SHA1 Message Date
Sebastien Melki
1db05e6caa feat(usage): per-request Axiom telemetry pipeline (gateway + upstream attribution) (#3403)
* feat(gateway): thread Vercel Edge ctx through createDomainGateway (#3381)

PR-0 of the Axiom usage-telemetry stack. Pure infra change: no telemetry
emission yet, only the signature plumbing required for ctx.waitUntil to
exist on the hot path.

- createDomainGateway returns (req, ctx) instead of (req)
- rewriteToSebuf propagates ctx to its target gateway
- 5 alias callsites updated to pass ctx through
- ~30 [rpc].ts callsites unchanged (export default createDomainGateway(...))

Pattern reference: api/notification-channels.ts:166.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(usage): pure UsageIdentity resolver + Axiom emit primitives (#3381)

server/_shared/usage-identity.ts
- buildUsageIdentity: pure function, consumes already-resolved gateway state.
- Static ENTERPRISE_KEY_TO_CUSTOMER map (explicit, reviewable in code).
- Does not re-verify JWTs or re-validate API keys.

server/_shared/usage.ts
- buildRequestEvent / buildUpstreamEvent: allowlisted-primitive builders only.
  Never accept Request/Response — additive field leaks become structurally
  impossible.
- emitUsageEvents → ctx.waitUntil(sendToAxiom). Direct fetch, 1.5s timeout,
  no retry, gated by USAGE_TELEMETRY=1 and AXIOM_API_TOKEN.
- Sliding-window circuit breaker (5% over 5min, min 20 samples). Trips with
  one structured console.error; subsequent drops are 1%-sampled console.warn.
- Header derivers reuse Vercel/CF headers for request_id, region, country,
  reqBytes; ua_hash null unless USAGE_UA_PEPPER is set (no stable
  fingerprinting).
- Dev-only x-usage-telemetry response header for 2-second debugging.

server/_shared/auth-session.ts
- New resolveClerkSession returning { userId, orgId } in one JWT verify so
  customer_id can be Clerk org id without a second pass. resolveSessionUserId
  kept as back-compat wrapper.

No emission wiring yet — that lands in the next commit (gateway request
event + 403 + 429).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gateway): emit Axiom request events on every return path (#3381)

Wires the request-event side of the Axiom usage-telemetry stack. Behind
USAGE_TELEMETRY=1 — no-op when the env var is unset.

Emit points (each builds identity from accumulated gateway state):
- origin_403 disallowed origin → reason=origin_403
- API access subscription required (403)
- legacy bearer 401 / 403 / 401-without-bearer
- entitlement check fail-through
- endpoint rate-limit 429 → reason=rate_limit_429
- global rate-limit 429 → reason=rate_limit_429
- 405 method not allowed
- 404 not found
- 304 etag match (resolved cache tier)
- 200 GET with body (resolved cache tier, real res_bytes)
- streaming / non-GET-200 final return (res_bytes best-effort)

Identity inputs (UsageIdentityInput):
- sessionUserId / clerkOrgId from new resolveClerkSession (one JWT verify)
- isUserApiKey + userApiKeyCustomerRef from validateUserApiKey result
- enterpriseApiKey when keyCheck.valid + non-wm_ wmKey present
- widgetKey from x-widget-key header (best-effort)
- tier captured opportunistically from existing getEntitlements calls

Header derivers reuse Vercel/CF metadata (x-vercel-id, x-vercel-ip-country,
cf-ipcountry, content-length, sentry-trace) — no new geo lookup, no new
crypto on the hot path. ua_hash null unless USAGE_UA_PEPPER is set.

Dev-only x-usage-telemetry response header (ok | degraded | off) attached
on the response paths for 2-second debugging in non-production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(usage): upstream events via implicit request scope (#3381)

Closes the upstream-attribution side of the Axiom usage-telemetry stack
without requiring leaf-handler changes (per koala's review).

server/_shared/usage.ts
- AsyncLocalStorage-backed UsageScope: gateway sets it once per request,
  fetch helpers read from it lazily. Defensive import — if the runtime
  rejects node:async_hooks, scope helpers degrade to no-ops and the
  request event is unaffected.
- runWithUsageScope(scope, fn) / getUsageScope() exports.

server/gateway.ts
- Wraps matchedHandler in runWithUsageScope({ ctx, requestId, customerId,
  route, tier }) so deep fetchers can attribute upstream calls without
  threading state through every handler signature.

server/_shared/redis.ts
- cachedFetchJsonWithMeta accepts opts.usage = { provider, operation? }.
  Only the provider label is required to opt in — request_id / customer_id
  / route / tier flow implicitly from UsageScope.
- Emits on the fresh path only (cache hits don't emit; the inbound
  request event already records cache_status).
- cache_status correctly distinguishes 'miss' vs 'neg-sentinel' by
  construction, matching NEG_SENTINEL handling.
- Telemetry never throws — failures are swallowed in the lazy-import
  catch, sink itself short-circuits on USAGE_TELEMETRY=0.

server/_shared/fetch-json.ts
- New optional { provider, operation } in FetchJsonOptions. Same
  opt-in-by-provider model as cachedFetchJsonWithMeta. Auto-derives host
  from URL. Reads body via .text() so response_bytes is recorded
  (best-effort; chunked responses still report 0).

Net result: any handler that uses fetchJson or cachedFetchJsonWithMeta
gets full per-customer upstream attribution by adding two fields to the
options bag. No signature changes anywhere else.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): address round-1 codex feedback on usage telemetry

- ctx is now optional on the createDomainGateway handler signature so
  direct callers (tests, non-Vercel paths) no longer crash on emit
- legacy premium bearer-token routes (resilience, shipping-v2) propagate
  session.userId into the usage accumulator so successful requests are
  attributed instead of emitting as anon
- after checkEntitlement allows a tier-gated route, re-read entitlements
  (Redis-cached + in-flight coalesced) to populate usage.tier so
  analyze-stock & co. emit the correct tier rather than 0
- domain extraction now skips a leading vN segment, so /api/v2/shipping/*
  records domain="shipping" instead of "v2"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(usage): assert telemetry payload + identity resolver + operator guide

- tests/usage-telemetry-emission.test.mts stubs globalThis.fetch to
  capture the Axiom ingest POST body and asserts the four review-flagged
  fields end-to-end through the gateway: domain on /api/v2/<svc>/* (was
  "v2"), customer_id on legacy premium bearer success (was null/anon),
  tier on entitlement-gated success via the Convex fallback path (was 0),
  plus a ctx-optional regression guard
- server/__tests__/usage-identity.test.ts unit-tests the pure
  buildUsageIdentity() resolver across every auth_kind branch, tier
  coercion, and the secret-handling invariant (raw enterprise key never
  lands in any output field)
- docs/architecture/usage-telemetry.md is the operator + dev guide:
  field reference, architecture, configuration, failure modes, local
  workflow, eight Axiom APL recipes, and runbooks for adding fields /
  new gateway return paths

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(usage): make recorder.settled robust to nested waitUntil

Promise.all(pending) snapshotted the array at call time, missing the
inner ctx.waitUntil(sendToAxiom(...)) that emitUsageEvents pushes after
the outer drain begins. Tests passed only because the fetch spy resolved
in an earlier microtask tick. Replace with a quiescence loop so the
helper survives any future async in the emit path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: trigger preview

* fix(usage): address koala #3403 review — collapse nested waitUntil, widget-key validation, neg-sentinel status, auth_* reasons

P1
- Collapse nested ctx.waitUntil at all 3 emit sites (gateway.ts emitRequest,
  fetch-json.ts, redis.ts emitUpstreamFromHook). Export sendToAxiom and call
  it directly inside the outer waitUntil so Edge runtimes don't drop the
  delivery promise after the response phase.
- Validate X-Widget-Key against WIDGET_AGENT_KEY before populating usage.widgetKey
  so unauthenticated callers can't spoof per-customer attribution.

P2
- Emit on OPTIONS preflight (new 'preflight' RequestReason).
- Gate cachedFetchJsonWithMeta upstreamStatus=200 on result != null so the
  neg-sentinel branch no longer reports as a successful upstream call.
- Extend RequestReason with auth_401/auth_403/tier_403 and replace
  reason:'ok' on every auth/tier-rejection emit path.
- Replace 32-bit FNV-1a with a two-round XOR-folded 64-bit variant in
  hashKeySync (collision space matters once widget-key adoption grows).

Verification
- tests/usage-telemetry-emission.test.mts — 6/6
- tests/premium-stock-gateway.test.mts + tests/gateway-cdn-origin-policy.test.mts — 15/15
- npx vitest run server/__tests__/usage-identity.test.ts — 13/13
- npx tsc --noEmit clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: trigger preview rebuild for AXIOM_API_TOKEN

* chore(usage): note Axiom region in ingest URL comment

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* debug(usage): unconditional logs in sendToAxiom for preview troubleshooting

Temporary — to be reverted once Axiom delivery is confirmed working in preview.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(usage): add 'live' cache tier + revert preview debug logs

- Sync UsageCacheTier with the local CacheTier in gateway.ts (main added
  'live' in PR #3402 — synthetic merge with main was failing typecheck:api).
- Revert temporary unconditional debug logs in sendToAxiom now that Axiom
  delivery is verified end-to-end on preview (event landed with all fields
  populated, including the new auth_401 reason from the koala #3403 fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 18:10:51 +03:00
Elie Habib
8cca8d19e3 feat(resilience): Comtrade-backed re-export-share seeder + SWF Redis read (#3385)
* feat(seed): BUNDLE_RUN_STARTED_AT_MS env + runSeed SIGTERM cleanup

Prereq for the re-export-share Comtrade seeder (plan 2026-04-24-003),
usable by any cohort seeder whose consumer needs bundle-level freshness.

Two coupled changes:

1. `_bundle-runner.mjs` injects `BUNDLE_RUN_STARTED_AT_MS` into every
   spawned child. All siblings in a single bundle run share one value
   (captured at `runBundle` start, not spawn time). Consumers use this
   to detect stale peer keys — if a peer's seed-meta predates the
   current bundle run, fall back to a hard default rather than read
   a cohort-peer's last-week output.

2. `_seed-utils.mjs::runSeed` registers a `process.once('SIGTERM')`
   handler that releases the acquired lock and extends existing-data
   TTL before exiting 143. `_bundle-runner.mjs` sends SIGTERM on
   section timeout, then SIGKILL after KILL_GRACE_MS (5s). Without
   this handler the `finally` path never runs on SIGKILL, leaving
   the 30-min acquireLock reservation in place until its own TTL
   expires — the next cron tick silently skips the resource.

Regression guard memory: `bundle-runner-sigkill-leaks-child-lock` (PR
#3128 root cause).

Tests added:
- bundle-runner env injection (value within run bounds)
- sibling sections share the same timestamp (critical for the
  consumer freshness guard)
- runSeed SIGTERM path: exit 143 + cleanup log
- process.once contract: second SIGTERM does not re-enter handler

* fix(seed): address P1/P2 review findings on SIGTERM + bundle contracts

Addresses PR #3384 review findings (todos 256, 257, 259, 260):

#256 (P1) — SIGTERM handler narrowed to fetch phase only. Was installed
at runSeed entry and armed through every `process.exit` path; could
race `emptyDataIsFailure: true` strict-floor exits (IMF-External,
WB-bulk) and extend seed-meta TTL when the contract forbids it —
silently re-masking 30-day outages. Now the handler is attached
immediately before `withRetry(fetchFn)` and removed in a try/finally
that covers all fetch-phase exit branches.

#257 (P1) — `BUNDLE_RUN_STARTED_AT_MS` now has a first-class helper.
Exported `getBundleRunStartedAtMs()` from `_seed-utils.mjs` with JSDoc
describing the bundle-freshness contract. Fleet-wide helper so the
next consumer seeder imports instead of rediscovering the idiom.

#259 (P2) — SIGTERM cleanup runs `Promise.allSettled` on disjoint-key
ops (`releaseLock` + `extendExistingTtl`). Serialising compounded
Upstash latency during the exact failure mode (Redis degraded) this
handler exists to handle, risking breach of the 5s SIGKILL grace.

#260 (P2) — `_bundle-runner.mjs` asserts topological order on
optional `dependsOn` section field. Throws on unknown-label refs and
on deps appearing at a later index. Fleet-wide contract replacing
the previous prose-comment ordering guarantee.

Tests added/updated:
- New: SIGTERM handler removed after fetchFn completes (narrowed-scope
  contract — post-fetch SIGTERM must NOT trigger TTL extension)
- New: dependsOn unknown-label + out-of-order + happy-path (3 tests)

Full test suite: 6,866 tests pass (+4 net).

* fix(seed): getBundleRunStartedAtMs returns null outside a bundle run

Review follow-up: the earlier `Math.floor(Date.now()/1000)*1000` fallback
regressed standalone (non-bundle) runs. A consumer seeder invoked
manually just after its peer wrote `fetchedAt = (now - 5s)` would see
`bundleStartMs = Date.now()`, reject the perfectly-fresh peer envelope
as "stale", and fall back to defaults — defeating the point of the
peer-read path outside the bundle.

Returning null when `BUNDLE_RUN_STARTED_AT_MS` is unset/invalid keeps
the freshness gate scoped to its real purpose (across-bundle-tick
staleness) and lets standalone runs skip the gate entirely. Consumers
check `bundleStartMs != null` before applying the comparison; see the
companion `seed-sovereign-wealth.mjs` change on the stacked PR.

* test(seed): SIGTERM cleanup test now verifies Redis DEL + EXPIRE calls

Greptile review P2 on PR #3384: the existing test only asserted exit
code + log line, not that the Redis ops were actually issued. The
log claim was ahead of the test.

Fixture now logs every Upstash fetch call's shape (EVAL / pipeline-
EXPIRE / other) to stderr. Test asserts:

- >=1 EVAL op was issued during SIGTERM cleanup (releaseLock Lua
  script on the lock key)
- >=1 pipeline-EXPIRE op was issued (extendExistingTtl on canonical
  + seed-meta keys)
- The EVAL body carries the runSeed-generated runId (proves it's
  THIS run's release, not a phantom op)
- The EXPIRE pipeline touches both the canonicalKey AND the
  seed-meta key (proves the keys[] array was built correctly
  including the extraKeys merge path)

Full test suite: 6,866 tests pass, typecheck clean.

* feat(resilience): Comtrade-backed re-export-share seeder + SWF Redis read

Plan ref: docs/plans/2026-04-24-003-feat-reexport-share-comtrade-seeder-plan.md

Motivating case. Before this PR, the SWF `rawMonths` denominator for
the `sovereignFiscalBuffer` dimension used GROSS annual imports for
every country. For re-export hubs (goods transiting without domestic
settlement), this structurally under-reports resilience: UAE's 2023
$941B of imports include $334B of transit flow that never represents
domestic consumption. Net imports = gross × (1 − reexport_share).

The previous (PR 3A) design flattened a hand-curated YAML into Redis;
the YAML shipped empty and never populated, so the correction never
applied and the cohort audit showed no movement.

Gap #2 (this PR). Two coupled changes to make the correction actually
apply:

1. Comtrade-backed seeder (`scripts/seed-recovery-reexport-share.mjs`).
   Rewritten to fetch UN Comtrade `flowCode=RX` (re-exports) and
   `flowCode=M` (imports) per cohort member, compute share = RX/M at
   the latest co-populated year, clamp to [0.05, 0.95], publish the
   envelope. Header auth (`Ocp-Apim-Subscription-Key`) — subscription
   key never reaches URL/logs/Redis. `maxRecords=250000` cap with
   truncation detection. Sequential + retry-on-429 with backoff.

   Hub cohort resolved by Phase 0 empirical probe (plan §Phase 0):
   ['AE', 'PA']. Six candidates (SG/HK/NL/BE/MY/LT) return HTTP 200
   with zero RX rows — Comtrade doesn't expose RX for those reporters.

2. SWF seeder reads from Redis (`scripts/seed-sovereign-wealth.mjs`).
   Swaps `loadReexportShareByCountry()` (YAML) for
   `loadReexportShareFromRedis()` (Redis key written by #1). Guarded
   by bundle-run freshness: if the sibling Reexport-Share seeder's
   `seed-meta` predates `BUNDLE_RUN_STARTED_AT_MS` (set by the
   prereq PR's `_bundle-runner.mjs` env-injection), HARD fallback
   to gross imports rather than apply last-month's stale share.

Health registries. Both new keys registered in BOTH `api/health.js`
SEED_META (60-day alert threshold) and `api/seed-health.js`
SEED_DOMAINS (43200min interval). feedback_two_health_endpoints_must_match.

Bundle wiring. `seed-bundle-resilience-recovery` Reexport-Share
timeout bumped 60s → 300s (Comtrade + retry can take 2-3 min
worst-case). Ordering preserved: Reexport-Share before Sovereign-
Wealth so the SWF seeder reads a freshly-written key in the same
cron tick.

Deletions. YAML + loader + 7 obsolete loader tests removed; single
source of truth is now Comtrade → Redis.

Prereq. Stacks on PR #3384 (feat/bundle-runner-env-sigterm)
which adds BUNDLE_RUN_STARTED_AT_MS env injection + runSeed
SIGTERM cleanup. This PR's bundle-freshness guard depends on
that env variable.

Tests (19 new, 7 deleted, +12 net):
- Pure math: parseComtradeFlowResponse, computeShareFromFlows,
  clampShare, declareRecords + credential-leak source scan (15)
- Integration (Gap #2 regression guards): SWF seeder loadReexport
  ShareFromRedis — fresh/absent/malformed/stale-meta/missing-meta (5)
- Health registry dual-registry drift guard — scoped to this PR's
  keys, respecting pre-existing asymmetry (4)
- Bundle-ordering + timeout assertions (2)

Phase 0 cohort validation committed to plan. Full test suite
passes: 6,881 tests.

* fix(resilience): address P1/P2 review findings — adopt shared helpers, pin freshness boundary

Addresses PR #3385 review findings:

#257 (P1) consumer — `seed-sovereign-wealth.mjs` imports the shared
`getBundleRunStartedAtMs` helper from `_seed-utils.mjs` (added in the
prereq commit) instead of its own `getBundleStartMs`. Single source of
truth for the bundle-freshness contract.

#258 (P2) — `seed-recovery-reexport-share.mjs` isMain guard uses the
canonical `pathToFileURL(process.argv[1]).href === import.meta.url`
form instead of basename-suffix matching. Handles symlinks, case-
different paths on macOS HFS+, and Windows path separators without
string munging.

#260 (P2) consumer — Sovereign-Wealth declares `dependsOn:
['Reexport-Share']` in the bundle spec. `_bundle-runner.mjs` (prereq
commit) now enforces topological order on load and throws on
violation — replaces the previous prose-comment ordering contract.

#261 (P2) — added a test to `tests/seed-sovereign-wealth-reads-redis-
reexport-share.test.mts` pinning the inclusive-boundary semantic:
`fetchedAtMs === bundleStartMs` must be treated as FRESH. Guards
against a future refactor to `<=` that would silently reject peers
writing at the very first millisecond of the bundle run.

Rebased onto updated prereq. Full test suite: 6,886 tests pass (+5 net).

* fix(resilience): freshness gate skipped in standalone mode; meta still required

Review catch: the previous `bundleStartMs = Date.now()` fallback made
standalone/manual `seed-sovereign-wealth.mjs` runs ALWAYS reject any
previously-seeded re-export-share meta as "stale" — even when the
operator ran the Reexport seeder milliseconds beforehand. Defeated
the point of the peer-read path outside the bundle.

With `getBundleRunStartedAtMs()` now returning null outside a bundle
(companion commit on the prereq branch), the consumer only applies
the freshness gate when `bundleStartMs != null`. Standalone runs
accept any `fetchedAt` — the operator is responsible for ordering.

Two guards survive the change:
- Meta MUST exist (absence = peer-outage fail-safe, both modes)
- In-bundle: meta MUST be at or after `BUNDLE_RUN_STARTED_AT_MS`

Two new tests pin both modes:
- standalone: accepts meta written 10 min before this process started
- standalone: still rejects missing meta (peer-outage fail-safe
  survives gate bypass)

Rebased onto updated prereq. Full test suite: 6,888 tests (+2 net).

* fix(resilience): filter world-aggregate Comtrade rows + skip final-retry sleep

Greptile review of PR #3385 flagged two P2s in the Comtrade seeder.

Finding #3 (parseComtradeFlowResponse double-count risk):
`cmdCode=TOTAL` without a partner filter currently returns only
world-aggregate rows in practice — but `parseComtradeFlowResponse`
summed every row unconditionally. A future refactor adding per-
partner querying would silently double-count (world-aggregate row +
partner-level rows for the same year), cutting the derived share in
half with no test signal.

Fix: explicit `partnerCode ∈ {'0', 0, null/undefined}` filter. Matches
current empirical behavior (aggregate-only responses) and makes the
construct robust to a future partner-level query.

Finding #4 (wasted backoff on final retry):
429 and 5xx branches slept `backoffMs` before `continue`, but on
`attempt === RETRY_MAX_ATTEMPTS` the loop condition fails immediately
after — the sleep was pure waste. Added early-return (parallel to the
existing pattern in the network-error catch branch) so the final
attempt exits the retry loop at the first non-success response
without extra latency.

Tests:
- 3 new `parseComtradeFlowResponse` variants: world-only filter,
  numeric-0 partnerCode shape, rows without partnerCode field
- Existing tests updated: the double-count assertion replaced with
  a "per-partner rows must NOT sum into the world-aggregate total"
  assertion that pins the new contract

Rebased onto updated prereq. Full test suite: 6,890 tests (+2 net).
2026-04-25 00:14:17 +04:00
Sebastien Melki
e68a7147dd chore(api): sebuf migration follow-ups (post-#3242) (#3287)
* chore(api-manifest): rewrite brief-why-matters reason as proper internal-helper justification

Carried in from #3248 merge as a band-aid (called out in #3242 review followup
checklist item 7). The endpoint genuinely belongs in internal-helper —
RELAY_SHARED_SECRET-bearer auth, cron-only caller, never reached by dashboards
or partners. Same shape constraint as api/notify.ts.

Replaces the apologetic "filed here to keep the lint green" framing with a
proper structural justification: modeling it as a generated service would
publish internal cron plumbing as user-facing API surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(lint): premium-fetch parity check for ServiceClients (closes #3279)

Adds scripts/enforce-premium-fetch.mjs — AST-walks src/, finds every
`new <ServiceClient>(...)` (variable decl OR `this.foo =` assignment),
tracks which methods each instance actually calls, and fails if any
called method targets a path in src/shared/premium-paths.ts
PREMIUM_RPC_PATHS without `{ fetch: premiumFetch }` on the constructor.

Per-call-site analysis (not class-level) keeps the trade/index.ts pattern
clean — publicClient with globalThis.fetch + premiumClient with
premiumFetch on the same TradeServiceClient class — since publicClient
never calls a premium method.

Wired into:
- npm run lint:premium-fetch
- .husky/pre-push (right after lint:rate-limit-policies)
- .github/workflows/lint-code.yml (right after lint:api-contract)

Found and fixed three latent instances of the HIGH(new) #1 class from
#3242 review (silent 401 → empty fallback for signed-in browser pros):

- src/services/correlation-engine/engine.ts — IntelligenceServiceClient
  built with no fetch option called deductSituation. LLM-assessment overlay
  on convergence cards never landed for browser pros without a WM key.
- src/services/economic/index.ts — EconomicServiceClient with
  globalThis.fetch called getNationalDebt. National-debt panel rendered
  empty for browser pros.
- src/services/sanctions-pressure.ts — SanctionsServiceClient with
  globalThis.fetch called listSanctionsPressure. Sanctions-pressure panel
  rendered empty for browser pros.

All three swap to premiumFetch (single shared client, mirrors the
supply-chain/index.ts justification — premiumFetch no-ops safely on
public methods, so the public methods on those clients keep working).

Verification:
- lint:premium-fetch clean (34 ServiceClient classes, 28 premium paths,
  466 src/ files analyzed)
- Negative test: revert any of the three to globalThis.fetch → exit 1
  with file:line and called-premium-method names
- typecheck + typecheck:api clean
- lint:api-contract / lint:rate-limit-policies / lint:boundaries clean
- tests/sanctions-pressure.test.mjs + premium-fetch.test.mts: 16/16 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(military): fetchStaleFallback NEG_TTL=30s parity (closes #3277)

The legacy /api/military-flights handler had NEG_TTL = 30_000ms — a short
suppression window after a failed live + stale read so we don't Redis-hammer
the stale key during sustained relay+seed outages.

Carried into the sebuf list-military-flights handler:
- Module-scoped `staleNegUntil` timestamp (per-isolate on Vercel Edge,
  which is fine — each warm isolate gets its own 30s suppression window).
- Set whenever fetchStaleFallback returns null (key missing, parse fail,
  empty array after staleToProto filter, or thrown error).
- Checked at the entry of fetchStaleFallback before doing the Redis read.
- Test seam `_resetStaleNegativeCacheForTests()` exposed for unit tests.

Test pinned in tests/redis-caching.test.mjs: drives a stale-empty cycle
three times — first read hits Redis, second within window doesn't, after
test-only reset it does again.

Verified: 18/18 redis-caching tests pass, typecheck:api clean,
lint:premium-fetch clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(lint): rate-limit-policies regex → import() (closes #3278)

The previous lint regex-parsed ENDPOINT_RATE_POLICIES from the source
file. That worked because the literal happens to fit a single line per
key today, but a future reformat (multi-line key wrap, formatter swap,
etc.) would silently break the lint without breaking the build —
exactly the failure mode that's worse than no lint at all.

Fix:
- Export ENDPOINT_RATE_POLICIES from server/_shared/rate-limit.ts.
- Convert scripts/enforce-rate-limit-policies.mjs to async + dynamic
  import() of the policy object directly. Same TS module that the
  gateway uses at runtime → no source-of-truth drift possible.
- Run via tsx (already a dev dep, used by test:data) so the .mjs
  shebang can resolve a .ts import.
- npm script swapped to `tsx scripts/...`. .husky/pre-push uses
  `npm run lint:rate-limit-policies` so no hook change needed.

Verified:
- Clean: 6 policies / 182 gateway routes.
- Negative test (rename a key to the original sanctions typo
  /api/sanctions/v1/lookup-entity): exit 1 with the same incident-
  attributed remedy message as before.
- Reformat test (split a single-line entry across multiple lines):
  still passes — the property is what's read, not the source layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(shipping/v2): alertThreshold: 0 preserved; drop dead validation branch (#3242 followup)

Before: alert_threshold was a plain int32. proto3 scalar default is 0, so
the handler couldn't distinguish "partner explicitly sent 0 (deliver every
disruption)" from "partner omitted the field (apply legacy default 50)" —
both arrived as 0 and got coerced to 50 by `> 0 ? : 50`. Silent intent-drop
for any partner who wanted every alert. The subsequent `alertThreshold < 0`
branch was also unreachable after that coercion.

After:
- Proto field is `optional int32 alert_threshold` — TS type becomes
  `alertThreshold?: number`, so omitted = undefined and explicit 0 stays 0.
- Handler uses `req.alertThreshold ?? 50` — undefined → 50, any number
  passes through unchanged.
- Dead `< 0 || > 100` runtime check removed; buf.validate `int32.gte = 0,
  int32.lte = 100` already enforces the range at the wire layer.

Partner wire contract: identical for the omit-field and 1..100 cases.
Only behavioural change is explicit 0 — previously impossible to request,
now honored per proto3 optional semantics.

Scoped `buf generate --path worldmonitor/shipping/v2` to avoid the full-
regen `@ts-nocheck` drift Seb documented in the #3242 PR comments.
Re-applied `@ts-nocheck` on the two regenerated files manually.

Tests:
- `alertThreshold 0 coerces to 50` flipped to `alertThreshold 0 preserved`.
- New test: `alertThreshold omitted (undefined) applies legacy default 50`.
- `rejects > 100` test removed — proto/wire validation handles it; direct
  handler calls intentionally bypass wire and the handler no longer carries
  a redundant runtime range check.

Verified: 18/18 shipping-v2-handler tests pass, typecheck + typecheck:api
clean, all 4 custom lints clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(shipping/v2): document missing webhook delivery worker + DNS-rebinding contract (#3242 followup)

#3242 followup checklist item 6 from @koala73 — sanity-check that the
delivery worker honors the re-resolve-and-re-check contract that
isBlockedCallbackUrl explicitly delegates to it.

Audit finding: no delivery worker for shipping/v2 webhooks exists in this
repo. Grep across the entire tree (excluding generated/dist) shows the
only readers of webhook:sub:* records are the registration / inspection /
rotate-secret handlers themselves. No code reads them and POSTs to the
stored callbackUrl. The delivery worker is presumed to live in Railway
(separate repo) or hasn't been built yet — neither is auditable from
this repo.

Refreshes the comment block at the top of webhook-shared.ts to:
- explicitly state DNS rebinding is NOT mitigated at registration
- spell out the four-step contract the delivery worker MUST follow
  (re-validate URL, dns.lookup, re-check resolved IP against patterns,
   fetch with resolved IP + Host header preserved)
- flag the in-repo gap so anyone landing delivery code can't miss it

Tracking the gap as #3288 — acceptance there is "delivery worker imports
the patterns + helpers from webhook-shared.ts and applies the four steps
before each send." Action moves to wherever the delivery worker actually
lives (Railway likely).

No code change. Tests + lints unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(lint): add rate-limit-policies step (greptile P1 #3287)

Pre-push hook ran lint:rate-limit-policies but the CI workflow did not,
so fork PRs and --no-verify pushes bypassed the exact drift check the
lint was added to enforce (closes #3278). Adding it right after
lint:api-contract so it runs in the same context the lint was designed
for.

* refactor(lint): premium-fetch regex → import() + loop classRe (greptile P2 #3287)

Two fragilities greptile flagged on enforce-premium-fetch.mjs:

1. loadPremiumPaths regex-parsed src/shared/premium-paths.ts with
   /'(\/api\/[^']+)'/g — same class of silent drift we just removed
   from enforce-rate-limit-policies in #3278. Reformatting the source
   Set (double quotes, spread, helper-computed entries) would drop
   paths from the lint while leaving the runtime untouched. Fix: flip
   the shebang to `#!/usr/bin/env -S npx tsx` and dynamic-import
   PREMIUM_RPC_PATHS directly, mirroring the rate-limit pattern.
   package.json lint:premium-fetch now invokes via tsx too so the
   npm-script path matches direct execution.

2. loadClientClassMap ran classRe.exec once, silently dropping every
   ServiceClient after the first if a file ever contained more than
   one. Current codegen emits one class per file so this was latent,
   but a template change would ship un-linted classes. Fix: collect
   every class-open match with matchAll, slice each class body with
   the next class's start as the boundary, and scan methods per-body
   so method-to-class binding stays correct even with multiple
   classes per file.

Verification:
- lint:premium-fetch clean (34 classes / 28 premium paths / 466 files
  — identical counts to pre-refactor, so no coverage regression).
- Negative test: revert src/services/economic/index.ts to
  globalThis.fetch → exit 1 with file:line, bound var name, and
  premium method list (getNationalDebt). Restore → clean.
- lint:rate-limit-policies still clean.

* fix(shipping/v2): re-add alertThreshold handler range guard (greptile nit 1 #3287)

Wire-layer buf.validate enforces 0..100, but direct handler invocation
(internal jobs, test harnesses, future transports) bypasses it. Cheap
invariant-at-the-boundary — rejects < 0 or > 100 with ValidationError
before the record is stored.

Tests: restored the rejects-out-of-range cases that were dropped when the
branch was (correctly) deleted as dead code on the previous commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(lint): premium-fetch method-regex → TS AST (greptile nits 2+5 #3287)

loadClientClassMap:
  The method regex `async (\w+)\s*\([^)]*\)\s*:\s*Promise<[^>]+>\s*\{\s*let
  path = "..."` assumed (a) no nested `)` in arg types, (b) no nested `>`
  in the return type, (c) `let path = "..."` as the literal first statement.
  Any codegen template shift would silently drop methods with the lint still
  passing clean — the same silent-drift class #3287 just closed on the
  premium-paths side.

  Now walks the service_client.ts AST, matches `export class *ServiceClient`,
  iterates `MethodDeclaration` members, and reads the first
  `let path: string = '...'` variable statement as a StringLiteral. Tolerant
  to any reformatting of arg/return types or method shape.

findCalls scope-blindness:
  Added limitation comment — the walker matches `<varName>.<method>()`
  anywhere in the file without respecting scope. Two constructions in
  different function scopes sharing a var name merge their called-method
  sets. No current src/ file hits this; the lint errs cautiously (flags
  both instances). Keeping the walker simple until scope-aware binding
  is needed.

webhook-shared.ts:
  Inlined issue reference (#3288) so the breadcrumb resolves without
  bouncing through an MDX that isn't in the diff.

Verification:
- lint:premium-fetch clean — 34 classes / 28 premium paths / 489 files.
  Pre-refactor: 34 / 28 / 466. Class + path counts identical; file bump
  is from the main-branch rebase, not the refactor.
- Negative test: revert src/services/economic/index.ts premiumFetch →
  globalThis.fetch. Lint exits 1 at `src/services/economic/index.ts:64:7`
  with `premium method(s) called: getNationalDebt`. Restore → clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(lint): rate-limit OpenAPI regex → yaml parser (greptile nit 3 #3287)

Input side (ENDPOINT_RATE_POLICIES) was flipped to live `import()` in
4e79d029. Output side (OpenAPI routes) still regex-scraped top-level
`paths:` keys with `/^\s{4}(\/api\/[^\s:]+):/gm` — hard-coded 4-space
indent. Any YAML formatter change (2-space indent, flow style, line
folding) would silently drop routes and let policy-drift slip through
— same silent-drift class the input-side fix closed.

Now uses the `yaml` package (already a dep) to parse each
.openapi.yaml and reads `doc.paths` directly.

Verification:
- Clean: 6 policies / 189 routes (was 182 — yaml parser picks up a
  handful the regex missed, closing a silent coverage gap).
- Negative test: rename policy key back to /api/sanctions/v1/lookup-entity
  → exits 1 with the same incident-attributed remedy. Restore → clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(codegen): regenerate unified OpenAPI bundle for alert_threshold proto change

The shipping/v2 webhook alert_threshold field was flipped from `int32` to
`optional int32` with an expanded doc comment in f3339464. That comment
now surfaces in the unified docs/api/worldmonitor.openapi.yaml bundle
(introduced by #3341). Regenerated with sebuf v0.11.1 to pick it up.

No behaviour change — bundle-only documentation drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:00:41 +03:00
Elie Habib
184e82cb40 feat(resilience): PR 3A — net-imports denominator for sovereignFiscalBuffer (#3380)
PR 3A of cohort-audit plan 2026-04-24-002. Construct correction for
re-export hubs: the SWF rawMonths denominator was gross imports, which
double-counted flow-through trade that never represents domestic
consumption. Net-imports fix:

  rawMonths = aum / (grossImports × (1 − reexportShareOfImports)) × 12

applied to any country in the re-export share manifest. Countries NOT
in the manifest get gross imports unchanged (status-quo fallback).

Plan acceptance gates — verified synthetically in this PR:

  Construct invariant. Two synthetic countries, same SWF, same gross
  imports. A re-exports 60%; B re-exports 0%. Post-fix, A's rawMonths
  is 2.5× B's (1/(1-0.6) = 2.5). Pinned in
  tests/resilience-net-imports-denominator.test.mts.

  SWF-heavy exporter invariant. Country with share ≤ 5%: rawMonths
  lift < 5% vs baseline (negligible). Pinned.

What shipped

1. Re-export share manifest infrastructure.
   - scripts/shared/reexport-share-manifest.yaml (new, empty) — schema
     committed; entries populated in follow-up PRs with UNCTAD
     Handbook citations.
   - scripts/shared/reexport-share-loader.mjs (new) — loader + strict
     validator, mirrors swf-manifest-loader.mjs.
   - scripts/seed-recovery-reexport-share.mjs (new) — publishes
     resilience:recovery:reexport-share:v1 from manifest. Empty
     manifest = valid (no countries, no adjustment).

2. SWF seeder uses net-imports denominator.
   - scripts/seed-sovereign-wealth.mjs exports computeNetImports(gross,
     share) — pure helper, unit-tested.
   - Per-country loop: reads manifest, computes denominatorImports,
     applies to rawMonths math.
   - Payload records annualImports (gross, audit), denominatorImports
     (used in math), reexportShareOfImports (provenance).
   - Summary log reports which countries had a net-imports adjustment
     applied with source year.

3. Bundle wiring.
   - Reexport-Share runs BEFORE Sovereign-Wealth in the recovery
     bundle so the SWF seeder reads fresh re-export data in the same
     cron tick.
   - tests/seed-bundle-resilience-recovery.test.mjs expected-entries
     updated (6 → 7) with ordering preservation.

4. Cache-prefix bump (per cache-prefix-bump-propagation-scope skill).
   - RESILIENCE_SCORE_CACHE_PREFIX: v11 → v12
   - RESILIENCE_RANKING_CACHE_KEY: v11 → v12
   - RESILIENCE_HISTORY_KEY_PREFIX: v6 → v7 (history rotation prevents
     30-day rolling window from mixing pre/post-fix scores and
     manufacturing false "falling" trends on deploy day).
   - Source of truth: server/worldmonitor/resilience/v1/_shared.ts
   - Mirrored in: scripts/seed-resilience-scores.mjs,
     scripts/validate-resilience-correlation.mjs,
     scripts/backtest-resilience-outcomes.mjs,
     scripts/validate-resilience-backtest.mjs,
     scripts/benchmark-resilience-external.mjs, api/health.js
   - Test literals bumped in 4 test files (26 line edits).
   - EXTENDED tests/resilience-cache-keys-health-sync.test.mts with
     a parity pass that reads every known mirror file and asserts
     both (a) canonical prefix present AND (b) no stale v<older>
     literals in non-comment code. Found one legacy log-line that
     still referenced v9 (scripts/seed-resilience-scores.mjs:342)
     and refactored it to use the RESILIENCE_RANKING_CACHE_KEY
     constant so future bumps self-update.

Explicitly NOT in this PR

- liquidReserveAdequacy denominator fix. The plan's PR 3A wording
  mentions both dims, but the RESERVES ratio (WB FI.RES.TOTL.MO) is a
  PRE-COMPUTED WB series; applying a post-hoc net-imports adjustment
  mixes WB's denominator year with our manifest-year, and the math
  change belongs in PR 3B (unified liquidity) where the α calibration
  is explicit. This PR stays scoped to sovereignFiscalBuffer.
- Live re-export share entries. The manifest ships EMPTY in this PR;
  entries with UNCTAD citations are one-per-PR follow-ups so each
  figure is individually auditable.

Verified

- tests/resilience-net-imports-denominator.test.mts — 9 pass (construct
  contract: 2.5× ratio gate, monotonicity, boundary rejections,
  backward-compat on missing manifest entry, cohort-proportionality,
  SWF-heavy-exporter-unchanged)
- tests/reexport-share-loader.test.mts — 7 pass (committed-manifest
  shape + 6 schema-violation rejections)
- tests/resilience-cache-keys-health-sync.test.mts — 5 pass (existing 3
  + 2 new parity checks across all mirror files)
- tests/seed-bundle-resilience-recovery.test.mjs — 17 pass (expected
  entries bumped to 7)
- npm run test:data — 6714 pass / 0 fail
- npm run typecheck / typecheck:api — green
- npm run lint / lint:md — clean

Deployment notes

Score + ranking + history cache prefixes all bump in the same deploy.
Per established v10→v11 precedent (and the cache-prefix-bump-
propagation-scope skill):
- Score / ranking: 6h TTL — the new prefix populates via the Railway
  resilience-scores cron within one tick.
- History: 30d ring — the v7 ring starts empty; the first 30 days
  post-deploy lack baseline points, so trend / change30d will read as
  "no change" until v7 accumulates a window.
- Legacy v11 keys can be deleted from Redis at any time post-deploy
  (no reader references them). Leaving them in place costs storage
  but does no harm.
2026-04-24 18:14:04 +04:00
Elie Habib
34dfc9a451 fix(news): ground LLM surfaces on real RSS description end-to-end (#3370)
* feat(news/parser): extract RSS/Atom description for LLM grounding (U1)

Add description field to ParsedItem, extract from the first non-empty of
description/content:encoded (RSS) or summary/content (Atom), picking the
longest after HTML-strip + entity-decode + whitespace-normalize. Clip to
400 chars. Reject empty, <40 chars after strip, or normalize-equal to the
headline — downstream consumers fall back to the cleaned headline on '',
preserving current behavior for feeds without a description.

CDATA end is anchored to the closing tag so internal ]]> sequences do not
truncate the match. Preserves cached rss:feed:v1 row compatibility during
the 1h TTL bleed since the field is additive.

Part of fix: pipe RSS description end-to-end so LLM surfaces stop
hallucinating named actors (docs/plans/2026-04-24-001-...).

Covers R1, R7.

* feat(news/story-track): persist description on story:track:v1 HSET (U2)

Append description to the story:track:v1 HSET only when non-empty. Additive
— no key version bump. Old rows and rows from feeds without a description
return undefined on HGETALL, letting downstream readers fall back to the
cleaned headline (R6).

Extract buildStoryTrackHsetFields as a pure helper so the inclusion gate is
unit-testable without Redis.

Update the contract comment in cache-keys.ts so the next reader of the
schema sees description as an optional field.

Covers R2, R6.

* feat(proto): NewsItem.snippet + SummarizeArticleRequest.bodies (U3)

Add two additive proto fields so the article description can ride to every
LLM-adjacent consumer without a breaking change:

- NewsItem.snippet (field 12): RSS/Atom description, HTML-stripped,
  ≤400 chars, empty when unavailable. Wired on toProtoItem.
- SummarizeArticleRequest.bodies (field 8): optional article bodies
  paired 1:1 with headlines for prompt grounding. Empty array is today's
  headline-only behavior.

Regenerated TS client/server stubs and OpenAPI YAML/JSON via sebuf v0.11.1
(PATH=~/go/bin required — Homebrew's protoc-gen-openapiv3 is an older
pre-bundle-mode build that collides on duplicate emission).

Pre-emptive bodies:[] placeholders at the two existing SummarizeArticle
call sites in src/services/summarization.ts; U6 replaces them with real
article bodies once SummarizeArticle handler reads the field.

Covers R3, R5.

* feat(brief/digest): forward RSS description end-to-end through brief envelope (U4)

Digest accumulator reader (seed-digest-notifications.mjs::buildDigest) now
plumbs the optional `description` field off each story:track:v1 HGETALL into
the digest story object. The brief adapter (brief-compose.mjs::
digestStoryToUpstreamTopStory) prefers the real RSS description over the
cleaned headline; when the upstream row has no description (old rows in the
48h bleed, feeds that don't carry one), we fall back to the cleaned headline
so today behavior is preserved (R6).

This is the upstream half of the description cache path. U5 lands the LLM-
side grounding + cache-prefix bump so Gemini actually sees the article body
instead of hallucinating a named actor from the headline.

Covers R4 (upstream half), R6.

* feat(brief/llm): RSS grounding + sanitisation + 4 cache prefix bumps (U5)

The actual fix for the headline-only named-actor hallucination class:
Gemini 2.5 Flash now receives the real article body as grounding context,
so it paraphrases what the article says instead of filling role-label
headlines from parametric priors ("Iran's new supreme leader" → "Ali
Khamenei" was the 2026-04-24 reproduction; with grounding, it becomes
the actual article-named actor).

Changes:

- buildStoryDescriptionPrompt interpolates a `Context: <body>` line
  between the metadata block and the "One editorial sentence" instruction
  when description is non-empty AND not normalise-equal to the headline.
  Clips to 400 chars as a second belt-and-braces after the U1 parser cap.
  No Context line → identical prompt to pre-fix (R6 preserved).

- sanitizeStoryForPrompt extended to cover `description`. Closes the
  asymmetry where whyMatters was sanitised and description wasn't —
  untrusted RSS bodies now flow through the same injection-marker
  neutraliser before prompt interpolation. generateStoryDescription wraps
  the story in sanitizeStoryForPrompt before calling the builder,
  matching generateWhyMatters.

- Four cache prefixes bumped atomically to evict pre-grounding rows:
    scripts/lib/brief-llm.mjs:
      brief:llm:description:v1 → v2  (Railway, description path)
      brief:llm:whymatters:v2 → v3   (Railway, whyMatters fallback)
    api/internal/brief-why-matters.ts:
      brief:llm:whymatters:v6 → v7                (edge, primary)
      brief:llm:whymatters:shadow:v4 → shadow:v5  (edge, shadow)
  hashBriefStory already includes description in the 6-field material
  (v5 contract) so identity naturally drifts; the prefix bump is the
  belt-and-braces that guarantees a clean cold-start on first tick.

- Tests: 8 new + 2 prefix-match updates on tests/brief-llm.test.mjs.
  Covers Context-line injection, empty/dup-of-headline rejection,
  400-char clip, sanitisation of adversarial descriptions, v2 write,
  and legacy-v1 row dark (forced cold-start).

Covers R4 + new sanitisation requirement.

* feat(news/summarize): accept bodies + bump summary cache v5→v6 (U6)

SummarizeArticle now grounds on per-headline article bodies when callers
supply them, so the dashboard "News summary" path stops hallucinating
across unrelated headlines when the upstream RSS carried context.

Three coordinated changes:

1. SummarizeArticleRequest handler reads req.bodies, sanitises each entry
   through sanitizeForPrompt (same trust treatment as geoContext — bodies
   are untrusted RSS text), clips to 400 chars, and pads to the headlines
   length so pair-wise identity is stable.

2. buildArticlePrompts accepts optional bodies and interleaves a
   `    Context: <body>` line under each numbered headline that has a
   non-empty body. Skipped in translate mode (headline[0]-only) and when
   all bodies are empty — yielding a byte-identical prompt to pre-U6
   for every current caller (R6 preserved).

3. summary-cache-key bumps CACHE_VERSION v5→v6 so the pre-grounding rows
   (produced from headline-only prompts) cold-start cleanly. Extends
   canonicalizeSummaryInputs + buildSummaryCacheKey with a pair-wise
   bodies segment `:bd<hash>`; the prefix is `:bd` rather than `:b` to
   avoid colliding with `:brief:` when pattern-matching keys. Translate
   mode is headline[0]-only and intentionally does not shift on bodies.

Dedup reorder preserved: the handler re-pairs bodies to the deduplicated
top-5 via findIndex, so layout matches without breaking cache identity.

New tests: 7 on buildArticlePrompts (bodies interleave, partial fill,
translate-mode skip, clip, short-array tolerance), 8 on
buildSummaryCacheKey (pair-wise sort, cache-bust on body drift, translate
skip). Existing summary-cache-key assertions updated v5→v6.

Covers R3, R4.

* feat(consumers): surface RSS snippet across dashboard, email, relay, MCP + audit (U7)

Thread the RSS description from the ingestion path (U1-U5) into every
user-facing LLM-adjacent surface. Audit the notification producers so
RSS-origin and domain-origin events stay on distinct contracts.

Dashboard (proto snippet → client → panel):
- src/types/index.ts NewsItem.snippet?:string (client-side field).
- src/app/data-loader.ts proto→client mapper propagates p.snippet.
- src/components/NewsPanel.ts renders snippet as a truncated (~200 chars,
  word-boundary ellipsis) `.item-snippet` line under each headline.
- NewsPanel.currentBodies tracks per-headline bodies paired 1:1 with
  currentHeadlines; passed as options.bodies to generateSummary so the
  server-side SummarizeArticle LLM grounds on the article body.

Summary plumbing:
- src/services/summarization.ts threads bodies through SummarizeOptions
  → generateSummary → runApiChain → tryApiProvider; cache key now includes
  bodies (via U6's buildSummaryCacheKey signature).

MCP world-brief:
- api/mcp.ts pairs headlines with their RSS snippets and POSTs `bodies`
  to /api/news/v1/summarize-article so the MCP tool surface is no longer
  starved.

Email digest:
- scripts/seed-digest-notifications.mjs plain-text formatDigest appends
  a ~200-char truncated snippet line under each story; HTML formatDigestHtml
  renders a dim-grey description div between title and meta. Both gated
  on non-empty description (R6 — empty → today's behavior).

Real-time alerts:
- src/services/breaking-news-alerts.ts BreakingAlert gains optional
  description; checkBatchForBreakingAlerts reads item.snippet; dispatchAlert
  includes `description` in the /api/notify payload when present.

Notification relay:
- scripts/notification-relay.cjs formatMessage gated on
  NOTIFY_RELAY_INCLUDE_SNIPPET=1 (default off). When on, RSS-origin
  payloads render a `> <snippet>` context line under the title. When off
  or payload.description absent, output is byte-identical to pre-U7.

Audit (RSS vs domain):
- tests/notification-relay-payload-audit.test.mjs enforces file-level
  @notification-source tags on every producer, rejects `description:` in
  domain-origin payload blocks, and verifies the relay codepath gates
  snippet rendering under the flag.
- Tag added to ais-relay.cjs (domain), seed-aviation.mjs (domain),
  alert-emitter.mjs (domain), breaking-news-alerts.ts (rss).

Deferred (plan explicitly flags): InsightsPanel + cluster-producer
plumbing (bodies default to [] — will unlock gradually once news:insights:v1
producer also carries primarySnippet).

Covers R5, R6.

* docs+test: grounding-path note + bump pinned CACHE_VERSION v5→v6 (U8)

Final verification for the RSS-description-end-to-end fix:

- docs/architecture.mdx — one-paragraph "News Grounding Pipeline"
  subsection tracing parser → story:track:v1.description → NewsItem.snippet
  → brief / SummarizeArticle / dashboard / email / relay / MCP, with the
  empty-description R6 fallback rule called out explicitly.
- tests/summarize-reasoning.test.mjs — Fix-4 static-analysis pin updated
  to match the v6 bump from U6. Without this the summary cache bump silently
  regressed CI's pinned-version assertion.

Final sweep (2026-04-24):
- grep -rn 'brief:llm:description:v1' → only in the U5 legacy-row test
  simulation (by design: proves the v2 bump forces cold-start).
- grep -rn 'brief:llm:whymatters:v2/v6/shadow:v4' → no live references.
- grep -rn 'summary:v5' → no references.
- CACHE_VERSION = 'v6' in src/utils/summary-cache-key.ts.
- Full tsx --test sweep across all tests/*.test.{mjs,mts}: 6747/6747 pass.
- npm run typecheck + typecheck:api: both clean.

Covers R4, R6, R7.

* fix(rss-description): address /ce:review findings before merge

14 fixes from structured code review across 13 reviewer personas.

Correctness-critical (P1 — fixes that prevent R6/U7 contract violations):
- NewsPanel signature covers currentBodies so view-mode toggles that leave
  headlines identical but bodies different now invalidate in-flight summaries.
  Without this, switching renderItems → renderClusters mid-summary let a
  grounded response arrive under a stale (now-orphaned) cache key.
- summarize-article.ts re-pairs bodies with headlines BEFORE dedup via a
  single zip-sanitize-filter-dedup pass. Previously bodies[] was indexed by
  position in light-sanitized headlines while findIndex looked up the
  full-sanitized array — any headline that sanitizeHeadlines emptied
  mispaired every subsequent body, grounding the LLM on the wrong story.
- Client skips the pre-chain cache lookup when bodies are present, since
  client builds keys from RAW bodies while server sanitizes first. The
  keys diverge on injection content, which would silently miss the
  server's authoritative cache every call.

Test + audit hardening:
- Legacy v1 eviction test now uses the real hashBriefStory(story()) suffix
  instead of a literal "somehash", so a bug where the reader still queried
  the v1 prefix at the real key would actually be caught.
- tests/summary-cache-key.test.mts adds 400-char clip identity coverage so
  the canonicalizer's clip and any downstream clip can't silently drift.
- tests/news-rss-description-extract.test.mts renames the well-formed
  CDATA test and adds a new test documenting the malformed-]]> fallback
  behavior (plain regex captures, article content survives).

Safe_auto cleanups:
- Deleted dead SNIPPET_PUSH_MAX constant in notification-relay.cjs.
- BETA-mode groq warm call now passes bodies, warming the right cache slot.
- seed-digest shares a local normalize-equality helper for description !=
  headline comparison, matching the parser's contract.
- Pair-wise sort in summary-cache-key tie-breaks on body so duplicate
  headlines produce stable order across runs.
- buildSummaryCacheKey gained JSDoc documenting the client/server contract
  and the bodies parameter semantics.
- MCP get_world_brief tool description now mentions RSS article-body
  grounding so calling agents see the current contract.
- _shared.ts `opts.bodies![i]!` double-bang replaced with `?? ''`.
- extractRawTagBody regexes cached in module-level Map, mirroring the
  existing TAG_REGEX_CACHE pattern.

Deferred to follow-up (tracked for PR description / separate issue):
- Promote shared MAX_BODY constant across the 5 clip sites
- Promote shared truncateForDisplay helper across 4 render sites
- Collapse NewsPanel.{currentHeadlines, currentBodies} → Array<{title, snippet}>
- Promote sanitizeStoryForPrompt to shared/brief-llm-core.js
- Split list-feed-digest.ts parser helpers into sibling -utils.ts
- Strengthen audit test: forward-sweep + behavioral gate test

Tests: 6749/6749 pass. Typecheck clean on both configs.

* fix(summarization): thread bodies through browser T5 path (Codex #2)

Addresses the second of two Codex-raised findings on PR #3370:

The PR threaded bodies through the server-side API provider chain
(Ollama → Groq → OpenRouter → /api/news/v1/summarize-article) but the
local browser T5 path at tryBrowserT5 was still summarising from
headlines alone. In BETA_MODE that ungrounded path runs BEFORE the
grounded server providers; in normal mode it remains the last
fallback. Whenever T5-small won, the dashboard summary surface
regressed to the headline-only path — the exact hallucination class
this PR exists to eliminate.

Fix: tryBrowserT5 accepts an optional `bodies` parameter and
interleaves each body with its paired headline via a `headline —
body` separator in the combined text (clipped to 200 chars per body
to stay within T5-small's ~512-token context window). All three call
sites (BETA warm, BETA cold, normal-mode fallback) now pass the
bodies threaded down from generateSummary options.bodies.

When bodies is empty/omitted, the combined text is byte-identical to
pre-fix (R6 preserved).

On Codex finding #1 (story:track:v1 additive-only HSET keeps a body
from an earlier mention of the same normalized title), declining to
change. The current rule — "if this mention has a body, overwrite;
otherwise leave the prior body alone" — is defensible: a body from
mention A is not falsified by mention B being body-less (a wire
reprint doesn't invalidate the original source's body). A feed that
publishes a corrected headline creates a new normalized-title hash,
so no stale body carries forward. The failure window is narrow (live
story evolving while keeping the same title through hours of
body-less wire reprints) and the 7-day STORY_TTL is the backstop.
Opening a follow-up issue to revisit semantics if real-world evidence
surfaces a stale-grounding case.

* fix(story-track): description always-written to overwrite stale bodies (Codex #1)

Revisiting Codex finding #1 on PR #3370 after re-review. The previous
response declined the fix with reasoning; on reflection the argument
was over-defending the current behavior.

Problem: buildStoryTrackHsetFields previously wrote `description` only
when non-empty. Because story:track:v1 rows are collapsed by
normalized-title hash, an earlier mention's body would persist for up
to STORY_TTL (7 days) on subsequent body-less mentions of the same
story. Consumers reading `track.description` via HGETALL could not
distinguish "this mention's body" from "some mention's body from the
last week," silently grounding brief / whyMatters / SummarizeArticle
LLMs on text the current mention never supplied. That violates the
grounding contract advertised to every downstream surface in this PR.

Fix: HSET `description` unconditionally on every mention — empty
string when the current item has no body, real body when it does. An
empty value overwrites any prior mention's body so the row is always
authoritative for the current cycle. Consumers continue to treat
empty description as "fall back to cleaned headline" (R6 preserved).
The 7-day STORY_TTL and normalized-title hash semantics are unchanged.

Trade-off accepted: a valid body from Feed A (NYT) is wiped when Feed
B (AP body-less wire reprint) arrives for the same normalized title,
even though Feed A's body is factually correct. Rationale: the
alternative — keeping Feed A's body indefinitely — means the user
sees Feed A's body attributed (by proximity) to an AP mention at a
later timestamp, which is at minimum misleading and at worst carries
retracted/corrected details. Honest absence beats unlabeled presence.

Tests: new stale-body overwrite sequence test (T0 body → T1 empty →
T2 new body), existing "writes description when non-empty" preserved,
existing "omits when empty" inverted to "writes empty, overwriting."
cache-keys.ts contract comment updated to mark description as
always-written rather than optional.
2026-04-24 16:25:14 +04:00
Elie Habib
d521924253 fix(resilience): fail closed on missing v2 energy seeds + health CRIT on absent inputs (#3363)
* fix(resilience): fail closed on missing v2 energy seeds + health CRIT on absent inputs

PR #3289 shipped the v2 energy construct behind RESILIENCE_ENERGY_V2_ENABLED
(default false). Audit on 2026-04-24 after the user flagged "AE only moved
1.49 points — we added nuclear credit, we should see more" revealed two
safety gaps that made a future flag flip unsafe:

1. scoreEnergyV2 silently fell back to IMPUTE when any of its three
   required Redis seeds (low-carbon-generation, fossil-electricity-share,
   power-losses) was null. A future operator flipping the flag with
   seeds absent would produce fabricated-looking numbers for every
   country with zero operator signal.

2. api/health.js had those three seed labels in BOTH SEED_META (CRIT on
   missing) AND ON_DEMAND_KEYS (which demotes CRIT to WARN). The demotion
   won. Health has been reporting WARNING on a scorer dependency that has
   been 100% missing since PR #3289 merged — no paging trail existed.

Changes:

  server/worldmonitor/resilience/v1/_dimension-scorers.ts
    - Add ResilienceConfigurationError with missingKeys[] payload.
    - scoreEnergy: preflight the three v2 seeds when flag=true. Throw
      ResilienceConfigurationError listing the specific absent keys.
    - scoreAllDimensions: wrap per-dimension dispatch in try/catch so a
      thrown ResilienceConfigurationError routes to the source-failure
      shape (imputationClass='source-failure', coverage=0) for that ONE
      dimension — country keeps scoring other dims normally. Log once
      per country-dimension pair so the gap is audit-traceable.

  api/health.js
    - Remove lowCarbonGeneration / fossilElectricityShare / powerLosses
      from ON_DEMAND_KEYS. They stay in BOOTSTRAP_KEYS + SEED_META.
    - Replace the transitional comment with a hard "do NOT add these
      back" note pointing at the scorer's fail-closed gate.

  tests/resilience-energy-v2.test.mts
    - New test: flag on + ALL three seeds missing → throws
      ResilienceConfigurationError naming all three keys.
    - New test: flag on + only one seed missing → throws naming ONLY
      the missing key (operator-clarity guard).
    - New test: flag on + all seeds present → v2 runs normally.
    - Update the file-level invariant comment to reflect the new
      fail-closed contract (replacing the prior "degrade gracefully"
      wording that codified the silent-IMPUTE bug).
    - Note: fixture's `??` fallbacks coerce null-overrides into real
      data, so the preflight tests use a direct-reader helper.

  docs/methodology/country-resilience-index.mdx
    - New "Fail-closed semantics" paragraph in the v2 Energy section
      documenting the throw + source-failure + health-CRIT contract.

Non-goals (intentional):
  - This PR does NOT flip RESILIENCE_ENERGY_V2_ENABLED.
  - This PR does NOT provision seed-bundle-resilience-energy-v2 on Railway.
  - This PR does NOT touch RESILIENCE_PILLAR_COMBINE_ENABLED.

Operational effect post-merge:
  - /api/health flips from WARNING → CRITICAL on the three v2 seed-meta
    entries. That is the intended alarm; it reveals that the Railway
    bundle was never provisioned.
  - scoreEnergy behavior with flag=false is unchanged (legacy path).
  - scoreEnergy behavior with flag=true + seeds present is unchanged.
  - scoreEnergy behavior with flag=true + seeds absent changes from
    "silently IMPUTE all 217 countries" to "source-failure on the
    energy dim for every country, visible in widget + API response".

Tests: 511/511 resilience-* pass. Biome clean. Lint:md clean.

Related plan: docs/plans/2026-04-24-001-fix-resilience-v2-fail-closed-on-missing-seeds-plan.md

* docs(resilience): scrub stale ON_DEMAND_KEYS references for v2 energy seeds

Greptile P2 on PR #3363: four stale references implied the three v2
energy seeds were still gated as ON_DEMAND_KEYS (WARN-on-missing) even
though this PR's api/health.js change removed them (now strict
SEED_META = CRIT on missing). Scrubbing each:

  - api/health.js:196 (BOOTSTRAP_KEYS comment) — was "ON_DEMAND_KEYS
    until Railway cron provisions; see below." Updated to cite plan
    2026-04-24-001 and the strict-SEED_META posture.
  - api/health.js:398 (SEED_META comment) — was "Listed in ON_DEMAND_KEYS
    below until Railway cron provisions..." Updated for same reason.
  - docs/methodology/country-resilience-index.mdx:635 — v2.1 changelog
    entry said seed keys were ON_DEMAND_KEYS until graduation. Replaced
    with the fail-closed contract description.
  - docs/methodology/energy-v2-flag-flip-runbook.md:25 — step 3 said
    "ON_DEMAND_KEYS graduation" was required at flag-flip time.
    Rewrote to explain no graduation step is needed because the
    posture was removed pre-activation.

No code change. Tests still 14/14 on the energy-v2 suite, lint:md clean.

* fix(docs): escape MDX-unsafe `<=` in energy-v2 runbook to unblock Mintlify

Mintlify deploy on PR #3363 failed with
`Unexpected character '=' (U+003D) before name` at
`docs/methodology/energy-v2-flag-flip-runbook.md`. Two lines had
`<=` in plain prose, which MDX tries to parse as a JSX-tag-start.

Replaced both with `≤` (U+2264) — and promoted the two existing `>=`
on adjacent lines to `≥` for consistency. Prose is clearer and MDX
safe.

Same pattern as `mdx-unsafe-patterns-in-md` skill; also adjacent to
PR #3344's `(<137 countries)` fix.
2026-04-24 09:37:18 +04:00
Elie Habib
6d4c717e75 fix(health): treat empty intlDelays as OK, matching faaDelays (#3360)
intlDelays was alarming EMPTY_DATA during calm windows (seedAge 25m,
records 0) while its faaDelays sibling — written by the same aviation
seeder — was in EMPTY_DATA_OK_KEYS. The seeder itself declares
zeroIsValid: true (scripts/seed-aviation.mjs:1171) because 0 airport
disruptions is a real steady state, so the health classifier should
agree. Stale-seed degradation still kicks in once seedAge > 90min.
2026-04-24 07:11:56 +04:00
Elie Habib
d75bde4e03 fix(agent-readiness): host-aware oauth-protected-resource endpoint (#3351)
* fix(agent-readiness): host-aware oauth-protected-resource endpoint

isitagentready.com enforces that `authorization_servers[*]` share
origin with `resource` (same-origin rule, matches Cloudflare's
mcp.cloudflare.com reference — RFC 9728 §3 permits split origins
but the scanner is stricter).

A single static file served from 3 hosts (apex/www/api) can only
satisfy one origin at a time. Replacing with an edge function that
derives both `resource` and `authorization_servers` from the
request `Host` header gives each origin self-consistent metadata.

No server-side behavior changes: api/oauth/*.js token issuer
doesn't bind tokens to a specific resource value (verified in
the previous PR's review).

* fix(agent-readiness): host-derive resource_metadata + runtime guardrails

Addresses P1/P2 review on this PR:

- api/mcp.ts (P1): WWW-Authenticate resource_metadata was still
  hardcoded to apex even when the client hit api.worldmonitor.app.
  Derive from request.headers.get('host') so each client gets a
  pointer matching their own origin — consistent with the host-
  aware edge function this PR introduces.
- api/oauth-protected-resource.ts (P2): add Vary: Host so any
  intermediate cache keys by hostname (belt + suspenders on top of
  Vercel's routing).
- tests/deploy-config.test.mjs (P2): replace regex-on-source with
  a runtime handler invocation asserting origin-matching metadata
  for apex/www/api hosts, and tighten the api/mcp.ts assertion to
  require host-derived resource_metadata construction.

---------

Co-authored-by: Elie Habib <elie@worldmonitor.app>
2026-04-23 21:17:32 +04:00
Elie Habib
9c3c7e8657 fix(agent-readiness): align OAuth resource with public MCP origin (#3345)
* fix(agent-readiness): align OAuth resource with actual public MCP origin

isitagentready.com's OAuth Protected Resource check enforces an
origin match between the scanned host and the metadata's `resource`
field (per the spirit of RFC 9728 §3). Our metadata declared
`resource: "https://api.worldmonitor.app"` while the MCP endpoint is
publicly served at `https://worldmonitor.app/mcp` (per vercel.json's
/mcp → /api/mcp rewrite and the MCP card's transport.endpoint).

Flip `resource` to `https://worldmonitor.app` across the three
places that declare it:

- public/.well-known/oauth-protected-resource
- public/.well-known/mcp/server-card.json (authentication block)
- api/mcp.ts (two WWW-Authenticate resource_metadata pointers)

`authorization_servers` intentionally stays on api.worldmonitor.app
— that's where /oauth/{authorize,token,register} actually live.
RFC 9728 permits AS and resource to be at different origins.

No server-side validation breaks: api/oauth/*.js and api/mcp.ts
do not bind tokens to the old resource value.

* fix(agent-readiness): align docs/tests + add MCP origin guardrail

Addresses P1/P2 review on this PR. The resource-origin flip in the
previous commit only moved the mismatch from apex to api unless the
repo also documents apex as the canonical MCP origin.

- docs/mcp.mdx: swap api.worldmonitor.app/mcp -> worldmonitor.app/mcp
  (OAuth endpoints stay on api.*, only the resource URL changes)
- tests/mcp.test.mjs: same fixture update
- tests/deploy-config.test.mjs: new guardrail block asserting that
  MCP transport.endpoint origin, OAuth metadata resource, MCP card
  authentication.resource, and api/mcp.ts resource_metadata
  pointers all share the same origin. Includes a regression guard
  that authorization_servers stays on api.worldmonitor.app (the
  intentional resource/AS split).
2026-04-23 19:42:13 +04:00
Elie Habib
d3d406448a feat(resilience): PR 2 §3.4 recovery-domain weight rebalance (#3328)
* feat(resilience): PR 2 §3.4 recovery-domain weight rebalance

Dials the two PR 2 §3.4 recovery dims (liquidReserveAdequacy,
sovereignFiscalBuffer) to ~10% share each of the recovery-domain
score via a new per-dimension weight channel in the coverage-weighted
mean. Matches the plan's direction that the sovereign-wealth signal
complement — rather than dominate — the classical liquid-reserves
and fiscal-space signals.

Implementation

- RESILIENCE_DIMENSION_WEIGHTS: new Record<ResilienceDimensionId, number>
  alongside RESILIENCE_DOMAIN_WEIGHTS. Every dim has an explicit entry
  (default 1.0) so rebalance decisions stay auditable; the two new
  recovery dims carry 0.5 each.

  Share math at full coverage (6 active recovery dims):
    weight sum                  = 4 × 1.0 + 2 × 0.5 = 5.0
    each new-dim share          = 0.5 / 5.0 = 0.10  ✓
    each core-dim share         = 1.0 / 5.0 = 0.20

  Retired dims (reserveAdequacy, fuelStockDays) keep weight 1.0 in
  the map; their coverage=0 neutralizes them at the coverage channel
  regardless. Explicit entries guard against a future scorer bug
  accidentally returning coverage>0 for a retired dim and falling
  through the `?? 1.0` default — every retirement decision is now
  tied to a single explicit source of truth.

- coverageWeightedMean (_shared.ts): refactored to apply
  `coverage × dimWeight` per dim instead of `coverage` alone. Backward-
  compatible when all weights default to 1.0 (reduces to the original
  mean). All three aggregation callers — buildDomainList, baseline-
  Score, stressScore — pick up the weighting transparently.

Test coverage

1. New `tests/resilience-recovery-weight-rebalance.test.mts`:
   pins the per-dim weight values, asserts the share math
   (0.10 new / 0.20 core), verifies completeness of the weight map,
   and documents why retired dims stay in the map at 1.0.
2. New `tests/resilience-recovery-ordering.test.mts`: fixture-based
   Spearman-proxy sensitivity check. Asserts NO > US > YE ordering
   preserved on both the overall score and the recovery-domain
   subscore after the rebalance. (Live post-merge Spearman rerun
   against the PR 0 snapshot is tracked as a follow-up commit.)
3. resilience-scorers.test.mts fixture anchors updated in lockstep:
     baselineScore: 60.35 → 62.17 (low-scoring liquidReserveAdequacy
       + partial-coverage SWF now contribute ~half the weight)
     overallScore:  63.60 → 64.39 (recovery subscore lifts by ~3 pts
       from the rebalance, overall by ~0.79)
     recovery flat mean: 48.75 (unchanged — flat mean doesn't apply
       weights by design; documents the coverage-weighted diff)
   Local coverageWeightedMean helper in the test mirrors the
   production implementation (weights applied per dim).

Methodology doc

- New "Per-dimension weights in the recovery domain" subsection with
  the weight table and a sentence explaining the cap. Cross-references
  the source of truth (RESILIENCE_DIMENSION_WEIGHTS).

Deliberate non-goals

- Live post-merge Spearman ≥0.85 check against the PR 0 baseline
  snapshot. Fixture ordering is preserved (new ordering test); the
  live-data check runs after Railway cron refreshes the rankings on
  the new weights and commits docs/snapshots/resilience-ranking-live-
  post-pr2-<date>.json. Tracked as the final piece of PR 2 §3.4
  alongside the health.js / bootstrap graduation (waiting on the
  7-day Railway cron bake-in window).

Tests: 6588/6588 data-tier tests pass. Typecheck clean on both
tsconfig configs. Biome clean on touched files. NO > US > YE
fixture ordering preserved.

* fix(resilience): PR 2 review — thread RESILIENCE_DIMENSION_WEIGHTS through the comparison harness

Greptile P2: the operator comparison harness
(scripts/compare-resilience-current-vs-proposed.mjs) claims its domain
scores "mirror the production scorer's coverage-weighted mean" and is
the artifact generator for Spearman / rank-delta acceptance decisions.
After PR 2 §3.4's weight rebalance, the production mirror diverged —
production now applies RESILIENCE_DIMENSION_WEIGHTS (liquidReserveAdequacy
= 0.5, sovereignFiscalBuffer = 0.5) inside coverageWeightedMean, but
the harness still used equal-weight aggregation.

Left unfixed, post-merge Spearman / rank-delta diagnostics would
compare live API scores (with the 0.5 recovery weights) against
harness predictions that assume equal-share dims — silently biasing
every acceptance decision until someone noticed a country's rank-
delta didn't track.

Fix

- Mirrored coverageWeightedMean now accepts dimensionWeights and
  applies `coverage × weight` per dim, matching _shared.ts exactly.
- Mirrored buildDomainList accepts + forwards dimensionWeights.
- main() imports RESILIENCE_DIMENSION_WEIGHTS from the scorer module
  and passes it through to buildDomainList at the single call site.
- Missing-entry default = 1.0 (same contract as production) — makes
  the harness forward-compatible with any future weight refactor
  (adds a new dim without an explicit entry, old production fallback
  path still produces the correct number).

Verification

- Harness syntax-check clean (node -c).
- RESILIENCE_DIMENSION_WEIGHTS import resolves correctly from the
  harness's import path.
- 509/509 resilience tests still pass (harness isn't in the test
  suite; the invariant is that production ↔ harness use the same
  math, and the production side is covered by tests/resilience-
  recovery-weight-rebalance.test.mts).

* fix(resilience): PR 2 review — bump cache prefixes v10→v11 + document coverage-vs-weight asymmetry

Greptile P1 + P2 on PR #3328.

P1 — cache prefix not bumped after formula change
--------------------------------------------------
The per-dim weight rebalance changes the score formula, but the
`_formula` tag only distinguishes 'd6' vs 'pc' (pillar-combined vs
legacy 6-domain) — it does NOT detect intra-'d6' weight changes. Left
unfixed, scores cached before deploy would be served with the old
equal-weight math for up to the full 6h TTL, and the ranking key for
up to its 12h TTL. Matches the established v9→v10 pattern for every
prior formula-changing deploy.

Bumped in lockstep:
 - RESILIENCE_SCORE_CACHE_PREFIX:     v10  → v11
 - RESILIENCE_RANKING_CACHE_KEY:      v10  → v11
 - RESILIENCE_HISTORY_KEY_PREFIX:      v5  → v6
 - scripts/seed-resilience-scores.mjs local mirrors
 - api/health.js resilienceRanking literal
 - 4 analysis/backtest scripts that read the cached keys directly
 - Test fixtures in resilience-{ranking, handlers, scores-seed,
   pillar-aggregation}.test.* that assert on literal key values

The v5→v6 history bump is the critical one: without it, pre-rebalance
history points would mix with post-rebalance points inside the 30-day
window, and change30d / trend math would diff values from different
formulas against each other, producing false-negative "falling" trends
for every country across the deploy window.

P2 — coverage-vs-weight asymmetry in computeLowConfidence / computeOverallCoverage
----------------------------------------------------------------------------------
Reviewer flagged that these two functions still average coverage
equally across all non-retired dims, even after the scoring aggregation
started applying RESILIENCE_DIMENSION_WEIGHTS. The asymmetry is
INTENTIONAL — these signals answer a different question from scoring:

  scoring aggregation: "how much does each dim matter to the score?"
  coverage signal:     "how much real data do we have on this country?"

A dim at weight 0.5 still has the same data-availability footprint as
a weight=1.0 dim: its coverage value reflects whether we successfully
fetched the upstream source, not whether the scorer cares about it.
Applying scoring weights to the coverage signal would let a
half-weight dim hide half its sparsity from the overallCoverage pill,
misleading users reading coverage as a data-quality indicator.

Added explicit comments to both functions noting the asymmetry is
deliberate and pointing at the other site for matching rationale.
No code change — just documentation.

Tests: 6588/6588 data-tier tests pass (+511 resilience-specific
including the prefix-literal assertions). Typecheck clean on both
tsconfig configs. Biome clean on touched files.

* docs(resilience): bump methodology doc cache-prefix references to v11/v6

Greptile P2 on PR #3328: Redis keys table in the reproducibility
appendix still published `score:v10` / `ranking:v10` / `history:v5`,
and the rollback instructions told operators to flush those keys.
After the recovery-domain weight rebalance, live cache runs at
`score:v11` / `ranking:v11` / `history:v6`.

- Updated the Redis keys table (line 490-492) to match `_shared.ts`.
- Updated the rollback block to name the current keys.
- Left the historical "Activation sequence" narrative intact (it
  accurately describes the pillar-combine PR's v9→v10 / v4→v5 bump)
  but added a parenthetical pointing at the current v11/v6 values.

No code change — doc-only correction for operator accuracy.

* fix(docs): escape MDX-unsafe `<137` pattern to unblock Mintlify deploy

Line 643 had `(<137 countries)` — MDX parses `<137` as a JSX tag
starting with digit `1`, which is illegal and breaks the deploy with
"Unexpected character \`1\` (U+0031) before name". Surfaced after the
prior cache-prefix commit forced Mintlify to re-parse this file.

Replaced with "fewer than 137 countries" for unambiguous rendering.
Other `<` occurrences in this doc (lines 34, 642) are followed by
whitespace and don't trip MDX's tag parser.
2026-04-23 10:25:18 +04:00
Elie Habib
84ee2beb3e feat(energy): Energy Atlas end-to-end — pipelines + storage + shortages + disruptions + country drill-down (#3294)
* feat(energy): pipeline registries (gas + oil) — evidence-based schema

Day 6 of the Energy Atlas Release 1 plan (Week 2). First curated asset
registry for the atlas — the real gap vs GEF.

## Curated data (critical assets only, not global completeness)

scripts/data/pipelines-gas.json — 12 critical gas lines:
  Nord Stream 1/2 (offline; Swedish EEZ sabotage 2022; EU sanctions refs),
  TurkStream, Yamal–Europe (offline; Polish counter-sanctions),
  Brotherhood/Soyuz (offline; Ukraine transit expired 2024-12-31),
  Power of Siberia, Dolphin, Medgaz, TAP, TANAP,
  Central Asia–China, Langeled.

scripts/data/pipelines-oil.json — 12 critical oil lines:
  Druzhba North/South (N offline per EU 2022/879; S under landlocked
  derogation), CPC, ESPO (+ price-cap sanction ref), BTC, TAPS,
  Habshan–Fujairah (Hormuz bypass), Keystone, Kirkuk–Ceyhan (offline
  since 2023 ICC ruling), Baku–Supsa, Trans-Mountain (TMX expansion
  May 2024), ESPO spur to Daqing.

Scope note: 75+ each is Week 2b work via GEM bulk import. Today's cut
is curated from first-hand operator disclosures + regulator filings so
I can stand behind every evidence field.

## Evidence-based schema (not conclusion labels)

Per docs/methodology/pipelines.mdx: no bare `sanctions_blocked` field.
Every pipeline carries an evidence bundle with `physicalState`,
`physicalStateSource`, `operatorStatement`, `commercialState`,
`sanctionRefs[]`, `lastEvidenceUpdate`, `classifierVersion`,
`classifierConfidence`. The public badge (`flowing|reduced|offline|
disputed`) is derived server-side from this bundle at read time.

## Seeder

scripts/seed-pipelines.mjs — single process publishes BOTH keys
(energy:pipelines:{gas,oil}:v1) via two runSeed() calls. Tiny datasets
(<20KB each) so co-location is cheap and guarantees classifierVersion
consistency.

Conventions followed (worldmonitor-bootstrap-registration skill):
- TTL 21d = 3× weekly cadence (gold-standard per
  feedback_seeder_gold_standard.md)
- maxStaleMin 20_160 = 2× cadence (health-maxstalemin-write-cadence skill)
- sourceVersion + schemaVersion + recordCount + declareRecords wired
  (seed-contract-foundation)
- Zero-case explicitly NOT allowed — MIN_PIPELINES_PER_REGISTRY=8 floor

## Health registration (dual, per feedback_two_health_endpoints_must_match)

- api/health.js: BOOTSTRAP_KEYS adds pipelinesGas + pipelinesOil;
  SEED_META adds both with maxStaleMin=20_160.
- api/seed-health.js: mirror entries with intervalMin=10_080 (maxStaleMin/2).

## Bundle registration

scripts/seed-bundle-energy-sources.mjs adds a single Pipelines entry
(not two) because seed-pipelines.mjs publishes both keys in one run —
listing oil separately would double-execute. Monitoring of the oil key
staleness happens in api/health.js instead.

## Tests (tests/pipelines-registry.test.mts)

17 passing node:test assertions covering:
- Schema validation (both registries pass validateRegistry)
- Identity resolution (no id collisions, id matches object key)
- Country ISO2 normalization (from/to/transit all match /^[A-Z]{2}$/)
- Endpoint geometry within Earth bounds
- Evidence rigor: non-flowing badges require at least one supporting
  evidence source (operator statement / sanctionRefs / ais-relay /
  satellite / press)
- ClassifierConfidence in 0..1
- Commodity/capacity pairing (gas uses capacityBcmYr, oil uses
  capacityMbd — mixing = test fail)
- validateRegistry rejects: empty object, null, no-evidence fixtures,
  below-floor counts

Typecheck clean (both tsconfig.json and tsconfig.api.json).

Next: Day 7 will add list-pipelines / get-pipeline-detail RPCs in
supply-chain/v1. Day 8 ships PipelineStatusPanel with DeckGL PathLayer
consuming the registry.

* fix(energy): split seed-pipelines.mjs into two entry points — runSeed hard-exits

High finding from PR review. scripts/seed-pipelines.mjs called runSeed()
twice in one process and awaited Promise.all. But runSeed() in
scripts/_seed-utils.mjs hard-exits via process.exit on ~9 terminal paths
(lines 816, 820, 839, 888, 917, 989, plus fetch-retry 946, fatal 859,
skipped-lock 81). The first runSeed to reach any terminal path exits the
entire node process, so the second runSeed's resolve never fires — only
one of energy:pipelines:{gas,oil}:v1 would ever be written.

Since the bundle scheduled seed-pipelines.mjs exactly once, and both
api/health.js and api/seed-health.js expect both keys populated, the
other registry would stay permanently EMPTY/STALE after deploy.

Fix: split into two entry-point scripts around a shared utility.

- scripts/_pipeline-registry.mjs (NEW, was seed-pipelines.mjs) — shared
  helpers ONLY. Exports GAS_CANONICAL_KEY, OIL_CANONICAL_KEY,
  PIPELINES_TTL_SECONDS, MAX_STALE_MIN, buildGasPayload, buildOilPayload,
  validateRegistry, recordCount, declareRecords. Underscore prefix marks
  it as non-entry-point (matches _seed-utils.mjs / _seed-envelope-source.mjs
  convention).
- scripts/seed-pipelines-gas.mjs (NEW) — imports from the shared module,
  single runSeed('energy','pipelines-gas',…) call.
- scripts/seed-pipelines-oil.mjs (NEW) — same shape, oil.
- scripts/seed-bundle-energy-sources.mjs — register BOTH seeders (not one).
- scripts/seed-pipelines.mjs — deleted.
- tests/pipelines-registry.test.mts — update import path to the shared
  module. All 17 tests still pass.

Typecheck clean (both configs). Tests pass. No other consumers import
from the deleted script.

* fix(energy): complete pipeline bootstrap registration per 4-file checklist

High finding from PR review. My earlier PR description claimed
worldmonitor-bootstrap-registration was complete, but I only touched two
of the four registries (api/health.js + api/seed-health.js). The bootstrap
hydration payload itself (api/bootstrap.js) and the shared cache-keys
registry (server/_shared/cache-keys.ts) still had no entry for either
pipeline key, so any consumer that reads bootstrap data would see
pipelinesGas/pipelinesOil as missing on first load.

Files updated this commit:

- api/bootstrap.js — KEYS map + SLOW_KEYS set both gain pipelinesGas +
  pipelinesOil. Placed next to sprPolicies (same curated-registry cadence
  and tier). Slow tier is correct: weekly cron, not needed on first paint.
- server/_shared/cache-keys.ts — PIPELINES_GAS_KEY + PIPELINES_OIL_KEY
  exported constants (matches SPR_POLICIES_KEY pattern), BOOTSTRAP_KEYS map
  entries, and BOOTSTRAP_TIERS entries (both 'slow').

Not touched (intentional):
- server/gateway.ts — pipeline data is free-tier per the Energy Atlas
  plan; no PREMIUM_RPC_PATHS entry required. Energy Atlas monetization
  hooks (scenario runner, MCP tools, subscriptions) are Release 2.

Full 4-file checklist now complete:
   server/_shared/cache-keys.ts (this commit)
   api/bootstrap.js          (this commit)
   api/health.js             (earlier in PR)
   api/seed-health.js        (earlier in PR — dual-registry rule)

Typecheck clean (both configs).

* feat(energy): ListPipelines + GetPipelineDetail RPCs with evidence-derived badges

Day 7 of the Energy Atlas Release 1 plan (Week 2). Exposes the pipeline
registries (shipped in Day 6) via two supply-chain RPCs and ships the
evidence-to-badge derivation server-side.

## Proto

proto/worldmonitor/supply_chain/v1/list_pipelines.proto — new:
- ListPipelinesRequest { commodity_type?: 'gas' | 'oil' }
- ListPipelinesResponse { pipelines[], fetched_at, classifier_version, upstream_unavailable }
- GetPipelineDetailRequest { pipeline_id (required, query-param) }
- GetPipelineDetailResponse { pipeline?, revisions[], fetched_at, unavailable }
- PipelineEntry — wire shape mirroring scripts/data/pipelines-{gas,oil}.json
  + a server-derived public_badge field
- PipelineEvidence, OperatorStatement, SanctionRef, LatLon, PipelineRevisionEntry

service.proto adds both rpc methods with HTTP_METHOD_GET + path bindings:
  /api/supply-chain/v1/list-pipelines
  /api/supply-chain/v1/get-pipeline-detail

`make generate` regenerated src/generated/{client,server}/… + docs/api/
OpenAPI json/yaml.

## Evidence-derivation

server/worldmonitor/supply-chain/v1/_pipeline-evidence.ts — new.
derivePublicBadge(evidence) → 'flowing' | 'reduced' | 'offline' | 'disputed'
is deterministic + versioned (DERIVER_VERSION='badge-deriver-v1').

Rules (first match wins):
1. offline + sanctionRef OR expired/suspended commercial → offline
2. offline + operator statement → offline
3. offline + only press/ais/satellite → disputed (single-source negative claim)
4. reduced → reduced
5. flowing → flowing
6. unknown / malformed → disputed

Staleness guard: non-flowing badges on >14d-old evidence demote to
disputed. Flowing is the optimistic default — stale "still flowing" is
safer than stale "offline". Matches seed-pipelines-{gas,oil}.mjs maxStaleMin.

Tests (tests/pipeline-evidence-derivation.test.mts) — 15 passing cases
covering happy paths, disputed fallbacks, staleness guard, versioning.

## Handlers

server/worldmonitor/supply-chain/v1/list-pipelines.ts
- Reads energy:pipelines:{gas,oil}:v1 via getCachedJson.
- projectPipeline() narrows the Upstash `unknown` into PipelineEntry
  shape + calls derivePublicBadge.
- Honors commodity_type filter (skip the opposite registry's Redis read
  when the client pre-filters).
- Returns upstream_unavailable=true when BOTH registries miss.

server/worldmonitor/supply-chain/v1/get-pipeline-detail.ts
- Scans both registries by id (ids are globally unique per
  tests/pipelines-registry.test.mts).
- Empty revisions[] for now; auto-revision log wires up in Week 3.

handler.ts registers both into supplyChainHandler.

## Gateway

server/gateway.ts adds 'static' cache-tier for both new RPC paths
(registry is slow-moving; 'static' matches the other read-mostly
supply-chain endpoints).

## Consumer wiring

Not in this commit — PipelineStatusPanel (Day 8) is what will call
listPipelines/getPipelineDetail via the generated client. pipelinesGas
+ pipelinesOil stay in PENDING_CONSUMERS until Day 8.

Typecheck clean (both configs). 15 new tests + 17 registry tests all pass.

* feat(energy): PipelineStatusPanel — evidence-backed status table + drawer

Day 8 of the Energy Atlas Release 1 plan. First consumer of the Day 6–7
registries + RPCs.

## What this PR adds

- src/components/PipelineStatusPanel.ts — new panel (id=pipeline-status).
  * Bootstrap-hydrates from pipelinesGas + pipelinesOil for instant first
    paint; falls through to listPipelines() RPC if bootstrap misses.
    Background re-fetch runs on every render so a classifier-version bump
    between bootstrap stamp and first view produces a visible update.
  * Table rows sorted non-flowing-first (offline / reduced / disputed
    before flowing) — what an atlas reader cares about.
  * Click-to-expand drawer calls getPipelineDetail() lazily — operator
    statements, sanction refs (with clickable source URLs), commercial
    state, classifier version + confidence %, capacity + route metadata.
  * publicBadge color-chip palette matches the methodology doc.
  * Attribution footer with GEM (CC-BY 4.0) credit + classifier version.

- src/components/index.ts — barrel export.
- src/app/panel-layout.ts — import + createPanel('pipeline-status', …).
- src/config/panels.ts — ENERGY_PANELS adds 'pipeline-status' at priority 1.

## PENDING_CONSUMERS cleanup

tests/bootstrap.test.mjs — removes 'pipelinesGas' + 'pipelinesOil' from
the allowlist. The invariant "every bootstrap key has a getHydratedData
consumer" now enforces real wiring for these keys: the panel literally
calls getHydratedData('pipelinesGas') and getHydratedData('pipelinesOil').
Future regressions that remove the consumer will fail pre-push.

## Consumer contract verified

- 67 tests pass including bootstrap.test.mjs consumer coverage check.
- Typecheck clean.
- No DeckGL PathLayer in this commit — existing 'pipelines-layer' has a
  separate data source, so modifying DeckGLMap.ts to overlay evidence-
  derived badges on the map is a follow-up commit to avoid clobbering.

## Out of scope for Day 8 (next steps on same PR)

- DeckGL PathLayer integration (color pipelines on the main map by
  publicBadge, click-to-open this drawer) — Day 8b commit.
- Storage facility registry + StorageFacilityMapPanel — Days 9-10.

* fix(energy): PipelineStatusPanel bootstrap path — client-side badge derivation

High finding from PR review. The Day-8 panel crashed on first paint
whenever bootstrap hydration succeeded, because:

- Bootstrap hydrates raw scripts/data/pipelines-{gas,oil}.json verbatim.
- That JSON does NOT include publicBadge — that field is only added by
  the server handler's projectPipeline() in list-pipelines.ts.
- PipelineStatusPanel passed raw entries into badgeChip(), which called
  badgeLabel(undefined).charAt(0) → TypeError.

The background RPC refresh that would have repaired the data never ran
because the panel threw before reaching it. So the exact bootstrap path
newly wired in commit 6b01fa537 was broken for the new panel.

Fix: move the evidence→badge deriver to src/shared/pipeline-evidence.ts
so the client panel and the server handler run the identical function on
identical inputs. Panel projects raw bootstrap JSON through the shared
deriver client-side, producing the same publicBadge the RPC would have
returned. No UI flicker on hydration because pre- and post-RPC badges
match exactly (same function, same input).

## Changes

- src/shared/pipeline-evidence.ts (NEW) — pure deriver with duck-typed
  PipelineEvidenceInput (no generated-type dependency, so both client
  and server assign their proto-typed evidence bundles by structural
  subtyping). Exports derivePipelinePublicBadge + version + type.
- server/worldmonitor/supply-chain/v1/_pipeline-evidence.ts — now a thin
  re-export of the shared module under its older name so in-handler
  imports keep working without a sweep.
- src/components/PipelineStatusPanel.ts:
  * Imports derivePipelinePublicBadge from @/shared/pipeline-evidence.
  * NEW projectRawPipeline() defensively coerces every field from
    unknown → PipelineEntry shape, mirroring the server projection.
  * buildBootstrapResponse now routes every raw entry through the
    projection before returning, so the wire-format PipelineEntry[] the
    renderer receives always has publicBadge populated.
  * badgeChip() gained a null-guard fallback to 'disputed' — belt +
    braces so even if a future caller passes an undefined, the UI
    renders safely instead of throwing.
  * BootstrapRegistry renamed RawBootstrapRegistry with a comment
    explaining why the seeder ships raw JSON (not wire format).

## Regression tests

tests/pipeline-panel-bootstrap.test.mts (NEW) — 6 tests that exercise
the bootstrap-first-paint path end-to-end:

- Every gas + oil curated entry produces a valid badge.
- Raw entries never ship with pre-computed publicBadge (contract guard
  on the seed data format).
- Deriver never throws on undefined/null/{} evidence (was the crash).
- Nord Stream 1 regression check (offline + paperwork → offline).
- Druzhba-South staleness behavior (reduced when fresh, disputed after
  60 days without update).

38/38 tests now pass (17 registry + 15 deriver + 6 new bootstrap-path).
Typecheck clean on both configs.

## Invariant preserved

The server handler and the panel render identical badges because:
1. Same pure function (imported from the same module).
2. Same deterministic rules, same staleness window.
3. Same bootstrap data read by both paths (Redis → either bootstrap
   payload or RPC response).

No UI flicker on hydration.

* fix(energy): three PR-review P2s on PipelineStatusPanel + aggregators

## P2-1 — sanitizeUrl on external evidence links (XSS hardening)

Sanction-ref URLs and operator-statement URLs were interpolated with
escapeHtml only. HTML-escaping blocks tag injection but NOT javascript:
or data: URL schemes, so a bad URL in the seeded registry would execute
in-app when a reader clicked the evidence link. Every other panel in
the codebase (NewsPanel, GdeltIntelPanel, GeoHubsPanel, AirlineIntelPanel,
MonitorPanel) uses sanitizeUrl for this exact reason.

Fix: import sanitizeUrl from @/utils/sanitize and route both hrefs
through it. sanitizeUrl() drops non-http(s) schemes + returns '' on
invalid URLs. The renderer now suppresses the <a> entirely when
sanitize rejects — the date label still renders as plain text instead
of becoming an executable link.

## P2-2 — loadDetail catch path missing stale-response guard

The success path at loadDetail() checked `this.selectedId !== pipelineId`
to suppress stale responses when the user has clicked another pipeline
mid-flight. The catch path at line 219 had no such guard: if the user
clicked A, then B, and A's request failed before B resolved, A's error
handler cleared detailLoading and detail, showing "Pipeline detail
unavailable" for B's drawer even though B was still loading.

Fix: mirror the same `if (this.selectedId !== pipelineId) return` guard
in the catch path. The newer request now owns the drawer state
regardless of which path (success OR failure) the older one took.

## P2-3 — always-gas-preference aggregator for classifierVersion + fetchedAt

Three call sites (list-pipelines.ts handler, get-pipeline-detail.ts
handler, PipelineStatusPanel bootstrap projection) computed aggregate
classifier version and fetchedAt by `gas?.x || oil?.x || fallback`.
That was defensible when a single seed-pipelines.mjs wrote both keys
atomically (fix commit 29b4ac78f split this into two separate Railway
cron entry points). Now gas + oil cron independently, so mixed-version
(gas=v1, oil=v2 during classifier rollout) and mixed-timestamp (oil
refreshed 6h after gas) windows are the EXPECTED state, not the
exceptional one. The comment in list-pipelines.ts even said "pick the
newest classifier version" but the code didn't actually compare.

Fix: add two shared helpers in src/shared/pipeline-evidence.ts —

- pickNewerClassifierVersion(a,b) — parses /^v(\\d+)$/ and returns the
  higher-numbered version; falls back to lexicographic for non-v-
  prefixed values; handles single-missing inputs.
- pickNewerIsoTimestamp(a,b) — Date.parse()-compares and returns the
  later ISO; handles missing / malformed inputs gracefully.

Both server RPCs and the panel bootstrap projection now call these
helpers identically, so clients are told the truth about version +
freshness during partial rollouts.

## Tests

Extended tests/pipeline-evidence-derivation.test.mts with 8 new
assertions covering both pickers:

- Higher v-number wins regardless of order (v1 vs v2 → v2 both ways)
- Single-missing falls back to the one present
- Missing + missing → default 'v1' for version / '' for ts
- Non-v-numbered values fall back to lexicographic
- Explicit regression: "gas=v1 + oil=v2 during rollout" returns v2
- Explicit regression: "oil fresher than gas" returns the oil timestamp

38 → 46 tests. All pass. Typecheck clean on both configs.

* feat(energy): DeckGL PathLayer colored by evidence-derived badge + map↔panel link

Day 8b of the Energy Atlas plan. Pipelines now render on the main
DeckGL map of the energy variant colored by their derived publicBadge,
and clicking a pipeline on the map opens the same evidence drawer the
panel row-click opens.

## Why this commit

Day 8 shipped the PipelineStatusPanel as a table + drawer view.
Reviewer flag notwithstanding (fixed in 149d33ec3 + db52965cd), a
table-only pipeline view is a weak product compared to the map-centric
atlas it's meant to rival. The map-layer differentiation is the whole
point of the feature.

## What this adds

src/components/DeckGLMap.ts:
- New createEnergyPipelinesLayer() — reads hydrated pipeline registries
  via getHydratedData, projects raw JSON through the shared deriver
  (src/shared/pipeline-evidence.ts), renders a DeckGL PathLayer colored
  by publicBadge:
    flowing  → green (46,204,113)
    reduced  → amber (243,156,18)
    offline  → red   (231,76,60)
    disputed → purple (155,89,182)
  Offline + disputed get thicker strokes (3px vs 2px) for at-a-glance
  surfacing of disrupted assets. Geometry comes from raw startPoint +
  waypoints[] + endPoint per asset (straight line when no waypoints).
- Branching at line ~1498: SITE_VARIANT === 'energy' routes to the
  new method; other variants keep the static PIPELINES config (colored
  by oil/gas type). Existing commodity/finance/full map layers are
  untouched — no cross-variant leakage.
- onClick handler emits `energy:open-pipeline-detail` as a window
  CustomEvent with { pipelineId }. Loose coupling: the map doesn't
  import the panel, the panel doesn't import the map.
- Fallback: if bootstrap hasn't hydrated yet, createEnergyPipelinesLayer
  falls back to the static createPipelinesLayer() so the pipelines
  toggle always shows *something*.

src/components/PipelineStatusPanel.ts:
- Constructor registers a window event listener for
  'energy:open-pipeline-detail' → calls this.loadDetail(pipelineId) →
  drawer opens on the clicked asset. Map click and row click converge
  on the same drawer, same evidence view.
- destroy() removes the listener to prevent ghost handlers after panel
  unmount.

## Guarantees

- Bootstrap parity: the DeckGL layer calls the SAME derivePipelinePublicBadge
  as the panel and the server handler, so the map color, the table row
  chip, and the RPC response all agree on the badge. No flicker, no
  drift, no confused user.
- Variant isolation: only SITE_VARIANT === 'energy' triggers the new
  path. Commodity / finance / full map layers untouched.
- No cross-component import: the panel doesn't reference the map class
  and vice versa. The event contract is the only coupling — testable,
  swappable, tauri-safe (guarded with `typeof window !== 'undefined'`).

Typecheck clean. PR #3294 now has 8 commits.

Follow-up backlog:
- Add waypoints[] to the curated pipelines-{gas,oil}.json so the map
  draws real routes instead of straight lines (cosmetic; does not
  affect correctness).
- Tooltip case in the picking tooltip registry (line ~3748) so hover
  shows "Nord Stream 1 · OFFLINE" before click.

* fix(energy): three PR-review findings on Day 8b DeckGL integration

## P1 — getHydratedData single-use race between map + panel

src/services/bootstrap.ts:34 — `if (val !== undefined) hydrationCache.delete(key);`
The helper drains its slot on first read. Day 8 (PipelineStatusPanel) and
Day 8b (createEnergyPipelinesLayer) BOTH call getHydratedData('pipelinesGas')
and getHydratedData('pipelinesOil') — whoever renders first drains the cache
and forces the loser onto its fallback path (panel → RPC, map → static
PIPELINES layer). The commit's "shared bootstrap-backed data" guarantee
did not actually hold.

Fix: new src/shared/pipeline-registry-store.ts that reads once and memoizes.
Both consumers read through getCachedPipelineRegistries() — same data, same
reference, unlimited re-reads. When the panel's background RPC fetch lands,
it calls setCachedPipelineRegistries() to back-propagate fresh data into
the store so the map's next re-render sees the newer classifierVersion +
fetchedAt too (no map/panel drift during classifier rollouts).

Test-only injection hook (__setBootstrapReaderForTests) makes the drain-once
semantics observable without a real bootstrap payload.

## P2 — pipelines-layer tooltip regresses to blank label on energy variant

src/components/DeckGLMap.ts:3748 (pipelines-layer tooltip case) still assumed
the static-config shape (obj.type). The new energy layer emits objects with
commodityType + badge fields, so the tooltip's type-ternary fell through to
the generic fallback — hover rendered " pipeline" (empty leading commodity)
instead of "Nord Stream 1 · OFFLINE".

Fix: differentiate by presence of obj.badge (only the energy layer sets it).
On the energy variant, tooltip now reads name + commodity + badge. Static-
config variants (commodity / finance / full) keep their existing format
unchanged.

## P2 — createEnergyPipelinesLayer dropped highlightedAssets behavior

The static createPipelinesLayer() reads this.highlightedAssets.pipeline and
threads it into getColor / getWidth with an updateTrigger on the signature.
Any caller using flashAssets('pipeline', [...]) or highlightAssets([...])
gets a visible red-outline flash on the matching paths. My Day 8b energy
layer ignored the set entirely — those APIs silently no-op'd on the energy
variant.

Fix: createEnergyPipelinesLayer() now reads the same highlight set, applies
HIGHLIGHT_COLOR + wider stroke to matching IDs, and wires
updateTriggers: { getColor: sig, getWidth: sig } so DeckGL actually
recomputes when the set changes.

Also removed the unnecessary layerCache.set() in the energy path: the
store can update via RPC back-propagation, and a cache keyed only on
highlight-signature would serve stale data. With ~25 critical-asset
pipelines, rebuild per render is trivial.

## Tests

tests/pipeline-registry-store.test.mts (NEW) — 5 tests covering the
drain-once read-many invariant: multiple consumers get cached data
without re-draining, RPC back-propagation updates the source, partial
updates preserve the other commodity, and pure RPC-first (no bootstrap)
works without invoking the reader.

All 51 PR tests pass. Typecheck clean on both configs.

* feat(energy): Day 9 — storage facility registry (UGS + SPR + LNG + crude hubs)

Ships 21 critical strategic storage facilities as a curated registry, same
evidence-bundle pattern as the pipeline registries in Day 7/8:

- scripts/data/storage-facilities.json — 4 UGS + 4 SPR + 6 LNG export +
  3 LNG import + 4 crude tank farms. Each carries physicalState +
  sanctionRefs + classifierVersion/Confidence + fillDisclosed/fillSource.
- scripts/_storage-facility-registry.mjs — shared helpers (validator,
  builder, canonical key, MAX_STALE_MIN). Validator enforces facility-type
  × capacity-unit pairing (ugs→TWh, spr/tank-farm→Mb, LNG→Mtpa) and the
  non-operational badge ⇒ evidence invariant.
- scripts/seed-storage-facilities.mjs — single runSeed entry (only one
  key, so no split-seeder dance needed).
- Registered in the 4-file bootstrap checklist: cache-keys.ts
  (STORAGE_FACILITIES_KEY + BOOTSTRAP_CACHE_KEYS + BOOTSTRAP_TIERS),
  api/bootstrap.js (KEYS + SLOW_KEYS), api/health.js (BOOTSTRAP_KEYS +
  SEED_META, 14d threshold = 2× weekly cron), api/seed-health.js (mirror).
- tests/bootstrap.test.mjs PENDING_CONSUMERS adds storageFacilities —
  Day 10 StorageFacilityMapPanel will remove it.
- tests/storage-facilities-registry.test.mts — 20 tests covering schema,
  identity, geometry, type×capacity pairing, evidence contract, and
  negative-input validator rejection.

Registry fields are slow-moving; badge derivation happens at read-time
server-side once the RPC handler lands in Day 10 (panel + deckGL
ScatterplotLayer). Seeded data is live in Redis from this commit so the
Day 10 PR only adds display surfaces.

Tests: 56 pass (36 prior + 20 new). Typecheck + typecheck:api clean.

* feat(energy): Day 10 — storage atlas (ListStorageFacilities RPC + DeckGL ScatterplotLayer + panel)

End-to-end wiring for the strategic storage registry seeded in Day 9. Same
pattern as the pipeline shipping path (Days 7+8+8b): proto → handler →
shared evidence deriver → panel → DeckGL map layer, with a shared
read-once store keeping map + panel aligned.

Proto + generated code:
- list_storage_facilities.proto: ListStorageFacilities +
  GetStorageFacilityDetail messages with StorageFacilityEntry,
  StorageEvidence, StorageSanctionRef, StorageOperatorStatement,
  StorageLatLon, StorageFacilityRevisionEntry.
- service.proto wires both RPCs under /api/supply-chain/v1.
- make generate → regenerated client + server stubs + OpenAPI.

Server handlers:
- src/shared/storage-evidence.ts: shared pure deriver. Duck-typed input
  interface avoids generated-type deps; identical rules to the pipeline
  deriver (sanction/commercial paperwork vs external-signal-only offline,
  14d staleness window, version pin).
- _storage-evidence.ts: thin re-export for server handler import ergonomics.
- list-storage-facilities.ts: reads STORAGE_FACILITIES_KEY from Upstash,
  projects raw → wire format, attaches derived publicBadge, filters by
  optional facilityType query arg.
- get-storage-facility-detail.ts: single-asset lookup for drawer.
- handler.ts registers both new methods.
- gateway.ts: both routes → 'static' cache tier (registry is near-static).

Panel + map:
- src/shared/storage-facility-registry-store.ts: drain-once memo mirroring
  pipeline-registry-store. Both panel and DeckGL layer read through this
  so the single-use getHydratedData drain doesn't race between consumers.
  RPC back-propagation via setCachedStorageFacilityRegistry() keeps map ↔
  panel on the same classifierVersion during rollouts.
- StorageFacilityMapPanel.ts: table + evidence drawer. Bootstrap hot path
  projects raw registry through same deriver as server so first-paint
  badge matches post-RPC badge (no flicker). sanitizeUrl + stale-response
  guards (success + catch paths) carried over from PipelineStatusPanel.
- DeckGLMap.ts createEnergyStorageLayer(): ScatterplotLayer keyed on
  badge color; log-scale radius (6km–26km) keeps Rehden visible next to
  Ras Laffan. Click dispatches 'energy:open-storage-facility-detail' —
  panel listens and opens its drawer (loose coupling, no direct refs).
- Tooltip branch on storage-facilities-layer shows facility type, country,
  capacity unit, and badge.
- Added 'storageFacilities' optional field to MapLayers type (optional so
  existing variant literals across commodity/finance/tech/happy/full/etc.
  don't need touching). Wired into LAYER_REGISTRY + VARIANT_LAYER_ORDER.energy
  + ENERGY_MAP_LAYERS + ENERGY_MOBILE_MAP_LAYERS. Panel entry added to
  ENERGY_PANELS + panel-layout createPanel. PENDING_CONSUMERS entry from
  Day 9 removed — panel + map layer are now real consumers.

Tests:
- storage-evidence-derivation.test.mts (17 tests): covers every curated
  facility yields a valid badge, null/malformed input never throws,
  offline sanction/commercial/operator rules, external-signal-only offline
  → disputed, staleness demotion.
- storage-facility-registry-store.test.mts (4 tests): drain-once, no-data
  drain, RPC update, pure-RPC-first path.

All 6,426 unit tests pass. Typecheck + typecheck:api clean. Pre-existing
src-tauri/sidecar/ test failure is unrelated (no diff touches src-tauri/).

* feat(energy): Day 11 — fuel-shortage registry schema + seed + RPC (classifier post-launch)

Ships v1 of the global fuel-shortage alert registry. Severity is the
CLASSIFIER OUTPUT (confirmed/watch), not a client derivation — we ship
the evidence alongside so readers can audit the grounds. v1 is seeded
from curated JSON; post-launch the proactive-intelligence classifier
(Day 12 work) extends the same key directly.

Data:
- scripts/data/fuel-shortages.json — 15 known active shortages
  (PK, LK, NG×2, CU, VE, LB, ZW, AR, IR, BO, KE, PA, EG, BY)
  spanning petrol/diesel/jet across confirmed + watch tiers. Each entry
  carries evidenceSources[] (regulator/operator/press), firstSeen,
  lastConfirmed, resolvedAt, impactTypes[], causeChain[], classifier
  version + confidence. Confirmed severity enforces authoritative
  evidence at schema level.

Seeder:
- scripts/_fuel-shortage-registry.mjs — shared validator (enforces
  iso2 country, enum products/severities/impacts/causes, authoritative
  evidence for confirmed). MIN_SHORTAGES=10.
- scripts/seed-fuel-shortages.mjs — single runSeed entry.
- Registered in seed-bundle-energy-sources.mjs at DAY cadence (shortages
  move faster than registry assets).

Bootstrap 4-file registration:
- cache-keys.ts: FUEL_SHORTAGES_KEY + BOOTSTRAP_CACHE_KEYS + BOOTSTRAP_TIERS.
- api/bootstrap.js: KEYS + SLOW_KEYS.
- api/health.js: BOOTSTRAP_KEYS + SEED_META (2880min = 2× daily cron).
- api/seed-health.js: mirrors intervalMin=1440.

Proto + RPC:
- list_fuel_shortages.proto: ListFuelShortages (country/product/severity
  query facets) + GetFuelShortageDetail messages with FuelShortageEntry,
  FuelShortageEvidence, FuelShortageEvidenceSource.
- service.proto wires both new RPCs under /api/supply-chain/v1.
- list-fuel-shortages.ts handler projects raw → wire format, supports
  server-side country/product/severity filtering.
- get-fuel-shortage-detail.ts single-shortage lookup.
- handler.ts registers both. gateway.ts: 'medium' cache-tier (daily
  classifier updates warrant moderate freshness).

Shared evidence helper:
- src/shared/shortage-evidence.ts: deriveShortageEvidenceQuality maps
  (confidence + authoritative-source count + freshness) → 'strong' |
  'moderate' | 'thin' for client-side sort/trust indicators. Does NOT
  change severity — classifier owns that decision.
- countEvidenceSources buckets sources for the drawer's "n regulator /
  m press" line.

Tests:
- tests/fuel-shortages-registry.test.mts (19 tests): schema, identity,
  enum coverage, evidence contract (confirmed → authoritative source),
  validateRegistry negative cases.
- tests/shortage-evidence.test.mts (10 tests): quality deriver edge
  cases, source bucketing.
- tests/bootstrap.test.mjs PENDING_CONSUMERS adds fuelShortages —
  FuelShortagePanel arrives Day 12 which will remove the entry.

Typecheck + typecheck:api clean. 64 tests pass.

* feat(energy): Day 12 — FuelShortagePanel + DeckGL shortage pins

End-to-end wiring of the fuel-shortage registry shipped in Day 11: panel
on the Energy variant page, ScatterplotLayer pins on the DeckGL map,
both reading through a shared single-drain store so they don't race on
the bootstrap cache.

Panel:
- src/components/FuelShortagePanel.ts — table sorted by severity (confirmed
  first) then evidence quality (strong → thin) then most-recent lastConfirmed.
  Drawer shows short description, first-seen / last-confirmed / resolved,
  impact types, cause chain, classifier version/confidence, and a typed
  evidence-source list with regulator/operator/press chips. sanitizeUrl on
  every href so classifier-ingested URLs can't render as javascript:. Same
  stale-response guards on success + catch paths as the other detail drawers.
- Consumes deriveShortageEvidenceQuality for client-side trust indicator
  (three-dot ●●● / ●●○ / ●○○), NOT for severity — severity is classifier
  output.
- Registered in ENERGY_PANELS + panel-layout.ts + components barrel.

Shared store:
- src/shared/fuel-shortage-registry-store.ts — same drain-once memoize
  pattern as pipeline- and storage-facility-registry-store. Both the
  panel and the DeckGL shortage-pins layer read through it.

DeckGL layer:
- DeckGLMap.createEnergyShortagePinsLayer: ScatterplotLayer placing one
  pin per active shortage at the country centroid (via getCountryCentroid
  from services/country-geometry). Stacking offset (~0.8° lon) when
  multiple shortages share a country so Nigeria's petrol + diesel don't
  render as a single dot. Confirmed pins 55km radius; watch 38km. Click
  dispatches 'energy:open-fuel-shortage-detail' — panel listens.
- Tooltip branch on fuel-shortages-layer: country · product · short
  description · severity.
- Layer registered in LAYER_REGISTRY, VARIANT_LAYER_ORDER.energy,
  ENERGY_MAP_LAYERS, ENERGY_MOBILE_MAP_LAYERS. MapLayers.fuelShortages
  is optional on the type so other variants' literals remain valid.

Tests:
- tests/fuel-shortage-registry-store.test.mts (4 tests): drain-once,
  no-data, RPC back-prop, pure-RPC-first path.
- tests/bootstrap.test.mjs — fuelShortages removed from PENDING_CONSUMERS.

Typecheck + typecheck:api clean. 39 tests pass (plus full suite in pre-push).

* feat(energy): Day 13 — energy disruption event log + asset timeline drawer

Ships the energy:disruptions:v1 registry that threads together pipelines
and storage facilities: state transitions (sabotage, sanction, maintenance,
mechanical, weather, commercial, war) keyed by assetId so any asset's
drawer can render its history without a second registry lookup.

Data + seeder:
- scripts/data/energy-disruptions.json — 12 curated events spanning
  Nord Stream 1/2 sabotage, Druzhba sanctions, CPC force majeure,
  TurkStream maintenance, Yamal halt, Rehden trusteeship, Arctic LNG 2
  sanction, ESPO drone strikes, BTC fire (historical), Sabine Pass
  Hurricane Beryl, Power of Siberia ramp. Each event links back to a
  seeded asset.
- scripts/_energy-disruption-registry.mjs — validator enforces valid
  assetType/eventType/cause enums, http(s) sources, startAt ≤ endAt,
  MIN_EVENTS=8.
- scripts/seed-energy-disruptions.mjs — runSeed entry (weekly cron).
- Bundle entry at 7×DAY cadence.

Bootstrap 4-file registration (cache-keys.ts + bootstrap.js + health.js +
seed-health.js) — energyDisruptions in PENDING_CONSUMERS because panel
drawers fetch lazily via RPC on drawer-open rather than hydrating from
bootstrap directly.

Proto + handler:
- list_energy_disruptions.proto: ListEnergyDisruptions with
  assetId / assetType / ongoingOnly query facets. Returns events sorted
  newest-first.
- list-energy-disruptions.ts projects raw → wire format, supports all
  three query facets.
- Registered in handler.ts. gateway.ts: 'medium' cache tier.

Shared timeline helper:
- src/shared/disruption-timeline.ts — pure formatters (formatEventWindow,
  formatCapacityOffline, statusForEvent). No generated-type deps so
  PipelineStatusPanel + StorageFacilityMapPanel import the same helpers
  and render the timeline identically.

Panel integration:
- PipelineStatusPanel.loadDetail now fetches getPipelineDetail +
  listEnergyDisruptions({assetId, assetType:'pipeline'}) in parallel.
  Drawer gains "Disruption timeline (N)" section with event type, date
  window, capacity offline, cause chain, and short description per entry.
- StorageFacilityMapPanel gets identical treatment with assetType='storage'.
- Both reset detailEvents on closeDetail and on fresh click (stale-response
  safety).

Tests:
- tests/energy-disruptions-registry.test.mts (17 tests): schema, identity,
  enum coverage, evidence, negative inputs.
- tests/bootstrap.test.mjs — energyDisruptions added to PENDING_CONSUMERS.

Typecheck + typecheck:api clean. 51 tests pass locally (plus full suite
in pre-push).

* feat(energy): Day 14 — country drill-down Atlas exposure section

Extends CountryDeepDivePanel's existing "Energy Profile" card with a
mini Atlas-exposure section that surfaces per-country exposure to the
new registries we shipped in Days 7-13.

For each country:
- Pipelines touching this country (from, to, or transit) — clickable
  rows that dispatch 'energy:open-pipeline-detail' so the PipelineStatusPanel
  drawer opens on the energy variant; no-op on other variants.
- Storage facilities in this country — same loose-coupling pattern
  with 'energy:open-storage-facility-detail'.
- Active fuel shortages in this country — severity breakdown line
  (N confirmed · M watch) plus clickable rows emitting
  'energy:open-fuel-shortage-detail'.

Silent absence: sections render only when the country has matching
assets/events, so countries with no pipeline, storage, or shortage
touchpoints see the existing energy-profile card unchanged.

Lazy stores: reads go through the same shared drain-once stores
(getCachedPipelineRegistries, getCachedStorageFacilityRegistry,
getCachedFuelShortageRegistry) so CountryDeepDivePanel does NOT race
with Atlas panels over the single-drain bootstrap cache. Dynamic
import() keeps the three stores out of the panel's static import graph
so non-energy variants can tree-shake them.

Typecheck clean. No schema changes; purely additive UI read from
already-shipped registries.

* docs(energy): methodology page for energy disruption event log

Fills the /docs/methodology/disruptions URL referenced by
list_energy_disruptions.proto, scripts/_energy-disruption-registry.mjs,
and the panel attribution footers. Explains scope (state transitions
not daily noise), data shape, what counts as a disruption, classifier
evolution path, RPC contract, and ties into the sibling pipeline +
storage + shortage methodology pages.

No code change; pure docs completion for Week 4 launch polish.

* fix(energy): upstreamUnavailable only fires when Redis returned nothing

Two handlers (list-storage-facilities + list-pipelines) conflated "empty
filter result on a healthy registry" with "upstream unavailable". A
caller who queried one facilityType/commodityType and legitimately got
zero matches was told the upstream was down — which may push clients to
error-state rendering or suppress caching instead of showing a valid
empty list.

list-storage-facilities.ts — upstreamUnavailable now only fires when
`raw` is null (Redis miss). Zero filtered rows on a healthy registry
returns upstreamUnavailable: false + empty array. Matches the sibling
list-fuel-shortages handler and the wire contract in
list_storage_facilities.proto.

list-pipelines.ts — same bug, subtler shape. Now checks "requested at
least one side AND received nothing" rather than "zero rows after
collection". A filter that legitimately matches no gas/oil pipelines on
a healthy registry now returns upstreamUnavailable: false.

list-energy-disruptions.ts and list-fuel-shortages.ts already had the
correct shape (only flag unavailable when raw is missing) — left as-is.

Typecheck + typecheck:api clean. No tests added: the existing registry
schema tests cover the projection/filter helpers, and the handler-level
gating change is documented in code comments for future audits.

* fix(energy): three Greptile findings on PR #3294

Two P1 filter bugs (resolved shortages rendered as active) and one P2
contract inconsistency on the disruptions handler.

P1: DeckGLMap createEnergyShortagePinsLayer rendered every shortage in
the registry as an active crisis pin — including entries where the
classifier has written resolvedAt to mark the crisis over. Added a
filter so only entries with a null/empty resolvedAt become map pins.
Curated v1 data has resolvedAt=null everywhere so no visible change
today, but the moment the classifier starts writing resolutions
post-launch, resolved shortages would have appeared as ongoing.

P1: CountryDeepDivePanel renderAtlasExposure had the same bug in the
country drill-down — "N confirmed · M watch" counts included resolved
entries, inflating the active-crisis line per country. Same one-line
filter fix.

P2: list-energy-disruptions.ts gated upstreamUnavailable on
`!raw?.events` — a partial write (top-level object present but `events`
property missing) fired the "upstream down" flag, inconsistent with
the sibling handlers (list-pipelines, list-storage-facilities,
list-fuel-shortages) that only fire on `!raw`. Rewrote to match:
`!raw` → upstreamUnavailable, empty events → normal empty list. This
also aligns with the contract documented on the upstream-unavailable-
vs-empty-filter skill extracted from the earlier P2 review.

Typecheck + typecheck:api clean. All three fixes are one-liner filter
or gate changes; no test additions needed (registry tests still pass
with v1 data since resolvedAt is null throughout).
2026-04-23 07:34:07 +04:00
Elie Habib
c489aa6dab fix(pro-marketing): /pro reflects entitlement — swap upgrade CTAs to dashboard link for Pro users (#3301)
* fix(pro-marketing): /pro reflects entitlement — swap upgrade CTAs for dashboard link when user is already Pro

Reported: a Pro subscriber visiting /pro was pitched "UPGRADE TO PRO"
in the nav + "CHOOSE YOUR PLAN" in the hero, even though the dashboard
correctly recognized them as Pro. The avatar bubble was visible (per
PR-3250) but the upgrade CTAs were unconditional — none of the 14 prior
rollout PRs added an entitlement check to the /pro bundle.

Root cause: pro-test (/pro) is a separate React bundle with no Convex
client and no entitlement awareness. PR-3250 made the nav auth-aware
(signed-in vs anonymous) but not entitlement-aware (pro vs free). A
paying Dodo subscriber whose Clerk `publicMetadata.plan` isn't written
(our webhook pipeline doesn't set it — documented in panel-gating.ts)
still sees the upgrade pitch.

## Changes

### api/me/entitlement.ts (new)

Tiny edge endpoint returning `{ isPro: boolean }` via the existing
`isCallerPremium` helper, which does the canonical two-signal check
(Clerk pro role OR Convex Dodo entitlement tier >= 1). `Cache-Control:
private, no-store` — entitlement flips when Dodo webhooks fire, and
/pro reads it on every load.

### pro-test/src/App.tsx — useProEntitlement hook + conditional CTAs

- New `useProEntitlement(signedIn)` hook. When signed in, fetches
  `/api/me/entitlement` with the Clerk bearer token. Falls back to
  `isPro: false` on any error — /pro stays in upgrade-pitch mode
  rather than silently hiding the purchase path on a flaky network.
- Navbar: "UPGRADE TO PRO" → "GO TO DASHBOARD" (→ worldmonitor.app)
  when `isLoaded && user && isChecked && isPro`. Free/anonymous users
  see the original upgrade CTA unchanged.
- Hero: "CHOOSE YOUR PLAN" → "GO TO DASHBOARD" under the same condition.
  Also removes the #pricing anchor jump which is actively misleading
  for a paying customer.
- Deliberately delays the swap until the entitlement check resolves —
  a one-frame flash of "Upgrade" for a free signed-in user is better
  than a flash of "Go to Dashboard" for an unpaid visitor.

### Locale: en.json adds `nav.goToDashboard` + `hero.goToDashboard`

Other locales fall back to English via i18next's `fallbackLng` — no
translation files need updating for this change to work everywhere.

Bundle rebuilt on Node 22 to match CI.

## Post-Deploy Monitoring & Validation
- Test: sign in on /pro as an existing Pro user → nav shows
  "GO TO DASHBOARD", hero CTA shows "GO TO DASHBOARD".
- Test: sign in on /pro as a free user → original "UPGRADE TO PRO" /
  "CHOOSE YOUR PLAN" CTAs remain unchanged.
- Test: anonymous visitor → identical to pre-change behavior.
- Failure signal: any user report of "went to /pro as Pro user, still
  saw upgrade" within 48h → rollback trigger. Check Sentry for
  `surface: pro-marketing` + action `load-clerk-for-nav` or similar.

* fix(pro-marketing): address PR review — sebuf exception + Sentry on entitlement check

- api/api-route-exceptions.json: register /api/me/entitlement.ts as an
  internal-helper exception. It's a thin wrapper over the canonical
  isCallerPremium helper; the authoritative gates remain in panel-gating,
  isCallerPremium, and gateway.ts PREMIUM_RPC_PATHS. This endpoint exists
  only so the separate pro-test bundle (no Convex client) can ask the
  same question without reimplementing the two-signal check. Unblocks
  the sebuf API contract lint.
- Greptile P2: capture entitlement-check failures to Sentry to match
  the useClerkUser catch-block pattern. Tag surface=pro-marketing,
  action=check-entitlement.

* fix(pro-marketing): address PR review — retry-on-null-token + share entitlement state via context

Addresses reviewer P1 + P2 on PR #3301:

P1 — useProEntitlement treated a first null token as a final "not Pro"
result. Clerk can expose `user` before the session-token endpoint is
ready (same reason services/checkout.ts:getAuthToken retries once after
2s). Without retry, a real Pro user hitting /pro on a cold Clerk load
got a permanent isPro=false for the whole session, so the upgrade CTAs
stayed visible even after Clerk finished warming up. Fix: mirror the
checkout.ts retry pattern — try, sleep 2s, try again.

P2 — Navbar and Hero each called useProEntitlement(!!user), producing
two independent /api/me/entitlement fetches AND two independent state
machines that could disagree on transient failure (one 200, one 500 →
nav and hero showing different CTAs). Fix: hoist the effect into a
ProEntitlementProvider at the App root; Navbar and Hero now both read
from the same Context. One fetch per page load, one source of truth.

No behavior change for anonymous users or for successful Pro checks.

* fix(api/me/entitlement): distinguish auth failure from free-tier

Reviewer P2: returning 200 { isPro: false } for both "free user" and
"bearer missing/invalid" collapses the two states, making a /pro auth
regression read like normal free-tier traffic in edge logs / monitoring.

Fix: validate the bearer with validateBearerToken BEFORE delegating to
isCallerPremium. On missing/malformed/invalid bearer return 401
{ error: "unauthenticated" }; on valid bearer return 200 { isPro } as
before. /pro's client already treats any non-200 as isPro:false (safe
default), so no behavior change for callers — only observability
improves.

P1 (reviewer claim): PR-3298's wm_checkout=success bridge is not wired
end-to-end. NOT reproducible — src/services/checkout-return.ts lines
35-36, 52, and 100 already recognize the marker and return
{ kind: 'success' }, which src/app/panel-layout.ts:190 consumes via
`returnResult.kind === 'success'` to trigger showCheckoutSuccess. No
code change needed; the wiring landed in PR-3274 before PR-3298.
2026-04-22 23:39:32 +04:00
Elie Habib
52659ce192 feat(resilience): PR 1 — energy construct repair (flag-gated) (#3289)
* docs(resilience): PR 1 foundation — Option B framing + v2 energy construct spec

First commit in PR 1 of the resilience repair plan. Zero scoring-behaviour
change; sets up the construct contract that the code changes will implement.

Declares the framing decision required by plan section 3.2 before any
scorer code lands: Option B (power-system security) is adopted. Electricity
grids are the dominant short-horizon shock-transmission channel, and the
choice lets the v2 energy indicator set share one denominator (percent of
electricity generation) instead of mixing primary-energy and power-system
measures in a composite.

Methodology doc changes:
  - Energy Domain section now documents both the legacy indicator set
    (still the default) and the v2 indicator set (flag-gated), under a
    single #### Energy H4 heading so the methodology-doc linter still
    asserts dimension-id parity with the registry.
  - v2 indicators: importedFossilDependence (EG.ELC.FOSL.ZS x
    max(EG.IMP.CONS.ZS, 0)), lowCarbonGenerationShare (EG.ELC.NUCL.ZS +
    EG.ELC.RNEW.ZS), powerLossesPct (EG.ELC.LOSS.ZS), reserveMarginPct
    (IEA), euGasStorageStress (renamed + scoped to EU), energyPriceStress
    (retained at 0.15 weight).
  - Retired under v2: electricityConsumption, gasShare, coalShare,
    dependency (all into importedFossilDependence), renewShare.
  - electricityAccess moves from energy to infrastructure under v2.
  - Added a v2.1 changelog section documenting the flag-gated rollout,
    acceptance gates (per plan section 6), and snapshot filenames for
    the post-flag-flip captures.
  - Known-limitations items 1-3 updated to note PR 1 lands the v2
    construct behind RESILIENCE_ENERGY_V2_ENABLED (default off).

Methodology-doc linter + mdx-lint + typecheck all clean. Indicator
registry, seeders, and scorer rewrite land in subsequent commits on
this same branch.

* feat(resilience): PR 1 — RESILIENCE_ENERGY_V2_ENABLED flag + scoreEnergy v2 + registry entries

Second commit in PR 1 of the resilience repair plan. Lands the flag,
the v2 scorer code path, and the registry entries the methodology
doc referenced. Default is flag off; published rankings are unchanged
until the flag flips in a later commit (after seeders land and the
acceptance-gate rerun produces a fresh post-flip snapshot).

Changes:

  - _shared.ts: isEnergyV2Enabled() function reader on the canonical
    RESILIENCE_ENERGY_V2_ENABLED env var. Dynamic read (like
    isPillarCombineEnabled) so tests can flip per-case.

  - _dimension-scorers.ts:
    - New Redis key constants for the three v2 seed keys plus the
      reserved reserveMargin key (seeder deferred per plan §3.1
      open-question).
    - EU_GAS_STORAGE_COUNTRIES set (EU + EFTA + UK) for the renamed
      euGasStorageStress signal per plan §3.5 point 2.
    - isEnergyV2EnabledLocal() — private duplicate of the flag reader
      to avoid a circular import (_shared.ts already imports from
      this module). Same env-var contract.
    - scoreEnergy split into scoreEnergyLegacy() + scoreEnergyV2().
      Public scoreEnergy() branches on the flag. Legacy path is
      byte-identical to the pre-commit behaviour.
    - scoreEnergyV2() reads four new bulk payloads, composes
      importedFossilDependence = fossilElectricityShare × max(netImports, 0)/100
      per plan §3.2, collapses net exporters to 0, and gates
      euGasStorageStress on EU membership so non-EU countries
      re-normalise rather than getting penalised for a regional
      signal.

  - _indicator-registry.ts: four new entries under `dimension: 'energy'`
    with `tier: 'experimental'` — importedFossilDependence (0.35),
    lowCarbonGenerationShare (0.20), powerLossesPct (0.10),
    reserveMarginPct (0.10). Experimental tier keeps them out of the
    Core coverage gate until seed coverage is confirmed.

  - compare-resilience-current-vs-proposed.mjs: new
    'bulk-v1-country-value' shape family in the extraction dispatcher.
    EXTRACTION_RULES now covers the four v2 registry indicators so
    the per-indicator influence harness tracks them from day one.
    When the seeders are absent, pairedSampleSize = 0 and Pearson = 0
    — the harness output surfaces the "no influence yet" state rather
    than silently dropping the indicators.

  - tests/resilience-energy-v2.test.mts: 11 new tests pinning:
    - flag-off = legacy behaviour preserved (v2 seed keys have no
      effect when flag is off — catches accidental cross-path reads)
    - flag-on = v2 composite behaves correctly:
      - lower fossilElectricityShare raises score
      - net exporter with 90% fossil > net importer with 90% fossil
        (max(·, 0) collapse verified)
      - higher lowCarbonGenerationShare raises score (nuclear credit)
      - higher powerLossesPct lowers score
      - euGasStorageStress is invariant for non-EU, responds for DE
      - all v2 inputs absent = graceful degradation, coverage < 1.0

106 resilience tests pass (existing + 11 new). Typecheck clean. Biome
clean. No production behaviour change with flag off (default).

Next commits on this branch: three World Bank seeders for the v2 keys,
health.js + SEED_META registration (gated ON_DEMAND_KEYS until Railway
cron provisions), acceptance-gate rerun at flag-flip time.

* feat(resilience): PR 1 — three WB seeders + health registration for v2 energy construct

Third commit in PR 1. Lands the seed scripts for the three v2 energy
indicator source keys, registered in api/health.js with ON_DEMAND_KEYS
gating until Railway cron provisions.

New seeders (weekly cron cadence, 8d maxStaleMin = 2x interval):
  - scripts/seed-low-carbon-generation.mjs
    Pulls EG.ELC.NUCL.ZS + EG.ELC.RNEW.ZS from World Bank, sums per
    country into `resilience:low-carbon-generation:v1`. Partial
    coverage (one series missing) still emits a value using the
    observed half — the scorer's 0-80 saturating goalpost tolerates
    it and the underlying construct is "firm low-carbon share".

  - scripts/seed-fossil-electricity-share.mjs
    Pulls EG.ELC.FOSL.ZS into `resilience:fossil-electricity-share:v1`.
    Feeds the importedFossilDependence composite at score time
    (composite = fossilShare × max(netImports, 0) / 100 per plan §3.2).

  - scripts/seed-power-reliability.mjs
    Pulls EG.ELC.LOSS.ZS into `resilience:power-losses:v1`. Direct
    grid-integrity signal replacing the retired electricityConsumption
    wealth proxy.

All three follow the existing seed-recovery-*.mjs template:
  - Shape: { countries: { [ISO2]: { value, year } }, seededAt }
  - runSeed() from _seed-utils.mjs with schemaVersion=1, ttl=35d
  - validateFn floor of 150 countries (WB coverage is 150-180 for
    the three indicators; below 150 = transient fetch failure)
  - ISO3 → ISO2 mapping via scripts/shared/iso3-to-iso2.json

No reserveMargin seeder is shipped in this commit per plan §3.1 open
question: IEA electricity-balance coverage is sparse outside OECD+G20,
and the indicator will likely ship as 'unmonitored' with weight 0.05
if it lands at all. The Redis key (`resilience:reserve-margin:v1`) is
reserved in _dimension-scorers.ts so the v2 scorer shape is stable.

api/health.js:
  - SEED_DOMAINS: add `lowCarbonGeneration`, `fossilElectricityShare`,
    `powerLosses` → their Redis keys.
  - SEED_META: same three, pointing at `seed-meta:resilience:*` meta
    keys with maxStaleMin=11520 (8d, per the worldmonitor
    health-maxstalemin-write-cadence pattern: 2x weekly cron).
  - ON_DEMAND_KEYS: three new entries gated as TRANSITIONAL until
    Railway cron provisions and the first clean run completes. Remove
    from this set after ~7 days of green production runs.

Typecheck clean; existing 106 resilience tests pass (seeders have no
in-repo callers yet, so nothing depends on them executing). Real-API
integration tests land when Railway cron is provisioned.

Next commit: Railway cron configuration + bundle-runner wiring.

* feat(resilience): PR 1 — bundle-runner + acceptance-gate verdict + flag-flip runbook

Final commit in the PR 1 tranche. Lands the three remaining pieces so
the flag-flip is fully operable once Railway cron provisions.

  - scripts/seed-bundle-resilience-energy-v2.mjs
    Railway cron bundle wrapping the three v2 energy seeders
    (low-carbon-generation, fossil-electricity-share, power-losses).
    Weekly cadence (7-day intervalMs); the underlying data is annual
    at source so polling more frequently just hammers the World Bank
    API. 5-minute per-script timeout. Mirrors the existing
    seed-bundle-resilience-recovery.mjs pattern.

  - scripts/compare-resilience-current-vs-proposed.mjs: acceptanceGates
    block. Programmatic evaluation of plan §6 gates using the inputs
    the harness already computes:
      gate-1-spearman              Spearman vs baseline >= 0.85
      gate-2-country-drift         Max country drift vs baseline <= 15
      gate-6-cohort-median         Cohort median shift vs baseline <= 10
      gate-7-matched-pair          Every pair holds expected direction
      gate-9-effective-influence   >= 80% Core indicators measurable
      gate-universe-integrity      No cohort/pair endpoint missing from scorable
    Thresholds are encoded in a const so they can't silently soften.
    Output verdict is PASS / CONDITIONAL / BLOCK. Emitted in
    summary.acceptanceVerdict for at-a-glance PR comment pasting, with
    full per-gate detail in acceptanceGates.results.

  - docs/methodology/energy-v2-flag-flip-runbook.md
    Operator runbook for the flag flip. Pre-flip checklist (seeders
    green, health endpoint green, ON_DEMAND_KEYS graduation, Spearman
    verification), flip procedure (pre-flip snapshot, dry-run, cache
    prefix bump, Vercel env flip, post-flip snapshot, methodology
    doc reclassification), rollback procedure, and a reference table
    for the three possible verdict states.

PR 1 is now code-complete pending:
  1. Railway cron provisioning (ops, not code)
  2. Flag flip + acceptance-gate rerun (follows runbook, not code)
  3. Reserve-margin seeder (deferred per plan §3.1 open-question)

Zero scoring-behaviour change in this commit. 121 resilience tests
pass, typecheck clean.

* fix(resilience): PR 1 — drop unseeded reserveMargin from scorer + fix composite extractor

Addresses two P1 review findings on PR #3289.

Finding 1: scoreEnergyV2 read resilience:reserve-margin:v1 at weight
0.10 but no seeder ships in this PR (indicator deferred per plan
§3.1 open-question). On flag flip that slot would be permanently
null, silently renormalizing the remaining 90% of weight and
producing a construct different from what the methodology doc
describes. Fix: remove reserve-margin from the v2 reader +
blend entirely. Redistribute its 0.10 weight to powerLossesPct
(now 0.20); both are grid-integrity signals per plan §3.1, and
the original plan split electricityConsumption's 0.30 weight
across powerLossesPct + reserveMarginPct + importedFossilDependence
— without reserveMarginPct, powerLossesPct carries the shared
grid-integrity load until the IEA seeder ships.

  v2 weights now: 0.35 + 0.20 + 0.20 + 0.10 + 0.15 = 1.00
  (importedFossilDependence + lowCarbonGenerationShare +
   powerLossesPct + euGasStorageStress + energyPriceStress)

  Reserve-margin Redis key constant stays reserved so the v2
  scorer shape is stable when a future commit lands the seeder;
  split 0.10 back out of powerLossesPct at that point.

Methodology doc, _shared.ts flag comment, and v2 test suite all
updated to the 5-indicator shape. New regression test asserts
that changing reserve-margin Redis content has zero effect on
the v2 score — guards against a future commit accidentally
wiring the reader back in without its seeder.

Finding 2: scripts/compare-resilience-current-vs-proposed.mjs
measured importedFossilDependence by reading fossilElectricityShare
alone. The scorer defines it as fossilShare × max(netImports, 0)
/ 100, so the extractor zeroed out net exporters and
under-reported net importers — making gate-9 effective-influence
wrong for the centrepiece construct change of PR 1.

Fix: new 'imported-fossil-dependence-composite' extractor type
in applyExtractionRule that recomputes the same composite from
both inputs (fossilShare bulk payload + staticRecord.iea.
energyImportDependency.value). Stays in lockstep with the
scorer — drift between the two would break gate-9's
interpretation.

New unit tests pin:
  - net importer: 80% × max(60, 0) / 100 = 48 ✓
  - net exporter: 80% × max(-40, 0) / 100 = 0 ✓
  - missing either input → null

64 resilience tests pass; typecheck clean. Flag-off path is
still byte-identical to pre-PR behaviour.

* docs(resilience): PR 1 — align methodology doc with actual shipped indicators and seeders

Addresses P1 review on docs/methodology/country-resilience-index.mdx
lines 29 and 574-575. The doc still described reserveMarginPct as a
shipped v2 indicator and listed seed-net-energy-imports.mjs in the
new-seeders list, neither of which the branch actually ships.

Doc changes to match the code in this branch:

  Known-limitations item 1: restated to describe the actual v2
  replacement footprint — powerLossesPct at 0.20 (temporarily
  absorbing reserveMarginPct's 0.10) plus accessToElectricityPct
  moved to infrastructure. reserveMarginPct is named as a deferred
  companion with the split-out instructions for when its seeder
  lands.

  v2.1 changelog (Indicators added): split into "live in PR 1" and
  "deferred in PR 1" so the reader can distinguish which entries
  match real code. importedFossilDependence's composite formula
  now written out and the net-imports source attributed to the
  existing resilience:static.iea path (not a new seeder).

  v2.1 changelog (New seeders): lists the three actual files that
  ship in this branch (seed-low-carbon-generation, seed-fossil-
  electricity-share, seed-power-reliability) and explicitly notes
  seed-net-energy-imports.mjs is NOT a new seeder — the
  EG.IMP.CONS.ZS series is already fetched by seed-resilience-
  static.mjs. Adds the bundle-runner reference.

Methodology-doc linter + mdx-lint both pass (125/125). Typecheck
clean. Doc is now the source of truth for what PR 1 actually ships.

* fix(resilience): PR 1 — sync powerLossesPct registry weight with scorer (0.10 → 0.20)

Reviewer-caught mismatch between INDICATOR_REGISTRY and scoreEnergyV2.
The previous commit redistributed the deferred reserveMarginPct's 0.10
weight into powerLossesPct in the SCORER but left the REGISTRY entry
unchanged at 0.10. Two downstream effects:

  1. scripts/compare-resilience-current-vs-proposed.mjs copies
     `spec.weight` into `nominalWeight` for gate-9 reporting, so
     powerLossesPct's nominal influence would be under-reported by
     half in every post-flip acceptance run — exactly the harness PR 1
     relies on for merge evidence.
  2. Methodology doc vs registry vs scorer drift is the pattern the
     methodology-doc linter is supposed to catch; it passes here
     because the linter only checks dimension-id parity, not weights.
     Registry is now the only remaining source of truth to keep in
     lockstep with the scorer.

Change:
  - `_indicator-registry.ts` powerLossesPct.weight: 0.1 → 0.2
  - Inline comment names the deferral and instructs: "when the IEA
    electricity-balance seeder lands, split 0.10 back out and restore
    reserveMarginPct at 0.10. Keep this field in lockstep with
    scoreEnergyV2 ... because the PR 0 compare harness copies
    spec.weight into nominalWeight for gate-9 reporting."

Experimental weights per dimension invariant still holds (0.35 + 0.20
+ 0.20 = 0.75 for energy, well under the 1.0 ceiling). 64 resilience
tests pass, typecheck clean.
2026-04-22 17:10:38 +04:00
Sebastien Melki
58e42aadf9 chore(api): enforce sebuf contract + migrate drifting endpoints (#3207) (#3242)
* chore(api): enforce sebuf contract via exceptions manifest (#3207)

Adds api/api-route-exceptions.json as the single source of truth for
non-proto /api/ endpoints, with scripts/enforce-sebuf-api-contract.mjs
gating every PR via npm run lint:api-contract. Fixes the root-only blind
spot in the prior allowlist (tests/edge-functions.test.mjs), which only
scanned top-level *.js files and missed nested paths and .ts endpoints —
the gap that let api/supply-chain/v1/country-products.ts and friends
drift under proto domain URL prefixes unchallenged.

Checks both directions: every api/<domain>/v<N>/[rpc].ts must pair with
a generated service_server.ts (so a deleted proto fails CI), and every
generated service must have an HTTP gateway (no orphaned generated code).

Manifest entries require category + reason + owner, with removal_issue
mandatory for temporary categories (deferred, migration-pending) and
forbidden for permanent ones. .github/CODEOWNERS pins the manifest to
@SebastienMelki so new exceptions don't slip through review.

The manifest only shrinks: migration-pending entries (19 today) will be
removed as subsequent commits in this PR land each migration.

* refactor(maritime): migrate /api/ais-snapshot → maritime/v1.GetVesselSnapshot (#3207)

The proto VesselSnapshot was carrying density + disruptions but the frontend
also needed sequence, relay status, and candidate_reports to drive the
position-callback system. Those only lived on the raw relay passthrough, so
the client had to keep hitting /api/ais-snapshot whenever callbacks were
registered and fall back to the proto RPC only when the relay URL was gone.

This commit pushes all three missing fields through the proto contract and
collapses the dual-fetch-path into one proto client call.

Proto changes (proto/worldmonitor/maritime/v1/):
  - VesselSnapshot gains sequence, status, candidate_reports.
  - GetVesselSnapshotRequest gains include_candidates (query: include_candidates).

Handler (server/worldmonitor/maritime/v1/get-vessel-snapshot.ts):
  - Forwards include_candidates to ?candidates=... on the relay.
  - Separate 5-min in-memory caches for the candidates=on and candidates=off
    variants; they have very different payload sizes and should not share a slot.
  - Per-request in-flight dedup preserved per-variant.

Frontend (src/services/maritime/index.ts):
  - fetchSnapshotPayload now calls MaritimeServiceClient.getVesselSnapshot
    directly with includeCandidates threaded through. The raw-relay path,
    SNAPSHOT_PROXY_URL, DIRECT_RAILWAY_SNAPSHOT_URL and LOCAL_SNAPSHOT_FALLBACK
    are gone — production already routed via Vercel, the "direct" branch only
    ever fired on localhost, and the proto gateway covers both.
  - New toLegacyCandidateReport helper mirrors toDensityZone/toDisruptionEvent.

api/ais-snapshot.js deleted; manifest entry removed. Only reduced the codegen
scope to worldmonitor.maritime.v1 (buf generate --path) — regenerating the
full tree drops // @ts-nocheck from every client/server file and surfaces
pre-existing type errors across 30+ unrelated services, which is not in
scope for this PR.

Shape-diff vs legacy payload:
  - disruptions / density: proto carries the same fields, just with the
    GeoCoordinates wrapper and enum strings (remapped client-side via
    existing toDisruptionEvent / toDensityZone helpers).
  - sequence, status.{connected,vessels,messages}: now populated from the
    proto response — was hardcoded to 0/false in the prior proto fallback.
  - candidateReports: same shape; optional numeric fields come through as
    0 instead of undefined, which the legacy consumer already handled.

* refactor(sanctions): migrate /api/sanctions-entity-search → LookupSanctionEntity (#3207)

The proto docstring already claimed "OFAC + OpenSanctions" coverage but the
handler only fuzzy-matched a local OFAC Redis index — narrower than the
legacy /api/sanctions-entity-search, which proxied OpenSanctions live (the
source advertised in docs/api-proxies.mdx). Deleting the legacy without
expanding the handler would have been a silent coverage regression for
external consumers.

Handler changes (server/worldmonitor/sanctions/v1/lookup-entity.ts):
  - Primary path: live search against api.opensanctions.org/search/default
    with an 8s timeout and the same User-Agent the legacy edge fn used.
  - Fallback path: the existing OFAC local fuzzy match, kept intact for when
    OpenSanctions is unreachable / rate-limiting.
  - Response source field flips between 'opensanctions' (happy path) and
    'ofac' (fallback) so clients can tell which index answered.
  - Query validation tightened: rejects q > 200 chars (matches legacy cap).

Rate limiting:
  - Added /api/sanctions/v1/lookup-entity to ENDPOINT_RATE_POLICIES at 30/min
    per IP — matches the legacy createIpRateLimiter budget. The gateway
    already enforces per-endpoint policies via checkEndpointRateLimit.

Docs:
  - docs/api-proxies.mdx — dropped the /api/sanctions-entity-search row
    (plus the orphaned /api/ais-snapshot row left over from the previous
    commit in this PR).
  - docs/panels/sanctions-pressure.mdx — points at the new RPC URL and
    describes the OpenSanctions-primary / OFAC-fallback semantics.

api/sanctions-entity-search.js deleted; manifest entry removed.

* refactor(military): migrate /api/military-flights → ListMilitaryFlights (#3207)

Legacy /api/military-flights read a pre-baked Redis blob written by the
seed-military-flights cron and returned flights in a flat app-friendly
shape (lat/lon, lowercase enums, lastSeenMs). The proto RPC takes a bbox,
fetches OpenSky live, classifies server-side, and returns nested
GeoCoordinates + MILITARY_*_TYPE_* enum strings + lastSeenAt — same data,
different contract.

fetchFromRedis in src/services/military-flights.ts was doing nothing
sebuf-aware. Renamed it to fetchViaProto and rewrote to:

  - Instantiate MilitaryServiceClient against getRpcBaseUrl().
  - Iterate MILITARY_QUERY_REGIONS (PACIFIC + WESTERN) in parallel — same
    regions the desktop OpenSky path and the seed cron already use, so
    dashboard coverage tracks the analytic pipeline.
  - Dedup by hexCode across regions.
  - Map proto → app shape via new mapProtoFlight helper plus three reverse
    enum maps (AIRCRAFT_TYPE_REVERSE, OPERATOR_REVERSE, CONFIDENCE_REVERSE).

The seed cron (scripts/seed-military-flights.mjs) stays put: it feeds
regional-snapshot mobility, cross-source signals, correlation, and the
health freshness check (api/health.js: 'military:flights:v1'). None of
those read the legacy HTTP endpoint; they read the Redis key directly.
The proto handler uses its own per-bbox cache keys under the same prefix,
so dashboard traffic no longer races the seed cron's blob — the two paths
diverge by a small refresh lag, which is acceptable.

Docs: dropped the /api/military-flights row from docs/api-proxies.mdx.

api/military-flights.js deleted; manifest entry removed.

Shape-diff vs legacy:
  - f.location.{latitude,longitude} → f.lat, f.lon
  - f.aircraftType: MILITARY_AIRCRAFT_TYPE_TANKER → 'tanker' via reverse map
  - f.operator: MILITARY_OPERATOR_USAF → 'usaf' via reverse map
  - f.confidence: MILITARY_CONFIDENCE_LOW → 'low' via reverse map
  - f.lastSeenAt (number) → f.lastSeen (Date)
  - f.enrichment → f.enriched (with field renames)
  - Extra fields registration / aircraftModel / origin / destination /
    firstSeenAt now flow through where proto populates them.

* fix(supply-chain): thread includeCandidates through chokepoint status (#3207)

Caught by tsconfig.api.json typecheck in the pre-push hook (not covered
by the plain tsc --noEmit run that ran before I pushed the ais-snapshot
commit). The chokepoint status handler calls getVesselSnapshot internally
with a static no-auth request — now required to include the new
includeCandidates bool from the proto extension.

Passing false: server-internal callers don't need per-vessel reports.

* test(maritime): update getVesselSnapshot cache assertions (#3207)

The ais-snapshot migration replaced the single cachedSnapshot/cacheTimestamp
pair with a per-variant cache so candidates-on and candidates-off payloads
don't evict each other. Pre-push hook surfaced that tests/server-handlers
still asserted the old variable names. Rewriting the assertions to match
the new shape while preserving the invariants they actually guard:

  - Freshness check against slot TTL.
  - Cache read before relay call.
  - Per-slot in-flight dedup.
  - Stale-serve on relay failure (result ?? slot.snapshot).

* chore(proto): restore // @ts-nocheck on regenerated maritime files (#3207)

I ran 'buf generate --path worldmonitor/maritime/v1' to scope the proto
regen to the one service I was changing (to avoid the toolchain drift
that drops @ts-nocheck from 60+ unrelated files — separate issue). But
the repo convention is the 'make generate' target, which runs buf and
then sed-prepends '// @ts-nocheck' to every generated .ts file. My
scoped command skipped the sed step. The proto-check CI enforces the
sed output, so the two maritime files need the directive restored.

* refactor(enrichment): decomm /api/enrichment/{company,signals} legacy edge fns (#3207)

Both endpoints were already ported to IntelligenceService:
  - getCompanyEnrichment  (/api/intelligence/v1/get-company-enrichment)
  - listCompanySignals    (/api/intelligence/v1/list-company-signals)

No frontend callers of the legacy /api/enrichment/* paths exist. Removes:
  - api/enrichment/company.js, signals.js, _domain.js
  - api-route-exceptions.json migration-pending entries (58 remain)
  - docs/api-proxies.mdx rows for /api/enrichment/{company,signals}
  - docs/architecture.mdx reference updated to the IntelligenceService RPCs

Verified: typecheck, typecheck:api, lint:api-contract (89 files / 58 entries),
lint:boundaries, tests/edge-functions.test.mjs (136 pass),
tests/enrichment-caching.test.mjs (14 pass — still guards the intelligence/v1
handlers), make generate is zero-diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(leads): migrate /api/{contact,register-interest} → LeadsService (#3207)

New leads/v1 sebuf service with two POST RPCs:
  - SubmitContact    → /api/leads/v1/submit-contact
  - RegisterInterest → /api/leads/v1/register-interest

Handler logic ported 1:1 from api/contact.js + api/register-interest.js:
  - Turnstile verification (desktop sources bypass, preserved)
  - Honeypot (website field) silently accepts without upstream calls
  - Free-email-domain gate on SubmitContact (422 ApiError)
  - validateEmail (disposable/offensive/typo-TLD/MX) on RegisterInterest
  - Convex writes via ConvexHttpClient (contactMessages:submit, registerInterest:register)
  - Resend notification + confirmation emails (HTML templates unchanged)

Shared helpers moved to server/_shared/:
  - turnstile.ts (getClientIp + verifyTurnstile)
  - email-validation.ts (disposable/offensive/MX checks)

Rate limits preserved via ENDPOINT_RATE_POLICIES:
  - submit-contact:    3/hour per IP (was in-memory 3/hr)
  - register-interest: 5/hour per IP (was in-memory 5/hr; desktop
    sources previously capped at 2/hr via shared in-memory map —
    now 5/hr like everyone else, accepting the small regression in
    exchange for Upstash-backed global limiting)

Callers updated:
  - pro-test/src/App.tsx contact form → new submit-contact path
  - src-tauri/sidecar/local-api-server.mjs cloud-fallback rewrites
    /api/register-interest → /api/leads/v1/register-interest when
    proxying; keeps local path for older desktop builds
  - src/services/runtime.ts isKeyFreeApiTarget allows both old and
    new paths through the WORLDMONITOR_API_KEY-optional gate

Tests:
  - tests/contact-handler.test.mjs rewritten to call submitContact
    handler directly; asserts on ValidationError / ApiError
  - tests/email-validation.test.mjs + tests/turnstile.test.mjs
    point at the new server/_shared/ modules

Deleted: api/contact.js, api/register-interest.js, api/_ip-rate-limit.js,
api/_turnstile.js, api/_email-validation.js, api/_turnstile.test.mjs.
Manifest entries removed (58 → 56). Docs updated (api-platform,
api-commerce, usage-rate-limits).

Verified: npm run typecheck + typecheck:api + lint:api-contract
(88 files / 56 entries) + lint:boundaries pass; full test:data
(5852 tests) passes; make generate is zero-diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(pro-test): rebuild bundle for leads/v1 contact form (#3207)

Updates the enterprise contact form to POST to /api/leads/v1/submit-contact
(old path /api/contact removed in the previous commit).

Bundle is rebuilt from pro-test/src/App.tsx source change in 9ccd309d.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): address HIGH review findings 1-3 (#3207)

Three review findings from @koala73 on the sebuf-migration PR, all
silent bugs that would have shipped to prod:

### 1. Sanctions rate-limit policy was dead code

ENDPOINT_RATE_POLICIES keyed the 30/min budget under
/api/sanctions/v1/lookup-entity, but the generated route (from the
proto RPC LookupSanctionEntity) is /api/sanctions/v1/lookup-sanction-entity.
hasEndpointRatePolicy / getEndpointRatelimit are exact-string pathname
lookups, so the mismatch meant the endpoint fell through to the
generic 600/min global limiter instead of the advertised 30/min.

Net effect: the live OpenSanctions proxy endpoint (unauthenticated,
external upstream) had 20x the intended rate budget. Fixed by renaming
the policy key to match the generated route.

### 2. Lost stale-seed fallback on military-flights

Legacy api/military-flights.js cascaded military:flights:v1 →
military:flights:stale:v1 before returning empty. The new proto
handler went straight to live OpenSky/relay and returned null on miss.

Relay or OpenSky hiccup used to serve stale seeded data (24h TTL);
under the new handler it showed an empty map. Both keys are still
written by scripts/seed-military-flights.mjs on every run — fix just
reads the stale key when the live fetch returns null, converts the
seed's app-shape flights (flat lat/lon, lowercase enums, lastSeenMs)
to the proto shape (nested GeoCoordinates, enum strings, lastSeenAt),
and filters to the request bbox.

Read via getRawJson (unprefixed) to match the seed cron's writes,
which bypass the env-prefix system.

### 3. Hex-code casing mismatch broke getFlightByHex

The seed cron writes hexCode: icao24.toUpperCase() (uppercase);
src/services/military-flights.ts:getFlightByHex uppercases the lookup
input: f.hexCode === hexCode.toUpperCase(). The new proto handler
preserved OpenSky's lowercase icao24, and mapProtoFlight is a
pass-through. getFlightByHex was silently returning undefined for
every call after the migration.

Fix: uppercase in the proto handler (live + stale paths), and document
the invariant in a comment on MilitaryFlight.hex_code in
military_flight.proto so future handlers don't re-break it.

### Verified

- typecheck + typecheck:api clean
- lint:api-contract (56 entries) / lint:boundaries clean
- tests/edge-functions.test.mjs 130 pass
- make generate zero-diff (openapi spec regenerated for proto comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): restore desktop 2/hr rate cap on register-interest (#3207)

Addresses HIGH review finding #4 from @koala73. The legacy
api/register-interest.js applied a nested 2/hr per-IP cap when
`source === 'desktop-settings'`, on top of the generic 5/hr endpoint
budget. The sebuf migration lost this — desktop-source requests now
enjoy the full 5/hr cap.

Since `source` is an unsigned client-supplied field, anyone sending
`source: 'desktop-settings'` skips Turnstile AND gets 5/hr. Without
the tighter cap the Turnstile bypass is cheaper to abuse.

Added `checkScopedRateLimit` to `server/_shared/rate-limit.ts` — a
reusable second-stage Upstash limiter keyed on an opaque scope string
+ caller identifier. Fail-open on Redis errors to match existing
checkRateLimit / checkEndpointRateLimit semantics. Handlers that need
per-subscope caps on top of the gateway-level endpoint budget use this
helper.

In register-interest: when `isDesktopSource`, call checkScopedRateLimit
with scope `/api/leads/v1/register-interest#desktop`, limit=2, window=1h,
IP as identifier. On exceeded → throw ApiError(429).

### What this does not fix

This caps the blast radius of the Turnstile bypass but does not close
it — an attacker sending `source: 'desktop-settings'` still skips
Turnstile (just at 2/hr instead of 5/hr). The proper fix is a signed
desktop-secret header that authenticates the bypass; filed as
follow-up #3252. That requires coordinated Tauri build + Vercel env
changes out of scope for #3207.

### Verified

- typecheck + typecheck:api clean
- lint:api-contract (56 entries)
- tests/edge-functions.test.mjs + contact-handler.test.mjs (147 pass)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): MEDIUM + LOW + rate-limit-policy CI check (#3207)

Closes out the remaining @koala73 review findings from #3242 that
didn't already land in the HIGH-fix commits, plus the requested CI
check that would have caught HIGH #1 (dead-code policy key) at
review time.

### MEDIUM #5 — Turnstile missing-secret policy default

Flip `verifyTurnstile`'s default `missingSecretPolicy` from `'allow'`
to `'allow-in-development'`. Dev with no secret = pass (expected
local); prod with no secret = reject + log. submit-contact was
already explicitly overriding to `'allow-in-development'`;
register-interest was silently getting `'allow'`. Safe default now
means a future missing-secret misconfiguration in prod gets caught
instead of silently letting bots through. Removed the now-redundant
override in submit-contact.

### MEDIUM #6 — Silent enum fallbacks in maritime client

`toDisruptionEvent` mapped `AIS_DISRUPTION_TYPE_UNSPECIFIED` / unknown
enum values → `gap_spike` / `low` silently. Refactored to return null
when either enum is unknown; caller filters nulls out of the array.
Handler doesn't produce UNSPECIFIED today, but the `gap_spike`
default would have mislabeled the first new enum value the proto
ever adds — dropping unknowns is safer than shipping wrong labels.

### LOW — Copy drift in register-interest email

Email template hardcoded `435+ Sources`; PR #3241 bumped marketing to
`500+`. Bumped in the rewritten file to stay consistent.

The `as any` on Convex mutation names carried over from legacy and
filed as follow-up #3253.

### Rate-limit-policy coverage lint

`scripts/enforce-rate-limit-policies.mjs` validates every key in
`ENDPOINT_RATE_POLICIES` resolves to a proto-generated gateway route
by cross-referencing `docs/api/*.openapi.yaml`. Fails with the
sanctions-entity-search incident referenced in the error message so
future drift has a paper trail.

Wired into package.json (`lint:rate-limit-policies`) and the pre-push
hook alongside `lint:boundaries`. Smoke-tested both directions —
clean repo passes (5 policies / 175 routes), seeded drift (the exact
HIGH #1 typo) fails with the advertised remedy text.

### Verified
- `lint:rate-limit-policies` ✓
- `typecheck` + `typecheck:api` ✓
- `lint:api-contract` ✓ (56 entries)
- `lint:boundaries` ✓
- edge-functions + contact-handler tests (147 pass)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 5): decomm /api/eia/* + migrate /api/satellites → IntelligenceService (#3207)

Both targets turned out to be decomm-not-migration cases. The original
plan called for two new services (economic/v1.GetEiaSeries +
natural/v1.ListSatellitePositions) but research found neither was
needed:

### /api/eia/[[...path]].js — pure decomm, zero consumers

The "catch-all" is a misnomer — only two paths actually worked,
/api/eia/health and /api/eia/petroleum, both Redis-only readers.
Zero frontend callers in src/. Zero server-side readers. Nothing
consumes the `energy:eia-petroleum:v1` key that seed-eia-petroleum.mjs
writes daily.

The EIA data the frontend actually uses goes through existing typed
RPCs in economic/v1: GetEnergyPrices, GetCrudeInventories,
GetNatGasStorage, GetEnergyCapacity. None of those touch /api/eia/*.

Building GetEiaSeries would have been dead code. Deleted the legacy
file + its test (tests/api-eia-petroleum.test.mjs — it only covered
the legacy endpoint, no behavior to preserve). Empty api/eia/ dir
removed.

**Note for review:** the Redis seed cron keeps running daily and
nothing consumes it. If that stays unused, seed-eia-petroleum.mjs
should be retired too (separate PR). Out of scope for sebuf-migration.

### /api/satellites.js — Learning #2 strikes again

IntelligenceService.ListSatellites already exists at
/api/intelligence/v1/list-satellites, reads the same Redis key
(intelligence:satellites:tle:v1), and supports an optional country
filter the legacy didn't have.

One frontend caller in src/services/satellites.ts needed to switch
from `fetch(toApiUrl('/api/satellites'))` to the typed
IntelligenceServiceClient.listSatellites. Shape diff was tiny —
legacy `noradId` became proto `id` (handler line 36 already picks
either), everything else identical. alt/velocity/inclination in the
proto are ignored by the caller since it propagates positions
client-side via satellite.js.

Kept the client-side cache + failure cooldown + 20s timeout (still
valid concerns at the caller level).

### Manifest + docs
- api-route-exceptions.json: 56 → 54 entries (both removed)
- docs/api-proxies.mdx: dropped the two rows from the Raw-data
  passthroughs table

### Verified
- typecheck + typecheck:api ✓
- lint:api-contract (54 entries) / lint:boundaries / lint:rate-limit-policies ✓
- tests/edge-functions.test.mjs 127 pass (down from 130 — 3 tests were
  for the deleted eia endpoint)
- make generate zero-diff (no proto changes)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 6): migrate /api/supply-chain/v1/{country-products,multi-sector-cost-shock} → SupplyChainService (#3207)

Both endpoints were hand-rolled TS handlers sitting under a proto URL prefix —
the exact drift the manifest guardrail flagged. Promoted both to typed RPCs:

- GetCountryProducts → /api/supply-chain/v1/get-country-products
- GetMultiSectorCostShock → /api/supply-chain/v1/get-multi-sector-cost-shock

Handlers preserve the existing semantics: PRO-gate via isCallerPremium(ctx.request),
iso2 / chokepointId validation, raw bilateral-hs4 Redis read (skip env-prefix to
match seeder writes), CHOKEPOINT_STATUS_KEY for war-risk tier, and the math from
_multi-sector-shock.ts unchanged. Empty-data and non-PRO paths return the typed
empty payload (no 403 — the sebuf gateway pattern is empty-payload-on-deny).

Client wrapper switches from premiumFetch to client.getCountryProducts/
client.getMultiSectorCostShock. Legacy MultiSectorShock / MultiSectorShockResponse /
CountryProductsResponse names remain as type aliases of the generated proto types
so CountryBriefPanel + CountryDeepDivePanel callsites compile with zero churn.

Manifest 54 → 52. Rate-limit gateway routes 175 → 177.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): add cache-tier entries for new supply-chain RPCs (#3207)

Pre-push tests/route-cache-tier.test.mjs caught the missing entries.
Both PRO-gated, request-varying — match the existing supply-chain PRO cohort
(get-country-cost-shock, get-bypass-options, etc.) at slow-browser tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 7): migrate /api/scenario/v1/{run,status,templates} → ScenarioService (#3207)

Promote the three literal-filename scenario endpoints to a typed sebuf
service with three RPCs:

  POST /api/scenario/v1/run-scenario        (RunScenario)
  GET  /api/scenario/v1/get-scenario-status (GetScenarioStatus)
  GET  /api/scenario/v1/list-scenario-templates (ListScenarioTemplates)

Preserves all security invariants from the legacy handlers:
- 405 for wrong method (sebuf service-config method gate)
- scenarioId validation against SCENARIO_TEMPLATES registry
- iso2 regex ^[A-Z]{2}$
- JOB_ID_RE path-traversal guard on status
- Per-IP 10/min rate limit (moved to gateway ENDPOINT_RATE_POLICIES)
- Queue-depth backpressure (>100 → 429)
- PRO gating via isCallerPremium
- AbortSignal.timeout on every Redis pipeline (runRedisPipeline helper)

Wire-level diffs vs legacy:
- Per-user RL now enforced at the gateway (same 10/min/IP budget).
- Rate-limit response omits Retry-After header; retryAfter is in the
  body per error-mapper.ts convention.
- ListScenarioTemplates emits affectedHs2: [] when the registry entry
  is null (all-sectors sentinel); proto repeated cannot carry null.
- RunScenario returns { jobId, status } (no statusUrl field — unused
  by SupplyChainPanel, drop from wire).

Gateway wiring:
- server/gateway.ts RPC_CACHE_TIER: list-scenario-templates → 'daily'
  (matches legacy max-age=3600); get-scenario-status → 'slow-browser'
  (premium short-circuit target, explicit entry required by
  tests/route-cache-tier.test.mjs).
- src/shared/premium-paths.ts: swap old run/status for the new
  run-scenario/get-scenario-status paths.
- api/scenario/v1/{run,status,templates}.ts deleted; 3 manifest
  exceptions removed (63 → 52 → 49 migration-pending).

Client:
- src/services/scenario/index.ts — typed client wrapper using
  premiumFetch (injects Clerk bearer / API key).
- src/components/SupplyChainPanel.ts — polling loop swapped from
  premiumFetch strings to runScenario/getScenarioStatus. Hard 20s
  timeout on run preserved via AbortSignal.any.

Tests:
- tests/scenario-handler.test.mjs — 18 new handler-level tests
  covering every security invariant + the worker envelope coercion.
- tests/edge-functions.test.mjs — scenario sections removed,
  replaced with a breadcrumb pointer to the new test file.

Docs: api-scenarios.mdx, scenario-engine.mdx, usage-rate-limits.mdx,
usage-errors.mdx, supply-chain.mdx refreshed with new paths.

Verified: typecheck, typecheck:api, lint:api-contract (49 entries),
lint:rate-limit-policies (6/180), lint:boundaries, route-cache-tier
(parity), full edge-functions (117) + scenario-handler (18).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 8): migrate /api/v2/shipping/{route-intelligence,webhooks} → ShippingV2Service (#3207)

Partner-facing endpoints promoted to a typed sebuf service. Wire shape
preserved byte-for-byte (camelCase field names, ISO-8601 fetchedAt, the
same subscriberId/secret formats, the same SET + SADD + EXPIRE 30-day
Redis pipeline). Partner URLs /api/v2/shipping/* are unchanged.

RPCs landed:
- GET  /route-intelligence  → RouteIntelligence  (PRO, slow-browser)
- POST /webhooks            → RegisterWebhook    (PRO)
- GET  /webhooks            → ListWebhooks       (PRO, slow-browser)

The existing path-parameter URLs remain on the legacy edge-function
layout because sebuf's HTTP annotations don't currently model path
params (grep proto/**/*.proto for `path: "{…}"` returns zero). Those
endpoints are split into two Vercel dynamic-route files under
api/v2/shipping/webhooks/, behaviorally identical to the previous
hybrid file but cleanly separated:
- GET  /webhooks/{subscriberId}                → [subscriberId].ts
- POST /webhooks/{subscriberId}/rotate-secret  → [subscriberId]/[action].ts
- POST /webhooks/{subscriberId}/reactivate     → [subscriberId]/[action].ts

Both get manifest entries under `migration-pending` pointing at #3207.

Other changes
- scripts/enforce-sebuf-api-contract.mjs: extended GATEWAY_RE to accept
  api/v{N}/{domain}/[rpc].ts (version-first) alongside the canonical
  api/{domain}/v{N}/[rpc].ts; first-use of the reversed ordering is
  shipping/v2 because that's the partner contract.
- vite.config.ts: dev-server sebuf interceptor regex extended to match
  both layouts; shipping/v2 import + allRoutes entry added.
- server/gateway.ts: RPC_CACHE_TIER entries for /api/v2/shipping/
  route-intelligence + /webhooks (slow-browser; premium-gated endpoints
  short-circuit to slow-browser but the entries are required by
  tests/route-cache-tier.test.mjs).
- src/shared/premium-paths.ts: route-intelligence + webhooks added.
- tests/shipping-v2-handler.test.mjs: 18 handler-level tests covering
  PRO gate, iso2/cargoType/hs2 coercion, SSRF guards (http://, RFC1918,
  cloud metadata, IMDS), chokepoint whitelist, alertThreshold range,
  secret/subscriberId format, pipeline shape + 30-day TTL, cross-tenant
  owner isolation, `secret` omission from list response.

Manifest delta
- Removed: api/v2/shipping/route-intelligence.ts, api/v2/shipping/webhooks.ts
- Added:   api/v2/shipping/webhooks/[subscriberId].ts (migration-pending)
- Added:   api/v2/shipping/webhooks/[subscriberId]/[action].ts (migration-pending)
- Added:   api/internal/brief-why-matters.ts (internal-helper) — regression
  surface from the #3248 main merge, which introduced the file without a
  manifest entry. Filed here to keep the lint green; not strictly in scope
  for commit 8 but unblocking.

Net result: 49 → 47 `migration-pending` entries (one net-removal even
though webhook path-params stay pending, because two files collapsed
into two dynamic routes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 1): SupplyChainServiceClient must use premiumFetch (#3207)

Signed-in browser pro users were silently hitting 401 on 8 supply-chain
premium endpoints (country-products, multi-sector-cost-shock,
country-chokepoint-index, bypass-options, country-cost-shock,
sector-dependency, route-explorer-lane, route-impact). The shared
client was constructed with globalThis.fetch, so no Clerk bearer or
X-WorldMonitor-Key was injected. The gateway's validateApiKey runs
with forceKey=true for PREMIUM_RPC_PATHS and 401s before isCallerPremium
is consulted. The generated client's try/catch collapses the 401 into
an empty-fallback return, leaving panels blank with no visible error.

Fix is one line at the client constructor: swap globalThis.fetch for
premiumFetch. The same pattern is already in use for insider-transactions,
stock-analysis, stock-backtest, scenario, trade (premiumClient) — this
was an omission on this client, not a new pattern.

premiumFetch no-ops safely when no credentials are available, so the
5 non-premium methods on this client (shippingRates, chokepointStatus,
chokepointHistory, criticalMinerals, shippingStress) continue to work
unchanged.

This also fixes two panels that were pre-existing latently broken on
main (chokepoint-index, bypass-options, etc. — predating #3207, not
regressions from it). Commit 6 expanded the surface by routing two more
methods through the same buggy client; this commit fixes the class.

From koala73 review (#3242 second-pass, HIGH new #1):
> Exact class PR #3233 fixed for RegionalIntelligenceBoard /
> DeductionPanel / trade / country-intel. Supply-chain was not in
> #3233's scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 2): restore 400 on input-shape errors for 2 supply-chain handlers (#3207)

Commit 6 collapsed all non-happy paths into empty-200 on
`get-country-products` and `get-multi-sector-cost-shock`, including
caller-bug cases that legacy returned 400 for:

- get-country-products: malformed iso2 → empty 200 (was 400)
- get-multi-sector-cost-shock: malformed iso2 / missing chokepointId /
  unknown chokepointId → empty 200 (was 400)

The commit message for 6 called out the 403-for-non-pro → empty-200
shift ("sebuf gateway pattern is empty-payload-on-deny") but not the
400 shift. They're different classes:

- Empty-payload-200 for PRO-deny: intentional contract change, already
  documented and applied across the service. Generated clients treat
  "you lack PRO" as "no data" — fine.
- Empty-payload-200 for malformed input: caller bug silently masked.
  External API consumers can't distinguish "bad wiring" from "genuinely
  no data", test harnesses lose the signal, bad calling code doesn't
  surface in Sentry.

Fix: `throw new ValidationError(violations)` on the 3 input-shape
branches. The generated sebuf server maps ValidationError → HTTP 400
(see src/generated/server/.../service_server.ts and leads/v1 which
already uses this pattern).

PRO-gate deny stays as empty-200 — that contract shift was intentional
and is preserved.

Regression tests added at tests/supply-chain-validation.test.mjs (8
cases) pinning the three-way contract:
- bad input                         → 400 (ValidationError)
- PRO-gate deny on valid input      → 200 empty
- valid PRO input, no data in Redis → 200 empty (unchanged)

From koala73 review (#3242 second-pass, HIGH new #2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 3): restore statusUrl on RunScenarioResponse + document 202→200 wire break (#3207)

Commit 7 silently shifted /api/scenario/v1/run-scenario's response
contract in two ways that the commit message covered only partially:

1. HTTP 202 Accepted → HTTP 200 OK
2. Dropped `statusUrl` string from the response body

The `statusUrl` drop was mentioned as "unused by SupplyChainPanel" but
not framed as a contract change. The 202 → 200 shift was not mentioned
at all. This is a same-version (v1 → v1) migration, so external callers
that key off either signal — `response.status === 202` or
`response.body.statusUrl` — silently branch incorrectly.

Evaluated options:
  (a) sebuf per-RPC status-code config — not available. sebuf's
      HttpConfig only models `path` and `method`; no status annotation.
  (b) Bump to scenario/v2 — judged heavier than the break itself for
      a single status-code shift. No in-repo caller uses 202 or
      statusUrl; the docs-level impact is containable.
  (c) Accept the break, document explicitly, partially restore.

Took option (c):

- Restored `statusUrl` in the proto (new field `string status_url = 3`
  on RunScenarioResponse). Server computes
  `/api/scenario/v1/get-scenario-status?jobId=<encoded job_id>` and
  populates it on every successful enqueue. External callers that
  followed this URL keep working unchanged.
- 202 → 200 is not recoverable inside the sebuf generator, so it is
  called out explicitly in two places:
    - docs/api-scenarios.mdx now includes a prominent `<Warning>` block
      documenting the v1→v1 contract shift + the suggested migration
      (branch on response body shape, not HTTP status).
    - RunScenarioResponse proto comment explains why 200 is the new
      success status on enqueue.
  OpenAPI bundle regenerated to reflect the restored statusUrl field.

- Regression test added in tests/scenario-handler.test.mjs pinning
  `statusUrl` to the exact URL-encoded shape — locks the invariant so
  a future proto rename or handler refactor can't silently drop it
  again.

From koala73 review (#3242 second-pass, HIGH new #3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 1/2): close webhook tenant-isolation gap on shipping/v2 (#3207)

Koala flagged this as a merge blocker in PR #3242 review.

server/worldmonitor/shipping/v2/{register-webhook,list-webhooks}.ts
migrated without reinstating validateApiKey(req, { forceKey: true }),
diverging from both the sibling api/v2/shipping/webhooks/[subscriberId]
routes and the documented "X-WorldMonitor-Key required" contract in
docs/api-shipping-v2.mdx.

Attack surface: the gateway accepts Clerk bearer auth as a pro signal.
A Clerk-authenticated pro user with no X-WorldMonitor-Key reaches the
handler, callerFingerprint() falls back to 'anon', and every such
caller collapses into a shared webhook:owner:anon:v1 bucket. The
defense-in-depth ownerTag !== ownerHash check in list-webhooks.ts
doesn't catch it because both sides equal 'anon' — every Clerk-session
holder could enumerate / overwrite every other Clerk-session pro
tenant's registered webhook URLs.

Fix: reinstate validateApiKey(ctx.request, { forceKey: true }) at the
top of each handler, throwing ApiError(401) when absent. Matches the
sibling routes exactly and the published partner contract.

Tests:
- tests/shipping-v2-handler.test.mjs: two existing "non-PRO → 403"
  tests for register/list were using makeCtx() with no key, which now
  fails at the 401 layer first. Renamed to "no API key → 401
  (tenant-isolation gate)" with a comment explaining the failure mode
  being tested. 18/18 pass.

Verified: typecheck:api, lint:api-contract (no change), lint:boundaries,
lint:rate-limit-policies, test:data (6005/6005).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 2/2): restore v1 path aliases on scenario + supply-chain (#3207)

Koala flagged this as a merge blocker in PR #3242 review.

Commits 6 + 7 of #3207 renamed five documented v1 URLs to the sebuf
method-derived paths and deleted the legacy edge-function files:

  POST /api/scenario/v1/run                       → run-scenario
  GET  /api/scenario/v1/status                    → get-scenario-status
  GET  /api/scenario/v1/templates                 → list-scenario-templates
  GET  /api/supply-chain/v1/country-products      → get-country-products
  GET  /api/supply-chain/v1/multi-sector-cost-shock → get-multi-sector-cost-shock

server/router.ts is an exact static-match table (Map keyed on `METHOD
PATH`), so any external caller — docs, partner scripts, grep-the-
internet — hitting the old documented URL would 404 on first request
after merge. Commit 8 (shipping/v2) preserved partner URLs byte-for-
byte; the scenario + supply-chain renames missed that discipline.

Fix: add five thin alias edge functions that rewrite the pathname to
the canonical sebuf path and delegate to the domain [rpc].ts gateway
via a new server/alias-rewrite.ts helper. Premium gating, rate limits,
entitlement checks, and cache-tier lookups all fire on the canonical
path — aliases are pure URL rewrites, not a duplicate handler pipeline.

  api/scenario/v1/{run,status,templates}.ts
  api/supply-chain/v1/{country-products,multi-sector-cost-shock}.ts

Vite dev parity: file-based routing at api/ is a Vercel concern, so the
dev middleware (vite.config.ts) gets a matching V1_ALIASES rewrite map
before the router dispatch.

Manifest: 5 new entries under `deferred` with removal_issue=#3282
(tracking their retirement at the next v1→v2 break). lint:api-contract
stays green (89 files checked, 55 manifest entries validated).

Docs:
- docs/api-scenarios.mdx: migration callout at the top with the full
  old→new URL table and a link to the retirement issue.
- CHANGELOG.md + docs/changelog.mdx: Changed entry documenting the
  rename + alias compat + the 202→200 shift (from commit 23c821a1).

Verified: typecheck:api, lint:api-contract, lint:rate-limit-policies,
lint:boundaries, test:data (6005/6005).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 09:55:59 +03:00
Elie Habib
425507d15a fix(brief): category-gated context + RELEVANCE RULE to stop formulaic grounding (#3281)
* fix(brief): category-gated context + RELEVANCE RULE to stop formulaic grounding

Shadow-diff of 15 v2 pairs (2026-04-22) showed the analyst pattern-
matching the loudest context numbers — VIX 19.50, top forecast
probability, MidEast FX stress 77 — into every story regardless of
editorial fit. A Rwanda humanitarian story about refugees cited VIX;
an aviation story cited a forecast probability.

Root cause: every story got the same 6-bundle context block, so the
LLM had markets / forecasts / macro in-hand and the "cite a specific
fact" instruction did the rest.

Two-layer fix:

  1. STRUCTURAL — sectionsForCategory() maps the story's category to
     an editorially-relevant subset of bundles. Humanitarian stories
     don't see marketData / forecasts / macroSignals; diplomacy gets
     riskScores only; market/energy gets markets+forecasts but drops
     riskScores. The model physically cannot cite what it wasn't
     given. Unknown categories fall back to all six (backcompat).

  2. PROMPT — WHY_MATTERS_ANALYST_SYSTEM_V2 adds a RELEVANCE RULE
     that explicitly permits grounding in headline/description
     actors when no context fact is a clean fit, and bans dragging
     off-topic market metrics into humanitarian/aviation/diplomacy
     stories. The prompt footer (inline, per-call) restates the
     same guardrail — models follow inline instructions more
     reliably than system-prompt constraints on longer outputs.

Cache keys bumped to invalidate the formulaic v5 output: endpoint
v5 to v6, shadow v3 to v4. Adds 11 unit tests pinning the 5
policies + default fallback + humanitarian structural guarantee +
market policy does-see-markets + guardrail footer presence.

Observability: endpoint now logs policyLabel per call so operators
can confirm in Vercel logs that humanitarian/aviation stories are
NOT seeing marketData without dumping the full prompt.

* test(brief): address greptile P2 — sync MAX_BODY_BYTES + add parseWhyMattersV2 coverage

Greptile PR #3281 review raised two P2 test-quality issues:

1. Test-side MAX_BODY_BYTES mirror was still 4096 — the endpoint
   was bumped to 8192 in PR #3269 (v2 output + description). With
   the stale constant, a payload in the 4097–8192 range was
   accepted by the real endpoint but looked oversize in the test
   mirror, letting the body-cap invariant silently drift. Fixed
   by syncing to 8192 + bumping the bloated fixture to 10_000
   bytes so a future endpoint-cap bump doesn't silently
   re-invalidate the assertion.

2. parseWhyMattersV2 (the only output-validation gate on the
   analyst path) had no dedicated unit tests. Adds 11 targeted
   cases covering: valid 2 and 3 sentence output, 100/500 char
   bounds (incl. boundary assertions), all 6 banned preamble
   phrases, section-label leaks (SITUATION/ANALYSIS/Watch),
   markdown leakage (#, -, *, 1.), stub echo rejection, smart/
   plain quote stripping, non-string defensive branch, and
   whitespace-only strings.

Suite size: 50 to 61 tests, all green.

* fix(brief): add aviation policy to sectionsForCategory (PR #3281 review P1)

Reviewer caught that aviation was named in WHY_MATTERS_ANALYST_SYSTEM_V2's
RELEVANCE RULE as a category banned from off-topic market metrics, but
had no matching regex entry in CATEGORY_SECTION_POLICY. So 'Aviation
Incident' / 'Airspace Closure' / 'Plane Crash' / 'Drone Incursion' all
fell through to DEFAULT_SECTIONS and still got all 6 bundles including
marketData, forecasts, and macroSignals — exactly the VIX / forecast
probability pattern the PR claimed to structurally prevent.

Reproduced on HEAD before fix:
  Aviation Incident -> default
  Airspace Closure  -> default
  Plane Crash       -> default
  ...etc.

Fix:
  1. Adds aviation policy (same 3 bundles as humanitarian/diplomacy/
     tech: worldBrief, countryBrief, riskScores).
  2. Adds dedicated aviation-gating test with 6 category variants.
  3. Adds meta-invariant test: every category named in the system
     prompt's RELEVANCE RULE MUST have a structural policy entry,
     asserting policyLabel !== 'default'. If someone adds a new
     category name to the prompt in the future, this test fires
     until they wire up a regex — prevents soft-guard drift.
  4. Removes 'Aviation Incident' from the default-fall-through test
     list (it now correctly matches aviation).

No cache bump needed — v6 was published to the feature branch only a
few minutes ago, no production entries have been written yet.
2026-04-22 08:21:01 +04:00
Elie Habib
fbaf07e106 feat(resilience): flag-gated pillar-combined score activation (default off) (#3267)
Wires the non-compensatory 3-pillar combined overall_score behind a
RESILIENCE_PILLAR_COMBINE_ENABLED env flag. Default is false so this PR
ships zero behavior change in production. When flipped true the
top-level overall_score switches from the 6-domain weighted aggregate
to penalizedPillarScore(pillars) with alpha 0.5 and pillar weights
0.40 / 0.35 / 0.25.

Evidence from docs/snapshots/resilience-pillar-sensitivity-2026-04-21:
- Spearman rank correlation current vs proposed 0.9935
- Mean score delta -13.44 points (every country drops, penalty is
  always at most 1)
- Max top-50 rank swing 6 positions (Russia)
- No ceiling or floor effects under plus/minus 20pct perturbation
- Release gate PASS 0/19

Code change in server/worldmonitor/resilience/v1/_shared.ts:
- New isPillarCombineEnabled() reads env dynamically so tests can flip
  state without reloading the module
- overallScore branches on (isPillarCombineEnabled() AND
  RESILIENCE_SCHEMA_V2_ENABLED AND pillars.length > 0); otherwise falls
  through to the 6-domain aggregate (unchanged default path)
- RESILIENCE_SCORE_CACHE_PREFIX bumped v9 to v10
- RESILIENCE_RANKING_CACHE_KEY bumped v9 to v10

Cache invalidation: the version bump forces both per-country score
cache and ranking cache to recompute from the current code path on
first read after a flag flip. Without the bump, 6-domain values cached
under the flag-off path would continue to serve for up to 6-12 hours
after the flip, producing a ragged mix of formulas.

Ripple of v9 to v10:
- api/health.js registry entry
- scripts/seed-resilience-scores.mjs (both keys)
- scripts/validate-resilience-correlation.mjs,
  scripts/backtest-resilience-outcomes.mjs,
  scripts/validate-resilience-backtest.mjs,
  scripts/benchmark-resilience-external.mjs
- tests/resilience-ranking.test.mts 24 fixture usages
- tests/resilience-handlers.test.mts
- tests/resilience-scores-seed.test.mjs explicit pin
- tests/resilience-pillar-aggregation.test.mts explicit pin
- docs/methodology/country-resilience-index.mdx

New tests/resilience-pillar-combine-activation.test.mts:
7 assertions exercising the flag-on path against the release fixtures
with re-anchored bands (NO at least 60, YE/SO at most 40, NO greater
than US preserved, elite greater than fragile). Regression guard
verifies flipping the flag back off restores the 6-domain aggregate.

tests/resilience-ranking-snapshot.test.mts: band thresholds now
resolve from a METHODOLOGY_BANDS table keyed on
snapshot.methodologyFormula. Backward compatible (missing formula
defaults to domain-weighted-6d bands).

Snapshots:
- docs/snapshots/resilience-ranking-2026-04-21.json tagged
  methodologyFormula domain-weighted-6d
- docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json
  new: top/bottom/major-economies tables projected from the
  52-country sensitivity sample. Explicitly tagged projected (NOT a
  full-universe live capture). When the flag is flipped in production,
  run scripts/freeze-resilience-ranking.mjs to capture the
  authoritative full-universe snapshot.

Methodology doc: Pillar-combined score activation section rewritten to
describe the flag-gated mechanism (activation is an env-var flip, no
code deploy) and the rollback path.

Verification: npm run typecheck:all clean, 397/397 resilience tests
pass (up from 390, +7 activation tests).

Activation plan:
1. Merge this PR with flag default false (zero behavior change)
2. Set RESILIENCE_PILLAR_COMBINE_ENABLED=true in Vercel and Railway env
3. Redeploy or wait for next cold start; v9 to v10 bump forces every
   country to be rescored on first read
4. Run scripts/freeze-resilience-ranking.mjs against the flag-on
   deployment and commit the resulting snapshot
5. Ship a v2.0 methodology-change note explaining the re-anchored
   scale so analysts understand the universal ~13 point score drop is
   a scale rebase, not a country-level regression

Rollback: set RESILIENCE_PILLAR_COMBINE_ENABLED=false, flush
resilience:score:v10:* and resilience:ranking:v10 keys (or wait for
TTLs). The 6-domain formula stays alongside the pillar combine in
_shared.ts and needs no code change to come back.
2026-04-22 06:52:07 +04:00
Elie Habib
ec35cf4158 feat(brief): analyst prompt v2 — multi-sentence, grounded, story description (#3269)
* feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description

Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output
indistinguishable from legacy Gemini: identical single-sentence
abstraction ("destabilize / systemic / sovereign risk repricing") with
no named actors, metrics, or dates — in several cases Gemini was MORE
specific.

Root cause: 18–30 word cap compressed context specifics out.

v2 loosens three dials at once so we can settle the A/B:

1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences,
   40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc,
   MUST cite one specific named actor / metric / date / place from
   the context. Analyst path only; gemini path stays on v1.

2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects
   preamble boilerplate + leaked section labels + markdown.

3. Story description plumbed through — endpoint body accepts optional
   story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB).
   Cron forwards it when upstream has one (skipped when it equals the
   headline — no new signal).

Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the
first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length.

If shadow-diff 24h after deploy still shows no delta vs gemini, kill
is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy).

Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean.

* fix(brief): stop truncating v2 multi-sentence output + description in cache hash

Two P1s caught in PR #3269 review.

P1a — cron reparsed endpoint output with v1 single-sentence parser,
silently dropping sentences 2+3 of v2 analyst output. The endpoint had
ALREADY validated the string (parseWhyMattersV2 for analyst path;
parseWhyMatters for gemini). Re-parsing with v1 took only the first
sentence — exact regression #3269 was meant to fix.

Fix: trust the endpoint. Replace re-parse with bounds check (30–500
chars) + stub-echo reject. Added regression test asserting multi-
sentence output reaches the envelope unchanged.

P1b — `story.description` flowed into the analyst prompt but NOT into
the cache hash. Two requests with identical core fields but different
descriptions collided on one cache slot → second caller got prose
grounded in the FIRST caller's description.

Fix: add `description` as the 6th field of `hashBriefStory`. Bump
endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are
dropped. Updated the parity sentinel in brief-llm-core.test.mjs to
match 6-field semantics. Added regression tests covering different-
descriptions-differ and present-vs-absent-differ.

Tests: 6083 pass. typecheck × 2 clean.
2026-04-21 22:25:54 +04:00
Elie Habib
2f19d96357 feat(brief): route whyMatters through internal analyst-context endpoint (#3248)
* feat(brief): route whyMatters through internal analyst-context endpoint

The brief's "why this is important" callout currently calls Gemini on
only {headline, source, threatLevel, category, country} with no live
state. The LLM can't know whether a ceasefire is on day 2 or day 50,
that IMF flagged >90% gas dependency in UAE/Qatar/Bahrain, or what
today's forecasts look like. Output is generic prose instead of the
situational analysis WMAnalyst produces when given live context.

This PR adds an internal Vercel edge endpoint that reuses a trimmed
variant of the analyst context (country-brief, risk scores, top-3
forecasts, macro signals, market data — no GDELT, no digest-search)
and ships it through a one-sentence LLM call with the existing
WHY_MATTERS_SYSTEM prompt. The endpoint owns its own Upstash cache
(v3 prefix, 6h TTL), supports a shadow mode that runs both paths in
parallel for offline diffing, and is auth'd via RELAY_SHARED_SECRET.

Three-layer graceful degradation (endpoint → legacy Gemini-direct →
stub) keeps the brief shipping on any failure.

Env knobs:
- BRIEF_WHY_MATTERS_PRIMARY=analyst|gemini (default: analyst; typo → gemini)
- BRIEF_WHY_MATTERS_SHADOW=0|1 (default: 1; only '0' disables)
- BRIEF_WHY_MATTERS_SHADOW_SAMPLE_PCT=0..100 (default: 100)
- BRIEF_WHY_MATTERS_ENDPOINT_URL (Railway, optional override)

Cache keys:
- brief:llm:whymatters:v3:{hash16} — envelope {whyMatters, producedBy,
  at}, 6h TTL. Endpoint-owned.
- brief:llm:whymatters:shadow:v1:{hash16} — {analyst, gemini, chosen,
  at}, 7d TTL. Fire-and-forget.
- brief:llm:whymatters:v2:{hash16} — legacy. Cron's fallback path
  still reads/writes during the rollout window; expires in ≤24h.

Tests: 6022 pass (existing 5915 + 12 core + 36 endpoint + misc).
typecheck + typecheck:api + biome on changed files clean.

Plan (Codex-approved after 4 rounds):
docs/plans/2026-04-21-001-feat-brief-why-matters-analyst-endpoint-plan.md

* fix(brief): address /ce:review round 1 findings on PR #3248

Fixes 5 findings from multi-agent review, 2 of them P1:

- #241 P1: `.gitignore !api/internal/**` was too broad — it re-included
  `.env`, `.env.local`, and any future secret file dropped into that
  directory. Narrowed to explicit source extensions (`*.ts`, `*.js`,
  `*.mjs`) so parent `.env` / secrets rules stay in effect inside
  api/internal/.

- #242 P1: `Dockerfile.digest-notifications` did not COPY
  `shared/brief-llm-core.js` + `.d.ts`. Cron would have crashed at
  container start with ERR_MODULE_NOT_FOUND. Added alongside
  brief-envelope + brief-filter COPY lines.

- #243 P2: Cron dropped the endpoint's source/producedBy ground-truth
  signal, violating PR #3247's own round-3 memory
  (feedback_gate_on_ground_truth_not_configured_state.md). Added
  structured log at the call site: `[brief-llm] whyMatters source=<src>
  producedBy=<pb> hash=<h>`. Endpoint response now includes `hash` so
  log + shadow-record pairs can be cross-referenced.

- #244 P2: Defense-in-depth prompt-injection hardening. Story fields
  flowed verbatim into both LLM prompts, bypassing the repo's
  sanitizeForPrompt convention. Added sanitizeStoryFields helper and
  applied in both analyst and gemini paths.

- #245 P2: Removed redundant `validate` option from callLlmReasoning.
  With only openrouter configured in prod, a parse-reject walked the
  provider chain, then fell through to the other path (same provider),
  then the cron's own fallback (same model) — 3x billing on one reject.
  Post-call parseWhyMatters check already handles rejection cleanly.

Deferred to P3 follow-ups (todos 246-248): singleflight, v2 sunset,
misc polish (country-normalize LOC, JSDoc pruning, shadow waitUntil,
auto-sync mirror, context-assembly caching).

Tests: 6022 pass. typecheck + typecheck:api clean.

* fix(brief-why-matters): ctx.waitUntil for shadow write + sanitize legacy fallback

Two P2 findings on PR #3248:

1. Shadow record was fire-and-forget without ctx.waitUntil on an Edge
   function. Vercel can terminate the isolate after response return,
   so the background redisPipeline write completes unreliably — i.e.
   the rollout-validation signal the shadow keys were supposed to
   provide was flaky in production.

   Fix: accept an optional EdgeContext 2nd arg. Build the shadow
   promise up front (so it starts executing immediately) then register
   it with ctx.waitUntil when present. Falls back to plain unawaited
   execution when ctx is absent (local harness / tests).

2. scripts/lib/brief-llm.mjs legacy fallback path called
   buildWhyMattersPrompt(story) on raw fields with no sanitization.
   The analyst endpoint sanitizes before its own prompt build, but
   the fallback is exactly what runs when the endpoint misses /
   errors — so hostile headlines / sources reached the LLM verbatim
   on that path.

   Fix: local sanitizeStoryForPrompt wrapper imports sanitizeForPrompt
   from server/_shared/llm-sanitize.js (existing pattern — see
   scripts/seed-digest-notifications.mjs:41). Wraps story fields
   before buildWhyMattersPrompt. Cache key unchanged (hash is over raw
   story), so cache parity with the analyst endpoint's v3 entries is
   preserved.

Regression guard: new test asserts the fallback prompt strips
"ignore previous instructions", "### Assistant:" line prefixes, and
`<|im_start|>` tokens when injection-crafted fields arrive.

Typecheck + typecheck:api clean. 6023 / 6023 data tests pass.

* fix(digest-cron): COPY server/_shared/llm-sanitize into digest-notifications image

Reviewer P1 on PR #3248: my previous commit (4eee22083) added
`import sanitizeForPrompt from server/_shared/llm-sanitize.js` to
scripts/lib/brief-llm.mjs, but Dockerfile.digest-notifications cherry-
picks server/_shared/* files and doesn't copy llm-sanitize. Import is
top-level/static — the container would crash at module load with
ERR_MODULE_NOT_FOUND the moment seed-digest-notifications.mjs pulls in
scripts/lib/brief-llm.mjs. Not just on fallback — every startup.

Fix: add `COPY server/_shared/llm-sanitize.js server/_shared/llm-sanitize.d.ts`
next to the existing brief-render COPY line. Module is pure string
manipulation with zero transitive imports — nothing else needs to land.

Cites feedback_validation_docker_ship_full_scripts_dir.md in the comment
next to the COPY; the cherry-pick convention keeps biting when new
cross-dir imports land in scripts/lib/ or scripts/shared/.

Can't regression-test at build time from this branch without a
docker-build CI job, but the symptom is deterministic — local runs
remain green (they resolve against the live filesystem); only the
container crashes. Post-merge, Railway redeploy of seed-digest-
notifications should show a clean `Starting Container` log line
instead of the MODULE_NOT_FOUND crash my prior commit would have caused.
2026-04-21 14:03:27 +04:00
Elie Habib
ecd56d4212 feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast (#3236)
* feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast

Four direct-RSS sources verified from a clean IP and absent everywhere
in the repo (src/config/feeds.ts, scripts/seed-*, ais-relay.cjs, RSS
allowlist). Closes the highest-ROI Iran / Israel domestic-press gap
from the ME source audit (PR #3226) with zero infra changes.

- IRNA        https://en.irna.ir/rss
- Mehr News   https://en.mehrnews.com/rss
- Jerusalem Post https://www.jpost.com/rss/rssfeedsheadlines.aspx
- Ynetnews    https://www.ynetnews.com/Integration/StoryRss3089.xml

Propaganda-risk metadata:
- IRNA + Mehr tagged high / Iran state-affiliated (join Press TV).
- JPost + Ynetnews tagged low with knownBiases for transparency.

RSS allowlist updated in all three mirrors (shared/, scripts/shared/,
api/_rss-allowed-domains.js) per the byte-identical mirror contract
enforced by tests/edge-functions.test.mjs.

Deferred (separate PRs):
- Times of Israel: already in allowlist; was removed from feeds for
  cloud-IP 403. Needs Decodo routing.
- IDF Spokesperson: idf.il has no direct RSS endpoint; needs scraper.
- Tasnim / Press TV RSS / Israel Hayom: known cloud-IP blocks.
- WAM / SPA / KUNA / QNA / BNA: public RSS endpoints are dead; sites
  migrated to SPAs or gate with 403.

Plan doc (PR #3226) overstated the gap: it audited only feeds.ts and
missed that travel advisories + US Embassy alerts are already covered
by scripts/seed-security-advisories.mjs. NOTAM claim in that doc is
also wrong: we use ICAO's global NOTAM API, not FAA.

* fix(feeds): enable IRNA, Mehr, Jerusalem Post, Ynetnews by default

Reviewer on #3236 flagged that adding the four new ME feeds to
FULL_FEEDS.middleeast alone leaves them disabled on first run, because
App.ts:661 persists computeDefaultDisabledSources() output derived from
DEFAULT_ENABLED_SOURCES. Users would have to manually re-enable via
Settings > Sources, defeating the purpose of broadening the default
ME mix.

Add the four new sources to DEFAULT_ENABLED_SOURCES.middleeast so they
ship on by default. Placement keeps them adjacent to their peers
(IRNA / Mehr with the other Iran sources, JPost / Ynetnews after
Haaretz). Risk/slant tags already in SOURCE_PROPAGANDA_RISK ensure
downstream digest dedup + summarization weights them correctly.

* style(feeds): move JPost + Ynetnews under Low-risk section header

Greptile on #3236 flagged that both entries are risk: 'low' but were
inserted above the `// Low risk - Independent with editorial standards`
comment header, making the section boundary misleading for future
contributors. Shift them under the header where they belong.

No runtime change; cosmetic ordering only.
2026-04-20 19:07:09 +04:00
Elie Habib
661bbe8f09 fix(health): nationalDebt threshold 7d → 60d — match monthly cron interval (#3237)
* fix(health): nationalDebt threshold 7d → 60d to match monthly cron cadence

User reported health showing:
  "nationalDebt": { status: "STALE_SEED", records: 187, seedAgeMin: 10469, maxStaleMin: 10080 }

Root cause: api/health.js had `maxStaleMin: 10080` (7 days) on a seeder
that runs every 30 days via seed-bundle-macro.mjs:
  { label: 'National-Debt', intervalMs: 30 * DAY, ... }

The threshold was narrower than the cron interval, so every month
between days 8–30 it guaranteed STALE_SEED. Original comment
"7 days — monthly seed" even spelled the mismatch out loud.

Data source cadence:
- US Treasury debt_to_penny API: updates daily but we only snapshot latest
- IMF WEO: quarterly/semi-annual release — no value in checking daily
- 30-day cron is appropriate; stale threshold should be ≥ 2× interval

Fix: bump maxStaleMin to 86400 (60 days). Matches the 2× pattern used
by faoFoodPriceIndex + recovery pillar (recoveryFiscalSpace, etc.)
which also run monthly.

Also fixes the same mismatch in scripts/regional-snapshot/freshness.mjs —
the 10080 ceiling there would exclude national-debt from capital_stress
axis scoring 23 days out of every 30 between seeds.

* fix(seed-national-debt): raise CACHE_TTL to 65d so health.js stale window is actually reachable

PR #3237 review was correct: my earlier fix set api/health.js
SEED_META.nationalDebt.maxStaleMin to 60d (86400min), but the seeder's
CACHE_TTL was still 35d. After a missed monthly cron, the canonical key
expired at day 35 — long before the 60d "stale" threshold. Result path:
  hasData=false → api/health.js:545-549 → status = EMPTY (crit)
Not STALE_SEED (warn) as my commit message claimed.

writeFreshnessMetadata() in scripts/_seed-utils.mjs:222 sets meta TTL to
max(7d, ttlSeconds), so bumping ttlSeconds alone propagates to both the
canonical payload AND the meta key.

Fix:
- CACHE_TTL 35d → 65d (5d past the 60d stale window so we get a clean
  STALE_SEED → EMPTY transition without keys vanishing mid-warn).
- runSeed opts.maxStaleMin 10080 (7d) → 86400 (60d) so the in-seeder
  declaration matches api/health.js. Field is only validated for
  presence by runSeed (scripts/_seed-utils.mjs:798), but the drift was
  what hid the TTL invariant in the first place.

Invariant this restores: for any SEED_META entry,
  seeder CACHE_TTL ≥ maxStaleMin + buffer
so the "warn before crit" gradient actually exists.

* fix(freshness): wire national-debt to seed-meta + teach extractTimestamp about seededAt

Reviewer P2 on PR #3237: my earlier freshness.mjs bump to 86400 was a
no-op. classifyInputs() (scripts/regional-snapshot/freshness.mjs:100-108,
122-132) uses the entry's metaKey or extractTimestamp()'s known field
list. national-debt had neither — payload carries only `seededAt`, and
extractTimestamp didn't know that field, so the "present but undated"
branch treated every call as fresh. The age window never mattered.

Two complementary fixes:

1. Add metaKey: 'seed-meta:economic:national-debt' to the freshness
   entry. Primary, authoritative source — seed-meta.fetchedAt is
   written by writeFreshnessMetadata() on every successful run, which is
   also what api/health.js reads, keeping both surfaces consistent.

2. Add `seededAt` to extractTimestamp()'s field list. Defense-in-depth:
   many other runSeed-based scripts (seed-iea-oil-stocks,
   seed-eurostat-country-data, etc.) wrap output as { ..., seededAt: ISO }
   with no metaKey in the freshness registry. Without this, they were
   also silently always-fresh. ISO strings parse via Date.parse.

Note: `economic:eu-gas-storage:v1` uses `seededAt: String(Date.now())` —
a stringified epoch number, which Date.parse does NOT handle. That seed's
freshness classification is still broken by this entry's lack of metaKey,
but it's a separate shape issue out of scope here. Flagged in PR body.
2026-04-20 19:03:47 +04:00
Elie Habib
9e022f23bb fix(cable-health): stop EMPTY alarm during NGA outages — writeback fallback + mark zero-events healthy (#3230)
User reported health endpoint showing:
  "cableHealth": { status: "EMPTY", records: 0, seedAgeMin: 0, maxStaleMin: 90 }

despite the 30-min warm-ping loop running. Two bugs stacked:

1. get-cable-health.ts null-upstream path didn't write Redis.
   cachedFetchJson with a returning-null fetcher stores NEG_SENTINEL
   (10 bytes) in cable-health-v1 for 2 min. Handler then returned
   `fallbackCache || { cables: {} }` to the client WITHOUT writing to
   cable-health-v1 or refreshing seed-meta. api/health.js saw strlen=10
   → strlenIsData=false → hasData=false → records=0 → EMPTY (CRIT).

   Fix: on null result, write the fallback response back to CACHE_KEY
   (short TTL matching NEG_SENTINEL so a recovered NGA fetch can
   overwrite immediately) AND refresh seed-meta with the real count.
   Health now sees hasData=true during an outage.

2. Zero-cables was treated as EMPTY_DATA (CRIT), but `cables: {}` is
   the valid healthy state — NGA had no active subsea cable warnings.
   The old `Math.max(count, 1)` on recordCount was an intentional lie
   to sidestep this; now honest.

   Fix: add `cableHealth` to EMPTY_DATA_OK_KEYS. Matches the existing
   pattern for notamClosures, gpsjam, weatherAlerts — "zero events is
   valid, not critical". recordCount now reports actual cables.length.

Combined: NGA outage → fallback cached locally + written back → health
reads hasData=true, records=N, no false alarm. NGA healthy with zero
active warnings → cables={}, records=0, EMPTY_DATA_OK → OK. NGA healthy
with warnings → cables={...}, records>0 → OK.

Regression guard to keep in mind: if anyone later removes cableHealth
from EMPTY_DATA_OK_KEYS and wants strict zero-events to alarm, they'd
also need to revisit `Math.max(count, 1)` or an equivalent floor so
the "legitimately empty but healthy" state doesn't CRIT.
2026-04-20 15:21:04 +04:00
Elie Habib
84eec7f09f fix(health): align breadthHistory maxStaleMin with actual Tue-Sat cron schedule (#3219)
Production alarm: `breadthHistory` went STALE_SEED every Monday morning
despite the seeder running correctly. Root cause was a threshold /
schedule mismatch:

- Schedule (Railway): 02:00 UTC, Tuesday through Saturday. Five ticks
  per week, capturing Mon-Fri market close → following-day 02:00 UTC.
- Threshold: maxStaleMin=2880 (48h), assuming daily cadence.
- Max real gap: Sat 02:00 UTC → Tue 02:00 UTC = 72h. The existing 48h
  alarm fired every Monday at ~02:00 UTC when the Sun/Mon cron ticks
  are intentionally absent, until the Tue 02:00 UTC run restored
  fetchedAt.

Fix: bump maxStaleMin to 5760 (96h). 72h covers the weekend gap;
extra 24h tolerates one missed Tue run without alarming. Comment now
records the actual schedule + reasoning.

No seeder change needed — logs confirm the service fires and completes
correctly on its schedule (Apr 16/17/18 02:00 UTC runs all "Done" with
3/3 readings, `Stopping Container` is normal Railway cron teardown).

Diagnostic memo: this is the class of bug where the schedule comment
lies. Original comment said "daily cron at 21:00 ET". True start time
is 22:00 EDT / 21:00 EST Mon-Fri (02:00 UTC next day) AND only Mon-Fri,
so "daily" is wrong by two days every week.
2026-04-20 07:56:54 +04:00
Elie Habib
4853645d53 fix(brief): switch carousel to @vercel/og on edge runtime (#3210)
* fix(brief): switch carousel to @vercel/og on edge runtime

Every attempt to ship the Phase 8 Telegram carousel on Vercel's
Node serverless runtime has failed at cold start:

- PR #3174 direct satori + @resvg/resvg-wasm: Vercel edge bundler
  refused the `?url` asset import required by resvg-wasm.
- PR #3174 (fix) direct satori + @resvg/resvg-js native binding:
  Node runtime accepted it, but Vercel's nft tracer does not follow
  @resvg/resvg-js/js-binding.js's conditional
  `require('@resvg/resvg-js-<platform>-<arch>-<libc>')` pattern,
  so the linux-x64-gnu peer package was never bundled. Cold start
  threw MODULE_NOT_FOUND, isolate crashed,
  FUNCTION_INVOCATION_FAILED on every request including OPTIONS,
  and Telegram reported WEBPAGE_CURL_FAILED with no other signal.
- PR #3204 added `vercel.json` `functions.includeFiles` to force
  the binding in, but (a) the initial key was a literal path that
  Vercel micromatch read as a character class (PR #3206 fixed),
  (b) even with the corrected `api/brief/carousel/**` wildcard, the
  function still 500'd across the board. The `functions.includeFiles`
  path appears honored in the deployment manifest but not at runtime
  for this particular native-binding pattern.

Fix: swap the renderer to @vercel/og's ImageResponse, which is
Vercel's first-party wrapper around satori + resvg-wasm with
Vercel-native bundling. Runs on Edge runtime — matches every other
API route in the project. No native binding, no includeFiles, no
nft tracing surprises. Cold start ~300ms, warm ~30ms.

Changes:
- server/_shared/brief-carousel-render.ts: replace renderCarouselPng
  (Uint8Array) with renderCarouselImageResponse (ImageResponse).
  Drop ensureLibs + satori + @resvg/resvg-js dynamic-import dance.
  Keep layout builders (buildCover/buildThreads/buildStory) and
  font loading unchanged — the Satori object trees are
  wire-compatible with ImageResponse.
- api/brief/carousel/[userId]/[issueDate]/[page].ts: flip
  `runtime: 'nodejs'` -> `runtime: 'edge'`. Delegate rendering to
  the renderer's ImageResponse and return it directly; error path
  still 503 no-store so CDN + Telegram don't pin a bad render.
- vercel.json: drop the now-useless `functions.includeFiles` block.
- package.json: drop direct `@resvg/resvg-js` and `satori` deps
  (both now bundled inside @vercel/og).
- tests/deploy-config.test.mjs: replace the native-binding
  regression guards with an assertion that no `functions` block
  exists (with a comment pointing at the skill documenting the
  micromatch gotcha for future routes).
- tests/brief-carousel.test.mjs: updated comment references.

Verified:
- typecheck + typecheck:api clean
- test:data 5814/5814 pass
- node -e test: @vercel/og imports cleanly in Node (tests that
  reach through the renderer file no longer depend on native
  bindings)

Post-deploy validation:
  curl -I -H "User-Agent: TelegramBot (like TwitterBot)" \
    "https://www.worldmonitor.app/api/brief/carousel/<uid>/<slot>/0"
  # Expect: HTTP/2 403 (no token) or 200 (valid token)
  # NOT:    HTTP/2 500 FUNCTION_INVOCATION_FAILED

Then tail Railway digest logs on the next tick; the
`[digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED` line
should stop appearing, and the 3-image preview should actually land
on Telegram.

* Add renderer smoke test + fix Cache-Control duplication

Reviewer flagged residual risk: no dedicated carousel-route smoke
test for the @vercel/og path. Adds one, and catches a real bug in
the process.

Findings during test-writing:

1. @vercel/og's ImageResponse runs CLEANLY in Node via tsx — the
   comment in brief-carousel.test.mjs saying "we can't test the
   render in Node" was true for direct satori + @resvg/resvg-wasm
   but no longer holds after PR #3210. Pure Node render works
   end-to-end: satori tree-parse, jsdelivr font fetch, resvg-wasm
   init, PNG output. ~850ms first call, ~20ms warm.

2. ImageResponse sets its own default
   `Cache-Control: public, immutable, no-transform, max-age=31536000`.
   Passing Cache-Control via the constructor's headers option
   APPENDS rather than overrides, producing a duplicated
   comma-joined value like
   `public, immutable, no-transform, max-age=31536000, public, max-age=60`
   on the Response. The route handler was doing exactly this via
   extraHeaders. Fix: drop our Cache-Control override and rely on
   @vercel/og's 1-year immutable default — envelope is only
   immutable for its 7d Redis TTL so the effective ceiling is 7d
   anyway (after that the route 404s before render).

Changes:

- tests/brief-carousel.test.mjs: 6 new assertions under
  `renderCarouselImageResponse`:
    * renders cover / threads / story pages, each returning a
      valid PNG (magic bytes + size range)
    * rejects a structurally empty envelope
    * threads non-cache extraHeaders onto the Response
    * pins @vercel/og's Cache-Control default so it survives
      caller-supplied Cache-Control overrides (regression guard
      for the bug fixed in this commit)
- api/brief/carousel/[userId]/[issueDate]/[page].ts: remove the
  stacked Cache-Control; lean on @vercel/og default. Drop the now-
  unused `PAGE_CACHE_TTL` constant. Comment explains why.

Verified:
- test:data 5820/5820 pass (was 5814, +6 smoke)
- typecheck + typecheck:api clean
- Render smoke: cover 825ms / threads 23ms / story 16ms first run
  (wasm init dominates first render)
2026-04-19 15:18:12 +04:00
Elie Habib
38e6892995 fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205)
* fix(brief): per-run slot URL so same-day digests link to distinct briefs

Digest emails at 8am and 1pm on the same day pointed to byte-identical
magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz.
Each compose run overwrote the single daily envelope in place, and the
composer rolling 24h story window meant afternoon output often looked
identical to morning. Readers clicking an older email got whatever the
latest cron happened to write.

Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The
magazine URL, carousel URLs, and Redis key all carry the slot, and each
digest dispatch gets its own frozen envelope that lives out the 7d TTL.
envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026".

The digest cron also writes a brief:latest:{userId} pointer (7d TTL,
overwritten each compose) so the dashboard panel and share-url endpoint
can locate the most recent brief without knowing the slot. The
previous date-probing strategy does not work once keys carry HHMM.

No back-compat for the old YYYY-MM-DD format: the verifier rejects it,
the composer only ever writes the new shape, and any in-flight
notifications signed under the old format will 403 on click. Acceptable
at the rollout boundary per product decision.

* fix(brief): carve middleware bot allowlist to accept slot-format carousel path

BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the
pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted
by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist
and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn
image fetchers would 403 on sendMediaGroup, breaking previews for the
new digest links.

CI missed this because tests/middleware-bot-gate.test.mts still
exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the
slot format and add a regression asserting the pre-slot shape is now
rejected, so legacy links cannot silently leak the allowlist after
the rollout.

* fix(brief): preserve caller-requested slot + correct no-brief share-url error

Two contract bugs in the slot rollout that silently misled callers:

1. GET /api/latest-brief?slot=X where X has no envelope was returning
   { status: 'composing', issueDate: <today UTC> } — which reads as
   "today's brief is composing" instead of "the specific slot you
   asked about doesn't exist". A caller probing a known historical
   slot would get a completely unrelated "today" signal. Now we echo
   the requested slot back (issueSlot + issueDate derived from its
   date portion) when the caller supplied ?slot=, and keep the
   UTC-today placeholder only for the no-param path.

2. POST /api/brief/share-url with no slot and no latest-pointer was
   falling into the generic invalid_slot_shape 400 branch. That is
   not an input-shape problem; it is "no brief exists yet for this
   user". Return 404 brief_not_found — the same code the
   existing-envelope check returns — so callers get one coherent
   contract: either the brief exists and is shareable, or it doesn't
   and you get 404.
2026-04-19 14:15:59 +04:00
Elie Habib
d7f87754f0 fix(emails): update transactional email copy — 22 → 30+ services (#3203)
Follow-up to #3202. Greptile flagged two transactional email templates still claimed '22 services' while /pro now advertises '30+':

- api/register-interest.js:90 — interest-registration confirmation email ('22 Services, 1 Key')
- convex/payments/subscriptionEmails.ts:57 — API subscription confirmation email ('22 services, one API key')

A user signing up via /pro would read '30+ services' on the page, then receive an email saying '22'. Both updated to '30+' matching the /pro page and the actual server domain count (31 in server/worldmonitor/*, plus api/scenario/v1/ = 32, growing).
2026-04-19 13:15:17 +04:00
Elie Habib
c7aacfd651 fix(health): persist WARNING events + add failure-log timeline (#3197)
* fix(health): persist WARNING events + add failure-log timeline

WARNING status (stale seeds) was excluded from the health:last-failure
Redis write (line 680 checked `!== 'WARNING'`). When UptimeRobot keyword-
checks for "HEALTHY" and gets a WARNING response, it flags DOWN, but no
forensic trail was left in Redis. This made stale-seed incidents invisible
to post-mortem investigation.

Changes:
- Write health:last-failure for ANY non-HEALTHY status (including WARNING)
- Add health:failure-log (LPUSH list, last 50 entries, 7-day TTL) so
  multiple incidents are preserved as a timeline, not just the latest
- Include warnCount alongside critCount in the snapshot
- Broaden the problems filter to capture all non-OK statuses

* fix(health): dedupe failure-log entries by incident signature

Repeated polls during one long WARNING window would LPUSH near-identical
snapshots, filling the 50-entry log and evicting older distinct incidents.

Now compares a signature (status + sorted problem set) against the previous
entry via health:failure-log-sig. Only appends when the incident changes.
The last-failure key is still updated every poll (latest timestamp matters).

* fix(health): add 4s timeout to persist pipelines + consistent arg types

Addresses greptile review on PR #3197:
- Both persist redisPipeline calls now pass 4_000ms timeout (main data
  pipeline uses 8_000ms; persist is less critical so shorter is fine)
- LTRIM/EXPIRE args use numbers consistently (was mixing number/string)

* fix(health): atomic sig swap via SET ... GET to eliminate dedupe race

Two concurrent /api/health requests could both read the old signature
before either write lands, appending duplicate entries. Now uses
SET key val EX ttl GET (Redis 6.2+) to atomically swap the sig and
return the previous value in one pipeline command. The LPUSH only
fires if the returned previous sig differs from the new one.

Also skips the second redisPipeline call entirely when sig matches
(no logCmds to send).

* fix(health): exclude seedAgeMin from dedupe sig + clear sig on recovery

Two issues with the failure-log dedupe:

1. seedAgeMin changes on every poll (e.g. 31min, 32min, 33min), so
   the signature changed every time and LPUSH still fired on every
   probe during a STALE_SEED window. Now uses a separate sigKeys
   array with only key:status (no age) for the signature, while
   problemKeys still includes ages for the snapshot payload.

2. The sig was never cleared on recovery. If the same problem set
   recurred after a healthy gap, the old sig (within its 24h TTL)
   would match and the recurrence would be silently skipped. Now
   DELs health:failure-log-sig when overall === 'HEALTHY'.

* fix(health): move sig write after LPUSH in same pipeline

The sig was written eagerly in the first pipeline (SET ... GET), but the
LPUSH happened in a separate background pipeline. If that second write
failed, the sig was already advanced, permanently deduping the incident
out of the timeline.

Now: GET sig first (read-only), then write last-failure + LPUSH + sig
all in one pipeline. The sig only advances if the entire pipeline
succeeds. Failure leaves the old sig in place so the next poll retries.

Reintroduces a small read-then-write race window (two concurrent probes
can both read the old sig), but the worst case is a single duplicate
entry, which is strictly better than a permanently dropped incident.
2026-04-19 10:14:19 +04:00
Elie Habib
96fca1dc2b fix(supply-chain): popup-keyed history re-query + dataAvailable flag (#3187)
* fix(supply-chain): popup-keyed history re-query + dataAvailable flag for partial coverage

Two P1 findings on #3185 post-merge review:

1. MapPopup cross-chokepoint history contamination
   Popup's async history resolve re-queried [data-transit-chart] without a
   cpId key. User opens popup A → fetch starts for cpA; user opens popup B
   before it resolves → cpA's history mounts into cpB's chart container.
   Fix: add data-transit-chart-id keyed by cpId; re-query by it on resolve.
   Mirrors SupplyChainPanel's existing data-chart-cp-id pattern.

2. Partial portwatch coverage still looked healthy
   Previous fix emits all 13 canonical summaries (zero-state fill for
   missing IDs) and records pwCovered in seed-meta, but:
   - get-chokepoint-status still zero-filled missing chokepoints and cached
     the response as healthy — panel rendered silent empty rows.
   - api/health.js only degrades on recordCount=0, so 10/13 partial read
     as OK despite the UI hiding entire chokepoints.
   Fix:
   - proto: TransitSummary.data_available (field 12). Writer tags with
     Boolean(cpData). Status RPC passes through; defaults true for pre-fix
     payloads (absence = covered).
   - Status RPC writes seed-meta recordCount as covered count (not shape
     size), and flips response-level upstreamUnavailable on partial.
   - api/health.js: new minRecordCount field on SEED_META entries + new
     COVERAGE_PARTIAL status (warn rollup). chokepoints entry declares
     minRecordCount: 13. recordCount < 13 → COVERAGE_PARTIAL.
   - Client (panel + popup): skip stats/chart rendering when
     !dataAvailable; show "Transit data unavailable (upstream partial)"
     microcopy so users understand the gap.

5759/5759 data tests pass. Typecheck + typecheck:api clean.

* fix(supply-chain): guarantee Simulate Closure button exits Computing state

User reports "Simulate Closure does nothing beyond write Computing…" — the
button sticks at Computing forever. Two causes:

1. Scenario worker appears down (0 scenario-result:* keys in Redis in the
   last 24h of 24h-TTL). Railway-side — separate intervention needed to
   redeploy scripts/scenario-worker.mjs.

2. Client leaked the "Computing…" state on multiple exit paths:
   - signal.aborted early-return inside the poll loop never reset the
     button. Second click fired abort on first → first returned without
     resetting → button stayed "Computing…" until next render.
   - !this.content.isConnected early-return also skipped reset (less
     user-visible but same class of bug).
   - catch block swallowed AbortError without resetting.
   - POST /run had no hard timeout — a hanging edge function left the
     button in Computing indefinitely.

Fix:
- resetButton(text) helper touches the btn only if still connected;
  applied in every exit path (abort, timeout, post-success, catch).
- AbortSignal.any([caller, AbortSignal.timeout(20_000)]) on POST /run.
- console.error on failure so Simulate Closure errors surface in ops.
- Error message includes "scenario worker may be down" on loop timeout
  so operators see the right suspect.

Backend observations (for follow-up):
- Hormuz backend is healthy (/api/health chokepoints OK, 13 records,
  1 min old; live RPC has hormuz_strait.riskLevel=critical, wow=-22,
  flowEstimate present; GetChokepointHistory returns 174 entries).
  User-reported "Hormuz empty" is likely browser/CDN stale cache from
  before PR #3185; hard refresh should resolve.
- scenario-worker.mjs has zero result keys in 24h. Railway service
  needs verification/redeployment.

* fix(scenario): wrong Upstash RPUSH format silently broke every Simulate Closure

Railway scenario-worker log shows every job failing field validation since
at least 03:06Z today:

  [scenario-worker] Job failed field validation, discarding:
    ["{\"jobId\":\"scenario:1776535792087:cynxx5v4\",...

The leading [" in the payload is the smoking gun. api/scenario/v1/run.ts
was POSTing to /rpush/{key} with body `[payload]`, expecting Upstash to
unpack the array and push one string value. Upstash does NOT parse that
form — it stored the literal `["{...}"]` string as a single list value.

Worker BLMOVEs the literal string → JSON.parse → array → destructure
`{jobId, scenarioId, iso2}` on an array returns undefined for all three
→ every job discarded without writing a result. Client poll returns
`pending` for the full 60s timeout, then (on the prior client code path)
leaked the stuck "Computing…" button state indefinitely.

Fix: use the standard Upstash REST command format — POST to the base URL
with body `["RPUSH", key, value]`. Matches scripts/ais-relay.cjs upstashLpush.

After this, the scenario-queue:pending list stores the raw payload string,
BLMOVE returns the payload, JSON.parse gives the object, validation passes,
computeScenario runs, result key gets written, client poll sees `done`.

Zero result keys existed in prod Redis in the last 24h (24h TTL on
scenario-result:*) — confirms the fix addresses the production outage.
2026-04-18 23:38:33 +04:00
Elie Habib
d37ffb375e fix(referral): stop /api/referral/me 503s on prod homepage (#3186)
* fix(referral): make /api/referral/me non-blocking to stop prod 503s

Reported in prod: every PRO homepage load was logging 'GET /api/referral/me 503' to Sentry. Root cause: a prior review required the Convex binding to block the response (rationale: don't hand users a dead share link). That turned any flaky relay call into a homepage-wide 503 for the 5-minute client cache window — every PRO user, every page reload.

Fix: dispatch registerReferralCodeInConvex via ctx.waitUntil. Response returns 200 + code + shareUrl unconditionally. Binding failures log a warning but never surface as 503. The mutation is idempotent; the next /api/referral/me fetch retries. The /pro?ref=<code> signup side reads userReferralCodes at conversion time, so a missed binding degrades to missed attribution (partial), never to blocked homepage (total).

The BRIEF_URL_SIGNING_SECRET-missing 503 path is unchanged — that's a genuine misconfig, not a flake.

Handler signature now takes ctx with waitUntil, matching api/notification-channels.ts and api/discord/oauth/callback.ts.

Regression test flipped: brief-referral-code.test.mjs previously enforced the blocking shape; now enforces the non-blocking shape + handler signature + explicit does-not-503-on-binding-failure assertion. 14/14 referral tests pass. Typecheck clean, 5706/5706 test:data, lint exit 0.

* fix(referral): narrow err in non-blocking catch instead of unsafe cast

Greptile P2 on #3186. The (err as Error).message cast was safe today (registerReferralCodeInConvex only throws Error instances) but would silently log 'undefined' if a future path ever threw a non-Error value. Swapped to instanceof narrow + String(err) fallback.
2026-04-18 23:32:48 +04:00
Elie Habib
55ac431c3f feat(brief): public share mirror + in-magazine Share button (#3183)
* feat(brief): public share mirror + in-magazine Share button

Adds the growth-vector piece listed under Future Considerations in the
original brief plan (line 399): a shareable public URL and a one-click
Share button on the reader's magazine.

Problem: the per-user magazine at /api/brief/{userId}/{issueDate} is
HMAC-signed to a specific reader. You cannot share the URL you are
looking at, because the recipient either 403s (bad token) or reads
your personalised issue against your userId. Result: no way to share
the daily brief, no way for readers to drive discovery. Opening a
growth loop requires a separate public surface.

Approach: deterministic HMAC-derived short hash per {userId,
issueDate} backed by a pointer key in Redis.

New files

- server/_shared/brief-share-url.ts
  Web Crypto HMAC helper. deriveShareHash returns 12 base64url chars
  (72 bits) from (userId, issueDate) using BRIEF_SHARE_SECRET.
  Pointer encode/decode helpers and a shape check. Distinct from the
  per-user BRIEF_URL_SIGNING_SECRET so a leak of one does not
  automatically unmask the other.

- api/brief/share-url.ts (edge, Clerk auth, Pro gated)
  POST /api/brief/share-url?date=YYYY-MM-DD
  Idempotently writes brief:public:{hash} pointer with the same 7 day
  TTL as the underlying brief, then returns {shareUrl, hash,
  issueDate}. 404 if the per-user brief is missing. 503 on Upstash
  failure. Accepts an optional refCode in the JSON body for referral
  attribution.

- api/brief/public/[hash].ts (edge, unauth)
  GET /api/brief/public/{hash}?ref={code}
  Reads pointer, reads the real brief envelope, renders with
  publicMode=true. Emits X-Robots-Tag: noindex,nofollow so shared
  briefs never get enumerated by search engines. 404 on any missing
  part (bad hash shape, missing pointer, missing envelope) with a
  neutral error page. 503 on Upstash failure.

Renderer changes (server/_shared/brief-render.js)

- Signature extended: renderBriefMagazine(envelope, options?)
  - options.publicMode: redacts user.name and whyMatters before any
    HTML emission; swaps the back cover to a Subscribe CTA; prepends
    a Subscribe strip across the top of the deck; omits the Share
    button + share script; adds a noindex meta tag.
  - options.refCode: appended as ?ref= to /pro links on public views.
- Non-public views gain a sticky .wm-share pill in the top-right
  chrome. Inline SHARE_SCRIPT handles the click flow: POST /api/
  brief/share-url then navigator.share with clipboard fallback and a
  prompt() ancient-browser fallback. User-visible feedback via
  data-state on the button (sharing / copied / error). No change to
  the envelope contract, no LLM calls, no composer-side work
  required.
- Validation runs on the full unredacted envelope first, so the
  public path can never accept a shape the private path would reject.

Tests

- tests/brief-share-url.test.mts (18 assertions): determinism,
  secret sensitivity, userId/date sensitivity, shape validation, URL
  composition with/without refCode, trailing-slash handling on
  baseUrl, pointer encode/decode round-trip.
- tests/brief-magazine-render.test.mjs (+13 assertions): Share
  button carries the issue date; share script emitted once;
  share-url endpoint wired; publicMode strips the button+script,
  replaces whyMatters, emits noindex meta, prepends Subscribe strip,
  passes refCode through with escaping, swaps the back cover, does
  not leak the user name, preserves story headlines, options-less
  call matches the empty-options call byte for byte.
- Full typecheck/lint/edge-bundle/test:data/edge-functions suite all
  green: 5704/5704 data tests, 171/171 edge-function tests, 0 lint
  errors.

Env vars (new)

- BRIEF_SHARE_SECRET: 64+ random hex chars, Vercel (edge) only. NOT
  needed by the Railway composer because pointer writes are lazy
  (on share, not on compose).

* fix(brief): public share round-trip + magazine Share button without auth

Two P1 findings on #3183 review.

1) Pointer wire format: share-url.ts wrote the pointer as a raw colon-delimited string via SET. The public route reads via readRawJsonFromUpstash which ALWAYS JSON.parses. A bare non-JSON string throws at parse, the route returned 503 instead of resolving. Fix: JSON.stringify on both write sites. Regression test locks the wire format.

2) Share button auth unreachable from a standalone magazine tab: inline script needed window.WM_CLERK_JWT which is never set, endpoint hard-requires Bearer, fallback to credentials:include fails. Fix: derive share URL server-side in the per-user route (same inputs share-url uses), embed as data-share-url, click handler now reads dataset and invokes navigator.share directly. No network, no auth, works in any tab.

The /api/brief/share-url endpoint stays in place for other callers (dashboard panel) with its Clerk auth intact and its pointer write now in the correct format.

QA: typecheck clean, 5708/5708 data tests, 45/45 magazine, 20/20 share-url, edge bundle OK, lint exit 0.

* fix(brief): address remaining review findings on #3183

P0-2 (comment-only): public/[hash].ts inline comment incorrectly described readRawJsonFromUpstash parse-failure behaviour. The helper rethrows on JSON.parse failure, it does not return null. Rewrote the comment to match reality (JSON-encoded wire format, parse-to-string round-trip, intentional 503-on-bug-value as the loud failure mode). The actual wire-format fix was in prior commit 045771d55.

P2 (consistency): publicStripHtml href was built via template literal + encodeURIComponent without the final escapeHtml wrap that renderBackCover uses. Safe in practice (encodeURIComponent handles all HTML-special chars + route boundary restricts refCode to [A-Za-z0-9_-]) but inconsistent. Unified by extracting publicStripHref and escaping on interpolation, matching the sibling function.

QA: typecheck clean, 45/45 magazine tests pass, lint exit 0.
2026-04-18 22:46:22 +04:00
Elie Habib
388995b1a4 fix(health): macroSignals maxStaleMin 20 → 150 to match seed-economy cron cadence (#3179)
macroSignals is a secondary key written by seed-economy.mjs, whose
primary key energy-prices has maxStaleMin=150 in its runSeed config.
A 20-min threshold guaranteed STALE_SEED between every cron run.
2026-04-18 20:50:48 +04:00
Elie Habib
b5824d0512 feat(brief): Phase 9 / Todo #223 — share button + referral attribution (#3175)
* feat(brief): Phase 9 / Todo #223 — share button + referral attribution

Adds a Share button to the dashboard Brief panel so PRO users can
spread WorldMonitor virally. Built on the existing referral plumbing
(registrations.referralCode + referredBy fields; api/register-interest
already passes referredBy through) — this PR fills in the last mile:
a stable referral code for signed-in Clerk users, a share URL, and
a client-side share sheet.

Files:

  server/_shared/referral-code.ts (new)
    Deterministic 8-char hex code: HMAC(BRIEF_URL_SIGNING_SECRET,
    'referral:v1:' + userId). Same Clerk userId always produces the
    same code. No DB write on login, no schema migration, stable for
    the life of the account.

  api/referral/me.ts (new)
    GET -> { code, shareUrl, invitedCount }. Bearer-auth via Clerk.
    Reuses BRIEF_URL_SIGNING_SECRET to avoid another Railway env
    var. Stats fail gracefully to 0 on Convex outage.

  convex/registerInterest.ts + convex/http.ts
    New internal query getReferralStatsByCode({referralCode}) counts
    registrations rows that named this code as their referredBy.
    Exposed via POST /relay/referral-stats (RELAY_SHARED_SECRET auth).

  src/services/referral.ts (new)
    getReferralProfile: 5-min cache, profile is effectively immutable
    shareReferral: Web Share API primary (mobile native sheet),
    clipboard fallback on desktop. Returns 'shared'/'copied'/'blocked'
    /'error'. AbortError is treated as 'blocked', not failure.
    clearReferralCache for account-switch hygiene.

  src/components/LatestBriefPanel.ts + src/styles/panels.css
    New share row below the brief cover card. Button disabled until
    /api/referral/me resolves; if fetch fails the row removes itself.
    invitedCount > 0 renders as 'N invited' next to the button.
    Referral cache invalidated alongside Clerk token cache on account
    switch (otherwise user B would see user A's share link for 5 min).

Tests: 10 new cases in tests/brief-referral-code.test.mjs
  - getReferralCodeForUser: hex shape, determinism, uniqueness,
    secret-rotation invalidates, input guards
  - buildShareUrl: path shape, trailing-slash trim, URL-encoding

153/153 brief + deploy tests pass. Both tsconfigs typecheck clean.

Attribution flow (already working end-to-end):
  1. Share button -> worldmonitor.app/pro?ref={code}
  2. /pro landing page already reads ?ref= and passes to
     /api/register-interest as referredBy
  3. convex registerInterest:register increments the referrer's
     referralCount and stores referredBy on the new row
  4. /api/referral/me reads the count back via the relay query
  5. 'N invited' updates on next 5-min cache refresh

Scope boundaries (deferred):
  - Convex conversion tracking (invited -> PRO subscribed). Needs a
    join from registrations.referredBy to subscriptions.userId via
    email. Surface 'N converted' in a follow-up.
  - Referral-credit / reward system: viral loop works today, reward
    logic is a separate product decision.

* fix(brief): address three P2 review findings on #3175

- api/referral/me.ts JSDoc said '503 if REFERRAL_SIGNING_SECRET is
  not configured' but the handler actually reads BRIEF_URL_SIGNING_SECRET.
  Updated the docstring so an operator chasing a 503 doesn't look for
  an env var that doesn't exist.

- server/_shared/referral-code.ts carried a RESERVED_CODES Set to
  avoid collisions with URL-path keywords ('index', 'robots', 'admin').
  The guard is dead code: the code alphabet is [0-9a-f] (hex output
  of the HMAC) so none of those non-hex keywords can ever appear.
  Removed the Set + the while loop; left a comment explaining why it
  was unnecessary so nobody re-adds it.

- src/components/LatestBriefPanel.ts passed disabled: 'true' (string)
  to the h() helper. DOM-utils' h() calls setAttribute for unknown
  props, which does disable the button — but it's inconsistent with
  the later .disabled = false property write. Fixed to the boolean
  disabled: true so the attribute and the IDL property agree.

10/10 referral-code tests pass. Both tsconfigs typecheck clean.

* fix(brief): address two review findings on #3175 — drop misleading count + fix user-agnostic cache

P1: invitedCount wired to the wrong attribution store.
The share URL is /pro?ref=<code>. On /pro the 'ref' feeds
Dodopayments checkout metadata (affonso_referral), NOT
registrations.referredBy. /api/referral/me counted only the
waitlist path, so the panel would show 0 invited for anyone who
converted direct-to-checkout — misleading.

Rather than ship a count that measures only one of two attribution
paths (and the less-common one at that), the count is removed
entirely. The share button itself still works. A proper metric
requires unifying the waitlist + Dodo-metadata paths into a single
attribution store, which is a follow-up.

Changes:
  - api/referral/me.ts: response shape is { code, shareUrl } — no
    invitedCount / convertedCount
  - convex/registerInterest.ts: removed getReferralStatsByCode
    internal query
  - convex/http.ts: removed /relay/referral-stats route
  - src/services/referral.ts: ReferralProfile interface no longer
    has invitedCount; fetch call unchanged in behaviour
  - src/components/LatestBriefPanel.ts: dropped the 'N invited'
    render branch

P2: referral cache was user-agnostic.
Module-global _cached had no userId key, so a stale cache primed by
user A would hand user B user A's share link for up to 5 min after
an account switch — if no panel is mounted at the transition moment
to call clearReferralCache(). Per the reviewer's point, this is a
real race.

Fix: two-part.
  (a) Cache entry carries the userId it was computed for; reads
      check the current Clerk userId and only accept hits when they
      match. Mismatch → drop + re-fetch.
  (b) src/services/referral.ts self-subscribes to auth-state at
      module load (ensureAuthSubscription). On any id transition
      _cached is dropped. Module-level subscription means the
      invalidation works even when no panel is currently mounted.
  (c) Belt-and-suspenders: post-fetch, re-check the current user
      before caching. Protects against account switches that happen
      mid-flight between 'read cache → ask network → write cache'.

Panel's local clearReferralCache() call removed — module now
self-invalidates.

10/10 referral-code tests pass. Both tsconfigs typecheck clean.

* fix(referral): address P1 review finding — share codes now actually credit the sharer

The earlier head generated 8-char Clerk-derived HMAC codes for the
share button, but the waitlist register mutation only looked up
registrations.by_referral_code (6-char email-generated codes).
Codes issued by the share button NEVER resolved to a sharer — the
'referral attribution' half of the feature was non-functional.

Fix (schema-level, honest attribution path):

  convex/schema.ts
    - userReferralCodes   { userId, code, createdAt }  + by_code, by_user
    - userReferralCredits { referrerUserId, refereeEmail, createdAt }
                          + by_referrer, by_referrer_email

  convex/registerInterest.ts
    - register mutation: after the existing registrations.by_referral_code
      lookup, falls through to userReferralCodes.by_code. On match,
      inserts a userReferralCredits row (the Clerk user has no
      registrations row to increment, so credit needs its own
      table). Dedupes by (referrer, refereeEmail) so returning
      visitors can't double-credit.
    - new internalMutation registerUserReferralCode({userId, code})
      idempotent binding of a code to a userId. Collisions logged
      and ignored (keeps first writer).

  convex/http.ts
    - new POST /relay/register-referral-code (RELAY_SHARED_SECRET auth)
      that runs the mutation above.

  api/referral/me.ts
    - signature gains a ctx.waitUntil handle
    - after generating the user's code, fire-and-forget POSTs to
      /relay/register-referral-code so the binding is live by the
      time anyone clicks a shared link. Idempotent — a failure just
      means the NEXT call re-registers.

Still deferred: display of 'N credited' / 'N converted' in the
LatestBriefPanel. The waitlist side now resolves correctly, but the
Dodopayments checkout path (/pro?ref=<code> → affonso_referral) is
tracked in Dodo, not Convex. Surfacing a unified count requires a
separate follow-up to pull Dodo metadata into Convex.

Regression tests (3 new cases in tests/brief-referral-code.test.mjs):
  - register mutation extends to userReferralCodes + inserts credits
  - schema declares both tables with the right indexes
  - /api/referral/me registers the binding via waitUntil

13/13 referral tests pass. Both tsconfigs typecheck clean.

* fix(referral): address two P1 review findings — checkout attribution + dead-link prevention

P1: share URL didn't credit on the /pro?ref= checkout path.
The earlier PR wired Clerk codes into the waitlist path
(/api/register-interest -> userReferralCodes -> userReferralCredits)
but a visitor landing on /pro?ref=<code> and going straight to
Dodo checkout forwarded the code only into Dodo metadata
(affonso_referral). Nothing on our side credited the sharer.

Fix: convex/payments/subscriptionHelpers.ts handleSubscriptionActive
now reads data.metadata.affonso_referral when inserting a NEW
subscription row. If the code resolves in userReferralCodes, a
userReferralCredits row crediting the sharer is inserted (deduped
by (referrer, refereeEmail) so webhook replays don't double-credit).
The credit only lands on first-activation — the else-branch of the
existing/new split guards against replays.

P1: /api/referral/me returned 200 + share link even when the
(code, userId) binding failed.
ctx.waitUntil(registerReferralCodeInConvex(...)) ran the binding
asynchronously, swallowing missing env + non-2xx + network errors.
Users got a share URL that the waitlist lookup could never
resolve — dead link.

Fix: registerReferralCodeInConvex is now BLOCKING (throws on any
failure) and the handler awaits it before returning. On failure
the endpoint responds 503 service_unavailable rather than a 200
with a non-functional URL. Mutation is idempotent so client
retries are safe.

Regression tests (2 updated/new in tests/brief-referral-code.test.mjs):
  - asserts the binding is awaited, not ctx.waitUntil'd; asserts
    the failure path returns 503
  - asserts subscriptionHelpers reads affonso_referral, resolves via
    userReferralCodes.by_code, inserts a userReferralCredits row,
    and dedupes by (referrer, refereeEmail)

14/14 referral tests pass. Both tsconfigs typecheck clean.

Net effect: /pro?ref=<code> visitors who convert (direct checkout)
now credit the sharer on webhook receipt, same as waitlist
signups. The share button is no longer a dead-end UI.
2026-04-18 20:39:55 +04:00
Elie Habib
122204f691 feat(brief): Phase 8 — Telegram carousel images via Satori + resvg-wasm (#3174)
* feat(brief): Phase 8 — Telegram carousel images via Satori + resvg-wasm

Implements the Phase 8 carousel renderer (Option B): server-side PNG
generation in a Vercel edge function using Satori (JSX to SVG) +
@resvg/resvg-wasm (SVG to PNG). Zero new Railway infra, zero
Chromium, same edge runtime that already serves the magazine HTML.

Files:

  server/_shared/brief-carousel-render.ts (new)
    Pure renderer: (BriefEnvelope, CarouselPage) -> Uint8Array PNG.
    Three layouts (cover/threads/story), 1200x630 OG size.
    Satori + resvg + WASM are lazy-imported so Node tests don't trip
    over '?url' asset imports and the 800KB wasm doesn't ship in
    every bundle. Font: Noto Serif regular, fetched once from Google
    Fonts and memoised on the edge isolate.

  api/brief/carousel/[userId]/[issueDate]/[page].ts (new)
    Public edge function reusing the magazine route's HMAC token —
    same signer, same (userId, issueDate) binding, so one token
    unlocks magazine HTML AND all three carousel images. Returns
    image/png with 7d immutable cache headers. 404 on invalid page
    index, 403 on bad token, 404 on Redis miss, 503 on missing
    signing secret. Render failure falls back to a 1x1 transparent
    PNG so Telegram's sendMediaGroup doesn't 500 the brief.

  scripts/seed-digest-notifications.mjs
    carouselUrlsFrom(magazineUrl) derives the 3 signed carousel
    URLs from the already-signed magazine URL. sendTelegramBriefCarousel
    calls Telegram's sendMediaGroup with those URLs + short caption.
    Runs before the existing sendTelegram(text) so the carousel is
    the header and the text the body — long-form stories remain
    forwardable as text. Best-effort: carousel failure doesn't
    block text delivery.

  package.json + package-lock.json
    satori ^0.10.14 + @resvg/resvg-wasm ^2.6.2.

Tests (tests/brief-carousel.test.mjs, 9 cases):
  - pageFromIndex mapping + out-of-range
  - carouselUrlsFrom: valid URL, localhost origin preserved, missing
    token, wrong path, invalid issueDate, garbage input
  - Drift guard: cron must still declare the same helper + template
    string. If it drifts, test fails with a pointer to move the impl
    into a shared module.

PNG render itself isn't unit-tested — Satori + WASM need a
browser/edge runtime. Covered by smoke validation step in the
deploy monitoring plan.

Both tsconfigs typecheck clean. 152/152 brief tests pass.

Scope boundaries (deferred):
  - Slack + Discord image attachments (different payload shapes)
  - notification-relay.cjs brief_ready dispatch (real-time route)
  - Redis caching of rendered PNG (edge Cache-Control is enough for
    MVP)

* fix(brief): address two P1 review findings on Phase 8 carousel

P1-A: 200 placeholder PNG cached 7d on render failure.
Route config said runtime: 'edge' but a comment contradicted it
claiming Node semantics. More importantly, any render/init failure
(WASM load, Satori, Google Fonts) was converted to a 1x1 transparent
PNG returned with Cache-Control: public, max-age=7d, immutable.
Telegram's media fetcher and Vercel's CDN would cache that blank
for the full brief TTL per chat message — one cold-start mismatch
= every reader of that brief sees blank carousel previews for a
week.

Fix: deleted errorPng(). Render failure now returns 503 with
Cache-Control: no-store. sendMediaGroup fails cleanly for that
carousel (the digest cron already treats it as best-effort and
still sends the long-form text message), next cron tick re-renders
from a fresh isolate. Self-healing across ticks. Contradictory
comment about Node runtime removed.

P1-B: Google Fonts as silent hard dependency.
The renderer claimed 'safe embedded/fallback path' in comments but
no fallback existed. loadFont() fetches Noto Serif from gstatic.com
and rethrows on any failure. Combined with P1-A's old 200-cache-7d
path, a transient CDN blip would lock in a blank carousel for a
week.

Fix: updated comments to honestly declare the CDN dependency plus
document the self-healing semantics now that P1-A's fix no longer
caches the failure. If Google Fonts reliability becomes a problem,
swap the fetch for a bundled base64 TTF — noted as the escape hatch.

Tests (tests/brief-carousel.test.mjs): 2 new regression cases.
11/11 carousel tests pass. Both tsconfigs typecheck clean locally.

Note on currently-red CI: failures are NOT typecheck errors — npm
ci dies fetching libvips for sharp (504 Gateway Time-out from
GitHub releases). sharp is a transitive dep via @xenova/transformers,
pre-existing, not touched by this PR. Transient infra flake.

* fix(brief): switch carousel to Node + @resvg/resvg-js (fixes deploy block)

Vercel edge bundler fails the carousel deploy with:
  'Edge Function is referencing unsupported modules:
   @resvg/resvg-wasm/index_bg.wasm?url'

The ?url asset-import syntax is a Vite-ism that Vercel's edge
bundler doesn't resolve. Two ways out: find a Vercel-blessed edge
WASM import incantation, or switch to Node runtime with the native
@resvg/resvg-js binding. The second is simpler, faster per request,
and avoids the whole WASM-in-edge-bundler rabbit hole.

Changes:
  - package.json: @resvg/resvg-wasm -> @resvg/resvg-js ^2.6.2
  - api/brief/carousel/.../[page].ts: runtime 'edge' -> 'nodejs20.x'
  - server/_shared/brief-carousel-render.ts: drop initWasm path,
    dynamic-import resvg-js in ensureLibs(). Satori and resvg load
    in parallel via Promise.all, shaving ~30ms off cold start.

Also addresses the P2 finding from review: the old ensureLibsAndWasm
had a concurrent-cold-start race where two callers could reach
'await initWasm()' simultaneously. Replaced the boolean flag with a
shared _libsLoadPromise so concurrent callers await the same load.
On failure the promise resets so the NEXT request retries rather
than poisoning the isolate for its lifetime.

Cold start ~700ms (Satori + resvg-js native init), warm ~40ms.
Carousel images are not latency-critical — fetched by Telegram's
media service, CDN-cached 7d.

Both tsconfigs typecheck clean. 11/11 carousel tests pass.

* fix(brief): carousel runtime = 'nodejs' (was 'nodejs20.x', rejected by Vercel)

Vercel's functions config validator rejects 'nodejs20.x' at deploy
time:

  unsupported "runtime" value in config: "nodejs20.x"
  (must be one of: ["edge","experimental-edge","nodejs"])

The Node version comes from the project's default (currently Node 20
via package.json engines + Vercel project settings), not from the
runtime string. Use 'nodejs' — unversioned — and let the platform
resolve it.

11/11 carousel tests pass.

* fix(brief): swap carousel font from woff2 to woff (Satori can't parse woff2)

Review on PR #3174: the FONT_URL was pointing at a gstatic.com woff2
file. Satori parses ttf / otf / woff v1 — NOT woff2. Every render
was about to throw on font decode, the route would return 503, and
the carousel would never deliver a single image.

Fix: point FONT_URL at @fontsource's Noto Serif Latin 400 WOFF v1
via jsdelivr. WOFF v1 is a TrueType wrapper that Satori parses
natively (verified: file says 'Web Open Font Format, TrueType,
version 1.1'). Same cold-start semantics as before — one fetch per
warm isolate, memoised.

Regression test: asserts FONT_URL ends in ttf/otf/woff and explicitly
rejects any .woff2 suffix. A future swap that silently reintroduces
woff2 now fails CI loudly instead of shipping a permanently-broken
renderer.

12/12 carousel tests pass. Both tsconfigs typecheck clean.
2026-04-18 20:27:41 +04:00
Elie Habib
e1c3b28180 feat(notifications): Phase 6 — web-push channel for PWA notifications (#3173)
* feat(notifications): Phase 6 — web-push channel for PWA notifications

Adds a web_push notification channel so PWA users receive native
notifications when this tab is closed. Deep-links click to the
brief magazine URL for brief_ready events, to the event link for
everything else.

Schema / API:
- channelTypeValidator gains 'web_push' literal
- notificationChannels union adds { endpoint, p256dh, auth,
  userAgent? } variant (standard PushSubscription identity triple +
  cosmetic UA for the settings UI)
- new setWebPushChannelForUser internal mutation upserts the row
- /relay/deactivate allow-list extended to accept 'web_push'
- api/notification-channels: 'set-web-push' action validates the
  triple, rejects non-https, truncates UA to 200 chars

Client (src/services/push-notifications.ts + src/config/push.ts):
- isWebPushSupported guards Tauri webview + iOS Safari
- subscribeToPush: permission + pushManager.subscribe + POST triple
- unsubscribeFromPush: pushManager.unsubscribe + DELETE row
- VAPID_PUBLIC_KEY constant (with VITE_VAPID_PUBLIC_KEY env override)
- base64 <-> Uint8Array helpers (VAPID key encoding)

Service worker (public/push-handler.js):
- Imported into VitePWA's generated sw.js via workbox.importScripts
- push event: renders notification; requireInteraction=true for
  brief_ready so a lock-screen swipe does not dismiss the CTA
- notificationclick: focuses+navigates existing same-origin client
  when present, otherwise opens a new window
- Malformed JSON falls back to raw text body, missing data falls
  back to a minimal WorldMonitor default

Relay (scripts/notification-relay.cjs):
- sendWebPush() with lazy-loaded web-push dep. 404/410 triggers
  deactivateChannel('web_push'). Missing VAPID env vars logs once
  and skips — other channels keep delivering.
- processEvent dispatch loop + drainHeldForUser both gain web_push
  branches

Settings UI (src/services/notifications-settings.ts):
- New 'Browser Push' tile with bell icon
- Enable button lazy-imports push-notifications, calls subscribe,
  renders 'Not supported' on Tauri/in-app webviews
- Remove button routes web_push specifically through
  unsubscribeFromPush so the browser side is cleaned up too

Env vars required on Railway services:
  VAPID_PUBLIC_KEY   public key
  VAPID_PRIVATE_KEY  private key
  VAPID_SUBJECT      mailto:support@worldmonitor.app (optional)

Public key is also committed as the default in src/config/push.ts
so the client bundle works without a build-time override.

Tests: 11 new cases in tests/brief-web-push.test.mjs
- base64 <-> Uint8Array round-trip + null guards
- VAPID default fallback when env absent
- SW push event rendering, requireInteraction gating, malformed JSON
  + no-data fallbacks
- SW notificationclick: openWindow vs focus+navigate, default url

154/154 tests pass. Both tsconfigs typecheck clean.

* fix(brief): address PR #3173 review findings + drop hardcoded VAPID

P1 (security): VAPID private key leaked in PR description.
Rotated the keypair. Old pair permanently invalidated. Structural fix:

  Removed DEFAULT_VAPID_PUBLIC_KEY entirely. Hardcoding the public
  key in src/config/push.ts gave rotations two sources of truth
  (code vs env) — exactly the friction that caused me to paste the
  private key in a PR description in the first place. VAPID_PUBLIC_KEY
  now comes SOLELY from VITE_VAPID_PUBLIC_KEY at build time.
  isWebPushConfigured() gates the subscribe flow so builds without
  the env var surface as 'Not supported' rather than crashing
  pushManager.subscribe.

  Operator setup (one-time):
    Vercel build:      VITE_VAPID_PUBLIC_KEY=<public>
    Railway services:  VAPID_PUBLIC_KEY=<public>
                       VAPID_PRIVATE_KEY=<private>
                       VAPID_SUBJECT=mailto:support@worldmonitor.app

  Rotation: update env on both sides, redeploy. No code change, no
  PR body — no chance of leaking a key in a commit.

P2: single-device fan-out — setWebPushChannelForUser replaces the
previous subscription silently. Per-device fan-out is a schema change
deferred to follow-up. Fix for now: surface the replacement in
settings UI copy ('Enabling here replaces any previously registered
browser.') so users who expect multi-device see the warning.

P2: 24h push TTL floods offline devices on reconnect. Event-type-aware:
  brief_ready:       12h  (daily editorial — still interesting)
  quiet_hours_batch:  6h  (by definition queued-on-wake)
  everything else:   30m  (transient alerts: noise after 30min)

REGRESSION test: VAPID_PUBLIC_KEY must be '' when env var is unset.
If a committed default is reintroduced, the test fails loudly.

11/11 web-push tests pass. Both tsconfigs typecheck clean.

* fix(notifications): deliver channel_welcome push for web_push connects (#3173 P2)

The settings UI queues a channel_welcome event on first web_push
subscribe (api/notification-channels.ts:240 via publishWelcome), but
processWelcome() in the relay only branched on slack/discord/email —
no web_push arm. The welcome event was consumed off the queue and
then silently dropped, leaving first-time subscribers with no
'connection confirmed' signal.

Fix: add a web_push branch to processWelcome. Calls sendWebPush with
eventType='channel_welcome' which maps to the 30-minute TTL tier in
the push-delivery switch — a welcome that arrives >30 min after
subscribe is noise, not confirmation.

Short body (under 80 chars) so Chrome/Firefox/Safari notification
shelves don't clip past ellipsis.

11/11 web-push tests pass.

* fix(notifications): address two P1 review findings on #3173

P1-A: SSRF via user-supplied web_push endpoint.
The set-web-push edge handler accepted any https:// URL and wrote
it to Convex. The relay's sendWebPush() later POSTs to whatever
endpoint sits in that row, giving any Pro user a server-side-request
primitive bounded only by the relay's network egress.

Fix: isAllowedPushEndpointHost() allow-list in api/notification-
channels.ts. Only the four known browser push-service hosts pass:

  fcm.googleapis.com                (Chrome / Edge / Brave)
  updates.push.services.mozilla.com (Firefox)
  web.push.apple.com                (Safari, macOS 13+)
  *.notify.windows.com              (Windows Notification Service)

Fail-closed: unknown hosts rejected with 400 before the row ever
reaches Convex. If a future browser ships a new push service we'll
need to widen this list (guarded by the SSRF regression tests).

P1-B: cross-account endpoint reuse on shared devices.
The browser's PushSubscription is bound to the origin, NOT to the
Clerk session. User A subscribes on device X, signs out, user B
signs in on X and subscribes — the browser hands out the SAME
endpoint/p256dh/auth triple. The previous setWebPushChannelForUser
upsert keyed only by (userId, channelType), so BOTH rows now carry
the same endpoint. Every push the relay fans out for user A also
lands on device X which is now showing user B's session.

Fix: setWebPushChannelForUser scans all web_push rows and deletes
any that match the new endpoint BEFORE upserting. Effectively
transfers ownership of the subscription to the current caller.
The previous user will need to re-subscribe on that device if they
sign in again.

No endpoint-based index on notificationChannels — the scan happens
at <10k rows and is well-bounded to the one write-path per user
per connect. If volume grows, add an  + migration.

Regression tests (tests/brief-web-push.test.mjs, 3 new cases):
  - allow-list defines all four browser hosts + fail-closed return
  - allow-list is invoked BEFORE convexRelay() in the handler
  - setWebPushChannelForUser compares + deletes rows by endpoint

14/14 web-push tests pass. Both tsconfigs typecheck clean.
2026-04-18 20:27:08 +04:00
Sebastien Melki
bc91c61a87 [codex] guard duplicate subscription checkout (#3162)
* guard duplicate subscription checkout

* address checkout guard review feedback
2026-04-18 15:19:34 +04:00
Elie Habib
64c906a406 feat(eia): gold-standard /api/eia/petroleum (Railway seed → Redis → Vercel reads only) (#3161)
* feat(eia): move /api/eia/petroleum to gold-standard (Railway seed → Redis → Vercel reads only)

Live api.eia.gov fetches from the Vercel edge function were causing
FUNCTION_INVOCATION_TIMEOUT 504s on /api/eia/petroleum (Sydney edge →
US origin with no timeout, no cache, no stale fallback — one EIA blip
blew the 25s budget).

- New seeder scripts/seed-eia-petroleum.mjs — fetches WTI/Brent/
  production/inventory from api.eia.gov with per-fetch 15s timeouts,
  writes energy:eia-petroleum:v1 with the {_seed, data} envelope.
  Accepts 1-of-4 series; 0-of-4 routes to contract-mode RETRY so
  seed-meta stays stale and the bundle retries on next cron.
- Bundled into seed-bundle-energy-sources.mjs (daily, 90s timeout) —
  no new Railway service needed.
- Rewrote api/eia/[[...path]].js as a Redis-only reader via
  readJsonFromUpstash. Same response shape for backward compat with
  widgets/MCP/external callers. 503 + Retry-After on miss (never 504).
- Registered eiaPetroleum in api/health.js STANDALONE_KEYS + gated as
  ON_DEMAND_KEYS for the deploy window; promote to SEED_META
  (maxStaleMin: 4320) in a follow-up after ~7 days of clean cron.
- Tests: 14 seeder unit tests + 9 edge handler tests.

Audit result: /api/eia/petroleum was the only Vercel route fetching
dashboard data live. Every other fetch(https://…) in api/ is
auth/payments/notifications/user-initiated enrichment.

* fix(eia): close silent-stale window — add SEED_META + seed-health registration

Review finding on PR #3161: without a SEED_META entry, readSeedMeta
returns seedStale: null and classifyKey never reaches STALE_SEED.
That meant a broken Railway cron or missing EIA_API_KEY after the first
successful seed would keep /api/eia/petroleum serving stale data for
up to 7 days (TTL) while /api/health reported OK.

- api/health.js: add SEED_META.eiaPetroleum with maxStaleMin=4320
  (72h = 3× daily bundle cadence). Keep eiaPetroleum in ON_DEMAND_KEYS
  so the Vercel-instant / Railway-delayed deploy window doesn't CRIT
  on first seed, but stale-after-seed now properly fires STALE_SEED.
- api/seed-health.js: register energy:eia-petroleum in SEED_DOMAINS
  (intervalMin=1440) so the secondary health endpoint reports it too.
- Updated ON_DEMAND_KEYS comment to reflect freshness is now enforced.
2026-04-18 14:40:00 +04:00
Elie Habib
de769ce8e1 fix(api): unblock Pro API clients at edge + accept x-api-key alias (#3155)
* fix(api): unblock Pro API clients at edge + accept x-api-key alias

Fixes #3146: Pro API subscriber getting 403 when calling from Railway.
Two independent layers were blocking server-side callers:

1. Vercel Edge Middleware (middleware.ts) blocks any UA matching
   /bot|curl\/|python-requests|go-http|java\//, which killed every
   legitimate server-to-server API client before the gateway even saw
   the request. Add bypass: requests carrying an `x-worldmonitor-key`
   or `x-api-key` header that starts with `wm_` skip the UA gate.
   The prefix is a cheap client-side signal, not auth — downstream
   server/gateway.ts still hashes the key and validates against the
   Convex `userApiKeys` table + entitlement check.

2. Header name mismatch. Docs/gateway only accepted
   `X-WorldMonitor-Key`, but most API clients default to `x-api-key`.
   Accept both header names in:
     - api/_api-key.js (legacy static-key allowlist)
     - server/gateway.ts (user-issued Convex-backed keys)
     - server/_shared/premium-check.ts (isCallerPremium)
   Add `X-Api-Key` to CORS Allow-Headers in server/cors.ts and
   api/_cors.js so browser preflights succeed.

Follow-up outside this PR (Cloudflare dashboard, not in repo):
- Extend the "Allow api access with WM" custom WAF rule to also match
  `starts_with(http.request.headers["x-api-key"][0], "wm_")`, so CF
  Managed Rules don't block requests using the x-api-key header name.
- Update the api-cors-preflight CF Worker's corsHeaders to include
  `X-Api-Key` (memory: cors-cloudflare-worker.md — Worker overrides
  repo CORS on api.worldmonitor.app).

* fix(api): tighten middleware bypass shape + finish x-api-key alias coverage

Addresses review findings on #3155:

1. middleware.ts bypass was too loose. "Starts with wm_" let any caller
   send X-Api-Key: wm_fake and skip the UA gate, shifting unauthenticated
   scraper load onto the gateway's Convex lookup. Tighten to the exact
   key format emitted by src/services/api-keys.ts:generateKey —
   `^wm_[a-f0-9]{40}$` (wm_ + 20 random bytes as hex). Still a cheap
   edge heuristic (no hash lookup in middleware), but raises spoofing
   from trivial prefix match to a specific 43-char shape.

2. Alias was incomplete on bespoke endpoints outside the shared gateway:
   - api/v2/shipping/route-intelligence.ts: async wm_ user-key fallback
     now reads X-Api-Key as well
   - api/v2/shipping/webhooks.ts: webhook ownership fingerprint now
     reads X-Api-Key as well (same key value → same SHA-256 → same
     ownerTag, so a user registering with either header can manage
     their webhook from the other)
   - api/widget-agent.ts: accept X-Api-Key in the auth read AND in the
     OPTIONS Allow-Headers list
   - api/chat-analyst.ts: add X-Api-Key to the OPTIONS Allow-Headers
     list (auth path goes through shared helpers already aliased)
2026-04-18 08:18:49 +04:00
Elie Habib
d46c170012 feat(brief): hosted magazine edge route + latest-brief preview RPC (Phase 2) (#3153)
* feat(brief): hosted magazine edge route + latest-brief preview RPC (Phase 2)

Phase 2 of the WorldMonitor Brief plan (docs/plans/2026-04-17-003).
Adds the read path that every downstream delivery channel binds to.
Phase 1 shipped the renderer + envelope contract; Phase 3 (composer)
will write brief:{userId}:{issueDate} to Redis. Until then the route
returns a neutral "expired" page for every request — intentional, safe
to deploy ahead of the composer.

Files:

- server/_shared/brief-url.ts
  HMAC-SHA256 sign + verify helpers using Web Crypto (edge-compatible,
  no node:crypto). Rotation supported via optional prevSecret in
  verifyBriefToken. Constant-time comparison. Strict userId and
  YYYY-MM-DD shape validation before any crypto operation. Typed
  BriefUrlError with code enum for caller branches.

- api/brief/[userId]/[issueDate].ts  (Vercel edge)
  Auth model: the HMAC-signed ?t= token IS the credential. No Clerk
  session required — URLs are delivered over already-authenticated
  channels (push, email, dashboard panel). Flow: verify token →
  Redis GET brief:{userId}:{issueDate} → renderBriefMagazine → HTML.
  403 on bad token, 404 on Redis miss, 503 on missing secret. Never
  echoes the userId in error pages.

- api/latest-brief.ts  (Clerk JWT + PRO gated, Vercel edge)
  Returns { issueDate, dateLong, greeting, threadCount, magazineUrl }
  or { status: 'composing', issueDate } when Redis is cold. Mirrors
  the auth + entitlement pattern in api/notify.ts. magazineUrl is
  freshly signed per request against the current BRIEF_URL_SIGNING_
  SECRET so rotation takes effect immediately.

- tests/brief-url.test.mjs
  20 tests: round-trip, tampering, wrong user/date/secret, malformed
  token/inputs, rotation (accept prev secret + reject after removal),
  URL composition (trailing-slash trim, path encoding).

Quality gates: typecheck + typecheck:api clean, biome lint clean,
render tests still 30/30, new HMAC tests 20/20.

PRE-MERGE: BRIEF_URL_SIGNING_SECRET must be set in Vercel before this
deploys. The route returns 503 (not 500) with a server-side log when
the secret is missing, so a mis-configured deploy is safe to roll.

* fix(brief): address Phase 2 review — broken import + HEAD body + host reflection

Addresses todos 212–217 from the /ce-review on PR #3153.

P0 / P1 blockers:

- todo 212 (P0): renderer import path was broken — pointed at
  shared/render-brief-magazine.js which no longer exists (Phase 1
  review moved it to server/_shared/brief-render.js). Route would
  have 500'd on every successful token verification; @ts-expect-error
  silenced the error at compile time. Fixed the path and removed the
  now-unnecessary @ts-expect-error since brief-render.d.ts exists.
  New tests/brief-edge-route-smoke.test.mjs imports the handler so a
  future broken path cannot pass CI.

- todo 213 (P1): HEAD returned the full HTML body (RFC 7231
  violation). htmlResponse() now takes the request and emits null
  body when method is HEAD.

- todo 214 (P1): publicBaseUrl reflected x-forwarded-host, allowing a
  signed magazineUrl to point at a non-canonical origin (preview
  deploy, forwarded-host spoof). Now pinned to
  WORLDMONITOR_PUBLIC_BASE_URL env var, falling back to the request
  origin only in dev. This env var joins BRIEF_URL_SIGNING_SECRET on
  the pre-merge Vercel checklist.

P2 cleanups:

- todo 215: dropped dead 'invalid_token_shape' enum member from
  BriefUrlError — never thrown (verifyBriefToken returns false on
  shape failure by design).

- todo 216: consolidated EXPIRED_PAGE + FORBIDDEN_PAGE (and added
  UNAVAILABLE_PAGE) behind a single renderErrorPage(heading, body)
  helper with shared styles.

- todo 217: extracted readRawJsonFromUpstash() into api/_upstash-json.js
  so api/brief and api/latest-brief share one 3-second-timeout raw-GET
  implementation. Dodges the readJsonFromUpstash unwrap that strips
  seed envelopes (our brief envelope is flat, not seed-framed).

Also:
- brief-url.ts header now documents the rotation runbook + the
  emergency kill-switch path (rotate SECRET without setting PREV).
- distinguishable log tag for malformed-envelope vs Redis-miss
  (composer-bug grep vs expired-key grep).

Deferred (todo 218, P2): token-in-URL log leak is a design-level
issue that wants a cookie-based first-visit exchange or iat/exp in
the signed payload. Out of scope for this fix commit.

Tests: 55/55 (20 HMAC + 30 render + 5 edge-route smoke). Typecheck
clean, biome lint clean.

* fix(brief): distinguish infra failure from miss + walk tz boundary

Addresses two additional P1 review findings on PR #3153.

1. readRawJsonFromUpstash collapsed every failure mode (missing creds,
   HTTP non-2xx, timeout, JSON parse failure, genuine miss) into a
   single null return. Both edge routes then treated null as empty
   state: api/brief → 404 "expired", api/latest-brief → 200
   "composing". During any Upstash outage a user with a valid brief
   would see misleading empty-state UX.

   Helper now throws on every infrastructure/parse failure and
   returns null ONLY on a genuine miss (Upstash replied 200 with
   result: null). Callers:
     - api/brief catches → 503 UNAVAILABLE_PAGE
     - api/latest-brief catches → 503 { error: service_unavailable }

2. api/latest-brief probed today UTC only. For users ahead of or
   behind UTC, a ready brief could be invisible for 12-24h around
   the date boundary.

   Route now accepts an optional ?date=YYYY-MM-DD query param so the
   dashboard panel (or any client) can send its local date directly.
   When ?date= is absent, we walk today UTC → yesterday UTC and
   return the most recent hit; composing is only reported when both
   slots miss. Malformed ?date= returns 400.

Tests added:
- helper-level: missing creds → throws, HTTP error → throws,
  genuine miss → null
- route-level: api/brief returns 503 (not 404) on Upstash HTTP error

Final suite: 60/60 (20 HMAC + 30 render + 10 smoke). Typecheck +
lint clean.

* fix(brief): tomorrow-UTC probe + unified envelope validation

Addresses two third-round review findings on PR #3153.

1. Walk-back missed the tomorrow-UTC slot. Users at positive tz
   offsets (UTC+1..UTC+14) store today's brief under tomorrow UTC;
   the previous [today, yesterday] probe never saw them. Fixed to
   walk [tomorrow, today, yesterday] UTC — covers the full tz range
   and naturally prefers the most recently composed slot. Also
   correctly echoes the caller's ?date= in composing responses
   instead of always echoing today UTC.

2. readBriefPreview validated only dateLong + digest.greeting +
   stories (array). The renderer's assertBriefEnvelope is much
   stricter (closed-key contract, version guard, cross-field
   surfaced === stories.length, per-story 7 required fields, etc.).
   A partial envelope could be reported "ready" by the preview and
   404-expired on click. Exported assertBriefEnvelope from the
   renderer module, called it in the preview reader. An envelope
   that would render successfully is exactly an envelope that the
   preview reports as ready; any divergence is logged as a composer
   bug and surfaced as "composing".

Tests: 62/62 (20 HMAC + 30 render + 12 smoke). New cases cover the
shared-validator export and catch the "partial envelope slips past
weak preview" regression.

Typecheck + lint clean.
2026-04-18 07:28:49 +04:00
Elie Habib
f44b3260f4 fix(relay): harden Telegram session lifecycle + add health monitoring (#3152)
* fix(relay): harden Telegram session lifecycle + add health monitoring

Three fixes for the AUTH_KEY_DUPLICATED outage that silently emptied
the Telegram Intel panel with no health signal:

1. Increase disconnect timeout from 3s to 10s in gracefulShutdown,
   and default TELEGRAM_STARTUP_DELAY_MS from 60s to 120s. The 3s
   timeout was too aggressive for the MTProto disconnect handshake,
   allowing the old session to linger past the new container's
   startup delay window, causing AUTH_KEY_DUPLICATED.

2. Register telegramFeed in health.js (STANDALONE_KEYS + SEED_META
   with maxStaleMin=10). The relay now writes both a data key and
   seed-meta on each successful poll cycle. When the poll stops
   (session invalidated, package missing, FLOOD_WAIT), the key goes
   stale within 10 minutes and surfaces as STALE_SEED in the global
   health dashboard instead of silently showing "No messages available"
   in the panel indefinitely.

3. Add destroyTelegramClient() that nulls the client reference AND
   tears down the MTProto sender's internal reconnect loop and
   underlying socket. The library's autonomous reconnect mechanism
   continued running after AUTH_KEY_DUPLICATED, spamming recv loop
   crash/reconnect log lines every 90s even though telegramPermanently
   Disabled was true and no polls were running.

* fix(relay): stagger telegram Redis TTLs so STALE_SEED fires before EMPTY

Data key 1800s (30min), seed-meta 900s (15min). With maxStaleMin=10,
the health timeline is:
  0-10min after last poll: OK (meta fresh, data present)
  10-15min: STALE_SEED (meta older than maxStaleMin, data still present)
  15-30min: EMPTY (meta expired, data still present but records=0)
  30min+:   EMPTY (both expired)

Previously both keys had 600s TTL, so they expired together and
health jumped straight from OK to EMPTY with no stale window.

* fix(relay): destroy locally-created client on init failure

connect() throws before telegramState.client is assigned, so
destroyTelegramClient() saw null and the leaked client's MTProto
sender kept its autonomous reconnect loop running. Now: hoist
the client variable, assign it to telegramState.client before
calling destroyTelegramClient() on any connect failure (not just
AUTH_KEY_DUPLICATED) so the socket and sender are always torn down.
2026-04-18 00:29:11 +04:00
Elie Habib
1cf249c2f8 fix(security): strip importanceScore from /api/notify payload + scope fan-out by userId (#3143)
* fix(security): strip importanceScore from /api/notify payload + scope fan-out by userId

Closes todo #196 (activation blocker for IMPORTANCE_SCORE_LIVE=1).

Before this fix, any authenticated Pro user could POST to /api/notify with
`payload.importanceScore: 100` and `severity: 'critical'`, bypassing the
relay's IMPORTANCE_SCORE_MIN gate and fan-out would hit every Pro user with
matching rules (no userId filter). This was a pre-existing vulnerability
surfaced during the scoring pipeline work in PR #3069.

Two changes:

1. api/notify.ts — strip `importanceScore` and `corroborationCount` from
   the user-submitted payload before publishing to wm:events:queue. These
   fields are relay-internal (computed by ais-relay's scoring pipeline).
   Also validates `severity` against the known allowlist (critical, high,
   medium, low, info) instead of accepting any string.

2. scripts/notification-relay.cjs — scope rule matching: if the event
   carries `event.userId` (browser-submitted via /api/notify), only match
   rules where `rule.userId === event.userId`. Relay-emitted events (from
   ais-relay, regional-snapshot) have no userId and continue to fan out to
   all matching Pro users. This prevents a single user from broadcasting
   crafted events to every other Pro subscriber's notification channels.

Net effect: browser-submitted events can only reach the submitting user's
own Telegram/Slack/Email/webhook channels, and cannot carry an injected
importanceScore.

🤖 Generated with Claude Opus 4.6 via Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): reject internal relay control events from /api/notify

Review found that `flush_quiet_held` and `channel_welcome` are internal
relay control events (dispatched by Railway cron scripts) that the public
/api/notify endpoint accepted because only eventType length was checked.
A Pro user could POST `{"eventType":"flush_quiet_held","payload":{},
"variant":"full"}` to force-drain their held quiet-hours queue on demand,
bypassing batch_on_wake behavior.

Now returns 403 for reserved event types. The denylist approach (vs
allowlist) is deliberate: new user-facing event types shouldn't require
an API change to work, while new internal events must explicitly be
added to the deny set if they carry privileged semantics.

* fix(security): exempt browser events from score gate + hoist Sets to module scope

Two review findings from Greptile on PR #3143:

P1: Once IMPORTANCE_SCORE_LIVE=1 activates, browser-submitted rss_alert
events (which had importanceScore stripped by the first commit) would
evaluate to score=0 at the relay's top-level gate and be silently
dropped before rule matching. Fix: add `&& !event.userId` to the gate
condition — browser events carry userId and have no server-computed
score, so the gate must not apply to them. Relay-emitted events (no
userId, server-computed score) are still gated as before.

P2: VALID_SEVERITIES and INTERNAL_EVENT_TYPES Sets were allocated inside
the handler on every request. Hoisted to module scope.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 11:14:25 +04:00
Sebastien Melki
a4d9b0a5fa feat(auth): user-facing API key management (create / list / revoke) (#3125)
* feat(auth): user-facing API key management (create / list / revoke)

Adds full-stack API key management so authenticated users can create,
list, and revoke their own API keys from the Settings UI.

Backend:
- Convex `userApiKeys` table with SHA-256 key hash storage
- Mutations: createApiKey, listApiKeys, revokeApiKey
- Internal query validateKeyByHash + touchKeyLastUsed for gateway
- HTTP endpoints: /api/api-keys (CRUD) + /api/internal-validate-api-key
- Gateway middleware validates user-owned keys via Convex + Redis cache

Frontend:
- New "API Keys" tab in UnifiedSettings (visible when signed in)
- Create form with copy-on-creation banner (key shown once)
- List with prefix display, timestamps, and revoke action
- Client-side key generation + hashing (plaintext never sent to DB)

Closes #3116

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(api-keys): address PR review — cache invalidation, prefix validation, revoked-key guard

- Invalidate Redis cache on key revocation so gateway rejects revoked keys
  immediately instead of waiting for 5-min TTL expiry (P1)
- Enforce `wm_` prefix format with regex instead of loose length check (P2)
- Skip `touchKeyLastUsed` for revoked keys to preserve clean audit trail (P2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(api-keys): address consolidated PR review (P0–P3)

P0: gate createApiKey on pro entitlement (tier >= 1); isCallerPremium
now verifies key-owner tier instead of treating existence as premium.

P1: wire wm_ user keys into the domain gateway auth path with async
Convex-backed validation; user keys go through entitlement checks
(only admin keys bypass). Lower cache TTL 300s → 60s and await
revocation cache-bust instead of fire-and-forget.

P2: remove dead HTTP create/list/revoke path from convex/http.ts;
switch to cachedFetchJson (stampede protection, env-prefixed keys,
standard NEG_SENTINEL); add tenancy check on cache-invalidation
endpoint via new /api/internal-get-key-owner route; add 22 Convex
tests covering tier gate, per-user limit, duplicate hash, ownership
revoke guard, getKeyOwner, and touchKeyLastUsed debounce.

P3: tighten keyPrefix regex to exactly 5 hex chars; debounce
touchKeyLastUsed (5 min); surface PRO_REQUIRED in UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(api-keys): gate on apiAccess (not tier), wire wm_ keys through edge routes, harden error paths

- Gate API key creation/validation on features.apiAccess instead of tier >= 1.
  Pro (tier 1, apiAccess=false) can no longer mint keys — only API_STARTER+.
- Wire wm_ user keys through standalone edge routes (shipping/route-intelligence,
  shipping/webhooks) that were short-circuiting on validateApiKey before async
  Convex validation could run.
- Restore fail-soft behavior in validateUserApiKey: transient Convex/network
  errors degrade to unauthorized instead of bubbling a 500.
- Fail-closed on cache invalidation endpoint: ownership check errors now return
  503 instead of silently proceeding (surfaces Convex outages in logs).
- Tests updated: positive paths use api_starter (apiAccess=true), new test locks
  Pro-without-API-access rejection. 23 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(webhooks): remove wm_ user key fallback from shipping webhooks

Webhook ownership is keyed to SHA-256(apiKey) via callerFingerprint(),
not to the user. With user-owned keys (up to 5 per user), this causes
cross-key blindness (webhooks invisible when calling with a different
key) and revoke-orphaning (revoking the creating key makes the webhook
permanently unmanageable). User keys remain supported on the read-only
route-intelligence endpoint. Webhook ownership migration to userId will
follow in a separate PR.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Elie Habib <elie.habib@gmail.com>
2026-04-17 07:20:39 +04:00
Elie Habib
935417e390 chore(relay): socialVelocity + wsbTickers to hourly fetch (6x Reddit traffic reduction) (#3135)
* chore(relay): socialVelocity + wsbTickers to hourly fetch (was 10min)

Reduce Reddit rate-limiting blast radius. Both seeders fetch 5 subreddits
combined (2 for SV: worldnews, geopolitics; 3 for WSB: wallstreetbets,
stocks, investing) with no proxy or OAuth. Reddit's behavioral heuristic
for datacenter IPs consistently flags the Railway IP after ~50min of
10-min polling and returns HTTP 403 on every subsequent cycle until the
container restarts with a new IP.

Evidence (2026-04-16 ais-relay log):
  13:32-14:22 UTC: 6 successful 10-min cycles for both seeders
  16:06-16:16 UTC: 2 more successful cycles after a restart
  16:26 UTC:       BOTH subs flip to HTTP 403 simultaneously
  16:36, 16:46, 16:56: every cycle, all 5 subreddits return 403

Dropping success-path frequency from 6/hour to 1/hour cuts the traffic
Reddit's heuristic sees by 6x. On failure path the 20-min retry is kept
as-is — during a block we've already been flagged, so extra retries don't
make it worse.

Changes:
- SOCIAL_VELOCITY_INTERVAL_MS:   10min → 60min
- SOCIAL_VELOCITY_TTL:           30min → 3h   (3× new interval)
- WSB_TICKERS_INTERVAL_MS:       10min → 60min
- WSB_TICKERS_TTL:               30min → 3h   (3× new interval)
- api/health.js maxStaleMin:     30min → 180min for both (3× interval)
- api/seed-health.js intervalMin: 15 → 90 for wsb-tickers (maxStaleMin / 2)

Proper fix (proxy fallback or Reddit OAuth) deferred.

* fix(seed-health): add socialVelocity parity entry — greptile P2

Review finding on PR #3135: wsbTickers was bumped from intervalMin=15 to 90
but socialVelocity had no seed-health.js entry at all. Both Reddit seeders
now share the same 60-min cadence; adding the missing entry gives parity.

P2-1 (malformed comment lines 5682-5683) is a false positive — verified
the lines do start with '//' in the file.
2026-04-16 22:17:58 +04:00
Elie Habib
7d27cec21c feat(relay): seeder-loop heartbeats for chokepoint-flows + climate-news (#3133)
* feat(relay): seeder-loop heartbeats for chokepoint-flows + climate-news

Detect silent relay-loop failures (ERR_MODULE_NOT_FOUND at import, event-loop
blocked, container restart loop) up to 4 hours earlier than the data-level
seed-meta staleness window.

The chokepoint-flows bug that motivated this PR was invisible in health for
32 hours because each 6h cron tick fired, execFile'd the child, child died
at import, and NO ONE updated seed-meta:energy:chokepoint-flows. Since the
last successful write was still within its 3-day TTL, the data key was
present and the old seed-meta was still there — STALE_SEED triggered only
at +12h, and even then was a warn (not crit) that could easily be missed.

Fix:
- In scripts/ais-relay.cjs, write a success-only heartbeat via upstashSet
  after each execFile-spawned seeder exits cleanly. TTL = 3x the loop
  interval (18h for chokepoint-flows, 90min for climate-news) so a single
  missed cycle doesn't flap but two consecutive misses alarm.
  Payload shape matches seed-meta for drop-in compatibility with the
  existing health-check reader: { fetchedAt, recordCount, durMs }.

- In api/health.js, register two new STANDALONE_KEYS entries pointing at
  the heartbeat keys, plus SEED_META entries with tighter maxStaleMin:
    chokepointFlowsRelayHeartbeat: 480min (8h vs 720min existing)
    climateNewsRelayHeartbeat:     60min  (vs 90min existing)
  When the relay loop fails for >2 intervals, the heartbeat goes stale
  first and surfaces as STALE_SEED in /api/health, giving 4h more notice
  than waiting for seed-meta:energy:chokepoint-flows.

This is orthogonal to PR #3132 (fixes the actual ERR_MODULE_NOT_FOUND root
cause). Heartbeat is defensive observability for the NEXT failure mode we
can't predict.

* fix(health): gate new relay heartbeat keys as ON_DEMAND during deploy window — greptile review

Review finding on PR #3133: new heartbeat keys (relay:heartbeat:chokepoint-flows,
relay:heartbeat:climate-news) are written by ais-relay.cjs AFTER the first
successful post-deploy loop. Vercel deploys api/health.js instantly, so the
window between 'merge' and 'first heartbeat written' is:
  - chokepoint-flows: up to 6h (initial loop tick)
  - climate-news:     up to 30min

During that window the heartbeat keys don't exist in Redis. classifyKey()
would return EMPTY (crit), which counts toward critCount and can flip overall
/api/health to DEGRADED even though climateNews and chokepointFlows data
themselves are fine.

Matches existing rule in project memory
(feedback_health_required_key_needs_railway_cron_first.md) — new seeder +
health.js registration in same PR needs ON_DEMAND gating until the Railway
side catches up, then harden after ~7 days.

Fix: add both keys to ON_DEMAND_KEYS with TRANSITIONAL comments, matching
the fxYoy / hyperliquidFlow pattern already used for the same issue.
2026-04-16 18:21:51 +04:00
Elie Habib
da1fa3367b fix(resilience-ranking): chunked warm SET, always-on rebuild, truthful meta (Slice B) (#3124)
* fix(resilience-ranking): chunked warm SET, always-on rebuild, truthful meta

Slice B follow-up to PR #3121. Three coupled production failures observed:

1. Per-country score persistence works (Slice A), but the 222-SET single
   pipeline body (~600KB) exceeds REDIS_PIPELINE_TIMEOUT_MS (5s) on Vercel
   Edge. runRedisPipeline returns []; persistence guard correctly returns
   empty; coverage = 0/222 < 75%; ranking publish silently dropped. Live
   Railway log: "Ranking: 0 ranked, 222 greyed out" → "Rebuilt … with 222
   countries (bulk-call race left ranking:v9 null)" — second call only
   succeeded because Upstash had finally caught up between attempts.

2. The seeder's probe + rebuild block lives inside `if (missing > 0)`. When
   per-country scores survive a cron tick (TTL 6h, cron every 6h), missing=0
   and the rebuild path is skipped. Ranking aggregate then expires alone and
   is never refreshed until scores also expire — multi-hour gaps where
   `resilience:ranking:v9` is gone while seed-meta still claims freshness.

3. `writeRankingSeedMeta` fires whenever finalWarmed > 0, regardless of
   whether the ranking key is actually present. Health endpoint sees fresh
   meta + missing data → EMPTY_ON_DEMAND with a misleading seedAge.

Fixes:
- _shared.ts: split the warm pipeline SET into SET_BATCH=30-command chunks
  so each pipeline body fits well under timeout. Pad missing-batch results
  with empty entries so the per-command alignment stays correct (failed
  batches stay excluded from `warmed`, no proof = no claim).
- seed-resilience-scores.mjs: extract `ensureRankingPresent` helper, call
  it from BOTH the missing>0 and missing===0 branches so the ranking gets
  refreshed every cron. Add a post-rebuild STRLEN verification — rebuild
  HTTP can return 200 with a payload but still skip the SET (coverage gate,
  pipeline failure).
- main(): only writeRankingSeedMeta when result.rankingPresent === true.
  Otherwise log and let the next cron retry.

Tests:
- resilience-ranking.test.mts: assert pipelines stay ≤30 commands.
- resilience-scores-seed.test.mjs: structural checks that the rebuild is
  hoisted (≥2 callsites of ensureRankingPresent), STRLEN verification is
  present, and meta write is gated on rankingPresent.

Full resilience suite: 373/373 pass (was 370 — 3 new tests).

* fix(resilience-ranking): seeder no longer writes seed-meta (handler is sole writer)

Reviewer P1: ensureRankingPresent() returning true only means the ranking
key exists in Redis — not that THIS cron actually wrote it. The handler
skips both the ranking SET and the meta SET when coverage < 75%, so an
older ranking from a prior cron can linger while this cron's data didn't
land. Under that scenario, the previous commit still wrote a fresh
seed-meta:resilience:ranking, recreating the stale-meta-over-stale-data
failure this PR is meant to eliminate.

Fix: remove seeder-side seed-meta writes entirely. The ranking handler
already writes ranking + meta atomically in the same pipeline when (and
only when) coverage passes the gate. ensureRankingPresent() triggers the
handler every cron, which addresses the original rationale for the seeder
heartbeat (meta going stale during quiet Pro usage) without the seeder
needing to lie.

Consequence on failure:
- Coverage gate trips → handler writes neither ranking nor meta.
- seed-meta stays at its previous timestamp; api/health reports accurate
  staleness (STALE_SEED after maxStaleMin, then CRIT) instead of a fresh
  meta over stale/empty data.

Tests updated: the "meta gated on rankingPresent" assertion is replaced
with "seeder must not SET seed-meta:resilience:ranking" + "no
writeRankingSeedMeta". Comments may still reference the key name for
maintainer clarity — the assertion targets actual SET commands.

Full resilience suite: 373/373 pass.

* fix(resilience-ranking): always refresh + 12h TTL (close timing hole)

Reviewer P1+P2:

- P1: ranking TTL == cron interval (both 6h) left a timing hole. If a cron
  wrote the key near the end of its run and the next cron fired near the
  start of its interval, the key was still alive at probe time →
  ensureRankingPresent() returned early → no rebuild → key expired a short
  while later and stayed absent until a cron eventually ran while the key
  was missing. Multi-hour EMPTY_ON_DEMAND gaps.

- P2: probing only the ranking data key (not seed-meta) meant a partial
  handler pipeline (ranking SET ok, meta SET missed) would self-heal only
  when the ranking itself expired — never during its TTL window.

Fix:

1. Bump RESILIENCE_RANKING_CACHE_TTL_SECONDS from 6h to 12h (2x cron
   interval). A single missed or slow cron no longer causes a gap.
   Server-side and seeder-side constants kept in sync.

2. Replace ensureRankingPresent() with refreshRankingAggregate(): drop the
   'if key present, skip' short-circuit. Rebuild every cron, unconditionally.
   One cheap HTTP call keeps ranking + seed-meta rolling forward together
   and self-heals the partial-pipeline case — handler retries the atomic
   pair every 6h regardless of whether the keys are currently live.

3. Update health.js comment to reflect the new TTL and refresh cadence
   (12h data TTL, 6h refresh, 12h staleness threshold = 2 missed ticks).

Tests:
- RESILIENCE_RANKING_CACHE_TTL_SECONDS asserts 12h (was 6h).
- New assertion: refreshRankingAggregate must NOT early-return on probe-
  hit, and the rebuild HTTP call must be unconditional in its body.
- DEL-guard test relaxed to allow comments between '{' and the DEL line
  (structural property preserved).

Full resilience suite: 375/375.

* fix(resilience-ranking): parallelize warm batches + atomic rebuild via ?refresh=1

Reviewer P2s:

- Warm path serialized the 8 batch pipelines with `await` in a for-loop,
  adding ~7 extra Upstash round-trips (100-500ms each on Edge) to the warm
  wall-clock. Batches are independent; Promise.all collapses them into one
  slowest-batch window.

- DEL+rebuild created a brief absence window: if the rebuild request failed
  transiently, the ranking stayed absent until the next cron. Now seeder
  calls `/api/resilience/v1/get-resilience-ranking?refresh=1` and the
  handler bypasses its cache-hit early-return, recomputing and SETting
  atomically. On rebuild failure, the existing (possibly stale-but-present)
  ranking is preserved instead of being nuked.

Handler: read ctx.request.url for the refresh query param; guard the URL
parse with try/catch so an unparseable url falls back to the cached-first
behavior.

Tests:
- New: ?refresh=1 must bypass the cache-hit early-return (fails on old code,
  passes now).
- DEL-guard test replaced with 'does NOT DEL' + 'uses ?refresh=1'.
- Batch chunking still asserted at SET_BATCH=30.

Full resilience suite: 376/376.

* fix(resilience-ranking): bulk-warm call also needs ?refresh=1 (asymmetric TTL hazard)

Reviewer P1: in the 6h-12h window, per-country score keys have expired
(TTL 6h) but the ranking aggregate is still alive (TTL 12h). The seeder's
bulk-warm call was hitting get-resilience-ranking without ?refresh=1, so
the handler's cache-hit early-return fired and the entire warm path was
skipped. Scores stayed missing; coverage degraded; the only recovery was
the per-country laggard loop (5-request batches) — which silently no-ops
when WM_KEY is absent. This defeated the whole point of the chunked bulk
warm introduced in this PR.

Fix: the bulk-warm fetch at scripts/seed-resilience-scores.mjs:167 now
appends ?refresh=1, matching the rebuild call. Every seeder-initiated hit
on the ranking endpoint forces the handler to route through
warmMissingResilienceScores and its chunked pipeline SET, regardless of
whether the aggregate is still cached.

Test extended: structural assertion now scans ALL occurrences of
get-resilience-ranking in the seeder and requires every one of them to
carry ?refresh=1. Fails the moment a future change adds a bare call.

Full resilience suite: 376/376.

* fix(resilience-ranking): gate ?refresh=1 on seed key + detect partial pipeline publish

Reviewer P1: ?refresh=1 was honored for any caller — including valid Pro
bearer tokens. A full warm is ~222 score computations + chunked pipeline
SETs; a Pro user looping on refresh=1 (or an automated client) could DoS
Upstash quota and Edge budget. Gate refresh behind
WORLDMONITOR_VALID_KEYS / WORLDMONITOR_API_KEY (X-WorldMonitor-Key
header) — the same allowlist the cron uses. Pro bearer tokens get the
standard cache-first path; refresh requires the seed service key.

Reviewer P2: the handler's atomic runRedisPipeline SET of ranking + meta
is non-transactional on Upstash REST — either SET can fail independently.
If the ranking landed but meta missed, the seeder's STRLEN verify would
pass (ranking present) while /api/health stays stuck on stale meta.

Two-part fix:
- Handler inspects pipelineResult[0] and [1] and logs a warning when
  either SET didn't return OK. Ops-greppable signal.
- Seeder's verify now checks BOTH keys in parallel: STRLEN on ranking
  data, and GET + fetchedAt freshness (<5min) on seed-meta. Partial
  publish logs a warning; next cron retries (SET is idempotent).

Tests:
- New: ?refresh=1 without/with-wrong X-WorldMonitor-Key must NOT trigger
  recompute (falls back to cached response). Existing bypass test updated
  to carry a valid seed key header.

Full resilience suite: 376/376 + 1 new = 377/377.
2026-04-16 12:48:41 +04:00
Elie Habib
13446a2170 fix(seed-contract-probe): send Origin header so /api/bootstrap boundary check doesn't 401 (#3100)
* fix(seed-contract-probe): send Origin header so /api/bootstrap boundary check doesn't 401

Production probe returned {boundary: [{endpoint: '/api/bootstrap', pass: false,
status: 401, reason: 'status:401'}]}. Root cause: checkPublicBoundary's
self-fetch had no Origin header, so /api/bootstrap's validateApiKey() treated
it as a non-browser caller and required an API key.

Fix: set Origin: https://worldmonitor.app on the boundary self-fetch. This
takes the trusted-browser path without needing to embed an API key in the
probe. The probe runs edge-side with x-probe-secret internal auth; emulating
a trusted browser is only for boundary response-shape verification.

Tests still 17/17.

* fix(seed-contract-probe): explicit User-Agent on boundary self-fetch

Per AGENTS.md, server-side fetches must include a UA. middleware.ts:138
returns 403 for !ua || ua.length < 10 on non-public paths, and
/api/bootstrap is not in PUBLIC_API_PATHS — the probe works today only
because Vercel Edge implicitly adds a UA. Making it explicit.

Addresses greptile P2 on PR #3100.
2026-04-15 15:34:38 +04:00
Elie Habib
044598346e feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated (#3097)
* feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated

Opt-in contract path in runSeed: when opts.declareRecords is provided, write
{_seed, data} envelope to the canonical key alongside legacy seed-meta:*
(dual-write). State machine: OK / OK_ZERO / RETRY with zeroIsValid opt.
declareRecords throws or returns non-integer → hard fail (contract violation).
extraKeys[*] support per-key declareRecords; each extra key writes its own
envelope. Legacy seeders (no declareRecords) entirely unchanged.

Migrated all 91 scripts/seed-*.mjs to contract mode. Each exports
declareRecords returning the canonical record count, and passes
schemaVersion: 1 + maxStaleMin (matched to api/health.js SEED_META, or 2.5x
interval where no registry entry exists). Contract conformance reports 84/86
seeders with full descriptor (2 pre-existing warnings).

Legacy seed-meta keys still written so unmigrated readers keep working;
follow-up slices flip health.js + readers to envelope-first.

Tests: 61/61 PR 1 tests still pass.

Next slices for PR 2:
- api/health.js registry collapse + 15 seed-bundle-*.mjs canonicalKey wiring
- reader migration (mcp, resilience, aviation, displacement, regional-snapshot)
- direct writers — ais-relay.cjs, consumer-prices-core publish.ts
- public-boundary stripSeedEnvelope + test migration

Plan: docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md

* fix(seed-contract): unwrap envelopes in internal cross-seed readers

After PR 2a enveloped 91 canonical keys as {_seed, data}, every script-side
reader that returned the raw parsed JSON started silently handing callers the
envelope instead of the bare payload. WoW baselines (bigmac, grocery-basket,
fear-greed) saw undefined .countries / .composite; seed-climate-anomalies saw
undefined .normals from climate:zone-normals:v1; seed-thermal-escalation saw
undefined .fireDetections from wildfire:fires:v1; seed-forecasts' ~40-key
pipeline batch returned envelopes for every input.

Fix: route every script-side reader through unwrapEnvelope(...).data. Legacy
bare-shape values pass through unchanged (unwrapEnvelope returns
{_seed: null, data: raw} for any non-envelope shape).

Changed:
- scripts/_seed-utils.mjs: import unwrapEnvelope; redisGet, readSeedSnapshot,
  verifySeedKey all unwrap. Exported new readCanonicalValue() helper for
  cross-seed consumers.
- 18 seed-*.mjs scripts with local redisGet-style helpers or inline fetch
  patched to unwrap via the envelope source module (subagent sweep).
- scripts/seed-forecasts.mjs pipeline batch: parse() unwraps each result.
- scripts/seed-energy-spine.mjs redisMget: unwraps each result.

Tests:
- tests/seed-utils-envelope-reads.test.mjs: 7 new cases covering envelope
  + legacy + null paths for readSeedSnapshot and verifySeedKey.
- Full seed suite: 67/67 pass (was 61, +6 new).

Addresses both of user's P1 findings on PR #3097.

* feat(seed-contract): envelope-aware reads in server + api helpers

Every RPC and public-boundary reader now automatically strips _seed from
contract-mode canonical keys. Legacy bare-shape values pass through unchanged
(unwrapEnvelope no-ops on non-envelope shapes).

Changed helpers (one-place fix — unblocks ~60 call sites):
- server/_shared/redis.ts: getRawJson, getCachedJson, getCachedJsonBatch
  unwrap by default. cachedFetchJson inherits via getCachedJson.
- api/_upstash-json.js: readJsonFromUpstash unwraps (covers api/mcp.ts
  tool responses + all its canonical-key reads).
- api/bootstrap.js: getCachedJsonBatch unwraps (public-boundary —
  clients never see envelope metadata).

Left intentionally unchanged:
- api/health.js / api/seed-health.js: read only seed-meta:* keys which
  remain bare-shape during dual-write. unwrapEnvelope already imported at
  the meta-read boundary (PR 1) as a defensive no-op.

Tests: 67/67 seed tests pass. typecheck + typecheck:api clean.

This is the blast-radius fix the PR #3097 review called out — external
readers that would otherwise see {_seed, data} after the writer side
migrated.

* fix(test): strip export keyword in vm.runInContext'd seed source

cross-source-signals-regulatory.test.mjs loads scripts/seed-cross-source-signals.mjs
via vm.runInContext, which cannot parse ESM `export` syntax. PR 2a added
`export function declareRecords` to every seeder, which broke this test's
static-analysis approach.

Fix: strip the `export` keyword from the declareRecords line in the
preprocessed source string so the function body still evaluates as a plain
declaration.

Full test:data suite: 5307/5307 pass. typecheck + typecheck:api clean.

* feat(seed-contract): consumer-prices publish.ts writes envelopes

Wrap the 5 canonical keys written by consumer-prices-core/src/jobs/publish.ts
(overview, movers:7d/30d, freshness, categories:7d/30d/90d, retailer-spread,
basket-series) in {_seed, data} envelopes. Legacy seed-meta:<key> writes
preserved for dual-write.

Inlined a buildEnvelope helper (10 lines) rather than taking a cross-package
dependency — consumer-prices-core is a standalone npm package. Documented the
four-file parity contract (mjs source, ts mirror, js edge mirror, this copy).

Contract fields: sourceVersion='consumer-prices-core-publish-v1', schemaVersion=1,
state='OK' (recordCount>0) or 'OK_ZERO' (legitimate zero).

Typecheck: no new errors in publish.ts.

* fix(seed-contract): 3 more server-side readers unwrap envelopes

Found during final audit:

- server/worldmonitor/resilience/v1/_shared.ts: resilience score reader
  parsed cached GetResilienceScoreResponse raw. Contract-mode seed-resilience-scores
  now envelopes those keys.
- server/worldmonitor/resilience/v1/get-resilience-ranking.ts: p05/p95
  interval lookup parsed raw from seed-resilience-scores' extra-key path.
- server/worldmonitor/infrastructure/v1/_shared.ts: mgetJson() used for
  count-source keys (wildfire:fires:v1, news:insights:v1) which are both
  contract-mode now.

All three now unwrap via server/_shared/seed-envelope. Legacy shapes pass
through unchanged.

Typecheck clean.

* feat(seed-contract): ais-relay.cjs direct writes produce envelopes

32 canonical-key write sites in scripts/ais-relay.cjs now produce {_seed, data}
envelopes. Inlined buildEnvelope() (CJS module can't require ESM source) +
envelopeWrite(key, data, ttlSeconds, meta) wrapper. Enveloped keys span market
bootstrap, aviation, cyber-threats, theater-posture, weather-alerts, economic
spending/fred/worldbank, tech-events, corridor-risk, usni-fleet, shipping-stress,
social:reddit, wsb-tickers, pizzint, product-catalog, chokepoint transits,
ucdp-events, satellites, oref.

Left bare (not seeded data keys): seed-meta:* (dual-write legacy),
classifyCacheKey LLM cache, notam:prev-closed-state internal state,
wm:notif:scan-dedup flags.

Updated tests/ucdp-seed-resilience.test.mjs regex to accept both upstashSet
(pre-contract) and envelopeWrite (post-contract) call patterns.

* feat(seed-contract): 15 bundle files add canonicalKey for envelope gate

54 bundle sections across 12 files now declare canonicalKey alongside the
existing seedMetaKey. _bundle-runner.mjs (from PR 1) prefers canonicalKey
when both are present — gates section runs on envelope._seed.fetchedAt
read directly from the data key, eliminating the meta-outlives-data class
of bugs.

Files touched:
- climate (5), derived-signals (2), ecb-eu (3), energy-sources (6),
  health (2), imf-extended (4), macro (10), market-backup (9),
  portwatch (4), relay-backup (2), resilience-recovery (5), static-ref (2)

Skipped (14 sections, 3 whole bundles): multi-key writers, dynamic
templated keys (displacement year-scoped), or non-runSeed orchestrators
(regional brief cron, resilience-scores' 222-country publish, validation/
benchmark scripts). These continue to use seedMetaKey or their own gate.

seedMetaKey preserved everywhere — dual-write. _bundle-runner.mjs falls
back to legacy when canonicalKey is absent.

All 15 bundles pass node --check. test:data: 5307/5307. typecheck:all: clean.

* fix(seed-contract): 4 PR #3097 review P1s — transform/declareRecords mismatches + envelope leaks

Addresses both P1 findings and the extra-key seed-meta leak surfaced in review:

1. runSeed helper-level invariant: seed-meta:* keys NEVER envelope.
   scripts/_seed-utils.mjs exports shouldEnvelopeKey(key) — returns false for
   any key starting with 'seed-meta:'. Both atomicPublish (canonical) and
   writeExtraKey (extras) gate the envelope wrap through this helper. Fixes
   seed-iea-oil-stocks' ANALYSIS_META_EXTRA_KEY silently getting enveloped,
   which broke health.js parsing the value as bare {fetchedAt, recordCount}.
   Also defends against any future manual writeExtraKey(..., envelopeMeta)
   call that happens to target a seed-meta:* key.

2. seed-token-panels canonical + extras fixed.
   publishTransform returns data.defi (the defi panel itself, shape {tokens}).
   Old declareRecords counted data.defi.tokens + data.ai.tokens + data.other.tokens
   on the transformed payload → 0 → RETRY path → canonical market:defi-tokens:v1
   never wrote, and because runSeed returned before the extraKeys loop,
   market:ai-tokens:v1 + market:other-tokens:v1 stayed stale too.
   New: declareRecords counts data.tokens on the transformed shape. AI_KEY +
   OTHER_KEY extras reuse the same function (transforms return structurally
   identical panels). Added isMain guard so test imports don't fire runSeed.

3. api/product-catalog.js cached reader unwraps envelope.
   ais-relay.cjs now envelopes product-catalog:v2 via envelopeWrite(). The
   edge reader did raw JSON.parse(result) and returned {_seed, data} to
   clients, breaking the cached path. Fix: import unwrapEnvelope from
   ./_seed-envelope.js, apply after JSON.parse. One site — :238-241 is
   downstream of getFromCache(), so the single reader fix covers both.

4. Regression lock tests/seed-contract-transform-regressions.test.mjs (11 cases):
   - shouldEnvelopeKey invariant: seed-meta:* false, canonical true
   - Token-panels declareRecords works on transformed shape (canonical + both extras)
   - Explicit repro of pre-fix buggy signature returning 0 — guards against revert
   - resolveRecordCount accepts 0, rejects non-integer
   - Product-catalog envelope unwrap returns bare shape; legacy passes through

Verification:
- npm run test:data → 5318/5318 pass (was 5307 — 11 new regressions)
- npm run typecheck:all → clean
- node --check on every modified script

iea-oil-stocks canonical declareRecords was NOT broken (user confirmed during
review — buildIndex preserves .members); only its ANALYSIS_META_EXTRA_KEY
was affected, now covered generically by commit 1's helper invariant.

* fix(seed-contract): seed-token-panels validateFn also runs on post-transform shape

Review finding: fixing declareRecords wasn't sufficient — atomicPublish() runs
validateFn(publishData) on the transformed payload too. seed-token-panels'
validate() checked data.defi/.ai/.other on the transformed {tokens} shape,
returned false, and runSeed took the early skipped-write branch (before even
reaching the declareRecords RETRY logic). Net effect: same as before the
declareRecords fix — canonical + both extras stayed stale.

Fix: validate() now checks the canonical defi panel directly (Array.isArray
(data?.tokens) && has at least one t.price > 0). AI/OTHER panels are validated
implicitly by their own extraKey declareRecords on write.

Audited the other 9 seeders with publishTransform (bls-series, bis-extended,
bis-data, gdelt-intel, trade-flows, iea-oil-stocks, jodi-gas, sanctions-pressure,
forecasts): all validateFn's correctly target the post-transform shape. Only
token-panels regressed.

Added 4 regression tests (tests/seed-contract-transform-regressions.test.mjs):
- validate accepts transformed panel with priced tokens
- validate rejects all-zero-price tokens
- validate rejects empty/missing tokens
- Explicit pre-fix repro (buggy old signature fails on transformed shape)

Verification:
- npm run test:data → 5322/5322 pass (was 5318; +4 new)
- npm run typecheck:all → clean
- node --check clean

* feat(seed-contract): add /api/seed-contract-probe validation endpoint

Single machine-readable gate for 'is PR #3097 working in production'.
Replaces the curl/jq ritual with one authenticated edge call that returns
HTTP 200 ok:true or 503 + failing check list.

What it validates:
- 8 canonical keys have {_seed, data} envelopes with required data fields
  and minRecords floors (fsi-eu, zone-normals, 3 token panels + minRecords
  guard against token-panels RETRY regression, product-catalog, wildfire,
  earthquakes).
- 2 seed-meta:* keys remain BARE (shouldEnvelopeKey invariant; guards
  against iea-oil-stocks ANALYSIS_META_EXTRA_KEY-class regressions).
- /api/product-catalog + /api/bootstrap responses contain no '_seed' leak.

Auth: x-probe-secret header must match RELAY_SHARED_SECRET (reuses existing
Vercel↔Railway internal trust boundary).

Probe logic is exported (checkProbe, checkPublicBoundary, DEFAULT_PROBES) for
hermetic testing. tests/seed-contract-probe.test.mjs covers every branch:
envelope pass/fail on field/records/shape, bare pass/fail on shape/field,
missing/malformed JSON, Redis non-2xx, boundary seed-leak detection,
DEFAULT_PROBES sanity (seed-meta invariant present, token-panels minRecords
guard present).

Usage:
  curl -H "x-probe-secret: $RELAY_SHARED_SECRET" \
       https://api.worldmonitor.app/api/seed-contract-probe

PR 3 will extend the probe with a stricter mode that asserts seed-meta:*
keys are GONE (not just bare) once legacy dual-write is removed.

Verification:
- tests/seed-contract-probe.test.mjs → 15/15 pass
- npm run test:data → 5338/5338 (was 5322; +16 new incl. conformance)
- npm run typecheck:all → clean

* fix(seed-contract): tighten probe — minRecords on AI/OTHER + cache-path source header

Review P2 findings: the probe's stated guards were weaker than advertised.

1. market:ai-tokens:v1 + market:other-tokens:v1 probes claimed to guard the
   token-panels extra-key RETRY regression but only checked shape='envelope'
   + dataHas:['tokens']. If an extra-key declareRecords regressed to 0, both
   probes would still pass because checkProbe() only inspects _seed.recordCount
   when minRecords is set. Now both enforce minRecords: 1.

2. /api/product-catalog boundary check only asserted no '_seed' leak — which
   is also true for the static fallback path. A broken cached reader
   (getFromCache returning null or throwing) could serve fallback silently
   and still pass this probe. Now:
   - api/product-catalog.js emits X-Product-Catalog-Source: cache|dodo|fallback
     on the response (the json() helper gained an optional source param wired
     to each of the three branches).
   - checkPublicBoundary declaratively requires that header's value match
     'cache' for /api/product-catalog, so a fallback-serve fails the probe
     with reason 'source:fallback!=cache' or 'source:missing!=cache'.

Test updates (tests/seed-contract-probe.test.mjs):
- Boundary check reworked to use a BOUNDARY_CHECKS config with optional
  requireSourceHeader per endpoint.
- New cases: served-from-cache passes, served-from-fallback fails with source
  mismatch, missing header fails, seed-leak still takes precedence, bad
  status fails.
- Token-panels sanity test now asserts minRecords≥1 on all 3 panels.

Verification:
- tests/seed-contract-probe.test.mjs → 17/17 pass (was 15, +2 net)
- npm run test:data → 5340/5340
- npm run typecheck:all → clean
2026-04-15 09:16:27 +04:00
Elie Habib
dc10e47197 feat(seed-contract): PR 1 foundation — envelope + contract + conformance test (#3095)
* feat(seed-contract): PR 1 foundation — envelope helpers + contract validators + static conformance test

Adds the foundational pieces for the unified seed contract rollout described in
docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md. Behavior-
preserving by construction: legacy-shape Redis values unwrap as { _seed: null,
data: raw } and pass through every helper unchanged.

New files:
- scripts/_seed-envelope-source.mjs — single source of truth for unwrapEnvelope,
  stripSeedEnvelope, buildEnvelope.
- api/_seed-envelope.js — edge-safe mirror (AGENTS.md:80 forbids api/* importing
  from server/).
- server/_shared/seed-envelope.ts — TS mirror with SeedMeta, SeedEnvelope,
  UnwrapResult types.
- scripts/_seed-contract.mjs — SeedContractError + validateDescriptor (10
  required fields, 10 optional, unknown-field rejection) + resolveRecordCount
  (non-negative integer or throw).
- scripts/verify-seed-envelope-parity.mjs — diffs function bodies between the
  two JS copies; TS copy guarded by tsc.
- tests/seed-envelope.test.mjs — 14 tests for the three helpers (null,
  legacy-passthrough, stringified JSON, round-trip).
- tests/seed-contract.test.mjs — 25 tests for validateDescriptor/
  resolveRecordCount + a soft-warn conformance scan that STATICALLY parses
  scripts/seed-*.mjs (never dynamic import — several seeders process.exit() at
  module load). Currently logs 91 seeders awaiting declareRecords migration.

Wiring (minimal, behavior-preserving):
- api/health.js: imports unwrapEnvelope; routes readSeedMeta's parsed value
  through it. Legacy meta has no _seed wrapper → passes through unchanged.
- scripts/_bundle-runner.mjs: readSectionFreshness prefers envelope at
  section.canonicalKey when present, falls back to the existing
  seed-meta:<key> read via section.seedMetaKey (unchanged path today since no
  bundle defines canonicalKey yet).

No seeder modified. No writes changed. All 5279 existing data tests still
green; both typechecks clean; parity verifier green; 39 new tests pass.

PR 2 will migrate seeders, bundles, and readers to envelope semantics. PR 3
removes the legacy path and hard-fails the conformance test.

* fix(seed-contract): address PR #3095 review — metaTtlSeconds opt, bundle fallback, strict conformance mode

Review findings applied:

P1 — metaTtlSeconds missing from OPTIONAL_FIELDS whitelist.
scripts/seed-jodi-gas.mjs:250 passes metaTtlSeconds to runSeed(); field is
consumed by _seed-utils writeSeedMeta. Without it in the whitelist, PR 2's
validateDescriptor wiring would throw 'unknown field' the moment jodi-gas
migrates. Added with a 'removed in PR 3' note.

P2 — Bundle canonicalKey short-circuit over-runs during migration.
readSectionFreshness previously returned null if canonicalKey had no envelope
yet, even when a legacy seed-meta key was also declared — making every cron
re-run the section. Fixed to fall through to seedMetaKey on null envelope so
the transition state is safe.

P3 — Conformance soft-warn signal was invisible in CI.
tests/seed-contract.test.mjs now emits a t.diagnostic summary line
('N/M seeders export declareRecords') visible on every run and gates hard-fail
behind SEED_CONTRACT_STRICT=1 so PR 3 can flip to strict without more code.

Nitpick — parity regex missed 'export async function'.
Added '(?:async\s+)?' to scripts/verify-seed-envelope-parity.mjs function
extraction regex.

Verified: 39 tests green, parity verifier green, strict mode correctly
hard-fails with 91 seeders missing (expected during PR 1).

* fix(seed-contract): address review round 2 — NaN/empty-string validation, Error cause, parity CI wiring

P2 — Non-finite ttlSeconds/maxStaleMin bypassed validation.
`typeof NaN === 'number'` and `NaN > 0 === false` meant a NaN duration
passed the old typeof+<=0 checks and would have poisoned TTLs once
validateDescriptor is wired into runSeed. Now gated by Number.isFinite,
which rejects NaN and ±Infinity. Tests added for NaN/Infinity on both
fields.

P2 — Empty/whitespace-only strings for domain/resource/canonicalKey/sourceVersion
bypassed validation. Added .trim() === '' rejection + tests per field.
This mattered because canonicalKey='' would have landed writes at the
empty key and seed-meta under a blank resource namespace.

P3 — SeedContractError silently dropped the Error v2 cause option.
Constructor now forwards { cause } through super() so err.cause works
with standard tooling (Node's default stack printer, Sentry chained-cause
serialization). resolveRecordCount's manual err.cause = err assignment
was replaced with the options-bag form. Test added for both constructor
direct-use and the resolveRecordCount wrap path.

P3 — Parity verifier was not on an automated path. Added
tests/seed-envelope-parity.test.mjs which spawns scripts/verify-seed-envelope-parity.mjs
via execFile; non-zero exit (drift) → test fails. Now runs as part of
`npm run test:data` (tsx --test tests/*.test.mjs). Drift injection
confirmed: sed -i modifying api/_seed-envelope.js makes the test fail
with 'Command failed' from execFile.

51 tests total (was 39). All green on clean tree.

* fix(seed-contract): conformance test checks full descriptor, not just declareRecords

Previous conformance check green-lit any seeder that exported
declareRecords, even if the runSeed(...) call-site omitted other
validateDescriptor-required opts (validateFn, ttlSeconds, sourceVersion,
schemaVersion, maxStaleMin). That would have produced a false readiness
signal for PR 3's strict flip: test goes green, but wiring
validateDescriptor() into runSeed in PR 2 would still throw at runtime
across the fleet.

Examples verified on the PR head:
- scripts/seed-cot.mjs:188-192 — no sourceVersion/schemaVersion/maxStaleMin
- scripts/seed-market-breadth.mjs:121-124 — same
- scripts/seed-jodi-gas.mjs:248-253 — no schemaVersion/maxStaleMin

Now the conformance test:
1. AST-lite extracts the runSeed(...) call site with balanced parens,
   tolerating strings and comments.
2. Checks every REQUIRED_OPTS_FIELDS entry (validateFn, declareRecords,
   ttlSeconds, sourceVersion, schemaVersion, maxStaleMin) is present as
   an object key in that call-site.
3. Emits a per-file diagnostic listing missing fields.
4. Migration signal is now accurate: 0/91 seeders fully satisfy the
   descriptor (was claiming 0/91 missing just declareRecords). Matches
   the underlying validateDescriptor behavior.

Verified: strict mode (SEED_CONTRACT_STRICT=1) surfaces 'opt:schemaVersion,
opt:maxStaleMin' as missing fields per seeder — actionable for PR 2
migration work. 51 tests total (unchanged count; behavior change is in
which seeders the one conformance test considers migrated).

* fix(seed-contract): strip comments/strings before parsing runSeed() call site

The conformance scanner located the first 'runSeed(' substring in the raw
source, which caught commented-out mentions upstream of the real call.
Offending files where this produced false 'incomplete' diagnoses:
- scripts/seed-bis-data.mjs:209 // runSeed() calls process.exit(0)…
  real call at :220
- scripts/seed-economy.mjs:788 header comment mentioning runSeed()
  real call at :891

Three files had the same pattern. Under strict mode these would have been
false hard failures in PR 3 even when the real descriptor was migrated.

Fix:
- stripCommentsAndStrings(src) produces a view where block comments, line
  comments, and string/template literals are replaced with spaces (line
  feeds preserved). Indices stay aligned with the original source so
  extractRunSeedCall can match against the stripped view and then slice
  the original source for the real call body.
- descriptorFieldsPresent() also runs its field-presence regex against
  the stripped call body so '// TODO: validateFn' inside the call doesn't
  fool the check.
- hasRunSeedCall() uses the stripped view too, which correctly excludes
  5 seeders that only mentioned runSeed in comments. Count dropped
  91→86 real callers.

Added 4 targeted tests covering:
- runSeed() inside a line comment ahead of the real call
- runSeed() inside a block comment
- runSeed() inside a string literal ("don't call runSeed() directly")
- descriptor field names inside an inline comment don't count as present

Verified on the actual files: seed-bis-data.mjs first real runSeed( in
stripped source is at line 220 (was line 209 before fix).

40 tests total, all green.

* fix(seed-contract): parity verifier survives unbalanced braces in string/template literals

Addresses Greptile P2 on PR #3095: the body extractor in
scripts/verify-seed-envelope-parity.mjs counted raw { and } on every
character. A future helper body that legitimately contains
`const marker = '{'` would have pushed depth past zero at the literal
brace and truncated the body — silently masking drift in the rest of
the function.

Extracted the scan into scanBalanced(source, start, open, close) which
skips characters inside line comments, block comments, and string /
template literals (with escape handling and template-literal ${} recursion
for interpolation). Call sites in extractFunctions updated to use the new
scanner for both the arg-list parens and the function body braces.

Made extractFunctions and scanBalanced exported so the new test file
can exercise them directly. Gated main() behind an isMain check so
importing the module from tests doesn't trigger process.exit.

New tests in tests/seed-envelope-parity.test.mjs:
- extractFunctions tolerates unbalanced braces in string literals
- same for template literals
- same for braces inside block comments
- same for braces inside line comments
- scanBalanced respects backslash-escapes inside strings
- scanBalanced recurses into template-literal ${} interpolation

Also addresses the other two Greptile P2s which were already fixed in
earlier commits on this branch:
- Empty-string gap (99646dd9a): .trim()==='' rejection added
- SeedContractError cause drop (99646dd9a): constructor forwards cause
  through super's options bag per Error v2 spec

61 tests green. Both typechecks clean.
2026-04-14 22:11:56 +04:00
Elie Habib
30ddad28d7 fix(seeds): upstream API drift — SPDR XLSX + IMF IRFCL + IMF-External BX/BM drop (#3076)
* fix(seeds): gold-etf XLSX migration, IRFCL dataflow, imf-external BX/BM drop

Three upstream-drift regressions caught from the market-backup + imf-extended
bundle logs. Root causes validated by live API probes before coding.

1. seed-gold-etf-flows: SPDR /assets/dynamic/GLD/GLD_US_archive_EN.csv now
   silently returns a PDF (Content-Type: application/pdf) — site migrated
   to api.spdrgoldshares.com/api/v1/historical-archive which serves XLSX.
   Swapped the CSV parser for an exceljs-based XLSX parser. Adds
   browser-ish Origin/Referer headers (SPDR swaps payload for PDF
   without them) and a Content-Type guard. Column layout: Date | Closing |
   ... | Tonnes | Total NAV USD.

2. seed-gold-cb-reserves: PR #3038 shipped with IMF.STA/IFS dataflow and
   3-segment key M..<indicator> — both wrong. IFS isn't exposed on
   api.imf.org (HTTP 204). Gold-reserves data lives under IMF.STA/IRFCL
   with 4 dimensions (COUNTRY.INDICATOR.SECTOR.FREQUENCY). Verified live:
   *.IRFCLDT1_IRFCL56_FTO.*.M returns 111 series. Switched to IRFCL +
   IRFCLDT1_IRFCL56_FTO (fine troy ounces) and fallbacks. The
   valueIsOunces flag now matches _FTO suffix (keeps legacy _OZT/OUNCE
   detection for backward compat).

3. seed-imf-external: BX/BM (export/import LEVELS, USD) WEO coverage
   collapsed to ~10 countries in late 2026 — the seeder's >=190-country
   validate floor was failing every run. Dropped BX/BM from fetch + join;
   kept BCA (~209) / TM_RPCH (~189) / TX_RPCH (~190). exportsUsd /
   importsUsd / tradeBalanceUsd fields kept as explicit null so consumers
   see a deliberate gap. validate floor lowered to 180 (BCA∪TM∪TX union).

Tests: 32/32 pass. Rewrote gold-etf tests to use synthetic XLSX fixtures
(exceljs resolved from scripts/package.json since repo root doesn't have
it). Updated imf-external tests for the new indicator set + null BX/BM
contract + 180-country validate threshold.

* fix(mcp): update get_country_macro description after BX/BM drop

Consumer-side catch during PR #3076 validation: the MCP tool description
still promised 'exports, imports, trade balance' fields that the seeder
fix nulls out. LLM consumers would be directed to exportsUsd/importsUsd/
tradeBalanceUsd fields that always return null since seed-imf-external
dropped BX/BM (WEO coverage collapsed to ~10 countries).

Updated description to list only the indicators actually populated
(currentAccountUsd, importVolumePctChg, exportVolumePctChg) with an
explicit note about the null trade-level fields so LLMs don't attempt
to use them.

* fix(gold-cb-reserves): compute real pctOfReserves + add exceljs to root

Follow-up to #3076 review.

1. pctOfReserves was hardcoded to 0 with a "IFS doesn't give us total
   reserves" comment. That was a lazy limitation claim — IMF IRFCL DOES
   expose total official reserve assets as IRFCLDT1_IRFCL65_USD parallel
   to the gold USD series IRFCLDT1_IRFCL56_USD. fetchCbReserves now
   pulls all three indicators (primary FTO tonnage + the two USD series)
   via Promise.allSettled and passes the USD pair to buildReservesPayload
   so it can compute the true gold share per country. Falls back to 0
   only when the denominator is genuinely missing for that country
   (IRFCL coverage: 114 gold_usd, 96 total_usd series; ~15% of holders
   have no matched-month denominator). 3-month lookback window absorbs
   per-country reporting lag.

2. CI fix: tests couldn't find exceljs because scripts/package.json is
   not workspace-linked to the repo root — CI runs `npm ci` at root
   only. Added exceljs@^4.4.0 as a root devDependency. Runtime seeder
   continues to resolve it from scripts/node_modules via Node's upward
   module resolution.

3 new tests cover pct computation, missing-denominator fallback, and
the 3-month lookback window.
2026-04-14 08:19:47 +04:00
Elie Habib
e32d9b631c feat(market): Hyperliquid perp positioning flow as leading indicator (#3074)
* feat(market): Hyperliquid perp positioning flow as leading indicator

Adds a 4-component composite (funding × volume × OI × basis) "positioning
stress" score for ~14 perps spanning crypto (BTC/ETH/SOL), tokenized gold
(PAXG), commodity perps (WTI, Brent, Gold, Silver, Pt, Pd, Cu, NatGas), and
FX perps (EUR, JPY). Polls Hyperliquid /info every 5min via Railway cron;
publishes a single self-contained snapshot with embedded sparkline arrays
(60 samples = 5h history). Surfaces as a new "Perp Flow" tab in
CommoditiesPanel with separate Commodities / FX sections.

Why: existing CFTC COT is weekly + US-centric; market quotes are price-only.
Hyperliquid xyz: perps give 24/7 global positioning data that has been shown
to lead spot moves on commodities and FX by minutes-to-hours.

Implementation:
- scripts/seed-hyperliquid-flow.mjs — pure scoring math, symbol whitelist,
  content-type + schema validation, prior-state read via readSeedSnapshot(),
  warmup contract (first run / post-outage zeroes vol/OI deltas),
  missing-symbol carry-forward, $500k/24h min-notional guard to suppress
  thin xyz: noise. TTL 2700s (9× cadence).
- proto/worldmonitor/market/v1/get_hyperliquid_flow.proto + service.proto
  registration; make generate regenerated client/server bindings.
- server/worldmonitor/market/v1/get-hyperliquid-flow.ts — getCachedJson
  reader matching get-cot-positioning.ts seeded-handler pattern.
- server/gateway.ts cache-tier entry (medium).
- api/health.js: hyperliquidFlow registered with maxStaleMin:15 (3× cadence)
  + transitional ON_DEMAND_KEYS gate for the first ~7 days of bake-in.
- api/seed-health.js mirror with intervalMin:5.
- scripts/seed-bundle-market-backup.mjs entry (NIXPACKS auto-redeploy on
  scripts/** watch).
- src/components/MarketPanel.ts: CommoditiesPanel grows a Perp Flow tab
  + fetchHyperliquidFlow() RPC method; OI Δ1h derived from sparkOi tail.
- src/App.ts: prime via primeVisiblePanelData() + recurring refresh via
  refreshScheduler.scheduleRefresh() at 5min cadence (panel does NOT own
  setInterval; matches the App.ts:1251 lifecycle convention).
- 28 unit tests covering scoring parity, warmup flag, min-notional guard,
  schema rejection, missing-symbol carry-forward, post-outage cold start,
  sparkline cap, alert threshold.

Tests: test:data 5169/5169, hyperliquid-flow-seed 28/28, route-cache-tier
5/5, typecheck + typecheck:api green. One pre-existing test:sidecar failure
(cloud-fallback origin headers) is unrelated and reproduces on origin/main.

* fix(hyperliquid-flow): address review feedback — volume baseline window, warmup lifecycle, error logging

Two real correctness bugs and four review nits from PR #3074 review pass.

Correctness fixes:

1. Volume baseline was anchored to the OLDEST 12 samples, not the newest.
   sparkVol is newest-at-tail (shiftAndAppend), so slice(0, 12) pinned the
   rolling mean to the first hour of data forever once len >= 12. Volume
   scoring would drift further from current conditions each poll. Switched
   to slice(-VOLUME_BASELINE_MIN_SAMPLES) so the baseline tracks the most
   recent window. Regression test added.

2. Warmup flag flipped to false on the second successful poll while volume
   scoring still needed 12+ samples to activate. UI told users warmup
   lasted ~1h but the badge disappeared after 5 min. Tied per-asset warmup
   to real baseline readiness (coldStart OR vol samples < 12 OR prior OI
   missing). Snapshot-level warmup = any asset still warming. Three new
   tests cover the persist-through-baseline-build, clear-once-ready, and
   missing-OI paths.

Review nits:

- Handler: bare catch swallowed Redis/parse errors; now logs err.message.
- Panel: bare catch in fetchHyperliquidFlow hid RPC 500s; now logs.
- MarketPanel.ts: deleted hand-rolled RawHyperliquidAsset; mapHyperliquidFlowResponse
  now takes GetHyperliquidFlowResponse from the generated client so proto
  drift fails compilation instead of silently.
- Seeder: added @ts-check + JSDoc on computeAsset per type-safety rule.
- validateUpstream: MAX_UPSTREAM_UNIVERSE=2000 cap bounds memory.
- buildSnapshot: logs unknown xyz: perps upstream (once per run) so ops
  sees when Hyperliquid adds markets we could whitelist.

Tests: 37/37 green; typecheck + typecheck:api clean.

* fix(hyperliquid-flow): wire bootstrap hydration per AGENTS.md mandate

Greptile review caught that AGENTS.md:187 mandates new data sources be wired
into bootstrap hydration. Plan had deferred this on "lazy deep-dive signal"
grounds, but the project convention is binding.

- server/_shared/cache-keys.ts: add hyperliquidFlow to BOOTSTRAP_CACHE_KEYS
  + BOOTSTRAP_TIERS ('slow' — non-blocking, page-load-parallel).
- api/bootstrap.js: add to inlined BOOTSTRAP_CACHE_KEYS + SLOW_KEYS so
  bootstrap.test.mjs canonical-mirror assertions pass.
- src/components/MarketPanel.ts:
  - Import getHydratedData from @/services/bootstrap.
  - New mapHyperliquidFlowSeed() normalizes the raw seed-JSON shape
    (numeric fields) into HyperliquidFlowView. The RPC mapper handles the
    proto shape (string-encoded numbers); bootstrap emits the raw blob.
  - fetchHyperliquidFlow now reads hydrated data first, renders
    immediately, then refreshes from RPC — mirrors FearGreedPanel pattern.

Tests: 72/72 green (bootstrap + cache-tier + hyperliquid-flow-seed).
2026-04-14 08:05:40 +04:00
Elie Habib
46d17efe55 fix(resilience): wider FX YoY upstream + sanctions absolute threshold (#3071)
* fix(resilience): wider FX YoY upstream + sanctions absolute threshold

Two backtest families consistently failed Outcome-Backtest gates because
the detectors were reading the wrong shape of upstream data, not because
the upstream seeders were missing.

FX Stress (was AUC=0.500):
- BIS WS_EER (`economic:bis:eer:v1`) only covers 12 G10/major-EM countries
  — Argentina, Egypt, Turkey, Pakistan, Nigeria etc. are absent, so the
  detector had no positive events to score against
- Add `seed-fx-yoy.mjs` fetching Yahoo Finance 2-year monthly history
  across 45 single-country currencies, computing YoY % and 24-month
  peak-to-trough drawdown
- Switch detector to read drawdown24m with -15% threshold (matches
  methodology spec); falls back to yoyChange/realChange for back-compat
- Why drawdown not just YoY: rolling 12-month windows slice through
  historic crises (Egypt's March 2024 devaluation falls outside an
  April→April window by 2026); drawdown captures actual stress magnitude
  regardless of crisis timing
- Verified locally: flags AR (-38%), TR (-28%), NG (-21%), MX (-18%)

Sanctions Shocks (was AUC=0.624):
- Detector previously used top-quartile (Q3) of country-counts which
  conflated genuine comprehensive-sanctions targets (RU, IR, KP, CU,
  SY, VE, BY, MM) with financial hubs (UK, CH, DE, US) hosting many
  sanctioned entities
- Replace with absolute threshold of 100 entities — the OFAC
  distribution is heavy-tailed enough that this cleanly separates
  targets from hubs

Both fixes use existing seeded data (or new seeded data via
seed-fx-yoy.mjs) — no hardcoded country curation.

api/health.js: register `economic:fx:yoy:v1` in STANDALONE_KEYS +
SEED_META so the validation cron monitors freshness.

Railway: deploy `seed-fx-yoy` as a daily cron service (NIXPACKS
builder, startCommand `node scripts/seed-fx-yoy.mjs`, schedule
`30 6 * * *`).

* fix(seed-fx-yoy): use running-peak max-drawdown instead of global peak

PR #3071 review (P1): the original drawdown calculation found the global
maximum across the entire window, then the lowest point AFTER that peak.
This silently erased earlier crashes when the currency later recovered to
a new high — exactly the class of events the FX Stress family is trying
to capture.

Example series [5, 10, 7, 9, 6, 11, 10]: true worst drawdown is 10 → 6 =
-40%, but the broken implementation picked the later global peak 11 and
reported only 11 → 10 = -9.1%.

Fix: sweep forward tracking the running peak; for each subsequent bar
compute the drop from that running peak; keep the largest such drop.
This is the standard max-drawdown computation and correctly handles
recover-then-fall-again sequences.

Live data verification:
- BR (Brazilian real) was missing from the flagged set under the broken
  algorithm because BRL recovered above its 2024-04 peak. With the fix it
  correctly surfaces at drawdown=-15.8% (peak 2024-04, trough 2024-12).
- KR / CO peaks now resolve to mid-series dates instead of end-of-window,
  proving the running-peak scan is finding intermediate peaks.

Tests added covering: reviewer's regression case, peak-at-start (NGN
style), pure appreciation, multi-trough series, yoyChange anchor.

* fix(health): gate fxYoy as on-demand to avoid post-merge CRIT alarm

PR #3071 review (P1): registering `fxYoy` as a required standalone
seeded key creates an operational hazard during the deploy gap. After
merge, Vercel auto-deploys `api/health.js` immediately, but the
`seed-fx-yoy` Railway cron lives in a separate deployment surface that
must be triggered manually. Any gap (deploy-order race, first-cron
failure, env var typo) flips health to DEGRADED/UNHEALTHY because
`classifyKey()` marks the check as `EMPTY` without an on-demand or
empty-data-OK exemption.

Add `fxYoy` to ON_DEMAND_KEYS as a transitional safety net (matches the
pattern recovery* uses for "stub seeders not yet deployed"). The key is
still monitored — freshness via seed-meta — but missing data downgrades
from CRIT to WARN, which won't page anyone. Once the Railway cron has
fired cleanly for ~7 days in production we can remove this entry and
let it be a hard-required key like the rest of the FRED family.

Note: the Railway service IS already provisioned (cron `30 6 * * *`,
0.5 vCPU / 0.5 GB, NIXPACKS, watchPatterns scoped to the seeder + utils)
and the `economic:fx:yoy:v1` key is currently fresh in production from
local test runs. The gating here is defense-in-depth against the
operational coupling, not against a known absent key.
2026-04-13 21:57:11 +04:00
Elie Habib
4741689d26 fix(health): EMPTY+records contradiction, year rollover, edge waitUntil, per-command errors (#3056)
* fix(health): resolve EMPTY+records contradiction, displacement-year rollover, edge waitUntil, per-command errors

Code review of api/health.js surfaced six bugs / design issues:

1. EMPTY status with records>0 (contradiction): when a data key was missing
   but seed-meta still cached the prior count, status reported EMPTY while
   records displayed the stale count (e.g. energyMixAll: EMPTY records=209).
   classifyKey() now forces records=0 whenever hasData is false so the two
   fields cannot disagree.

2. strlen heuristic misclassified small payloads: hasData = strlen > 10
   treated any payload <11 bytes as missing, including legitimate {}, [],
   or single-digit numbers. Replaced with strlenIsData(strlen): >0 and not
   exactly the negative-cache sentinel length.

3. Displacement key rolls over on UTC Jan 1 and went CRIT for hours every
   year. Added displacementPrev sibling and CASCADE_GROUPS entry so health
   stays OK_CASCADE during the cutover.

4. Summary 'warn' included on-demand-empty keys but 'overall' did not,
   so HEALTHY responses showed warn>0. Summary now reports realWarnCount
   and surfaces onDemandWarn separately.

5. Background last-failure write used void Promise.catch(() => {}). Edge
   isolates can terminate before it resolves; now uses ctx.waitUntil when
   available.

6. Per-command Redis errors silently became 'EMPTY'. Pipeline result
   errors are now collected into keyErrors and surface as REDIS_PARTIAL.

Also collapsed the 150-line BOOTSTRAP/STANDALONE duplicated loops into a
shared classifyKey() helper. DEGRADED threshold is now 3% of total checks
instead of a hardcoded 3 (so adding keys does not silently raise the bar).

* fix(health): also track per-command errors on seed-meta GET half

Review on PR #3056: error tracking only covered the STRLEN data half of the
pipeline. A failed GET seed-meta:* command was returned as result=null and
silently fell through to STALE_SEED instead of surfacing as REDIS_PARTIAL.
Collect keyMetaErrors alongside keyErrors and route either failure to the
REDIS_PARTIAL branch in classifyKey.

* fix(health): drop dead cascadeCovered check in records===0 branch

Greptile review on PR #3056: that branch only runs when hasData=true, and
isCascadeCovered() short-circuits to false in that case, so the cascade
check was structurally unreachable. Replaced with a comment so a future
maintainer doesn't add cascade siblings expecting empty-data shielding.
2026-04-13 16:24:22 +04:00