Commit Graph

3506 Commits

Author SHA1 Message Date
Sebastien Melki
58e42aadf9 chore(api): enforce sebuf contract + migrate drifting endpoints (#3207) (#3242)
* chore(api): enforce sebuf contract via exceptions manifest (#3207)

Adds api/api-route-exceptions.json as the single source of truth for
non-proto /api/ endpoints, with scripts/enforce-sebuf-api-contract.mjs
gating every PR via npm run lint:api-contract. Fixes the root-only blind
spot in the prior allowlist (tests/edge-functions.test.mjs), which only
scanned top-level *.js files and missed nested paths and .ts endpoints —
the gap that let api/supply-chain/v1/country-products.ts and friends
drift under proto domain URL prefixes unchallenged.

Checks both directions: every api/<domain>/v<N>/[rpc].ts must pair with
a generated service_server.ts (so a deleted proto fails CI), and every
generated service must have an HTTP gateway (no orphaned generated code).

Manifest entries require category + reason + owner, with removal_issue
mandatory for temporary categories (deferred, migration-pending) and
forbidden for permanent ones. .github/CODEOWNERS pins the manifest to
@SebastienMelki so new exceptions don't slip through review.

The manifest only shrinks: migration-pending entries (19 today) will be
removed as subsequent commits in this PR land each migration.

* refactor(maritime): migrate /api/ais-snapshot → maritime/v1.GetVesselSnapshot (#3207)

The proto VesselSnapshot was carrying density + disruptions but the frontend
also needed sequence, relay status, and candidate_reports to drive the
position-callback system. Those only lived on the raw relay passthrough, so
the client had to keep hitting /api/ais-snapshot whenever callbacks were
registered and fall back to the proto RPC only when the relay URL was gone.

This commit pushes all three missing fields through the proto contract and
collapses the dual-fetch-path into one proto client call.

Proto changes (proto/worldmonitor/maritime/v1/):
  - VesselSnapshot gains sequence, status, candidate_reports.
  - GetVesselSnapshotRequest gains include_candidates (query: include_candidates).

Handler (server/worldmonitor/maritime/v1/get-vessel-snapshot.ts):
  - Forwards include_candidates to ?candidates=... on the relay.
  - Separate 5-min in-memory caches for the candidates=on and candidates=off
    variants; they have very different payload sizes and should not share a slot.
  - Per-request in-flight dedup preserved per-variant.

Frontend (src/services/maritime/index.ts):
  - fetchSnapshotPayload now calls MaritimeServiceClient.getVesselSnapshot
    directly with includeCandidates threaded through. The raw-relay path,
    SNAPSHOT_PROXY_URL, DIRECT_RAILWAY_SNAPSHOT_URL and LOCAL_SNAPSHOT_FALLBACK
    are gone — production already routed via Vercel, the "direct" branch only
    ever fired on localhost, and the proto gateway covers both.
  - New toLegacyCandidateReport helper mirrors toDensityZone/toDisruptionEvent.

api/ais-snapshot.js deleted; manifest entry removed. Only reduced the codegen
scope to worldmonitor.maritime.v1 (buf generate --path) — regenerating the
full tree drops // @ts-nocheck from every client/server file and surfaces
pre-existing type errors across 30+ unrelated services, which is not in
scope for this PR.

Shape-diff vs legacy payload:
  - disruptions / density: proto carries the same fields, just with the
    GeoCoordinates wrapper and enum strings (remapped client-side via
    existing toDisruptionEvent / toDensityZone helpers).
  - sequence, status.{connected,vessels,messages}: now populated from the
    proto response — was hardcoded to 0/false in the prior proto fallback.
  - candidateReports: same shape; optional numeric fields come through as
    0 instead of undefined, which the legacy consumer already handled.

* refactor(sanctions): migrate /api/sanctions-entity-search → LookupSanctionEntity (#3207)

The proto docstring already claimed "OFAC + OpenSanctions" coverage but the
handler only fuzzy-matched a local OFAC Redis index — narrower than the
legacy /api/sanctions-entity-search, which proxied OpenSanctions live (the
source advertised in docs/api-proxies.mdx). Deleting the legacy without
expanding the handler would have been a silent coverage regression for
external consumers.

Handler changes (server/worldmonitor/sanctions/v1/lookup-entity.ts):
  - Primary path: live search against api.opensanctions.org/search/default
    with an 8s timeout and the same User-Agent the legacy edge fn used.
  - Fallback path: the existing OFAC local fuzzy match, kept intact for when
    OpenSanctions is unreachable / rate-limiting.
  - Response source field flips between 'opensanctions' (happy path) and
    'ofac' (fallback) so clients can tell which index answered.
  - Query validation tightened: rejects q > 200 chars (matches legacy cap).

Rate limiting:
  - Added /api/sanctions/v1/lookup-entity to ENDPOINT_RATE_POLICIES at 30/min
    per IP — matches the legacy createIpRateLimiter budget. The gateway
    already enforces per-endpoint policies via checkEndpointRateLimit.

Docs:
  - docs/api-proxies.mdx — dropped the /api/sanctions-entity-search row
    (plus the orphaned /api/ais-snapshot row left over from the previous
    commit in this PR).
  - docs/panels/sanctions-pressure.mdx — points at the new RPC URL and
    describes the OpenSanctions-primary / OFAC-fallback semantics.

api/sanctions-entity-search.js deleted; manifest entry removed.

* refactor(military): migrate /api/military-flights → ListMilitaryFlights (#3207)

Legacy /api/military-flights read a pre-baked Redis blob written by the
seed-military-flights cron and returned flights in a flat app-friendly
shape (lat/lon, lowercase enums, lastSeenMs). The proto RPC takes a bbox,
fetches OpenSky live, classifies server-side, and returns nested
GeoCoordinates + MILITARY_*_TYPE_* enum strings + lastSeenAt — same data,
different contract.

fetchFromRedis in src/services/military-flights.ts was doing nothing
sebuf-aware. Renamed it to fetchViaProto and rewrote to:

  - Instantiate MilitaryServiceClient against getRpcBaseUrl().
  - Iterate MILITARY_QUERY_REGIONS (PACIFIC + WESTERN) in parallel — same
    regions the desktop OpenSky path and the seed cron already use, so
    dashboard coverage tracks the analytic pipeline.
  - Dedup by hexCode across regions.
  - Map proto → app shape via new mapProtoFlight helper plus three reverse
    enum maps (AIRCRAFT_TYPE_REVERSE, OPERATOR_REVERSE, CONFIDENCE_REVERSE).

The seed cron (scripts/seed-military-flights.mjs) stays put: it feeds
regional-snapshot mobility, cross-source signals, correlation, and the
health freshness check (api/health.js: 'military:flights:v1'). None of
those read the legacy HTTP endpoint; they read the Redis key directly.
The proto handler uses its own per-bbox cache keys under the same prefix,
so dashboard traffic no longer races the seed cron's blob — the two paths
diverge by a small refresh lag, which is acceptable.

Docs: dropped the /api/military-flights row from docs/api-proxies.mdx.

api/military-flights.js deleted; manifest entry removed.

Shape-diff vs legacy:
  - f.location.{latitude,longitude} → f.lat, f.lon
  - f.aircraftType: MILITARY_AIRCRAFT_TYPE_TANKER → 'tanker' via reverse map
  - f.operator: MILITARY_OPERATOR_USAF → 'usaf' via reverse map
  - f.confidence: MILITARY_CONFIDENCE_LOW → 'low' via reverse map
  - f.lastSeenAt (number) → f.lastSeen (Date)
  - f.enrichment → f.enriched (with field renames)
  - Extra fields registration / aircraftModel / origin / destination /
    firstSeenAt now flow through where proto populates them.

* fix(supply-chain): thread includeCandidates through chokepoint status (#3207)

Caught by tsconfig.api.json typecheck in the pre-push hook (not covered
by the plain tsc --noEmit run that ran before I pushed the ais-snapshot
commit). The chokepoint status handler calls getVesselSnapshot internally
with a static no-auth request — now required to include the new
includeCandidates bool from the proto extension.

Passing false: server-internal callers don't need per-vessel reports.

* test(maritime): update getVesselSnapshot cache assertions (#3207)

The ais-snapshot migration replaced the single cachedSnapshot/cacheTimestamp
pair with a per-variant cache so candidates-on and candidates-off payloads
don't evict each other. Pre-push hook surfaced that tests/server-handlers
still asserted the old variable names. Rewriting the assertions to match
the new shape while preserving the invariants they actually guard:

  - Freshness check against slot TTL.
  - Cache read before relay call.
  - Per-slot in-flight dedup.
  - Stale-serve on relay failure (result ?? slot.snapshot).

* chore(proto): restore // @ts-nocheck on regenerated maritime files (#3207)

I ran 'buf generate --path worldmonitor/maritime/v1' to scope the proto
regen to the one service I was changing (to avoid the toolchain drift
that drops @ts-nocheck from 60+ unrelated files — separate issue). But
the repo convention is the 'make generate' target, which runs buf and
then sed-prepends '// @ts-nocheck' to every generated .ts file. My
scoped command skipped the sed step. The proto-check CI enforces the
sed output, so the two maritime files need the directive restored.

* refactor(enrichment): decomm /api/enrichment/{company,signals} legacy edge fns (#3207)

Both endpoints were already ported to IntelligenceService:
  - getCompanyEnrichment  (/api/intelligence/v1/get-company-enrichment)
  - listCompanySignals    (/api/intelligence/v1/list-company-signals)

No frontend callers of the legacy /api/enrichment/* paths exist. Removes:
  - api/enrichment/company.js, signals.js, _domain.js
  - api-route-exceptions.json migration-pending entries (58 remain)
  - docs/api-proxies.mdx rows for /api/enrichment/{company,signals}
  - docs/architecture.mdx reference updated to the IntelligenceService RPCs

Verified: typecheck, typecheck:api, lint:api-contract (89 files / 58 entries),
lint:boundaries, tests/edge-functions.test.mjs (136 pass),
tests/enrichment-caching.test.mjs (14 pass — still guards the intelligence/v1
handlers), make generate is zero-diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(leads): migrate /api/{contact,register-interest} → LeadsService (#3207)

New leads/v1 sebuf service with two POST RPCs:
  - SubmitContact    → /api/leads/v1/submit-contact
  - RegisterInterest → /api/leads/v1/register-interest

Handler logic ported 1:1 from api/contact.js + api/register-interest.js:
  - Turnstile verification (desktop sources bypass, preserved)
  - Honeypot (website field) silently accepts without upstream calls
  - Free-email-domain gate on SubmitContact (422 ApiError)
  - validateEmail (disposable/offensive/typo-TLD/MX) on RegisterInterest
  - Convex writes via ConvexHttpClient (contactMessages:submit, registerInterest:register)
  - Resend notification + confirmation emails (HTML templates unchanged)

Shared helpers moved to server/_shared/:
  - turnstile.ts (getClientIp + verifyTurnstile)
  - email-validation.ts (disposable/offensive/MX checks)

Rate limits preserved via ENDPOINT_RATE_POLICIES:
  - submit-contact:    3/hour per IP (was in-memory 3/hr)
  - register-interest: 5/hour per IP (was in-memory 5/hr; desktop
    sources previously capped at 2/hr via shared in-memory map —
    now 5/hr like everyone else, accepting the small regression in
    exchange for Upstash-backed global limiting)

Callers updated:
  - pro-test/src/App.tsx contact form → new submit-contact path
  - src-tauri/sidecar/local-api-server.mjs cloud-fallback rewrites
    /api/register-interest → /api/leads/v1/register-interest when
    proxying; keeps local path for older desktop builds
  - src/services/runtime.ts isKeyFreeApiTarget allows both old and
    new paths through the WORLDMONITOR_API_KEY-optional gate

Tests:
  - tests/contact-handler.test.mjs rewritten to call submitContact
    handler directly; asserts on ValidationError / ApiError
  - tests/email-validation.test.mjs + tests/turnstile.test.mjs
    point at the new server/_shared/ modules

Deleted: api/contact.js, api/register-interest.js, api/_ip-rate-limit.js,
api/_turnstile.js, api/_email-validation.js, api/_turnstile.test.mjs.
Manifest entries removed (58 → 56). Docs updated (api-platform,
api-commerce, usage-rate-limits).

Verified: npm run typecheck + typecheck:api + lint:api-contract
(88 files / 56 entries) + lint:boundaries pass; full test:data
(5852 tests) passes; make generate is zero-diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(pro-test): rebuild bundle for leads/v1 contact form (#3207)

Updates the enterprise contact form to POST to /api/leads/v1/submit-contact
(old path /api/contact removed in the previous commit).

Bundle is rebuilt from pro-test/src/App.tsx source change in 9ccd309d.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): address HIGH review findings 1-3 (#3207)

Three review findings from @koala73 on the sebuf-migration PR, all
silent bugs that would have shipped to prod:

### 1. Sanctions rate-limit policy was dead code

ENDPOINT_RATE_POLICIES keyed the 30/min budget under
/api/sanctions/v1/lookup-entity, but the generated route (from the
proto RPC LookupSanctionEntity) is /api/sanctions/v1/lookup-sanction-entity.
hasEndpointRatePolicy / getEndpointRatelimit are exact-string pathname
lookups, so the mismatch meant the endpoint fell through to the
generic 600/min global limiter instead of the advertised 30/min.

Net effect: the live OpenSanctions proxy endpoint (unauthenticated,
external upstream) had 20x the intended rate budget. Fixed by renaming
the policy key to match the generated route.

### 2. Lost stale-seed fallback on military-flights

Legacy api/military-flights.js cascaded military:flights:v1 →
military:flights:stale:v1 before returning empty. The new proto
handler went straight to live OpenSky/relay and returned null on miss.

Relay or OpenSky hiccup used to serve stale seeded data (24h TTL);
under the new handler it showed an empty map. Both keys are still
written by scripts/seed-military-flights.mjs on every run — fix just
reads the stale key when the live fetch returns null, converts the
seed's app-shape flights (flat lat/lon, lowercase enums, lastSeenMs)
to the proto shape (nested GeoCoordinates, enum strings, lastSeenAt),
and filters to the request bbox.

Read via getRawJson (unprefixed) to match the seed cron's writes,
which bypass the env-prefix system.

### 3. Hex-code casing mismatch broke getFlightByHex

The seed cron writes hexCode: icao24.toUpperCase() (uppercase);
src/services/military-flights.ts:getFlightByHex uppercases the lookup
input: f.hexCode === hexCode.toUpperCase(). The new proto handler
preserved OpenSky's lowercase icao24, and mapProtoFlight is a
pass-through. getFlightByHex was silently returning undefined for
every call after the migration.

Fix: uppercase in the proto handler (live + stale paths), and document
the invariant in a comment on MilitaryFlight.hex_code in
military_flight.proto so future handlers don't re-break it.

### Verified

- typecheck + typecheck:api clean
- lint:api-contract (56 entries) / lint:boundaries clean
- tests/edge-functions.test.mjs 130 pass
- make generate zero-diff (openapi spec regenerated for proto comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): restore desktop 2/hr rate cap on register-interest (#3207)

Addresses HIGH review finding #4 from @koala73. The legacy
api/register-interest.js applied a nested 2/hr per-IP cap when
`source === 'desktop-settings'`, on top of the generic 5/hr endpoint
budget. The sebuf migration lost this — desktop-source requests now
enjoy the full 5/hr cap.

Since `source` is an unsigned client-supplied field, anyone sending
`source: 'desktop-settings'` skips Turnstile AND gets 5/hr. Without
the tighter cap the Turnstile bypass is cheaper to abuse.

Added `checkScopedRateLimit` to `server/_shared/rate-limit.ts` — a
reusable second-stage Upstash limiter keyed on an opaque scope string
+ caller identifier. Fail-open on Redis errors to match existing
checkRateLimit / checkEndpointRateLimit semantics. Handlers that need
per-subscope caps on top of the gateway-level endpoint budget use this
helper.

In register-interest: when `isDesktopSource`, call checkScopedRateLimit
with scope `/api/leads/v1/register-interest#desktop`, limit=2, window=1h,
IP as identifier. On exceeded → throw ApiError(429).

### What this does not fix

This caps the blast radius of the Turnstile bypass but does not close
it — an attacker sending `source: 'desktop-settings'` still skips
Turnstile (just at 2/hr instead of 5/hr). The proper fix is a signed
desktop-secret header that authenticates the bypass; filed as
follow-up #3252. That requires coordinated Tauri build + Vercel env
changes out of scope for #3207.

### Verified

- typecheck + typecheck:api clean
- lint:api-contract (56 entries)
- tests/edge-functions.test.mjs + contact-handler.test.mjs (147 pass)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): MEDIUM + LOW + rate-limit-policy CI check (#3207)

Closes out the remaining @koala73 review findings from #3242 that
didn't already land in the HIGH-fix commits, plus the requested CI
check that would have caught HIGH #1 (dead-code policy key) at
review time.

### MEDIUM #5 — Turnstile missing-secret policy default

Flip `verifyTurnstile`'s default `missingSecretPolicy` from `'allow'`
to `'allow-in-development'`. Dev with no secret = pass (expected
local); prod with no secret = reject + log. submit-contact was
already explicitly overriding to `'allow-in-development'`;
register-interest was silently getting `'allow'`. Safe default now
means a future missing-secret misconfiguration in prod gets caught
instead of silently letting bots through. Removed the now-redundant
override in submit-contact.

### MEDIUM #6 — Silent enum fallbacks in maritime client

`toDisruptionEvent` mapped `AIS_DISRUPTION_TYPE_UNSPECIFIED` / unknown
enum values → `gap_spike` / `low` silently. Refactored to return null
when either enum is unknown; caller filters nulls out of the array.
Handler doesn't produce UNSPECIFIED today, but the `gap_spike`
default would have mislabeled the first new enum value the proto
ever adds — dropping unknowns is safer than shipping wrong labels.

### LOW — Copy drift in register-interest email

Email template hardcoded `435+ Sources`; PR #3241 bumped marketing to
`500+`. Bumped in the rewritten file to stay consistent.

The `as any` on Convex mutation names carried over from legacy and
filed as follow-up #3253.

### Rate-limit-policy coverage lint

`scripts/enforce-rate-limit-policies.mjs` validates every key in
`ENDPOINT_RATE_POLICIES` resolves to a proto-generated gateway route
by cross-referencing `docs/api/*.openapi.yaml`. Fails with the
sanctions-entity-search incident referenced in the error message so
future drift has a paper trail.

Wired into package.json (`lint:rate-limit-policies`) and the pre-push
hook alongside `lint:boundaries`. Smoke-tested both directions —
clean repo passes (5 policies / 175 routes), seeded drift (the exact
HIGH #1 typo) fails with the advertised remedy text.

### Verified
- `lint:rate-limit-policies` ✓
- `typecheck` + `typecheck:api` ✓
- `lint:api-contract` ✓ (56 entries)
- `lint:boundaries` ✓
- edge-functions + contact-handler tests (147 pass)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 5): decomm /api/eia/* + migrate /api/satellites → IntelligenceService (#3207)

Both targets turned out to be decomm-not-migration cases. The original
plan called for two new services (economic/v1.GetEiaSeries +
natural/v1.ListSatellitePositions) but research found neither was
needed:

### /api/eia/[[...path]].js — pure decomm, zero consumers

The "catch-all" is a misnomer — only two paths actually worked,
/api/eia/health and /api/eia/petroleum, both Redis-only readers.
Zero frontend callers in src/. Zero server-side readers. Nothing
consumes the `energy:eia-petroleum:v1` key that seed-eia-petroleum.mjs
writes daily.

The EIA data the frontend actually uses goes through existing typed
RPCs in economic/v1: GetEnergyPrices, GetCrudeInventories,
GetNatGasStorage, GetEnergyCapacity. None of those touch /api/eia/*.

Building GetEiaSeries would have been dead code. Deleted the legacy
file + its test (tests/api-eia-petroleum.test.mjs — it only covered
the legacy endpoint, no behavior to preserve). Empty api/eia/ dir
removed.

**Note for review:** the Redis seed cron keeps running daily and
nothing consumes it. If that stays unused, seed-eia-petroleum.mjs
should be retired too (separate PR). Out of scope for sebuf-migration.

### /api/satellites.js — Learning #2 strikes again

IntelligenceService.ListSatellites already exists at
/api/intelligence/v1/list-satellites, reads the same Redis key
(intelligence:satellites:tle:v1), and supports an optional country
filter the legacy didn't have.

One frontend caller in src/services/satellites.ts needed to switch
from `fetch(toApiUrl('/api/satellites'))` to the typed
IntelligenceServiceClient.listSatellites. Shape diff was tiny —
legacy `noradId` became proto `id` (handler line 36 already picks
either), everything else identical. alt/velocity/inclination in the
proto are ignored by the caller since it propagates positions
client-side via satellite.js.

Kept the client-side cache + failure cooldown + 20s timeout (still
valid concerns at the caller level).

### Manifest + docs
- api-route-exceptions.json: 56 → 54 entries (both removed)
- docs/api-proxies.mdx: dropped the two rows from the Raw-data
  passthroughs table

### Verified
- typecheck + typecheck:api ✓
- lint:api-contract (54 entries) / lint:boundaries / lint:rate-limit-policies ✓
- tests/edge-functions.test.mjs 127 pass (down from 130 — 3 tests were
  for the deleted eia endpoint)
- make generate zero-diff (no proto changes)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 6): migrate /api/supply-chain/v1/{country-products,multi-sector-cost-shock} → SupplyChainService (#3207)

Both endpoints were hand-rolled TS handlers sitting under a proto URL prefix —
the exact drift the manifest guardrail flagged. Promoted both to typed RPCs:

- GetCountryProducts → /api/supply-chain/v1/get-country-products
- GetMultiSectorCostShock → /api/supply-chain/v1/get-multi-sector-cost-shock

Handlers preserve the existing semantics: PRO-gate via isCallerPremium(ctx.request),
iso2 / chokepointId validation, raw bilateral-hs4 Redis read (skip env-prefix to
match seeder writes), CHOKEPOINT_STATUS_KEY for war-risk tier, and the math from
_multi-sector-shock.ts unchanged. Empty-data and non-PRO paths return the typed
empty payload (no 403 — the sebuf gateway pattern is empty-payload-on-deny).

Client wrapper switches from premiumFetch to client.getCountryProducts/
client.getMultiSectorCostShock. Legacy MultiSectorShock / MultiSectorShockResponse /
CountryProductsResponse names remain as type aliases of the generated proto types
so CountryBriefPanel + CountryDeepDivePanel callsites compile with zero churn.

Manifest 54 → 52. Rate-limit gateway routes 175 → 177.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): add cache-tier entries for new supply-chain RPCs (#3207)

Pre-push tests/route-cache-tier.test.mjs caught the missing entries.
Both PRO-gated, request-varying — match the existing supply-chain PRO cohort
(get-country-cost-shock, get-bypass-options, etc.) at slow-browser tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 7): migrate /api/scenario/v1/{run,status,templates} → ScenarioService (#3207)

Promote the three literal-filename scenario endpoints to a typed sebuf
service with three RPCs:

  POST /api/scenario/v1/run-scenario        (RunScenario)
  GET  /api/scenario/v1/get-scenario-status (GetScenarioStatus)
  GET  /api/scenario/v1/list-scenario-templates (ListScenarioTemplates)

Preserves all security invariants from the legacy handlers:
- 405 for wrong method (sebuf service-config method gate)
- scenarioId validation against SCENARIO_TEMPLATES registry
- iso2 regex ^[A-Z]{2}$
- JOB_ID_RE path-traversal guard on status
- Per-IP 10/min rate limit (moved to gateway ENDPOINT_RATE_POLICIES)
- Queue-depth backpressure (>100 → 429)
- PRO gating via isCallerPremium
- AbortSignal.timeout on every Redis pipeline (runRedisPipeline helper)

Wire-level diffs vs legacy:
- Per-user RL now enforced at the gateway (same 10/min/IP budget).
- Rate-limit response omits Retry-After header; retryAfter is in the
  body per error-mapper.ts convention.
- ListScenarioTemplates emits affectedHs2: [] when the registry entry
  is null (all-sectors sentinel); proto repeated cannot carry null.
- RunScenario returns { jobId, status } (no statusUrl field — unused
  by SupplyChainPanel, drop from wire).

Gateway wiring:
- server/gateway.ts RPC_CACHE_TIER: list-scenario-templates → 'daily'
  (matches legacy max-age=3600); get-scenario-status → 'slow-browser'
  (premium short-circuit target, explicit entry required by
  tests/route-cache-tier.test.mjs).
- src/shared/premium-paths.ts: swap old run/status for the new
  run-scenario/get-scenario-status paths.
- api/scenario/v1/{run,status,templates}.ts deleted; 3 manifest
  exceptions removed (63 → 52 → 49 migration-pending).

Client:
- src/services/scenario/index.ts — typed client wrapper using
  premiumFetch (injects Clerk bearer / API key).
- src/components/SupplyChainPanel.ts — polling loop swapped from
  premiumFetch strings to runScenario/getScenarioStatus. Hard 20s
  timeout on run preserved via AbortSignal.any.

Tests:
- tests/scenario-handler.test.mjs — 18 new handler-level tests
  covering every security invariant + the worker envelope coercion.
- tests/edge-functions.test.mjs — scenario sections removed,
  replaced with a breadcrumb pointer to the new test file.

Docs: api-scenarios.mdx, scenario-engine.mdx, usage-rate-limits.mdx,
usage-errors.mdx, supply-chain.mdx refreshed with new paths.

Verified: typecheck, typecheck:api, lint:api-contract (49 entries),
lint:rate-limit-policies (6/180), lint:boundaries, route-cache-tier
(parity), full edge-functions (117) + scenario-handler (18).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commit 8): migrate /api/v2/shipping/{route-intelligence,webhooks} → ShippingV2Service (#3207)

Partner-facing endpoints promoted to a typed sebuf service. Wire shape
preserved byte-for-byte (camelCase field names, ISO-8601 fetchedAt, the
same subscriberId/secret formats, the same SET + SADD + EXPIRE 30-day
Redis pipeline). Partner URLs /api/v2/shipping/* are unchanged.

RPCs landed:
- GET  /route-intelligence  → RouteIntelligence  (PRO, slow-browser)
- POST /webhooks            → RegisterWebhook    (PRO)
- GET  /webhooks            → ListWebhooks       (PRO, slow-browser)

The existing path-parameter URLs remain on the legacy edge-function
layout because sebuf's HTTP annotations don't currently model path
params (grep proto/**/*.proto for `path: "{…}"` returns zero). Those
endpoints are split into two Vercel dynamic-route files under
api/v2/shipping/webhooks/, behaviorally identical to the previous
hybrid file but cleanly separated:
- GET  /webhooks/{subscriberId}                → [subscriberId].ts
- POST /webhooks/{subscriberId}/rotate-secret  → [subscriberId]/[action].ts
- POST /webhooks/{subscriberId}/reactivate     → [subscriberId]/[action].ts

Both get manifest entries under `migration-pending` pointing at #3207.

Other changes
- scripts/enforce-sebuf-api-contract.mjs: extended GATEWAY_RE to accept
  api/v{N}/{domain}/[rpc].ts (version-first) alongside the canonical
  api/{domain}/v{N}/[rpc].ts; first-use of the reversed ordering is
  shipping/v2 because that's the partner contract.
- vite.config.ts: dev-server sebuf interceptor regex extended to match
  both layouts; shipping/v2 import + allRoutes entry added.
- server/gateway.ts: RPC_CACHE_TIER entries for /api/v2/shipping/
  route-intelligence + /webhooks (slow-browser; premium-gated endpoints
  short-circuit to slow-browser but the entries are required by
  tests/route-cache-tier.test.mjs).
- src/shared/premium-paths.ts: route-intelligence + webhooks added.
- tests/shipping-v2-handler.test.mjs: 18 handler-level tests covering
  PRO gate, iso2/cargoType/hs2 coercion, SSRF guards (http://, RFC1918,
  cloud metadata, IMDS), chokepoint whitelist, alertThreshold range,
  secret/subscriberId format, pipeline shape + 30-day TTL, cross-tenant
  owner isolation, `secret` omission from list response.

Manifest delta
- Removed: api/v2/shipping/route-intelligence.ts, api/v2/shipping/webhooks.ts
- Added:   api/v2/shipping/webhooks/[subscriberId].ts (migration-pending)
- Added:   api/v2/shipping/webhooks/[subscriberId]/[action].ts (migration-pending)
- Added:   api/internal/brief-why-matters.ts (internal-helper) — regression
  surface from the #3248 main merge, which introduced the file without a
  manifest entry. Filed here to keep the lint green; not strictly in scope
  for commit 8 but unblocking.

Net result: 49 → 47 `migration-pending` entries (one net-removal even
though webhook path-params stay pending, because two files collapsed
into two dynamic routes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 1): SupplyChainServiceClient must use premiumFetch (#3207)

Signed-in browser pro users were silently hitting 401 on 8 supply-chain
premium endpoints (country-products, multi-sector-cost-shock,
country-chokepoint-index, bypass-options, country-cost-shock,
sector-dependency, route-explorer-lane, route-impact). The shared
client was constructed with globalThis.fetch, so no Clerk bearer or
X-WorldMonitor-Key was injected. The gateway's validateApiKey runs
with forceKey=true for PREMIUM_RPC_PATHS and 401s before isCallerPremium
is consulted. The generated client's try/catch collapses the 401 into
an empty-fallback return, leaving panels blank with no visible error.

Fix is one line at the client constructor: swap globalThis.fetch for
premiumFetch. The same pattern is already in use for insider-transactions,
stock-analysis, stock-backtest, scenario, trade (premiumClient) — this
was an omission on this client, not a new pattern.

premiumFetch no-ops safely when no credentials are available, so the
5 non-premium methods on this client (shippingRates, chokepointStatus,
chokepointHistory, criticalMinerals, shippingStress) continue to work
unchanged.

This also fixes two panels that were pre-existing latently broken on
main (chokepoint-index, bypass-options, etc. — predating #3207, not
regressions from it). Commit 6 expanded the surface by routing two more
methods through the same buggy client; this commit fixes the class.

From koala73 review (#3242 second-pass, HIGH new #1):
> Exact class PR #3233 fixed for RegionalIntelligenceBoard /
> DeductionPanel / trade / country-intel. Supply-chain was not in
> #3233's scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 2): restore 400 on input-shape errors for 2 supply-chain handlers (#3207)

Commit 6 collapsed all non-happy paths into empty-200 on
`get-country-products` and `get-multi-sector-cost-shock`, including
caller-bug cases that legacy returned 400 for:

- get-country-products: malformed iso2 → empty 200 (was 400)
- get-multi-sector-cost-shock: malformed iso2 / missing chokepointId /
  unknown chokepointId → empty 200 (was 400)

The commit message for 6 called out the 403-for-non-pro → empty-200
shift ("sebuf gateway pattern is empty-payload-on-deny") but not the
400 shift. They're different classes:

- Empty-payload-200 for PRO-deny: intentional contract change, already
  documented and applied across the service. Generated clients treat
  "you lack PRO" as "no data" — fine.
- Empty-payload-200 for malformed input: caller bug silently masked.
  External API consumers can't distinguish "bad wiring" from "genuinely
  no data", test harnesses lose the signal, bad calling code doesn't
  surface in Sentry.

Fix: `throw new ValidationError(violations)` on the 3 input-shape
branches. The generated sebuf server maps ValidationError → HTTP 400
(see src/generated/server/.../service_server.ts and leads/v1 which
already uses this pattern).

PRO-gate deny stays as empty-200 — that contract shift was intentional
and is preserved.

Regression tests added at tests/supply-chain-validation.test.mjs (8
cases) pinning the three-way contract:
- bad input                         → 400 (ValidationError)
- PRO-gate deny on valid input      → 200 empty
- valid PRO input, no data in Redis → 200 empty (unchanged)

From koala73 review (#3242 second-pass, HIGH new #2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 3): restore statusUrl on RunScenarioResponse + document 202→200 wire break (#3207)

Commit 7 silently shifted /api/scenario/v1/run-scenario's response
contract in two ways that the commit message covered only partially:

1. HTTP 202 Accepted → HTTP 200 OK
2. Dropped `statusUrl` string from the response body

The `statusUrl` drop was mentioned as "unused by SupplyChainPanel" but
not framed as a contract change. The 202 → 200 shift was not mentioned
at all. This is a same-version (v1 → v1) migration, so external callers
that key off either signal — `response.status === 202` or
`response.body.statusUrl` — silently branch incorrectly.

Evaluated options:
  (a) sebuf per-RPC status-code config — not available. sebuf's
      HttpConfig only models `path` and `method`; no status annotation.
  (b) Bump to scenario/v2 — judged heavier than the break itself for
      a single status-code shift. No in-repo caller uses 202 or
      statusUrl; the docs-level impact is containable.
  (c) Accept the break, document explicitly, partially restore.

Took option (c):

- Restored `statusUrl` in the proto (new field `string status_url = 3`
  on RunScenarioResponse). Server computes
  `/api/scenario/v1/get-scenario-status?jobId=<encoded job_id>` and
  populates it on every successful enqueue. External callers that
  followed this URL keep working unchanged.
- 202 → 200 is not recoverable inside the sebuf generator, so it is
  called out explicitly in two places:
    - docs/api-scenarios.mdx now includes a prominent `<Warning>` block
      documenting the v1→v1 contract shift + the suggested migration
      (branch on response body shape, not HTTP status).
    - RunScenarioResponse proto comment explains why 200 is the new
      success status on enqueue.
  OpenAPI bundle regenerated to reflect the restored statusUrl field.

- Regression test added in tests/scenario-handler.test.mjs pinning
  `statusUrl` to the exact URL-encoded shape — locks the invariant so
  a future proto rename or handler refactor can't silently drop it
  again.

From koala73 review (#3242 second-pass, HIGH new #3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 1/2): close webhook tenant-isolation gap on shipping/v2 (#3207)

Koala flagged this as a merge blocker in PR #3242 review.

server/worldmonitor/shipping/v2/{register-webhook,list-webhooks}.ts
migrated without reinstating validateApiKey(req, { forceKey: true }),
diverging from both the sibling api/v2/shipping/webhooks/[subscriberId]
routes and the documented "X-WorldMonitor-Key required" contract in
docs/api-shipping-v2.mdx.

Attack surface: the gateway accepts Clerk bearer auth as a pro signal.
A Clerk-authenticated pro user with no X-WorldMonitor-Key reaches the
handler, callerFingerprint() falls back to 'anon', and every such
caller collapses into a shared webhook:owner:anon:v1 bucket. The
defense-in-depth ownerTag !== ownerHash check in list-webhooks.ts
doesn't catch it because both sides equal 'anon' — every Clerk-session
holder could enumerate / overwrite every other Clerk-session pro
tenant's registered webhook URLs.

Fix: reinstate validateApiKey(ctx.request, { forceKey: true }) at the
top of each handler, throwing ApiError(401) when absent. Matches the
sibling routes exactly and the published partner contract.

Tests:
- tests/shipping-v2-handler.test.mjs: two existing "non-PRO → 403"
  tests for register/list were using makeCtx() with no key, which now
  fails at the 401 layer first. Renamed to "no API key → 401
  (tenant-isolation gate)" with a comment explaining the failure mode
  being tested. 18/18 pass.

Verified: typecheck:api, lint:api-contract (no change), lint:boundaries,
lint:rate-limit-policies, test:data (6005/6005).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review HIGH 2/2): restore v1 path aliases on scenario + supply-chain (#3207)

Koala flagged this as a merge blocker in PR #3242 review.

Commits 6 + 7 of #3207 renamed five documented v1 URLs to the sebuf
method-derived paths and deleted the legacy edge-function files:

  POST /api/scenario/v1/run                       → run-scenario
  GET  /api/scenario/v1/status                    → get-scenario-status
  GET  /api/scenario/v1/templates                 → list-scenario-templates
  GET  /api/supply-chain/v1/country-products      → get-country-products
  GET  /api/supply-chain/v1/multi-sector-cost-shock → get-multi-sector-cost-shock

server/router.ts is an exact static-match table (Map keyed on `METHOD
PATH`), so any external caller — docs, partner scripts, grep-the-
internet — hitting the old documented URL would 404 on first request
after merge. Commit 8 (shipping/v2) preserved partner URLs byte-for-
byte; the scenario + supply-chain renames missed that discipline.

Fix: add five thin alias edge functions that rewrite the pathname to
the canonical sebuf path and delegate to the domain [rpc].ts gateway
via a new server/alias-rewrite.ts helper. Premium gating, rate limits,
entitlement checks, and cache-tier lookups all fire on the canonical
path — aliases are pure URL rewrites, not a duplicate handler pipeline.

  api/scenario/v1/{run,status,templates}.ts
  api/supply-chain/v1/{country-products,multi-sector-cost-shock}.ts

Vite dev parity: file-based routing at api/ is a Vercel concern, so the
dev middleware (vite.config.ts) gets a matching V1_ALIASES rewrite map
before the router dispatch.

Manifest: 5 new entries under `deferred` with removal_issue=#3282
(tracking their retirement at the next v1→v2 break). lint:api-contract
stays green (89 files checked, 55 manifest entries validated).

Docs:
- docs/api-scenarios.mdx: migration callout at the top with the full
  old→new URL table and a link to the retirement issue.
- CHANGELOG.md + docs/changelog.mdx: Changed entry documenting the
  rename + alias compat + the 202→200 shift (from commit 23c821a1).

Verified: typecheck:api, lint:api-contract, lint:rate-limit-policies,
lint:boundaries, test:data (6005/6005).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 09:55:59 +03:00
Elie Habib
425507d15a fix(brief): category-gated context + RELEVANCE RULE to stop formulaic grounding (#3281)
* fix(brief): category-gated context + RELEVANCE RULE to stop formulaic grounding

Shadow-diff of 15 v2 pairs (2026-04-22) showed the analyst pattern-
matching the loudest context numbers — VIX 19.50, top forecast
probability, MidEast FX stress 77 — into every story regardless of
editorial fit. A Rwanda humanitarian story about refugees cited VIX;
an aviation story cited a forecast probability.

Root cause: every story got the same 6-bundle context block, so the
LLM had markets / forecasts / macro in-hand and the "cite a specific
fact" instruction did the rest.

Two-layer fix:

  1. STRUCTURAL — sectionsForCategory() maps the story's category to
     an editorially-relevant subset of bundles. Humanitarian stories
     don't see marketData / forecasts / macroSignals; diplomacy gets
     riskScores only; market/energy gets markets+forecasts but drops
     riskScores. The model physically cannot cite what it wasn't
     given. Unknown categories fall back to all six (backcompat).

  2. PROMPT — WHY_MATTERS_ANALYST_SYSTEM_V2 adds a RELEVANCE RULE
     that explicitly permits grounding in headline/description
     actors when no context fact is a clean fit, and bans dragging
     off-topic market metrics into humanitarian/aviation/diplomacy
     stories. The prompt footer (inline, per-call) restates the
     same guardrail — models follow inline instructions more
     reliably than system-prompt constraints on longer outputs.

Cache keys bumped to invalidate the formulaic v5 output: endpoint
v5 to v6, shadow v3 to v4. Adds 11 unit tests pinning the 5
policies + default fallback + humanitarian structural guarantee +
market policy does-see-markets + guardrail footer presence.

Observability: endpoint now logs policyLabel per call so operators
can confirm in Vercel logs that humanitarian/aviation stories are
NOT seeing marketData without dumping the full prompt.

* test(brief): address greptile P2 — sync MAX_BODY_BYTES + add parseWhyMattersV2 coverage

Greptile PR #3281 review raised two P2 test-quality issues:

1. Test-side MAX_BODY_BYTES mirror was still 4096 — the endpoint
   was bumped to 8192 in PR #3269 (v2 output + description). With
   the stale constant, a payload in the 4097–8192 range was
   accepted by the real endpoint but looked oversize in the test
   mirror, letting the body-cap invariant silently drift. Fixed
   by syncing to 8192 + bumping the bloated fixture to 10_000
   bytes so a future endpoint-cap bump doesn't silently
   re-invalidate the assertion.

2. parseWhyMattersV2 (the only output-validation gate on the
   analyst path) had no dedicated unit tests. Adds 11 targeted
   cases covering: valid 2 and 3 sentence output, 100/500 char
   bounds (incl. boundary assertions), all 6 banned preamble
   phrases, section-label leaks (SITUATION/ANALYSIS/Watch),
   markdown leakage (#, -, *, 1.), stub echo rejection, smart/
   plain quote stripping, non-string defensive branch, and
   whitespace-only strings.

Suite size: 50 to 61 tests, all green.

* fix(brief): add aviation policy to sectionsForCategory (PR #3281 review P1)

Reviewer caught that aviation was named in WHY_MATTERS_ANALYST_SYSTEM_V2's
RELEVANCE RULE as a category banned from off-topic market metrics, but
had no matching regex entry in CATEGORY_SECTION_POLICY. So 'Aviation
Incident' / 'Airspace Closure' / 'Plane Crash' / 'Drone Incursion' all
fell through to DEFAULT_SECTIONS and still got all 6 bundles including
marketData, forecasts, and macroSignals — exactly the VIX / forecast
probability pattern the PR claimed to structurally prevent.

Reproduced on HEAD before fix:
  Aviation Incident -> default
  Airspace Closure  -> default
  Plane Crash       -> default
  ...etc.

Fix:
  1. Adds aviation policy (same 3 bundles as humanitarian/diplomacy/
     tech: worldBrief, countryBrief, riskScores).
  2. Adds dedicated aviation-gating test with 6 category variants.
  3. Adds meta-invariant test: every category named in the system
     prompt's RELEVANCE RULE MUST have a structural policy entry,
     asserting policyLabel !== 'default'. If someone adds a new
     category name to the prompt in the future, this test fires
     until they wire up a regex — prevents soft-guard drift.
  4. Removes 'Aviation Incident' from the default-fall-through test
     list (it now correctly matches aviation).

No cache bump needed — v6 was published to the feature branch only a
few minutes ago, no production entries have been written yet.
2026-04-22 08:21:01 +04:00
Elie Habib
fbaf07e106 feat(resilience): flag-gated pillar-combined score activation (default off) (#3267)
Wires the non-compensatory 3-pillar combined overall_score behind a
RESILIENCE_PILLAR_COMBINE_ENABLED env flag. Default is false so this PR
ships zero behavior change in production. When flipped true the
top-level overall_score switches from the 6-domain weighted aggregate
to penalizedPillarScore(pillars) with alpha 0.5 and pillar weights
0.40 / 0.35 / 0.25.

Evidence from docs/snapshots/resilience-pillar-sensitivity-2026-04-21:
- Spearman rank correlation current vs proposed 0.9935
- Mean score delta -13.44 points (every country drops, penalty is
  always at most 1)
- Max top-50 rank swing 6 positions (Russia)
- No ceiling or floor effects under plus/minus 20pct perturbation
- Release gate PASS 0/19

Code change in server/worldmonitor/resilience/v1/_shared.ts:
- New isPillarCombineEnabled() reads env dynamically so tests can flip
  state without reloading the module
- overallScore branches on (isPillarCombineEnabled() AND
  RESILIENCE_SCHEMA_V2_ENABLED AND pillars.length > 0); otherwise falls
  through to the 6-domain aggregate (unchanged default path)
- RESILIENCE_SCORE_CACHE_PREFIX bumped v9 to v10
- RESILIENCE_RANKING_CACHE_KEY bumped v9 to v10

Cache invalidation: the version bump forces both per-country score
cache and ranking cache to recompute from the current code path on
first read after a flag flip. Without the bump, 6-domain values cached
under the flag-off path would continue to serve for up to 6-12 hours
after the flip, producing a ragged mix of formulas.

Ripple of v9 to v10:
- api/health.js registry entry
- scripts/seed-resilience-scores.mjs (both keys)
- scripts/validate-resilience-correlation.mjs,
  scripts/backtest-resilience-outcomes.mjs,
  scripts/validate-resilience-backtest.mjs,
  scripts/benchmark-resilience-external.mjs
- tests/resilience-ranking.test.mts 24 fixture usages
- tests/resilience-handlers.test.mts
- tests/resilience-scores-seed.test.mjs explicit pin
- tests/resilience-pillar-aggregation.test.mts explicit pin
- docs/methodology/country-resilience-index.mdx

New tests/resilience-pillar-combine-activation.test.mts:
7 assertions exercising the flag-on path against the release fixtures
with re-anchored bands (NO at least 60, YE/SO at most 40, NO greater
than US preserved, elite greater than fragile). Regression guard
verifies flipping the flag back off restores the 6-domain aggregate.

tests/resilience-ranking-snapshot.test.mts: band thresholds now
resolve from a METHODOLOGY_BANDS table keyed on
snapshot.methodologyFormula. Backward compatible (missing formula
defaults to domain-weighted-6d bands).

Snapshots:
- docs/snapshots/resilience-ranking-2026-04-21.json tagged
  methodologyFormula domain-weighted-6d
- docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json
  new: top/bottom/major-economies tables projected from the
  52-country sensitivity sample. Explicitly tagged projected (NOT a
  full-universe live capture). When the flag is flipped in production,
  run scripts/freeze-resilience-ranking.mjs to capture the
  authoritative full-universe snapshot.

Methodology doc: Pillar-combined score activation section rewritten to
describe the flag-gated mechanism (activation is an env-var flip, no
code deploy) and the rollback path.

Verification: npm run typecheck:all clean, 397/397 resilience tests
pass (up from 390, +7 activation tests).

Activation plan:
1. Merge this PR with flag default false (zero behavior change)
2. Set RESILIENCE_PILLAR_COMBINE_ENABLED=true in Vercel and Railway env
3. Redeploy or wait for next cold start; v9 to v10 bump forces every
   country to be rescored on first read
4. Run scripts/freeze-resilience-ranking.mjs against the flag-on
   deployment and commit the resulting snapshot
5. Ship a v2.0 methodology-change note explaining the re-anchored
   scale so analysts understand the universal ~13 point score drop is
   a scale rebase, not a country-level regression

Rollback: set RESILIENCE_PILLAR_COMBINE_ENABLED=false, flush
resilience:score:v10:* and resilience:ranking:v10 keys (or wait for
TTLs). The 6-domain formula stays alongside the pillar combine in
_shared.ts and needs no code change to come back.
2026-04-22 06:52:07 +04:00
Elie Habib
502bd4472c docs(resilience): sync methodology/proto/widget to 6-domain + 3-pillar reality (#3264)
Brings every user-facing surface into alignment with the live resilience
scorer. Zero behavior change: overall_score is still the 6-domain
weighted aggregate, schemaVersion is still 2.0 default, and every
existing test continues to pass.

Surfaces touched:
- proto + OpenAPI: rewrote the ResiliencePillar + schema_version
  descriptions. 2.0 is correctly documented as default; shaped-but-empty
  language removed.
- Widget: added missing recovery: 'Recovery' label (was rendering
  literal lowercase recovery before), retitled footer data-version chip
  from Data to Seed date so it is clear the value reflects the static
  seed bundle not every live input, rewrote help tooltip for 6 domains
  and 3 pillars and called out the 0.25 recovery weight.
- Methodology doc: domains-and-weights table now carries all 6 rows
  with actual code weights (0.17/0.15/0.11/0.19/0.13/0.25), Recovery
  section header weight corrected from 1.0 to 0.25, new Pillar-combined
  score activation (pending) section with the measured Spearman 0.9935,
  top-5 movers, and the activation checklist.
- documentation.mdx + features.mdx: product blurbs updated from 5
  domains and 13 dimensions to 6 domains and 19 dimensions grouped into
  3 pillars.
- Tests: recovery-label regression pin, Seed date label pin, clarified
  pillar-schema degenerate-input semantics.

New scaffolding for defensibility:
- docs/snapshots/resilience-ranking-2026-04-21.json frozen published
  tables artifact with methodology metadata and commit SHA.
- docs/snapshots/resilience-pillar-sensitivity-2026-04-21.json live
  Redis capture (52-country sample) combining sensitivity stability
  with the current-vs-proposed Spearman comparison.
- scripts/freeze-resilience-ranking.mjs refresh script.
- scripts/compare-resilience-current-vs-proposed.mjs comparison script.
- tests/resilience-ranking-snapshot.test.mts 13 assertions auto
  discovered from any resilience-ranking-YYYY-MM-DD.json in snapshots.

Verification: npm run typecheck:all clean, 390/390 resilience tests
pass.

Follow-up: pillar-combined score activation. The sensitivity artifact
shows rank-preservation Spearman 0.9935 and no ceiling effects, which
clears the methodological bar. Blocker is messaging because every
country drops ~13 points under the penalty, so activation PR ships with
re-anchored release-gate bands, refreshed frozen ranking, and a v2.0
methodology note.
2026-04-21 22:37:27 +04:00
Elie Habib
e878baec52 fix(digest): DIGEST_ONLY_USER self-expiring (mandatory until= suffix, 48h cap) (#3271)
* fix(digest): DIGEST_ONLY_USER self-expiring (mandatory until= suffix, 48h cap)

Review finding on PR #3255: DIGEST_ONLY_USER was a sticky production
footgun. If an operator forgot to unset after a one-off validation,
the cron silently filtered every other user out indefinitely while
still completing normally (exit 0) — prolonged partial outage with
"green" runs.

Fix: mandatory `|until=<ISO8601>` suffix within 48h of now. Otherwise
the flag is IGNORED with a loud warn, fan-out proceeds normally.
Active filter emits a structured console.warn at run start listing
expiry + remaining minutes.

Valid:
  DIGEST_ONLY_USER=user_xxx|until=2026-04-22T18:00Z

Rejected (→ loud warn, normal fan-out):
- Legacy bare `user_xxx` (missing required suffix)
- Unparseable ISO
- Expiry > 48h (forever-test mistake)
- Expiry in past (auto-disable)

Parser extracted to `scripts/lib/digest-only-user.mjs` (testable
without importing seed-digest-notifications.mjs which has no isMain
guard).

Tests: 17 cases covering unset / reject / active branches, ISO
variants, boundaries, and the 48h cap. 6066 total pass. typecheck × 2
clean.

Breaking change on the flag's format, but it shipped 2h before this
finding with no prod usage — tightening now is cheaper than after
an incident.

* chore(digest): address /ce:review P2s on DIGEST_ONLY_USER parser

Two style fixes flagged by Greptile on PR #3271:

1. Misleading multi-pipe error message.
   `user_xxx|until=<iso>|extra` returned "missing mandatory suffix",
   which points the operator toward adding a suffix that is already
   present (confused operator might try `user_xxx|until=...|until=...`).
   Now distinguishes parts.length===1 ("missing suffix") from >2
   ("expected exactly one '|' separator, got N").

2. Date.parse is lenient — accepts RFC 2822, locale strings, "April 22".
   The documented contract is strict ISO 8601; the 48h cap catches
   accidental-valid dates but the documentation lied. Added a regex
   guard up-front that enforces the ISO 8601 shape
   (YYYY-MM-DD optionally followed by time + TZ). Rejects the 6
   Date-parseable-but-not-ISO fixtures before Date.parse runs.

Both regressions pinned in tests/digest-only-user.test.mjs (18 pass,
was 17). typecheck × 2 clean.
2026-04-21 22:36:30 +04:00
Elie Habib
ec35cf4158 feat(brief): analyst prompt v2 — multi-sentence, grounded, story description (#3269)
* feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description

Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output
indistinguishable from legacy Gemini: identical single-sentence
abstraction ("destabilize / systemic / sovereign risk repricing") with
no named actors, metrics, or dates — in several cases Gemini was MORE
specific.

Root cause: 18–30 word cap compressed context specifics out.

v2 loosens three dials at once so we can settle the A/B:

1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences,
   40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc,
   MUST cite one specific named actor / metric / date / place from
   the context. Analyst path only; gemini path stays on v1.

2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects
   preamble boilerplate + leaked section labels + markdown.

3. Story description plumbed through — endpoint body accepts optional
   story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB).
   Cron forwards it when upstream has one (skipped when it equals the
   headline — no new signal).

Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the
first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length.

If shadow-diff 24h after deploy still shows no delta vs gemini, kill
is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy).

Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean.

* fix(brief): stop truncating v2 multi-sentence output + description in cache hash

Two P1s caught in PR #3269 review.

P1a — cron reparsed endpoint output with v1 single-sentence parser,
silently dropping sentences 2+3 of v2 analyst output. The endpoint had
ALREADY validated the string (parseWhyMattersV2 for analyst path;
parseWhyMatters for gemini). Re-parsing with v1 took only the first
sentence — exact regression #3269 was meant to fix.

Fix: trust the endpoint. Replace re-parse with bounds check (30–500
chars) + stub-echo reject. Added regression test asserting multi-
sentence output reaches the envelope unchanged.

P1b — `story.description` flowed into the analyst prompt but NOT into
the cache hash. Two requests with identical core fields but different
descriptions collided on one cache slot → second caller got prose
grounded in the FIRST caller's description.

Fix: add `description` as the 6th field of `hashBriefStory`. Bump
endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are
dropped. Updated the parity sentinel in brief-llm-core.test.mjs to
match 6-field semantics. Added regression tests covering different-
descriptions-differ and present-vs-absent-differ.

Tests: 6083 pass. typecheck × 2 clean.
2026-04-21 22:25:54 +04:00
Elie Habib
048bb8bb52 fix(brief): unblock whyMatters analyst endpoint (middleware 403) + DIGEST_ONLY_USER filter (#3255)
* fix(brief): unblock whyMatters analyst endpoint + add DIGEST_ONLY_USER filter

Three changes, all operational for PR #3248's brief-why-matters feature.

1. middleware.ts PUBLIC_API_PATHS allowlist
Railway logs post-#3248 merge showed every cron call to
/api/internal/brief-why-matters returning 403 — middleware's "short
UA" guard (~L183) rejects Node undici's default UA before the
endpoint's own Bearer-auth runs. The feature never executed in prod;
three-layer fallback silently shipped legacy Gemini output. Same
class as /api/seed-contract-probe (2026-04-15). Endpoint still
carries its own subtle-crypto HMAC auth, so bypassing the UA gate
is safe.

2. Explicit UA on callAnalystWhyMatters fetch
Defense-in-depth. Explicit 'worldmonitor-digest-notifications/1.0'
keeps the endpoint reachable if PUBLIC_API_PATHS is ever refactored,
and makes cron traffic distinguishable from ops curl in logs.

3. DIGEST_ONLY_USER=user_xxx filter
Operator single-user test flag. Set on Railway to run compose + send
for one user on the next tick (then unset) — validates new features
end-to-end without fanning out. Empty/unset = normal fan-out. Applied
right after rule fetch so both compose and dispatch paths respect it.

Regression tests: 15 new cases in tests/middleware-bot-gate.test.mts
pin every PUBLIC_API_PATHS entry against 3 triggers (empty/short/curl
UA) plus a negative sibling-path suite so a future prefix-match
refactor can't silently unblock /api/internal/.

Tests: 6043 pass. typecheck + typecheck:api clean. biome: pre-existing
main() complexity warning bumped 74→78 by the filter block (unchanged
in character from pre-PR).

* test(middleware): expand sibling-path negatives to cover all 3 trigger UAs

Greptile flagged: `SIBLING_PATHS` was only tested with `EMPTY_UA`. Under
the current middleware chain this is sufficient (sibling paths hit the
short-UA OR BOT_UA 403 regardless), but it doesn't pin *which* guard
fires. A future refactor that moves `PUBLIC_API_PATHS.has(path)` later
in the chain could let a curl or undici UA pass on a sibling path
without this suite failing.

Fix: iterate the 3 sibling paths against all 3 trigger UAs (empty,
short/undici, curl). Every combination must still 403 regardless of
which guard catches it. 6 new test cases.

Tests: 35 pass in the middleware-bot-gate suite (was 29).
2026-04-21 19:41:58 +04:00
Elie Habib
65a1210531 fix(unrest): Decodo proxy fallback for GDELT + surface err.cause (#3256)
* fix(unrest): Decodo proxy fallback for GDELT + surface err.cause

Background: unrestEvents went STALE_SEED when every tick logged
"GDELT failed: fetch failed" (Railway log 2026-04-21). The bare
"fetch failed" string hid the actual cause (DNS/TCP/TLS), so the
outage was opaque. ACLED is disabled (no credentials) so GDELT is
the sole live source — when it fails, the seed freezes.

Changes:
- fetchGdeltEvents: direct-first, Decodo proxy fallback via
  httpsProxyFetchRaw when PROXY_URL is configured. Mirrors
  imfFetchJson / _yahoo-fetch.mjs direct→proxy pattern.
- Error messages now include err.cause.code (UND_ERR_CONNECT_TIMEOUT,
  ENOTFOUND, ECONNRESET, etc.) so the next outage surfaces the
  underlying transport error instead of "fetch failed".
- Both-paths-failed error carries direct + proxy message so either
  can be diagnosed from a single log line.

No behavior change on the happy path — direct fetch still runs first
with the existing 30s AbortSignal timeout.

* fix(unrest): address PR #3256 P2 review

- describeErr: handle plain-string .cause (e.g. `{ cause: 'ENOTFOUND' }`)
  that would otherwise be silently dropped since a string has no
  .code/.errno/.message accessors.
- fetchGdeltDirect: tag HTTP-status errors (!resp.ok) with httpStatus.
  fetchGdeltEvents skips the proxy hop for upstream HTTP errors since
  the proxy routes to the same GDELT endpoint — saves the 20s proxy
  timeout and avoids a pointless retry. Transport failures (DNS/TCP/TLS
  timeouts against Railway IPs) still trigger the proxy fallback, which
  is the motivating case.
2026-04-21 19:39:16 +04:00
Elie Habib
2f19d96357 feat(brief): route whyMatters through internal analyst-context endpoint (#3248)
* feat(brief): route whyMatters through internal analyst-context endpoint

The brief's "why this is important" callout currently calls Gemini on
only {headline, source, threatLevel, category, country} with no live
state. The LLM can't know whether a ceasefire is on day 2 or day 50,
that IMF flagged >90% gas dependency in UAE/Qatar/Bahrain, or what
today's forecasts look like. Output is generic prose instead of the
situational analysis WMAnalyst produces when given live context.

This PR adds an internal Vercel edge endpoint that reuses a trimmed
variant of the analyst context (country-brief, risk scores, top-3
forecasts, macro signals, market data — no GDELT, no digest-search)
and ships it through a one-sentence LLM call with the existing
WHY_MATTERS_SYSTEM prompt. The endpoint owns its own Upstash cache
(v3 prefix, 6h TTL), supports a shadow mode that runs both paths in
parallel for offline diffing, and is auth'd via RELAY_SHARED_SECRET.

Three-layer graceful degradation (endpoint → legacy Gemini-direct →
stub) keeps the brief shipping on any failure.

Env knobs:
- BRIEF_WHY_MATTERS_PRIMARY=analyst|gemini (default: analyst; typo → gemini)
- BRIEF_WHY_MATTERS_SHADOW=0|1 (default: 1; only '0' disables)
- BRIEF_WHY_MATTERS_SHADOW_SAMPLE_PCT=0..100 (default: 100)
- BRIEF_WHY_MATTERS_ENDPOINT_URL (Railway, optional override)

Cache keys:
- brief:llm:whymatters:v3:{hash16} — envelope {whyMatters, producedBy,
  at}, 6h TTL. Endpoint-owned.
- brief:llm:whymatters:shadow:v1:{hash16} — {analyst, gemini, chosen,
  at}, 7d TTL. Fire-and-forget.
- brief:llm:whymatters:v2:{hash16} — legacy. Cron's fallback path
  still reads/writes during the rollout window; expires in ≤24h.

Tests: 6022 pass (existing 5915 + 12 core + 36 endpoint + misc).
typecheck + typecheck:api + biome on changed files clean.

Plan (Codex-approved after 4 rounds):
docs/plans/2026-04-21-001-feat-brief-why-matters-analyst-endpoint-plan.md

* fix(brief): address /ce:review round 1 findings on PR #3248

Fixes 5 findings from multi-agent review, 2 of them P1:

- #241 P1: `.gitignore !api/internal/**` was too broad — it re-included
  `.env`, `.env.local`, and any future secret file dropped into that
  directory. Narrowed to explicit source extensions (`*.ts`, `*.js`,
  `*.mjs`) so parent `.env` / secrets rules stay in effect inside
  api/internal/.

- #242 P1: `Dockerfile.digest-notifications` did not COPY
  `shared/brief-llm-core.js` + `.d.ts`. Cron would have crashed at
  container start with ERR_MODULE_NOT_FOUND. Added alongside
  brief-envelope + brief-filter COPY lines.

- #243 P2: Cron dropped the endpoint's source/producedBy ground-truth
  signal, violating PR #3247's own round-3 memory
  (feedback_gate_on_ground_truth_not_configured_state.md). Added
  structured log at the call site: `[brief-llm] whyMatters source=<src>
  producedBy=<pb> hash=<h>`. Endpoint response now includes `hash` so
  log + shadow-record pairs can be cross-referenced.

- #244 P2: Defense-in-depth prompt-injection hardening. Story fields
  flowed verbatim into both LLM prompts, bypassing the repo's
  sanitizeForPrompt convention. Added sanitizeStoryFields helper and
  applied in both analyst and gemini paths.

- #245 P2: Removed redundant `validate` option from callLlmReasoning.
  With only openrouter configured in prod, a parse-reject walked the
  provider chain, then fell through to the other path (same provider),
  then the cron's own fallback (same model) — 3x billing on one reject.
  Post-call parseWhyMatters check already handles rejection cleanly.

Deferred to P3 follow-ups (todos 246-248): singleflight, v2 sunset,
misc polish (country-normalize LOC, JSDoc pruning, shadow waitUntil,
auto-sync mirror, context-assembly caching).

Tests: 6022 pass. typecheck + typecheck:api clean.

* fix(brief-why-matters): ctx.waitUntil for shadow write + sanitize legacy fallback

Two P2 findings on PR #3248:

1. Shadow record was fire-and-forget without ctx.waitUntil on an Edge
   function. Vercel can terminate the isolate after response return,
   so the background redisPipeline write completes unreliably — i.e.
   the rollout-validation signal the shadow keys were supposed to
   provide was flaky in production.

   Fix: accept an optional EdgeContext 2nd arg. Build the shadow
   promise up front (so it starts executing immediately) then register
   it with ctx.waitUntil when present. Falls back to plain unawaited
   execution when ctx is absent (local harness / tests).

2. scripts/lib/brief-llm.mjs legacy fallback path called
   buildWhyMattersPrompt(story) on raw fields with no sanitization.
   The analyst endpoint sanitizes before its own prompt build, but
   the fallback is exactly what runs when the endpoint misses /
   errors — so hostile headlines / sources reached the LLM verbatim
   on that path.

   Fix: local sanitizeStoryForPrompt wrapper imports sanitizeForPrompt
   from server/_shared/llm-sanitize.js (existing pattern — see
   scripts/seed-digest-notifications.mjs:41). Wraps story fields
   before buildWhyMattersPrompt. Cache key unchanged (hash is over raw
   story), so cache parity with the analyst endpoint's v3 entries is
   preserved.

Regression guard: new test asserts the fallback prompt strips
"ignore previous instructions", "### Assistant:" line prefixes, and
`<|im_start|>` tokens when injection-crafted fields arrive.

Typecheck + typecheck:api clean. 6023 / 6023 data tests pass.

* fix(digest-cron): COPY server/_shared/llm-sanitize into digest-notifications image

Reviewer P1 on PR #3248: my previous commit (4eee22083) added
`import sanitizeForPrompt from server/_shared/llm-sanitize.js` to
scripts/lib/brief-llm.mjs, but Dockerfile.digest-notifications cherry-
picks server/_shared/* files and doesn't copy llm-sanitize. Import is
top-level/static — the container would crash at module load with
ERR_MODULE_NOT_FOUND the moment seed-digest-notifications.mjs pulls in
scripts/lib/brief-llm.mjs. Not just on fallback — every startup.

Fix: add `COPY server/_shared/llm-sanitize.js server/_shared/llm-sanitize.d.ts`
next to the existing brief-render COPY line. Module is pure string
manipulation with zero transitive imports — nothing else needs to land.

Cites feedback_validation_docker_ship_full_scripts_dir.md in the comment
next to the COPY; the cherry-pick convention keeps biting when new
cross-dir imports land in scripts/lib/ or scripts/shared/.

Can't regression-test at build time from this branch without a
docker-build CI job, but the symptom is deterministic — local runs
remain green (they resolve against the live filesystem); only the
container crashes. Post-merge, Railway redeploy of seed-digest-
notifications should show a clean `Starting Container` log line
instead of the MODULE_NOT_FOUND crash my prior commit would have caused.
2026-04-21 14:03:27 +04:00
Elie Habib
89c179e412 fix(brief): cover greeting was hardcoded "Good evening" regardless of issue time (#3254)
* fix(brief): cover greeting was hardcoded "Good evening" regardless of issue time

Reported: a brief viewed at 13:02 local time showed "Good evening" on the
cover (slide 1) but "Good afternoon." on the digest greeting page (slide 2).

Cause: `server/_shared/brief-render.js:renderCover` had the string
`'Good evening'` hardcoded in the cover's mono-cased salutation slot.
The digest greeting page (slide 2) renders the time-of-day-correct
value from `envelope.data.digest.greeting`, which is computed by
`shared/brief-filter.js:174-179` from `localHour` in the user's TZ
(< 12 → morning, < 18 → afternoon, else → evening). So any brief
viewed outside the literal evening showed an inconsistent pair.

Fix: thread `digest.greeting` into `renderCover`; a small
`coverGreeting()` helper strips the trailing period so the cover's
no-punctuation mono style is preserved. On unexpected/missing values
it falls back to a generic "Hello" rather than silently re-hardcoding
a specific time of day.

Tests: 5 regression cases in `tests/brief-magazine-render.test.mjs`
cover afternoon/morning/evening parity, period stripping, and HTML
escape (defense-in-depth). 60 total in that file pass. Full
test:data 5921 pass. typecheck + typecheck:api + biome clean.

* chore(brief): fix orphaned JSDoc on coverGreeting / renderCover

Greptile flagged: the original `renderCover` JSDoc block stayed above
`coverGreeting` when the helper was inserted, so the @param shape was
misattributed to the wrong function and `renderCover` was left
undocumented (plus the new `greeting` field was unlisted).

Moved the opts-shape JSDoc to immediately above `renderCover` and
added `greeting: string` to the param type. `coverGreeting` keeps its
own prose comment.

No runtime change.
2026-04-21 13:46:21 +04:00
Elie Habib
b0928f213c fix(live-webcams): refresh Iran-Attacks multicam + Mideast Mecca video IDs (#3251)
User reported two tiles showing "This live stream recording is not
available" — the pinned fallbackVideoIds had gone dark.

- iran-multicam (Iran Attacks → Middle East slot):
    FGUKbzulB_Y → KSwPNkzEgxg
- mecca (Mideast → Mecca slot):
    Cm1v4bteXbI → kJwEsQTegxk

Values supplied by the user from working YouTube live URLs. Only
`fallbackVideoId` is read by the runtime (buildEmbedUrl line 303, open-link
line 422); `channelHandle` is metadata and left as-is.
2026-04-21 13:09:28 +04:00
Elie Habib
c279f6f426 fix(pro-marketing): nav reflects auth state, hide pro banner for pro users (#3250)
* fix(pro-marketing): reflect auth state in nav, hide pro banner for pro users

Two related signed-in-experience bugs caught by the user during the
post-purchase flow:

1. /pro Navbar's SIGN IN button never reacted to auth state. The
   component was a static const Navbar = () => <nav>...</nav>; with
   no Clerk subscription, so signing in left the SIGN IN button in
   place even though the user was authenticated.

2. The "Pro is launched — Upgrade to Pro" announcement banner on the
   main app showed for ALL visitors including paying Pro subscribers.
   Pitching upgrade to a customer who already paid is a small but
   real annoyance, and it stays sticky for 7 days via the localStorage
   dismiss key — so a returning paying user dismisses it once and
   then never sees the (genuinely useful) banner again if they later
   downgrade.

## Changes

### pro-test/src/App.tsx — useClerkUser hook + ClerkUserButton

- New useClerkUser() hook subscribes to Clerk via clerk.addListener
  and returns { user, isLoaded } so any component can react to auth
  changes (sign-in, sign-out, account switch).
- New ClerkUserButton component mounts Clerk's native UserButton
  widget (avatar + dropdown with profile/sign-out) into a div via
  clerk.mountUserButton — inherits the existing dark-theme appearance
  options from services/checkout.ts::ensureClerk.
- Navbar swaps SIGN IN button for ClerkUserButton when user is
  signed in. Slot is intentionally empty during isLoaded=false to
  avoid a SIGN IN → avatar flicker for returning users.
- Hero hides its redundant SIGN IN CTA when signed in; collapses to
  just "Choose Plan" which is the relevant action for returning users.
- Public/pro/ rebuilt to ship the change (per PR #3229's bundle-
  freshness rule).

### src/components/ProBanner.ts — premium-aware show + reactive auto-hide

- showProBanner returns early if hasPremiumAccess() — same authoritative
  signal used by the frontend's panel-gating layer (unions API key,
  tester key, Clerk pro role, AND Convex Dodo entitlement).
- onEntitlementChange listener auto-dismisses the banner if a Convex
  snapshot arrives mid-session that flips the user to premium (e.g.
  Dodo webhook lands while they're sitting on the dashboard). Does NOT
  write the dismiss timestamp, so the banner reappears correctly if
  they later downgrade.

## Test plan

### pro-test (sign-in UI)

- [ ] Anonymous user loads /pro → SIGN IN button visible in nav.
- [ ] Click SIGN IN, complete Clerk modal → button replaced with
      Clerk's UserButton avatar dropdown.
- [ ] Open dropdown, click Sign Out → reverts to SIGN IN button.
- [ ] Hard reload as signed-in user → SIGN IN button never flashes;
      avatar appears once Clerk loads.

### main app (banner gating)

- [ ] Anonymous user loads / → "Pro is launched" banner shows.
- [ ] Click ✕ to dismiss → banner stays dismissed for 7 days
      (existing behavior preserved).
- [ ] Pro user (active Convex entitlement) loads / → banner does
      NOT appear, regardless of dismiss state.
- [ ] Free user opens /, then completes checkout in another tab and
      Convex publishes the entitlement snapshot → banner auto-hides
      in the dashboard tab without reload.
- [ ] Pro user whose subscription lapses (validUntil < now) → banner
      reappears on next page load, since dismiss timestamp wasn't
      written by the entitlement-change auto-hide.

* fix(pro-banner): symmetric show/hide on entitlement change

Reviewer caught that the previous iteration only handled the upgrade
direction (premium snapshot → hide banner) but never re-showed the
banner on a downgrade. App.ts calls showProBanner() once at init, so
without a symmetric show path, a session that started premium and
then lost entitlement (cancellation, billing grace expiry, plan
downgrade for the same user) would stay banner-less for the rest of
the SPA session — until a full reload re-ran App.ts init.

Net effect of the bug: the comment claiming "the banner reappears
correctly if they later downgrade or the entitlement lapses" was
false in practice for any in-tab transition.

Two changes:

  1. Cache the container on every showProBanner() call, including
     the early-return paths. App.ts always calls showProBanner()
     once at init regardless of premium state, so this guarantees
     the listener has the container reference even when the initial
     mount was skipped (premium user, dismissed, in iframe).

  2. Make onEntitlementChange handler symmetric:
       - premium snapshot + visible → hide (existing behavior)
       - non-premium snapshot + not visible + cached container +
         not dismissed + not in iframe → re-mount via showProBanner

The non-premium re-mount goes through showProBanner() so it gets the
same gate checks as the initial path (isDismissed, iframe, premium).
We can never surface a banner the user has already explicitly ✕'d
this week.

Edge cases handled:
  - User starts premium, no banner shown, downgrades mid-session
    → listener fires, premium false, no bannerEl, container cached,
       not dismissed → showProBanner mounts banner ✓
  - User starts free, sees banner, upgrades mid-session
    → listener fires, premium true, bannerEl present → fade out ✓
  - User starts free, dismisses banner, upgrades, downgrades
    → listener fires on downgrade, premium false, no bannerEl,
       container cached, isDismissed=true → showProBanner returns early ✓
  - User starts free, banner showing, multiple entitlement snapshots
    arrive without state change → premium=false && bannerEl present,
    neither branch fires, idempotent no-op ✓

* fix(pro-banner): defer initial mount while entitlement is loading

Greptile P1 round-2: hasPremiumAccess() at line 48 reads isEntitled()
synchronously, but the Convex entitlement subscription is fired
non-awaited at App.ts:868 (`void initEntitlementSubscription()`).
showProBanner() runs at App.ts:923 during init Phase 1, before the
first Convex snapshot arrives.

So a Convex-only paying user (Clerk role 'free' + Dodo entitlement
tier=1) sees this sequence:

  t=0    init runs → hasPremiumAccess() === false (isEntitled() reads
         currentState===null) → "Upgrade to Pro" banner mounts
  t=~1s  Convex snapshot arrives → onEntitlementChange fires → my
         listener detects premium=true && bannerEl !== null → fade out

That's a 1+ second flash of "you should upgrade!" content for someone
who has already paid. Worst case is closer to ~10s on a cold-start
Convex client, which is much worse — looks like the upgrade pitch is
the actual UI.

Defer the initial mount when (1) the user is signed in (so they
plausibly have a Convex entitlement) AND (2) the entitlement state
hasn't loaded yet (currentState === null). The existing
onEntitlementChange listener will mount it later if the first
snapshot confirms the user is actually free.

Two reasons this is gated on "signed in":
  - Anonymous users will never have a Convex entitlement, so
    deferring would mean the banner NEVER mounts for them. Bad
    regression: anon visitors are the highest-value audience for
    the upgrade pitch.
  - For signed-in users, the worst case if no entitlement EVER
    arrives is the banner stays absent — which is identical to a
    paying user's correct state, so it fails-closed safely.

Edge case behavior:
  - Anonymous user: no Clerk session → first condition false →
    banner mounts immediately ✓
  - Signed-in free user with first snapshot pre-loaded somehow:
    second condition false → banner mounts immediately ✓
  - Signed-in user, snapshot pending: deferred → listener mounts
    on first snapshot if user turns out free ✓
  - Signed-in user, snapshot pending, user turns out premium: never
    mounted ✓ (the desired path)
  - Signed-in user, snapshot pending, never arrives (Convex outage):
    banner never shows → see above, this fails-closed safely
2026-04-21 11:01:57 +04:00
Elie Habib
6977e9d0fe fix(gateway): accept Dodo entitlement as pro, not just Clerk role — unblocks paying users (#3249)
* fix(gateway): accept Dodo entitlement as pro, not just Clerk role

The gateway's legacy premium-paths gate (lines 388-401) was rejecting
authenticated Bearer users with 403 "Pro subscription required"
whenever session.role !== 'pro' — which is EVERY paying Dodo
subscriber, because the Dodo webhook pipeline writes Convex
entitlements and does NOT sync Clerk publicMetadata.role.

So the flow was:
  - User pays, Dodo webhook fires, Convex entitlement tier=1 written
  - User loads the dashboard, Clerk token includes Bearer but role='free'
  - Gateway sees role!=='pro' → 403 on every intelligence/trade/
    economic/sanctions premium endpoint
  - User sees a blank dashboard despite having paid

This is the exact split-brain documented at the frontend layer
(src/services/panel-gating.ts:11-27): "The Convex entitlement check
is the authoritative signal for paying customers — Clerk
`publicMetadata.plan` is NOT written by our webhook pipeline". The
frontend was fixed by having hasPremiumAccess() fall through to
isEntitled() from Convex. The backend gateway still had the
Clerk-role-only gate, so paying users got rejected even though
their Convex entitlement was active.

Align the gateway gate with the logic already in
server/_shared/premium-check.ts::isCallerPremium (line 44-49):

  1. If Clerk role === 'pro' → allow (fast path, no Redis/Convex I/O)
  2. Else if session.userId → look up Convex entitlement; allow if
     tier >= 1 AND validUntil >= Date.now() (covers lapsed subs)
  3. Else → 403

Same two-signal semantics as the per-handler isCallerPremium, so
the gateway and handlers can't disagree on who is premium. Uses
the already-imported getEntitlements function (line 345 already
imports it dynamically; promoting to top-level import since the new
site is in a hotter path).

Impact: unblocks all Dodo subscribers whose Clerk role is still
'free' — the common case after any fresh Pro purchase and for
every user since webhook-based role sync was never wired up.

Reported 2026-04-21 post-purchase flow: user completed Dodo payment,
landed back on dashboard, saw 403s on get-regional-snapshot,
get-tariff-trends, list-comtrade-flows, get-national-debt,
deduct-situation — all 5 are in PREMIUM_RPC_PATHS but not in
ENDPOINT_ENTITLEMENTS, so they hit this legacy gate.

* fix(gateway): move entitlement fallback to the gate that actually fires

Reviewer caught that the previous iteration of this fix put the
entitlement fallback at line ~400, inside an `if (sessionUserId &&
!keyCheck.valid && needsLegacyProBearerGate)` branch that's
unreachable for the case the PR was supposed to fix:

  - sessionUserId is only resolved when isTierGated is true (line 292)
    — JWKS lookup is intentionally skipped for non-tier-gated paths.
  - needsLegacyProBearerGate IS the non-tier-gated set
    (PREMIUM_RPC_PATHS && !isTierGated).
  - So sessionUserId is null, the branch never enters, and the actual
    legacy-Bearer rejection still happens earlier at line 367 inside
    the `keyCheck.required && !keyCheck.valid` branch.

Move the entitlement fallback INTO the line-367 check, where the
Bearer is already being validated and `session.userId` is already
exposed on the validateBearerToken() result. No extra JWKS round-trip
needed (validateBearerToken already verified the JWT). The previously-
added line-400 block is removed since it never ran.

Now for a paying Dodo subscriber whose Clerk role is still 'free':
  - Bearer validates → role !== 'pro'
  - Fall through: getEntitlements(session.userId) → tier=1, validUntil future
  - allowed = true, request proceeds to handler

Same fail-closed semantics as before for the negative cases:
  - Anonymous → no Bearer → 401
  - Bearer with invalid JWT → 401
  - Free user with no Dodo entitlement → 403
  - Pro user whose Dodo subscription lapsed (validUntil < now) → 403

* chore(gateway): drop redundant dynamic getEntitlements import

Greptile spotted that the previous commit promoted getEntitlements to
a top-level import for the new line-385 fallback site, but the older
dynamic import at line 345 (in the user-API-key entitlement check
branch) was left in place. Same module, same symbol, so the dynamic
import is now dead weight that just adds a microtask boundary to the
hot path.

Drop it; line 345's `getEntitlements(sessionUserId)` call now resolves
through the top-level import like the line-385 site already does.
2026-04-21 10:55:09 +04:00
Elie Habib
4d9ae3b214 feat(digest): topic-grouped brief ordering (size-first) (#3247) 2026-04-21 08:58:02 +04:00
Elie Habib
ee93fb475f fix(portwatch): cut HISTORY_DAYS 90 to 60 so per-country fits 540s budget (#3246)
Prod log 2026-04-21T00:02Z on the standalone portwatch-port-activity
service confirmed the per-country shape at 90 days still doesn't fit
even in its own 540s container:

  batch 1/15: 7 seeded, 5 errors (90.0s)
  batch 5/15: 42 seeded, 18 errors (392.6s)
  [SIGTERM at 540s after ~batch 7]

Math: avg ~75s/batch × 15 batches = 1125s needed vs 540s available.
Degradation guard would reject the ~50-country partial publish against
a prev snapshot of ~150+ countries.

60 days is the minimum window that still covers both aggregates the UI
consumes: last30 (days 0-30, all current metrics) + prev30 (days 30-60,
for trendDelta). Cutting from 90d→60d drops each per-country query by
~33% in row count and page count. Expected avg batch time ~50s.

No feature regression: last7/anomalySignal were already no-ops because
ArcGIS's Daily_Ports_Data max date lags 10+ days behind now, so no row
ever falls in the last-7-day window regardless of HISTORY_DAYS.

Test added asserting HISTORY_DAYS=60 so an accidental revert breaks CI.
55 portwatch tests pass. Typecheck + lint clean.
2026-04-21 06:37:40 +04:00
Elie Habib
56a792bbc4 docs(marketing): bump source-count claims from 435+ to 500+ (#3241)
Feeds.ts is at 523 entries after PR #3236 landed. The "435+" figure
has been baked into marketing copy, docs, press kit, and localized
strings for a long time and is now noticeably understated. Bump to
500+ as the new canonical figure.

Also aligned three stale claims in less-visited docs:
  docs/getting-started.mdx        70+ RSS feeds        => 500+ RSS feeds
  docs/ai-intelligence.mdx        344 sources          => 500+ sources
  docs/COMMUNITY-PROMOTION-GUIDE  170+ news feeds      => 500+ news feeds
                                  170+ news sources    => 500+ news sources

And bumped the digest-dedup copy 400+ to 500+ (English + French
locales + pro-test/index.html prerendered body) for consistency with
the pricing and GDELT panels.

Left alone on purpose (different metric):
  22 services / 22 service domains
  24 feeds (security-advisory seeder specifically)
  31 sources (freshness tracker)
  45 map layers

Rebuilt /pro bundle so the per-locale chunks + prerendered index.html
under public/pro/assets ship the new copy. 20 locales updated.
2026-04-20 22:39:42 +04:00
Elie Habib
d880f6a0e7 refactor(aviation): consolidate intl+FAA+NOTAM+news seeds into seed-aviation.mjs (#3238)
* refactor(aviation): consolidate intl+FAA+NOTAM+news seeds into seed-aviation.mjs

seed-aviation.mjs was misnamed: it wrote to a dead Redis key while the
51-airport AviationStack loop + ICAO NOTAM loop lived hidden inside
ais-relay.cjs, duplicating the NOTAM write already done by
seed-airport-delays.mjs.

Make seed-aviation.mjs the single home for every aviation Redis key:
  aviation:delays:intl:v3     (AviationStack 51 intl — primary)
  aviation:delays:faa:v1      (FAA ASWS 30 US)
  aviation:notam:closures:v2  (ICAO NOTAM 60 global)
  aviation:news::24:v1        (9 RSS feeds prewarmer)

One unified AIRPORTS registry (~85 entries) replaces the three separate lists.
Notifications preserved via wm:events:queue LPUSH + SETNX dedup; prev-state
migrated from in-process Sets to Redis so short-lived cron runs don't spam
on every tick. ICAO quota-exhaustion backoff retained.

Contracts preserved byte-identically for consumers (AirportDelayAlert shape,
seed-meta:aviation:{intl,faa,notam} meta keys, runSeed envelope writes).

Impact: kills ~8,640/mo wasted AviationStack calls (dead-key writes), strips
~490 lines of hidden seed logic from ais-relay, eliminates duplicate NOTAM
writer. Net -243 lines across three files.

Railway steps after merge:
  1. Ensure seed-aviation service env has AVIATIONSTACK_API + ICAO_API_KEY.
  2. Delete/disable the seed-airport-delays Railway service.
  3. ais-relay redeploys automatically; /aviationstack + /notam live proxies
     for user-triggered flight lookups preserved.

* fix(aviation): preserve last-good intl snapshot on unhealthy/skipped fetch + restore NOTAM quota-exhaust handling

Review feedback on PR #3238:

(1) Intl unhealthy → was silently overwriting aviation:delays:intl:v3 with
    an empty or partial snapshot because fetchAll() always returned
    { alerts } and zeroIsValid:true let runSeed publish. Now:
      • seedIntlDelays() returns { alerts, healthy, skipped } unchanged
      • fetchAll() refuses to publish when !healthy || skipped:
          - extendExistingTtl([INTL_KEY, INTL_META_KEY], INTL_TTL)
          - throws so runSeed enters its graceful catch path (which also
            extends these TTLs — idempotent)
      • Per-run cache (cachedRun) short-circuits subsequent withRetry(3)
        invocations so the retries don't burn 3x NOTAM quota + 3x FAA/RSS
        fetches when intl is sick.

(2) NOTAM quota exhausted — PR claimed "preserved" but only logged; the
    NOTAM data key was drifting toward TTL expiry and seed-meta was going
    stale, which would flip api/health.js maxStaleMin=240 red after 4h
    despite the intended 24h backoff window. Now matches the pre-strip
    ais-relay behavior byte-for-byte:
      • extendExistingTtl([NOTAM_KEY], NOTAM_TTL)
      • upstashSet(NOTAM_META_KEY, {fetchedAt: now, recordCount: 0,
        quotaExhausted: true}, 604800)
    Consumers keep serving the last known closure list; health stays green.

Also added extendExistingTtl fallbacks on FAA/NOTAM network-rejection paths
so transient network failures also don't drift to TTL expiry.

* refactor(aviation): move secondary writes + notifications into afterPublish

Review feedback on PR #3238: fetchAll() was impure — it wrote FAA / NOTAM /
news and dispatched notifications during runSeed's fetch phase, before the
canonical aviation:delays:intl:v3 publish ran. If that later publish failed,
consumers could see fresh FAA/NOTAM/news alongside a stale intl key, and
notifications could fire for a run whose primary key never published,
breaking the "single home / one cron tick" atomic contract.

Restructure:
  • fetchAll() now pure — returns { intl, faa, notam, news + rejection refs }.
    No Redis writes, no notifications.
  • Intl gate stays: unhealthy / skipped → throw. runSeed's catch extends
    TTL on INTL_KEY + seed-meta:aviation:intl and exits 0. afterPublish
    never runs, so no side effects escape.
  • publishTransform extracts { alerts } from the bundle for the canonical
    envelope; declareRecords sees the transformed shape.
  • afterPublish handles ALL secondary writes (FAA, NOTAM, news) and
    notification dispatch. Runs only after a successful canonical publish.
  • Per-run memo (cachedBundle) still short-circuits withRetry(3) retries
    so NOTAM quota isn't burned 3x when intl is sick.

NOTAM quota-exhaustion + rejection TTL-extend branches preserved inside
afterPublish — same behavior, different location.

* refactor(aviation): decouple FAA/NOTAM/news side-cars from intl's runSeed gate

Review feedback on PR #3238: the previous refactor coupled all secondary
outputs to the AviationStack primary key. If AVIATIONSTACK_API was missing
or intl was systemically unhealthy, fetchAll() threw → runSeed skipped
afterPublish → FAA/NOTAM/news all went stale despite their own upstream
sources being fine. Before consolidation, FAA and NOTAM each ran their own
cron and could freshen independently. This restores that independence.

Structure:
  • Three side-car runners: runFaaSideCar, runNotamSideCar, runNewsSideCar.
    Each acquires its own Redis lock (aviation:faa / aviation:notam /
    aviation:news — distinct from aviation:intl), fetches its source,
    writes data-key + seed-meta on success, extends TTL on failure,
    releases the lock. Completely independent of the AviationStack path.
  • NOTAM side-car keeps the quota-exhausted + rejection handling and
    dispatches notam_closure notifications inline.
  • main() runs the three side-cars sequentially, then hands off to runSeed
    for intl. runSeed still process.exit()s at the end so it remains the
    last call.
  • Intl's afterPublish now only dispatches aviation_closure notifications
    (its single responsibility).

Removed: the per-run memo for fetchAll (no longer needed — withRetry now
only re-runs the intl fetch, not FAA/NOTAM/RSS).

Net behavior:
  • AviationStack 500s / missing key → FAA, NOTAM, news still refresh
    normally; only aviation:delays:intl:v3 extends TTL + preserves prior
    snapshot.
  • ICAO quota exhausted → NOTAM extends TTL + writes fresh meta (as before);
    FAA/intl/news unaffected.
  • FAA upstream failure → only FAA extends TTL; other sources unaffected.

* fix(aviation): correct Gaborone ICAO + populate FAA alert meta from registry

Greptile review on PR #3238:

P1: GABS is not the ICAO for Gaborone — the value was faithfully copied
    from the pre-strip ais-relay NOTAM list which was wrong. Botswana's
    ICAO prefix is FB; the correct code is FBSK. NOTAM queries for GABS
    would silently exclude Gaborone from closure detection. (Pre-existing
    bug in the repo; fixing while in this neighborhood.)

P2 (FAA alerts): Now that the unified AIRPORTS registry carries
    icao/name/city/country for every FAA airport, use it. Previous code
    returned icao:'', name:iata, city:'' — consumers saw bare IATA codes
    for US-only alerts. Registry lookup via a new FAA_META map; lat/lon
    stays 0,0 by design (FAA rows aren't rendered on the globe, so lat/lon
    is intentionally absent from those registry rows).

P2 (NOTAM TTL on quota exhaustion): already fixed in commit ba7ed014e
    (pre-decouple) — confirmed line 803 calls extendExistingTtl([NOTAM_KEY])
    and line 805 writes fresh meta with quotaExhausted=true.
2026-04-20 22:37:49 +04:00
Elie Habib
ecd56d4212 feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast (#3236)
* feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast

Four direct-RSS sources verified from a clean IP and absent everywhere
in the repo (src/config/feeds.ts, scripts/seed-*, ais-relay.cjs, RSS
allowlist). Closes the highest-ROI Iran / Israel domestic-press gap
from the ME source audit (PR #3226) with zero infra changes.

- IRNA        https://en.irna.ir/rss
- Mehr News   https://en.mehrnews.com/rss
- Jerusalem Post https://www.jpost.com/rss/rssfeedsheadlines.aspx
- Ynetnews    https://www.ynetnews.com/Integration/StoryRss3089.xml

Propaganda-risk metadata:
- IRNA + Mehr tagged high / Iran state-affiliated (join Press TV).
- JPost + Ynetnews tagged low with knownBiases for transparency.

RSS allowlist updated in all three mirrors (shared/, scripts/shared/,
api/_rss-allowed-domains.js) per the byte-identical mirror contract
enforced by tests/edge-functions.test.mjs.

Deferred (separate PRs):
- Times of Israel: already in allowlist; was removed from feeds for
  cloud-IP 403. Needs Decodo routing.
- IDF Spokesperson: idf.il has no direct RSS endpoint; needs scraper.
- Tasnim / Press TV RSS / Israel Hayom: known cloud-IP blocks.
- WAM / SPA / KUNA / QNA / BNA: public RSS endpoints are dead; sites
  migrated to SPAs or gate with 403.

Plan doc (PR #3226) overstated the gap: it audited only feeds.ts and
missed that travel advisories + US Embassy alerts are already covered
by scripts/seed-security-advisories.mjs. NOTAM claim in that doc is
also wrong: we use ICAO's global NOTAM API, not FAA.

* fix(feeds): enable IRNA, Mehr, Jerusalem Post, Ynetnews by default

Reviewer on #3236 flagged that adding the four new ME feeds to
FULL_FEEDS.middleeast alone leaves them disabled on first run, because
App.ts:661 persists computeDefaultDisabledSources() output derived from
DEFAULT_ENABLED_SOURCES. Users would have to manually re-enable via
Settings > Sources, defeating the purpose of broadening the default
ME mix.

Add the four new sources to DEFAULT_ENABLED_SOURCES.middleeast so they
ship on by default. Placement keeps them adjacent to their peers
(IRNA / Mehr with the other Iran sources, JPost / Ynetnews after
Haaretz). Risk/slant tags already in SOURCE_PROPAGANDA_RISK ensure
downstream digest dedup + summarization weights them correctly.

* style(feeds): move JPost + Ynetnews under Low-risk section header

Greptile on #3236 flagged that both entries are risk: 'low' but were
inserted above the `// Low risk - Independent with editorial standards`
comment header, making the section boundary misleading for future
contributors. Shift them under the header where they belong.

No runtime change; cosmetic ordering only.
2026-04-20 19:07:09 +04:00
Elie Habib
661bbe8f09 fix(health): nationalDebt threshold 7d → 60d — match monthly cron interval (#3237)
* fix(health): nationalDebt threshold 7d → 60d to match monthly cron cadence

User reported health showing:
  "nationalDebt": { status: "STALE_SEED", records: 187, seedAgeMin: 10469, maxStaleMin: 10080 }

Root cause: api/health.js had `maxStaleMin: 10080` (7 days) on a seeder
that runs every 30 days via seed-bundle-macro.mjs:
  { label: 'National-Debt', intervalMs: 30 * DAY, ... }

The threshold was narrower than the cron interval, so every month
between days 8–30 it guaranteed STALE_SEED. Original comment
"7 days — monthly seed" even spelled the mismatch out loud.

Data source cadence:
- US Treasury debt_to_penny API: updates daily but we only snapshot latest
- IMF WEO: quarterly/semi-annual release — no value in checking daily
- 30-day cron is appropriate; stale threshold should be ≥ 2× interval

Fix: bump maxStaleMin to 86400 (60 days). Matches the 2× pattern used
by faoFoodPriceIndex + recovery pillar (recoveryFiscalSpace, etc.)
which also run monthly.

Also fixes the same mismatch in scripts/regional-snapshot/freshness.mjs —
the 10080 ceiling there would exclude national-debt from capital_stress
axis scoring 23 days out of every 30 between seeds.

* fix(seed-national-debt): raise CACHE_TTL to 65d so health.js stale window is actually reachable

PR #3237 review was correct: my earlier fix set api/health.js
SEED_META.nationalDebt.maxStaleMin to 60d (86400min), but the seeder's
CACHE_TTL was still 35d. After a missed monthly cron, the canonical key
expired at day 35 — long before the 60d "stale" threshold. Result path:
  hasData=false → api/health.js:545-549 → status = EMPTY (crit)
Not STALE_SEED (warn) as my commit message claimed.

writeFreshnessMetadata() in scripts/_seed-utils.mjs:222 sets meta TTL to
max(7d, ttlSeconds), so bumping ttlSeconds alone propagates to both the
canonical payload AND the meta key.

Fix:
- CACHE_TTL 35d → 65d (5d past the 60d stale window so we get a clean
  STALE_SEED → EMPTY transition without keys vanishing mid-warn).
- runSeed opts.maxStaleMin 10080 (7d) → 86400 (60d) so the in-seeder
  declaration matches api/health.js. Field is only validated for
  presence by runSeed (scripts/_seed-utils.mjs:798), but the drift was
  what hid the TTL invariant in the first place.

Invariant this restores: for any SEED_META entry,
  seeder CACHE_TTL ≥ maxStaleMin + buffer
so the "warn before crit" gradient actually exists.

* fix(freshness): wire national-debt to seed-meta + teach extractTimestamp about seededAt

Reviewer P2 on PR #3237: my earlier freshness.mjs bump to 86400 was a
no-op. classifyInputs() (scripts/regional-snapshot/freshness.mjs:100-108,
122-132) uses the entry's metaKey or extractTimestamp()'s known field
list. national-debt had neither — payload carries only `seededAt`, and
extractTimestamp didn't know that field, so the "present but undated"
branch treated every call as fresh. The age window never mattered.

Two complementary fixes:

1. Add metaKey: 'seed-meta:economic:national-debt' to the freshness
   entry. Primary, authoritative source — seed-meta.fetchedAt is
   written by writeFreshnessMetadata() on every successful run, which is
   also what api/health.js reads, keeping both surfaces consistent.

2. Add `seededAt` to extractTimestamp()'s field list. Defense-in-depth:
   many other runSeed-based scripts (seed-iea-oil-stocks,
   seed-eurostat-country-data, etc.) wrap output as { ..., seededAt: ISO }
   with no metaKey in the freshness registry. Without this, they were
   also silently always-fresh. ISO strings parse via Date.parse.

Note: `economic:eu-gas-storage:v1` uses `seededAt: String(Date.now())` —
a stringified epoch number, which Date.parse does NOT handle. That seed's
freshness classification is still broken by this entry's lack of metaKey,
but it's a separate shape issue out of scope here. Flagged in PR body.
2026-04-20 19:03:47 +04:00
Elie Habib
42a86c5859 fix(preview): skip premium RPCs when main app runs inside /pro live-preview iframe (#3235)
* fix(preview): skip premium RPCs when main app runs inside /pro preview iframe

pro-test/src/App.tsx embeds the full main app as a "live preview"
via <iframe src="https://worldmonitor.app?alert=false" sandbox="...">.
The iframe boots an anonymous main-app session, which fires premium
RPCs (get-regional-snapshot, get-tariff-trends, list-comtrade-flows,
and on country-click the fetchProSections batch) with no Clerk bearer
available. Every call 401s, the circuit breakers catch and fall
through to empty fallbacks (so the preview renders fine), but the
401s surface on the PARENT /pro page's DevTools console and Sentry
because `sandbox` includes `allow-same-origin`.

Net effect: /pro pricing page shows a flood of fake-looking errors
that cost us a session of debugging to trace back to the iframe.
PR #3233's premiumFetch swap didn't help here (there's simply no
token to inject for an anonymous iframe).

Introduce `src/utils/embedded-preview.ts::IS_EMBEDDED_PREVIEW`, a
module-level boolean evaluated once at load from
`window.top !== window` (with try/catch for cross-origin sandboxes),
and short-circuit three init-time premium entry points when true:

  - RegionalIntelligenceBoard.loadCurrent → renderEmpty()
  - fetchTariffTrends → return emptyTariffs
  - fetchComtradeFlows → return emptyComtrade

Plus one defensive gate in country-intel.fetchProSections for the
case a user clicks a country inside the iframe preview.

Each gate returns the exact same empty fallback the breaker would
have produced after a 401, so visual behavior is unchanged — the
preview iframe still shows the dashboard layout with empty premium
panels, just without the network request and its console/Sentry
trail.

Live-tab /pro page should now see zero 401s from regional-snapshot /
tariff-trends / comtrade-flows on load.

* fix(preview): narrow iframe gate to ?embed=pro-preview marker only

Reviewer flagged that the first iteration's `window.top !== window`
check was too broad. The repo explicitly markets "Embeddable iframe
panels" as an Enterprise feature
(pro-test/src/locales/en.json: whiteLabelDesc), so legitimate
customer embeds must keep firing premium RPCs normally. Only the /pro
marketing preview — which is known-anonymous and generates expected
401 noise — should short-circuit.

Fix: replace the blanket iframe check with a unique marker that only
/pro's preview iframe carries.

  - pro-test/src/App.tsx: iframe src switched from
    `?alert=false` (dead param, unused in main app) to
    `?embed=pro-preview`. Rebuilt public/pro/ to ship the change.

  - src/utils/embedded-preview.ts: two-gate check now. Gate 1 still
    requires `window.top !== window` so the marker leaking into a
    top-level URL doesn't disable premium RPCs for the top-level app.
    Gate 2 requires `?embed=pro-preview` in location.search so only
    the known embedder matches. Enterprise white-label embeds without
    this marker behave exactly like a top-level visit.

Same three premium fetchers + the one country-intel path still gate
on IS_EMBEDDED_PREVIEW; the semantic change is purely in how the
flag is computed.

Per PR #3229 / #3228 lesson, the pro-test rebuild ships in the same
PR as the source change — public/pro/assets/index-*.js and index.html
reflect the new iframe src.
2026-04-20 17:49:50 +04:00
Elie Habib
240abaa8ed fix(premium): route RPC clients through premiumFetch — stop 401s for pro users (#3233)
* fix(premium): route premium RPC clients through premiumFetch

Four generated-client instantiations were using plain globalThis.fetch,
bypassing the Clerk bearer / tester-key / WORLDMONITOR_API_KEY injection
chain. Signed-in pro users hit the premium endpoints unauthenticated
and got 401, with no visible path to recovery:

  - src/components/RegionalIntelligenceBoard.ts
      → get-regional-snapshot, get-regime-history, get-regional-brief
  - src/components/DeductionPanel.ts
      → deduct-situation, list-market-implications
  - src/services/trade/index.ts
      → get-tariff-trends, list-comtrade-flows (+ non-premium siblings)
  - src/app/country-intel.ts::fetchProSections
      → get-national-debt, getRegimeHistory/getRegionalBrief,
        list-comtrade-flows

Swap each to `fetch: premiumFetch` (src/services/premium-fetch.ts),
which tries in order: existing auth header → WORLDMONITOR_API_KEY →
tester key → Clerk bearer token → unauthenticated passthrough. For
non-premium endpoints that share the same client (e.g. getTradeFlows,
getCustomsRevenue) the fallthrough behavior is identical to plain
globalThis.fetch — no regression surface.

Surfaces as user-facing 401 on /pro sign-in → redirect-to-/ flow, where
pro users briefly see the dashboard try to fetch then hit 401. After
this fix the bearer token flows through and regional/deduction/trade
panels populate as expected.

Left un-touched (not hitting premium paths today, so not blocking):
  - src/services/gdelt-intel.ts (searchGdeltDocuments)
  - src/services/social-velocity.ts (getSocialVelocity)
  - src/services/pizzint.ts (getPizzintStatus)

If any of those ever move into PREMIUM_RPC_PATHS, swap them too.

* fix(premium): split trade client + disable premium-breaker persistCache

Review found a real access-control leak in the first iteration of this
PR: routing the entire shared TradeServiceClient through premiumFetch
populated module-level breakers with `persistCache: true` and
auth-invariant cache keys. A pro user's get-tariff-trends /
list-comtrade-flows response would be written to localStorage by the
breaker, and a later free / signed-out session on the same browser
would be served that cached premium data directly, bypassing both
auth injection and the gateway's entitlement check.

Two-layer fix so neither the in-memory breaker nor the persistent
cache can leak premium data across auth states:

1. **Split clients.** publicClient keeps plain globalThis.fetch and
   feeds restrictions/flows/barriers/revenue breakers (non-premium,
   shareable across users). premiumClient uses premiumFetch and is
   ONLY used by fetchTariffTrends + fetchComtradeFlows.

2. **Disable persistCache for premium breakers.** tariffsBreaker and
   comtradeBreaker flip to `persistCache: false`. In-memory cache
   within a session is still fine (and expected for circuit-breaker
   behavior), but the response no longer survives a reload / cross-
   session switch where a different user could read it.

Both changes are needed: split clients alone would still let premium
responses ride through the old cached entries (if any) after the
deploy; persistCache:false alone would still mean a shared client
routes anonymous calls through premiumFetch (minor but avoidable
token leak potential). Together they're airtight for the leak vector.

Follow-up potentially worth doing: auth-keyed cache keys for breakers
used by premium data, so the in-tab SPA sign-out case is also sealed.
Not blocking today.

* fix(premium): invalidate in-memory premium breakers on Clerk identity change

Review caught the remaining leak vector: persistCache:false + split
clients closes cross-browser-reload leaks, but the module-level
tariffsBreaker/comtradeBreaker in-memory cache still lives for the
full 30-min / 6-hour TTL inside a single SPA session. A pro user
loads tariff/comtrade data → signs out in the same tab → any caller
is served the pro response from memory without re-auth.

Track the Clerk user id that last populated the premium breakers in a
module-level `lastPremiumUserId`. On every call to fetchTariffTrends
or fetchComtradeFlows, check the current Clerk identity via
getCurrentClerkUser(). If it changed (sign-out, user switch,
free↔pro transition), call clearMemoryCache() on both premium
breakers before executing. The breaker then falls back to a live
fetch through premiumFetch with the new caller's credentials.

`clearMemoryCache` (not `clearCache`) is deliberate — it only touches
the in-memory cache and leaves persistent storage alone. Non-premium
breakers on this same client (restrictions, flows, barriers, revenue)
are untouched: their responses are public and shareable across auth
states, so an identity-triggered clear would only cost us cache hits
with zero security benefit.

Edge cases handled:
  - First call: lastPremiumUserId is undefined, no clear.
  - Anonymous → anonymous: no clear (both null).
  - Anonymous → pro: clears (defensive; anon can't populate anyway
    because emptyTariffs/emptyComtrade fail shouldCache).
  - Pro → anonymous: clears (the critical case).
  - Pro A → pro B: clears (account switch).
  - getCurrentClerkUser throws (Clerk not loaded yet): treated as
    anonymous, safe default.

Closes the audit cycle on this PR's cache-leak thread.

* fix(premium): invalidate breakers on entitlement change, not just user id

Reviewer caught that the prior fingerprint was `userId` only. That
covers sign-out, user switch, and account change — but NOT the case
the prior commit's comment explicitly claimed: a pro→free downgrade
for the same signed-in user (subscription cancellation, billing
grace-period expiry, plan switch to annual-lapsed). The Clerk user id
doesn't change, so the invalidation never fired and the cached pro
response kept serving until the 30-min / 6-hour breaker TTL.

Widen the fingerprint to `${userId}:${plan}` (or 'anon' when
signed-out). `getCurrentClerkUser()` already reads `plan` off
`user.publicMetadata`, so no new Clerk calls needed.

Now covered:
  - pro A → pro B               (userId change)
  - pro → anon                  (userId → null)
  - anon → pro                  (defensive)
  - pro (active) → free (cancelled or expired) for same userId  ← new
  - free → pro (upgrade for same userId)                        ← new

Pro users who actively downgrade (or whose subscription lapses)
during an open tab will see the cached premium response invalidate
on the next premium fetcher call, at which point the live fetch
through premiumFetch + the gateway's entitlement check returns the
correct empty/403 response for their now-free plan.

* fix(premium): fingerprint on hasPremiumAccess, not Clerk publicMetadata.plan

Reviewer caught that Clerk publicMetadata.plan is NOT the authoritative
premium signal in this codebase. Per the explicit docstring on
src/services/panel-gating.ts::hasPremiumAccess (lines 11-27), the
webhook pipeline does NOT write publicMetadata.plan — the authoritative
signal is the Convex Dodo entitlement surfaced by isProUser() /
isEntitled(). Two fingerprint blind spots that ate the prior iteration:

  - Paying user with valid Dodo entitlement but no Clerk metadata:
    publicMetadata.plan === 'free' → fingerprint 'uid:free' → no
    invalidation when their session transitions (they never looked
    premium by the fingerprint, but premiumFetch was still injecting
    their token and the gateway was still serving premium data).
  - Active pro user whose Dodo subscription lapses: Clerk metadata
    doesn't change → fingerprint stays 'uid:pro' → no invalidation
    → cached tariff/comtrade response keeps serving until TTL
    (the exact pro→free case the previous commit claimed to cover).

Swap the plan source from getCurrentClerkUser().plan to
hasPremiumAccess() — the single source of truth used by panel-gating,
widgets, search, and event handlers. It unions API key, tester keys,
Clerk pro role, AND Convex entitlement, so every path that legitimately
grants premium access contributes to the fingerprint, and every path
that revokes it triggers invalidation.

Also add a reactive path: subscribe to onEntitlementChange() from
src/services/entitlements, and wipe the premium breakers the moment
Convex publishes a new entitlement snapshot. This closes the window
between subscription lapse and the user's next premium panel click —
the currently-open tariff panel clears its memory cache immediately
instead of serving stale pro data until the user navigates.

Combined: fingerprint is now (userId, hasPremiumAccess) tuple
evaluated both lazily on every premium fetcher call AND eagerly when
Convex pushes an entitlement change.
2026-04-20 16:28:04 +04:00
Elie Habib
d1ebc84c6c feat(digest-dedup): single-link clustering (F1 0.73 vs 0.53 complete-link) (#3234)
Problem
-------
The post-threshold-tuning brief at
/api/brief/user_3BovQ1tYlaz2YIGYAdDPXGFBgKy/2026-04-20-1532 still
showed 4 copies of "US seizes Iranian ship", 3 copies of the Hormuz
closure, and 2 copies of the oil-price story — despite running the
calibrated 0.55 threshold.

Root cause: complete-link is too strict for wire-headline clustering.
Pairwise cosines in the 4-way ship-seizure cluster:

    1 <-> 5: 0.632    5 <-> 8: 0.692
    1 <-> 8: 0.500    5 <-> 10: 0.656
    1 <-> 10: 0.554   8 <-> 10: 0.510

Complete-link requires EVERY pair to clear threshold. Pair 1<->8 at
0.500 fails so the whole 4-way cluster can't form, and all 4 stories
bubble up as separate reps, eating 4 slots of the 12-story brief.

Measured on the 12 real titles from that brief:

    Algorithm                 | Clusters | F1    | P    | R
    --------------------------|----------|-------|------|------
    complete-link @ 0.55 (was)|        7 | 0.526 | 0.56 | 0.50
    complete-link @ 0.50      |        6 | 0.435 | 0.38 | 0.50
    single-link   @ 0.55      |        4 | 0.435 | 0.28 | 1.00  over-merge
    single-link   @ 0.60      |        6 | 0.727 | 0.67 | 0.80  winner

Change
------
scripts/lib/brief-dedup-embed.mjs:
  New singleLinkCluster(items, {cosineThreshold, vetoFn}) using
  union-find. Chain merges through strong intermediates when a
  direct pair is weak; respects the entity veto (blocked pairs
  don't union). O(N^2 alpha(N)); permutation-invariant by
  construction.

scripts/lib/brief-dedup.mjs:
  New DIGEST_DEDUP_CLUSTERING env var (default 'single', set
  'complete' to revert). readOrchestratorConfig returns 'clustering'
  field. Dispatch at call site picks the right function. Structured
  log line now includes clustering=<algo>.

tests/brief-dedup-embedding.test.mjs:
  +8 regressions:
    - singleLinkCluster chains the 4-way through a bridge
    - veto blocks unions even when cosine passes
    - permutation-invariance property test (5 shuffles)
    - empty-input
    - DIGEST_DEDUP_CLUSTERING default is 'single'
    - DIGEST_DEDUP_CLUSTERING=complete kill switch works
    - unrecognised values fall back to 'single'
    - log line includes clustering=<algo>

Bridge-pollution risk note
--------------------------
The original plan rejected single-link to avoid the Jaccard-era
"bridge pollution" (A~B=0.6, B~C=0.6, A~C=0.3 all chain through a
mixed-topic B). With text-embedding-3-small at cosine >= 0.60, a
bridge must be semantically real — the probe showed a 37% F1 bump
with no new FPs on the production case. Setting
DIGEST_DEDUP_CLUSTERING=complete on Railway is the instant rollback
if a bad day ever surfaces chaining.

Operator activation
-------------------
After merge, on Railway seed-digest-notifications service:

    DIGEST_DEDUP_COSINE_THRESHOLD=0.60

No other changes needed — clustering=single is the default.

Verification
------------
- npm run test:data           5825/5825 pass
- tests/brief-dedup-embedding  53/53   pass (45 existing + 8 new)
- typecheck + typecheck:api   clean
- biome check on changed files clean

Post-Deploy Monitoring & Validation
-----------------------------------
- Grep '[digest] dedup mode=embed clustering=single' in Railway logs
  — confirms the new algo is live
- Expect clusters= to drop further on bulk ticks (stories=700+):
  current ~23 on 84-story ticks -> expected ~15-18
- Manually open next brief post-deploy, visually verify ship-seizure
  / Hormuz / oil stories no longer duplicate
- Rollback: DIGEST_DEDUP_CLUSTERING=complete on Railway (instant,
  no deploy), next cron tick reverts to old behaviour
- Validation window: 24h
- Owner: koala73

Related
-------
- #3200 embedding-based dedup (introduced complete-link)
- #3224 DIGEST_SCORE_MIN floor (the low-importance half of the fix)
2026-04-20 16:21:20 +04:00
Elie Habib
d7393d8010 fix(pro): downgrade @clerk/clerk-js to v5 to restore auto-mount UI (#3232)
The actual root cause behind the "Clerk was not loaded with Ui
components" sign-in failure on /pro is NOT the import path — it's
that pro-test was on @clerk/clerk-js v6.4.0 while the main app
(which works fine) is on v5.125.7.

Clerk v6 fundamentally changed `clerk.load()`: the UI controller
is no longer auto-mounted by default. Both `@clerk/clerk-js` (the
default v6 entry) and `@clerk/clerk-js/no-rhc` (the bundled-UI
variant) expect the caller to either:
  - load Clerk's UI bundle from CDN and pass `window.__internal_ClerkUICtor`
    to `clerk.load({ ui: { ClerkUI } })`, or
  - manually wire up `clerkUICtor`.

That's why my earlier "switch to no-rhc" fix (PR #3227 + #3228)
didn't actually unbreak production — both v6 variants throw the same
assertion. The error stack on the deployed bundle confirmed it:
`assertComponentsReady` from `clerk.no-rhc-UeQvd9Xf.js`.

Fix: pin pro-test to `@clerk/clerk-js@^5.125.7` to match the main
app's working version. v5 still auto-mounts UI on `clerk.load()` —
no extra wiring needed. The plain `import { Clerk } from '@clerk/clerk-js'`
pattern (which the main app uses verbatim and which pro-test had
before #3227) just works under v5.

Verification of the rebuilt bundle (chunk: clerk-PNSFEZs8.js):
  - 3.05 MB (matches main app's clerk-DC7Q2aDh.js: 3.05 MB)
  - 44 occurrences of mountComponent (matches main: 44)
  - 3 occurrences of SignInComponent (matches main: 3)
  - 0 occurrences of "Clerk was not loaded with Ui" (the assertion
    error string is absent; UI is unconditionally mounted)

Includes the rebuilt public/pro/ artifacts so this fix is actually
deployed (PR #3229's CI check will catch any future PR that touches
pro-test/src without rebuilding).
2026-04-20 15:25:11 +04:00
Elie Habib
0a4eff0053 feat(portwatch): split port-activity into standalone Railway cron + restore per-country shape (#3231)
Context: PR #3225 globalised EP3 because the per-country shape was
missing the section budget. Post-merge production log (2026-04-20)
proved the globalisation itself was worse: 42s/page full-table scans
(ArcGIS has no `date` index — confirmed via service metadata probe)
AND intermittent "Invalid query parameters" on the global WHERE.

Probes of outStatistics as an alternative showed it works for small
countries (BRA: 19s, 103 ports) but times out server-side for heavy
ones (USA: 313k historic rows, 30s+ server-compute, multiple
retries returned HTTP_STATUS 000). Not a reliable path.

The only shape ArcGIS reliably handles is per-country WHERE ISO3='X'
AND date > Y (uses the ISO3 index). Its problem was fitting 174
countries in the 420s portwatch bundle budget — solve that by giving
it its own container.

Changes:

- scripts/seed-portwatch-port-activity.mjs: restore per-country
  paginated EP3 with the accumulator shape from PR #3225 folded into
  the per-country loop (memory stays O(ports-per-country), not
  O(all-rows)). Keep every stabiliser: AbortSignal.any through
  fetchWithTimeout, SIGTERM handler with stage/batch/errors flush,
  per-country Promise.race with AbortController that actually cancels
  the work, eager p.catch for mid-batch error flush.
- Add fetchWithRetryOnInvalidParams — single retry on the specific
  "Invalid query parameters" error class ArcGIS has returned
  intermittently in prod. Does not retry other error classes.
- Bump LOCK_TTL_MS from 30 to 60 min to match the wider wall-time
  budget of the standalone cron.
- scripts/seed-bundle-portwatch.mjs: remove PW-Port-Activity from the
  main portwatch bundle. Keeps PW-Disruptions (hourly), PW-Main (6h),
  PW-Chokepoints-Ref (weekly).
- scripts/seed-bundle-portwatch-port-activity.mjs: new 1-section
  bundle. 540s section timeout, 570s bundle budget. Includes the
  full Railway service provisioning checklist in the header.
- Dockerfile.seed-bundle-portwatch-port-activity: mirrors the
  resilience-validation pattern — node:22-alpine, full scripts/ tree
  copy (avoids the add-an-import-forget-to-COPY class that has bit
  us 3+ times), shared/ for _country-resolver.
- tests/portwatch-port-activity-seed.test.mjs: rewrite assertions for
  the per-country shape. 54 tests pass (was 50, +4 for new
  assertions on the standalone bundle + Dockerfile + retry wrapper +
  ISO3 shape). Full test:data: 5883 pass. Typecheck + lint clean.

Post-merge Railway provisioning: see header of
seed-bundle-portwatch-port-activity.mjs for the 7-step checklist.
2026-04-20 15:21:43 +04:00
Elie Habib
234ec9bf45 chore(ci): enforce pro-test bundle freshness — local hook + CI backstop (#3229)
* chore(ci): enforce pro-test bundle freshness, prevent silent deploy staleness

public/pro/ is committed to the repo and served verbatim by Vercel.
The root build script only runs the main app's vite build — it does
NOT run pro-test's build. So any PR that changes pro-test/src/**
without manually running `cd pro-test && npm run build` and committing
the regenerated chunks ships to production with a stale bundle.

This footgun just cost us: PR #3227 fixed the Clerk "not loaded with
Ui components" sign-in bug in source, merged, deployed — and the live
site still threw the error because the committed chunk under
public/pro/assets/ was the pre-fix build. PR #3228 fix-forwarded by
rebuilding.

Two-layer enforcement so it doesn't happen again:

1. .husky/pre-push — mirrors the existing proto freshness block.
   If pro-test/ changed vs origin/main, rebuild and `git diff
   --exit-code public/pro/`. Blocks the push with a clear message if
   the bundle is stale or untracked files appear.

2. .github/workflows/pro-bundle-freshness.yml — CI backstop on any
   PR touching pro-test/** or public/pro/**. Runs `npm ci + npm run
   build` in pro-test and fails the check if the working tree shows
   any diff or untracked files under public/pro/. Required before
   merge, so bypassing the local hook still can't land a stale
   bundle.

Note: the hook's diff-against-origin/main check means it skips the
build when pushing a branch that already matches main on pro-test/
(e.g. fix-forward branches that only touch public/pro/). CI covers
that case via its public/pro/** path filter.

* fix(hooks): scope pro-test freshness check to branch delta, not worktree

The first version of this hook used `git diff --name-only origin/main
-- pro-test/`, which compares the WORKING TREE to origin/main. That
fires on unstaged local pro-test/ scratch edits and blocks pushing
unrelated branches purely because of dirty checkout state.

Switch to `$CHANGED_FILES` (computed earlier at line 77 from
`git diff origin/main...HEAD`), which scopes the check to commits on
the branch being pushed. This matches the convention the test-runner
gates already use (lines 93-97). Also honor `$RUN_ALL` as the safety
fallback when the branch delta can't be computed.

* fix(hooks): trigger pro freshness check on public/pro/ too, match CI

The first scoping fix used `^pro-test/` only, but the CI workflow
keys off both `pro-test/**` AND `public/pro/**`. That left a gap: a
bundle-only PR (e.g. a fix-forward rebuild like #3228, or a hand-edit
to a committed asset) skipped the local check entirely while CI would
still validate it. The hook and CI are now consistent.

Trigger condition: `^(pro-test|public/pro)/` — the rebuild + diff
check now fires whenever the branch delta touches either side of the
source/artifact pair, matching the CI workflow's path filter.
2026-04-20 15:21:25 +04:00
Elie Habib
9e022f23bb fix(cable-health): stop EMPTY alarm during NGA outages — writeback fallback + mark zero-events healthy (#3230)
User reported health endpoint showing:
  "cableHealth": { status: "EMPTY", records: 0, seedAgeMin: 0, maxStaleMin: 90 }

despite the 30-min warm-ping loop running. Two bugs stacked:

1. get-cable-health.ts null-upstream path didn't write Redis.
   cachedFetchJson with a returning-null fetcher stores NEG_SENTINEL
   (10 bytes) in cable-health-v1 for 2 min. Handler then returned
   `fallbackCache || { cables: {} }` to the client WITHOUT writing to
   cable-health-v1 or refreshing seed-meta. api/health.js saw strlen=10
   → strlenIsData=false → hasData=false → records=0 → EMPTY (CRIT).

   Fix: on null result, write the fallback response back to CACHE_KEY
   (short TTL matching NEG_SENTINEL so a recovered NGA fetch can
   overwrite immediately) AND refresh seed-meta with the real count.
   Health now sees hasData=true during an outage.

2. Zero-cables was treated as EMPTY_DATA (CRIT), but `cables: {}` is
   the valid healthy state — NGA had no active subsea cable warnings.
   The old `Math.max(count, 1)` on recordCount was an intentional lie
   to sidestep this; now honest.

   Fix: add `cableHealth` to EMPTY_DATA_OK_KEYS. Matches the existing
   pattern for notamClosures, gpsjam, weatherAlerts — "zero events is
   valid, not critical". recordCount now reports actual cables.length.

Combined: NGA outage → fallback cached locally + written back → health
reads hasData=true, records=N, no false alarm. NGA healthy with zero
active warnings → cables={}, records=0, EMPTY_DATA_OK → OK. NGA healthy
with warnings → cables={...}, records>0 → OK.

Regression guard to keep in mind: if anyone later removes cableHealth
from EMPTY_DATA_OK_KEYS and wants strict zero-events to alarm, they'd
also need to revisit `Math.max(count, 1)` or an equivalent floor so
the "legitimately empty but healthy" state doesn't CRIT.
2026-04-20 15:21:04 +04:00
Elie Habib
2b7f83fd3e fix(pro): regenerate /pro bundle with no-rhc Clerk so deploy reflects #3227 (#3228)
PR #3227 fixed pro-test/src/services/checkout.ts to import
@clerk/clerk-js/no-rhc instead of the headless main export, but the
deployed bundle in public/pro/assets/ was never regenerated. The
Vercel deploy ships whatever is committed under public/pro/ — the
root build script does not run pro-test's vite build — so
production /pro continued serving the old broken clerk-C6kUTNKl.js
even after #3227 merged. Sign-in still threw "Clerk was not loaded
with Ui components".

Rebuild: cd pro-test && npm run build, which writes the new chunks
to ../public/pro/assets/. Deletes the stale clerk-C6kUTNKl.js +
index-J1JYVlDk.js, adds clerk.no-rhc-UeQvd9Xf.js +
index-CFLOgmG-.js, and updates pro/index.html to reference them.
2026-04-20 14:24:44 +04:00
Elie Habib
7979b4da0e fix(pro): switch Clerk to no-rhc bundle so sign-in modal mounts on /pro (#3227)
* fix(pro): switch Clerk to no-rhc bundle so sign-in modal mounts

The /pro marketing page was throwing "Clerk was not loaded with Ui
components" the moment an unauthenticated user clicked Sign In or
GET STARTED on a pricing tier, blocking every conversion.

@clerk/clerk-js v6 main export (`dist/clerk.mjs`) is the headless
build — it has no UI controller and expects `clerkUICtor` passed to
`clerk.load()`. Calling `openSignIn()` on it always throws. The
bundled-with-UI variant is exposed at `@clerk/clerk-js/no-rhc`
(same `Clerk` named export, drop-in).

Also adds explicit `Sentry.captureException` at both call sites,
because the rejection was being swallowed by `.catch(console.error)`
in App.tsx and by an unwrapped `c.openSignIn()` in checkout.ts —
which is why this regression had zero Sentry trail in production.

* fix(pro): catch Clerk load failures in startCheckout, not just openSignIn

The PricingSection CTA fires-and-forgets `startCheckout()` with no
.catch. The previous fix only wrapped `c.openSignIn()`, so any
rejection from `await ensureClerk()` (dynamic import failure, network
loss mid-load, clerk.load() throwing) still escaped as an unhandled
promise — defeating the Sentry coverage we added.

Now `startCheckout()` reports both load and openSignIn failures
explicitly and returns false rather than rejecting.

Also clear the cached `clerkLoadPromise` on failure so the next
button click can retry from scratch instead of replaying a rejected
promise forever.

* fix(pro): only publish Clerk instance after load() succeeds

_loadClerk() was assigning the module-level `clerk` singleton before
awaiting `clerk.load()`. If load() rejected (transient network failure,
malformed publishable key, Clerk frontend-api 4xx/5xx), the half-
initialized instance stayed cached. The next ensureClerk() call then
short-circuited on `if (clerk) return clerk;` and returned the broken
instance, bypassing the retry path that commit 70d96d380 added.

Hold the new instance in a local var, await load(), only then publish
to the module slot. A failed load now leaves `clerk` null and the
cleared `clerkLoadPromise` allows a genuine retry on the next click.
2026-04-20 13:45:46 +04:00
Elie Habib
1928b48e68 feat(portwatch): globalise EP3 — one paginated fetch, in-memory groupBy (#3225)
* feat(portwatch): globalise EP3 — one paginated fetch, in-memory groupBy

Follow-up to #3222 (stabiliser) — the real fix. Production log
2026-04-20T06:48-06:55 confirmed the stabilisers worked (per-country
cap enforced at 90.0s, SIGTERM printed stage+errors, abort propagated
through fetch + proxy paths) — but also proved the per-country shape
itself is the bug:

  batch 1/15: 7 seeded, 5 errors (90.0s)   ← per-country cap hit cleanly
  batch 5/15: 40 seeded, 20 errors (371.3s) ← 4 batches / ~70s avg
  SIGTERM at batch 6/15 after 420s

15 batches × ~70s = 1050s. Section budget is 420s. Per-country will
never fit, even with a perfectly-behaving ArcGIS. Three countries
(BRA, IDN, NGA) also returned "Invalid query parameters" on the
ISO3-filtered WHERE — a failure mode unique to the per-country shape.

Fix: replace 174 per-country round-trips with a single paginated pass
over EP3, grouped by ISO3 in memory (same pattern EP4 refs already
use via `fetchAllPortRefs`). ~150-200 sequential pages × ~1s each
≈ 2-4 min total wall time inside the 420s section. Eliminates the
per-country failure modes by construction.

Changes:

- New `fetchAllActivityRows(since, { signal, progress })`: paginated
  `WHERE date > <ts>` across the whole Daily_Ports_Data feature server,
  grouped by attributes.ISO3 into Map<iso3, rows[]>. Advances offset by
  actual features.length (same server-cap defence as EP4). Checks
  signal.aborted between pages.
- `fetchAll()` now reads the global map and drives `computeCountryPorts`
  for each eligible ISO3. No concurrency primitives, no batch loop, no
  Promise.allSettled.
- Dropped: `processCountry`, `withPerCountryTimeout`, per-country
  `fetchActivityRows`, CONCURRENCY, PER_COUNTRY_TIMEOUT_MS,
  BATCH_LOG_EVERY. All dead under the global pattern.
- `progress` shape now `{ stage, pages, countries }`. SIGTERM handler
  logs "SIGTERM during stage=<x> (pages=N, countries=M)" — still
  useful forensics if the global paginator itself hangs.
- Shutdown controller: `main()` creates an AbortController, threads
  its signal through fetchAll → fetchAllActivityRows → fetchWithTimeout
  → _proxy-utils, and the SIGTERM handler calls abort() so in-flight
  HTTP work stops instead of burning SIGKILL grace window. Reuses the
  signal-threading plumbing shipped in #3222.

Preserved: degradation guard (>20% drop rejects), TTL extension on
failure, lock release in finally, 429 proxy fallback with signal
propagation, page-level abort checks.

Tests: 43 pass (dropped 2 withPerCountryTimeout runtime tests that
targeted removed code; kept proxyFetch pre-aborted-signal test since
the proxy plumbing is still exercised by the global fetch). Full
test:data 5865 pass. Typecheck + lint clean.

* fix(portwatch): stream-aggregate EP3 into per-port accumulators (PR #3225 P1)

Review feedback on PR #3225: the first globalisation pass materialised
the full 90-day activity dataset as Map<iso3, Feature[]> before any
aggregation. At ~2000 ports × 90 days ≈ 180k feature objects × ~400
bytes = ~70MB RSS. Trades the timeout failure mode for an OOM/restart
under large datasets on the 1GB Railway container.

Fix: replace the two-phase "fetch-all then compute" shape with a
single-pass streaming aggregator.

- `fetchAndAggregateActivity(since, { signal, progress })` folds each
  page's features into Map<iso3, Map<portId, PortAccum>> inline and
  discards the raw features. Only ~2000 per-port accumulators (~100
  bytes each = ~200KB) live across pages. Memory is O(ports), not
  O(rows).
- PortAccum holds running counters for each aggregation window:
  last30_calls, last30_count, last30_import, last30_export, prev30_calls,
  last7_calls, last7_count. Captured once per port at first sighting
  (date ASC order preserves old `rows[0].portname` behaviour).
- New `finalisePortsForCountry(portAccumMap, refMap)` — exported for
  tests — computes the exact same per-port fields as the removed
  `computeCountryPorts`: tankerCalls30d = last30_calls,
  tankerCalls30dPrev = prev30_calls, import/export from their sums,
  avg30d = last30_calls/last30_count, avg7d = last7_calls/last7_count,
  anomalySignal unchanged, trendDelta unchanged, top-50 truncation
  unchanged.
- `fetchAll()` now calls the aggregator + finaliser directly; the
  transient Feature[] map is gone.

Preserved from PR #3225: shutdown AbortController plumbing, 429 proxy
fallback with signal propagation, degradation guard, SIGTERM
diagnostic flush, page-level abort checks.

Tests: 48 pass (was 43, +5 runtime tests for finalisePortsForCountry
covering trendDelta, anomalySignal, top-N truncation, and missing
refMap entries). Full test:data 5877 pass. Typecheck + lint clean.

* fix(portwatch): skip EP3 geometry + thread signal into EP4 refs (PR #3225 review)

Two valid findings from PR #3225 review on commit fca1eaba0:

P1 — `returnGeometry: 'false'` was present on the EP4 refs paginator
but missing on the EP3 activity paginator. ArcGIS returns geometry by
default (~100-200KB per page). Across the ~150-200 pages this PR adds
to the perf-critical path, that's tens of MB of unused coordinate
data on the wire — directly undermining the 420s section budget the
globalisation is meant to fit inside. One-line fix on the EP3 params
block.

P2 — `fetchAllPortRefs` accepted no signal, so during the 'refs'
stage a SIGTERM would abort the shutdownController without
cancelling any in-flight EP4 fetches. Each page could still run up
to FETCH_TIMEOUT (45s) after the handler fired. Low blast radius
(process.exit terminates the container regardless) but the PR
description claimed full signal threading end-to-end; this closes
the last gap.

Tests: 50 pass (was 48, +2 for the two new assertions —
returnGeometry presence in both paginators, fetchAllPortRefs
signal plumbing). Full test:data 5879 pass. Typecheck + lint clean.
2026-04-20 13:44:42 +04:00
Elie Habib
1a2295157e feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief (#3224)
* feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief

Problem
-------
The 2026-04-20 08:00 brief contained 12 stories, 7 of which were
duplicates of 4 events, alongside low-importance filler (niche
commodity, domestic crime). notification-relay's IMPORTANCE_SCORE_MIN
gate (#3223, set to 63) only applies to the realtime fanout path.
The digest cron reads the same story:track:*.currentScore but has
NO absolute score floor — it just ranks and slices(0, 30), so on
slow news days low-importance items bubble up to fill slots.

Change
------
scripts/seed-digest-notifications.mjs:
  - New getDigestScoreMin() reads DIGEST_SCORE_MIN env at call time
    (Railway flips apply on the next cron tick, no redeploy).
  - Default 0 = no-op, so this PR is behaviour-neutral until the
    env var is set on Railway.
  - Filter runs AFTER deduplicateStories() so it drops clusters by
    the REPRESENTATIVE's score (which is the highest-scoring member
    of its cluster per materializeCluster's sort).
  - One-line operator log when the floor fires:
      [digest] score floor dropped N of M clusters (DIGEST_SCORE_MIN=X)

tests/digest-score-floor.test.mjs (6 regressions):
  - getDigestScoreMin reads from process.env (not a module const)
  - default is 0 (no-op)
  - rejects non-integer / negative values (degrades to 0)
  - filter runs AFTER dedup, BEFORE slice(0, DIGEST_MAX_ITEMS)
  - short-circuits when floor is 0 (no wasted filter pass)
  - log line emits "dropped N of M clusters"

Operator activation
-------------------
Set on Railway seed-digest-notifications service:

    DIGEST_SCORE_MIN=63

Start at 63 to match the realtime gate, then nudge up/down based
on the log lines over ~24h. Unset = off (pre-PR behaviour).

Why not bundle a cosine-threshold bump
--------------------------------------
The cosine-threshold tuning (0.60 -> 0.55 per the threshold probe)
is an env-only flip already supported by the dedup orchestrator.
Bundling an env-default change into this PR would slow rollback.
Operator sets DIGEST_DEDUP_COSINE_THRESHOLD=0.55 on Railway as a
separate action; this PR stays scoped to the score floor.

Verification
------------
- npm run test:data            5825/5825 pass
- tests/digest-score-floor      6/6 pass
- tests/edge-functions        171/171 pass
- typecheck + typecheck:api   clean
- biome check on changed files clean (pre-existing main() complexity
  warning on this file is unchanged)
- lint:md                      0 errors
- version:check                OK

Post-Deploy Monitoring & Validation
-----------------------------------
- **What to monitor** after setting DIGEST_SCORE_MIN on Railway:
  - `[digest] score floor dropped` lines — expect ~5-25% of
    clusters dropped on bulk-send ticks (stories=700+)
  - `[digest] Cron run complete: N digest(s) sent` stays > 0
- **Expected healthy behaviour**
  - 0-5 clusters dropped on normal ~80-story ticks
  - 50-200 dropped on bulk 700+ story ticks
  - brief still reports 10-30 stories for PRO users
- **Failure signals / rollback**
  - 0 digests sent for 24h after flipping the var
  - user-visible brief now has < 10 stories
  - Rollback: unset DIGEST_SCORE_MIN on Railway dashboard (instant,
    no deploy), next cron tick reverts to unfiltered behaviour
- **Validation window**: 24h
- **Owner**: koala73

Related
-------
- #3218 LLM prompt upgrade (source of importanceScore quality)
- #3221 geopolitical scope for critical
- #3223 notification-relay realtime gate (mirror knob)
- #3200 embedding-based dedup (the other half of brief quality)

* fix(digest): return null (not []) when score floor drains every cluster

Greptile P2 finding on PR #3224.

When DIGEST_SCORE_MIN is set high enough to filter every cluster,
buildDigest previously returned [] (empty array). The caller's
`if (!stories)` guard only catches falsy values, so [] slipped
past the "No stories in window" skip-log and the run reached
formatDigest([], nowMs) which returns null, then silently continued
at the !storyListPlain check.

Flow was still correct (no digest sent) but operators lost the
observability signal to distinguish "floor too high" from "no news
today" from "dedup ate everything".

Fix:
- buildDigest now returns null when the post-floor list is empty,
  matching the pre-dedup-empty path. Caller's existing !stories
  guard fires the canonical skip-log.
- Emits a distinct `[digest] score floor dropped ALL N clusters
  (DIGEST_SCORE_MIN=X) — skipping user` line BEFORE the return,
  so operators can spot an over-aggressive floor in the logs.
- Test added covering both the null-return contract and the
  distinct "dropped ALL" log line.

7/7 dedup-score-floor tests pass.
2026-04-20 10:19:03 +04:00
Elie Habib
4f38ee5a19 fix(portwatch): per-country timeout + SIGTERM progress flush (#3222)
* fix(portwatch): per-country timeout + SIGTERM progress flush

Diagnosed from Railway log 2026-04-20T04:00-04:07: Port-Activity section hit
the 420s section cap with only batch 1/15 logged. Gap between batch 1 (67.3s)
and SIGTERM was 352s of silence — batch 2 stalled because Promise.allSettled
waits for the slowest country and processCountry had no per-country budget.
One slow country (USA/CHN with many ports × many pages under ArcGIS EP3
throttling) blocked the whole batch and cascaded to the section timeout,
leaving batches 2..15 unattempted.

Two changes, both stabilisers ahead of the proper fix (globalising EP3):

1. Wrap processCountry in Promise.race against a 90s PER_COUNTRY_TIMEOUT_MS.
   Bounds worst-case batch time at ~90s regardless of ArcGIS behaviour.
   Orphan fetches keep running until their own AbortSignal.timeout(45s)
   fires — acceptable since the process exits soon after either way.

2. Share a `progress` object between fetchAll() and the SIGTERM handler so
   the kill path flushes batch index, seeded count, and the first 10 error
   messages. Past timeout kills discarded the errors array entirely,
   making every regression undiagnosable.

* fix(portwatch): address PR #3222 P1+P2 (propagate abort, eager error flush)

Review feedback on #3222:

P1 — The 90s per-country timeout did not actually stop the timed-out
country's work; Promise.race rejected but processCountry kept paginating
with fresh 45s fetch timeouts per page, violating the CONCURRENCY=12 cap
and amplifying ArcGIS throttling instead of containing it.

Fix: thread an AbortController signal from withPerCountryTimeout through
processCountry → fetchActivityRows → fetchWithTimeout. fetchWithTimeout
combines the caller signal with AbortSignal.timeout(FETCH_TIMEOUT) via
AbortSignal.any so the per-country abort propagates into the in-flight
fetch. fetchActivityRows also checks signal.aborted between pages so a
cancel lands on the next iteration boundary even if the current page
has already resolved. Node 24 runtime supports AbortSignal.any.

P2 — SIGTERM diagnostics missed failures from the currently-stuck batch
because progress.errors was only populated after Promise.allSettled
returned. A kill during the pending await left progress.errors empty.

Fix: attach p.catch(err => errors.push(...)) to each wrapped promise
before Promise.allSettled. Rejections land in the shared errors array
at the moment they fire, so a SIGTERM mid-batch sees every rejection
that has already occurred (including per-country timeouts that have
already aborted their controllers). The settled loop skips rejected
outcomes to avoid double-counting.

Also exports withPerCountryTimeout with an injectable timeoutMs so the
new runtime tests can exercise the abort path at 40ms. Runtime tests
verify: (a) timer fires → underlying signal aborted + work rejects with
the per-country message, (b) work-resolves-first returns the value,
(c) work-rejects-first surfaces the real error, (d) eager .catch flush
populates a shared errors array before allSettled resolves.

Tests: 45 pass (was 38, +7 — 4 runtime + 3 source-regex).
Full test:data: 5867 pass. Typecheck + lint clean.

* fix(portwatch): abort also cancels 429 proxy fallback (PR #3222 P1 follow-up)

Second review iteration on #3222: the per-country AbortController fix
from b2f4a2626 stopped at the direct fetch() and did not reach the 429
proxy fallback. httpsProxyFetchRaw only accepted timeoutMs, so a
timed-out country could keep a CONNECT tunnel + request alive for up
to another FETCH_TIMEOUT (45s) after the batch moved on — the exact
throttling scenario the PR is meant to contain. The concurrency cap
was still violated on the slow path.

Threads `signal` all the way through:

- scripts/_proxy-utils.cjs: proxyConnectTunnel + proxyFetch accept an
  optional signal option. Early-reject if `signal.aborted` before
  opening the socket. Otherwise addEventListener('abort') destroys the
  in-flight proxy socket + TLS tunnel and rejects with signal.reason.
  Listener removed in cleanup() on all terminal paths. Refactored both
  functions around resolveOnce/rejectOnce guards so the abort path
  races cleanly with timeout and network errors without double-settle.

- scripts/_seed-utils.mjs: httpsProxyFetchRaw accepts + forwards
  `signal` to proxyFetch.

- scripts/seed-portwatch-port-activity.mjs: fetchWithTimeout's 429
  branch passes its caller signal to httpsProxyFetchRaw.

Backward compatible: signal is optional in every layer, so the many
other callers of proxyFetch / httpsProxyFetchRaw across the repo are
unaffected.

Tests: 49 pass (was 45, +4). New runtime test proves pre-aborted
signals reject proxyFetch synchronously without touching the network.
Source-regex tests assert signal threading at each layer. Full
test:data 5871 pass. Typecheck + lint clean.
2026-04-20 09:36:10 +04:00
Elie Habib
6e639274f1 feat(scoring): set data-driven score gate thresholds (82/69/63) (#3223)
* feat(scoring): set data-driven score gate thresholds (82/69/63)

Calibrated from v5 shadow-log recalibration on 2026-04-20:
  critical sensitivity: 85 → 82  (fires on Hormuz closures, ship seizures, ceasefire collapses)
  high sensitivity:     65 → 69  (fires on mass shootings, blockade enforcement, major diplomatic)
  all/default MIN:      40 → 63  (drops tutorials, domestic crime, niche commodity)

Activation: set IMPORTANCE_SCORE_LIVE=1 + IMPORTANCE_SCORE_MIN=63 on
Railway notification-relay env vars after this PR merges.

Scoring pipeline journey:
  PR #3069 — fixed stale relay score (Pearson 0.31 → 0.41)
  PR #3143 — closed /api/notify bypass
  PR #3144 — weight rebalance severity 55% (Pearson 0.41 → 0.67)
  PR #3218 — LLM prompt upgrade + cache v2
  PR #3221 — geopolitical scope for critical
  This PR — final threshold constants

* fix(scoring): use IMPORTANCE_SCORE_MIN for 'all' sensitivity threshold

Review found the hardcoded 63 for 'all' sensitivity diverged from the
IMPORTANCE_SCORE_MIN env var used at the relay ingress gate. An operator
setting IMPORTANCE_SCORE_MIN=40 would still have 'all' subscribers miss
alerts scored 40-62. Now both gates use the same env var (default 63).
2026-04-20 09:19:47 +04:00
Elie Habib
14c1314629 fix(scoring): scope "critical" to geopolitical events, not domestic tragedies (#3221)
The weight rebalance (PR #3144) amplified a prompt gap: domestic mass
shootings (e.g. "8 children killed in Louisiana") scored 88 because the
LLM classified them as "critical" (mass-casualty 10+ killed) and the
55% severity weight pushed them into the critical gate. But WorldMonitor
is a geopolitical monitor — domestic tragedies are terrible but not
geopolitically destabilizing.

Prompt change (both ais-relay.cjs + classify-event.ts):
- "critical" now explicitly requires GEOPOLITICAL scope: "events that
  destabilize international order, threaten cross-border security, or
  disrupt global systems"
- Domestic mass-casualty events (mass shootings, industrial accidents)
  moved to "high" — still important, but not critical-sensitivity alerts
- Added counterexamples: "8 children killed in mass shooting in
  Louisiana → domestic mass-casualty → high" and "23 killed in fireworks
  factory explosion → industrial accident → high"
- Retained: "700 killed in Sudan drone strikes → geopolitical mass-
  casualty in active civil war → critical"

Classify cache: v2→v3 (bust stale entries that lack geopolitical scope).
Shadow-log: v4→v5 (clean dataset for recalibration under the scoped prompt).

🤖 Generated with Claude Opus 4.6 via Claude Code

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 08:40:29 +04:00
Elie Habib
e2255840f6 fix(sentry): allowlist third-party tile hosts for maplibre Failed-to-fetch filter (#3220)
* fix(sentry): allowlist third-party tile hosts for maplibre Failed-to-fetch filter

Follow-up to #3217. The blanket "any maplibre frame + (hostname)" rule would
drop real failures on our self-hosted R2 PMTiles bucket or any first-party
fetch that happens to run on a maplibre-framed stack. Enumerated the actual
third-party hosts our maplibre paths fetch from (tilecache.rainviewer.com,
basemaps.cartocdn.com, tiles.openfreemap.org, protomaps.github.io) into a
module-level Set and gated the filter on membership. First-party hosts keep
surfacing.

Updated regression test to mirror real-world mixed stacks (maplibre + first-party
fetch wrapper) so the allowlist is what decides, not the pre-existing
"all frames are maplibre internals" filter which is orthogonal.

* fix(sentry): route maplibre AJAX errors past the generic vendor-only filter

Review feedback: the broader "all non-infra frames are maplibre internals"
TypeError filter at main.ts:287 runs BEFORE the new host-allowlist block and
short-circuits it for all-vendor stacks. Meaning a self-hosted R2 basemap
fetch failure whose stack is purely maplibre frames would still be silently
dropped, defeating the point of the allowlist.

Carve out the `Failed to fetch (<host>)` AJAX pattern: precompute
`isMaplibreAjaxFailure` and skip the generic vendor filter when it matches,
so the host-allowlist check is always the one that decides. Added two
regression tests covering the all-maplibre edge case both ways:
- allowlisted host + all-maplibre → still suppressed
- non-allowlisted host + all-maplibre → surfaces
2026-04-20 08:21:37 +04:00
Elie Habib
1581b2dd70 fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 (#3218)
* fix(scoring): upgrade LLM classifier prompt + keywords + cache v2

PR B of the scoring recalibration plan (docs/plans/2026-04-17-002).
Builds on PR A (weight rebalance, PR #3144) which achieved Pearson
0.669. This PR targets the remaining noise in the 50-69 band where
editorials, tutorials, and domestic crime score alongside real news.

LLM prompt upgrade (both writers):
- scripts/ais-relay.cjs CLASSIFY_SYSTEM_PROMPT: added per-level
  guidelines, content-type distinction (editorial/opinion/tutorial →
  info, domestic crime → info, mass-casualty → critical), and concrete
  counterexamples.
- server/worldmonitor/intelligence/v1/classify-event.ts: same guidelines
  added to align the second cache writer.

Classify cache bump:
- classify:sebuf:v1: → classify:sebuf:v2: in all three locations
  (ais-relay.cjs classifyCacheKey, list-feed-digest.ts enrichWithAiCache,
  _shared.ts CLASSIFY_CACHE_PREFIX). Old v1 entries expire naturally
  (24h TTL). All items reclassified within 15 min of Railway deploy.

Keyword additions (_classifier.ts):
- HIGH: 'sanctions imposed', 'sanctions package', 'new sanctions'
  (phrase patterns — no false positives on 'sanctioned 10 individuals')
- MEDIUM: promoted 'ceasefire' from LOW to reduce prompt/keyword
  misalignment during cold-cache window

Shadow-log v4:
- Clean dataset for post-prompt-change recalibration. v3 rolls off via
  7-day TTL.

Deploy order: Railway first (seedClassify prewarms v2 cache immediately),
then Vercel. First ~15 min of v4 may carry stale digest-cached scores.

🤖 Generated with Claude Opus 4.6 via Claude Code + Compound Engineering v2.49.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(scoring): align classify-event prompt + remove dead keywords + update v4 docs

Review findings on PR #3218:

P1: classify-event.ts prompt was missing 2 counterexamples and the
"Focus" line present in the relay prompt. Both writers share
classify:sebuf:v2 cache, so differing prompts mean nondeterministic
classification depending on which path writes first. Now both prompts
have identical level guidelines and counterexamples (format differs:
array vs single object, but classification logic is aligned).

P2: Removed 3 dead phrase-pattern keywords (sanctions imposed/package/
new) — the existing 'sanctions' entry already substring-matches all of
them and maps to the same (high, economic). Just dead code that misled
readers into thinking they added coverage.

P2: Updated stale v3 references in cache-keys.ts (doc block + exported
constant) and shadow-score-report.mjs header to v4.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 08:05:13 +04:00
Elie Habib
84eec7f09f fix(health): align breadthHistory maxStaleMin with actual Tue-Sat cron schedule (#3219)
Production alarm: `breadthHistory` went STALE_SEED every Monday morning
despite the seeder running correctly. Root cause was a threshold /
schedule mismatch:

- Schedule (Railway): 02:00 UTC, Tuesday through Saturday. Five ticks
  per week, capturing Mon-Fri market close → following-day 02:00 UTC.
- Threshold: maxStaleMin=2880 (48h), assuming daily cadence.
- Max real gap: Sat 02:00 UTC → Tue 02:00 UTC = 72h. The existing 48h
  alarm fired every Monday at ~02:00 UTC when the Sun/Mon cron ticks
  are intentionally absent, until the Tue 02:00 UTC run restored
  fetchedAt.

Fix: bump maxStaleMin to 5760 (96h). 72h covers the weekend gap;
extra 24h tolerates one missed Tue run without alarming. Comment now
records the actual schedule + reasoning.

No seeder change needed — logs confirm the service fires and completes
correctly on its schedule (Apr 16/17/18 02:00 UTC runs all "Done" with
3/3 readings, `Stopping Container` is normal Railway cron teardown).

Diagnostic memo: this is the class of bug where the schedule comment
lies. Original comment said "daily cron at 21:00 ET". True start time
is 22:00 EDT / 21:00 EST Mon-Fri (02:00 UTC next day) AND only Mon-Fri,
so "daily" is wrong by two days every week.
2026-04-20 07:56:54 +04:00
Elie Habib
0bc5b49267 fix(sentry): filter MapLibre AJAXError tile-fetch transients (#3217)
WORLDMONITOR-NE/NF (8 events, 1 user): MapLibre wraps transient tile
fetch failures as `TypeError: Failed to fetch (<hostname>)` and rethrows
inside a Generator-backed Promise, leaking to onunhandledrejection even
though DeckGLMap's map-error handler already logs them as warnings.
Triggered mostly by adblockers/extensions and flaky mobile networks.

Add a beforeSend filter gated on (a) the maplibre-specific paren message
format (`Failed to fetch (hostname)` — our own fetch code throws plain
`Failed to fetch` without hostname) and (b) presence of a maplibre vendor
frame in the stack, so a real first-party fetch regression with the same
message shape still surfaces. Covered by 3 regression tests.
2026-04-20 07:54:42 +04:00
Elie Habib
fc0c6bc163 fix(convex): use ConvexError for AUTH_REQUIRED so Sentry treats it as expected (#3216)
* fix(convex): use ConvexError for AUTH_REQUIRED so Sentry treats it as expected

WORLDMONITOR-N3: 8 events / 2 users from server-side Convex reporting
`Uncaught Error: Authentication required` thrown by requireUserId() when
a query fires before the WebSocket auth handshake completes. Every other
business error in this repo uses ConvexError("CODE"), which Convex's
server-side Sentry integration treats as expected rather than unhandled.

Migrate requireUserId to ConvexError("AUTH_REQUIRED") (no consumer parses
the message string — only a code comment references it) and add a matching
client-side ignoreErrors pattern next to the existing API_ACCESS_REQUIRED
precedent, as defense-in-depth against unhandled rejections reaching the
browser SDK.

* fix(sentry): drop broad AUTH_REQUIRED ignoreErrors — too many real call sites

Review feedback: requireUserId() backs user-initiated actions (checkout,
billing portal, API key ops), not just the benign query-race path. A bare
`ConvexError: AUTH_REQUIRED` message-regex in ignoreErrors has no stack
context, so a genuine auth regression breaking those flows for signed-in
users would be silently dropped. The server-side ConvexError migration in
convex/lib/auth.ts is enough to silence WORLDMONITOR-N3; anything that
still reaches the browser SDK should surface.
2026-04-20 01:35:33 +04:00
Elie Habib
4c9888ac79 docs(mintlify): panel reference pages (PR 2) (#3213)
* docs(mintlify): add user-facing panel reference pages (PR 2)

Six new end-user pages under docs/panels/ for the shipped panels that
had no user-facing documentation in the published docs, per the plan
docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md.
All claims are grounded in the live component source + SEED_META +
handler dirs — no invented fields, counts, or refresh windows.

- panels/latest-brief.mdx — daily AI brief panel (ready/composing/
  locked states). Hard-gated PRO (`premium: 'locked'`).
- panels/forecast.mdx — AI Forecasts panel (internal id `forecast`,
  label "AI Forecasts"). Domain + macro-region filter pills; 10%
  probability floor. Free on web, locked on desktop.
- panels/consumer-prices.mdx — 5-tab retail-price surface (Overview
  / Categories / Movers / Spread / Health) with market, basket, and
  7/30/90-day controls. Free.
- panels/disease-outbreaks.mdx — WHO / ProMED / national health
  ministries outbreak alerts with alert/warning/watch pills. Free.
- panels/radiation-watch.mdx — EPA RadNet + Safecast observations
  with anomaly scoring and source-confidence synthesis. Free.
- panels/thermal-escalation.mdx — FIRMS/VIIRS thermal clusters with
  persistence and conflict-adjacency flags. Free.

Also:
- docs/docs.json — new Panels nav group (Latest Brief, AI Forecasts,
  Consumer Prices, Disease Outbreaks, Radiation Watch, Thermal
  Escalation).
- docs/features.mdx — cross-link every panel name in the Cmd+K
  inventory to its new page (and link Country Instability + Country
  Resilience from the same list).
- docs/methodology/country-resilience-index.mdx — short "In the
  dashboard" bridge section naming the three CRI surfaces
  (Resilience widget, Country Deep-Dive, map choropleth) so the
  methodology page doubles as the user-facing panel reference for
  CRI. No separate docs/panels/country-resilience.mdx — keeps the
  methodology page as the single source of truth.

* docs(panels): fix Latest Brief polling description

Reviewer catch: the panel does schedule a 60-second re-poll while
in the composing state. `COMPOSING_POLL_MS = 60_000` at
src/components/LatestBriefPanel.ts:78, and `scheduleComposingPoll()`
is called from `renderComposing()` at :366. The poll auto-promotes
the panel to ready without a manual refresh and is cleared when
the panel leaves composing. My earlier 'no polling timer' line was
right for the ready state but wrong as a blanket claim.

* docs(panels): fix variant-availability claims across all 6 panel pages

Reviewer catch on consumer-prices surfaced the same class of error
on 4 other panel pages: I described variant availability with loose
phrasing ('most variants', 'where X context is relevant', 'tech/
finance/happy opt-in') that didn't match the actual per-variant
panel registries in src/config/panels.ts.

Verified matrix against each *_PANELS block directly:

  Panel              | FULL | TECH | FINANCE | HAPPY | COMMODITY
  consumer-prices    | opt  |  -   |  def    |   -   |  def
  latest-brief       | def  | def  |  def    |   -   |  def   (all PRO-locked)
  disease-outbreaks  | def  |  -   |   -     |   -   |   -
  radiation-watch    | def  |  -   |   -     |   -   |   -
  thermal-escalation | def  |  -   |   -     |   -   |   -
  forecast           | def  |  -   |   -     |   -   |   -   (PRO-locked on desktop)

All 6 pages now name the exact variant blocks in src/config/panels.ts
that register them, so the claim is re-verifiable by grep rather than
drifting with future panel-registry changes.

* docs(panels): fix 5 reviewer findings — no invented controls/sources/keys

All fixes cross-checked against source.

- consumer-prices: no basket selector UI exists. The panel has a
  market bar, a range bar, and tab/category affordances; basket is
  derived from market selection (essentials-<code>, or DEFAULT_BASKET
  for the 'all' aggregate view). Per
  src/components/ConsumerPricesPanel.ts:120-123 and :216-229.
- disease-outbreaks: 'Row click opens advisory' was wrong. The only
  interactive elements in-row are the source-name <a> link
  (sanitised URL, target=_blank); clicking the row itself is a no-op
  (the only content-level listener is for [data-filter] pills and
  the search input). Per DiseaseOutbreaksPanel.ts:35-49,115-117.
- disease-outbreaks: upstream list was wrong. Actual seeder uses
  WHO DON (JSON API), CDC HAN (RSS), Outbreak News Today
  (aggregator), and ThinkGlobalHealth disease tracker
  (ProMED-sourced, 90d lookback). Noted the in-panel tooltip's
  shorter 'WHO, ProMED, health ministries' summary and gave the full
  upstream list with the 72h Redis TTL. Per seed-disease-outbreaks
  .mjs:31-38.
- radiation-watch: summary bar renders 6 cards, not 7 — Anomalies,
  Elevated, Confirmed, Low Confidence, Conflicts, Spikes. The
  CPM-derived indicator is a per-row badge (radiation-flag-converted
  at :67), not a summary card. Moved the CPM reference to the
  per-row badges list. Per RadiationWatchPanel.ts:85-112.
- latest-brief: Redis key shape corrected. The composer writes the
  envelope to brief:{userId}:{issueSlot} (where issueSlot comes from
  issueSlotInTz, not a plain date) and atomically writes a latest
  pointer at brief:latest:{userId} → {issueSlot}. Readers resolve
  via the pointer. 7-day TTL on both. Per
  seed-digest-notifications.mjs:1103-1115 and
  api/latest-brief.ts:80-89.

* docs(panels): Tier 1 — PRO/LLM panel reference pages (9)

Adds user-facing panel pages for the 9 PRO/LLM-backed surfaces flagged
in the extended audit. All claims grounded in component source +
src/config/panels.ts entries (with line cites).

- panels/chat-analyst.mdx — WM Analyst (conversational AI, 5 quick
  actions, 4 domain scopes, POSTs /api/chat-analyst via premiumFetch).
- panels/market-implications.mdx — AI Market Implications trade signals
  (LONG/SHORT/HEDGE × HIGH/MEDIUM/LOW, transmission paths, 120min
  maxStaleMin, degrade-to-warn). Carries the repo's disclaimer verbatim.
- panels/deduction.mdx — Deduct Situation (opt-in PRO; 5s cooldown;
  composes buildNewsContext + active framework).
- panels/daily-market-brief.mdx — Daily Market Brief (stanced items,
  framework selector, live vs cached source badge).
- panels/regional-intelligence.mdx — Regional Intelligence Board
  (7 BOARD_REGIONS, 6 structured blocks + narrative sections,
  request-sequence arbitrator, opt-in PRO).
- panels/strategic-posture.mdx — AI Strategic Posture (cached posture
  + live military vessels → recalcPostureWithVessels; free on web,
  enhanced on desktop).
- panels/stock-analysis.mdx — Premium Stock Analysis (per-ticker
  deep dive: signal, targets, consensus, upgrades, insiders, sparkline).
- panels/stock-backtest.mdx — Premium Backtesting (longitudinal view;
  live vs cached data badge).
- panels/wsb-ticker-scanner.mdx — WSB Ticker Scanner (retail sentiment
  + velocity score with 4-tier color bucketing).

All 9 are PRO (8 via apiKeyPanels allowlist at src/config/panels.ts:973,
strategic-posture is free-on-web/enhanced-on-desktop). Variant matrices
name the exact *_PANELS block registering each panel.

* docs(panels): Tier 2 — flagship free data panels (7)

Adds reference pages for 7 flagship free panels. Every claim grounded
in the panel component + src/config/panels.ts per-variant registration.

- panels/airline-intel.mdx — 6-tab aviation surface (ops/flights/
  airlines/tracking/news/prices), 8 aviation RPCs, user watchlist.
- panels/tech-readiness.mdx — ranked country tech-readiness index with
  6-hour in-panel refresh interval.
- panels/trade-policy.mdx — 6-tab trade-policy surface (restrictions/
  tariffs/flows/barriers/revenue/comtrade).
- panels/supply-chain.mdx — composite stress + carriers + minerals +
  Scenario Engine trigger surface (free panel, PRO scenario activation).
- panels/sanctions-pressure.mdx — OFAC SDN + Consolidated list
  pressure rollup with new/vessels/aircraft summary cards and top-8
  country rows.
- panels/hormuz-tracker.mdx — Hormuz chokepoint drill-down; status
  indicator + per-series bar charts; references Scenario Engine's
  hormuz-tanker-blockade template.
- panels/energy-crisis.mdx — IEA 2026 Energy Crisis Policy Response
  Tracker; category/sector/status filters.

All 7 are free. Variant matrices name exact *_PANELS blocks
registering each panel.

* docs(panels): Tier 3 — compact panels (5)

Adds reference pages for 5 compact user-facing panels.

- panels/world-clock.mdx — 22 global market-centre clocks with
  exchange labels + open/closed indicators (client-side only).
- panels/monitors.mdx — personal keyword alerts, localStorage-persisted;
  links to Features → Custom Monitors for longer explanation.
- panels/oref-sirens.mdx — OREF civil-defence siren feed; active +
  24h wave history; free on web, PRO-locked on desktop (_desktop &&
  premium: 'locked' pattern).
- panels/telegram-intel.mdx — topic-tabbed Telegram channel mirror
  via relay; free on web, PRO-locked on desktop.
- panels/fsi.mdx — US KCFSI + EU FSI stress composites with
  four-level colour buckets (Low/Moderate/Elevated/High).

All 5 grounded in component source + variant registrations.
oref-sirens and telegram-intel correctly describe the _desktop &&
locking pattern rather than the misleading 'PRO' shorthand used
earlier for other desktop-locked panels.

* docs(panels): Tier 4 + 5 catalogue pages, nav re-grouping, features cross-links

Closes out the comprehensive panel-reference expansion. Two catalogue
pages cover the remaining ~60 panels collectively so they're all
searchable and findable without dedicated pages per feed/tile.

- panels/news-feeds.mdx — catalogue covering all content-stream panels:
  regional news (africa/asia/europe/latam/us/middleeast/politics),
  topical news (climate/crypto/economic/markets/mining/commodity/
  commodities), tech/startup streams (startups/unicorns/accelerators/
  fintech/ipo/layoffs/producthunt/regionalStartups/thinktanks/vcblogs/
  defense-patents/ai-regulation/tech-hubs/ai/cloud/hardware/dev/
  security/github), finance streams (bonds/centralbanks/derivatives/
  forex/institutional/policy/fin-regulation/commodity-regulation/
  analysis), happy variant streams (species/breakthroughs/progress/
  spotlight/giving/digest/events/funding/counters/gov/renewable).
- panels/indicators-and-signals.mdx — catalogue covering compact
  market-indicator tiles, correlation panels, and misc signal surfaces.
  Grouped by function: sentiment, macro, calendars, market-structure,
  commodity, crypto, regional economy, correlation panels, misc signals.

docs/docs.json — split the Panels group into three for navigability:
  - Panels — AI & PRO (11 pages)
  - Panels — Data & Tracking (16 pages)
  - Panels — Catalogues (2 pages)

docs/features.mdx — Cmd+K inventory rewritten as per-family sub-lists
  with links to every panel page (or catalogue page for the ones
  that live in a catalogue). Replaces the prior run-on paragraph.

Every catalogue panel is also registered in at least one *_PANELS
block in src/config/panels.ts — the catalogue pages note this and
point readers to the config file for variant-availability details.

* docs(panels): fix airline-intel + world-clock source-of-truth errors

- airline-intel: refresh behavior section was wrong on two counts.
  (1) The panel DOES have a polling timer: a 5-minute setInterval
  in the constructor calling refresh() (which reloads ops + active
  tab). (2) The 'prices' tab does NOT re-fetch on tab switch —
  it's explicitly excluded from both tab-switch and auto-refresh
  paths, loading only on explicit search-button click. Three
  distinct refresh paths now documented with source line hints.
  Per src/components/AirlineIntelPanel.ts ~:173 (setInterval),
  :287 (prices tab-switch guard), :291 (refresh() prices skip).
- world-clock: the WORLD_CITIES list has 30 entries, not '~22'.
  Replaced the approximate count with the exact number and a
  :14-43 line-range cite so it's re-verifiable.
2026-04-20 00:53:17 +04:00
Elie Habib
d1a4cf7780 docs(mintlify): add Route Explorer + Scenario Engine workflow pages (#3211)
* docs(mintlify): add Route Explorer + Scenario Engine workflow pages

Checkpoint for review on the IA refresh (per plan
docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md).

- docs/docs.json: link Country Resilience Index methodology under
  Intelligence & Analysis so the flagship 222-country feature is
  reachable from the main nav (previously orphaned). Add a new
  Workflows group containing route-explorer and scenario-engine.
- docs/route-explorer.mdx: standalone workflow page. Who it is for,
  Cmd+K entry, four tabs (Current / Alternatives / Land / Impact),
  inputs, keyboard bindings, map-state integration, PRO gating
  with free-tier blur + public-route highlight, data sources.
- docs/scenario-engine.mdx: standalone workflow page. Template
  categories (conflict / weather / sanctions / tariff_shock /
  infrastructure / pandemic), how a scenario activates on the map,
  PRO gating, pointers to the async job API.

Deferred to follow-up commits in the same PR:
  - documentation.mdx landing rewrite
  - features.mdx refresh
  - maritime-intelligence.mdx link-out to Route Explorer
  - Panels nav group (waits for PR 2 content)

All content grounded in live source files cited inline.

* docs(mintlify): fix Route Explorer + Scenario Engine review findings

Reviewer caught 4 cases where I described behavior I hadn't read
carefully. All fixes cross-checked against source.

- route-explorer (free-tier): the workflow does NOT blur a numeric
  payload behind a public demo route. On free tier, fetchLane()
  short-circuits to renderFreeGate() which blurs the left rail,
  replaces the tab area with an Upgrade-to-PRO card, and applies a
  generic public-route highlight on the map. No lane data is
  rendered in any tab. See src/components/RouteExplorer/
  RouteExplorer.ts:212 + :342.
- route-explorer (keyboard): Tab / Shift+Tab moves focus between the
  panel and the map. Direct field jumps are F (From), T (To), P
  (Product/HS2), not Tab-cycling. Also added the full KeyboardHelp
  binding list (S swap, ↑/↓ list nav, Enter commit, Cmd+, copy URL,
  Esc close, ? help, 1-4 tabs). See src/components/RouteExplorer/
  KeyboardHelp.ts:9 and RouteExplorer.ts:623.
- scenario-engine: the SCENARIO_TEMPLATES array only ships templates
  of 4 types today (conflict, weather, sanctions, tariff_shock).
  The ScenarioType union includes infrastructure and pandemic but
  no templates of those types ship. Dropped them from the shipped
  table and noted the type union leaves room for future additions.
- scenario-engine + api-scenarios: the worker writes
  status: 'done' (not 'completed') on success, 'failed' on error;
  pending is synthesised by the status endpoint when no worker
  record exists. Fixed both the new workflow page and the merged
  api-scenarios.mdx completed-response example + polling language.
  See scripts/scenario-worker.mjs:421 and
  src/components/SupplyChainPanel.ts:870.

* docs(mintlify): fix third-round review findings (real IDs + 4-state lifecycle)

- api-scenarios (template example): replaced invented
  hormuz-closure-30d / ["hormuz"] with the actually-shipped
  hormuz-tanker-blockade / ["hormuz_strait"] from scenario-
  templates.ts:80. Listed the other 5 shipped template IDs so
  scripted users aren't dependent on a single example.
- api-scenarios (status lifecycle): worker writes FOUR states,
  not three. Added the intermediate "processing" state with
  startedAt, written by the worker at job pickup (scenario-
  worker.mjs:411). Lifecycle now: pending → processing →
  done|failed. Both pending and processing are non-terminal.
- scenario-engine (scripted use blurb): mirror the 4-state
  language and link into the lifecycle table.
- scenario-engine (UI dismiss): replaced "Click Deactivate"
  with the actual × dismiss control on the scenario banner
  (aria-label: "Dismiss scenario") per
  src/components/SupplyChainPanel.ts:790. Also described the
  banner contents (name, chokepoints, countries, tagline).
- api-shipping-v2: while fixing chokepoint IDs, also corrected
  "hormuz" → "hormuz_strait" and "bab-el-mandeb" → "bab_el_mandeb"
  across all four occurrences in the shipping v2 page (from
  PR #3209). Real IDs come from server/_shared/chokepoint-
  registry.ts (snake_case, not kebab-case, not bare "hormuz").

* docs(mintlify): fix fourth-round findings (banner DOM, webhook TTL refresh)

- scenario-engine: accurate description of the rendered scenario
  banner. Always-present elements are the ⚠ icon, scenario name,
  top-5 impacted countries with impact %, and dismiss ×. Params
  chip (e.g. '14d · +110% cost') and 'Simulating …' tagline are
  conditional on the worker result carrying template parameters
  (durationDays, disruptionPct, costShockMultiplier). The banner
  never lists affected chokepoints by name — the map and the
  chokepoint cards surface those. Per renderScenarioBanner at
  src/components/SupplyChainPanel.ts:750.
- api-shipping-v2 (webhook TTL): register extends both the record
  and the owner-index set's 30-day TTL via atomic pipeline
  (SET + SADD + EXPIRE). rotate-secret and reactivate only
  extend the record's TTL — neither touches the owner-index set,
  so the owner index can expire independently if a caller only
  rotates/reactivates within a 30-day window. Re-register to keep
  both alive. Per api/v2/shipping/webhooks.ts:230 (register
  pipeline) and :325 (rotate setCachedJson on record only).

* docs(mintlify): fix PRO auth contract (trusted origin ≠ PRO)

- api-scenarios: 'X-WorldMonitor-Key (or trusted browser origin)
  + PRO' was wrong — isCallerPremium() explicitly skips
  trusted-origin short-circuits (keyCheck.required === false) and
  only counts (a) an env-valid or user-owned wm_-prefixed API key
  with apiAccess entitlement, or (b) a Clerk bearer with role=pro
  or Dodo tier ≥ 1. Browser calls work because premiumFetch()
  injects one of those credentials per request, not because Origin
  alone authenticates. Per server/_shared/premium-check.ts:34 and
  src/services/premium-fetch.ts:66.
- usage-auth: strengthened the 'Entitlement / tier gating' section
  to state outright that authentication and PRO entitlement are
  orthogonal, and that trusted Origin is NOT accepted as PRO even
  though it is accepted for public endpoints. Listed the two real
  credential forms that pass the gate.

* docs(mintlify): fix stale line cite (MapContainer.activateScenario at :1010)

Greptile review P2: prose cited MapContainer.ts:1004 but activateScenario
is declared at :1010. Line 1004 landed inside the JSDoc block.

* docs(mintlify): finish PR 1 — landing rewrite, features refresh, maritime link-out

Completes the PR 1 items from docs/plans/2026-04-19-001-feat-docs-user-
facing-ia-refresh-plan.md that were deferred after the checkpoint on
Route Explorer + Scenario Engine + CRI nav. No new pages — only edits
to existing pages to point at and cohere with the new workflow pages.

- documentation.mdx: landing rewrite. Dropped brittle counts (344
  news sources, 49 layers, 24 CII countries, 31+ sources, 24 typed
  services) in favor of durable product framing. Surfaced the
  shipped differentiators that were invisible on the landing
  previously: Country Resilience Index (222 countries, linked to
  its methodology page), AI daily brief, Route Explorer,
  Scenario Engine, MCP server. Kept CII and CRI as two distinct
  country-risk surfaces — do not conflate.
- features.mdx: replaced the 'all 55 panels' Cmd+K claim and the
  stale inventory list with family-grouped descriptions that
  include the panels this audit surfaced as missing (disease-
  outbreaks, radiation-watch, thermal-escalation, consumer-prices,
  latest-brief, forecast, country-resilience). Added a Workflows
  section linking to Route Explorer and Scenario Engine, and a
  Country-level risk section linking CII + CRI. Untouched
  sections (map, marker clustering, data layers, export, monitors,
  activity tracking) left as-is.
- maritime-intelligence.mdx: collapsed the embedded Route Explorer
  subsection to a one-paragraph pointer at /route-explorer so the
  standalone page is the canonical home.

Panels nav group remains intentionally unadded; it waits on PR 2
content to avoid rendering an empty group in Mintlify.
2026-04-19 18:39:36 +04:00
Elie Habib
1f66b0c486 fix(billing): wrap non-Error throws before Sentry.captureException (#3212)
* fix(billing): wrap non-Error throws before Sentry.captureException

Convex/Clerk bootstrap occasionally rejects with undefined, which
Sentry.captureException then serializes as a synthetic `Error: undefined`
with zero stack frames — impossible to debug. Normalize err to a real Error
carrying the non-Error value in the message so the next occurrence yields a
usable event.

Resolves WORLDMONITOR-ND.

* fix(billing): apply non-Error normalization to openBillingPortal too

Review feedback: initSubscriptionWatch was fixed but openBillingPortal
shares the same Convex/Clerk bootstrap helpers and the same raw
Sentry.captureException(err) pattern — the synthetic `Error: undefined`
signature can still surface from that path. Extract a module-level
normalizeCaughtError() helper and apply it at both catch sites.

* fix(billing): attach original err as cause on normalized Error

Greptile P2: preserve the raw thrown value as structured `cause` data so
Sentry can display it alongside the stringified message. Assigned
post-construction because tsconfig target=ES2020 lacks the ErrorOptions
typing for `new Error(msg, { cause })`; modern browsers and Sentry read
the property either way.
2026-04-19 17:06:19 +04:00
Elie Habib
4853645d53 fix(brief): switch carousel to @vercel/og on edge runtime (#3210)
* fix(brief): switch carousel to @vercel/og on edge runtime

Every attempt to ship the Phase 8 Telegram carousel on Vercel's
Node serverless runtime has failed at cold start:

- PR #3174 direct satori + @resvg/resvg-wasm: Vercel edge bundler
  refused the `?url` asset import required by resvg-wasm.
- PR #3174 (fix) direct satori + @resvg/resvg-js native binding:
  Node runtime accepted it, but Vercel's nft tracer does not follow
  @resvg/resvg-js/js-binding.js's conditional
  `require('@resvg/resvg-js-<platform>-<arch>-<libc>')` pattern,
  so the linux-x64-gnu peer package was never bundled. Cold start
  threw MODULE_NOT_FOUND, isolate crashed,
  FUNCTION_INVOCATION_FAILED on every request including OPTIONS,
  and Telegram reported WEBPAGE_CURL_FAILED with no other signal.
- PR #3204 added `vercel.json` `functions.includeFiles` to force
  the binding in, but (a) the initial key was a literal path that
  Vercel micromatch read as a character class (PR #3206 fixed),
  (b) even with the corrected `api/brief/carousel/**` wildcard, the
  function still 500'd across the board. The `functions.includeFiles`
  path appears honored in the deployment manifest but not at runtime
  for this particular native-binding pattern.

Fix: swap the renderer to @vercel/og's ImageResponse, which is
Vercel's first-party wrapper around satori + resvg-wasm with
Vercel-native bundling. Runs on Edge runtime — matches every other
API route in the project. No native binding, no includeFiles, no
nft tracing surprises. Cold start ~300ms, warm ~30ms.

Changes:
- server/_shared/brief-carousel-render.ts: replace renderCarouselPng
  (Uint8Array) with renderCarouselImageResponse (ImageResponse).
  Drop ensureLibs + satori + @resvg/resvg-js dynamic-import dance.
  Keep layout builders (buildCover/buildThreads/buildStory) and
  font loading unchanged — the Satori object trees are
  wire-compatible with ImageResponse.
- api/brief/carousel/[userId]/[issueDate]/[page].ts: flip
  `runtime: 'nodejs'` -> `runtime: 'edge'`. Delegate rendering to
  the renderer's ImageResponse and return it directly; error path
  still 503 no-store so CDN + Telegram don't pin a bad render.
- vercel.json: drop the now-useless `functions.includeFiles` block.
- package.json: drop direct `@resvg/resvg-js` and `satori` deps
  (both now bundled inside @vercel/og).
- tests/deploy-config.test.mjs: replace the native-binding
  regression guards with an assertion that no `functions` block
  exists (with a comment pointing at the skill documenting the
  micromatch gotcha for future routes).
- tests/brief-carousel.test.mjs: updated comment references.

Verified:
- typecheck + typecheck:api clean
- test:data 5814/5814 pass
- node -e test: @vercel/og imports cleanly in Node (tests that
  reach through the renderer file no longer depend on native
  bindings)

Post-deploy validation:
  curl -I -H "User-Agent: TelegramBot (like TwitterBot)" \
    "https://www.worldmonitor.app/api/brief/carousel/<uid>/<slot>/0"
  # Expect: HTTP/2 403 (no token) or 200 (valid token)
  # NOT:    HTTP/2 500 FUNCTION_INVOCATION_FAILED

Then tail Railway digest logs on the next tick; the
`[digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED` line
should stop appearing, and the 3-image preview should actually land
on Telegram.

* Add renderer smoke test + fix Cache-Control duplication

Reviewer flagged residual risk: no dedicated carousel-route smoke
test for the @vercel/og path. Adds one, and catches a real bug in
the process.

Findings during test-writing:

1. @vercel/og's ImageResponse runs CLEANLY in Node via tsx — the
   comment in brief-carousel.test.mjs saying "we can't test the
   render in Node" was true for direct satori + @resvg/resvg-wasm
   but no longer holds after PR #3210. Pure Node render works
   end-to-end: satori tree-parse, jsdelivr font fetch, resvg-wasm
   init, PNG output. ~850ms first call, ~20ms warm.

2. ImageResponse sets its own default
   `Cache-Control: public, immutable, no-transform, max-age=31536000`.
   Passing Cache-Control via the constructor's headers option
   APPENDS rather than overrides, producing a duplicated
   comma-joined value like
   `public, immutable, no-transform, max-age=31536000, public, max-age=60`
   on the Response. The route handler was doing exactly this via
   extraHeaders. Fix: drop our Cache-Control override and rely on
   @vercel/og's 1-year immutable default — envelope is only
   immutable for its 7d Redis TTL so the effective ceiling is 7d
   anyway (after that the route 404s before render).

Changes:

- tests/brief-carousel.test.mjs: 6 new assertions under
  `renderCarouselImageResponse`:
    * renders cover / threads / story pages, each returning a
      valid PNG (magic bytes + size range)
    * rejects a structurally empty envelope
    * threads non-cache extraHeaders onto the Response
    * pins @vercel/og's Cache-Control default so it survives
      caller-supplied Cache-Control overrides (regression guard
      for the bug fixed in this commit)
- api/brief/carousel/[userId]/[issueDate]/[page].ts: remove the
  stacked Cache-Control; lean on @vercel/og default. Drop the now-
  unused `PAGE_CACHE_TTL` constant. Comment explains why.

Verified:
- test:data 5820/5820 pass (was 5814, +6 smoke)
- typecheck + typecheck:api clean
- Render smoke: cover 825ms / threads 23ms / story 16ms first run
  (wasm init dominates first render)
2026-04-19 15:18:12 +04:00
Elie Habib
e4c95ad9be docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage (#3209)
* docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage

Audit against api/ + proto/ revealed 9 OpenAPI specs missing from nav,
the scenario/v1 service undocumented, and MCP (32 tools + OAuth 2.1 flow)
with no user-facing docs. The stale Docs_To_Review/API_REFERENCE.md still
pointed at pre-migration endpoints that no longer exist.

- Wire 9 orphaned specs into docs.json: ConsumerPrices, Forecast, Health,
  Imagery, Radiation, Resilience, Sanctions, Thermal, Webcam
- Hand-write ScenarioService.openapi.yaml (3 RPCs) until it's proto-backed
  (tracked in issue #3207)
- New MCP page with tool catalog + client setup (Claude Desktop/web, Cursor)
- New MDX for OAuth, Platform, Brief, Commerce, Notifications, Shipping v2,
  Proxies
- New Usage group: quickstart, auth matrix, rate limits, errors
- Remove docs/Docs_To_Review/API_REFERENCE.md and EXTERNAL_APIS.md
  (referenced dead endpoints); add README flagging dir as archival

* docs(mintlify): move scenario docs out of generated docs/api/ tree

The pre-push hook enforces that docs/api/ is proto-generated only.
Replace the hand-written ScenarioService.openapi.yaml with a plain
MDX page (docs/api-scenarios.mdx) until the proto migration lands
(tracked in issue #3207).

* docs(mintlify): fix factual errors flagged in PR review

Reviewer caught 5 endpoints where I speculated on shape/method/limits
instead of reading the code. All fixes cross-checked against the
source:

- api-shipping-v2: route-intelligence is GET with query params
  (fromIso2, toIso2, cargoType, hs2), not POST with a JSON body.
  Response shape is {primaryRouteId, chokepointExposures[],
  bypassOptions[], warRiskTier, disruptionScore, ...}.
- api-commerce: /api/product-catalog returns {tiers, fetchedAt,
  cachedUntil, priceSource} with tier groups free|pro|api_starter|
  enterprise, not the invented {currency, plans}. Document the
  DELETE purge path too.
- api-notifications: Slack/Discord /oauth/start are POST + Clerk
  JWT + PRO (returning {oauthUrl}), not GET redirects. Callbacks
  remain GET.
- api-platform: /api/version returns the latest GitHub Release
  ({version, tag, url, prerelease}), not deployed commit/build
  metadata.
- api-oauth + mcp: /api/oauth/register limit is 5/60s/IP (match
  code), not 10/hour.

Also caught while double-checking: /api/register-interest and
/api/contact are 5/60min and 3/60min respectively (1-hour window,
not 1-minute). Both require Turnstile. Removed the fabricated
limits for share-url, notification-channels, create-checkout
(they fall back to the default per-IP limit).

* docs(mintlify): second-round fixes — verify every claim against source

Reviewer caught 7 more cases where I described API behavior I hadn't
read. Each fix below cross-checked against the handler.

- api-commerce (product-catalog): tiers are flat objects with
  monthlyPrice/annualPrice/monthlyProductId/annualProductId on paid
  tiers, price+period for free, price:null for enterprise. There is
  no nested plans[] array.
- api-commerce (referral/me): returns {code, shareUrl}, not counts.
  Code is a deterministic 8-char HMAC of the Clerk userId; binding
  into Convex is fire-and-forget via ctx.waitUntil.
- api-notifications (notification-channels): actual action set is
  create-pairing-token, set-channel, set-web-push, delete-channel,
  set-alert-rules, set-quiet-hours, set-digest-settings. Replaced
  the made-up list.
- api-shipping-v2 (webhooks): alertThreshold is numeric 0-100
  (default 50), not a severity string. Subscriber IDs are wh_+24hex;
  secret is raw 64-char hex (no whsec_ prefix). POST registration
  returns 201. Added the management routes: GET /{id},
  POST /{id}/rotate-secret, POST /{id}/reactivate.
- api-platform (cache-purge): auth is Authorization: Bearer
  RELAY_SHARED_SECRET, not an admin-key header. Body takes keys[]
  and/or patterns[] (not {key} or {tag}), with explicit per-request
  caps and prefix-blocklist behavior.
- api-platform (download): platform+variant query params, not
  file=<id>. Response is a 302 to a GitHub release asset; documented
  the full platform/variant tables.
- mcp: server also accepts direct X-WorldMonitor-Key in addition to
  OAuth bearer. Fixed the curl example which was incorrectly sending
  a wm_live_ API key as a bearer token.
- api-notifications (youtube/live): handler reads channel or videoId,
  not channelId.
- usage-auth: corrected the auth-matrix row for /api/mcp to reflect
  that OAuth is one of two accepted modes.

* docs(mintlify): fix Greptile review findings

- mcp.mdx: 'Five' slow tools → 'Six' (list contains 6 tools)
- api-scenarios.mdx: replace invalid JSON numeric separator
  (8_400_000_000) with plain integer (8400000000)

Greptile's third finding — /api/oauth/register rate-limit contradiction
across api-oauth.mdx / mcp.mdx / usage-rate-limits.mdx — was already
resolved in commit 4f2600b2a (reviewed commit was eb5654647).
2026-04-19 15:03:16 +04:00
Elie Habib
38e6892995 fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205)
* fix(brief): per-run slot URL so same-day digests link to distinct briefs

Digest emails at 8am and 1pm on the same day pointed to byte-identical
magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz.
Each compose run overwrote the single daily envelope in place, and the
composer rolling 24h story window meant afternoon output often looked
identical to morning. Readers clicking an older email got whatever the
latest cron happened to write.

Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The
magazine URL, carousel URLs, and Redis key all carry the slot, and each
digest dispatch gets its own frozen envelope that lives out the 7d TTL.
envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026".

The digest cron also writes a brief:latest:{userId} pointer (7d TTL,
overwritten each compose) so the dashboard panel and share-url endpoint
can locate the most recent brief without knowing the slot. The
previous date-probing strategy does not work once keys carry HHMM.

No back-compat for the old YYYY-MM-DD format: the verifier rejects it,
the composer only ever writes the new shape, and any in-flight
notifications signed under the old format will 403 on click. Acceptable
at the rollout boundary per product decision.

* fix(brief): carve middleware bot allowlist to accept slot-format carousel path

BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the
pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted
by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist
and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn
image fetchers would 403 on sendMediaGroup, breaking previews for the
new digest links.

CI missed this because tests/middleware-bot-gate.test.mts still
exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the
slot format and add a regression asserting the pre-slot shape is now
rejected, so legacy links cannot silently leak the allowlist after
the rollout.

* fix(brief): preserve caller-requested slot + correct no-brief share-url error

Two contract bugs in the slot rollout that silently misled callers:

1. GET /api/latest-brief?slot=X where X has no envelope was returning
   { status: 'composing', issueDate: <today UTC> } — which reads as
   "today's brief is composing" instead of "the specific slot you
   asked about doesn't exist". A caller probing a known historical
   slot would get a completely unrelated "today" signal. Now we echo
   the requested slot back (issueSlot + issueDate derived from its
   date portion) when the caller supplied ?slot=, and keep the
   UTC-today placeholder only for the no-param path.

2. POST /api/brief/share-url with no slot and no latest-pointer was
   falling into the generic invalid_slot_shape 400 branch. That is
   not an input-shape problem; it is "no brief exists yet for this
   user". Return 404 brief_not_found — the same code the
   existing-envelope check returns — so callers get one coherent
   contract: either the brief exists and is shareable, or it doesn't
   and you get 404.
2026-04-19 14:15:59 +04:00
Elie Habib
56054bfbc1 fix(brief): use wildcard glob in vercel.json functions key (PR #3204 follow-up) (#3206)
* fix(brief): use wildcard glob in vercel.json functions key

PR #3204 shipped the right `includeFiles` value but the WRONG key:

  "api/brief/carousel/[userId]/[issueDate]/[page].ts"

Vercel's `functions` config keys are micromatch globs, not literal
paths. Bracketed segments like `[userId]` are parsed as character
classes (match any ONE character from {u,s,e,r,I,d}), so my rule
matched zero files and `includeFiles` was silently ignored. Post-
merge probe still returned HTTP 500 FUNCTION_INVOCATION_FAILED on
every request. Build log shows zero mentions of `carousel` or
`resvg` — corroborates the key never applied.

Fix: wildcard path segments.

  "api/brief/carousel/**"

Matches any file under the carousel route dir. Since the only
deployed file there is the dynamic-segment handler, the effective
scope is identical to what I originally intended.

Added a second regression test that sweeps every functions key and
fails loudly if any bracketed segment slips back in. Guards against
future reverts AND against anyone copy-pasting the literal route
path without realising Vercel reads it as a glob.

23/23 deploy-config tests pass (was 22, +1 new guard).

* Address Greptile P2: widen bracket-literal guard regex

Greptile spotted that `/\[[A-Za-z]+\]/` only matches purely-alphabetic
segment names. Real-world Next.js routes often use `[user_id]`,
`[issue_date]`, `[page1]`, `[slug2024]` — none flagged by the old
regex, so the guard would silently pass on the exact kind of
regression it was written to catch.

Widened to `/\[[A-Za-z][A-Za-z0-9_]*\]/`:
  - requires a leading letter (so legit char classes like `[0-9]`
    and `[!abc]` don't false-positive)
  - allows letters, digits, underscores after the first char
  - covers every Next.js-style dynamic-segment name convention

Also added a self-test that pins positive cases (userId, user_id,
issue_date, page1, slug2024) and negative cases (the actual `**`
glob, `[0-9]`, `[!abc]`) so any future narrowing of the regex
breaks CI immediately instead of silently re-opening PR #3206.

24/24 deploy-config tests pass (was 23, +1 new self-test).
2026-04-19 14:02:30 +04:00
Elie Habib
305dc5ef36 feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200)
* feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op)

Replaces the inline Jaccard story-dedup in seed-digest-notifications
with an orchestrator that can run Jaccard, shadow, or full embedding
modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so
production behaviour is unchanged until Phase C shadow + Phase D flip.

New modules (scripts/lib/):
- brief-dedup-consts.mjs       tunables + cache prefix + __constants bag
- brief-dedup-jaccard.mjs      verbatim 0.55-threshold extract (fallback)
- entity-gazetteer.mjs         cities/regions gazetteer + common-caps
- brief-embedding.mjs          OpenRouter /embeddings client with Upstash
                               cache, all-or-nothing timeout, cosineSimilarity
- brief-dedup-embed.mjs        complete-link clustering + entity veto (pure)
- brief-dedup.mjs              orchestrator, env read at call entry,
                               shadow archive, structured log line

Operator tools (scripts/tools/):
- calibrate-dedup-threshold.mjs  offline calibration runner + histogram
- golden-pair-validator.mjs      live-embedder drift detector (nightly CI)
- shadow-sample.mjs              Sample A/B CSV emitter over SCAN archive

Tests:
- brief-dedup-jaccard.test.mjs    migrated from regex-harness to direct
                                   import plus orchestrator parity tests (22)
- brief-dedup-embedding.test.mjs  9 plan scenarios incl. 10-permutation
                                   property test, complete-link non-chain (21)
- brief-dedup-golden.test.mjs     20-pair mocked canary (21)

Workflows:
- .github/workflows/dedup-golden-pairs.yml  nightly live-embedder canary
                                             (07:17 UTC), opens issue on drift

Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran
shuts Hormuz") case can't return true under a single coherent
classification (country-in-A vs capital-in-B sit on different sides
of the actor/location boundary). Gazetteer follows the plan's
"countries are actors" intent; the test is updated to assert false
with a comment pointing at the irreducible capital-country
coreference limitation.

Verification:
- npm run test:data          5825/5825 pass
- tests/edge-functions        171/171 pass
- typecheck + typecheck:api  clean
- biome check on new files    clean
- lint:md                     0 errors

Phase B (calibration), Phase C (shadow), and Phase D (flip) are
subsequent PRs.

* refactor(digest-dedup): address review findings 193-199

Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across
kieran-typescript, security-sentinel, performance-oracle, architecture-
strategist, and code-simplicity reviewers. Fixes below; all 64 dedup
tests + 5825 data tests + 171 edge-function tests still green.

P1 #193 - dedup regex + redis pipeline duplication
- Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs;
  both orchestrator and embedding client import from there.
- normalizeForEmbedding now delegates to stripSourceSuffix from the
  Jaccard module so the outlet allow-list is single-sourced.

P1 #194 - embedding timeout floor + negative-budget path
- callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0
  instead of opening a doomed 250ms fetch.
- Removed Math.max(250, ...) floor that let wall-clock cap overshoot.

P1 #195 - dead env getters
- Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled /
  getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs
  (zero callers; orchestrator reimplements inline).

P2 #196 - orchestrator cleanup bundle
- Removed re-exports at bottom of brief-dedup.mjs.
- Extracted materializeCluster into brief-dedup-jaccard.mjs; both
  the fallback and orchestrator use the shared helper.
- Deleted clusterWithEntityVeto wrapper; orchestrator inlines the
  vetoFn wiring at the single call site.
- Shadow mode now runs Jaccard exactly once per tick (was twice).
- Fallback warn line carries reason=ErrorName so operators can
  filter timeout vs provider vs shape errors.
- Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs
  silently falling to jaccard).

P2 #197 - workflow + shadow-sample hardening
- dedup-golden-pairs.yml body composition no longer relies on a
  heredoc that would command-substitute validator stdout. Switched
  to printf with sanitised LOG_TAIL (printable ASCII only) and
  --body-file so crafted fixture text cannot escape into the runner.
- shadow-sample.mjs Upstash helper enforces a hardcoded command
  allowlist (SCAN | GET | EXISTS).

P2 #198 - test + observability polish
- Scenarios 2 and 3 deep-equal returned clusters against the Jaccard
  expected shape, not just length. Also assert the reason= field.

P3 #199 - nits
- Removed __constants test-bag; jaccard tests use named imports.
- Renamed deps.apiKey to deps._apiKey in embedding client.
- Added @pre JSDoc on diffClustersByHash about unique-hash contract.
- Deferred: mocked golden-pair test removal, gazetteer JSON migration,
  scripts/tools AGENTS.md doc note.

Todos 193-199 moved from pending to complete.

Verification:
- npm run test:data            5825/5825 pass
- tests/edge-functions          171/171 pass
- typecheck + typecheck:api    clean
- biome check on changed files clean

* fix(digest-dedup): address Greptile P2 findings on PR #3200

1. brief-embedding.mjs: wrap fetch lookup as
   `(...args) => globalThis.fetch(...args)` instead of aliasing bare
   `fetch`. Aliasing captures the binding at module-load time, so
   later instrumentation / Edge-runtime shims don't see the wrapper —
   same class of bug as the banned `fetch.bind(globalThis)` pattern
   flagged in AGENTS.md.

2. dedup-golden-pairs.yml: `gh issue create --label "..." || true`
   silently swallowed the failure when any of dedup/canary/p1 labels
   didn't pre-exist, breaking the drift alert channel while leaving
   the job red in the Actions UI. Switched to repeated `--label`
   flags + `--create-label` so any missing label is auto-created on
   first drift, and dropped the `|| true` so a legitimate failure
   (network / auth) surfaces instead of hiding.

Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1);
applied pre-merge so the nightly canary is usable from day one.

* fix(digest-dedup): two P1s found on PR #3200

P1 — canary classifier must match production
Nightly golden-pair validator was checking a hardcoded threshold
(default 0.60) and always applied the entity veto, while the actual
dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase
C/D env flip could make the canary green while prod was wrong or
red while prod was healthy, defeating the whole point of a drift
detector.

Fix:
- golden-pair-validator.mjs now calls readOrchestratorConfig(process.env)
  — the same helper the orchestrator uses — so any classifier knob
  added later is picked up automatically. The threshold and veto-
  enabled flags are sourced from env by default; a --threshold CLI
  flag still overrides for manual calibration sweeps.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and
  DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.*),
  which operators must keep in lockstep with Railway. The
  workflow_dispatch threshold input now defaults to empty; the
  scheduled canary always uses the production-parity config.
- Validator log line prints the effective config + source so nightly
  output makes the classifier visible.

P1 — shadow archive writes were fail-open
`defaultRedisPipeline()` returns null on timeout / auth / HTTP
failure. `writeShadowArchive()` only had a try/catch, so the null
result was silently treated as success. A Phase C rollout could
log clean "mode=shadow … disagreements=X" lines every tick while
the Upstash archive received zero writes — and Sample B labelling
would then find no batches, silently killing calibration.

Fix:
- writeShadowArchive now inspects the pipeline return. null result,
  non-array response, per-command {error}, or a cell without
  {result: "OK"} all return {ok: false, reason}.
- Orchestrator emits a warn line with the failure reason, and the
  structured log line carries archive_write=ok|failed so operators
  can grep for failed ticks.
- Regression test in brief-dedup-embedding.test.mjs simulates the
  null-pipeline contract and asserts both the warn and the structured
  field land.

Verification:
- test:data           5825/5825 pass
- dedup suites         65/65   pass (new: archive-fail regression)
- typecheck + api     clean
- biome check         clean on changed files

* fix(digest-dedup): two more P1s found on PR #3200

P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED
The prior round fixed the threshold/veto knobs but left the canary
running embeddings regardless of whether production could actually
reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or
DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the
classifier, so a drift signal is meaningless — or worse, a live
OpenRouter issue flags the canary while prod is obliviously fine.

Fix:
- golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the
  same readOrchestratorConfig() helper the orchestrator uses. When
  either says "embed path inactive in prod", the validator logs an
  explicit skip line and exits 0. The nightly workflow then shows
  green, which is the correct signal ("nothing to drift against").
- A --force CLI flag remains for manual dispatch during staged
  rollouts.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and
  DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables
  alongside the threshold and veto-enabled knobs, so all four
  classifier gates stay in lockstep with Railway.
- Validator log line now prints mode + remoteEmbedEnabled so the
  canary output surfaces which classifier it validated.

P1 — shadow-sample Sample A was biased by SCAN order
enumerate-and-dedup added every seen pair to a dedup key BEFORE
filtering by agreement. If the same pair appeared in an agreeing
batch first and a disagreeing batch later, the disagreeing
occurrence was silently dropped. SCAN order is unspecified, so
Sample A could omit real disagreement pairs.

Fix:
- Extracted the enumeration into a pure `enumeratePairs(archives, mode)`
  export so the logic is testable. Mode filter runs BEFORE the dedup
  check: agreeing pairs are skipped entirely under
  --mode disagreements, so any later disagreeing occurrence can
  still claim the dedup slot.
- Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression
  cases: agreement-then-disagreement, reversed order (symmetry),
  always-agreed omission, population enumeration, cross-batch dedup.
- isMain guard added so importing the module for tests does not
  kick off the CLI scan path.

Verification:
- test:data           5825/5825 pass
- dedup suites         70/70   pass (5 new shadow-sample regressions)
- typecheck + api     clean
- biome check         clean on changed files

Operator follow-up before Phase C:
Set all FOUR dedup repo variables in GitHub alongside Railway:
  DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED,
  DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED

* refactor(digest-dedup): Railway is the single source of truth for dedup config

Fair user pushback: asking operators to set four DIGEST_DEDUP_*
values in BOTH Railway (where the cron runs) AND GitHub repo
variables (where the canary runs) is architectural debt. Two
copies of the same truth will always drift.

Solution: the digest cron publishes its resolved config to Upstash
on every tick under brief:dedup:config:v1 (2h TTL). The nightly
golden-pair canary reads that key instead of env vars. Railway
stays the sole source of truth; no parallel repo variables to
maintain. A missing/expired key signals "cron hasn't run" and
the canary skips with exit 0 — better than validating against
hardcoded defaults that might diverge from prod.

Changes:
- brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants.
- brief-dedup.mjs: new publishActiveConfig() fires at the start of
  every deduplicateStories() call (before the mode short-circuit,
  so jaccard ticks also publish a "mode=jaccard" signal the canary
  can read). Fire-and-forget; archive-write error semantics still
  apply if the operator wants stricter tracking.
- golden-pair-validator.mjs: removed readOrchestratorConfig(env)
  path. Now calls fetchActiveConfigFromUpstash() and either
  validates against that config, skips when the embed path is
  inactive, or skips when the key is missing (with --force
  override for manual dispatch).
- dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines
  and the corresponding repo-variable dependency. Only the three
  Upstash + OpenRouter secrets remain.
- tests: two new regressions assert config is published on every
  tick (shadow AND jaccard modes) with the right shape + TTL.

Operator onboarding now takes one action: set the four
DIGEST_DEDUP_* variables on the Railway seed-digest-notifications
service. Nothing to set in GitHub beyond the existing
OPENROUTER_API_KEY / UPSTASH_* secrets.

Verification:
- test:data           5825/5825 pass
- dedup suites         72/72   pass (2 new config-publish regressions)
- typecheck + api     clean
- biome check         clean on changed files

* refactor(digest-dedup): ship embed directly, drop phases/canary/shadow

User feedback: "i dont need multiple phases and shit, we go directly
to embed". Fair. Ripping out the overengineering I accumulated:

DELETED
- .github/workflows/dedup-golden-pairs.yml (nightly canary)
- scripts/tools/golden-pair-validator.mjs
- scripts/tools/shadow-sample.mjs
- scripts/tools/calibrate-dedup-threshold.mjs
- tests/fixtures/brief-dedup-golden-pairs.json
- tests/brief-dedup-golden.test.mjs
- tests/brief-dedup-shadow-sample.test.mjs

SIMPLIFIED
- brief-dedup.mjs: removed shadow mode, publishActiveConfig,
  writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes,
  and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now
  binary: `embed` (default) or `jaccard` (instant kill switch).
- brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_*, ACTIVE_CONFIG_*.
- Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path).
  Railway deploy with OPENROUTER_API_KEY set = embeddings live on
  next cron tick. Set MODE=jaccard on Railway to revert instantly.

Orchestrator still falls back to Jaccard on any embed-path failure
(timeout, provider outage, missing API key, bad response). Fallback
warn carries reason=<ErrorName>. The cron never fails because
embeddings flaked. All 64 dedup tests + 5825 data tests still green.

Net diff: -1,407 lines.

Operator single action: set OPENROUTER_API_KEY on Railway's
seed-digest-notifications service (already present) and ship. No
GH Actions, no shadow archives, no labelling sprints. If the 0.60
threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on
Railway — takes effect on next tick, no redeploy.

* fix(digest-dedup): multi-word location phrases in the entity veto

Extractor was whitespace-tokenising and only single-token matching
against LOCATION_GAZETTEER, silently making every multi-word entry
unreachable:

  extractEntities("Houthis strike ship in Red Sea")
    → { locations: [], actors: ['houthis','red','sea'] }   ✗
  shouldVeto("Houthis strike ship in Red Sea",
             "US escorts convoy in Red Sea")  → false       ✗

With MODE=embed as the default, that turned off the main
anti-overmerge safety rail for bodies of water, regions, and
compound city names — exactly the P07-Hormuz / Houthis-Red-Sea
headlines the veto was designed to cover.

Fix: greedy longest-phrase scan with a sliding window. At each
token position try the longest multi-word phrase first (down to
2), require first AND last tokens to be capitalised (so lowercase
prose like "the middle east" doesn't falsely match while headline
"Middle East" does), lowercase connectors in between are fine
("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to
single-token lookup when no multi-word phrase fits.

Now:
  extractEntities("Houthis strike ship in Red Sea")
    → { locations: ['red sea'], actors: ['houthis'] }       ✓
  shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true             ✓

Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4
(longest gazetteer entry: "ho chi minh city"), so this is
effectively O(N).

Added 5 regression tests covering Red Sea, South China Sea,
Strait of Hormuz (lowercase-connector case), Abu Dhabi, and
New York, plus the Houthis-vs-US veto reproducer from the P1.
All 5825 data tests + 45 dedup tests green; lint + typecheck clean.
2026-04-19 13:49:48 +04:00
Elie Habib
27849fee1e fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn (#3204)
* fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn

Real root cause of every Telegram carousel WEBPAGE_CURL_FAILED
since PR #3174 merged. Not middleware (last PR fixed that
theoretical path but not the observed failure). The Vercel
function itself crashes HTTP 500 FUNCTION_INVOCATION_FAILED on
every request including OPTIONS - the isolate can't initialise.

The handler imports brief-carousel-render which lazy-imports
@resvg/resvg-js. That package's js-binding.js does runtime
require(@resvg/resvg-js-<platform>-<arch>-<libc>). On Vercel
Lambda (Amazon Linux 2 glibc) that resolves to
@resvg/resvg-js-linux-x64-gnu. Vercel nft tracing does NOT
follow this conditional require so the optional peer package
isnt bundled. Cold start throws MODULE_NOT_FOUND, isolate
crashes, Vercel returns FUNCTION_INVOCATION_FAILED, Telegram
reports WEBPAGE_CURL_FAILED.

Fix: vercel.json functions.includeFiles forces linux-x64-gnu
binding into the carousel functions bundle. Only this route
needs it; every other api route is unaffected.

Verified:
- deploy-config tests 21/21 pass
- JSON valid
- Reproduced 500 via curl on all methods and UAs
- resvg-js/js-binding.js confirms linux-x64-gnu is the runtime
  binary on Amazon Linux 2 glibc

Post-merge: curl with TelegramBot UA should return 200 image/png
instead of 500; next cron tick should clear the Railway
[digest] Telegram carousel 400 line.

* Address Greptile P2s: regression guard + arch-assumption reasoning

Two P2 findings on PR #3204:

P2 #1 (inline on vercel.json:6): Platform architecture assumption
undocumented. If Vercel migrates to Graviton/arm64 Lambda the
cold-start crash silently returns. vercel.json is strict JSON so
comments aren't possible inline.

P2 #2 (tests/deploy-config.test.mjs:17): No regression guard for
the carousel includeFiles rule. A future vercel.json tidy-up
could silently revert the fix with no CI signal.

Fixed both in a single block:

- New describe() in deploy-config.test.mjs asserts the carousel
  route's functions entry exists AND its includeFiles points at
  @resvg/resvg-js-linux-x64-gnu. Any drift fails the build.
- The block comment above it documents the Amazon Linux 2 x86_64
  glibc assumption that would have lived next to the includeFiles
  entry if JSON supported comments. Includes the Graviton/arm64
  migration pointer.

tests 22/22 pass (was 21, +1 new).
2026-04-19 13:36:17 +04:00
Elie Habib
45f02fed00 fix(sentry): filter Three.js OrbitControls setPointerCapture NotFoundError (#3201)
* fix(sentry): suppress Three.js OrbitControls setPointerCapture NotFoundError

OrbitControls' pointerdown handler calls setPointerCapture after the
browser has already released the pointer (focus change, rapid re-tap),
leaking as an unhandled NotFoundError. OrbitControls is bundled into
main-*.js so hasFirstParty=true; matched by the unique setPointerCapture
message (grep confirms no first-party setPointerCapture usage).

Resolves WORLDMONITOR-NC.

* fix(sentry): gate OrbitControls setPointerCapture filter on bundle-only stack

Review feedback: suppressing by message alone would hide a future first-party
setPointerCapture regression. Mirror the existing OrbitControls filter's
provenance check — require absence of any source-mapped .ts/.tsx frame so the
filter only matches stacks whose only non-infra frame is the bundled main chunk.

Adds positive + negative regression tests for the pair.

* fix(sentry): gate OrbitControls filter on positive three.js context signature

Review feedback: absence of .ts/.tsx frames is not proof of third-party origin
because production stacks are often unsymbolicated. Replace the negative-only
gate with a positive OrbitControls signature — require a frame whose context
slice contains the literal `_pointers … setPointerCapture` adjacency unique to
three.js OrbitControls. Update tests to cover the production-realistic case
(unsymbolicated first-party bundle frame calling setPointerCapture must still
reach Sentry) plus a defensive no-context fallthrough.
2026-04-19 13:15:31 +04:00
Elie Habib
d7f87754f0 fix(emails): update transactional email copy — 22 → 30+ services (#3203)
Follow-up to #3202. Greptile flagged two transactional email templates still claimed '22 services' while /pro now advertises '30+':

- api/register-interest.js:90 — interest-registration confirmation email ('22 Services, 1 Key')
- convex/payments/subscriptionEmails.ts:57 — API subscription confirmation email ('22 services, one API key')

A user signing up via /pro would read '30+ services' on the page, then receive an email saying '22'. Both updated to '30+' matching the /pro page and the actual server domain count (31 in server/worldmonitor/*, plus api/scenario/v1/ = 32, growing).
2026-04-19 13:15:17 +04:00
Elie Habib
135082d84f fix(pro): correct service-domain count — 22 → 30+ (server has 31) (#3202)
* fix(pro): correct service-domain count — 22 → 30+ (server has 31, growing)

The /pro page advertised '22 services' / '22 service domains' but server/worldmonitor/, proto/worldmonitor/, and src/generated/server/worldmonitor/ all have 31 domain dirs (aviation, climate, conflict, consumer-prices, cyber, displacement, economic, forecast, giving, health, imagery, infrastructure, intelligence, maritime, market, military, natural, news, positive-events, prediction, radiation, research, resilience, sanctions, seismology, supply-chain, thermal, trade, unrest, webcam, wildfire). api/scenario/v1/ adds a 32nd recently shipped surface.

Used '30+' rather than the literal '31' so the page doesn't drift again every time a new domain ships (the '22' was probably accurate at one point too).

168 string substitutions across all 21 locale JSON files (8 keys each: twoPath.proDesc, twoPath.proF1, whyUpgrade.fasterDesc, pillars.askItDesc, dataCoverage.subtitle, proShowcase.oneKey, apiSection.restApi, faq.a8). Plus 10 in pro-test/index.html (meta description, og:description, twitter:description, SoftwareApplication ld+json description + Pro Monthly offer, FAQ ld+json a8, noscript fallback). Bundle rebuilt.

* fix(pro): Bulgarian grammar — drop definite-article suffix after 30+
2026-04-19 13:07:07 +04:00