worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Sebastien Melki	58e42aadf9	chore(api): enforce sebuf contract + migrate drifting endpoints (#3207 ) (#3242 ) * chore(api): enforce sebuf contract via exceptions manifest (#3207) Adds api/api-route-exceptions.json as the single source of truth for non-proto /api/ endpoints, with scripts/enforce-sebuf-api-contract.mjs gating every PR via npm run lint:api-contract. Fixes the root-only blind spot in the prior allowlist (tests/edge-functions.test.mjs), which only scanned top-level .js files and missed nested paths and .ts endpoints — the gap that let api/supply-chain/v1/country-products.ts and friends drift under proto domain URL prefixes unchallenged. Checks both directions: every api/<domain>/v<N>/[rpc].ts must pair with a generated service_server.ts (so a deleted proto fails CI), and every generated service must have an HTTP gateway (no orphaned generated code). Manifest entries require category + reason + owner, with removal_issue mandatory for temporary categories (deferred, migration-pending) and forbidden for permanent ones. .github/CODEOWNERS pins the manifest to @SebastienMelki so new exceptions don't slip through review. The manifest only shrinks: migration-pending entries (19 today) will be removed as subsequent commits in this PR land each migration. refactor(maritime): migrate /api/ais-snapshot → maritime/v1.GetVesselSnapshot (#3207) The proto VesselSnapshot was carrying density + disruptions but the frontend also needed sequence, relay status, and candidate_reports to drive the position-callback system. Those only lived on the raw relay passthrough, so the client had to keep hitting /api/ais-snapshot whenever callbacks were registered and fall back to the proto RPC only when the relay URL was gone. This commit pushes all three missing fields through the proto contract and collapses the dual-fetch-path into one proto client call. Proto changes (proto/worldmonitor/maritime/v1/): - VesselSnapshot gains sequence, status, candidate_reports. - GetVesselSnapshotRequest gains include_candidates (query: include_candidates). Handler (server/worldmonitor/maritime/v1/get-vessel-snapshot.ts): - Forwards include_candidates to ?candidates=... on the relay. - Separate 5-min in-memory caches for the candidates=on and candidates=off variants; they have very different payload sizes and should not share a slot. - Per-request in-flight dedup preserved per-variant. Frontend (src/services/maritime/index.ts): - fetchSnapshotPayload now calls MaritimeServiceClient.getVesselSnapshot directly with includeCandidates threaded through. The raw-relay path, SNAPSHOT_PROXY_URL, DIRECT_RAILWAY_SNAPSHOT_URL and LOCAL_SNAPSHOT_FALLBACK are gone — production already routed via Vercel, the "direct" branch only ever fired on localhost, and the proto gateway covers both. - New toLegacyCandidateReport helper mirrors toDensityZone/toDisruptionEvent. api/ais-snapshot.js deleted; manifest entry removed. Only reduced the codegen scope to worldmonitor.maritime.v1 (buf generate --path) — regenerating the full tree drops // @ts-nocheck from every client/server file and surfaces pre-existing type errors across 30+ unrelated services, which is not in scope for this PR. Shape-diff vs legacy payload: - disruptions / density: proto carries the same fields, just with the GeoCoordinates wrapper and enum strings (remapped client-side via existing toDisruptionEvent / toDensityZone helpers). - sequence, status.{connected,vessels,messages}: now populated from the proto response — was hardcoded to 0/false in the prior proto fallback. - candidateReports: same shape; optional numeric fields come through as 0 instead of undefined, which the legacy consumer already handled. * refactor(sanctions): migrate /api/sanctions-entity-search → LookupSanctionEntity (#3207) The proto docstring already claimed "OFAC + OpenSanctions" coverage but the handler only fuzzy-matched a local OFAC Redis index — narrower than the legacy /api/sanctions-entity-search, which proxied OpenSanctions live (the source advertised in docs/api-proxies.mdx). Deleting the legacy without expanding the handler would have been a silent coverage regression for external consumers. Handler changes (server/worldmonitor/sanctions/v1/lookup-entity.ts): - Primary path: live search against api.opensanctions.org/search/default with an 8s timeout and the same User-Agent the legacy edge fn used. - Fallback path: the existing OFAC local fuzzy match, kept intact for when OpenSanctions is unreachable / rate-limiting. - Response source field flips between 'opensanctions' (happy path) and 'ofac' (fallback) so clients can tell which index answered. - Query validation tightened: rejects q > 200 chars (matches legacy cap). Rate limiting: - Added /api/sanctions/v1/lookup-entity to ENDPOINT_RATE_POLICIES at 30/min per IP — matches the legacy createIpRateLimiter budget. The gateway already enforces per-endpoint policies via checkEndpointRateLimit. Docs: - docs/api-proxies.mdx — dropped the /api/sanctions-entity-search row (plus the orphaned /api/ais-snapshot row left over from the previous commit in this PR). - docs/panels/sanctions-pressure.mdx — points at the new RPC URL and describes the OpenSanctions-primary / OFAC-fallback semantics. api/sanctions-entity-search.js deleted; manifest entry removed. * refactor(military): migrate /api/military-flights → ListMilitaryFlights (#3207) Legacy /api/military-flights read a pre-baked Redis blob written by the seed-military-flights cron and returned flights in a flat app-friendly shape (lat/lon, lowercase enums, lastSeenMs). The proto RPC takes a bbox, fetches OpenSky live, classifies server-side, and returns nested GeoCoordinates + MILITARY__TYPE_ enum strings + lastSeenAt — same data, different contract. fetchFromRedis in src/services/military-flights.ts was doing nothing sebuf-aware. Renamed it to fetchViaProto and rewrote to: - Instantiate MilitaryServiceClient against getRpcBaseUrl(). - Iterate MILITARY_QUERY_REGIONS (PACIFIC + WESTERN) in parallel — same regions the desktop OpenSky path and the seed cron already use, so dashboard coverage tracks the analytic pipeline. - Dedup by hexCode across regions. - Map proto → app shape via new mapProtoFlight helper plus three reverse enum maps (AIRCRAFT_TYPE_REVERSE, OPERATOR_REVERSE, CONFIDENCE_REVERSE). The seed cron (scripts/seed-military-flights.mjs) stays put: it feeds regional-snapshot mobility, cross-source signals, correlation, and the health freshness check (api/health.js: 'military:flights:v1'). None of those read the legacy HTTP endpoint; they read the Redis key directly. The proto handler uses its own per-bbox cache keys under the same prefix, so dashboard traffic no longer races the seed cron's blob — the two paths diverge by a small refresh lag, which is acceptable. Docs: dropped the /api/military-flights row from docs/api-proxies.mdx. api/military-flights.js deleted; manifest entry removed. Shape-diff vs legacy: - f.location.{latitude,longitude} → f.lat, f.lon - f.aircraftType: MILITARY_AIRCRAFT_TYPE_TANKER → 'tanker' via reverse map - f.operator: MILITARY_OPERATOR_USAF → 'usaf' via reverse map - f.confidence: MILITARY_CONFIDENCE_LOW → 'low' via reverse map - f.lastSeenAt (number) → f.lastSeen (Date) - f.enrichment → f.enriched (with field renames) - Extra fields registration / aircraftModel / origin / destination / firstSeenAt now flow through where proto populates them. * fix(supply-chain): thread includeCandidates through chokepoint status (#3207) Caught by tsconfig.api.json typecheck in the pre-push hook (not covered by the plain tsc --noEmit run that ran before I pushed the ais-snapshot commit). The chokepoint status handler calls getVesselSnapshot internally with a static no-auth request — now required to include the new includeCandidates bool from the proto extension. Passing false: server-internal callers don't need per-vessel reports. * test(maritime): update getVesselSnapshot cache assertions (#3207) The ais-snapshot migration replaced the single cachedSnapshot/cacheTimestamp pair with a per-variant cache so candidates-on and candidates-off payloads don't evict each other. Pre-push hook surfaced that tests/server-handlers still asserted the old variable names. Rewriting the assertions to match the new shape while preserving the invariants they actually guard: - Freshness check against slot TTL. - Cache read before relay call. - Per-slot in-flight dedup. - Stale-serve on relay failure (result ?? slot.snapshot). * chore(proto): restore // @ts-nocheck on regenerated maritime files (#3207) I ran 'buf generate --path worldmonitor/maritime/v1' to scope the proto regen to the one service I was changing (to avoid the toolchain drift that drops @ts-nocheck from 60+ unrelated files — separate issue). But the repo convention is the 'make generate' target, which runs buf and then sed-prepends '// @ts-nocheck' to every generated .ts file. My scoped command skipped the sed step. The proto-check CI enforces the sed output, so the two maritime files need the directive restored. * refactor(enrichment): decomm /api/enrichment/{company,signals} legacy edge fns (#3207) Both endpoints were already ported to IntelligenceService: - getCompanyEnrichment (/api/intelligence/v1/get-company-enrichment) - listCompanySignals (/api/intelligence/v1/list-company-signals) No frontend callers of the legacy /api/enrichment/* paths exist. Removes: - api/enrichment/company.js, signals.js, _domain.js - api-route-exceptions.json migration-pending entries (58 remain) - docs/api-proxies.mdx rows for /api/enrichment/{company,signals} - docs/architecture.mdx reference updated to the IntelligenceService RPCs Verified: typecheck, typecheck:api, lint:api-contract (89 files / 58 entries), lint:boundaries, tests/edge-functions.test.mjs (136 pass), tests/enrichment-caching.test.mjs (14 pass — still guards the intelligence/v1 handlers), make generate is zero-diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(leads): migrate /api/{contact,register-interest} → LeadsService (#3207) New leads/v1 sebuf service with two POST RPCs: - SubmitContact → /api/leads/v1/submit-contact - RegisterInterest → /api/leads/v1/register-interest Handler logic ported 1:1 from api/contact.js + api/register-interest.js: - Turnstile verification (desktop sources bypass, preserved) - Honeypot (website field) silently accepts without upstream calls - Free-email-domain gate on SubmitContact (422 ApiError) - validateEmail (disposable/offensive/typo-TLD/MX) on RegisterInterest - Convex writes via ConvexHttpClient (contactMessages:submit, registerInterest:register) - Resend notification + confirmation emails (HTML templates unchanged) Shared helpers moved to server/_shared/: - turnstile.ts (getClientIp + verifyTurnstile) - email-validation.ts (disposable/offensive/MX checks) Rate limits preserved via ENDPOINT_RATE_POLICIES: - submit-contact: 3/hour per IP (was in-memory 3/hr) - register-interest: 5/hour per IP (was in-memory 5/hr; desktop sources previously capped at 2/hr via shared in-memory map — now 5/hr like everyone else, accepting the small regression in exchange for Upstash-backed global limiting) Callers updated: - pro-test/src/App.tsx contact form → new submit-contact path - src-tauri/sidecar/local-api-server.mjs cloud-fallback rewrites /api/register-interest → /api/leads/v1/register-interest when proxying; keeps local path for older desktop builds - src/services/runtime.ts isKeyFreeApiTarget allows both old and new paths through the WORLDMONITOR_API_KEY-optional gate Tests: - tests/contact-handler.test.mjs rewritten to call submitContact handler directly; asserts on ValidationError / ApiError - tests/email-validation.test.mjs + tests/turnstile.test.mjs point at the new server/_shared/ modules Deleted: api/contact.js, api/register-interest.js, api/_ip-rate-limit.js, api/_turnstile.js, api/_email-validation.js, api/_turnstile.test.mjs. Manifest entries removed (58 → 56). Docs updated (api-platform, api-commerce, usage-rate-limits). Verified: npm run typecheck + typecheck:api + lint:api-contract (88 files / 56 entries) + lint:boundaries pass; full test:data (5852 tests) passes; make generate is zero-diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(pro-test): rebuild bundle for leads/v1 contact form (#3207) Updates the enterprise contact form to POST to /api/leads/v1/submit-contact (old path /api/contact removed in the previous commit). Bundle is rebuilt from pro-test/src/App.tsx source change in `9ccd309d`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): address HIGH review findings 1-3 (#3207) Three review findings from @koala73 on the sebuf-migration PR, all silent bugs that would have shipped to prod: ### 1. Sanctions rate-limit policy was dead code ENDPOINT_RATE_POLICIES keyed the 30/min budget under /api/sanctions/v1/lookup-entity, but the generated route (from the proto RPC LookupSanctionEntity) is /api/sanctions/v1/lookup-sanction-entity. hasEndpointRatePolicy / getEndpointRatelimit are exact-string pathname lookups, so the mismatch meant the endpoint fell through to the generic 600/min global limiter instead of the advertised 30/min. Net effect: the live OpenSanctions proxy endpoint (unauthenticated, external upstream) had 20x the intended rate budget. Fixed by renaming the policy key to match the generated route. ### 2. Lost stale-seed fallback on military-flights Legacy api/military-flights.js cascaded military:flights:v1 → military:flights:stale:v1 before returning empty. The new proto handler went straight to live OpenSky/relay and returned null on miss. Relay or OpenSky hiccup used to serve stale seeded data (24h TTL); under the new handler it showed an empty map. Both keys are still written by scripts/seed-military-flights.mjs on every run — fix just reads the stale key when the live fetch returns null, converts the seed's app-shape flights (flat lat/lon, lowercase enums, lastSeenMs) to the proto shape (nested GeoCoordinates, enum strings, lastSeenAt), and filters to the request bbox. Read via getRawJson (unprefixed) to match the seed cron's writes, which bypass the env-prefix system. ### 3. Hex-code casing mismatch broke getFlightByHex The seed cron writes hexCode: icao24.toUpperCase() (uppercase); src/services/military-flights.ts:getFlightByHex uppercases the lookup input: f.hexCode === hexCode.toUpperCase(). The new proto handler preserved OpenSky's lowercase icao24, and mapProtoFlight is a pass-through. getFlightByHex was silently returning undefined for every call after the migration. Fix: uppercase in the proto handler (live + stale paths), and document the invariant in a comment on MilitaryFlight.hex_code in military_flight.proto so future handlers don't re-break it. ### Verified - typecheck + typecheck:api clean - lint:api-contract (56 entries) / lint:boundaries clean - tests/edge-functions.test.mjs 130 pass - make generate zero-diff (openapi spec regenerated for proto comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): restore desktop 2/hr rate cap on register-interest (#3207) Addresses HIGH review finding #4 from @koala73. The legacy api/register-interest.js applied a nested 2/hr per-IP cap when `source === 'desktop-settings'`, on top of the generic 5/hr endpoint budget. The sebuf migration lost this — desktop-source requests now enjoy the full 5/hr cap. Since `source` is an unsigned client-supplied field, anyone sending `source: 'desktop-settings'` skips Turnstile AND gets 5/hr. Without the tighter cap the Turnstile bypass is cheaper to abuse. Added `checkScopedRateLimit` to `server/_shared/rate-limit.ts` — a reusable second-stage Upstash limiter keyed on an opaque scope string + caller identifier. Fail-open on Redis errors to match existing checkRateLimit / checkEndpointRateLimit semantics. Handlers that need per-subscope caps on top of the gateway-level endpoint budget use this helper. In register-interest: when `isDesktopSource`, call checkScopedRateLimit with scope `/api/leads/v1/register-interest#desktop`, limit=2, window=1h, IP as identifier. On exceeded → throw ApiError(429). ### What this does not fix This caps the blast radius of the Turnstile bypass but does not close it — an attacker sending `source: 'desktop-settings'` still skips Turnstile (just at 2/hr instead of 5/hr). The proper fix is a signed desktop-secret header that authenticates the bypass; filed as follow-up #3252. That requires coordinated Tauri build + Vercel env changes out of scope for #3207. ### Verified - typecheck + typecheck:api clean - lint:api-contract (56 entries) - tests/edge-functions.test.mjs + contact-handler.test.mjs (147 pass) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): MEDIUM + LOW + rate-limit-policy CI check (#3207) Closes out the remaining @koala73 review findings from #3242 that didn't already land in the HIGH-fix commits, plus the requested CI check that would have caught HIGH #1 (dead-code policy key) at review time. ### MEDIUM #5 — Turnstile missing-secret policy default Flip `verifyTurnstile`'s default `missingSecretPolicy` from `'allow'` to `'allow-in-development'`. Dev with no secret = pass (expected local); prod with no secret = reject + log. submit-contact was already explicitly overriding to `'allow-in-development'`; register-interest was silently getting `'allow'`. Safe default now means a future missing-secret misconfiguration in prod gets caught instead of silently letting bots through. Removed the now-redundant override in submit-contact. ### MEDIUM #6 — Silent enum fallbacks in maritime client `toDisruptionEvent` mapped `AIS_DISRUPTION_TYPE_UNSPECIFIED` / unknown enum values → `gap_spike` / `low` silently. Refactored to return null when either enum is unknown; caller filters nulls out of the array. Handler doesn't produce UNSPECIFIED today, but the `gap_spike` default would have mislabeled the first new enum value the proto ever adds — dropping unknowns is safer than shipping wrong labels. ### LOW — Copy drift in register-interest email Email template hardcoded `435+ Sources`; PR #3241 bumped marketing to `500+`. Bumped in the rewritten file to stay consistent. The `as any` on Convex mutation names carried over from legacy and filed as follow-up #3253. ### Rate-limit-policy coverage lint `scripts/enforce-rate-limit-policies.mjs` validates every key in `ENDPOINT_RATE_POLICIES` resolves to a proto-generated gateway route by cross-referencing `docs/api/.openapi.yaml`. Fails with the sanctions-entity-search incident referenced in the error message so future drift has a paper trail. Wired into package.json (`lint:rate-limit-policies`) and the pre-push hook alongside `lint:boundaries`. Smoke-tested both directions — clean repo passes (5 policies / 175 routes), seeded drift (the exact HIGH #1 typo) fails with the advertised remedy text. ### Verified - `lint:rate-limit-policies` ✓ - `typecheck` + `typecheck:api` ✓ - `lint:api-contract` ✓ (56 entries) - `lint:boundaries` ✓ - edge-functions + contact-handler tests (147 pass) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> refactor(commit 5): decomm /api/eia/* + migrate /api/satellites → IntelligenceService (#3207) Both targets turned out to be decomm-not-migration cases. The original plan called for two new services (economic/v1.GetEiaSeries + natural/v1.ListSatellitePositions) but research found neither was needed: ### /api/eia/[[...path]].js — pure decomm, zero consumers The "catch-all" is a misnomer — only two paths actually worked, /api/eia/health and /api/eia/petroleum, both Redis-only readers. Zero frontend callers in src/. Zero server-side readers. Nothing consumes the `energy:eia-petroleum:v1` key that seed-eia-petroleum.mjs writes daily. The EIA data the frontend actually uses goes through existing typed RPCs in economic/v1: GetEnergyPrices, GetCrudeInventories, GetNatGasStorage, GetEnergyCapacity. None of those touch /api/eia/. Building GetEiaSeries would have been dead code. Deleted the legacy file + its test (tests/api-eia-petroleum.test.mjs — it only covered the legacy endpoint, no behavior to preserve). Empty api/eia/ dir removed. Note for review:* the Redis seed cron keeps running daily and nothing consumes it. If that stays unused, seed-eia-petroleum.mjs should be retired too (separate PR). Out of scope for sebuf-migration. ### /api/satellites.js — Learning #2 strikes again IntelligenceService.ListSatellites already exists at /api/intelligence/v1/list-satellites, reads the same Redis key (intelligence:satellites:tle:v1), and supports an optional country filter the legacy didn't have. One frontend caller in src/services/satellites.ts needed to switch from `fetch(toApiUrl('/api/satellites'))` to the typed IntelligenceServiceClient.listSatellites. Shape diff was tiny — legacy `noradId` became proto `id` (handler line 36 already picks either), everything else identical. alt/velocity/inclination in the proto are ignored by the caller since it propagates positions client-side via satellite.js. Kept the client-side cache + failure cooldown + 20s timeout (still valid concerns at the caller level). ### Manifest + docs - api-route-exceptions.json: 56 → 54 entries (both removed) - docs/api-proxies.mdx: dropped the two rows from the Raw-data passthroughs table ### Verified - typecheck + typecheck:api ✓ - lint:api-contract (54 entries) / lint:boundaries / lint:rate-limit-policies ✓ - tests/edge-functions.test.mjs 127 pass (down from 130 — 3 tests were for the deleted eia endpoint) - make generate zero-diff (no proto changes) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(commit 6): migrate /api/supply-chain/v1/{country-products,multi-sector-cost-shock} → SupplyChainService (#3207) Both endpoints were hand-rolled TS handlers sitting under a proto URL prefix — the exact drift the manifest guardrail flagged. Promoted both to typed RPCs: - GetCountryProducts → /api/supply-chain/v1/get-country-products - GetMultiSectorCostShock → /api/supply-chain/v1/get-multi-sector-cost-shock Handlers preserve the existing semantics: PRO-gate via isCallerPremium(ctx.request), iso2 / chokepointId validation, raw bilateral-hs4 Redis read (skip env-prefix to match seeder writes), CHOKEPOINT_STATUS_KEY for war-risk tier, and the math from _multi-sector-shock.ts unchanged. Empty-data and non-PRO paths return the typed empty payload (no 403 — the sebuf gateway pattern is empty-payload-on-deny). Client wrapper switches from premiumFetch to client.getCountryProducts/ client.getMultiSectorCostShock. Legacy MultiSectorShock / MultiSectorShockResponse / CountryProductsResponse names remain as type aliases of the generated proto types so CountryBriefPanel + CountryDeepDivePanel callsites compile with zero churn. Manifest 54 → 52. Rate-limit gateway routes 175 → 177. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): add cache-tier entries for new supply-chain RPCs (#3207) Pre-push tests/route-cache-tier.test.mjs caught the missing entries. Both PRO-gated, request-varying — match the existing supply-chain PRO cohort (get-country-cost-shock, get-bypass-options, etc.) at slow-browser tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(commit 7): migrate /api/scenario/v1/{run,status,templates} → ScenarioService (#3207) Promote the three literal-filename scenario endpoints to a typed sebuf service with three RPCs: POST /api/scenario/v1/run-scenario (RunScenario) GET /api/scenario/v1/get-scenario-status (GetScenarioStatus) GET /api/scenario/v1/list-scenario-templates (ListScenarioTemplates) Preserves all security invariants from the legacy handlers: - 405 for wrong method (sebuf service-config method gate) - scenarioId validation against SCENARIO_TEMPLATES registry - iso2 regex ^[A-Z]{2}$ - JOB_ID_RE path-traversal guard on status - Per-IP 10/min rate limit (moved to gateway ENDPOINT_RATE_POLICIES) - Queue-depth backpressure (>100 → 429) - PRO gating via isCallerPremium - AbortSignal.timeout on every Redis pipeline (runRedisPipeline helper) Wire-level diffs vs legacy: - Per-user RL now enforced at the gateway (same 10/min/IP budget). - Rate-limit response omits Retry-After header; retryAfter is in the body per error-mapper.ts convention. - ListScenarioTemplates emits affectedHs2: [] when the registry entry is null (all-sectors sentinel); proto repeated cannot carry null. - RunScenario returns { jobId, status } (no statusUrl field — unused by SupplyChainPanel, drop from wire). Gateway wiring: - server/gateway.ts RPC_CACHE_TIER: list-scenario-templates → 'daily' (matches legacy max-age=3600); get-scenario-status → 'slow-browser' (premium short-circuit target, explicit entry required by tests/route-cache-tier.test.mjs). - src/shared/premium-paths.ts: swap old run/status for the new run-scenario/get-scenario-status paths. - api/scenario/v1/{run,status,templates}.ts deleted; 3 manifest exceptions removed (63 → 52 → 49 migration-pending). Client: - src/services/scenario/index.ts — typed client wrapper using premiumFetch (injects Clerk bearer / API key). - src/components/SupplyChainPanel.ts — polling loop swapped from premiumFetch strings to runScenario/getScenarioStatus. Hard 20s timeout on run preserved via AbortSignal.any. Tests: - tests/scenario-handler.test.mjs — 18 new handler-level tests covering every security invariant + the worker envelope coercion. - tests/edge-functions.test.mjs — scenario sections removed, replaced with a breadcrumb pointer to the new test file. Docs: api-scenarios.mdx, scenario-engine.mdx, usage-rate-limits.mdx, usage-errors.mdx, supply-chain.mdx refreshed with new paths. Verified: typecheck, typecheck:api, lint:api-contract (49 entries), lint:rate-limit-policies (6/180), lint:boundaries, route-cache-tier (parity), full edge-functions (117) + scenario-handler (18). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(commit 8): migrate /api/v2/shipping/{route-intelligence,webhooks} → ShippingV2Service (#3207) Partner-facing endpoints promoted to a typed sebuf service. Wire shape preserved byte-for-byte (camelCase field names, ISO-8601 fetchedAt, the same subscriberId/secret formats, the same SET + SADD + EXPIRE 30-day Redis pipeline). Partner URLs /api/v2/shipping/* are unchanged. RPCs landed: - GET /route-intelligence → RouteIntelligence (PRO, slow-browser) - POST /webhooks → RegisterWebhook (PRO) - GET /webhooks → ListWebhooks (PRO, slow-browser) The existing path-parameter URLs remain on the legacy edge-function layout because sebuf's HTTP annotations don't currently model path params (grep proto/*/.proto for `path: "{…}"` returns zero). Those endpoints are split into two Vercel dynamic-route files under api/v2/shipping/webhooks/, behaviorally identical to the previous hybrid file but cleanly separated: - GET /webhooks/{subscriberId} → [subscriberId].ts - POST /webhooks/{subscriberId}/rotate-secret → [subscriberId]/[action].ts - POST /webhooks/{subscriberId}/reactivate → [subscriberId]/[action].ts Both get manifest entries under `migration-pending` pointing at #3207. Other changes - scripts/enforce-sebuf-api-contract.mjs: extended GATEWAY_RE to accept api/v{N}/{domain}/[rpc].ts (version-first) alongside the canonical api/{domain}/v{N}/[rpc].ts; first-use of the reversed ordering is shipping/v2 because that's the partner contract. - vite.config.ts: dev-server sebuf interceptor regex extended to match both layouts; shipping/v2 import + allRoutes entry added. - server/gateway.ts: RPC_CACHE_TIER entries for /api/v2/shipping/ route-intelligence + /webhooks (slow-browser; premium-gated endpoints short-circuit to slow-browser but the entries are required by tests/route-cache-tier.test.mjs). - src/shared/premium-paths.ts: route-intelligence + webhooks added. - tests/shipping-v2-handler.test.mjs: 18 handler-level tests covering PRO gate, iso2/cargoType/hs2 coercion, SSRF guards (http://, RFC1918, cloud metadata, IMDS), chokepoint whitelist, alertThreshold range, secret/subscriberId format, pipeline shape + 30-day TTL, cross-tenant owner isolation, `secret` omission from list response. Manifest delta - Removed: api/v2/shipping/route-intelligence.ts, api/v2/shipping/webhooks.ts - Added: api/v2/shipping/webhooks/[subscriberId].ts (migration-pending) - Added: api/v2/shipping/webhooks/[subscriberId]/[action].ts (migration-pending) - Added: api/internal/brief-why-matters.ts (internal-helper) — regression surface from the #3248 main merge, which introduced the file without a manifest entry. Filed here to keep the lint green; not strictly in scope for commit 8 but unblocking. Net result: 49 → 47 `migration-pending` entries (one net-removal even though webhook path-params stay pending, because two files collapsed into two dynamic routes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review HIGH 1): SupplyChainServiceClient must use premiumFetch (#3207) Signed-in browser pro users were silently hitting 401 on 8 supply-chain premium endpoints (country-products, multi-sector-cost-shock, country-chokepoint-index, bypass-options, country-cost-shock, sector-dependency, route-explorer-lane, route-impact). The shared client was constructed with globalThis.fetch, so no Clerk bearer or X-WorldMonitor-Key was injected. The gateway's validateApiKey runs with forceKey=true for PREMIUM_RPC_PATHS and 401s before isCallerPremium is consulted. The generated client's try/catch collapses the 401 into an empty-fallback return, leaving panels blank with no visible error. Fix is one line at the client constructor: swap globalThis.fetch for premiumFetch. The same pattern is already in use for insider-transactions, stock-analysis, stock-backtest, scenario, trade (premiumClient) — this was an omission on this client, not a new pattern. premiumFetch no-ops safely when no credentials are available, so the 5 non-premium methods on this client (shippingRates, chokepointStatus, chokepointHistory, criticalMinerals, shippingStress) continue to work unchanged. This also fixes two panels that were pre-existing latently broken on main (chokepoint-index, bypass-options, etc. — predating #3207, not regressions from it). Commit 6 expanded the surface by routing two more methods through the same buggy client; this commit fixes the class. From koala73 review (#3242 second-pass, HIGH new #1): > Exact class PR #3233 fixed for RegionalIntelligenceBoard / > DeductionPanel / trade / country-intel. Supply-chain was not in > #3233's scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review HIGH 2): restore 400 on input-shape errors for 2 supply-chain handlers (#3207) Commit 6 collapsed all non-happy paths into empty-200 on `get-country-products` and `get-multi-sector-cost-shock`, including caller-bug cases that legacy returned 400 for: - get-country-products: malformed iso2 → empty 200 (was 400) - get-multi-sector-cost-shock: malformed iso2 / missing chokepointId / unknown chokepointId → empty 200 (was 400) The commit message for 6 called out the 403-for-non-pro → empty-200 shift ("sebuf gateway pattern is empty-payload-on-deny") but not the 400 shift. They're different classes: - Empty-payload-200 for PRO-deny: intentional contract change, already documented and applied across the service. Generated clients treat "you lack PRO" as "no data" — fine. - Empty-payload-200 for malformed input: caller bug silently masked. External API consumers can't distinguish "bad wiring" from "genuinely no data", test harnesses lose the signal, bad calling code doesn't surface in Sentry. Fix: `throw new ValidationError(violations)` on the 3 input-shape branches. The generated sebuf server maps ValidationError → HTTP 400 (see src/generated/server/.../service_server.ts and leads/v1 which already uses this pattern). PRO-gate deny stays as empty-200 — that contract shift was intentional and is preserved. Regression tests added at tests/supply-chain-validation.test.mjs (8 cases) pinning the three-way contract: - bad input → 400 (ValidationError) - PRO-gate deny on valid input → 200 empty - valid PRO input, no data in Redis → 200 empty (unchanged) From koala73 review (#3242 second-pass, HIGH new #2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review HIGH 3): restore statusUrl on RunScenarioResponse + document 202→200 wire break (#3207) Commit 7 silently shifted /api/scenario/v1/run-scenario's response contract in two ways that the commit message covered only partially: 1. HTTP 202 Accepted → HTTP 200 OK 2. Dropped `statusUrl` string from the response body The `statusUrl` drop was mentioned as "unused by SupplyChainPanel" but not framed as a contract change. The 202 → 200 shift was not mentioned at all. This is a same-version (v1 → v1) migration, so external callers that key off either signal — `response.status === 202` or `response.body.statusUrl` — silently branch incorrectly. Evaluated options: (a) sebuf per-RPC status-code config — not available. sebuf's HttpConfig only models `path` and `method`; no status annotation. (b) Bump to scenario/v2 — judged heavier than the break itself for a single status-code shift. No in-repo caller uses 202 or statusUrl; the docs-level impact is containable. (c) Accept the break, document explicitly, partially restore. Took option (c): - Restored `statusUrl` in the proto (new field `string status_url = 3` on RunScenarioResponse). Server computes `/api/scenario/v1/get-scenario-status?jobId=<encoded job_id>` and populates it on every successful enqueue. External callers that followed this URL keep working unchanged. - 202 → 200 is not recoverable inside the sebuf generator, so it is called out explicitly in two places: - docs/api-scenarios.mdx now includes a prominent `<Warning>` block documenting the v1→v1 contract shift + the suggested migration (branch on response body shape, not HTTP status). - RunScenarioResponse proto comment explains why 200 is the new success status on enqueue. OpenAPI bundle regenerated to reflect the restored statusUrl field. - Regression test added in tests/scenario-handler.test.mjs pinning `statusUrl` to the exact URL-encoded shape — locks the invariant so a future proto rename or handler refactor can't silently drop it again. From koala73 review (#3242 second-pass, HIGH new #3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review HIGH 1/2): close webhook tenant-isolation gap on shipping/v2 (#3207) Koala flagged this as a merge blocker in PR #3242 review. server/worldmonitor/shipping/v2/{register-webhook,list-webhooks}.ts migrated without reinstating validateApiKey(req, { forceKey: true }), diverging from both the sibling api/v2/shipping/webhooks/[subscriberId] routes and the documented "X-WorldMonitor-Key required" contract in docs/api-shipping-v2.mdx. Attack surface: the gateway accepts Clerk bearer auth as a pro signal. A Clerk-authenticated pro user with no X-WorldMonitor-Key reaches the handler, callerFingerprint() falls back to 'anon', and every such caller collapses into a shared webhook:owner:anon:v1 bucket. The defense-in-depth ownerTag !== ownerHash check in list-webhooks.ts doesn't catch it because both sides equal 'anon' — every Clerk-session holder could enumerate / overwrite every other Clerk-session pro tenant's registered webhook URLs. Fix: reinstate validateApiKey(ctx.request, { forceKey: true }) at the top of each handler, throwing ApiError(401) when absent. Matches the sibling routes exactly and the published partner contract. Tests: - tests/shipping-v2-handler.test.mjs: two existing "non-PRO → 403" tests for register/list were using makeCtx() with no key, which now fails at the 401 layer first. Renamed to "no API key → 401 (tenant-isolation gate)" with a comment explaining the failure mode being tested. 18/18 pass. Verified: typecheck:api, lint:api-contract (no change), lint:boundaries, lint:rate-limit-policies, test:data (6005/6005). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review HIGH 2/2): restore v1 path aliases on scenario + supply-chain (#3207) Koala flagged this as a merge blocker in PR #3242 review. Commits 6 + 7 of #3207 renamed five documented v1 URLs to the sebuf method-derived paths and deleted the legacy edge-function files: POST /api/scenario/v1/run → run-scenario GET /api/scenario/v1/status → get-scenario-status GET /api/scenario/v1/templates → list-scenario-templates GET /api/supply-chain/v1/country-products → get-country-products GET /api/supply-chain/v1/multi-sector-cost-shock → get-multi-sector-cost-shock server/router.ts is an exact static-match table (Map keyed on `METHOD PATH`), so any external caller — docs, partner scripts, grep-the- internet — hitting the old documented URL would 404 on first request after merge. Commit 8 (shipping/v2) preserved partner URLs byte-for- byte; the scenario + supply-chain renames missed that discipline. Fix: add five thin alias edge functions that rewrite the pathname to the canonical sebuf path and delegate to the domain [rpc].ts gateway via a new server/alias-rewrite.ts helper. Premium gating, rate limits, entitlement checks, and cache-tier lookups all fire on the canonical path — aliases are pure URL rewrites, not a duplicate handler pipeline. api/scenario/v1/{run,status,templates}.ts api/supply-chain/v1/{country-products,multi-sector-cost-shock}.ts Vite dev parity: file-based routing at api/ is a Vercel concern, so the dev middleware (vite.config.ts) gets a matching V1_ALIASES rewrite map before the router dispatch. Manifest: 5 new entries under `deferred` with removal_issue=#3282 (tracking their retirement at the next v1→v2 break). lint:api-contract stays green (89 files checked, 55 manifest entries validated). Docs: - docs/api-scenarios.mdx: migration callout at the top with the full old→new URL table and a link to the retirement issue. - CHANGELOG.md + docs/changelog.mdx: Changed entry documenting the rename + alias compat + the 202→200 shift (from commit `23c821a1`). Verified: typecheck:api, lint:api-contract, lint:rate-limit-policies, lint:boundaries, test:data (6005/6005). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 09:55:59 +03:00
Elie Habib	425507d15a	fix(brief): category-gated context + RELEVANCE RULE to stop formulaic grounding (#3281 ) * fix(brief): category-gated context + RELEVANCE RULE to stop formulaic grounding Shadow-diff of 15 v2 pairs (2026-04-22) showed the analyst pattern- matching the loudest context numbers — VIX 19.50, top forecast probability, MidEast FX stress 77 — into every story regardless of editorial fit. A Rwanda humanitarian story about refugees cited VIX; an aviation story cited a forecast probability. Root cause: every story got the same 6-bundle context block, so the LLM had markets / forecasts / macro in-hand and the "cite a specific fact" instruction did the rest. Two-layer fix: 1. STRUCTURAL — sectionsForCategory() maps the story's category to an editorially-relevant subset of bundles. Humanitarian stories don't see marketData / forecasts / macroSignals; diplomacy gets riskScores only; market/energy gets markets+forecasts but drops riskScores. The model physically cannot cite what it wasn't given. Unknown categories fall back to all six (backcompat). 2. PROMPT — WHY_MATTERS_ANALYST_SYSTEM_V2 adds a RELEVANCE RULE that explicitly permits grounding in headline/description actors when no context fact is a clean fit, and bans dragging off-topic market metrics into humanitarian/aviation/diplomacy stories. The prompt footer (inline, per-call) restates the same guardrail — models follow inline instructions more reliably than system-prompt constraints on longer outputs. Cache keys bumped to invalidate the formulaic v5 output: endpoint v5 to v6, shadow v3 to v4. Adds 11 unit tests pinning the 5 policies + default fallback + humanitarian structural guarantee + market policy does-see-markets + guardrail footer presence. Observability: endpoint now logs policyLabel per call so operators can confirm in Vercel logs that humanitarian/aviation stories are NOT seeing marketData without dumping the full prompt. * test(brief): address greptile P2 — sync MAX_BODY_BYTES + add parseWhyMattersV2 coverage Greptile PR #3281 review raised two P2 test-quality issues: 1. Test-side MAX_BODY_BYTES mirror was still 4096 — the endpoint was bumped to 8192 in PR #3269 (v2 output + description). With the stale constant, a payload in the 4097–8192 range was accepted by the real endpoint but looked oversize in the test mirror, letting the body-cap invariant silently drift. Fixed by syncing to 8192 + bumping the bloated fixture to 10_000 bytes so a future endpoint-cap bump doesn't silently re-invalidate the assertion. 2. parseWhyMattersV2 (the only output-validation gate on the analyst path) had no dedicated unit tests. Adds 11 targeted cases covering: valid 2 and 3 sentence output, 100/500 char bounds (incl. boundary assertions), all 6 banned preamble phrases, section-label leaks (SITUATION/ANALYSIS/Watch), markdown leakage (#, -, , 1.), stub echo rejection, smart/ plain quote stripping, non-string defensive branch, and whitespace-only strings. Suite size: 50 to 61 tests, all green. fix(brief): add aviation policy to sectionsForCategory (PR #3281 review P1) Reviewer caught that aviation was named in WHY_MATTERS_ANALYST_SYSTEM_V2's RELEVANCE RULE as a category banned from off-topic market metrics, but had no matching regex entry in CATEGORY_SECTION_POLICY. So 'Aviation Incident' / 'Airspace Closure' / 'Plane Crash' / 'Drone Incursion' all fell through to DEFAULT_SECTIONS and still got all 6 bundles including marketData, forecasts, and macroSignals — exactly the VIX / forecast probability pattern the PR claimed to structurally prevent. Reproduced on HEAD before fix: Aviation Incident -> default Airspace Closure -> default Plane Crash -> default ...etc. Fix: 1. Adds aviation policy (same 3 bundles as humanitarian/diplomacy/ tech: worldBrief, countryBrief, riskScores). 2. Adds dedicated aviation-gating test with 6 category variants. 3. Adds meta-invariant test: every category named in the system prompt's RELEVANCE RULE MUST have a structural policy entry, asserting policyLabel !== 'default'. If someone adds a new category name to the prompt in the future, this test fires until they wire up a regex — prevents soft-guard drift. 4. Removes 'Aviation Incident' from the default-fall-through test list (it now correctly matches aviation). No cache bump needed — v6 was published to the feature branch only a few minutes ago, no production entries have been written yet.	2026-04-22 08:21:01 +04:00
Elie Habib	fbaf07e106	feat(resilience): flag-gated pillar-combined score activation (default off) (#3267 ) Wires the non-compensatory 3-pillar combined overall_score behind a RESILIENCE_PILLAR_COMBINE_ENABLED env flag. Default is false so this PR ships zero behavior change in production. When flipped true the top-level overall_score switches from the 6-domain weighted aggregate to penalizedPillarScore(pillars) with alpha 0.5 and pillar weights 0.40 / 0.35 / 0.25. Evidence from docs/snapshots/resilience-pillar-sensitivity-2026-04-21: - Spearman rank correlation current vs proposed 0.9935 - Mean score delta -13.44 points (every country drops, penalty is always at most 1) - Max top-50 rank swing 6 positions (Russia) - No ceiling or floor effects under plus/minus 20pct perturbation - Release gate PASS 0/19 Code change in server/worldmonitor/resilience/v1/_shared.ts: - New isPillarCombineEnabled() reads env dynamically so tests can flip state without reloading the module - overallScore branches on (isPillarCombineEnabled() AND RESILIENCE_SCHEMA_V2_ENABLED AND pillars.length > 0); otherwise falls through to the 6-domain aggregate (unchanged default path) - RESILIENCE_SCORE_CACHE_PREFIX bumped v9 to v10 - RESILIENCE_RANKING_CACHE_KEY bumped v9 to v10 Cache invalidation: the version bump forces both per-country score cache and ranking cache to recompute from the current code path on first read after a flag flip. Without the bump, 6-domain values cached under the flag-off path would continue to serve for up to 6-12 hours after the flip, producing a ragged mix of formulas. Ripple of v9 to v10: - api/health.js registry entry - scripts/seed-resilience-scores.mjs (both keys) - scripts/validate-resilience-correlation.mjs, scripts/backtest-resilience-outcomes.mjs, scripts/validate-resilience-backtest.mjs, scripts/benchmark-resilience-external.mjs - tests/resilience-ranking.test.mts 24 fixture usages - tests/resilience-handlers.test.mts - tests/resilience-scores-seed.test.mjs explicit pin - tests/resilience-pillar-aggregation.test.mts explicit pin - docs/methodology/country-resilience-index.mdx New tests/resilience-pillar-combine-activation.test.mts: 7 assertions exercising the flag-on path against the release fixtures with re-anchored bands (NO at least 60, YE/SO at most 40, NO greater than US preserved, elite greater than fragile). Regression guard verifies flipping the flag back off restores the 6-domain aggregate. tests/resilience-ranking-snapshot.test.mts: band thresholds now resolve from a METHODOLOGY_BANDS table keyed on snapshot.methodologyFormula. Backward compatible (missing formula defaults to domain-weighted-6d bands). Snapshots: - docs/snapshots/resilience-ranking-2026-04-21.json tagged methodologyFormula domain-weighted-6d - docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json new: top/bottom/major-economies tables projected from the 52-country sensitivity sample. Explicitly tagged projected (NOT a full-universe live capture). When the flag is flipped in production, run scripts/freeze-resilience-ranking.mjs to capture the authoritative full-universe snapshot. Methodology doc: Pillar-combined score activation section rewritten to describe the flag-gated mechanism (activation is an env-var flip, no code deploy) and the rollback path. Verification: npm run typecheck:all clean, 397/397 resilience tests pass (up from 390, +7 activation tests). Activation plan: 1. Merge this PR with flag default false (zero behavior change) 2. Set RESILIENCE_PILLAR_COMBINE_ENABLED=true in Vercel and Railway env 3. Redeploy or wait for next cold start; v9 to v10 bump forces every country to be rescored on first read 4. Run scripts/freeze-resilience-ranking.mjs against the flag-on deployment and commit the resulting snapshot 5. Ship a v2.0 methodology-change note explaining the re-anchored scale so analysts understand the universal ~13 point score drop is a scale rebase, not a country-level regression Rollback: set RESILIENCE_PILLAR_COMBINE_ENABLED=false, flush resilience:score:v10:* and resilience:ranking:v10 keys (or wait for TTLs). The 6-domain formula stays alongside the pillar combine in _shared.ts and needs no code change to come back.	2026-04-22 06:52:07 +04:00
Elie Habib	502bd4472c	docs(resilience): sync methodology/proto/widget to 6-domain + 3-pillar reality (#3264 ) Brings every user-facing surface into alignment with the live resilience scorer. Zero behavior change: overall_score is still the 6-domain weighted aggregate, schemaVersion is still 2.0 default, and every existing test continues to pass. Surfaces touched: - proto + OpenAPI: rewrote the ResiliencePillar + schema_version descriptions. 2.0 is correctly documented as default; shaped-but-empty language removed. - Widget: added missing recovery: 'Recovery' label (was rendering literal lowercase recovery before), retitled footer data-version chip from Data to Seed date so it is clear the value reflects the static seed bundle not every live input, rewrote help tooltip for 6 domains and 3 pillars and called out the 0.25 recovery weight. - Methodology doc: domains-and-weights table now carries all 6 rows with actual code weights (0.17/0.15/0.11/0.19/0.13/0.25), Recovery section header weight corrected from 1.0 to 0.25, new Pillar-combined score activation (pending) section with the measured Spearman 0.9935, top-5 movers, and the activation checklist. - documentation.mdx + features.mdx: product blurbs updated from 5 domains and 13 dimensions to 6 domains and 19 dimensions grouped into 3 pillars. - Tests: recovery-label regression pin, Seed date label pin, clarified pillar-schema degenerate-input semantics. New scaffolding for defensibility: - docs/snapshots/resilience-ranking-2026-04-21.json frozen published tables artifact with methodology metadata and commit SHA. - docs/snapshots/resilience-pillar-sensitivity-2026-04-21.json live Redis capture (52-country sample) combining sensitivity stability with the current-vs-proposed Spearman comparison. - scripts/freeze-resilience-ranking.mjs refresh script. - scripts/compare-resilience-current-vs-proposed.mjs comparison script. - tests/resilience-ranking-snapshot.test.mts 13 assertions auto discovered from any resilience-ranking-YYYY-MM-DD.json in snapshots. Verification: npm run typecheck:all clean, 390/390 resilience tests pass. Follow-up: pillar-combined score activation. The sensitivity artifact shows rank-preservation Spearman 0.9935 and no ceiling effects, which clears the methodological bar. Blocker is messaging because every country drops ~13 points under the penalty, so activation PR ships with re-anchored release-gate bands, refreshed frozen ranking, and a v2.0 methodology note.	2026-04-21 22:37:27 +04:00
Elie Habib	e878baec52	fix(digest): DIGEST_ONLY_USER self-expiring (mandatory until= suffix, 48h cap) (#3271 ) * fix(digest): DIGEST_ONLY_USER self-expiring (mandatory until= suffix, 48h cap) Review finding on PR #3255: DIGEST_ONLY_USER was a sticky production footgun. If an operator forgot to unset after a one-off validation, the cron silently filtered every other user out indefinitely while still completing normally (exit 0) — prolonged partial outage with "green" runs. Fix: mandatory `\|until=<ISO8601>` suffix within 48h of now. Otherwise the flag is IGNORED with a loud warn, fan-out proceeds normally. Active filter emits a structured console.warn at run start listing expiry + remaining minutes. Valid: DIGEST_ONLY_USER=user_xxx\|until=2026-04-22T18:00Z Rejected (→ loud warn, normal fan-out): - Legacy bare `user_xxx` (missing required suffix) - Unparseable ISO - Expiry > 48h (forever-test mistake) - Expiry in past (auto-disable) Parser extracted to `scripts/lib/digest-only-user.mjs` (testable without importing seed-digest-notifications.mjs which has no isMain guard). Tests: 17 cases covering unset / reject / active branches, ISO variants, boundaries, and the 48h cap. 6066 total pass. typecheck × 2 clean. Breaking change on the flag's format, but it shipped 2h before this finding with no prod usage — tightening now is cheaper than after an incident. * chore(digest): address /ce:review P2s on DIGEST_ONLY_USER parser Two style fixes flagged by Greptile on PR #3271: 1. Misleading multi-pipe error message. `user_xxx\|until=<iso>\|extra` returned "missing mandatory suffix", which points the operator toward adding a suffix that is already present (confused operator might try `user_xxx\|until=...\|until=...`). Now distinguishes parts.length===1 ("missing suffix") from >2 ("expected exactly one '\|' separator, got N"). 2. Date.parse is lenient — accepts RFC 2822, locale strings, "April 22". The documented contract is strict ISO 8601; the 48h cap catches accidental-valid dates but the documentation lied. Added a regex guard up-front that enforces the ISO 8601 shape (YYYY-MM-DD optionally followed by time + TZ). Rejects the 6 Date-parseable-but-not-ISO fixtures before Date.parse runs. Both regressions pinned in tests/digest-only-user.test.mjs (18 pass, was 17). typecheck × 2 clean.	2026-04-21 22:36:30 +04:00
Elie Habib	ec35cf4158	feat(brief): analyst prompt v2 — multi-sentence, grounded, story description (#3269 ) * feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output indistinguishable from legacy Gemini: identical single-sentence abstraction ("destabilize / systemic / sovereign risk repricing") with no named actors, metrics, or dates — in several cases Gemini was MORE specific. Root cause: 18–30 word cap compressed context specifics out. v2 loosens three dials at once so we can settle the A/B: 1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences, 40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc, MUST cite one specific named actor / metric / date / place from the context. Analyst path only; gemini path stays on v1. 2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects preamble boilerplate + leaked section labels + markdown. 3. Story description plumbed through — endpoint body accepts optional story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB). Cron forwards it when upstream has one (skipped when it equals the headline — no new signal). Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length. If shadow-diff 24h after deploy still shows no delta vs gemini, kill is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy). Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean. * fix(brief): stop truncating v2 multi-sentence output + description in cache hash Two P1s caught in PR #3269 review. P1a — cron reparsed endpoint output with v1 single-sentence parser, silently dropping sentences 2+3 of v2 analyst output. The endpoint had ALREADY validated the string (parseWhyMattersV2 for analyst path; parseWhyMatters for gemini). Re-parsing with v1 took only the first sentence — exact regression #3269 was meant to fix. Fix: trust the endpoint. Replace re-parse with bounds check (30–500 chars) + stub-echo reject. Added regression test asserting multi- sentence output reaches the envelope unchanged. P1b — `story.description` flowed into the analyst prompt but NOT into the cache hash. Two requests with identical core fields but different descriptions collided on one cache slot → second caller got prose grounded in the FIRST caller's description. Fix: add `description` as the 6th field of `hashBriefStory`. Bump endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are dropped. Updated the parity sentinel in brief-llm-core.test.mjs to match 6-field semantics. Added regression tests covering different- descriptions-differ and present-vs-absent-differ. Tests: 6083 pass. typecheck × 2 clean.	2026-04-21 22:25:54 +04:00
Elie Habib	048bb8bb52	fix(brief): unblock whyMatters analyst endpoint (middleware 403) + DIGEST_ONLY_USER filter (#3255 ) * fix(brief): unblock whyMatters analyst endpoint + add DIGEST_ONLY_USER filter Three changes, all operational for PR #3248's brief-why-matters feature. 1. middleware.ts PUBLIC_API_PATHS allowlist Railway logs post-#3248 merge showed every cron call to /api/internal/brief-why-matters returning 403 — middleware's "short UA" guard (~L183) rejects Node undici's default UA before the endpoint's own Bearer-auth runs. The feature never executed in prod; three-layer fallback silently shipped legacy Gemini output. Same class as /api/seed-contract-probe (2026-04-15). Endpoint still carries its own subtle-crypto HMAC auth, so bypassing the UA gate is safe. 2. Explicit UA on callAnalystWhyMatters fetch Defense-in-depth. Explicit 'worldmonitor-digest-notifications/1.0' keeps the endpoint reachable if PUBLIC_API_PATHS is ever refactored, and makes cron traffic distinguishable from ops curl in logs. 3. DIGEST_ONLY_USER=user_xxx filter Operator single-user test flag. Set on Railway to run compose + send for one user on the next tick (then unset) — validates new features end-to-end without fanning out. Empty/unset = normal fan-out. Applied right after rule fetch so both compose and dispatch paths respect it. Regression tests: 15 new cases in tests/middleware-bot-gate.test.mts pin every PUBLIC_API_PATHS entry against 3 triggers (empty/short/curl UA) plus a negative sibling-path suite so a future prefix-match refactor can't silently unblock /api/internal/. Tests: 6043 pass. typecheck + typecheck:api clean. biome: pre-existing main() complexity warning bumped 74→78 by the filter block (unchanged in character from pre-PR). * test(middleware): expand sibling-path negatives to cover all 3 trigger UAs Greptile flagged: `SIBLING_PATHS` was only tested with `EMPTY_UA`. Under the current middleware chain this is sufficient (sibling paths hit the short-UA OR BOT_UA 403 regardless), but it doesn't pin which guard fires. A future refactor that moves `PUBLIC_API_PATHS.has(path)` later in the chain could let a curl or undici UA pass on a sibling path without this suite failing. Fix: iterate the 3 sibling paths against all 3 trigger UAs (empty, short/undici, curl). Every combination must still 403 regardless of which guard catches it. 6 new test cases. Tests: 35 pass in the middleware-bot-gate suite (was 29).	2026-04-21 19:41:58 +04:00
Elie Habib	65a1210531	fix(unrest): Decodo proxy fallback for GDELT + surface err.cause (#3256 ) * fix(unrest): Decodo proxy fallback for GDELT + surface err.cause Background: unrestEvents went STALE_SEED when every tick logged "GDELT failed: fetch failed" (Railway log 2026-04-21). The bare "fetch failed" string hid the actual cause (DNS/TCP/TLS), so the outage was opaque. ACLED is disabled (no credentials) so GDELT is the sole live source — when it fails, the seed freezes. Changes: - fetchGdeltEvents: direct-first, Decodo proxy fallback via httpsProxyFetchRaw when PROXY_URL is configured. Mirrors imfFetchJson / _yahoo-fetch.mjs direct→proxy pattern. - Error messages now include err.cause.code (UND_ERR_CONNECT_TIMEOUT, ENOTFOUND, ECONNRESET, etc.) so the next outage surfaces the underlying transport error instead of "fetch failed". - Both-paths-failed error carries direct + proxy message so either can be diagnosed from a single log line. No behavior change on the happy path — direct fetch still runs first with the existing 30s AbortSignal timeout. * fix(unrest): address PR #3256 P2 review - describeErr: handle plain-string .cause (e.g. `{ cause: 'ENOTFOUND' }`) that would otherwise be silently dropped since a string has no .code/.errno/.message accessors. - fetchGdeltDirect: tag HTTP-status errors (!resp.ok) with httpStatus. fetchGdeltEvents skips the proxy hop for upstream HTTP errors since the proxy routes to the same GDELT endpoint — saves the 20s proxy timeout and avoids a pointless retry. Transport failures (DNS/TCP/TLS timeouts against Railway IPs) still trigger the proxy fallback, which is the motivating case.	2026-04-21 19:39:16 +04:00
Elie Habib	2f19d96357	feat(brief): route whyMatters through internal analyst-context endpoint (#3248 ) * feat(brief): route whyMatters through internal analyst-context endpoint The brief's "why this is important" callout currently calls Gemini on only {headline, source, threatLevel, category, country} with no live state. The LLM can't know whether a ceasefire is on day 2 or day 50, that IMF flagged >90% gas dependency in UAE/Qatar/Bahrain, or what today's forecasts look like. Output is generic prose instead of the situational analysis WMAnalyst produces when given live context. This PR adds an internal Vercel edge endpoint that reuses a trimmed variant of the analyst context (country-brief, risk scores, top-3 forecasts, macro signals, market data — no GDELT, no digest-search) and ships it through a one-sentence LLM call with the existing WHY_MATTERS_SYSTEM prompt. The endpoint owns its own Upstash cache (v3 prefix, 6h TTL), supports a shadow mode that runs both paths in parallel for offline diffing, and is auth'd via RELAY_SHARED_SECRET. Three-layer graceful degradation (endpoint → legacy Gemini-direct → stub) keeps the brief shipping on any failure. Env knobs: - BRIEF_WHY_MATTERS_PRIMARY=analyst\|gemini (default: analyst; typo → gemini) - BRIEF_WHY_MATTERS_SHADOW=0\|1 (default: 1; only '0' disables) - BRIEF_WHY_MATTERS_SHADOW_SAMPLE_PCT=0..100 (default: 100) - BRIEF_WHY_MATTERS_ENDPOINT_URL (Railway, optional override) Cache keys: - brief:llm:whymatters:v3:{hash16} — envelope {whyMatters, producedBy, at}, 6h TTL. Endpoint-owned. - brief:llm:whymatters:shadow:v1:{hash16} — {analyst, gemini, chosen, at}, 7d TTL. Fire-and-forget. - brief:llm:whymatters:v2:{hash16} — legacy. Cron's fallback path still reads/writes during the rollout window; expires in ≤24h. Tests: 6022 pass (existing 5915 + 12 core + 36 endpoint + misc). typecheck + typecheck:api + biome on changed files clean. Plan (Codex-approved after 4 rounds): docs/plans/2026-04-21-001-feat-brief-why-matters-analyst-endpoint-plan.md * fix(brief): address /ce:review round 1 findings on PR #3248 Fixes 5 findings from multi-agent review, 2 of them P1: - #241 P1: `.gitignore !api/internal/*` was too broad — it re-included `.env`, `.env.local`, and any future secret file dropped into that directory. Narrowed to explicit source extensions (`.ts`, `.js`, `.mjs`) so parent `.env` / secrets rules stay in effect inside api/internal/. - #242 P1: `Dockerfile.digest-notifications` did not COPY `shared/brief-llm-core.js` + `.d.ts`. Cron would have crashed at container start with ERR_MODULE_NOT_FOUND. Added alongside brief-envelope + brief-filter COPY lines. - #243 P2: Cron dropped the endpoint's source/producedBy ground-truth signal, violating PR #3247's own round-3 memory (feedback_gate_on_ground_truth_not_configured_state.md). Added structured log at the call site: `[brief-llm] whyMatters source=<src> producedBy=<pb> hash=<h>`. Endpoint response now includes `hash` so log + shadow-record pairs can be cross-referenced. - #244 P2: Defense-in-depth prompt-injection hardening. Story fields flowed verbatim into both LLM prompts, bypassing the repo's sanitizeForPrompt convention. Added sanitizeStoryFields helper and applied in both analyst and gemini paths. - #245 P2: Removed redundant `validate` option from callLlmReasoning. With only openrouter configured in prod, a parse-reject walked the provider chain, then fell through to the other path (same provider), then the cron's own fallback (same model) — 3x billing on one reject. Post-call parseWhyMatters check already handles rejection cleanly. Deferred to P3 follow-ups (todos 246-248): singleflight, v2 sunset, misc polish (country-normalize LOC, JSDoc pruning, shadow waitUntil, auto-sync mirror, context-assembly caching). Tests: 6022 pass. typecheck + typecheck:api clean. * fix(brief-why-matters): ctx.waitUntil for shadow write + sanitize legacy fallback Two P2 findings on PR #3248: 1. Shadow record was fire-and-forget without ctx.waitUntil on an Edge function. Vercel can terminate the isolate after response return, so the background redisPipeline write completes unreliably — i.e. the rollout-validation signal the shadow keys were supposed to provide was flaky in production. Fix: accept an optional EdgeContext 2nd arg. Build the shadow promise up front (so it starts executing immediately) then register it with ctx.waitUntil when present. Falls back to plain unawaited execution when ctx is absent (local harness / tests). 2. scripts/lib/brief-llm.mjs legacy fallback path called buildWhyMattersPrompt(story) on raw fields with no sanitization. The analyst endpoint sanitizes before its own prompt build, but the fallback is exactly what runs when the endpoint misses / errors — so hostile headlines / sources reached the LLM verbatim on that path. Fix: local sanitizeStoryForPrompt wrapper imports sanitizeForPrompt from server/_shared/llm-sanitize.js (existing pattern — see scripts/seed-digest-notifications.mjs:41). Wraps story fields before buildWhyMattersPrompt. Cache key unchanged (hash is over raw story), so cache parity with the analyst endpoint's v3 entries is preserved. Regression guard: new test asserts the fallback prompt strips "ignore previous instructions", "### Assistant:" line prefixes, and `<\|im_start\|>` tokens when injection-crafted fields arrive. Typecheck + typecheck:api clean. 6023 / 6023 data tests pass. * fix(digest-cron): COPY server/_shared/llm-sanitize into digest-notifications image Reviewer P1 on PR #3248: my previous commit (`4eee22083`) added `import sanitizeForPrompt from server/_shared/llm-sanitize.js` to scripts/lib/brief-llm.mjs, but Dockerfile.digest-notifications cherry- picks server/_shared/* files and doesn't copy llm-sanitize. Import is top-level/static — the container would crash at module load with ERR_MODULE_NOT_FOUND the moment seed-digest-notifications.mjs pulls in scripts/lib/brief-llm.mjs. Not just on fallback — every startup. Fix: add `COPY server/_shared/llm-sanitize.js server/_shared/llm-sanitize.d.ts` next to the existing brief-render COPY line. Module is pure string manipulation with zero transitive imports — nothing else needs to land. Cites feedback_validation_docker_ship_full_scripts_dir.md in the comment next to the COPY; the cherry-pick convention keeps biting when new cross-dir imports land in scripts/lib/ or scripts/shared/. Can't regression-test at build time from this branch without a docker-build CI job, but the symptom is deterministic — local runs remain green (they resolve against the live filesystem); only the container crashes. Post-merge, Railway redeploy of seed-digest- notifications should show a clean `Starting Container` log line instead of the MODULE_NOT_FOUND crash my prior commit would have caused.	2026-04-21 14:03:27 +04:00
Elie Habib	89c179e412	fix(brief): cover greeting was hardcoded "Good evening" regardless of issue time (#3254 ) * fix(brief): cover greeting was hardcoded "Good evening" regardless of issue time Reported: a brief viewed at 13:02 local time showed "Good evening" on the cover (slide 1) but "Good afternoon." on the digest greeting page (slide 2). Cause: `server/_shared/brief-render.js:renderCover` had the string `'Good evening'` hardcoded in the cover's mono-cased salutation slot. The digest greeting page (slide 2) renders the time-of-day-correct value from `envelope.data.digest.greeting`, which is computed by `shared/brief-filter.js:174-179` from `localHour` in the user's TZ (< 12 → morning, < 18 → afternoon, else → evening). So any brief viewed outside the literal evening showed an inconsistent pair. Fix: thread `digest.greeting` into `renderCover`; a small `coverGreeting()` helper strips the trailing period so the cover's no-punctuation mono style is preserved. On unexpected/missing values it falls back to a generic "Hello" rather than silently re-hardcoding a specific time of day. Tests: 5 regression cases in `tests/brief-magazine-render.test.mjs` cover afternoon/morning/evening parity, period stripping, and HTML escape (defense-in-depth). 60 total in that file pass. Full test:data 5921 pass. typecheck + typecheck:api + biome clean. * chore(brief): fix orphaned JSDoc on coverGreeting / renderCover Greptile flagged: the original `renderCover` JSDoc block stayed above `coverGreeting` when the helper was inserted, so the @param shape was misattributed to the wrong function and `renderCover` was left undocumented (plus the new `greeting` field was unlisted). Moved the opts-shape JSDoc to immediately above `renderCover` and added `greeting: string` to the param type. `coverGreeting` keeps its own prose comment. No runtime change.	2026-04-21 13:46:21 +04:00
Elie Habib	b0928f213c	fix(live-webcams): refresh Iran-Attacks multicam + Mideast Mecca video IDs (#3251 ) User reported two tiles showing "This live stream recording is not available" — the pinned fallbackVideoIds had gone dark. - iran-multicam (Iran Attacks → Middle East slot): FGUKbzulB_Y → KSwPNkzEgxg - mecca (Mideast → Mecca slot): Cm1v4bteXbI → kJwEsQTegxk Values supplied by the user from working YouTube live URLs. Only `fallbackVideoId` is read by the runtime (buildEmbedUrl line 303, open-link line 422); `channelHandle` is metadata and left as-is.	2026-04-21 13:09:28 +04:00
Elie Habib	c279f6f426	fix(pro-marketing): nav reflects auth state, hide pro banner for pro users (#3250 ) * fix(pro-marketing): reflect auth state in nav, hide pro banner for pro users Two related signed-in-experience bugs caught by the user during the post-purchase flow: 1. /pro Navbar's SIGN IN button never reacted to auth state. The component was a static const Navbar = () => <nav>...</nav>; with no Clerk subscription, so signing in left the SIGN IN button in place even though the user was authenticated. 2. The "Pro is launched — Upgrade to Pro" announcement banner on the main app showed for ALL visitors including paying Pro subscribers. Pitching upgrade to a customer who already paid is a small but real annoyance, and it stays sticky for 7 days via the localStorage dismiss key — so a returning paying user dismisses it once and then never sees the (genuinely useful) banner again if they later downgrade. ## Changes ### pro-test/src/App.tsx — useClerkUser hook + ClerkUserButton - New useClerkUser() hook subscribes to Clerk via clerk.addListener and returns { user, isLoaded } so any component can react to auth changes (sign-in, sign-out, account switch). - New ClerkUserButton component mounts Clerk's native UserButton widget (avatar + dropdown with profile/sign-out) into a div via clerk.mountUserButton — inherits the existing dark-theme appearance options from services/checkout.ts::ensureClerk. - Navbar swaps SIGN IN button for ClerkUserButton when user is signed in. Slot is intentionally empty during isLoaded=false to avoid a SIGN IN → avatar flicker for returning users. - Hero hides its redundant SIGN IN CTA when signed in; collapses to just "Choose Plan" which is the relevant action for returning users. - Public/pro/ rebuilt to ship the change (per PR #3229's bundle- freshness rule). ### src/components/ProBanner.ts — premium-aware show + reactive auto-hide - showProBanner returns early if hasPremiumAccess() — same authoritative signal used by the frontend's panel-gating layer (unions API key, tester key, Clerk pro role, AND Convex Dodo entitlement). - onEntitlementChange listener auto-dismisses the banner if a Convex snapshot arrives mid-session that flips the user to premium (e.g. Dodo webhook lands while they're sitting on the dashboard). Does NOT write the dismiss timestamp, so the banner reappears correctly if they later downgrade. ## Test plan ### pro-test (sign-in UI) - [ ] Anonymous user loads /pro → SIGN IN button visible in nav. - [ ] Click SIGN IN, complete Clerk modal → button replaced with Clerk's UserButton avatar dropdown. - [ ] Open dropdown, click Sign Out → reverts to SIGN IN button. - [ ] Hard reload as signed-in user → SIGN IN button never flashes; avatar appears once Clerk loads. ### main app (banner gating) - [ ] Anonymous user loads / → "Pro is launched" banner shows. - [ ] Click ✕ to dismiss → banner stays dismissed for 7 days (existing behavior preserved). - [ ] Pro user (active Convex entitlement) loads / → banner does NOT appear, regardless of dismiss state. - [ ] Free user opens /, then completes checkout in another tab and Convex publishes the entitlement snapshot → banner auto-hides in the dashboard tab without reload. - [ ] Pro user whose subscription lapses (validUntil < now) → banner reappears on next page load, since dismiss timestamp wasn't written by the entitlement-change auto-hide. * fix(pro-banner): symmetric show/hide on entitlement change Reviewer caught that the previous iteration only handled the upgrade direction (premium snapshot → hide banner) but never re-showed the banner on a downgrade. App.ts calls showProBanner() once at init, so without a symmetric show path, a session that started premium and then lost entitlement (cancellation, billing grace expiry, plan downgrade for the same user) would stay banner-less for the rest of the SPA session — until a full reload re-ran App.ts init. Net effect of the bug: the comment claiming "the banner reappears correctly if they later downgrade or the entitlement lapses" was false in practice for any in-tab transition. Two changes: 1. Cache the container on every showProBanner() call, including the early-return paths. App.ts always calls showProBanner() once at init regardless of premium state, so this guarantees the listener has the container reference even when the initial mount was skipped (premium user, dismissed, in iframe). 2. Make onEntitlementChange handler symmetric: - premium snapshot + visible → hide (existing behavior) - non-premium snapshot + not visible + cached container + not dismissed + not in iframe → re-mount via showProBanner The non-premium re-mount goes through showProBanner() so it gets the same gate checks as the initial path (isDismissed, iframe, premium). We can never surface a banner the user has already explicitly ✕'d this week. Edge cases handled: - User starts premium, no banner shown, downgrades mid-session → listener fires, premium false, no bannerEl, container cached, not dismissed → showProBanner mounts banner ✓ - User starts free, sees banner, upgrades mid-session → listener fires, premium true, bannerEl present → fade out ✓ - User starts free, dismisses banner, upgrades, downgrades → listener fires on downgrade, premium false, no bannerEl, container cached, isDismissed=true → showProBanner returns early ✓ - User starts free, banner showing, multiple entitlement snapshots arrive without state change → premium=false && bannerEl present, neither branch fires, idempotent no-op ✓ * fix(pro-banner): defer initial mount while entitlement is loading Greptile P1 round-2: hasPremiumAccess() at line 48 reads isEntitled() synchronously, but the Convex entitlement subscription is fired non-awaited at App.ts:868 (`void initEntitlementSubscription()`). showProBanner() runs at App.ts:923 during init Phase 1, before the first Convex snapshot arrives. So a Convex-only paying user (Clerk role 'free' + Dodo entitlement tier=1) sees this sequence: t=0 init runs → hasPremiumAccess() === false (isEntitled() reads currentState===null) → "Upgrade to Pro" banner mounts t=~1s Convex snapshot arrives → onEntitlementChange fires → my listener detects premium=true && bannerEl !== null → fade out That's a 1+ second flash of "you should upgrade!" content for someone who has already paid. Worst case is closer to ~10s on a cold-start Convex client, which is much worse — looks like the upgrade pitch is the actual UI. Defer the initial mount when (1) the user is signed in (so they plausibly have a Convex entitlement) AND (2) the entitlement state hasn't loaded yet (currentState === null). The existing onEntitlementChange listener will mount it later if the first snapshot confirms the user is actually free. Two reasons this is gated on "signed in": - Anonymous users will never have a Convex entitlement, so deferring would mean the banner NEVER mounts for them. Bad regression: anon visitors are the highest-value audience for the upgrade pitch. - For signed-in users, the worst case if no entitlement EVER arrives is the banner stays absent — which is identical to a paying user's correct state, so it fails-closed safely. Edge case behavior: - Anonymous user: no Clerk session → first condition false → banner mounts immediately ✓ - Signed-in free user with first snapshot pre-loaded somehow: second condition false → banner mounts immediately ✓ - Signed-in user, snapshot pending: deferred → listener mounts on first snapshot if user turns out free ✓ - Signed-in user, snapshot pending, user turns out premium: never mounted ✓ (the desired path) - Signed-in user, snapshot pending, never arrives (Convex outage): banner never shows → see above, this fails-closed safely	2026-04-21 11:01:57 +04:00
Elie Habib	6977e9d0fe	fix(gateway): accept Dodo entitlement as pro, not just Clerk role — unblocks paying users (#3249 ) * fix(gateway): accept Dodo entitlement as pro, not just Clerk role The gateway's legacy premium-paths gate (lines 388-401) was rejecting authenticated Bearer users with 403 "Pro subscription required" whenever session.role !== 'pro' — which is EVERY paying Dodo subscriber, because the Dodo webhook pipeline writes Convex entitlements and does NOT sync Clerk publicMetadata.role. So the flow was: - User pays, Dodo webhook fires, Convex entitlement tier=1 written - User loads the dashboard, Clerk token includes Bearer but role='free' - Gateway sees role!=='pro' → 403 on every intelligence/trade/ economic/sanctions premium endpoint - User sees a blank dashboard despite having paid This is the exact split-brain documented at the frontend layer (src/services/panel-gating.ts:11-27): "The Convex entitlement check is the authoritative signal for paying customers — Clerk `publicMetadata.plan` is NOT written by our webhook pipeline". The frontend was fixed by having hasPremiumAccess() fall through to isEntitled() from Convex. The backend gateway still had the Clerk-role-only gate, so paying users got rejected even though their Convex entitlement was active. Align the gateway gate with the logic already in server/_shared/premium-check.ts::isCallerPremium (line 44-49): 1. If Clerk role === 'pro' → allow (fast path, no Redis/Convex I/O) 2. Else if session.userId → look up Convex entitlement; allow if tier >= 1 AND validUntil >= Date.now() (covers lapsed subs) 3. Else → 403 Same two-signal semantics as the per-handler isCallerPremium, so the gateway and handlers can't disagree on who is premium. Uses the already-imported getEntitlements function (line 345 already imports it dynamically; promoting to top-level import since the new site is in a hotter path). Impact: unblocks all Dodo subscribers whose Clerk role is still 'free' — the common case after any fresh Pro purchase and for every user since webhook-based role sync was never wired up. Reported 2026-04-21 post-purchase flow: user completed Dodo payment, landed back on dashboard, saw 403s on get-regional-snapshot, get-tariff-trends, list-comtrade-flows, get-national-debt, deduct-situation — all 5 are in PREMIUM_RPC_PATHS but not in ENDPOINT_ENTITLEMENTS, so they hit this legacy gate. * fix(gateway): move entitlement fallback to the gate that actually fires Reviewer caught that the previous iteration of this fix put the entitlement fallback at line ~400, inside an `if (sessionUserId && !keyCheck.valid && needsLegacyProBearerGate)` branch that's unreachable for the case the PR was supposed to fix: - sessionUserId is only resolved when isTierGated is true (line 292) — JWKS lookup is intentionally skipped for non-tier-gated paths. - needsLegacyProBearerGate IS the non-tier-gated set (PREMIUM_RPC_PATHS && !isTierGated). - So sessionUserId is null, the branch never enters, and the actual legacy-Bearer rejection still happens earlier at line 367 inside the `keyCheck.required && !keyCheck.valid` branch. Move the entitlement fallback INTO the line-367 check, where the Bearer is already being validated and `session.userId` is already exposed on the validateBearerToken() result. No extra JWKS round-trip needed (validateBearerToken already verified the JWT). The previously- added line-400 block is removed since it never ran. Now for a paying Dodo subscriber whose Clerk role is still 'free': - Bearer validates → role !== 'pro' - Fall through: getEntitlements(session.userId) → tier=1, validUntil future - allowed = true, request proceeds to handler Same fail-closed semantics as before for the negative cases: - Anonymous → no Bearer → 401 - Bearer with invalid JWT → 401 - Free user with no Dodo entitlement → 403 - Pro user whose Dodo subscription lapsed (validUntil < now) → 403 * chore(gateway): drop redundant dynamic getEntitlements import Greptile spotted that the previous commit promoted getEntitlements to a top-level import for the new line-385 fallback site, but the older dynamic import at line 345 (in the user-API-key entitlement check branch) was left in place. Same module, same symbol, so the dynamic import is now dead weight that just adds a microtask boundary to the hot path. Drop it; line 345's `getEntitlements(sessionUserId)` call now resolves through the top-level import like the line-385 site already does.	2026-04-21 10:55:09 +04:00
Elie Habib	4d9ae3b214	feat(digest): topic-grouped brief ordering (size-first) (#3247 )	2026-04-21 08:58:02 +04:00
Elie Habib	ee93fb475f	fix(portwatch): cut HISTORY_DAYS 90 to 60 so per-country fits 540s budget (#3246 ) Prod log 2026-04-21T00:02Z on the standalone portwatch-port-activity service confirmed the per-country shape at 90 days still doesn't fit even in its own 540s container: batch 1/15: 7 seeded, 5 errors (90.0s) batch 5/15: 42 seeded, 18 errors (392.6s) [SIGTERM at 540s after ~batch 7] Math: avg ~75s/batch × 15 batches = 1125s needed vs 540s available. Degradation guard would reject the ~50-country partial publish against a prev snapshot of ~150+ countries. 60 days is the minimum window that still covers both aggregates the UI consumes: last30 (days 0-30, all current metrics) + prev30 (days 30-60, for trendDelta). Cutting from 90d→60d drops each per-country query by ~33% in row count and page count. Expected avg batch time ~50s. No feature regression: last7/anomalySignal were already no-ops because ArcGIS's Daily_Ports_Data max date lags 10+ days behind now, so no row ever falls in the last-7-day window regardless of HISTORY_DAYS. Test added asserting HISTORY_DAYS=60 so an accidental revert breaks CI. 55 portwatch tests pass. Typecheck + lint clean.	2026-04-21 06:37:40 +04:00
Elie Habib	56a792bbc4	docs(marketing): bump source-count claims from 435+ to 500+ (#3241 ) Feeds.ts is at 523 entries after PR #3236 landed. The "435+" figure has been baked into marketing copy, docs, press kit, and localized strings for a long time and is now noticeably understated. Bump to 500+ as the new canonical figure. Also aligned three stale claims in less-visited docs: docs/getting-started.mdx 70+ RSS feeds => 500+ RSS feeds docs/ai-intelligence.mdx 344 sources => 500+ sources docs/COMMUNITY-PROMOTION-GUIDE 170+ news feeds => 500+ news feeds 170+ news sources => 500+ news sources And bumped the digest-dedup copy 400+ to 500+ (English + French locales + pro-test/index.html prerendered body) for consistency with the pricing and GDELT panels. Left alone on purpose (different metric): 22 services / 22 service domains 24 feeds (security-advisory seeder specifically) 31 sources (freshness tracker) 45 map layers Rebuilt /pro bundle so the per-locale chunks + prerendered index.html under public/pro/assets ship the new copy. 20 locales updated.	2026-04-20 22:39:42 +04:00
Elie Habib	d880f6a0e7	refactor(aviation): consolidate intl+FAA+NOTAM+news seeds into seed-aviation.mjs (#3238 ) * refactor(aviation): consolidate intl+FAA+NOTAM+news seeds into seed-aviation.mjs seed-aviation.mjs was misnamed: it wrote to a dead Redis key while the 51-airport AviationStack loop + ICAO NOTAM loop lived hidden inside ais-relay.cjs, duplicating the NOTAM write already done by seed-airport-delays.mjs. Make seed-aviation.mjs the single home for every aviation Redis key: aviation:delays:intl:v3 (AviationStack 51 intl — primary) aviation:delays:faa:v1 (FAA ASWS 30 US) aviation:notam:closures:v2 (ICAO NOTAM 60 global) aviation:news::24:v1 (9 RSS feeds prewarmer) One unified AIRPORTS registry (~85 entries) replaces the three separate lists. Notifications preserved via wm:events:queue LPUSH + SETNX dedup; prev-state migrated from in-process Sets to Redis so short-lived cron runs don't spam on every tick. ICAO quota-exhaustion backoff retained. Contracts preserved byte-identically for consumers (AirportDelayAlert shape, seed-meta:aviation:{intl,faa,notam} meta keys, runSeed envelope writes). Impact: kills ~8,640/mo wasted AviationStack calls (dead-key writes), strips ~490 lines of hidden seed logic from ais-relay, eliminates duplicate NOTAM writer. Net -243 lines across three files. Railway steps after merge: 1. Ensure seed-aviation service env has AVIATIONSTACK_API + ICAO_API_KEY. 2. Delete/disable the seed-airport-delays Railway service. 3. ais-relay redeploys automatically; /aviationstack + /notam live proxies for user-triggered flight lookups preserved. * fix(aviation): preserve last-good intl snapshot on unhealthy/skipped fetch + restore NOTAM quota-exhaust handling Review feedback on PR #3238: (1) Intl unhealthy → was silently overwriting aviation:delays:intl:v3 with an empty or partial snapshot because fetchAll() always returned { alerts } and zeroIsValid:true let runSeed publish. Now: • seedIntlDelays() returns { alerts, healthy, skipped } unchanged • fetchAll() refuses to publish when !healthy \|\| skipped: - extendExistingTtl([INTL_KEY, INTL_META_KEY], INTL_TTL) - throws so runSeed enters its graceful catch path (which also extends these TTLs — idempotent) • Per-run cache (cachedRun) short-circuits subsequent withRetry(3) invocations so the retries don't burn 3x NOTAM quota + 3x FAA/RSS fetches when intl is sick. (2) NOTAM quota exhausted — PR claimed "preserved" but only logged; the NOTAM data key was drifting toward TTL expiry and seed-meta was going stale, which would flip api/health.js maxStaleMin=240 red after 4h despite the intended 24h backoff window. Now matches the pre-strip ais-relay behavior byte-for-byte: • extendExistingTtl([NOTAM_KEY], NOTAM_TTL) • upstashSet(NOTAM_META_KEY, {fetchedAt: now, recordCount: 0, quotaExhausted: true}, 604800) Consumers keep serving the last known closure list; health stays green. Also added extendExistingTtl fallbacks on FAA/NOTAM network-rejection paths so transient network failures also don't drift to TTL expiry. * refactor(aviation): move secondary writes + notifications into afterPublish Review feedback on PR #3238: fetchAll() was impure — it wrote FAA / NOTAM / news and dispatched notifications during runSeed's fetch phase, before the canonical aviation:delays:intl:v3 publish ran. If that later publish failed, consumers could see fresh FAA/NOTAM/news alongside a stale intl key, and notifications could fire for a run whose primary key never published, breaking the "single home / one cron tick" atomic contract. Restructure: • fetchAll() now pure — returns { intl, faa, notam, news + rejection refs }. No Redis writes, no notifications. • Intl gate stays: unhealthy / skipped → throw. runSeed's catch extends TTL on INTL_KEY + seed-meta:aviation:intl and exits 0. afterPublish never runs, so no side effects escape. • publishTransform extracts { alerts } from the bundle for the canonical envelope; declareRecords sees the transformed shape. • afterPublish handles ALL secondary writes (FAA, NOTAM, news) and notification dispatch. Runs only after a successful canonical publish. • Per-run memo (cachedBundle) still short-circuits withRetry(3) retries so NOTAM quota isn't burned 3x when intl is sick. NOTAM quota-exhaustion + rejection TTL-extend branches preserved inside afterPublish — same behavior, different location. * refactor(aviation): decouple FAA/NOTAM/news side-cars from intl's runSeed gate Review feedback on PR #3238: the previous refactor coupled all secondary outputs to the AviationStack primary key. If AVIATIONSTACK_API was missing or intl was systemically unhealthy, fetchAll() threw → runSeed skipped afterPublish → FAA/NOTAM/news all went stale despite their own upstream sources being fine. Before consolidation, FAA and NOTAM each ran their own cron and could freshen independently. This restores that independence. Structure: • Three side-car runners: runFaaSideCar, runNotamSideCar, runNewsSideCar. Each acquires its own Redis lock (aviation:faa / aviation:notam / aviation:news — distinct from aviation:intl), fetches its source, writes data-key + seed-meta on success, extends TTL on failure, releases the lock. Completely independent of the AviationStack path. • NOTAM side-car keeps the quota-exhausted + rejection handling and dispatches notam_closure notifications inline. • main() runs the three side-cars sequentially, then hands off to runSeed for intl. runSeed still process.exit()s at the end so it remains the last call. • Intl's afterPublish now only dispatches aviation_closure notifications (its single responsibility). Removed: the per-run memo for fetchAll (no longer needed — withRetry now only re-runs the intl fetch, not FAA/NOTAM/RSS). Net behavior: • AviationStack 500s / missing key → FAA, NOTAM, news still refresh normally; only aviation:delays:intl:v3 extends TTL + preserves prior snapshot. • ICAO quota exhausted → NOTAM extends TTL + writes fresh meta (as before); FAA/intl/news unaffected. • FAA upstream failure → only FAA extends TTL; other sources unaffected. * fix(aviation): correct Gaborone ICAO + populate FAA alert meta from registry Greptile review on PR #3238: P1: GABS is not the ICAO for Gaborone — the value was faithfully copied from the pre-strip ais-relay NOTAM list which was wrong. Botswana's ICAO prefix is FB; the correct code is FBSK. NOTAM queries for GABS would silently exclude Gaborone from closure detection. (Pre-existing bug in the repo; fixing while in this neighborhood.) P2 (FAA alerts): Now that the unified AIRPORTS registry carries icao/name/city/country for every FAA airport, use it. Previous code returned icao:'', name:iata, city:'' — consumers saw bare IATA codes for US-only alerts. Registry lookup via a new FAA_META map; lat/lon stays 0,0 by design (FAA rows aren't rendered on the globe, so lat/lon is intentionally absent from those registry rows). P2 (NOTAM TTL on quota exhaustion): already fixed in commit `ba7ed014e` (pre-decouple) — confirmed line 803 calls extendExistingTtl([NOTAM_KEY]) and line 805 writes fresh meta with quotaExhausted=true.	2026-04-20 22:37:49 +04:00
Elie Habib	ecd56d4212	feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast (#3236 ) * feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast Four direct-RSS sources verified from a clean IP and absent everywhere in the repo (src/config/feeds.ts, scripts/seed-, ais-relay.cjs, RSS allowlist). Closes the highest-ROI Iran / Israel domestic-press gap from the ME source audit (PR #3226) with zero infra changes. - IRNA https://en.irna.ir/rss - Mehr News https://en.mehrnews.com/rss - Jerusalem Post https://www.jpost.com/rss/rssfeedsheadlines.aspx - Ynetnews https://www.ynetnews.com/Integration/StoryRss3089.xml Propaganda-risk metadata: - IRNA + Mehr tagged high / Iran state-affiliated (join Press TV). - JPost + Ynetnews tagged low with knownBiases for transparency. RSS allowlist updated in all three mirrors (shared/, scripts/shared/, api/_rss-allowed-domains.js) per the byte-identical mirror contract enforced by tests/edge-functions.test.mjs. Deferred (separate PRs): - Times of Israel: already in allowlist; was removed from feeds for cloud-IP 403. Needs Decodo routing. - IDF Spokesperson: idf.il has no direct RSS endpoint; needs scraper. - Tasnim / Press TV RSS / Israel Hayom: known cloud-IP blocks. - WAM / SPA / KUNA / QNA / BNA: public RSS endpoints are dead; sites migrated to SPAs or gate with 403. Plan doc (PR #3226) overstated the gap: it audited only feeds.ts and missed that travel advisories + US Embassy alerts are already covered by scripts/seed-security-advisories.mjs. NOTAM claim in that doc is also wrong: we use ICAO's global NOTAM API, not FAA. fix(feeds): enable IRNA, Mehr, Jerusalem Post, Ynetnews by default Reviewer on #3236 flagged that adding the four new ME feeds to FULL_FEEDS.middleeast alone leaves them disabled on first run, because App.ts:661 persists computeDefaultDisabledSources() output derived from DEFAULT_ENABLED_SOURCES. Users would have to manually re-enable via Settings > Sources, defeating the purpose of broadening the default ME mix. Add the four new sources to DEFAULT_ENABLED_SOURCES.middleeast so they ship on by default. Placement keeps them adjacent to their peers (IRNA / Mehr with the other Iran sources, JPost / Ynetnews after Haaretz). Risk/slant tags already in SOURCE_PROPAGANDA_RISK ensure downstream digest dedup + summarization weights them correctly. * style(feeds): move JPost + Ynetnews under Low-risk section header Greptile on #3236 flagged that both entries are risk: 'low' but were inserted above the `// Low risk - Independent with editorial standards` comment header, making the section boundary misleading for future contributors. Shift them under the header where they belong. No runtime change; cosmetic ordering only.	2026-04-20 19:07:09 +04:00
Elie Habib	661bbe8f09	fix(health): nationalDebt threshold 7d → 60d — match monthly cron interval (#3237 ) * fix(health): nationalDebt threshold 7d → 60d to match monthly cron cadence User reported health showing: "nationalDebt": { status: "STALE_SEED", records: 187, seedAgeMin: 10469, maxStaleMin: 10080 } Root cause: api/health.js had `maxStaleMin: 10080` (7 days) on a seeder that runs every 30 days via seed-bundle-macro.mjs: { label: 'National-Debt', intervalMs: 30 * DAY, ... } The threshold was narrower than the cron interval, so every month between days 8–30 it guaranteed STALE_SEED. Original comment "7 days — monthly seed" even spelled the mismatch out loud. Data source cadence: - US Treasury debt_to_penny API: updates daily but we only snapshot latest - IMF WEO: quarterly/semi-annual release — no value in checking daily - 30-day cron is appropriate; stale threshold should be ≥ 2× interval Fix: bump maxStaleMin to 86400 (60 days). Matches the 2× pattern used by faoFoodPriceIndex + recovery pillar (recoveryFiscalSpace, etc.) which also run monthly. Also fixes the same mismatch in scripts/regional-snapshot/freshness.mjs — the 10080 ceiling there would exclude national-debt from capital_stress axis scoring 23 days out of every 30 between seeds. * fix(seed-national-debt): raise CACHE_TTL to 65d so health.js stale window is actually reachable PR #3237 review was correct: my earlier fix set api/health.js SEED_META.nationalDebt.maxStaleMin to 60d (86400min), but the seeder's CACHE_TTL was still 35d. After a missed monthly cron, the canonical key expired at day 35 — long before the 60d "stale" threshold. Result path: hasData=false → api/health.js:545-549 → status = EMPTY (crit) Not STALE_SEED (warn) as my commit message claimed. writeFreshnessMetadata() in scripts/_seed-utils.mjs:222 sets meta TTL to max(7d, ttlSeconds), so bumping ttlSeconds alone propagates to both the canonical payload AND the meta key. Fix: - CACHE_TTL 35d → 65d (5d past the 60d stale window so we get a clean STALE_SEED → EMPTY transition without keys vanishing mid-warn). - runSeed opts.maxStaleMin 10080 (7d) → 86400 (60d) so the in-seeder declaration matches api/health.js. Field is only validated for presence by runSeed (scripts/_seed-utils.mjs:798), but the drift was what hid the TTL invariant in the first place. Invariant this restores: for any SEED_META entry, seeder CACHE_TTL ≥ maxStaleMin + buffer so the "warn before crit" gradient actually exists. * fix(freshness): wire national-debt to seed-meta + teach extractTimestamp about seededAt Reviewer P2 on PR #3237: my earlier freshness.mjs bump to 86400 was a no-op. classifyInputs() (scripts/regional-snapshot/freshness.mjs:100-108, 122-132) uses the entry's metaKey or extractTimestamp()'s known field list. national-debt had neither — payload carries only `seededAt`, and extractTimestamp didn't know that field, so the "present but undated" branch treated every call as fresh. The age window never mattered. Two complementary fixes: 1. Add metaKey: 'seed-meta:economic:national-debt' to the freshness entry. Primary, authoritative source — seed-meta.fetchedAt is written by writeFreshnessMetadata() on every successful run, which is also what api/health.js reads, keeping both surfaces consistent. 2. Add `seededAt` to extractTimestamp()'s field list. Defense-in-depth: many other runSeed-based scripts (seed-iea-oil-stocks, seed-eurostat-country-data, etc.) wrap output as { ..., seededAt: ISO } with no metaKey in the freshness registry. Without this, they were also silently always-fresh. ISO strings parse via Date.parse. Note: `economic:eu-gas-storage:v1` uses `seededAt: String(Date.now())` — a stringified epoch number, which Date.parse does NOT handle. That seed's freshness classification is still broken by this entry's lack of metaKey, but it's a separate shape issue out of scope here. Flagged in PR body.	2026-04-20 19:03:47 +04:00
Elie Habib	42a86c5859	fix(preview): skip premium RPCs when main app runs inside /pro live-preview iframe (#3235 ) * fix(preview): skip premium RPCs when main app runs inside /pro preview iframe pro-test/src/App.tsx embeds the full main app as a "live preview" via <iframe src="https://worldmonitor.app?alert=false" sandbox="...">. The iframe boots an anonymous main-app session, which fires premium RPCs (get-regional-snapshot, get-tariff-trends, list-comtrade-flows, and on country-click the fetchProSections batch) with no Clerk bearer available. Every call 401s, the circuit breakers catch and fall through to empty fallbacks (so the preview renders fine), but the 401s surface on the PARENT /pro page's DevTools console and Sentry because `sandbox` includes `allow-same-origin`. Net effect: /pro pricing page shows a flood of fake-looking errors that cost us a session of debugging to trace back to the iframe. PR #3233's premiumFetch swap didn't help here (there's simply no token to inject for an anonymous iframe). Introduce `src/utils/embedded-preview.ts::IS_EMBEDDED_PREVIEW`, a module-level boolean evaluated once at load from `window.top !== window` (with try/catch for cross-origin sandboxes), and short-circuit three init-time premium entry points when true: - RegionalIntelligenceBoard.loadCurrent → renderEmpty() - fetchTariffTrends → return emptyTariffs - fetchComtradeFlows → return emptyComtrade Plus one defensive gate in country-intel.fetchProSections for the case a user clicks a country inside the iframe preview. Each gate returns the exact same empty fallback the breaker would have produced after a 401, so visual behavior is unchanged — the preview iframe still shows the dashboard layout with empty premium panels, just without the network request and its console/Sentry trail. Live-tab /pro page should now see zero 401s from regional-snapshot / tariff-trends / comtrade-flows on load. * fix(preview): narrow iframe gate to ?embed=pro-preview marker only Reviewer flagged that the first iteration's `window.top !== window` check was too broad. The repo explicitly markets "Embeddable iframe panels" as an Enterprise feature (pro-test/src/locales/en.json: whiteLabelDesc), so legitimate customer embeds must keep firing premium RPCs normally. Only the /pro marketing preview — which is known-anonymous and generates expected 401 noise — should short-circuit. Fix: replace the blanket iframe check with a unique marker that only /pro's preview iframe carries. - pro-test/src/App.tsx: iframe src switched from `?alert=false` (dead param, unused in main app) to `?embed=pro-preview`. Rebuilt public/pro/ to ship the change. - src/utils/embedded-preview.ts: two-gate check now. Gate 1 still requires `window.top !== window` so the marker leaking into a top-level URL doesn't disable premium RPCs for the top-level app. Gate 2 requires `?embed=pro-preview` in location.search so only the known embedder matches. Enterprise white-label embeds without this marker behave exactly like a top-level visit. Same three premium fetchers + the one country-intel path still gate on IS_EMBEDDED_PREVIEW; the semantic change is purely in how the flag is computed. Per PR #3229 / #3228 lesson, the pro-test rebuild ships in the same PR as the source change — public/pro/assets/index-*.js and index.html reflect the new iframe src.	2026-04-20 17:49:50 +04:00
Elie Habib	240abaa8ed	fix(premium): route RPC clients through premiumFetch — stop 401s for pro users (#3233 ) * fix(premium): route premium RPC clients through premiumFetch Four generated-client instantiations were using plain globalThis.fetch, bypassing the Clerk bearer / tester-key / WORLDMONITOR_API_KEY injection chain. Signed-in pro users hit the premium endpoints unauthenticated and got 401, with no visible path to recovery: - src/components/RegionalIntelligenceBoard.ts → get-regional-snapshot, get-regime-history, get-regional-brief - src/components/DeductionPanel.ts → deduct-situation, list-market-implications - src/services/trade/index.ts → get-tariff-trends, list-comtrade-flows (+ non-premium siblings) - src/app/country-intel.ts::fetchProSections → get-national-debt, getRegimeHistory/getRegionalBrief, list-comtrade-flows Swap each to `fetch: premiumFetch` (src/services/premium-fetch.ts), which tries in order: existing auth header → WORLDMONITOR_API_KEY → tester key → Clerk bearer token → unauthenticated passthrough. For non-premium endpoints that share the same client (e.g. getTradeFlows, getCustomsRevenue) the fallthrough behavior is identical to plain globalThis.fetch — no regression surface. Surfaces as user-facing 401 on /pro sign-in → redirect-to-/ flow, where pro users briefly see the dashboard try to fetch then hit 401. After this fix the bearer token flows through and regional/deduction/trade panels populate as expected. Left un-touched (not hitting premium paths today, so not blocking): - src/services/gdelt-intel.ts (searchGdeltDocuments) - src/services/social-velocity.ts (getSocialVelocity) - src/services/pizzint.ts (getPizzintStatus) If any of those ever move into PREMIUM_RPC_PATHS, swap them too. * fix(premium): split trade client + disable premium-breaker persistCache Review found a real access-control leak in the first iteration of this PR: routing the entire shared TradeServiceClient through premiumFetch populated module-level breakers with `persistCache: true` and auth-invariant cache keys. A pro user's get-tariff-trends / list-comtrade-flows response would be written to localStorage by the breaker, and a later free / signed-out session on the same browser would be served that cached premium data directly, bypassing both auth injection and the gateway's entitlement check. Two-layer fix so neither the in-memory breaker nor the persistent cache can leak premium data across auth states: 1. Split clients. publicClient keeps plain globalThis.fetch and feeds restrictions/flows/barriers/revenue breakers (non-premium, shareable across users). premiumClient uses premiumFetch and is ONLY used by fetchTariffTrends + fetchComtradeFlows. 2. Disable persistCache for premium breakers. tariffsBreaker and comtradeBreaker flip to `persistCache: false`. In-memory cache within a session is still fine (and expected for circuit-breaker behavior), but the response no longer survives a reload / cross- session switch where a different user could read it. Both changes are needed: split clients alone would still let premium responses ride through the old cached entries (if any) after the deploy; persistCache:false alone would still mean a shared client routes anonymous calls through premiumFetch (minor but avoidable token leak potential). Together they're airtight for the leak vector. Follow-up potentially worth doing: auth-keyed cache keys for breakers used by premium data, so the in-tab SPA sign-out case is also sealed. Not blocking today. * fix(premium): invalidate in-memory premium breakers on Clerk identity change Review caught the remaining leak vector: persistCache:false + split clients closes cross-browser-reload leaks, but the module-level tariffsBreaker/comtradeBreaker in-memory cache still lives for the full 30-min / 6-hour TTL inside a single SPA session. A pro user loads tariff/comtrade data → signs out in the same tab → any caller is served the pro response from memory without re-auth. Track the Clerk user id that last populated the premium breakers in a module-level `lastPremiumUserId`. On every call to fetchTariffTrends or fetchComtradeFlows, check the current Clerk identity via getCurrentClerkUser(). If it changed (sign-out, user switch, free↔pro transition), call clearMemoryCache() on both premium breakers before executing. The breaker then falls back to a live fetch through premiumFetch with the new caller's credentials. `clearMemoryCache` (not `clearCache`) is deliberate — it only touches the in-memory cache and leaves persistent storage alone. Non-premium breakers on this same client (restrictions, flows, barriers, revenue) are untouched: their responses are public and shareable across auth states, so an identity-triggered clear would only cost us cache hits with zero security benefit. Edge cases handled: - First call: lastPremiumUserId is undefined, no clear. - Anonymous → anonymous: no clear (both null). - Anonymous → pro: clears (defensive; anon can't populate anyway because emptyTariffs/emptyComtrade fail shouldCache). - Pro → anonymous: clears (the critical case). - Pro A → pro B: clears (account switch). - getCurrentClerkUser throws (Clerk not loaded yet): treated as anonymous, safe default. Closes the audit cycle on this PR's cache-leak thread. * fix(premium): invalidate breakers on entitlement change, not just user id Reviewer caught that the prior fingerprint was `userId` only. That covers sign-out, user switch, and account change — but NOT the case the prior commit's comment explicitly claimed: a pro→free downgrade for the same signed-in user (subscription cancellation, billing grace-period expiry, plan switch to annual-lapsed). The Clerk user id doesn't change, so the invalidation never fired and the cached pro response kept serving until the 30-min / 6-hour breaker TTL. Widen the fingerprint to `${userId}:${plan}` (or 'anon' when signed-out). `getCurrentClerkUser()` already reads `plan` off `user.publicMetadata`, so no new Clerk calls needed. Now covered: - pro A → pro B (userId change) - pro → anon (userId → null) - anon → pro (defensive) - pro (active) → free (cancelled or expired) for same userId ← new - free → pro (upgrade for same userId) ← new Pro users who actively downgrade (or whose subscription lapses) during an open tab will see the cached premium response invalidate on the next premium fetcher call, at which point the live fetch through premiumFetch + the gateway's entitlement check returns the correct empty/403 response for their now-free plan. * fix(premium): fingerprint on hasPremiumAccess, not Clerk publicMetadata.plan Reviewer caught that Clerk publicMetadata.plan is NOT the authoritative premium signal in this codebase. Per the explicit docstring on src/services/panel-gating.ts::hasPremiumAccess (lines 11-27), the webhook pipeline does NOT write publicMetadata.plan — the authoritative signal is the Convex Dodo entitlement surfaced by isProUser() / isEntitled(). Two fingerprint blind spots that ate the prior iteration: - Paying user with valid Dodo entitlement but no Clerk metadata: publicMetadata.plan === 'free' → fingerprint 'uid:free' → no invalidation when their session transitions (they never looked premium by the fingerprint, but premiumFetch was still injecting their token and the gateway was still serving premium data). - Active pro user whose Dodo subscription lapses: Clerk metadata doesn't change → fingerprint stays 'uid:pro' → no invalidation → cached tariff/comtrade response keeps serving until TTL (the exact pro→free case the previous commit claimed to cover). Swap the plan source from getCurrentClerkUser().plan to hasPremiumAccess() — the single source of truth used by panel-gating, widgets, search, and event handlers. It unions API key, tester keys, Clerk pro role, AND Convex entitlement, so every path that legitimately grants premium access contributes to the fingerprint, and every path that revokes it triggers invalidation. Also add a reactive path: subscribe to onEntitlementChange() from src/services/entitlements, and wipe the premium breakers the moment Convex publishes a new entitlement snapshot. This closes the window between subscription lapse and the user's next premium panel click — the currently-open tariff panel clears its memory cache immediately instead of serving stale pro data until the user navigates. Combined: fingerprint is now (userId, hasPremiumAccess) tuple evaluated both lazily on every premium fetcher call AND eagerly when Convex pushes an entitlement change.	2026-04-20 16:28:04 +04:00
Elie Habib	d1ebc84c6c	feat(digest-dedup): single-link clustering (F1 0.73 vs 0.53 complete-link) (#3234 ) Problem ------- The post-threshold-tuning brief at /api/brief/user_3BovQ1tYlaz2YIGYAdDPXGFBgKy/2026-04-20-1532 still showed 4 copies of "US seizes Iranian ship", 3 copies of the Hormuz closure, and 2 copies of the oil-price story — despite running the calibrated 0.55 threshold. Root cause: complete-link is too strict for wire-headline clustering. Pairwise cosines in the 4-way ship-seizure cluster: 1 <-> 5: 0.632 5 <-> 8: 0.692 1 <-> 8: 0.500 5 <-> 10: 0.656 1 <-> 10: 0.554 8 <-> 10: 0.510 Complete-link requires EVERY pair to clear threshold. Pair 1<->8 at 0.500 fails so the whole 4-way cluster can't form, and all 4 stories bubble up as separate reps, eating 4 slots of the 12-story brief. Measured on the 12 real titles from that brief: Algorithm \| Clusters \| F1 \| P \| R --------------------------\|----------\|-------\|------\|------ complete-link @ 0.55 (was)\| 7 \| 0.526 \| 0.56 \| 0.50 complete-link @ 0.50 \| 6 \| 0.435 \| 0.38 \| 0.50 single-link @ 0.55 \| 4 \| 0.435 \| 0.28 \| 1.00 over-merge single-link @ 0.60 \| 6 \| 0.727 \| 0.67 \| 0.80 winner Change ------ scripts/lib/brief-dedup-embed.mjs: New singleLinkCluster(items, {cosineThreshold, vetoFn}) using union-find. Chain merges through strong intermediates when a direct pair is weak; respects the entity veto (blocked pairs don't union). O(N^2 alpha(N)); permutation-invariant by construction. scripts/lib/brief-dedup.mjs: New DIGEST_DEDUP_CLUSTERING env var (default 'single', set 'complete' to revert). readOrchestratorConfig returns 'clustering' field. Dispatch at call site picks the right function. Structured log line now includes clustering=<algo>. tests/brief-dedup-embedding.test.mjs: +8 regressions: - singleLinkCluster chains the 4-way through a bridge - veto blocks unions even when cosine passes - permutation-invariance property test (5 shuffles) - empty-input - DIGEST_DEDUP_CLUSTERING default is 'single' - DIGEST_DEDUP_CLUSTERING=complete kill switch works - unrecognised values fall back to 'single' - log line includes clustering=<algo> Bridge-pollution risk note -------------------------- The original plan rejected single-link to avoid the Jaccard-era "bridge pollution" (A~B=0.6, B~C=0.6, A~C=0.3 all chain through a mixed-topic B). With text-embedding-3-small at cosine >= 0.60, a bridge must be semantically real — the probe showed a 37% F1 bump with no new FPs on the production case. Setting DIGEST_DEDUP_CLUSTERING=complete on Railway is the instant rollback if a bad day ever surfaces chaining. Operator activation ------------------- After merge, on Railway seed-digest-notifications service: DIGEST_DEDUP_COSINE_THRESHOLD=0.60 No other changes needed — clustering=single is the default. Verification ------------ - npm run test:data 5825/5825 pass - tests/brief-dedup-embedding 53/53 pass (45 existing + 8 new) - typecheck + typecheck:api clean - biome check on changed files clean Post-Deploy Monitoring & Validation ----------------------------------- - Grep '[digest] dedup mode=embed clustering=single' in Railway logs — confirms the new algo is live - Expect clusters= to drop further on bulk ticks (stories=700+): current ~23 on 84-story ticks -> expected ~15-18 - Manually open next brief post-deploy, visually verify ship-seizure / Hormuz / oil stories no longer duplicate - Rollback: DIGEST_DEDUP_CLUSTERING=complete on Railway (instant, no deploy), next cron tick reverts to old behaviour - Validation window: 24h - Owner: koala73 Related ------- - #3200 embedding-based dedup (introduced complete-link) - #3224 DIGEST_SCORE_MIN floor (the low-importance half of the fix)	2026-04-20 16:21:20 +04:00
Elie Habib	d7393d8010	fix(pro): downgrade @clerk/clerk-js to v5 to restore auto-mount UI (#3232 ) The actual root cause behind the "Clerk was not loaded with Ui components" sign-in failure on /pro is NOT the import path — it's that pro-test was on @clerk/clerk-js v6.4.0 while the main app (which works fine) is on v5.125.7. Clerk v6 fundamentally changed `clerk.load()`: the UI controller is no longer auto-mounted by default. Both `@clerk/clerk-js` (the default v6 entry) and `@clerk/clerk-js/no-rhc` (the bundled-UI variant) expect the caller to either: - load Clerk's UI bundle from CDN and pass `window.__internal_ClerkUICtor` to `clerk.load({ ui: { ClerkUI } })`, or - manually wire up `clerkUICtor`. That's why my earlier "switch to no-rhc" fix (PR #3227 + #3228) didn't actually unbreak production — both v6 variants throw the same assertion. The error stack on the deployed bundle confirmed it: `assertComponentsReady` from `clerk.no-rhc-UeQvd9Xf.js`. Fix: pin pro-test to `@clerk/clerk-js@^5.125.7` to match the main app's working version. v5 still auto-mounts UI on `clerk.load()` — no extra wiring needed. The plain `import { Clerk } from '@clerk/clerk-js'` pattern (which the main app uses verbatim and which pro-test had before #3227) just works under v5. Verification of the rebuilt bundle (chunk: clerk-PNSFEZs8.js): - 3.05 MB (matches main app's clerk-DC7Q2aDh.js: 3.05 MB) - 44 occurrences of mountComponent (matches main: 44) - 3 occurrences of SignInComponent (matches main: 3) - 0 occurrences of "Clerk was not loaded with Ui" (the assertion error string is absent; UI is unconditionally mounted) Includes the rebuilt public/pro/ artifacts so this fix is actually deployed (PR #3229's CI check will catch any future PR that touches pro-test/src without rebuilding).	2026-04-20 15:25:11 +04:00
Elie Habib	0a4eff0053	feat(portwatch): split port-activity into standalone Railway cron + restore per-country shape (#3231 ) Context: PR #3225 globalised EP3 because the per-country shape was missing the section budget. Post-merge production log (2026-04-20) proved the globalisation itself was worse: 42s/page full-table scans (ArcGIS has no `date` index — confirmed via service metadata probe) AND intermittent "Invalid query parameters" on the global WHERE. Probes of outStatistics as an alternative showed it works for small countries (BRA: 19s, 103 ports) but times out server-side for heavy ones (USA: 313k historic rows, 30s+ server-compute, multiple retries returned HTTP_STATUS 000). Not a reliable path. The only shape ArcGIS reliably handles is per-country WHERE ISO3='X' AND date > Y (uses the ISO3 index). Its problem was fitting 174 countries in the 420s portwatch bundle budget — solve that by giving it its own container. Changes: - scripts/seed-portwatch-port-activity.mjs: restore per-country paginated EP3 with the accumulator shape from PR #3225 folded into the per-country loop (memory stays O(ports-per-country), not O(all-rows)). Keep every stabiliser: AbortSignal.any through fetchWithTimeout, SIGTERM handler with stage/batch/errors flush, per-country Promise.race with AbortController that actually cancels the work, eager p.catch for mid-batch error flush. - Add fetchWithRetryOnInvalidParams — single retry on the specific "Invalid query parameters" error class ArcGIS has returned intermittently in prod. Does not retry other error classes. - Bump LOCK_TTL_MS from 30 to 60 min to match the wider wall-time budget of the standalone cron. - scripts/seed-bundle-portwatch.mjs: remove PW-Port-Activity from the main portwatch bundle. Keeps PW-Disruptions (hourly), PW-Main (6h), PW-Chokepoints-Ref (weekly). - scripts/seed-bundle-portwatch-port-activity.mjs: new 1-section bundle. 540s section timeout, 570s bundle budget. Includes the full Railway service provisioning checklist in the header. - Dockerfile.seed-bundle-portwatch-port-activity: mirrors the resilience-validation pattern — node:22-alpine, full scripts/ tree copy (avoids the add-an-import-forget-to-COPY class that has bit us 3+ times), shared/ for _country-resolver. - tests/portwatch-port-activity-seed.test.mjs: rewrite assertions for the per-country shape. 54 tests pass (was 50, +4 for new assertions on the standalone bundle + Dockerfile + retry wrapper + ISO3 shape). Full test:data: 5883 pass. Typecheck + lint clean. Post-merge Railway provisioning: see header of seed-bundle-portwatch-port-activity.mjs for the 7-step checklist.	2026-04-20 15:21:43 +04:00
Elie Habib	234ec9bf45	chore(ci): enforce pro-test bundle freshness — local hook + CI backstop (#3229 ) * chore(ci): enforce pro-test bundle freshness, prevent silent deploy staleness public/pro/ is committed to the repo and served verbatim by Vercel. The root build script only runs the main app's vite build — it does NOT run pro-test's build. So any PR that changes pro-test/src/ without manually running `cd pro-test && npm run build` and committing the regenerated chunks ships to production with a stale bundle. This footgun just cost us: PR #3227 fixed the Clerk "not loaded with Ui components" sign-in bug in source, merged, deployed — and the live site still threw the error because the committed chunk under public/pro/assets/ was the pre-fix build. PR #3228 fix-forwarded by rebuilding. Two-layer enforcement so it doesn't happen again: 1. .husky/pre-push — mirrors the existing proto freshness block. If pro-test/ changed vs origin/main, rebuild and `git diff --exit-code public/pro/`. Blocks the push with a clear message if the bundle is stale or untracked files appear. 2. .github/workflows/pro-bundle-freshness.yml — CI backstop on any PR touching pro-test/ or public/pro/. Runs `npm ci + npm run build` in pro-test and fails the check if the working tree shows any diff or untracked files under public/pro/. Required before merge, so bypassing the local hook still can't land a stale bundle. Note: the hook's diff-against-origin/main check means it skips the build when pushing a branch that already matches main on pro-test/ (e.g. fix-forward branches that only touch public/pro/). CI covers that case via its public/pro/ path filter. * fix(hooks): scope pro-test freshness check to branch delta, not worktree The first version of this hook used `git diff --name-only origin/main -- pro-test/`, which compares the WORKING TREE to origin/main. That fires on unstaged local pro-test/ scratch edits and blocks pushing unrelated branches purely because of dirty checkout state. Switch to `$CHANGED_FILES` (computed earlier at line 77 from `git diff origin/main...HEAD`), which scopes the check to commits on the branch being pushed. This matches the convention the test-runner gates already use (lines 93-97). Also honor `$RUN_ALL` as the safety fallback when the branch delta can't be computed. * fix(hooks): trigger pro freshness check on public/pro/ too, match CI The first scoping fix used `^pro-test/` only, but the CI workflow keys off both `pro-test/` AND `public/pro/`. That left a gap: a bundle-only PR (e.g. a fix-forward rebuild like #3228, or a hand-edit to a committed asset) skipped the local check entirely while CI would still validate it. The hook and CI are now consistent. Trigger condition: `^(pro-test\|public/pro)/` — the rebuild + diff check now fires whenever the branch delta touches either side of the source/artifact pair, matching the CI workflow's path filter.	2026-04-20 15:21:25 +04:00
Elie Habib	9e022f23bb	fix(cable-health): stop EMPTY alarm during NGA outages — writeback fallback + mark zero-events healthy (#3230 ) User reported health endpoint showing: "cableHealth": { status: "EMPTY", records: 0, seedAgeMin: 0, maxStaleMin: 90 } despite the 30-min warm-ping loop running. Two bugs stacked: 1. get-cable-health.ts null-upstream path didn't write Redis. cachedFetchJson with a returning-null fetcher stores NEG_SENTINEL (10 bytes) in cable-health-v1 for 2 min. Handler then returned `fallbackCache \|\| { cables: {} }` to the client WITHOUT writing to cable-health-v1 or refreshing seed-meta. api/health.js saw strlen=10 → strlenIsData=false → hasData=false → records=0 → EMPTY (CRIT). Fix: on null result, write the fallback response back to CACHE_KEY (short TTL matching NEG_SENTINEL so a recovered NGA fetch can overwrite immediately) AND refresh seed-meta with the real count. Health now sees hasData=true during an outage. 2. Zero-cables was treated as EMPTY_DATA (CRIT), but `cables: {}` is the valid healthy state — NGA had no active subsea cable warnings. The old `Math.max(count, 1)` on recordCount was an intentional lie to sidestep this; now honest. Fix: add `cableHealth` to EMPTY_DATA_OK_KEYS. Matches the existing pattern for notamClosures, gpsjam, weatherAlerts — "zero events is valid, not critical". recordCount now reports actual cables.length. Combined: NGA outage → fallback cached locally + written back → health reads hasData=true, records=N, no false alarm. NGA healthy with zero active warnings → cables={}, records=0, EMPTY_DATA_OK → OK. NGA healthy with warnings → cables={...}, records>0 → OK. Regression guard to keep in mind: if anyone later removes cableHealth from EMPTY_DATA_OK_KEYS and wants strict zero-events to alarm, they'd also need to revisit `Math.max(count, 1)` or an equivalent floor so the "legitimately empty but healthy" state doesn't CRIT.	2026-04-20 15:21:04 +04:00
Elie Habib	2b7f83fd3e	fix(pro): regenerate /pro bundle with no-rhc Clerk so deploy reflects #3227 (#3228 ) PR #3227 fixed pro-test/src/services/checkout.ts to import @clerk/clerk-js/no-rhc instead of the headless main export, but the deployed bundle in public/pro/assets/ was never regenerated. The Vercel deploy ships whatever is committed under public/pro/ — the root build script does not run pro-test's vite build — so production /pro continued serving the old broken clerk-C6kUTNKl.js even after #3227 merged. Sign-in still threw "Clerk was not loaded with Ui components". Rebuild: cd pro-test && npm run build, which writes the new chunks to ../public/pro/assets/. Deletes the stale clerk-C6kUTNKl.js + index-J1JYVlDk.js, adds clerk.no-rhc-UeQvd9Xf.js + index-CFLOgmG-.js, and updates pro/index.html to reference them.	2026-04-20 14:24:44 +04:00
Elie Habib	7979b4da0e	fix(pro): switch Clerk to no-rhc bundle so sign-in modal mounts on /pro (#3227 ) * fix(pro): switch Clerk to no-rhc bundle so sign-in modal mounts The /pro marketing page was throwing "Clerk was not loaded with Ui components" the moment an unauthenticated user clicked Sign In or GET STARTED on a pricing tier, blocking every conversion. @clerk/clerk-js v6 main export (`dist/clerk.mjs`) is the headless build — it has no UI controller and expects `clerkUICtor` passed to `clerk.load()`. Calling `openSignIn()` on it always throws. The bundled-with-UI variant is exposed at `@clerk/clerk-js/no-rhc` (same `Clerk` named export, drop-in). Also adds explicit `Sentry.captureException` at both call sites, because the rejection was being swallowed by `.catch(console.error)` in App.tsx and by an unwrapped `c.openSignIn()` in checkout.ts — which is why this regression had zero Sentry trail in production. * fix(pro): catch Clerk load failures in startCheckout, not just openSignIn The PricingSection CTA fires-and-forgets `startCheckout()` with no .catch. The previous fix only wrapped `c.openSignIn()`, so any rejection from `await ensureClerk()` (dynamic import failure, network loss mid-load, clerk.load() throwing) still escaped as an unhandled promise — defeating the Sentry coverage we added. Now `startCheckout()` reports both load and openSignIn failures explicitly and returns false rather than rejecting. Also clear the cached `clerkLoadPromise` on failure so the next button click can retry from scratch instead of replaying a rejected promise forever. * fix(pro): only publish Clerk instance after load() succeeds _loadClerk() was assigning the module-level `clerk` singleton before awaiting `clerk.load()`. If load() rejected (transient network failure, malformed publishable key, Clerk frontend-api 4xx/5xx), the half- initialized instance stayed cached. The next ensureClerk() call then short-circuited on `if (clerk) return clerk;` and returned the broken instance, bypassing the retry path that commit `70d96d380` added. Hold the new instance in a local var, await load(), only then publish to the module slot. A failed load now leaves `clerk` null and the cleared `clerkLoadPromise` allows a genuine retry on the next click.	2026-04-20 13:45:46 +04:00
Elie Habib	1928b48e68	feat(portwatch): globalise EP3 — one paginated fetch, in-memory groupBy (#3225 ) * feat(portwatch): globalise EP3 — one paginated fetch, in-memory groupBy Follow-up to #3222 (stabiliser) — the real fix. Production log 2026-04-20T06:48-06:55 confirmed the stabilisers worked (per-country cap enforced at 90.0s, SIGTERM printed stage+errors, abort propagated through fetch + proxy paths) — but also proved the per-country shape itself is the bug: batch 1/15: 7 seeded, 5 errors (90.0s) ← per-country cap hit cleanly batch 5/15: 40 seeded, 20 errors (371.3s) ← 4 batches / ~70s avg SIGTERM at batch 6/15 after 420s 15 batches × ~70s = 1050s. Section budget is 420s. Per-country will never fit, even with a perfectly-behaving ArcGIS. Three countries (BRA, IDN, NGA) also returned "Invalid query parameters" on the ISO3-filtered WHERE — a failure mode unique to the per-country shape. Fix: replace 174 per-country round-trips with a single paginated pass over EP3, grouped by ISO3 in memory (same pattern EP4 refs already use via `fetchAllPortRefs`). ~150-200 sequential pages × ~1s each ≈ 2-4 min total wall time inside the 420s section. Eliminates the per-country failure modes by construction. Changes: - New `fetchAllActivityRows(since, { signal, progress })`: paginated `WHERE date > <ts>` across the whole Daily_Ports_Data feature server, grouped by attributes.ISO3 into Map<iso3, rows[]>. Advances offset by actual features.length (same server-cap defence as EP4). Checks signal.aborted between pages. - `fetchAll()` now reads the global map and drives `computeCountryPorts` for each eligible ISO3. No concurrency primitives, no batch loop, no Promise.allSettled. - Dropped: `processCountry`, `withPerCountryTimeout`, per-country `fetchActivityRows`, CONCURRENCY, PER_COUNTRY_TIMEOUT_MS, BATCH_LOG_EVERY. All dead under the global pattern. - `progress` shape now `{ stage, pages, countries }`. SIGTERM handler logs "SIGTERM during stage=<x> (pages=N, countries=M)" — still useful forensics if the global paginator itself hangs. - Shutdown controller: `main()` creates an AbortController, threads its signal through fetchAll → fetchAllActivityRows → fetchWithTimeout → _proxy-utils, and the SIGTERM handler calls abort() so in-flight HTTP work stops instead of burning SIGKILL grace window. Reuses the signal-threading plumbing shipped in #3222. Preserved: degradation guard (>20% drop rejects), TTL extension on failure, lock release in finally, 429 proxy fallback with signal propagation, page-level abort checks. Tests: 43 pass (dropped 2 withPerCountryTimeout runtime tests that targeted removed code; kept proxyFetch pre-aborted-signal test since the proxy plumbing is still exercised by the global fetch). Full test:data 5865 pass. Typecheck + lint clean. * fix(portwatch): stream-aggregate EP3 into per-port accumulators (PR #3225 P1) Review feedback on PR #3225: the first globalisation pass materialised the full 90-day activity dataset as Map<iso3, Feature[]> before any aggregation. At ~2000 ports × 90 days ≈ 180k feature objects × ~400 bytes = ~70MB RSS. Trades the timeout failure mode for an OOM/restart under large datasets on the 1GB Railway container. Fix: replace the two-phase "fetch-all then compute" shape with a single-pass streaming aggregator. - `fetchAndAggregateActivity(since, { signal, progress })` folds each page's features into Map<iso3, Map<portId, PortAccum>> inline and discards the raw features. Only ~2000 per-port accumulators (~100 bytes each = ~200KB) live across pages. Memory is O(ports), not O(rows). - PortAccum holds running counters for each aggregation window: last30_calls, last30_count, last30_import, last30_export, prev30_calls, last7_calls, last7_count. Captured once per port at first sighting (date ASC order preserves old `rows[0].portname` behaviour). - New `finalisePortsForCountry(portAccumMap, refMap)` — exported for tests — computes the exact same per-port fields as the removed `computeCountryPorts`: tankerCalls30d = last30_calls, tankerCalls30dPrev = prev30_calls, import/export from their sums, avg30d = last30_calls/last30_count, avg7d = last7_calls/last7_count, anomalySignal unchanged, trendDelta unchanged, top-50 truncation unchanged. - `fetchAll()` now calls the aggregator + finaliser directly; the transient Feature[] map is gone. Preserved from PR #3225: shutdown AbortController plumbing, 429 proxy fallback with signal propagation, degradation guard, SIGTERM diagnostic flush, page-level abort checks. Tests: 48 pass (was 43, +5 runtime tests for finalisePortsForCountry covering trendDelta, anomalySignal, top-N truncation, and missing refMap entries). Full test:data 5877 pass. Typecheck + lint clean. * fix(portwatch): skip EP3 geometry + thread signal into EP4 refs (PR #3225 review) Two valid findings from PR #3225 review on commit `fca1eaba0`: P1 — `returnGeometry: 'false'` was present on the EP4 refs paginator but missing on the EP3 activity paginator. ArcGIS returns geometry by default (~100-200KB per page). Across the ~150-200 pages this PR adds to the perf-critical path, that's tens of MB of unused coordinate data on the wire — directly undermining the 420s section budget the globalisation is meant to fit inside. One-line fix on the EP3 params block. P2 — `fetchAllPortRefs` accepted no signal, so during the 'refs' stage a SIGTERM would abort the shutdownController without cancelling any in-flight EP4 fetches. Each page could still run up to FETCH_TIMEOUT (45s) after the handler fired. Low blast radius (process.exit terminates the container regardless) but the PR description claimed full signal threading end-to-end; this closes the last gap. Tests: 50 pass (was 48, +2 for the two new assertions — returnGeometry presence in both paginators, fetchAllPortRefs signal plumbing). Full test:data 5879 pass. Typecheck + lint clean.	2026-04-20 13:44:42 +04:00
Elie Habib	1a2295157e	feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief (#3224 ) * feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief Problem ------- The 2026-04-20 08:00 brief contained 12 stories, 7 of which were duplicates of 4 events, alongside low-importance filler (niche commodity, domestic crime). notification-relay's IMPORTANCE_SCORE_MIN gate (#3223, set to 63) only applies to the realtime fanout path. The digest cron reads the same story:track:.currentScore but has NO absolute score floor — it just ranks and slices(0, 30), so on slow news days low-importance items bubble up to fill slots. Change ------ scripts/seed-digest-notifications.mjs: - New getDigestScoreMin() reads DIGEST_SCORE_MIN env at call time (Railway flips apply on the next cron tick, no redeploy). - Default 0 = no-op, so this PR is behaviour-neutral until the env var is set on Railway. - Filter runs AFTER deduplicateStories() so it drops clusters by the REPRESENTATIVE's score (which is the highest-scoring member of its cluster per materializeCluster's sort). - One-line operator log when the floor fires: [digest] score floor dropped N of M clusters (DIGEST_SCORE_MIN=X) tests/digest-score-floor.test.mjs (6 regressions): - getDigestScoreMin reads from process.env (not a module const) - default is 0 (no-op) - rejects non-integer / negative values (degrades to 0) - filter runs AFTER dedup, BEFORE slice(0, DIGEST_MAX_ITEMS) - short-circuits when floor is 0 (no wasted filter pass) - log line emits "dropped N of M clusters" Operator activation ------------------- Set on Railway seed-digest-notifications service: DIGEST_SCORE_MIN=63 Start at 63 to match the realtime gate, then nudge up/down based on the log lines over ~24h. Unset = off (pre-PR behaviour). Why not bundle a cosine-threshold bump -------------------------------------- The cosine-threshold tuning (0.60 -> 0.55 per the threshold probe) is an env-only flip already supported by the dedup orchestrator. Bundling an env-default change into this PR would slow rollback. Operator sets DIGEST_DEDUP_COSINE_THRESHOLD=0.55 on Railway as a separate action; this PR stays scoped to the score floor. Verification ------------ - npm run test:data 5825/5825 pass - tests/digest-score-floor 6/6 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on changed files clean (pre-existing main() complexity warning on this file is unchanged) - lint:md 0 errors - version:check OK Post-Deploy Monitoring & Validation ----------------------------------- - What to monitor* after setting DIGEST_SCORE_MIN on Railway: - `[digest] score floor dropped` lines — expect ~5-25% of clusters dropped on bulk-send ticks (stories=700+) - `[digest] Cron run complete: N digest(s) sent` stays > 0 - Expected healthy behaviour - 0-5 clusters dropped on normal ~80-story ticks - 50-200 dropped on bulk 700+ story ticks - brief still reports 10-30 stories for PRO users - Failure signals / rollback - 0 digests sent for 24h after flipping the var - user-visible brief now has < 10 stories - Rollback: unset DIGEST_SCORE_MIN on Railway dashboard (instant, no deploy), next cron tick reverts to unfiltered behaviour - Validation window: 24h - Owner: koala73 Related ------- - #3218 LLM prompt upgrade (source of importanceScore quality) - #3221 geopolitical scope for critical - #3223 notification-relay realtime gate (mirror knob) - #3200 embedding-based dedup (the other half of brief quality) * fix(digest): return null (not []) when score floor drains every cluster Greptile P2 finding on PR #3224. When DIGEST_SCORE_MIN is set high enough to filter every cluster, buildDigest previously returned [] (empty array). The caller's `if (!stories)` guard only catches falsy values, so [] slipped past the "No stories in window" skip-log and the run reached formatDigest([], nowMs) which returns null, then silently continued at the !storyListPlain check. Flow was still correct (no digest sent) but operators lost the observability signal to distinguish "floor too high" from "no news today" from "dedup ate everything". Fix: - buildDigest now returns null when the post-floor list is empty, matching the pre-dedup-empty path. Caller's existing !stories guard fires the canonical skip-log. - Emits a distinct `[digest] score floor dropped ALL N clusters (DIGEST_SCORE_MIN=X) — skipping user` line BEFORE the return, so operators can spot an over-aggressive floor in the logs. - Test added covering both the null-return contract and the distinct "dropped ALL" log line. 7/7 dedup-score-floor tests pass.	2026-04-20 10:19:03 +04:00
Elie Habib	4f38ee5a19	fix(portwatch): per-country timeout + SIGTERM progress flush (#3222 ) * fix(portwatch): per-country timeout + SIGTERM progress flush Diagnosed from Railway log 2026-04-20T04:00-04:07: Port-Activity section hit the 420s section cap with only batch 1/15 logged. Gap between batch 1 (67.3s) and SIGTERM was 352s of silence — batch 2 stalled because Promise.allSettled waits for the slowest country and processCountry had no per-country budget. One slow country (USA/CHN with many ports × many pages under ArcGIS EP3 throttling) blocked the whole batch and cascaded to the section timeout, leaving batches 2..15 unattempted. Two changes, both stabilisers ahead of the proper fix (globalising EP3): 1. Wrap processCountry in Promise.race against a 90s PER_COUNTRY_TIMEOUT_MS. Bounds worst-case batch time at ~90s regardless of ArcGIS behaviour. Orphan fetches keep running until their own AbortSignal.timeout(45s) fires — acceptable since the process exits soon after either way. 2. Share a `progress` object between fetchAll() and the SIGTERM handler so the kill path flushes batch index, seeded count, and the first 10 error messages. Past timeout kills discarded the errors array entirely, making every regression undiagnosable. * fix(portwatch): address PR #3222 P1+P2 (propagate abort, eager error flush) Review feedback on #3222: P1 — The 90s per-country timeout did not actually stop the timed-out country's work; Promise.race rejected but processCountry kept paginating with fresh 45s fetch timeouts per page, violating the CONCURRENCY=12 cap and amplifying ArcGIS throttling instead of containing it. Fix: thread an AbortController signal from withPerCountryTimeout through processCountry → fetchActivityRows → fetchWithTimeout. fetchWithTimeout combines the caller signal with AbortSignal.timeout(FETCH_TIMEOUT) via AbortSignal.any so the per-country abort propagates into the in-flight fetch. fetchActivityRows also checks signal.aborted between pages so a cancel lands on the next iteration boundary even if the current page has already resolved. Node 24 runtime supports AbortSignal.any. P2 — SIGTERM diagnostics missed failures from the currently-stuck batch because progress.errors was only populated after Promise.allSettled returned. A kill during the pending await left progress.errors empty. Fix: attach p.catch(err => errors.push(...)) to each wrapped promise before Promise.allSettled. Rejections land in the shared errors array at the moment they fire, so a SIGTERM mid-batch sees every rejection that has already occurred (including per-country timeouts that have already aborted their controllers). The settled loop skips rejected outcomes to avoid double-counting. Also exports withPerCountryTimeout with an injectable timeoutMs so the new runtime tests can exercise the abort path at 40ms. Runtime tests verify: (a) timer fires → underlying signal aborted + work rejects with the per-country message, (b) work-resolves-first returns the value, (c) work-rejects-first surfaces the real error, (d) eager .catch flush populates a shared errors array before allSettled resolves. Tests: 45 pass (was 38, +7 — 4 runtime + 3 source-regex). Full test:data: 5867 pass. Typecheck + lint clean. * fix(portwatch): abort also cancels 429 proxy fallback (PR #3222 P1 follow-up) Second review iteration on #3222: the per-country AbortController fix from `b2f4a2626` stopped at the direct fetch() and did not reach the 429 proxy fallback. httpsProxyFetchRaw only accepted timeoutMs, so a timed-out country could keep a CONNECT tunnel + request alive for up to another FETCH_TIMEOUT (45s) after the batch moved on — the exact throttling scenario the PR is meant to contain. The concurrency cap was still violated on the slow path. Threads `signal` all the way through: - scripts/_proxy-utils.cjs: proxyConnectTunnel + proxyFetch accept an optional signal option. Early-reject if `signal.aborted` before opening the socket. Otherwise addEventListener('abort') destroys the in-flight proxy socket + TLS tunnel and rejects with signal.reason. Listener removed in cleanup() on all terminal paths. Refactored both functions around resolveOnce/rejectOnce guards so the abort path races cleanly with timeout and network errors without double-settle. - scripts/_seed-utils.mjs: httpsProxyFetchRaw accepts + forwards `signal` to proxyFetch. - scripts/seed-portwatch-port-activity.mjs: fetchWithTimeout's 429 branch passes its caller signal to httpsProxyFetchRaw. Backward compatible: signal is optional in every layer, so the many other callers of proxyFetch / httpsProxyFetchRaw across the repo are unaffected. Tests: 49 pass (was 45, +4). New runtime test proves pre-aborted signals reject proxyFetch synchronously without touching the network. Source-regex tests assert signal threading at each layer. Full test:data 5871 pass. Typecheck + lint clean.	2026-04-20 09:36:10 +04:00
Elie Habib	6e639274f1	feat(scoring): set data-driven score gate thresholds (82/69/63) (#3223 ) * feat(scoring): set data-driven score gate thresholds (82/69/63) Calibrated from v5 shadow-log recalibration on 2026-04-20: critical sensitivity: 85 → 82 (fires on Hormuz closures, ship seizures, ceasefire collapses) high sensitivity: 65 → 69 (fires on mass shootings, blockade enforcement, major diplomatic) all/default MIN: 40 → 63 (drops tutorials, domestic crime, niche commodity) Activation: set IMPORTANCE_SCORE_LIVE=1 + IMPORTANCE_SCORE_MIN=63 on Railway notification-relay env vars after this PR merges. Scoring pipeline journey: PR #3069 — fixed stale relay score (Pearson 0.31 → 0.41) PR #3143 — closed /api/notify bypass PR #3144 — weight rebalance severity 55% (Pearson 0.41 → 0.67) PR #3218 — LLM prompt upgrade + cache v2 PR #3221 — geopolitical scope for critical This PR — final threshold constants * fix(scoring): use IMPORTANCE_SCORE_MIN for 'all' sensitivity threshold Review found the hardcoded 63 for 'all' sensitivity diverged from the IMPORTANCE_SCORE_MIN env var used at the relay ingress gate. An operator setting IMPORTANCE_SCORE_MIN=40 would still have 'all' subscribers miss alerts scored 40-62. Now both gates use the same env var (default 63).	2026-04-20 09:19:47 +04:00
Elie Habib	14c1314629	fix(scoring): scope "critical" to geopolitical events, not domestic tragedies (#3221 ) The weight rebalance (PR #3144) amplified a prompt gap: domestic mass shootings (e.g. "8 children killed in Louisiana") scored 88 because the LLM classified them as "critical" (mass-casualty 10+ killed) and the 55% severity weight pushed them into the critical gate. But WorldMonitor is a geopolitical monitor — domestic tragedies are terrible but not geopolitically destabilizing. Prompt change (both ais-relay.cjs + classify-event.ts): - "critical" now explicitly requires GEOPOLITICAL scope: "events that destabilize international order, threaten cross-border security, or disrupt global systems" - Domestic mass-casualty events (mass shootings, industrial accidents) moved to "high" — still important, but not critical-sensitivity alerts - Added counterexamples: "8 children killed in mass shooting in Louisiana → domestic mass-casualty → high" and "23 killed in fireworks factory explosion → industrial accident → high" - Retained: "700 killed in Sudan drone strikes → geopolitical mass- casualty in active civil war → critical" Classify cache: v2→v3 (bust stale entries that lack geopolitical scope). Shadow-log: v4→v5 (clean dataset for recalibration under the scoped prompt). 🤖 Generated with Claude Opus 4.6 via Claude Code Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-20 08:40:29 +04:00
Elie Habib	e2255840f6	fix(sentry): allowlist third-party tile hosts for maplibre Failed-to-fetch filter (#3220 ) * fix(sentry): allowlist third-party tile hosts for maplibre Failed-to-fetch filter Follow-up to #3217. The blanket "any maplibre frame + (hostname)" rule would drop real failures on our self-hosted R2 PMTiles bucket or any first-party fetch that happens to run on a maplibre-framed stack. Enumerated the actual third-party hosts our maplibre paths fetch from (tilecache.rainviewer.com, basemaps.cartocdn.com, tiles.openfreemap.org, protomaps.github.io) into a module-level Set and gated the filter on membership. First-party hosts keep surfacing. Updated regression test to mirror real-world mixed stacks (maplibre + first-party fetch wrapper) so the allowlist is what decides, not the pre-existing "all frames are maplibre internals" filter which is orthogonal. * fix(sentry): route maplibre AJAX errors past the generic vendor-only filter Review feedback: the broader "all non-infra frames are maplibre internals" TypeError filter at main.ts:287 runs BEFORE the new host-allowlist block and short-circuits it for all-vendor stacks. Meaning a self-hosted R2 basemap fetch failure whose stack is purely maplibre frames would still be silently dropped, defeating the point of the allowlist. Carve out the `Failed to fetch (<host>)` AJAX pattern: precompute `isMaplibreAjaxFailure` and skip the generic vendor filter when it matches, so the host-allowlist check is always the one that decides. Added two regression tests covering the all-maplibre edge case both ways: - allowlisted host + all-maplibre → still suppressed - non-allowlisted host + all-maplibre → surfaces	2026-04-20 08:21:37 +04:00
Elie Habib	1581b2dd70	fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 (#3218 ) * fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 PR B of the scoring recalibration plan (docs/plans/2026-04-17-002). Builds on PR A (weight rebalance, PR #3144) which achieved Pearson 0.669. This PR targets the remaining noise in the 50-69 band where editorials, tutorials, and domestic crime score alongside real news. LLM prompt upgrade (both writers): - scripts/ais-relay.cjs CLASSIFY_SYSTEM_PROMPT: added per-level guidelines, content-type distinction (editorial/opinion/tutorial → info, domestic crime → info, mass-casualty → critical), and concrete counterexamples. - server/worldmonitor/intelligence/v1/classify-event.ts: same guidelines added to align the second cache writer. Classify cache bump: - classify:sebuf:v1: → classify:sebuf:v2: in all three locations (ais-relay.cjs classifyCacheKey, list-feed-digest.ts enrichWithAiCache, _shared.ts CLASSIFY_CACHE_PREFIX). Old v1 entries expire naturally (24h TTL). All items reclassified within 15 min of Railway deploy. Keyword additions (_classifier.ts): - HIGH: 'sanctions imposed', 'sanctions package', 'new sanctions' (phrase patterns — no false positives on 'sanctioned 10 individuals') - MEDIUM: promoted 'ceasefire' from LOW to reduce prompt/keyword misalignment during cold-cache window Shadow-log v4: - Clean dataset for post-prompt-change recalibration. v3 rolls off via 7-day TTL. Deploy order: Railway first (seedClassify prewarms v2 cache immediately), then Vercel. First ~15 min of v4 may carry stale digest-cached scores. 🤖 Generated with Claude Opus 4.6 via Claude Code + Compound Engineering v2.49.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(scoring): align classify-event prompt + remove dead keywords + update v4 docs Review findings on PR #3218: P1: classify-event.ts prompt was missing 2 counterexamples and the "Focus" line present in the relay prompt. Both writers share classify:sebuf:v2 cache, so differing prompts mean nondeterministic classification depending on which path writes first. Now both prompts have identical level guidelines and counterexamples (format differs: array vs single object, but classification logic is aligned). P2: Removed 3 dead phrase-pattern keywords (sanctions imposed/package/ new) — the existing 'sanctions' entry already substring-matches all of them and maps to the same (high, economic). Just dead code that misled readers into thinking they added coverage. P2: Updated stale v3 references in cache-keys.ts (doc block + exported constant) and shadow-score-report.mjs header to v4. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-20 08:05:13 +04:00
Elie Habib	84eec7f09f	fix(health): align breadthHistory maxStaleMin with actual Tue-Sat cron schedule (#3219 ) Production alarm: `breadthHistory` went STALE_SEED every Monday morning despite the seeder running correctly. Root cause was a threshold / schedule mismatch: - Schedule (Railway): 02:00 UTC, Tuesday through Saturday. Five ticks per week, capturing Mon-Fri market close → following-day 02:00 UTC. - Threshold: maxStaleMin=2880 (48h), assuming daily cadence. - Max real gap: Sat 02:00 UTC → Tue 02:00 UTC = 72h. The existing 48h alarm fired every Monday at ~02:00 UTC when the Sun/Mon cron ticks are intentionally absent, until the Tue 02:00 UTC run restored fetchedAt. Fix: bump maxStaleMin to 5760 (96h). 72h covers the weekend gap; extra 24h tolerates one missed Tue run without alarming. Comment now records the actual schedule + reasoning. No seeder change needed — logs confirm the service fires and completes correctly on its schedule (Apr 16/17/18 02:00 UTC runs all "Done" with 3/3 readings, `Stopping Container` is normal Railway cron teardown). Diagnostic memo: this is the class of bug where the schedule comment lies. Original comment said "daily cron at 21:00 ET". True start time is 22:00 EDT / 21:00 EST Mon-Fri (02:00 UTC next day) AND only Mon-Fri, so "daily" is wrong by two days every week.	2026-04-20 07:56:54 +04:00
Elie Habib	0bc5b49267	fix(sentry): filter MapLibre AJAXError tile-fetch transients (#3217 ) WORLDMONITOR-NE/NF (8 events, 1 user): MapLibre wraps transient tile fetch failures as `TypeError: Failed to fetch (<hostname>)` and rethrows inside a Generator-backed Promise, leaking to onunhandledrejection even though DeckGLMap's map-error handler already logs them as warnings. Triggered mostly by adblockers/extensions and flaky mobile networks. Add a beforeSend filter gated on (a) the maplibre-specific paren message format (`Failed to fetch (hostname)` — our own fetch code throws plain `Failed to fetch` without hostname) and (b) presence of a maplibre vendor frame in the stack, so a real first-party fetch regression with the same message shape still surfaces. Covered by 3 regression tests.	2026-04-20 07:54:42 +04:00
Elie Habib	fc0c6bc163	fix(convex): use ConvexError for AUTH_REQUIRED so Sentry treats it as expected (#3216 ) * fix(convex): use ConvexError for AUTH_REQUIRED so Sentry treats it as expected WORLDMONITOR-N3: 8 events / 2 users from server-side Convex reporting `Uncaught Error: Authentication required` thrown by requireUserId() when a query fires before the WebSocket auth handshake completes. Every other business error in this repo uses ConvexError("CODE"), which Convex's server-side Sentry integration treats as expected rather than unhandled. Migrate requireUserId to ConvexError("AUTH_REQUIRED") (no consumer parses the message string — only a code comment references it) and add a matching client-side ignoreErrors pattern next to the existing API_ACCESS_REQUIRED precedent, as defense-in-depth against unhandled rejections reaching the browser SDK. * fix(sentry): drop broad AUTH_REQUIRED ignoreErrors — too many real call sites Review feedback: requireUserId() backs user-initiated actions (checkout, billing portal, API key ops), not just the benign query-race path. A bare `ConvexError: AUTH_REQUIRED` message-regex in ignoreErrors has no stack context, so a genuine auth regression breaking those flows for signed-in users would be silently dropped. The server-side ConvexError migration in convex/lib/auth.ts is enough to silence WORLDMONITOR-N3; anything that still reaches the browser SDK should surface.	2026-04-20 01:35:33 +04:00
Elie Habib	4c9888ac79	docs(mintlify): panel reference pages (PR 2) (#3213 ) * docs(mintlify): add user-facing panel reference pages (PR 2) Six new end-user pages under docs/panels/ for the shipped panels that had no user-facing documentation in the published docs, per the plan docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md. All claims are grounded in the live component source + SEED_META + handler dirs — no invented fields, counts, or refresh windows. - panels/latest-brief.mdx — daily AI brief panel (ready/composing/ locked states). Hard-gated PRO (`premium: 'locked'`). - panels/forecast.mdx — AI Forecasts panel (internal id `forecast`, label "AI Forecasts"). Domain + macro-region filter pills; 10% probability floor. Free on web, locked on desktop. - panels/consumer-prices.mdx — 5-tab retail-price surface (Overview / Categories / Movers / Spread / Health) with market, basket, and 7/30/90-day controls. Free. - panels/disease-outbreaks.mdx — WHO / ProMED / national health ministries outbreak alerts with alert/warning/watch pills. Free. - panels/radiation-watch.mdx — EPA RadNet + Safecast observations with anomaly scoring and source-confidence synthesis. Free. - panels/thermal-escalation.mdx — FIRMS/VIIRS thermal clusters with persistence and conflict-adjacency flags. Free. Also: - docs/docs.json — new Panels nav group (Latest Brief, AI Forecasts, Consumer Prices, Disease Outbreaks, Radiation Watch, Thermal Escalation). - docs/features.mdx — cross-link every panel name in the Cmd+K inventory to its new page (and link Country Instability + Country Resilience from the same list). - docs/methodology/country-resilience-index.mdx — short "In the dashboard" bridge section naming the three CRI surfaces (Resilience widget, Country Deep-Dive, map choropleth) so the methodology page doubles as the user-facing panel reference for CRI. No separate docs/panels/country-resilience.mdx — keeps the methodology page as the single source of truth. * docs(panels): fix Latest Brief polling description Reviewer catch: the panel does schedule a 60-second re-poll while in the composing state. `COMPOSING_POLL_MS = 60_000` at src/components/LatestBriefPanel.ts:78, and `scheduleComposingPoll()` is called from `renderComposing()` at :366. The poll auto-promotes the panel to ready without a manual refresh and is cleared when the panel leaves composing. My earlier 'no polling timer' line was right for the ready state but wrong as a blanket claim. * docs(panels): fix variant-availability claims across all 6 panel pages Reviewer catch on consumer-prices surfaced the same class of error on 4 other panel pages: I described variant availability with loose phrasing ('most variants', 'where X context is relevant', 'tech/ finance/happy opt-in') that didn't match the actual per-variant panel registries in src/config/panels.ts. Verified matrix against each _PANELS block directly: Panel \| FULL \| TECH \| FINANCE \| HAPPY \| COMMODITY consumer-prices \| opt \| - \| def \| - \| def latest-brief \| def \| def \| def \| - \| def (all PRO-locked) disease-outbreaks \| def \| - \| - \| - \| - radiation-watch \| def \| - \| - \| - \| - thermal-escalation \| def \| - \| - \| - \| - forecast \| def \| - \| - \| - \| - (PRO-locked on desktop) All 6 pages now name the exact variant blocks in src/config/panels.ts that register them, so the claim is re-verifiable by grep rather than drifting with future panel-registry changes. docs(panels): fix 5 reviewer findings — no invented controls/sources/keys All fixes cross-checked against source. - consumer-prices: no basket selector UI exists. The panel has a market bar, a range bar, and tab/category affordances; basket is derived from market selection (essentials-<code>, or DEFAULT_BASKET for the 'all' aggregate view). Per src/components/ConsumerPricesPanel.ts:120-123 and :216-229. - disease-outbreaks: 'Row click opens advisory' was wrong. The only interactive elements in-row are the source-name <a> link (sanitised URL, target=_blank); clicking the row itself is a no-op (the only content-level listener is for [data-filter] pills and the search input). Per DiseaseOutbreaksPanel.ts:35-49,115-117. - disease-outbreaks: upstream list was wrong. Actual seeder uses WHO DON (JSON API), CDC HAN (RSS), Outbreak News Today (aggregator), and ThinkGlobalHealth disease tracker (ProMED-sourced, 90d lookback). Noted the in-panel tooltip's shorter 'WHO, ProMED, health ministries' summary and gave the full upstream list with the 72h Redis TTL. Per seed-disease-outbreaks .mjs:31-38. - radiation-watch: summary bar renders 6 cards, not 7 — Anomalies, Elevated, Confirmed, Low Confidence, Conflicts, Spikes. The CPM-derived indicator is a per-row badge (radiation-flag-converted at :67), not a summary card. Moved the CPM reference to the per-row badges list. Per RadiationWatchPanel.ts:85-112. - latest-brief: Redis key shape corrected. The composer writes the envelope to brief:{userId}:{issueSlot} (where issueSlot comes from issueSlotInTz, not a plain date) and atomically writes a latest pointer at brief:latest:{userId} → {issueSlot}. Readers resolve via the pointer. 7-day TTL on both. Per seed-digest-notifications.mjs:1103-1115 and api/latest-brief.ts:80-89. * docs(panels): Tier 1 — PRO/LLM panel reference pages (9) Adds user-facing panel pages for the 9 PRO/LLM-backed surfaces flagged in the extended audit. All claims grounded in component source + src/config/panels.ts entries (with line cites). - panels/chat-analyst.mdx — WM Analyst (conversational AI, 5 quick actions, 4 domain scopes, POSTs /api/chat-analyst via premiumFetch). - panels/market-implications.mdx — AI Market Implications trade signals (LONG/SHORT/HEDGE × HIGH/MEDIUM/LOW, transmission paths, 120min maxStaleMin, degrade-to-warn). Carries the repo's disclaimer verbatim. - panels/deduction.mdx — Deduct Situation (opt-in PRO; 5s cooldown; composes buildNewsContext + active framework). - panels/daily-market-brief.mdx — Daily Market Brief (stanced items, framework selector, live vs cached source badge). - panels/regional-intelligence.mdx — Regional Intelligence Board (7 BOARD_REGIONS, 6 structured blocks + narrative sections, request-sequence arbitrator, opt-in PRO). - panels/strategic-posture.mdx — AI Strategic Posture (cached posture + live military vessels → recalcPostureWithVessels; free on web, enhanced on desktop). - panels/stock-analysis.mdx — Premium Stock Analysis (per-ticker deep dive: signal, targets, consensus, upgrades, insiders, sparkline). - panels/stock-backtest.mdx — Premium Backtesting (longitudinal view; live vs cached data badge). - panels/wsb-ticker-scanner.mdx — WSB Ticker Scanner (retail sentiment + velocity score with 4-tier color bucketing). All 9 are PRO (8 via apiKeyPanels allowlist at src/config/panels.ts:973, strategic-posture is free-on-web/enhanced-on-desktop). Variant matrices name the exact _PANELS block registering each panel. docs(panels): Tier 2 — flagship free data panels (7) Adds reference pages for 7 flagship free panels. Every claim grounded in the panel component + src/config/panels.ts per-variant registration. - panels/airline-intel.mdx — 6-tab aviation surface (ops/flights/ airlines/tracking/news/prices), 8 aviation RPCs, user watchlist. - panels/tech-readiness.mdx — ranked country tech-readiness index with 6-hour in-panel refresh interval. - panels/trade-policy.mdx — 6-tab trade-policy surface (restrictions/ tariffs/flows/barriers/revenue/comtrade). - panels/supply-chain.mdx — composite stress + carriers + minerals + Scenario Engine trigger surface (free panel, PRO scenario activation). - panels/sanctions-pressure.mdx — OFAC SDN + Consolidated list pressure rollup with new/vessels/aircraft summary cards and top-8 country rows. - panels/hormuz-tracker.mdx — Hormuz chokepoint drill-down; status indicator + per-series bar charts; references Scenario Engine's hormuz-tanker-blockade template. - panels/energy-crisis.mdx — IEA 2026 Energy Crisis Policy Response Tracker; category/sector/status filters. All 7 are free. Variant matrices name exact _PANELS blocks registering each panel. docs(panels): Tier 3 — compact panels (5) Adds reference pages for 5 compact user-facing panels. - panels/world-clock.mdx — 22 global market-centre clocks with exchange labels + open/closed indicators (client-side only). - panels/monitors.mdx — personal keyword alerts, localStorage-persisted; links to Features → Custom Monitors for longer explanation. - panels/oref-sirens.mdx — OREF civil-defence siren feed; active + 24h wave history; free on web, PRO-locked on desktop (_desktop && premium: 'locked' pattern). - panels/telegram-intel.mdx — topic-tabbed Telegram channel mirror via relay; free on web, PRO-locked on desktop. - panels/fsi.mdx — US KCFSI + EU FSI stress composites with four-level colour buckets (Low/Moderate/Elevated/High). All 5 grounded in component source + variant registrations. oref-sirens and telegram-intel correctly describe the _desktop && locking pattern rather than the misleading 'PRO' shorthand used earlier for other desktop-locked panels. * docs(panels): Tier 4 + 5 catalogue pages, nav re-grouping, features cross-links Closes out the comprehensive panel-reference expansion. Two catalogue pages cover the remaining ~60 panels collectively so they're all searchable and findable without dedicated pages per feed/tile. - panels/news-feeds.mdx — catalogue covering all content-stream panels: regional news (africa/asia/europe/latam/us/middleeast/politics), topical news (climate/crypto/economic/markets/mining/commodity/ commodities), tech/startup streams (startups/unicorns/accelerators/ fintech/ipo/layoffs/producthunt/regionalStartups/thinktanks/vcblogs/ defense-patents/ai-regulation/tech-hubs/ai/cloud/hardware/dev/ security/github), finance streams (bonds/centralbanks/derivatives/ forex/institutional/policy/fin-regulation/commodity-regulation/ analysis), happy variant streams (species/breakthroughs/progress/ spotlight/giving/digest/events/funding/counters/gov/renewable). - panels/indicators-and-signals.mdx — catalogue covering compact market-indicator tiles, correlation panels, and misc signal surfaces. Grouped by function: sentiment, macro, calendars, market-structure, commodity, crypto, regional economy, correlation panels, misc signals. docs/docs.json — split the Panels group into three for navigability: - Panels — AI & PRO (11 pages) - Panels — Data & Tracking (16 pages) - Panels — Catalogues (2 pages) docs/features.mdx — Cmd+K inventory rewritten as per-family sub-lists with links to every panel page (or catalogue page for the ones that live in a catalogue). Replaces the prior run-on paragraph. Every catalogue panel is also registered in at least one _PANELS block in src/config/panels.ts — the catalogue pages note this and point readers to the config file for variant-availability details. docs(panels): fix airline-intel + world-clock source-of-truth errors - airline-intel: refresh behavior section was wrong on two counts. (1) The panel DOES have a polling timer: a 5-minute setInterval in the constructor calling refresh() (which reloads ops + active tab). (2) The 'prices' tab does NOT re-fetch on tab switch — it's explicitly excluded from both tab-switch and auto-refresh paths, loading only on explicit search-button click. Three distinct refresh paths now documented with source line hints. Per src/components/AirlineIntelPanel.ts ~:173 (setInterval), :287 (prices tab-switch guard), :291 (refresh() prices skip). - world-clock: the WORLD_CITIES list has 30 entries, not '~22'. Replaced the approximate count with the exact number and a :14-43 line-range cite so it's re-verifiable.	2026-04-20 00:53:17 +04:00
Elie Habib	d1a4cf7780	docs(mintlify): add Route Explorer + Scenario Engine workflow pages (#3211 ) * docs(mintlify): add Route Explorer + Scenario Engine workflow pages Checkpoint for review on the IA refresh (per plan docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md). - docs/docs.json: link Country Resilience Index methodology under Intelligence & Analysis so the flagship 222-country feature is reachable from the main nav (previously orphaned). Add a new Workflows group containing route-explorer and scenario-engine. - docs/route-explorer.mdx: standalone workflow page. Who it is for, Cmd+K entry, four tabs (Current / Alternatives / Land / Impact), inputs, keyboard bindings, map-state integration, PRO gating with free-tier blur + public-route highlight, data sources. - docs/scenario-engine.mdx: standalone workflow page. Template categories (conflict / weather / sanctions / tariff_shock / infrastructure / pandemic), how a scenario activates on the map, PRO gating, pointers to the async job API. Deferred to follow-up commits in the same PR: - documentation.mdx landing rewrite - features.mdx refresh - maritime-intelligence.mdx link-out to Route Explorer - Panels nav group (waits for PR 2 content) All content grounded in live source files cited inline. * docs(mintlify): fix Route Explorer + Scenario Engine review findings Reviewer caught 4 cases where I described behavior I hadn't read carefully. All fixes cross-checked against source. - route-explorer (free-tier): the workflow does NOT blur a numeric payload behind a public demo route. On free tier, fetchLane() short-circuits to renderFreeGate() which blurs the left rail, replaces the tab area with an Upgrade-to-PRO card, and applies a generic public-route highlight on the map. No lane data is rendered in any tab. See src/components/RouteExplorer/ RouteExplorer.ts:212 + :342. - route-explorer (keyboard): Tab / Shift+Tab moves focus between the panel and the map. Direct field jumps are F (From), T (To), P (Product/HS2), not Tab-cycling. Also added the full KeyboardHelp binding list (S swap, ↑/↓ list nav, Enter commit, Cmd+, copy URL, Esc close, ? help, 1-4 tabs). See src/components/RouteExplorer/ KeyboardHelp.ts:9 and RouteExplorer.ts:623. - scenario-engine: the SCENARIO_TEMPLATES array only ships templates of 4 types today (conflict, weather, sanctions, tariff_shock). The ScenarioType union includes infrastructure and pandemic but no templates of those types ship. Dropped them from the shipped table and noted the type union leaves room for future additions. - scenario-engine + api-scenarios: the worker writes status: 'done' (not 'completed') on success, 'failed' on error; pending is synthesised by the status endpoint when no worker record exists. Fixed both the new workflow page and the merged api-scenarios.mdx completed-response example + polling language. See scripts/scenario-worker.mjs:421 and src/components/SupplyChainPanel.ts:870. * docs(mintlify): fix third-round review findings (real IDs + 4-state lifecycle) - api-scenarios (template example): replaced invented hormuz-closure-30d / ["hormuz"] with the actually-shipped hormuz-tanker-blockade / ["hormuz_strait"] from scenario- templates.ts:80. Listed the other 5 shipped template IDs so scripted users aren't dependent on a single example. - api-scenarios (status lifecycle): worker writes FOUR states, not three. Added the intermediate "processing" state with startedAt, written by the worker at job pickup (scenario- worker.mjs:411). Lifecycle now: pending → processing → done\|failed. Both pending and processing are non-terminal. - scenario-engine (scripted use blurb): mirror the 4-state language and link into the lifecycle table. - scenario-engine (UI dismiss): replaced "Click Deactivate" with the actual × dismiss control on the scenario banner (aria-label: "Dismiss scenario") per src/components/SupplyChainPanel.ts:790. Also described the banner contents (name, chokepoints, countries, tagline). - api-shipping-v2: while fixing chokepoint IDs, also corrected "hormuz" → "hormuz_strait" and "bab-el-mandeb" → "bab_el_mandeb" across all four occurrences in the shipping v2 page (from PR #3209). Real IDs come from server/_shared/chokepoint- registry.ts (snake_case, not kebab-case, not bare "hormuz"). * docs(mintlify): fix fourth-round findings (banner DOM, webhook TTL refresh) - scenario-engine: accurate description of the rendered scenario banner. Always-present elements are the ⚠ icon, scenario name, top-5 impacted countries with impact %, and dismiss ×. Params chip (e.g. '14d · +110% cost') and 'Simulating …' tagline are conditional on the worker result carrying template parameters (durationDays, disruptionPct, costShockMultiplier). The banner never lists affected chokepoints by name — the map and the chokepoint cards surface those. Per renderScenarioBanner at src/components/SupplyChainPanel.ts:750. - api-shipping-v2 (webhook TTL): register extends both the record and the owner-index set's 30-day TTL via atomic pipeline (SET + SADD + EXPIRE). rotate-secret and reactivate only extend the record's TTL — neither touches the owner-index set, so the owner index can expire independently if a caller only rotates/reactivates within a 30-day window. Re-register to keep both alive. Per api/v2/shipping/webhooks.ts:230 (register pipeline) and :325 (rotate setCachedJson on record only). * docs(mintlify): fix PRO auth contract (trusted origin ≠ PRO) - api-scenarios: 'X-WorldMonitor-Key (or trusted browser origin) + PRO' was wrong — isCallerPremium() explicitly skips trusted-origin short-circuits (keyCheck.required === false) and only counts (a) an env-valid or user-owned wm_-prefixed API key with apiAccess entitlement, or (b) a Clerk bearer with role=pro or Dodo tier ≥ 1. Browser calls work because premiumFetch() injects one of those credentials per request, not because Origin alone authenticates. Per server/_shared/premium-check.ts:34 and src/services/premium-fetch.ts:66. - usage-auth: strengthened the 'Entitlement / tier gating' section to state outright that authentication and PRO entitlement are orthogonal, and that trusted Origin is NOT accepted as PRO even though it is accepted for public endpoints. Listed the two real credential forms that pass the gate. * docs(mintlify): fix stale line cite (MapContainer.activateScenario at :1010) Greptile review P2: prose cited MapContainer.ts:1004 but activateScenario is declared at :1010. Line 1004 landed inside the JSDoc block. * docs(mintlify): finish PR 1 — landing rewrite, features refresh, maritime link-out Completes the PR 1 items from docs/plans/2026-04-19-001-feat-docs-user- facing-ia-refresh-plan.md that were deferred after the checkpoint on Route Explorer + Scenario Engine + CRI nav. No new pages — only edits to existing pages to point at and cohere with the new workflow pages. - documentation.mdx: landing rewrite. Dropped brittle counts (344 news sources, 49 layers, 24 CII countries, 31+ sources, 24 typed services) in favor of durable product framing. Surfaced the shipped differentiators that were invisible on the landing previously: Country Resilience Index (222 countries, linked to its methodology page), AI daily brief, Route Explorer, Scenario Engine, MCP server. Kept CII and CRI as two distinct country-risk surfaces — do not conflate. - features.mdx: replaced the 'all 55 panels' Cmd+K claim and the stale inventory list with family-grouped descriptions that include the panels this audit surfaced as missing (disease- outbreaks, radiation-watch, thermal-escalation, consumer-prices, latest-brief, forecast, country-resilience). Added a Workflows section linking to Route Explorer and Scenario Engine, and a Country-level risk section linking CII + CRI. Untouched sections (map, marker clustering, data layers, export, monitors, activity tracking) left as-is. - maritime-intelligence.mdx: collapsed the embedded Route Explorer subsection to a one-paragraph pointer at /route-explorer so the standalone page is the canonical home. Panels nav group remains intentionally unadded; it waits on PR 2 content to avoid rendering an empty group in Mintlify.	2026-04-19 18:39:36 +04:00
Elie Habib	1f66b0c486	fix(billing): wrap non-Error throws before Sentry.captureException (#3212 ) * fix(billing): wrap non-Error throws before Sentry.captureException Convex/Clerk bootstrap occasionally rejects with undefined, which Sentry.captureException then serializes as a synthetic `Error: undefined` with zero stack frames — impossible to debug. Normalize err to a real Error carrying the non-Error value in the message so the next occurrence yields a usable event. Resolves WORLDMONITOR-ND. * fix(billing): apply non-Error normalization to openBillingPortal too Review feedback: initSubscriptionWatch was fixed but openBillingPortal shares the same Convex/Clerk bootstrap helpers and the same raw Sentry.captureException(err) pattern — the synthetic `Error: undefined` signature can still surface from that path. Extract a module-level normalizeCaughtError() helper and apply it at both catch sites. * fix(billing): attach original err as cause on normalized Error Greptile P2: preserve the raw thrown value as structured `cause` data so Sentry can display it alongside the stringified message. Assigned post-construction because tsconfig target=ES2020 lacks the ErrorOptions typing for `new Error(msg, { cause })`; modern browsers and Sentry read the property either way.	2026-04-19 17:06:19 +04:00
Elie Habib	4853645d53	fix(brief): switch carousel to @vercel/og on edge runtime (#3210 ) * fix(brief): switch carousel to @vercel/og on edge runtime Every attempt to ship the Phase 8 Telegram carousel on Vercel's Node serverless runtime has failed at cold start: - PR #3174 direct satori + @resvg/resvg-wasm: Vercel edge bundler refused the `?url` asset import required by resvg-wasm. - PR #3174 (fix) direct satori + @resvg/resvg-js native binding: Node runtime accepted it, but Vercel's nft tracer does not follow @resvg/resvg-js/js-binding.js's conditional `require('@resvg/resvg-js-<platform>-<arch>-<libc>')` pattern, so the linux-x64-gnu peer package was never bundled. Cold start threw MODULE_NOT_FOUND, isolate crashed, FUNCTION_INVOCATION_FAILED on every request including OPTIONS, and Telegram reported WEBPAGE_CURL_FAILED with no other signal. - PR #3204 added `vercel.json` `functions.includeFiles` to force the binding in, but (a) the initial key was a literal path that Vercel micromatch read as a character class (PR #3206 fixed), (b) even with the corrected `api/brief/carousel/*` wildcard, the function still 500'd across the board. The `functions.includeFiles` path appears honored in the deployment manifest but not at runtime for this particular native-binding pattern. Fix: swap the renderer to @vercel/og's ImageResponse, which is Vercel's first-party wrapper around satori + resvg-wasm with Vercel-native bundling. Runs on Edge runtime — matches every other API route in the project. No native binding, no includeFiles, no nft tracing surprises. Cold start ~300ms, warm ~30ms. Changes: - server/_shared/brief-carousel-render.ts: replace renderCarouselPng (Uint8Array) with renderCarouselImageResponse (ImageResponse). Drop ensureLibs + satori + @resvg/resvg-js dynamic-import dance. Keep layout builders (buildCover/buildThreads/buildStory) and font loading unchanged — the Satori object trees are wire-compatible with ImageResponse. - api/brief/carousel/[userId]/[issueDate]/[page].ts: flip `runtime: 'nodejs'` -> `runtime: 'edge'`. Delegate rendering to the renderer's ImageResponse and return it directly; error path still 503 no-store so CDN + Telegram don't pin a bad render. - vercel.json: drop the now-useless `functions.includeFiles` block. - package.json: drop direct `@resvg/resvg-js` and `satori` deps (both now bundled inside @vercel/og). - tests/deploy-config.test.mjs: replace the native-binding regression guards with an assertion that no `functions` block exists (with a comment pointing at the skill documenting the micromatch gotcha for future routes). - tests/brief-carousel.test.mjs: updated comment references. Verified: - typecheck + typecheck:api clean - test:data 5814/5814 pass - node -e test: @vercel/og imports cleanly in Node (tests that reach through the renderer file no longer depend on native bindings) Post-deploy validation: curl -I -H "User-Agent: TelegramBot (like TwitterBot)" \ "https://www.worldmonitor.app/api/brief/carousel/<uid>/<slot>/0" # Expect: HTTP/2 403 (no token) or 200 (valid token) # NOT: HTTP/2 500 FUNCTION_INVOCATION_FAILED Then tail Railway digest logs on the next tick; the `[digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED` line should stop appearing, and the 3-image preview should actually land on Telegram. Add renderer smoke test + fix Cache-Control duplication Reviewer flagged residual risk: no dedicated carousel-route smoke test for the @vercel/og path. Adds one, and catches a real bug in the process. Findings during test-writing: 1. @vercel/og's ImageResponse runs CLEANLY in Node via tsx — the comment in brief-carousel.test.mjs saying "we can't test the render in Node" was true for direct satori + @resvg/resvg-wasm but no longer holds after PR #3210. Pure Node render works end-to-end: satori tree-parse, jsdelivr font fetch, resvg-wasm init, PNG output. ~850ms first call, ~20ms warm. 2. ImageResponse sets its own default `Cache-Control: public, immutable, no-transform, max-age=31536000`. Passing Cache-Control via the constructor's headers option APPENDS rather than overrides, producing a duplicated comma-joined value like `public, immutable, no-transform, max-age=31536000, public, max-age=60` on the Response. The route handler was doing exactly this via extraHeaders. Fix: drop our Cache-Control override and rely on @vercel/og's 1-year immutable default — envelope is only immutable for its 7d Redis TTL so the effective ceiling is 7d anyway (after that the route 404s before render). Changes: - tests/brief-carousel.test.mjs: 6 new assertions under `renderCarouselImageResponse`: * renders cover / threads / story pages, each returning a valid PNG (magic bytes + size range) * rejects a structurally empty envelope * threads non-cache extraHeaders onto the Response * pins @vercel/og's Cache-Control default so it survives caller-supplied Cache-Control overrides (regression guard for the bug fixed in this commit) - api/brief/carousel/[userId]/[issueDate]/[page].ts: remove the stacked Cache-Control; lean on @vercel/og default. Drop the now- unused `PAGE_CACHE_TTL` constant. Comment explains why. Verified: - test:data 5820/5820 pass (was 5814, +6 smoke) - typecheck + typecheck:api clean - Render smoke: cover 825ms / threads 23ms / story 16ms first run (wasm init dominates first render)	2026-04-19 15:18:12 +04:00
Elie Habib	e4c95ad9be	docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage (#3209 ) * docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage Audit against api/ + proto/ revealed 9 OpenAPI specs missing from nav, the scenario/v1 service undocumented, and MCP (32 tools + OAuth 2.1 flow) with no user-facing docs. The stale Docs_To_Review/API_REFERENCE.md still pointed at pre-migration endpoints that no longer exist. - Wire 9 orphaned specs into docs.json: ConsumerPrices, Forecast, Health, Imagery, Radiation, Resilience, Sanctions, Thermal, Webcam - Hand-write ScenarioService.openapi.yaml (3 RPCs) until it's proto-backed (tracked in issue #3207) - New MCP page with tool catalog + client setup (Claude Desktop/web, Cursor) - New MDX for OAuth, Platform, Brief, Commerce, Notifications, Shipping v2, Proxies - New Usage group: quickstart, auth matrix, rate limits, errors - Remove docs/Docs_To_Review/API_REFERENCE.md and EXTERNAL_APIS.md (referenced dead endpoints); add README flagging dir as archival * docs(mintlify): move scenario docs out of generated docs/api/ tree The pre-push hook enforces that docs/api/ is proto-generated only. Replace the hand-written ScenarioService.openapi.yaml with a plain MDX page (docs/api-scenarios.mdx) until the proto migration lands (tracked in issue #3207). * docs(mintlify): fix factual errors flagged in PR review Reviewer caught 5 endpoints where I speculated on shape/method/limits instead of reading the code. All fixes cross-checked against the source: - api-shipping-v2: route-intelligence is GET with query params (fromIso2, toIso2, cargoType, hs2), not POST with a JSON body. Response shape is {primaryRouteId, chokepointExposures[], bypassOptions[], warRiskTier, disruptionScore, ...}. - api-commerce: /api/product-catalog returns {tiers, fetchedAt, cachedUntil, priceSource} with tier groups free\|pro\|api_starter\| enterprise, not the invented {currency, plans}. Document the DELETE purge path too. - api-notifications: Slack/Discord /oauth/start are POST + Clerk JWT + PRO (returning {oauthUrl}), not GET redirects. Callbacks remain GET. - api-platform: /api/version returns the latest GitHub Release ({version, tag, url, prerelease}), not deployed commit/build metadata. - api-oauth + mcp: /api/oauth/register limit is 5/60s/IP (match code), not 10/hour. Also caught while double-checking: /api/register-interest and /api/contact are 5/60min and 3/60min respectively (1-hour window, not 1-minute). Both require Turnstile. Removed the fabricated limits for share-url, notification-channels, create-checkout (they fall back to the default per-IP limit). * docs(mintlify): second-round fixes — verify every claim against source Reviewer caught 7 more cases where I described API behavior I hadn't read. Each fix below cross-checked against the handler. - api-commerce (product-catalog): tiers are flat objects with monthlyPrice/annualPrice/monthlyProductId/annualProductId on paid tiers, price+period for free, price:null for enterprise. There is no nested plans[] array. - api-commerce (referral/me): returns {code, shareUrl}, not counts. Code is a deterministic 8-char HMAC of the Clerk userId; binding into Convex is fire-and-forget via ctx.waitUntil. - api-notifications (notification-channels): actual action set is create-pairing-token, set-channel, set-web-push, delete-channel, set-alert-rules, set-quiet-hours, set-digest-settings. Replaced the made-up list. - api-shipping-v2 (webhooks): alertThreshold is numeric 0-100 (default 50), not a severity string. Subscriber IDs are wh_+24hex; secret is raw 64-char hex (no whsec_ prefix). POST registration returns 201. Added the management routes: GET /{id}, POST /{id}/rotate-secret, POST /{id}/reactivate. - api-platform (cache-purge): auth is Authorization: Bearer RELAY_SHARED_SECRET, not an admin-key header. Body takes keys[] and/or patterns[] (not {key} or {tag}), with explicit per-request caps and prefix-blocklist behavior. - api-platform (download): platform+variant query params, not file=<id>. Response is a 302 to a GitHub release asset; documented the full platform/variant tables. - mcp: server also accepts direct X-WorldMonitor-Key in addition to OAuth bearer. Fixed the curl example which was incorrectly sending a wm_live_ API key as a bearer token. - api-notifications (youtube/live): handler reads channel or videoId, not channelId. - usage-auth: corrected the auth-matrix row for /api/mcp to reflect that OAuth is one of two accepted modes. * docs(mintlify): fix Greptile review findings - mcp.mdx: 'Five' slow tools → 'Six' (list contains 6 tools) - api-scenarios.mdx: replace invalid JSON numeric separator (8_400_000_000) with plain integer (8400000000) Greptile's third finding — /api/oauth/register rate-limit contradiction across api-oauth.mdx / mcp.mdx / usage-rate-limits.mdx — was already resolved in commit `4f2600b2a` (reviewed commit was `eb5654647`).	2026-04-19 15:03:16 +04:00
Elie Habib	38e6892995	fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205 ) * fix(brief): per-run slot URL so same-day digests link to distinct briefs Digest emails at 8am and 1pm on the same day pointed to byte-identical magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz. Each compose run overwrote the single daily envelope in place, and the composer rolling 24h story window meant afternoon output often looked identical to morning. Readers clicking an older email got whatever the latest cron happened to write. Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The magazine URL, carousel URLs, and Redis key all carry the slot, and each digest dispatch gets its own frozen envelope that lives out the 7d TTL. envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026". The digest cron also writes a brief:latest:{userId} pointer (7d TTL, overwritten each compose) so the dashboard panel and share-url endpoint can locate the most recent brief without knowing the slot. The previous date-probing strategy does not work once keys carry HHMM. No back-compat for the old YYYY-MM-DD format: the verifier rejects it, the composer only ever writes the new shape, and any in-flight notifications signed under the old format will 403 on click. Acceptable at the rollout boundary per product decision. * fix(brief): carve middleware bot allowlist to accept slot-format carousel path BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn image fetchers would 403 on sendMediaGroup, breaking previews for the new digest links. CI missed this because tests/middleware-bot-gate.test.mts still exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the slot format and add a regression asserting the pre-slot shape is now rejected, so legacy links cannot silently leak the allowlist after the rollout. * fix(brief): preserve caller-requested slot + correct no-brief share-url error Two contract bugs in the slot rollout that silently misled callers: 1. GET /api/latest-brief?slot=X where X has no envelope was returning { status: 'composing', issueDate: <today UTC> } — which reads as "today's brief is composing" instead of "the specific slot you asked about doesn't exist". A caller probing a known historical slot would get a completely unrelated "today" signal. Now we echo the requested slot back (issueSlot + issueDate derived from its date portion) when the caller supplied ?slot=, and keep the UTC-today placeholder only for the no-param path. 2. POST /api/brief/share-url with no slot and no latest-pointer was falling into the generic invalid_slot_shape 400 branch. That is not an input-shape problem; it is "no brief exists yet for this user". Return 404 brief_not_found — the same code the existing-envelope check returns — so callers get one coherent contract: either the brief exists and is shareable, or it doesn't and you get 404.	2026-04-19 14:15:59 +04:00
Elie Habib	56054bfbc1	fix(brief): use wildcard glob in vercel.json functions key (PR #3204 follow-up) (#3206 ) * fix(brief): use wildcard glob in vercel.json functions key PR #3204 shipped the right `includeFiles` value but the WRONG key: "api/brief/carousel/[userId]/[issueDate]/[page].ts" Vercel's `functions` config keys are micromatch globs, not literal paths. Bracketed segments like `[userId]` are parsed as character classes (match any ONE character from {u,s,e,r,I,d}), so my rule matched zero files and `includeFiles` was silently ignored. Post- merge probe still returned HTTP 500 FUNCTION_INVOCATION_FAILED on every request. Build log shows zero mentions of `carousel` or `resvg` — corroborates the key never applied. Fix: wildcard path segments. "api/brief/carousel/*" Matches any file under the carousel route dir. Since the only deployed file there is the dynamic-segment handler, the effective scope is identical to what I originally intended. Added a second regression test that sweeps every functions key and fails loudly if any bracketed segment slips back in. Guards against future reverts AND against anyone copy-pasting the literal route path without realising Vercel reads it as a glob. 23/23 deploy-config tests pass (was 22, +1 new guard). Address Greptile P2: widen bracket-literal guard regex Greptile spotted that `/\[[A-Za-z]+\]/` only matches purely-alphabetic segment names. Real-world Next.js routes often use `[user_id]`, `[issue_date]`, `[page1]`, `[slug2024]` — none flagged by the old regex, so the guard would silently pass on the exact kind of regression it was written to catch. Widened to `/\[[A-Za-z][A-Za-z0-9_]\]/`: - requires a leading letter (so legit char classes like `[0-9]` and `[!abc]` don't false-positive) - allows letters, digits, underscores after the first char - covers every Next.js-style dynamic-segment name convention Also added a self-test that pins positive cases (userId, user_id, issue_date, page1, slug2024) and negative cases (the actual `*` glob, `[0-9]`, `[!abc]`) so any future narrowing of the regex breaks CI immediately instead of silently re-opening PR #3206. 24/24 deploy-config tests pass (was 23, +1 new self-test).	2026-04-19 14:02:30 +04:00
Elie Habib	305dc5ef36	feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200 ) * feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) Replaces the inline Jaccard story-dedup in seed-digest-notifications with an orchestrator that can run Jaccard, shadow, or full embedding modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so production behaviour is unchanged until Phase C shadow + Phase D flip. New modules (scripts/lib/): - brief-dedup-consts.mjs tunables + cache prefix + __constants bag - brief-dedup-jaccard.mjs verbatim 0.55-threshold extract (fallback) - entity-gazetteer.mjs cities/regions gazetteer + common-caps - brief-embedding.mjs OpenRouter /embeddings client with Upstash cache, all-or-nothing timeout, cosineSimilarity - brief-dedup-embed.mjs complete-link clustering + entity veto (pure) - brief-dedup.mjs orchestrator, env read at call entry, shadow archive, structured log line Operator tools (scripts/tools/): - calibrate-dedup-threshold.mjs offline calibration runner + histogram - golden-pair-validator.mjs live-embedder drift detector (nightly CI) - shadow-sample.mjs Sample A/B CSV emitter over SCAN archive Tests: - brief-dedup-jaccard.test.mjs migrated from regex-harness to direct import plus orchestrator parity tests (22) - brief-dedup-embedding.test.mjs 9 plan scenarios incl. 10-permutation property test, complete-link non-chain (21) - brief-dedup-golden.test.mjs 20-pair mocked canary (21) Workflows: - .github/workflows/dedup-golden-pairs.yml nightly live-embedder canary (07:17 UTC), opens issue on drift Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran shuts Hormuz") case can't return true under a single coherent classification (country-in-A vs capital-in-B sit on different sides of the actor/location boundary). Gazetteer follows the plan's "countries are actors" intent; the test is updated to assert false with a comment pointing at the irreducible capital-country coreference limitation. Verification: - npm run test:data 5825/5825 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on new files clean - lint:md 0 errors Phase B (calibration), Phase C (shadow), and Phase D (flip) are subsequent PRs. * refactor(digest-dedup): address review findings 193-199 Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across kieran-typescript, security-sentinel, performance-oracle, architecture- strategist, and code-simplicity reviewers. Fixes below; all 64 dedup tests + 5825 data tests + 171 edge-function tests still green. P1 #193 - dedup regex + redis pipeline duplication - Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs; both orchestrator and embedding client import from there. - normalizeForEmbedding now delegates to stripSourceSuffix from the Jaccard module so the outlet allow-list is single-sourced. P1 #194 - embedding timeout floor + negative-budget path - callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0 instead of opening a doomed 250ms fetch. - Removed Math.max(250, ...) floor that let wall-clock cap overshoot. P1 #195 - dead env getters - Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled / getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs (zero callers; orchestrator reimplements inline). P2 #196 - orchestrator cleanup bundle - Removed re-exports at bottom of brief-dedup.mjs. - Extracted materializeCluster into brief-dedup-jaccard.mjs; both the fallback and orchestrator use the shared helper. - Deleted clusterWithEntityVeto wrapper; orchestrator inlines the vetoFn wiring at the single call site. - Shadow mode now runs Jaccard exactly once per tick (was twice). - Fallback warn line carries reason=ErrorName so operators can filter timeout vs provider vs shape errors. - Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs silently falling to jaccard). P2 #197 - workflow + shadow-sample hardening - dedup-golden-pairs.yml body composition no longer relies on a heredoc that would command-substitute validator stdout. Switched to printf with sanitised LOG_TAIL (printable ASCII only) and --body-file so crafted fixture text cannot escape into the runner. - shadow-sample.mjs Upstash helper enforces a hardcoded command allowlist (SCAN \| GET \| EXISTS). P2 #198 - test + observability polish - Scenarios 2 and 3 deep-equal returned clusters against the Jaccard expected shape, not just length. Also assert the reason= field. P3 #199 - nits - Removed __constants test-bag; jaccard tests use named imports. - Renamed deps.apiKey to deps._apiKey in embedding client. - Added @pre JSDoc on diffClustersByHash about unique-hash contract. - Deferred: mocked golden-pair test removal, gazetteer JSON migration, scripts/tools AGENTS.md doc note. Todos 193-199 moved from pending to complete. Verification: - npm run test:data 5825/5825 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on changed files clean * fix(digest-dedup): address Greptile P2 findings on PR #3200 1. brief-embedding.mjs: wrap fetch lookup as `(...args) => globalThis.fetch(...args)` instead of aliasing bare `fetch`. Aliasing captures the binding at module-load time, so later instrumentation / Edge-runtime shims don't see the wrapper — same class of bug as the banned `fetch.bind(globalThis)` pattern flagged in AGENTS.md. 2. dedup-golden-pairs.yml: `gh issue create --label "..." \|\| true` silently swallowed the failure when any of dedup/canary/p1 labels didn't pre-exist, breaking the drift alert channel while leaving the job red in the Actions UI. Switched to repeated `--label` flags + `--create-label` so any missing label is auto-created on first drift, and dropped the `\|\| true` so a legitimate failure (network / auth) surfaces instead of hiding. Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1); applied pre-merge so the nightly canary is usable from day one. * fix(digest-dedup): two P1s found on PR #3200 P1 — canary classifier must match production Nightly golden-pair validator was checking a hardcoded threshold (default 0.60) and always applied the entity veto, while the actual dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase C/D env flip could make the canary green while prod was wrong or red while prod was healthy, defeating the whole point of a drift detector. Fix: - golden-pair-validator.mjs now calls readOrchestratorConfig(process.env) — the same helper the orchestrator uses — so any classifier knob added later is picked up automatically. The threshold and veto- enabled flags are sourced from env by default; a --threshold CLI flag still overrides for manual calibration sweeps. - dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.), which operators must keep in lockstep with Railway. The workflow_dispatch threshold input now defaults to empty; the scheduled canary always uses the production-parity config. - Validator log line prints the effective config + source so nightly output makes the classifier visible. P1 — shadow archive writes were fail-open `defaultRedisPipeline()` returns null on timeout / auth / HTTP failure. `writeShadowArchive()` only had a try/catch, so the null result was silently treated as success. A Phase C rollout could log clean "mode=shadow … disagreements=X" lines every tick while the Upstash archive received zero writes — and Sample B labelling would then find no batches, silently killing calibration. Fix: - writeShadowArchive now inspects the pipeline return. null result, non-array response, per-command {error}, or a cell without {result: "OK"} all return {ok: false, reason}. - Orchestrator emits a warn line with the failure reason, and the structured log line carries archive_write=ok\|failed so operators can grep for failed ticks. - Regression test in brief-dedup-embedding.test.mjs simulates the null-pipeline contract and asserts both the warn and the structured field land. Verification: - test:data 5825/5825 pass - dedup suites 65/65 pass (new: archive-fail regression) - typecheck + api clean - biome check clean on changed files fix(digest-dedup): two more P1s found on PR #3200 P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED The prior round fixed the threshold/veto knobs but left the canary running embeddings regardless of whether production could actually reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the classifier, so a drift signal is meaningless — or worse, a live OpenRouter issue flags the canary while prod is obliviously fine. Fix: - golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the same readOrchestratorConfig() helper the orchestrator uses. When either says "embed path inactive in prod", the validator logs an explicit skip line and exits 0. The nightly workflow then shows green, which is the correct signal ("nothing to drift against"). - A --force CLI flag remains for manual dispatch during staged rollouts. - dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables alongside the threshold and veto-enabled knobs, so all four classifier gates stay in lockstep with Railway. - Validator log line now prints mode + remoteEmbedEnabled so the canary output surfaces which classifier it validated. P1 — shadow-sample Sample A was biased by SCAN order enumerate-and-dedup added every seen pair to a dedup key BEFORE filtering by agreement. If the same pair appeared in an agreeing batch first and a disagreeing batch later, the disagreeing occurrence was silently dropped. SCAN order is unspecified, so Sample A could omit real disagreement pairs. Fix: - Extracted the enumeration into a pure `enumeratePairs(archives, mode)` export so the logic is testable. Mode filter runs BEFORE the dedup check: agreeing pairs are skipped entirely under --mode disagreements, so any later disagreeing occurrence can still claim the dedup slot. - Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression cases: agreement-then-disagreement, reversed order (symmetry), always-agreed omission, population enumeration, cross-batch dedup. - isMain guard added so importing the module for tests does not kick off the CLI scan path. Verification: - test:data 5825/5825 pass - dedup suites 70/70 pass (5 new shadow-sample regressions) - typecheck + api clean - biome check clean on changed files Operator follow-up before Phase C: Set all FOUR dedup repo variables in GitHub alongside Railway: DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED, DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED * refactor(digest-dedup): Railway is the single source of truth for dedup config Fair user pushback: asking operators to set four DIGEST_DEDUP_* values in BOTH Railway (where the cron runs) AND GitHub repo variables (where the canary runs) is architectural debt. Two copies of the same truth will always drift. Solution: the digest cron publishes its resolved config to Upstash on every tick under brief:dedup:config:v1 (2h TTL). The nightly golden-pair canary reads that key instead of env vars. Railway stays the sole source of truth; no parallel repo variables to maintain. A missing/expired key signals "cron hasn't run" and the canary skips with exit 0 — better than validating against hardcoded defaults that might diverge from prod. Changes: - brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants. - brief-dedup.mjs: new publishActiveConfig() fires at the start of every deduplicateStories() call (before the mode short-circuit, so jaccard ticks also publish a "mode=jaccard" signal the canary can read). Fire-and-forget; archive-write error semantics still apply if the operator wants stricter tracking. - golden-pair-validator.mjs: removed readOrchestratorConfig(env) path. Now calls fetchActiveConfigFromUpstash() and either validates against that config, skips when the embed path is inactive, or skips when the key is missing (with --force override for manual dispatch). - dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines and the corresponding repo-variable dependency. Only the three Upstash + OpenRouter secrets remain. - tests: two new regressions assert config is published on every tick (shadow AND jaccard modes) with the right shape + TTL. Operator onboarding now takes one action: set the four DIGEST_DEDUP_* variables on the Railway seed-digest-notifications service. Nothing to set in GitHub beyond the existing OPENROUTER_API_KEY / UPSTASH_* secrets. Verification: - test:data 5825/5825 pass - dedup suites 72/72 pass (2 new config-publish regressions) - typecheck + api clean - biome check clean on changed files * refactor(digest-dedup): ship embed directly, drop phases/canary/shadow User feedback: "i dont need multiple phases and shit, we go directly to embed". Fair. Ripping out the overengineering I accumulated: DELETED - .github/workflows/dedup-golden-pairs.yml (nightly canary) - scripts/tools/golden-pair-validator.mjs - scripts/tools/shadow-sample.mjs - scripts/tools/calibrate-dedup-threshold.mjs - tests/fixtures/brief-dedup-golden-pairs.json - tests/brief-dedup-golden.test.mjs - tests/brief-dedup-shadow-sample.test.mjs SIMPLIFIED - brief-dedup.mjs: removed shadow mode, publishActiveConfig, writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes, and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now binary: `embed` (default) or `jaccard` (instant kill switch). - brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_, ACTIVE_CONFIG_. - Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path). Railway deploy with OPENROUTER_API_KEY set = embeddings live on next cron tick. Set MODE=jaccard on Railway to revert instantly. Orchestrator still falls back to Jaccard on any embed-path failure (timeout, provider outage, missing API key, bad response). Fallback warn carries reason=<ErrorName>. The cron never fails because embeddings flaked. All 64 dedup tests + 5825 data tests still green. Net diff: -1,407 lines. Operator single action: set OPENROUTER_API_KEY on Railway's seed-digest-notifications service (already present) and ship. No GH Actions, no shadow archives, no labelling sprints. If the 0.60 threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on Railway — takes effect on next tick, no redeploy. * fix(digest-dedup): multi-word location phrases in the entity veto Extractor was whitespace-tokenising and only single-token matching against LOCATION_GAZETTEER, silently making every multi-word entry unreachable: extractEntities("Houthis strike ship in Red Sea") → { locations: [], actors: ['houthis','red','sea'] } ✗ shouldVeto("Houthis strike ship in Red Sea", "US escorts convoy in Red Sea") → false ✗ With MODE=embed as the default, that turned off the main anti-overmerge safety rail for bodies of water, regions, and compound city names — exactly the P07-Hormuz / Houthis-Red-Sea headlines the veto was designed to cover. Fix: greedy longest-phrase scan with a sliding window. At each token position try the longest multi-word phrase first (down to 2), require first AND last tokens to be capitalised (so lowercase prose like "the middle east" doesn't falsely match while headline "Middle East" does), lowercase connectors in between are fine ("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to single-token lookup when no multi-word phrase fits. Now: extractEntities("Houthis strike ship in Red Sea") → { locations: ['red sea'], actors: ['houthis'] } ✓ shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true ✓ Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4 (longest gazetteer entry: "ho chi minh city"), so this is effectively O(N). Added 5 regression tests covering Red Sea, South China Sea, Strait of Hormuz (lowercase-connector case), Abu Dhabi, and New York, plus the Houthis-vs-US veto reproducer from the P1. All 5825 data tests + 45 dedup tests green; lint + typecheck clean.	2026-04-19 13:49:48 +04:00
Elie Habib	27849fee1e	fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn (#3204 ) * fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn Real root cause of every Telegram carousel WEBPAGE_CURL_FAILED since PR #3174 merged. Not middleware (last PR fixed that theoretical path but not the observed failure). The Vercel function itself crashes HTTP 500 FUNCTION_INVOCATION_FAILED on every request including OPTIONS - the isolate can't initialise. The handler imports brief-carousel-render which lazy-imports @resvg/resvg-js. That package's js-binding.js does runtime require(@resvg/resvg-js-<platform>-<arch>-<libc>). On Vercel Lambda (Amazon Linux 2 glibc) that resolves to @resvg/resvg-js-linux-x64-gnu. Vercel nft tracing does NOT follow this conditional require so the optional peer package isnt bundled. Cold start throws MODULE_NOT_FOUND, isolate crashes, Vercel returns FUNCTION_INVOCATION_FAILED, Telegram reports WEBPAGE_CURL_FAILED. Fix: vercel.json functions.includeFiles forces linux-x64-gnu binding into the carousel functions bundle. Only this route needs it; every other api route is unaffected. Verified: - deploy-config tests 21/21 pass - JSON valid - Reproduced 500 via curl on all methods and UAs - resvg-js/js-binding.js confirms linux-x64-gnu is the runtime binary on Amazon Linux 2 glibc Post-merge: curl with TelegramBot UA should return 200 image/png instead of 500; next cron tick should clear the Railway [digest] Telegram carousel 400 line. * Address Greptile P2s: regression guard + arch-assumption reasoning Two P2 findings on PR #3204: P2 #1 (inline on vercel.json:6): Platform architecture assumption undocumented. If Vercel migrates to Graviton/arm64 Lambda the cold-start crash silently returns. vercel.json is strict JSON so comments aren't possible inline. P2 #2 (tests/deploy-config.test.mjs:17): No regression guard for the carousel includeFiles rule. A future vercel.json tidy-up could silently revert the fix with no CI signal. Fixed both in a single block: - New describe() in deploy-config.test.mjs asserts the carousel route's functions entry exists AND its includeFiles points at @resvg/resvg-js-linux-x64-gnu. Any drift fails the build. - The block comment above it documents the Amazon Linux 2 x86_64 glibc assumption that would have lived next to the includeFiles entry if JSON supported comments. Includes the Graviton/arm64 migration pointer. tests 22/22 pass (was 21, +1 new).	2026-04-19 13:36:17 +04:00
Elie Habib	45f02fed00	fix(sentry): filter Three.js OrbitControls setPointerCapture NotFoundError (#3201 ) * fix(sentry): suppress Three.js OrbitControls setPointerCapture NotFoundError OrbitControls' pointerdown handler calls setPointerCapture after the browser has already released the pointer (focus change, rapid re-tap), leaking as an unhandled NotFoundError. OrbitControls is bundled into main-.js so hasFirstParty=true; matched by the unique setPointerCapture message (grep confirms no first-party setPointerCapture usage). Resolves WORLDMONITOR-NC. fix(sentry): gate OrbitControls setPointerCapture filter on bundle-only stack Review feedback: suppressing by message alone would hide a future first-party setPointerCapture regression. Mirror the existing OrbitControls filter's provenance check — require absence of any source-mapped .ts/.tsx frame so the filter only matches stacks whose only non-infra frame is the bundled main chunk. Adds positive + negative regression tests for the pair. * fix(sentry): gate OrbitControls filter on positive three.js context signature Review feedback: absence of .ts/.tsx frames is not proof of third-party origin because production stacks are often unsymbolicated. Replace the negative-only gate with a positive OrbitControls signature — require a frame whose context slice contains the literal `_pointers … setPointerCapture` adjacency unique to three.js OrbitControls. Update tests to cover the production-realistic case (unsymbolicated first-party bundle frame calling setPointerCapture must still reach Sentry) plus a defensive no-context fallthrough.	2026-04-19 13:15:31 +04:00
Elie Habib	d7f87754f0	fix(emails): update transactional email copy — 22 → 30+ services (#3203 ) Follow-up to #3202. Greptile flagged two transactional email templates still claimed '22 services' while /pro now advertises '30+': - api/register-interest.js:90 — interest-registration confirmation email ('22 Services, 1 Key') - convex/payments/subscriptionEmails.ts:57 — API subscription confirmation email ('22 services, one API key') A user signing up via /pro would read '30+ services' on the page, then receive an email saying '22'. Both updated to '30+' matching the /pro page and the actual server domain count (31 in server/worldmonitor/*, plus api/scenario/v1/ = 32, growing).	2026-04-19 13:15:17 +04:00
Elie Habib	135082d84f	fix(pro): correct service-domain count — 22 → 30+ (server has 31) (#3202 ) * fix(pro): correct service-domain count — 22 → 30+ (server has 31, growing) The /pro page advertised '22 services' / '22 service domains' but server/worldmonitor/, proto/worldmonitor/, and src/generated/server/worldmonitor/ all have 31 domain dirs (aviation, climate, conflict, consumer-prices, cyber, displacement, economic, forecast, giving, health, imagery, infrastructure, intelligence, maritime, market, military, natural, news, positive-events, prediction, radiation, research, resilience, sanctions, seismology, supply-chain, thermal, trade, unrest, webcam, wildfire). api/scenario/v1/ adds a 32nd recently shipped surface. Used '30+' rather than the literal '31' so the page doesn't drift again every time a new domain ships (the '22' was probably accurate at one point too). 168 string substitutions across all 21 locale JSON files (8 keys each: twoPath.proDesc, twoPath.proF1, whyUpgrade.fasterDesc, pillars.askItDesc, dataCoverage.subtitle, proShowcase.oneKey, apiSection.restApi, faq.a8). Plus 10 in pro-test/index.html (meta description, og:description, twitter:description, SoftwareApplication ld+json description + Pro Monthly offer, FAQ ld+json a8, noscript fallback). Bundle rebuilt. * fix(pro): Bulgarian grammar — drop definite-article suffix after 30+	2026-04-19 13:07:07 +04:00

1 2 3 4 5 ...

3506 Commits