worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	048bb8bb52	fix(brief): unblock whyMatters analyst endpoint (middleware 403) + DIGEST_ONLY_USER filter (#3255 ) * fix(brief): unblock whyMatters analyst endpoint + add DIGEST_ONLY_USER filter Three changes, all operational for PR #3248's brief-why-matters feature. 1. middleware.ts PUBLIC_API_PATHS allowlist Railway logs post-#3248 merge showed every cron call to /api/internal/brief-why-matters returning 403 — middleware's "short UA" guard (~L183) rejects Node undici's default UA before the endpoint's own Bearer-auth runs. The feature never executed in prod; three-layer fallback silently shipped legacy Gemini output. Same class as /api/seed-contract-probe (2026-04-15). Endpoint still carries its own subtle-crypto HMAC auth, so bypassing the UA gate is safe. 2. Explicit UA on callAnalystWhyMatters fetch Defense-in-depth. Explicit 'worldmonitor-digest-notifications/1.0' keeps the endpoint reachable if PUBLIC_API_PATHS is ever refactored, and makes cron traffic distinguishable from ops curl in logs. 3. DIGEST_ONLY_USER=user_xxx filter Operator single-user test flag. Set on Railway to run compose + send for one user on the next tick (then unset) — validates new features end-to-end without fanning out. Empty/unset = normal fan-out. Applied right after rule fetch so both compose and dispatch paths respect it. Regression tests: 15 new cases in tests/middleware-bot-gate.test.mts pin every PUBLIC_API_PATHS entry against 3 triggers (empty/short/curl UA) plus a negative sibling-path suite so a future prefix-match refactor can't silently unblock /api/internal/. Tests: 6043 pass. typecheck + typecheck:api clean. biome: pre-existing main() complexity warning bumped 74→78 by the filter block (unchanged in character from pre-PR). * test(middleware): expand sibling-path negatives to cover all 3 trigger UAs Greptile flagged: `SIBLING_PATHS` was only tested with `EMPTY_UA`. Under the current middleware chain this is sufficient (sibling paths hit the short-UA OR BOT_UA 403 regardless), but it doesn't pin which guard fires. A future refactor that moves `PUBLIC_API_PATHS.has(path)` later in the chain could let a curl or undici UA pass on a sibling path without this suite failing. Fix: iterate the 3 sibling paths against all 3 trigger UAs (empty, short/undici, curl). Every combination must still 403 regardless of which guard catches it. 6 new test cases. Tests: 35 pass in the middleware-bot-gate suite (was 29).	2026-04-21 19:41:58 +04:00
Elie Habib	65a1210531	fix(unrest): Decodo proxy fallback for GDELT + surface err.cause (#3256 ) * fix(unrest): Decodo proxy fallback for GDELT + surface err.cause Background: unrestEvents went STALE_SEED when every tick logged "GDELT failed: fetch failed" (Railway log 2026-04-21). The bare "fetch failed" string hid the actual cause (DNS/TCP/TLS), so the outage was opaque. ACLED is disabled (no credentials) so GDELT is the sole live source — when it fails, the seed freezes. Changes: - fetchGdeltEvents: direct-first, Decodo proxy fallback via httpsProxyFetchRaw when PROXY_URL is configured. Mirrors imfFetchJson / _yahoo-fetch.mjs direct→proxy pattern. - Error messages now include err.cause.code (UND_ERR_CONNECT_TIMEOUT, ENOTFOUND, ECONNRESET, etc.) so the next outage surfaces the underlying transport error instead of "fetch failed". - Both-paths-failed error carries direct + proxy message so either can be diagnosed from a single log line. No behavior change on the happy path — direct fetch still runs first with the existing 30s AbortSignal timeout. * fix(unrest): address PR #3256 P2 review - describeErr: handle plain-string .cause (e.g. `{ cause: 'ENOTFOUND' }`) that would otherwise be silently dropped since a string has no .code/.errno/.message accessors. - fetchGdeltDirect: tag HTTP-status errors (!resp.ok) with httpStatus. fetchGdeltEvents skips the proxy hop for upstream HTTP errors since the proxy routes to the same GDELT endpoint — saves the 20s proxy timeout and avoids a pointless retry. Transport failures (DNS/TCP/TLS timeouts against Railway IPs) still trigger the proxy fallback, which is the motivating case.	2026-04-21 19:39:16 +04:00
Elie Habib	2f19d96357	feat(brief): route whyMatters through internal analyst-context endpoint (#3248 ) * feat(brief): route whyMatters through internal analyst-context endpoint The brief's "why this is important" callout currently calls Gemini on only {headline, source, threatLevel, category, country} with no live state. The LLM can't know whether a ceasefire is on day 2 or day 50, that IMF flagged >90% gas dependency in UAE/Qatar/Bahrain, or what today's forecasts look like. Output is generic prose instead of the situational analysis WMAnalyst produces when given live context. This PR adds an internal Vercel edge endpoint that reuses a trimmed variant of the analyst context (country-brief, risk scores, top-3 forecasts, macro signals, market data — no GDELT, no digest-search) and ships it through a one-sentence LLM call with the existing WHY_MATTERS_SYSTEM prompt. The endpoint owns its own Upstash cache (v3 prefix, 6h TTL), supports a shadow mode that runs both paths in parallel for offline diffing, and is auth'd via RELAY_SHARED_SECRET. Three-layer graceful degradation (endpoint → legacy Gemini-direct → stub) keeps the brief shipping on any failure. Env knobs: - BRIEF_WHY_MATTERS_PRIMARY=analyst\|gemini (default: analyst; typo → gemini) - BRIEF_WHY_MATTERS_SHADOW=0\|1 (default: 1; only '0' disables) - BRIEF_WHY_MATTERS_SHADOW_SAMPLE_PCT=0..100 (default: 100) - BRIEF_WHY_MATTERS_ENDPOINT_URL (Railway, optional override) Cache keys: - brief:llm:whymatters:v3:{hash16} — envelope {whyMatters, producedBy, at}, 6h TTL. Endpoint-owned. - brief:llm:whymatters:shadow:v1:{hash16} — {analyst, gemini, chosen, at}, 7d TTL. Fire-and-forget. - brief:llm:whymatters:v2:{hash16} — legacy. Cron's fallback path still reads/writes during the rollout window; expires in ≤24h. Tests: 6022 pass (existing 5915 + 12 core + 36 endpoint + misc). typecheck + typecheck:api + biome on changed files clean. Plan (Codex-approved after 4 rounds): docs/plans/2026-04-21-001-feat-brief-why-matters-analyst-endpoint-plan.md * fix(brief): address /ce:review round 1 findings on PR #3248 Fixes 5 findings from multi-agent review, 2 of them P1: - #241 P1: `.gitignore !api/internal/*` was too broad — it re-included `.env`, `.env.local`, and any future secret file dropped into that directory. Narrowed to explicit source extensions (`.ts`, `.js`, `.mjs`) so parent `.env` / secrets rules stay in effect inside api/internal/. - #242 P1: `Dockerfile.digest-notifications` did not COPY `shared/brief-llm-core.js` + `.d.ts`. Cron would have crashed at container start with ERR_MODULE_NOT_FOUND. Added alongside brief-envelope + brief-filter COPY lines. - #243 P2: Cron dropped the endpoint's source/producedBy ground-truth signal, violating PR #3247's own round-3 memory (feedback_gate_on_ground_truth_not_configured_state.md). Added structured log at the call site: `[brief-llm] whyMatters source=<src> producedBy=<pb> hash=<h>`. Endpoint response now includes `hash` so log + shadow-record pairs can be cross-referenced. - #244 P2: Defense-in-depth prompt-injection hardening. Story fields flowed verbatim into both LLM prompts, bypassing the repo's sanitizeForPrompt convention. Added sanitizeStoryFields helper and applied in both analyst and gemini paths. - #245 P2: Removed redundant `validate` option from callLlmReasoning. With only openrouter configured in prod, a parse-reject walked the provider chain, then fell through to the other path (same provider), then the cron's own fallback (same model) — 3x billing on one reject. Post-call parseWhyMatters check already handles rejection cleanly. Deferred to P3 follow-ups (todos 246-248): singleflight, v2 sunset, misc polish (country-normalize LOC, JSDoc pruning, shadow waitUntil, auto-sync mirror, context-assembly caching). Tests: 6022 pass. typecheck + typecheck:api clean. * fix(brief-why-matters): ctx.waitUntil for shadow write + sanitize legacy fallback Two P2 findings on PR #3248: 1. Shadow record was fire-and-forget without ctx.waitUntil on an Edge function. Vercel can terminate the isolate after response return, so the background redisPipeline write completes unreliably — i.e. the rollout-validation signal the shadow keys were supposed to provide was flaky in production. Fix: accept an optional EdgeContext 2nd arg. Build the shadow promise up front (so it starts executing immediately) then register it with ctx.waitUntil when present. Falls back to plain unawaited execution when ctx is absent (local harness / tests). 2. scripts/lib/brief-llm.mjs legacy fallback path called buildWhyMattersPrompt(story) on raw fields with no sanitization. The analyst endpoint sanitizes before its own prompt build, but the fallback is exactly what runs when the endpoint misses / errors — so hostile headlines / sources reached the LLM verbatim on that path. Fix: local sanitizeStoryForPrompt wrapper imports sanitizeForPrompt from server/_shared/llm-sanitize.js (existing pattern — see scripts/seed-digest-notifications.mjs:41). Wraps story fields before buildWhyMattersPrompt. Cache key unchanged (hash is over raw story), so cache parity with the analyst endpoint's v3 entries is preserved. Regression guard: new test asserts the fallback prompt strips "ignore previous instructions", "### Assistant:" line prefixes, and `<\|im_start\|>` tokens when injection-crafted fields arrive. Typecheck + typecheck:api clean. 6023 / 6023 data tests pass. * fix(digest-cron): COPY server/_shared/llm-sanitize into digest-notifications image Reviewer P1 on PR #3248: my previous commit (`4eee22083`) added `import sanitizeForPrompt from server/_shared/llm-sanitize.js` to scripts/lib/brief-llm.mjs, but Dockerfile.digest-notifications cherry- picks server/_shared/* files and doesn't copy llm-sanitize. Import is top-level/static — the container would crash at module load with ERR_MODULE_NOT_FOUND the moment seed-digest-notifications.mjs pulls in scripts/lib/brief-llm.mjs. Not just on fallback — every startup. Fix: add `COPY server/_shared/llm-sanitize.js server/_shared/llm-sanitize.d.ts` next to the existing brief-render COPY line. Module is pure string manipulation with zero transitive imports — nothing else needs to land. Cites feedback_validation_docker_ship_full_scripts_dir.md in the comment next to the COPY; the cherry-pick convention keeps biting when new cross-dir imports land in scripts/lib/ or scripts/shared/. Can't regression-test at build time from this branch without a docker-build CI job, but the symptom is deterministic — local runs remain green (they resolve against the live filesystem); only the container crashes. Post-merge, Railway redeploy of seed-digest- notifications should show a clean `Starting Container` log line instead of the MODULE_NOT_FOUND crash my prior commit would have caused.	2026-04-21 14:03:27 +04:00
Elie Habib	89c179e412	fix(brief): cover greeting was hardcoded "Good evening" regardless of issue time (#3254 ) * fix(brief): cover greeting was hardcoded "Good evening" regardless of issue time Reported: a brief viewed at 13:02 local time showed "Good evening" on the cover (slide 1) but "Good afternoon." on the digest greeting page (slide 2). Cause: `server/_shared/brief-render.js:renderCover` had the string `'Good evening'` hardcoded in the cover's mono-cased salutation slot. The digest greeting page (slide 2) renders the time-of-day-correct value from `envelope.data.digest.greeting`, which is computed by `shared/brief-filter.js:174-179` from `localHour` in the user's TZ (< 12 → morning, < 18 → afternoon, else → evening). So any brief viewed outside the literal evening showed an inconsistent pair. Fix: thread `digest.greeting` into `renderCover`; a small `coverGreeting()` helper strips the trailing period so the cover's no-punctuation mono style is preserved. On unexpected/missing values it falls back to a generic "Hello" rather than silently re-hardcoding a specific time of day. Tests: 5 regression cases in `tests/brief-magazine-render.test.mjs` cover afternoon/morning/evening parity, period stripping, and HTML escape (defense-in-depth). 60 total in that file pass. Full test:data 5921 pass. typecheck + typecheck:api + biome clean. * chore(brief): fix orphaned JSDoc on coverGreeting / renderCover Greptile flagged: the original `renderCover` JSDoc block stayed above `coverGreeting` when the helper was inserted, so the @param shape was misattributed to the wrong function and `renderCover` was left undocumented (plus the new `greeting` field was unlisted). Moved the opts-shape JSDoc to immediately above `renderCover` and added `greeting: string` to the param type. `coverGreeting` keeps its own prose comment. No runtime change.	2026-04-21 13:46:21 +04:00
Elie Habib	b0928f213c	fix(live-webcams): refresh Iran-Attacks multicam + Mideast Mecca video IDs (#3251 ) User reported two tiles showing "This live stream recording is not available" — the pinned fallbackVideoIds had gone dark. - iran-multicam (Iran Attacks → Middle East slot): FGUKbzulB_Y → KSwPNkzEgxg - mecca (Mideast → Mecca slot): Cm1v4bteXbI → kJwEsQTegxk Values supplied by the user from working YouTube live URLs. Only `fallbackVideoId` is read by the runtime (buildEmbedUrl line 303, open-link line 422); `channelHandle` is metadata and left as-is.	2026-04-21 13:09:28 +04:00
Elie Habib	c279f6f426	fix(pro-marketing): nav reflects auth state, hide pro banner for pro users (#3250 ) * fix(pro-marketing): reflect auth state in nav, hide pro banner for pro users Two related signed-in-experience bugs caught by the user during the post-purchase flow: 1. /pro Navbar's SIGN IN button never reacted to auth state. The component was a static const Navbar = () => <nav>...</nav>; with no Clerk subscription, so signing in left the SIGN IN button in place even though the user was authenticated. 2. The "Pro is launched — Upgrade to Pro" announcement banner on the main app showed for ALL visitors including paying Pro subscribers. Pitching upgrade to a customer who already paid is a small but real annoyance, and it stays sticky for 7 days via the localStorage dismiss key — so a returning paying user dismisses it once and then never sees the (genuinely useful) banner again if they later downgrade. ## Changes ### pro-test/src/App.tsx — useClerkUser hook + ClerkUserButton - New useClerkUser() hook subscribes to Clerk via clerk.addListener and returns { user, isLoaded } so any component can react to auth changes (sign-in, sign-out, account switch). - New ClerkUserButton component mounts Clerk's native UserButton widget (avatar + dropdown with profile/sign-out) into a div via clerk.mountUserButton — inherits the existing dark-theme appearance options from services/checkout.ts::ensureClerk. - Navbar swaps SIGN IN button for ClerkUserButton when user is signed in. Slot is intentionally empty during isLoaded=false to avoid a SIGN IN → avatar flicker for returning users. - Hero hides its redundant SIGN IN CTA when signed in; collapses to just "Choose Plan" which is the relevant action for returning users. - Public/pro/ rebuilt to ship the change (per PR #3229's bundle- freshness rule). ### src/components/ProBanner.ts — premium-aware show + reactive auto-hide - showProBanner returns early if hasPremiumAccess() — same authoritative signal used by the frontend's panel-gating layer (unions API key, tester key, Clerk pro role, AND Convex Dodo entitlement). - onEntitlementChange listener auto-dismisses the banner if a Convex snapshot arrives mid-session that flips the user to premium (e.g. Dodo webhook lands while they're sitting on the dashboard). Does NOT write the dismiss timestamp, so the banner reappears correctly if they later downgrade. ## Test plan ### pro-test (sign-in UI) - [ ] Anonymous user loads /pro → SIGN IN button visible in nav. - [ ] Click SIGN IN, complete Clerk modal → button replaced with Clerk's UserButton avatar dropdown. - [ ] Open dropdown, click Sign Out → reverts to SIGN IN button. - [ ] Hard reload as signed-in user → SIGN IN button never flashes; avatar appears once Clerk loads. ### main app (banner gating) - [ ] Anonymous user loads / → "Pro is launched" banner shows. - [ ] Click ✕ to dismiss → banner stays dismissed for 7 days (existing behavior preserved). - [ ] Pro user (active Convex entitlement) loads / → banner does NOT appear, regardless of dismiss state. - [ ] Free user opens /, then completes checkout in another tab and Convex publishes the entitlement snapshot → banner auto-hides in the dashboard tab without reload. - [ ] Pro user whose subscription lapses (validUntil < now) → banner reappears on next page load, since dismiss timestamp wasn't written by the entitlement-change auto-hide. * fix(pro-banner): symmetric show/hide on entitlement change Reviewer caught that the previous iteration only handled the upgrade direction (premium snapshot → hide banner) but never re-showed the banner on a downgrade. App.ts calls showProBanner() once at init, so without a symmetric show path, a session that started premium and then lost entitlement (cancellation, billing grace expiry, plan downgrade for the same user) would stay banner-less for the rest of the SPA session — until a full reload re-ran App.ts init. Net effect of the bug: the comment claiming "the banner reappears correctly if they later downgrade or the entitlement lapses" was false in practice for any in-tab transition. Two changes: 1. Cache the container on every showProBanner() call, including the early-return paths. App.ts always calls showProBanner() once at init regardless of premium state, so this guarantees the listener has the container reference even when the initial mount was skipped (premium user, dismissed, in iframe). 2. Make onEntitlementChange handler symmetric: - premium snapshot + visible → hide (existing behavior) - non-premium snapshot + not visible + cached container + not dismissed + not in iframe → re-mount via showProBanner The non-premium re-mount goes through showProBanner() so it gets the same gate checks as the initial path (isDismissed, iframe, premium). We can never surface a banner the user has already explicitly ✕'d this week. Edge cases handled: - User starts premium, no banner shown, downgrades mid-session → listener fires, premium false, no bannerEl, container cached, not dismissed → showProBanner mounts banner ✓ - User starts free, sees banner, upgrades mid-session → listener fires, premium true, bannerEl present → fade out ✓ - User starts free, dismisses banner, upgrades, downgrades → listener fires on downgrade, premium false, no bannerEl, container cached, isDismissed=true → showProBanner returns early ✓ - User starts free, banner showing, multiple entitlement snapshots arrive without state change → premium=false && bannerEl present, neither branch fires, idempotent no-op ✓ * fix(pro-banner): defer initial mount while entitlement is loading Greptile P1 round-2: hasPremiumAccess() at line 48 reads isEntitled() synchronously, but the Convex entitlement subscription is fired non-awaited at App.ts:868 (`void initEntitlementSubscription()`). showProBanner() runs at App.ts:923 during init Phase 1, before the first Convex snapshot arrives. So a Convex-only paying user (Clerk role 'free' + Dodo entitlement tier=1) sees this sequence: t=0 init runs → hasPremiumAccess() === false (isEntitled() reads currentState===null) → "Upgrade to Pro" banner mounts t=~1s Convex snapshot arrives → onEntitlementChange fires → my listener detects premium=true && bannerEl !== null → fade out That's a 1+ second flash of "you should upgrade!" content for someone who has already paid. Worst case is closer to ~10s on a cold-start Convex client, which is much worse — looks like the upgrade pitch is the actual UI. Defer the initial mount when (1) the user is signed in (so they plausibly have a Convex entitlement) AND (2) the entitlement state hasn't loaded yet (currentState === null). The existing onEntitlementChange listener will mount it later if the first snapshot confirms the user is actually free. Two reasons this is gated on "signed in": - Anonymous users will never have a Convex entitlement, so deferring would mean the banner NEVER mounts for them. Bad regression: anon visitors are the highest-value audience for the upgrade pitch. - For signed-in users, the worst case if no entitlement EVER arrives is the banner stays absent — which is identical to a paying user's correct state, so it fails-closed safely. Edge case behavior: - Anonymous user: no Clerk session → first condition false → banner mounts immediately ✓ - Signed-in free user with first snapshot pre-loaded somehow: second condition false → banner mounts immediately ✓ - Signed-in user, snapshot pending: deferred → listener mounts on first snapshot if user turns out free ✓ - Signed-in user, snapshot pending, user turns out premium: never mounted ✓ (the desired path) - Signed-in user, snapshot pending, never arrives (Convex outage): banner never shows → see above, this fails-closed safely	2026-04-21 11:01:57 +04:00
Elie Habib	6977e9d0fe	fix(gateway): accept Dodo entitlement as pro, not just Clerk role — unblocks paying users (#3249 ) * fix(gateway): accept Dodo entitlement as pro, not just Clerk role The gateway's legacy premium-paths gate (lines 388-401) was rejecting authenticated Bearer users with 403 "Pro subscription required" whenever session.role !== 'pro' — which is EVERY paying Dodo subscriber, because the Dodo webhook pipeline writes Convex entitlements and does NOT sync Clerk publicMetadata.role. So the flow was: - User pays, Dodo webhook fires, Convex entitlement tier=1 written - User loads the dashboard, Clerk token includes Bearer but role='free' - Gateway sees role!=='pro' → 403 on every intelligence/trade/ economic/sanctions premium endpoint - User sees a blank dashboard despite having paid This is the exact split-brain documented at the frontend layer (src/services/panel-gating.ts:11-27): "The Convex entitlement check is the authoritative signal for paying customers — Clerk `publicMetadata.plan` is NOT written by our webhook pipeline". The frontend was fixed by having hasPremiumAccess() fall through to isEntitled() from Convex. The backend gateway still had the Clerk-role-only gate, so paying users got rejected even though their Convex entitlement was active. Align the gateway gate with the logic already in server/_shared/premium-check.ts::isCallerPremium (line 44-49): 1. If Clerk role === 'pro' → allow (fast path, no Redis/Convex I/O) 2. Else if session.userId → look up Convex entitlement; allow if tier >= 1 AND validUntil >= Date.now() (covers lapsed subs) 3. Else → 403 Same two-signal semantics as the per-handler isCallerPremium, so the gateway and handlers can't disagree on who is premium. Uses the already-imported getEntitlements function (line 345 already imports it dynamically; promoting to top-level import since the new site is in a hotter path). Impact: unblocks all Dodo subscribers whose Clerk role is still 'free' — the common case after any fresh Pro purchase and for every user since webhook-based role sync was never wired up. Reported 2026-04-21 post-purchase flow: user completed Dodo payment, landed back on dashboard, saw 403s on get-regional-snapshot, get-tariff-trends, list-comtrade-flows, get-national-debt, deduct-situation — all 5 are in PREMIUM_RPC_PATHS but not in ENDPOINT_ENTITLEMENTS, so they hit this legacy gate. * fix(gateway): move entitlement fallback to the gate that actually fires Reviewer caught that the previous iteration of this fix put the entitlement fallback at line ~400, inside an `if (sessionUserId && !keyCheck.valid && needsLegacyProBearerGate)` branch that's unreachable for the case the PR was supposed to fix: - sessionUserId is only resolved when isTierGated is true (line 292) — JWKS lookup is intentionally skipped for non-tier-gated paths. - needsLegacyProBearerGate IS the non-tier-gated set (PREMIUM_RPC_PATHS && !isTierGated). - So sessionUserId is null, the branch never enters, and the actual legacy-Bearer rejection still happens earlier at line 367 inside the `keyCheck.required && !keyCheck.valid` branch. Move the entitlement fallback INTO the line-367 check, where the Bearer is already being validated and `session.userId` is already exposed on the validateBearerToken() result. No extra JWKS round-trip needed (validateBearerToken already verified the JWT). The previously- added line-400 block is removed since it never ran. Now for a paying Dodo subscriber whose Clerk role is still 'free': - Bearer validates → role !== 'pro' - Fall through: getEntitlements(session.userId) → tier=1, validUntil future - allowed = true, request proceeds to handler Same fail-closed semantics as before for the negative cases: - Anonymous → no Bearer → 401 - Bearer with invalid JWT → 401 - Free user with no Dodo entitlement → 403 - Pro user whose Dodo subscription lapsed (validUntil < now) → 403 * chore(gateway): drop redundant dynamic getEntitlements import Greptile spotted that the previous commit promoted getEntitlements to a top-level import for the new line-385 fallback site, but the older dynamic import at line 345 (in the user-API-key entitlement check branch) was left in place. Same module, same symbol, so the dynamic import is now dead weight that just adds a microtask boundary to the hot path. Drop it; line 345's `getEntitlements(sessionUserId)` call now resolves through the top-level import like the line-385 site already does.	2026-04-21 10:55:09 +04:00
Elie Habib	4d9ae3b214	feat(digest): topic-grouped brief ordering (size-first) (#3247 )	2026-04-21 08:58:02 +04:00
Elie Habib	ee93fb475f	fix(portwatch): cut HISTORY_DAYS 90 to 60 so per-country fits 540s budget (#3246 ) Prod log 2026-04-21T00:02Z on the standalone portwatch-port-activity service confirmed the per-country shape at 90 days still doesn't fit even in its own 540s container: batch 1/15: 7 seeded, 5 errors (90.0s) batch 5/15: 42 seeded, 18 errors (392.6s) [SIGTERM at 540s after ~batch 7] Math: avg ~75s/batch × 15 batches = 1125s needed vs 540s available. Degradation guard would reject the ~50-country partial publish against a prev snapshot of ~150+ countries. 60 days is the minimum window that still covers both aggregates the UI consumes: last30 (days 0-30, all current metrics) + prev30 (days 30-60, for trendDelta). Cutting from 90d→60d drops each per-country query by ~33% in row count and page count. Expected avg batch time ~50s. No feature regression: last7/anomalySignal were already no-ops because ArcGIS's Daily_Ports_Data max date lags 10+ days behind now, so no row ever falls in the last-7-day window regardless of HISTORY_DAYS. Test added asserting HISTORY_DAYS=60 so an accidental revert breaks CI. 55 portwatch tests pass. Typecheck + lint clean.	2026-04-21 06:37:40 +04:00
Elie Habib	56a792bbc4	docs(marketing): bump source-count claims from 435+ to 500+ (#3241 ) Feeds.ts is at 523 entries after PR #3236 landed. The "435+" figure has been baked into marketing copy, docs, press kit, and localized strings for a long time and is now noticeably understated. Bump to 500+ as the new canonical figure. Also aligned three stale claims in less-visited docs: docs/getting-started.mdx 70+ RSS feeds => 500+ RSS feeds docs/ai-intelligence.mdx 344 sources => 500+ sources docs/COMMUNITY-PROMOTION-GUIDE 170+ news feeds => 500+ news feeds 170+ news sources => 500+ news sources And bumped the digest-dedup copy 400+ to 500+ (English + French locales + pro-test/index.html prerendered body) for consistency with the pricing and GDELT panels. Left alone on purpose (different metric): 22 services / 22 service domains 24 feeds (security-advisory seeder specifically) 31 sources (freshness tracker) 45 map layers Rebuilt /pro bundle so the per-locale chunks + prerendered index.html under public/pro/assets ship the new copy. 20 locales updated.	2026-04-20 22:39:42 +04:00
Elie Habib	d880f6a0e7	refactor(aviation): consolidate intl+FAA+NOTAM+news seeds into seed-aviation.mjs (#3238 ) * refactor(aviation): consolidate intl+FAA+NOTAM+news seeds into seed-aviation.mjs seed-aviation.mjs was misnamed: it wrote to a dead Redis key while the 51-airport AviationStack loop + ICAO NOTAM loop lived hidden inside ais-relay.cjs, duplicating the NOTAM write already done by seed-airport-delays.mjs. Make seed-aviation.mjs the single home for every aviation Redis key: aviation:delays:intl:v3 (AviationStack 51 intl — primary) aviation:delays:faa:v1 (FAA ASWS 30 US) aviation:notam:closures:v2 (ICAO NOTAM 60 global) aviation:news::24:v1 (9 RSS feeds prewarmer) One unified AIRPORTS registry (~85 entries) replaces the three separate lists. Notifications preserved via wm:events:queue LPUSH + SETNX dedup; prev-state migrated from in-process Sets to Redis so short-lived cron runs don't spam on every tick. ICAO quota-exhaustion backoff retained. Contracts preserved byte-identically for consumers (AirportDelayAlert shape, seed-meta:aviation:{intl,faa,notam} meta keys, runSeed envelope writes). Impact: kills ~8,640/mo wasted AviationStack calls (dead-key writes), strips ~490 lines of hidden seed logic from ais-relay, eliminates duplicate NOTAM writer. Net -243 lines across three files. Railway steps after merge: 1. Ensure seed-aviation service env has AVIATIONSTACK_API + ICAO_API_KEY. 2. Delete/disable the seed-airport-delays Railway service. 3. ais-relay redeploys automatically; /aviationstack + /notam live proxies for user-triggered flight lookups preserved. * fix(aviation): preserve last-good intl snapshot on unhealthy/skipped fetch + restore NOTAM quota-exhaust handling Review feedback on PR #3238: (1) Intl unhealthy → was silently overwriting aviation:delays:intl:v3 with an empty or partial snapshot because fetchAll() always returned { alerts } and zeroIsValid:true let runSeed publish. Now: • seedIntlDelays() returns { alerts, healthy, skipped } unchanged • fetchAll() refuses to publish when !healthy \|\| skipped: - extendExistingTtl([INTL_KEY, INTL_META_KEY], INTL_TTL) - throws so runSeed enters its graceful catch path (which also extends these TTLs — idempotent) • Per-run cache (cachedRun) short-circuits subsequent withRetry(3) invocations so the retries don't burn 3x NOTAM quota + 3x FAA/RSS fetches when intl is sick. (2) NOTAM quota exhausted — PR claimed "preserved" but only logged; the NOTAM data key was drifting toward TTL expiry and seed-meta was going stale, which would flip api/health.js maxStaleMin=240 red after 4h despite the intended 24h backoff window. Now matches the pre-strip ais-relay behavior byte-for-byte: • extendExistingTtl([NOTAM_KEY], NOTAM_TTL) • upstashSet(NOTAM_META_KEY, {fetchedAt: now, recordCount: 0, quotaExhausted: true}, 604800) Consumers keep serving the last known closure list; health stays green. Also added extendExistingTtl fallbacks on FAA/NOTAM network-rejection paths so transient network failures also don't drift to TTL expiry. * refactor(aviation): move secondary writes + notifications into afterPublish Review feedback on PR #3238: fetchAll() was impure — it wrote FAA / NOTAM / news and dispatched notifications during runSeed's fetch phase, before the canonical aviation:delays:intl:v3 publish ran. If that later publish failed, consumers could see fresh FAA/NOTAM/news alongside a stale intl key, and notifications could fire for a run whose primary key never published, breaking the "single home / one cron tick" atomic contract. Restructure: • fetchAll() now pure — returns { intl, faa, notam, news + rejection refs }. No Redis writes, no notifications. • Intl gate stays: unhealthy / skipped → throw. runSeed's catch extends TTL on INTL_KEY + seed-meta:aviation:intl and exits 0. afterPublish never runs, so no side effects escape. • publishTransform extracts { alerts } from the bundle for the canonical envelope; declareRecords sees the transformed shape. • afterPublish handles ALL secondary writes (FAA, NOTAM, news) and notification dispatch. Runs only after a successful canonical publish. • Per-run memo (cachedBundle) still short-circuits withRetry(3) retries so NOTAM quota isn't burned 3x when intl is sick. NOTAM quota-exhaustion + rejection TTL-extend branches preserved inside afterPublish — same behavior, different location. * refactor(aviation): decouple FAA/NOTAM/news side-cars from intl's runSeed gate Review feedback on PR #3238: the previous refactor coupled all secondary outputs to the AviationStack primary key. If AVIATIONSTACK_API was missing or intl was systemically unhealthy, fetchAll() threw → runSeed skipped afterPublish → FAA/NOTAM/news all went stale despite their own upstream sources being fine. Before consolidation, FAA and NOTAM each ran their own cron and could freshen independently. This restores that independence. Structure: • Three side-car runners: runFaaSideCar, runNotamSideCar, runNewsSideCar. Each acquires its own Redis lock (aviation:faa / aviation:notam / aviation:news — distinct from aviation:intl), fetches its source, writes data-key + seed-meta on success, extends TTL on failure, releases the lock. Completely independent of the AviationStack path. • NOTAM side-car keeps the quota-exhausted + rejection handling and dispatches notam_closure notifications inline. • main() runs the three side-cars sequentially, then hands off to runSeed for intl. runSeed still process.exit()s at the end so it remains the last call. • Intl's afterPublish now only dispatches aviation_closure notifications (its single responsibility). Removed: the per-run memo for fetchAll (no longer needed — withRetry now only re-runs the intl fetch, not FAA/NOTAM/RSS). Net behavior: • AviationStack 500s / missing key → FAA, NOTAM, news still refresh normally; only aviation:delays:intl:v3 extends TTL + preserves prior snapshot. • ICAO quota exhausted → NOTAM extends TTL + writes fresh meta (as before); FAA/intl/news unaffected. • FAA upstream failure → only FAA extends TTL; other sources unaffected. * fix(aviation): correct Gaborone ICAO + populate FAA alert meta from registry Greptile review on PR #3238: P1: GABS is not the ICAO for Gaborone — the value was faithfully copied from the pre-strip ais-relay NOTAM list which was wrong. Botswana's ICAO prefix is FB; the correct code is FBSK. NOTAM queries for GABS would silently exclude Gaborone from closure detection. (Pre-existing bug in the repo; fixing while in this neighborhood.) P2 (FAA alerts): Now that the unified AIRPORTS registry carries icao/name/city/country for every FAA airport, use it. Previous code returned icao:'', name:iata, city:'' — consumers saw bare IATA codes for US-only alerts. Registry lookup via a new FAA_META map; lat/lon stays 0,0 by design (FAA rows aren't rendered on the globe, so lat/lon is intentionally absent from those registry rows). P2 (NOTAM TTL on quota exhaustion): already fixed in commit `ba7ed014e` (pre-decouple) — confirmed line 803 calls extendExistingTtl([NOTAM_KEY]) and line 805 writes fresh meta with quotaExhausted=true.	2026-04-20 22:37:49 +04:00
Elie Habib	ecd56d4212	feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast (#3236 ) * feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast Four direct-RSS sources verified from a clean IP and absent everywhere in the repo (src/config/feeds.ts, scripts/seed-, ais-relay.cjs, RSS allowlist). Closes the highest-ROI Iran / Israel domestic-press gap from the ME source audit (PR #3226) with zero infra changes. - IRNA https://en.irna.ir/rss - Mehr News https://en.mehrnews.com/rss - Jerusalem Post https://www.jpost.com/rss/rssfeedsheadlines.aspx - Ynetnews https://www.ynetnews.com/Integration/StoryRss3089.xml Propaganda-risk metadata: - IRNA + Mehr tagged high / Iran state-affiliated (join Press TV). - JPost + Ynetnews tagged low with knownBiases for transparency. RSS allowlist updated in all three mirrors (shared/, scripts/shared/, api/_rss-allowed-domains.js) per the byte-identical mirror contract enforced by tests/edge-functions.test.mjs. Deferred (separate PRs): - Times of Israel: already in allowlist; was removed from feeds for cloud-IP 403. Needs Decodo routing. - IDF Spokesperson: idf.il has no direct RSS endpoint; needs scraper. - Tasnim / Press TV RSS / Israel Hayom: known cloud-IP blocks. - WAM / SPA / KUNA / QNA / BNA: public RSS endpoints are dead; sites migrated to SPAs or gate with 403. Plan doc (PR #3226) overstated the gap: it audited only feeds.ts and missed that travel advisories + US Embassy alerts are already covered by scripts/seed-security-advisories.mjs. NOTAM claim in that doc is also wrong: we use ICAO's global NOTAM API, not FAA. fix(feeds): enable IRNA, Mehr, Jerusalem Post, Ynetnews by default Reviewer on #3236 flagged that adding the four new ME feeds to FULL_FEEDS.middleeast alone leaves them disabled on first run, because App.ts:661 persists computeDefaultDisabledSources() output derived from DEFAULT_ENABLED_SOURCES. Users would have to manually re-enable via Settings > Sources, defeating the purpose of broadening the default ME mix. Add the four new sources to DEFAULT_ENABLED_SOURCES.middleeast so they ship on by default. Placement keeps them adjacent to their peers (IRNA / Mehr with the other Iran sources, JPost / Ynetnews after Haaretz). Risk/slant tags already in SOURCE_PROPAGANDA_RISK ensure downstream digest dedup + summarization weights them correctly. * style(feeds): move JPost + Ynetnews under Low-risk section header Greptile on #3236 flagged that both entries are risk: 'low' but were inserted above the `// Low risk - Independent with editorial standards` comment header, making the section boundary misleading for future contributors. Shift them under the header where they belong. No runtime change; cosmetic ordering only.	2026-04-20 19:07:09 +04:00
Elie Habib	661bbe8f09	fix(health): nationalDebt threshold 7d → 60d — match monthly cron interval (#3237 ) * fix(health): nationalDebt threshold 7d → 60d to match monthly cron cadence User reported health showing: "nationalDebt": { status: "STALE_SEED", records: 187, seedAgeMin: 10469, maxStaleMin: 10080 } Root cause: api/health.js had `maxStaleMin: 10080` (7 days) on a seeder that runs every 30 days via seed-bundle-macro.mjs: { label: 'National-Debt', intervalMs: 30 * DAY, ... } The threshold was narrower than the cron interval, so every month between days 8–30 it guaranteed STALE_SEED. Original comment "7 days — monthly seed" even spelled the mismatch out loud. Data source cadence: - US Treasury debt_to_penny API: updates daily but we only snapshot latest - IMF WEO: quarterly/semi-annual release — no value in checking daily - 30-day cron is appropriate; stale threshold should be ≥ 2× interval Fix: bump maxStaleMin to 86400 (60 days). Matches the 2× pattern used by faoFoodPriceIndex + recovery pillar (recoveryFiscalSpace, etc.) which also run monthly. Also fixes the same mismatch in scripts/regional-snapshot/freshness.mjs — the 10080 ceiling there would exclude national-debt from capital_stress axis scoring 23 days out of every 30 between seeds. * fix(seed-national-debt): raise CACHE_TTL to 65d so health.js stale window is actually reachable PR #3237 review was correct: my earlier fix set api/health.js SEED_META.nationalDebt.maxStaleMin to 60d (86400min), but the seeder's CACHE_TTL was still 35d. After a missed monthly cron, the canonical key expired at day 35 — long before the 60d "stale" threshold. Result path: hasData=false → api/health.js:545-549 → status = EMPTY (crit) Not STALE_SEED (warn) as my commit message claimed. writeFreshnessMetadata() in scripts/_seed-utils.mjs:222 sets meta TTL to max(7d, ttlSeconds), so bumping ttlSeconds alone propagates to both the canonical payload AND the meta key. Fix: - CACHE_TTL 35d → 65d (5d past the 60d stale window so we get a clean STALE_SEED → EMPTY transition without keys vanishing mid-warn). - runSeed opts.maxStaleMin 10080 (7d) → 86400 (60d) so the in-seeder declaration matches api/health.js. Field is only validated for presence by runSeed (scripts/_seed-utils.mjs:798), but the drift was what hid the TTL invariant in the first place. Invariant this restores: for any SEED_META entry, seeder CACHE_TTL ≥ maxStaleMin + buffer so the "warn before crit" gradient actually exists. * fix(freshness): wire national-debt to seed-meta + teach extractTimestamp about seededAt Reviewer P2 on PR #3237: my earlier freshness.mjs bump to 86400 was a no-op. classifyInputs() (scripts/regional-snapshot/freshness.mjs:100-108, 122-132) uses the entry's metaKey or extractTimestamp()'s known field list. national-debt had neither — payload carries only `seededAt`, and extractTimestamp didn't know that field, so the "present but undated" branch treated every call as fresh. The age window never mattered. Two complementary fixes: 1. Add metaKey: 'seed-meta:economic:national-debt' to the freshness entry. Primary, authoritative source — seed-meta.fetchedAt is written by writeFreshnessMetadata() on every successful run, which is also what api/health.js reads, keeping both surfaces consistent. 2. Add `seededAt` to extractTimestamp()'s field list. Defense-in-depth: many other runSeed-based scripts (seed-iea-oil-stocks, seed-eurostat-country-data, etc.) wrap output as { ..., seededAt: ISO } with no metaKey in the freshness registry. Without this, they were also silently always-fresh. ISO strings parse via Date.parse. Note: `economic:eu-gas-storage:v1` uses `seededAt: String(Date.now())` — a stringified epoch number, which Date.parse does NOT handle. That seed's freshness classification is still broken by this entry's lack of metaKey, but it's a separate shape issue out of scope here. Flagged in PR body.	2026-04-20 19:03:47 +04:00
Elie Habib	42a86c5859	fix(preview): skip premium RPCs when main app runs inside /pro live-preview iframe (#3235 ) * fix(preview): skip premium RPCs when main app runs inside /pro preview iframe pro-test/src/App.tsx embeds the full main app as a "live preview" via <iframe src="https://worldmonitor.app?alert=false" sandbox="...">. The iframe boots an anonymous main-app session, which fires premium RPCs (get-regional-snapshot, get-tariff-trends, list-comtrade-flows, and on country-click the fetchProSections batch) with no Clerk bearer available. Every call 401s, the circuit breakers catch and fall through to empty fallbacks (so the preview renders fine), but the 401s surface on the PARENT /pro page's DevTools console and Sentry because `sandbox` includes `allow-same-origin`. Net effect: /pro pricing page shows a flood of fake-looking errors that cost us a session of debugging to trace back to the iframe. PR #3233's premiumFetch swap didn't help here (there's simply no token to inject for an anonymous iframe). Introduce `src/utils/embedded-preview.ts::IS_EMBEDDED_PREVIEW`, a module-level boolean evaluated once at load from `window.top !== window` (with try/catch for cross-origin sandboxes), and short-circuit three init-time premium entry points when true: - RegionalIntelligenceBoard.loadCurrent → renderEmpty() - fetchTariffTrends → return emptyTariffs - fetchComtradeFlows → return emptyComtrade Plus one defensive gate in country-intel.fetchProSections for the case a user clicks a country inside the iframe preview. Each gate returns the exact same empty fallback the breaker would have produced after a 401, so visual behavior is unchanged — the preview iframe still shows the dashboard layout with empty premium panels, just without the network request and its console/Sentry trail. Live-tab /pro page should now see zero 401s from regional-snapshot / tariff-trends / comtrade-flows on load. * fix(preview): narrow iframe gate to ?embed=pro-preview marker only Reviewer flagged that the first iteration's `window.top !== window` check was too broad. The repo explicitly markets "Embeddable iframe panels" as an Enterprise feature (pro-test/src/locales/en.json: whiteLabelDesc), so legitimate customer embeds must keep firing premium RPCs normally. Only the /pro marketing preview — which is known-anonymous and generates expected 401 noise — should short-circuit. Fix: replace the blanket iframe check with a unique marker that only /pro's preview iframe carries. - pro-test/src/App.tsx: iframe src switched from `?alert=false` (dead param, unused in main app) to `?embed=pro-preview`. Rebuilt public/pro/ to ship the change. - src/utils/embedded-preview.ts: two-gate check now. Gate 1 still requires `window.top !== window` so the marker leaking into a top-level URL doesn't disable premium RPCs for the top-level app. Gate 2 requires `?embed=pro-preview` in location.search so only the known embedder matches. Enterprise white-label embeds without this marker behave exactly like a top-level visit. Same three premium fetchers + the one country-intel path still gate on IS_EMBEDDED_PREVIEW; the semantic change is purely in how the flag is computed. Per PR #3229 / #3228 lesson, the pro-test rebuild ships in the same PR as the source change — public/pro/assets/index-*.js and index.html reflect the new iframe src.	2026-04-20 17:49:50 +04:00
Elie Habib	240abaa8ed	fix(premium): route RPC clients through premiumFetch — stop 401s for pro users (#3233 ) * fix(premium): route premium RPC clients through premiumFetch Four generated-client instantiations were using plain globalThis.fetch, bypassing the Clerk bearer / tester-key / WORLDMONITOR_API_KEY injection chain. Signed-in pro users hit the premium endpoints unauthenticated and got 401, with no visible path to recovery: - src/components/RegionalIntelligenceBoard.ts → get-regional-snapshot, get-regime-history, get-regional-brief - src/components/DeductionPanel.ts → deduct-situation, list-market-implications - src/services/trade/index.ts → get-tariff-trends, list-comtrade-flows (+ non-premium siblings) - src/app/country-intel.ts::fetchProSections → get-national-debt, getRegimeHistory/getRegionalBrief, list-comtrade-flows Swap each to `fetch: premiumFetch` (src/services/premium-fetch.ts), which tries in order: existing auth header → WORLDMONITOR_API_KEY → tester key → Clerk bearer token → unauthenticated passthrough. For non-premium endpoints that share the same client (e.g. getTradeFlows, getCustomsRevenue) the fallthrough behavior is identical to plain globalThis.fetch — no regression surface. Surfaces as user-facing 401 on /pro sign-in → redirect-to-/ flow, where pro users briefly see the dashboard try to fetch then hit 401. After this fix the bearer token flows through and regional/deduction/trade panels populate as expected. Left un-touched (not hitting premium paths today, so not blocking): - src/services/gdelt-intel.ts (searchGdeltDocuments) - src/services/social-velocity.ts (getSocialVelocity) - src/services/pizzint.ts (getPizzintStatus) If any of those ever move into PREMIUM_RPC_PATHS, swap them too. * fix(premium): split trade client + disable premium-breaker persistCache Review found a real access-control leak in the first iteration of this PR: routing the entire shared TradeServiceClient through premiumFetch populated module-level breakers with `persistCache: true` and auth-invariant cache keys. A pro user's get-tariff-trends / list-comtrade-flows response would be written to localStorage by the breaker, and a later free / signed-out session on the same browser would be served that cached premium data directly, bypassing both auth injection and the gateway's entitlement check. Two-layer fix so neither the in-memory breaker nor the persistent cache can leak premium data across auth states: 1. Split clients. publicClient keeps plain globalThis.fetch and feeds restrictions/flows/barriers/revenue breakers (non-premium, shareable across users). premiumClient uses premiumFetch and is ONLY used by fetchTariffTrends + fetchComtradeFlows. 2. Disable persistCache for premium breakers. tariffsBreaker and comtradeBreaker flip to `persistCache: false`. In-memory cache within a session is still fine (and expected for circuit-breaker behavior), but the response no longer survives a reload / cross- session switch where a different user could read it. Both changes are needed: split clients alone would still let premium responses ride through the old cached entries (if any) after the deploy; persistCache:false alone would still mean a shared client routes anonymous calls through premiumFetch (minor but avoidable token leak potential). Together they're airtight for the leak vector. Follow-up potentially worth doing: auth-keyed cache keys for breakers used by premium data, so the in-tab SPA sign-out case is also sealed. Not blocking today. * fix(premium): invalidate in-memory premium breakers on Clerk identity change Review caught the remaining leak vector: persistCache:false + split clients closes cross-browser-reload leaks, but the module-level tariffsBreaker/comtradeBreaker in-memory cache still lives for the full 30-min / 6-hour TTL inside a single SPA session. A pro user loads tariff/comtrade data → signs out in the same tab → any caller is served the pro response from memory without re-auth. Track the Clerk user id that last populated the premium breakers in a module-level `lastPremiumUserId`. On every call to fetchTariffTrends or fetchComtradeFlows, check the current Clerk identity via getCurrentClerkUser(). If it changed (sign-out, user switch, free↔pro transition), call clearMemoryCache() on both premium breakers before executing. The breaker then falls back to a live fetch through premiumFetch with the new caller's credentials. `clearMemoryCache` (not `clearCache`) is deliberate — it only touches the in-memory cache and leaves persistent storage alone. Non-premium breakers on this same client (restrictions, flows, barriers, revenue) are untouched: their responses are public and shareable across auth states, so an identity-triggered clear would only cost us cache hits with zero security benefit. Edge cases handled: - First call: lastPremiumUserId is undefined, no clear. - Anonymous → anonymous: no clear (both null). - Anonymous → pro: clears (defensive; anon can't populate anyway because emptyTariffs/emptyComtrade fail shouldCache). - Pro → anonymous: clears (the critical case). - Pro A → pro B: clears (account switch). - getCurrentClerkUser throws (Clerk not loaded yet): treated as anonymous, safe default. Closes the audit cycle on this PR's cache-leak thread. * fix(premium): invalidate breakers on entitlement change, not just user id Reviewer caught that the prior fingerprint was `userId` only. That covers sign-out, user switch, and account change — but NOT the case the prior commit's comment explicitly claimed: a pro→free downgrade for the same signed-in user (subscription cancellation, billing grace-period expiry, plan switch to annual-lapsed). The Clerk user id doesn't change, so the invalidation never fired and the cached pro response kept serving until the 30-min / 6-hour breaker TTL. Widen the fingerprint to `${userId}:${plan}` (or 'anon' when signed-out). `getCurrentClerkUser()` already reads `plan` off `user.publicMetadata`, so no new Clerk calls needed. Now covered: - pro A → pro B (userId change) - pro → anon (userId → null) - anon → pro (defensive) - pro (active) → free (cancelled or expired) for same userId ← new - free → pro (upgrade for same userId) ← new Pro users who actively downgrade (or whose subscription lapses) during an open tab will see the cached premium response invalidate on the next premium fetcher call, at which point the live fetch through premiumFetch + the gateway's entitlement check returns the correct empty/403 response for their now-free plan. * fix(premium): fingerprint on hasPremiumAccess, not Clerk publicMetadata.plan Reviewer caught that Clerk publicMetadata.plan is NOT the authoritative premium signal in this codebase. Per the explicit docstring on src/services/panel-gating.ts::hasPremiumAccess (lines 11-27), the webhook pipeline does NOT write publicMetadata.plan — the authoritative signal is the Convex Dodo entitlement surfaced by isProUser() / isEntitled(). Two fingerprint blind spots that ate the prior iteration: - Paying user with valid Dodo entitlement but no Clerk metadata: publicMetadata.plan === 'free' → fingerprint 'uid:free' → no invalidation when their session transitions (they never looked premium by the fingerprint, but premiumFetch was still injecting their token and the gateway was still serving premium data). - Active pro user whose Dodo subscription lapses: Clerk metadata doesn't change → fingerprint stays 'uid:pro' → no invalidation → cached tariff/comtrade response keeps serving until TTL (the exact pro→free case the previous commit claimed to cover). Swap the plan source from getCurrentClerkUser().plan to hasPremiumAccess() — the single source of truth used by panel-gating, widgets, search, and event handlers. It unions API key, tester keys, Clerk pro role, AND Convex entitlement, so every path that legitimately grants premium access contributes to the fingerprint, and every path that revokes it triggers invalidation. Also add a reactive path: subscribe to onEntitlementChange() from src/services/entitlements, and wipe the premium breakers the moment Convex publishes a new entitlement snapshot. This closes the window between subscription lapse and the user's next premium panel click — the currently-open tariff panel clears its memory cache immediately instead of serving stale pro data until the user navigates. Combined: fingerprint is now (userId, hasPremiumAccess) tuple evaluated both lazily on every premium fetcher call AND eagerly when Convex pushes an entitlement change.	2026-04-20 16:28:04 +04:00
Elie Habib	d1ebc84c6c	feat(digest-dedup): single-link clustering (F1 0.73 vs 0.53 complete-link) (#3234 ) Problem ------- The post-threshold-tuning brief at /api/brief/user_3BovQ1tYlaz2YIGYAdDPXGFBgKy/2026-04-20-1532 still showed 4 copies of "US seizes Iranian ship", 3 copies of the Hormuz closure, and 2 copies of the oil-price story — despite running the calibrated 0.55 threshold. Root cause: complete-link is too strict for wire-headline clustering. Pairwise cosines in the 4-way ship-seizure cluster: 1 <-> 5: 0.632 5 <-> 8: 0.692 1 <-> 8: 0.500 5 <-> 10: 0.656 1 <-> 10: 0.554 8 <-> 10: 0.510 Complete-link requires EVERY pair to clear threshold. Pair 1<->8 at 0.500 fails so the whole 4-way cluster can't form, and all 4 stories bubble up as separate reps, eating 4 slots of the 12-story brief. Measured on the 12 real titles from that brief: Algorithm \| Clusters \| F1 \| P \| R --------------------------\|----------\|-------\|------\|------ complete-link @ 0.55 (was)\| 7 \| 0.526 \| 0.56 \| 0.50 complete-link @ 0.50 \| 6 \| 0.435 \| 0.38 \| 0.50 single-link @ 0.55 \| 4 \| 0.435 \| 0.28 \| 1.00 over-merge single-link @ 0.60 \| 6 \| 0.727 \| 0.67 \| 0.80 winner Change ------ scripts/lib/brief-dedup-embed.mjs: New singleLinkCluster(items, {cosineThreshold, vetoFn}) using union-find. Chain merges through strong intermediates when a direct pair is weak; respects the entity veto (blocked pairs don't union). O(N^2 alpha(N)); permutation-invariant by construction. scripts/lib/brief-dedup.mjs: New DIGEST_DEDUP_CLUSTERING env var (default 'single', set 'complete' to revert). readOrchestratorConfig returns 'clustering' field. Dispatch at call site picks the right function. Structured log line now includes clustering=<algo>. tests/brief-dedup-embedding.test.mjs: +8 regressions: - singleLinkCluster chains the 4-way through a bridge - veto blocks unions even when cosine passes - permutation-invariance property test (5 shuffles) - empty-input - DIGEST_DEDUP_CLUSTERING default is 'single' - DIGEST_DEDUP_CLUSTERING=complete kill switch works - unrecognised values fall back to 'single' - log line includes clustering=<algo> Bridge-pollution risk note -------------------------- The original plan rejected single-link to avoid the Jaccard-era "bridge pollution" (A~B=0.6, B~C=0.6, A~C=0.3 all chain through a mixed-topic B). With text-embedding-3-small at cosine >= 0.60, a bridge must be semantically real — the probe showed a 37% F1 bump with no new FPs on the production case. Setting DIGEST_DEDUP_CLUSTERING=complete on Railway is the instant rollback if a bad day ever surfaces chaining. Operator activation ------------------- After merge, on Railway seed-digest-notifications service: DIGEST_DEDUP_COSINE_THRESHOLD=0.60 No other changes needed — clustering=single is the default. Verification ------------ - npm run test:data 5825/5825 pass - tests/brief-dedup-embedding 53/53 pass (45 existing + 8 new) - typecheck + typecheck:api clean - biome check on changed files clean Post-Deploy Monitoring & Validation ----------------------------------- - Grep '[digest] dedup mode=embed clustering=single' in Railway logs — confirms the new algo is live - Expect clusters= to drop further on bulk ticks (stories=700+): current ~23 on 84-story ticks -> expected ~15-18 - Manually open next brief post-deploy, visually verify ship-seizure / Hormuz / oil stories no longer duplicate - Rollback: DIGEST_DEDUP_CLUSTERING=complete on Railway (instant, no deploy), next cron tick reverts to old behaviour - Validation window: 24h - Owner: koala73 Related ------- - #3200 embedding-based dedup (introduced complete-link) - #3224 DIGEST_SCORE_MIN floor (the low-importance half of the fix)	2026-04-20 16:21:20 +04:00
Elie Habib	d7393d8010	fix(pro): downgrade @clerk/clerk-js to v5 to restore auto-mount UI (#3232 ) The actual root cause behind the "Clerk was not loaded with Ui components" sign-in failure on /pro is NOT the import path — it's that pro-test was on @clerk/clerk-js v6.4.0 while the main app (which works fine) is on v5.125.7. Clerk v6 fundamentally changed `clerk.load()`: the UI controller is no longer auto-mounted by default. Both `@clerk/clerk-js` (the default v6 entry) and `@clerk/clerk-js/no-rhc` (the bundled-UI variant) expect the caller to either: - load Clerk's UI bundle from CDN and pass `window.__internal_ClerkUICtor` to `clerk.load({ ui: { ClerkUI } })`, or - manually wire up `clerkUICtor`. That's why my earlier "switch to no-rhc" fix (PR #3227 + #3228) didn't actually unbreak production — both v6 variants throw the same assertion. The error stack on the deployed bundle confirmed it: `assertComponentsReady` from `clerk.no-rhc-UeQvd9Xf.js`. Fix: pin pro-test to `@clerk/clerk-js@^5.125.7` to match the main app's working version. v5 still auto-mounts UI on `clerk.load()` — no extra wiring needed. The plain `import { Clerk } from '@clerk/clerk-js'` pattern (which the main app uses verbatim and which pro-test had before #3227) just works under v5. Verification of the rebuilt bundle (chunk: clerk-PNSFEZs8.js): - 3.05 MB (matches main app's clerk-DC7Q2aDh.js: 3.05 MB) - 44 occurrences of mountComponent (matches main: 44) - 3 occurrences of SignInComponent (matches main: 3) - 0 occurrences of "Clerk was not loaded with Ui" (the assertion error string is absent; UI is unconditionally mounted) Includes the rebuilt public/pro/ artifacts so this fix is actually deployed (PR #3229's CI check will catch any future PR that touches pro-test/src without rebuilding).	2026-04-20 15:25:11 +04:00
Elie Habib	0a4eff0053	feat(portwatch): split port-activity into standalone Railway cron + restore per-country shape (#3231 ) Context: PR #3225 globalised EP3 because the per-country shape was missing the section budget. Post-merge production log (2026-04-20) proved the globalisation itself was worse: 42s/page full-table scans (ArcGIS has no `date` index — confirmed via service metadata probe) AND intermittent "Invalid query parameters" on the global WHERE. Probes of outStatistics as an alternative showed it works for small countries (BRA: 19s, 103 ports) but times out server-side for heavy ones (USA: 313k historic rows, 30s+ server-compute, multiple retries returned HTTP_STATUS 000). Not a reliable path. The only shape ArcGIS reliably handles is per-country WHERE ISO3='X' AND date > Y (uses the ISO3 index). Its problem was fitting 174 countries in the 420s portwatch bundle budget — solve that by giving it its own container. Changes: - scripts/seed-portwatch-port-activity.mjs: restore per-country paginated EP3 with the accumulator shape from PR #3225 folded into the per-country loop (memory stays O(ports-per-country), not O(all-rows)). Keep every stabiliser: AbortSignal.any through fetchWithTimeout, SIGTERM handler with stage/batch/errors flush, per-country Promise.race with AbortController that actually cancels the work, eager p.catch for mid-batch error flush. - Add fetchWithRetryOnInvalidParams — single retry on the specific "Invalid query parameters" error class ArcGIS has returned intermittently in prod. Does not retry other error classes. - Bump LOCK_TTL_MS from 30 to 60 min to match the wider wall-time budget of the standalone cron. - scripts/seed-bundle-portwatch.mjs: remove PW-Port-Activity from the main portwatch bundle. Keeps PW-Disruptions (hourly), PW-Main (6h), PW-Chokepoints-Ref (weekly). - scripts/seed-bundle-portwatch-port-activity.mjs: new 1-section bundle. 540s section timeout, 570s bundle budget. Includes the full Railway service provisioning checklist in the header. - Dockerfile.seed-bundle-portwatch-port-activity: mirrors the resilience-validation pattern — node:22-alpine, full scripts/ tree copy (avoids the add-an-import-forget-to-COPY class that has bit us 3+ times), shared/ for _country-resolver. - tests/portwatch-port-activity-seed.test.mjs: rewrite assertions for the per-country shape. 54 tests pass (was 50, +4 for new assertions on the standalone bundle + Dockerfile + retry wrapper + ISO3 shape). Full test:data: 5883 pass. Typecheck + lint clean. Post-merge Railway provisioning: see header of seed-bundle-portwatch-port-activity.mjs for the 7-step checklist.	2026-04-20 15:21:43 +04:00
Elie Habib	234ec9bf45	chore(ci): enforce pro-test bundle freshness — local hook + CI backstop (#3229 ) * chore(ci): enforce pro-test bundle freshness, prevent silent deploy staleness public/pro/ is committed to the repo and served verbatim by Vercel. The root build script only runs the main app's vite build — it does NOT run pro-test's build. So any PR that changes pro-test/src/ without manually running `cd pro-test && npm run build` and committing the regenerated chunks ships to production with a stale bundle. This footgun just cost us: PR #3227 fixed the Clerk "not loaded with Ui components" sign-in bug in source, merged, deployed — and the live site still threw the error because the committed chunk under public/pro/assets/ was the pre-fix build. PR #3228 fix-forwarded by rebuilding. Two-layer enforcement so it doesn't happen again: 1. .husky/pre-push — mirrors the existing proto freshness block. If pro-test/ changed vs origin/main, rebuild and `git diff --exit-code public/pro/`. Blocks the push with a clear message if the bundle is stale or untracked files appear. 2. .github/workflows/pro-bundle-freshness.yml — CI backstop on any PR touching pro-test/ or public/pro/. Runs `npm ci + npm run build` in pro-test and fails the check if the working tree shows any diff or untracked files under public/pro/. Required before merge, so bypassing the local hook still can't land a stale bundle. Note: the hook's diff-against-origin/main check means it skips the build when pushing a branch that already matches main on pro-test/ (e.g. fix-forward branches that only touch public/pro/). CI covers that case via its public/pro/ path filter. * fix(hooks): scope pro-test freshness check to branch delta, not worktree The first version of this hook used `git diff --name-only origin/main -- pro-test/`, which compares the WORKING TREE to origin/main. That fires on unstaged local pro-test/ scratch edits and blocks pushing unrelated branches purely because of dirty checkout state. Switch to `$CHANGED_FILES` (computed earlier at line 77 from `git diff origin/main...HEAD`), which scopes the check to commits on the branch being pushed. This matches the convention the test-runner gates already use (lines 93-97). Also honor `$RUN_ALL` as the safety fallback when the branch delta can't be computed. * fix(hooks): trigger pro freshness check on public/pro/ too, match CI The first scoping fix used `^pro-test/` only, but the CI workflow keys off both `pro-test/` AND `public/pro/`. That left a gap: a bundle-only PR (e.g. a fix-forward rebuild like #3228, or a hand-edit to a committed asset) skipped the local check entirely while CI would still validate it. The hook and CI are now consistent. Trigger condition: `^(pro-test\|public/pro)/` — the rebuild + diff check now fires whenever the branch delta touches either side of the source/artifact pair, matching the CI workflow's path filter.	2026-04-20 15:21:25 +04:00
Elie Habib	9e022f23bb	fix(cable-health): stop EMPTY alarm during NGA outages — writeback fallback + mark zero-events healthy (#3230 ) User reported health endpoint showing: "cableHealth": { status: "EMPTY", records: 0, seedAgeMin: 0, maxStaleMin: 90 } despite the 30-min warm-ping loop running. Two bugs stacked: 1. get-cable-health.ts null-upstream path didn't write Redis. cachedFetchJson with a returning-null fetcher stores NEG_SENTINEL (10 bytes) in cable-health-v1 for 2 min. Handler then returned `fallbackCache \|\| { cables: {} }` to the client WITHOUT writing to cable-health-v1 or refreshing seed-meta. api/health.js saw strlen=10 → strlenIsData=false → hasData=false → records=0 → EMPTY (CRIT). Fix: on null result, write the fallback response back to CACHE_KEY (short TTL matching NEG_SENTINEL so a recovered NGA fetch can overwrite immediately) AND refresh seed-meta with the real count. Health now sees hasData=true during an outage. 2. Zero-cables was treated as EMPTY_DATA (CRIT), but `cables: {}` is the valid healthy state — NGA had no active subsea cable warnings. The old `Math.max(count, 1)` on recordCount was an intentional lie to sidestep this; now honest. Fix: add `cableHealth` to EMPTY_DATA_OK_KEYS. Matches the existing pattern for notamClosures, gpsjam, weatherAlerts — "zero events is valid, not critical". recordCount now reports actual cables.length. Combined: NGA outage → fallback cached locally + written back → health reads hasData=true, records=N, no false alarm. NGA healthy with zero active warnings → cables={}, records=0, EMPTY_DATA_OK → OK. NGA healthy with warnings → cables={...}, records>0 → OK. Regression guard to keep in mind: if anyone later removes cableHealth from EMPTY_DATA_OK_KEYS and wants strict zero-events to alarm, they'd also need to revisit `Math.max(count, 1)` or an equivalent floor so the "legitimately empty but healthy" state doesn't CRIT.	2026-04-20 15:21:04 +04:00
Elie Habib	2b7f83fd3e	fix(pro): regenerate /pro bundle with no-rhc Clerk so deploy reflects #3227 (#3228 ) PR #3227 fixed pro-test/src/services/checkout.ts to import @clerk/clerk-js/no-rhc instead of the headless main export, but the deployed bundle in public/pro/assets/ was never regenerated. The Vercel deploy ships whatever is committed under public/pro/ — the root build script does not run pro-test's vite build — so production /pro continued serving the old broken clerk-C6kUTNKl.js even after #3227 merged. Sign-in still threw "Clerk was not loaded with Ui components". Rebuild: cd pro-test && npm run build, which writes the new chunks to ../public/pro/assets/. Deletes the stale clerk-C6kUTNKl.js + index-J1JYVlDk.js, adds clerk.no-rhc-UeQvd9Xf.js + index-CFLOgmG-.js, and updates pro/index.html to reference them.	2026-04-20 14:24:44 +04:00
Elie Habib	7979b4da0e	fix(pro): switch Clerk to no-rhc bundle so sign-in modal mounts on /pro (#3227 ) * fix(pro): switch Clerk to no-rhc bundle so sign-in modal mounts The /pro marketing page was throwing "Clerk was not loaded with Ui components" the moment an unauthenticated user clicked Sign In or GET STARTED on a pricing tier, blocking every conversion. @clerk/clerk-js v6 main export (`dist/clerk.mjs`) is the headless build — it has no UI controller and expects `clerkUICtor` passed to `clerk.load()`. Calling `openSignIn()` on it always throws. The bundled-with-UI variant is exposed at `@clerk/clerk-js/no-rhc` (same `Clerk` named export, drop-in). Also adds explicit `Sentry.captureException` at both call sites, because the rejection was being swallowed by `.catch(console.error)` in App.tsx and by an unwrapped `c.openSignIn()` in checkout.ts — which is why this regression had zero Sentry trail in production. * fix(pro): catch Clerk load failures in startCheckout, not just openSignIn The PricingSection CTA fires-and-forgets `startCheckout()` with no .catch. The previous fix only wrapped `c.openSignIn()`, so any rejection from `await ensureClerk()` (dynamic import failure, network loss mid-load, clerk.load() throwing) still escaped as an unhandled promise — defeating the Sentry coverage we added. Now `startCheckout()` reports both load and openSignIn failures explicitly and returns false rather than rejecting. Also clear the cached `clerkLoadPromise` on failure so the next button click can retry from scratch instead of replaying a rejected promise forever. * fix(pro): only publish Clerk instance after load() succeeds _loadClerk() was assigning the module-level `clerk` singleton before awaiting `clerk.load()`. If load() rejected (transient network failure, malformed publishable key, Clerk frontend-api 4xx/5xx), the half- initialized instance stayed cached. The next ensureClerk() call then short-circuited on `if (clerk) return clerk;` and returned the broken instance, bypassing the retry path that commit `70d96d380` added. Hold the new instance in a local var, await load(), only then publish to the module slot. A failed load now leaves `clerk` null and the cleared `clerkLoadPromise` allows a genuine retry on the next click.	2026-04-20 13:45:46 +04:00
Elie Habib	1928b48e68	feat(portwatch): globalise EP3 — one paginated fetch, in-memory groupBy (#3225 ) * feat(portwatch): globalise EP3 — one paginated fetch, in-memory groupBy Follow-up to #3222 (stabiliser) — the real fix. Production log 2026-04-20T06:48-06:55 confirmed the stabilisers worked (per-country cap enforced at 90.0s, SIGTERM printed stage+errors, abort propagated through fetch + proxy paths) — but also proved the per-country shape itself is the bug: batch 1/15: 7 seeded, 5 errors (90.0s) ← per-country cap hit cleanly batch 5/15: 40 seeded, 20 errors (371.3s) ← 4 batches / ~70s avg SIGTERM at batch 6/15 after 420s 15 batches × ~70s = 1050s. Section budget is 420s. Per-country will never fit, even with a perfectly-behaving ArcGIS. Three countries (BRA, IDN, NGA) also returned "Invalid query parameters" on the ISO3-filtered WHERE — a failure mode unique to the per-country shape. Fix: replace 174 per-country round-trips with a single paginated pass over EP3, grouped by ISO3 in memory (same pattern EP4 refs already use via `fetchAllPortRefs`). ~150-200 sequential pages × ~1s each ≈ 2-4 min total wall time inside the 420s section. Eliminates the per-country failure modes by construction. Changes: - New `fetchAllActivityRows(since, { signal, progress })`: paginated `WHERE date > <ts>` across the whole Daily_Ports_Data feature server, grouped by attributes.ISO3 into Map<iso3, rows[]>. Advances offset by actual features.length (same server-cap defence as EP4). Checks signal.aborted between pages. - `fetchAll()` now reads the global map and drives `computeCountryPorts` for each eligible ISO3. No concurrency primitives, no batch loop, no Promise.allSettled. - Dropped: `processCountry`, `withPerCountryTimeout`, per-country `fetchActivityRows`, CONCURRENCY, PER_COUNTRY_TIMEOUT_MS, BATCH_LOG_EVERY. All dead under the global pattern. - `progress` shape now `{ stage, pages, countries }`. SIGTERM handler logs "SIGTERM during stage=<x> (pages=N, countries=M)" — still useful forensics if the global paginator itself hangs. - Shutdown controller: `main()` creates an AbortController, threads its signal through fetchAll → fetchAllActivityRows → fetchWithTimeout → _proxy-utils, and the SIGTERM handler calls abort() so in-flight HTTP work stops instead of burning SIGKILL grace window. Reuses the signal-threading plumbing shipped in #3222. Preserved: degradation guard (>20% drop rejects), TTL extension on failure, lock release in finally, 429 proxy fallback with signal propagation, page-level abort checks. Tests: 43 pass (dropped 2 withPerCountryTimeout runtime tests that targeted removed code; kept proxyFetch pre-aborted-signal test since the proxy plumbing is still exercised by the global fetch). Full test:data 5865 pass. Typecheck + lint clean. * fix(portwatch): stream-aggregate EP3 into per-port accumulators (PR #3225 P1) Review feedback on PR #3225: the first globalisation pass materialised the full 90-day activity dataset as Map<iso3, Feature[]> before any aggregation. At ~2000 ports × 90 days ≈ 180k feature objects × ~400 bytes = ~70MB RSS. Trades the timeout failure mode for an OOM/restart under large datasets on the 1GB Railway container. Fix: replace the two-phase "fetch-all then compute" shape with a single-pass streaming aggregator. - `fetchAndAggregateActivity(since, { signal, progress })` folds each page's features into Map<iso3, Map<portId, PortAccum>> inline and discards the raw features. Only ~2000 per-port accumulators (~100 bytes each = ~200KB) live across pages. Memory is O(ports), not O(rows). - PortAccum holds running counters for each aggregation window: last30_calls, last30_count, last30_import, last30_export, prev30_calls, last7_calls, last7_count. Captured once per port at first sighting (date ASC order preserves old `rows[0].portname` behaviour). - New `finalisePortsForCountry(portAccumMap, refMap)` — exported for tests — computes the exact same per-port fields as the removed `computeCountryPorts`: tankerCalls30d = last30_calls, tankerCalls30dPrev = prev30_calls, import/export from their sums, avg30d = last30_calls/last30_count, avg7d = last7_calls/last7_count, anomalySignal unchanged, trendDelta unchanged, top-50 truncation unchanged. - `fetchAll()` now calls the aggregator + finaliser directly; the transient Feature[] map is gone. Preserved from PR #3225: shutdown AbortController plumbing, 429 proxy fallback with signal propagation, degradation guard, SIGTERM diagnostic flush, page-level abort checks. Tests: 48 pass (was 43, +5 runtime tests for finalisePortsForCountry covering trendDelta, anomalySignal, top-N truncation, and missing refMap entries). Full test:data 5877 pass. Typecheck + lint clean. * fix(portwatch): skip EP3 geometry + thread signal into EP4 refs (PR #3225 review) Two valid findings from PR #3225 review on commit `fca1eaba0`: P1 — `returnGeometry: 'false'` was present on the EP4 refs paginator but missing on the EP3 activity paginator. ArcGIS returns geometry by default (~100-200KB per page). Across the ~150-200 pages this PR adds to the perf-critical path, that's tens of MB of unused coordinate data on the wire — directly undermining the 420s section budget the globalisation is meant to fit inside. One-line fix on the EP3 params block. P2 — `fetchAllPortRefs` accepted no signal, so during the 'refs' stage a SIGTERM would abort the shutdownController without cancelling any in-flight EP4 fetches. Each page could still run up to FETCH_TIMEOUT (45s) after the handler fired. Low blast radius (process.exit terminates the container regardless) but the PR description claimed full signal threading end-to-end; this closes the last gap. Tests: 50 pass (was 48, +2 for the two new assertions — returnGeometry presence in both paginators, fetchAllPortRefs signal plumbing). Full test:data 5879 pass. Typecheck + lint clean.	2026-04-20 13:44:42 +04:00
Elie Habib	1a2295157e	feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief (#3224 ) * feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief Problem ------- The 2026-04-20 08:00 brief contained 12 stories, 7 of which were duplicates of 4 events, alongside low-importance filler (niche commodity, domestic crime). notification-relay's IMPORTANCE_SCORE_MIN gate (#3223, set to 63) only applies to the realtime fanout path. The digest cron reads the same story:track:.currentScore but has NO absolute score floor — it just ranks and slices(0, 30), so on slow news days low-importance items bubble up to fill slots. Change ------ scripts/seed-digest-notifications.mjs: - New getDigestScoreMin() reads DIGEST_SCORE_MIN env at call time (Railway flips apply on the next cron tick, no redeploy). - Default 0 = no-op, so this PR is behaviour-neutral until the env var is set on Railway. - Filter runs AFTER deduplicateStories() so it drops clusters by the REPRESENTATIVE's score (which is the highest-scoring member of its cluster per materializeCluster's sort). - One-line operator log when the floor fires: [digest] score floor dropped N of M clusters (DIGEST_SCORE_MIN=X) tests/digest-score-floor.test.mjs (6 regressions): - getDigestScoreMin reads from process.env (not a module const) - default is 0 (no-op) - rejects non-integer / negative values (degrades to 0) - filter runs AFTER dedup, BEFORE slice(0, DIGEST_MAX_ITEMS) - short-circuits when floor is 0 (no wasted filter pass) - log line emits "dropped N of M clusters" Operator activation ------------------- Set on Railway seed-digest-notifications service: DIGEST_SCORE_MIN=63 Start at 63 to match the realtime gate, then nudge up/down based on the log lines over ~24h. Unset = off (pre-PR behaviour). Why not bundle a cosine-threshold bump -------------------------------------- The cosine-threshold tuning (0.60 -> 0.55 per the threshold probe) is an env-only flip already supported by the dedup orchestrator. Bundling an env-default change into this PR would slow rollback. Operator sets DIGEST_DEDUP_COSINE_THRESHOLD=0.55 on Railway as a separate action; this PR stays scoped to the score floor. Verification ------------ - npm run test:data 5825/5825 pass - tests/digest-score-floor 6/6 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on changed files clean (pre-existing main() complexity warning on this file is unchanged) - lint:md 0 errors - version:check OK Post-Deploy Monitoring & Validation ----------------------------------- - What to monitor* after setting DIGEST_SCORE_MIN on Railway: - `[digest] score floor dropped` lines — expect ~5-25% of clusters dropped on bulk-send ticks (stories=700+) - `[digest] Cron run complete: N digest(s) sent` stays > 0 - Expected healthy behaviour - 0-5 clusters dropped on normal ~80-story ticks - 50-200 dropped on bulk 700+ story ticks - brief still reports 10-30 stories for PRO users - Failure signals / rollback - 0 digests sent for 24h after flipping the var - user-visible brief now has < 10 stories - Rollback: unset DIGEST_SCORE_MIN on Railway dashboard (instant, no deploy), next cron tick reverts to unfiltered behaviour - Validation window: 24h - Owner: koala73 Related ------- - #3218 LLM prompt upgrade (source of importanceScore quality) - #3221 geopolitical scope for critical - #3223 notification-relay realtime gate (mirror knob) - #3200 embedding-based dedup (the other half of brief quality) * fix(digest): return null (not []) when score floor drains every cluster Greptile P2 finding on PR #3224. When DIGEST_SCORE_MIN is set high enough to filter every cluster, buildDigest previously returned [] (empty array). The caller's `if (!stories)` guard only catches falsy values, so [] slipped past the "No stories in window" skip-log and the run reached formatDigest([], nowMs) which returns null, then silently continued at the !storyListPlain check. Flow was still correct (no digest sent) but operators lost the observability signal to distinguish "floor too high" from "no news today" from "dedup ate everything". Fix: - buildDigest now returns null when the post-floor list is empty, matching the pre-dedup-empty path. Caller's existing !stories guard fires the canonical skip-log. - Emits a distinct `[digest] score floor dropped ALL N clusters (DIGEST_SCORE_MIN=X) — skipping user` line BEFORE the return, so operators can spot an over-aggressive floor in the logs. - Test added covering both the null-return contract and the distinct "dropped ALL" log line. 7/7 dedup-score-floor tests pass.	2026-04-20 10:19:03 +04:00
Elie Habib	4f38ee5a19	fix(portwatch): per-country timeout + SIGTERM progress flush (#3222 ) * fix(portwatch): per-country timeout + SIGTERM progress flush Diagnosed from Railway log 2026-04-20T04:00-04:07: Port-Activity section hit the 420s section cap with only batch 1/15 logged. Gap between batch 1 (67.3s) and SIGTERM was 352s of silence — batch 2 stalled because Promise.allSettled waits for the slowest country and processCountry had no per-country budget. One slow country (USA/CHN with many ports × many pages under ArcGIS EP3 throttling) blocked the whole batch and cascaded to the section timeout, leaving batches 2..15 unattempted. Two changes, both stabilisers ahead of the proper fix (globalising EP3): 1. Wrap processCountry in Promise.race against a 90s PER_COUNTRY_TIMEOUT_MS. Bounds worst-case batch time at ~90s regardless of ArcGIS behaviour. Orphan fetches keep running until their own AbortSignal.timeout(45s) fires — acceptable since the process exits soon after either way. 2. Share a `progress` object between fetchAll() and the SIGTERM handler so the kill path flushes batch index, seeded count, and the first 10 error messages. Past timeout kills discarded the errors array entirely, making every regression undiagnosable. * fix(portwatch): address PR #3222 P1+P2 (propagate abort, eager error flush) Review feedback on #3222: P1 — The 90s per-country timeout did not actually stop the timed-out country's work; Promise.race rejected but processCountry kept paginating with fresh 45s fetch timeouts per page, violating the CONCURRENCY=12 cap and amplifying ArcGIS throttling instead of containing it. Fix: thread an AbortController signal from withPerCountryTimeout through processCountry → fetchActivityRows → fetchWithTimeout. fetchWithTimeout combines the caller signal with AbortSignal.timeout(FETCH_TIMEOUT) via AbortSignal.any so the per-country abort propagates into the in-flight fetch. fetchActivityRows also checks signal.aborted between pages so a cancel lands on the next iteration boundary even if the current page has already resolved. Node 24 runtime supports AbortSignal.any. P2 — SIGTERM diagnostics missed failures from the currently-stuck batch because progress.errors was only populated after Promise.allSettled returned. A kill during the pending await left progress.errors empty. Fix: attach p.catch(err => errors.push(...)) to each wrapped promise before Promise.allSettled. Rejections land in the shared errors array at the moment they fire, so a SIGTERM mid-batch sees every rejection that has already occurred (including per-country timeouts that have already aborted their controllers). The settled loop skips rejected outcomes to avoid double-counting. Also exports withPerCountryTimeout with an injectable timeoutMs so the new runtime tests can exercise the abort path at 40ms. Runtime tests verify: (a) timer fires → underlying signal aborted + work rejects with the per-country message, (b) work-resolves-first returns the value, (c) work-rejects-first surfaces the real error, (d) eager .catch flush populates a shared errors array before allSettled resolves. Tests: 45 pass (was 38, +7 — 4 runtime + 3 source-regex). Full test:data: 5867 pass. Typecheck + lint clean. * fix(portwatch): abort also cancels 429 proxy fallback (PR #3222 P1 follow-up) Second review iteration on #3222: the per-country AbortController fix from `b2f4a2626` stopped at the direct fetch() and did not reach the 429 proxy fallback. httpsProxyFetchRaw only accepted timeoutMs, so a timed-out country could keep a CONNECT tunnel + request alive for up to another FETCH_TIMEOUT (45s) after the batch moved on — the exact throttling scenario the PR is meant to contain. The concurrency cap was still violated on the slow path. Threads `signal` all the way through: - scripts/_proxy-utils.cjs: proxyConnectTunnel + proxyFetch accept an optional signal option. Early-reject if `signal.aborted` before opening the socket. Otherwise addEventListener('abort') destroys the in-flight proxy socket + TLS tunnel and rejects with signal.reason. Listener removed in cleanup() on all terminal paths. Refactored both functions around resolveOnce/rejectOnce guards so the abort path races cleanly with timeout and network errors without double-settle. - scripts/_seed-utils.mjs: httpsProxyFetchRaw accepts + forwards `signal` to proxyFetch. - scripts/seed-portwatch-port-activity.mjs: fetchWithTimeout's 429 branch passes its caller signal to httpsProxyFetchRaw. Backward compatible: signal is optional in every layer, so the many other callers of proxyFetch / httpsProxyFetchRaw across the repo are unaffected. Tests: 49 pass (was 45, +4). New runtime test proves pre-aborted signals reject proxyFetch synchronously without touching the network. Source-regex tests assert signal threading at each layer. Full test:data 5871 pass. Typecheck + lint clean.	2026-04-20 09:36:10 +04:00
Elie Habib	6e639274f1	feat(scoring): set data-driven score gate thresholds (82/69/63) (#3223 ) * feat(scoring): set data-driven score gate thresholds (82/69/63) Calibrated from v5 shadow-log recalibration on 2026-04-20: critical sensitivity: 85 → 82 (fires on Hormuz closures, ship seizures, ceasefire collapses) high sensitivity: 65 → 69 (fires on mass shootings, blockade enforcement, major diplomatic) all/default MIN: 40 → 63 (drops tutorials, domestic crime, niche commodity) Activation: set IMPORTANCE_SCORE_LIVE=1 + IMPORTANCE_SCORE_MIN=63 on Railway notification-relay env vars after this PR merges. Scoring pipeline journey: PR #3069 — fixed stale relay score (Pearson 0.31 → 0.41) PR #3143 — closed /api/notify bypass PR #3144 — weight rebalance severity 55% (Pearson 0.41 → 0.67) PR #3218 — LLM prompt upgrade + cache v2 PR #3221 — geopolitical scope for critical This PR — final threshold constants * fix(scoring): use IMPORTANCE_SCORE_MIN for 'all' sensitivity threshold Review found the hardcoded 63 for 'all' sensitivity diverged from the IMPORTANCE_SCORE_MIN env var used at the relay ingress gate. An operator setting IMPORTANCE_SCORE_MIN=40 would still have 'all' subscribers miss alerts scored 40-62. Now both gates use the same env var (default 63).	2026-04-20 09:19:47 +04:00
Elie Habib	14c1314629	fix(scoring): scope "critical" to geopolitical events, not domestic tragedies (#3221 ) The weight rebalance (PR #3144) amplified a prompt gap: domestic mass shootings (e.g. "8 children killed in Louisiana") scored 88 because the LLM classified them as "critical" (mass-casualty 10+ killed) and the 55% severity weight pushed them into the critical gate. But WorldMonitor is a geopolitical monitor — domestic tragedies are terrible but not geopolitically destabilizing. Prompt change (both ais-relay.cjs + classify-event.ts): - "critical" now explicitly requires GEOPOLITICAL scope: "events that destabilize international order, threaten cross-border security, or disrupt global systems" - Domestic mass-casualty events (mass shootings, industrial accidents) moved to "high" — still important, but not critical-sensitivity alerts - Added counterexamples: "8 children killed in mass shooting in Louisiana → domestic mass-casualty → high" and "23 killed in fireworks factory explosion → industrial accident → high" - Retained: "700 killed in Sudan drone strikes → geopolitical mass- casualty in active civil war → critical" Classify cache: v2→v3 (bust stale entries that lack geopolitical scope). Shadow-log: v4→v5 (clean dataset for recalibration under the scoped prompt). 🤖 Generated with Claude Opus 4.6 via Claude Code Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-20 08:40:29 +04:00
Elie Habib	e2255840f6	fix(sentry): allowlist third-party tile hosts for maplibre Failed-to-fetch filter (#3220 ) * fix(sentry): allowlist third-party tile hosts for maplibre Failed-to-fetch filter Follow-up to #3217. The blanket "any maplibre frame + (hostname)" rule would drop real failures on our self-hosted R2 PMTiles bucket or any first-party fetch that happens to run on a maplibre-framed stack. Enumerated the actual third-party hosts our maplibre paths fetch from (tilecache.rainviewer.com, basemaps.cartocdn.com, tiles.openfreemap.org, protomaps.github.io) into a module-level Set and gated the filter on membership. First-party hosts keep surfacing. Updated regression test to mirror real-world mixed stacks (maplibre + first-party fetch wrapper) so the allowlist is what decides, not the pre-existing "all frames are maplibre internals" filter which is orthogonal. * fix(sentry): route maplibre AJAX errors past the generic vendor-only filter Review feedback: the broader "all non-infra frames are maplibre internals" TypeError filter at main.ts:287 runs BEFORE the new host-allowlist block and short-circuits it for all-vendor stacks. Meaning a self-hosted R2 basemap fetch failure whose stack is purely maplibre frames would still be silently dropped, defeating the point of the allowlist. Carve out the `Failed to fetch (<host>)` AJAX pattern: precompute `isMaplibreAjaxFailure` and skip the generic vendor filter when it matches, so the host-allowlist check is always the one that decides. Added two regression tests covering the all-maplibre edge case both ways: - allowlisted host + all-maplibre → still suppressed - non-allowlisted host + all-maplibre → surfaces	2026-04-20 08:21:37 +04:00
Elie Habib	1581b2dd70	fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 (#3218 ) * fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 PR B of the scoring recalibration plan (docs/plans/2026-04-17-002). Builds on PR A (weight rebalance, PR #3144) which achieved Pearson 0.669. This PR targets the remaining noise in the 50-69 band where editorials, tutorials, and domestic crime score alongside real news. LLM prompt upgrade (both writers): - scripts/ais-relay.cjs CLASSIFY_SYSTEM_PROMPT: added per-level guidelines, content-type distinction (editorial/opinion/tutorial → info, domestic crime → info, mass-casualty → critical), and concrete counterexamples. - server/worldmonitor/intelligence/v1/classify-event.ts: same guidelines added to align the second cache writer. Classify cache bump: - classify:sebuf:v1: → classify:sebuf:v2: in all three locations (ais-relay.cjs classifyCacheKey, list-feed-digest.ts enrichWithAiCache, _shared.ts CLASSIFY_CACHE_PREFIX). Old v1 entries expire naturally (24h TTL). All items reclassified within 15 min of Railway deploy. Keyword additions (_classifier.ts): - HIGH: 'sanctions imposed', 'sanctions package', 'new sanctions' (phrase patterns — no false positives on 'sanctioned 10 individuals') - MEDIUM: promoted 'ceasefire' from LOW to reduce prompt/keyword misalignment during cold-cache window Shadow-log v4: - Clean dataset for post-prompt-change recalibration. v3 rolls off via 7-day TTL. Deploy order: Railway first (seedClassify prewarms v2 cache immediately), then Vercel. First ~15 min of v4 may carry stale digest-cached scores. 🤖 Generated with Claude Opus 4.6 via Claude Code + Compound Engineering v2.49.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(scoring): align classify-event prompt + remove dead keywords + update v4 docs Review findings on PR #3218: P1: classify-event.ts prompt was missing 2 counterexamples and the "Focus" line present in the relay prompt. Both writers share classify:sebuf:v2 cache, so differing prompts mean nondeterministic classification depending on which path writes first. Now both prompts have identical level guidelines and counterexamples (format differs: array vs single object, but classification logic is aligned). P2: Removed 3 dead phrase-pattern keywords (sanctions imposed/package/ new) — the existing 'sanctions' entry already substring-matches all of them and maps to the same (high, economic). Just dead code that misled readers into thinking they added coverage. P2: Updated stale v3 references in cache-keys.ts (doc block + exported constant) and shadow-score-report.mjs header to v4. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-20 08:05:13 +04:00
Elie Habib	84eec7f09f	fix(health): align breadthHistory maxStaleMin with actual Tue-Sat cron schedule (#3219 ) Production alarm: `breadthHistory` went STALE_SEED every Monday morning despite the seeder running correctly. Root cause was a threshold / schedule mismatch: - Schedule (Railway): 02:00 UTC, Tuesday through Saturday. Five ticks per week, capturing Mon-Fri market close → following-day 02:00 UTC. - Threshold: maxStaleMin=2880 (48h), assuming daily cadence. - Max real gap: Sat 02:00 UTC → Tue 02:00 UTC = 72h. The existing 48h alarm fired every Monday at ~02:00 UTC when the Sun/Mon cron ticks are intentionally absent, until the Tue 02:00 UTC run restored fetchedAt. Fix: bump maxStaleMin to 5760 (96h). 72h covers the weekend gap; extra 24h tolerates one missed Tue run without alarming. Comment now records the actual schedule + reasoning. No seeder change needed — logs confirm the service fires and completes correctly on its schedule (Apr 16/17/18 02:00 UTC runs all "Done" with 3/3 readings, `Stopping Container` is normal Railway cron teardown). Diagnostic memo: this is the class of bug where the schedule comment lies. Original comment said "daily cron at 21:00 ET". True start time is 22:00 EDT / 21:00 EST Mon-Fri (02:00 UTC next day) AND only Mon-Fri, so "daily" is wrong by two days every week.	2026-04-20 07:56:54 +04:00
Elie Habib	0bc5b49267	fix(sentry): filter MapLibre AJAXError tile-fetch transients (#3217 ) WORLDMONITOR-NE/NF (8 events, 1 user): MapLibre wraps transient tile fetch failures as `TypeError: Failed to fetch (<hostname>)` and rethrows inside a Generator-backed Promise, leaking to onunhandledrejection even though DeckGLMap's map-error handler already logs them as warnings. Triggered mostly by adblockers/extensions and flaky mobile networks. Add a beforeSend filter gated on (a) the maplibre-specific paren message format (`Failed to fetch (hostname)` — our own fetch code throws plain `Failed to fetch` without hostname) and (b) presence of a maplibre vendor frame in the stack, so a real first-party fetch regression with the same message shape still surfaces. Covered by 3 regression tests.	2026-04-20 07:54:42 +04:00
Elie Habib	fc0c6bc163	fix(convex): use ConvexError for AUTH_REQUIRED so Sentry treats it as expected (#3216 ) * fix(convex): use ConvexError for AUTH_REQUIRED so Sentry treats it as expected WORLDMONITOR-N3: 8 events / 2 users from server-side Convex reporting `Uncaught Error: Authentication required` thrown by requireUserId() when a query fires before the WebSocket auth handshake completes. Every other business error in this repo uses ConvexError("CODE"), which Convex's server-side Sentry integration treats as expected rather than unhandled. Migrate requireUserId to ConvexError("AUTH_REQUIRED") (no consumer parses the message string — only a code comment references it) and add a matching client-side ignoreErrors pattern next to the existing API_ACCESS_REQUIRED precedent, as defense-in-depth against unhandled rejections reaching the browser SDK. * fix(sentry): drop broad AUTH_REQUIRED ignoreErrors — too many real call sites Review feedback: requireUserId() backs user-initiated actions (checkout, billing portal, API key ops), not just the benign query-race path. A bare `ConvexError: AUTH_REQUIRED` message-regex in ignoreErrors has no stack context, so a genuine auth regression breaking those flows for signed-in users would be silently dropped. The server-side ConvexError migration in convex/lib/auth.ts is enough to silence WORLDMONITOR-N3; anything that still reaches the browser SDK should surface.	2026-04-20 01:35:33 +04:00
Elie Habib	4c9888ac79	docs(mintlify): panel reference pages (PR 2) (#3213 ) * docs(mintlify): add user-facing panel reference pages (PR 2) Six new end-user pages under docs/panels/ for the shipped panels that had no user-facing documentation in the published docs, per the plan docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md. All claims are grounded in the live component source + SEED_META + handler dirs — no invented fields, counts, or refresh windows. - panels/latest-brief.mdx — daily AI brief panel (ready/composing/ locked states). Hard-gated PRO (`premium: 'locked'`). - panels/forecast.mdx — AI Forecasts panel (internal id `forecast`, label "AI Forecasts"). Domain + macro-region filter pills; 10% probability floor. Free on web, locked on desktop. - panels/consumer-prices.mdx — 5-tab retail-price surface (Overview / Categories / Movers / Spread / Health) with market, basket, and 7/30/90-day controls. Free. - panels/disease-outbreaks.mdx — WHO / ProMED / national health ministries outbreak alerts with alert/warning/watch pills. Free. - panels/radiation-watch.mdx — EPA RadNet + Safecast observations with anomaly scoring and source-confidence synthesis. Free. - panels/thermal-escalation.mdx — FIRMS/VIIRS thermal clusters with persistence and conflict-adjacency flags. Free. Also: - docs/docs.json — new Panels nav group (Latest Brief, AI Forecasts, Consumer Prices, Disease Outbreaks, Radiation Watch, Thermal Escalation). - docs/features.mdx — cross-link every panel name in the Cmd+K inventory to its new page (and link Country Instability + Country Resilience from the same list). - docs/methodology/country-resilience-index.mdx — short "In the dashboard" bridge section naming the three CRI surfaces (Resilience widget, Country Deep-Dive, map choropleth) so the methodology page doubles as the user-facing panel reference for CRI. No separate docs/panels/country-resilience.mdx — keeps the methodology page as the single source of truth. * docs(panels): fix Latest Brief polling description Reviewer catch: the panel does schedule a 60-second re-poll while in the composing state. `COMPOSING_POLL_MS = 60_000` at src/components/LatestBriefPanel.ts:78, and `scheduleComposingPoll()` is called from `renderComposing()` at :366. The poll auto-promotes the panel to ready without a manual refresh and is cleared when the panel leaves composing. My earlier 'no polling timer' line was right for the ready state but wrong as a blanket claim. * docs(panels): fix variant-availability claims across all 6 panel pages Reviewer catch on consumer-prices surfaced the same class of error on 4 other panel pages: I described variant availability with loose phrasing ('most variants', 'where X context is relevant', 'tech/ finance/happy opt-in') that didn't match the actual per-variant panel registries in src/config/panels.ts. Verified matrix against each _PANELS block directly: Panel \| FULL \| TECH \| FINANCE \| HAPPY \| COMMODITY consumer-prices \| opt \| - \| def \| - \| def latest-brief \| def \| def \| def \| - \| def (all PRO-locked) disease-outbreaks \| def \| - \| - \| - \| - radiation-watch \| def \| - \| - \| - \| - thermal-escalation \| def \| - \| - \| - \| - forecast \| def \| - \| - \| - \| - (PRO-locked on desktop) All 6 pages now name the exact variant blocks in src/config/panels.ts that register them, so the claim is re-verifiable by grep rather than drifting with future panel-registry changes. docs(panels): fix 5 reviewer findings — no invented controls/sources/keys All fixes cross-checked against source. - consumer-prices: no basket selector UI exists. The panel has a market bar, a range bar, and tab/category affordances; basket is derived from market selection (essentials-<code>, or DEFAULT_BASKET for the 'all' aggregate view). Per src/components/ConsumerPricesPanel.ts:120-123 and :216-229. - disease-outbreaks: 'Row click opens advisory' was wrong. The only interactive elements in-row are the source-name <a> link (sanitised URL, target=_blank); clicking the row itself is a no-op (the only content-level listener is for [data-filter] pills and the search input). Per DiseaseOutbreaksPanel.ts:35-49,115-117. - disease-outbreaks: upstream list was wrong. Actual seeder uses WHO DON (JSON API), CDC HAN (RSS), Outbreak News Today (aggregator), and ThinkGlobalHealth disease tracker (ProMED-sourced, 90d lookback). Noted the in-panel tooltip's shorter 'WHO, ProMED, health ministries' summary and gave the full upstream list with the 72h Redis TTL. Per seed-disease-outbreaks .mjs:31-38. - radiation-watch: summary bar renders 6 cards, not 7 — Anomalies, Elevated, Confirmed, Low Confidence, Conflicts, Spikes. The CPM-derived indicator is a per-row badge (radiation-flag-converted at :67), not a summary card. Moved the CPM reference to the per-row badges list. Per RadiationWatchPanel.ts:85-112. - latest-brief: Redis key shape corrected. The composer writes the envelope to brief:{userId}:{issueSlot} (where issueSlot comes from issueSlotInTz, not a plain date) and atomically writes a latest pointer at brief:latest:{userId} → {issueSlot}. Readers resolve via the pointer. 7-day TTL on both. Per seed-digest-notifications.mjs:1103-1115 and api/latest-brief.ts:80-89. * docs(panels): Tier 1 — PRO/LLM panel reference pages (9) Adds user-facing panel pages for the 9 PRO/LLM-backed surfaces flagged in the extended audit. All claims grounded in component source + src/config/panels.ts entries (with line cites). - panels/chat-analyst.mdx — WM Analyst (conversational AI, 5 quick actions, 4 domain scopes, POSTs /api/chat-analyst via premiumFetch). - panels/market-implications.mdx — AI Market Implications trade signals (LONG/SHORT/HEDGE × HIGH/MEDIUM/LOW, transmission paths, 120min maxStaleMin, degrade-to-warn). Carries the repo's disclaimer verbatim. - panels/deduction.mdx — Deduct Situation (opt-in PRO; 5s cooldown; composes buildNewsContext + active framework). - panels/daily-market-brief.mdx — Daily Market Brief (stanced items, framework selector, live vs cached source badge). - panels/regional-intelligence.mdx — Regional Intelligence Board (7 BOARD_REGIONS, 6 structured blocks + narrative sections, request-sequence arbitrator, opt-in PRO). - panels/strategic-posture.mdx — AI Strategic Posture (cached posture + live military vessels → recalcPostureWithVessels; free on web, enhanced on desktop). - panels/stock-analysis.mdx — Premium Stock Analysis (per-ticker deep dive: signal, targets, consensus, upgrades, insiders, sparkline). - panels/stock-backtest.mdx — Premium Backtesting (longitudinal view; live vs cached data badge). - panels/wsb-ticker-scanner.mdx — WSB Ticker Scanner (retail sentiment + velocity score with 4-tier color bucketing). All 9 are PRO (8 via apiKeyPanels allowlist at src/config/panels.ts:973, strategic-posture is free-on-web/enhanced-on-desktop). Variant matrices name the exact _PANELS block registering each panel. docs(panels): Tier 2 — flagship free data panels (7) Adds reference pages for 7 flagship free panels. Every claim grounded in the panel component + src/config/panels.ts per-variant registration. - panels/airline-intel.mdx — 6-tab aviation surface (ops/flights/ airlines/tracking/news/prices), 8 aviation RPCs, user watchlist. - panels/tech-readiness.mdx — ranked country tech-readiness index with 6-hour in-panel refresh interval. - panels/trade-policy.mdx — 6-tab trade-policy surface (restrictions/ tariffs/flows/barriers/revenue/comtrade). - panels/supply-chain.mdx — composite stress + carriers + minerals + Scenario Engine trigger surface (free panel, PRO scenario activation). - panels/sanctions-pressure.mdx — OFAC SDN + Consolidated list pressure rollup with new/vessels/aircraft summary cards and top-8 country rows. - panels/hormuz-tracker.mdx — Hormuz chokepoint drill-down; status indicator + per-series bar charts; references Scenario Engine's hormuz-tanker-blockade template. - panels/energy-crisis.mdx — IEA 2026 Energy Crisis Policy Response Tracker; category/sector/status filters. All 7 are free. Variant matrices name exact _PANELS blocks registering each panel. docs(panels): Tier 3 — compact panels (5) Adds reference pages for 5 compact user-facing panels. - panels/world-clock.mdx — 22 global market-centre clocks with exchange labels + open/closed indicators (client-side only). - panels/monitors.mdx — personal keyword alerts, localStorage-persisted; links to Features → Custom Monitors for longer explanation. - panels/oref-sirens.mdx — OREF civil-defence siren feed; active + 24h wave history; free on web, PRO-locked on desktop (_desktop && premium: 'locked' pattern). - panels/telegram-intel.mdx — topic-tabbed Telegram channel mirror via relay; free on web, PRO-locked on desktop. - panels/fsi.mdx — US KCFSI + EU FSI stress composites with four-level colour buckets (Low/Moderate/Elevated/High). All 5 grounded in component source + variant registrations. oref-sirens and telegram-intel correctly describe the _desktop && locking pattern rather than the misleading 'PRO' shorthand used earlier for other desktop-locked panels. * docs(panels): Tier 4 + 5 catalogue pages, nav re-grouping, features cross-links Closes out the comprehensive panel-reference expansion. Two catalogue pages cover the remaining ~60 panels collectively so they're all searchable and findable without dedicated pages per feed/tile. - panels/news-feeds.mdx — catalogue covering all content-stream panels: regional news (africa/asia/europe/latam/us/middleeast/politics), topical news (climate/crypto/economic/markets/mining/commodity/ commodities), tech/startup streams (startups/unicorns/accelerators/ fintech/ipo/layoffs/producthunt/regionalStartups/thinktanks/vcblogs/ defense-patents/ai-regulation/tech-hubs/ai/cloud/hardware/dev/ security/github), finance streams (bonds/centralbanks/derivatives/ forex/institutional/policy/fin-regulation/commodity-regulation/ analysis), happy variant streams (species/breakthroughs/progress/ spotlight/giving/digest/events/funding/counters/gov/renewable). - panels/indicators-and-signals.mdx — catalogue covering compact market-indicator tiles, correlation panels, and misc signal surfaces. Grouped by function: sentiment, macro, calendars, market-structure, commodity, crypto, regional economy, correlation panels, misc signals. docs/docs.json — split the Panels group into three for navigability: - Panels — AI & PRO (11 pages) - Panels — Data & Tracking (16 pages) - Panels — Catalogues (2 pages) docs/features.mdx — Cmd+K inventory rewritten as per-family sub-lists with links to every panel page (or catalogue page for the ones that live in a catalogue). Replaces the prior run-on paragraph. Every catalogue panel is also registered in at least one _PANELS block in src/config/panels.ts — the catalogue pages note this and point readers to the config file for variant-availability details. docs(panels): fix airline-intel + world-clock source-of-truth errors - airline-intel: refresh behavior section was wrong on two counts. (1) The panel DOES have a polling timer: a 5-minute setInterval in the constructor calling refresh() (which reloads ops + active tab). (2) The 'prices' tab does NOT re-fetch on tab switch — it's explicitly excluded from both tab-switch and auto-refresh paths, loading only on explicit search-button click. Three distinct refresh paths now documented with source line hints. Per src/components/AirlineIntelPanel.ts ~:173 (setInterval), :287 (prices tab-switch guard), :291 (refresh() prices skip). - world-clock: the WORLD_CITIES list has 30 entries, not '~22'. Replaced the approximate count with the exact number and a :14-43 line-range cite so it's re-verifiable.	2026-04-20 00:53:17 +04:00
Elie Habib	d1a4cf7780	docs(mintlify): add Route Explorer + Scenario Engine workflow pages (#3211 ) * docs(mintlify): add Route Explorer + Scenario Engine workflow pages Checkpoint for review on the IA refresh (per plan docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md). - docs/docs.json: link Country Resilience Index methodology under Intelligence & Analysis so the flagship 222-country feature is reachable from the main nav (previously orphaned). Add a new Workflows group containing route-explorer and scenario-engine. - docs/route-explorer.mdx: standalone workflow page. Who it is for, Cmd+K entry, four tabs (Current / Alternatives / Land / Impact), inputs, keyboard bindings, map-state integration, PRO gating with free-tier blur + public-route highlight, data sources. - docs/scenario-engine.mdx: standalone workflow page. Template categories (conflict / weather / sanctions / tariff_shock / infrastructure / pandemic), how a scenario activates on the map, PRO gating, pointers to the async job API. Deferred to follow-up commits in the same PR: - documentation.mdx landing rewrite - features.mdx refresh - maritime-intelligence.mdx link-out to Route Explorer - Panels nav group (waits for PR 2 content) All content grounded in live source files cited inline. * docs(mintlify): fix Route Explorer + Scenario Engine review findings Reviewer caught 4 cases where I described behavior I hadn't read carefully. All fixes cross-checked against source. - route-explorer (free-tier): the workflow does NOT blur a numeric payload behind a public demo route. On free tier, fetchLane() short-circuits to renderFreeGate() which blurs the left rail, replaces the tab area with an Upgrade-to-PRO card, and applies a generic public-route highlight on the map. No lane data is rendered in any tab. See src/components/RouteExplorer/ RouteExplorer.ts:212 + :342. - route-explorer (keyboard): Tab / Shift+Tab moves focus between the panel and the map. Direct field jumps are F (From), T (To), P (Product/HS2), not Tab-cycling. Also added the full KeyboardHelp binding list (S swap, ↑/↓ list nav, Enter commit, Cmd+, copy URL, Esc close, ? help, 1-4 tabs). See src/components/RouteExplorer/ KeyboardHelp.ts:9 and RouteExplorer.ts:623. - scenario-engine: the SCENARIO_TEMPLATES array only ships templates of 4 types today (conflict, weather, sanctions, tariff_shock). The ScenarioType union includes infrastructure and pandemic but no templates of those types ship. Dropped them from the shipped table and noted the type union leaves room for future additions. - scenario-engine + api-scenarios: the worker writes status: 'done' (not 'completed') on success, 'failed' on error; pending is synthesised by the status endpoint when no worker record exists. Fixed both the new workflow page and the merged api-scenarios.mdx completed-response example + polling language. See scripts/scenario-worker.mjs:421 and src/components/SupplyChainPanel.ts:870. * docs(mintlify): fix third-round review findings (real IDs + 4-state lifecycle) - api-scenarios (template example): replaced invented hormuz-closure-30d / ["hormuz"] with the actually-shipped hormuz-tanker-blockade / ["hormuz_strait"] from scenario- templates.ts:80. Listed the other 5 shipped template IDs so scripted users aren't dependent on a single example. - api-scenarios (status lifecycle): worker writes FOUR states, not three. Added the intermediate "processing" state with startedAt, written by the worker at job pickup (scenario- worker.mjs:411). Lifecycle now: pending → processing → done\|failed. Both pending and processing are non-terminal. - scenario-engine (scripted use blurb): mirror the 4-state language and link into the lifecycle table. - scenario-engine (UI dismiss): replaced "Click Deactivate" with the actual × dismiss control on the scenario banner (aria-label: "Dismiss scenario") per src/components/SupplyChainPanel.ts:790. Also described the banner contents (name, chokepoints, countries, tagline). - api-shipping-v2: while fixing chokepoint IDs, also corrected "hormuz" → "hormuz_strait" and "bab-el-mandeb" → "bab_el_mandeb" across all four occurrences in the shipping v2 page (from PR #3209). Real IDs come from server/_shared/chokepoint- registry.ts (snake_case, not kebab-case, not bare "hormuz"). * docs(mintlify): fix fourth-round findings (banner DOM, webhook TTL refresh) - scenario-engine: accurate description of the rendered scenario banner. Always-present elements are the ⚠ icon, scenario name, top-5 impacted countries with impact %, and dismiss ×. Params chip (e.g. '14d · +110% cost') and 'Simulating …' tagline are conditional on the worker result carrying template parameters (durationDays, disruptionPct, costShockMultiplier). The banner never lists affected chokepoints by name — the map and the chokepoint cards surface those. Per renderScenarioBanner at src/components/SupplyChainPanel.ts:750. - api-shipping-v2 (webhook TTL): register extends both the record and the owner-index set's 30-day TTL via atomic pipeline (SET + SADD + EXPIRE). rotate-secret and reactivate only extend the record's TTL — neither touches the owner-index set, so the owner index can expire independently if a caller only rotates/reactivates within a 30-day window. Re-register to keep both alive. Per api/v2/shipping/webhooks.ts:230 (register pipeline) and :325 (rotate setCachedJson on record only). * docs(mintlify): fix PRO auth contract (trusted origin ≠ PRO) - api-scenarios: 'X-WorldMonitor-Key (or trusted browser origin) + PRO' was wrong — isCallerPremium() explicitly skips trusted-origin short-circuits (keyCheck.required === false) and only counts (a) an env-valid or user-owned wm_-prefixed API key with apiAccess entitlement, or (b) a Clerk bearer with role=pro or Dodo tier ≥ 1. Browser calls work because premiumFetch() injects one of those credentials per request, not because Origin alone authenticates. Per server/_shared/premium-check.ts:34 and src/services/premium-fetch.ts:66. - usage-auth: strengthened the 'Entitlement / tier gating' section to state outright that authentication and PRO entitlement are orthogonal, and that trusted Origin is NOT accepted as PRO even though it is accepted for public endpoints. Listed the two real credential forms that pass the gate. * docs(mintlify): fix stale line cite (MapContainer.activateScenario at :1010) Greptile review P2: prose cited MapContainer.ts:1004 but activateScenario is declared at :1010. Line 1004 landed inside the JSDoc block. * docs(mintlify): finish PR 1 — landing rewrite, features refresh, maritime link-out Completes the PR 1 items from docs/plans/2026-04-19-001-feat-docs-user- facing-ia-refresh-plan.md that were deferred after the checkpoint on Route Explorer + Scenario Engine + CRI nav. No new pages — only edits to existing pages to point at and cohere with the new workflow pages. - documentation.mdx: landing rewrite. Dropped brittle counts (344 news sources, 49 layers, 24 CII countries, 31+ sources, 24 typed services) in favor of durable product framing. Surfaced the shipped differentiators that were invisible on the landing previously: Country Resilience Index (222 countries, linked to its methodology page), AI daily brief, Route Explorer, Scenario Engine, MCP server. Kept CII and CRI as two distinct country-risk surfaces — do not conflate. - features.mdx: replaced the 'all 55 panels' Cmd+K claim and the stale inventory list with family-grouped descriptions that include the panels this audit surfaced as missing (disease- outbreaks, radiation-watch, thermal-escalation, consumer-prices, latest-brief, forecast, country-resilience). Added a Workflows section linking to Route Explorer and Scenario Engine, and a Country-level risk section linking CII + CRI. Untouched sections (map, marker clustering, data layers, export, monitors, activity tracking) left as-is. - maritime-intelligence.mdx: collapsed the embedded Route Explorer subsection to a one-paragraph pointer at /route-explorer so the standalone page is the canonical home. Panels nav group remains intentionally unadded; it waits on PR 2 content to avoid rendering an empty group in Mintlify.	2026-04-19 18:39:36 +04:00
Elie Habib	1f66b0c486	fix(billing): wrap non-Error throws before Sentry.captureException (#3212 ) * fix(billing): wrap non-Error throws before Sentry.captureException Convex/Clerk bootstrap occasionally rejects with undefined, which Sentry.captureException then serializes as a synthetic `Error: undefined` with zero stack frames — impossible to debug. Normalize err to a real Error carrying the non-Error value in the message so the next occurrence yields a usable event. Resolves WORLDMONITOR-ND. * fix(billing): apply non-Error normalization to openBillingPortal too Review feedback: initSubscriptionWatch was fixed but openBillingPortal shares the same Convex/Clerk bootstrap helpers and the same raw Sentry.captureException(err) pattern — the synthetic `Error: undefined` signature can still surface from that path. Extract a module-level normalizeCaughtError() helper and apply it at both catch sites. * fix(billing): attach original err as cause on normalized Error Greptile P2: preserve the raw thrown value as structured `cause` data so Sentry can display it alongside the stringified message. Assigned post-construction because tsconfig target=ES2020 lacks the ErrorOptions typing for `new Error(msg, { cause })`; modern browsers and Sentry read the property either way.	2026-04-19 17:06:19 +04:00
Elie Habib	4853645d53	fix(brief): switch carousel to @vercel/og on edge runtime (#3210 ) * fix(brief): switch carousel to @vercel/og on edge runtime Every attempt to ship the Phase 8 Telegram carousel on Vercel's Node serverless runtime has failed at cold start: - PR #3174 direct satori + @resvg/resvg-wasm: Vercel edge bundler refused the `?url` asset import required by resvg-wasm. - PR #3174 (fix) direct satori + @resvg/resvg-js native binding: Node runtime accepted it, but Vercel's nft tracer does not follow @resvg/resvg-js/js-binding.js's conditional `require('@resvg/resvg-js-<platform>-<arch>-<libc>')` pattern, so the linux-x64-gnu peer package was never bundled. Cold start threw MODULE_NOT_FOUND, isolate crashed, FUNCTION_INVOCATION_FAILED on every request including OPTIONS, and Telegram reported WEBPAGE_CURL_FAILED with no other signal. - PR #3204 added `vercel.json` `functions.includeFiles` to force the binding in, but (a) the initial key was a literal path that Vercel micromatch read as a character class (PR #3206 fixed), (b) even with the corrected `api/brief/carousel/*` wildcard, the function still 500'd across the board. The `functions.includeFiles` path appears honored in the deployment manifest but not at runtime for this particular native-binding pattern. Fix: swap the renderer to @vercel/og's ImageResponse, which is Vercel's first-party wrapper around satori + resvg-wasm with Vercel-native bundling. Runs on Edge runtime — matches every other API route in the project. No native binding, no includeFiles, no nft tracing surprises. Cold start ~300ms, warm ~30ms. Changes: - server/_shared/brief-carousel-render.ts: replace renderCarouselPng (Uint8Array) with renderCarouselImageResponse (ImageResponse). Drop ensureLibs + satori + @resvg/resvg-js dynamic-import dance. Keep layout builders (buildCover/buildThreads/buildStory) and font loading unchanged — the Satori object trees are wire-compatible with ImageResponse. - api/brief/carousel/[userId]/[issueDate]/[page].ts: flip `runtime: 'nodejs'` -> `runtime: 'edge'`. Delegate rendering to the renderer's ImageResponse and return it directly; error path still 503 no-store so CDN + Telegram don't pin a bad render. - vercel.json: drop the now-useless `functions.includeFiles` block. - package.json: drop direct `@resvg/resvg-js` and `satori` deps (both now bundled inside @vercel/og). - tests/deploy-config.test.mjs: replace the native-binding regression guards with an assertion that no `functions` block exists (with a comment pointing at the skill documenting the micromatch gotcha for future routes). - tests/brief-carousel.test.mjs: updated comment references. Verified: - typecheck + typecheck:api clean - test:data 5814/5814 pass - node -e test: @vercel/og imports cleanly in Node (tests that reach through the renderer file no longer depend on native bindings) Post-deploy validation: curl -I -H "User-Agent: TelegramBot (like TwitterBot)" \ "https://www.worldmonitor.app/api/brief/carousel/<uid>/<slot>/0" # Expect: HTTP/2 403 (no token) or 200 (valid token) # NOT: HTTP/2 500 FUNCTION_INVOCATION_FAILED Then tail Railway digest logs on the next tick; the `[digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED` line should stop appearing, and the 3-image preview should actually land on Telegram. Add renderer smoke test + fix Cache-Control duplication Reviewer flagged residual risk: no dedicated carousel-route smoke test for the @vercel/og path. Adds one, and catches a real bug in the process. Findings during test-writing: 1. @vercel/og's ImageResponse runs CLEANLY in Node via tsx — the comment in brief-carousel.test.mjs saying "we can't test the render in Node" was true for direct satori + @resvg/resvg-wasm but no longer holds after PR #3210. Pure Node render works end-to-end: satori tree-parse, jsdelivr font fetch, resvg-wasm init, PNG output. ~850ms first call, ~20ms warm. 2. ImageResponse sets its own default `Cache-Control: public, immutable, no-transform, max-age=31536000`. Passing Cache-Control via the constructor's headers option APPENDS rather than overrides, producing a duplicated comma-joined value like `public, immutable, no-transform, max-age=31536000, public, max-age=60` on the Response. The route handler was doing exactly this via extraHeaders. Fix: drop our Cache-Control override and rely on @vercel/og's 1-year immutable default — envelope is only immutable for its 7d Redis TTL so the effective ceiling is 7d anyway (after that the route 404s before render). Changes: - tests/brief-carousel.test.mjs: 6 new assertions under `renderCarouselImageResponse`: * renders cover / threads / story pages, each returning a valid PNG (magic bytes + size range) * rejects a structurally empty envelope * threads non-cache extraHeaders onto the Response * pins @vercel/og's Cache-Control default so it survives caller-supplied Cache-Control overrides (regression guard for the bug fixed in this commit) - api/brief/carousel/[userId]/[issueDate]/[page].ts: remove the stacked Cache-Control; lean on @vercel/og default. Drop the now- unused `PAGE_CACHE_TTL` constant. Comment explains why. Verified: - test:data 5820/5820 pass (was 5814, +6 smoke) - typecheck + typecheck:api clean - Render smoke: cover 825ms / threads 23ms / story 16ms first run (wasm init dominates first render)	2026-04-19 15:18:12 +04:00
Elie Habib	e4c95ad9be	docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage (#3209 ) * docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage Audit against api/ + proto/ revealed 9 OpenAPI specs missing from nav, the scenario/v1 service undocumented, and MCP (32 tools + OAuth 2.1 flow) with no user-facing docs. The stale Docs_To_Review/API_REFERENCE.md still pointed at pre-migration endpoints that no longer exist. - Wire 9 orphaned specs into docs.json: ConsumerPrices, Forecast, Health, Imagery, Radiation, Resilience, Sanctions, Thermal, Webcam - Hand-write ScenarioService.openapi.yaml (3 RPCs) until it's proto-backed (tracked in issue #3207) - New MCP page with tool catalog + client setup (Claude Desktop/web, Cursor) - New MDX for OAuth, Platform, Brief, Commerce, Notifications, Shipping v2, Proxies - New Usage group: quickstart, auth matrix, rate limits, errors - Remove docs/Docs_To_Review/API_REFERENCE.md and EXTERNAL_APIS.md (referenced dead endpoints); add README flagging dir as archival * docs(mintlify): move scenario docs out of generated docs/api/ tree The pre-push hook enforces that docs/api/ is proto-generated only. Replace the hand-written ScenarioService.openapi.yaml with a plain MDX page (docs/api-scenarios.mdx) until the proto migration lands (tracked in issue #3207). * docs(mintlify): fix factual errors flagged in PR review Reviewer caught 5 endpoints where I speculated on shape/method/limits instead of reading the code. All fixes cross-checked against the source: - api-shipping-v2: route-intelligence is GET with query params (fromIso2, toIso2, cargoType, hs2), not POST with a JSON body. Response shape is {primaryRouteId, chokepointExposures[], bypassOptions[], warRiskTier, disruptionScore, ...}. - api-commerce: /api/product-catalog returns {tiers, fetchedAt, cachedUntil, priceSource} with tier groups free\|pro\|api_starter\| enterprise, not the invented {currency, plans}. Document the DELETE purge path too. - api-notifications: Slack/Discord /oauth/start are POST + Clerk JWT + PRO (returning {oauthUrl}), not GET redirects. Callbacks remain GET. - api-platform: /api/version returns the latest GitHub Release ({version, tag, url, prerelease}), not deployed commit/build metadata. - api-oauth + mcp: /api/oauth/register limit is 5/60s/IP (match code), not 10/hour. Also caught while double-checking: /api/register-interest and /api/contact are 5/60min and 3/60min respectively (1-hour window, not 1-minute). Both require Turnstile. Removed the fabricated limits for share-url, notification-channels, create-checkout (they fall back to the default per-IP limit). * docs(mintlify): second-round fixes — verify every claim against source Reviewer caught 7 more cases where I described API behavior I hadn't read. Each fix below cross-checked against the handler. - api-commerce (product-catalog): tiers are flat objects with monthlyPrice/annualPrice/monthlyProductId/annualProductId on paid tiers, price+period for free, price:null for enterprise. There is no nested plans[] array. - api-commerce (referral/me): returns {code, shareUrl}, not counts. Code is a deterministic 8-char HMAC of the Clerk userId; binding into Convex is fire-and-forget via ctx.waitUntil. - api-notifications (notification-channels): actual action set is create-pairing-token, set-channel, set-web-push, delete-channel, set-alert-rules, set-quiet-hours, set-digest-settings. Replaced the made-up list. - api-shipping-v2 (webhooks): alertThreshold is numeric 0-100 (default 50), not a severity string. Subscriber IDs are wh_+24hex; secret is raw 64-char hex (no whsec_ prefix). POST registration returns 201. Added the management routes: GET /{id}, POST /{id}/rotate-secret, POST /{id}/reactivate. - api-platform (cache-purge): auth is Authorization: Bearer RELAY_SHARED_SECRET, not an admin-key header. Body takes keys[] and/or patterns[] (not {key} or {tag}), with explicit per-request caps and prefix-blocklist behavior. - api-platform (download): platform+variant query params, not file=<id>. Response is a 302 to a GitHub release asset; documented the full platform/variant tables. - mcp: server also accepts direct X-WorldMonitor-Key in addition to OAuth bearer. Fixed the curl example which was incorrectly sending a wm_live_ API key as a bearer token. - api-notifications (youtube/live): handler reads channel or videoId, not channelId. - usage-auth: corrected the auth-matrix row for /api/mcp to reflect that OAuth is one of two accepted modes. * docs(mintlify): fix Greptile review findings - mcp.mdx: 'Five' slow tools → 'Six' (list contains 6 tools) - api-scenarios.mdx: replace invalid JSON numeric separator (8_400_000_000) with plain integer (8400000000) Greptile's third finding — /api/oauth/register rate-limit contradiction across api-oauth.mdx / mcp.mdx / usage-rate-limits.mdx — was already resolved in commit `4f2600b2a` (reviewed commit was `eb5654647`).	2026-04-19 15:03:16 +04:00
Elie Habib	38e6892995	fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205 ) * fix(brief): per-run slot URL so same-day digests link to distinct briefs Digest emails at 8am and 1pm on the same day pointed to byte-identical magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz. Each compose run overwrote the single daily envelope in place, and the composer rolling 24h story window meant afternoon output often looked identical to morning. Readers clicking an older email got whatever the latest cron happened to write. Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The magazine URL, carousel URLs, and Redis key all carry the slot, and each digest dispatch gets its own frozen envelope that lives out the 7d TTL. envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026". The digest cron also writes a brief:latest:{userId} pointer (7d TTL, overwritten each compose) so the dashboard panel and share-url endpoint can locate the most recent brief without knowing the slot. The previous date-probing strategy does not work once keys carry HHMM. No back-compat for the old YYYY-MM-DD format: the verifier rejects it, the composer only ever writes the new shape, and any in-flight notifications signed under the old format will 403 on click. Acceptable at the rollout boundary per product decision. * fix(brief): carve middleware bot allowlist to accept slot-format carousel path BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn image fetchers would 403 on sendMediaGroup, breaking previews for the new digest links. CI missed this because tests/middleware-bot-gate.test.mts still exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the slot format and add a regression asserting the pre-slot shape is now rejected, so legacy links cannot silently leak the allowlist after the rollout. * fix(brief): preserve caller-requested slot + correct no-brief share-url error Two contract bugs in the slot rollout that silently misled callers: 1. GET /api/latest-brief?slot=X where X has no envelope was returning { status: 'composing', issueDate: <today UTC> } — which reads as "today's brief is composing" instead of "the specific slot you asked about doesn't exist". A caller probing a known historical slot would get a completely unrelated "today" signal. Now we echo the requested slot back (issueSlot + issueDate derived from its date portion) when the caller supplied ?slot=, and keep the UTC-today placeholder only for the no-param path. 2. POST /api/brief/share-url with no slot and no latest-pointer was falling into the generic invalid_slot_shape 400 branch. That is not an input-shape problem; it is "no brief exists yet for this user". Return 404 brief_not_found — the same code the existing-envelope check returns — so callers get one coherent contract: either the brief exists and is shareable, or it doesn't and you get 404.	2026-04-19 14:15:59 +04:00
Elie Habib	56054bfbc1	fix(brief): use wildcard glob in vercel.json functions key (PR #3204 follow-up) (#3206 ) * fix(brief): use wildcard glob in vercel.json functions key PR #3204 shipped the right `includeFiles` value but the WRONG key: "api/brief/carousel/[userId]/[issueDate]/[page].ts" Vercel's `functions` config keys are micromatch globs, not literal paths. Bracketed segments like `[userId]` are parsed as character classes (match any ONE character from {u,s,e,r,I,d}), so my rule matched zero files and `includeFiles` was silently ignored. Post- merge probe still returned HTTP 500 FUNCTION_INVOCATION_FAILED on every request. Build log shows zero mentions of `carousel` or `resvg` — corroborates the key never applied. Fix: wildcard path segments. "api/brief/carousel/*" Matches any file under the carousel route dir. Since the only deployed file there is the dynamic-segment handler, the effective scope is identical to what I originally intended. Added a second regression test that sweeps every functions key and fails loudly if any bracketed segment slips back in. Guards against future reverts AND against anyone copy-pasting the literal route path without realising Vercel reads it as a glob. 23/23 deploy-config tests pass (was 22, +1 new guard). Address Greptile P2: widen bracket-literal guard regex Greptile spotted that `/\[[A-Za-z]+\]/` only matches purely-alphabetic segment names. Real-world Next.js routes often use `[user_id]`, `[issue_date]`, `[page1]`, `[slug2024]` — none flagged by the old regex, so the guard would silently pass on the exact kind of regression it was written to catch. Widened to `/\[[A-Za-z][A-Za-z0-9_]\]/`: - requires a leading letter (so legit char classes like `[0-9]` and `[!abc]` don't false-positive) - allows letters, digits, underscores after the first char - covers every Next.js-style dynamic-segment name convention Also added a self-test that pins positive cases (userId, user_id, issue_date, page1, slug2024) and negative cases (the actual `*` glob, `[0-9]`, `[!abc]`) so any future narrowing of the regex breaks CI immediately instead of silently re-opening PR #3206. 24/24 deploy-config tests pass (was 23, +1 new self-test).	2026-04-19 14:02:30 +04:00
Elie Habib	305dc5ef36	feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200 ) * feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) Replaces the inline Jaccard story-dedup in seed-digest-notifications with an orchestrator that can run Jaccard, shadow, or full embedding modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so production behaviour is unchanged until Phase C shadow + Phase D flip. New modules (scripts/lib/): - brief-dedup-consts.mjs tunables + cache prefix + __constants bag - brief-dedup-jaccard.mjs verbatim 0.55-threshold extract (fallback) - entity-gazetteer.mjs cities/regions gazetteer + common-caps - brief-embedding.mjs OpenRouter /embeddings client with Upstash cache, all-or-nothing timeout, cosineSimilarity - brief-dedup-embed.mjs complete-link clustering + entity veto (pure) - brief-dedup.mjs orchestrator, env read at call entry, shadow archive, structured log line Operator tools (scripts/tools/): - calibrate-dedup-threshold.mjs offline calibration runner + histogram - golden-pair-validator.mjs live-embedder drift detector (nightly CI) - shadow-sample.mjs Sample A/B CSV emitter over SCAN archive Tests: - brief-dedup-jaccard.test.mjs migrated from regex-harness to direct import plus orchestrator parity tests (22) - brief-dedup-embedding.test.mjs 9 plan scenarios incl. 10-permutation property test, complete-link non-chain (21) - brief-dedup-golden.test.mjs 20-pair mocked canary (21) Workflows: - .github/workflows/dedup-golden-pairs.yml nightly live-embedder canary (07:17 UTC), opens issue on drift Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran shuts Hormuz") case can't return true under a single coherent classification (country-in-A vs capital-in-B sit on different sides of the actor/location boundary). Gazetteer follows the plan's "countries are actors" intent; the test is updated to assert false with a comment pointing at the irreducible capital-country coreference limitation. Verification: - npm run test:data 5825/5825 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on new files clean - lint:md 0 errors Phase B (calibration), Phase C (shadow), and Phase D (flip) are subsequent PRs. * refactor(digest-dedup): address review findings 193-199 Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across kieran-typescript, security-sentinel, performance-oracle, architecture- strategist, and code-simplicity reviewers. Fixes below; all 64 dedup tests + 5825 data tests + 171 edge-function tests still green. P1 #193 - dedup regex + redis pipeline duplication - Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs; both orchestrator and embedding client import from there. - normalizeForEmbedding now delegates to stripSourceSuffix from the Jaccard module so the outlet allow-list is single-sourced. P1 #194 - embedding timeout floor + negative-budget path - callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0 instead of opening a doomed 250ms fetch. - Removed Math.max(250, ...) floor that let wall-clock cap overshoot. P1 #195 - dead env getters - Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled / getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs (zero callers; orchestrator reimplements inline). P2 #196 - orchestrator cleanup bundle - Removed re-exports at bottom of brief-dedup.mjs. - Extracted materializeCluster into brief-dedup-jaccard.mjs; both the fallback and orchestrator use the shared helper. - Deleted clusterWithEntityVeto wrapper; orchestrator inlines the vetoFn wiring at the single call site. - Shadow mode now runs Jaccard exactly once per tick (was twice). - Fallback warn line carries reason=ErrorName so operators can filter timeout vs provider vs shape errors. - Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs silently falling to jaccard). P2 #197 - workflow + shadow-sample hardening - dedup-golden-pairs.yml body composition no longer relies on a heredoc that would command-substitute validator stdout. Switched to printf with sanitised LOG_TAIL (printable ASCII only) and --body-file so crafted fixture text cannot escape into the runner. - shadow-sample.mjs Upstash helper enforces a hardcoded command allowlist (SCAN \| GET \| EXISTS). P2 #198 - test + observability polish - Scenarios 2 and 3 deep-equal returned clusters against the Jaccard expected shape, not just length. Also assert the reason= field. P3 #199 - nits - Removed __constants test-bag; jaccard tests use named imports. - Renamed deps.apiKey to deps._apiKey in embedding client. - Added @pre JSDoc on diffClustersByHash about unique-hash contract. - Deferred: mocked golden-pair test removal, gazetteer JSON migration, scripts/tools AGENTS.md doc note. Todos 193-199 moved from pending to complete. Verification: - npm run test:data 5825/5825 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on changed files clean * fix(digest-dedup): address Greptile P2 findings on PR #3200 1. brief-embedding.mjs: wrap fetch lookup as `(...args) => globalThis.fetch(...args)` instead of aliasing bare `fetch`. Aliasing captures the binding at module-load time, so later instrumentation / Edge-runtime shims don't see the wrapper — same class of bug as the banned `fetch.bind(globalThis)` pattern flagged in AGENTS.md. 2. dedup-golden-pairs.yml: `gh issue create --label "..." \|\| true` silently swallowed the failure when any of dedup/canary/p1 labels didn't pre-exist, breaking the drift alert channel while leaving the job red in the Actions UI. Switched to repeated `--label` flags + `--create-label` so any missing label is auto-created on first drift, and dropped the `\|\| true` so a legitimate failure (network / auth) surfaces instead of hiding. Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1); applied pre-merge so the nightly canary is usable from day one. * fix(digest-dedup): two P1s found on PR #3200 P1 — canary classifier must match production Nightly golden-pair validator was checking a hardcoded threshold (default 0.60) and always applied the entity veto, while the actual dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase C/D env flip could make the canary green while prod was wrong or red while prod was healthy, defeating the whole point of a drift detector. Fix: - golden-pair-validator.mjs now calls readOrchestratorConfig(process.env) — the same helper the orchestrator uses — so any classifier knob added later is picked up automatically. The threshold and veto- enabled flags are sourced from env by default; a --threshold CLI flag still overrides for manual calibration sweeps. - dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.), which operators must keep in lockstep with Railway. The workflow_dispatch threshold input now defaults to empty; the scheduled canary always uses the production-parity config. - Validator log line prints the effective config + source so nightly output makes the classifier visible. P1 — shadow archive writes were fail-open `defaultRedisPipeline()` returns null on timeout / auth / HTTP failure. `writeShadowArchive()` only had a try/catch, so the null result was silently treated as success. A Phase C rollout could log clean "mode=shadow … disagreements=X" lines every tick while the Upstash archive received zero writes — and Sample B labelling would then find no batches, silently killing calibration. Fix: - writeShadowArchive now inspects the pipeline return. null result, non-array response, per-command {error}, or a cell without {result: "OK"} all return {ok: false, reason}. - Orchestrator emits a warn line with the failure reason, and the structured log line carries archive_write=ok\|failed so operators can grep for failed ticks. - Regression test in brief-dedup-embedding.test.mjs simulates the null-pipeline contract and asserts both the warn and the structured field land. Verification: - test:data 5825/5825 pass - dedup suites 65/65 pass (new: archive-fail regression) - typecheck + api clean - biome check clean on changed files fix(digest-dedup): two more P1s found on PR #3200 P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED The prior round fixed the threshold/veto knobs but left the canary running embeddings regardless of whether production could actually reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the classifier, so a drift signal is meaningless — or worse, a live OpenRouter issue flags the canary while prod is obliviously fine. Fix: - golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the same readOrchestratorConfig() helper the orchestrator uses. When either says "embed path inactive in prod", the validator logs an explicit skip line and exits 0. The nightly workflow then shows green, which is the correct signal ("nothing to drift against"). - A --force CLI flag remains for manual dispatch during staged rollouts. - dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables alongside the threshold and veto-enabled knobs, so all four classifier gates stay in lockstep with Railway. - Validator log line now prints mode + remoteEmbedEnabled so the canary output surfaces which classifier it validated. P1 — shadow-sample Sample A was biased by SCAN order enumerate-and-dedup added every seen pair to a dedup key BEFORE filtering by agreement. If the same pair appeared in an agreeing batch first and a disagreeing batch later, the disagreeing occurrence was silently dropped. SCAN order is unspecified, so Sample A could omit real disagreement pairs. Fix: - Extracted the enumeration into a pure `enumeratePairs(archives, mode)` export so the logic is testable. Mode filter runs BEFORE the dedup check: agreeing pairs are skipped entirely under --mode disagreements, so any later disagreeing occurrence can still claim the dedup slot. - Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression cases: agreement-then-disagreement, reversed order (symmetry), always-agreed omission, population enumeration, cross-batch dedup. - isMain guard added so importing the module for tests does not kick off the CLI scan path. Verification: - test:data 5825/5825 pass - dedup suites 70/70 pass (5 new shadow-sample regressions) - typecheck + api clean - biome check clean on changed files Operator follow-up before Phase C: Set all FOUR dedup repo variables in GitHub alongside Railway: DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED, DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED * refactor(digest-dedup): Railway is the single source of truth for dedup config Fair user pushback: asking operators to set four DIGEST_DEDUP_* values in BOTH Railway (where the cron runs) AND GitHub repo variables (where the canary runs) is architectural debt. Two copies of the same truth will always drift. Solution: the digest cron publishes its resolved config to Upstash on every tick under brief:dedup:config:v1 (2h TTL). The nightly golden-pair canary reads that key instead of env vars. Railway stays the sole source of truth; no parallel repo variables to maintain. A missing/expired key signals "cron hasn't run" and the canary skips with exit 0 — better than validating against hardcoded defaults that might diverge from prod. Changes: - brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants. - brief-dedup.mjs: new publishActiveConfig() fires at the start of every deduplicateStories() call (before the mode short-circuit, so jaccard ticks also publish a "mode=jaccard" signal the canary can read). Fire-and-forget; archive-write error semantics still apply if the operator wants stricter tracking. - golden-pair-validator.mjs: removed readOrchestratorConfig(env) path. Now calls fetchActiveConfigFromUpstash() and either validates against that config, skips when the embed path is inactive, or skips when the key is missing (with --force override for manual dispatch). - dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines and the corresponding repo-variable dependency. Only the three Upstash + OpenRouter secrets remain. - tests: two new regressions assert config is published on every tick (shadow AND jaccard modes) with the right shape + TTL. Operator onboarding now takes one action: set the four DIGEST_DEDUP_* variables on the Railway seed-digest-notifications service. Nothing to set in GitHub beyond the existing OPENROUTER_API_KEY / UPSTASH_* secrets. Verification: - test:data 5825/5825 pass - dedup suites 72/72 pass (2 new config-publish regressions) - typecheck + api clean - biome check clean on changed files * refactor(digest-dedup): ship embed directly, drop phases/canary/shadow User feedback: "i dont need multiple phases and shit, we go directly to embed". Fair. Ripping out the overengineering I accumulated: DELETED - .github/workflows/dedup-golden-pairs.yml (nightly canary) - scripts/tools/golden-pair-validator.mjs - scripts/tools/shadow-sample.mjs - scripts/tools/calibrate-dedup-threshold.mjs - tests/fixtures/brief-dedup-golden-pairs.json - tests/brief-dedup-golden.test.mjs - tests/brief-dedup-shadow-sample.test.mjs SIMPLIFIED - brief-dedup.mjs: removed shadow mode, publishActiveConfig, writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes, and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now binary: `embed` (default) or `jaccard` (instant kill switch). - brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_, ACTIVE_CONFIG_. - Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path). Railway deploy with OPENROUTER_API_KEY set = embeddings live on next cron tick. Set MODE=jaccard on Railway to revert instantly. Orchestrator still falls back to Jaccard on any embed-path failure (timeout, provider outage, missing API key, bad response). Fallback warn carries reason=<ErrorName>. The cron never fails because embeddings flaked. All 64 dedup tests + 5825 data tests still green. Net diff: -1,407 lines. Operator single action: set OPENROUTER_API_KEY on Railway's seed-digest-notifications service (already present) and ship. No GH Actions, no shadow archives, no labelling sprints. If the 0.60 threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on Railway — takes effect on next tick, no redeploy. * fix(digest-dedup): multi-word location phrases in the entity veto Extractor was whitespace-tokenising and only single-token matching against LOCATION_GAZETTEER, silently making every multi-word entry unreachable: extractEntities("Houthis strike ship in Red Sea") → { locations: [], actors: ['houthis','red','sea'] } ✗ shouldVeto("Houthis strike ship in Red Sea", "US escorts convoy in Red Sea") → false ✗ With MODE=embed as the default, that turned off the main anti-overmerge safety rail for bodies of water, regions, and compound city names — exactly the P07-Hormuz / Houthis-Red-Sea headlines the veto was designed to cover. Fix: greedy longest-phrase scan with a sliding window. At each token position try the longest multi-word phrase first (down to 2), require first AND last tokens to be capitalised (so lowercase prose like "the middle east" doesn't falsely match while headline "Middle East" does), lowercase connectors in between are fine ("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to single-token lookup when no multi-word phrase fits. Now: extractEntities("Houthis strike ship in Red Sea") → { locations: ['red sea'], actors: ['houthis'] } ✓ shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true ✓ Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4 (longest gazetteer entry: "ho chi minh city"), so this is effectively O(N). Added 5 regression tests covering Red Sea, South China Sea, Strait of Hormuz (lowercase-connector case), Abu Dhabi, and New York, plus the Houthis-vs-US veto reproducer from the P1. All 5825 data tests + 45 dedup tests green; lint + typecheck clean.	2026-04-19 13:49:48 +04:00
Elie Habib	27849fee1e	fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn (#3204 ) * fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn Real root cause of every Telegram carousel WEBPAGE_CURL_FAILED since PR #3174 merged. Not middleware (last PR fixed that theoretical path but not the observed failure). The Vercel function itself crashes HTTP 500 FUNCTION_INVOCATION_FAILED on every request including OPTIONS - the isolate can't initialise. The handler imports brief-carousel-render which lazy-imports @resvg/resvg-js. That package's js-binding.js does runtime require(@resvg/resvg-js-<platform>-<arch>-<libc>). On Vercel Lambda (Amazon Linux 2 glibc) that resolves to @resvg/resvg-js-linux-x64-gnu. Vercel nft tracing does NOT follow this conditional require so the optional peer package isnt bundled. Cold start throws MODULE_NOT_FOUND, isolate crashes, Vercel returns FUNCTION_INVOCATION_FAILED, Telegram reports WEBPAGE_CURL_FAILED. Fix: vercel.json functions.includeFiles forces linux-x64-gnu binding into the carousel functions bundle. Only this route needs it; every other api route is unaffected. Verified: - deploy-config tests 21/21 pass - JSON valid - Reproduced 500 via curl on all methods and UAs - resvg-js/js-binding.js confirms linux-x64-gnu is the runtime binary on Amazon Linux 2 glibc Post-merge: curl with TelegramBot UA should return 200 image/png instead of 500; next cron tick should clear the Railway [digest] Telegram carousel 400 line. * Address Greptile P2s: regression guard + arch-assumption reasoning Two P2 findings on PR #3204: P2 #1 (inline on vercel.json:6): Platform architecture assumption undocumented. If Vercel migrates to Graviton/arm64 Lambda the cold-start crash silently returns. vercel.json is strict JSON so comments aren't possible inline. P2 #2 (tests/deploy-config.test.mjs:17): No regression guard for the carousel includeFiles rule. A future vercel.json tidy-up could silently revert the fix with no CI signal. Fixed both in a single block: - New describe() in deploy-config.test.mjs asserts the carousel route's functions entry exists AND its includeFiles points at @resvg/resvg-js-linux-x64-gnu. Any drift fails the build. - The block comment above it documents the Amazon Linux 2 x86_64 glibc assumption that would have lived next to the includeFiles entry if JSON supported comments. Includes the Graviton/arm64 migration pointer. tests 22/22 pass (was 21, +1 new).	2026-04-19 13:36:17 +04:00
Elie Habib	45f02fed00	fix(sentry): filter Three.js OrbitControls setPointerCapture NotFoundError (#3201 ) * fix(sentry): suppress Three.js OrbitControls setPointerCapture NotFoundError OrbitControls' pointerdown handler calls setPointerCapture after the browser has already released the pointer (focus change, rapid re-tap), leaking as an unhandled NotFoundError. OrbitControls is bundled into main-.js so hasFirstParty=true; matched by the unique setPointerCapture message (grep confirms no first-party setPointerCapture usage). Resolves WORLDMONITOR-NC. fix(sentry): gate OrbitControls setPointerCapture filter on bundle-only stack Review feedback: suppressing by message alone would hide a future first-party setPointerCapture regression. Mirror the existing OrbitControls filter's provenance check — require absence of any source-mapped .ts/.tsx frame so the filter only matches stacks whose only non-infra frame is the bundled main chunk. Adds positive + negative regression tests for the pair. * fix(sentry): gate OrbitControls filter on positive three.js context signature Review feedback: absence of .ts/.tsx frames is not proof of third-party origin because production stacks are often unsymbolicated. Replace the negative-only gate with a positive OrbitControls signature — require a frame whose context slice contains the literal `_pointers … setPointerCapture` adjacency unique to three.js OrbitControls. Update tests to cover the production-realistic case (unsymbolicated first-party bundle frame calling setPointerCapture must still reach Sentry) plus a defensive no-context fallthrough.	2026-04-19 13:15:31 +04:00
Elie Habib	d7f87754f0	fix(emails): update transactional email copy — 22 → 30+ services (#3203 ) Follow-up to #3202. Greptile flagged two transactional email templates still claimed '22 services' while /pro now advertises '30+': - api/register-interest.js:90 — interest-registration confirmation email ('22 Services, 1 Key') - convex/payments/subscriptionEmails.ts:57 — API subscription confirmation email ('22 services, one API key') A user signing up via /pro would read '30+ services' on the page, then receive an email saying '22'. Both updated to '30+' matching the /pro page and the actual server domain count (31 in server/worldmonitor/*, plus api/scenario/v1/ = 32, growing).	2026-04-19 13:15:17 +04:00
Elie Habib	135082d84f	fix(pro): correct service-domain count — 22 → 30+ (server has 31) (#3202 ) * fix(pro): correct service-domain count — 22 → 30+ (server has 31, growing) The /pro page advertised '22 services' / '22 service domains' but server/worldmonitor/, proto/worldmonitor/, and src/generated/server/worldmonitor/ all have 31 domain dirs (aviation, climate, conflict, consumer-prices, cyber, displacement, economic, forecast, giving, health, imagery, infrastructure, intelligence, maritime, market, military, natural, news, positive-events, prediction, radiation, research, resilience, sanctions, seismology, supply-chain, thermal, trade, unrest, webcam, wildfire). api/scenario/v1/ adds a 32nd recently shipped surface. Used '30+' rather than the literal '31' so the page doesn't drift again every time a new domain ships (the '22' was probably accurate at one point too). 168 string substitutions across all 21 locale JSON files (8 keys each: twoPath.proDesc, twoPath.proF1, whyUpgrade.fasterDesc, pillars.askItDesc, dataCoverage.subtitle, proShowcase.oneKey, apiSection.restApi, faq.a8). Plus 10 in pro-test/index.html (meta description, og:description, twitter:description, SoftwareApplication ld+json description + Pro Monthly offer, FAQ ld+json a8, noscript fallback). Bundle rebuilt. * fix(pro): Bulgarian grammar — drop definite-article suffix after 30+	2026-04-19 13:07:07 +04:00
Elie Habib	cce46a1767	fix(pro): API tier is launched — drop 'Coming Soon' label (#3198 ) The /pro comparison-table column header still read 'API (Coming Soon)' across all 21 locales (and locale-translated variants), but convex/config/productCatalog.ts has api_starter at currentForCheckout=true, publicVisible=true, priceCents=9999 — $99.99/month, with api_starter_annual at $999/year. The API tier is shipped and self-serve. Updated pricingTable.apiHeader → 'API ($99.99)' for every locale, matching the same '<Tier> ($<price>)' pattern as 'Free ($0)' and 'Pro ($39.99)'. Bundle rebuilt.	2026-04-19 11:44:35 +04:00
Elie Habib	c7aacfd651	fix(health): persist WARNING events + add failure-log timeline (#3197 ) * fix(health): persist WARNING events + add failure-log timeline WARNING status (stale seeds) was excluded from the health:last-failure Redis write (line 680 checked `!== 'WARNING'`). When UptimeRobot keyword- checks for "HEALTHY" and gets a WARNING response, it flags DOWN, but no forensic trail was left in Redis. This made stale-seed incidents invisible to post-mortem investigation. Changes: - Write health:last-failure for ANY non-HEALTHY status (including WARNING) - Add health:failure-log (LPUSH list, last 50 entries, 7-day TTL) so multiple incidents are preserved as a timeline, not just the latest - Include warnCount alongside critCount in the snapshot - Broaden the problems filter to capture all non-OK statuses * fix(health): dedupe failure-log entries by incident signature Repeated polls during one long WARNING window would LPUSH near-identical snapshots, filling the 50-entry log and evicting older distinct incidents. Now compares a signature (status + sorted problem set) against the previous entry via health:failure-log-sig. Only appends when the incident changes. The last-failure key is still updated every poll (latest timestamp matters). * fix(health): add 4s timeout to persist pipelines + consistent arg types Addresses greptile review on PR #3197: - Both persist redisPipeline calls now pass 4_000ms timeout (main data pipeline uses 8_000ms; persist is less critical so shorter is fine) - LTRIM/EXPIRE args use numbers consistently (was mixing number/string) * fix(health): atomic sig swap via SET ... GET to eliminate dedupe race Two concurrent /api/health requests could both read the old signature before either write lands, appending duplicate entries. Now uses SET key val EX ttl GET (Redis 6.2+) to atomically swap the sig and return the previous value in one pipeline command. The LPUSH only fires if the returned previous sig differs from the new one. Also skips the second redisPipeline call entirely when sig matches (no logCmds to send). * fix(health): exclude seedAgeMin from dedupe sig + clear sig on recovery Two issues with the failure-log dedupe: 1. seedAgeMin changes on every poll (e.g. 31min, 32min, 33min), so the signature changed every time and LPUSH still fired on every probe during a STALE_SEED window. Now uses a separate sigKeys array with only key:status (no age) for the signature, while problemKeys still includes ages for the snapshot payload. 2. The sig was never cleared on recovery. If the same problem set recurred after a healthy gap, the old sig (within its 24h TTL) would match and the recurrence would be silently skipped. Now DELs health:failure-log-sig when overall === 'HEALTHY'. * fix(health): move sig write after LPUSH in same pipeline The sig was written eagerly in the first pipeline (SET ... GET), but the LPUSH happened in a separate background pipeline. If that second write failed, the sig was already advanced, permanently deduping the incident out of the timeline. Now: GET sig first (read-only), then write last-failure + LPUSH + sig all in one pipeline. The sig only advances if the entire pipeline succeeds. Failure leaves the old sig in place so the next poll retries. Reintroduces a small read-then-write race window (two concurrent probes can both read the old sig), but the worst case is a single duplicate entry, which is strictly better than a permanently dropped incident.	2026-04-19 10:14:19 +04:00
Elie Habib	63464775a5	feat(supply-chain): scenario UX — rich banner + projected score + faster poll (#3193 ) * feat(supply-chain): rich scenario banner + projected score per chokepoint + faster poll User reported Simulate Closure adds only a thin banner with no context — "not clear what value user is getting, takes many many seconds". Four targeted UX improvements in one PR: A. Rich banner (scenario params + tagline) Banner now reads: ⚠ Hormuz Tanker Blockade · 14d · +110% cost CN 100% · IN 84% · TW 82% · IR 80% · US 39% Simulating 14d / 100% closure / +110% cost on 1 chokepoint. Chokepoint card below shows projected score; map highlights… Surfaces the scenario template fields (durationDays, disruptionPct, costShockMultiplier) + a one-line explainer so a first-time user understands what "CN 100%" actually means. B. Projected score on each affected chokepoint card Card header now shows: `[current]/100 → [projected]/100` with a red trailing badge + red left border on the card body. Body prepends: "⚠ Projected under scenario: X% closure for N days (+Y% cost)". Projected = max(current, template.disruptionPct) — conservative floor since the real scoring mixes threat + warnings + anomaly. C. Faster polling Status poll interval 2s → 1s. Max iterations 30→60 (unchanged 60s budget). Worker processes in <1s; perceived latency drops from 2–3s to <2s in the common case. First poll still immediate. D. ScenarioResult interface widened Added optional `template` and `currentDisruptionScores` fields in scenario-templates.ts to match what the scenario-worker already emits. Optional = backward-compat with map-only consumers. Dependent on PR #3192 (already merged) which fixed the 10000% banner % inflation. * fix(supply-chain): trigger render() on scenario activate/dismiss — cards must re-render PR review caught a real bug in the new scenario UX: showScenarioSummary and hideScenarioSummary were mutating the banner DOM directly without triggering render(). renderChokepoints() reads activeScenarioState to paint the projected score + red border + callout, but those only run during render() — so the cards stayed stale on activate AND on dismiss until some unrelated re-render happened. Refactor to split public API from internal rendering: - showScenarioSummary(scenarioId, result) — now just sets state + calls render(). Was: set state + inline DOM mutation (bypassing card render). - renderScenarioBanner() — new private helper that builds the banner DOM from activeScenarioState. Called from render()'s postlude (replacing the old self-recursive showScenarioSummary() call — which only worked because it had a side-effectful early-exit path that happened to terminate, but was a latent recursion risk). - hideScenarioSummary() — now just sets state=null + calls render(). Was: clear state + manual banner removal + manual button-text reset loop. The button loop is redundant now — the freshly-rendered card template produces buttons with default "Simulate Closure" text by construction. Net effect: activating a scenario paints the banner AND the affected chokepoint cards in a single render tick. Dismissing strips both in the same tick. * fix(supply-chain): derive scenario button state from activeScenarioState, not imperative mutation PR review caught: the earlier re-render fix (showScenarioSummary → render()) correctly repaints cards on activate, but the button-state logic in runScenario() is now wrong. render() detaches the old btn reference, so the post-onScenarioActivate `resetButton('Active') + btn.disabled = true` touches a detached node and no-ops (resetButton() explicitly skips !btn.isConnected). The fresh button painted by render() uses the default template text — visible button reads "Simulate Closure" enabled, and users can queue duplicate runs of an already-active scenario. Fix: make button state a function of panel state. - renderChokepoints() scenario section: check activeScenarioState.scenarioId === template.id and, when matched, emit the button with class `sc-scenario-btn--active`, text "Active", and `disabled` attribute. On dismiss, the next render strips those automatically — same pattern as the card projection styling. - runScenario(): drop the dead `resetButton('Active')` + `btn.disabled` lines after onScenarioActivate. That path is now template-driven; touching the detached btn was the defect. Catch-path resets ('Simulate Closure' on abort, 'Error — retry' on real error) are unchanged — those fire BEFORE any render could detach the btn, so the imperative path is still correct there. * fix(supply-chain): hide scenario projection arrow when current already ≥ template Greptile P1: projected badge was rendered as `N/100 → N/100` whenever current disruptionScore already met or exceeded template.disruptionPct. Visible for Suez (80%) or Panama (50%) scenarios when a chokepoint is already elevated — read as "scenario has zero effect", which is misleading. The two values live on different scales — cp.disruptionScore is a computed risk score (threat + warnings + anomaly) while template.disruptionPct is "% of capacity blocked" — but they share the 0–100 axis so directional comparison is still meaningful for the "does this scenario escalate things?" signal. Fix: arrow only renders when template.disruptionPct > cp.disruptionScore. When current already equals or exceeds the scenario level, show the single current badge. The card's red left border + "⚠ Projected under scenario" callout still indicate the card is the scenario target — only the escalation arrow is suppressed.	2026-04-19 09:25:55 +04:00
Elie Habib	85d6308ed0	fix(brief): unblock Telegram carousel fetch in middleware bot gate (#3196 ) * fix(brief): allow Telegram/social UAs to fetch carousel images middleware.ts BOT_UA regex (/bot/i) was 403 on Telegram sendMediaGroup fetch of /api/brief/carousel/<u>/<d>/<p>. SOCIAL_IMAGE_UA allowlist (includes telegrambot) was scoped to /favico/* and .png suffix only; carousel returns image/png but the URL has no extension. Symptom: Railway log [digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED and zero images above the Telegram brief. Fix: extend UA-bypass guard to cover /api/brief/carousel/ prefix. HMAC token on the URL is the real auth; UA allowlist is defence-in-depth. * Address P2 + P3: regression test + route-shape regex P2: Add tests/middleware-bot-gate.test.mts — 13 cases pinning the contract: - TelegramBot/Slackbot/Discordbot/LinkedInBot pass on carousel - curl, generic bot UAs, missing UA still 403 on carousel - TelegramBot 403s on non-carousel API routes (scoped, not global) - Malformed carousel paths (admin/dashboard, page >= 3, non-ISO date) all still 403 via the regex - Normal browsers pass everywhere P3: Replace startsWith('/api/brief/carousel/') prefix with BRIEF_CAROUSEL_PATH_RE matching the exact shape enforced by api/brief/carousel/[userId]/[issueDate]/[page].ts (userId / YYYY-MM-DD / page 0\|1\|2). A future /api/brief/carousel/admin or similar sibling cannot inherit the bypass. Comment now lists every social-image UA this protects. typecheck + typecheck:api clean. test:data 5772/5772.	2026-04-19 09:16:14 +04:00
Elie Habib	6025b0ce47	chore(sentry): add Chrome/Firefox variant of UTItemActionController filter (#3194 ) The Safari variant (Can't find variable: UTItemActionController) was already in ignoreErrors at line 53. Chrome/Firefox uses the "X is not defined" format instead (WORLDMONITOR-NB). Added to the existing "is not defined" group at line 119.	2026-04-19 08:58:07 +04:00
Elie Habib	434a2e0628	feat(settings): API Keys tab visible to all users with PRO upgrade CTA (#3190 ) * feat(settings): show API Keys tab to all users with PRO upgrade CTA Free users who clicked the API Keys tab triggered a server-side ConvexError: API_ACCESS_REQUIRED (WORLDMONITOR-NA). Now the tab is always visible with a PRO badge, and the content is gated client-side: - Anonymous: lock icon + "Sign In" CTA (opens Clerk sign-in) - Free: upgrade icon + "Upgrade to Pro" CTA (opens Dodo checkout) - PRO: full key management UI (unchanged) The Convex query is never called for non-PRO users, eliminating the server error at the source while creating a natural upgrade funnel. Reuses existing panel-locked-state CSS (gold accent, gradient button). * fix(settings): gate API Keys on apiAccess feature, not isProUser Addresses review findings on PR #3190: 1. Gate changed from isProUser() to hasFeature('apiAccess') — matches the server contract in convex/apiKeys.ts which requires apiAccess (tier 2+), not just PRO (tier 1). PRO users without apiAccess now correctly see the upgrade CTA instead of the full UI. 2. CTA button now launches API_STARTER_MONTHLY checkout instead of DEFAULT_UPGRADE_PRODUCT (PRO_MONTHLY) — users buy the correct product that actually includes API key access. 3. loadApiKeys() guard now checks both getAuthState().user AND hasFeature('apiAccess') — prevents anonymous keyed sessions (widget/pro keys without Clerk auth) from hitting the Convex query that requires authentication. * fix(settings): re-render API Keys panel when entitlements arrive On cold load, hasFeature('apiAccess') returns false until the Convex entitlement subscription delivers data. A paid API Starter user who opens settings before that snapshot arrives would see the upgrade CTA and loadApiKeys() would be skipped. Subscribes to onEntitlementChange() while the modal is open and re-renders the api-keys panel content + re-attaches handlers when entitlements change. Cleans up in close() and destroy(). Also extracts handler attachment into attachApiKeysHandlers() to avoid duplicating the CTA click + input keydown wiring between render() and the entitlement callback.	2026-04-19 08:24:10 +04:00

1 2 3 4 5 ...

3500 Commits