mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
048bb8bb525393dc4a9c1998b9877c1f8cc8c011
3500 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
048bb8bb52 |
fix(brief): unblock whyMatters analyst endpoint (middleware 403) + DIGEST_ONLY_USER filter (#3255)
* fix(brief): unblock whyMatters analyst endpoint + add DIGEST_ONLY_USER filter Three changes, all operational for PR #3248's brief-why-matters feature. 1. middleware.ts PUBLIC_API_PATHS allowlist Railway logs post-#3248 merge showed every cron call to /api/internal/brief-why-matters returning 403 — middleware's "short UA" guard (~L183) rejects Node undici's default UA before the endpoint's own Bearer-auth runs. The feature never executed in prod; three-layer fallback silently shipped legacy Gemini output. Same class as /api/seed-contract-probe (2026-04-15). Endpoint still carries its own subtle-crypto HMAC auth, so bypassing the UA gate is safe. 2. Explicit UA on callAnalystWhyMatters fetch Defense-in-depth. Explicit 'worldmonitor-digest-notifications/1.0' keeps the endpoint reachable if PUBLIC_API_PATHS is ever refactored, and makes cron traffic distinguishable from ops curl in logs. 3. DIGEST_ONLY_USER=user_xxx filter Operator single-user test flag. Set on Railway to run compose + send for one user on the next tick (then unset) — validates new features end-to-end without fanning out. Empty/unset = normal fan-out. Applied right after rule fetch so both compose and dispatch paths respect it. Regression tests: 15 new cases in tests/middleware-bot-gate.test.mts pin every PUBLIC_API_PATHS entry against 3 triggers (empty/short/curl UA) plus a negative sibling-path suite so a future prefix-match refactor can't silently unblock /api/internal/. Tests: 6043 pass. typecheck + typecheck:api clean. biome: pre-existing main() complexity warning bumped 74→78 by the filter block (unchanged in character from pre-PR). * test(middleware): expand sibling-path negatives to cover all 3 trigger UAs Greptile flagged: `SIBLING_PATHS` was only tested with `EMPTY_UA`. Under the current middleware chain this is sufficient (sibling paths hit the short-UA OR BOT_UA 403 regardless), but it doesn't pin *which* guard fires. A future refactor that moves `PUBLIC_API_PATHS.has(path)` later in the chain could let a curl or undici UA pass on a sibling path without this suite failing. Fix: iterate the 3 sibling paths against all 3 trigger UAs (empty, short/undici, curl). Every combination must still 403 regardless of which guard catches it. 6 new test cases. Tests: 35 pass in the middleware-bot-gate suite (was 29). |
||
|
|
65a1210531 |
fix(unrest): Decodo proxy fallback for GDELT + surface err.cause (#3256)
* fix(unrest): Decodo proxy fallback for GDELT + surface err.cause Background: unrestEvents went STALE_SEED when every tick logged "GDELT failed: fetch failed" (Railway log 2026-04-21). The bare "fetch failed" string hid the actual cause (DNS/TCP/TLS), so the outage was opaque. ACLED is disabled (no credentials) so GDELT is the sole live source — when it fails, the seed freezes. Changes: - fetchGdeltEvents: direct-first, Decodo proxy fallback via httpsProxyFetchRaw when PROXY_URL is configured. Mirrors imfFetchJson / _yahoo-fetch.mjs direct→proxy pattern. - Error messages now include err.cause.code (UND_ERR_CONNECT_TIMEOUT, ENOTFOUND, ECONNRESET, etc.) so the next outage surfaces the underlying transport error instead of "fetch failed". - Both-paths-failed error carries direct + proxy message so either can be diagnosed from a single log line. No behavior change on the happy path — direct fetch still runs first with the existing 30s AbortSignal timeout. * fix(unrest): address PR #3256 P2 review - describeErr: handle plain-string .cause (e.g. `{ cause: 'ENOTFOUND' }`) that would otherwise be silently dropped since a string has no .code/.errno/.message accessors. - fetchGdeltDirect: tag HTTP-status errors (!resp.ok) with httpStatus. fetchGdeltEvents skips the proxy hop for upstream HTTP errors since the proxy routes to the same GDELT endpoint — saves the 20s proxy timeout and avoids a pointless retry. Transport failures (DNS/TCP/TLS timeouts against Railway IPs) still trigger the proxy fallback, which is the motivating case. |
||
|
|
2f19d96357 |
feat(brief): route whyMatters through internal analyst-context endpoint (#3248)
* feat(brief): route whyMatters through internal analyst-context endpoint
The brief's "why this is important" callout currently calls Gemini on
only {headline, source, threatLevel, category, country} with no live
state. The LLM can't know whether a ceasefire is on day 2 or day 50,
that IMF flagged >90% gas dependency in UAE/Qatar/Bahrain, or what
today's forecasts look like. Output is generic prose instead of the
situational analysis WMAnalyst produces when given live context.
This PR adds an internal Vercel edge endpoint that reuses a trimmed
variant of the analyst context (country-brief, risk scores, top-3
forecasts, macro signals, market data — no GDELT, no digest-search)
and ships it through a one-sentence LLM call with the existing
WHY_MATTERS_SYSTEM prompt. The endpoint owns its own Upstash cache
(v3 prefix, 6h TTL), supports a shadow mode that runs both paths in
parallel for offline diffing, and is auth'd via RELAY_SHARED_SECRET.
Three-layer graceful degradation (endpoint → legacy Gemini-direct →
stub) keeps the brief shipping on any failure.
Env knobs:
- BRIEF_WHY_MATTERS_PRIMARY=analyst|gemini (default: analyst; typo → gemini)
- BRIEF_WHY_MATTERS_SHADOW=0|1 (default: 1; only '0' disables)
- BRIEF_WHY_MATTERS_SHADOW_SAMPLE_PCT=0..100 (default: 100)
- BRIEF_WHY_MATTERS_ENDPOINT_URL (Railway, optional override)
Cache keys:
- brief:llm:whymatters:v3:{hash16} — envelope {whyMatters, producedBy,
at}, 6h TTL. Endpoint-owned.
- brief:llm:whymatters:shadow:v1:{hash16} — {analyst, gemini, chosen,
at}, 7d TTL. Fire-and-forget.
- brief:llm:whymatters:v2:{hash16} — legacy. Cron's fallback path
still reads/writes during the rollout window; expires in ≤24h.
Tests: 6022 pass (existing 5915 + 12 core + 36 endpoint + misc).
typecheck + typecheck:api + biome on changed files clean.
Plan (Codex-approved after 4 rounds):
docs/plans/2026-04-21-001-feat-brief-why-matters-analyst-endpoint-plan.md
* fix(brief): address /ce:review round 1 findings on PR #3248
Fixes 5 findings from multi-agent review, 2 of them P1:
- #241 P1: `.gitignore !api/internal/**` was too broad — it re-included
`.env`, `.env.local`, and any future secret file dropped into that
directory. Narrowed to explicit source extensions (`*.ts`, `*.js`,
`*.mjs`) so parent `.env` / secrets rules stay in effect inside
api/internal/.
- #242 P1: `Dockerfile.digest-notifications` did not COPY
`shared/brief-llm-core.js` + `.d.ts`. Cron would have crashed at
container start with ERR_MODULE_NOT_FOUND. Added alongside
brief-envelope + brief-filter COPY lines.
- #243 P2: Cron dropped the endpoint's source/producedBy ground-truth
signal, violating PR #3247's own round-3 memory
(feedback_gate_on_ground_truth_not_configured_state.md). Added
structured log at the call site: `[brief-llm] whyMatters source=<src>
producedBy=<pb> hash=<h>`. Endpoint response now includes `hash` so
log + shadow-record pairs can be cross-referenced.
- #244 P2: Defense-in-depth prompt-injection hardening. Story fields
flowed verbatim into both LLM prompts, bypassing the repo's
sanitizeForPrompt convention. Added sanitizeStoryFields helper and
applied in both analyst and gemini paths.
- #245 P2: Removed redundant `validate` option from callLlmReasoning.
With only openrouter configured in prod, a parse-reject walked the
provider chain, then fell through to the other path (same provider),
then the cron's own fallback (same model) — 3x billing on one reject.
Post-call parseWhyMatters check already handles rejection cleanly.
Deferred to P3 follow-ups (todos 246-248): singleflight, v2 sunset,
misc polish (country-normalize LOC, JSDoc pruning, shadow waitUntil,
auto-sync mirror, context-assembly caching).
Tests: 6022 pass. typecheck + typecheck:api clean.
* fix(brief-why-matters): ctx.waitUntil for shadow write + sanitize legacy fallback
Two P2 findings on PR #3248:
1. Shadow record was fire-and-forget without ctx.waitUntil on an Edge
function. Vercel can terminate the isolate after response return,
so the background redisPipeline write completes unreliably — i.e.
the rollout-validation signal the shadow keys were supposed to
provide was flaky in production.
Fix: accept an optional EdgeContext 2nd arg. Build the shadow
promise up front (so it starts executing immediately) then register
it with ctx.waitUntil when present. Falls back to plain unawaited
execution when ctx is absent (local harness / tests).
2. scripts/lib/brief-llm.mjs legacy fallback path called
buildWhyMattersPrompt(story) on raw fields with no sanitization.
The analyst endpoint sanitizes before its own prompt build, but
the fallback is exactly what runs when the endpoint misses /
errors — so hostile headlines / sources reached the LLM verbatim
on that path.
Fix: local sanitizeStoryForPrompt wrapper imports sanitizeForPrompt
from server/_shared/llm-sanitize.js (existing pattern — see
scripts/seed-digest-notifications.mjs:41). Wraps story fields
before buildWhyMattersPrompt. Cache key unchanged (hash is over raw
story), so cache parity with the analyst endpoint's v3 entries is
preserved.
Regression guard: new test asserts the fallback prompt strips
"ignore previous instructions", "### Assistant:" line prefixes, and
`<|im_start|>` tokens when injection-crafted fields arrive.
Typecheck + typecheck:api clean. 6023 / 6023 data tests pass.
* fix(digest-cron): COPY server/_shared/llm-sanitize into digest-notifications image
Reviewer P1 on PR #3248: my previous commit (
|
||
|
|
89c179e412 |
fix(brief): cover greeting was hardcoded "Good evening" regardless of issue time (#3254)
* fix(brief): cover greeting was hardcoded "Good evening" regardless of issue time Reported: a brief viewed at 13:02 local time showed "Good evening" on the cover (slide 1) but "Good afternoon." on the digest greeting page (slide 2). Cause: `server/_shared/brief-render.js:renderCover` had the string `'Good evening'` hardcoded in the cover's mono-cased salutation slot. The digest greeting page (slide 2) renders the time-of-day-correct value from `envelope.data.digest.greeting`, which is computed by `shared/brief-filter.js:174-179` from `localHour` in the user's TZ (< 12 → morning, < 18 → afternoon, else → evening). So any brief viewed outside the literal evening showed an inconsistent pair. Fix: thread `digest.greeting` into `renderCover`; a small `coverGreeting()` helper strips the trailing period so the cover's no-punctuation mono style is preserved. On unexpected/missing values it falls back to a generic "Hello" rather than silently re-hardcoding a specific time of day. Tests: 5 regression cases in `tests/brief-magazine-render.test.mjs` cover afternoon/morning/evening parity, period stripping, and HTML escape (defense-in-depth). 60 total in that file pass. Full test:data 5921 pass. typecheck + typecheck:api + biome clean. * chore(brief): fix orphaned JSDoc on coverGreeting / renderCover Greptile flagged: the original `renderCover` JSDoc block stayed above `coverGreeting` when the helper was inserted, so the @param shape was misattributed to the wrong function and `renderCover` was left undocumented (plus the new `greeting` field was unlisted). Moved the opts-shape JSDoc to immediately above `renderCover` and added `greeting: string` to the param type. `coverGreeting` keeps its own prose comment. No runtime change. |
||
|
|
b0928f213c |
fix(live-webcams): refresh Iran-Attacks multicam + Mideast Mecca video IDs (#3251)
User reported two tiles showing "This live stream recording is not
available" — the pinned fallbackVideoIds had gone dark.
- iran-multicam (Iran Attacks → Middle East slot):
FGUKbzulB_Y → KSwPNkzEgxg
- mecca (Mideast → Mecca slot):
Cm1v4bteXbI → kJwEsQTegxk
Values supplied by the user from working YouTube live URLs. Only
`fallbackVideoId` is read by the runtime (buildEmbedUrl line 303, open-link
line 422); `channelHandle` is metadata and left as-is.
|
||
|
|
c279f6f426 |
fix(pro-marketing): nav reflects auth state, hide pro banner for pro users (#3250)
* fix(pro-marketing): reflect auth state in nav, hide pro banner for pro users
Two related signed-in-experience bugs caught by the user during the
post-purchase flow:
1. /pro Navbar's SIGN IN button never reacted to auth state. The
component was a static const Navbar = () => <nav>...</nav>; with
no Clerk subscription, so signing in left the SIGN IN button in
place even though the user was authenticated.
2. The "Pro is launched — Upgrade to Pro" announcement banner on the
main app showed for ALL visitors including paying Pro subscribers.
Pitching upgrade to a customer who already paid is a small but
real annoyance, and it stays sticky for 7 days via the localStorage
dismiss key — so a returning paying user dismisses it once and
then never sees the (genuinely useful) banner again if they later
downgrade.
## Changes
### pro-test/src/App.tsx — useClerkUser hook + ClerkUserButton
- New useClerkUser() hook subscribes to Clerk via clerk.addListener
and returns { user, isLoaded } so any component can react to auth
changes (sign-in, sign-out, account switch).
- New ClerkUserButton component mounts Clerk's native UserButton
widget (avatar + dropdown with profile/sign-out) into a div via
clerk.mountUserButton — inherits the existing dark-theme appearance
options from services/checkout.ts::ensureClerk.
- Navbar swaps SIGN IN button for ClerkUserButton when user is
signed in. Slot is intentionally empty during isLoaded=false to
avoid a SIGN IN → avatar flicker for returning users.
- Hero hides its redundant SIGN IN CTA when signed in; collapses to
just "Choose Plan" which is the relevant action for returning users.
- Public/pro/ rebuilt to ship the change (per PR #3229's bundle-
freshness rule).
### src/components/ProBanner.ts — premium-aware show + reactive auto-hide
- showProBanner returns early if hasPremiumAccess() — same authoritative
signal used by the frontend's panel-gating layer (unions API key,
tester key, Clerk pro role, AND Convex Dodo entitlement).
- onEntitlementChange listener auto-dismisses the banner if a Convex
snapshot arrives mid-session that flips the user to premium (e.g.
Dodo webhook lands while they're sitting on the dashboard). Does NOT
write the dismiss timestamp, so the banner reappears correctly if
they later downgrade.
## Test plan
### pro-test (sign-in UI)
- [ ] Anonymous user loads /pro → SIGN IN button visible in nav.
- [ ] Click SIGN IN, complete Clerk modal → button replaced with
Clerk's UserButton avatar dropdown.
- [ ] Open dropdown, click Sign Out → reverts to SIGN IN button.
- [ ] Hard reload as signed-in user → SIGN IN button never flashes;
avatar appears once Clerk loads.
### main app (banner gating)
- [ ] Anonymous user loads / → "Pro is launched" banner shows.
- [ ] Click ✕ to dismiss → banner stays dismissed for 7 days
(existing behavior preserved).
- [ ] Pro user (active Convex entitlement) loads / → banner does
NOT appear, regardless of dismiss state.
- [ ] Free user opens /, then completes checkout in another tab and
Convex publishes the entitlement snapshot → banner auto-hides
in the dashboard tab without reload.
- [ ] Pro user whose subscription lapses (validUntil < now) → banner
reappears on next page load, since dismiss timestamp wasn't
written by the entitlement-change auto-hide.
* fix(pro-banner): symmetric show/hide on entitlement change
Reviewer caught that the previous iteration only handled the upgrade
direction (premium snapshot → hide banner) but never re-showed the
banner on a downgrade. App.ts calls showProBanner() once at init, so
without a symmetric show path, a session that started premium and
then lost entitlement (cancellation, billing grace expiry, plan
downgrade for the same user) would stay banner-less for the rest of
the SPA session — until a full reload re-ran App.ts init.
Net effect of the bug: the comment claiming "the banner reappears
correctly if they later downgrade or the entitlement lapses" was
false in practice for any in-tab transition.
Two changes:
1. Cache the container on every showProBanner() call, including
the early-return paths. App.ts always calls showProBanner()
once at init regardless of premium state, so this guarantees
the listener has the container reference even when the initial
mount was skipped (premium user, dismissed, in iframe).
2. Make onEntitlementChange handler symmetric:
- premium snapshot + visible → hide (existing behavior)
- non-premium snapshot + not visible + cached container +
not dismissed + not in iframe → re-mount via showProBanner
The non-premium re-mount goes through showProBanner() so it gets the
same gate checks as the initial path (isDismissed, iframe, premium).
We can never surface a banner the user has already explicitly ✕'d
this week.
Edge cases handled:
- User starts premium, no banner shown, downgrades mid-session
→ listener fires, premium false, no bannerEl, container cached,
not dismissed → showProBanner mounts banner ✓
- User starts free, sees banner, upgrades mid-session
→ listener fires, premium true, bannerEl present → fade out ✓
- User starts free, dismisses banner, upgrades, downgrades
→ listener fires on downgrade, premium false, no bannerEl,
container cached, isDismissed=true → showProBanner returns early ✓
- User starts free, banner showing, multiple entitlement snapshots
arrive without state change → premium=false && bannerEl present,
neither branch fires, idempotent no-op ✓
* fix(pro-banner): defer initial mount while entitlement is loading
Greptile P1 round-2: hasPremiumAccess() at line 48 reads isEntitled()
synchronously, but the Convex entitlement subscription is fired
non-awaited at App.ts:868 (`void initEntitlementSubscription()`).
showProBanner() runs at App.ts:923 during init Phase 1, before the
first Convex snapshot arrives.
So a Convex-only paying user (Clerk role 'free' + Dodo entitlement
tier=1) sees this sequence:
t=0 init runs → hasPremiumAccess() === false (isEntitled() reads
currentState===null) → "Upgrade to Pro" banner mounts
t=~1s Convex snapshot arrives → onEntitlementChange fires → my
listener detects premium=true && bannerEl !== null → fade out
That's a 1+ second flash of "you should upgrade!" content for someone
who has already paid. Worst case is closer to ~10s on a cold-start
Convex client, which is much worse — looks like the upgrade pitch is
the actual UI.
Defer the initial mount when (1) the user is signed in (so they
plausibly have a Convex entitlement) AND (2) the entitlement state
hasn't loaded yet (currentState === null). The existing
onEntitlementChange listener will mount it later if the first
snapshot confirms the user is actually free.
Two reasons this is gated on "signed in":
- Anonymous users will never have a Convex entitlement, so
deferring would mean the banner NEVER mounts for them. Bad
regression: anon visitors are the highest-value audience for
the upgrade pitch.
- For signed-in users, the worst case if no entitlement EVER
arrives is the banner stays absent — which is identical to a
paying user's correct state, so it fails-closed safely.
Edge case behavior:
- Anonymous user: no Clerk session → first condition false →
banner mounts immediately ✓
- Signed-in free user with first snapshot pre-loaded somehow:
second condition false → banner mounts immediately ✓
- Signed-in user, snapshot pending: deferred → listener mounts
on first snapshot if user turns out free ✓
- Signed-in user, snapshot pending, user turns out premium: never
mounted ✓ (the desired path)
- Signed-in user, snapshot pending, never arrives (Convex outage):
banner never shows → see above, this fails-closed safely
|
||
|
|
6977e9d0fe |
fix(gateway): accept Dodo entitlement as pro, not just Clerk role — unblocks paying users (#3249)
* fix(gateway): accept Dodo entitlement as pro, not just Clerk role
The gateway's legacy premium-paths gate (lines 388-401) was rejecting
authenticated Bearer users with 403 "Pro subscription required"
whenever session.role !== 'pro' — which is EVERY paying Dodo
subscriber, because the Dodo webhook pipeline writes Convex
entitlements and does NOT sync Clerk publicMetadata.role.
So the flow was:
- User pays, Dodo webhook fires, Convex entitlement tier=1 written
- User loads the dashboard, Clerk token includes Bearer but role='free'
- Gateway sees role!=='pro' → 403 on every intelligence/trade/
economic/sanctions premium endpoint
- User sees a blank dashboard despite having paid
This is the exact split-brain documented at the frontend layer
(src/services/panel-gating.ts:11-27): "The Convex entitlement check
is the authoritative signal for paying customers — Clerk
`publicMetadata.plan` is NOT written by our webhook pipeline". The
frontend was fixed by having hasPremiumAccess() fall through to
isEntitled() from Convex. The backend gateway still had the
Clerk-role-only gate, so paying users got rejected even though
their Convex entitlement was active.
Align the gateway gate with the logic already in
server/_shared/premium-check.ts::isCallerPremium (line 44-49):
1. If Clerk role === 'pro' → allow (fast path, no Redis/Convex I/O)
2. Else if session.userId → look up Convex entitlement; allow if
tier >= 1 AND validUntil >= Date.now() (covers lapsed subs)
3. Else → 403
Same two-signal semantics as the per-handler isCallerPremium, so
the gateway and handlers can't disagree on who is premium. Uses
the already-imported getEntitlements function (line 345 already
imports it dynamically; promoting to top-level import since the new
site is in a hotter path).
Impact: unblocks all Dodo subscribers whose Clerk role is still
'free' — the common case after any fresh Pro purchase and for
every user since webhook-based role sync was never wired up.
Reported 2026-04-21 post-purchase flow: user completed Dodo payment,
landed back on dashboard, saw 403s on get-regional-snapshot,
get-tariff-trends, list-comtrade-flows, get-national-debt,
deduct-situation — all 5 are in PREMIUM_RPC_PATHS but not in
ENDPOINT_ENTITLEMENTS, so they hit this legacy gate.
* fix(gateway): move entitlement fallback to the gate that actually fires
Reviewer caught that the previous iteration of this fix put the
entitlement fallback at line ~400, inside an `if (sessionUserId &&
!keyCheck.valid && needsLegacyProBearerGate)` branch that's
unreachable for the case the PR was supposed to fix:
- sessionUserId is only resolved when isTierGated is true (line 292)
— JWKS lookup is intentionally skipped for non-tier-gated paths.
- needsLegacyProBearerGate IS the non-tier-gated set
(PREMIUM_RPC_PATHS && !isTierGated).
- So sessionUserId is null, the branch never enters, and the actual
legacy-Bearer rejection still happens earlier at line 367 inside
the `keyCheck.required && !keyCheck.valid` branch.
Move the entitlement fallback INTO the line-367 check, where the
Bearer is already being validated and `session.userId` is already
exposed on the validateBearerToken() result. No extra JWKS round-trip
needed (validateBearerToken already verified the JWT). The previously-
added line-400 block is removed since it never ran.
Now for a paying Dodo subscriber whose Clerk role is still 'free':
- Bearer validates → role !== 'pro'
- Fall through: getEntitlements(session.userId) → tier=1, validUntil future
- allowed = true, request proceeds to handler
Same fail-closed semantics as before for the negative cases:
- Anonymous → no Bearer → 401
- Bearer with invalid JWT → 401
- Free user with no Dodo entitlement → 403
- Pro user whose Dodo subscription lapsed (validUntil < now) → 403
* chore(gateway): drop redundant dynamic getEntitlements import
Greptile spotted that the previous commit promoted getEntitlements to
a top-level import for the new line-385 fallback site, but the older
dynamic import at line 345 (in the user-API-key entitlement check
branch) was left in place. Same module, same symbol, so the dynamic
import is now dead weight that just adds a microtask boundary to the
hot path.
Drop it; line 345's `getEntitlements(sessionUserId)` call now resolves
through the top-level import like the line-385 site already does.
|
||
|
|
4d9ae3b214 | feat(digest): topic-grouped brief ordering (size-first) (#3247) | ||
|
|
ee93fb475f |
fix(portwatch): cut HISTORY_DAYS 90 to 60 so per-country fits 540s budget (#3246)
Prod log 2026-04-21T00:02Z on the standalone portwatch-port-activity service confirmed the per-country shape at 90 days still doesn't fit even in its own 540s container: batch 1/15: 7 seeded, 5 errors (90.0s) batch 5/15: 42 seeded, 18 errors (392.6s) [SIGTERM at 540s after ~batch 7] Math: avg ~75s/batch × 15 batches = 1125s needed vs 540s available. Degradation guard would reject the ~50-country partial publish against a prev snapshot of ~150+ countries. 60 days is the minimum window that still covers both aggregates the UI consumes: last30 (days 0-30, all current metrics) + prev30 (days 30-60, for trendDelta). Cutting from 90d→60d drops each per-country query by ~33% in row count and page count. Expected avg batch time ~50s. No feature regression: last7/anomalySignal were already no-ops because ArcGIS's Daily_Ports_Data max date lags 10+ days behind now, so no row ever falls in the last-7-day window regardless of HISTORY_DAYS. Test added asserting HISTORY_DAYS=60 so an accidental revert breaks CI. 55 portwatch tests pass. Typecheck + lint clean. |
||
|
|
56a792bbc4 |
docs(marketing): bump source-count claims from 435+ to 500+ (#3241)
Feeds.ts is at 523 entries after PR #3236 landed. The "435+" figure has been baked into marketing copy, docs, press kit, and localized strings for a long time and is now noticeably understated. Bump to 500+ as the new canonical figure. Also aligned three stale claims in less-visited docs: docs/getting-started.mdx 70+ RSS feeds => 500+ RSS feeds docs/ai-intelligence.mdx 344 sources => 500+ sources docs/COMMUNITY-PROMOTION-GUIDE 170+ news feeds => 500+ news feeds 170+ news sources => 500+ news sources And bumped the digest-dedup copy 400+ to 500+ (English + French locales + pro-test/index.html prerendered body) for consistency with the pricing and GDELT panels. Left alone on purpose (different metric): 22 services / 22 service domains 24 feeds (security-advisory seeder specifically) 31 sources (freshness tracker) 45 map layers Rebuilt /pro bundle so the per-locale chunks + prerendered index.html under public/pro/assets ship the new copy. 20 locales updated. |
||
|
|
d880f6a0e7 |
refactor(aviation): consolidate intl+FAA+NOTAM+news seeds into seed-aviation.mjs (#3238)
* refactor(aviation): consolidate intl+FAA+NOTAM+news seeds into seed-aviation.mjs
seed-aviation.mjs was misnamed: it wrote to a dead Redis key while the
51-airport AviationStack loop + ICAO NOTAM loop lived hidden inside
ais-relay.cjs, duplicating the NOTAM write already done by
seed-airport-delays.mjs.
Make seed-aviation.mjs the single home for every aviation Redis key:
aviation:delays:intl:v3 (AviationStack 51 intl — primary)
aviation:delays:faa:v1 (FAA ASWS 30 US)
aviation:notam:closures:v2 (ICAO NOTAM 60 global)
aviation:news::24:v1 (9 RSS feeds prewarmer)
One unified AIRPORTS registry (~85 entries) replaces the three separate lists.
Notifications preserved via wm:events:queue LPUSH + SETNX dedup; prev-state
migrated from in-process Sets to Redis so short-lived cron runs don't spam
on every tick. ICAO quota-exhaustion backoff retained.
Contracts preserved byte-identically for consumers (AirportDelayAlert shape,
seed-meta:aviation:{intl,faa,notam} meta keys, runSeed envelope writes).
Impact: kills ~8,640/mo wasted AviationStack calls (dead-key writes), strips
~490 lines of hidden seed logic from ais-relay, eliminates duplicate NOTAM
writer. Net -243 lines across three files.
Railway steps after merge:
1. Ensure seed-aviation service env has AVIATIONSTACK_API + ICAO_API_KEY.
2. Delete/disable the seed-airport-delays Railway service.
3. ais-relay redeploys automatically; /aviationstack + /notam live proxies
for user-triggered flight lookups preserved.
* fix(aviation): preserve last-good intl snapshot on unhealthy/skipped fetch + restore NOTAM quota-exhaust handling
Review feedback on PR #3238:
(1) Intl unhealthy → was silently overwriting aviation:delays:intl:v3 with
an empty or partial snapshot because fetchAll() always returned
{ alerts } and zeroIsValid:true let runSeed publish. Now:
• seedIntlDelays() returns { alerts, healthy, skipped } unchanged
• fetchAll() refuses to publish when !healthy || skipped:
- extendExistingTtl([INTL_KEY, INTL_META_KEY], INTL_TTL)
- throws so runSeed enters its graceful catch path (which also
extends these TTLs — idempotent)
• Per-run cache (cachedRun) short-circuits subsequent withRetry(3)
invocations so the retries don't burn 3x NOTAM quota + 3x FAA/RSS
fetches when intl is sick.
(2) NOTAM quota exhausted — PR claimed "preserved" but only logged; the
NOTAM data key was drifting toward TTL expiry and seed-meta was going
stale, which would flip api/health.js maxStaleMin=240 red after 4h
despite the intended 24h backoff window. Now matches the pre-strip
ais-relay behavior byte-for-byte:
• extendExistingTtl([NOTAM_KEY], NOTAM_TTL)
• upstashSet(NOTAM_META_KEY, {fetchedAt: now, recordCount: 0,
quotaExhausted: true}, 604800)
Consumers keep serving the last known closure list; health stays green.
Also added extendExistingTtl fallbacks on FAA/NOTAM network-rejection paths
so transient network failures also don't drift to TTL expiry.
* refactor(aviation): move secondary writes + notifications into afterPublish
Review feedback on PR #3238: fetchAll() was impure — it wrote FAA / NOTAM /
news and dispatched notifications during runSeed's fetch phase, before the
canonical aviation:delays:intl:v3 publish ran. If that later publish failed,
consumers could see fresh FAA/NOTAM/news alongside a stale intl key, and
notifications could fire for a run whose primary key never published,
breaking the "single home / one cron tick" atomic contract.
Restructure:
• fetchAll() now pure — returns { intl, faa, notam, news + rejection refs }.
No Redis writes, no notifications.
• Intl gate stays: unhealthy / skipped → throw. runSeed's catch extends
TTL on INTL_KEY + seed-meta:aviation:intl and exits 0. afterPublish
never runs, so no side effects escape.
• publishTransform extracts { alerts } from the bundle for the canonical
envelope; declareRecords sees the transformed shape.
• afterPublish handles ALL secondary writes (FAA, NOTAM, news) and
notification dispatch. Runs only after a successful canonical publish.
• Per-run memo (cachedBundle) still short-circuits withRetry(3) retries
so NOTAM quota isn't burned 3x when intl is sick.
NOTAM quota-exhaustion + rejection TTL-extend branches preserved inside
afterPublish — same behavior, different location.
* refactor(aviation): decouple FAA/NOTAM/news side-cars from intl's runSeed gate
Review feedback on PR #3238: the previous refactor coupled all secondary
outputs to the AviationStack primary key. If AVIATIONSTACK_API was missing
or intl was systemically unhealthy, fetchAll() threw → runSeed skipped
afterPublish → FAA/NOTAM/news all went stale despite their own upstream
sources being fine. Before consolidation, FAA and NOTAM each ran their own
cron and could freshen independently. This restores that independence.
Structure:
• Three side-car runners: runFaaSideCar, runNotamSideCar, runNewsSideCar.
Each acquires its own Redis lock (aviation:faa / aviation:notam /
aviation:news — distinct from aviation:intl), fetches its source,
writes data-key + seed-meta on success, extends TTL on failure,
releases the lock. Completely independent of the AviationStack path.
• NOTAM side-car keeps the quota-exhausted + rejection handling and
dispatches notam_closure notifications inline.
• main() runs the three side-cars sequentially, then hands off to runSeed
for intl. runSeed still process.exit()s at the end so it remains the
last call.
• Intl's afterPublish now only dispatches aviation_closure notifications
(its single responsibility).
Removed: the per-run memo for fetchAll (no longer needed — withRetry now
only re-runs the intl fetch, not FAA/NOTAM/RSS).
Net behavior:
• AviationStack 500s / missing key → FAA, NOTAM, news still refresh
normally; only aviation:delays:intl:v3 extends TTL + preserves prior
snapshot.
• ICAO quota exhausted → NOTAM extends TTL + writes fresh meta (as before);
FAA/intl/news unaffected.
• FAA upstream failure → only FAA extends TTL; other sources unaffected.
* fix(aviation): correct Gaborone ICAO + populate FAA alert meta from registry
Greptile review on PR #3238:
P1: GABS is not the ICAO for Gaborone — the value was faithfully copied
from the pre-strip ais-relay NOTAM list which was wrong. Botswana's
ICAO prefix is FB; the correct code is FBSK. NOTAM queries for GABS
would silently exclude Gaborone from closure detection. (Pre-existing
bug in the repo; fixing while in this neighborhood.)
P2 (FAA alerts): Now that the unified AIRPORTS registry carries
icao/name/city/country for every FAA airport, use it. Previous code
returned icao:'', name:iata, city:'' — consumers saw bare IATA codes
for US-only alerts. Registry lookup via a new FAA_META map; lat/lon
stays 0,0 by design (FAA rows aren't rendered on the globe, so lat/lon
is intentionally absent from those registry rows).
P2 (NOTAM TTL on quota exhaustion): already fixed in commit
|
||
|
|
ecd56d4212 |
feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast (#3236)
* feat(feeds): add IRNA, Mehr, Jerusalem Post, Ynetnews to middleeast Four direct-RSS sources verified from a clean IP and absent everywhere in the repo (src/config/feeds.ts, scripts/seed-*, ais-relay.cjs, RSS allowlist). Closes the highest-ROI Iran / Israel domestic-press gap from the ME source audit (PR #3226) with zero infra changes. - IRNA https://en.irna.ir/rss - Mehr News https://en.mehrnews.com/rss - Jerusalem Post https://www.jpost.com/rss/rssfeedsheadlines.aspx - Ynetnews https://www.ynetnews.com/Integration/StoryRss3089.xml Propaganda-risk metadata: - IRNA + Mehr tagged high / Iran state-affiliated (join Press TV). - JPost + Ynetnews tagged low with knownBiases for transparency. RSS allowlist updated in all three mirrors (shared/, scripts/shared/, api/_rss-allowed-domains.js) per the byte-identical mirror contract enforced by tests/edge-functions.test.mjs. Deferred (separate PRs): - Times of Israel: already in allowlist; was removed from feeds for cloud-IP 403. Needs Decodo routing. - IDF Spokesperson: idf.il has no direct RSS endpoint; needs scraper. - Tasnim / Press TV RSS / Israel Hayom: known cloud-IP blocks. - WAM / SPA / KUNA / QNA / BNA: public RSS endpoints are dead; sites migrated to SPAs or gate with 403. Plan doc (PR #3226) overstated the gap: it audited only feeds.ts and missed that travel advisories + US Embassy alerts are already covered by scripts/seed-security-advisories.mjs. NOTAM claim in that doc is also wrong: we use ICAO's global NOTAM API, not FAA. * fix(feeds): enable IRNA, Mehr, Jerusalem Post, Ynetnews by default Reviewer on #3236 flagged that adding the four new ME feeds to FULL_FEEDS.middleeast alone leaves them disabled on first run, because App.ts:661 persists computeDefaultDisabledSources() output derived from DEFAULT_ENABLED_SOURCES. Users would have to manually re-enable via Settings > Sources, defeating the purpose of broadening the default ME mix. Add the four new sources to DEFAULT_ENABLED_SOURCES.middleeast so they ship on by default. Placement keeps them adjacent to their peers (IRNA / Mehr with the other Iran sources, JPost / Ynetnews after Haaretz). Risk/slant tags already in SOURCE_PROPAGANDA_RISK ensure downstream digest dedup + summarization weights them correctly. * style(feeds): move JPost + Ynetnews under Low-risk section header Greptile on #3236 flagged that both entries are risk: 'low' but were inserted above the `// Low risk - Independent with editorial standards` comment header, making the section boundary misleading for future contributors. Shift them under the header where they belong. No runtime change; cosmetic ordering only. |
||
|
|
661bbe8f09 |
fix(health): nationalDebt threshold 7d → 60d — match monthly cron interval (#3237)
* fix(health): nationalDebt threshold 7d → 60d to match monthly cron cadence
User reported health showing:
"nationalDebt": { status: "STALE_SEED", records: 187, seedAgeMin: 10469, maxStaleMin: 10080 }
Root cause: api/health.js had `maxStaleMin: 10080` (7 days) on a seeder
that runs every 30 days via seed-bundle-macro.mjs:
{ label: 'National-Debt', intervalMs: 30 * DAY, ... }
The threshold was narrower than the cron interval, so every month
between days 8–30 it guaranteed STALE_SEED. Original comment
"7 days — monthly seed" even spelled the mismatch out loud.
Data source cadence:
- US Treasury debt_to_penny API: updates daily but we only snapshot latest
- IMF WEO: quarterly/semi-annual release — no value in checking daily
- 30-day cron is appropriate; stale threshold should be ≥ 2× interval
Fix: bump maxStaleMin to 86400 (60 days). Matches the 2× pattern used
by faoFoodPriceIndex + recovery pillar (recoveryFiscalSpace, etc.)
which also run monthly.
Also fixes the same mismatch in scripts/regional-snapshot/freshness.mjs —
the 10080 ceiling there would exclude national-debt from capital_stress
axis scoring 23 days out of every 30 between seeds.
* fix(seed-national-debt): raise CACHE_TTL to 65d so health.js stale window is actually reachable
PR #3237 review was correct: my earlier fix set api/health.js
SEED_META.nationalDebt.maxStaleMin to 60d (86400min), but the seeder's
CACHE_TTL was still 35d. After a missed monthly cron, the canonical key
expired at day 35 — long before the 60d "stale" threshold. Result path:
hasData=false → api/health.js:545-549 → status = EMPTY (crit)
Not STALE_SEED (warn) as my commit message claimed.
writeFreshnessMetadata() in scripts/_seed-utils.mjs:222 sets meta TTL to
max(7d, ttlSeconds), so bumping ttlSeconds alone propagates to both the
canonical payload AND the meta key.
Fix:
- CACHE_TTL 35d → 65d (5d past the 60d stale window so we get a clean
STALE_SEED → EMPTY transition without keys vanishing mid-warn).
- runSeed opts.maxStaleMin 10080 (7d) → 86400 (60d) so the in-seeder
declaration matches api/health.js. Field is only validated for
presence by runSeed (scripts/_seed-utils.mjs:798), but the drift was
what hid the TTL invariant in the first place.
Invariant this restores: for any SEED_META entry,
seeder CACHE_TTL ≥ maxStaleMin + buffer
so the "warn before crit" gradient actually exists.
* fix(freshness): wire national-debt to seed-meta + teach extractTimestamp about seededAt
Reviewer P2 on PR #3237: my earlier freshness.mjs bump to 86400 was a
no-op. classifyInputs() (scripts/regional-snapshot/freshness.mjs:100-108,
122-132) uses the entry's metaKey or extractTimestamp()'s known field
list. national-debt had neither — payload carries only `seededAt`, and
extractTimestamp didn't know that field, so the "present but undated"
branch treated every call as fresh. The age window never mattered.
Two complementary fixes:
1. Add metaKey: 'seed-meta:economic:national-debt' to the freshness
entry. Primary, authoritative source — seed-meta.fetchedAt is
written by writeFreshnessMetadata() on every successful run, which is
also what api/health.js reads, keeping both surfaces consistent.
2. Add `seededAt` to extractTimestamp()'s field list. Defense-in-depth:
many other runSeed-based scripts (seed-iea-oil-stocks,
seed-eurostat-country-data, etc.) wrap output as { ..., seededAt: ISO }
with no metaKey in the freshness registry. Without this, they were
also silently always-fresh. ISO strings parse via Date.parse.
Note: `economic:eu-gas-storage:v1` uses `seededAt: String(Date.now())` —
a stringified epoch number, which Date.parse does NOT handle. That seed's
freshness classification is still broken by this entry's lack of metaKey,
but it's a separate shape issue out of scope here. Flagged in PR body.
|
||
|
|
42a86c5859 |
fix(preview): skip premium RPCs when main app runs inside /pro live-preview iframe (#3235)
* fix(preview): skip premium RPCs when main app runs inside /pro preview iframe pro-test/src/App.tsx embeds the full main app as a "live preview" via <iframe src="https://worldmonitor.app?alert=false" sandbox="...">. The iframe boots an anonymous main-app session, which fires premium RPCs (get-regional-snapshot, get-tariff-trends, list-comtrade-flows, and on country-click the fetchProSections batch) with no Clerk bearer available. Every call 401s, the circuit breakers catch and fall through to empty fallbacks (so the preview renders fine), but the 401s surface on the PARENT /pro page's DevTools console and Sentry because `sandbox` includes `allow-same-origin`. Net effect: /pro pricing page shows a flood of fake-looking errors that cost us a session of debugging to trace back to the iframe. PR #3233's premiumFetch swap didn't help here (there's simply no token to inject for an anonymous iframe). Introduce `src/utils/embedded-preview.ts::IS_EMBEDDED_PREVIEW`, a module-level boolean evaluated once at load from `window.top !== window` (with try/catch for cross-origin sandboxes), and short-circuit three init-time premium entry points when true: - RegionalIntelligenceBoard.loadCurrent → renderEmpty() - fetchTariffTrends → return emptyTariffs - fetchComtradeFlows → return emptyComtrade Plus one defensive gate in country-intel.fetchProSections for the case a user clicks a country inside the iframe preview. Each gate returns the exact same empty fallback the breaker would have produced after a 401, so visual behavior is unchanged — the preview iframe still shows the dashboard layout with empty premium panels, just without the network request and its console/Sentry trail. Live-tab /pro page should now see zero 401s from regional-snapshot / tariff-trends / comtrade-flows on load. * fix(preview): narrow iframe gate to ?embed=pro-preview marker only Reviewer flagged that the first iteration's `window.top !== window` check was too broad. The repo explicitly markets "Embeddable iframe panels" as an Enterprise feature (pro-test/src/locales/en.json: whiteLabelDesc), so legitimate customer embeds must keep firing premium RPCs normally. Only the /pro marketing preview — which is known-anonymous and generates expected 401 noise — should short-circuit. Fix: replace the blanket iframe check with a unique marker that only /pro's preview iframe carries. - pro-test/src/App.tsx: iframe src switched from `?alert=false` (dead param, unused in main app) to `?embed=pro-preview`. Rebuilt public/pro/ to ship the change. - src/utils/embedded-preview.ts: two-gate check now. Gate 1 still requires `window.top !== window` so the marker leaking into a top-level URL doesn't disable premium RPCs for the top-level app. Gate 2 requires `?embed=pro-preview` in location.search so only the known embedder matches. Enterprise white-label embeds without this marker behave exactly like a top-level visit. Same three premium fetchers + the one country-intel path still gate on IS_EMBEDDED_PREVIEW; the semantic change is purely in how the flag is computed. Per PR #3229 / #3228 lesson, the pro-test rebuild ships in the same PR as the source change — public/pro/assets/index-*.js and index.html reflect the new iframe src. |
||
|
|
240abaa8ed |
fix(premium): route RPC clients through premiumFetch — stop 401s for pro users (#3233)
* fix(premium): route premium RPC clients through premiumFetch
Four generated-client instantiations were using plain globalThis.fetch,
bypassing the Clerk bearer / tester-key / WORLDMONITOR_API_KEY injection
chain. Signed-in pro users hit the premium endpoints unauthenticated
and got 401, with no visible path to recovery:
- src/components/RegionalIntelligenceBoard.ts
→ get-regional-snapshot, get-regime-history, get-regional-brief
- src/components/DeductionPanel.ts
→ deduct-situation, list-market-implications
- src/services/trade/index.ts
→ get-tariff-trends, list-comtrade-flows (+ non-premium siblings)
- src/app/country-intel.ts::fetchProSections
→ get-national-debt, getRegimeHistory/getRegionalBrief,
list-comtrade-flows
Swap each to `fetch: premiumFetch` (src/services/premium-fetch.ts),
which tries in order: existing auth header → WORLDMONITOR_API_KEY →
tester key → Clerk bearer token → unauthenticated passthrough. For
non-premium endpoints that share the same client (e.g. getTradeFlows,
getCustomsRevenue) the fallthrough behavior is identical to plain
globalThis.fetch — no regression surface.
Surfaces as user-facing 401 on /pro sign-in → redirect-to-/ flow, where
pro users briefly see the dashboard try to fetch then hit 401. After
this fix the bearer token flows through and regional/deduction/trade
panels populate as expected.
Left un-touched (not hitting premium paths today, so not blocking):
- src/services/gdelt-intel.ts (searchGdeltDocuments)
- src/services/social-velocity.ts (getSocialVelocity)
- src/services/pizzint.ts (getPizzintStatus)
If any of those ever move into PREMIUM_RPC_PATHS, swap them too.
* fix(premium): split trade client + disable premium-breaker persistCache
Review found a real access-control leak in the first iteration of this
PR: routing the entire shared TradeServiceClient through premiumFetch
populated module-level breakers with `persistCache: true` and
auth-invariant cache keys. A pro user's get-tariff-trends /
list-comtrade-flows response would be written to localStorage by the
breaker, and a later free / signed-out session on the same browser
would be served that cached premium data directly, bypassing both
auth injection and the gateway's entitlement check.
Two-layer fix so neither the in-memory breaker nor the persistent
cache can leak premium data across auth states:
1. **Split clients.** publicClient keeps plain globalThis.fetch and
feeds restrictions/flows/barriers/revenue breakers (non-premium,
shareable across users). premiumClient uses premiumFetch and is
ONLY used by fetchTariffTrends + fetchComtradeFlows.
2. **Disable persistCache for premium breakers.** tariffsBreaker and
comtradeBreaker flip to `persistCache: false`. In-memory cache
within a session is still fine (and expected for circuit-breaker
behavior), but the response no longer survives a reload / cross-
session switch where a different user could read it.
Both changes are needed: split clients alone would still let premium
responses ride through the old cached entries (if any) after the
deploy; persistCache:false alone would still mean a shared client
routes anonymous calls through premiumFetch (minor but avoidable
token leak potential). Together they're airtight for the leak vector.
Follow-up potentially worth doing: auth-keyed cache keys for breakers
used by premium data, so the in-tab SPA sign-out case is also sealed.
Not blocking today.
* fix(premium): invalidate in-memory premium breakers on Clerk identity change
Review caught the remaining leak vector: persistCache:false + split
clients closes cross-browser-reload leaks, but the module-level
tariffsBreaker/comtradeBreaker in-memory cache still lives for the
full 30-min / 6-hour TTL inside a single SPA session. A pro user
loads tariff/comtrade data → signs out in the same tab → any caller
is served the pro response from memory without re-auth.
Track the Clerk user id that last populated the premium breakers in a
module-level `lastPremiumUserId`. On every call to fetchTariffTrends
or fetchComtradeFlows, check the current Clerk identity via
getCurrentClerkUser(). If it changed (sign-out, user switch,
free↔pro transition), call clearMemoryCache() on both premium
breakers before executing. The breaker then falls back to a live
fetch through premiumFetch with the new caller's credentials.
`clearMemoryCache` (not `clearCache`) is deliberate — it only touches
the in-memory cache and leaves persistent storage alone. Non-premium
breakers on this same client (restrictions, flows, barriers, revenue)
are untouched: their responses are public and shareable across auth
states, so an identity-triggered clear would only cost us cache hits
with zero security benefit.
Edge cases handled:
- First call: lastPremiumUserId is undefined, no clear.
- Anonymous → anonymous: no clear (both null).
- Anonymous → pro: clears (defensive; anon can't populate anyway
because emptyTariffs/emptyComtrade fail shouldCache).
- Pro → anonymous: clears (the critical case).
- Pro A → pro B: clears (account switch).
- getCurrentClerkUser throws (Clerk not loaded yet): treated as
anonymous, safe default.
Closes the audit cycle on this PR's cache-leak thread.
* fix(premium): invalidate breakers on entitlement change, not just user id
Reviewer caught that the prior fingerprint was `userId` only. That
covers sign-out, user switch, and account change — but NOT the case
the prior commit's comment explicitly claimed: a pro→free downgrade
for the same signed-in user (subscription cancellation, billing
grace-period expiry, plan switch to annual-lapsed). The Clerk user id
doesn't change, so the invalidation never fired and the cached pro
response kept serving until the 30-min / 6-hour breaker TTL.
Widen the fingerprint to `${userId}:${plan}` (or 'anon' when
signed-out). `getCurrentClerkUser()` already reads `plan` off
`user.publicMetadata`, so no new Clerk calls needed.
Now covered:
- pro A → pro B (userId change)
- pro → anon (userId → null)
- anon → pro (defensive)
- pro (active) → free (cancelled or expired) for same userId ← new
- free → pro (upgrade for same userId) ← new
Pro users who actively downgrade (or whose subscription lapses)
during an open tab will see the cached premium response invalidate
on the next premium fetcher call, at which point the live fetch
through premiumFetch + the gateway's entitlement check returns the
correct empty/403 response for their now-free plan.
* fix(premium): fingerprint on hasPremiumAccess, not Clerk publicMetadata.plan
Reviewer caught that Clerk publicMetadata.plan is NOT the authoritative
premium signal in this codebase. Per the explicit docstring on
src/services/panel-gating.ts::hasPremiumAccess (lines 11-27), the
webhook pipeline does NOT write publicMetadata.plan — the authoritative
signal is the Convex Dodo entitlement surfaced by isProUser() /
isEntitled(). Two fingerprint blind spots that ate the prior iteration:
- Paying user with valid Dodo entitlement but no Clerk metadata:
publicMetadata.plan === 'free' → fingerprint 'uid:free' → no
invalidation when their session transitions (they never looked
premium by the fingerprint, but premiumFetch was still injecting
their token and the gateway was still serving premium data).
- Active pro user whose Dodo subscription lapses: Clerk metadata
doesn't change → fingerprint stays 'uid:pro' → no invalidation
→ cached tariff/comtrade response keeps serving until TTL
(the exact pro→free case the previous commit claimed to cover).
Swap the plan source from getCurrentClerkUser().plan to
hasPremiumAccess() — the single source of truth used by panel-gating,
widgets, search, and event handlers. It unions API key, tester keys,
Clerk pro role, AND Convex entitlement, so every path that legitimately
grants premium access contributes to the fingerprint, and every path
that revokes it triggers invalidation.
Also add a reactive path: subscribe to onEntitlementChange() from
src/services/entitlements, and wipe the premium breakers the moment
Convex publishes a new entitlement snapshot. This closes the window
between subscription lapse and the user's next premium panel click —
the currently-open tariff panel clears its memory cache immediately
instead of serving stale pro data until the user navigates.
Combined: fingerprint is now (userId, hasPremiumAccess) tuple
evaluated both lazily on every premium fetcher call AND eagerly when
Convex pushes an entitlement change.
|
||
|
|
d1ebc84c6c |
feat(digest-dedup): single-link clustering (F1 0.73 vs 0.53 complete-link) (#3234)
Problem
-------
The post-threshold-tuning brief at
/api/brief/user_3BovQ1tYlaz2YIGYAdDPXGFBgKy/2026-04-20-1532 still
showed 4 copies of "US seizes Iranian ship", 3 copies of the Hormuz
closure, and 2 copies of the oil-price story — despite running the
calibrated 0.55 threshold.
Root cause: complete-link is too strict for wire-headline clustering.
Pairwise cosines in the 4-way ship-seizure cluster:
1 <-> 5: 0.632 5 <-> 8: 0.692
1 <-> 8: 0.500 5 <-> 10: 0.656
1 <-> 10: 0.554 8 <-> 10: 0.510
Complete-link requires EVERY pair to clear threshold. Pair 1<->8 at
0.500 fails so the whole 4-way cluster can't form, and all 4 stories
bubble up as separate reps, eating 4 slots of the 12-story brief.
Measured on the 12 real titles from that brief:
Algorithm | Clusters | F1 | P | R
--------------------------|----------|-------|------|------
complete-link @ 0.55 (was)| 7 | 0.526 | 0.56 | 0.50
complete-link @ 0.50 | 6 | 0.435 | 0.38 | 0.50
single-link @ 0.55 | 4 | 0.435 | 0.28 | 1.00 over-merge
single-link @ 0.60 | 6 | 0.727 | 0.67 | 0.80 winner
Change
------
scripts/lib/brief-dedup-embed.mjs:
New singleLinkCluster(items, {cosineThreshold, vetoFn}) using
union-find. Chain merges through strong intermediates when a
direct pair is weak; respects the entity veto (blocked pairs
don't union). O(N^2 alpha(N)); permutation-invariant by
construction.
scripts/lib/brief-dedup.mjs:
New DIGEST_DEDUP_CLUSTERING env var (default 'single', set
'complete' to revert). readOrchestratorConfig returns 'clustering'
field. Dispatch at call site picks the right function. Structured
log line now includes clustering=<algo>.
tests/brief-dedup-embedding.test.mjs:
+8 regressions:
- singleLinkCluster chains the 4-way through a bridge
- veto blocks unions even when cosine passes
- permutation-invariance property test (5 shuffles)
- empty-input
- DIGEST_DEDUP_CLUSTERING default is 'single'
- DIGEST_DEDUP_CLUSTERING=complete kill switch works
- unrecognised values fall back to 'single'
- log line includes clustering=<algo>
Bridge-pollution risk note
--------------------------
The original plan rejected single-link to avoid the Jaccard-era
"bridge pollution" (A~B=0.6, B~C=0.6, A~C=0.3 all chain through a
mixed-topic B). With text-embedding-3-small at cosine >= 0.60, a
bridge must be semantically real — the probe showed a 37% F1 bump
with no new FPs on the production case. Setting
DIGEST_DEDUP_CLUSTERING=complete on Railway is the instant rollback
if a bad day ever surfaces chaining.
Operator activation
-------------------
After merge, on Railway seed-digest-notifications service:
DIGEST_DEDUP_COSINE_THRESHOLD=0.60
No other changes needed — clustering=single is the default.
Verification
------------
- npm run test:data 5825/5825 pass
- tests/brief-dedup-embedding 53/53 pass (45 existing + 8 new)
- typecheck + typecheck:api clean
- biome check on changed files clean
Post-Deploy Monitoring & Validation
-----------------------------------
- Grep '[digest] dedup mode=embed clustering=single' in Railway logs
— confirms the new algo is live
- Expect clusters= to drop further on bulk ticks (stories=700+):
current ~23 on 84-story ticks -> expected ~15-18
- Manually open next brief post-deploy, visually verify ship-seizure
/ Hormuz / oil stories no longer duplicate
- Rollback: DIGEST_DEDUP_CLUSTERING=complete on Railway (instant,
no deploy), next cron tick reverts to old behaviour
- Validation window: 24h
- Owner: koala73
Related
-------
- #3200 embedding-based dedup (introduced complete-link)
- #3224 DIGEST_SCORE_MIN floor (the low-importance half of the fix)
|
||
|
|
d7393d8010 |
fix(pro): downgrade @clerk/clerk-js to v5 to restore auto-mount UI (#3232)
The actual root cause behind the "Clerk was not loaded with Ui
components" sign-in failure on /pro is NOT the import path — it's
that pro-test was on @clerk/clerk-js v6.4.0 while the main app
(which works fine) is on v5.125.7.
Clerk v6 fundamentally changed `clerk.load()`: the UI controller
is no longer auto-mounted by default. Both `@clerk/clerk-js` (the
default v6 entry) and `@clerk/clerk-js/no-rhc` (the bundled-UI
variant) expect the caller to either:
- load Clerk's UI bundle from CDN and pass `window.__internal_ClerkUICtor`
to `clerk.load({ ui: { ClerkUI } })`, or
- manually wire up `clerkUICtor`.
That's why my earlier "switch to no-rhc" fix (PR #3227 + #3228)
didn't actually unbreak production — both v6 variants throw the same
assertion. The error stack on the deployed bundle confirmed it:
`assertComponentsReady` from `clerk.no-rhc-UeQvd9Xf.js`.
Fix: pin pro-test to `@clerk/clerk-js@^5.125.7` to match the main
app's working version. v5 still auto-mounts UI on `clerk.load()` —
no extra wiring needed. The plain `import { Clerk } from '@clerk/clerk-js'`
pattern (which the main app uses verbatim and which pro-test had
before #3227) just works under v5.
Verification of the rebuilt bundle (chunk: clerk-PNSFEZs8.js):
- 3.05 MB (matches main app's clerk-DC7Q2aDh.js: 3.05 MB)
- 44 occurrences of mountComponent (matches main: 44)
- 3 occurrences of SignInComponent (matches main: 3)
- 0 occurrences of "Clerk was not loaded with Ui" (the assertion
error string is absent; UI is unconditionally mounted)
Includes the rebuilt public/pro/ artifacts so this fix is actually
deployed (PR #3229's CI check will catch any future PR that touches
pro-test/src without rebuilding).
|
||
|
|
0a4eff0053 |
feat(portwatch): split port-activity into standalone Railway cron + restore per-country shape (#3231)
Context: PR #3225 globalised EP3 because the per-country shape was missing the section budget. Post-merge production log (2026-04-20) proved the globalisation itself was worse: 42s/page full-table scans (ArcGIS has no `date` index — confirmed via service metadata probe) AND intermittent "Invalid query parameters" on the global WHERE. Probes of outStatistics as an alternative showed it works for small countries (BRA: 19s, 103 ports) but times out server-side for heavy ones (USA: 313k historic rows, 30s+ server-compute, multiple retries returned HTTP_STATUS 000). Not a reliable path. The only shape ArcGIS reliably handles is per-country WHERE ISO3='X' AND date > Y (uses the ISO3 index). Its problem was fitting 174 countries in the 420s portwatch bundle budget — solve that by giving it its own container. Changes: - scripts/seed-portwatch-port-activity.mjs: restore per-country paginated EP3 with the accumulator shape from PR #3225 folded into the per-country loop (memory stays O(ports-per-country), not O(all-rows)). Keep every stabiliser: AbortSignal.any through fetchWithTimeout, SIGTERM handler with stage/batch/errors flush, per-country Promise.race with AbortController that actually cancels the work, eager p.catch for mid-batch error flush. - Add fetchWithRetryOnInvalidParams — single retry on the specific "Invalid query parameters" error class ArcGIS has returned intermittently in prod. Does not retry other error classes. - Bump LOCK_TTL_MS from 30 to 60 min to match the wider wall-time budget of the standalone cron. - scripts/seed-bundle-portwatch.mjs: remove PW-Port-Activity from the main portwatch bundle. Keeps PW-Disruptions (hourly), PW-Main (6h), PW-Chokepoints-Ref (weekly). - scripts/seed-bundle-portwatch-port-activity.mjs: new 1-section bundle. 540s section timeout, 570s bundle budget. Includes the full Railway service provisioning checklist in the header. - Dockerfile.seed-bundle-portwatch-port-activity: mirrors the resilience-validation pattern — node:22-alpine, full scripts/ tree copy (avoids the add-an-import-forget-to-COPY class that has bit us 3+ times), shared/ for _country-resolver. - tests/portwatch-port-activity-seed.test.mjs: rewrite assertions for the per-country shape. 54 tests pass (was 50, +4 for new assertions on the standalone bundle + Dockerfile + retry wrapper + ISO3 shape). Full test:data: 5883 pass. Typecheck + lint clean. Post-merge Railway provisioning: see header of seed-bundle-portwatch-port-activity.mjs for the 7-step checklist. |
||
|
|
234ec9bf45 |
chore(ci): enforce pro-test bundle freshness — local hook + CI backstop (#3229)
* chore(ci): enforce pro-test bundle freshness, prevent silent deploy staleness public/pro/ is committed to the repo and served verbatim by Vercel. The root build script only runs the main app's vite build — it does NOT run pro-test's build. So any PR that changes pro-test/src/** without manually running `cd pro-test && npm run build` and committing the regenerated chunks ships to production with a stale bundle. This footgun just cost us: PR #3227 fixed the Clerk "not loaded with Ui components" sign-in bug in source, merged, deployed — and the live site still threw the error because the committed chunk under public/pro/assets/ was the pre-fix build. PR #3228 fix-forwarded by rebuilding. Two-layer enforcement so it doesn't happen again: 1. .husky/pre-push — mirrors the existing proto freshness block. If pro-test/ changed vs origin/main, rebuild and `git diff --exit-code public/pro/`. Blocks the push with a clear message if the bundle is stale or untracked files appear. 2. .github/workflows/pro-bundle-freshness.yml — CI backstop on any PR touching pro-test/** or public/pro/**. Runs `npm ci + npm run build` in pro-test and fails the check if the working tree shows any diff or untracked files under public/pro/. Required before merge, so bypassing the local hook still can't land a stale bundle. Note: the hook's diff-against-origin/main check means it skips the build when pushing a branch that already matches main on pro-test/ (e.g. fix-forward branches that only touch public/pro/). CI covers that case via its public/pro/** path filter. * fix(hooks): scope pro-test freshness check to branch delta, not worktree The first version of this hook used `git diff --name-only origin/main -- pro-test/`, which compares the WORKING TREE to origin/main. That fires on unstaged local pro-test/ scratch edits and blocks pushing unrelated branches purely because of dirty checkout state. Switch to `$CHANGED_FILES` (computed earlier at line 77 from `git diff origin/main...HEAD`), which scopes the check to commits on the branch being pushed. This matches the convention the test-runner gates already use (lines 93-97). Also honor `$RUN_ALL` as the safety fallback when the branch delta can't be computed. * fix(hooks): trigger pro freshness check on public/pro/ too, match CI The first scoping fix used `^pro-test/` only, but the CI workflow keys off both `pro-test/**` AND `public/pro/**`. That left a gap: a bundle-only PR (e.g. a fix-forward rebuild like #3228, or a hand-edit to a committed asset) skipped the local check entirely while CI would still validate it. The hook and CI are now consistent. Trigger condition: `^(pro-test|public/pro)/` — the rebuild + diff check now fires whenever the branch delta touches either side of the source/artifact pair, matching the CI workflow's path filter. |
||
|
|
9e022f23bb |
fix(cable-health): stop EMPTY alarm during NGA outages — writeback fallback + mark zero-events healthy (#3230)
User reported health endpoint showing:
"cableHealth": { status: "EMPTY", records: 0, seedAgeMin: 0, maxStaleMin: 90 }
despite the 30-min warm-ping loop running. Two bugs stacked:
1. get-cable-health.ts null-upstream path didn't write Redis.
cachedFetchJson with a returning-null fetcher stores NEG_SENTINEL
(10 bytes) in cable-health-v1 for 2 min. Handler then returned
`fallbackCache || { cables: {} }` to the client WITHOUT writing to
cable-health-v1 or refreshing seed-meta. api/health.js saw strlen=10
→ strlenIsData=false → hasData=false → records=0 → EMPTY (CRIT).
Fix: on null result, write the fallback response back to CACHE_KEY
(short TTL matching NEG_SENTINEL so a recovered NGA fetch can
overwrite immediately) AND refresh seed-meta with the real count.
Health now sees hasData=true during an outage.
2. Zero-cables was treated as EMPTY_DATA (CRIT), but `cables: {}` is
the valid healthy state — NGA had no active subsea cable warnings.
The old `Math.max(count, 1)` on recordCount was an intentional lie
to sidestep this; now honest.
Fix: add `cableHealth` to EMPTY_DATA_OK_KEYS. Matches the existing
pattern for notamClosures, gpsjam, weatherAlerts — "zero events is
valid, not critical". recordCount now reports actual cables.length.
Combined: NGA outage → fallback cached locally + written back → health
reads hasData=true, records=N, no false alarm. NGA healthy with zero
active warnings → cables={}, records=0, EMPTY_DATA_OK → OK. NGA healthy
with warnings → cables={...}, records>0 → OK.
Regression guard to keep in mind: if anyone later removes cableHealth
from EMPTY_DATA_OK_KEYS and wants strict zero-events to alarm, they'd
also need to revisit `Math.max(count, 1)` or an equivalent floor so
the "legitimately empty but healthy" state doesn't CRIT.
|
||
|
|
2b7f83fd3e |
fix(pro): regenerate /pro bundle with no-rhc Clerk so deploy reflects #3227 (#3228)
PR #3227 fixed pro-test/src/services/checkout.ts to import @clerk/clerk-js/no-rhc instead of the headless main export, but the deployed bundle in public/pro/assets/ was never regenerated. The Vercel deploy ships whatever is committed under public/pro/ — the root build script does not run pro-test's vite build — so production /pro continued serving the old broken clerk-C6kUTNKl.js even after #3227 merged. Sign-in still threw "Clerk was not loaded with Ui components". Rebuild: cd pro-test && npm run build, which writes the new chunks to ../public/pro/assets/. Deletes the stale clerk-C6kUTNKl.js + index-J1JYVlDk.js, adds clerk.no-rhc-UeQvd9Xf.js + index-CFLOgmG-.js, and updates pro/index.html to reference them. |
||
|
|
7979b4da0e |
fix(pro): switch Clerk to no-rhc bundle so sign-in modal mounts on /pro (#3227)
* fix(pro): switch Clerk to no-rhc bundle so sign-in modal mounts
The /pro marketing page was throwing "Clerk was not loaded with Ui
components" the moment an unauthenticated user clicked Sign In or
GET STARTED on a pricing tier, blocking every conversion.
@clerk/clerk-js v6 main export (`dist/clerk.mjs`) is the headless
build — it has no UI controller and expects `clerkUICtor` passed to
`clerk.load()`. Calling `openSignIn()` on it always throws. The
bundled-with-UI variant is exposed at `@clerk/clerk-js/no-rhc`
(same `Clerk` named export, drop-in).
Also adds explicit `Sentry.captureException` at both call sites,
because the rejection was being swallowed by `.catch(console.error)`
in App.tsx and by an unwrapped `c.openSignIn()` in checkout.ts —
which is why this regression had zero Sentry trail in production.
* fix(pro): catch Clerk load failures in startCheckout, not just openSignIn
The PricingSection CTA fires-and-forgets `startCheckout()` with no
.catch. The previous fix only wrapped `c.openSignIn()`, so any
rejection from `await ensureClerk()` (dynamic import failure, network
loss mid-load, clerk.load() throwing) still escaped as an unhandled
promise — defeating the Sentry coverage we added.
Now `startCheckout()` reports both load and openSignIn failures
explicitly and returns false rather than rejecting.
Also clear the cached `clerkLoadPromise` on failure so the next
button click can retry from scratch instead of replaying a rejected
promise forever.
* fix(pro): only publish Clerk instance after load() succeeds
_loadClerk() was assigning the module-level `clerk` singleton before
awaiting `clerk.load()`. If load() rejected (transient network failure,
malformed publishable key, Clerk frontend-api 4xx/5xx), the half-
initialized instance stayed cached. The next ensureClerk() call then
short-circuited on `if (clerk) return clerk;` and returned the broken
instance, bypassing the retry path that commit
|
||
|
|
1928b48e68 |
feat(portwatch): globalise EP3 — one paginated fetch, in-memory groupBy (#3225)
* feat(portwatch): globalise EP3 — one paginated fetch, in-memory groupBy
Follow-up to #3222 (stabiliser) — the real fix. Production log
2026-04-20T06:48-06:55 confirmed the stabilisers worked (per-country
cap enforced at 90.0s, SIGTERM printed stage+errors, abort propagated
through fetch + proxy paths) — but also proved the per-country shape
itself is the bug:
batch 1/15: 7 seeded, 5 errors (90.0s) ← per-country cap hit cleanly
batch 5/15: 40 seeded, 20 errors (371.3s) ← 4 batches / ~70s avg
SIGTERM at batch 6/15 after 420s
15 batches × ~70s = 1050s. Section budget is 420s. Per-country will
never fit, even with a perfectly-behaving ArcGIS. Three countries
(BRA, IDN, NGA) also returned "Invalid query parameters" on the
ISO3-filtered WHERE — a failure mode unique to the per-country shape.
Fix: replace 174 per-country round-trips with a single paginated pass
over EP3, grouped by ISO3 in memory (same pattern EP4 refs already
use via `fetchAllPortRefs`). ~150-200 sequential pages × ~1s each
≈ 2-4 min total wall time inside the 420s section. Eliminates the
per-country failure modes by construction.
Changes:
- New `fetchAllActivityRows(since, { signal, progress })`: paginated
`WHERE date > <ts>` across the whole Daily_Ports_Data feature server,
grouped by attributes.ISO3 into Map<iso3, rows[]>. Advances offset by
actual features.length (same server-cap defence as EP4). Checks
signal.aborted between pages.
- `fetchAll()` now reads the global map and drives `computeCountryPorts`
for each eligible ISO3. No concurrency primitives, no batch loop, no
Promise.allSettled.
- Dropped: `processCountry`, `withPerCountryTimeout`, per-country
`fetchActivityRows`, CONCURRENCY, PER_COUNTRY_TIMEOUT_MS,
BATCH_LOG_EVERY. All dead under the global pattern.
- `progress` shape now `{ stage, pages, countries }`. SIGTERM handler
logs "SIGTERM during stage=<x> (pages=N, countries=M)" — still
useful forensics if the global paginator itself hangs.
- Shutdown controller: `main()` creates an AbortController, threads
its signal through fetchAll → fetchAllActivityRows → fetchWithTimeout
→ _proxy-utils, and the SIGTERM handler calls abort() so in-flight
HTTP work stops instead of burning SIGKILL grace window. Reuses the
signal-threading plumbing shipped in #3222.
Preserved: degradation guard (>20% drop rejects), TTL extension on
failure, lock release in finally, 429 proxy fallback with signal
propagation, page-level abort checks.
Tests: 43 pass (dropped 2 withPerCountryTimeout runtime tests that
targeted removed code; kept proxyFetch pre-aborted-signal test since
the proxy plumbing is still exercised by the global fetch). Full
test:data 5865 pass. Typecheck + lint clean.
* fix(portwatch): stream-aggregate EP3 into per-port accumulators (PR #3225 P1)
Review feedback on PR #3225: the first globalisation pass materialised
the full 90-day activity dataset as Map<iso3, Feature[]> before any
aggregation. At ~2000 ports × 90 days ≈ 180k feature objects × ~400
bytes = ~70MB RSS. Trades the timeout failure mode for an OOM/restart
under large datasets on the 1GB Railway container.
Fix: replace the two-phase "fetch-all then compute" shape with a
single-pass streaming aggregator.
- `fetchAndAggregateActivity(since, { signal, progress })` folds each
page's features into Map<iso3, Map<portId, PortAccum>> inline and
discards the raw features. Only ~2000 per-port accumulators (~100
bytes each = ~200KB) live across pages. Memory is O(ports), not
O(rows).
- PortAccum holds running counters for each aggregation window:
last30_calls, last30_count, last30_import, last30_export, prev30_calls,
last7_calls, last7_count. Captured once per port at first sighting
(date ASC order preserves old `rows[0].portname` behaviour).
- New `finalisePortsForCountry(portAccumMap, refMap)` — exported for
tests — computes the exact same per-port fields as the removed
`computeCountryPorts`: tankerCalls30d = last30_calls,
tankerCalls30dPrev = prev30_calls, import/export from their sums,
avg30d = last30_calls/last30_count, avg7d = last7_calls/last7_count,
anomalySignal unchanged, trendDelta unchanged, top-50 truncation
unchanged.
- `fetchAll()` now calls the aggregator + finaliser directly; the
transient Feature[] map is gone.
Preserved from PR #3225: shutdown AbortController plumbing, 429 proxy
fallback with signal propagation, degradation guard, SIGTERM
diagnostic flush, page-level abort checks.
Tests: 48 pass (was 43, +5 runtime tests for finalisePortsForCountry
covering trendDelta, anomalySignal, top-N truncation, and missing
refMap entries). Full test:data 5877 pass. Typecheck + lint clean.
* fix(portwatch): skip EP3 geometry + thread signal into EP4 refs (PR #3225 review)
Two valid findings from PR #3225 review on commit
|
||
|
|
1a2295157e |
feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief (#3224)
* feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief Problem ------- The 2026-04-20 08:00 brief contained 12 stories, 7 of which were duplicates of 4 events, alongside low-importance filler (niche commodity, domestic crime). notification-relay's IMPORTANCE_SCORE_MIN gate (#3223, set to 63) only applies to the realtime fanout path. The digest cron reads the same story:track:*.currentScore but has NO absolute score floor — it just ranks and slices(0, 30), so on slow news days low-importance items bubble up to fill slots. Change ------ scripts/seed-digest-notifications.mjs: - New getDigestScoreMin() reads DIGEST_SCORE_MIN env at call time (Railway flips apply on the next cron tick, no redeploy). - Default 0 = no-op, so this PR is behaviour-neutral until the env var is set on Railway. - Filter runs AFTER deduplicateStories() so it drops clusters by the REPRESENTATIVE's score (which is the highest-scoring member of its cluster per materializeCluster's sort). - One-line operator log when the floor fires: [digest] score floor dropped N of M clusters (DIGEST_SCORE_MIN=X) tests/digest-score-floor.test.mjs (6 regressions): - getDigestScoreMin reads from process.env (not a module const) - default is 0 (no-op) - rejects non-integer / negative values (degrades to 0) - filter runs AFTER dedup, BEFORE slice(0, DIGEST_MAX_ITEMS) - short-circuits when floor is 0 (no wasted filter pass) - log line emits "dropped N of M clusters" Operator activation ------------------- Set on Railway seed-digest-notifications service: DIGEST_SCORE_MIN=63 Start at 63 to match the realtime gate, then nudge up/down based on the log lines over ~24h. Unset = off (pre-PR behaviour). Why not bundle a cosine-threshold bump -------------------------------------- The cosine-threshold tuning (0.60 -> 0.55 per the threshold probe) is an env-only flip already supported by the dedup orchestrator. Bundling an env-default change into this PR would slow rollback. Operator sets DIGEST_DEDUP_COSINE_THRESHOLD=0.55 on Railway as a separate action; this PR stays scoped to the score floor. Verification ------------ - npm run test:data 5825/5825 pass - tests/digest-score-floor 6/6 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on changed files clean (pre-existing main() complexity warning on this file is unchanged) - lint:md 0 errors - version:check OK Post-Deploy Monitoring & Validation ----------------------------------- - **What to monitor** after setting DIGEST_SCORE_MIN on Railway: - `[digest] score floor dropped` lines — expect ~5-25% of clusters dropped on bulk-send ticks (stories=700+) - `[digest] Cron run complete: N digest(s) sent` stays > 0 - **Expected healthy behaviour** - 0-5 clusters dropped on normal ~80-story ticks - 50-200 dropped on bulk 700+ story ticks - brief still reports 10-30 stories for PRO users - **Failure signals / rollback** - 0 digests sent for 24h after flipping the var - user-visible brief now has < 10 stories - Rollback: unset DIGEST_SCORE_MIN on Railway dashboard (instant, no deploy), next cron tick reverts to unfiltered behaviour - **Validation window**: 24h - **Owner**: koala73 Related ------- - #3218 LLM prompt upgrade (source of importanceScore quality) - #3221 geopolitical scope for critical - #3223 notification-relay realtime gate (mirror knob) - #3200 embedding-based dedup (the other half of brief quality) * fix(digest): return null (not []) when score floor drains every cluster Greptile P2 finding on PR #3224. When DIGEST_SCORE_MIN is set high enough to filter every cluster, buildDigest previously returned [] (empty array). The caller's `if (!stories)` guard only catches falsy values, so [] slipped past the "No stories in window" skip-log and the run reached formatDigest([], nowMs) which returns null, then silently continued at the !storyListPlain check. Flow was still correct (no digest sent) but operators lost the observability signal to distinguish "floor too high" from "no news today" from "dedup ate everything". Fix: - buildDigest now returns null when the post-floor list is empty, matching the pre-dedup-empty path. Caller's existing !stories guard fires the canonical skip-log. - Emits a distinct `[digest] score floor dropped ALL N clusters (DIGEST_SCORE_MIN=X) — skipping user` line BEFORE the return, so operators can spot an over-aggressive floor in the logs. - Test added covering both the null-return contract and the distinct "dropped ALL" log line. 7/7 dedup-score-floor tests pass. |
||
|
|
4f38ee5a19 |
fix(portwatch): per-country timeout + SIGTERM progress flush (#3222)
* fix(portwatch): per-country timeout + SIGTERM progress flush
Diagnosed from Railway log 2026-04-20T04:00-04:07: Port-Activity section hit
the 420s section cap with only batch 1/15 logged. Gap between batch 1 (67.3s)
and SIGTERM was 352s of silence — batch 2 stalled because Promise.allSettled
waits for the slowest country and processCountry had no per-country budget.
One slow country (USA/CHN with many ports × many pages under ArcGIS EP3
throttling) blocked the whole batch and cascaded to the section timeout,
leaving batches 2..15 unattempted.
Two changes, both stabilisers ahead of the proper fix (globalising EP3):
1. Wrap processCountry in Promise.race against a 90s PER_COUNTRY_TIMEOUT_MS.
Bounds worst-case batch time at ~90s regardless of ArcGIS behaviour.
Orphan fetches keep running until their own AbortSignal.timeout(45s)
fires — acceptable since the process exits soon after either way.
2. Share a `progress` object between fetchAll() and the SIGTERM handler so
the kill path flushes batch index, seeded count, and the first 10 error
messages. Past timeout kills discarded the errors array entirely,
making every regression undiagnosable.
* fix(portwatch): address PR #3222 P1+P2 (propagate abort, eager error flush)
Review feedback on #3222:
P1 — The 90s per-country timeout did not actually stop the timed-out
country's work; Promise.race rejected but processCountry kept paginating
with fresh 45s fetch timeouts per page, violating the CONCURRENCY=12 cap
and amplifying ArcGIS throttling instead of containing it.
Fix: thread an AbortController signal from withPerCountryTimeout through
processCountry → fetchActivityRows → fetchWithTimeout. fetchWithTimeout
combines the caller signal with AbortSignal.timeout(FETCH_TIMEOUT) via
AbortSignal.any so the per-country abort propagates into the in-flight
fetch. fetchActivityRows also checks signal.aborted between pages so a
cancel lands on the next iteration boundary even if the current page
has already resolved. Node 24 runtime supports AbortSignal.any.
P2 — SIGTERM diagnostics missed failures from the currently-stuck batch
because progress.errors was only populated after Promise.allSettled
returned. A kill during the pending await left progress.errors empty.
Fix: attach p.catch(err => errors.push(...)) to each wrapped promise
before Promise.allSettled. Rejections land in the shared errors array
at the moment they fire, so a SIGTERM mid-batch sees every rejection
that has already occurred (including per-country timeouts that have
already aborted their controllers). The settled loop skips rejected
outcomes to avoid double-counting.
Also exports withPerCountryTimeout with an injectable timeoutMs so the
new runtime tests can exercise the abort path at 40ms. Runtime tests
verify: (a) timer fires → underlying signal aborted + work rejects with
the per-country message, (b) work-resolves-first returns the value,
(c) work-rejects-first surfaces the real error, (d) eager .catch flush
populates a shared errors array before allSettled resolves.
Tests: 45 pass (was 38, +7 — 4 runtime + 3 source-regex).
Full test:data: 5867 pass. Typecheck + lint clean.
* fix(portwatch): abort also cancels 429 proxy fallback (PR #3222 P1 follow-up)
Second review iteration on #3222: the per-country AbortController fix
from
|
||
|
|
6e639274f1 |
feat(scoring): set data-driven score gate thresholds (82/69/63) (#3223)
* feat(scoring): set data-driven score gate thresholds (82/69/63) Calibrated from v5 shadow-log recalibration on 2026-04-20: critical sensitivity: 85 → 82 (fires on Hormuz closures, ship seizures, ceasefire collapses) high sensitivity: 65 → 69 (fires on mass shootings, blockade enforcement, major diplomatic) all/default MIN: 40 → 63 (drops tutorials, domestic crime, niche commodity) Activation: set IMPORTANCE_SCORE_LIVE=1 + IMPORTANCE_SCORE_MIN=63 on Railway notification-relay env vars after this PR merges. Scoring pipeline journey: PR #3069 — fixed stale relay score (Pearson 0.31 → 0.41) PR #3143 — closed /api/notify bypass PR #3144 — weight rebalance severity 55% (Pearson 0.41 → 0.67) PR #3218 — LLM prompt upgrade + cache v2 PR #3221 — geopolitical scope for critical This PR — final threshold constants * fix(scoring): use IMPORTANCE_SCORE_MIN for 'all' sensitivity threshold Review found the hardcoded 63 for 'all' sensitivity diverged from the IMPORTANCE_SCORE_MIN env var used at the relay ingress gate. An operator setting IMPORTANCE_SCORE_MIN=40 would still have 'all' subscribers miss alerts scored 40-62. Now both gates use the same env var (default 63). |
||
|
|
14c1314629 |
fix(scoring): scope "critical" to geopolitical events, not domestic tragedies (#3221)
The weight rebalance (PR #3144) amplified a prompt gap: domestic mass shootings (e.g. "8 children killed in Louisiana") scored 88 because the LLM classified them as "critical" (mass-casualty 10+ killed) and the 55% severity weight pushed them into the critical gate. But WorldMonitor is a geopolitical monitor — domestic tragedies are terrible but not geopolitically destabilizing. Prompt change (both ais-relay.cjs + classify-event.ts): - "critical" now explicitly requires GEOPOLITICAL scope: "events that destabilize international order, threaten cross-border security, or disrupt global systems" - Domestic mass-casualty events (mass shootings, industrial accidents) moved to "high" — still important, but not critical-sensitivity alerts - Added counterexamples: "8 children killed in mass shooting in Louisiana → domestic mass-casualty → high" and "23 killed in fireworks factory explosion → industrial accident → high" - Retained: "700 killed in Sudan drone strikes → geopolitical mass- casualty in active civil war → critical" Classify cache: v2→v3 (bust stale entries that lack geopolitical scope). Shadow-log: v4→v5 (clean dataset for recalibration under the scoped prompt). 🤖 Generated with Claude Opus 4.6 via Claude Code Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e2255840f6 |
fix(sentry): allowlist third-party tile hosts for maplibre Failed-to-fetch filter (#3220)
* fix(sentry): allowlist third-party tile hosts for maplibre Failed-to-fetch filter Follow-up to #3217. The blanket "any maplibre frame + (hostname)" rule would drop real failures on our self-hosted R2 PMTiles bucket or any first-party fetch that happens to run on a maplibre-framed stack. Enumerated the actual third-party hosts our maplibre paths fetch from (tilecache.rainviewer.com, basemaps.cartocdn.com, tiles.openfreemap.org, protomaps.github.io) into a module-level Set and gated the filter on membership. First-party hosts keep surfacing. Updated regression test to mirror real-world mixed stacks (maplibre + first-party fetch wrapper) so the allowlist is what decides, not the pre-existing "all frames are maplibre internals" filter which is orthogonal. * fix(sentry): route maplibre AJAX errors past the generic vendor-only filter Review feedback: the broader "all non-infra frames are maplibre internals" TypeError filter at main.ts:287 runs BEFORE the new host-allowlist block and short-circuits it for all-vendor stacks. Meaning a self-hosted R2 basemap fetch failure whose stack is purely maplibre frames would still be silently dropped, defeating the point of the allowlist. Carve out the `Failed to fetch (<host>)` AJAX pattern: precompute `isMaplibreAjaxFailure` and skip the generic vendor filter when it matches, so the host-allowlist check is always the one that decides. Added two regression tests covering the all-maplibre edge case both ways: - allowlisted host + all-maplibre → still suppressed - non-allowlisted host + all-maplibre → surfaces |
||
|
|
1581b2dd70 |
fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 (#3218)
* fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 PR B of the scoring recalibration plan (docs/plans/2026-04-17-002). Builds on PR A (weight rebalance, PR #3144) which achieved Pearson 0.669. This PR targets the remaining noise in the 50-69 band where editorials, tutorials, and domestic crime score alongside real news. LLM prompt upgrade (both writers): - scripts/ais-relay.cjs CLASSIFY_SYSTEM_PROMPT: added per-level guidelines, content-type distinction (editorial/opinion/tutorial → info, domestic crime → info, mass-casualty → critical), and concrete counterexamples. - server/worldmonitor/intelligence/v1/classify-event.ts: same guidelines added to align the second cache writer. Classify cache bump: - classify:sebuf:v1: → classify:sebuf:v2: in all three locations (ais-relay.cjs classifyCacheKey, list-feed-digest.ts enrichWithAiCache, _shared.ts CLASSIFY_CACHE_PREFIX). Old v1 entries expire naturally (24h TTL). All items reclassified within 15 min of Railway deploy. Keyword additions (_classifier.ts): - HIGH: 'sanctions imposed', 'sanctions package', 'new sanctions' (phrase patterns — no false positives on 'sanctioned 10 individuals') - MEDIUM: promoted 'ceasefire' from LOW to reduce prompt/keyword misalignment during cold-cache window Shadow-log v4: - Clean dataset for post-prompt-change recalibration. v3 rolls off via 7-day TTL. Deploy order: Railway first (seedClassify prewarms v2 cache immediately), then Vercel. First ~15 min of v4 may carry stale digest-cached scores. 🤖 Generated with Claude Opus 4.6 via Claude Code + Compound Engineering v2.49.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(scoring): align classify-event prompt + remove dead keywords + update v4 docs Review findings on PR #3218: P1: classify-event.ts prompt was missing 2 counterexamples and the "Focus" line present in the relay prompt. Both writers share classify:sebuf:v2 cache, so differing prompts mean nondeterministic classification depending on which path writes first. Now both prompts have identical level guidelines and counterexamples (format differs: array vs single object, but classification logic is aligned). P2: Removed 3 dead phrase-pattern keywords (sanctions imposed/package/ new) — the existing 'sanctions' entry already substring-matches all of them and maps to the same (high, economic). Just dead code that misled readers into thinking they added coverage. P2: Updated stale v3 references in cache-keys.ts (doc block + exported constant) and shadow-score-report.mjs header to v4. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
84eec7f09f |
fix(health): align breadthHistory maxStaleMin with actual Tue-Sat cron schedule (#3219)
Production alarm: `breadthHistory` went STALE_SEED every Monday morning despite the seeder running correctly. Root cause was a threshold / schedule mismatch: - Schedule (Railway): 02:00 UTC, Tuesday through Saturday. Five ticks per week, capturing Mon-Fri market close → following-day 02:00 UTC. - Threshold: maxStaleMin=2880 (48h), assuming daily cadence. - Max real gap: Sat 02:00 UTC → Tue 02:00 UTC = 72h. The existing 48h alarm fired every Monday at ~02:00 UTC when the Sun/Mon cron ticks are intentionally absent, until the Tue 02:00 UTC run restored fetchedAt. Fix: bump maxStaleMin to 5760 (96h). 72h covers the weekend gap; extra 24h tolerates one missed Tue run without alarming. Comment now records the actual schedule + reasoning. No seeder change needed — logs confirm the service fires and completes correctly on its schedule (Apr 16/17/18 02:00 UTC runs all "Done" with 3/3 readings, `Stopping Container` is normal Railway cron teardown). Diagnostic memo: this is the class of bug where the schedule comment lies. Original comment said "daily cron at 21:00 ET". True start time is 22:00 EDT / 21:00 EST Mon-Fri (02:00 UTC next day) AND only Mon-Fri, so "daily" is wrong by two days every week. |
||
|
|
0bc5b49267 |
fix(sentry): filter MapLibre AJAXError tile-fetch transients (#3217)
WORLDMONITOR-NE/NF (8 events, 1 user): MapLibre wraps transient tile fetch failures as `TypeError: Failed to fetch (<hostname>)` and rethrows inside a Generator-backed Promise, leaking to onunhandledrejection even though DeckGLMap's map-error handler already logs them as warnings. Triggered mostly by adblockers/extensions and flaky mobile networks. Add a beforeSend filter gated on (a) the maplibre-specific paren message format (`Failed to fetch (hostname)` — our own fetch code throws plain `Failed to fetch` without hostname) and (b) presence of a maplibre vendor frame in the stack, so a real first-party fetch regression with the same message shape still surfaces. Covered by 3 regression tests. |
||
|
|
fc0c6bc163 |
fix(convex): use ConvexError for AUTH_REQUIRED so Sentry treats it as expected (#3216)
* fix(convex): use ConvexError for AUTH_REQUIRED so Sentry treats it as expected
WORLDMONITOR-N3: 8 events / 2 users from server-side Convex reporting
`Uncaught Error: Authentication required` thrown by requireUserId() when
a query fires before the WebSocket auth handshake completes. Every other
business error in this repo uses ConvexError("CODE"), which Convex's
server-side Sentry integration treats as expected rather than unhandled.
Migrate requireUserId to ConvexError("AUTH_REQUIRED") (no consumer parses
the message string — only a code comment references it) and add a matching
client-side ignoreErrors pattern next to the existing API_ACCESS_REQUIRED
precedent, as defense-in-depth against unhandled rejections reaching the
browser SDK.
* fix(sentry): drop broad AUTH_REQUIRED ignoreErrors — too many real call sites
Review feedback: requireUserId() backs user-initiated actions (checkout,
billing portal, API key ops), not just the benign query-race path. A bare
`ConvexError: AUTH_REQUIRED` message-regex in ignoreErrors has no stack
context, so a genuine auth regression breaking those flows for signed-in
users would be silently dropped. The server-side ConvexError migration in
convex/lib/auth.ts is enough to silence WORLDMONITOR-N3; anything that
still reaches the browser SDK should surface.
|
||
|
|
4c9888ac79 |
docs(mintlify): panel reference pages (PR 2) (#3213)
* docs(mintlify): add user-facing panel reference pages (PR 2)
Six new end-user pages under docs/panels/ for the shipped panels that
had no user-facing documentation in the published docs, per the plan
docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md.
All claims are grounded in the live component source + SEED_META +
handler dirs — no invented fields, counts, or refresh windows.
- panels/latest-brief.mdx — daily AI brief panel (ready/composing/
locked states). Hard-gated PRO (`premium: 'locked'`).
- panels/forecast.mdx — AI Forecasts panel (internal id `forecast`,
label "AI Forecasts"). Domain + macro-region filter pills; 10%
probability floor. Free on web, locked on desktop.
- panels/consumer-prices.mdx — 5-tab retail-price surface (Overview
/ Categories / Movers / Spread / Health) with market, basket, and
7/30/90-day controls. Free.
- panels/disease-outbreaks.mdx — WHO / ProMED / national health
ministries outbreak alerts with alert/warning/watch pills. Free.
- panels/radiation-watch.mdx — EPA RadNet + Safecast observations
with anomaly scoring and source-confidence synthesis. Free.
- panels/thermal-escalation.mdx — FIRMS/VIIRS thermal clusters with
persistence and conflict-adjacency flags. Free.
Also:
- docs/docs.json — new Panels nav group (Latest Brief, AI Forecasts,
Consumer Prices, Disease Outbreaks, Radiation Watch, Thermal
Escalation).
- docs/features.mdx — cross-link every panel name in the Cmd+K
inventory to its new page (and link Country Instability + Country
Resilience from the same list).
- docs/methodology/country-resilience-index.mdx — short "In the
dashboard" bridge section naming the three CRI surfaces
(Resilience widget, Country Deep-Dive, map choropleth) so the
methodology page doubles as the user-facing panel reference for
CRI. No separate docs/panels/country-resilience.mdx — keeps the
methodology page as the single source of truth.
* docs(panels): fix Latest Brief polling description
Reviewer catch: the panel does schedule a 60-second re-poll while
in the composing state. `COMPOSING_POLL_MS = 60_000` at
src/components/LatestBriefPanel.ts:78, and `scheduleComposingPoll()`
is called from `renderComposing()` at :366. The poll auto-promotes
the panel to ready without a manual refresh and is cleared when
the panel leaves composing. My earlier 'no polling timer' line was
right for the ready state but wrong as a blanket claim.
* docs(panels): fix variant-availability claims across all 6 panel pages
Reviewer catch on consumer-prices surfaced the same class of error
on 4 other panel pages: I described variant availability with loose
phrasing ('most variants', 'where X context is relevant', 'tech/
finance/happy opt-in') that didn't match the actual per-variant
panel registries in src/config/panels.ts.
Verified matrix against each *_PANELS block directly:
Panel | FULL | TECH | FINANCE | HAPPY | COMMODITY
consumer-prices | opt | - | def | - | def
latest-brief | def | def | def | - | def (all PRO-locked)
disease-outbreaks | def | - | - | - | -
radiation-watch | def | - | - | - | -
thermal-escalation | def | - | - | - | -
forecast | def | - | - | - | - (PRO-locked on desktop)
All 6 pages now name the exact variant blocks in src/config/panels.ts
that register them, so the claim is re-verifiable by grep rather than
drifting with future panel-registry changes.
* docs(panels): fix 5 reviewer findings — no invented controls/sources/keys
All fixes cross-checked against source.
- consumer-prices: no basket selector UI exists. The panel has a
market bar, a range bar, and tab/category affordances; basket is
derived from market selection (essentials-<code>, or DEFAULT_BASKET
for the 'all' aggregate view). Per
src/components/ConsumerPricesPanel.ts:120-123 and :216-229.
- disease-outbreaks: 'Row click opens advisory' was wrong. The only
interactive elements in-row are the source-name <a> link
(sanitised URL, target=_blank); clicking the row itself is a no-op
(the only content-level listener is for [data-filter] pills and
the search input). Per DiseaseOutbreaksPanel.ts:35-49,115-117.
- disease-outbreaks: upstream list was wrong. Actual seeder uses
WHO DON (JSON API), CDC HAN (RSS), Outbreak News Today
(aggregator), and ThinkGlobalHealth disease tracker
(ProMED-sourced, 90d lookback). Noted the in-panel tooltip's
shorter 'WHO, ProMED, health ministries' summary and gave the full
upstream list with the 72h Redis TTL. Per seed-disease-outbreaks
.mjs:31-38.
- radiation-watch: summary bar renders 6 cards, not 7 — Anomalies,
Elevated, Confirmed, Low Confidence, Conflicts, Spikes. The
CPM-derived indicator is a per-row badge (radiation-flag-converted
at :67), not a summary card. Moved the CPM reference to the
per-row badges list. Per RadiationWatchPanel.ts:85-112.
- latest-brief: Redis key shape corrected. The composer writes the
envelope to brief:{userId}:{issueSlot} (where issueSlot comes from
issueSlotInTz, not a plain date) and atomically writes a latest
pointer at brief:latest:{userId} → {issueSlot}. Readers resolve
via the pointer. 7-day TTL on both. Per
seed-digest-notifications.mjs:1103-1115 and
api/latest-brief.ts:80-89.
* docs(panels): Tier 1 — PRO/LLM panel reference pages (9)
Adds user-facing panel pages for the 9 PRO/LLM-backed surfaces flagged
in the extended audit. All claims grounded in component source +
src/config/panels.ts entries (with line cites).
- panels/chat-analyst.mdx — WM Analyst (conversational AI, 5 quick
actions, 4 domain scopes, POSTs /api/chat-analyst via premiumFetch).
- panels/market-implications.mdx — AI Market Implications trade signals
(LONG/SHORT/HEDGE × HIGH/MEDIUM/LOW, transmission paths, 120min
maxStaleMin, degrade-to-warn). Carries the repo's disclaimer verbatim.
- panels/deduction.mdx — Deduct Situation (opt-in PRO; 5s cooldown;
composes buildNewsContext + active framework).
- panels/daily-market-brief.mdx — Daily Market Brief (stanced items,
framework selector, live vs cached source badge).
- panels/regional-intelligence.mdx — Regional Intelligence Board
(7 BOARD_REGIONS, 6 structured blocks + narrative sections,
request-sequence arbitrator, opt-in PRO).
- panels/strategic-posture.mdx — AI Strategic Posture (cached posture
+ live military vessels → recalcPostureWithVessels; free on web,
enhanced on desktop).
- panels/stock-analysis.mdx — Premium Stock Analysis (per-ticker
deep dive: signal, targets, consensus, upgrades, insiders, sparkline).
- panels/stock-backtest.mdx — Premium Backtesting (longitudinal view;
live vs cached data badge).
- panels/wsb-ticker-scanner.mdx — WSB Ticker Scanner (retail sentiment
+ velocity score with 4-tier color bucketing).
All 9 are PRO (8 via apiKeyPanels allowlist at src/config/panels.ts:973,
strategic-posture is free-on-web/enhanced-on-desktop). Variant matrices
name the exact *_PANELS block registering each panel.
* docs(panels): Tier 2 — flagship free data panels (7)
Adds reference pages for 7 flagship free panels. Every claim grounded
in the panel component + src/config/panels.ts per-variant registration.
- panels/airline-intel.mdx — 6-tab aviation surface (ops/flights/
airlines/tracking/news/prices), 8 aviation RPCs, user watchlist.
- panels/tech-readiness.mdx — ranked country tech-readiness index with
6-hour in-panel refresh interval.
- panels/trade-policy.mdx — 6-tab trade-policy surface (restrictions/
tariffs/flows/barriers/revenue/comtrade).
- panels/supply-chain.mdx — composite stress + carriers + minerals +
Scenario Engine trigger surface (free panel, PRO scenario activation).
- panels/sanctions-pressure.mdx — OFAC SDN + Consolidated list
pressure rollup with new/vessels/aircraft summary cards and top-8
country rows.
- panels/hormuz-tracker.mdx — Hormuz chokepoint drill-down; status
indicator + per-series bar charts; references Scenario Engine's
hormuz-tanker-blockade template.
- panels/energy-crisis.mdx — IEA 2026 Energy Crisis Policy Response
Tracker; category/sector/status filters.
All 7 are free. Variant matrices name exact *_PANELS blocks
registering each panel.
* docs(panels): Tier 3 — compact panels (5)
Adds reference pages for 5 compact user-facing panels.
- panels/world-clock.mdx — 22 global market-centre clocks with
exchange labels + open/closed indicators (client-side only).
- panels/monitors.mdx — personal keyword alerts, localStorage-persisted;
links to Features → Custom Monitors for longer explanation.
- panels/oref-sirens.mdx — OREF civil-defence siren feed; active +
24h wave history; free on web, PRO-locked on desktop (_desktop &&
premium: 'locked' pattern).
- panels/telegram-intel.mdx — topic-tabbed Telegram channel mirror
via relay; free on web, PRO-locked on desktop.
- panels/fsi.mdx — US KCFSI + EU FSI stress composites with
four-level colour buckets (Low/Moderate/Elevated/High).
All 5 grounded in component source + variant registrations.
oref-sirens and telegram-intel correctly describe the _desktop &&
locking pattern rather than the misleading 'PRO' shorthand used
earlier for other desktop-locked panels.
* docs(panels): Tier 4 + 5 catalogue pages, nav re-grouping, features cross-links
Closes out the comprehensive panel-reference expansion. Two catalogue
pages cover the remaining ~60 panels collectively so they're all
searchable and findable without dedicated pages per feed/tile.
- panels/news-feeds.mdx — catalogue covering all content-stream panels:
regional news (africa/asia/europe/latam/us/middleeast/politics),
topical news (climate/crypto/economic/markets/mining/commodity/
commodities), tech/startup streams (startups/unicorns/accelerators/
fintech/ipo/layoffs/producthunt/regionalStartups/thinktanks/vcblogs/
defense-patents/ai-regulation/tech-hubs/ai/cloud/hardware/dev/
security/github), finance streams (bonds/centralbanks/derivatives/
forex/institutional/policy/fin-regulation/commodity-regulation/
analysis), happy variant streams (species/breakthroughs/progress/
spotlight/giving/digest/events/funding/counters/gov/renewable).
- panels/indicators-and-signals.mdx — catalogue covering compact
market-indicator tiles, correlation panels, and misc signal surfaces.
Grouped by function: sentiment, macro, calendars, market-structure,
commodity, crypto, regional economy, correlation panels, misc signals.
docs/docs.json — split the Panels group into three for navigability:
- Panels — AI & PRO (11 pages)
- Panels — Data & Tracking (16 pages)
- Panels — Catalogues (2 pages)
docs/features.mdx — Cmd+K inventory rewritten as per-family sub-lists
with links to every panel page (or catalogue page for the ones
that live in a catalogue). Replaces the prior run-on paragraph.
Every catalogue panel is also registered in at least one *_PANELS
block in src/config/panels.ts — the catalogue pages note this and
point readers to the config file for variant-availability details.
* docs(panels): fix airline-intel + world-clock source-of-truth errors
- airline-intel: refresh behavior section was wrong on two counts.
(1) The panel DOES have a polling timer: a 5-minute setInterval
in the constructor calling refresh() (which reloads ops + active
tab). (2) The 'prices' tab does NOT re-fetch on tab switch —
it's explicitly excluded from both tab-switch and auto-refresh
paths, loading only on explicit search-button click. Three
distinct refresh paths now documented with source line hints.
Per src/components/AirlineIntelPanel.ts ~:173 (setInterval),
:287 (prices tab-switch guard), :291 (refresh() prices skip).
- world-clock: the WORLD_CITIES list has 30 entries, not '~22'.
Replaced the approximate count with the exact number and a
:14-43 line-range cite so it's re-verifiable.
|
||
|
|
d1a4cf7780 |
docs(mintlify): add Route Explorer + Scenario Engine workflow pages (#3211)
* docs(mintlify): add Route Explorer + Scenario Engine workflow pages Checkpoint for review on the IA refresh (per plan docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md). - docs/docs.json: link Country Resilience Index methodology under Intelligence & Analysis so the flagship 222-country feature is reachable from the main nav (previously orphaned). Add a new Workflows group containing route-explorer and scenario-engine. - docs/route-explorer.mdx: standalone workflow page. Who it is for, Cmd+K entry, four tabs (Current / Alternatives / Land / Impact), inputs, keyboard bindings, map-state integration, PRO gating with free-tier blur + public-route highlight, data sources. - docs/scenario-engine.mdx: standalone workflow page. Template categories (conflict / weather / sanctions / tariff_shock / infrastructure / pandemic), how a scenario activates on the map, PRO gating, pointers to the async job API. Deferred to follow-up commits in the same PR: - documentation.mdx landing rewrite - features.mdx refresh - maritime-intelligence.mdx link-out to Route Explorer - Panels nav group (waits for PR 2 content) All content grounded in live source files cited inline. * docs(mintlify): fix Route Explorer + Scenario Engine review findings Reviewer caught 4 cases where I described behavior I hadn't read carefully. All fixes cross-checked against source. - route-explorer (free-tier): the workflow does NOT blur a numeric payload behind a public demo route. On free tier, fetchLane() short-circuits to renderFreeGate() which blurs the left rail, replaces the tab area with an Upgrade-to-PRO card, and applies a generic public-route highlight on the map. No lane data is rendered in any tab. See src/components/RouteExplorer/ RouteExplorer.ts:212 + :342. - route-explorer (keyboard): Tab / Shift+Tab moves focus between the panel and the map. Direct field jumps are F (From), T (To), P (Product/HS2), not Tab-cycling. Also added the full KeyboardHelp binding list (S swap, ↑/↓ list nav, Enter commit, Cmd+, copy URL, Esc close, ? help, 1-4 tabs). See src/components/RouteExplorer/ KeyboardHelp.ts:9 and RouteExplorer.ts:623. - scenario-engine: the SCENARIO_TEMPLATES array only ships templates of 4 types today (conflict, weather, sanctions, tariff_shock). The ScenarioType union includes infrastructure and pandemic but no templates of those types ship. Dropped them from the shipped table and noted the type union leaves room for future additions. - scenario-engine + api-scenarios: the worker writes status: 'done' (not 'completed') on success, 'failed' on error; pending is synthesised by the status endpoint when no worker record exists. Fixed both the new workflow page and the merged api-scenarios.mdx completed-response example + polling language. See scripts/scenario-worker.mjs:421 and src/components/SupplyChainPanel.ts:870. * docs(mintlify): fix third-round review findings (real IDs + 4-state lifecycle) - api-scenarios (template example): replaced invented hormuz-closure-30d / ["hormuz"] with the actually-shipped hormuz-tanker-blockade / ["hormuz_strait"] from scenario- templates.ts:80. Listed the other 5 shipped template IDs so scripted users aren't dependent on a single example. - api-scenarios (status lifecycle): worker writes FOUR states, not three. Added the intermediate "processing" state with startedAt, written by the worker at job pickup (scenario- worker.mjs:411). Lifecycle now: pending → processing → done|failed. Both pending and processing are non-terminal. - scenario-engine (scripted use blurb): mirror the 4-state language and link into the lifecycle table. - scenario-engine (UI dismiss): replaced "Click Deactivate" with the actual × dismiss control on the scenario banner (aria-label: "Dismiss scenario") per src/components/SupplyChainPanel.ts:790. Also described the banner contents (name, chokepoints, countries, tagline). - api-shipping-v2: while fixing chokepoint IDs, also corrected "hormuz" → "hormuz_strait" and "bab-el-mandeb" → "bab_el_mandeb" across all four occurrences in the shipping v2 page (from PR #3209). Real IDs come from server/_shared/chokepoint- registry.ts (snake_case, not kebab-case, not bare "hormuz"). * docs(mintlify): fix fourth-round findings (banner DOM, webhook TTL refresh) - scenario-engine: accurate description of the rendered scenario banner. Always-present elements are the ⚠ icon, scenario name, top-5 impacted countries with impact %, and dismiss ×. Params chip (e.g. '14d · +110% cost') and 'Simulating …' tagline are conditional on the worker result carrying template parameters (durationDays, disruptionPct, costShockMultiplier). The banner never lists affected chokepoints by name — the map and the chokepoint cards surface those. Per renderScenarioBanner at src/components/SupplyChainPanel.ts:750. - api-shipping-v2 (webhook TTL): register extends both the record and the owner-index set's 30-day TTL via atomic pipeline (SET + SADD + EXPIRE). rotate-secret and reactivate only extend the record's TTL — neither touches the owner-index set, so the owner index can expire independently if a caller only rotates/reactivates within a 30-day window. Re-register to keep both alive. Per api/v2/shipping/webhooks.ts:230 (register pipeline) and :325 (rotate setCachedJson on record only). * docs(mintlify): fix PRO auth contract (trusted origin ≠ PRO) - api-scenarios: 'X-WorldMonitor-Key (or trusted browser origin) + PRO' was wrong — isCallerPremium() explicitly skips trusted-origin short-circuits (keyCheck.required === false) and only counts (a) an env-valid or user-owned wm_-prefixed API key with apiAccess entitlement, or (b) a Clerk bearer with role=pro or Dodo tier ≥ 1. Browser calls work because premiumFetch() injects one of those credentials per request, not because Origin alone authenticates. Per server/_shared/premium-check.ts:34 and src/services/premium-fetch.ts:66. - usage-auth: strengthened the 'Entitlement / tier gating' section to state outright that authentication and PRO entitlement are orthogonal, and that trusted Origin is NOT accepted as PRO even though it is accepted for public endpoints. Listed the two real credential forms that pass the gate. * docs(mintlify): fix stale line cite (MapContainer.activateScenario at :1010) Greptile review P2: prose cited MapContainer.ts:1004 but activateScenario is declared at :1010. Line 1004 landed inside the JSDoc block. * docs(mintlify): finish PR 1 — landing rewrite, features refresh, maritime link-out Completes the PR 1 items from docs/plans/2026-04-19-001-feat-docs-user- facing-ia-refresh-plan.md that were deferred after the checkpoint on Route Explorer + Scenario Engine + CRI nav. No new pages — only edits to existing pages to point at and cohere with the new workflow pages. - documentation.mdx: landing rewrite. Dropped brittle counts (344 news sources, 49 layers, 24 CII countries, 31+ sources, 24 typed services) in favor of durable product framing. Surfaced the shipped differentiators that were invisible on the landing previously: Country Resilience Index (222 countries, linked to its methodology page), AI daily brief, Route Explorer, Scenario Engine, MCP server. Kept CII and CRI as two distinct country-risk surfaces — do not conflate. - features.mdx: replaced the 'all 55 panels' Cmd+K claim and the stale inventory list with family-grouped descriptions that include the panels this audit surfaced as missing (disease- outbreaks, radiation-watch, thermal-escalation, consumer-prices, latest-brief, forecast, country-resilience). Added a Workflows section linking to Route Explorer and Scenario Engine, and a Country-level risk section linking CII + CRI. Untouched sections (map, marker clustering, data layers, export, monitors, activity tracking) left as-is. - maritime-intelligence.mdx: collapsed the embedded Route Explorer subsection to a one-paragraph pointer at /route-explorer so the standalone page is the canonical home. Panels nav group remains intentionally unadded; it waits on PR 2 content to avoid rendering an empty group in Mintlify. |
||
|
|
1f66b0c486 |
fix(billing): wrap non-Error throws before Sentry.captureException (#3212)
* fix(billing): wrap non-Error throws before Sentry.captureException
Convex/Clerk bootstrap occasionally rejects with undefined, which
Sentry.captureException then serializes as a synthetic `Error: undefined`
with zero stack frames — impossible to debug. Normalize err to a real Error
carrying the non-Error value in the message so the next occurrence yields a
usable event.
Resolves WORLDMONITOR-ND.
* fix(billing): apply non-Error normalization to openBillingPortal too
Review feedback: initSubscriptionWatch was fixed but openBillingPortal
shares the same Convex/Clerk bootstrap helpers and the same raw
Sentry.captureException(err) pattern — the synthetic `Error: undefined`
signature can still surface from that path. Extract a module-level
normalizeCaughtError() helper and apply it at both catch sites.
* fix(billing): attach original err as cause on normalized Error
Greptile P2: preserve the raw thrown value as structured `cause` data so
Sentry can display it alongside the stringified message. Assigned
post-construction because tsconfig target=ES2020 lacks the ErrorOptions
typing for `new Error(msg, { cause })`; modern browsers and Sentry read
the property either way.
|
||
|
|
4853645d53 |
fix(brief): switch carousel to @vercel/og on edge runtime (#3210)
* fix(brief): switch carousel to @vercel/og on edge runtime Every attempt to ship the Phase 8 Telegram carousel on Vercel's Node serverless runtime has failed at cold start: - PR #3174 direct satori + @resvg/resvg-wasm: Vercel edge bundler refused the `?url` asset import required by resvg-wasm. - PR #3174 (fix) direct satori + @resvg/resvg-js native binding: Node runtime accepted it, but Vercel's nft tracer does not follow @resvg/resvg-js/js-binding.js's conditional `require('@resvg/resvg-js-<platform>-<arch>-<libc>')` pattern, so the linux-x64-gnu peer package was never bundled. Cold start threw MODULE_NOT_FOUND, isolate crashed, FUNCTION_INVOCATION_FAILED on every request including OPTIONS, and Telegram reported WEBPAGE_CURL_FAILED with no other signal. - PR #3204 added `vercel.json` `functions.includeFiles` to force the binding in, but (a) the initial key was a literal path that Vercel micromatch read as a character class (PR #3206 fixed), (b) even with the corrected `api/brief/carousel/**` wildcard, the function still 500'd across the board. The `functions.includeFiles` path appears honored in the deployment manifest but not at runtime for this particular native-binding pattern. Fix: swap the renderer to @vercel/og's ImageResponse, which is Vercel's first-party wrapper around satori + resvg-wasm with Vercel-native bundling. Runs on Edge runtime — matches every other API route in the project. No native binding, no includeFiles, no nft tracing surprises. Cold start ~300ms, warm ~30ms. Changes: - server/_shared/brief-carousel-render.ts: replace renderCarouselPng (Uint8Array) with renderCarouselImageResponse (ImageResponse). Drop ensureLibs + satori + @resvg/resvg-js dynamic-import dance. Keep layout builders (buildCover/buildThreads/buildStory) and font loading unchanged — the Satori object trees are wire-compatible with ImageResponse. - api/brief/carousel/[userId]/[issueDate]/[page].ts: flip `runtime: 'nodejs'` -> `runtime: 'edge'`. Delegate rendering to the renderer's ImageResponse and return it directly; error path still 503 no-store so CDN + Telegram don't pin a bad render. - vercel.json: drop the now-useless `functions.includeFiles` block. - package.json: drop direct `@resvg/resvg-js` and `satori` deps (both now bundled inside @vercel/og). - tests/deploy-config.test.mjs: replace the native-binding regression guards with an assertion that no `functions` block exists (with a comment pointing at the skill documenting the micromatch gotcha for future routes). - tests/brief-carousel.test.mjs: updated comment references. Verified: - typecheck + typecheck:api clean - test:data 5814/5814 pass - node -e test: @vercel/og imports cleanly in Node (tests that reach through the renderer file no longer depend on native bindings) Post-deploy validation: curl -I -H "User-Agent: TelegramBot (like TwitterBot)" \ "https://www.worldmonitor.app/api/brief/carousel/<uid>/<slot>/0" # Expect: HTTP/2 403 (no token) or 200 (valid token) # NOT: HTTP/2 500 FUNCTION_INVOCATION_FAILED Then tail Railway digest logs on the next tick; the `[digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED` line should stop appearing, and the 3-image preview should actually land on Telegram. * Add renderer smoke test + fix Cache-Control duplication Reviewer flagged residual risk: no dedicated carousel-route smoke test for the @vercel/og path. Adds one, and catches a real bug in the process. Findings during test-writing: 1. @vercel/og's ImageResponse runs CLEANLY in Node via tsx — the comment in brief-carousel.test.mjs saying "we can't test the render in Node" was true for direct satori + @resvg/resvg-wasm but no longer holds after PR #3210. Pure Node render works end-to-end: satori tree-parse, jsdelivr font fetch, resvg-wasm init, PNG output. ~850ms first call, ~20ms warm. 2. ImageResponse sets its own default `Cache-Control: public, immutable, no-transform, max-age=31536000`. Passing Cache-Control via the constructor's headers option APPENDS rather than overrides, producing a duplicated comma-joined value like `public, immutable, no-transform, max-age=31536000, public, max-age=60` on the Response. The route handler was doing exactly this via extraHeaders. Fix: drop our Cache-Control override and rely on @vercel/og's 1-year immutable default — envelope is only immutable for its 7d Redis TTL so the effective ceiling is 7d anyway (after that the route 404s before render). Changes: - tests/brief-carousel.test.mjs: 6 new assertions under `renderCarouselImageResponse`: * renders cover / threads / story pages, each returning a valid PNG (magic bytes + size range) * rejects a structurally empty envelope * threads non-cache extraHeaders onto the Response * pins @vercel/og's Cache-Control default so it survives caller-supplied Cache-Control overrides (regression guard for the bug fixed in this commit) - api/brief/carousel/[userId]/[issueDate]/[page].ts: remove the stacked Cache-Control; lean on @vercel/og default. Drop the now- unused `PAGE_CACHE_TTL` constant. Comment explains why. Verified: - test:data 5820/5820 pass (was 5814, +6 smoke) - typecheck + typecheck:api clean - Render smoke: cover 825ms / threads 23ms / story 16ms first run (wasm init dominates first render) |
||
|
|
e4c95ad9be |
docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage (#3209)
* docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage Audit against api/ + proto/ revealed 9 OpenAPI specs missing from nav, the scenario/v1 service undocumented, and MCP (32 tools + OAuth 2.1 flow) with no user-facing docs. The stale Docs_To_Review/API_REFERENCE.md still pointed at pre-migration endpoints that no longer exist. - Wire 9 orphaned specs into docs.json: ConsumerPrices, Forecast, Health, Imagery, Radiation, Resilience, Sanctions, Thermal, Webcam - Hand-write ScenarioService.openapi.yaml (3 RPCs) until it's proto-backed (tracked in issue #3207) - New MCP page with tool catalog + client setup (Claude Desktop/web, Cursor) - New MDX for OAuth, Platform, Brief, Commerce, Notifications, Shipping v2, Proxies - New Usage group: quickstart, auth matrix, rate limits, errors - Remove docs/Docs_To_Review/API_REFERENCE.md and EXTERNAL_APIS.md (referenced dead endpoints); add README flagging dir as archival * docs(mintlify): move scenario docs out of generated docs/api/ tree The pre-push hook enforces that docs/api/ is proto-generated only. Replace the hand-written ScenarioService.openapi.yaml with a plain MDX page (docs/api-scenarios.mdx) until the proto migration lands (tracked in issue #3207). * docs(mintlify): fix factual errors flagged in PR review Reviewer caught 5 endpoints where I speculated on shape/method/limits instead of reading the code. All fixes cross-checked against the source: - api-shipping-v2: route-intelligence is GET with query params (fromIso2, toIso2, cargoType, hs2), not POST with a JSON body. Response shape is {primaryRouteId, chokepointExposures[], bypassOptions[], warRiskTier, disruptionScore, ...}. - api-commerce: /api/product-catalog returns {tiers, fetchedAt, cachedUntil, priceSource} with tier groups free|pro|api_starter| enterprise, not the invented {currency, plans}. Document the DELETE purge path too. - api-notifications: Slack/Discord /oauth/start are POST + Clerk JWT + PRO (returning {oauthUrl}), not GET redirects. Callbacks remain GET. - api-platform: /api/version returns the latest GitHub Release ({version, tag, url, prerelease}), not deployed commit/build metadata. - api-oauth + mcp: /api/oauth/register limit is 5/60s/IP (match code), not 10/hour. Also caught while double-checking: /api/register-interest and /api/contact are 5/60min and 3/60min respectively (1-hour window, not 1-minute). Both require Turnstile. Removed the fabricated limits for share-url, notification-channels, create-checkout (they fall back to the default per-IP limit). * docs(mintlify): second-round fixes — verify every claim against source Reviewer caught 7 more cases where I described API behavior I hadn't read. Each fix below cross-checked against the handler. - api-commerce (product-catalog): tiers are flat objects with monthlyPrice/annualPrice/monthlyProductId/annualProductId on paid tiers, price+period for free, price:null for enterprise. There is no nested plans[] array. - api-commerce (referral/me): returns {code, shareUrl}, not counts. Code is a deterministic 8-char HMAC of the Clerk userId; binding into Convex is fire-and-forget via ctx.waitUntil. - api-notifications (notification-channels): actual action set is create-pairing-token, set-channel, set-web-push, delete-channel, set-alert-rules, set-quiet-hours, set-digest-settings. Replaced the made-up list. - api-shipping-v2 (webhooks): alertThreshold is numeric 0-100 (default 50), not a severity string. Subscriber IDs are wh_+24hex; secret is raw 64-char hex (no whsec_ prefix). POST registration returns 201. Added the management routes: GET /{id}, POST /{id}/rotate-secret, POST /{id}/reactivate. - api-platform (cache-purge): auth is Authorization: Bearer RELAY_SHARED_SECRET, not an admin-key header. Body takes keys[] and/or patterns[] (not {key} or {tag}), with explicit per-request caps and prefix-blocklist behavior. - api-platform (download): platform+variant query params, not file=<id>. Response is a 302 to a GitHub release asset; documented the full platform/variant tables. - mcp: server also accepts direct X-WorldMonitor-Key in addition to OAuth bearer. Fixed the curl example which was incorrectly sending a wm_live_ API key as a bearer token. - api-notifications (youtube/live): handler reads channel or videoId, not channelId. - usage-auth: corrected the auth-matrix row for /api/mcp to reflect that OAuth is one of two accepted modes. * docs(mintlify): fix Greptile review findings - mcp.mdx: 'Five' slow tools → 'Six' (list contains 6 tools) - api-scenarios.mdx: replace invalid JSON numeric separator (8_400_000_000) with plain integer (8400000000) Greptile's third finding — /api/oauth/register rate-limit contradiction across api-oauth.mdx / mcp.mdx / usage-rate-limits.mdx — was already resolved in commit |
||
|
|
38e6892995 |
fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205)
* fix(brief): per-run slot URL so same-day digests link to distinct briefs
Digest emails at 8am and 1pm on the same day pointed to byte-identical
magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz.
Each compose run overwrote the single daily envelope in place, and the
composer rolling 24h story window meant afternoon output often looked
identical to morning. Readers clicking an older email got whatever the
latest cron happened to write.
Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The
magazine URL, carousel URLs, and Redis key all carry the slot, and each
digest dispatch gets its own frozen envelope that lives out the 7d TTL.
envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026".
The digest cron also writes a brief:latest:{userId} pointer (7d TTL,
overwritten each compose) so the dashboard panel and share-url endpoint
can locate the most recent brief without knowing the slot. The
previous date-probing strategy does not work once keys carry HHMM.
No back-compat for the old YYYY-MM-DD format: the verifier rejects it,
the composer only ever writes the new shape, and any in-flight
notifications signed under the old format will 403 on click. Acceptable
at the rollout boundary per product decision.
* fix(brief): carve middleware bot allowlist to accept slot-format carousel path
BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the
pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted
by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist
and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn
image fetchers would 403 on sendMediaGroup, breaking previews for the
new digest links.
CI missed this because tests/middleware-bot-gate.test.mts still
exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the
slot format and add a regression asserting the pre-slot shape is now
rejected, so legacy links cannot silently leak the allowlist after
the rollout.
* fix(brief): preserve caller-requested slot + correct no-brief share-url error
Two contract bugs in the slot rollout that silently misled callers:
1. GET /api/latest-brief?slot=X where X has no envelope was returning
{ status: 'composing', issueDate: <today UTC> } — which reads as
"today's brief is composing" instead of "the specific slot you
asked about doesn't exist". A caller probing a known historical
slot would get a completely unrelated "today" signal. Now we echo
the requested slot back (issueSlot + issueDate derived from its
date portion) when the caller supplied ?slot=, and keep the
UTC-today placeholder only for the no-param path.
2. POST /api/brief/share-url with no slot and no latest-pointer was
falling into the generic invalid_slot_shape 400 branch. That is
not an input-shape problem; it is "no brief exists yet for this
user". Return 404 brief_not_found — the same code the
existing-envelope check returns — so callers get one coherent
contract: either the brief exists and is shareable, or it doesn't
and you get 404.
|
||
|
|
56054bfbc1 |
fix(brief): use wildcard glob in vercel.json functions key (PR #3204 follow-up) (#3206)
* fix(brief): use wildcard glob in vercel.json functions key PR #3204 shipped the right `includeFiles` value but the WRONG key: "api/brief/carousel/[userId]/[issueDate]/[page].ts" Vercel's `functions` config keys are micromatch globs, not literal paths. Bracketed segments like `[userId]` are parsed as character classes (match any ONE character from {u,s,e,r,I,d}), so my rule matched zero files and `includeFiles` was silently ignored. Post- merge probe still returned HTTP 500 FUNCTION_INVOCATION_FAILED on every request. Build log shows zero mentions of `carousel` or `resvg` — corroborates the key never applied. Fix: wildcard path segments. "api/brief/carousel/**" Matches any file under the carousel route dir. Since the only deployed file there is the dynamic-segment handler, the effective scope is identical to what I originally intended. Added a second regression test that sweeps every functions key and fails loudly if any bracketed segment slips back in. Guards against future reverts AND against anyone copy-pasting the literal route path without realising Vercel reads it as a glob. 23/23 deploy-config tests pass (was 22, +1 new guard). * Address Greptile P2: widen bracket-literal guard regex Greptile spotted that `/\[[A-Za-z]+\]/` only matches purely-alphabetic segment names. Real-world Next.js routes often use `[user_id]`, `[issue_date]`, `[page1]`, `[slug2024]` — none flagged by the old regex, so the guard would silently pass on the exact kind of regression it was written to catch. Widened to `/\[[A-Za-z][A-Za-z0-9_]*\]/`: - requires a leading letter (so legit char classes like `[0-9]` and `[!abc]` don't false-positive) - allows letters, digits, underscores after the first char - covers every Next.js-style dynamic-segment name convention Also added a self-test that pins positive cases (userId, user_id, issue_date, page1, slug2024) and negative cases (the actual `**` glob, `[0-9]`, `[!abc]`) so any future narrowing of the regex breaks CI immediately instead of silently re-opening PR #3206. 24/24 deploy-config tests pass (was 23, +1 new self-test). |
||
|
|
305dc5ef36 |
feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200)
* feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op)
Replaces the inline Jaccard story-dedup in seed-digest-notifications
with an orchestrator that can run Jaccard, shadow, or full embedding
modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so
production behaviour is unchanged until Phase C shadow + Phase D flip.
New modules (scripts/lib/):
- brief-dedup-consts.mjs tunables + cache prefix + __constants bag
- brief-dedup-jaccard.mjs verbatim 0.55-threshold extract (fallback)
- entity-gazetteer.mjs cities/regions gazetteer + common-caps
- brief-embedding.mjs OpenRouter /embeddings client with Upstash
cache, all-or-nothing timeout, cosineSimilarity
- brief-dedup-embed.mjs complete-link clustering + entity veto (pure)
- brief-dedup.mjs orchestrator, env read at call entry,
shadow archive, structured log line
Operator tools (scripts/tools/):
- calibrate-dedup-threshold.mjs offline calibration runner + histogram
- golden-pair-validator.mjs live-embedder drift detector (nightly CI)
- shadow-sample.mjs Sample A/B CSV emitter over SCAN archive
Tests:
- brief-dedup-jaccard.test.mjs migrated from regex-harness to direct
import plus orchestrator parity tests (22)
- brief-dedup-embedding.test.mjs 9 plan scenarios incl. 10-permutation
property test, complete-link non-chain (21)
- brief-dedup-golden.test.mjs 20-pair mocked canary (21)
Workflows:
- .github/workflows/dedup-golden-pairs.yml nightly live-embedder canary
(07:17 UTC), opens issue on drift
Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran
shuts Hormuz") case can't return true under a single coherent
classification (country-in-A vs capital-in-B sit on different sides
of the actor/location boundary). Gazetteer follows the plan's
"countries are actors" intent; the test is updated to assert false
with a comment pointing at the irreducible capital-country
coreference limitation.
Verification:
- npm run test:data 5825/5825 pass
- tests/edge-functions 171/171 pass
- typecheck + typecheck:api clean
- biome check on new files clean
- lint:md 0 errors
Phase B (calibration), Phase C (shadow), and Phase D (flip) are
subsequent PRs.
* refactor(digest-dedup): address review findings 193-199
Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across
kieran-typescript, security-sentinel, performance-oracle, architecture-
strategist, and code-simplicity reviewers. Fixes below; all 64 dedup
tests + 5825 data tests + 171 edge-function tests still green.
P1 #193 - dedup regex + redis pipeline duplication
- Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs;
both orchestrator and embedding client import from there.
- normalizeForEmbedding now delegates to stripSourceSuffix from the
Jaccard module so the outlet allow-list is single-sourced.
P1 #194 - embedding timeout floor + negative-budget path
- callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0
instead of opening a doomed 250ms fetch.
- Removed Math.max(250, ...) floor that let wall-clock cap overshoot.
P1 #195 - dead env getters
- Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled /
getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs
(zero callers; orchestrator reimplements inline).
P2 #196 - orchestrator cleanup bundle
- Removed re-exports at bottom of brief-dedup.mjs.
- Extracted materializeCluster into brief-dedup-jaccard.mjs; both
the fallback and orchestrator use the shared helper.
- Deleted clusterWithEntityVeto wrapper; orchestrator inlines the
vetoFn wiring at the single call site.
- Shadow mode now runs Jaccard exactly once per tick (was twice).
- Fallback warn line carries reason=ErrorName so operators can
filter timeout vs provider vs shape errors.
- Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs
silently falling to jaccard).
P2 #197 - workflow + shadow-sample hardening
- dedup-golden-pairs.yml body composition no longer relies on a
heredoc that would command-substitute validator stdout. Switched
to printf with sanitised LOG_TAIL (printable ASCII only) and
--body-file so crafted fixture text cannot escape into the runner.
- shadow-sample.mjs Upstash helper enforces a hardcoded command
allowlist (SCAN | GET | EXISTS).
P2 #198 - test + observability polish
- Scenarios 2 and 3 deep-equal returned clusters against the Jaccard
expected shape, not just length. Also assert the reason= field.
P3 #199 - nits
- Removed __constants test-bag; jaccard tests use named imports.
- Renamed deps.apiKey to deps._apiKey in embedding client.
- Added @pre JSDoc on diffClustersByHash about unique-hash contract.
- Deferred: mocked golden-pair test removal, gazetteer JSON migration,
scripts/tools AGENTS.md doc note.
Todos 193-199 moved from pending to complete.
Verification:
- npm run test:data 5825/5825 pass
- tests/edge-functions 171/171 pass
- typecheck + typecheck:api clean
- biome check on changed files clean
* fix(digest-dedup): address Greptile P2 findings on PR #3200
1. brief-embedding.mjs: wrap fetch lookup as
`(...args) => globalThis.fetch(...args)` instead of aliasing bare
`fetch`. Aliasing captures the binding at module-load time, so
later instrumentation / Edge-runtime shims don't see the wrapper —
same class of bug as the banned `fetch.bind(globalThis)` pattern
flagged in AGENTS.md.
2. dedup-golden-pairs.yml: `gh issue create --label "..." || true`
silently swallowed the failure when any of dedup/canary/p1 labels
didn't pre-exist, breaking the drift alert channel while leaving
the job red in the Actions UI. Switched to repeated `--label`
flags + `--create-label` so any missing label is auto-created on
first drift, and dropped the `|| true` so a legitimate failure
(network / auth) surfaces instead of hiding.
Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1);
applied pre-merge so the nightly canary is usable from day one.
* fix(digest-dedup): two P1s found on PR #3200
P1 — canary classifier must match production
Nightly golden-pair validator was checking a hardcoded threshold
(default 0.60) and always applied the entity veto, while the actual
dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase
C/D env flip could make the canary green while prod was wrong or
red while prod was healthy, defeating the whole point of a drift
detector.
Fix:
- golden-pair-validator.mjs now calls readOrchestratorConfig(process.env)
— the same helper the orchestrator uses — so any classifier knob
added later is picked up automatically. The threshold and veto-
enabled flags are sourced from env by default; a --threshold CLI
flag still overrides for manual calibration sweeps.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.*),
which operators must keep in lockstep with Railway. The
workflow_dispatch threshold input now defaults to empty; the
scheduled canary always uses the production-parity config.
- Validator log line prints the effective config + source so nightly
output makes the classifier visible.
P1 — shadow archive writes were fail-open
`defaultRedisPipeline()` returns null on timeout / auth / HTTP
failure. `writeShadowArchive()` only had a try/catch, so the null
result was silently treated as success. A Phase C rollout could
log clean "mode=shadow … disagreements=X" lines every tick while
the Upstash archive received zero writes — and Sample B labelling
would then find no batches, silently killing calibration.
Fix:
- writeShadowArchive now inspects the pipeline return. null result,
non-array response, per-command {error}, or a cell without
{result: "OK"} all return {ok: false, reason}.
- Orchestrator emits a warn line with the failure reason, and the
structured log line carries archive_write=ok|failed so operators
can grep for failed ticks.
- Regression test in brief-dedup-embedding.test.mjs simulates the
null-pipeline contract and asserts both the warn and the structured
field land.
Verification:
- test:data 5825/5825 pass
- dedup suites 65/65 pass (new: archive-fail regression)
- typecheck + api clean
- biome check clean on changed files
* fix(digest-dedup): two more P1s found on PR #3200
P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED
The prior round fixed the threshold/veto knobs but left the canary
running embeddings regardless of whether production could actually
reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or
DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the
classifier, so a drift signal is meaningless — or worse, a live
OpenRouter issue flags the canary while prod is obliviously fine.
Fix:
- golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the
same readOrchestratorConfig() helper the orchestrator uses. When
either says "embed path inactive in prod", the validator logs an
explicit skip line and exits 0. The nightly workflow then shows
green, which is the correct signal ("nothing to drift against").
- A --force CLI flag remains for manual dispatch during staged
rollouts.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and
DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables
alongside the threshold and veto-enabled knobs, so all four
classifier gates stay in lockstep with Railway.
- Validator log line now prints mode + remoteEmbedEnabled so the
canary output surfaces which classifier it validated.
P1 — shadow-sample Sample A was biased by SCAN order
enumerate-and-dedup added every seen pair to a dedup key BEFORE
filtering by agreement. If the same pair appeared in an agreeing
batch first and a disagreeing batch later, the disagreeing
occurrence was silently dropped. SCAN order is unspecified, so
Sample A could omit real disagreement pairs.
Fix:
- Extracted the enumeration into a pure `enumeratePairs(archives, mode)`
export so the logic is testable. Mode filter runs BEFORE the dedup
check: agreeing pairs are skipped entirely under
--mode disagreements, so any later disagreeing occurrence can
still claim the dedup slot.
- Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression
cases: agreement-then-disagreement, reversed order (symmetry),
always-agreed omission, population enumeration, cross-batch dedup.
- isMain guard added so importing the module for tests does not
kick off the CLI scan path.
Verification:
- test:data 5825/5825 pass
- dedup suites 70/70 pass (5 new shadow-sample regressions)
- typecheck + api clean
- biome check clean on changed files
Operator follow-up before Phase C:
Set all FOUR dedup repo variables in GitHub alongside Railway:
DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED,
DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED
* refactor(digest-dedup): Railway is the single source of truth for dedup config
Fair user pushback: asking operators to set four DIGEST_DEDUP_*
values in BOTH Railway (where the cron runs) AND GitHub repo
variables (where the canary runs) is architectural debt. Two
copies of the same truth will always drift.
Solution: the digest cron publishes its resolved config to Upstash
on every tick under brief:dedup:config:v1 (2h TTL). The nightly
golden-pair canary reads that key instead of env vars. Railway
stays the sole source of truth; no parallel repo variables to
maintain. A missing/expired key signals "cron hasn't run" and
the canary skips with exit 0 — better than validating against
hardcoded defaults that might diverge from prod.
Changes:
- brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants.
- brief-dedup.mjs: new publishActiveConfig() fires at the start of
every deduplicateStories() call (before the mode short-circuit,
so jaccard ticks also publish a "mode=jaccard" signal the canary
can read). Fire-and-forget; archive-write error semantics still
apply if the operator wants stricter tracking.
- golden-pair-validator.mjs: removed readOrchestratorConfig(env)
path. Now calls fetchActiveConfigFromUpstash() and either
validates against that config, skips when the embed path is
inactive, or skips when the key is missing (with --force
override for manual dispatch).
- dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines
and the corresponding repo-variable dependency. Only the three
Upstash + OpenRouter secrets remain.
- tests: two new regressions assert config is published on every
tick (shadow AND jaccard modes) with the right shape + TTL.
Operator onboarding now takes one action: set the four
DIGEST_DEDUP_* variables on the Railway seed-digest-notifications
service. Nothing to set in GitHub beyond the existing
OPENROUTER_API_KEY / UPSTASH_* secrets.
Verification:
- test:data 5825/5825 pass
- dedup suites 72/72 pass (2 new config-publish regressions)
- typecheck + api clean
- biome check clean on changed files
* refactor(digest-dedup): ship embed directly, drop phases/canary/shadow
User feedback: "i dont need multiple phases and shit, we go directly
to embed". Fair. Ripping out the overengineering I accumulated:
DELETED
- .github/workflows/dedup-golden-pairs.yml (nightly canary)
- scripts/tools/golden-pair-validator.mjs
- scripts/tools/shadow-sample.mjs
- scripts/tools/calibrate-dedup-threshold.mjs
- tests/fixtures/brief-dedup-golden-pairs.json
- tests/brief-dedup-golden.test.mjs
- tests/brief-dedup-shadow-sample.test.mjs
SIMPLIFIED
- brief-dedup.mjs: removed shadow mode, publishActiveConfig,
writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes,
and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now
binary: `embed` (default) or `jaccard` (instant kill switch).
- brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_*, ACTIVE_CONFIG_*.
- Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path).
Railway deploy with OPENROUTER_API_KEY set = embeddings live on
next cron tick. Set MODE=jaccard on Railway to revert instantly.
Orchestrator still falls back to Jaccard on any embed-path failure
(timeout, provider outage, missing API key, bad response). Fallback
warn carries reason=<ErrorName>. The cron never fails because
embeddings flaked. All 64 dedup tests + 5825 data tests still green.
Net diff: -1,407 lines.
Operator single action: set OPENROUTER_API_KEY on Railway's
seed-digest-notifications service (already present) and ship. No
GH Actions, no shadow archives, no labelling sprints. If the 0.60
threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on
Railway — takes effect on next tick, no redeploy.
* fix(digest-dedup): multi-word location phrases in the entity veto
Extractor was whitespace-tokenising and only single-token matching
against LOCATION_GAZETTEER, silently making every multi-word entry
unreachable:
extractEntities("Houthis strike ship in Red Sea")
→ { locations: [], actors: ['houthis','red','sea'] } ✗
shouldVeto("Houthis strike ship in Red Sea",
"US escorts convoy in Red Sea") → false ✗
With MODE=embed as the default, that turned off the main
anti-overmerge safety rail for bodies of water, regions, and
compound city names — exactly the P07-Hormuz / Houthis-Red-Sea
headlines the veto was designed to cover.
Fix: greedy longest-phrase scan with a sliding window. At each
token position try the longest multi-word phrase first (down to
2), require first AND last tokens to be capitalised (so lowercase
prose like "the middle east" doesn't falsely match while headline
"Middle East" does), lowercase connectors in between are fine
("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to
single-token lookup when no multi-word phrase fits.
Now:
extractEntities("Houthis strike ship in Red Sea")
→ { locations: ['red sea'], actors: ['houthis'] } ✓
shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true ✓
Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4
(longest gazetteer entry: "ho chi minh city"), so this is
effectively O(N).
Added 5 regression tests covering Red Sea, South China Sea,
Strait of Hormuz (lowercase-connector case), Abu Dhabi, and
New York, plus the Houthis-vs-US veto reproducer from the P1.
All 5825 data tests + 45 dedup tests green; lint + typecheck clean.
|
||
|
|
27849fee1e |
fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn (#3204)
* fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn Real root cause of every Telegram carousel WEBPAGE_CURL_FAILED since PR #3174 merged. Not middleware (last PR fixed that theoretical path but not the observed failure). The Vercel function itself crashes HTTP 500 FUNCTION_INVOCATION_FAILED on every request including OPTIONS - the isolate can't initialise. The handler imports brief-carousel-render which lazy-imports @resvg/resvg-js. That package's js-binding.js does runtime require(@resvg/resvg-js-<platform>-<arch>-<libc>). On Vercel Lambda (Amazon Linux 2 glibc) that resolves to @resvg/resvg-js-linux-x64-gnu. Vercel nft tracing does NOT follow this conditional require so the optional peer package isnt bundled. Cold start throws MODULE_NOT_FOUND, isolate crashes, Vercel returns FUNCTION_INVOCATION_FAILED, Telegram reports WEBPAGE_CURL_FAILED. Fix: vercel.json functions.includeFiles forces linux-x64-gnu binding into the carousel functions bundle. Only this route needs it; every other api route is unaffected. Verified: - deploy-config tests 21/21 pass - JSON valid - Reproduced 500 via curl on all methods and UAs - resvg-js/js-binding.js confirms linux-x64-gnu is the runtime binary on Amazon Linux 2 glibc Post-merge: curl with TelegramBot UA should return 200 image/png instead of 500; next cron tick should clear the Railway [digest] Telegram carousel 400 line. * Address Greptile P2s: regression guard + arch-assumption reasoning Two P2 findings on PR #3204: P2 #1 (inline on vercel.json:6): Platform architecture assumption undocumented. If Vercel migrates to Graviton/arm64 Lambda the cold-start crash silently returns. vercel.json is strict JSON so comments aren't possible inline. P2 #2 (tests/deploy-config.test.mjs:17): No regression guard for the carousel includeFiles rule. A future vercel.json tidy-up could silently revert the fix with no CI signal. Fixed both in a single block: - New describe() in deploy-config.test.mjs asserts the carousel route's functions entry exists AND its includeFiles points at @resvg/resvg-js-linux-x64-gnu. Any drift fails the build. - The block comment above it documents the Amazon Linux 2 x86_64 glibc assumption that would have lived next to the includeFiles entry if JSON supported comments. Includes the Graviton/arm64 migration pointer. tests 22/22 pass (was 21, +1 new). |
||
|
|
45f02fed00 |
fix(sentry): filter Three.js OrbitControls setPointerCapture NotFoundError (#3201)
* fix(sentry): suppress Three.js OrbitControls setPointerCapture NotFoundError OrbitControls' pointerdown handler calls setPointerCapture after the browser has already released the pointer (focus change, rapid re-tap), leaking as an unhandled NotFoundError. OrbitControls is bundled into main-*.js so hasFirstParty=true; matched by the unique setPointerCapture message (grep confirms no first-party setPointerCapture usage). Resolves WORLDMONITOR-NC. * fix(sentry): gate OrbitControls setPointerCapture filter on bundle-only stack Review feedback: suppressing by message alone would hide a future first-party setPointerCapture regression. Mirror the existing OrbitControls filter's provenance check — require absence of any source-mapped .ts/.tsx frame so the filter only matches stacks whose only non-infra frame is the bundled main chunk. Adds positive + negative regression tests for the pair. * fix(sentry): gate OrbitControls filter on positive three.js context signature Review feedback: absence of .ts/.tsx frames is not proof of third-party origin because production stacks are often unsymbolicated. Replace the negative-only gate with a positive OrbitControls signature — require a frame whose context slice contains the literal `_pointers … setPointerCapture` adjacency unique to three.js OrbitControls. Update tests to cover the production-realistic case (unsymbolicated first-party bundle frame calling setPointerCapture must still reach Sentry) plus a defensive no-context fallthrough. |
||
|
|
d7f87754f0 |
fix(emails): update transactional email copy — 22 → 30+ services (#3203)
Follow-up to #3202. Greptile flagged two transactional email templates still claimed '22 services' while /pro now advertises '30+': - api/register-interest.js:90 — interest-registration confirmation email ('22 Services, 1 Key') - convex/payments/subscriptionEmails.ts:57 — API subscription confirmation email ('22 services, one API key') A user signing up via /pro would read '30+ services' on the page, then receive an email saying '22'. Both updated to '30+' matching the /pro page and the actual server domain count (31 in server/worldmonitor/*, plus api/scenario/v1/ = 32, growing). |
||
|
|
135082d84f |
fix(pro): correct service-domain count — 22 → 30+ (server has 31) (#3202)
* fix(pro): correct service-domain count — 22 → 30+ (server has 31, growing) The /pro page advertised '22 services' / '22 service domains' but server/worldmonitor/, proto/worldmonitor/, and src/generated/server/worldmonitor/ all have 31 domain dirs (aviation, climate, conflict, consumer-prices, cyber, displacement, economic, forecast, giving, health, imagery, infrastructure, intelligence, maritime, market, military, natural, news, positive-events, prediction, radiation, research, resilience, sanctions, seismology, supply-chain, thermal, trade, unrest, webcam, wildfire). api/scenario/v1/ adds a 32nd recently shipped surface. Used '30+' rather than the literal '31' so the page doesn't drift again every time a new domain ships (the '22' was probably accurate at one point too). 168 string substitutions across all 21 locale JSON files (8 keys each: twoPath.proDesc, twoPath.proF1, whyUpgrade.fasterDesc, pillars.askItDesc, dataCoverage.subtitle, proShowcase.oneKey, apiSection.restApi, faq.a8). Plus 10 in pro-test/index.html (meta description, og:description, twitter:description, SoftwareApplication ld+json description + Pro Monthly offer, FAQ ld+json a8, noscript fallback). Bundle rebuilt. * fix(pro): Bulgarian grammar — drop definite-article suffix after 30+ |
||
|
|
cce46a1767 |
fix(pro): API tier is launched — drop 'Coming Soon' label (#3198)
The /pro comparison-table column header still read 'API (Coming Soon)' across all 21 locales (and locale-translated variants), but convex/config/productCatalog.ts has api_starter at currentForCheckout=true, publicVisible=true, priceCents=9999 — $99.99/month, with api_starter_annual at $999/year. The API tier is shipped and self-serve. Updated pricingTable.apiHeader → 'API ($99.99)' for every locale, matching the same '<Tier> ($<price>)' pattern as 'Free ($0)' and 'Pro ($39.99)'. Bundle rebuilt. |
||
|
|
c7aacfd651 |
fix(health): persist WARNING events + add failure-log timeline (#3197)
* fix(health): persist WARNING events + add failure-log timeline WARNING status (stale seeds) was excluded from the health:last-failure Redis write (line 680 checked `!== 'WARNING'`). When UptimeRobot keyword- checks for "HEALTHY" and gets a WARNING response, it flags DOWN, but no forensic trail was left in Redis. This made stale-seed incidents invisible to post-mortem investigation. Changes: - Write health:last-failure for ANY non-HEALTHY status (including WARNING) - Add health:failure-log (LPUSH list, last 50 entries, 7-day TTL) so multiple incidents are preserved as a timeline, not just the latest - Include warnCount alongside critCount in the snapshot - Broaden the problems filter to capture all non-OK statuses * fix(health): dedupe failure-log entries by incident signature Repeated polls during one long WARNING window would LPUSH near-identical snapshots, filling the 50-entry log and evicting older distinct incidents. Now compares a signature (status + sorted problem set) against the previous entry via health:failure-log-sig. Only appends when the incident changes. The last-failure key is still updated every poll (latest timestamp matters). * fix(health): add 4s timeout to persist pipelines + consistent arg types Addresses greptile review on PR #3197: - Both persist redisPipeline calls now pass 4_000ms timeout (main data pipeline uses 8_000ms; persist is less critical so shorter is fine) - LTRIM/EXPIRE args use numbers consistently (was mixing number/string) * fix(health): atomic sig swap via SET ... GET to eliminate dedupe race Two concurrent /api/health requests could both read the old signature before either write lands, appending duplicate entries. Now uses SET key val EX ttl GET (Redis 6.2+) to atomically swap the sig and return the previous value in one pipeline command. The LPUSH only fires if the returned previous sig differs from the new one. Also skips the second redisPipeline call entirely when sig matches (no logCmds to send). * fix(health): exclude seedAgeMin from dedupe sig + clear sig on recovery Two issues with the failure-log dedupe: 1. seedAgeMin changes on every poll (e.g. 31min, 32min, 33min), so the signature changed every time and LPUSH still fired on every probe during a STALE_SEED window. Now uses a separate sigKeys array with only key:status (no age) for the signature, while problemKeys still includes ages for the snapshot payload. 2. The sig was never cleared on recovery. If the same problem set recurred after a healthy gap, the old sig (within its 24h TTL) would match and the recurrence would be silently skipped. Now DELs health:failure-log-sig when overall === 'HEALTHY'. * fix(health): move sig write after LPUSH in same pipeline The sig was written eagerly in the first pipeline (SET ... GET), but the LPUSH happened in a separate background pipeline. If that second write failed, the sig was already advanced, permanently deduping the incident out of the timeline. Now: GET sig first (read-only), then write last-failure + LPUSH + sig all in one pipeline. The sig only advances if the entire pipeline succeeds. Failure leaves the old sig in place so the next poll retries. Reintroduces a small read-then-write race window (two concurrent probes can both read the old sig), but the worst case is a single duplicate entry, which is strictly better than a permanently dropped incident. |
||
|
|
63464775a5 |
feat(supply-chain): scenario UX — rich banner + projected score + faster poll (#3193)
* feat(supply-chain): rich scenario banner + projected score per chokepoint + faster poll
User reported Simulate Closure adds only a thin banner with no context —
"not clear what value user is getting, takes many many seconds". Four
targeted UX improvements in one PR:
A. Rich banner (scenario params + tagline)
Banner now reads:
⚠ Hormuz Tanker Blockade · 14d · +110% cost
CN 100% · IN 84% · TW 82% · IR 80% · US 39%
Simulating 14d / 100% closure / +110% cost on 1 chokepoint.
Chokepoint card below shows projected score; map highlights…
Surfaces the scenario template fields (durationDays, disruptionPct,
costShockMultiplier) + a one-line explainer so a first-time user
understands what "CN 100%" actually means.
B. Projected score on each affected chokepoint card
Card header now shows: `[current]/100 → [projected]/100` with a red
trailing badge + red left border on the card body.
Body prepends: "⚠ Projected under scenario: X% closure for N days
(+Y% cost)".
Projected = max(current, template.disruptionPct) — conservative
floor since the real scoring mixes threat + warnings + anomaly.
C. Faster polling
Status poll interval 2s → 1s. Max iterations 30→60 (unchanged 60s
budget). Worker processes in <1s; perceived latency drops from
2–3s to <2s in the common case. First poll still immediate.
D. ScenarioResult interface widened
Added optional `template` and `currentDisruptionScores` fields in
scenario-templates.ts to match what the scenario-worker already
emits. Optional = backward-compat with map-only consumers.
Dependent on PR #3192 (already merged) which fixed the 10000% banner
% inflation.
* fix(supply-chain): trigger render() on scenario activate/dismiss — cards must re-render
PR review caught a real bug in the new scenario UX: showScenarioSummary
and hideScenarioSummary were mutating the banner DOM directly without
triggering render(). renderChokepoints() reads activeScenarioState to
paint the projected score + red border + callout, but those only run
during render() — so the cards stayed stale on activate AND on dismiss
until some unrelated re-render happened.
Refactor to split public API from internal rendering:
- showScenarioSummary(scenarioId, result) — now just sets state + calls
render(). Was: set state + inline DOM mutation (bypassing card render).
- renderScenarioBanner() — new private helper that builds the banner
DOM from activeScenarioState. Called from render()'s postlude
(replacing the old self-recursive showScenarioSummary() call — which
only worked because it had a side-effectful early-exit path that
happened to terminate, but was a latent recursion risk).
- hideScenarioSummary() — now just sets state=null + calls render().
Was: clear state + manual banner removal + manual button-text reset
loop. The button loop is redundant now — the freshly-rendered card
template produces buttons with default "Simulate Closure" text by
construction.
Net effect: activating a scenario paints the banner AND the affected
chokepoint cards in a single render tick. Dismissing strips both in
the same tick.
* fix(supply-chain): derive scenario button state from activeScenarioState, not imperative mutation
PR review caught: the earlier re-render fix (showScenarioSummary → render())
correctly repaints cards on activate, but the button-state logic in
runScenario() is now wrong. render() detaches the old btn reference, so
the post-onScenarioActivate `resetButton('Active') + btn.disabled = true`
touches a detached node and no-ops (resetButton() explicitly skips
!btn.isConnected). The fresh button painted by render() uses the default
template text — visible button reads "Simulate Closure" enabled, and users
can queue duplicate runs of an already-active scenario.
Fix: make button state a function of panel state.
- renderChokepoints() scenario section: check
activeScenarioState.scenarioId === template.id and, when matched, emit
the button with class `sc-scenario-btn--active`, text "Active", and
`disabled` attribute. On dismiss, the next render strips those
automatically — same pattern as the card projection styling.
- runScenario(): drop the dead `resetButton('Active')` + `btn.disabled`
lines after onScenarioActivate. That path is now template-driven;
touching the detached btn was the defect.
Catch-path resets ('Simulate Closure' on abort, 'Error — retry' on real
error) are unchanged — those fire BEFORE any render could detach the btn,
so the imperative path is still correct there.
* fix(supply-chain): hide scenario projection arrow when current already ≥ template
Greptile P1: projected badge was rendered as `N/100 → N/100` whenever
current disruptionScore already met or exceeded template.disruptionPct.
Visible for Suez (80%) or Panama (50%) scenarios when a chokepoint is
already elevated — read as "scenario has zero effect", which is misleading.
The two values live on different scales — cp.disruptionScore is a
computed risk score (threat + warnings + anomaly) while
template.disruptionPct is "% of capacity blocked" — but they share the
0–100 axis so directional comparison is still meaningful for the
"does this scenario escalate things?" signal.
Fix: arrow only renders when template.disruptionPct > cp.disruptionScore.
When current already equals or exceeds the scenario level, show the
single current badge. The card's red left border + "⚠ Projected under
scenario" callout still indicate the card is the scenario target —
only the escalation arrow is suppressed.
|
||
|
|
85d6308ed0 |
fix(brief): unblock Telegram carousel fetch in middleware bot gate (#3196)
* fix(brief): allow Telegram/social UAs to fetch carousel images
middleware.ts BOT_UA regex (/bot/i) was 403 on Telegram sendMediaGroup
fetch of /api/brief/carousel/<u>/<d>/<p>. SOCIAL_IMAGE_UA allowlist
(includes telegrambot) was scoped to /favico/* and .png suffix only;
carousel returns image/png but the URL has no extension.
Symptom: Railway log [digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED
and zero images above the Telegram brief.
Fix: extend UA-bypass guard to cover /api/brief/carousel/ prefix.
HMAC token on the URL is the real auth; UA allowlist is defence-in-depth.
* Address P2 + P3: regression test + route-shape regex
P2: Add tests/middleware-bot-gate.test.mts — 13 cases pinning the
contract:
- TelegramBot/Slackbot/Discordbot/LinkedInBot pass on carousel
- curl, generic bot UAs, missing UA still 403 on carousel
- TelegramBot 403s on non-carousel API routes (scoped, not global)
- Malformed carousel paths (admin/dashboard, page >= 3, non-ISO
date) all still 403 via the regex
- Normal browsers pass everywhere
P3: Replace startsWith('/api/brief/carousel/') prefix with
BRIEF_CAROUSEL_PATH_RE matching the exact shape enforced by
api/brief/carousel/[userId]/[issueDate]/[page].ts
(userId / YYYY-MM-DD / page 0|1|2). A future
/api/brief/carousel/admin or similar sibling cannot inherit the
bypass. Comment now lists every social-image UA this protects.
typecheck + typecheck:api clean. test:data 5772/5772.
|
||
|
|
6025b0ce47 |
chore(sentry): add Chrome/Firefox variant of UTItemActionController filter (#3194)
The Safari variant (Can't find variable: UTItemActionController) was already in ignoreErrors at line 53. Chrome/Firefox uses the "X is not defined" format instead (WORLDMONITOR-NB). Added to the existing "is not defined" group at line 119. |
||
|
|
434a2e0628 |
feat(settings): API Keys tab visible to all users with PRO upgrade CTA (#3190)
* feat(settings): show API Keys tab to all users with PRO upgrade CTA Free users who clicked the API Keys tab triggered a server-side ConvexError: API_ACCESS_REQUIRED (WORLDMONITOR-NA). Now the tab is always visible with a PRO badge, and the content is gated client-side: - Anonymous: lock icon + "Sign In" CTA (opens Clerk sign-in) - Free: upgrade icon + "Upgrade to Pro" CTA (opens Dodo checkout) - PRO: full key management UI (unchanged) The Convex query is never called for non-PRO users, eliminating the server error at the source while creating a natural upgrade funnel. Reuses existing panel-locked-state CSS (gold accent, gradient button). * fix(settings): gate API Keys on apiAccess feature, not isProUser Addresses review findings on PR #3190: 1. Gate changed from isProUser() to hasFeature('apiAccess') — matches the server contract in convex/apiKeys.ts which requires apiAccess (tier 2+), not just PRO (tier 1). PRO users without apiAccess now correctly see the upgrade CTA instead of the full UI. 2. CTA button now launches API_STARTER_MONTHLY checkout instead of DEFAULT_UPGRADE_PRODUCT (PRO_MONTHLY) — users buy the correct product that actually includes API key access. 3. loadApiKeys() guard now checks both getAuthState().user AND hasFeature('apiAccess') — prevents anonymous keyed sessions (widget/pro keys without Clerk auth) from hitting the Convex query that requires authentication. * fix(settings): re-render API Keys panel when entitlements arrive On cold load, hasFeature('apiAccess') returns false until the Convex entitlement subscription delivers data. A paid API Starter user who opens settings before that snapshot arrives would see the upgrade CTA and loadApiKeys() would be skipped. Subscribes to onEntitlementChange() while the modal is open and re-renders the api-keys panel content + re-attaches handlers when entitlements change. Cleans up in close() and destroy(). Also extracts handler attachment into attachApiKeysHandlers() to avoid duplicating the CTA click + input keydown wiring between render() and the entitlement callback. |