Commit Graph

3471 Commits

Author SHA1 Message Date
Elie Habib
63ef0dd0f1 docs(mintlify): finish PR 1 — landing rewrite, features refresh, maritime link-out
Completes the PR 1 items from docs/plans/2026-04-19-001-feat-docs-user-
facing-ia-refresh-plan.md that were deferred after the checkpoint on
Route Explorer + Scenario Engine + CRI nav. No new pages — only edits
to existing pages to point at and cohere with the new workflow pages.

- documentation.mdx: landing rewrite. Dropped brittle counts (344
  news sources, 49 layers, 24 CII countries, 31+ sources, 24 typed
  services) in favor of durable product framing. Surfaced the
  shipped differentiators that were invisible on the landing
  previously: Country Resilience Index (222 countries, linked to
  its methodology page), AI daily brief, Route Explorer,
  Scenario Engine, MCP server. Kept CII and CRI as two distinct
  country-risk surfaces — do not conflate.
- features.mdx: replaced the 'all 55 panels' Cmd+K claim and the
  stale inventory list with family-grouped descriptions that
  include the panels this audit surfaced as missing (disease-
  outbreaks, radiation-watch, thermal-escalation, consumer-prices,
  latest-brief, forecast, country-resilience). Added a Workflows
  section linking to Route Explorer and Scenario Engine, and a
  Country-level risk section linking CII + CRI. Untouched
  sections (map, marker clustering, data layers, export, monitors,
  activity tracking) left as-is.
- maritime-intelligence.mdx: collapsed the embedded Route Explorer
  subsection to a one-paragraph pointer at /route-explorer so the
  standalone page is the canonical home.

Panels nav group remains intentionally unadded; it waits on PR 2
content to avoid rendering an empty group in Mintlify.
2026-04-19 17:09:05 +04:00
Elie Habib
233835e206 docs(mintlify): fix stale line cite (MapContainer.activateScenario at :1010)
Greptile review P2: prose cited MapContainer.ts:1004 but activateScenario
is declared at :1010. Line 1004 landed inside the JSDoc block.
2026-04-19 17:03:21 +04:00
Elie Habib
bffa1f4498 docs(mintlify): fix PRO auth contract (trusted origin ≠ PRO)
- api-scenarios: 'X-WorldMonitor-Key (or trusted browser origin)
  + PRO' was wrong — isCallerPremium() explicitly skips
  trusted-origin short-circuits (keyCheck.required === false) and
  only counts (a) an env-valid or user-owned wm_-prefixed API key
  with apiAccess entitlement, or (b) a Clerk bearer with role=pro
  or Dodo tier ≥ 1. Browser calls work because premiumFetch()
  injects one of those credentials per request, not because Origin
  alone authenticates. Per server/_shared/premium-check.ts:34 and
  src/services/premium-fetch.ts:66.
- usage-auth: strengthened the 'Entitlement / tier gating' section
  to state outright that authentication and PRO entitlement are
  orthogonal, and that trusted Origin is NOT accepted as PRO even
  though it is accepted for public endpoints. Listed the two real
  credential forms that pass the gate.
2026-04-19 17:01:45 +04:00
Elie Habib
80ede691cf docs(mintlify): fix fourth-round findings (banner DOM, webhook TTL refresh)
- scenario-engine: accurate description of the rendered scenario
  banner. Always-present elements are the ⚠ icon, scenario name,
  top-5 impacted countries with impact %, and dismiss ×. Params
  chip (e.g. '14d · +110% cost') and 'Simulating …' tagline are
  conditional on the worker result carrying template parameters
  (durationDays, disruptionPct, costShockMultiplier). The banner
  never lists affected chokepoints by name — the map and the
  chokepoint cards surface those. Per renderScenarioBanner at
  src/components/SupplyChainPanel.ts:750.
- api-shipping-v2 (webhook TTL): register extends both the record
  and the owner-index set's 30-day TTL via atomic pipeline
  (SET + SADD + EXPIRE). rotate-secret and reactivate only
  extend the record's TTL — neither touches the owner-index set,
  so the owner index can expire independently if a caller only
  rotates/reactivates within a 30-day window. Re-register to keep
  both alive. Per api/v2/shipping/webhooks.ts:230 (register
  pipeline) and :325 (rotate setCachedJson on record only).
2026-04-19 16:57:46 +04:00
Elie Habib
fc7d46829f docs(mintlify): fix third-round review findings (real IDs + 4-state lifecycle)
- api-scenarios (template example): replaced invented
  hormuz-closure-30d / ["hormuz"] with the actually-shipped
  hormuz-tanker-blockade / ["hormuz_strait"] from scenario-
  templates.ts:80. Listed the other 5 shipped template IDs so
  scripted users aren't dependent on a single example.
- api-scenarios (status lifecycle): worker writes FOUR states,
  not three. Added the intermediate "processing" state with
  startedAt, written by the worker at job pickup (scenario-
  worker.mjs:411). Lifecycle now: pending → processing →
  done|failed. Both pending and processing are non-terminal.
- scenario-engine (scripted use blurb): mirror the 4-state
  language and link into the lifecycle table.
- scenario-engine (UI dismiss): replaced "Click Deactivate"
  with the actual × dismiss control on the scenario banner
  (aria-label: "Dismiss scenario") per
  src/components/SupplyChainPanel.ts:790. Also described the
  banner contents (name, chokepoints, countries, tagline).
- api-shipping-v2: while fixing chokepoint IDs, also corrected
  "hormuz" → "hormuz_strait" and "bab-el-mandeb" → "bab_el_mandeb"
  across all four occurrences in the shipping v2 page (from
  PR #3209). Real IDs come from server/_shared/chokepoint-
  registry.ts (snake_case, not kebab-case, not bare "hormuz").
2026-04-19 16:49:43 +04:00
Elie Habib
6380245f21 docs(mintlify): fix Route Explorer + Scenario Engine review findings
Reviewer caught 4 cases where I described behavior I hadn't read
carefully. All fixes cross-checked against source.

- route-explorer (free-tier): the workflow does NOT blur a numeric
  payload behind a public demo route. On free tier, fetchLane()
  short-circuits to renderFreeGate() which blurs the left rail,
  replaces the tab area with an Upgrade-to-PRO card, and applies a
  generic public-route highlight on the map. No lane data is
  rendered in any tab. See src/components/RouteExplorer/
  RouteExplorer.ts:212 + :342.
- route-explorer (keyboard): Tab / Shift+Tab moves focus between the
  panel and the map. Direct field jumps are F (From), T (To), P
  (Product/HS2), not Tab-cycling. Also added the full KeyboardHelp
  binding list (S swap, ↑/↓ list nav, Enter commit, Cmd+, copy URL,
  Esc close, ? help, 1-4 tabs). See src/components/RouteExplorer/
  KeyboardHelp.ts:9 and RouteExplorer.ts:623.
- scenario-engine: the SCENARIO_TEMPLATES array only ships templates
  of 4 types today (conflict, weather, sanctions, tariff_shock).
  The ScenarioType union includes infrastructure and pandemic but
  no templates of those types ship. Dropped them from the shipped
  table and noted the type union leaves room for future additions.
- scenario-engine + api-scenarios: the worker writes
  status: 'done' (not 'completed') on success, 'failed' on error;
  pending is synthesised by the status endpoint when no worker
  record exists. Fixed both the new workflow page and the merged
  api-scenarios.mdx completed-response example + polling language.
  See scripts/scenario-worker.mjs:421 and
  src/components/SupplyChainPanel.ts:870.
2026-04-19 15:56:23 +04:00
Elie Habib
44bc40ee34 docs(mintlify): add Route Explorer + Scenario Engine workflow pages
Checkpoint for review on the IA refresh (per plan
docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md).

- docs/docs.json: link Country Resilience Index methodology under
  Intelligence & Analysis so the flagship 222-country feature is
  reachable from the main nav (previously orphaned). Add a new
  Workflows group containing route-explorer and scenario-engine.
- docs/route-explorer.mdx: standalone workflow page. Who it is for,
  Cmd+K entry, four tabs (Current / Alternatives / Land / Impact),
  inputs, keyboard bindings, map-state integration, PRO gating
  with free-tier blur + public-route highlight, data sources.
- docs/scenario-engine.mdx: standalone workflow page. Template
  categories (conflict / weather / sanctions / tariff_shock /
  infrastructure / pandemic), how a scenario activates on the map,
  PRO gating, pointers to the async job API.

Deferred to follow-up commits in the same PR:
  - documentation.mdx landing rewrite
  - features.mdx refresh
  - maritime-intelligence.mdx link-out to Route Explorer
  - Panels nav group (waits for PR 2 content)

All content grounded in live source files cited inline.
2026-04-19 15:27:27 +04:00
Elie Habib
e4c95ad9be docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage (#3209)
* docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage

Audit against api/ + proto/ revealed 9 OpenAPI specs missing from nav,
the scenario/v1 service undocumented, and MCP (32 tools + OAuth 2.1 flow)
with no user-facing docs. The stale Docs_To_Review/API_REFERENCE.md still
pointed at pre-migration endpoints that no longer exist.

- Wire 9 orphaned specs into docs.json: ConsumerPrices, Forecast, Health,
  Imagery, Radiation, Resilience, Sanctions, Thermal, Webcam
- Hand-write ScenarioService.openapi.yaml (3 RPCs) until it's proto-backed
  (tracked in issue #3207)
- New MCP page with tool catalog + client setup (Claude Desktop/web, Cursor)
- New MDX for OAuth, Platform, Brief, Commerce, Notifications, Shipping v2,
  Proxies
- New Usage group: quickstart, auth matrix, rate limits, errors
- Remove docs/Docs_To_Review/API_REFERENCE.md and EXTERNAL_APIS.md
  (referenced dead endpoints); add README flagging dir as archival

* docs(mintlify): move scenario docs out of generated docs/api/ tree

The pre-push hook enforces that docs/api/ is proto-generated only.
Replace the hand-written ScenarioService.openapi.yaml with a plain
MDX page (docs/api-scenarios.mdx) until the proto migration lands
(tracked in issue #3207).

* docs(mintlify): fix factual errors flagged in PR review

Reviewer caught 5 endpoints where I speculated on shape/method/limits
instead of reading the code. All fixes cross-checked against the
source:

- api-shipping-v2: route-intelligence is GET with query params
  (fromIso2, toIso2, cargoType, hs2), not POST with a JSON body.
  Response shape is {primaryRouteId, chokepointExposures[],
  bypassOptions[], warRiskTier, disruptionScore, ...}.
- api-commerce: /api/product-catalog returns {tiers, fetchedAt,
  cachedUntil, priceSource} with tier groups free|pro|api_starter|
  enterprise, not the invented {currency, plans}. Document the
  DELETE purge path too.
- api-notifications: Slack/Discord /oauth/start are POST + Clerk
  JWT + PRO (returning {oauthUrl}), not GET redirects. Callbacks
  remain GET.
- api-platform: /api/version returns the latest GitHub Release
  ({version, tag, url, prerelease}), not deployed commit/build
  metadata.
- api-oauth + mcp: /api/oauth/register limit is 5/60s/IP (match
  code), not 10/hour.

Also caught while double-checking: /api/register-interest and
/api/contact are 5/60min and 3/60min respectively (1-hour window,
not 1-minute). Both require Turnstile. Removed the fabricated
limits for share-url, notification-channels, create-checkout
(they fall back to the default per-IP limit).

* docs(mintlify): second-round fixes — verify every claim against source

Reviewer caught 7 more cases where I described API behavior I hadn't
read. Each fix below cross-checked against the handler.

- api-commerce (product-catalog): tiers are flat objects with
  monthlyPrice/annualPrice/monthlyProductId/annualProductId on paid
  tiers, price+period for free, price:null for enterprise. There is
  no nested plans[] array.
- api-commerce (referral/me): returns {code, shareUrl}, not counts.
  Code is a deterministic 8-char HMAC of the Clerk userId; binding
  into Convex is fire-and-forget via ctx.waitUntil.
- api-notifications (notification-channels): actual action set is
  create-pairing-token, set-channel, set-web-push, delete-channel,
  set-alert-rules, set-quiet-hours, set-digest-settings. Replaced
  the made-up list.
- api-shipping-v2 (webhooks): alertThreshold is numeric 0-100
  (default 50), not a severity string. Subscriber IDs are wh_+24hex;
  secret is raw 64-char hex (no whsec_ prefix). POST registration
  returns 201. Added the management routes: GET /{id},
  POST /{id}/rotate-secret, POST /{id}/reactivate.
- api-platform (cache-purge): auth is Authorization: Bearer
  RELAY_SHARED_SECRET, not an admin-key header. Body takes keys[]
  and/or patterns[] (not {key} or {tag}), with explicit per-request
  caps and prefix-blocklist behavior.
- api-platform (download): platform+variant query params, not
  file=<id>. Response is a 302 to a GitHub release asset; documented
  the full platform/variant tables.
- mcp: server also accepts direct X-WorldMonitor-Key in addition to
  OAuth bearer. Fixed the curl example which was incorrectly sending
  a wm_live_ API key as a bearer token.
- api-notifications (youtube/live): handler reads channel or videoId,
  not channelId.
- usage-auth: corrected the auth-matrix row for /api/mcp to reflect
  that OAuth is one of two accepted modes.

* docs(mintlify): fix Greptile review findings

- mcp.mdx: 'Five' slow tools → 'Six' (list contains 6 tools)
- api-scenarios.mdx: replace invalid JSON numeric separator
  (8_400_000_000) with plain integer (8400000000)

Greptile's third finding — /api/oauth/register rate-limit contradiction
across api-oauth.mdx / mcp.mdx / usage-rate-limits.mdx — was already
resolved in commit 4f2600b2a (reviewed commit was eb5654647).
2026-04-19 15:03:16 +04:00
Elie Habib
38e6892995 fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205)
* fix(brief): per-run slot URL so same-day digests link to distinct briefs

Digest emails at 8am and 1pm on the same day pointed to byte-identical
magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz.
Each compose run overwrote the single daily envelope in place, and the
composer rolling 24h story window meant afternoon output often looked
identical to morning. Readers clicking an older email got whatever the
latest cron happened to write.

Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The
magazine URL, carousel URLs, and Redis key all carry the slot, and each
digest dispatch gets its own frozen envelope that lives out the 7d TTL.
envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026".

The digest cron also writes a brief:latest:{userId} pointer (7d TTL,
overwritten each compose) so the dashboard panel and share-url endpoint
can locate the most recent brief without knowing the slot. The
previous date-probing strategy does not work once keys carry HHMM.

No back-compat for the old YYYY-MM-DD format: the verifier rejects it,
the composer only ever writes the new shape, and any in-flight
notifications signed under the old format will 403 on click. Acceptable
at the rollout boundary per product decision.

* fix(brief): carve middleware bot allowlist to accept slot-format carousel path

BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the
pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted
by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist
and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn
image fetchers would 403 on sendMediaGroup, breaking previews for the
new digest links.

CI missed this because tests/middleware-bot-gate.test.mts still
exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the
slot format and add a regression asserting the pre-slot shape is now
rejected, so legacy links cannot silently leak the allowlist after
the rollout.

* fix(brief): preserve caller-requested slot + correct no-brief share-url error

Two contract bugs in the slot rollout that silently misled callers:

1. GET /api/latest-brief?slot=X where X has no envelope was returning
   { status: 'composing', issueDate: <today UTC> } — which reads as
   "today's brief is composing" instead of "the specific slot you
   asked about doesn't exist". A caller probing a known historical
   slot would get a completely unrelated "today" signal. Now we echo
   the requested slot back (issueSlot + issueDate derived from its
   date portion) when the caller supplied ?slot=, and keep the
   UTC-today placeholder only for the no-param path.

2. POST /api/brief/share-url with no slot and no latest-pointer was
   falling into the generic invalid_slot_shape 400 branch. That is
   not an input-shape problem; it is "no brief exists yet for this
   user". Return 404 brief_not_found — the same code the
   existing-envelope check returns — so callers get one coherent
   contract: either the brief exists and is shareable, or it doesn't
   and you get 404.
2026-04-19 14:15:59 +04:00
Elie Habib
56054bfbc1 fix(brief): use wildcard glob in vercel.json functions key (PR #3204 follow-up) (#3206)
* fix(brief): use wildcard glob in vercel.json functions key

PR #3204 shipped the right `includeFiles` value but the WRONG key:

  "api/brief/carousel/[userId]/[issueDate]/[page].ts"

Vercel's `functions` config keys are micromatch globs, not literal
paths. Bracketed segments like `[userId]` are parsed as character
classes (match any ONE character from {u,s,e,r,I,d}), so my rule
matched zero files and `includeFiles` was silently ignored. Post-
merge probe still returned HTTP 500 FUNCTION_INVOCATION_FAILED on
every request. Build log shows zero mentions of `carousel` or
`resvg` — corroborates the key never applied.

Fix: wildcard path segments.

  "api/brief/carousel/**"

Matches any file under the carousel route dir. Since the only
deployed file there is the dynamic-segment handler, the effective
scope is identical to what I originally intended.

Added a second regression test that sweeps every functions key and
fails loudly if any bracketed segment slips back in. Guards against
future reverts AND against anyone copy-pasting the literal route
path without realising Vercel reads it as a glob.

23/23 deploy-config tests pass (was 22, +1 new guard).

* Address Greptile P2: widen bracket-literal guard regex

Greptile spotted that `/\[[A-Za-z]+\]/` only matches purely-alphabetic
segment names. Real-world Next.js routes often use `[user_id]`,
`[issue_date]`, `[page1]`, `[slug2024]` — none flagged by the old
regex, so the guard would silently pass on the exact kind of
regression it was written to catch.

Widened to `/\[[A-Za-z][A-Za-z0-9_]*\]/`:
  - requires a leading letter (so legit char classes like `[0-9]`
    and `[!abc]` don't false-positive)
  - allows letters, digits, underscores after the first char
  - covers every Next.js-style dynamic-segment name convention

Also added a self-test that pins positive cases (userId, user_id,
issue_date, page1, slug2024) and negative cases (the actual `**`
glob, `[0-9]`, `[!abc]`) so any future narrowing of the regex
breaks CI immediately instead of silently re-opening PR #3206.

24/24 deploy-config tests pass (was 23, +1 new self-test).
2026-04-19 14:02:30 +04:00
Elie Habib
305dc5ef36 feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200)
* feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op)

Replaces the inline Jaccard story-dedup in seed-digest-notifications
with an orchestrator that can run Jaccard, shadow, or full embedding
modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so
production behaviour is unchanged until Phase C shadow + Phase D flip.

New modules (scripts/lib/):
- brief-dedup-consts.mjs       tunables + cache prefix + __constants bag
- brief-dedup-jaccard.mjs      verbatim 0.55-threshold extract (fallback)
- entity-gazetteer.mjs         cities/regions gazetteer + common-caps
- brief-embedding.mjs          OpenRouter /embeddings client with Upstash
                               cache, all-or-nothing timeout, cosineSimilarity
- brief-dedup-embed.mjs        complete-link clustering + entity veto (pure)
- brief-dedup.mjs              orchestrator, env read at call entry,
                               shadow archive, structured log line

Operator tools (scripts/tools/):
- calibrate-dedup-threshold.mjs  offline calibration runner + histogram
- golden-pair-validator.mjs      live-embedder drift detector (nightly CI)
- shadow-sample.mjs              Sample A/B CSV emitter over SCAN archive

Tests:
- brief-dedup-jaccard.test.mjs    migrated from regex-harness to direct
                                   import plus orchestrator parity tests (22)
- brief-dedup-embedding.test.mjs  9 plan scenarios incl. 10-permutation
                                   property test, complete-link non-chain (21)
- brief-dedup-golden.test.mjs     20-pair mocked canary (21)

Workflows:
- .github/workflows/dedup-golden-pairs.yml  nightly live-embedder canary
                                             (07:17 UTC), opens issue on drift

Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran
shuts Hormuz") case can't return true under a single coherent
classification (country-in-A vs capital-in-B sit on different sides
of the actor/location boundary). Gazetteer follows the plan's
"countries are actors" intent; the test is updated to assert false
with a comment pointing at the irreducible capital-country
coreference limitation.

Verification:
- npm run test:data          5825/5825 pass
- tests/edge-functions        171/171 pass
- typecheck + typecheck:api  clean
- biome check on new files    clean
- lint:md                     0 errors

Phase B (calibration), Phase C (shadow), and Phase D (flip) are
subsequent PRs.

* refactor(digest-dedup): address review findings 193-199

Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across
kieran-typescript, security-sentinel, performance-oracle, architecture-
strategist, and code-simplicity reviewers. Fixes below; all 64 dedup
tests + 5825 data tests + 171 edge-function tests still green.

P1 #193 - dedup regex + redis pipeline duplication
- Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs;
  both orchestrator and embedding client import from there.
- normalizeForEmbedding now delegates to stripSourceSuffix from the
  Jaccard module so the outlet allow-list is single-sourced.

P1 #194 - embedding timeout floor + negative-budget path
- callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0
  instead of opening a doomed 250ms fetch.
- Removed Math.max(250, ...) floor that let wall-clock cap overshoot.

P1 #195 - dead env getters
- Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled /
  getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs
  (zero callers; orchestrator reimplements inline).

P2 #196 - orchestrator cleanup bundle
- Removed re-exports at bottom of brief-dedup.mjs.
- Extracted materializeCluster into brief-dedup-jaccard.mjs; both
  the fallback and orchestrator use the shared helper.
- Deleted clusterWithEntityVeto wrapper; orchestrator inlines the
  vetoFn wiring at the single call site.
- Shadow mode now runs Jaccard exactly once per tick (was twice).
- Fallback warn line carries reason=ErrorName so operators can
  filter timeout vs provider vs shape errors.
- Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs
  silently falling to jaccard).

P2 #197 - workflow + shadow-sample hardening
- dedup-golden-pairs.yml body composition no longer relies on a
  heredoc that would command-substitute validator stdout. Switched
  to printf with sanitised LOG_TAIL (printable ASCII only) and
  --body-file so crafted fixture text cannot escape into the runner.
- shadow-sample.mjs Upstash helper enforces a hardcoded command
  allowlist (SCAN | GET | EXISTS).

P2 #198 - test + observability polish
- Scenarios 2 and 3 deep-equal returned clusters against the Jaccard
  expected shape, not just length. Also assert the reason= field.

P3 #199 - nits
- Removed __constants test-bag; jaccard tests use named imports.
- Renamed deps.apiKey to deps._apiKey in embedding client.
- Added @pre JSDoc on diffClustersByHash about unique-hash contract.
- Deferred: mocked golden-pair test removal, gazetteer JSON migration,
  scripts/tools AGENTS.md doc note.

Todos 193-199 moved from pending to complete.

Verification:
- npm run test:data            5825/5825 pass
- tests/edge-functions          171/171 pass
- typecheck + typecheck:api    clean
- biome check on changed files clean

* fix(digest-dedup): address Greptile P2 findings on PR #3200

1. brief-embedding.mjs: wrap fetch lookup as
   `(...args) => globalThis.fetch(...args)` instead of aliasing bare
   `fetch`. Aliasing captures the binding at module-load time, so
   later instrumentation / Edge-runtime shims don't see the wrapper —
   same class of bug as the banned `fetch.bind(globalThis)` pattern
   flagged in AGENTS.md.

2. dedup-golden-pairs.yml: `gh issue create --label "..." || true`
   silently swallowed the failure when any of dedup/canary/p1 labels
   didn't pre-exist, breaking the drift alert channel while leaving
   the job red in the Actions UI. Switched to repeated `--label`
   flags + `--create-label` so any missing label is auto-created on
   first drift, and dropped the `|| true` so a legitimate failure
   (network / auth) surfaces instead of hiding.

Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1);
applied pre-merge so the nightly canary is usable from day one.

* fix(digest-dedup): two P1s found on PR #3200

P1 — canary classifier must match production
Nightly golden-pair validator was checking a hardcoded threshold
(default 0.60) and always applied the entity veto, while the actual
dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase
C/D env flip could make the canary green while prod was wrong or
red while prod was healthy, defeating the whole point of a drift
detector.

Fix:
- golden-pair-validator.mjs now calls readOrchestratorConfig(process.env)
  — the same helper the orchestrator uses — so any classifier knob
  added later is picked up automatically. The threshold and veto-
  enabled flags are sourced from env by default; a --threshold CLI
  flag still overrides for manual calibration sweeps.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and
  DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.*),
  which operators must keep in lockstep with Railway. The
  workflow_dispatch threshold input now defaults to empty; the
  scheduled canary always uses the production-parity config.
- Validator log line prints the effective config + source so nightly
  output makes the classifier visible.

P1 — shadow archive writes were fail-open
`defaultRedisPipeline()` returns null on timeout / auth / HTTP
failure. `writeShadowArchive()` only had a try/catch, so the null
result was silently treated as success. A Phase C rollout could
log clean "mode=shadow … disagreements=X" lines every tick while
the Upstash archive received zero writes — and Sample B labelling
would then find no batches, silently killing calibration.

Fix:
- writeShadowArchive now inspects the pipeline return. null result,
  non-array response, per-command {error}, or a cell without
  {result: "OK"} all return {ok: false, reason}.
- Orchestrator emits a warn line with the failure reason, and the
  structured log line carries archive_write=ok|failed so operators
  can grep for failed ticks.
- Regression test in brief-dedup-embedding.test.mjs simulates the
  null-pipeline contract and asserts both the warn and the structured
  field land.

Verification:
- test:data           5825/5825 pass
- dedup suites         65/65   pass (new: archive-fail regression)
- typecheck + api     clean
- biome check         clean on changed files

* fix(digest-dedup): two more P1s found on PR #3200

P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED
The prior round fixed the threshold/veto knobs but left the canary
running embeddings regardless of whether production could actually
reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or
DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the
classifier, so a drift signal is meaningless — or worse, a live
OpenRouter issue flags the canary while prod is obliviously fine.

Fix:
- golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the
  same readOrchestratorConfig() helper the orchestrator uses. When
  either says "embed path inactive in prod", the validator logs an
  explicit skip line and exits 0. The nightly workflow then shows
  green, which is the correct signal ("nothing to drift against").
- A --force CLI flag remains for manual dispatch during staged
  rollouts.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and
  DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables
  alongside the threshold and veto-enabled knobs, so all four
  classifier gates stay in lockstep with Railway.
- Validator log line now prints mode + remoteEmbedEnabled so the
  canary output surfaces which classifier it validated.

P1 — shadow-sample Sample A was biased by SCAN order
enumerate-and-dedup added every seen pair to a dedup key BEFORE
filtering by agreement. If the same pair appeared in an agreeing
batch first and a disagreeing batch later, the disagreeing
occurrence was silently dropped. SCAN order is unspecified, so
Sample A could omit real disagreement pairs.

Fix:
- Extracted the enumeration into a pure `enumeratePairs(archives, mode)`
  export so the logic is testable. Mode filter runs BEFORE the dedup
  check: agreeing pairs are skipped entirely under
  --mode disagreements, so any later disagreeing occurrence can
  still claim the dedup slot.
- Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression
  cases: agreement-then-disagreement, reversed order (symmetry),
  always-agreed omission, population enumeration, cross-batch dedup.
- isMain guard added so importing the module for tests does not
  kick off the CLI scan path.

Verification:
- test:data           5825/5825 pass
- dedup suites         70/70   pass (5 new shadow-sample regressions)
- typecheck + api     clean
- biome check         clean on changed files

Operator follow-up before Phase C:
Set all FOUR dedup repo variables in GitHub alongside Railway:
  DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED,
  DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED

* refactor(digest-dedup): Railway is the single source of truth for dedup config

Fair user pushback: asking operators to set four DIGEST_DEDUP_*
values in BOTH Railway (where the cron runs) AND GitHub repo
variables (where the canary runs) is architectural debt. Two
copies of the same truth will always drift.

Solution: the digest cron publishes its resolved config to Upstash
on every tick under brief:dedup:config:v1 (2h TTL). The nightly
golden-pair canary reads that key instead of env vars. Railway
stays the sole source of truth; no parallel repo variables to
maintain. A missing/expired key signals "cron hasn't run" and
the canary skips with exit 0 — better than validating against
hardcoded defaults that might diverge from prod.

Changes:
- brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants.
- brief-dedup.mjs: new publishActiveConfig() fires at the start of
  every deduplicateStories() call (before the mode short-circuit,
  so jaccard ticks also publish a "mode=jaccard" signal the canary
  can read). Fire-and-forget; archive-write error semantics still
  apply if the operator wants stricter tracking.
- golden-pair-validator.mjs: removed readOrchestratorConfig(env)
  path. Now calls fetchActiveConfigFromUpstash() and either
  validates against that config, skips when the embed path is
  inactive, or skips when the key is missing (with --force
  override for manual dispatch).
- dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines
  and the corresponding repo-variable dependency. Only the three
  Upstash + OpenRouter secrets remain.
- tests: two new regressions assert config is published on every
  tick (shadow AND jaccard modes) with the right shape + TTL.

Operator onboarding now takes one action: set the four
DIGEST_DEDUP_* variables on the Railway seed-digest-notifications
service. Nothing to set in GitHub beyond the existing
OPENROUTER_API_KEY / UPSTASH_* secrets.

Verification:
- test:data           5825/5825 pass
- dedup suites         72/72   pass (2 new config-publish regressions)
- typecheck + api     clean
- biome check         clean on changed files

* refactor(digest-dedup): ship embed directly, drop phases/canary/shadow

User feedback: "i dont need multiple phases and shit, we go directly
to embed". Fair. Ripping out the overengineering I accumulated:

DELETED
- .github/workflows/dedup-golden-pairs.yml (nightly canary)
- scripts/tools/golden-pair-validator.mjs
- scripts/tools/shadow-sample.mjs
- scripts/tools/calibrate-dedup-threshold.mjs
- tests/fixtures/brief-dedup-golden-pairs.json
- tests/brief-dedup-golden.test.mjs
- tests/brief-dedup-shadow-sample.test.mjs

SIMPLIFIED
- brief-dedup.mjs: removed shadow mode, publishActiveConfig,
  writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes,
  and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now
  binary: `embed` (default) or `jaccard` (instant kill switch).
- brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_*, ACTIVE_CONFIG_*.
- Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path).
  Railway deploy with OPENROUTER_API_KEY set = embeddings live on
  next cron tick. Set MODE=jaccard on Railway to revert instantly.

Orchestrator still falls back to Jaccard on any embed-path failure
(timeout, provider outage, missing API key, bad response). Fallback
warn carries reason=<ErrorName>. The cron never fails because
embeddings flaked. All 64 dedup tests + 5825 data tests still green.

Net diff: -1,407 lines.

Operator single action: set OPENROUTER_API_KEY on Railway's
seed-digest-notifications service (already present) and ship. No
GH Actions, no shadow archives, no labelling sprints. If the 0.60
threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on
Railway — takes effect on next tick, no redeploy.

* fix(digest-dedup): multi-word location phrases in the entity veto

Extractor was whitespace-tokenising and only single-token matching
against LOCATION_GAZETTEER, silently making every multi-word entry
unreachable:

  extractEntities("Houthis strike ship in Red Sea")
    → { locations: [], actors: ['houthis','red','sea'] }   ✗
  shouldVeto("Houthis strike ship in Red Sea",
             "US escorts convoy in Red Sea")  → false       ✗

With MODE=embed as the default, that turned off the main
anti-overmerge safety rail for bodies of water, regions, and
compound city names — exactly the P07-Hormuz / Houthis-Red-Sea
headlines the veto was designed to cover.

Fix: greedy longest-phrase scan with a sliding window. At each
token position try the longest multi-word phrase first (down to
2), require first AND last tokens to be capitalised (so lowercase
prose like "the middle east" doesn't falsely match while headline
"Middle East" does), lowercase connectors in between are fine
("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to
single-token lookup when no multi-word phrase fits.

Now:
  extractEntities("Houthis strike ship in Red Sea")
    → { locations: ['red sea'], actors: ['houthis'] }       ✓
  shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true             ✓

Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4
(longest gazetteer entry: "ho chi minh city"), so this is
effectively O(N).

Added 5 regression tests covering Red Sea, South China Sea,
Strait of Hormuz (lowercase-connector case), Abu Dhabi, and
New York, plus the Houthis-vs-US veto reproducer from the P1.
All 5825 data tests + 45 dedup tests green; lint + typecheck clean.
2026-04-19 13:49:48 +04:00
Elie Habib
27849fee1e fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn (#3204)
* fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn

Real root cause of every Telegram carousel WEBPAGE_CURL_FAILED
since PR #3174 merged. Not middleware (last PR fixed that
theoretical path but not the observed failure). The Vercel
function itself crashes HTTP 500 FUNCTION_INVOCATION_FAILED on
every request including OPTIONS - the isolate can't initialise.

The handler imports brief-carousel-render which lazy-imports
@resvg/resvg-js. That package's js-binding.js does runtime
require(@resvg/resvg-js-<platform>-<arch>-<libc>). On Vercel
Lambda (Amazon Linux 2 glibc) that resolves to
@resvg/resvg-js-linux-x64-gnu. Vercel nft tracing does NOT
follow this conditional require so the optional peer package
isnt bundled. Cold start throws MODULE_NOT_FOUND, isolate
crashes, Vercel returns FUNCTION_INVOCATION_FAILED, Telegram
reports WEBPAGE_CURL_FAILED.

Fix: vercel.json functions.includeFiles forces linux-x64-gnu
binding into the carousel functions bundle. Only this route
needs it; every other api route is unaffected.

Verified:
- deploy-config tests 21/21 pass
- JSON valid
- Reproduced 500 via curl on all methods and UAs
- resvg-js/js-binding.js confirms linux-x64-gnu is the runtime
  binary on Amazon Linux 2 glibc

Post-merge: curl with TelegramBot UA should return 200 image/png
instead of 500; next cron tick should clear the Railway
[digest] Telegram carousel 400 line.

* Address Greptile P2s: regression guard + arch-assumption reasoning

Two P2 findings on PR #3204:

P2 #1 (inline on vercel.json:6): Platform architecture assumption
undocumented. If Vercel migrates to Graviton/arm64 Lambda the
cold-start crash silently returns. vercel.json is strict JSON so
comments aren't possible inline.

P2 #2 (tests/deploy-config.test.mjs:17): No regression guard for
the carousel includeFiles rule. A future vercel.json tidy-up
could silently revert the fix with no CI signal.

Fixed both in a single block:

- New describe() in deploy-config.test.mjs asserts the carousel
  route's functions entry exists AND its includeFiles points at
  @resvg/resvg-js-linux-x64-gnu. Any drift fails the build.
- The block comment above it documents the Amazon Linux 2 x86_64
  glibc assumption that would have lived next to the includeFiles
  entry if JSON supported comments. Includes the Graviton/arm64
  migration pointer.

tests 22/22 pass (was 21, +1 new).
2026-04-19 13:36:17 +04:00
Elie Habib
45f02fed00 fix(sentry): filter Three.js OrbitControls setPointerCapture NotFoundError (#3201)
* fix(sentry): suppress Three.js OrbitControls setPointerCapture NotFoundError

OrbitControls' pointerdown handler calls setPointerCapture after the
browser has already released the pointer (focus change, rapid re-tap),
leaking as an unhandled NotFoundError. OrbitControls is bundled into
main-*.js so hasFirstParty=true; matched by the unique setPointerCapture
message (grep confirms no first-party setPointerCapture usage).

Resolves WORLDMONITOR-NC.

* fix(sentry): gate OrbitControls setPointerCapture filter on bundle-only stack

Review feedback: suppressing by message alone would hide a future first-party
setPointerCapture regression. Mirror the existing OrbitControls filter's
provenance check — require absence of any source-mapped .ts/.tsx frame so the
filter only matches stacks whose only non-infra frame is the bundled main chunk.

Adds positive + negative regression tests for the pair.

* fix(sentry): gate OrbitControls filter on positive three.js context signature

Review feedback: absence of .ts/.tsx frames is not proof of third-party origin
because production stacks are often unsymbolicated. Replace the negative-only
gate with a positive OrbitControls signature — require a frame whose context
slice contains the literal `_pointers … setPointerCapture` adjacency unique to
three.js OrbitControls. Update tests to cover the production-realistic case
(unsymbolicated first-party bundle frame calling setPointerCapture must still
reach Sentry) plus a defensive no-context fallthrough.
2026-04-19 13:15:31 +04:00
Elie Habib
d7f87754f0 fix(emails): update transactional email copy — 22 → 30+ services (#3203)
Follow-up to #3202. Greptile flagged two transactional email templates still claimed '22 services' while /pro now advertises '30+':

- api/register-interest.js:90 — interest-registration confirmation email ('22 Services, 1 Key')
- convex/payments/subscriptionEmails.ts:57 — API subscription confirmation email ('22 services, one API key')

A user signing up via /pro would read '30+ services' on the page, then receive an email saying '22'. Both updated to '30+' matching the /pro page and the actual server domain count (31 in server/worldmonitor/*, plus api/scenario/v1/ = 32, growing).
2026-04-19 13:15:17 +04:00
Elie Habib
135082d84f fix(pro): correct service-domain count — 22 → 30+ (server has 31) (#3202)
* fix(pro): correct service-domain count — 22 → 30+ (server has 31, growing)

The /pro page advertised '22 services' / '22 service domains' but server/worldmonitor/, proto/worldmonitor/, and src/generated/server/worldmonitor/ all have 31 domain dirs (aviation, climate, conflict, consumer-prices, cyber, displacement, economic, forecast, giving, health, imagery, infrastructure, intelligence, maritime, market, military, natural, news, positive-events, prediction, radiation, research, resilience, sanctions, seismology, supply-chain, thermal, trade, unrest, webcam, wildfire). api/scenario/v1/ adds a 32nd recently shipped surface.

Used '30+' rather than the literal '31' so the page doesn't drift again every time a new domain ships (the '22' was probably accurate at one point too).

168 string substitutions across all 21 locale JSON files (8 keys each: twoPath.proDesc, twoPath.proF1, whyUpgrade.fasterDesc, pillars.askItDesc, dataCoverage.subtitle, proShowcase.oneKey, apiSection.restApi, faq.a8). Plus 10 in pro-test/index.html (meta description, og:description, twitter:description, SoftwareApplication ld+json description + Pro Monthly offer, FAQ ld+json a8, noscript fallback). Bundle rebuilt.

* fix(pro): Bulgarian grammar — drop definite-article suffix after 30+
2026-04-19 13:07:07 +04:00
Elie Habib
cce46a1767 fix(pro): API tier is launched — drop 'Coming Soon' label (#3198)
The /pro comparison-table column header still read 'API (Coming Soon)' across all 21 locales (and locale-translated variants), but convex/config/productCatalog.ts has api_starter at currentForCheckout=true, publicVisible=true, priceCents=9999 — $99.99/month, with api_starter_annual at $999/year. The API tier is shipped and self-serve.

Updated pricingTable.apiHeader → 'API ($99.99)' for every locale, matching the same '<Tier> ($<price>)' pattern as 'Free ($0)' and 'Pro ($39.99)'. Bundle rebuilt.
2026-04-19 11:44:35 +04:00
Elie Habib
c7aacfd651 fix(health): persist WARNING events + add failure-log timeline (#3197)
* fix(health): persist WARNING events + add failure-log timeline

WARNING status (stale seeds) was excluded from the health:last-failure
Redis write (line 680 checked `!== 'WARNING'`). When UptimeRobot keyword-
checks for "HEALTHY" and gets a WARNING response, it flags DOWN, but no
forensic trail was left in Redis. This made stale-seed incidents invisible
to post-mortem investigation.

Changes:
- Write health:last-failure for ANY non-HEALTHY status (including WARNING)
- Add health:failure-log (LPUSH list, last 50 entries, 7-day TTL) so
  multiple incidents are preserved as a timeline, not just the latest
- Include warnCount alongside critCount in the snapshot
- Broaden the problems filter to capture all non-OK statuses

* fix(health): dedupe failure-log entries by incident signature

Repeated polls during one long WARNING window would LPUSH near-identical
snapshots, filling the 50-entry log and evicting older distinct incidents.

Now compares a signature (status + sorted problem set) against the previous
entry via health:failure-log-sig. Only appends when the incident changes.
The last-failure key is still updated every poll (latest timestamp matters).

* fix(health): add 4s timeout to persist pipelines + consistent arg types

Addresses greptile review on PR #3197:
- Both persist redisPipeline calls now pass 4_000ms timeout (main data
  pipeline uses 8_000ms; persist is less critical so shorter is fine)
- LTRIM/EXPIRE args use numbers consistently (was mixing number/string)

* fix(health): atomic sig swap via SET ... GET to eliminate dedupe race

Two concurrent /api/health requests could both read the old signature
before either write lands, appending duplicate entries. Now uses
SET key val EX ttl GET (Redis 6.2+) to atomically swap the sig and
return the previous value in one pipeline command. The LPUSH only
fires if the returned previous sig differs from the new one.

Also skips the second redisPipeline call entirely when sig matches
(no logCmds to send).

* fix(health): exclude seedAgeMin from dedupe sig + clear sig on recovery

Two issues with the failure-log dedupe:

1. seedAgeMin changes on every poll (e.g. 31min, 32min, 33min), so
   the signature changed every time and LPUSH still fired on every
   probe during a STALE_SEED window. Now uses a separate sigKeys
   array with only key:status (no age) for the signature, while
   problemKeys still includes ages for the snapshot payload.

2. The sig was never cleared on recovery. If the same problem set
   recurred after a healthy gap, the old sig (within its 24h TTL)
   would match and the recurrence would be silently skipped. Now
   DELs health:failure-log-sig when overall === 'HEALTHY'.

* fix(health): move sig write after LPUSH in same pipeline

The sig was written eagerly in the first pipeline (SET ... GET), but the
LPUSH happened in a separate background pipeline. If that second write
failed, the sig was already advanced, permanently deduping the incident
out of the timeline.

Now: GET sig first (read-only), then write last-failure + LPUSH + sig
all in one pipeline. The sig only advances if the entire pipeline
succeeds. Failure leaves the old sig in place so the next poll retries.

Reintroduces a small read-then-write race window (two concurrent probes
can both read the old sig), but the worst case is a single duplicate
entry, which is strictly better than a permanently dropped incident.
2026-04-19 10:14:19 +04:00
Elie Habib
63464775a5 feat(supply-chain): scenario UX — rich banner + projected score + faster poll (#3193)
* feat(supply-chain): rich scenario banner + projected score per chokepoint + faster poll

User reported Simulate Closure adds only a thin banner with no context —
"not clear what value user is getting, takes many many seconds". Four
targeted UX improvements in one PR:

A. Rich banner (scenario params + tagline)
   Banner now reads:
     ⚠ Hormuz Tanker Blockade · 14d · +110% cost
        CN 100% · IN 84% · TW 82% · IR 80% · US 39%
        Simulating 14d / 100% closure / +110% cost on 1 chokepoint.
        Chokepoint card below shows projected score; map highlights…
   Surfaces the scenario template fields (durationDays, disruptionPct,
   costShockMultiplier) + a one-line explainer so a first-time user
   understands what "CN 100%" actually means.

B. Projected score on each affected chokepoint card
   Card header now shows: `[current]/100 → [projected]/100` with a red
   trailing badge + red left border on the card body.
   Body prepends: "⚠ Projected under scenario: X% closure for N days
   (+Y% cost)".
   Projected = max(current, template.disruptionPct) — conservative
   floor since the real scoring mixes threat + warnings + anomaly.

C. Faster polling
   Status poll interval 2s → 1s. Max iterations 30→60 (unchanged 60s
   budget). Worker processes in <1s; perceived latency drops from
   2–3s to <2s in the common case. First poll still immediate.

D. ScenarioResult interface widened
   Added optional `template` and `currentDisruptionScores` fields in
   scenario-templates.ts to match what the scenario-worker already
   emits. Optional = backward-compat with map-only consumers.

Dependent on PR #3192 (already merged) which fixed the 10000% banner
% inflation.

* fix(supply-chain): trigger render() on scenario activate/dismiss — cards must re-render

PR review caught a real bug in the new scenario UX: showScenarioSummary
and hideScenarioSummary were mutating the banner DOM directly without
triggering render(). renderChokepoints() reads activeScenarioState to
paint the projected score + red border + callout, but those only run
during render() — so the cards stayed stale on activate AND on dismiss
until some unrelated re-render happened.

Refactor to split public API from internal rendering:

- showScenarioSummary(scenarioId, result) — now just sets state + calls
  render(). Was: set state + inline DOM mutation (bypassing card render).
- renderScenarioBanner() — new private helper that builds the banner
  DOM from activeScenarioState. Called from render()'s postlude
  (replacing the old self-recursive showScenarioSummary() call — which
  only worked because it had a side-effectful early-exit path that
  happened to terminate, but was a latent recursion risk).
- hideScenarioSummary() — now just sets state=null + calls render().
  Was: clear state + manual banner removal + manual button-text reset
  loop. The button loop is redundant now — the freshly-rendered card
  template produces buttons with default "Simulate Closure" text by
  construction.

Net effect: activating a scenario paints the banner AND the affected
chokepoint cards in a single render tick. Dismissing strips both in
the same tick.

* fix(supply-chain): derive scenario button state from activeScenarioState, not imperative mutation

PR review caught: the earlier re-render fix (showScenarioSummary → render())
correctly repaints cards on activate, but the button-state logic in
runScenario() is now wrong. render() detaches the old btn reference, so
the post-onScenarioActivate `resetButton('Active') + btn.disabled = true`
touches a detached node and no-ops (resetButton() explicitly skips
!btn.isConnected). The fresh button painted by render() uses the default
template text — visible button reads "Simulate Closure" enabled, and users
can queue duplicate runs of an already-active scenario.

Fix: make button state a function of panel state.

- renderChokepoints() scenario section: check
  activeScenarioState.scenarioId === template.id and, when matched, emit
  the button with class `sc-scenario-btn--active`, text "Active", and
  `disabled` attribute. On dismiss, the next render strips those
  automatically — same pattern as the card projection styling.
- runScenario(): drop the dead `resetButton('Active')` + `btn.disabled`
  lines after onScenarioActivate. That path is now template-driven;
  touching the detached btn was the defect.

Catch-path resets ('Simulate Closure' on abort, 'Error — retry' on real
error) are unchanged — those fire BEFORE any render could detach the btn,
so the imperative path is still correct there.

* fix(supply-chain): hide scenario projection arrow when current already ≥ template

Greptile P1: projected badge was rendered as `N/100 → N/100` whenever
current disruptionScore already met or exceeded template.disruptionPct.
Visible for Suez (80%) or Panama (50%) scenarios when a chokepoint is
already elevated — read as "scenario has zero effect", which is misleading.

The two values live on different scales — cp.disruptionScore is a
computed risk score (threat + warnings + anomaly) while
template.disruptionPct is "% of capacity blocked" — but they share the
0–100 axis so directional comparison is still meaningful for the
"does this scenario escalate things?" signal.

Fix: arrow only renders when template.disruptionPct > cp.disruptionScore.
When current already equals or exceeds the scenario level, show the
single current badge. The card's red left border + "⚠ Projected under
scenario" callout still indicate the card is the scenario target —
only the escalation arrow is suppressed.
2026-04-19 09:25:55 +04:00
Elie Habib
85d6308ed0 fix(brief): unblock Telegram carousel fetch in middleware bot gate (#3196)
* fix(brief): allow Telegram/social UAs to fetch carousel images

middleware.ts BOT_UA regex (/bot/i) was 403 on Telegram sendMediaGroup
fetch of /api/brief/carousel/<u>/<d>/<p>. SOCIAL_IMAGE_UA allowlist
(includes telegrambot) was scoped to /favico/* and .png suffix only;
carousel returns image/png but the URL has no extension.

Symptom: Railway log [digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED
and zero images above the Telegram brief.

Fix: extend UA-bypass guard to cover /api/brief/carousel/ prefix.
HMAC token on the URL is the real auth; UA allowlist is defence-in-depth.

* Address P2 + P3: regression test + route-shape regex

P2: Add tests/middleware-bot-gate.test.mts — 13 cases pinning the
contract:
  - TelegramBot/Slackbot/Discordbot/LinkedInBot pass on carousel
  - curl, generic bot UAs, missing UA still 403 on carousel
  - TelegramBot 403s on non-carousel API routes (scoped, not global)
  - Malformed carousel paths (admin/dashboard, page >= 3, non-ISO
    date) all still 403 via the regex
  - Normal browsers pass everywhere

P3: Replace startsWith('/api/brief/carousel/') prefix with
BRIEF_CAROUSEL_PATH_RE matching the exact shape enforced by
api/brief/carousel/[userId]/[issueDate]/[page].ts
(userId / YYYY-MM-DD / page 0|1|2). A future
/api/brief/carousel/admin or similar sibling cannot inherit the
bypass. Comment now lists every social-image UA this protects.

typecheck + typecheck:api clean. test:data 5772/5772.
2026-04-19 09:16:14 +04:00
Elie Habib
6025b0ce47 chore(sentry): add Chrome/Firefox variant of UTItemActionController filter (#3194)
The Safari variant (Can't find variable: UTItemActionController) was
already in ignoreErrors at line 53. Chrome/Firefox uses the "X is not
defined" format instead (WORLDMONITOR-NB). Added to the existing
"is not defined" group at line 119.
2026-04-19 08:58:07 +04:00
Elie Habib
434a2e0628 feat(settings): API Keys tab visible to all users with PRO upgrade CTA (#3190)
* feat(settings): show API Keys tab to all users with PRO upgrade CTA

Free users who clicked the API Keys tab triggered a server-side
ConvexError: API_ACCESS_REQUIRED (WORLDMONITOR-NA). Now the tab is
always visible with a PRO badge, and the content is gated client-side:

- Anonymous: lock icon + "Sign In" CTA (opens Clerk sign-in)
- Free: upgrade icon + "Upgrade to Pro" CTA (opens Dodo checkout)
- PRO: full key management UI (unchanged)

The Convex query is never called for non-PRO users, eliminating the
server error at the source while creating a natural upgrade funnel.
Reuses existing panel-locked-state CSS (gold accent, gradient button).

* fix(settings): gate API Keys on apiAccess feature, not isProUser

Addresses review findings on PR #3190:

1. Gate changed from isProUser() to hasFeature('apiAccess') — matches
   the server contract in convex/apiKeys.ts which requires apiAccess
   (tier 2+), not just PRO (tier 1). PRO users without apiAccess now
   correctly see the upgrade CTA instead of the full UI.

2. CTA button now launches API_STARTER_MONTHLY checkout instead of
   DEFAULT_UPGRADE_PRODUCT (PRO_MONTHLY) — users buy the correct
   product that actually includes API key access.

3. loadApiKeys() guard now checks both getAuthState().user AND
   hasFeature('apiAccess') — prevents anonymous keyed sessions
   (widget/pro keys without Clerk auth) from hitting the Convex
   query that requires authentication.

* fix(settings): re-render API Keys panel when entitlements arrive

On cold load, hasFeature('apiAccess') returns false until the Convex
entitlement subscription delivers data. A paid API Starter user who
opens settings before that snapshot arrives would see the upgrade CTA
and loadApiKeys() would be skipped.

Subscribes to onEntitlementChange() while the modal is open and
re-renders the api-keys panel content + re-attaches handlers when
entitlements change. Cleans up in close() and destroy().

Also extracts handler attachment into attachApiKeysHandlers() to
avoid duplicating the CTA click + input keydown wiring between
render() and the entitlement callback.
2026-04-19 08:24:10 +04:00
Elie Habib
7a99c3406e fix(supply-chain, news): scenario % double-multiply + scoreByEntities null-type TypeError (#3192)
Two unrelated issues reported from a live session (browser screenshot + console):

1. Scenario banner showed "CN 10000% · IN 8400% · TW 8200%"
   showScenarioSummary did (c.impactPct * 100).toFixed(0) but the
   scenario-worker already sends impactPct as a 0-100 integer:
     scripts/scenario-worker.mjs:295 — Math.min(Math.round((total / max) * 100), 100)
   Multiplying by 100 again inflated every percentage 100x.
   Fix: drop the extra * 100. 100 renders as "100%", 84 as "84%".

2. Sentry/console TypeError at parallel-analysis.ts scoreByEntities:
     [ParallelAnalysis] Error: TypeError: Cannot read properties of
     undefined (reading 'includes')
   The ML worker occasionally returns entities with undefined `type`
   or `text`. scoreByEntities did entities.filter(e => e.type.includes('LOC'))
   — NPE when e.type missing. Fix: narrow to a well-formed subset via
   a type guard on e?.type and e?.text strings before any string access.
   Apply the safe array everywhere downstream (locations/people/orgs +
   density + confidence) so the guard is the single source of truth.
2026-04-19 08:22:44 +04:00
Elie Habib
d8e479188a fix(supply-chain): don't CDN-cache empty chokepoint-history responses (#3189)
User reported "Transit history unavailable" persisting for Hormuz after
PR #3187 deployed. Direct Redis probe confirms
supply_chain:transit-summaries:history:v1:hormuz_strait has 174 entries.
Direct server-side curl to /api/supply-chain/v1/get-chokepoint-history
also returns 174. But the user's browser kept receiving
`{"history":[],"fetchedAt":"0"}`.

Root cause: gateway cache tier `slow` pins 200 responses for 30 min at
Cloudflare edge (s-maxage=1800). During the gap between Vercel instant
deploy and Railway ais-relay redeploy + first transit-summary cron tick
(~20 min), per-id history keys were absent, so the handler returned
empty. Those empty bodies got CF-cached and kept serving for 30 min
AFTER the keys were populated in Redis. Bab-el-Mandeb (which DOES render
for the user) got a fresh non-empty cache entry; Hormuz got stuck with
the empty one.

Fix: when returning empty (missing key, invalid id, error), call
markNoCacheResponse(ctx.request) so the gateway sets Cache-Control:
no-store instead of the 30-min tier cache. Every call on an empty
state re-checks Redis. Once data is present, the normal tier cache
applies on the non-empty response.

Mechanism: the gateway at server/gateway.ts:488 honors X-No-Cache
header via the same side-channel (response-headers.ts). Pattern already
used by other handlers for upstream-unavailable bodies.

Cost: per-id history keys are ~35KB, edge→Upstash round-trip <1.5s.
Slight Redis-traffic bump for as long as keys stay empty; negligible
in practice (only the deploy window).

Also no-caches invalid chokepoint IDs so scanners/junk IDs don't pin
30-min empties either.
2026-04-19 07:33:34 +04:00
Elie Habib
d7e40bc4e5 chore(sentry): filter NS_ERROR_UNEXPECTED + ConvexError API_ACCESS_REQUIRED (#3188)
NS_ERROR_UNEXPECTED (WORLDMONITOR-N6/N7/N8/N9): Firefox 149/Ubuntu XPCOM
Worker init failure. Same family as already-filtered NS_ERROR_ABORT and
NS_ERROR_OUT_OF_MEMORY. 0 repo matches. Worker fallback confirmed working
via breadcrumbs ("keeping flat list").

ConvexError: API_ACCESS_REQUIRED (WORLDMONITOR-NA, 15 events/5 users):
expected business error from PR #3125 (API key management). Free user opens
API Keys tab, server correctly denies, client try/catch at
UnifiedSettings.ts:731 handles gracefully. Convex WS transport leaks the
rejection to Sentry before the client Promise chain catches it.
2026-04-18 23:51:12 +04:00
Elie Habib
96fca1dc2b fix(supply-chain): popup-keyed history re-query + dataAvailable flag (#3187)
* fix(supply-chain): popup-keyed history re-query + dataAvailable flag for partial coverage

Two P1 findings on #3185 post-merge review:

1. MapPopup cross-chokepoint history contamination
   Popup's async history resolve re-queried [data-transit-chart] without a
   cpId key. User opens popup A → fetch starts for cpA; user opens popup B
   before it resolves → cpA's history mounts into cpB's chart container.
   Fix: add data-transit-chart-id keyed by cpId; re-query by it on resolve.
   Mirrors SupplyChainPanel's existing data-chart-cp-id pattern.

2. Partial portwatch coverage still looked healthy
   Previous fix emits all 13 canonical summaries (zero-state fill for
   missing IDs) and records pwCovered in seed-meta, but:
   - get-chokepoint-status still zero-filled missing chokepoints and cached
     the response as healthy — panel rendered silent empty rows.
   - api/health.js only degrades on recordCount=0, so 10/13 partial read
     as OK despite the UI hiding entire chokepoints.
   Fix:
   - proto: TransitSummary.data_available (field 12). Writer tags with
     Boolean(cpData). Status RPC passes through; defaults true for pre-fix
     payloads (absence = covered).
   - Status RPC writes seed-meta recordCount as covered count (not shape
     size), and flips response-level upstreamUnavailable on partial.
   - api/health.js: new minRecordCount field on SEED_META entries + new
     COVERAGE_PARTIAL status (warn rollup). chokepoints entry declares
     minRecordCount: 13. recordCount < 13 → COVERAGE_PARTIAL.
   - Client (panel + popup): skip stats/chart rendering when
     !dataAvailable; show "Transit data unavailable (upstream partial)"
     microcopy so users understand the gap.

5759/5759 data tests pass. Typecheck + typecheck:api clean.

* fix(supply-chain): guarantee Simulate Closure button exits Computing state

User reports "Simulate Closure does nothing beyond write Computing…" — the
button sticks at Computing forever. Two causes:

1. Scenario worker appears down (0 scenario-result:* keys in Redis in the
   last 24h of 24h-TTL). Railway-side — separate intervention needed to
   redeploy scripts/scenario-worker.mjs.

2. Client leaked the "Computing…" state on multiple exit paths:
   - signal.aborted early-return inside the poll loop never reset the
     button. Second click fired abort on first → first returned without
     resetting → button stayed "Computing…" until next render.
   - !this.content.isConnected early-return also skipped reset (less
     user-visible but same class of bug).
   - catch block swallowed AbortError without resetting.
   - POST /run had no hard timeout — a hanging edge function left the
     button in Computing indefinitely.

Fix:
- resetButton(text) helper touches the btn only if still connected;
  applied in every exit path (abort, timeout, post-success, catch).
- AbortSignal.any([caller, AbortSignal.timeout(20_000)]) on POST /run.
- console.error on failure so Simulate Closure errors surface in ops.
- Error message includes "scenario worker may be down" on loop timeout
  so operators see the right suspect.

Backend observations (for follow-up):
- Hormuz backend is healthy (/api/health chokepoints OK, 13 records,
  1 min old; live RPC has hormuz_strait.riskLevel=critical, wow=-22,
  flowEstimate present; GetChokepointHistory returns 174 entries).
  User-reported "Hormuz empty" is likely browser/CDN stale cache from
  before PR #3185; hard refresh should resolve.
- scenario-worker.mjs has zero result keys in 24h. Railway service
  needs verification/redeployment.

* fix(scenario): wrong Upstash RPUSH format silently broke every Simulate Closure

Railway scenario-worker log shows every job failing field validation since
at least 03:06Z today:

  [scenario-worker] Job failed field validation, discarding:
    ["{\"jobId\":\"scenario:1776535792087:cynxx5v4\",...

The leading [" in the payload is the smoking gun. api/scenario/v1/run.ts
was POSTing to /rpush/{key} with body `[payload]`, expecting Upstash to
unpack the array and push one string value. Upstash does NOT parse that
form — it stored the literal `["{...}"]` string as a single list value.

Worker BLMOVEs the literal string → JSON.parse → array → destructure
`{jobId, scenarioId, iso2}` on an array returns undefined for all three
→ every job discarded without writing a result. Client poll returns
`pending` for the full 60s timeout, then (on the prior client code path)
leaked the stuck "Computing…" button state indefinitely.

Fix: use the standard Upstash REST command format — POST to the base URL
with body `["RPUSH", key, value]`. Matches scripts/ais-relay.cjs upstashLpush.

After this, the scenario-queue:pending list stores the raw payload string,
BLMOVE returns the payload, JSON.parse gives the object, validation passes,
computeScenario runs, result key gets written, client poll sees `done`.

Zero result keys existed in prod Redis in the last 24h (24h TTL on
scenario-result:*) — confirms the fix addresses the production outage.
2026-04-18 23:38:33 +04:00
Elie Habib
d37ffb375e fix(referral): stop /api/referral/me 503s on prod homepage (#3186)
* fix(referral): make /api/referral/me non-blocking to stop prod 503s

Reported in prod: every PRO homepage load was logging 'GET /api/referral/me 503' to Sentry. Root cause: a prior review required the Convex binding to block the response (rationale: don't hand users a dead share link). That turned any flaky relay call into a homepage-wide 503 for the 5-minute client cache window — every PRO user, every page reload.

Fix: dispatch registerReferralCodeInConvex via ctx.waitUntil. Response returns 200 + code + shareUrl unconditionally. Binding failures log a warning but never surface as 503. The mutation is idempotent; the next /api/referral/me fetch retries. The /pro?ref=<code> signup side reads userReferralCodes at conversion time, so a missed binding degrades to missed attribution (partial), never to blocked homepage (total).

The BRIEF_URL_SIGNING_SECRET-missing 503 path is unchanged — that's a genuine misconfig, not a flake.

Handler signature now takes ctx with waitUntil, matching api/notification-channels.ts and api/discord/oauth/callback.ts.

Regression test flipped: brief-referral-code.test.mjs previously enforced the blocking shape; now enforces the non-blocking shape + handler signature + explicit does-not-503-on-binding-failure assertion. 14/14 referral tests pass. Typecheck clean, 5706/5706 test:data, lint exit 0.

* fix(referral): narrow err in non-blocking catch instead of unsafe cast

Greptile P2 on #3186. The (err as Error).message cast was safe today (registerReferralCodeInConvex only throws Error instances) but would silently log 'undefined' if a future path ever threw a non-Error value. Swapped to instanceof narrow + String(err) fallback.
2026-04-18 23:32:48 +04:00
Elie Habib
3c47c1b222 fix(supply-chain): split chokepoint transit data + close silent zero-state cache (#3185)
* fix(supply-chain): split chokepoint transit data + close silent zero-state cache

Production supply-chain panel was rendering 13 empty chokepoints because
the getChokepointStatus RPC silently cached zero-state for 5 minutes:

1. supply_chain:transit-summaries:v1 grew to ~500 KB (180d × 13 × 14 fields
   of history per chokepoint).
2. REDIS_OP_TIMEOUT_MS is 1.5 s. Vercel Sydney edge → Upstash for a 500 KB
   GET consistently exceeded the budget; getCachedJson caught the AbortError
   and returned null.
3. The 500 KB portwatch fallback read hit the same timeout.
4. summaries = {} → every summaries[cp.id] was undefined → 13 chokepoints
   got the zero-state default → cached as a non-null success response for
   REDIS_CACHE_TTL (5 min) instead of NEG_SENTINEL (120 s).

Fix (one PR, per docs/plans/chokepoint-rpc-payload-split.md):

- ais-relay.cjs: split seedTransitSummaries output.
  - supply_chain:transit-summaries:v1 — compact (~30 KB, no history).
  - supply_chain:transit-summaries:history:v1:{id} — per chokepoint
    (~35 KB each, 13 keys). Both under the 1.5 s Redis read budget.
- New RPC GetChokepointHistory: lazy-loaded on card expand.
- get-chokepoint-status.ts: drop the 500 KB portwatch/corridorrisk/
  chokepoint_transits fallback reads. Treat a null transit-summaries
  read as upstreamUnavailable=true so cachedFetchJson writes NEG_SENTINEL
  (2 min) instead of a 5-min zero-state pin. Omit history from the
  response (proto field stays declared; empty array).
- server/_shared/redis.ts: tag AbortError timeouts with [REDIS-TIMEOUT]
  key=… timeoutMs=… so log drains / Sentry-Vercel integration pick up
  large-payload timeouts instead of them being silently swallowed.
- SupplyChainPanel.ts + MapPopup.ts: lazy-fetch history on card expand
  via fetchChokepointHistory; session-scoped cache; graceful "History
  unavailable" on empty/error. PRO gating on the map popup unchanged.
- Gateway: cache-tier entry for /get-chokepoint-history (slow).
- Tests: regression guards for upstreamUnavailable gate + per-id key
  shape + handler wiring + proto query annotations.

Audit included in plan: no other RPC consumer read stacks >200 KB
besides displacement:summary:v1:2026 (724 KB, same risk, flagged for
follow-up PR). wildfire:fires:v1 at 1.7 MB loads via bootstrap (3 s
timeout, different path) — monitor but out of scope.

Expected impact:
- supply_chain:chokepoints:v4 payload drops from ~508 KB to <100 KB.
- supply_chain:transit-summaries:v1 drops from ~502 KB to <50 KB.
- RPC Redis reads stay well under 1.5 s in the hot path.
- Silent zero-state pinning is now impossible: null reads → 2-min neg
  cache → self-heal on next relay tick.

* fix(supply-chain): address PR #3185 review — stop caching empty/error + fix partial coverage

Two P1 regressions caught in review:

1. Client cache poisoning on empty/error (MapPopup.ts, SupplyChainPanel.ts)
   Empty-array is truthy in JS, so MapPopup's `!cached && !inflight` branch
   never fired once we cached []. Neither `cached && cached.length` fired
   either — popup stuck on "Loading transit history..." for the session.
   SupplyChainPanel had the explicit `cached && !cached.length` branch but
   still never retried, so the same transient became session-sticky there too.

   Fix: cache ONLY non-empty successful responses. Empty/error show the
   "History unavailable" placeholder but leave the cache untouched, so the
   next re-expand retries. The /get-chokepoint-history gateway tier is
   "slow" (5-min CF edge cache) → retries stay cheap.

2. Partial portwatch coverage treated as healthy (ais-relay.cjs)
   seedTransitSummaries iterated Object.entries(pw), so if seed-portwatch
   dropped N of 13 chokepoints (ArcGIS reject/empty), summaries had <13 keys.
   get-chokepoint-status upstreamUnavailable fires only on fully-empty
   summaries, so the N missing chokepoints fell through to zero-state rows
   that got pinned in cache for 5 minutes.

   Fix: iterate CANONICAL_IDS (Object.keys(CHOKEPOINT_THREAT_LEVELS)) and
   fill zero-state for any ID missing from pw. Shape is consistently 13
   keys. Track pwCovered → envelope + seed-meta recordCount reflect real
   upstream coverage (not shape size), so health.js can distinguish 13/13
   healthy from 10/13 partial. Warn-log on shortfall.

Tests: new regression guards
- panel must NOT cache empty arrays (historyCache.set with []).
- writer must iterate CANONICAL_IDS, not Object.entries(pw).
- seed-meta recordCount binds to pwCovered.

5718/5718 data tests pass. typecheck + typecheck:api clean.
2026-04-18 23:14:00 +04:00
Elie Habib
d1e084061d fix(sw): preserve open modals when tab-hide auto-reload would fire (#3184)
* fix(sw): preserve open modals when tab-hide auto-reload would fire

Scenario: a Pro user opens the Clerk sign-in modal, enters their email,
and switches to their mail app to fetch the code. If a deploy happens
while they wait and the SW update toast's 5 s dwell window has elapsed,
`visibilitychange: hidden` triggers `window.location.reload()` — which
wipes the Clerk flow, so the code in the inbox is for a now-dead attempt
and the user has to re-request. Same failure applies to UnifiedSettings,
the ⌘K search modal, story/signal popups, and anything else with modal
semantics: leaving the tab = lose your place.

Fix: in `sw-update.ts`, the hidden-tab auto-reload now checks for any
open modal/dialog via a compound selector (`[aria-modal="true"],
[role="dialog"], .modal, .cl-modalBackdrop, dialog[open]`) and
suppresses the reload when one matches. Covers Clerk's `.cl-modalBackdrop`,
the site-wide `.modal` convention (UnifiedSettings, WidgetChatModal),
and any well-authored dialog. The reload stays armed — next tab-hide
after the modal closes fires it. Manual "Reload" button click is
unaffected (explicit user intent).

Over-matching is safe (worst case: user clicks Reload manually).
Under-matching keeps the bug, so the selector errs generous.

Tests: three new cases cover modal-open suppression, re-arming after
modal close, and manual-click bypass. 25/25 sw-update tests pass.

Follow-up ticket worth filing: add `aria-modal="true"` + `role="dialog"`
to the modals that are missing them (SearchModal, StoryModal, SignalModal,
WidgetChatModal, McpConnectModal, MobileWarningModal, CountryIntelModal,
UnifiedSettings). That's the proper long-term a11y fix and would let us
narrow the selector once coverage is complete.

* fix(sw): filter modal guard by actual visibility, not just DOM presence

Addresses review feedback on #3184:

The previous selector (`[role="dialog"]` etc.) matched the UnifiedSettings
overlay, which is created in its constructor at app startup
(App.ts:977 → UnifiedSettings.ts:68-71 sets role="dialog") and stays in
the DOM for the whole session. That meant auto-reload was effectively
disabled for every user, not just those with an actually-open modal.

Fix: don't just check for selector matches — check whether the matched
element is actually rendered. Persistent modal overlays hide themselves
via `display: none` (main.css:6744: `.modal-overlay { display: none }`)
and reveal via an `.active` class (main.css:6750: `.active { display: flex }`),
so `offsetParent === null` cleanly distinguishes closed from open. We
prefer `checkVisibility()` where available (Chrome 105+, Safari 17.4+,
Firefox 125+, which covers virtually all current WM users) and fall back
to `offsetParent` otherwise.

This also handles future modals automatically, without needing us to
enumerate every `.xxx-modal-overlay.active` class the site might
introduce.

New tests:
- Modal mounted AND visible → reload suppressed (original Clerk case)
- Modal mounted but hidden → reload fires (reviewer's regression case)
- Modal visible, then hidden on return → reload fires on next tab-hide
- Manual Reload click unaffected in all cases

26/26 sw-update tests pass.

* fix(sw): replace offsetParent fallback with getClientRects for fixed overlays

Addresses second review finding on #3184:

The previous fallback `el.offsetParent !== null` silently failed on every
`position: fixed` overlay — which is every modal in this app:

- `.modal-overlay` (main.css:6737) — UnifiedSettings, WidgetChatModal
- `.story-modal-overlay` (main.css:3442)
- `.country-intel-modal-overlay` active state (main.css:18415)

MDN: `offsetParent` is specified to return null for any `position: fixed`
element, regardless of visibility. So on Firefox <125 or Safari <17.4
(where `Element.checkVisibility()` is unavailable), `isModalOpen` would
return false for actually-open modals → auto-reload fires → Clerk sign-in
and every other fixed-position flow gets wiped exactly as PR #3184 was
meant to prevent.

Fix: fall back to `getClientRects().length > 0`. This returns 0 for
`display: none` elements (how `.modal-overlay` hides when `.active` is
absent) and non-zero for rendered elements, including position:fixed.
It's universally supported and matches the semantics we want.

New tests exercise the fallback path explicitly with a `supportsCheckVisibility`
toggle on the fake env:

- visible position:fixed modal + no checkVisibility → reload suppressed
- hidden mounted modal + no checkVisibility → reload fires

28/28 sw-update tests pass.

* fix(a11y): add role=dialog + aria-modal=true to five missing modals

Addresses third review finding on #3184.

SW auto-reload guard uses a `[role="dialog"]` selector but five modals
were missing the attribute, so `isModalOpen()` returned false and the
page could still auto-reload mid-flow on those screens. Broadening the
selector to enumerate specific class names was rejected because the app
has many non-modal `-overlay` classes (`#deckgl-overlay`,
`.conflict-label-overlay`, `.layer-warn-overlay`, `.mobile-menu-overlay`)
that would cause false positives and permanently disable auto-reload.

Instead, standardize on the existing convention used by UnifiedSettings:
every modal overlay sets `role="dialog"` + `aria-modal="true"` at
creation. This makes the SW selector work AND improves screen-reader
behavior (focus trap, background element suppression).

Modals updated:
- SearchModal (⌘K search) — both mobile sheet and desktop variants use
  the same element, single set-attributes call at create time
- StoryModal (news story detail)
- SignalModal (instability spike detail)
- CountryIntelModal (country deep-dive overlay)
- MobileWarningModal (mobile device warning)

No change to sw-update.ts — the existing selector already covers the
newly-attributed elements. All 28 sw-update tests still pass; typecheck
clean.
2026-04-18 22:54:58 +04:00
Elie Habib
55ac431c3f feat(brief): public share mirror + in-magazine Share button (#3183)
* feat(brief): public share mirror + in-magazine Share button

Adds the growth-vector piece listed under Future Considerations in the
original brief plan (line 399): a shareable public URL and a one-click
Share button on the reader's magazine.

Problem: the per-user magazine at /api/brief/{userId}/{issueDate} is
HMAC-signed to a specific reader. You cannot share the URL you are
looking at, because the recipient either 403s (bad token) or reads
your personalised issue against your userId. Result: no way to share
the daily brief, no way for readers to drive discovery. Opening a
growth loop requires a separate public surface.

Approach: deterministic HMAC-derived short hash per {userId,
issueDate} backed by a pointer key in Redis.

New files

- server/_shared/brief-share-url.ts
  Web Crypto HMAC helper. deriveShareHash returns 12 base64url chars
  (72 bits) from (userId, issueDate) using BRIEF_SHARE_SECRET.
  Pointer encode/decode helpers and a shape check. Distinct from the
  per-user BRIEF_URL_SIGNING_SECRET so a leak of one does not
  automatically unmask the other.

- api/brief/share-url.ts (edge, Clerk auth, Pro gated)
  POST /api/brief/share-url?date=YYYY-MM-DD
  Idempotently writes brief:public:{hash} pointer with the same 7 day
  TTL as the underlying brief, then returns {shareUrl, hash,
  issueDate}. 404 if the per-user brief is missing. 503 on Upstash
  failure. Accepts an optional refCode in the JSON body for referral
  attribution.

- api/brief/public/[hash].ts (edge, unauth)
  GET /api/brief/public/{hash}?ref={code}
  Reads pointer, reads the real brief envelope, renders with
  publicMode=true. Emits X-Robots-Tag: noindex,nofollow so shared
  briefs never get enumerated by search engines. 404 on any missing
  part (bad hash shape, missing pointer, missing envelope) with a
  neutral error page. 503 on Upstash failure.

Renderer changes (server/_shared/brief-render.js)

- Signature extended: renderBriefMagazine(envelope, options?)
  - options.publicMode: redacts user.name and whyMatters before any
    HTML emission; swaps the back cover to a Subscribe CTA; prepends
    a Subscribe strip across the top of the deck; omits the Share
    button + share script; adds a noindex meta tag.
  - options.refCode: appended as ?ref= to /pro links on public views.
- Non-public views gain a sticky .wm-share pill in the top-right
  chrome. Inline SHARE_SCRIPT handles the click flow: POST /api/
  brief/share-url then navigator.share with clipboard fallback and a
  prompt() ancient-browser fallback. User-visible feedback via
  data-state on the button (sharing / copied / error). No change to
  the envelope contract, no LLM calls, no composer-side work
  required.
- Validation runs on the full unredacted envelope first, so the
  public path can never accept a shape the private path would reject.

Tests

- tests/brief-share-url.test.mts (18 assertions): determinism,
  secret sensitivity, userId/date sensitivity, shape validation, URL
  composition with/without refCode, trailing-slash handling on
  baseUrl, pointer encode/decode round-trip.
- tests/brief-magazine-render.test.mjs (+13 assertions): Share
  button carries the issue date; share script emitted once;
  share-url endpoint wired; publicMode strips the button+script,
  replaces whyMatters, emits noindex meta, prepends Subscribe strip,
  passes refCode through with escaping, swaps the back cover, does
  not leak the user name, preserves story headlines, options-less
  call matches the empty-options call byte for byte.
- Full typecheck/lint/edge-bundle/test:data/edge-functions suite all
  green: 5704/5704 data tests, 171/171 edge-function tests, 0 lint
  errors.

Env vars (new)

- BRIEF_SHARE_SECRET: 64+ random hex chars, Vercel (edge) only. NOT
  needed by the Railway composer because pointer writes are lazy
  (on share, not on compose).

* fix(brief): public share round-trip + magazine Share button without auth

Two P1 findings on #3183 review.

1) Pointer wire format: share-url.ts wrote the pointer as a raw colon-delimited string via SET. The public route reads via readRawJsonFromUpstash which ALWAYS JSON.parses. A bare non-JSON string throws at parse, the route returned 503 instead of resolving. Fix: JSON.stringify on both write sites. Regression test locks the wire format.

2) Share button auth unreachable from a standalone magazine tab: inline script needed window.WM_CLERK_JWT which is never set, endpoint hard-requires Bearer, fallback to credentials:include fails. Fix: derive share URL server-side in the per-user route (same inputs share-url uses), embed as data-share-url, click handler now reads dataset and invokes navigator.share directly. No network, no auth, works in any tab.

The /api/brief/share-url endpoint stays in place for other callers (dashboard panel) with its Clerk auth intact and its pointer write now in the correct format.

QA: typecheck clean, 5708/5708 data tests, 45/45 magazine, 20/20 share-url, edge bundle OK, lint exit 0.

* fix(brief): address remaining review findings on #3183

P0-2 (comment-only): public/[hash].ts inline comment incorrectly described readRawJsonFromUpstash parse-failure behaviour. The helper rethrows on JSON.parse failure, it does not return null. Rewrote the comment to match reality (JSON-encoded wire format, parse-to-string round-trip, intentional 503-on-bug-value as the loud failure mode). The actual wire-format fix was in prior commit 045771d55.

P2 (consistency): publicStripHtml href was built via template literal + encodeURIComponent without the final escapeHtml wrap that renderBackCover uses. Safe in practice (encodeURIComponent handles all HTML-special chars + route boundary restricts refCode to [A-Za-z0-9_-]) but inconsistent. Unified by extracting publicStripHref and escaping on interpolation, matching the sibling function.

QA: typecheck clean, 45/45 magazine tests pass, lint exit 0.
2026-04-18 22:46:22 +04:00
Elie Habib
81536cb395 feat(brief): source links, LLM descriptions, strip suffix (envelope v2) (#3181)
* feat(brief): source links, LLM descriptions, strip publisher suffix (envelope v2)

Three coordinated fixes to the magazine content pipeline.

1. Headlines were ending with " - AP News" / " | Reuters" etc. because
   the composer passed RSS titles through verbatim. Added
   stripHeadlineSuffix() in brief-compose.mjs, conservative case-
   insensitive match only when the trailing token equals primarySource,
   so a real subtitle that happens to contain a dash still survives.

2. Story descriptions were the headline verbatim. Added
   generateStoryDescription to brief-llm.mjs, plumbed into
   enrichBriefEnvelopeWithLLM: one additional LLM call per story,
   cached 24h on a v1 key covering headline, source, severity,
   category, country. Cache hits are revalidated via
   parseStoryDescription so a bad row cannot flow to the envelope.
   Falls through to the cleaned headline on any failure.

3. Source attribution was plain text, no outgoing link. Bumped
   BRIEF_ENVELOPE_VERSION to 2, added BriefStory.sourceUrl. The
   composer now plumbs story:track:v1.link through
   digestStoryToUpstreamTopStory, UpstreamTopStory.primaryLink,
   filterTopStories, BriefStory.sourceUrl. The renderer wraps the
   Source line in an anchor with target=_blank, rel=noopener
   noreferrer, and UTM params (utm_source=worldmonitor,
   utm_medium=brief, utm_campaign=<issueDate>, utm_content=story-
   <rank>). UTM appending is idempotent, publisher-attributed URLs
   keep their own utm_source.

Envelope validation gains a validateSourceUrl step (https/http only,
no userinfo credentials, parseable absolute URL). Stories without a
valid upstream link are dropped by filterTopStories rather than
shipping with an unlinked source.

Tests: 30 renderer tests to 38; new assertions cover UTM presence on
every anchor, HTML-escaping of ampersands in hrefs, pre-existing UTM
preservation, and all four validator rejection modes. New composer
tests cover suffix stripping, link plumb-through, and v2 drop-on-no-
link behaviour. New LLM tests for generateStoryDescription cover
cache hit/miss, revalidation of bad rows, 24h TTL, and null-on-
failure.

* fix(brief): v1 back-compat window on renderer + consolidate story hash helper

Two P1/P2 review findings on #3181.

P1 (v1 back-compat). Bumping BRIEF_ENVELOPE_VERSION 1 to 2 made every
v1 envelope still resident in Redis under the 7-day TTL fail
assertBriefEnvelope. The hosted /api/brief route would 404 "expired"
and the /api/latest-brief preview would downgrade to "composing",
breaking already-issued links from the preceding week.

Fix: renderer now accepts SUPPORTED_ENVELOPE_VERSIONS = Set([1, 2])
on READ. BRIEF_ENVELOPE_VERSION stays at 2 and is the only version
the composer ever writes. BriefStory.sourceUrl is required when
version === 2 and absent on v1; when rendering a v1 story the source
line degrades to plain text (no anchor), matching pre-v2 appearance.
When the TTL window passes the set can shrink to [2] in a follow-up.

P2 (hash dedup). hashStoryDescription was byte-identical to hashStory,
inviting silent drift if one prompt gains a field the other forgets.
Consolidated into hashBriefStory. Cache key separation remains via
the distinct prefixes (brief:llm:whymatters:v2:/brief:llm:description:v1:).

Tests: adds 3 v1 back-compat assertions (plain source line, field
validation still runs, defensive sourceUrl check), updates the
version-mismatch assertion to match the new supported-set message.
161/161 pass (was 158). Full test:data 5706/5706.
2026-04-18 21:49:17 +04:00
Elie Habib
8fc302abd9 fix(brief): mobile layout — stack story callout, floor digest typography (#3180)
* fix(brief): mobile layout — stack story callout, floor digest typography

On viewports <=640px the 55/45 story grid cramped both the headline and
the "Why this is important" callout to ~45% width each, and several
digest rules used raw vw units (blockquote 2vw, threads 1.55vw) that
collapsed to ~7-8px on a 393px iPhone frame before the browser min
clamped them to barely-readable.

Appends a single @media (max-width: 640px) block to the renderer's
STYLE_BLOCK:

- .story becomes a flex column — callout stacks under the headline,
  no column squeeze. Headline goes full-width at 9.5vw.
- Digest blockquote, threads, signals, and stat rows get max(Npx, Nvw)
  floors so they never render below ~15-17px regardless of viewport.
- Running-head stacks on digest and the absolute page-number gets
  right-hand clearance so they stop overlapping.
- Tags and source labels pinned to 11px (were scaling down with vw).

CSS-only; no envelope, no HTML structure, no new classes. All 30
renderBriefMagazine tests still pass.

* fix(brief): raise mobile digest px floors and running-head clearance

Two P2 findings from PR review on #3180:

1. .digest .running-head padding-right: 18vw left essentially zero
   clearance from the absolute .page-number block on iPhone SE (375px)
   and common Android (360px). Bumped to 22vw (~79px at 360px) which
   accommodates "09 / 12" in IBM Plex Mono at the right:5vw offset
   with a one-vw safety margin.

2. Mobile overrides were lowering base-rule px floors (thread 17px to
   15px, signal 18px to 15px). On viewports <375px this rendered
   digest body text smaller than desktop. Kept the px floors at or
   above the base rules so effective size only ever goes up on mobile.
2026-04-18 21:37:40 +04:00
Elie Habib
388995b1a4 fix(health): macroSignals maxStaleMin 20 → 150 to match seed-economy cron cadence (#3179)
macroSignals is a secondary key written by seed-economy.mjs, whose
primary key energy-prices has maxStaleMin=150 in its runSeed config.
A 20-min threshold guaranteed STALE_SEED between every cron run.
2026-04-18 20:50:48 +04:00
Elie Habib
6f6102e5a7 feat(brief): swap sienna rust for two-strength WM mint (Option B palette) (#3178)
* feat(brief): swap sienna rust for two-strength WM mint (Option B palette)

The only off-brand color in the product was the brief's sienna rust
(#8b3a1f) accent. Every other surface — /pro landing, dashboard,
dashboard panels — uses the WM mint green (#4ade80). Swapping the
brief's accent to the brand mint makes the magazine read as a sibling
of /pro rather than a separate editorial product, while keeping the
magazine-grade serif typography and even/odd page inversion intact.

Implementation (user picked Option B from brief-palette-playground.html):
  --sienna  : #8b3a1f -> #3ab567   muted mint for LIGHT pages (readable
                                   on #fafafa without the bright-mint
                                   glare of a pure-brand swap)
  --mint    :         + #4ade80   bright WM mint for DARK pages
                                   (matches /pro exactly)
  --cream   : #f1e9d8 -> #fafafa   unified with --paper; one crisp white
  --cream-ink: #1a1612 -> #0a0a0a  crisper contrast on the new paper

Accent placement (unchanged structurally — only colors swapped):
  - Digest running heads, labels, blockquote rule, stats dividers,
    end-marker rule, signal/thread tags: all muted mint on light
  - Story source line: newly mint (was unstyled bone/ink at 0.6 opacity);
    two-strength — muted on light stories, bright on dark
  - Logo ekg dot: mint on every page so the brand 'signal' pulse
    threads through the whole magazine

No layout changes. No HTML structure changes. Only color constants +
a ~20-line CSS addition for story-source + ekg-dot accents.

165/165 brief tests pass (renderer contract unchanged — envelope shape
identical, only computed styles differ). Both tsconfigs typecheck clean.

* fix(brief): darken light-page mint to pass WCAG AA + fix digest ekg-dot

Two P2 findings on PR #3178 review.

1. Digest ekg-dot used bright #4ade80 on a #fafafa background,
   contradicting the code comment that said 'light pages use the
   muted mint'. The rule was grouped with .cover and .story.dark
   (both ink backgrounds) when it should have been grouped with
   .story.light (paper background). Regrouped.

2. #3ab567 on #fafafa tests at ~2.31:1 — fails WCAG AA 4.5:1 for
   every text size and fails the 3:1 large-text floor. The PR called
   this a rollback trigger; contrast math says it would fail every
   meaningful text usage (mono running heads, source lines, labels,
   footer captions). Swapped --sienna from #3ab567 to #1f7a3f —
   tested at ~4.90:1 on #fafafa, passes AA for normal text.

   Kept the variable name '--sienna' for backwards compat (every
   .digest rule references it). The hue stays recognisably mint-
   family (green dominant) so the brand relationship with #4ade80
   on dark pages is still clear to a reader. Dark-page mint is
   unchanged — #4ade80 on #0a0a0a is ~11.4:1, passes AAA.

Playground (brief-palette-playground.html) updated to match so
future iterations work against the accessible value.

165/165 brief tests pass. Both tsconfigs typecheck clean.
2026-04-18 20:50:16 +04:00
Elie Habib
048e5486ac fix(brief): Latest Brief panel locks out Pro users — gate reads Clerk metadata, not entitlement (#3177)
* fix(brief): Latest Brief panel locks out Pro users — gate reads wrong field

Reported: PR #3166 shipped a WEB_CLERK_PRO_ONLY_PANELS gate that
downgrades the Latest Brief panel to FREE_TIER/ANONYMOUS when the
user isn't Clerk-Pro. The downgrade condition was:

  state.user?.role !== 'pro'

state.user.role is derived from Clerk's publicMetadata.plan via
getCurrentClerkUser(). That field is NOT kept in sync with the
real entitlement for many users — the source of truth is the
Convex entitlements table, not Clerk metadata. Result: a confirmed
Pro user (Convex entitlement.features.tier = 1+) sees every other
premium panel unlock (hasPremiumAccess consults isEntitled() per
PR #3167) but the Latest Brief panel shows 'Upgrade to Pro'.

Fix: swap the condition from 'state.user?.role !== \'pro\'' to
'!hasTier(1)' — the same Convex-backed entitlement check
hasPremiumAccess uses. The panel's own auth subscription keeps the
separate role-based guard inside refresh() as defence in depth
(belt-and-suspenders), but the top-level gating no longer
over-fires on the wrong field.

No new behaviour for users without an entitlement. Typecheck clean.

* fix(brief): panel-side role gate also reads Convex entitlement (not Clerk metadata)

Reviewer caught that the prior PR (#3177) fixed the layout-level gate
but left the panel's own refresh() guard reading authState.user.role
— same stale-publicMetadata bug. A user whose Convex entitlement
says tier=1 but whose Clerk publicMetadata.plan is unset would
unlock past the layout gate (now correct) and then still hit the
panel's local renderUpgradeRequired() path.

Fix: swap the local role check to hasTier(1) — the same Convex
snapshot the layout now consults. Now BOTH gates agree on the
source of truth.

* fix(brief): defer Pro gate when entitlement snapshot hasn't arrived yet

Review flagged a transient 'Upgrade to Pro' flash for Pro users on
initial load. The auth-state subscription can fire before the Convex
entitlement snapshot arrives; hasTier(1) returns false by default
when currentState is null, so a Pro user briefly sees the upgrade
overlay until onEntitlementChange re-runs the gate with the real
snapshot.

Fix: treat 'entitlement not yet loaded' as distinct from 'free user'.
Both panel-layout.ts gate AND LatestBriefPanel.refresh() now check
getEntitlementState() !== null before applying the Clerk-Pro-only
downgrade. During the unknown window the panel stays in its loading
state; the onEntitlementChange listener re-runs updatePanelGating
once the snapshot lands and either unlocks or gates correctly.

No behaviour change for free users (entitlement snapshot arrives
with tier=0, still correctly gates). No behaviour change for the
steady-state Pro case. Only the cold-start window differs: flash
of upgrade-overlay → clean loading state.

* fix(brief): drop client entitlement gate from panel refresh — let server decide

Reviewer's sharper read on PR #3177: the prior 'defer-if-unknown' fix
still blocks Pro users whenever the Convex entitlement subscription
is late, skipped, or failed to establish. getEntitlementState() can
stay null indefinitely if the Convex client auth never connects;
hasTier(1) would stay false; the panel would stay on renderLoading()
forever and the server-side /api/latest-brief check would never even
fire.

The correct architecture: the server is authoritative. /api/latest-
brief already does its own entitlement check against the Clerk JWT.
Client-side entitlement is a fast-path optimisation, never a gate.

Fix: switch both call sites to AFFIRMATIVE DENIAL ONLY.

  LatestBriefPanel.refresh()
    Before: if snapshot null -> renderLoading (fetch never fires);
            if snapshot + free -> renderUpgradeRequired.
    After:  if snapshot != null AND !hasTier(1) -> renderUpgradeRequired.
            Otherwise fall through and FIRE THE FETCH. The 403 path
            (BriefAccessError 'upgrade_required') already renders the
            upgrade CTA when the server says free.

  panel-layout.ts updatePanelGating
    Already shaped as affirmative-denial (snapshot != null AND !hasTier).
    Updated the comment to make the invariant explicit so a future
    refactor doesn't flip it back to positive-gating.

Consequence: an API-key-only user with a free Clerk account will
fire one doomed fetch per refresh and see renderUpgradeRequired a
beat later than before. Accepted — the alternative locked legitimate
Pro users out whenever Convex was anything other than perfectly
healthy, which is a materially worse failure mode.

Both tsconfigs typecheck clean. No test changes needed — the
BriefAccessError path was already covered by existing tests.
2026-04-18 20:49:39 +04:00
Elie Habib
b5824d0512 feat(brief): Phase 9 / Todo #223 — share button + referral attribution (#3175)
* feat(brief): Phase 9 / Todo #223 — share button + referral attribution

Adds a Share button to the dashboard Brief panel so PRO users can
spread WorldMonitor virally. Built on the existing referral plumbing
(registrations.referralCode + referredBy fields; api/register-interest
already passes referredBy through) — this PR fills in the last mile:
a stable referral code for signed-in Clerk users, a share URL, and
a client-side share sheet.

Files:

  server/_shared/referral-code.ts (new)
    Deterministic 8-char hex code: HMAC(BRIEF_URL_SIGNING_SECRET,
    'referral:v1:' + userId). Same Clerk userId always produces the
    same code. No DB write on login, no schema migration, stable for
    the life of the account.

  api/referral/me.ts (new)
    GET -> { code, shareUrl, invitedCount }. Bearer-auth via Clerk.
    Reuses BRIEF_URL_SIGNING_SECRET to avoid another Railway env
    var. Stats fail gracefully to 0 on Convex outage.

  convex/registerInterest.ts + convex/http.ts
    New internal query getReferralStatsByCode({referralCode}) counts
    registrations rows that named this code as their referredBy.
    Exposed via POST /relay/referral-stats (RELAY_SHARED_SECRET auth).

  src/services/referral.ts (new)
    getReferralProfile: 5-min cache, profile is effectively immutable
    shareReferral: Web Share API primary (mobile native sheet),
    clipboard fallback on desktop. Returns 'shared'/'copied'/'blocked'
    /'error'. AbortError is treated as 'blocked', not failure.
    clearReferralCache for account-switch hygiene.

  src/components/LatestBriefPanel.ts + src/styles/panels.css
    New share row below the brief cover card. Button disabled until
    /api/referral/me resolves; if fetch fails the row removes itself.
    invitedCount > 0 renders as 'N invited' next to the button.
    Referral cache invalidated alongside Clerk token cache on account
    switch (otherwise user B would see user A's share link for 5 min).

Tests: 10 new cases in tests/brief-referral-code.test.mjs
  - getReferralCodeForUser: hex shape, determinism, uniqueness,
    secret-rotation invalidates, input guards
  - buildShareUrl: path shape, trailing-slash trim, URL-encoding

153/153 brief + deploy tests pass. Both tsconfigs typecheck clean.

Attribution flow (already working end-to-end):
  1. Share button -> worldmonitor.app/pro?ref={code}
  2. /pro landing page already reads ?ref= and passes to
     /api/register-interest as referredBy
  3. convex registerInterest:register increments the referrer's
     referralCount and stores referredBy on the new row
  4. /api/referral/me reads the count back via the relay query
  5. 'N invited' updates on next 5-min cache refresh

Scope boundaries (deferred):
  - Convex conversion tracking (invited -> PRO subscribed). Needs a
    join from registrations.referredBy to subscriptions.userId via
    email. Surface 'N converted' in a follow-up.
  - Referral-credit / reward system: viral loop works today, reward
    logic is a separate product decision.

* fix(brief): address three P2 review findings on #3175

- api/referral/me.ts JSDoc said '503 if REFERRAL_SIGNING_SECRET is
  not configured' but the handler actually reads BRIEF_URL_SIGNING_SECRET.
  Updated the docstring so an operator chasing a 503 doesn't look for
  an env var that doesn't exist.

- server/_shared/referral-code.ts carried a RESERVED_CODES Set to
  avoid collisions with URL-path keywords ('index', 'robots', 'admin').
  The guard is dead code: the code alphabet is [0-9a-f] (hex output
  of the HMAC) so none of those non-hex keywords can ever appear.
  Removed the Set + the while loop; left a comment explaining why it
  was unnecessary so nobody re-adds it.

- src/components/LatestBriefPanel.ts passed disabled: 'true' (string)
  to the h() helper. DOM-utils' h() calls setAttribute for unknown
  props, which does disable the button — but it's inconsistent with
  the later .disabled = false property write. Fixed to the boolean
  disabled: true so the attribute and the IDL property agree.

10/10 referral-code tests pass. Both tsconfigs typecheck clean.

* fix(brief): address two review findings on #3175 — drop misleading count + fix user-agnostic cache

P1: invitedCount wired to the wrong attribution store.
The share URL is /pro?ref=<code>. On /pro the 'ref' feeds
Dodopayments checkout metadata (affonso_referral), NOT
registrations.referredBy. /api/referral/me counted only the
waitlist path, so the panel would show 0 invited for anyone who
converted direct-to-checkout — misleading.

Rather than ship a count that measures only one of two attribution
paths (and the less-common one at that), the count is removed
entirely. The share button itself still works. A proper metric
requires unifying the waitlist + Dodo-metadata paths into a single
attribution store, which is a follow-up.

Changes:
  - api/referral/me.ts: response shape is { code, shareUrl } — no
    invitedCount / convertedCount
  - convex/registerInterest.ts: removed getReferralStatsByCode
    internal query
  - convex/http.ts: removed /relay/referral-stats route
  - src/services/referral.ts: ReferralProfile interface no longer
    has invitedCount; fetch call unchanged in behaviour
  - src/components/LatestBriefPanel.ts: dropped the 'N invited'
    render branch

P2: referral cache was user-agnostic.
Module-global _cached had no userId key, so a stale cache primed by
user A would hand user B user A's share link for up to 5 min after
an account switch — if no panel is mounted at the transition moment
to call clearReferralCache(). Per the reviewer's point, this is a
real race.

Fix: two-part.
  (a) Cache entry carries the userId it was computed for; reads
      check the current Clerk userId and only accept hits when they
      match. Mismatch → drop + re-fetch.
  (b) src/services/referral.ts self-subscribes to auth-state at
      module load (ensureAuthSubscription). On any id transition
      _cached is dropped. Module-level subscription means the
      invalidation works even when no panel is currently mounted.
  (c) Belt-and-suspenders: post-fetch, re-check the current user
      before caching. Protects against account switches that happen
      mid-flight between 'read cache → ask network → write cache'.

Panel's local clearReferralCache() call removed — module now
self-invalidates.

10/10 referral-code tests pass. Both tsconfigs typecheck clean.

* fix(referral): address P1 review finding — share codes now actually credit the sharer

The earlier head generated 8-char Clerk-derived HMAC codes for the
share button, but the waitlist register mutation only looked up
registrations.by_referral_code (6-char email-generated codes).
Codes issued by the share button NEVER resolved to a sharer — the
'referral attribution' half of the feature was non-functional.

Fix (schema-level, honest attribution path):

  convex/schema.ts
    - userReferralCodes   { userId, code, createdAt }  + by_code, by_user
    - userReferralCredits { referrerUserId, refereeEmail, createdAt }
                          + by_referrer, by_referrer_email

  convex/registerInterest.ts
    - register mutation: after the existing registrations.by_referral_code
      lookup, falls through to userReferralCodes.by_code. On match,
      inserts a userReferralCredits row (the Clerk user has no
      registrations row to increment, so credit needs its own
      table). Dedupes by (referrer, refereeEmail) so returning
      visitors can't double-credit.
    - new internalMutation registerUserReferralCode({userId, code})
      idempotent binding of a code to a userId. Collisions logged
      and ignored (keeps first writer).

  convex/http.ts
    - new POST /relay/register-referral-code (RELAY_SHARED_SECRET auth)
      that runs the mutation above.

  api/referral/me.ts
    - signature gains a ctx.waitUntil handle
    - after generating the user's code, fire-and-forget POSTs to
      /relay/register-referral-code so the binding is live by the
      time anyone clicks a shared link. Idempotent — a failure just
      means the NEXT call re-registers.

Still deferred: display of 'N credited' / 'N converted' in the
LatestBriefPanel. The waitlist side now resolves correctly, but the
Dodopayments checkout path (/pro?ref=<code> → affonso_referral) is
tracked in Dodo, not Convex. Surfacing a unified count requires a
separate follow-up to pull Dodo metadata into Convex.

Regression tests (3 new cases in tests/brief-referral-code.test.mjs):
  - register mutation extends to userReferralCodes + inserts credits
  - schema declares both tables with the right indexes
  - /api/referral/me registers the binding via waitUntil

13/13 referral tests pass. Both tsconfigs typecheck clean.

* fix(referral): address two P1 review findings — checkout attribution + dead-link prevention

P1: share URL didn't credit on the /pro?ref= checkout path.
The earlier PR wired Clerk codes into the waitlist path
(/api/register-interest -> userReferralCodes -> userReferralCredits)
but a visitor landing on /pro?ref=<code> and going straight to
Dodo checkout forwarded the code only into Dodo metadata
(affonso_referral). Nothing on our side credited the sharer.

Fix: convex/payments/subscriptionHelpers.ts handleSubscriptionActive
now reads data.metadata.affonso_referral when inserting a NEW
subscription row. If the code resolves in userReferralCodes, a
userReferralCredits row crediting the sharer is inserted (deduped
by (referrer, refereeEmail) so webhook replays don't double-credit).
The credit only lands on first-activation — the else-branch of the
existing/new split guards against replays.

P1: /api/referral/me returned 200 + share link even when the
(code, userId) binding failed.
ctx.waitUntil(registerReferralCodeInConvex(...)) ran the binding
asynchronously, swallowing missing env + non-2xx + network errors.
Users got a share URL that the waitlist lookup could never
resolve — dead link.

Fix: registerReferralCodeInConvex is now BLOCKING (throws on any
failure) and the handler awaits it before returning. On failure
the endpoint responds 503 service_unavailable rather than a 200
with a non-functional URL. Mutation is idempotent so client
retries are safe.

Regression tests (2 updated/new in tests/brief-referral-code.test.mjs):
  - asserts the binding is awaited, not ctx.waitUntil'd; asserts
    the failure path returns 503
  - asserts subscriptionHelpers reads affonso_referral, resolves via
    userReferralCodes.by_code, inserts a userReferralCredits row,
    and dedupes by (referrer, refereeEmail)

14/14 referral tests pass. Both tsconfigs typecheck clean.

Net effect: /pro?ref=<code> visitors who convert (direct checkout)
now credit the sharer on webhook receipt, same as waitlist
signups. The share button is no longer a dead-end UI.
2026-04-18 20:39:55 +04:00
Elie Habib
122204f691 feat(brief): Phase 8 — Telegram carousel images via Satori + resvg-wasm (#3174)
* feat(brief): Phase 8 — Telegram carousel images via Satori + resvg-wasm

Implements the Phase 8 carousel renderer (Option B): server-side PNG
generation in a Vercel edge function using Satori (JSX to SVG) +
@resvg/resvg-wasm (SVG to PNG). Zero new Railway infra, zero
Chromium, same edge runtime that already serves the magazine HTML.

Files:

  server/_shared/brief-carousel-render.ts (new)
    Pure renderer: (BriefEnvelope, CarouselPage) -> Uint8Array PNG.
    Three layouts (cover/threads/story), 1200x630 OG size.
    Satori + resvg + WASM are lazy-imported so Node tests don't trip
    over '?url' asset imports and the 800KB wasm doesn't ship in
    every bundle. Font: Noto Serif regular, fetched once from Google
    Fonts and memoised on the edge isolate.

  api/brief/carousel/[userId]/[issueDate]/[page].ts (new)
    Public edge function reusing the magazine route's HMAC token —
    same signer, same (userId, issueDate) binding, so one token
    unlocks magazine HTML AND all three carousel images. Returns
    image/png with 7d immutable cache headers. 404 on invalid page
    index, 403 on bad token, 404 on Redis miss, 503 on missing
    signing secret. Render failure falls back to a 1x1 transparent
    PNG so Telegram's sendMediaGroup doesn't 500 the brief.

  scripts/seed-digest-notifications.mjs
    carouselUrlsFrom(magazineUrl) derives the 3 signed carousel
    URLs from the already-signed magazine URL. sendTelegramBriefCarousel
    calls Telegram's sendMediaGroup with those URLs + short caption.
    Runs before the existing sendTelegram(text) so the carousel is
    the header and the text the body — long-form stories remain
    forwardable as text. Best-effort: carousel failure doesn't
    block text delivery.

  package.json + package-lock.json
    satori ^0.10.14 + @resvg/resvg-wasm ^2.6.2.

Tests (tests/brief-carousel.test.mjs, 9 cases):
  - pageFromIndex mapping + out-of-range
  - carouselUrlsFrom: valid URL, localhost origin preserved, missing
    token, wrong path, invalid issueDate, garbage input
  - Drift guard: cron must still declare the same helper + template
    string. If it drifts, test fails with a pointer to move the impl
    into a shared module.

PNG render itself isn't unit-tested — Satori + WASM need a
browser/edge runtime. Covered by smoke validation step in the
deploy monitoring plan.

Both tsconfigs typecheck clean. 152/152 brief tests pass.

Scope boundaries (deferred):
  - Slack + Discord image attachments (different payload shapes)
  - notification-relay.cjs brief_ready dispatch (real-time route)
  - Redis caching of rendered PNG (edge Cache-Control is enough for
    MVP)

* fix(brief): address two P1 review findings on Phase 8 carousel

P1-A: 200 placeholder PNG cached 7d on render failure.
Route config said runtime: 'edge' but a comment contradicted it
claiming Node semantics. More importantly, any render/init failure
(WASM load, Satori, Google Fonts) was converted to a 1x1 transparent
PNG returned with Cache-Control: public, max-age=7d, immutable.
Telegram's media fetcher and Vercel's CDN would cache that blank
for the full brief TTL per chat message — one cold-start mismatch
= every reader of that brief sees blank carousel previews for a
week.

Fix: deleted errorPng(). Render failure now returns 503 with
Cache-Control: no-store. sendMediaGroup fails cleanly for that
carousel (the digest cron already treats it as best-effort and
still sends the long-form text message), next cron tick re-renders
from a fresh isolate. Self-healing across ticks. Contradictory
comment about Node runtime removed.

P1-B: Google Fonts as silent hard dependency.
The renderer claimed 'safe embedded/fallback path' in comments but
no fallback existed. loadFont() fetches Noto Serif from gstatic.com
and rethrows on any failure. Combined with P1-A's old 200-cache-7d
path, a transient CDN blip would lock in a blank carousel for a
week.

Fix: updated comments to honestly declare the CDN dependency plus
document the self-healing semantics now that P1-A's fix no longer
caches the failure. If Google Fonts reliability becomes a problem,
swap the fetch for a bundled base64 TTF — noted as the escape hatch.

Tests (tests/brief-carousel.test.mjs): 2 new regression cases.
11/11 carousel tests pass. Both tsconfigs typecheck clean locally.

Note on currently-red CI: failures are NOT typecheck errors — npm
ci dies fetching libvips for sharp (504 Gateway Time-out from
GitHub releases). sharp is a transitive dep via @xenova/transformers,
pre-existing, not touched by this PR. Transient infra flake.

* fix(brief): switch carousel to Node + @resvg/resvg-js (fixes deploy block)

Vercel edge bundler fails the carousel deploy with:
  'Edge Function is referencing unsupported modules:
   @resvg/resvg-wasm/index_bg.wasm?url'

The ?url asset-import syntax is a Vite-ism that Vercel's edge
bundler doesn't resolve. Two ways out: find a Vercel-blessed edge
WASM import incantation, or switch to Node runtime with the native
@resvg/resvg-js binding. The second is simpler, faster per request,
and avoids the whole WASM-in-edge-bundler rabbit hole.

Changes:
  - package.json: @resvg/resvg-wasm -> @resvg/resvg-js ^2.6.2
  - api/brief/carousel/.../[page].ts: runtime 'edge' -> 'nodejs20.x'
  - server/_shared/brief-carousel-render.ts: drop initWasm path,
    dynamic-import resvg-js in ensureLibs(). Satori and resvg load
    in parallel via Promise.all, shaving ~30ms off cold start.

Also addresses the P2 finding from review: the old ensureLibsAndWasm
had a concurrent-cold-start race where two callers could reach
'await initWasm()' simultaneously. Replaced the boolean flag with a
shared _libsLoadPromise so concurrent callers await the same load.
On failure the promise resets so the NEXT request retries rather
than poisoning the isolate for its lifetime.

Cold start ~700ms (Satori + resvg-js native init), warm ~40ms.
Carousel images are not latency-critical — fetched by Telegram's
media service, CDN-cached 7d.

Both tsconfigs typecheck clean. 11/11 carousel tests pass.

* fix(brief): carousel runtime = 'nodejs' (was 'nodejs20.x', rejected by Vercel)

Vercel's functions config validator rejects 'nodejs20.x' at deploy
time:

  unsupported "runtime" value in config: "nodejs20.x"
  (must be one of: ["edge","experimental-edge","nodejs"])

The Node version comes from the project's default (currently Node 20
via package.json engines + Vercel project settings), not from the
runtime string. Use 'nodejs' — unversioned — and let the platform
resolve it.

11/11 carousel tests pass.

* fix(brief): swap carousel font from woff2 to woff (Satori can't parse woff2)

Review on PR #3174: the FONT_URL was pointing at a gstatic.com woff2
file. Satori parses ttf / otf / woff v1 — NOT woff2. Every render
was about to throw on font decode, the route would return 503, and
the carousel would never deliver a single image.

Fix: point FONT_URL at @fontsource's Noto Serif Latin 400 WOFF v1
via jsdelivr. WOFF v1 is a TrueType wrapper that Satori parses
natively (verified: file says 'Web Open Font Format, TrueType,
version 1.1'). Same cold-start semantics as before — one fetch per
warm isolate, memoised.

Regression test: asserts FONT_URL ends in ttf/otf/woff and explicitly
rejects any .woff2 suffix. A future swap that silently reintroduces
woff2 now fails CI loudly instead of shipping a permanently-broken
renderer.

12/12 carousel tests pass. Both tsconfigs typecheck clean.
2026-04-18 20:27:41 +04:00
Elie Habib
e1c3b28180 feat(notifications): Phase 6 — web-push channel for PWA notifications (#3173)
* feat(notifications): Phase 6 — web-push channel for PWA notifications

Adds a web_push notification channel so PWA users receive native
notifications when this tab is closed. Deep-links click to the
brief magazine URL for brief_ready events, to the event link for
everything else.

Schema / API:
- channelTypeValidator gains 'web_push' literal
- notificationChannels union adds { endpoint, p256dh, auth,
  userAgent? } variant (standard PushSubscription identity triple +
  cosmetic UA for the settings UI)
- new setWebPushChannelForUser internal mutation upserts the row
- /relay/deactivate allow-list extended to accept 'web_push'
- api/notification-channels: 'set-web-push' action validates the
  triple, rejects non-https, truncates UA to 200 chars

Client (src/services/push-notifications.ts + src/config/push.ts):
- isWebPushSupported guards Tauri webview + iOS Safari
- subscribeToPush: permission + pushManager.subscribe + POST triple
- unsubscribeFromPush: pushManager.unsubscribe + DELETE row
- VAPID_PUBLIC_KEY constant (with VITE_VAPID_PUBLIC_KEY env override)
- base64 <-> Uint8Array helpers (VAPID key encoding)

Service worker (public/push-handler.js):
- Imported into VitePWA's generated sw.js via workbox.importScripts
- push event: renders notification; requireInteraction=true for
  brief_ready so a lock-screen swipe does not dismiss the CTA
- notificationclick: focuses+navigates existing same-origin client
  when present, otherwise opens a new window
- Malformed JSON falls back to raw text body, missing data falls
  back to a minimal WorldMonitor default

Relay (scripts/notification-relay.cjs):
- sendWebPush() with lazy-loaded web-push dep. 404/410 triggers
  deactivateChannel('web_push'). Missing VAPID env vars logs once
  and skips — other channels keep delivering.
- processEvent dispatch loop + drainHeldForUser both gain web_push
  branches

Settings UI (src/services/notifications-settings.ts):
- New 'Browser Push' tile with bell icon
- Enable button lazy-imports push-notifications, calls subscribe,
  renders 'Not supported' on Tauri/in-app webviews
- Remove button routes web_push specifically through
  unsubscribeFromPush so the browser side is cleaned up too

Env vars required on Railway services:
  VAPID_PUBLIC_KEY   public key
  VAPID_PRIVATE_KEY  private key
  VAPID_SUBJECT      mailto:support@worldmonitor.app (optional)

Public key is also committed as the default in src/config/push.ts
so the client bundle works without a build-time override.

Tests: 11 new cases in tests/brief-web-push.test.mjs
- base64 <-> Uint8Array round-trip + null guards
- VAPID default fallback when env absent
- SW push event rendering, requireInteraction gating, malformed JSON
  + no-data fallbacks
- SW notificationclick: openWindow vs focus+navigate, default url

154/154 tests pass. Both tsconfigs typecheck clean.

* fix(brief): address PR #3173 review findings + drop hardcoded VAPID

P1 (security): VAPID private key leaked in PR description.
Rotated the keypair. Old pair permanently invalidated. Structural fix:

  Removed DEFAULT_VAPID_PUBLIC_KEY entirely. Hardcoding the public
  key in src/config/push.ts gave rotations two sources of truth
  (code vs env) — exactly the friction that caused me to paste the
  private key in a PR description in the first place. VAPID_PUBLIC_KEY
  now comes SOLELY from VITE_VAPID_PUBLIC_KEY at build time.
  isWebPushConfigured() gates the subscribe flow so builds without
  the env var surface as 'Not supported' rather than crashing
  pushManager.subscribe.

  Operator setup (one-time):
    Vercel build:      VITE_VAPID_PUBLIC_KEY=<public>
    Railway services:  VAPID_PUBLIC_KEY=<public>
                       VAPID_PRIVATE_KEY=<private>
                       VAPID_SUBJECT=mailto:support@worldmonitor.app

  Rotation: update env on both sides, redeploy. No code change, no
  PR body — no chance of leaking a key in a commit.

P2: single-device fan-out — setWebPushChannelForUser replaces the
previous subscription silently. Per-device fan-out is a schema change
deferred to follow-up. Fix for now: surface the replacement in
settings UI copy ('Enabling here replaces any previously registered
browser.') so users who expect multi-device see the warning.

P2: 24h push TTL floods offline devices on reconnect. Event-type-aware:
  brief_ready:       12h  (daily editorial — still interesting)
  quiet_hours_batch:  6h  (by definition queued-on-wake)
  everything else:   30m  (transient alerts: noise after 30min)

REGRESSION test: VAPID_PUBLIC_KEY must be '' when env var is unset.
If a committed default is reintroduced, the test fails loudly.

11/11 web-push tests pass. Both tsconfigs typecheck clean.

* fix(notifications): deliver channel_welcome push for web_push connects (#3173 P2)

The settings UI queues a channel_welcome event on first web_push
subscribe (api/notification-channels.ts:240 via publishWelcome), but
processWelcome() in the relay only branched on slack/discord/email —
no web_push arm. The welcome event was consumed off the queue and
then silently dropped, leaving first-time subscribers with no
'connection confirmed' signal.

Fix: add a web_push branch to processWelcome. Calls sendWebPush with
eventType='channel_welcome' which maps to the 30-minute TTL tier in
the push-delivery switch — a welcome that arrives >30 min after
subscribe is noise, not confirmation.

Short body (under 80 chars) so Chrome/Firefox/Safari notification
shelves don't clip past ellipsis.

11/11 web-push tests pass.

* fix(notifications): address two P1 review findings on #3173

P1-A: SSRF via user-supplied web_push endpoint.
The set-web-push edge handler accepted any https:// URL and wrote
it to Convex. The relay's sendWebPush() later POSTs to whatever
endpoint sits in that row, giving any Pro user a server-side-request
primitive bounded only by the relay's network egress.

Fix: isAllowedPushEndpointHost() allow-list in api/notification-
channels.ts. Only the four known browser push-service hosts pass:

  fcm.googleapis.com                (Chrome / Edge / Brave)
  updates.push.services.mozilla.com (Firefox)
  web.push.apple.com                (Safari, macOS 13+)
  *.notify.windows.com              (Windows Notification Service)

Fail-closed: unknown hosts rejected with 400 before the row ever
reaches Convex. If a future browser ships a new push service we'll
need to widen this list (guarded by the SSRF regression tests).

P1-B: cross-account endpoint reuse on shared devices.
The browser's PushSubscription is bound to the origin, NOT to the
Clerk session. User A subscribes on device X, signs out, user B
signs in on X and subscribes — the browser hands out the SAME
endpoint/p256dh/auth triple. The previous setWebPushChannelForUser
upsert keyed only by (userId, channelType), so BOTH rows now carry
the same endpoint. Every push the relay fans out for user A also
lands on device X which is now showing user B's session.

Fix: setWebPushChannelForUser scans all web_push rows and deletes
any that match the new endpoint BEFORE upserting. Effectively
transfers ownership of the subscription to the current caller.
The previous user will need to re-subscribe on that device if they
sign in again.

No endpoint-based index on notificationChannels — the scan happens
at <10k rows and is well-bounded to the one write-path per user
per connect. If volume grows, add an  + migration.

Regression tests (tests/brief-web-push.test.mjs, 3 new cases):
  - allow-list defines all four browser hosts + fail-closed return
  - allow-list is invoked BEFORE convexRelay() in the handler
  - setWebPushChannelForUser compares + deletes rows by endpoint

14/14 web-push tests pass. Both tsconfigs typecheck clean.
2026-04-18 20:27:08 +04:00
Elie Habib
c2356890da feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini (#3172)
* feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini

Replaces the Phase 3a stubs with editorial output from Gemini 2.5
Flash via the existing OpenRouter-backed callLLM chain. Two LLM
pathways, different caching semantics:

  whyMatters (per story): 1 editorial sentence, 18-30 words, global
  stakes. Cache brief:llm:whymatters:v1:{sha256(headline|source|severity)}
  with 24h TTL shared ACROSS users (whyMatters is not personalised).
  Bounded concurrency 5 so a 12-story brief doesn't open 12 parallel
  sockets to OpenRouter.

  digest prose (per user): JSON { lead, threads[], signals[] }
  replacing the stubs. Cache brief:llm:digest:v1:{userId}:{sensitivity}
  :{poolHash} with 4h TTL per-user. Pool hash is order-insensitive
  so rank shuffling doesn't invalidate.

Provider pinned to OpenRouter (google/gemini-2.5-flash) via
skipProviders: ['ollama', 'groq'] per explicit user direction.

Null-safe all the way down. If the LLM is unreachable, parse fails,
or cache throws, enrichBriefEnvelopeWithLLM returns the baseline
envelope with its stubs intact. The brief always ships. Kill switch
BRIEF_LLM_ENABLED is distinct from AI_DIGEST_ENABLED so the brief's
editorial prose and the email's AI summary can be toggled
independently during provider outages.

Files:
  scripts/lib/brief-llm.mjs (new) — pure prompt/parse helpers + IO
    generateWhyMatters/generateDigestProse + envelope enrichment
  scripts/seed-digest-notifications.mjs — BRIEF_LLM_ENABLED flag,
    briefLlmDeps closure, enrichment inserted between compose + SETEX
  tests/brief-llm.test.mjs (new, 34 cases)

End-to-end verification: the enriched envelope passes
assertBriefEnvelope() — the renderer's strict validator is the gate
between composer and api/brief, so we prove the enriched envelope
still validates.

156/156 brief tests pass. Both tsconfigs typecheck clean.

* fix(brief): address three P1 review findings on Phase 3b

All three findings are about cache-key correctness + envelope safety.

P1-A — whyMatters cache key under-specifies the prompt.
  hashStory keyed on headline|source|threatLevel, but the prompt also
  carries category + country. Upstream classification or geocoding
  corrections that leave those three fields unchanged would return
  pre-correction prose for a materially different prompt. Bumped to
  v2 key space (pre-fix rows ignored, re-LLM once on rollout). Added
  regression tests for category + country busting the cache.

P1-B — digest prose cache key under-specifies the prompt.
  hashDigestInput sorted stories and hashed headline|threatLevel only.
  The actual prompt includes ranked order + category + country + source.
  v2 hash now canonicalises to JSON of the fields in the prompt's
  ranked order. Test inverted to lock the corrected behaviour
  (reordering MUST miss the cache). Added a test for category change
  invalidating.

P1-C — malformed cached digest poisons the envelope at SETEX time.
  On cache hit generateDigestProse accepted any object with a string
  lead, skipping the full shape check. enrichBriefEnvelopeWithLLM then
  wrote prose.threads/.signals into the envelope, and the cron SETEXed
  unvalidated. A bad cache row would 404 /api/brief at render time.

  Two-layer fix:
    1. Extracted validateDigestProseShape(obj) — same strictness
       parseDigestProse ran on fresh output. generateDigestProse now
       runs it on cache hits too, and returns a normalised copy.
    2. Cron now re-runs assertBriefEnvelope on the ENRICHED envelope
       before SETEX. On assertion failure it falls back to the
       unenriched baseline (already passed assertion on construction).

  Regression test: malformed cached row is rejected on hit and the
  LLM is called again to overwrite.

Tests: 8 new regression cases locking all three findings. Total brief
test suite now 185/185 green. Both tsconfigs typecheck clean.

Cache-key version bumps (v1 -> v2) trigger one-off cache miss on
deploy. Editorial prose re-LLM'd on the next cron tick per user.

* fix(brief): address two P2 review findings on #3172

P2-A: misleading test name 'different users share the cache' asserted
the opposite (per-user isolation). Renamed to 'different users do NOT
share the digest cache even when the story pool is identical' so a
future reader can't refactor away the per-user key on a misreading.

P2-B: signal length validator only capped bytes (< 220 chars), so a
30-word signal could pass even though the prompt says '<=14 words'.
Added a word-count filter with an 18-word ceiling (14 + 4 margin for
model drift / hyphenated compounds). Regression test locks the
behaviour: signals with >14-word drift are dropped, short imperatives
pass.

43/43 brief-llm tests pass. Both tsconfigs typecheck clean.
2026-04-18 19:37:33 +04:00
Elie Habib
0fd8cd7d5f feat(pro): complimentary entitlement tooling + subscription.expired guard (#3169)
Adds support / goodwill tooling for granting free-tier credits that
survive Dodo subscription cancellations. Triggered by the 2026-04-17/18
duplicate-subscription incident: the customer was granted a manual
extension to validUntil, but that extension is naked — our existing
handleSubscriptionExpired handler unconditionally downgrades to free
when Dodo fires the expired event, which would wipe the credit.

Three coordinated changes:

1. convex/schema.ts — add optional compUntil: number to entitlements.
   Acts as a "don't downgrade me before this" floor independent of the
   subscription billing cycle. Optional, so existing rows are untouched.

2. convex/payments/billing.ts::grantComplimentaryEntitlement —
   new internalMutation callable via `npx convex run`. Upserts the
   entitlement, sets both validUntil and compUntil to max(existing, now +
   days). Never shrinks (calling twice is idempotent upward), validates
   planKey against PRODUCT_CATALOG, and syncs the Redis cache so edge
   gateway sees the comp without waiting for TTL.

3. convex/payments/subscriptionHelpers.ts::handleSubscriptionExpired —
   before the unconditional downgrade, read the current entitlement and
   skip revocation if compUntil > eventTimestamp. This protects comp
   grants from Dodo-originated revocations; normal subscription.expired
   revocation is unchanged when there's no comp or the comp has lapsed.

Tests (convex/__tests__/comp-entitlement.test.ts, 9 new):

  grantComplimentaryEntitlement
    creates row with validUntil == compUntil
    never shrinks an existing longer comp
    upgrades planKey on existing free-tier row
    rejects unknown planKey and non-positive days

  handleSubscriptionExpired comp guard
    revokes to free when no comp set (unchanged)
    revokes to free when comp is already expired
    preserves entitlement when comp is still valid
    end-to-end: grant + expired webhook = entitlement survives

CLI usage (requires npx convex deploy after merge, then):

  npx convex run 'payments/billing:grantComplimentaryEntitlement' \
    '{"userId":"user_XXX","planKey":"pro_monthly","days":90,"reason":"support"}'

Full check suite green: typecheck x2, biome, 5590 data tests,
171 edge tests, 9 convex tests, md lint, version sync.
2026-04-18 18:06:20 +04:00
Elie Habib
c90d40dfc5 fix(pro): consult Convex entitlement in hasPremiumAccess + isProUser (#3167)
* fix(pro): consult Convex entitlement in hasPremiumAccess and isProUser

Customer cus_0NcmwcAWw0jhVBHVOK58C still saw "Upgrade to Pro" overlays on
premium panels this morning despite PR #3163 landing, because the reload
the transition detector triggers only helps if hasPremiumAccess returns
true on the next load — and it doesn't. Our Dodo webhook writes to Convex
`entitlements` but never to Clerk publicMetadata.plan, so
hasPremiumAccess (which only checked Clerk role / API keys / tester keys)
stayed false for every paying Dodo customer. isPanelEntitled honours the
Convex entitlement and unlocks the panel content, but getPanelGateReason
-> hasPremiumAccess still returns FREE_TIER and covers it with the
upgrade overlay. Kevin's screenshot shows exactly that: PRO badges next
to panel titles, "Upgrade to Pro" body beneath.

Fix:
  - src/services/panel-gating.ts — hasPremiumAccess now checks isEntitled()
    after the Clerk role check. Convex entitlement is now authoritative for
    paying customers.
  - src/services/widget-store.ts — same fix in isProUser so widget, search,
    and event-handler gates agree with panel gating.
  - src/app/panel-layout.ts — after the existing free->pro reload branch,
    re-run updatePanelGating(getAuthState()) on every entitlement snapshot.
    Without this a legacy-pro user's null->true initial snapshot (NOT
    reloaded to avoid a loop) would leave the paywall overlay in place
    until the next auth event; likewise on WS reconnect or revocation the
    lock state must follow the current snapshot synchronously.

Import graph stays acyclic: panel-gating -> entitlements -> convex-client
-> clerk; widget-store -> entitlements likewise. Typecheck clean.

Pairs with PR #3163 (already merged). Earlier iterations of the activation
race fix were cargo-culted around the transition detector; this is the
actual UI-visible fix.

* refactor(pro): drop redundant isEntitled() from hasPremiumAccess

Greptile P2: the isEntitled() check at the end of hasPremiumAccess can
never flip the result, because isProUser() — called two lines earlier in
the same function — now also checks isEntitled() after this PR's
widget-store change. If isProUser() returns false, we already know
isEntitled() was false inside it.

Remove the redundant call and expand the docstring to say explicitly that
isProUser carries the Convex entitlement signal. Keeps hasPremiumAccess
as a thin union of signals that aren't already covered by isProUser
(WORLDMONITOR_API_KEY desktop secret + the passed-in authState.role which
could in principle differ from getAuthState()).
2026-04-18 15:49:16 +04:00
Elie Habib
01c607c27c fix(brief): compose magazine stories from digest accumulator, not news:insights (#3168)
Root cause of the "email digest lists 30 critical events, brief shows
2 random Bellingcat stories" mismatch reported today: the email and
the brief read from two unrelated Redis keys.

  email digest -> digest:accumulator:v1:{variant}:{lang}
                  live per-variant ZSET of 30+ ingested stories,
                  hydrated from story:track:v1:{hash} + sources.
                  written by list-feed-digest on every ingest cycle.

  brief        -> news:insights:v1
                  global 8-story summary written by seed-insights.
                  After sensitivity=critical filter only 2 survive.
                  A completely different pool on a different cadence.

The brief was shipping from the wrong source, so a user who had just
read "UNICEF / Hormuz / Rohingya" in their email would open their
brief and see unrelated Bellingcat pieces.

Fix: brief now composes from the same digest accumulator the email
reads. scripts/lib/brief-compose.mjs exposes a new
composeBriefFromDigestStories(rule, digestStories, insightsNumbers,
{nowMs}) that maps the digest story shape ({hash, title, severity,
sources[], ...}) through a small adapter into the upstream brief-
filter shape, applies the user's sensitivity gate, and assembles the
envelope. news:insights:v1 is still read — but only for the
clusters / multi-source counters on the stats page. A failed
insights fetch now returns zeroed stats instead of aborting brief
composition, because the stories (not the numbers) are what matter.

seed-digest-notifications:
- composeBriefsForRun now calls buildDigest(candidate, windowStart)
  per rule instead of using a single global pool
- memoizes buildDigest by (variant, lang, windowStart) to keep the
  per-user loop from issuing N identical ZRANGE+HGETALL round-trips
- BRIEF_STORY_WINDOW_MS = 24h — a weekly-cadence user still expects
  a fresh brief in the dashboard every day, independent of email
  cadence
- composeBriefForRule kept as @deprecated so tests that stub
  news:insights directly don't break; all live traffic uses the
  digest path

Tests: new tests/brief-from-digest-stories.test.mjs (12 cases) locks
the mapping — empty input, source selection, sensitivity pass/drop,
12-story cap, moderate→medium severity aliasing, category/country
defaults, stats-number passthrough, determinism.

122/122 brief tests pass; both tsconfigs typecheck clean.

Operator note: today's wrong brief at brief:user_...:2026-04-18 was
already DELed manually. The next cron tick under this code composes
a correct one from the same pool the email used.
2026-04-18 15:47:08 +04:00
Elie Habib
e98df6f694 fix(brief): clerk-pro-only gate + generation-guarded token cache (#3166)
Two P1 race/gate findings from post-merge review of #3160.

Finding 1 — mixed-auth path still rendered as unlocked.
hasPremiumAccess() returns true for desktop API key / browser tester
keys even when the signed-in Clerk account is FREE. The brief is
stored at brief:{clerkUserId}:{date} in Redis — without a Clerk PRO
user there is nothing to fetch, and /api/latest-brief returns 403.
Relying on the panel's inline role check only inverts the UX: the
panel "unlocks", then paints an upgrade CTA inside an unlocked body.

Fix: WEB_CLERK_PRO_ONLY_PANELS in panel-layout.ts. When a panel is
in this set AND the Clerk role is not 'pro', the layout downgrades
the gate reason from NONE to FREE_TIER (or ANONYMOUS when no Clerk
user at all). The panel now shows the same locked overlay an actual
free user sees — consistent, and no doomed fetch.

Finding 2 — clearClerkTokenCache() didn't invalidate mid-flight fetch.
Nulling _cachedToken and _tokenInflight does not cancel a promise
that is already awaiting session.getToken(). When that promise
resolves it unconditionally writes _cachedToken = tokenA and returns
tokenA to its already-awaiting callers, re-poisoning the cache for
50s and silently shipping A's JWT into B's session. The panel's
post-response UI-user check catches the direct A-during-fetch race,
but a panel that starts a fresh refresh AFTER the switch can still
get A's token back from the stale inflight.

Fix: monotonic _tokenGen counter. Bumped by clearClerkTokenCache()
and signOut(). Each getClerkToken() captures myGen on entry; if the
generation has advanced by the time the JWT arrives, the result is
dropped on the floor (no cache write, no return value). The finally
block also guards the _tokenInflight null-out so a newer generation's
inflight isn't clobbered.

Typecheck clean on both tsconfigs. 94/94 brief + deploy tests green.
2026-04-18 15:46:56 +04:00
Elie Habib
8684e5a398 fix(brief): per-route CSP override so magazine swipe/arrow nav runs (#3165)
* fix(brief): per-route CSP override so magazine swipe/arrow nav runs

The global CSP at /((?!docs).*) allow-lists only four SHA-256 hashes
for inline scripts (the app's own index.html scripts). brief-render.js
emits its swipe/arrow/wheel/touch nav as a deterministic inline IIFE
with a different hash, so the browser silently blocked it. The deck
rendered, pages were present, dots were drawn — but nothing advanced.

Fix mirrors the existing /api/slack/oauth/callback and
/api/discord/oauth/callback precedent: a per-route Content-Security-
Policy header for /api/brief/(.*) that relaxes script-src to
'unsafe-inline'. Everything else is tight:
- default-src 'self'
- connect-src 'self' (no outbound network)
- object-src 'none', form-action 'none'
- frame-ancestors pinned to worldmonitor domains
- style-src keeps Google Fonts; font-src keeps gstatic
- script-src keeps Cloudflare Insights beacon (auto-injected)

'unsafe-inline' is safe here because server/_shared/brief-render.js
HTML-escapes all Redis-sourced content via escapeHtml over [&<>"'].
No user-controlled string reaches the DOM unescaped.

Verified: all 17 tests/deploy-config.test.mjs security-header
assertions still pass (they target the catch-all route, untouched).

* fix(brief): un-block Cloudflare Insights beacon + add CSP test coverage

Two P2 follow-ups from Greptile review on #3165.

1. connect-src was 'self' only — the Cloudflare Insights beacon script
   loaded (script-src allowed static.cloudflareinsights.com) but its
   outbound POST to https://cloudflareinsights.com/cdn-cgi/rum was
   silently blocked. Analytics for brief-page traffic was dropped with
   no console error. Added https://cloudflareinsights.com to
   connect-src so the beacon can ship its payload.

2. tests/deploy-config.test.mjs had 17 assertions for the catch-all
   CSP but nothing for the new /api/brief/(.*) override. Any future
   edit — or accidental deletion — of the rule would land without a
   red test. Added a 4-test suite covering:
   - rule exists with a CSP header
   - script-src allows 'unsafe-inline' (the whole point)
   - connect-src allows cloudflareinsights.com (this fix)
   - tight non-script defaults still present (default-src 'self',
     object-src 'none', form-action 'none', base-uri 'self')

21/21 deploy-config assertions pass locally.
2026-04-18 15:20:01 +04:00
Sebastien Melki
bc91c61a87 [codex] guard duplicate subscription checkout (#3162)
* guard duplicate subscription checkout

* address checkout guard review feedback
2026-04-18 15:19:34 +04:00
Elie Habib
c49c2f80f6 fix(pro): reliable post-payment activation (#3163)
* fix(pro): reliable post-payment activation (transition reload + auth wait + overlay-success reload)

Fixes a silent race where paying users saw locked panels after a successful
Dodo checkout and concluded PRO hadn't activated. Incident 2026-04-17/18:
one customer purchased Pro Monthly twice within 32 min on Google Pay then
Credit Card because the first charge showed no UI change; the duplicate
was refunded by Dodo with reason "Duplicate transaction".

Server path (webhook -> Convex entitlements row) was verified correct end
to end: all 9 webhook events processed, entitlement row written within
seconds, planKey=pro_monthly. The bug was client-side in three places.

1. panel-layout.ts replaced skipInitialSnapshot with a free->pro
   transition detector (shouldReloadOnEntitlementChange). The prior guard
   swallowed the first pro snapshot unconditionally, which collapsed two
   distinct cases: (a) legacy-pro user on normal page load (correctly no
   reload) and (b) free user whose post-payment pro snapshot arrives after
   panels rendered against free-tier gating (should reload). The
   transition detector distinguishes them by remembering the last observed
   entitlement.

2. entitlements.ts awaits waitForConvexAuth(10_000) before calling
   client.onUpdate. Mirrors the pattern already used in api-keys.ts and
   App.ts claimSubscription path. Eliminates the spurious FREE_TIER_DEFAULTS
   first snapshot from unauthenticated cold sessions that the transition
   detector would otherwise treat as the baseline.

3. checkout.ts on Dodo overlay checkout.status=succeeded schedules a
   window.location.reload() after 3s (median webhook latency <5s observed
   in prod). Belt-and-braces: guarantees the post-payment state is fresh
   even if the WS subscription is slow or the transition detector misses
   the edge for any reason.

Unit tests in tests/entitlement-transition.test.mts cover all six
(last, next) combinations plus the full incident-simulation sequence
(null -> false -> true -> true => exactly one reload) and the legacy-pro
reconnect sequence (null -> true -> true -> true => zero reloads).

Out of scope (tracked separately): server-side duplicate-subscription
guard in _createCheckoutSession.

* fix(pro): seed lastEntitled=false on redirect-return from checkout

Addresses a gap in the original PR: the transition detector still swallowed
the first pro snapshot when the user came back via Dodo's full-page
redirect flow (/pro page -> Dodo checkout -> return to worldmonitor.app
with ?subscription_id=...&status=active URL params handled by
handleCheckoutReturn).

On that path a fast webhook can land before the browser finishes the
return navigation. When the dashboard boots, Convex's first entitlement
snapshot already carries pro_monthly — which the detector treats as the
"legacy-pro on normal page load" case and does not reload. Panels rendered
against free-tier gating stay locked until manual refresh.

Fix: when handleCheckoutReturn() returns true, seed lastEntitled=false
instead of null. This biases the detector to treat the first pro snapshot
as the true free->pro transition that it is, not a legacy-pro baseline.

Adds two new unit tests covering both redirect-return timings (webhook
already landed; webhook still pending). Full transition suite is now
10/10 passing.

* fix(pro): seed lastEntitled=false across the overlay reload too

Prior amendment covered the full-page Dodo redirect return (URL carries
subscription_id params consumed by handleCheckoutReturn). But the overlay
success path does its own setTimeout(() => window.location.reload(), 3_000)
and the overlay uses manualRedirect:true, so the reload lands at the
original URL with no params. handleCheckoutReturn returns false there,
returnedFromCheckout stays false, lastEntitled seeds to null, and a fast
webhook's first-snapshot pro entitlement gets swallowed as legacy-pro
baseline — same class of bug that caused the 2026-04-17/18 incident, now
reproducible on the overlay path instead of the redirect path.

Fix: before the scheduled reload, set a session flag (wm-post-checkout).
On the reloaded page, panel-layout consumes the flag and treats it as a
post-checkout return, which makes the transition detector seed
lastEntitled=false and correctly route the first pro snapshot through the
reload.

Session storage is used (not local) so the flag is scoped to the tab and
doesn't leak across sessions. Silent try/catch keeps private-browsing
environments working — in that case we fall back to the pre-flag behavior
(risk bounded by the 3s reload + Convex WS catching up, same as before).
2026-04-18 15:19:12 +04:00
Elie Habib
5673bc6c16 fix(brief): sign-in-required state + composing auto-refresh (#3160)
* fix(brief): sign-in-required state + composing poll + visibility refresh

Addresses two P1 review findings after PR #3159 merged.

1. hasPremiumAccess unlocks for desktop WORLDMONITOR_API_KEY and
   browser tester keys, but /api/latest-brief is Clerk-userId
   scoped — it returns 401 for those paths. Previously the panel
   unlocked then showed a retry-loop error for API-key-only users.
   Now: if authState.user is missing, render a dedicated "Sign in
   to view your brief" CTA inline (no endpoint hit). Subscribes to
   subscribeAuthState; when the user signs in mid-session, refresh()
   fires automatically.

2. composing state never auto-refreshed. Fix: every renderComposing
   schedules a 60s setTimeout re-poll; cleared on ready/error/lock/
   destroy. Also added a document visibilitychange listener that
   triggers refresh when the tab comes back into focus. Covers the
   "brief composed while tab backgrounded" case without reload.

Added matching destroy() override to clean up timeout + auth
subscription + visibility listener.

Typecheck + biome lint clean.

* fix(brief): direct Clerk fetch + sign-out clear + account-switch guard

Addresses three P1 review findings on PR #3160.

1. Desktop API key + Clerk Pro couldn't load the panel.
   premiumFetch hard-stops on WORLDMONITOR_API_KEY and never sends
   Clerk Bearer. /api/latest-brief is Bearer-only so every desktop
   request 401'd even for a signed-in Pro user. Fix: fetchLatest
   bypasses premiumFetch entirely — imports getClerkToken() and
   builds the request with Authorization: Bearer directly. The
   user-scoped pre-check already guaranteed we have a Clerk user.

2. Sign-out left the previous user's brief on screen. The auth
   subscription only triggered refresh on truthy nextId, and
   hasPremiumAccess stays true on desktop/tester keys so the
   layout-level updatePanelGating doesn't re-lock us. Fix: the
   subscription now handles ALL three transitions explicitly:
     null → id   → refresh (sign-in)
     idA → idB   → abort in-flight + refresh (account switch)
     id → null   → abort in-flight + renderSignInRequired (sign-out)
   Clears composing poll and inflight fetch on every transition.

3. Clerk account switch could paint user A's brief in user B's
   session. getClerkToken caches the JWT for 50s, so a fast A→B
   switch during an in-flight fetch would hit the server with A's
   Bearer, return A's brief, and the post-response guard (which
   only checked "still premium") would let it paint in B's UI.
   Fix: refresh() captures requestUserId at fetch-start and the
   post-response + error branches re-verify that the current
   authState.user.id still equals requestUserId before any
   this.content mutation. A transient account switch silently
   drops the stale response.

Typecheck + biome lint clean.

* fix(brief): clear Clerk token cache on user-id transition

Closes the remaining account-switch race the previous commit
couldn't cover from inside the panel. Detail:

My previous post-response userId check compared the CURRENT
authState.user.id to the requestUserId captured at refresh start.
For a fast A→B switch:
  - Subscription handler fires, lastUserId = B
  - refresh() captures requestUserId = B
  - fetchLatest calls getClerkToken() → returns A's cached JWT
    (50s TTL, keyed by time not user)
  - Server /api/latest-brief decodes A's sub → returns A's brief
  - Post-check: currentUserId (B) === requestUserId (B) → paint
  - Result: user A's greeting + signed magazineUrl painted in
    user B's session

Fix: clear the Clerk token cache via the existing
clearClerkTokenCache() export from clerk.ts on every observed
user-id transition. Next getClerkToken() re-fetches from
Clerk.session — bound to the CURRENT session, not the previous
one. Server now receives B's token, returns B's brief.

Covers sign-in (A=null→B), account switch (A→B), and sign-out
(B→null) symmetrically. The (B→null) branch also short-circuits
to renderSignInRequired which was already in place.

Findings 1 (sign-out clear) and 3 (desktop API key) from the
latest review were already resolved in commit 8b0690b9a —
reviewer was evaluating pre-commit state. This commit closes
the residual token-cache hole that wasn't visible from the
panel alone.

Typecheck + biome lint clean.

* fix(brief): upgrade CTA for mixed-auth + clear inflight promise too

Addresses two P1 review findings on PR #3160.

1. Mixed auth: tester/API key + FREE Clerk session. hasPremiumAccess
   unlocks the panel; the Bearer-only fetch now sends the free
   Clerk's JWT; /api/latest-brief validates entitlement from the
   JWT userId and returns 403. Before: user saw a retry-loop
   error banner. Now: panel checks authState.user.role !== 'pro'
   BEFORE fetching and renders a dedicated "Pro required" CTA
   inline. A stale client cache that says role='pro' while Convex
   says free is also covered — 403 from the server now surfaces
   as the same upgrade CTA, not a retry error. Introduces a typed
   BriefAccessError so the refresh loop can branch terminal vs
   transient failures cleanly.

2. clearClerkTokenCache only nulled _cachedToken, not
   _tokenInflight. If a token fetch for user A was in flight when
   the app switched to B, the next getClerkToken() reused A's
   promise and sent A's JWT. Server returned A's brief; post-
   response guard (currentUserId B === requestUserId B) let it
   paint. Fix: clearClerkTokenCache now nulls _tokenInflight too.
   The old promise still resolves to its closure, but no caller
   holds a reference, so the fresh getClerkToken() call starts a
   new request bound to the current Clerk session.

Typecheck + biome lint clean.
2026-04-18 14:51:32 +04:00
Elie Habib
64c906a406 feat(eia): gold-standard /api/eia/petroleum (Railway seed → Redis → Vercel reads only) (#3161)
* feat(eia): move /api/eia/petroleum to gold-standard (Railway seed → Redis → Vercel reads only)

Live api.eia.gov fetches from the Vercel edge function were causing
FUNCTION_INVOCATION_TIMEOUT 504s on /api/eia/petroleum (Sydney edge →
US origin with no timeout, no cache, no stale fallback — one EIA blip
blew the 25s budget).

- New seeder scripts/seed-eia-petroleum.mjs — fetches WTI/Brent/
  production/inventory from api.eia.gov with per-fetch 15s timeouts,
  writes energy:eia-petroleum:v1 with the {_seed, data} envelope.
  Accepts 1-of-4 series; 0-of-4 routes to contract-mode RETRY so
  seed-meta stays stale and the bundle retries on next cron.
- Bundled into seed-bundle-energy-sources.mjs (daily, 90s timeout) —
  no new Railway service needed.
- Rewrote api/eia/[[...path]].js as a Redis-only reader via
  readJsonFromUpstash. Same response shape for backward compat with
  widgets/MCP/external callers. 503 + Retry-After on miss (never 504).
- Registered eiaPetroleum in api/health.js STANDALONE_KEYS + gated as
  ON_DEMAND_KEYS for the deploy window; promote to SEED_META
  (maxStaleMin: 4320) in a follow-up after ~7 days of clean cron.
- Tests: 14 seeder unit tests + 9 edge handler tests.

Audit result: /api/eia/petroleum was the only Vercel route fetching
dashboard data live. Every other fetch(https://…) in api/ is
auth/payments/notifications/user-initiated enrichment.

* fix(eia): close silent-stale window — add SEED_META + seed-health registration

Review finding on PR #3161: without a SEED_META entry, readSeedMeta
returns seedStale: null and classifyKey never reaches STALE_SEED.
That meant a broken Railway cron or missing EIA_API_KEY after the first
successful seed would keep /api/eia/petroleum serving stale data for
up to 7 days (TTL) while /api/health reported OK.

- api/health.js: add SEED_META.eiaPetroleum with maxStaleMin=4320
  (72h = 3× daily bundle cadence). Keep eiaPetroleum in ON_DEMAND_KEYS
  so the Vercel-instant / Railway-delayed deploy window doesn't CRIT
  on first seed, but stale-after-seed now properly fires STALE_SEED.
- api/seed-health.js: register energy:eia-petroleum in SEED_DOMAINS
  (intervalMin=1440) so the secondary health endpoint reports it too.
- Updated ON_DEMAND_KEYS comment to reflect freshness is now enforced.
2026-04-18 14:40:00 +04:00
Elie Habib
fd419bcfae feat(brief): dashboard Latest Brief panel (Phase 4) (#3159)
* feat(brief): dashboard "Latest Brief" panel (Phase 4)

New PRO-gated panel that reads /api/latest-brief and renders today's
brief with a cover-style thumbnail + greeting + thread count + CTA.
Clicking opens the signed magazine URL in a new tab. Base Panel
class handles the PRO overlay (ANONYMOUS/FREE_TIER) via
premium: 'locked' — no story content, headline, or greeting leaks
through DOM on the locked state.

Three render states:
- ready     → cover card + "Read brief →" CTA linking to magazineUrl
- composing → neutral empty state ("Your brief is composing.")
- error     → base showError() with retry

Files:
- src/components/LatestBriefPanel.ts — new Panel subclass, self-
  fetching via premiumFetch (handles Clerk Bearer + X-WorldMonitor-
  Key tester keys + api key fallback)
- src/components/index.ts — export the new panel
- src/app/panel-layout.ts — createPanel('latest-brief', ...)
- src/config/panels.ts — registry entry (priority 1 so it sorts up
  front across all variant registries)
- src/styles/panels.css — cover-card + meta-strip styles using the
  same e-ink palette as the magazine (sienna kicker, bone text on
  ink cover, serif greeting)

Self-contained: no Convex migration, no new env vars, no backend
changes. Reads the /api/latest-brief endpoint already shipped in
Phase 2 (#3153 merged). Lands independently of Phase 3b / 5 / 6 / 8.

Follow-ups (not in this PR):
- CMD+K entry for "Open Latest Brief" — locale strings + commands
  registry, trivial.
- Localisation of panel title + copy strings.
- Share button (todo 223).

Typecheck clean, lint clean on the new file.

* fix(brief): register latest-brief in both premium gate registries

Addresses the review finding that the Panel base class's
`premium: 'locked'` flag is NOT what actually enforces PRO gating in
the app. Two separate registries do:

1. WEB_PREMIUM_PANELS in src/app/panel-layout.ts — the set
   updatePanelGating() iterates on every auth-state change to decide
   which panels to lock with a CTA overlay. Panels not in this set
   get `reason === NONE` and are always unlocked for whoever's
   viewing them, regardless of the Panel constructor flag.

2. The `premium:` property on each entry in src/config/panels.ts —
   which isPanelEntitled() checks to decide whether a panel is
   premium at all.

`latest-brief` was missing from both. Result for anonymous/free
users: the panel mounted, self-fetched /api/latest-brief, got 401
or 403, and showed raw error UI instead of the intended "Upgrade to
Pro" overlay. Also: a PRO user who downgraded mid-session would
retain the rendered brief because updatePanelGating() wouldn't
re-lock them.

Fixes:

- src/app/panel-layout.ts — add 'latest-brief' to WEB_PREMIUM_PANELS
  so updatePanelGating() locks the panel correctly for non-PRO users
  and RE-locks it on a mid-session downgrade.

- src/config/panels.ts — add `premium: 'locked' as const` to all
  four registry entries (full, finance, tech, happy variants) so
  isPanelEntitled() treats it as premium everywhere.

- src/components/LatestBriefPanel.ts — guard refresh() against
  running without premium access. Belt-and-suspenders against race
  conditions where the panel mounts before updatePanelGating()
  completes, and against mid-session downgrade where the panel
  stays mounted but should stop hitting the endpoint. Uses the
  same hasPremiumAccess(getAuthState()) check as the gating
  infrastructure itself.

Typecheck + biome lint clean.

* fix(brief): SVG logo now actually renders + queue concurrent refresh

Addresses two P1 + one P2 from Greptile on PR #3159.

1. P1 (line 147 + line 167): `h('div', { innerHTML: ... })` silently
   did nothing. src/utils/dom-utils.ts applyProps has no special
   case for `innerHTML` — it falls through to
   `el.setAttribute('innerHTML', svgString)` which just sets a
   literal DOM attribute. Both logo containers rendered empty.
   Switched to:
     const logo = h('div', { className: '...' });
     logo.appendChild(rawHtml(WM_LOGO_SVG));
   rawHtml() exists in dom-utils for exactly this case; returns a
   parsed DocumentFragment.

2. P2: Concurrent refresh() was silently dropped. Added a
   refreshQueued flag so a second refresh during an in-flight one
   queues a single follow-up pass instead of disappearing. Now a
   retry-after-error or a downstream caller that triggers refresh
   while another is mid-fetch always sees its intent applied.

Typecheck + biome lint clean.

* fix(brief): close downgrade-leak + blank-on-upgrade races on panel

Addresses two P1 findings on PR #3159.

1. In-flight fetch leaked premium content after downgrade.
   refresh() checked entitlement only BEFORE await premiumFetch.
   If auth flipped during the fetch, updatePanelGating had already
   replaced this.content with the locked CTA, but renderReady/
   renderComposing then overwrote it with brief content. Fixed with
   a three-gate sequence + fetch abort:

   (a) Pre-fetch check: gate + hasPremiumAccess — unchanged.
   (b) In-flight abort: override showGatedCta() to abort()
       the AbortController before super() paints the locked CTA.
       renderReady/renderComposing never even runs.
   (c) Post-response re-check: re-verify this.gateLocked +
       hasPremiumAccess before any this.content mutation. Catches
       the tight window where abort() lost the race or where an
       error-handler path could still paint brief-ish UI.

   All three are needed — a user can sign out between any two of
   them; removing any one leaves a real leakage window.

2. Upgrade → blank panel.
   unlockPanel() base-class behaviour clears the locked content and
   leaves the content element empty. No refresh was triggered on
   the free/anon → PRO transition, so the panel stayed blank until
   page reload. Overrode unlockPanel() to detect the wasLocked
   transition and call refresh() after re-rendering the loading
   state.

Also tracks gateLocked as a local mirror of the base's private
_locked, since Panel doesn't expose a getter. Synced via the two
override sites above; used in the three-gate checks.

Typecheck + biome lint clean.
2026-04-18 13:28:23 +04:00
Elie Habib
711636c7b6 feat(brief): consolidate composer into digest cron (retire standalone service) (#3157)
* feat(brief): consolidate composer into digest cron (retire standalone service)

Merges the Phase 3a standalone Railway composer into the existing
digest cron. End state: one cron (seed-digest-notifications.mjs)
writes brief:{userId}:{issueDate} for every eligible user AND
dispatches the digest to their configured channels with a signed
magazine URL appended. Net -1 Railway service.

User's architectural note: "there is no reason to have 1 digest
preparing all and sending, then another doing a duplicate". This
delivers that — infrastructure consolidation, same send cadence,
single source of truth for brief envelopes.

File moves / deletes:

- scripts/seed-brief-composer.mjs → scripts/lib/brief-compose.mjs
  Pure-helpers library: no main(), no env guards, no cron. Exports
  composeBriefForRule + groupEligibleRulesByUser + dedupeRulesByUser
  (shim) + shouldExitNonZero + date helpers + extractInsights.
- Dockerfile.seed-brief-composer → deleted.
- The seed-brief-composer Railway service is retired (user confirmed
  they would delete it manually).

New files:

- scripts/lib/brief-url-sign.mjs — plain .mjs port of the sign path
  in server/_shared/brief-url.ts (Web Crypto only, no node:crypto).
- tests/brief-url-sign.test.mjs — parity tests that confirm tokens
  minted by the scripts-side signer verify via the edge-side verifier
  and produce byte-identical output for identical input.

Digest cron (scripts/seed-digest-notifications.mjs):

- Reads news:insights:v1 once per run, composes per-user brief
  envelopes, SETEX brief:{userId}:{issueDate} via body-POST pipeline.
- Signs magazine URL per user (BRIEF_URL_SIGNING_SECRET +
  WORLDMONITOR_PUBLIC_BASE_URL new env requirements, see pre-merge).
- Injects magazineUrl into buildChannelBodies for every channel
  (email, telegram, slack, discord) as a "📖 Open your WorldMonitor
  Brief magazine" footer CTA.
- Email HTML gets a dedicated data-brief-cta-slot near the top of
  the body with a styled button.
- Compose failures NEVER block the digest send — the digest cron's
  existing behaviour is preserved when the brief pipeline has issues.
- Brief compose extracted to its own functions (composeBriefsForRun
  + composeAndStoreBriefForUser) to keep main's biome complexity at
  baseline (64 — was 63 before; inline would have pushed to 117).

Tests: 98/98 across the brief suite. New parity tests confirm cross-
module signer agreement.

PRE-MERGE: add BRIEF_URL_SIGNING_SECRET and WORLDMONITOR_PUBLIC_BASE_URL
to the digest-notifications Railway service env (same values already
set on Vercel for Phase 2). Without them, brief compose is auto-
disabled and the digest falls back to its current behaviour — safe to
deploy before env is set.

* fix(brief): digest Dockerfile + propagate compose failure to exit code

Addresses two seventh-round review findings on PR #3157.

1. Cross-directory imports + current Railway build root (todo 230).
   The consolidated digest cron imports from ../api, ../shared, and
   (transitively via scripts/lib/brief-compose.mjs) ../server/_shared.
   The running digest-notifications Railway service builds from the
   scripts/ root — those parent paths are outside the deploy tree
   and would 500 on next rebuild with ERR_MODULE_NOT_FOUND.

   New Dockerfile.digest-notifications (repo-root build context)
   COPYs exactly the modules the cron needs: scripts/ contents,
   scripts/lib/, shared/brief-envelope.*, shared/brief-filter.*,
   server/_shared/brief-render.*, api/_upstash-json.js,
   api/_seed-envelope.js. Tight list to keep the watch surface small.
   Pattern matches the retired Dockerfile.seed-brief-composer + the
   existing Dockerfile.relay.

2. Silent compose failures (todo 231). composeBriefsForRun logged
   counters but never exited non-zero. An Upstash outage or missing
   signing secret silently dropped every brief write while Railway
   showed the cron green. The retired standalone composer exited 1
   on structural failures; that observability was lost in the
   consolidation.

   Changed the compose fn to return {briefByUser, composeSuccess,
   composeFailed}. Main captures the counters, runs the full digest
   send loop first (compose-layer breakage must NEVER block user-
   visible digest delivery), then calls shouldExitNonZero at the
   very end. Exit-on-failure gives ops the Railway-red signal
   without touching send behaviour.

   Also: a total read failure of news:insights:v1 (catch branch)
   now counts as 1 compose failure so the gate trips on insights-
   key infra breakage, not just per-user write failures.

Tests unchanged (98/98). Typecheck + node --check clean. Biome
complexity ticks 63→65 — same pre-existing bucket, already tolerated
by CI; no new blocker.

PRE-MERGE Railway work still pending: set BRIEF_URL_SIGNING_SECRET
+ WORLDMONITOR_PUBLIC_BASE_URL on the digest-notifications service,
AND switch its dockerfilePath to /Dockerfile.digest-notifications
before merging. Without the dockerfilePath switch, the next rebuild
fails.

* fix(brief): Dockerfile type:module + explicit missing-secret tripwire

Addresses two eighth-round review findings on PR #3157.

1. ESM .js files parse as CommonJS in the container (todo 232).
   Dockerfile.digest-notifications COPYs shared/*.js,
   server/_shared/*.js, api/*.js — all ESM because the repo-root
   package.json has "type":"module". But the image never copies the
   root package.json, so Node's nearest-pjson walk inside /app/
   reaches / without finding one and defaults to CommonJS. First
   `export` statement throws `SyntaxError: Unexpected token 'export'`
   at startup.

   Fix: write a minimal /app/package.json with {"type":"module"}
   early in the build. Avoids dragging the full root package.json
   into the image while still giving Node the ESM hint it needs for
   repo-owned .js files.

2. Missing BRIEF_URL_SIGNING_SECRET silently tolerated (todo 233).
   The old gate folded "operator-disabled" (BRIEF_COMPOSE_ENABLED=0)
   and "required secret missing in rollout" into the same boolean
   via AND. A production deploy that forgot the env var would skip
   brief compose without any failure signal — Railway green, no
   briefs, no CTA in digests, nobody notices.

   Split the two states: BRIEF_COMPOSE_DISABLED_BY_OPERATOR (explicit
   kill switch, silent) and BRIEF_SIGNING_SECRET_MISSING (the misconfig
   we care about). When the secret is missing without the operator
   flag, composeBriefsForRun returns composeFailed=1 on first call
   so the end-of-run exit gate trips and Railway flags the run red.
   Digest send still proceeds — compose-layer issues never block
   notifications.

Tests: 98/98. Syntax + node --check clean.

* fix(brief): address 2 remaining P2 review comments on PR #3157

Greptile review (2026-04-18T05:04Z) flagged three P2 items. The
first (shouldExitNonZero never wired into cron) was already fixed in
commit 35a46aa34. This commit addresses the other two.

1. composeBriefForRule: issuedAt used Date.now() instead of the
   caller-supplied nowMs. Under the digest cron the delta is
   milliseconds and harmless, but it broke the function's
   determinism contract — same input must produce same output for
   tests + retries. Now uses the passed nowMs.

2. buildChannelBodies: magazineUrl embedded raw inside Telegram HTML
   <a href="..."> and Slack <URL|text> syntax. The URL is HMAC-
   signed and shape-validated upstream (userId regex + YYYY-MM-DD
   date), so injection is practically impossible — but the email
   CTA (injectBriefCta) escapes per-target and channel footers
   should match that discipline. Added:
     - Telegram: escape &, <, >, " to HTML entities
     - Slack: strip <, >, | (mrkdwn metacharacters)
   Discord and plain-text paths unchanged — Discord links tolerate
   raw URLs, plain text has no metacharacters to escape.

Tests: 98/98 still pass (deterministic issuedAt change was
transparent to existing assertions because tests already pass nowMs
explicitly via the issuedAt fixture field).
2026-04-18 12:30:08 +04:00
Elie Habib
45da551d17 feat(brief): per-user composer writing brief:{userId}:{issueDate} (Phase 3a) (#3154)
* feat(brief): per-user composer writing brief:{userId}:{issueDate} (Phase 3a)

Phase 3a of docs/plans/2026-04-17-003. Produces the Redis-resident
envelopes that Phases 1 (renderer) and 2 (edge routes) already know
how to serve, so after this ships the end-to-end read path works
with real data.

Files:

- shared/brief-filter.{js,d.ts}: pure helpers. normaliseThreatLevel
  maps upstream 'moderate' -> 'medium' (contract pinned the union in
  Phase 1). filterTopStories applies sensitivity thresholds and caps
  at maxStories. assembleStubbedBriefEnvelope builds a full envelope
  with stubbed greeting/lead/threads/signals and runs it through the
  renderer's assertBriefEnvelope so no malformed envelope is ever
  persisted. issueDateInTz computes per-user local date via Intl
  with UTC fallback.

- scripts/seed-brief-composer.mjs: Railway cron. Reads
  news:insights:v1 once, fetches enabled alert rules via the
  existing /relay/digest-rules endpoint (same set
  seed-digest-notifications uses), then for each rule computes the
  user's local issue date, filters stories, assembles an envelope,
  and SETEX brief:{userId}:{issueDate} with 7-day TTL. Respects
  aiDigestEnabled opt-in. Honours SIGTERM. Exits non-zero when >5%
  of rules fail so Railway surfaces structural breakage.

- Dockerfile.seed-brief-composer: standalone container. Copies the
  minimum set (composer + shared/ contract + renderer validator +
  Upstash helper + seed-envelope unwrapper).

- tests/brief-filter.test.mjs: 22 pure-function tests covering
  severity normalisation (including 'moderate' alias), sensitivity
  thresholds, story cap, empty-title drop, envelope assembly passes
  the strict renderer validator, tz-aware date math across +UTC/-UTC
  offsets with a bad-timezone fallback.

Out of scope for this PR:
- LLM-generated whyMatters / lead / signals (Phase 3b).
- brief_ready event fan-out to notification-relay (Phase 3c).
- Dashboard panel that consumes /api/latest-brief (Phase 4).

Pre-merge runbook:
1. Create a new Railway service from Dockerfile.seed-brief-composer.
2. Set env vars (UPSTASH_*, CONVEX_URL, RELAY_SHARED_SECRET) — reuse
   the values already in the digest service.
3. Add a cron schedule (suggested: hourly at :05 so it lands between
   the insights-seeder tick and the digest cron).
4. Verify first run: check service logs for
   "[brief-composer] Done: success=X ..." and a reader's
   /api/latest-brief should stop returning 'composing' within one
   cron cycle.

Tests: 72/72 (22 brief-filter + 30 render + 20 HMAC). Typecheck +
lint clean. Composer script parses with node --check.

* fix(brief): aiDigestEnabled default + per-user rule dedupe

Addresses two fourth-round review findings on PR #3154.

1. aiDigestEnabled default parity (todo 224). Composer was checking
   `!rule.aiDigestEnabled`, which skips legacy rules that predate the
   optional field. The rest of the codebase defaults it to true
   (seed-digest-notifications.mjs:914 uses `!== false`;
   notifications-settings.ts:228 uses `?? true`; the Convex setter
   defaults to true). Flipped the composer to `=== false` so only an
   explicit opt-out skips the brief.

2. Multi-variant last-write-wins (todo 225). alertRules are
   (userId, variant)-scoped but the brief key is user-scoped
   (brief:{userId}:{issueDate}). Users with the full+finance+tech
   variants all enabled would produce three competing writes with a
   nondeterministic survivor. Added dedupeRulesByUser() that picks
   one rule per user: prefers 'full' variant, then most permissive
   sensitivity (all > high > critical), tie-breaking on earliest
   updatedAt for stability across input reordering. Logs the
   occurrence so we can see how often users have multi-variant
   configs.

Also hardened against future regressions:

- Moved env-var guards + main() call behind an isMain() check
  (feedback_seed_isMain_guard). Previously, importing the script
  from a test would fire process.exit(0) on the
  BRIEF_COMPOSER_ENABLED=0 branch and kill the test runner. Tests
  now load the file cleanly.

- Exported dedupeRulesByUser so the tests can exercise the selection
  logic directly.

- The new tests/brief-composer-rule-dedup.test.mjs includes a
  cross-module assertion that seed-digest-notifications.mjs still
  reads `rule.aiDigestEnabled !== false`. If the digest cron ever
  drifts, this test fails loud — the brief and digest must agree on
  who is eligible.

Tests: 83/83 (was 72; +6 dedupe cases + 5 aiDigestEnabled parity
cases). Typecheck + lint clean.

* fix(brief): dedupe order + failure-rate denominator

Addresses two fifth-round review findings on PR #3154.

1. Dedupe was picking a preferred variant BEFORE checking whether it
   could actually emit a brief (todo 226). A user with
   aiDigestEnabled=false on 'full' but true on 'finance' got skipped
   entirely; same for a user with sensitivity='critical' on 'full'
   that filters to zero stories while 'finance' has matching content.

   Replaced dedupeRulesByUser with groupEligibleRulesByUser: pre-
   filters opted-out rules, then returns ALL eligible variants per
   user in preference order. The main loop walks candidates and
   takes the first one whose story filter produces non-empty content.
   Fallback is cheap (story filter is pure) and preserves the 'full'-
   first + most-permissive-sensitivity tie-breakers from before.

   dedupeRulesByUser is kept as a thin wrapper for the existing tests;
   new tests exercise the group+fallback path directly (opt-out +
   opt-in sibling, all-opted-out drop, ordering stability).

2. Failure gate denominator drifted from numerator (todo 227). After
   dedupe, `failed` counts per-user but the gate still compared to
   pre-dedupe rules.length. 60 rules → 10 users → 2 failed writes =
   20% real failure hidden behind a 60-rule denominator.

   Fix: denominator is now eligibleUserCount (Map size after
   group-and-filter). Log line reports rules + eligible_users +
   success + skipped_empty + failed + duration so ops can see the
   full shape.

Tests: 86/86 (was 83; +3 new: opt-out+sibling, all-opted-out drop,
candidate-ordering). Typecheck clean, node --check clean, biome clean.

* fix(brief): body-POST SETEX + attempted-only failure denominator

Addresses two sixth-round review findings on PR #3154.

1. Upstash SETEX (todo 228). The previous write path URL-encoded the
   full envelope into /setex/{key}/{ttl}/{payload} which can blow
   past proxy/edge/Node HTTP request-target limits for realistic
   12-story briefs (5-20 KB JSON). Switched to body-POST via the
   existing `redisPipeline` helper — same transport every other
   write in the repo uses. Per-command error surface is preserved:
   the wrapper throws on null pipeline response or on a {error}
   entry in the result array.

2. Failure-rate denominator (todo 229). Earlier round switched
   denominator from pre-dedupe rules.length to eligibleUserCount,
   but the numerator only counts users that actually reached a
   write attempt. skipped_empty users inflate eligibleUserCount
   without being able to fail, so 4/4 failed writes against 100
   eligible (96 skipped_empty) reads as 4% and silently passes.
   Denominator is now `success + failed` (attempted writes only).

Extracted shouldExitNonZero({success, failed}) so the denominator
contract lives in a pure function with 7 test cases:
- 0 failures → no exit
- 100% failure on small volume → exits
- 1/20 at exact 5% threshold → exits (documented boundary)
- 1/50 below threshold → no exit
- 2/10 above Math.max(1) floor → exits
- 1/1 single isolated failure → exits
- 0 attempted (no signal) → no exit

Tests: 93/93 (was 86; +7 threshold cases). Typecheck + lint clean.
2026-04-18 08:45:02 +04:00