mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
docs/user-facing-gaps
3471 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
63ef0dd0f1 |
docs(mintlify): finish PR 1 — landing rewrite, features refresh, maritime link-out
Completes the PR 1 items from docs/plans/2026-04-19-001-feat-docs-user- facing-ia-refresh-plan.md that were deferred after the checkpoint on Route Explorer + Scenario Engine + CRI nav. No new pages — only edits to existing pages to point at and cohere with the new workflow pages. - documentation.mdx: landing rewrite. Dropped brittle counts (344 news sources, 49 layers, 24 CII countries, 31+ sources, 24 typed services) in favor of durable product framing. Surfaced the shipped differentiators that were invisible on the landing previously: Country Resilience Index (222 countries, linked to its methodology page), AI daily brief, Route Explorer, Scenario Engine, MCP server. Kept CII and CRI as two distinct country-risk surfaces — do not conflate. - features.mdx: replaced the 'all 55 panels' Cmd+K claim and the stale inventory list with family-grouped descriptions that include the panels this audit surfaced as missing (disease- outbreaks, radiation-watch, thermal-escalation, consumer-prices, latest-brief, forecast, country-resilience). Added a Workflows section linking to Route Explorer and Scenario Engine, and a Country-level risk section linking CII + CRI. Untouched sections (map, marker clustering, data layers, export, monitors, activity tracking) left as-is. - maritime-intelligence.mdx: collapsed the embedded Route Explorer subsection to a one-paragraph pointer at /route-explorer so the standalone page is the canonical home. Panels nav group remains intentionally unadded; it waits on PR 2 content to avoid rendering an empty group in Mintlify. |
||
|
|
233835e206 |
docs(mintlify): fix stale line cite (MapContainer.activateScenario at :1010)
Greptile review P2: prose cited MapContainer.ts:1004 but activateScenario is declared at :1010. Line 1004 landed inside the JSDoc block. |
||
|
|
bffa1f4498 |
docs(mintlify): fix PRO auth contract (trusted origin ≠ PRO)
- api-scenarios: 'X-WorldMonitor-Key (or trusted browser origin) + PRO' was wrong — isCallerPremium() explicitly skips trusted-origin short-circuits (keyCheck.required === false) and only counts (a) an env-valid or user-owned wm_-prefixed API key with apiAccess entitlement, or (b) a Clerk bearer with role=pro or Dodo tier ≥ 1. Browser calls work because premiumFetch() injects one of those credentials per request, not because Origin alone authenticates. Per server/_shared/premium-check.ts:34 and src/services/premium-fetch.ts:66. - usage-auth: strengthened the 'Entitlement / tier gating' section to state outright that authentication and PRO entitlement are orthogonal, and that trusted Origin is NOT accepted as PRO even though it is accepted for public endpoints. Listed the two real credential forms that pass the gate. |
||
|
|
80ede691cf |
docs(mintlify): fix fourth-round findings (banner DOM, webhook TTL refresh)
- scenario-engine: accurate description of the rendered scenario banner. Always-present elements are the ⚠ icon, scenario name, top-5 impacted countries with impact %, and dismiss ×. Params chip (e.g. '14d · +110% cost') and 'Simulating …' tagline are conditional on the worker result carrying template parameters (durationDays, disruptionPct, costShockMultiplier). The banner never lists affected chokepoints by name — the map and the chokepoint cards surface those. Per renderScenarioBanner at src/components/SupplyChainPanel.ts:750. - api-shipping-v2 (webhook TTL): register extends both the record and the owner-index set's 30-day TTL via atomic pipeline (SET + SADD + EXPIRE). rotate-secret and reactivate only extend the record's TTL — neither touches the owner-index set, so the owner index can expire independently if a caller only rotates/reactivates within a 30-day window. Re-register to keep both alive. Per api/v2/shipping/webhooks.ts:230 (register pipeline) and :325 (rotate setCachedJson on record only). |
||
|
|
fc7d46829f |
docs(mintlify): fix third-round review findings (real IDs + 4-state lifecycle)
- api-scenarios (template example): replaced invented hormuz-closure-30d / ["hormuz"] with the actually-shipped hormuz-tanker-blockade / ["hormuz_strait"] from scenario- templates.ts:80. Listed the other 5 shipped template IDs so scripted users aren't dependent on a single example. - api-scenarios (status lifecycle): worker writes FOUR states, not three. Added the intermediate "processing" state with startedAt, written by the worker at job pickup (scenario- worker.mjs:411). Lifecycle now: pending → processing → done|failed. Both pending and processing are non-terminal. - scenario-engine (scripted use blurb): mirror the 4-state language and link into the lifecycle table. - scenario-engine (UI dismiss): replaced "Click Deactivate" with the actual × dismiss control on the scenario banner (aria-label: "Dismiss scenario") per src/components/SupplyChainPanel.ts:790. Also described the banner contents (name, chokepoints, countries, tagline). - api-shipping-v2: while fixing chokepoint IDs, also corrected "hormuz" → "hormuz_strait" and "bab-el-mandeb" → "bab_el_mandeb" across all four occurrences in the shipping v2 page (from PR #3209). Real IDs come from server/_shared/chokepoint- registry.ts (snake_case, not kebab-case, not bare "hormuz"). |
||
|
|
6380245f21 |
docs(mintlify): fix Route Explorer + Scenario Engine review findings
Reviewer caught 4 cases where I described behavior I hadn't read carefully. All fixes cross-checked against source. - route-explorer (free-tier): the workflow does NOT blur a numeric payload behind a public demo route. On free tier, fetchLane() short-circuits to renderFreeGate() which blurs the left rail, replaces the tab area with an Upgrade-to-PRO card, and applies a generic public-route highlight on the map. No lane data is rendered in any tab. See src/components/RouteExplorer/ RouteExplorer.ts:212 + :342. - route-explorer (keyboard): Tab / Shift+Tab moves focus between the panel and the map. Direct field jumps are F (From), T (To), P (Product/HS2), not Tab-cycling. Also added the full KeyboardHelp binding list (S swap, ↑/↓ list nav, Enter commit, Cmd+, copy URL, Esc close, ? help, 1-4 tabs). See src/components/RouteExplorer/ KeyboardHelp.ts:9 and RouteExplorer.ts:623. - scenario-engine: the SCENARIO_TEMPLATES array only ships templates of 4 types today (conflict, weather, sanctions, tariff_shock). The ScenarioType union includes infrastructure and pandemic but no templates of those types ship. Dropped them from the shipped table and noted the type union leaves room for future additions. - scenario-engine + api-scenarios: the worker writes status: 'done' (not 'completed') on success, 'failed' on error; pending is synthesised by the status endpoint when no worker record exists. Fixed both the new workflow page and the merged api-scenarios.mdx completed-response example + polling language. See scripts/scenario-worker.mjs:421 and src/components/SupplyChainPanel.ts:870. |
||
|
|
44bc40ee34 |
docs(mintlify): add Route Explorer + Scenario Engine workflow pages
Checkpoint for review on the IA refresh (per plan docs/plans/2026-04-19-001-feat-docs-user-facing-ia-refresh-plan.md). - docs/docs.json: link Country Resilience Index methodology under Intelligence & Analysis so the flagship 222-country feature is reachable from the main nav (previously orphaned). Add a new Workflows group containing route-explorer and scenario-engine. - docs/route-explorer.mdx: standalone workflow page. Who it is for, Cmd+K entry, four tabs (Current / Alternatives / Land / Impact), inputs, keyboard bindings, map-state integration, PRO gating with free-tier blur + public-route highlight, data sources. - docs/scenario-engine.mdx: standalone workflow page. Template categories (conflict / weather / sanctions / tariff_shock / infrastructure / pandemic), how a scenario activates on the map, PRO gating, pointers to the async job API. Deferred to follow-up commits in the same PR: - documentation.mdx landing rewrite - features.mdx refresh - maritime-intelligence.mdx link-out to Route Explorer - Panels nav group (waits for PR 2 content) All content grounded in live source files cited inline. |
||
|
|
e4c95ad9be |
docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage (#3209)
* docs(mintlify): cover MCP, OAuth, non-RPC endpoints, and usage Audit against api/ + proto/ revealed 9 OpenAPI specs missing from nav, the scenario/v1 service undocumented, and MCP (32 tools + OAuth 2.1 flow) with no user-facing docs. The stale Docs_To_Review/API_REFERENCE.md still pointed at pre-migration endpoints that no longer exist. - Wire 9 orphaned specs into docs.json: ConsumerPrices, Forecast, Health, Imagery, Radiation, Resilience, Sanctions, Thermal, Webcam - Hand-write ScenarioService.openapi.yaml (3 RPCs) until it's proto-backed (tracked in issue #3207) - New MCP page with tool catalog + client setup (Claude Desktop/web, Cursor) - New MDX for OAuth, Platform, Brief, Commerce, Notifications, Shipping v2, Proxies - New Usage group: quickstart, auth matrix, rate limits, errors - Remove docs/Docs_To_Review/API_REFERENCE.md and EXTERNAL_APIS.md (referenced dead endpoints); add README flagging dir as archival * docs(mintlify): move scenario docs out of generated docs/api/ tree The pre-push hook enforces that docs/api/ is proto-generated only. Replace the hand-written ScenarioService.openapi.yaml with a plain MDX page (docs/api-scenarios.mdx) until the proto migration lands (tracked in issue #3207). * docs(mintlify): fix factual errors flagged in PR review Reviewer caught 5 endpoints where I speculated on shape/method/limits instead of reading the code. All fixes cross-checked against the source: - api-shipping-v2: route-intelligence is GET with query params (fromIso2, toIso2, cargoType, hs2), not POST with a JSON body. Response shape is {primaryRouteId, chokepointExposures[], bypassOptions[], warRiskTier, disruptionScore, ...}. - api-commerce: /api/product-catalog returns {tiers, fetchedAt, cachedUntil, priceSource} with tier groups free|pro|api_starter| enterprise, not the invented {currency, plans}. Document the DELETE purge path too. - api-notifications: Slack/Discord /oauth/start are POST + Clerk JWT + PRO (returning {oauthUrl}), not GET redirects. Callbacks remain GET. - api-platform: /api/version returns the latest GitHub Release ({version, tag, url, prerelease}), not deployed commit/build metadata. - api-oauth + mcp: /api/oauth/register limit is 5/60s/IP (match code), not 10/hour. Also caught while double-checking: /api/register-interest and /api/contact are 5/60min and 3/60min respectively (1-hour window, not 1-minute). Both require Turnstile. Removed the fabricated limits for share-url, notification-channels, create-checkout (they fall back to the default per-IP limit). * docs(mintlify): second-round fixes — verify every claim against source Reviewer caught 7 more cases where I described API behavior I hadn't read. Each fix below cross-checked against the handler. - api-commerce (product-catalog): tiers are flat objects with monthlyPrice/annualPrice/monthlyProductId/annualProductId on paid tiers, price+period for free, price:null for enterprise. There is no nested plans[] array. - api-commerce (referral/me): returns {code, shareUrl}, not counts. Code is a deterministic 8-char HMAC of the Clerk userId; binding into Convex is fire-and-forget via ctx.waitUntil. - api-notifications (notification-channels): actual action set is create-pairing-token, set-channel, set-web-push, delete-channel, set-alert-rules, set-quiet-hours, set-digest-settings. Replaced the made-up list. - api-shipping-v2 (webhooks): alertThreshold is numeric 0-100 (default 50), not a severity string. Subscriber IDs are wh_+24hex; secret is raw 64-char hex (no whsec_ prefix). POST registration returns 201. Added the management routes: GET /{id}, POST /{id}/rotate-secret, POST /{id}/reactivate. - api-platform (cache-purge): auth is Authorization: Bearer RELAY_SHARED_SECRET, not an admin-key header. Body takes keys[] and/or patterns[] (not {key} or {tag}), with explicit per-request caps and prefix-blocklist behavior. - api-platform (download): platform+variant query params, not file=<id>. Response is a 302 to a GitHub release asset; documented the full platform/variant tables. - mcp: server also accepts direct X-WorldMonitor-Key in addition to OAuth bearer. Fixed the curl example which was incorrectly sending a wm_live_ API key as a bearer token. - api-notifications (youtube/live): handler reads channel or videoId, not channelId. - usage-auth: corrected the auth-matrix row for /api/mcp to reflect that OAuth is one of two accepted modes. * docs(mintlify): fix Greptile review findings - mcp.mdx: 'Five' slow tools → 'Six' (list contains 6 tools) - api-scenarios.mdx: replace invalid JSON numeric separator (8_400_000_000) with plain integer (8400000000) Greptile's third finding — /api/oauth/register rate-limit contradiction across api-oauth.mdx / mcp.mdx / usage-rate-limits.mdx — was already resolved in commit |
||
|
|
38e6892995 |
fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205)
* fix(brief): per-run slot URL so same-day digests link to distinct briefs
Digest emails at 8am and 1pm on the same day pointed to byte-identical
magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz.
Each compose run overwrote the single daily envelope in place, and the
composer rolling 24h story window meant afternoon output often looked
identical to morning. Readers clicking an older email got whatever the
latest cron happened to write.
Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The
magazine URL, carousel URLs, and Redis key all carry the slot, and each
digest dispatch gets its own frozen envelope that lives out the 7d TTL.
envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026".
The digest cron also writes a brief:latest:{userId} pointer (7d TTL,
overwritten each compose) so the dashboard panel and share-url endpoint
can locate the most recent brief without knowing the slot. The
previous date-probing strategy does not work once keys carry HHMM.
No back-compat for the old YYYY-MM-DD format: the verifier rejects it,
the composer only ever writes the new shape, and any in-flight
notifications signed under the old format will 403 on click. Acceptable
at the rollout boundary per product decision.
* fix(brief): carve middleware bot allowlist to accept slot-format carousel path
BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the
pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted
by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist
and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn
image fetchers would 403 on sendMediaGroup, breaking previews for the
new digest links.
CI missed this because tests/middleware-bot-gate.test.mts still
exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the
slot format and add a regression asserting the pre-slot shape is now
rejected, so legacy links cannot silently leak the allowlist after
the rollout.
* fix(brief): preserve caller-requested slot + correct no-brief share-url error
Two contract bugs in the slot rollout that silently misled callers:
1. GET /api/latest-brief?slot=X where X has no envelope was returning
{ status: 'composing', issueDate: <today UTC> } — which reads as
"today's brief is composing" instead of "the specific slot you
asked about doesn't exist". A caller probing a known historical
slot would get a completely unrelated "today" signal. Now we echo
the requested slot back (issueSlot + issueDate derived from its
date portion) when the caller supplied ?slot=, and keep the
UTC-today placeholder only for the no-param path.
2. POST /api/brief/share-url with no slot and no latest-pointer was
falling into the generic invalid_slot_shape 400 branch. That is
not an input-shape problem; it is "no brief exists yet for this
user". Return 404 brief_not_found — the same code the
existing-envelope check returns — so callers get one coherent
contract: either the brief exists and is shareable, or it doesn't
and you get 404.
|
||
|
|
56054bfbc1 |
fix(brief): use wildcard glob in vercel.json functions key (PR #3204 follow-up) (#3206)
* fix(brief): use wildcard glob in vercel.json functions key PR #3204 shipped the right `includeFiles` value but the WRONG key: "api/brief/carousel/[userId]/[issueDate]/[page].ts" Vercel's `functions` config keys are micromatch globs, not literal paths. Bracketed segments like `[userId]` are parsed as character classes (match any ONE character from {u,s,e,r,I,d}), so my rule matched zero files and `includeFiles` was silently ignored. Post- merge probe still returned HTTP 500 FUNCTION_INVOCATION_FAILED on every request. Build log shows zero mentions of `carousel` or `resvg` — corroborates the key never applied. Fix: wildcard path segments. "api/brief/carousel/**" Matches any file under the carousel route dir. Since the only deployed file there is the dynamic-segment handler, the effective scope is identical to what I originally intended. Added a second regression test that sweeps every functions key and fails loudly if any bracketed segment slips back in. Guards against future reverts AND against anyone copy-pasting the literal route path without realising Vercel reads it as a glob. 23/23 deploy-config tests pass (was 22, +1 new guard). * Address Greptile P2: widen bracket-literal guard regex Greptile spotted that `/\[[A-Za-z]+\]/` only matches purely-alphabetic segment names. Real-world Next.js routes often use `[user_id]`, `[issue_date]`, `[page1]`, `[slug2024]` — none flagged by the old regex, so the guard would silently pass on the exact kind of regression it was written to catch. Widened to `/\[[A-Za-z][A-Za-z0-9_]*\]/`: - requires a leading letter (so legit char classes like `[0-9]` and `[!abc]` don't false-positive) - allows letters, digits, underscores after the first char - covers every Next.js-style dynamic-segment name convention Also added a self-test that pins positive cases (userId, user_id, issue_date, page1, slug2024) and negative cases (the actual `**` glob, `[0-9]`, `[!abc]`) so any future narrowing of the regex breaks CI immediately instead of silently re-opening PR #3206. 24/24 deploy-config tests pass (was 23, +1 new self-test). |
||
|
|
305dc5ef36 |
feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200)
* feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op)
Replaces the inline Jaccard story-dedup in seed-digest-notifications
with an orchestrator that can run Jaccard, shadow, or full embedding
modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so
production behaviour is unchanged until Phase C shadow + Phase D flip.
New modules (scripts/lib/):
- brief-dedup-consts.mjs tunables + cache prefix + __constants bag
- brief-dedup-jaccard.mjs verbatim 0.55-threshold extract (fallback)
- entity-gazetteer.mjs cities/regions gazetteer + common-caps
- brief-embedding.mjs OpenRouter /embeddings client with Upstash
cache, all-or-nothing timeout, cosineSimilarity
- brief-dedup-embed.mjs complete-link clustering + entity veto (pure)
- brief-dedup.mjs orchestrator, env read at call entry,
shadow archive, structured log line
Operator tools (scripts/tools/):
- calibrate-dedup-threshold.mjs offline calibration runner + histogram
- golden-pair-validator.mjs live-embedder drift detector (nightly CI)
- shadow-sample.mjs Sample A/B CSV emitter over SCAN archive
Tests:
- brief-dedup-jaccard.test.mjs migrated from regex-harness to direct
import plus orchestrator parity tests (22)
- brief-dedup-embedding.test.mjs 9 plan scenarios incl. 10-permutation
property test, complete-link non-chain (21)
- brief-dedup-golden.test.mjs 20-pair mocked canary (21)
Workflows:
- .github/workflows/dedup-golden-pairs.yml nightly live-embedder canary
(07:17 UTC), opens issue on drift
Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran
shuts Hormuz") case can't return true under a single coherent
classification (country-in-A vs capital-in-B sit on different sides
of the actor/location boundary). Gazetteer follows the plan's
"countries are actors" intent; the test is updated to assert false
with a comment pointing at the irreducible capital-country
coreference limitation.
Verification:
- npm run test:data 5825/5825 pass
- tests/edge-functions 171/171 pass
- typecheck + typecheck:api clean
- biome check on new files clean
- lint:md 0 errors
Phase B (calibration), Phase C (shadow), and Phase D (flip) are
subsequent PRs.
* refactor(digest-dedup): address review findings 193-199
Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across
kieran-typescript, security-sentinel, performance-oracle, architecture-
strategist, and code-simplicity reviewers. Fixes below; all 64 dedup
tests + 5825 data tests + 171 edge-function tests still green.
P1 #193 - dedup regex + redis pipeline duplication
- Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs;
both orchestrator and embedding client import from there.
- normalizeForEmbedding now delegates to stripSourceSuffix from the
Jaccard module so the outlet allow-list is single-sourced.
P1 #194 - embedding timeout floor + negative-budget path
- callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0
instead of opening a doomed 250ms fetch.
- Removed Math.max(250, ...) floor that let wall-clock cap overshoot.
P1 #195 - dead env getters
- Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled /
getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs
(zero callers; orchestrator reimplements inline).
P2 #196 - orchestrator cleanup bundle
- Removed re-exports at bottom of brief-dedup.mjs.
- Extracted materializeCluster into brief-dedup-jaccard.mjs; both
the fallback and orchestrator use the shared helper.
- Deleted clusterWithEntityVeto wrapper; orchestrator inlines the
vetoFn wiring at the single call site.
- Shadow mode now runs Jaccard exactly once per tick (was twice).
- Fallback warn line carries reason=ErrorName so operators can
filter timeout vs provider vs shape errors.
- Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs
silently falling to jaccard).
P2 #197 - workflow + shadow-sample hardening
- dedup-golden-pairs.yml body composition no longer relies on a
heredoc that would command-substitute validator stdout. Switched
to printf with sanitised LOG_TAIL (printable ASCII only) and
--body-file so crafted fixture text cannot escape into the runner.
- shadow-sample.mjs Upstash helper enforces a hardcoded command
allowlist (SCAN | GET | EXISTS).
P2 #198 - test + observability polish
- Scenarios 2 and 3 deep-equal returned clusters against the Jaccard
expected shape, not just length. Also assert the reason= field.
P3 #199 - nits
- Removed __constants test-bag; jaccard tests use named imports.
- Renamed deps.apiKey to deps._apiKey in embedding client.
- Added @pre JSDoc on diffClustersByHash about unique-hash contract.
- Deferred: mocked golden-pair test removal, gazetteer JSON migration,
scripts/tools AGENTS.md doc note.
Todos 193-199 moved from pending to complete.
Verification:
- npm run test:data 5825/5825 pass
- tests/edge-functions 171/171 pass
- typecheck + typecheck:api clean
- biome check on changed files clean
* fix(digest-dedup): address Greptile P2 findings on PR #3200
1. brief-embedding.mjs: wrap fetch lookup as
`(...args) => globalThis.fetch(...args)` instead of aliasing bare
`fetch`. Aliasing captures the binding at module-load time, so
later instrumentation / Edge-runtime shims don't see the wrapper —
same class of bug as the banned `fetch.bind(globalThis)` pattern
flagged in AGENTS.md.
2. dedup-golden-pairs.yml: `gh issue create --label "..." || true`
silently swallowed the failure when any of dedup/canary/p1 labels
didn't pre-exist, breaking the drift alert channel while leaving
the job red in the Actions UI. Switched to repeated `--label`
flags + `--create-label` so any missing label is auto-created on
first drift, and dropped the `|| true` so a legitimate failure
(network / auth) surfaces instead of hiding.
Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1);
applied pre-merge so the nightly canary is usable from day one.
* fix(digest-dedup): two P1s found on PR #3200
P1 — canary classifier must match production
Nightly golden-pair validator was checking a hardcoded threshold
(default 0.60) and always applied the entity veto, while the actual
dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase
C/D env flip could make the canary green while prod was wrong or
red while prod was healthy, defeating the whole point of a drift
detector.
Fix:
- golden-pair-validator.mjs now calls readOrchestratorConfig(process.env)
— the same helper the orchestrator uses — so any classifier knob
added later is picked up automatically. The threshold and veto-
enabled flags are sourced from env by default; a --threshold CLI
flag still overrides for manual calibration sweeps.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.*),
which operators must keep in lockstep with Railway. The
workflow_dispatch threshold input now defaults to empty; the
scheduled canary always uses the production-parity config.
- Validator log line prints the effective config + source so nightly
output makes the classifier visible.
P1 — shadow archive writes were fail-open
`defaultRedisPipeline()` returns null on timeout / auth / HTTP
failure. `writeShadowArchive()` only had a try/catch, so the null
result was silently treated as success. A Phase C rollout could
log clean "mode=shadow … disagreements=X" lines every tick while
the Upstash archive received zero writes — and Sample B labelling
would then find no batches, silently killing calibration.
Fix:
- writeShadowArchive now inspects the pipeline return. null result,
non-array response, per-command {error}, or a cell without
{result: "OK"} all return {ok: false, reason}.
- Orchestrator emits a warn line with the failure reason, and the
structured log line carries archive_write=ok|failed so operators
can grep for failed ticks.
- Regression test in brief-dedup-embedding.test.mjs simulates the
null-pipeline contract and asserts both the warn and the structured
field land.
Verification:
- test:data 5825/5825 pass
- dedup suites 65/65 pass (new: archive-fail regression)
- typecheck + api clean
- biome check clean on changed files
* fix(digest-dedup): two more P1s found on PR #3200
P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED
The prior round fixed the threshold/veto knobs but left the canary
running embeddings regardless of whether production could actually
reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or
DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the
classifier, so a drift signal is meaningless — or worse, a live
OpenRouter issue flags the canary while prod is obliviously fine.
Fix:
- golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the
same readOrchestratorConfig() helper the orchestrator uses. When
either says "embed path inactive in prod", the validator logs an
explicit skip line and exits 0. The nightly workflow then shows
green, which is the correct signal ("nothing to drift against").
- A --force CLI flag remains for manual dispatch during staged
rollouts.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and
DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables
alongside the threshold and veto-enabled knobs, so all four
classifier gates stay in lockstep with Railway.
- Validator log line now prints mode + remoteEmbedEnabled so the
canary output surfaces which classifier it validated.
P1 — shadow-sample Sample A was biased by SCAN order
enumerate-and-dedup added every seen pair to a dedup key BEFORE
filtering by agreement. If the same pair appeared in an agreeing
batch first and a disagreeing batch later, the disagreeing
occurrence was silently dropped. SCAN order is unspecified, so
Sample A could omit real disagreement pairs.
Fix:
- Extracted the enumeration into a pure `enumeratePairs(archives, mode)`
export so the logic is testable. Mode filter runs BEFORE the dedup
check: agreeing pairs are skipped entirely under
--mode disagreements, so any later disagreeing occurrence can
still claim the dedup slot.
- Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression
cases: agreement-then-disagreement, reversed order (symmetry),
always-agreed omission, population enumeration, cross-batch dedup.
- isMain guard added so importing the module for tests does not
kick off the CLI scan path.
Verification:
- test:data 5825/5825 pass
- dedup suites 70/70 pass (5 new shadow-sample regressions)
- typecheck + api clean
- biome check clean on changed files
Operator follow-up before Phase C:
Set all FOUR dedup repo variables in GitHub alongside Railway:
DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED,
DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED
* refactor(digest-dedup): Railway is the single source of truth for dedup config
Fair user pushback: asking operators to set four DIGEST_DEDUP_*
values in BOTH Railway (where the cron runs) AND GitHub repo
variables (where the canary runs) is architectural debt. Two
copies of the same truth will always drift.
Solution: the digest cron publishes its resolved config to Upstash
on every tick under brief:dedup:config:v1 (2h TTL). The nightly
golden-pair canary reads that key instead of env vars. Railway
stays the sole source of truth; no parallel repo variables to
maintain. A missing/expired key signals "cron hasn't run" and
the canary skips with exit 0 — better than validating against
hardcoded defaults that might diverge from prod.
Changes:
- brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants.
- brief-dedup.mjs: new publishActiveConfig() fires at the start of
every deduplicateStories() call (before the mode short-circuit,
so jaccard ticks also publish a "mode=jaccard" signal the canary
can read). Fire-and-forget; archive-write error semantics still
apply if the operator wants stricter tracking.
- golden-pair-validator.mjs: removed readOrchestratorConfig(env)
path. Now calls fetchActiveConfigFromUpstash() and either
validates against that config, skips when the embed path is
inactive, or skips when the key is missing (with --force
override for manual dispatch).
- dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines
and the corresponding repo-variable dependency. Only the three
Upstash + OpenRouter secrets remain.
- tests: two new regressions assert config is published on every
tick (shadow AND jaccard modes) with the right shape + TTL.
Operator onboarding now takes one action: set the four
DIGEST_DEDUP_* variables on the Railway seed-digest-notifications
service. Nothing to set in GitHub beyond the existing
OPENROUTER_API_KEY / UPSTASH_* secrets.
Verification:
- test:data 5825/5825 pass
- dedup suites 72/72 pass (2 new config-publish regressions)
- typecheck + api clean
- biome check clean on changed files
* refactor(digest-dedup): ship embed directly, drop phases/canary/shadow
User feedback: "i dont need multiple phases and shit, we go directly
to embed". Fair. Ripping out the overengineering I accumulated:
DELETED
- .github/workflows/dedup-golden-pairs.yml (nightly canary)
- scripts/tools/golden-pair-validator.mjs
- scripts/tools/shadow-sample.mjs
- scripts/tools/calibrate-dedup-threshold.mjs
- tests/fixtures/brief-dedup-golden-pairs.json
- tests/brief-dedup-golden.test.mjs
- tests/brief-dedup-shadow-sample.test.mjs
SIMPLIFIED
- brief-dedup.mjs: removed shadow mode, publishActiveConfig,
writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes,
and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now
binary: `embed` (default) or `jaccard` (instant kill switch).
- brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_*, ACTIVE_CONFIG_*.
- Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path).
Railway deploy with OPENROUTER_API_KEY set = embeddings live on
next cron tick. Set MODE=jaccard on Railway to revert instantly.
Orchestrator still falls back to Jaccard on any embed-path failure
(timeout, provider outage, missing API key, bad response). Fallback
warn carries reason=<ErrorName>. The cron never fails because
embeddings flaked. All 64 dedup tests + 5825 data tests still green.
Net diff: -1,407 lines.
Operator single action: set OPENROUTER_API_KEY on Railway's
seed-digest-notifications service (already present) and ship. No
GH Actions, no shadow archives, no labelling sprints. If the 0.60
threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on
Railway — takes effect on next tick, no redeploy.
* fix(digest-dedup): multi-word location phrases in the entity veto
Extractor was whitespace-tokenising and only single-token matching
against LOCATION_GAZETTEER, silently making every multi-word entry
unreachable:
extractEntities("Houthis strike ship in Red Sea")
→ { locations: [], actors: ['houthis','red','sea'] } ✗
shouldVeto("Houthis strike ship in Red Sea",
"US escorts convoy in Red Sea") → false ✗
With MODE=embed as the default, that turned off the main
anti-overmerge safety rail for bodies of water, regions, and
compound city names — exactly the P07-Hormuz / Houthis-Red-Sea
headlines the veto was designed to cover.
Fix: greedy longest-phrase scan with a sliding window. At each
token position try the longest multi-word phrase first (down to
2), require first AND last tokens to be capitalised (so lowercase
prose like "the middle east" doesn't falsely match while headline
"Middle East" does), lowercase connectors in between are fine
("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to
single-token lookup when no multi-word phrase fits.
Now:
extractEntities("Houthis strike ship in Red Sea")
→ { locations: ['red sea'], actors: ['houthis'] } ✓
shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true ✓
Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4
(longest gazetteer entry: "ho chi minh city"), so this is
effectively O(N).
Added 5 regression tests covering Red Sea, South China Sea,
Strait of Hormuz (lowercase-connector case), Abu Dhabi, and
New York, plus the Houthis-vs-US veto reproducer from the P1.
All 5825 data tests + 45 dedup tests green; lint + typecheck clean.
|
||
|
|
27849fee1e |
fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn (#3204)
* fix(brief): bundle resvg linux-x64-gnu native binding with carousel fn Real root cause of every Telegram carousel WEBPAGE_CURL_FAILED since PR #3174 merged. Not middleware (last PR fixed that theoretical path but not the observed failure). The Vercel function itself crashes HTTP 500 FUNCTION_INVOCATION_FAILED on every request including OPTIONS - the isolate can't initialise. The handler imports brief-carousel-render which lazy-imports @resvg/resvg-js. That package's js-binding.js does runtime require(@resvg/resvg-js-<platform>-<arch>-<libc>). On Vercel Lambda (Amazon Linux 2 glibc) that resolves to @resvg/resvg-js-linux-x64-gnu. Vercel nft tracing does NOT follow this conditional require so the optional peer package isnt bundled. Cold start throws MODULE_NOT_FOUND, isolate crashes, Vercel returns FUNCTION_INVOCATION_FAILED, Telegram reports WEBPAGE_CURL_FAILED. Fix: vercel.json functions.includeFiles forces linux-x64-gnu binding into the carousel functions bundle. Only this route needs it; every other api route is unaffected. Verified: - deploy-config tests 21/21 pass - JSON valid - Reproduced 500 via curl on all methods and UAs - resvg-js/js-binding.js confirms linux-x64-gnu is the runtime binary on Amazon Linux 2 glibc Post-merge: curl with TelegramBot UA should return 200 image/png instead of 500; next cron tick should clear the Railway [digest] Telegram carousel 400 line. * Address Greptile P2s: regression guard + arch-assumption reasoning Two P2 findings on PR #3204: P2 #1 (inline on vercel.json:6): Platform architecture assumption undocumented. If Vercel migrates to Graviton/arm64 Lambda the cold-start crash silently returns. vercel.json is strict JSON so comments aren't possible inline. P2 #2 (tests/deploy-config.test.mjs:17): No regression guard for the carousel includeFiles rule. A future vercel.json tidy-up could silently revert the fix with no CI signal. Fixed both in a single block: - New describe() in deploy-config.test.mjs asserts the carousel route's functions entry exists AND its includeFiles points at @resvg/resvg-js-linux-x64-gnu. Any drift fails the build. - The block comment above it documents the Amazon Linux 2 x86_64 glibc assumption that would have lived next to the includeFiles entry if JSON supported comments. Includes the Graviton/arm64 migration pointer. tests 22/22 pass (was 21, +1 new). |
||
|
|
45f02fed00 |
fix(sentry): filter Three.js OrbitControls setPointerCapture NotFoundError (#3201)
* fix(sentry): suppress Three.js OrbitControls setPointerCapture NotFoundError OrbitControls' pointerdown handler calls setPointerCapture after the browser has already released the pointer (focus change, rapid re-tap), leaking as an unhandled NotFoundError. OrbitControls is bundled into main-*.js so hasFirstParty=true; matched by the unique setPointerCapture message (grep confirms no first-party setPointerCapture usage). Resolves WORLDMONITOR-NC. * fix(sentry): gate OrbitControls setPointerCapture filter on bundle-only stack Review feedback: suppressing by message alone would hide a future first-party setPointerCapture regression. Mirror the existing OrbitControls filter's provenance check — require absence of any source-mapped .ts/.tsx frame so the filter only matches stacks whose only non-infra frame is the bundled main chunk. Adds positive + negative regression tests for the pair. * fix(sentry): gate OrbitControls filter on positive three.js context signature Review feedback: absence of .ts/.tsx frames is not proof of third-party origin because production stacks are often unsymbolicated. Replace the negative-only gate with a positive OrbitControls signature — require a frame whose context slice contains the literal `_pointers … setPointerCapture` adjacency unique to three.js OrbitControls. Update tests to cover the production-realistic case (unsymbolicated first-party bundle frame calling setPointerCapture must still reach Sentry) plus a defensive no-context fallthrough. |
||
|
|
d7f87754f0 |
fix(emails): update transactional email copy — 22 → 30+ services (#3203)
Follow-up to #3202. Greptile flagged two transactional email templates still claimed '22 services' while /pro now advertises '30+': - api/register-interest.js:90 — interest-registration confirmation email ('22 Services, 1 Key') - convex/payments/subscriptionEmails.ts:57 — API subscription confirmation email ('22 services, one API key') A user signing up via /pro would read '30+ services' on the page, then receive an email saying '22'. Both updated to '30+' matching the /pro page and the actual server domain count (31 in server/worldmonitor/*, plus api/scenario/v1/ = 32, growing). |
||
|
|
135082d84f |
fix(pro): correct service-domain count — 22 → 30+ (server has 31) (#3202)
* fix(pro): correct service-domain count — 22 → 30+ (server has 31, growing) The /pro page advertised '22 services' / '22 service domains' but server/worldmonitor/, proto/worldmonitor/, and src/generated/server/worldmonitor/ all have 31 domain dirs (aviation, climate, conflict, consumer-prices, cyber, displacement, economic, forecast, giving, health, imagery, infrastructure, intelligence, maritime, market, military, natural, news, positive-events, prediction, radiation, research, resilience, sanctions, seismology, supply-chain, thermal, trade, unrest, webcam, wildfire). api/scenario/v1/ adds a 32nd recently shipped surface. Used '30+' rather than the literal '31' so the page doesn't drift again every time a new domain ships (the '22' was probably accurate at one point too). 168 string substitutions across all 21 locale JSON files (8 keys each: twoPath.proDesc, twoPath.proF1, whyUpgrade.fasterDesc, pillars.askItDesc, dataCoverage.subtitle, proShowcase.oneKey, apiSection.restApi, faq.a8). Plus 10 in pro-test/index.html (meta description, og:description, twitter:description, SoftwareApplication ld+json description + Pro Monthly offer, FAQ ld+json a8, noscript fallback). Bundle rebuilt. * fix(pro): Bulgarian grammar — drop definite-article suffix after 30+ |
||
|
|
cce46a1767 |
fix(pro): API tier is launched — drop 'Coming Soon' label (#3198)
The /pro comparison-table column header still read 'API (Coming Soon)' across all 21 locales (and locale-translated variants), but convex/config/productCatalog.ts has api_starter at currentForCheckout=true, publicVisible=true, priceCents=9999 — $99.99/month, with api_starter_annual at $999/year. The API tier is shipped and self-serve. Updated pricingTable.apiHeader → 'API ($99.99)' for every locale, matching the same '<Tier> ($<price>)' pattern as 'Free ($0)' and 'Pro ($39.99)'. Bundle rebuilt. |
||
|
|
c7aacfd651 |
fix(health): persist WARNING events + add failure-log timeline (#3197)
* fix(health): persist WARNING events + add failure-log timeline WARNING status (stale seeds) was excluded from the health:last-failure Redis write (line 680 checked `!== 'WARNING'`). When UptimeRobot keyword- checks for "HEALTHY" and gets a WARNING response, it flags DOWN, but no forensic trail was left in Redis. This made stale-seed incidents invisible to post-mortem investigation. Changes: - Write health:last-failure for ANY non-HEALTHY status (including WARNING) - Add health:failure-log (LPUSH list, last 50 entries, 7-day TTL) so multiple incidents are preserved as a timeline, not just the latest - Include warnCount alongside critCount in the snapshot - Broaden the problems filter to capture all non-OK statuses * fix(health): dedupe failure-log entries by incident signature Repeated polls during one long WARNING window would LPUSH near-identical snapshots, filling the 50-entry log and evicting older distinct incidents. Now compares a signature (status + sorted problem set) against the previous entry via health:failure-log-sig. Only appends when the incident changes. The last-failure key is still updated every poll (latest timestamp matters). * fix(health): add 4s timeout to persist pipelines + consistent arg types Addresses greptile review on PR #3197: - Both persist redisPipeline calls now pass 4_000ms timeout (main data pipeline uses 8_000ms; persist is less critical so shorter is fine) - LTRIM/EXPIRE args use numbers consistently (was mixing number/string) * fix(health): atomic sig swap via SET ... GET to eliminate dedupe race Two concurrent /api/health requests could both read the old signature before either write lands, appending duplicate entries. Now uses SET key val EX ttl GET (Redis 6.2+) to atomically swap the sig and return the previous value in one pipeline command. The LPUSH only fires if the returned previous sig differs from the new one. Also skips the second redisPipeline call entirely when sig matches (no logCmds to send). * fix(health): exclude seedAgeMin from dedupe sig + clear sig on recovery Two issues with the failure-log dedupe: 1. seedAgeMin changes on every poll (e.g. 31min, 32min, 33min), so the signature changed every time and LPUSH still fired on every probe during a STALE_SEED window. Now uses a separate sigKeys array with only key:status (no age) for the signature, while problemKeys still includes ages for the snapshot payload. 2. The sig was never cleared on recovery. If the same problem set recurred after a healthy gap, the old sig (within its 24h TTL) would match and the recurrence would be silently skipped. Now DELs health:failure-log-sig when overall === 'HEALTHY'. * fix(health): move sig write after LPUSH in same pipeline The sig was written eagerly in the first pipeline (SET ... GET), but the LPUSH happened in a separate background pipeline. If that second write failed, the sig was already advanced, permanently deduping the incident out of the timeline. Now: GET sig first (read-only), then write last-failure + LPUSH + sig all in one pipeline. The sig only advances if the entire pipeline succeeds. Failure leaves the old sig in place so the next poll retries. Reintroduces a small read-then-write race window (two concurrent probes can both read the old sig), but the worst case is a single duplicate entry, which is strictly better than a permanently dropped incident. |
||
|
|
63464775a5 |
feat(supply-chain): scenario UX — rich banner + projected score + faster poll (#3193)
* feat(supply-chain): rich scenario banner + projected score per chokepoint + faster poll
User reported Simulate Closure adds only a thin banner with no context —
"not clear what value user is getting, takes many many seconds". Four
targeted UX improvements in one PR:
A. Rich banner (scenario params + tagline)
Banner now reads:
⚠ Hormuz Tanker Blockade · 14d · +110% cost
CN 100% · IN 84% · TW 82% · IR 80% · US 39%
Simulating 14d / 100% closure / +110% cost on 1 chokepoint.
Chokepoint card below shows projected score; map highlights…
Surfaces the scenario template fields (durationDays, disruptionPct,
costShockMultiplier) + a one-line explainer so a first-time user
understands what "CN 100%" actually means.
B. Projected score on each affected chokepoint card
Card header now shows: `[current]/100 → [projected]/100` with a red
trailing badge + red left border on the card body.
Body prepends: "⚠ Projected under scenario: X% closure for N days
(+Y% cost)".
Projected = max(current, template.disruptionPct) — conservative
floor since the real scoring mixes threat + warnings + anomaly.
C. Faster polling
Status poll interval 2s → 1s. Max iterations 30→60 (unchanged 60s
budget). Worker processes in <1s; perceived latency drops from
2–3s to <2s in the common case. First poll still immediate.
D. ScenarioResult interface widened
Added optional `template` and `currentDisruptionScores` fields in
scenario-templates.ts to match what the scenario-worker already
emits. Optional = backward-compat with map-only consumers.
Dependent on PR #3192 (already merged) which fixed the 10000% banner
% inflation.
* fix(supply-chain): trigger render() on scenario activate/dismiss — cards must re-render
PR review caught a real bug in the new scenario UX: showScenarioSummary
and hideScenarioSummary were mutating the banner DOM directly without
triggering render(). renderChokepoints() reads activeScenarioState to
paint the projected score + red border + callout, but those only run
during render() — so the cards stayed stale on activate AND on dismiss
until some unrelated re-render happened.
Refactor to split public API from internal rendering:
- showScenarioSummary(scenarioId, result) — now just sets state + calls
render(). Was: set state + inline DOM mutation (bypassing card render).
- renderScenarioBanner() — new private helper that builds the banner
DOM from activeScenarioState. Called from render()'s postlude
(replacing the old self-recursive showScenarioSummary() call — which
only worked because it had a side-effectful early-exit path that
happened to terminate, but was a latent recursion risk).
- hideScenarioSummary() — now just sets state=null + calls render().
Was: clear state + manual banner removal + manual button-text reset
loop. The button loop is redundant now — the freshly-rendered card
template produces buttons with default "Simulate Closure" text by
construction.
Net effect: activating a scenario paints the banner AND the affected
chokepoint cards in a single render tick. Dismissing strips both in
the same tick.
* fix(supply-chain): derive scenario button state from activeScenarioState, not imperative mutation
PR review caught: the earlier re-render fix (showScenarioSummary → render())
correctly repaints cards on activate, but the button-state logic in
runScenario() is now wrong. render() detaches the old btn reference, so
the post-onScenarioActivate `resetButton('Active') + btn.disabled = true`
touches a detached node and no-ops (resetButton() explicitly skips
!btn.isConnected). The fresh button painted by render() uses the default
template text — visible button reads "Simulate Closure" enabled, and users
can queue duplicate runs of an already-active scenario.
Fix: make button state a function of panel state.
- renderChokepoints() scenario section: check
activeScenarioState.scenarioId === template.id and, when matched, emit
the button with class `sc-scenario-btn--active`, text "Active", and
`disabled` attribute. On dismiss, the next render strips those
automatically — same pattern as the card projection styling.
- runScenario(): drop the dead `resetButton('Active')` + `btn.disabled`
lines after onScenarioActivate. That path is now template-driven;
touching the detached btn was the defect.
Catch-path resets ('Simulate Closure' on abort, 'Error — retry' on real
error) are unchanged — those fire BEFORE any render could detach the btn,
so the imperative path is still correct there.
* fix(supply-chain): hide scenario projection arrow when current already ≥ template
Greptile P1: projected badge was rendered as `N/100 → N/100` whenever
current disruptionScore already met or exceeded template.disruptionPct.
Visible for Suez (80%) or Panama (50%) scenarios when a chokepoint is
already elevated — read as "scenario has zero effect", which is misleading.
The two values live on different scales — cp.disruptionScore is a
computed risk score (threat + warnings + anomaly) while
template.disruptionPct is "% of capacity blocked" — but they share the
0–100 axis so directional comparison is still meaningful for the
"does this scenario escalate things?" signal.
Fix: arrow only renders when template.disruptionPct > cp.disruptionScore.
When current already equals or exceeds the scenario level, show the
single current badge. The card's red left border + "⚠ Projected under
scenario" callout still indicate the card is the scenario target —
only the escalation arrow is suppressed.
|
||
|
|
85d6308ed0 |
fix(brief): unblock Telegram carousel fetch in middleware bot gate (#3196)
* fix(brief): allow Telegram/social UAs to fetch carousel images
middleware.ts BOT_UA regex (/bot/i) was 403 on Telegram sendMediaGroup
fetch of /api/brief/carousel/<u>/<d>/<p>. SOCIAL_IMAGE_UA allowlist
(includes telegrambot) was scoped to /favico/* and .png suffix only;
carousel returns image/png but the URL has no extension.
Symptom: Railway log [digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED
and zero images above the Telegram brief.
Fix: extend UA-bypass guard to cover /api/brief/carousel/ prefix.
HMAC token on the URL is the real auth; UA allowlist is defence-in-depth.
* Address P2 + P3: regression test + route-shape regex
P2: Add tests/middleware-bot-gate.test.mts — 13 cases pinning the
contract:
- TelegramBot/Slackbot/Discordbot/LinkedInBot pass on carousel
- curl, generic bot UAs, missing UA still 403 on carousel
- TelegramBot 403s on non-carousel API routes (scoped, not global)
- Malformed carousel paths (admin/dashboard, page >= 3, non-ISO
date) all still 403 via the regex
- Normal browsers pass everywhere
P3: Replace startsWith('/api/brief/carousel/') prefix with
BRIEF_CAROUSEL_PATH_RE matching the exact shape enforced by
api/brief/carousel/[userId]/[issueDate]/[page].ts
(userId / YYYY-MM-DD / page 0|1|2). A future
/api/brief/carousel/admin or similar sibling cannot inherit the
bypass. Comment now lists every social-image UA this protects.
typecheck + typecheck:api clean. test:data 5772/5772.
|
||
|
|
6025b0ce47 |
chore(sentry): add Chrome/Firefox variant of UTItemActionController filter (#3194)
The Safari variant (Can't find variable: UTItemActionController) was already in ignoreErrors at line 53. Chrome/Firefox uses the "X is not defined" format instead (WORLDMONITOR-NB). Added to the existing "is not defined" group at line 119. |
||
|
|
434a2e0628 |
feat(settings): API Keys tab visible to all users with PRO upgrade CTA (#3190)
* feat(settings): show API Keys tab to all users with PRO upgrade CTA Free users who clicked the API Keys tab triggered a server-side ConvexError: API_ACCESS_REQUIRED (WORLDMONITOR-NA). Now the tab is always visible with a PRO badge, and the content is gated client-side: - Anonymous: lock icon + "Sign In" CTA (opens Clerk sign-in) - Free: upgrade icon + "Upgrade to Pro" CTA (opens Dodo checkout) - PRO: full key management UI (unchanged) The Convex query is never called for non-PRO users, eliminating the server error at the source while creating a natural upgrade funnel. Reuses existing panel-locked-state CSS (gold accent, gradient button). * fix(settings): gate API Keys on apiAccess feature, not isProUser Addresses review findings on PR #3190: 1. Gate changed from isProUser() to hasFeature('apiAccess') — matches the server contract in convex/apiKeys.ts which requires apiAccess (tier 2+), not just PRO (tier 1). PRO users without apiAccess now correctly see the upgrade CTA instead of the full UI. 2. CTA button now launches API_STARTER_MONTHLY checkout instead of DEFAULT_UPGRADE_PRODUCT (PRO_MONTHLY) — users buy the correct product that actually includes API key access. 3. loadApiKeys() guard now checks both getAuthState().user AND hasFeature('apiAccess') — prevents anonymous keyed sessions (widget/pro keys without Clerk auth) from hitting the Convex query that requires authentication. * fix(settings): re-render API Keys panel when entitlements arrive On cold load, hasFeature('apiAccess') returns false until the Convex entitlement subscription delivers data. A paid API Starter user who opens settings before that snapshot arrives would see the upgrade CTA and loadApiKeys() would be skipped. Subscribes to onEntitlementChange() while the modal is open and re-renders the api-keys panel content + re-attaches handlers when entitlements change. Cleans up in close() and destroy(). Also extracts handler attachment into attachApiKeysHandlers() to avoid duplicating the CTA click + input keydown wiring between render() and the entitlement callback. |
||
|
|
7a99c3406e |
fix(supply-chain, news): scenario % double-multiply + scoreByEntities null-type TypeError (#3192)
Two unrelated issues reported from a live session (browser screenshot + console):
1. Scenario banner showed "CN 10000% · IN 8400% · TW 8200%"
showScenarioSummary did (c.impactPct * 100).toFixed(0) but the
scenario-worker already sends impactPct as a 0-100 integer:
scripts/scenario-worker.mjs:295 — Math.min(Math.round((total / max) * 100), 100)
Multiplying by 100 again inflated every percentage 100x.
Fix: drop the extra * 100. 100 renders as "100%", 84 as "84%".
2. Sentry/console TypeError at parallel-analysis.ts scoreByEntities:
[ParallelAnalysis] Error: TypeError: Cannot read properties of
undefined (reading 'includes')
The ML worker occasionally returns entities with undefined `type`
or `text`. scoreByEntities did entities.filter(e => e.type.includes('LOC'))
— NPE when e.type missing. Fix: narrow to a well-formed subset via
a type guard on e?.type and e?.text strings before any string access.
Apply the safe array everywhere downstream (locations/people/orgs +
density + confidence) so the guard is the single source of truth.
|
||
|
|
d8e479188a |
fix(supply-chain): don't CDN-cache empty chokepoint-history responses (#3189)
User reported "Transit history unavailable" persisting for Hormuz after PR #3187 deployed. Direct Redis probe confirms supply_chain:transit-summaries:history:v1:hormuz_strait has 174 entries. Direct server-side curl to /api/supply-chain/v1/get-chokepoint-history also returns 174. But the user's browser kept receiving `{"history":[],"fetchedAt":"0"}`. Root cause: gateway cache tier `slow` pins 200 responses for 30 min at Cloudflare edge (s-maxage=1800). During the gap between Vercel instant deploy and Railway ais-relay redeploy + first transit-summary cron tick (~20 min), per-id history keys were absent, so the handler returned empty. Those empty bodies got CF-cached and kept serving for 30 min AFTER the keys were populated in Redis. Bab-el-Mandeb (which DOES render for the user) got a fresh non-empty cache entry; Hormuz got stuck with the empty one. Fix: when returning empty (missing key, invalid id, error), call markNoCacheResponse(ctx.request) so the gateway sets Cache-Control: no-store instead of the 30-min tier cache. Every call on an empty state re-checks Redis. Once data is present, the normal tier cache applies on the non-empty response. Mechanism: the gateway at server/gateway.ts:488 honors X-No-Cache header via the same side-channel (response-headers.ts). Pattern already used by other handlers for upstream-unavailable bodies. Cost: per-id history keys are ~35KB, edge→Upstash round-trip <1.5s. Slight Redis-traffic bump for as long as keys stay empty; negligible in practice (only the deploy window). Also no-caches invalid chokepoint IDs so scanners/junk IDs don't pin 30-min empties either. |
||
|
|
d7e40bc4e5 |
chore(sentry): filter NS_ERROR_UNEXPECTED + ConvexError API_ACCESS_REQUIRED (#3188)
NS_ERROR_UNEXPECTED (WORLDMONITOR-N6/N7/N8/N9): Firefox 149/Ubuntu XPCOM
Worker init failure. Same family as already-filtered NS_ERROR_ABORT and
NS_ERROR_OUT_OF_MEMORY. 0 repo matches. Worker fallback confirmed working
via breadcrumbs ("keeping flat list").
ConvexError: API_ACCESS_REQUIRED (WORLDMONITOR-NA, 15 events/5 users):
expected business error from PR #3125 (API key management). Free user opens
API Keys tab, server correctly denies, client try/catch at
UnifiedSettings.ts:731 handles gracefully. Convex WS transport leaks the
rejection to Sentry before the client Promise chain catches it.
|
||
|
|
96fca1dc2b |
fix(supply-chain): popup-keyed history re-query + dataAvailable flag (#3187)
* fix(supply-chain): popup-keyed history re-query + dataAvailable flag for partial coverage Two P1 findings on #3185 post-merge review: 1. MapPopup cross-chokepoint history contamination Popup's async history resolve re-queried [data-transit-chart] without a cpId key. User opens popup A → fetch starts for cpA; user opens popup B before it resolves → cpA's history mounts into cpB's chart container. Fix: add data-transit-chart-id keyed by cpId; re-query by it on resolve. Mirrors SupplyChainPanel's existing data-chart-cp-id pattern. 2. Partial portwatch coverage still looked healthy Previous fix emits all 13 canonical summaries (zero-state fill for missing IDs) and records pwCovered in seed-meta, but: - get-chokepoint-status still zero-filled missing chokepoints and cached the response as healthy — panel rendered silent empty rows. - api/health.js only degrades on recordCount=0, so 10/13 partial read as OK despite the UI hiding entire chokepoints. Fix: - proto: TransitSummary.data_available (field 12). Writer tags with Boolean(cpData). Status RPC passes through; defaults true for pre-fix payloads (absence = covered). - Status RPC writes seed-meta recordCount as covered count (not shape size), and flips response-level upstreamUnavailable on partial. - api/health.js: new minRecordCount field on SEED_META entries + new COVERAGE_PARTIAL status (warn rollup). chokepoints entry declares minRecordCount: 13. recordCount < 13 → COVERAGE_PARTIAL. - Client (panel + popup): skip stats/chart rendering when !dataAvailable; show "Transit data unavailable (upstream partial)" microcopy so users understand the gap. 5759/5759 data tests pass. Typecheck + typecheck:api clean. * fix(supply-chain): guarantee Simulate Closure button exits Computing state User reports "Simulate Closure does nothing beyond write Computing…" — the button sticks at Computing forever. Two causes: 1. Scenario worker appears down (0 scenario-result:* keys in Redis in the last 24h of 24h-TTL). Railway-side — separate intervention needed to redeploy scripts/scenario-worker.mjs. 2. Client leaked the "Computing…" state on multiple exit paths: - signal.aborted early-return inside the poll loop never reset the button. Second click fired abort on first → first returned without resetting → button stayed "Computing…" until next render. - !this.content.isConnected early-return also skipped reset (less user-visible but same class of bug). - catch block swallowed AbortError without resetting. - POST /run had no hard timeout — a hanging edge function left the button in Computing indefinitely. Fix: - resetButton(text) helper touches the btn only if still connected; applied in every exit path (abort, timeout, post-success, catch). - AbortSignal.any([caller, AbortSignal.timeout(20_000)]) on POST /run. - console.error on failure so Simulate Closure errors surface in ops. - Error message includes "scenario worker may be down" on loop timeout so operators see the right suspect. Backend observations (for follow-up): - Hormuz backend is healthy (/api/health chokepoints OK, 13 records, 1 min old; live RPC has hormuz_strait.riskLevel=critical, wow=-22, flowEstimate present; GetChokepointHistory returns 174 entries). User-reported "Hormuz empty" is likely browser/CDN stale cache from before PR #3185; hard refresh should resolve. - scenario-worker.mjs has zero result keys in 24h. Railway service needs verification/redeployment. * fix(scenario): wrong Upstash RPUSH format silently broke every Simulate Closure Railway scenario-worker log shows every job failing field validation since at least 03:06Z today: [scenario-worker] Job failed field validation, discarding: ["{\"jobId\":\"scenario:1776535792087:cynxx5v4\",... The leading [" in the payload is the smoking gun. api/scenario/v1/run.ts was POSTing to /rpush/{key} with body `[payload]`, expecting Upstash to unpack the array and push one string value. Upstash does NOT parse that form — it stored the literal `["{...}"]` string as a single list value. Worker BLMOVEs the literal string → JSON.parse → array → destructure `{jobId, scenarioId, iso2}` on an array returns undefined for all three → every job discarded without writing a result. Client poll returns `pending` for the full 60s timeout, then (on the prior client code path) leaked the stuck "Computing…" button state indefinitely. Fix: use the standard Upstash REST command format — POST to the base URL with body `["RPUSH", key, value]`. Matches scripts/ais-relay.cjs upstashLpush. After this, the scenario-queue:pending list stores the raw payload string, BLMOVE returns the payload, JSON.parse gives the object, validation passes, computeScenario runs, result key gets written, client poll sees `done`. Zero result keys existed in prod Redis in the last 24h (24h TTL on scenario-result:*) — confirms the fix addresses the production outage. |
||
|
|
d37ffb375e |
fix(referral): stop /api/referral/me 503s on prod homepage (#3186)
* fix(referral): make /api/referral/me non-blocking to stop prod 503s Reported in prod: every PRO homepage load was logging 'GET /api/referral/me 503' to Sentry. Root cause: a prior review required the Convex binding to block the response (rationale: don't hand users a dead share link). That turned any flaky relay call into a homepage-wide 503 for the 5-minute client cache window — every PRO user, every page reload. Fix: dispatch registerReferralCodeInConvex via ctx.waitUntil. Response returns 200 + code + shareUrl unconditionally. Binding failures log a warning but never surface as 503. The mutation is idempotent; the next /api/referral/me fetch retries. The /pro?ref=<code> signup side reads userReferralCodes at conversion time, so a missed binding degrades to missed attribution (partial), never to blocked homepage (total). The BRIEF_URL_SIGNING_SECRET-missing 503 path is unchanged — that's a genuine misconfig, not a flake. Handler signature now takes ctx with waitUntil, matching api/notification-channels.ts and api/discord/oauth/callback.ts. Regression test flipped: brief-referral-code.test.mjs previously enforced the blocking shape; now enforces the non-blocking shape + handler signature + explicit does-not-503-on-binding-failure assertion. 14/14 referral tests pass. Typecheck clean, 5706/5706 test:data, lint exit 0. * fix(referral): narrow err in non-blocking catch instead of unsafe cast Greptile P2 on #3186. The (err as Error).message cast was safe today (registerReferralCodeInConvex only throws Error instances) but would silently log 'undefined' if a future path ever threw a non-Error value. Swapped to instanceof narrow + String(err) fallback. |
||
|
|
3c47c1b222 |
fix(supply-chain): split chokepoint transit data + close silent zero-state cache (#3185)
* fix(supply-chain): split chokepoint transit data + close silent zero-state cache
Production supply-chain panel was rendering 13 empty chokepoints because
the getChokepointStatus RPC silently cached zero-state for 5 minutes:
1. supply_chain:transit-summaries:v1 grew to ~500 KB (180d × 13 × 14 fields
of history per chokepoint).
2. REDIS_OP_TIMEOUT_MS is 1.5 s. Vercel Sydney edge → Upstash for a 500 KB
GET consistently exceeded the budget; getCachedJson caught the AbortError
and returned null.
3. The 500 KB portwatch fallback read hit the same timeout.
4. summaries = {} → every summaries[cp.id] was undefined → 13 chokepoints
got the zero-state default → cached as a non-null success response for
REDIS_CACHE_TTL (5 min) instead of NEG_SENTINEL (120 s).
Fix (one PR, per docs/plans/chokepoint-rpc-payload-split.md):
- ais-relay.cjs: split seedTransitSummaries output.
- supply_chain:transit-summaries:v1 — compact (~30 KB, no history).
- supply_chain:transit-summaries:history:v1:{id} — per chokepoint
(~35 KB each, 13 keys). Both under the 1.5 s Redis read budget.
- New RPC GetChokepointHistory: lazy-loaded on card expand.
- get-chokepoint-status.ts: drop the 500 KB portwatch/corridorrisk/
chokepoint_transits fallback reads. Treat a null transit-summaries
read as upstreamUnavailable=true so cachedFetchJson writes NEG_SENTINEL
(2 min) instead of a 5-min zero-state pin. Omit history from the
response (proto field stays declared; empty array).
- server/_shared/redis.ts: tag AbortError timeouts with [REDIS-TIMEOUT]
key=… timeoutMs=… so log drains / Sentry-Vercel integration pick up
large-payload timeouts instead of them being silently swallowed.
- SupplyChainPanel.ts + MapPopup.ts: lazy-fetch history on card expand
via fetchChokepointHistory; session-scoped cache; graceful "History
unavailable" on empty/error. PRO gating on the map popup unchanged.
- Gateway: cache-tier entry for /get-chokepoint-history (slow).
- Tests: regression guards for upstreamUnavailable gate + per-id key
shape + handler wiring + proto query annotations.
Audit included in plan: no other RPC consumer read stacks >200 KB
besides displacement:summary:v1:2026 (724 KB, same risk, flagged for
follow-up PR). wildfire:fires:v1 at 1.7 MB loads via bootstrap (3 s
timeout, different path) — monitor but out of scope.
Expected impact:
- supply_chain:chokepoints:v4 payload drops from ~508 KB to <100 KB.
- supply_chain:transit-summaries:v1 drops from ~502 KB to <50 KB.
- RPC Redis reads stay well under 1.5 s in the hot path.
- Silent zero-state pinning is now impossible: null reads → 2-min neg
cache → self-heal on next relay tick.
* fix(supply-chain): address PR #3185 review — stop caching empty/error + fix partial coverage
Two P1 regressions caught in review:
1. Client cache poisoning on empty/error (MapPopup.ts, SupplyChainPanel.ts)
Empty-array is truthy in JS, so MapPopup's `!cached && !inflight` branch
never fired once we cached []. Neither `cached && cached.length` fired
either — popup stuck on "Loading transit history..." for the session.
SupplyChainPanel had the explicit `cached && !cached.length` branch but
still never retried, so the same transient became session-sticky there too.
Fix: cache ONLY non-empty successful responses. Empty/error show the
"History unavailable" placeholder but leave the cache untouched, so the
next re-expand retries. The /get-chokepoint-history gateway tier is
"slow" (5-min CF edge cache) → retries stay cheap.
2. Partial portwatch coverage treated as healthy (ais-relay.cjs)
seedTransitSummaries iterated Object.entries(pw), so if seed-portwatch
dropped N of 13 chokepoints (ArcGIS reject/empty), summaries had <13 keys.
get-chokepoint-status upstreamUnavailable fires only on fully-empty
summaries, so the N missing chokepoints fell through to zero-state rows
that got pinned in cache for 5 minutes.
Fix: iterate CANONICAL_IDS (Object.keys(CHOKEPOINT_THREAT_LEVELS)) and
fill zero-state for any ID missing from pw. Shape is consistently 13
keys. Track pwCovered → envelope + seed-meta recordCount reflect real
upstream coverage (not shape size), so health.js can distinguish 13/13
healthy from 10/13 partial. Warn-log on shortfall.
Tests: new regression guards
- panel must NOT cache empty arrays (historyCache.set with []).
- writer must iterate CANONICAL_IDS, not Object.entries(pw).
- seed-meta recordCount binds to pwCovered.
5718/5718 data tests pass. typecheck + typecheck:api clean.
|
||
|
|
d1e084061d |
fix(sw): preserve open modals when tab-hide auto-reload would fire (#3184)
* fix(sw): preserve open modals when tab-hide auto-reload would fire Scenario: a Pro user opens the Clerk sign-in modal, enters their email, and switches to their mail app to fetch the code. If a deploy happens while they wait and the SW update toast's 5 s dwell window has elapsed, `visibilitychange: hidden` triggers `window.location.reload()` — which wipes the Clerk flow, so the code in the inbox is for a now-dead attempt and the user has to re-request. Same failure applies to UnifiedSettings, the ⌘K search modal, story/signal popups, and anything else with modal semantics: leaving the tab = lose your place. Fix: in `sw-update.ts`, the hidden-tab auto-reload now checks for any open modal/dialog via a compound selector (`[aria-modal="true"], [role="dialog"], .modal, .cl-modalBackdrop, dialog[open]`) and suppresses the reload when one matches. Covers Clerk's `.cl-modalBackdrop`, the site-wide `.modal` convention (UnifiedSettings, WidgetChatModal), and any well-authored dialog. The reload stays armed — next tab-hide after the modal closes fires it. Manual "Reload" button click is unaffected (explicit user intent). Over-matching is safe (worst case: user clicks Reload manually). Under-matching keeps the bug, so the selector errs generous. Tests: three new cases cover modal-open suppression, re-arming after modal close, and manual-click bypass. 25/25 sw-update tests pass. Follow-up ticket worth filing: add `aria-modal="true"` + `role="dialog"` to the modals that are missing them (SearchModal, StoryModal, SignalModal, WidgetChatModal, McpConnectModal, MobileWarningModal, CountryIntelModal, UnifiedSettings). That's the proper long-term a11y fix and would let us narrow the selector once coverage is complete. * fix(sw): filter modal guard by actual visibility, not just DOM presence Addresses review feedback on #3184: The previous selector (`[role="dialog"]` etc.) matched the UnifiedSettings overlay, which is created in its constructor at app startup (App.ts:977 → UnifiedSettings.ts:68-71 sets role="dialog") and stays in the DOM for the whole session. That meant auto-reload was effectively disabled for every user, not just those with an actually-open modal. Fix: don't just check for selector matches — check whether the matched element is actually rendered. Persistent modal overlays hide themselves via `display: none` (main.css:6744: `.modal-overlay { display: none }`) and reveal via an `.active` class (main.css:6750: `.active { display: flex }`), so `offsetParent === null` cleanly distinguishes closed from open. We prefer `checkVisibility()` where available (Chrome 105+, Safari 17.4+, Firefox 125+, which covers virtually all current WM users) and fall back to `offsetParent` otherwise. This also handles future modals automatically, without needing us to enumerate every `.xxx-modal-overlay.active` class the site might introduce. New tests: - Modal mounted AND visible → reload suppressed (original Clerk case) - Modal mounted but hidden → reload fires (reviewer's regression case) - Modal visible, then hidden on return → reload fires on next tab-hide - Manual Reload click unaffected in all cases 26/26 sw-update tests pass. * fix(sw): replace offsetParent fallback with getClientRects for fixed overlays Addresses second review finding on #3184: The previous fallback `el.offsetParent !== null` silently failed on every `position: fixed` overlay — which is every modal in this app: - `.modal-overlay` (main.css:6737) — UnifiedSettings, WidgetChatModal - `.story-modal-overlay` (main.css:3442) - `.country-intel-modal-overlay` active state (main.css:18415) MDN: `offsetParent` is specified to return null for any `position: fixed` element, regardless of visibility. So on Firefox <125 or Safari <17.4 (where `Element.checkVisibility()` is unavailable), `isModalOpen` would return false for actually-open modals → auto-reload fires → Clerk sign-in and every other fixed-position flow gets wiped exactly as PR #3184 was meant to prevent. Fix: fall back to `getClientRects().length > 0`. This returns 0 for `display: none` elements (how `.modal-overlay` hides when `.active` is absent) and non-zero for rendered elements, including position:fixed. It's universally supported and matches the semantics we want. New tests exercise the fallback path explicitly with a `supportsCheckVisibility` toggle on the fake env: - visible position:fixed modal + no checkVisibility → reload suppressed - hidden mounted modal + no checkVisibility → reload fires 28/28 sw-update tests pass. * fix(a11y): add role=dialog + aria-modal=true to five missing modals Addresses third review finding on #3184. SW auto-reload guard uses a `[role="dialog"]` selector but five modals were missing the attribute, so `isModalOpen()` returned false and the page could still auto-reload mid-flow on those screens. Broadening the selector to enumerate specific class names was rejected because the app has many non-modal `-overlay` classes (`#deckgl-overlay`, `.conflict-label-overlay`, `.layer-warn-overlay`, `.mobile-menu-overlay`) that would cause false positives and permanently disable auto-reload. Instead, standardize on the existing convention used by UnifiedSettings: every modal overlay sets `role="dialog"` + `aria-modal="true"` at creation. This makes the SW selector work AND improves screen-reader behavior (focus trap, background element suppression). Modals updated: - SearchModal (⌘K search) — both mobile sheet and desktop variants use the same element, single set-attributes call at create time - StoryModal (news story detail) - SignalModal (instability spike detail) - CountryIntelModal (country deep-dive overlay) - MobileWarningModal (mobile device warning) No change to sw-update.ts — the existing selector already covers the newly-attributed elements. All 28 sw-update tests still pass; typecheck clean. |
||
|
|
55ac431c3f |
feat(brief): public share mirror + in-magazine Share button (#3183)
* feat(brief): public share mirror + in-magazine Share button
Adds the growth-vector piece listed under Future Considerations in the
original brief plan (line 399): a shareable public URL and a one-click
Share button on the reader's magazine.
Problem: the per-user magazine at /api/brief/{userId}/{issueDate} is
HMAC-signed to a specific reader. You cannot share the URL you are
looking at, because the recipient either 403s (bad token) or reads
your personalised issue against your userId. Result: no way to share
the daily brief, no way for readers to drive discovery. Opening a
growth loop requires a separate public surface.
Approach: deterministic HMAC-derived short hash per {userId,
issueDate} backed by a pointer key in Redis.
New files
- server/_shared/brief-share-url.ts
Web Crypto HMAC helper. deriveShareHash returns 12 base64url chars
(72 bits) from (userId, issueDate) using BRIEF_SHARE_SECRET.
Pointer encode/decode helpers and a shape check. Distinct from the
per-user BRIEF_URL_SIGNING_SECRET so a leak of one does not
automatically unmask the other.
- api/brief/share-url.ts (edge, Clerk auth, Pro gated)
POST /api/brief/share-url?date=YYYY-MM-DD
Idempotently writes brief:public:{hash} pointer with the same 7 day
TTL as the underlying brief, then returns {shareUrl, hash,
issueDate}. 404 if the per-user brief is missing. 503 on Upstash
failure. Accepts an optional refCode in the JSON body for referral
attribution.
- api/brief/public/[hash].ts (edge, unauth)
GET /api/brief/public/{hash}?ref={code}
Reads pointer, reads the real brief envelope, renders with
publicMode=true. Emits X-Robots-Tag: noindex,nofollow so shared
briefs never get enumerated by search engines. 404 on any missing
part (bad hash shape, missing pointer, missing envelope) with a
neutral error page. 503 on Upstash failure.
Renderer changes (server/_shared/brief-render.js)
- Signature extended: renderBriefMagazine(envelope, options?)
- options.publicMode: redacts user.name and whyMatters before any
HTML emission; swaps the back cover to a Subscribe CTA; prepends
a Subscribe strip across the top of the deck; omits the Share
button + share script; adds a noindex meta tag.
- options.refCode: appended as ?ref= to /pro links on public views.
- Non-public views gain a sticky .wm-share pill in the top-right
chrome. Inline SHARE_SCRIPT handles the click flow: POST /api/
brief/share-url then navigator.share with clipboard fallback and a
prompt() ancient-browser fallback. User-visible feedback via
data-state on the button (sharing / copied / error). No change to
the envelope contract, no LLM calls, no composer-side work
required.
- Validation runs on the full unredacted envelope first, so the
public path can never accept a shape the private path would reject.
Tests
- tests/brief-share-url.test.mts (18 assertions): determinism,
secret sensitivity, userId/date sensitivity, shape validation, URL
composition with/without refCode, trailing-slash handling on
baseUrl, pointer encode/decode round-trip.
- tests/brief-magazine-render.test.mjs (+13 assertions): Share
button carries the issue date; share script emitted once;
share-url endpoint wired; publicMode strips the button+script,
replaces whyMatters, emits noindex meta, prepends Subscribe strip,
passes refCode through with escaping, swaps the back cover, does
not leak the user name, preserves story headlines, options-less
call matches the empty-options call byte for byte.
- Full typecheck/lint/edge-bundle/test:data/edge-functions suite all
green: 5704/5704 data tests, 171/171 edge-function tests, 0 lint
errors.
Env vars (new)
- BRIEF_SHARE_SECRET: 64+ random hex chars, Vercel (edge) only. NOT
needed by the Railway composer because pointer writes are lazy
(on share, not on compose).
* fix(brief): public share round-trip + magazine Share button without auth
Two P1 findings on #3183 review.
1) Pointer wire format: share-url.ts wrote the pointer as a raw colon-delimited string via SET. The public route reads via readRawJsonFromUpstash which ALWAYS JSON.parses. A bare non-JSON string throws at parse, the route returned 503 instead of resolving. Fix: JSON.stringify on both write sites. Regression test locks the wire format.
2) Share button auth unreachable from a standalone magazine tab: inline script needed window.WM_CLERK_JWT which is never set, endpoint hard-requires Bearer, fallback to credentials:include fails. Fix: derive share URL server-side in the per-user route (same inputs share-url uses), embed as data-share-url, click handler now reads dataset and invokes navigator.share directly. No network, no auth, works in any tab.
The /api/brief/share-url endpoint stays in place for other callers (dashboard panel) with its Clerk auth intact and its pointer write now in the correct format.
QA: typecheck clean, 5708/5708 data tests, 45/45 magazine, 20/20 share-url, edge bundle OK, lint exit 0.
* fix(brief): address remaining review findings on #3183
P0-2 (comment-only): public/[hash].ts inline comment incorrectly described readRawJsonFromUpstash parse-failure behaviour. The helper rethrows on JSON.parse failure, it does not return null. Rewrote the comment to match reality (JSON-encoded wire format, parse-to-string round-trip, intentional 503-on-bug-value as the loud failure mode). The actual wire-format fix was in prior commit
|
||
|
|
81536cb395 |
feat(brief): source links, LLM descriptions, strip suffix (envelope v2) (#3181)
* feat(brief): source links, LLM descriptions, strip publisher suffix (envelope v2) Three coordinated fixes to the magazine content pipeline. 1. Headlines were ending with " - AP News" / " | Reuters" etc. because the composer passed RSS titles through verbatim. Added stripHeadlineSuffix() in brief-compose.mjs, conservative case- insensitive match only when the trailing token equals primarySource, so a real subtitle that happens to contain a dash still survives. 2. Story descriptions were the headline verbatim. Added generateStoryDescription to brief-llm.mjs, plumbed into enrichBriefEnvelopeWithLLM: one additional LLM call per story, cached 24h on a v1 key covering headline, source, severity, category, country. Cache hits are revalidated via parseStoryDescription so a bad row cannot flow to the envelope. Falls through to the cleaned headline on any failure. 3. Source attribution was plain text, no outgoing link. Bumped BRIEF_ENVELOPE_VERSION to 2, added BriefStory.sourceUrl. The composer now plumbs story:track:v1.link through digestStoryToUpstreamTopStory, UpstreamTopStory.primaryLink, filterTopStories, BriefStory.sourceUrl. The renderer wraps the Source line in an anchor with target=_blank, rel=noopener noreferrer, and UTM params (utm_source=worldmonitor, utm_medium=brief, utm_campaign=<issueDate>, utm_content=story- <rank>). UTM appending is idempotent, publisher-attributed URLs keep their own utm_source. Envelope validation gains a validateSourceUrl step (https/http only, no userinfo credentials, parseable absolute URL). Stories without a valid upstream link are dropped by filterTopStories rather than shipping with an unlinked source. Tests: 30 renderer tests to 38; new assertions cover UTM presence on every anchor, HTML-escaping of ampersands in hrefs, pre-existing UTM preservation, and all four validator rejection modes. New composer tests cover suffix stripping, link plumb-through, and v2 drop-on-no- link behaviour. New LLM tests for generateStoryDescription cover cache hit/miss, revalidation of bad rows, 24h TTL, and null-on- failure. * fix(brief): v1 back-compat window on renderer + consolidate story hash helper Two P1/P2 review findings on #3181. P1 (v1 back-compat). Bumping BRIEF_ENVELOPE_VERSION 1 to 2 made every v1 envelope still resident in Redis under the 7-day TTL fail assertBriefEnvelope. The hosted /api/brief route would 404 "expired" and the /api/latest-brief preview would downgrade to "composing", breaking already-issued links from the preceding week. Fix: renderer now accepts SUPPORTED_ENVELOPE_VERSIONS = Set([1, 2]) on READ. BRIEF_ENVELOPE_VERSION stays at 2 and is the only version the composer ever writes. BriefStory.sourceUrl is required when version === 2 and absent on v1; when rendering a v1 story the source line degrades to plain text (no anchor), matching pre-v2 appearance. When the TTL window passes the set can shrink to [2] in a follow-up. P2 (hash dedup). hashStoryDescription was byte-identical to hashStory, inviting silent drift if one prompt gains a field the other forgets. Consolidated into hashBriefStory. Cache key separation remains via the distinct prefixes (brief:llm:whymatters:v2:/brief:llm:description:v1:). Tests: adds 3 v1 back-compat assertions (plain source line, field validation still runs, defensive sourceUrl check), updates the version-mismatch assertion to match the new supported-set message. 161/161 pass (was 158). Full test:data 5706/5706. |
||
|
|
8fc302abd9 |
fix(brief): mobile layout — stack story callout, floor digest typography (#3180)
* fix(brief): mobile layout — stack story callout, floor digest typography On viewports <=640px the 55/45 story grid cramped both the headline and the "Why this is important" callout to ~45% width each, and several digest rules used raw vw units (blockquote 2vw, threads 1.55vw) that collapsed to ~7-8px on a 393px iPhone frame before the browser min clamped them to barely-readable. Appends a single @media (max-width: 640px) block to the renderer's STYLE_BLOCK: - .story becomes a flex column — callout stacks under the headline, no column squeeze. Headline goes full-width at 9.5vw. - Digest blockquote, threads, signals, and stat rows get max(Npx, Nvw) floors so they never render below ~15-17px regardless of viewport. - Running-head stacks on digest and the absolute page-number gets right-hand clearance so they stop overlapping. - Tags and source labels pinned to 11px (were scaling down with vw). CSS-only; no envelope, no HTML structure, no new classes. All 30 renderBriefMagazine tests still pass. * fix(brief): raise mobile digest px floors and running-head clearance Two P2 findings from PR review on #3180: 1. .digest .running-head padding-right: 18vw left essentially zero clearance from the absolute .page-number block on iPhone SE (375px) and common Android (360px). Bumped to 22vw (~79px at 360px) which accommodates "09 / 12" in IBM Plex Mono at the right:5vw offset with a one-vw safety margin. 2. Mobile overrides were lowering base-rule px floors (thread 17px to 15px, signal 18px to 15px). On viewports <375px this rendered digest body text smaller than desktop. Kept the px floors at or above the base rules so effective size only ever goes up on mobile. |
||
|
|
388995b1a4 |
fix(health): macroSignals maxStaleMin 20 → 150 to match seed-economy cron cadence (#3179)
macroSignals is a secondary key written by seed-economy.mjs, whose primary key energy-prices has maxStaleMin=150 in its runSeed config. A 20-min threshold guaranteed STALE_SEED between every cron run. |
||
|
|
6f6102e5a7 |
feat(brief): swap sienna rust for two-strength WM mint (Option B palette) (#3178)
* feat(brief): swap sienna rust for two-strength WM mint (Option B palette)
The only off-brand color in the product was the brief's sienna rust
(#8b3a1f) accent. Every other surface — /pro landing, dashboard,
dashboard panels — uses the WM mint green (#4ade80). Swapping the
brief's accent to the brand mint makes the magazine read as a sibling
of /pro rather than a separate editorial product, while keeping the
magazine-grade serif typography and even/odd page inversion intact.
Implementation (user picked Option B from brief-palette-playground.html):
--sienna : #8b3a1f -> #3ab567 muted mint for LIGHT pages (readable
on #fafafa without the bright-mint
glare of a pure-brand swap)
--mint : + #4ade80 bright WM mint for DARK pages
(matches /pro exactly)
--cream : #f1e9d8 -> #fafafa unified with --paper; one crisp white
--cream-ink: #1a1612 -> #0a0a0a crisper contrast on the new paper
Accent placement (unchanged structurally — only colors swapped):
- Digest running heads, labels, blockquote rule, stats dividers,
end-marker rule, signal/thread tags: all muted mint on light
- Story source line: newly mint (was unstyled bone/ink at 0.6 opacity);
two-strength — muted on light stories, bright on dark
- Logo ekg dot: mint on every page so the brand 'signal' pulse
threads through the whole magazine
No layout changes. No HTML structure changes. Only color constants +
a ~20-line CSS addition for story-source + ekg-dot accents.
165/165 brief tests pass (renderer contract unchanged — envelope shape
identical, only computed styles differ). Both tsconfigs typecheck clean.
* fix(brief): darken light-page mint to pass WCAG AA + fix digest ekg-dot
Two P2 findings on PR #3178 review.
1. Digest ekg-dot used bright #4ade80 on a #fafafa background,
contradicting the code comment that said 'light pages use the
muted mint'. The rule was grouped with .cover and .story.dark
(both ink backgrounds) when it should have been grouped with
.story.light (paper background). Regrouped.
2. #3ab567 on #fafafa tests at ~2.31:1 — fails WCAG AA 4.5:1 for
every text size and fails the 3:1 large-text floor. The PR called
this a rollback trigger; contrast math says it would fail every
meaningful text usage (mono running heads, source lines, labels,
footer captions). Swapped --sienna from #3ab567 to #1f7a3f —
tested at ~4.90:1 on #fafafa, passes AA for normal text.
Kept the variable name '--sienna' for backwards compat (every
.digest rule references it). The hue stays recognisably mint-
family (green dominant) so the brand relationship with #4ade80
on dark pages is still clear to a reader. Dark-page mint is
unchanged — #4ade80 on #0a0a0a is ~11.4:1, passes AAA.
Playground (brief-palette-playground.html) updated to match so
future iterations work against the accessible value.
165/165 brief tests pass. Both tsconfigs typecheck clean.
|
||
|
|
048e5486ac |
fix(brief): Latest Brief panel locks out Pro users — gate reads Clerk metadata, not entitlement (#3177)
* fix(brief): Latest Brief panel locks out Pro users — gate reads wrong field Reported: PR #3166 shipped a WEB_CLERK_PRO_ONLY_PANELS gate that downgrades the Latest Brief panel to FREE_TIER/ANONYMOUS when the user isn't Clerk-Pro. The downgrade condition was: state.user?.role !== 'pro' state.user.role is derived from Clerk's publicMetadata.plan via getCurrentClerkUser(). That field is NOT kept in sync with the real entitlement for many users — the source of truth is the Convex entitlements table, not Clerk metadata. Result: a confirmed Pro user (Convex entitlement.features.tier = 1+) sees every other premium panel unlock (hasPremiumAccess consults isEntitled() per PR #3167) but the Latest Brief panel shows 'Upgrade to Pro'. Fix: swap the condition from 'state.user?.role !== \'pro\'' to '!hasTier(1)' — the same Convex-backed entitlement check hasPremiumAccess uses. The panel's own auth subscription keeps the separate role-based guard inside refresh() as defence in depth (belt-and-suspenders), but the top-level gating no longer over-fires on the wrong field. No new behaviour for users without an entitlement. Typecheck clean. * fix(brief): panel-side role gate also reads Convex entitlement (not Clerk metadata) Reviewer caught that the prior PR (#3177) fixed the layout-level gate but left the panel's own refresh() guard reading authState.user.role — same stale-publicMetadata bug. A user whose Convex entitlement says tier=1 but whose Clerk publicMetadata.plan is unset would unlock past the layout gate (now correct) and then still hit the panel's local renderUpgradeRequired() path. Fix: swap the local role check to hasTier(1) — the same Convex snapshot the layout now consults. Now BOTH gates agree on the source of truth. * fix(brief): defer Pro gate when entitlement snapshot hasn't arrived yet Review flagged a transient 'Upgrade to Pro' flash for Pro users on initial load. The auth-state subscription can fire before the Convex entitlement snapshot arrives; hasTier(1) returns false by default when currentState is null, so a Pro user briefly sees the upgrade overlay until onEntitlementChange re-runs the gate with the real snapshot. Fix: treat 'entitlement not yet loaded' as distinct from 'free user'. Both panel-layout.ts gate AND LatestBriefPanel.refresh() now check getEntitlementState() !== null before applying the Clerk-Pro-only downgrade. During the unknown window the panel stays in its loading state; the onEntitlementChange listener re-runs updatePanelGating once the snapshot lands and either unlocks or gates correctly. No behaviour change for free users (entitlement snapshot arrives with tier=0, still correctly gates). No behaviour change for the steady-state Pro case. Only the cold-start window differs: flash of upgrade-overlay → clean loading state. * fix(brief): drop client entitlement gate from panel refresh — let server decide Reviewer's sharper read on PR #3177: the prior 'defer-if-unknown' fix still blocks Pro users whenever the Convex entitlement subscription is late, skipped, or failed to establish. getEntitlementState() can stay null indefinitely if the Convex client auth never connects; hasTier(1) would stay false; the panel would stay on renderLoading() forever and the server-side /api/latest-brief check would never even fire. The correct architecture: the server is authoritative. /api/latest- brief already does its own entitlement check against the Clerk JWT. Client-side entitlement is a fast-path optimisation, never a gate. Fix: switch both call sites to AFFIRMATIVE DENIAL ONLY. LatestBriefPanel.refresh() Before: if snapshot null -> renderLoading (fetch never fires); if snapshot + free -> renderUpgradeRequired. After: if snapshot != null AND !hasTier(1) -> renderUpgradeRequired. Otherwise fall through and FIRE THE FETCH. The 403 path (BriefAccessError 'upgrade_required') already renders the upgrade CTA when the server says free. panel-layout.ts updatePanelGating Already shaped as affirmative-denial (snapshot != null AND !hasTier). Updated the comment to make the invariant explicit so a future refactor doesn't flip it back to positive-gating. Consequence: an API-key-only user with a free Clerk account will fire one doomed fetch per refresh and see renderUpgradeRequired a beat later than before. Accepted — the alternative locked legitimate Pro users out whenever Convex was anything other than perfectly healthy, which is a materially worse failure mode. Both tsconfigs typecheck clean. No test changes needed — the BriefAccessError path was already covered by existing tests. |
||
|
|
b5824d0512 |
feat(brief): Phase 9 / Todo #223 — share button + referral attribution (#3175)
* feat(brief): Phase 9 / Todo #223 — share button + referral attribution Adds a Share button to the dashboard Brief panel so PRO users can spread WorldMonitor virally. Built on the existing referral plumbing (registrations.referralCode + referredBy fields; api/register-interest already passes referredBy through) — this PR fills in the last mile: a stable referral code for signed-in Clerk users, a share URL, and a client-side share sheet. Files: server/_shared/referral-code.ts (new) Deterministic 8-char hex code: HMAC(BRIEF_URL_SIGNING_SECRET, 'referral:v1:' + userId). Same Clerk userId always produces the same code. No DB write on login, no schema migration, stable for the life of the account. api/referral/me.ts (new) GET -> { code, shareUrl, invitedCount }. Bearer-auth via Clerk. Reuses BRIEF_URL_SIGNING_SECRET to avoid another Railway env var. Stats fail gracefully to 0 on Convex outage. convex/registerInterest.ts + convex/http.ts New internal query getReferralStatsByCode({referralCode}) counts registrations rows that named this code as their referredBy. Exposed via POST /relay/referral-stats (RELAY_SHARED_SECRET auth). src/services/referral.ts (new) getReferralProfile: 5-min cache, profile is effectively immutable shareReferral: Web Share API primary (mobile native sheet), clipboard fallback on desktop. Returns 'shared'/'copied'/'blocked' /'error'. AbortError is treated as 'blocked', not failure. clearReferralCache for account-switch hygiene. src/components/LatestBriefPanel.ts + src/styles/panels.css New share row below the brief cover card. Button disabled until /api/referral/me resolves; if fetch fails the row removes itself. invitedCount > 0 renders as 'N invited' next to the button. Referral cache invalidated alongside Clerk token cache on account switch (otherwise user B would see user A's share link for 5 min). Tests: 10 new cases in tests/brief-referral-code.test.mjs - getReferralCodeForUser: hex shape, determinism, uniqueness, secret-rotation invalidates, input guards - buildShareUrl: path shape, trailing-slash trim, URL-encoding 153/153 brief + deploy tests pass. Both tsconfigs typecheck clean. Attribution flow (already working end-to-end): 1. Share button -> worldmonitor.app/pro?ref={code} 2. /pro landing page already reads ?ref= and passes to /api/register-interest as referredBy 3. convex registerInterest:register increments the referrer's referralCount and stores referredBy on the new row 4. /api/referral/me reads the count back via the relay query 5. 'N invited' updates on next 5-min cache refresh Scope boundaries (deferred): - Convex conversion tracking (invited -> PRO subscribed). Needs a join from registrations.referredBy to subscriptions.userId via email. Surface 'N converted' in a follow-up. - Referral-credit / reward system: viral loop works today, reward logic is a separate product decision. * fix(brief): address three P2 review findings on #3175 - api/referral/me.ts JSDoc said '503 if REFERRAL_SIGNING_SECRET is not configured' but the handler actually reads BRIEF_URL_SIGNING_SECRET. Updated the docstring so an operator chasing a 503 doesn't look for an env var that doesn't exist. - server/_shared/referral-code.ts carried a RESERVED_CODES Set to avoid collisions with URL-path keywords ('index', 'robots', 'admin'). The guard is dead code: the code alphabet is [0-9a-f] (hex output of the HMAC) so none of those non-hex keywords can ever appear. Removed the Set + the while loop; left a comment explaining why it was unnecessary so nobody re-adds it. - src/components/LatestBriefPanel.ts passed disabled: 'true' (string) to the h() helper. DOM-utils' h() calls setAttribute for unknown props, which does disable the button — but it's inconsistent with the later .disabled = false property write. Fixed to the boolean disabled: true so the attribute and the IDL property agree. 10/10 referral-code tests pass. Both tsconfigs typecheck clean. * fix(brief): address two review findings on #3175 — drop misleading count + fix user-agnostic cache P1: invitedCount wired to the wrong attribution store. The share URL is /pro?ref=<code>. On /pro the 'ref' feeds Dodopayments checkout metadata (affonso_referral), NOT registrations.referredBy. /api/referral/me counted only the waitlist path, so the panel would show 0 invited for anyone who converted direct-to-checkout — misleading. Rather than ship a count that measures only one of two attribution paths (and the less-common one at that), the count is removed entirely. The share button itself still works. A proper metric requires unifying the waitlist + Dodo-metadata paths into a single attribution store, which is a follow-up. Changes: - api/referral/me.ts: response shape is { code, shareUrl } — no invitedCount / convertedCount - convex/registerInterest.ts: removed getReferralStatsByCode internal query - convex/http.ts: removed /relay/referral-stats route - src/services/referral.ts: ReferralProfile interface no longer has invitedCount; fetch call unchanged in behaviour - src/components/LatestBriefPanel.ts: dropped the 'N invited' render branch P2: referral cache was user-agnostic. Module-global _cached had no userId key, so a stale cache primed by user A would hand user B user A's share link for up to 5 min after an account switch — if no panel is mounted at the transition moment to call clearReferralCache(). Per the reviewer's point, this is a real race. Fix: two-part. (a) Cache entry carries the userId it was computed for; reads check the current Clerk userId and only accept hits when they match. Mismatch → drop + re-fetch. (b) src/services/referral.ts self-subscribes to auth-state at module load (ensureAuthSubscription). On any id transition _cached is dropped. Module-level subscription means the invalidation works even when no panel is currently mounted. (c) Belt-and-suspenders: post-fetch, re-check the current user before caching. Protects against account switches that happen mid-flight between 'read cache → ask network → write cache'. Panel's local clearReferralCache() call removed — module now self-invalidates. 10/10 referral-code tests pass. Both tsconfigs typecheck clean. * fix(referral): address P1 review finding — share codes now actually credit the sharer The earlier head generated 8-char Clerk-derived HMAC codes for the share button, but the waitlist register mutation only looked up registrations.by_referral_code (6-char email-generated codes). Codes issued by the share button NEVER resolved to a sharer — the 'referral attribution' half of the feature was non-functional. Fix (schema-level, honest attribution path): convex/schema.ts - userReferralCodes { userId, code, createdAt } + by_code, by_user - userReferralCredits { referrerUserId, refereeEmail, createdAt } + by_referrer, by_referrer_email convex/registerInterest.ts - register mutation: after the existing registrations.by_referral_code lookup, falls through to userReferralCodes.by_code. On match, inserts a userReferralCredits row (the Clerk user has no registrations row to increment, so credit needs its own table). Dedupes by (referrer, refereeEmail) so returning visitors can't double-credit. - new internalMutation registerUserReferralCode({userId, code}) idempotent binding of a code to a userId. Collisions logged and ignored (keeps first writer). convex/http.ts - new POST /relay/register-referral-code (RELAY_SHARED_SECRET auth) that runs the mutation above. api/referral/me.ts - signature gains a ctx.waitUntil handle - after generating the user's code, fire-and-forget POSTs to /relay/register-referral-code so the binding is live by the time anyone clicks a shared link. Idempotent — a failure just means the NEXT call re-registers. Still deferred: display of 'N credited' / 'N converted' in the LatestBriefPanel. The waitlist side now resolves correctly, but the Dodopayments checkout path (/pro?ref=<code> → affonso_referral) is tracked in Dodo, not Convex. Surfacing a unified count requires a separate follow-up to pull Dodo metadata into Convex. Regression tests (3 new cases in tests/brief-referral-code.test.mjs): - register mutation extends to userReferralCodes + inserts credits - schema declares both tables with the right indexes - /api/referral/me registers the binding via waitUntil 13/13 referral tests pass. Both tsconfigs typecheck clean. * fix(referral): address two P1 review findings — checkout attribution + dead-link prevention P1: share URL didn't credit on the /pro?ref= checkout path. The earlier PR wired Clerk codes into the waitlist path (/api/register-interest -> userReferralCodes -> userReferralCredits) but a visitor landing on /pro?ref=<code> and going straight to Dodo checkout forwarded the code only into Dodo metadata (affonso_referral). Nothing on our side credited the sharer. Fix: convex/payments/subscriptionHelpers.ts handleSubscriptionActive now reads data.metadata.affonso_referral when inserting a NEW subscription row. If the code resolves in userReferralCodes, a userReferralCredits row crediting the sharer is inserted (deduped by (referrer, refereeEmail) so webhook replays don't double-credit). The credit only lands on first-activation — the else-branch of the existing/new split guards against replays. P1: /api/referral/me returned 200 + share link even when the (code, userId) binding failed. ctx.waitUntil(registerReferralCodeInConvex(...)) ran the binding asynchronously, swallowing missing env + non-2xx + network errors. Users got a share URL that the waitlist lookup could never resolve — dead link. Fix: registerReferralCodeInConvex is now BLOCKING (throws on any failure) and the handler awaits it before returning. On failure the endpoint responds 503 service_unavailable rather than a 200 with a non-functional URL. Mutation is idempotent so client retries are safe. Regression tests (2 updated/new in tests/brief-referral-code.test.mjs): - asserts the binding is awaited, not ctx.waitUntil'd; asserts the failure path returns 503 - asserts subscriptionHelpers reads affonso_referral, resolves via userReferralCodes.by_code, inserts a userReferralCredits row, and dedupes by (referrer, refereeEmail) 14/14 referral tests pass. Both tsconfigs typecheck clean. Net effect: /pro?ref=<code> visitors who convert (direct checkout) now credit the sharer on webhook receipt, same as waitlist signups. The share button is no longer a dead-end UI. |
||
|
|
122204f691 |
feat(brief): Phase 8 — Telegram carousel images via Satori + resvg-wasm (#3174)
* feat(brief): Phase 8 — Telegram carousel images via Satori + resvg-wasm
Implements the Phase 8 carousel renderer (Option B): server-side PNG
generation in a Vercel edge function using Satori (JSX to SVG) +
@resvg/resvg-wasm (SVG to PNG). Zero new Railway infra, zero
Chromium, same edge runtime that already serves the magazine HTML.
Files:
server/_shared/brief-carousel-render.ts (new)
Pure renderer: (BriefEnvelope, CarouselPage) -> Uint8Array PNG.
Three layouts (cover/threads/story), 1200x630 OG size.
Satori + resvg + WASM are lazy-imported so Node tests don't trip
over '?url' asset imports and the 800KB wasm doesn't ship in
every bundle. Font: Noto Serif regular, fetched once from Google
Fonts and memoised on the edge isolate.
api/brief/carousel/[userId]/[issueDate]/[page].ts (new)
Public edge function reusing the magazine route's HMAC token —
same signer, same (userId, issueDate) binding, so one token
unlocks magazine HTML AND all three carousel images. Returns
image/png with 7d immutable cache headers. 404 on invalid page
index, 403 on bad token, 404 on Redis miss, 503 on missing
signing secret. Render failure falls back to a 1x1 transparent
PNG so Telegram's sendMediaGroup doesn't 500 the brief.
scripts/seed-digest-notifications.mjs
carouselUrlsFrom(magazineUrl) derives the 3 signed carousel
URLs from the already-signed magazine URL. sendTelegramBriefCarousel
calls Telegram's sendMediaGroup with those URLs + short caption.
Runs before the existing sendTelegram(text) so the carousel is
the header and the text the body — long-form stories remain
forwardable as text. Best-effort: carousel failure doesn't
block text delivery.
package.json + package-lock.json
satori ^0.10.14 + @resvg/resvg-wasm ^2.6.2.
Tests (tests/brief-carousel.test.mjs, 9 cases):
- pageFromIndex mapping + out-of-range
- carouselUrlsFrom: valid URL, localhost origin preserved, missing
token, wrong path, invalid issueDate, garbage input
- Drift guard: cron must still declare the same helper + template
string. If it drifts, test fails with a pointer to move the impl
into a shared module.
PNG render itself isn't unit-tested — Satori + WASM need a
browser/edge runtime. Covered by smoke validation step in the
deploy monitoring plan.
Both tsconfigs typecheck clean. 152/152 brief tests pass.
Scope boundaries (deferred):
- Slack + Discord image attachments (different payload shapes)
- notification-relay.cjs brief_ready dispatch (real-time route)
- Redis caching of rendered PNG (edge Cache-Control is enough for
MVP)
* fix(brief): address two P1 review findings on Phase 8 carousel
P1-A: 200 placeholder PNG cached 7d on render failure.
Route config said runtime: 'edge' but a comment contradicted it
claiming Node semantics. More importantly, any render/init failure
(WASM load, Satori, Google Fonts) was converted to a 1x1 transparent
PNG returned with Cache-Control: public, max-age=7d, immutable.
Telegram's media fetcher and Vercel's CDN would cache that blank
for the full brief TTL per chat message — one cold-start mismatch
= every reader of that brief sees blank carousel previews for a
week.
Fix: deleted errorPng(). Render failure now returns 503 with
Cache-Control: no-store. sendMediaGroup fails cleanly for that
carousel (the digest cron already treats it as best-effort and
still sends the long-form text message), next cron tick re-renders
from a fresh isolate. Self-healing across ticks. Contradictory
comment about Node runtime removed.
P1-B: Google Fonts as silent hard dependency.
The renderer claimed 'safe embedded/fallback path' in comments but
no fallback existed. loadFont() fetches Noto Serif from gstatic.com
and rethrows on any failure. Combined with P1-A's old 200-cache-7d
path, a transient CDN blip would lock in a blank carousel for a
week.
Fix: updated comments to honestly declare the CDN dependency plus
document the self-healing semantics now that P1-A's fix no longer
caches the failure. If Google Fonts reliability becomes a problem,
swap the fetch for a bundled base64 TTF — noted as the escape hatch.
Tests (tests/brief-carousel.test.mjs): 2 new regression cases.
11/11 carousel tests pass. Both tsconfigs typecheck clean locally.
Note on currently-red CI: failures are NOT typecheck errors — npm
ci dies fetching libvips for sharp (504 Gateway Time-out from
GitHub releases). sharp is a transitive dep via @xenova/transformers,
pre-existing, not touched by this PR. Transient infra flake.
* fix(brief): switch carousel to Node + @resvg/resvg-js (fixes deploy block)
Vercel edge bundler fails the carousel deploy with:
'Edge Function is referencing unsupported modules:
@resvg/resvg-wasm/index_bg.wasm?url'
The ?url asset-import syntax is a Vite-ism that Vercel's edge
bundler doesn't resolve. Two ways out: find a Vercel-blessed edge
WASM import incantation, or switch to Node runtime with the native
@resvg/resvg-js binding. The second is simpler, faster per request,
and avoids the whole WASM-in-edge-bundler rabbit hole.
Changes:
- package.json: @resvg/resvg-wasm -> @resvg/resvg-js ^2.6.2
- api/brief/carousel/.../[page].ts: runtime 'edge' -> 'nodejs20.x'
- server/_shared/brief-carousel-render.ts: drop initWasm path,
dynamic-import resvg-js in ensureLibs(). Satori and resvg load
in parallel via Promise.all, shaving ~30ms off cold start.
Also addresses the P2 finding from review: the old ensureLibsAndWasm
had a concurrent-cold-start race where two callers could reach
'await initWasm()' simultaneously. Replaced the boolean flag with a
shared _libsLoadPromise so concurrent callers await the same load.
On failure the promise resets so the NEXT request retries rather
than poisoning the isolate for its lifetime.
Cold start ~700ms (Satori + resvg-js native init), warm ~40ms.
Carousel images are not latency-critical — fetched by Telegram's
media service, CDN-cached 7d.
Both tsconfigs typecheck clean. 11/11 carousel tests pass.
* fix(brief): carousel runtime = 'nodejs' (was 'nodejs20.x', rejected by Vercel)
Vercel's functions config validator rejects 'nodejs20.x' at deploy
time:
unsupported "runtime" value in config: "nodejs20.x"
(must be one of: ["edge","experimental-edge","nodejs"])
The Node version comes from the project's default (currently Node 20
via package.json engines + Vercel project settings), not from the
runtime string. Use 'nodejs' — unversioned — and let the platform
resolve it.
11/11 carousel tests pass.
* fix(brief): swap carousel font from woff2 to woff (Satori can't parse woff2)
Review on PR #3174: the FONT_URL was pointing at a gstatic.com woff2
file. Satori parses ttf / otf / woff v1 — NOT woff2. Every render
was about to throw on font decode, the route would return 503, and
the carousel would never deliver a single image.
Fix: point FONT_URL at @fontsource's Noto Serif Latin 400 WOFF v1
via jsdelivr. WOFF v1 is a TrueType wrapper that Satori parses
natively (verified: file says 'Web Open Font Format, TrueType,
version 1.1'). Same cold-start semantics as before — one fetch per
warm isolate, memoised.
Regression test: asserts FONT_URL ends in ttf/otf/woff and explicitly
rejects any .woff2 suffix. A future swap that silently reintroduces
woff2 now fails CI loudly instead of shipping a permanently-broken
renderer.
12/12 carousel tests pass. Both tsconfigs typecheck clean.
|
||
|
|
e1c3b28180 |
feat(notifications): Phase 6 — web-push channel for PWA notifications (#3173)
* feat(notifications): Phase 6 — web-push channel for PWA notifications
Adds a web_push notification channel so PWA users receive native
notifications when this tab is closed. Deep-links click to the
brief magazine URL for brief_ready events, to the event link for
everything else.
Schema / API:
- channelTypeValidator gains 'web_push' literal
- notificationChannels union adds { endpoint, p256dh, auth,
userAgent? } variant (standard PushSubscription identity triple +
cosmetic UA for the settings UI)
- new setWebPushChannelForUser internal mutation upserts the row
- /relay/deactivate allow-list extended to accept 'web_push'
- api/notification-channels: 'set-web-push' action validates the
triple, rejects non-https, truncates UA to 200 chars
Client (src/services/push-notifications.ts + src/config/push.ts):
- isWebPushSupported guards Tauri webview + iOS Safari
- subscribeToPush: permission + pushManager.subscribe + POST triple
- unsubscribeFromPush: pushManager.unsubscribe + DELETE row
- VAPID_PUBLIC_KEY constant (with VITE_VAPID_PUBLIC_KEY env override)
- base64 <-> Uint8Array helpers (VAPID key encoding)
Service worker (public/push-handler.js):
- Imported into VitePWA's generated sw.js via workbox.importScripts
- push event: renders notification; requireInteraction=true for
brief_ready so a lock-screen swipe does not dismiss the CTA
- notificationclick: focuses+navigates existing same-origin client
when present, otherwise opens a new window
- Malformed JSON falls back to raw text body, missing data falls
back to a minimal WorldMonitor default
Relay (scripts/notification-relay.cjs):
- sendWebPush() with lazy-loaded web-push dep. 404/410 triggers
deactivateChannel('web_push'). Missing VAPID env vars logs once
and skips — other channels keep delivering.
- processEvent dispatch loop + drainHeldForUser both gain web_push
branches
Settings UI (src/services/notifications-settings.ts):
- New 'Browser Push' tile with bell icon
- Enable button lazy-imports push-notifications, calls subscribe,
renders 'Not supported' on Tauri/in-app webviews
- Remove button routes web_push specifically through
unsubscribeFromPush so the browser side is cleaned up too
Env vars required on Railway services:
VAPID_PUBLIC_KEY public key
VAPID_PRIVATE_KEY private key
VAPID_SUBJECT mailto:support@worldmonitor.app (optional)
Public key is also committed as the default in src/config/push.ts
so the client bundle works without a build-time override.
Tests: 11 new cases in tests/brief-web-push.test.mjs
- base64 <-> Uint8Array round-trip + null guards
- VAPID default fallback when env absent
- SW push event rendering, requireInteraction gating, malformed JSON
+ no-data fallbacks
- SW notificationclick: openWindow vs focus+navigate, default url
154/154 tests pass. Both tsconfigs typecheck clean.
* fix(brief): address PR #3173 review findings + drop hardcoded VAPID
P1 (security): VAPID private key leaked in PR description.
Rotated the keypair. Old pair permanently invalidated. Structural fix:
Removed DEFAULT_VAPID_PUBLIC_KEY entirely. Hardcoding the public
key in src/config/push.ts gave rotations two sources of truth
(code vs env) — exactly the friction that caused me to paste the
private key in a PR description in the first place. VAPID_PUBLIC_KEY
now comes SOLELY from VITE_VAPID_PUBLIC_KEY at build time.
isWebPushConfigured() gates the subscribe flow so builds without
the env var surface as 'Not supported' rather than crashing
pushManager.subscribe.
Operator setup (one-time):
Vercel build: VITE_VAPID_PUBLIC_KEY=<public>
Railway services: VAPID_PUBLIC_KEY=<public>
VAPID_PRIVATE_KEY=<private>
VAPID_SUBJECT=mailto:support@worldmonitor.app
Rotation: update env on both sides, redeploy. No code change, no
PR body — no chance of leaking a key in a commit.
P2: single-device fan-out — setWebPushChannelForUser replaces the
previous subscription silently. Per-device fan-out is a schema change
deferred to follow-up. Fix for now: surface the replacement in
settings UI copy ('Enabling here replaces any previously registered
browser.') so users who expect multi-device see the warning.
P2: 24h push TTL floods offline devices on reconnect. Event-type-aware:
brief_ready: 12h (daily editorial — still interesting)
quiet_hours_batch: 6h (by definition queued-on-wake)
everything else: 30m (transient alerts: noise after 30min)
REGRESSION test: VAPID_PUBLIC_KEY must be '' when env var is unset.
If a committed default is reintroduced, the test fails loudly.
11/11 web-push tests pass. Both tsconfigs typecheck clean.
* fix(notifications): deliver channel_welcome push for web_push connects (#3173 P2)
The settings UI queues a channel_welcome event on first web_push
subscribe (api/notification-channels.ts:240 via publishWelcome), but
processWelcome() in the relay only branched on slack/discord/email —
no web_push arm. The welcome event was consumed off the queue and
then silently dropped, leaving first-time subscribers with no
'connection confirmed' signal.
Fix: add a web_push branch to processWelcome. Calls sendWebPush with
eventType='channel_welcome' which maps to the 30-minute TTL tier in
the push-delivery switch — a welcome that arrives >30 min after
subscribe is noise, not confirmation.
Short body (under 80 chars) so Chrome/Firefox/Safari notification
shelves don't clip past ellipsis.
11/11 web-push tests pass.
* fix(notifications): address two P1 review findings on #3173
P1-A: SSRF via user-supplied web_push endpoint.
The set-web-push edge handler accepted any https:// URL and wrote
it to Convex. The relay's sendWebPush() later POSTs to whatever
endpoint sits in that row, giving any Pro user a server-side-request
primitive bounded only by the relay's network egress.
Fix: isAllowedPushEndpointHost() allow-list in api/notification-
channels.ts. Only the four known browser push-service hosts pass:
fcm.googleapis.com (Chrome / Edge / Brave)
updates.push.services.mozilla.com (Firefox)
web.push.apple.com (Safari, macOS 13+)
*.notify.windows.com (Windows Notification Service)
Fail-closed: unknown hosts rejected with 400 before the row ever
reaches Convex. If a future browser ships a new push service we'll
need to widen this list (guarded by the SSRF regression tests).
P1-B: cross-account endpoint reuse on shared devices.
The browser's PushSubscription is bound to the origin, NOT to the
Clerk session. User A subscribes on device X, signs out, user B
signs in on X and subscribes — the browser hands out the SAME
endpoint/p256dh/auth triple. The previous setWebPushChannelForUser
upsert keyed only by (userId, channelType), so BOTH rows now carry
the same endpoint. Every push the relay fans out for user A also
lands on device X which is now showing user B's session.
Fix: setWebPushChannelForUser scans all web_push rows and deletes
any that match the new endpoint BEFORE upserting. Effectively
transfers ownership of the subscription to the current caller.
The previous user will need to re-subscribe on that device if they
sign in again.
No endpoint-based index on notificationChannels — the scan happens
at <10k rows and is well-bounded to the one write-path per user
per connect. If volume grows, add an + migration.
Regression tests (tests/brief-web-push.test.mjs, 3 new cases):
- allow-list defines all four browser hosts + fail-closed return
- allow-list is invoked BEFORE convexRelay() in the handler
- setWebPushChannelForUser compares + deletes rows by endpoint
14/14 web-push tests pass. Both tsconfigs typecheck clean.
|
||
|
|
c2356890da |
feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini (#3172)
* feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini
Replaces the Phase 3a stubs with editorial output from Gemini 2.5
Flash via the existing OpenRouter-backed callLLM chain. Two LLM
pathways, different caching semantics:
whyMatters (per story): 1 editorial sentence, 18-30 words, global
stakes. Cache brief:llm:whymatters:v1:{sha256(headline|source|severity)}
with 24h TTL shared ACROSS users (whyMatters is not personalised).
Bounded concurrency 5 so a 12-story brief doesn't open 12 parallel
sockets to OpenRouter.
digest prose (per user): JSON { lead, threads[], signals[] }
replacing the stubs. Cache brief:llm:digest:v1:{userId}:{sensitivity}
:{poolHash} with 4h TTL per-user. Pool hash is order-insensitive
so rank shuffling doesn't invalidate.
Provider pinned to OpenRouter (google/gemini-2.5-flash) via
skipProviders: ['ollama', 'groq'] per explicit user direction.
Null-safe all the way down. If the LLM is unreachable, parse fails,
or cache throws, enrichBriefEnvelopeWithLLM returns the baseline
envelope with its stubs intact. The brief always ships. Kill switch
BRIEF_LLM_ENABLED is distinct from AI_DIGEST_ENABLED so the brief's
editorial prose and the email's AI summary can be toggled
independently during provider outages.
Files:
scripts/lib/brief-llm.mjs (new) — pure prompt/parse helpers + IO
generateWhyMatters/generateDigestProse + envelope enrichment
scripts/seed-digest-notifications.mjs — BRIEF_LLM_ENABLED flag,
briefLlmDeps closure, enrichment inserted between compose + SETEX
tests/brief-llm.test.mjs (new, 34 cases)
End-to-end verification: the enriched envelope passes
assertBriefEnvelope() — the renderer's strict validator is the gate
between composer and api/brief, so we prove the enriched envelope
still validates.
156/156 brief tests pass. Both tsconfigs typecheck clean.
* fix(brief): address three P1 review findings on Phase 3b
All three findings are about cache-key correctness + envelope safety.
P1-A — whyMatters cache key under-specifies the prompt.
hashStory keyed on headline|source|threatLevel, but the prompt also
carries category + country. Upstream classification or geocoding
corrections that leave those three fields unchanged would return
pre-correction prose for a materially different prompt. Bumped to
v2 key space (pre-fix rows ignored, re-LLM once on rollout). Added
regression tests for category + country busting the cache.
P1-B — digest prose cache key under-specifies the prompt.
hashDigestInput sorted stories and hashed headline|threatLevel only.
The actual prompt includes ranked order + category + country + source.
v2 hash now canonicalises to JSON of the fields in the prompt's
ranked order. Test inverted to lock the corrected behaviour
(reordering MUST miss the cache). Added a test for category change
invalidating.
P1-C — malformed cached digest poisons the envelope at SETEX time.
On cache hit generateDigestProse accepted any object with a string
lead, skipping the full shape check. enrichBriefEnvelopeWithLLM then
wrote prose.threads/.signals into the envelope, and the cron SETEXed
unvalidated. A bad cache row would 404 /api/brief at render time.
Two-layer fix:
1. Extracted validateDigestProseShape(obj) — same strictness
parseDigestProse ran on fresh output. generateDigestProse now
runs it on cache hits too, and returns a normalised copy.
2. Cron now re-runs assertBriefEnvelope on the ENRICHED envelope
before SETEX. On assertion failure it falls back to the
unenriched baseline (already passed assertion on construction).
Regression test: malformed cached row is rejected on hit and the
LLM is called again to overwrite.
Tests: 8 new regression cases locking all three findings. Total brief
test suite now 185/185 green. Both tsconfigs typecheck clean.
Cache-key version bumps (v1 -> v2) trigger one-off cache miss on
deploy. Editorial prose re-LLM'd on the next cron tick per user.
* fix(brief): address two P2 review findings on #3172
P2-A: misleading test name 'different users share the cache' asserted
the opposite (per-user isolation). Renamed to 'different users do NOT
share the digest cache even when the story pool is identical' so a
future reader can't refactor away the per-user key on a misreading.
P2-B: signal length validator only capped bytes (< 220 chars), so a
30-word signal could pass even though the prompt says '<=14 words'.
Added a word-count filter with an 18-word ceiling (14 + 4 margin for
model drift / hyphenated compounds). Regression test locks the
behaviour: signals with >14-word drift are dropped, short imperatives
pass.
43/43 brief-llm tests pass. Both tsconfigs typecheck clean.
|
||
|
|
0fd8cd7d5f |
feat(pro): complimentary entitlement tooling + subscription.expired guard (#3169)
Adds support / goodwill tooling for granting free-tier credits that
survive Dodo subscription cancellations. Triggered by the 2026-04-17/18
duplicate-subscription incident: the customer was granted a manual
extension to validUntil, but that extension is naked — our existing
handleSubscriptionExpired handler unconditionally downgrades to free
when Dodo fires the expired event, which would wipe the credit.
Three coordinated changes:
1. convex/schema.ts — add optional compUntil: number to entitlements.
Acts as a "don't downgrade me before this" floor independent of the
subscription billing cycle. Optional, so existing rows are untouched.
2. convex/payments/billing.ts::grantComplimentaryEntitlement —
new internalMutation callable via `npx convex run`. Upserts the
entitlement, sets both validUntil and compUntil to max(existing, now +
days). Never shrinks (calling twice is idempotent upward), validates
planKey against PRODUCT_CATALOG, and syncs the Redis cache so edge
gateway sees the comp without waiting for TTL.
3. convex/payments/subscriptionHelpers.ts::handleSubscriptionExpired —
before the unconditional downgrade, read the current entitlement and
skip revocation if compUntil > eventTimestamp. This protects comp
grants from Dodo-originated revocations; normal subscription.expired
revocation is unchanged when there's no comp or the comp has lapsed.
Tests (convex/__tests__/comp-entitlement.test.ts, 9 new):
grantComplimentaryEntitlement
creates row with validUntil == compUntil
never shrinks an existing longer comp
upgrades planKey on existing free-tier row
rejects unknown planKey and non-positive days
handleSubscriptionExpired comp guard
revokes to free when no comp set (unchanged)
revokes to free when comp is already expired
preserves entitlement when comp is still valid
end-to-end: grant + expired webhook = entitlement survives
CLI usage (requires npx convex deploy after merge, then):
npx convex run 'payments/billing:grantComplimentaryEntitlement' \
'{"userId":"user_XXX","planKey":"pro_monthly","days":90,"reason":"support"}'
Full check suite green: typecheck x2, biome, 5590 data tests,
171 edge tests, 9 convex tests, md lint, version sync.
|
||
|
|
c90d40dfc5 |
fix(pro): consult Convex entitlement in hasPremiumAccess + isProUser (#3167)
* fix(pro): consult Convex entitlement in hasPremiumAccess and isProUser Customer cus_0NcmwcAWw0jhVBHVOK58C still saw "Upgrade to Pro" overlays on premium panels this morning despite PR #3163 landing, because the reload the transition detector triggers only helps if hasPremiumAccess returns true on the next load — and it doesn't. Our Dodo webhook writes to Convex `entitlements` but never to Clerk publicMetadata.plan, so hasPremiumAccess (which only checked Clerk role / API keys / tester keys) stayed false for every paying Dodo customer. isPanelEntitled honours the Convex entitlement and unlocks the panel content, but getPanelGateReason -> hasPremiumAccess still returns FREE_TIER and covers it with the upgrade overlay. Kevin's screenshot shows exactly that: PRO badges next to panel titles, "Upgrade to Pro" body beneath. Fix: - src/services/panel-gating.ts — hasPremiumAccess now checks isEntitled() after the Clerk role check. Convex entitlement is now authoritative for paying customers. - src/services/widget-store.ts — same fix in isProUser so widget, search, and event-handler gates agree with panel gating. - src/app/panel-layout.ts — after the existing free->pro reload branch, re-run updatePanelGating(getAuthState()) on every entitlement snapshot. Without this a legacy-pro user's null->true initial snapshot (NOT reloaded to avoid a loop) would leave the paywall overlay in place until the next auth event; likewise on WS reconnect or revocation the lock state must follow the current snapshot synchronously. Import graph stays acyclic: panel-gating -> entitlements -> convex-client -> clerk; widget-store -> entitlements likewise. Typecheck clean. Pairs with PR #3163 (already merged). Earlier iterations of the activation race fix were cargo-culted around the transition detector; this is the actual UI-visible fix. * refactor(pro): drop redundant isEntitled() from hasPremiumAccess Greptile P2: the isEntitled() check at the end of hasPremiumAccess can never flip the result, because isProUser() — called two lines earlier in the same function — now also checks isEntitled() after this PR's widget-store change. If isProUser() returns false, we already know isEntitled() was false inside it. Remove the redundant call and expand the docstring to say explicitly that isProUser carries the Convex entitlement signal. Keeps hasPremiumAccess as a thin union of signals that aren't already covered by isProUser (WORLDMONITOR_API_KEY desktop secret + the passed-in authState.role which could in principle differ from getAuthState()). |
||
|
|
01c607c27c |
fix(brief): compose magazine stories from digest accumulator, not news:insights (#3168)
Root cause of the "email digest lists 30 critical events, brief shows
2 random Bellingcat stories" mismatch reported today: the email and
the brief read from two unrelated Redis keys.
email digest -> digest:accumulator:v1:{variant}:{lang}
live per-variant ZSET of 30+ ingested stories,
hydrated from story:track:v1:{hash} + sources.
written by list-feed-digest on every ingest cycle.
brief -> news:insights:v1
global 8-story summary written by seed-insights.
After sensitivity=critical filter only 2 survive.
A completely different pool on a different cadence.
The brief was shipping from the wrong source, so a user who had just
read "UNICEF / Hormuz / Rohingya" in their email would open their
brief and see unrelated Bellingcat pieces.
Fix: brief now composes from the same digest accumulator the email
reads. scripts/lib/brief-compose.mjs exposes a new
composeBriefFromDigestStories(rule, digestStories, insightsNumbers,
{nowMs}) that maps the digest story shape ({hash, title, severity,
sources[], ...}) through a small adapter into the upstream brief-
filter shape, applies the user's sensitivity gate, and assembles the
envelope. news:insights:v1 is still read — but only for the
clusters / multi-source counters on the stats page. A failed
insights fetch now returns zeroed stats instead of aborting brief
composition, because the stories (not the numbers) are what matter.
seed-digest-notifications:
- composeBriefsForRun now calls buildDigest(candidate, windowStart)
per rule instead of using a single global pool
- memoizes buildDigest by (variant, lang, windowStart) to keep the
per-user loop from issuing N identical ZRANGE+HGETALL round-trips
- BRIEF_STORY_WINDOW_MS = 24h — a weekly-cadence user still expects
a fresh brief in the dashboard every day, independent of email
cadence
- composeBriefForRule kept as @deprecated so tests that stub
news:insights directly don't break; all live traffic uses the
digest path
Tests: new tests/brief-from-digest-stories.test.mjs (12 cases) locks
the mapping — empty input, source selection, sensitivity pass/drop,
12-story cap, moderate→medium severity aliasing, category/country
defaults, stats-number passthrough, determinism.
122/122 brief tests pass; both tsconfigs typecheck clean.
Operator note: today's wrong brief at brief:user_...:2026-04-18 was
already DELed manually. The next cron tick under this code composes
a correct one from the same pool the email used.
|
||
|
|
e98df6f694 |
fix(brief): clerk-pro-only gate + generation-guarded token cache (#3166)
Two P1 race/gate findings from post-merge review of #3160. Finding 1 — mixed-auth path still rendered as unlocked. hasPremiumAccess() returns true for desktop API key / browser tester keys even when the signed-in Clerk account is FREE. The brief is stored at brief:{clerkUserId}:{date} in Redis — without a Clerk PRO user there is nothing to fetch, and /api/latest-brief returns 403. Relying on the panel's inline role check only inverts the UX: the panel "unlocks", then paints an upgrade CTA inside an unlocked body. Fix: WEB_CLERK_PRO_ONLY_PANELS in panel-layout.ts. When a panel is in this set AND the Clerk role is not 'pro', the layout downgrades the gate reason from NONE to FREE_TIER (or ANONYMOUS when no Clerk user at all). The panel now shows the same locked overlay an actual free user sees — consistent, and no doomed fetch. Finding 2 — clearClerkTokenCache() didn't invalidate mid-flight fetch. Nulling _cachedToken and _tokenInflight does not cancel a promise that is already awaiting session.getToken(). When that promise resolves it unconditionally writes _cachedToken = tokenA and returns tokenA to its already-awaiting callers, re-poisoning the cache for 50s and silently shipping A's JWT into B's session. The panel's post-response UI-user check catches the direct A-during-fetch race, but a panel that starts a fresh refresh AFTER the switch can still get A's token back from the stale inflight. Fix: monotonic _tokenGen counter. Bumped by clearClerkTokenCache() and signOut(). Each getClerkToken() captures myGen on entry; if the generation has advanced by the time the JWT arrives, the result is dropped on the floor (no cache write, no return value). The finally block also guards the _tokenInflight null-out so a newer generation's inflight isn't clobbered. Typecheck clean on both tsconfigs. 94/94 brief + deploy tests green. |
||
|
|
8684e5a398 |
fix(brief): per-route CSP override so magazine swipe/arrow nav runs (#3165)
* fix(brief): per-route CSP override so magazine swipe/arrow nav runs The global CSP at /((?!docs).*) allow-lists only four SHA-256 hashes for inline scripts (the app's own index.html scripts). brief-render.js emits its swipe/arrow/wheel/touch nav as a deterministic inline IIFE with a different hash, so the browser silently blocked it. The deck rendered, pages were present, dots were drawn — but nothing advanced. Fix mirrors the existing /api/slack/oauth/callback and /api/discord/oauth/callback precedent: a per-route Content-Security- Policy header for /api/brief/(.*) that relaxes script-src to 'unsafe-inline'. Everything else is tight: - default-src 'self' - connect-src 'self' (no outbound network) - object-src 'none', form-action 'none' - frame-ancestors pinned to worldmonitor domains - style-src keeps Google Fonts; font-src keeps gstatic - script-src keeps Cloudflare Insights beacon (auto-injected) 'unsafe-inline' is safe here because server/_shared/brief-render.js HTML-escapes all Redis-sourced content via escapeHtml over [&<>"']. No user-controlled string reaches the DOM unescaped. Verified: all 17 tests/deploy-config.test.mjs security-header assertions still pass (they target the catch-all route, untouched). * fix(brief): un-block Cloudflare Insights beacon + add CSP test coverage Two P2 follow-ups from Greptile review on #3165. 1. connect-src was 'self' only — the Cloudflare Insights beacon script loaded (script-src allowed static.cloudflareinsights.com) but its outbound POST to https://cloudflareinsights.com/cdn-cgi/rum was silently blocked. Analytics for brief-page traffic was dropped with no console error. Added https://cloudflareinsights.com to connect-src so the beacon can ship its payload. 2. tests/deploy-config.test.mjs had 17 assertions for the catch-all CSP but nothing for the new /api/brief/(.*) override. Any future edit — or accidental deletion — of the rule would land without a red test. Added a 4-test suite covering: - rule exists with a CSP header - script-src allows 'unsafe-inline' (the whole point) - connect-src allows cloudflareinsights.com (this fix) - tight non-script defaults still present (default-src 'self', object-src 'none', form-action 'none', base-uri 'self') 21/21 deploy-config assertions pass locally. |
||
|
|
bc91c61a87 |
[codex] guard duplicate subscription checkout (#3162)
* guard duplicate subscription checkout * address checkout guard review feedback |
||
|
|
c49c2f80f6 |
fix(pro): reliable post-payment activation (#3163)
* fix(pro): reliable post-payment activation (transition reload + auth wait + overlay-success reload) Fixes a silent race where paying users saw locked panels after a successful Dodo checkout and concluded PRO hadn't activated. Incident 2026-04-17/18: one customer purchased Pro Monthly twice within 32 min on Google Pay then Credit Card because the first charge showed no UI change; the duplicate was refunded by Dodo with reason "Duplicate transaction". Server path (webhook -> Convex entitlements row) was verified correct end to end: all 9 webhook events processed, entitlement row written within seconds, planKey=pro_monthly. The bug was client-side in three places. 1. panel-layout.ts replaced skipInitialSnapshot with a free->pro transition detector (shouldReloadOnEntitlementChange). The prior guard swallowed the first pro snapshot unconditionally, which collapsed two distinct cases: (a) legacy-pro user on normal page load (correctly no reload) and (b) free user whose post-payment pro snapshot arrives after panels rendered against free-tier gating (should reload). The transition detector distinguishes them by remembering the last observed entitlement. 2. entitlements.ts awaits waitForConvexAuth(10_000) before calling client.onUpdate. Mirrors the pattern already used in api-keys.ts and App.ts claimSubscription path. Eliminates the spurious FREE_TIER_DEFAULTS first snapshot from unauthenticated cold sessions that the transition detector would otherwise treat as the baseline. 3. checkout.ts on Dodo overlay checkout.status=succeeded schedules a window.location.reload() after 3s (median webhook latency <5s observed in prod). Belt-and-braces: guarantees the post-payment state is fresh even if the WS subscription is slow or the transition detector misses the edge for any reason. Unit tests in tests/entitlement-transition.test.mts cover all six (last, next) combinations plus the full incident-simulation sequence (null -> false -> true -> true => exactly one reload) and the legacy-pro reconnect sequence (null -> true -> true -> true => zero reloads). Out of scope (tracked separately): server-side duplicate-subscription guard in _createCheckoutSession. * fix(pro): seed lastEntitled=false on redirect-return from checkout Addresses a gap in the original PR: the transition detector still swallowed the first pro snapshot when the user came back via Dodo's full-page redirect flow (/pro page -> Dodo checkout -> return to worldmonitor.app with ?subscription_id=...&status=active URL params handled by handleCheckoutReturn). On that path a fast webhook can land before the browser finishes the return navigation. When the dashboard boots, Convex's first entitlement snapshot already carries pro_monthly — which the detector treats as the "legacy-pro on normal page load" case and does not reload. Panels rendered against free-tier gating stay locked until manual refresh. Fix: when handleCheckoutReturn() returns true, seed lastEntitled=false instead of null. This biases the detector to treat the first pro snapshot as the true free->pro transition that it is, not a legacy-pro baseline. Adds two new unit tests covering both redirect-return timings (webhook already landed; webhook still pending). Full transition suite is now 10/10 passing. * fix(pro): seed lastEntitled=false across the overlay reload too Prior amendment covered the full-page Dodo redirect return (URL carries subscription_id params consumed by handleCheckoutReturn). But the overlay success path does its own setTimeout(() => window.location.reload(), 3_000) and the overlay uses manualRedirect:true, so the reload lands at the original URL with no params. handleCheckoutReturn returns false there, returnedFromCheckout stays false, lastEntitled seeds to null, and a fast webhook's first-snapshot pro entitlement gets swallowed as legacy-pro baseline — same class of bug that caused the 2026-04-17/18 incident, now reproducible on the overlay path instead of the redirect path. Fix: before the scheduled reload, set a session flag (wm-post-checkout). On the reloaded page, panel-layout consumes the flag and treats it as a post-checkout return, which makes the transition detector seed lastEntitled=false and correctly route the first pro snapshot through the reload. Session storage is used (not local) so the flag is scoped to the tab and doesn't leak across sessions. Silent try/catch keeps private-browsing environments working — in that case we fall back to the pre-flag behavior (risk bounded by the 3s reload + Convex WS catching up, same as before). |
||
|
|
5673bc6c16 |
fix(brief): sign-in-required state + composing auto-refresh (#3160)
* fix(brief): sign-in-required state + composing poll + visibility refresh
Addresses two P1 review findings after PR #3159 merged.
1. hasPremiumAccess unlocks for desktop WORLDMONITOR_API_KEY and
browser tester keys, but /api/latest-brief is Clerk-userId
scoped — it returns 401 for those paths. Previously the panel
unlocked then showed a retry-loop error for API-key-only users.
Now: if authState.user is missing, render a dedicated "Sign in
to view your brief" CTA inline (no endpoint hit). Subscribes to
subscribeAuthState; when the user signs in mid-session, refresh()
fires automatically.
2. composing state never auto-refreshed. Fix: every renderComposing
schedules a 60s setTimeout re-poll; cleared on ready/error/lock/
destroy. Also added a document visibilitychange listener that
triggers refresh when the tab comes back into focus. Covers the
"brief composed while tab backgrounded" case without reload.
Added matching destroy() override to clean up timeout + auth
subscription + visibility listener.
Typecheck + biome lint clean.
* fix(brief): direct Clerk fetch + sign-out clear + account-switch guard
Addresses three P1 review findings on PR #3160.
1. Desktop API key + Clerk Pro couldn't load the panel.
premiumFetch hard-stops on WORLDMONITOR_API_KEY and never sends
Clerk Bearer. /api/latest-brief is Bearer-only so every desktop
request 401'd even for a signed-in Pro user. Fix: fetchLatest
bypasses premiumFetch entirely — imports getClerkToken() and
builds the request with Authorization: Bearer directly. The
user-scoped pre-check already guaranteed we have a Clerk user.
2. Sign-out left the previous user's brief on screen. The auth
subscription only triggered refresh on truthy nextId, and
hasPremiumAccess stays true on desktop/tester keys so the
layout-level updatePanelGating doesn't re-lock us. Fix: the
subscription now handles ALL three transitions explicitly:
null → id → refresh (sign-in)
idA → idB → abort in-flight + refresh (account switch)
id → null → abort in-flight + renderSignInRequired (sign-out)
Clears composing poll and inflight fetch on every transition.
3. Clerk account switch could paint user A's brief in user B's
session. getClerkToken caches the JWT for 50s, so a fast A→B
switch during an in-flight fetch would hit the server with A's
Bearer, return A's brief, and the post-response guard (which
only checked "still premium") would let it paint in B's UI.
Fix: refresh() captures requestUserId at fetch-start and the
post-response + error branches re-verify that the current
authState.user.id still equals requestUserId before any
this.content mutation. A transient account switch silently
drops the stale response.
Typecheck + biome lint clean.
* fix(brief): clear Clerk token cache on user-id transition
Closes the remaining account-switch race the previous commit
couldn't cover from inside the panel. Detail:
My previous post-response userId check compared the CURRENT
authState.user.id to the requestUserId captured at refresh start.
For a fast A→B switch:
- Subscription handler fires, lastUserId = B
- refresh() captures requestUserId = B
- fetchLatest calls getClerkToken() → returns A's cached JWT
(50s TTL, keyed by time not user)
- Server /api/latest-brief decodes A's sub → returns A's brief
- Post-check: currentUserId (B) === requestUserId (B) → paint
- Result: user A's greeting + signed magazineUrl painted in
user B's session
Fix: clear the Clerk token cache via the existing
clearClerkTokenCache() export from clerk.ts on every observed
user-id transition. Next getClerkToken() re-fetches from
Clerk.session — bound to the CURRENT session, not the previous
one. Server now receives B's token, returns B's brief.
Covers sign-in (A=null→B), account switch (A→B), and sign-out
(B→null) symmetrically. The (B→null) branch also short-circuits
to renderSignInRequired which was already in place.
Findings 1 (sign-out clear) and 3 (desktop API key) from the
latest review were already resolved in commit
|
||
|
|
64c906a406 |
feat(eia): gold-standard /api/eia/petroleum (Railway seed → Redis → Vercel reads only) (#3161)
* feat(eia): move /api/eia/petroleum to gold-standard (Railway seed → Redis → Vercel reads only)
Live api.eia.gov fetches from the Vercel edge function were causing
FUNCTION_INVOCATION_TIMEOUT 504s on /api/eia/petroleum (Sydney edge →
US origin with no timeout, no cache, no stale fallback — one EIA blip
blew the 25s budget).
- New seeder scripts/seed-eia-petroleum.mjs — fetches WTI/Brent/
production/inventory from api.eia.gov with per-fetch 15s timeouts,
writes energy:eia-petroleum:v1 with the {_seed, data} envelope.
Accepts 1-of-4 series; 0-of-4 routes to contract-mode RETRY so
seed-meta stays stale and the bundle retries on next cron.
- Bundled into seed-bundle-energy-sources.mjs (daily, 90s timeout) —
no new Railway service needed.
- Rewrote api/eia/[[...path]].js as a Redis-only reader via
readJsonFromUpstash. Same response shape for backward compat with
widgets/MCP/external callers. 503 + Retry-After on miss (never 504).
- Registered eiaPetroleum in api/health.js STANDALONE_KEYS + gated as
ON_DEMAND_KEYS for the deploy window; promote to SEED_META
(maxStaleMin: 4320) in a follow-up after ~7 days of clean cron.
- Tests: 14 seeder unit tests + 9 edge handler tests.
Audit result: /api/eia/petroleum was the only Vercel route fetching
dashboard data live. Every other fetch(https://…) in api/ is
auth/payments/notifications/user-initiated enrichment.
* fix(eia): close silent-stale window — add SEED_META + seed-health registration
Review finding on PR #3161: without a SEED_META entry, readSeedMeta
returns seedStale: null and classifyKey never reaches STALE_SEED.
That meant a broken Railway cron or missing EIA_API_KEY after the first
successful seed would keep /api/eia/petroleum serving stale data for
up to 7 days (TTL) while /api/health reported OK.
- api/health.js: add SEED_META.eiaPetroleum with maxStaleMin=4320
(72h = 3× daily bundle cadence). Keep eiaPetroleum in ON_DEMAND_KEYS
so the Vercel-instant / Railway-delayed deploy window doesn't CRIT
on first seed, but stale-after-seed now properly fires STALE_SEED.
- api/seed-health.js: register energy:eia-petroleum in SEED_DOMAINS
(intervalMin=1440) so the secondary health endpoint reports it too.
- Updated ON_DEMAND_KEYS comment to reflect freshness is now enforced.
|
||
|
|
fd419bcfae |
feat(brief): dashboard Latest Brief panel (Phase 4) (#3159)
* feat(brief): dashboard "Latest Brief" panel (Phase 4)
New PRO-gated panel that reads /api/latest-brief and renders today's
brief with a cover-style thumbnail + greeting + thread count + CTA.
Clicking opens the signed magazine URL in a new tab. Base Panel
class handles the PRO overlay (ANONYMOUS/FREE_TIER) via
premium: 'locked' — no story content, headline, or greeting leaks
through DOM on the locked state.
Three render states:
- ready → cover card + "Read brief →" CTA linking to magazineUrl
- composing → neutral empty state ("Your brief is composing.")
- error → base showError() with retry
Files:
- src/components/LatestBriefPanel.ts — new Panel subclass, self-
fetching via premiumFetch (handles Clerk Bearer + X-WorldMonitor-
Key tester keys + api key fallback)
- src/components/index.ts — export the new panel
- src/app/panel-layout.ts — createPanel('latest-brief', ...)
- src/config/panels.ts — registry entry (priority 1 so it sorts up
front across all variant registries)
- src/styles/panels.css — cover-card + meta-strip styles using the
same e-ink palette as the magazine (sienna kicker, bone text on
ink cover, serif greeting)
Self-contained: no Convex migration, no new env vars, no backend
changes. Reads the /api/latest-brief endpoint already shipped in
Phase 2 (#3153 merged). Lands independently of Phase 3b / 5 / 6 / 8.
Follow-ups (not in this PR):
- CMD+K entry for "Open Latest Brief" — locale strings + commands
registry, trivial.
- Localisation of panel title + copy strings.
- Share button (todo 223).
Typecheck clean, lint clean on the new file.
* fix(brief): register latest-brief in both premium gate registries
Addresses the review finding that the Panel base class's
`premium: 'locked'` flag is NOT what actually enforces PRO gating in
the app. Two separate registries do:
1. WEB_PREMIUM_PANELS in src/app/panel-layout.ts — the set
updatePanelGating() iterates on every auth-state change to decide
which panels to lock with a CTA overlay. Panels not in this set
get `reason === NONE` and are always unlocked for whoever's
viewing them, regardless of the Panel constructor flag.
2. The `premium:` property on each entry in src/config/panels.ts —
which isPanelEntitled() checks to decide whether a panel is
premium at all.
`latest-brief` was missing from both. Result for anonymous/free
users: the panel mounted, self-fetched /api/latest-brief, got 401
or 403, and showed raw error UI instead of the intended "Upgrade to
Pro" overlay. Also: a PRO user who downgraded mid-session would
retain the rendered brief because updatePanelGating() wouldn't
re-lock them.
Fixes:
- src/app/panel-layout.ts — add 'latest-brief' to WEB_PREMIUM_PANELS
so updatePanelGating() locks the panel correctly for non-PRO users
and RE-locks it on a mid-session downgrade.
- src/config/panels.ts — add `premium: 'locked' as const` to all
four registry entries (full, finance, tech, happy variants) so
isPanelEntitled() treats it as premium everywhere.
- src/components/LatestBriefPanel.ts — guard refresh() against
running without premium access. Belt-and-suspenders against race
conditions where the panel mounts before updatePanelGating()
completes, and against mid-session downgrade where the panel
stays mounted but should stop hitting the endpoint. Uses the
same hasPremiumAccess(getAuthState()) check as the gating
infrastructure itself.
Typecheck + biome lint clean.
* fix(brief): SVG logo now actually renders + queue concurrent refresh
Addresses two P1 + one P2 from Greptile on PR #3159.
1. P1 (line 147 + line 167): `h('div', { innerHTML: ... })` silently
did nothing. src/utils/dom-utils.ts applyProps has no special
case for `innerHTML` — it falls through to
`el.setAttribute('innerHTML', svgString)` which just sets a
literal DOM attribute. Both logo containers rendered empty.
Switched to:
const logo = h('div', { className: '...' });
logo.appendChild(rawHtml(WM_LOGO_SVG));
rawHtml() exists in dom-utils for exactly this case; returns a
parsed DocumentFragment.
2. P2: Concurrent refresh() was silently dropped. Added a
refreshQueued flag so a second refresh during an in-flight one
queues a single follow-up pass instead of disappearing. Now a
retry-after-error or a downstream caller that triggers refresh
while another is mid-fetch always sees its intent applied.
Typecheck + biome lint clean.
* fix(brief): close downgrade-leak + blank-on-upgrade races on panel
Addresses two P1 findings on PR #3159.
1. In-flight fetch leaked premium content after downgrade.
refresh() checked entitlement only BEFORE await premiumFetch.
If auth flipped during the fetch, updatePanelGating had already
replaced this.content with the locked CTA, but renderReady/
renderComposing then overwrote it with brief content. Fixed with
a three-gate sequence + fetch abort:
(a) Pre-fetch check: gate + hasPremiumAccess — unchanged.
(b) In-flight abort: override showGatedCta() to abort()
the AbortController before super() paints the locked CTA.
renderReady/renderComposing never even runs.
(c) Post-response re-check: re-verify this.gateLocked +
hasPremiumAccess before any this.content mutation. Catches
the tight window where abort() lost the race or where an
error-handler path could still paint brief-ish UI.
All three are needed — a user can sign out between any two of
them; removing any one leaves a real leakage window.
2. Upgrade → blank panel.
unlockPanel() base-class behaviour clears the locked content and
leaves the content element empty. No refresh was triggered on
the free/anon → PRO transition, so the panel stayed blank until
page reload. Overrode unlockPanel() to detect the wasLocked
transition and call refresh() after re-rendering the loading
state.
Also tracks gateLocked as a local mirror of the base's private
_locked, since Panel doesn't expose a getter. Synced via the two
override sites above; used in the three-gate checks.
Typecheck + biome lint clean.
|
||
|
|
711636c7b6 |
feat(brief): consolidate composer into digest cron (retire standalone service) (#3157)
* feat(brief): consolidate composer into digest cron (retire standalone service)
Merges the Phase 3a standalone Railway composer into the existing
digest cron. End state: one cron (seed-digest-notifications.mjs)
writes brief:{userId}:{issueDate} for every eligible user AND
dispatches the digest to their configured channels with a signed
magazine URL appended. Net -1 Railway service.
User's architectural note: "there is no reason to have 1 digest
preparing all and sending, then another doing a duplicate". This
delivers that — infrastructure consolidation, same send cadence,
single source of truth for brief envelopes.
File moves / deletes:
- scripts/seed-brief-composer.mjs → scripts/lib/brief-compose.mjs
Pure-helpers library: no main(), no env guards, no cron. Exports
composeBriefForRule + groupEligibleRulesByUser + dedupeRulesByUser
(shim) + shouldExitNonZero + date helpers + extractInsights.
- Dockerfile.seed-brief-composer → deleted.
- The seed-brief-composer Railway service is retired (user confirmed
they would delete it manually).
New files:
- scripts/lib/brief-url-sign.mjs — plain .mjs port of the sign path
in server/_shared/brief-url.ts (Web Crypto only, no node:crypto).
- tests/brief-url-sign.test.mjs — parity tests that confirm tokens
minted by the scripts-side signer verify via the edge-side verifier
and produce byte-identical output for identical input.
Digest cron (scripts/seed-digest-notifications.mjs):
- Reads news:insights:v1 once per run, composes per-user brief
envelopes, SETEX brief:{userId}:{issueDate} via body-POST pipeline.
- Signs magazine URL per user (BRIEF_URL_SIGNING_SECRET +
WORLDMONITOR_PUBLIC_BASE_URL new env requirements, see pre-merge).
- Injects magazineUrl into buildChannelBodies for every channel
(email, telegram, slack, discord) as a "📖 Open your WorldMonitor
Brief magazine" footer CTA.
- Email HTML gets a dedicated data-brief-cta-slot near the top of
the body with a styled button.
- Compose failures NEVER block the digest send — the digest cron's
existing behaviour is preserved when the brief pipeline has issues.
- Brief compose extracted to its own functions (composeBriefsForRun
+ composeAndStoreBriefForUser) to keep main's biome complexity at
baseline (64 — was 63 before; inline would have pushed to 117).
Tests: 98/98 across the brief suite. New parity tests confirm cross-
module signer agreement.
PRE-MERGE: add BRIEF_URL_SIGNING_SECRET and WORLDMONITOR_PUBLIC_BASE_URL
to the digest-notifications Railway service env (same values already
set on Vercel for Phase 2). Without them, brief compose is auto-
disabled and the digest falls back to its current behaviour — safe to
deploy before env is set.
* fix(brief): digest Dockerfile + propagate compose failure to exit code
Addresses two seventh-round review findings on PR #3157.
1. Cross-directory imports + current Railway build root (todo 230).
The consolidated digest cron imports from ../api, ../shared, and
(transitively via scripts/lib/brief-compose.mjs) ../server/_shared.
The running digest-notifications Railway service builds from the
scripts/ root — those parent paths are outside the deploy tree
and would 500 on next rebuild with ERR_MODULE_NOT_FOUND.
New Dockerfile.digest-notifications (repo-root build context)
COPYs exactly the modules the cron needs: scripts/ contents,
scripts/lib/, shared/brief-envelope.*, shared/brief-filter.*,
server/_shared/brief-render.*, api/_upstash-json.js,
api/_seed-envelope.js. Tight list to keep the watch surface small.
Pattern matches the retired Dockerfile.seed-brief-composer + the
existing Dockerfile.relay.
2. Silent compose failures (todo 231). composeBriefsForRun logged
counters but never exited non-zero. An Upstash outage or missing
signing secret silently dropped every brief write while Railway
showed the cron green. The retired standalone composer exited 1
on structural failures; that observability was lost in the
consolidation.
Changed the compose fn to return {briefByUser, composeSuccess,
composeFailed}. Main captures the counters, runs the full digest
send loop first (compose-layer breakage must NEVER block user-
visible digest delivery), then calls shouldExitNonZero at the
very end. Exit-on-failure gives ops the Railway-red signal
without touching send behaviour.
Also: a total read failure of news:insights:v1 (catch branch)
now counts as 1 compose failure so the gate trips on insights-
key infra breakage, not just per-user write failures.
Tests unchanged (98/98). Typecheck + node --check clean. Biome
complexity ticks 63→65 — same pre-existing bucket, already tolerated
by CI; no new blocker.
PRE-MERGE Railway work still pending: set BRIEF_URL_SIGNING_SECRET
+ WORLDMONITOR_PUBLIC_BASE_URL on the digest-notifications service,
AND switch its dockerfilePath to /Dockerfile.digest-notifications
before merging. Without the dockerfilePath switch, the next rebuild
fails.
* fix(brief): Dockerfile type:module + explicit missing-secret tripwire
Addresses two eighth-round review findings on PR #3157.
1. ESM .js files parse as CommonJS in the container (todo 232).
Dockerfile.digest-notifications COPYs shared/*.js,
server/_shared/*.js, api/*.js — all ESM because the repo-root
package.json has "type":"module". But the image never copies the
root package.json, so Node's nearest-pjson walk inside /app/
reaches / without finding one and defaults to CommonJS. First
`export` statement throws `SyntaxError: Unexpected token 'export'`
at startup.
Fix: write a minimal /app/package.json with {"type":"module"}
early in the build. Avoids dragging the full root package.json
into the image while still giving Node the ESM hint it needs for
repo-owned .js files.
2. Missing BRIEF_URL_SIGNING_SECRET silently tolerated (todo 233).
The old gate folded "operator-disabled" (BRIEF_COMPOSE_ENABLED=0)
and "required secret missing in rollout" into the same boolean
via AND. A production deploy that forgot the env var would skip
brief compose without any failure signal — Railway green, no
briefs, no CTA in digests, nobody notices.
Split the two states: BRIEF_COMPOSE_DISABLED_BY_OPERATOR (explicit
kill switch, silent) and BRIEF_SIGNING_SECRET_MISSING (the misconfig
we care about). When the secret is missing without the operator
flag, composeBriefsForRun returns composeFailed=1 on first call
so the end-of-run exit gate trips and Railway flags the run red.
Digest send still proceeds — compose-layer issues never block
notifications.
Tests: 98/98. Syntax + node --check clean.
* fix(brief): address 2 remaining P2 review comments on PR #3157
Greptile review (2026-04-18T05:04Z) flagged three P2 items. The
first (shouldExitNonZero never wired into cron) was already fixed in
commit
|
||
|
|
45da551d17 |
feat(brief): per-user composer writing brief:{userId}:{issueDate} (Phase 3a) (#3154)
* feat(brief): per-user composer writing brief:{userId}:{issueDate} (Phase 3a)
Phase 3a of docs/plans/2026-04-17-003. Produces the Redis-resident
envelopes that Phases 1 (renderer) and 2 (edge routes) already know
how to serve, so after this ships the end-to-end read path works
with real data.
Files:
- shared/brief-filter.{js,d.ts}: pure helpers. normaliseThreatLevel
maps upstream 'moderate' -> 'medium' (contract pinned the union in
Phase 1). filterTopStories applies sensitivity thresholds and caps
at maxStories. assembleStubbedBriefEnvelope builds a full envelope
with stubbed greeting/lead/threads/signals and runs it through the
renderer's assertBriefEnvelope so no malformed envelope is ever
persisted. issueDateInTz computes per-user local date via Intl
with UTC fallback.
- scripts/seed-brief-composer.mjs: Railway cron. Reads
news:insights:v1 once, fetches enabled alert rules via the
existing /relay/digest-rules endpoint (same set
seed-digest-notifications uses), then for each rule computes the
user's local issue date, filters stories, assembles an envelope,
and SETEX brief:{userId}:{issueDate} with 7-day TTL. Respects
aiDigestEnabled opt-in. Honours SIGTERM. Exits non-zero when >5%
of rules fail so Railway surfaces structural breakage.
- Dockerfile.seed-brief-composer: standalone container. Copies the
minimum set (composer + shared/ contract + renderer validator +
Upstash helper + seed-envelope unwrapper).
- tests/brief-filter.test.mjs: 22 pure-function tests covering
severity normalisation (including 'moderate' alias), sensitivity
thresholds, story cap, empty-title drop, envelope assembly passes
the strict renderer validator, tz-aware date math across +UTC/-UTC
offsets with a bad-timezone fallback.
Out of scope for this PR:
- LLM-generated whyMatters / lead / signals (Phase 3b).
- brief_ready event fan-out to notification-relay (Phase 3c).
- Dashboard panel that consumes /api/latest-brief (Phase 4).
Pre-merge runbook:
1. Create a new Railway service from Dockerfile.seed-brief-composer.
2. Set env vars (UPSTASH_*, CONVEX_URL, RELAY_SHARED_SECRET) — reuse
the values already in the digest service.
3. Add a cron schedule (suggested: hourly at :05 so it lands between
the insights-seeder tick and the digest cron).
4. Verify first run: check service logs for
"[brief-composer] Done: success=X ..." and a reader's
/api/latest-brief should stop returning 'composing' within one
cron cycle.
Tests: 72/72 (22 brief-filter + 30 render + 20 HMAC). Typecheck +
lint clean. Composer script parses with node --check.
* fix(brief): aiDigestEnabled default + per-user rule dedupe
Addresses two fourth-round review findings on PR #3154.
1. aiDigestEnabled default parity (todo 224). Composer was checking
`!rule.aiDigestEnabled`, which skips legacy rules that predate the
optional field. The rest of the codebase defaults it to true
(seed-digest-notifications.mjs:914 uses `!== false`;
notifications-settings.ts:228 uses `?? true`; the Convex setter
defaults to true). Flipped the composer to `=== false` so only an
explicit opt-out skips the brief.
2. Multi-variant last-write-wins (todo 225). alertRules are
(userId, variant)-scoped but the brief key is user-scoped
(brief:{userId}:{issueDate}). Users with the full+finance+tech
variants all enabled would produce three competing writes with a
nondeterministic survivor. Added dedupeRulesByUser() that picks
one rule per user: prefers 'full' variant, then most permissive
sensitivity (all > high > critical), tie-breaking on earliest
updatedAt for stability across input reordering. Logs the
occurrence so we can see how often users have multi-variant
configs.
Also hardened against future regressions:
- Moved env-var guards + main() call behind an isMain() check
(feedback_seed_isMain_guard). Previously, importing the script
from a test would fire process.exit(0) on the
BRIEF_COMPOSER_ENABLED=0 branch and kill the test runner. Tests
now load the file cleanly.
- Exported dedupeRulesByUser so the tests can exercise the selection
logic directly.
- The new tests/brief-composer-rule-dedup.test.mjs includes a
cross-module assertion that seed-digest-notifications.mjs still
reads `rule.aiDigestEnabled !== false`. If the digest cron ever
drifts, this test fails loud — the brief and digest must agree on
who is eligible.
Tests: 83/83 (was 72; +6 dedupe cases + 5 aiDigestEnabled parity
cases). Typecheck + lint clean.
* fix(brief): dedupe order + failure-rate denominator
Addresses two fifth-round review findings on PR #3154.
1. Dedupe was picking a preferred variant BEFORE checking whether it
could actually emit a brief (todo 226). A user with
aiDigestEnabled=false on 'full' but true on 'finance' got skipped
entirely; same for a user with sensitivity='critical' on 'full'
that filters to zero stories while 'finance' has matching content.
Replaced dedupeRulesByUser with groupEligibleRulesByUser: pre-
filters opted-out rules, then returns ALL eligible variants per
user in preference order. The main loop walks candidates and
takes the first one whose story filter produces non-empty content.
Fallback is cheap (story filter is pure) and preserves the 'full'-
first + most-permissive-sensitivity tie-breakers from before.
dedupeRulesByUser is kept as a thin wrapper for the existing tests;
new tests exercise the group+fallback path directly (opt-out +
opt-in sibling, all-opted-out drop, ordering stability).
2. Failure gate denominator drifted from numerator (todo 227). After
dedupe, `failed` counts per-user but the gate still compared to
pre-dedupe rules.length. 60 rules → 10 users → 2 failed writes =
20% real failure hidden behind a 60-rule denominator.
Fix: denominator is now eligibleUserCount (Map size after
group-and-filter). Log line reports rules + eligible_users +
success + skipped_empty + failed + duration so ops can see the
full shape.
Tests: 86/86 (was 83; +3 new: opt-out+sibling, all-opted-out drop,
candidate-ordering). Typecheck clean, node --check clean, biome clean.
* fix(brief): body-POST SETEX + attempted-only failure denominator
Addresses two sixth-round review findings on PR #3154.
1. Upstash SETEX (todo 228). The previous write path URL-encoded the
full envelope into /setex/{key}/{ttl}/{payload} which can blow
past proxy/edge/Node HTTP request-target limits for realistic
12-story briefs (5-20 KB JSON). Switched to body-POST via the
existing `redisPipeline` helper — same transport every other
write in the repo uses. Per-command error surface is preserved:
the wrapper throws on null pipeline response or on a {error}
entry in the result array.
2. Failure-rate denominator (todo 229). Earlier round switched
denominator from pre-dedupe rules.length to eligibleUserCount,
but the numerator only counts users that actually reached a
write attempt. skipped_empty users inflate eligibleUserCount
without being able to fail, so 4/4 failed writes against 100
eligible (96 skipped_empty) reads as 4% and silently passes.
Denominator is now `success + failed` (attempted writes only).
Extracted shouldExitNonZero({success, failed}) so the denominator
contract lives in a pure function with 7 test cases:
- 0 failures → no exit
- 100% failure on small volume → exits
- 1/20 at exact 5% threshold → exits (documented boundary)
- 1/50 below threshold → no exit
- 2/10 above Math.max(1) floor → exits
- 1/1 single isolated failure → exits
- 0 attempted (no signal) → no exit
Tests: 93/93 (was 86; +7 threshold cases). Typecheck + lint clean.
|