2 Commits

Author SHA1 Message Date
Elie Habib
2f19d96357 feat(brief): route whyMatters through internal analyst-context endpoint (#3248)
* feat(brief): route whyMatters through internal analyst-context endpoint

The brief's "why this is important" callout currently calls Gemini on
only {headline, source, threatLevel, category, country} with no live
state. The LLM can't know whether a ceasefire is on day 2 or day 50,
that IMF flagged >90% gas dependency in UAE/Qatar/Bahrain, or what
today's forecasts look like. Output is generic prose instead of the
situational analysis WMAnalyst produces when given live context.

This PR adds an internal Vercel edge endpoint that reuses a trimmed
variant of the analyst context (country-brief, risk scores, top-3
forecasts, macro signals, market data — no GDELT, no digest-search)
and ships it through a one-sentence LLM call with the existing
WHY_MATTERS_SYSTEM prompt. The endpoint owns its own Upstash cache
(v3 prefix, 6h TTL), supports a shadow mode that runs both paths in
parallel for offline diffing, and is auth'd via RELAY_SHARED_SECRET.

Three-layer graceful degradation (endpoint → legacy Gemini-direct →
stub) keeps the brief shipping on any failure.

Env knobs:
- BRIEF_WHY_MATTERS_PRIMARY=analyst|gemini (default: analyst; typo → gemini)
- BRIEF_WHY_MATTERS_SHADOW=0|1 (default: 1; only '0' disables)
- BRIEF_WHY_MATTERS_SHADOW_SAMPLE_PCT=0..100 (default: 100)
- BRIEF_WHY_MATTERS_ENDPOINT_URL (Railway, optional override)

Cache keys:
- brief:llm:whymatters:v3:{hash16} — envelope {whyMatters, producedBy,
  at}, 6h TTL. Endpoint-owned.
- brief:llm:whymatters:shadow:v1:{hash16} — {analyst, gemini, chosen,
  at}, 7d TTL. Fire-and-forget.
- brief:llm:whymatters:v2:{hash16} — legacy. Cron's fallback path
  still reads/writes during the rollout window; expires in ≤24h.

Tests: 6022 pass (existing 5915 + 12 core + 36 endpoint + misc).
typecheck + typecheck:api + biome on changed files clean.

Plan (Codex-approved after 4 rounds):
docs/plans/2026-04-21-001-feat-brief-why-matters-analyst-endpoint-plan.md

* fix(brief): address /ce:review round 1 findings on PR #3248

Fixes 5 findings from multi-agent review, 2 of them P1:

- #241 P1: `.gitignore !api/internal/**` was too broad — it re-included
  `.env`, `.env.local`, and any future secret file dropped into that
  directory. Narrowed to explicit source extensions (`*.ts`, `*.js`,
  `*.mjs`) so parent `.env` / secrets rules stay in effect inside
  api/internal/.

- #242 P1: `Dockerfile.digest-notifications` did not COPY
  `shared/brief-llm-core.js` + `.d.ts`. Cron would have crashed at
  container start with ERR_MODULE_NOT_FOUND. Added alongside
  brief-envelope + brief-filter COPY lines.

- #243 P2: Cron dropped the endpoint's source/producedBy ground-truth
  signal, violating PR #3247's own round-3 memory
  (feedback_gate_on_ground_truth_not_configured_state.md). Added
  structured log at the call site: `[brief-llm] whyMatters source=<src>
  producedBy=<pb> hash=<h>`. Endpoint response now includes `hash` so
  log + shadow-record pairs can be cross-referenced.

- #244 P2: Defense-in-depth prompt-injection hardening. Story fields
  flowed verbatim into both LLM prompts, bypassing the repo's
  sanitizeForPrompt convention. Added sanitizeStoryFields helper and
  applied in both analyst and gemini paths.

- #245 P2: Removed redundant `validate` option from callLlmReasoning.
  With only openrouter configured in prod, a parse-reject walked the
  provider chain, then fell through to the other path (same provider),
  then the cron's own fallback (same model) — 3x billing on one reject.
  Post-call parseWhyMatters check already handles rejection cleanly.

Deferred to P3 follow-ups (todos 246-248): singleflight, v2 sunset,
misc polish (country-normalize LOC, JSDoc pruning, shadow waitUntil,
auto-sync mirror, context-assembly caching).

Tests: 6022 pass. typecheck + typecheck:api clean.

* fix(brief-why-matters): ctx.waitUntil for shadow write + sanitize legacy fallback

Two P2 findings on PR #3248:

1. Shadow record was fire-and-forget without ctx.waitUntil on an Edge
   function. Vercel can terminate the isolate after response return,
   so the background redisPipeline write completes unreliably — i.e.
   the rollout-validation signal the shadow keys were supposed to
   provide was flaky in production.

   Fix: accept an optional EdgeContext 2nd arg. Build the shadow
   promise up front (so it starts executing immediately) then register
   it with ctx.waitUntil when present. Falls back to plain unawaited
   execution when ctx is absent (local harness / tests).

2. scripts/lib/brief-llm.mjs legacy fallback path called
   buildWhyMattersPrompt(story) on raw fields with no sanitization.
   The analyst endpoint sanitizes before its own prompt build, but
   the fallback is exactly what runs when the endpoint misses /
   errors — so hostile headlines / sources reached the LLM verbatim
   on that path.

   Fix: local sanitizeStoryForPrompt wrapper imports sanitizeForPrompt
   from server/_shared/llm-sanitize.js (existing pattern — see
   scripts/seed-digest-notifications.mjs:41). Wraps story fields
   before buildWhyMattersPrompt. Cache key unchanged (hash is over raw
   story), so cache parity with the analyst endpoint's v3 entries is
   preserved.

Regression guard: new test asserts the fallback prompt strips
"ignore previous instructions", "### Assistant:" line prefixes, and
`<|im_start|>` tokens when injection-crafted fields arrive.

Typecheck + typecheck:api clean. 6023 / 6023 data tests pass.

* fix(digest-cron): COPY server/_shared/llm-sanitize into digest-notifications image

Reviewer P1 on PR #3248: my previous commit (4eee22083) added
`import sanitizeForPrompt from server/_shared/llm-sanitize.js` to
scripts/lib/brief-llm.mjs, but Dockerfile.digest-notifications cherry-
picks server/_shared/* files and doesn't copy llm-sanitize. Import is
top-level/static — the container would crash at module load with
ERR_MODULE_NOT_FOUND the moment seed-digest-notifications.mjs pulls in
scripts/lib/brief-llm.mjs. Not just on fallback — every startup.

Fix: add `COPY server/_shared/llm-sanitize.js server/_shared/llm-sanitize.d.ts`
next to the existing brief-render COPY line. Module is pure string
manipulation with zero transitive imports — nothing else needs to land.

Cites feedback_validation_docker_ship_full_scripts_dir.md in the comment
next to the COPY; the cherry-pick convention keeps biting when new
cross-dir imports land in scripts/lib/ or scripts/shared/.

Can't regression-test at build time from this branch without a
docker-build CI job, but the symptom is deterministic — local runs
remain green (they resolve against the live filesystem); only the
container crashes. Post-merge, Railway redeploy of seed-digest-
notifications should show a clean `Starting Container` log line
instead of the MODULE_NOT_FOUND crash my prior commit would have caused.
2026-04-21 14:03:27 +04:00
Elie Habib
711636c7b6 feat(brief): consolidate composer into digest cron (retire standalone service) (#3157)
* feat(brief): consolidate composer into digest cron (retire standalone service)

Merges the Phase 3a standalone Railway composer into the existing
digest cron. End state: one cron (seed-digest-notifications.mjs)
writes brief:{userId}:{issueDate} for every eligible user AND
dispatches the digest to their configured channels with a signed
magazine URL appended. Net -1 Railway service.

User's architectural note: "there is no reason to have 1 digest
preparing all and sending, then another doing a duplicate". This
delivers that — infrastructure consolidation, same send cadence,
single source of truth for brief envelopes.

File moves / deletes:

- scripts/seed-brief-composer.mjs → scripts/lib/brief-compose.mjs
  Pure-helpers library: no main(), no env guards, no cron. Exports
  composeBriefForRule + groupEligibleRulesByUser + dedupeRulesByUser
  (shim) + shouldExitNonZero + date helpers + extractInsights.
- Dockerfile.seed-brief-composer → deleted.
- The seed-brief-composer Railway service is retired (user confirmed
  they would delete it manually).

New files:

- scripts/lib/brief-url-sign.mjs — plain .mjs port of the sign path
  in server/_shared/brief-url.ts (Web Crypto only, no node:crypto).
- tests/brief-url-sign.test.mjs — parity tests that confirm tokens
  minted by the scripts-side signer verify via the edge-side verifier
  and produce byte-identical output for identical input.

Digest cron (scripts/seed-digest-notifications.mjs):

- Reads news:insights:v1 once per run, composes per-user brief
  envelopes, SETEX brief:{userId}:{issueDate} via body-POST pipeline.
- Signs magazine URL per user (BRIEF_URL_SIGNING_SECRET +
  WORLDMONITOR_PUBLIC_BASE_URL new env requirements, see pre-merge).
- Injects magazineUrl into buildChannelBodies for every channel
  (email, telegram, slack, discord) as a "📖 Open your WorldMonitor
  Brief magazine" footer CTA.
- Email HTML gets a dedicated data-brief-cta-slot near the top of
  the body with a styled button.
- Compose failures NEVER block the digest send — the digest cron's
  existing behaviour is preserved when the brief pipeline has issues.
- Brief compose extracted to its own functions (composeBriefsForRun
  + composeAndStoreBriefForUser) to keep main's biome complexity at
  baseline (64 — was 63 before; inline would have pushed to 117).

Tests: 98/98 across the brief suite. New parity tests confirm cross-
module signer agreement.

PRE-MERGE: add BRIEF_URL_SIGNING_SECRET and WORLDMONITOR_PUBLIC_BASE_URL
to the digest-notifications Railway service env (same values already
set on Vercel for Phase 2). Without them, brief compose is auto-
disabled and the digest falls back to its current behaviour — safe to
deploy before env is set.

* fix(brief): digest Dockerfile + propagate compose failure to exit code

Addresses two seventh-round review findings on PR #3157.

1. Cross-directory imports + current Railway build root (todo 230).
   The consolidated digest cron imports from ../api, ../shared, and
   (transitively via scripts/lib/brief-compose.mjs) ../server/_shared.
   The running digest-notifications Railway service builds from the
   scripts/ root — those parent paths are outside the deploy tree
   and would 500 on next rebuild with ERR_MODULE_NOT_FOUND.

   New Dockerfile.digest-notifications (repo-root build context)
   COPYs exactly the modules the cron needs: scripts/ contents,
   scripts/lib/, shared/brief-envelope.*, shared/brief-filter.*,
   server/_shared/brief-render.*, api/_upstash-json.js,
   api/_seed-envelope.js. Tight list to keep the watch surface small.
   Pattern matches the retired Dockerfile.seed-brief-composer + the
   existing Dockerfile.relay.

2. Silent compose failures (todo 231). composeBriefsForRun logged
   counters but never exited non-zero. An Upstash outage or missing
   signing secret silently dropped every brief write while Railway
   showed the cron green. The retired standalone composer exited 1
   on structural failures; that observability was lost in the
   consolidation.

   Changed the compose fn to return {briefByUser, composeSuccess,
   composeFailed}. Main captures the counters, runs the full digest
   send loop first (compose-layer breakage must NEVER block user-
   visible digest delivery), then calls shouldExitNonZero at the
   very end. Exit-on-failure gives ops the Railway-red signal
   without touching send behaviour.

   Also: a total read failure of news:insights:v1 (catch branch)
   now counts as 1 compose failure so the gate trips on insights-
   key infra breakage, not just per-user write failures.

Tests unchanged (98/98). Typecheck + node --check clean. Biome
complexity ticks 63→65 — same pre-existing bucket, already tolerated
by CI; no new blocker.

PRE-MERGE Railway work still pending: set BRIEF_URL_SIGNING_SECRET
+ WORLDMONITOR_PUBLIC_BASE_URL on the digest-notifications service,
AND switch its dockerfilePath to /Dockerfile.digest-notifications
before merging. Without the dockerfilePath switch, the next rebuild
fails.

* fix(brief): Dockerfile type:module + explicit missing-secret tripwire

Addresses two eighth-round review findings on PR #3157.

1. ESM .js files parse as CommonJS in the container (todo 232).
   Dockerfile.digest-notifications COPYs shared/*.js,
   server/_shared/*.js, api/*.js — all ESM because the repo-root
   package.json has "type":"module". But the image never copies the
   root package.json, so Node's nearest-pjson walk inside /app/
   reaches / without finding one and defaults to CommonJS. First
   `export` statement throws `SyntaxError: Unexpected token 'export'`
   at startup.

   Fix: write a minimal /app/package.json with {"type":"module"}
   early in the build. Avoids dragging the full root package.json
   into the image while still giving Node the ESM hint it needs for
   repo-owned .js files.

2. Missing BRIEF_URL_SIGNING_SECRET silently tolerated (todo 233).
   The old gate folded "operator-disabled" (BRIEF_COMPOSE_ENABLED=0)
   and "required secret missing in rollout" into the same boolean
   via AND. A production deploy that forgot the env var would skip
   brief compose without any failure signal — Railway green, no
   briefs, no CTA in digests, nobody notices.

   Split the two states: BRIEF_COMPOSE_DISABLED_BY_OPERATOR (explicit
   kill switch, silent) and BRIEF_SIGNING_SECRET_MISSING (the misconfig
   we care about). When the secret is missing without the operator
   flag, composeBriefsForRun returns composeFailed=1 on first call
   so the end-of-run exit gate trips and Railway flags the run red.
   Digest send still proceeds — compose-layer issues never block
   notifications.

Tests: 98/98. Syntax + node --check clean.

* fix(brief): address 2 remaining P2 review comments on PR #3157

Greptile review (2026-04-18T05:04Z) flagged three P2 items. The
first (shouldExitNonZero never wired into cron) was already fixed in
commit 35a46aa34. This commit addresses the other two.

1. composeBriefForRule: issuedAt used Date.now() instead of the
   caller-supplied nowMs. Under the digest cron the delta is
   milliseconds and harmless, but it broke the function's
   determinism contract — same input must produce same output for
   tests + retries. Now uses the passed nowMs.

2. buildChannelBodies: magazineUrl embedded raw inside Telegram HTML
   <a href="..."> and Slack <URL|text> syntax. The URL is HMAC-
   signed and shape-validated upstream (userId regex + YYYY-MM-DD
   date), so injection is practically impossible — but the email
   CTA (injectBriefCta) escapes per-target and channel footers
   should match that discipline. Added:
     - Telegram: escape &, <, >, " to HTML entities
     - Slack: strip <, >, | (mrkdwn metacharacters)
   Discord and plain-text paths unchanged — Discord links tolerate
   raw URLs, plain text has no metacharacters to escape.

Tests: 98/98 still pass (deterministic issuedAt change was
transparent to existing assertions because tests already pass nowMs
explicitly via the issuedAt fixture field).
2026-04-18 12:30:08 +04:00