Files
worldmonitor/scripts/lib/brief-llm.mjs
Elie Habib 2f5445284b fix(brief): single canonical synthesis brain — eliminate email/brief lead divergence (#3396)
* feat(brief-llm): canonical synthesis prompt + v3 cache key

Extends generateDigestProse to be the single source of truth for
brief executive-summary synthesis (canonicalises what was previously
split between brief-llm's generateDigestProse and seed-digest-
notifications.mjs's generateAISummary). Ports Brain B's prompt
features into buildDigestPrompt:

- ctx={profile, greeting, isPublic} parameter (back-compat: 4-arg
  callers behave like today)
- per-story severity uppercased + short-hash prefix [h:XXXX] so the
  model can emit rankedStoryHashes for stable re-ranking
- profile lines + greeting opener appear only when ctx.isPublic !== true

validateDigestProseShape gains optional rankedStoryHashes (≥4-char
strings, capped to MAX_STORIES_PER_USER × 2). v2-shaped rows still
pass — field defaults to [].

hashDigestInput v3:
- material includes profile-SHA, greeting bucket, isPublic flag,
  per-story hash
- isPublic=true substitutes literal 'public' for userId in the cache
  key so all share-URL readers of the same (date, sensitivity, pool)
  hit ONE cache row (no PII in public cache key)

Adds generateDigestProsePublic(stories, sensitivity, deps) wrapper —
no userId param by design — for the share-URL surface.

Cache prefix bumped brief:llm:digest:v2 → v3. v2 rows expire on TTL.
Per the v1→v2 precedent (see hashDigestInput comment), one-tick cost
on rollout is acceptable for cache-key correctness.

Tests: 72/72 passing in tests/brief-llm.test.mjs (8 new for the v3
behaviors), full data suite 6952/6952.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 1, Codex-approved (5 rounds).

* feat(brief): envelope v3 — adds digest.publicLead for share-URL surface

Bumps BRIEF_ENVELOPE_VERSION 2 → 3. Adds optional
BriefDigest.publicLead — non-personalised executive lead generated
by generateDigestProsePublic (already in this branch from the
previous commit) for the public share-URL surface. Personalised
`lead` is the canonical synthesis for authenticated channels;
publicLead is its profile-stripped sibling so api/brief/public/*
never serves user-specific content (watched assets/regions).

SUPPORTED_ENVELOPE_VERSIONS = [1, 2, 3] keeps v1 + v2 envelopes
in the 7-day TTL window readable through the rollout — the
composer only ever writes the current version, but readers must
tolerate older shapes that haven't expired yet. Same rollout
pattern used at the v1 → v2 bump.

Renderer changes (server/_shared/brief-render.js):
- ALLOWED_DIGEST_KEYS gains 'publicLead' (closed-key-set still
  enforced; v2 envelopes pass because publicLead === undefined is
  the v2 shape).
- assertBriefEnvelope: new isNonEmptyString check on publicLead
  when present. Type contract enforced; absence is OK.

Tests (tests/brief-magazine-render.test.mjs):
- New describe block "v3 publicLead field": v3 envelope renders;
  malformed publicLead rejected; v2 envelope still passes; ad-hoc
  digest keys (e.g. synthesisLevel) still rejected — confirming
  the closed-key-set defense holds for the cron-local-only fields
  the orchestrator must NOT persist.
- BRIEF_ENVELOPE_VERSION pin updated 2 → 3 with rollout-rationale
  comment.

Test results: 182 brief-related tests pass; full data suite
6956/6956.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 2, Codex Round-3 Medium #2.

* feat(brief): synthesis splice + rankedStoryHashes pre-cap re-order

Plumbs the canonical synthesis output (lead, threads, signals,
publicLead, rankedStoryHashes from generateDigestProse) through the
pure composer so the orchestration layer can hand pre-resolved data
into envelope.digest. Composer stays sync / no I/O — Codex Round-2
High #2 honored.

Changes:

scripts/lib/brief-compose.mjs:
- digestStoryToUpstreamTopStory now emits `hash` (the digest story's
  stable identifier, falls back to titleHash when absent). Without
  this, rankedStoryHashes from the LLM has nothing to match against.
- composeBriefFromDigestStories accepts opts.synthesis = {lead,
  threads, signals, rankedStoryHashes?, publicLead?}. When passed,
  splices into envelope.digest after the stub is built. Partial
  synthesis (e.g. only `lead` populated) keeps stub defaults for the
  other fields — graceful degradation when L2 fallback fires.

shared/brief-filter.js:
- filterTopStories accepts optional rankedStoryHashes. New helper
  applyRankedOrder re-orders stories by short-hash prefix match
  BEFORE the cap is applied, so the model's editorial judgment of
  importance survives MAX_STORIES_PER_USER. Stable for ties; stories
  not in the ranking come after in original order. Empty/missing
  ranking is a no-op (legacy callers unchanged).

shared/brief-filter.d.ts:
- filterTopStories signature gains rankedStoryHashes?: string[].
- UpstreamTopStory gains hash?: unknown (carried through from
  digestStoryToUpstreamTopStory).

Tests added (tests/brief-from-digest-stories.test.mjs):
- synthesis substitutes lead/threads/signals/publicLead.
- legacy 4-arg callers (no synthesis) keep stub lead.
- partial synthesis (only lead) keeps stub threads/signals.
- rankedStoryHashes re-orders pool before cap.
- short-hash prefix match (model emits 8 chars; story carries full).
- unranked stories go after in original order.

Test results: 33/33 in brief-from-digest-stories; 182/182 across all
brief tests; full data suite 6956/6956.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 3, Codex Round-2 Low + Round-2 High #2.

* feat(brief): single canonical synthesis per user; rewire all channels

Restructures the digest cron's per-user compose + send loops to
produce ONE canonical synthesis per user per issueSlot — the lead
text every channel (email HTML, plain-text, Telegram, Slack,
Discord, webhook) and the magazine show is byte-identical. This
eliminates the "two-brain" divergence that was producing different
exec summaries on different surfaces (observed 2026-04-25 0802).

Architecture:

composeBriefsForRun (orchestration):
- Pre-annotates every eligible rule with lastSentAt + isDue once,
  before the per-user pass. Same getLastSentAt helper the send loop
  uses so compose + send agree on lastSentAt for every rule.

composeAndStoreBriefForUser (per-user):
- Two-pass winner walk: try DUE rules first (sortedDue), fall back
  to ALL eligible rules (sortedAll) for compose-only ticks.
  Preserves today's dashboard refresh contract for weekly /
  twice_daily users on non-due ticks (Codex Round-4 High #1).
- Within each pass, walk by compareRules priority and pick the
  FIRST candidate with a non-empty pool — mirrors today's behavior
  at scripts/seed-digest-notifications.mjs:1044 and prevents the
  "highest-priority but empty pool" edge case (Codex Round-4
  Medium #2).
- Three-level synthesis fallback chain:
    L1: generateDigestProse(fullPool, ctx={profile,greeting,!public})
    L2: generateDigestProse(envelope-sized slice, ctx={})
    L3: stub from assembleStubbedBriefEnvelope
  Distinct log lines per fallback level so ops can quantify
  failure-mode distribution.
- Generates publicLead in parallel via generateDigestProsePublic
  (no userId param; cache-shared across all share-URL readers).
- Splices synthesis into envelope via composer's optional
  `synthesis` arg (Step 3); rankedStoryHashes re-orders the pool
  BEFORE the cap so editorial importance survives MAX_STORIES.
- synthesisLevel stored in the cron-local briefByUser entry — NOT
  persisted in the envelope (renderer's assertNoExtraKeys would
  reject; Codex Round-2 Medium #5).

Send loop:
- Reads lastSentAt via shared getLastSentAt helper (single source
  of truth with compose flow).
- briefLead = brief?.envelope?.data?.digest?.lead — the canonical
  lead. Passed to buildChannelBodies (text/Telegram/Slack/Discord),
  injectEmailSummary (HTML email), and sendWebhook (webhook
  payload's `summary` field). All-channel parity (Codex Round-1
  Medium #6).
- Subject ternary reads cron-local synthesisLevel: 1 or 2 →
  "Intelligence Brief", 3 → "Digest" (preserves today's UX for
  fallback paths; Codex Round-1 Missing #5).

Removed:
- generateAISummary() — the second LLM call that produced the
  divergent email lead. ~85 lines.
- AI_SUMMARY_CACHE_TTL constant — no longer referenced. The
  digest:ai-summary:v1:* cache rows expire on their existing 1h
  TTL (no cleanup pass).

Helpers added:
- getLastSentAt(rule) — extracted Upstash GET for digest:last-sent
  so compose + send both call one source of truth.
- buildSynthesisCtx(rule, nowMs) — formats profile + greeting for
  the canonical synthesis call. Preserves all today's prefs-fetch
  failure-mode behavior.

Composer:
- compareRules now exported from scripts/lib/brief-compose.mjs so
  the cron can sort each pass identically to groupEligibleRulesByUser.

Test results: full data suite 6962/6962 (was 6956 pre-Step 4; +6
new compose-synthesis tests from Step 3).

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 4 + 4b. Codex-approved (5 rounds).

* fix(brief-render): public-share lead fail-safe — never leak personalised lead

Public-share render path (api/brief/public/[hash].ts → renderer
publicMode=true) MUST NEVER serve the personalised digest.lead
because that string can carry profile context — watched assets,
saved-region names, etc. — written by generateDigestProse with
ctx.profile populated.

Previously: redactForPublic redacted user.name and stories.whyMatters
but passed digest.lead through unchanged. Codex Round-2 High
(security finding).

Now (v3 envelope contract):
- redactForPublic substitutes digest.lead = digest.publicLead when
  the v3 envelope carries one (generated by generateDigestProsePublic
  with profile=null, cache-shared across all public readers).
- When publicLead is absent (v2 envelope still in TTL window OR v3
  envelope where publicLead generation failed), redactForPublic sets
  digest.lead to empty string.
- renderDigestGreeting: when lead is empty, OMIT the <blockquote>
  pull-quote entirely. Page still renders complete (greeting +
  horizontal rule), just without the italic lead block.
- NEVER falls back to the original personalised lead.

assertBriefEnvelope still validates publicLead's contract (when
present, must be a non-empty string) BEFORE redactForPublic runs,
so a malformed publicLead throws before any leak risk.

Tests added (tests/brief-magazine-render.test.mjs):
- v3 envelope renders publicLead in pull-quote, personalised lead
  text never appears.
- v2 envelope (no publicLead) omits pull-quote; rest of page
  intact.
- empty-string publicLead rejected by validator (defensive).
- private render still uses personalised lead.

Test results: 68 brief-magazine-render tests pass; full data suite
remains green from prior commit.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 5, Codex Round-2 High (security).

* feat(digest): brief lead parity log + extra acceptance tests

Adds the parity-contract observability line and supplementary
acceptance tests for the canonical synthesis path.

Parity log (per send, after successful delivery):
  [digest] brief lead parity user=<id> rule=<v>:<s>:<lang>
    synthesis_level=<1|2|3> exec_len=<n> brief_lead_len=<n>
    channels_equal=<bool> public_lead_len=<n>

When channels_equal=false an extra WARN line fires —
"PARITY REGRESSION user=… — email lead != envelope lead." Sentry's
existing console-breadcrumb hook lifts this without an explicit
captureMessage call. Plan acceptance criterion A5.

Tests added (tests/brief-llm.test.mjs, +9):
- generateDigestProsePublic: two distinct callers with identical
  (sensitivity, story-pool) hit the SAME cache row (per Codex
  Round-2 Medium #4 — "no PII in public cache key").
- public + private writes never collide on cache key (defensive).
- greeting bucket change re-keys the personalised cache (Brain B
  parity).
- profile change re-keys the personalised cache.
- v3 cache prefix used (no v2 writes).

Test results: 77/77 in brief-llm; full data suite 6971/6971
(was 6962 pre-Step-7; +9 new public-cache tests).

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 6 (partial) + 7. Acceptance A5, A6.g, A6.f.

* test(digest): backfill A6.h/i/l/m acceptance tests via helper extraction

* fix(brief): close two correctness regressions on multi-rule + public surface

Two findings from human review of the canonical-synthesis PR:

1. Public-share redaction leaked personalised signals + threads.
   The new prompt explicitly personalises both `lead` and `signals`
   ("personalise lead and signals"), but redactForPublic only
   substituted `lead` — leaving `signals` and `threads` intact.
   Public renderer's hasSignals gate would emit the signals page
   whenever `digest.signals.length > 0`, exposing watched-asset /
   region phrasing to anonymous readers. Same privacy bug class
   the original PR was meant to close, just on different fields.

2. Multi-rule users got cross-pool lead/storyList mismatch.
   composeAndStoreBriefForUser picks ONE winning rule for the
   canonical envelope. The send loop then injected that ONE
   `briefLead` into every due rule's channel body — even though
   each rule's storyList came from its own (per-rule) digest pool.
   Multi-rule users (e.g. `full` + `finance`) ended up with email
   bodies leading on geopolitics while listing finance stories.
   Cross-rule editorial mismatch reintroduced after the cross-
   surface fix.

Fix 1 — public signals + threads:
- Envelope shape: BriefDigest gains `publicSignals?: string[]` +
  `publicThreads?: BriefThread[]` (sibling fields to publicLead).
  Renderer's ALLOWED_DIGEST_KEYS extended; assertBriefEnvelope
  validates them when present.
- generateDigestProsePublic already returned a full prose object
  (lead + signals + threads) — orchestration now captures all
  three instead of just `.lead`. Composer splices each into its
  envelope slot.
- redactForPublic substitutes:
    digest.lead    ← publicLead (or empty → omits pull-quote)
    digest.signals ← publicSignals (or empty → omits signals page)
    digest.threads ← publicThreads (or category-derived stub via
                     new derivePublicThreadsStub helper — never
                     falls back to the personalised threads)
- New tests cover all three substitutions + their fail-safes.

Fix 2 — per-rule synthesis in send loop:
- Each due rule independently calls runSynthesisWithFallback over
  ITS OWN pool + ctx. Channel body lead is internally consistent
  with the storyList (both from the same pool).
- Cache absorbs the cost: when this is the winner rule, the
  synthesis hits the cache row written during the compose pass
  (same userId/sensitivity/pool/ctx) — no extra LLM call. Only
  multi-rule users with non-overlapping pools incur additional
  LLM calls.
- magazineUrl still points at the winner's envelope (single brief
  per user per slot — `(userId, issueSlot)` URL contract). Channel
  lead vs magazine lead may differ for non-winner rule sends;
  documented as acceptable trade-off (URL/key shape change to
  support per-rule magazines is out of scope for this PR).
- Parity log refined: adds `winner_match=<bool>` field. The
  PARITY REGRESSION warning now fires only when winner_match=true
  AND the channel lead differs from the envelope lead (the actual
  contract regression). Non-winner sends with legitimately
  different leads no longer spam the alert.

Test results:
- tests/brief-magazine-render.test.mjs: 75/75 (+7 new for public
  signals/threads + validator + private-mode-ignores-public-fields)
- Full data suite: 6995/6995 (was 6988; +7 net)
- typecheck + typecheck:api: clean

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Addresses 2 review findings on PR #3396 not anticipated in the
5-round Codex review.

* fix(brief): unify compose+send window, fall through filter-rejection

Address two residual risks in PR #3396 (single-canonical-brain refactor):

Risk 1 — canonical lead synthesized from a fixed 24h pool while the
send loop ships stories from `lastSentAt ?? 24h`. For weekly users
that meant a 24h-pool lead bolted onto a 7d email body — the same
cross-surface divergence the refactor was meant to eliminate, just in
a different shape. Twice-daily users hit a 12h-vs-24h variant.

Fix: extract the window formula to `digestWindowStartMs(lastSentAt,
nowMs, defaultLookbackMs)` in digest-orchestration-helpers.mjs and
call it from BOTH the compose path's digestFor closure AND the send
loop. The compose path now derives windowStart per-candidate from
`cand.lastSentAt`, identical to what the send loop will use for that
rule. Removed the now-unused BRIEF_STORY_WINDOW_MS constant.

Side-effect: digestFor now receives the full annotated candidate
(`cand`) instead of just the rule, so it can reach `cand.lastSentAt`.
Backwards-compatible at the helper level — pickWinningCandidateWithPool
forwards `cand` instead of `cand.rule`.

Cache memo hit rate drops since lastSentAt varies per-rule, but
correctness > a few extra Upstash GETs.

Risk 2 — pickWinningCandidateWithPool returned the first candidate
with a non-empty raw pool as winner. If composeBriefFromDigestStories
then dropped every story (URL/headline/shape filters), the caller
bailed without trying lower-priority candidates. Pre-PR behaviour was
to keep walking. This regressed multi-rule users whose top-priority
rule's pool happens to be entirely filter-rejected.

Fix: optional `tryCompose(cand, stories)` callback on
pickWinningCandidateWithPool. When provided, the helper calls it after
the non-empty pool check; falsy return → log filter-rejected and walk
to the next candidate; truthy → returns `{winner, stories,
composeResult}` so the caller can reuse the result. Without the
callback, legacy semantics preserved (existing tests + callers
unaffected).

Caller composeAndStoreBriefForUser passes a no-synthesis compose call
as tryCompose — cheap pure-JS, no I/O. Synthesis only runs once after
the winner is locked in, so the perf cost is one extra compose per
filter-rejected candidate, no extra LLM round-trips.

Tests:
- 10 new cases in tests/digest-orchestration-helpers.test.mjs
  covering: digestFor receiving full candidate; tryCompose
  fall-through to lower-priority; all-rejected returns null;
  composeResult forwarded; legacy semantics without tryCompose;
  digestWindowStartMs lastSentAt-vs-default branches; weekly +
  twice-daily window parity assertions; epoch-zero ?? guard.
- Updated tests/digest-cache-key-sensitivity.test.mjs static-shape
  regex to match the new `cand.rule.sensitivity` cache-key shape
  (intent unchanged: cache key MUST include sensitivity).

Stacked on PR #3396 — targets feat/brief-two-brain-divergence.
2026-04-25 16:22:31 +04:00

725 lines
31 KiB
JavaScript
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
// Phase 3b: LLM enrichment for the WorldMonitor Brief envelope.
//
// Substitutes the stubbed `whyMatters` per story and the stubbed
// executive summary (`digest.lead` / `digest.threads` / `digest.signals`)
// with Gemini 2.5 Flash output via the existing OpenRouter-backed
// callLLM chain. The LLM provider is pinned to openrouter by
// skipProviders:['ollama','groq'] so the brief's editorial voice
// stays on one model across environments.
//
// Deliberately:
// - Pure parse/build helpers are exported for testing without IO.
// - Cache layer is parameterised (cacheGet / cacheSet) so tests use
// an in-memory stub and production uses Upstash.
// - Any failure (null LLM result, parse error, cache hiccup) falls
// through to the original stub — the brief must always ship.
//
// Cache semantics:
// - brief:llm:whymatters:v3:{storyHash}:{leadHash} — 24h, shared
// across users for the same (story, lead) pair. v3 includes
// SHA-256 of the resolved digest lead so per-story rationales
// re-generate when the lead changes (rationales must align with
// the headline frame). v2 rows were lead-blind and could drift.
// - brief:llm:digest:v3:{userId|public}:{sensitivity}:{poolHash}
// — 4h. The canonical synthesis is now ALWAYS produced through
// this path (formerly split with `generateAISummary` in the
// digest cron). Material includes profile-SHA, greeting bucket,
// isPublic flag, and per-story hash so cache hits never serve a
// differently-ranked or differently-personalised prompt.
// When isPublic=true, the userId slot in the key is the literal
// string 'public' so all public-share readers of the same
// (date, sensitivity, story-pool) hit the same row — no PII in
// the public cache key. v2 rows ignored on rollout.
import { createHash } from 'node:crypto';
import {
WHY_MATTERS_SYSTEM,
buildWhyMattersUserPrompt,
hashBriefStory,
parseWhyMatters,
} from '../../shared/brief-llm-core.js';
import { sanitizeForPrompt } from '../../server/_shared/llm-sanitize.js';
// Single source of truth for the brief story cap. Both buildDigestPrompt
// and hashDigestInput must slice to this value or the LLM prose drifts
// from the rendered story cards (PR #3389 reviewer P1).
import { MAX_STORIES_PER_USER } from './brief-compose.mjs';
/**
* Sanitize the story fields that flow into buildWhyMattersUserPrompt and
* buildStoryDescriptionPrompt. Mirrors
* server/worldmonitor/intelligence/v1/brief-why-matters-prompt.ts
* sanitizeStoryFields — the legacy Railway fallback path must apply the
* same defense as the analyst endpoint, since this is exactly what runs
* when the endpoint misses / returns null / throws.
*
* `description` is included because the RSS-description fix (2026-04-24)
* now threads untrusted article bodies into the description prompt as
* grounding context. Without sanitising it, a hostile feed's
* `<description>` is an unsanitised injection vector — the asymmetry with
* whyMatters (already sanitised) was a latent bug, fixed here.
*
* Kept local (not promoted to brief-llm-core.js) because llm-sanitize.js
* only lives in server/_shared and the edge endpoint already sanitizes
* before its own buildWhyMattersUserPrompt call.
*
* @param {{ headline?: string; source?: string; threatLevel?: string; category?: string; country?: string; description?: string }} story
*/
function sanitizeStoryForPrompt(story) {
return {
headline: sanitizeForPrompt(story.headline ?? ''),
source: sanitizeForPrompt(story.source ?? ''),
threatLevel: sanitizeForPrompt(story.threatLevel ?? ''),
category: sanitizeForPrompt(story.category ?? ''),
country: sanitizeForPrompt(story.country ?? ''),
description: sanitizeForPrompt(story.description ?? ''),
};
}
// Re-export for backcompat with existing tests / callers.
export { WHY_MATTERS_SYSTEM, hashBriefStory, parseWhyMatters };
export const buildWhyMattersPrompt = buildWhyMattersUserPrompt;
// ── Tunables ───────────────────────────────────────────────────────────────
const WHY_MATTERS_TTL_SEC = 24 * 60 * 60;
const DIGEST_PROSE_TTL_SEC = 4 * 60 * 60;
const STORY_DESCRIPTION_TTL_SEC = 24 * 60 * 60;
const WHY_MATTERS_CONCURRENCY = 5;
// Pin to openrouter (google/gemini-2.5-flash). Ollama isn't deployed
// in Railway and groq (llama-3.1-8b) produces noticeably less
// editorial prose than Gemini Flash.
const BRIEF_LLM_SKIP_PROVIDERS = ['ollama', 'groq'];
// ── whyMatters (per story) ─────────────────────────────────────────────────
// The pure helpers (`WHY_MATTERS_SYSTEM`, `buildWhyMattersUserPrompt` (aliased
// to `buildWhyMattersPrompt` for backcompat), `parseWhyMatters`, `hashBriefStory`)
// live in `shared/brief-llm-core.js` so the Vercel-edge endpoint
// (`api/internal/brief-why-matters.ts`) can import them without pulling in
// `node:crypto`. See the `shared/` → `scripts/shared/` mirror convention.
/**
* Resolve a `whyMatters` sentence for one story.
*
* Three-layer graceful degradation:
* 1. `deps.callAnalystWhyMatters(story)` — the analyst-context edge
* endpoint (brief:llm:whymatters:v3 cache lives there). Preferred.
* 2. Legacy direct-Gemini chain: cacheGet (v2) → callLLM → cacheSet.
* Runs whenever the analyst call is missing, returns null, or throws.
* 3. Caller (enrichBriefEnvelopeWithLLM) uses the baseline stub if
* this function returns null.
*
* Returns null on all-layer failure.
*
* @param {object} story
* @param {{
* callLLM: (system: string, user: string, opts: object) => Promise<string|null>;
* cacheGet: (key: string) => Promise<unknown>;
* cacheSet: (key: string, value: unknown, ttlSec: number) => Promise<void>;
* callAnalystWhyMatters?: (story: object) => Promise<string|null>;
* }} deps
*/
export async function generateWhyMatters(story, deps) {
// Priority path: analyst endpoint. It owns its own cache and has
// ALREADY validated the output via parseWhyMatters (gemini path) or
// parseWhyMattersV2 (analyst path, multi-sentence). We must NOT
// re-parse here with the single-sentence v1 parser — that silently
// truncates v2's 23-sentence output to the first sentence. Trust
// the wire shape; only reject an obviously-bad payload (empty, stub
// echo, or length outside the legal bounds for either parser).
if (typeof deps.callAnalystWhyMatters === 'function') {
try {
const analystOut = await deps.callAnalystWhyMatters(story);
if (typeof analystOut === 'string') {
const trimmed = analystOut.trim();
const lenOk = trimmed.length >= 30 && trimmed.length <= 500;
const notStub = !/^story flagged by your sensitivity/i.test(trimmed);
if (lenOk && notStub) return trimmed;
console.warn(
`[brief-llm] callAnalystWhyMatters → fallback: endpoint returned out-of-bounds or stub (len=${trimmed.length})`,
);
} else {
console.warn('[brief-llm] callAnalystWhyMatters → fallback: null/empty response');
}
} catch (err) {
console.warn(
`[brief-llm] callAnalystWhyMatters → fallback: ${err instanceof Error ? err.message : String(err)}`,
);
}
}
// Fallback path: legacy direct-Gemini chain with the v3 cache.
// Bumped v2→v3 on 2026-04-24 alongside the RSS-description fix: rows
// keyed on the prior v2 prefix were produced from headline-only prompts
// and may reference hallucinated named actors. The prefix bump forces
// a clean cold-start on first tick after deploy; entries expire in
// ≤24h so the prior prefix ages out naturally without a DEL sweep.
const key = `brief:llm:whymatters:v3:${await hashBriefStory(story)}`;
try {
const hit = await deps.cacheGet(key);
if (typeof hit === 'string' && hit.length > 0) return hit;
} catch { /* cache miss is fine */ }
// Sanitize story fields before interpolating into the prompt. The analyst
// endpoint already does this; without it the Railway fallback path was an
// unsanitized injection vector for any future untrusted `source` / `headline`.
const { system, user } = buildWhyMattersPrompt(sanitizeStoryForPrompt(story));
let text = null;
try {
text = await deps.callLLM(system, user, {
maxTokens: 120,
temperature: 0.4,
timeoutMs: 10_000,
skipProviders: BRIEF_LLM_SKIP_PROVIDERS,
});
} catch {
return null;
}
const parsed = parseWhyMatters(text);
if (!parsed) return null;
try {
await deps.cacheSet(key, parsed, WHY_MATTERS_TTL_SEC);
} catch { /* cache write failures don't matter here */ }
return parsed;
}
// ── Per-story description (replaces title-verbatim fallback) ──────────────
const STORY_DESCRIPTION_SYSTEM =
'You are the editor of WorldMonitor Brief, a geopolitical intelligence magazine. ' +
'Given the story attributes below, write ONE concise sentence (1630 words) that ' +
'describes the development itself — not why it matters, not the reader reaction. ' +
'Editorial, serious, past/present tense, named actors where possible. Do NOT ' +
'repeat the headline verbatim. No preamble, no quotes, no questions, no markdown, ' +
'no hedging. One sentence only.';
/**
* @param {{ headline: string; source: string; category: string; country: string; threatLevel: string; description?: string }} story
* @returns {{ system: string; user: string }}
*/
export function buildStoryDescriptionPrompt(story) {
// Grounding context: when the RSS feed carried a real description
// (post-RSS-description fix, 2026-04-24), interpolate it as `Context:`
// between the metadata block and the "One editorial sentence" instruction.
// This is the actual fix for the named-actor hallucination class — the LLM
// now has the article's body to paraphrase instead of filling role-label
// headlines from its parametric priors. Skip when description is empty or
// normalise-equal to the headline (no grounding value; parser already
// filters this but the prompt builder is a second belt-and-braces check).
const normalise = /** @param {string} x */ (x) => x.trim().toLowerCase().replace(/\s+/g, ' ');
const rawDescription = typeof story.description === 'string' ? story.description.trim() : '';
const contextUseful = rawDescription.length > 0
&& normalise(rawDescription) !== normalise(story.headline ?? '');
const contextLine = contextUseful ? `Context: ${rawDescription.slice(0, 400)}` : null;
const lines = [
`Headline: ${story.headline}`,
`Source: ${story.source}`,
`Severity: ${story.threatLevel}`,
`Category: ${story.category}`,
`Country: ${story.country}`,
...(contextLine ? [contextLine] : []),
'',
'One editorial sentence describing what happened (not why it matters):',
];
return { system: STORY_DESCRIPTION_SYSTEM, user: lines.join('\n') };
}
/**
* Parse + validate the LLM story-description output. Rejects empty
* responses, boilerplate preambles that slipped through the system
* prompt, outputs that trivially echo the headline (sanity guard
* against models that default to copying the prompt), and lengths
* that drift far outside the prompted range.
*
* @param {unknown} text
* @param {string} [headline] used to detect headline-echo drift
* @returns {string | null}
*/
export function parseStoryDescription(text, headline) {
if (typeof text !== 'string') return null;
let s = text.trim();
if (!s) return null;
s = s.replace(/^[\u201C"']+/, '').replace(/[\u201D"']+$/, '').trim();
const match = s.match(/^[^.!?]+[.!?]/);
const sentence = match ? match[0].trim() : s;
if (sentence.length < 40 || sentence.length > 400) return null;
if (typeof headline === 'string') {
const normalise = /** @param {string} x */ (x) => x.trim().toLowerCase().replace(/\s+/g, ' ');
// Reject outputs that are a verbatim echo of the headline — that
// is exactly the fallback we're replacing, shipping it as
// "LLM enrichment" would be dishonest about cache spend.
if (normalise(sentence) === normalise(headline)) return null;
}
return sentence;
}
/**
* Resolve a description sentence for one story via cache → LLM.
* Returns null on any failure; caller falls back to the composer's
* baseline (cleaned headline) rather than shipping with a placeholder.
*
* @param {object} story
* @param {{
* callLLM: (system: string, user: string, opts: object) => Promise<string|null>;
* cacheGet: (key: string) => Promise<unknown>;
* cacheSet: (key: string, value: unknown, ttlSec: number) => Promise<void>;
* }} deps
*/
export async function generateStoryDescription(story, deps) {
// Shares hashBriefStory() with whyMatters — the key prefix
// (`brief:llm:description:v2:`) is what separates the two cache
// namespaces; the material is the six fields including description.
// Bumped v1→v2 on 2026-04-24 alongside the RSS-description fix so
// cached pre-grounding output (hallucinated named actors from
// headline-only prompts) is evicted. hashBriefStory itself includes
// description in the hash material, so content drift invalidates
// naturally too — the prefix bump is belt-and-braces.
const key = `brief:llm:description:v2:${await hashBriefStory(story)}`;
try {
const hit = await deps.cacheGet(key);
if (typeof hit === 'string') {
// Revalidate on cache hit so a pre-fix bad row (short, echo,
// malformed) can't flow into the envelope unchecked.
const valid = parseStoryDescription(hit, story.headline);
if (valid) return valid;
}
} catch { /* cache miss is fine */ }
// Sanitise the story BEFORE building the prompt. `description` (RSS body)
// is untrusted input; without sanitisation, a hostile feed's
// `<description>` would be an injection vector. The whyMatters path
// already does this — keep the two symmetric.
const { system, user } = buildStoryDescriptionPrompt(sanitizeStoryForPrompt(story));
let text = null;
try {
text = await deps.callLLM(system, user, {
maxTokens: 140,
temperature: 0.4,
timeoutMs: 10_000,
skipProviders: BRIEF_LLM_SKIP_PROVIDERS,
});
} catch {
return null;
}
const parsed = parseStoryDescription(text, story.headline);
if (!parsed) return null;
try {
await deps.cacheSet(key, parsed, STORY_DESCRIPTION_TTL_SEC);
} catch { /* ignore */ }
return parsed;
}
// ── Digest prose (canonical synthesis) ─────────────────────────────────────
//
// This is the single LLM call that produces the brief's executive summary.
// All channels (email HTML, plain-text, Telegram, Slack, Discord, webhook)
// AND the magazine's `digest.lead` read the same string from this output.
// The cron orchestration layer also produces a separate non-personalised
// `publicLead` via `generateDigestProsePublic` for the share-URL surface.
const DIGEST_PROSE_SYSTEM_BASE =
'You are the chief editor of WorldMonitor Brief. Given a ranked list of ' +
"today's top stories for a reader, produce EXACTLY this JSON and nothing " +
'else (no markdown, no code fences, no preamble):\n' +
'{\n' +
' "lead": "<23 sentence executive summary, editorial tone, references ' +
'the most important 12 threads, addresses the reader directly>",\n' +
' "threads": [\n' +
' { "tag": "<one-word editorial category e.g. Energy, Diplomacy, Climate>", ' +
'"teaser": "<one sentence describing what is developing>" }\n' +
' ],\n' +
' "signals": ["<forward-looking imperative phrase, <=14 words>"],\n' +
' "rankedStoryHashes": ["<short hash from the [h:XXXX] prefix of the most ' +
'important story>", "..."]\n' +
'}\n' +
'Threads: 36 items reflecting actual clusters in the stories. ' +
'Signals: 24 items, forward-looking. ' +
'rankedStoryHashes: at least the top 3 stories by editorial importance, ' +
'using the short hash from each story line (the value inside [h:...]). ' +
'Lead with the single most impactful development. Lead under 250 words.';
/**
* Compute a coarse greeting bucket for cache-key stability.
* Greeting strings can vary in punctuation/capitalisation across
* locales; the bucket collapses them to one of three slots so the
* cache key only changes when the time-of-day window changes.
*
* @param {string|null|undefined} greeting
* @returns {'morning' | 'afternoon' | 'evening' | ''}
*/
export function greetingBucket(greeting) {
if (typeof greeting !== 'string') return '';
const g = greeting.toLowerCase();
if (g.includes('morning')) return 'morning';
if (g.includes('afternoon')) return 'afternoon';
if (g.includes('evening') || g.includes('night')) return 'evening';
return '';
}
/**
* @typedef {object} DigestPromptCtx
* @property {string|null} [profile] formatted user profile lines, or null for non-personalised
* @property {string|null} [greeting] e.g. "Good morning", or null for non-personalised
* @property {boolean} [isPublic] true = strip personalisation, build a generic lead
*/
/**
* Build the digest-prose prompt. When `ctx.profile` / `ctx.greeting`
* are present (and `ctx.isPublic !== true`), the prompt asks the
* model to address the reader by their watched assets/regions and
* open with the greeting. Otherwise the prompt produces a generic
* editorial brief safe for share-URL surfaces.
*
* Per-story line format includes a stable short-hash prefix:
* `01 [h:abc12345] [CRITICAL] Headline — Category · Country · Source`
* The model emits `rankedStoryHashes` referencing those short hashes
* so the cron can re-order envelope.stories before the cap.
*
* @param {Array<{ hash?: string; headline: string; threatLevel: string; category: string; country: string; source: string }>} stories
* @param {string} sensitivity
* @param {DigestPromptCtx} [ctx]
* @returns {{ system: string; user: string }}
*/
export function buildDigestPrompt(stories, sensitivity, ctx = {}) {
const isPublic = ctx?.isPublic === true;
const profile = !isPublic && typeof ctx?.profile === 'string' ? ctx.profile.trim() : '';
const greeting = !isPublic && typeof ctx?.greeting === 'string' ? ctx.greeting.trim() : '';
const lines = stories.slice(0, MAX_STORIES_PER_USER).map((s, i) => {
const n = String(i + 1).padStart(2, '0');
const sev = (s.threatLevel ?? '').toUpperCase();
// Short hash prefix — first 8 chars of digest story hash. Keeps
// the prompt compact while remaining collision-free for ≤30
// stories. Stories without a hash fall back to position-based
// 'p<NN>' so the prompt is always well-formed.
const shortHash = typeof s.hash === 'string' && s.hash.length >= 8
? s.hash.slice(0, 8)
: `p${n}`;
return `${n}. [h:${shortHash}] [${sev}] ${s.headline}${s.category} · ${s.country} · ${s.source}`;
});
const userParts = [
`Reader sensitivity level: ${sensitivity}`,
];
if (greeting) {
userParts.push('', `Open the lead with: "${greeting}."`);
}
if (profile) {
userParts.push('', 'Reader profile (use to personalise lead and signals):', profile);
}
userParts.push('', "Today's surfaced stories (ranked):", ...lines);
return { system: DIGEST_PROSE_SYSTEM_BASE, user: userParts.join('\n') };
}
// Back-compat alias for tests that import the old constant name.
export const DIGEST_PROSE_SYSTEM = DIGEST_PROSE_SYSTEM_BASE;
/**
* Strict shape check for a parsed digest-prose object. Used by BOTH
* parseDigestProse (fresh LLM output) AND generateDigestProse's
* cache-hit path, so a bad row written under an older/buggy version
* can't poison the envelope at SETEX time. Returns a **normalised**
* copy of the object on success, null on any shape failure — never
* returns the caller's object by reference so downstream writes
* can't observe internal state.
*
* v3 (2026-04-25): adds optional `rankedStoryHashes` — short hashes
* (≥4 chars each) that the orchestration layer maps back to digest
* story `hash` values to re-order envelope.stories before the cap.
* Field is optional so v2-shaped cache rows still pass validation
* during the rollout window — they just don't carry ranking signal.
*
* @param {unknown} obj
* @returns {{ lead: string; threads: Array<{tag:string;teaser:string}>; signals: string[]; rankedStoryHashes: string[] } | null}
*/
export function validateDigestProseShape(obj) {
if (!obj || typeof obj !== 'object' || Array.isArray(obj)) return null;
const lead = typeof obj.lead === 'string' ? obj.lead.trim() : '';
if (lead.length < 40 || lead.length > 1500) return null;
const rawThreads = Array.isArray(obj.threads) ? obj.threads : [];
const threads = rawThreads
.filter((t) => t && typeof t.tag === 'string' && typeof t.teaser === 'string')
.map((t) => ({
tag: t.tag.trim().slice(0, 40),
teaser: t.teaser.trim().slice(0, 220),
}))
.filter((t) => t.tag.length > 0 && t.teaser.length > 0)
.slice(0, 6);
if (threads.length < 1) return null;
// The prompt instructs the model to produce signals of "<=14 words,
// forward-looking imperative phrase". Enforce both a word cap (with
// a small margin of 4 words for model drift and compound phrases)
// and a byte cap — a 30-word "signal" would render as a second
// paragraph on the signals page, breaking visual rhythm. Previously
// only the byte cap was enforced, allowing ~40-word signals to
// sneak through when the model ignored the word count.
const rawSignals = Array.isArray(obj.signals) ? obj.signals : [];
const signals = rawSignals
.filter((x) => typeof x === 'string')
.map((x) => x.trim())
.filter((x) => {
if (x.length === 0 || x.length >= 220) return false;
const words = x.split(/\s+/).filter(Boolean).length;
return words <= 18;
})
.slice(0, 6);
// rankedStoryHashes: optional. When present, must be array of
// non-empty short-hash strings (≥4 chars). Each entry trimmed and
// capped to 16 chars (the prompt emits 8). Length capped to
// MAX_STORIES_PER_USER × 2 to bound prompt drift.
const rawRanked = Array.isArray(obj.rankedStoryHashes) ? obj.rankedStoryHashes : [];
const rankedStoryHashes = rawRanked
.filter((x) => typeof x === 'string')
.map((x) => x.trim().slice(0, 16))
.filter((x) => x.length >= 4)
.slice(0, MAX_STORIES_PER_USER * 2);
return { lead, threads, signals, rankedStoryHashes };
}
/**
* @param {unknown} text
* @returns {{ lead: string; threads: Array<{tag:string;teaser:string}>; signals: string[] } | null}
*/
export function parseDigestProse(text) {
if (typeof text !== 'string') return null;
let s = text.trim();
if (!s) return null;
// Defensive: strip common wrappings the model sometimes inserts
// despite the explicit system instruction.
s = s.replace(/^```(?:json)?\s*/i, '').replace(/\s*```$/, '').trim();
let obj;
try {
obj = JSON.parse(s);
} catch {
return null;
}
return validateDigestProseShape(obj);
}
/**
* Cache key for digest prose. MUST cover every field the LLM sees,
* in the order it sees them — anything less and we risk returning
* pre-computed prose for a materially different prompt (e.g. the
* same stories re-ranked, or with corrected category/country
* metadata). The old "sort + headline|severity" hash was explicitly
* about cache-hit rate; that optimisation is the wrong tradeoff for
* an editorial product whose correctness bar is "matches the email".
*
* v3 key space (2026-04-25): material now includes the digest-story
* `hash` (per-story rankability), `ctx.profile` SHA-256, greeting
* bucket, and isPublic flag. When `ctx.isPublic === true` the userId
* slot is replaced with the literal `'public'` so all public-share
* readers of the same (sensitivity, story-pool) hit ONE cache row
* regardless of caller — no PII in public cache keys, no per-user
* inflation. v2 rows are ignored on rollout (paid for once).
*
* @param {string} userId
* @param {Array} stories
* @param {string} sensitivity
* @param {DigestPromptCtx} [ctx]
*/
function hashDigestInput(userId, stories, sensitivity, ctx = {}) {
const isPublic = ctx?.isPublic === true;
const profileSha = isPublic ? '' : (typeof ctx?.profile === 'string' && ctx.profile.length > 0
? createHash('sha256').update(ctx.profile).digest('hex').slice(0, 16)
: '');
const greetingSlot = isPublic ? '' : greetingBucket(ctx?.greeting);
// Canonicalise as JSON of the fields the prompt actually references,
// in the prompt's ranked order. Stable stringification via an array
// of tuples keeps field ordering deterministic without relying on
// JS object-key iteration order. Slice MUST match buildDigestPrompt's
// slice or the cache key drifts from the prompt content.
const material = JSON.stringify([
sensitivity ?? '',
profileSha,
greetingSlot,
isPublic ? 'public' : 'private',
...stories.slice(0, MAX_STORIES_PER_USER).map((s) => [
// hash drives ranking (model emits rankedStoryHashes); without
// it the cache ignores re-ranking and stale ordering is served.
typeof s.hash === 'string' ? s.hash.slice(0, 8) : '',
s.headline ?? '',
s.threatLevel ?? '',
s.category ?? '',
s.country ?? '',
s.source ?? '',
]),
]);
const h = createHash('sha256').update(material).digest('hex').slice(0, 16);
// userId-slot substitution for public mode — one cache row per
// (sensitivity, story-pool) shared across ALL public readers.
const userSlot = isPublic ? 'public' : userId;
return `${userSlot}:${sensitivity}:${h}`;
}
/**
* Resolve the digest prose object via cache → LLM.
*
* Backward-compatible signature: existing 4-arg callers behave like
* today (no profile/greeting → non-personalised lead). New callers
* pass `ctx` to enable canonical synthesis with greeting + profile.
*
* @param {string} userId
* @param {Array} stories
* @param {string} sensitivity
* @param {{ callLLM: Function; cacheGet: Function; cacheSet: Function }} deps
* @param {DigestPromptCtx} [ctx]
*/
export async function generateDigestProse(userId, stories, sensitivity, deps, ctx = {}) {
// v3 key: see hashDigestInput() comment. Full-prompt hash + strict
// shape validation on every cache hit.
const key = `brief:llm:digest:v3:${hashDigestInput(userId, stories, sensitivity, ctx)}`;
try {
const hit = await deps.cacheGet(key);
// CRITICAL: re-run the shape validator on cache hits. Without
// this, a bad row (written under an older buggy code path, or
// partial write, or tampered Redis) flows straight into
// envelope.data.digest and the envelope later fails
// assertBriefEnvelope() at the /api/brief render boundary. The
// user's brief URL then 404s / expired-pages. Treat a
// shape-failed hit the same as a miss — re-LLM and overwrite.
if (hit) {
const validated = validateDigestProseShape(hit);
if (validated) return validated;
}
} catch { /* cache miss fine */ }
const { system, user } = buildDigestPrompt(stories, sensitivity, ctx);
let text = null;
try {
text = await deps.callLLM(system, user, {
maxTokens: 900,
temperature: 0.4,
timeoutMs: 15_000,
skipProviders: BRIEF_LLM_SKIP_PROVIDERS,
});
} catch {
return null;
}
const parsed = parseDigestProse(text);
if (!parsed) return null;
try {
await deps.cacheSet(key, parsed, DIGEST_PROSE_TTL_SEC);
} catch { /* ignore */ }
return parsed;
}
/**
* Non-personalised wrapper for share-URL surfaces. Strips profile
* and greeting; substitutes 'public' for userId in the cache key
* (see hashDigestInput) so all public-share readers of the same
* (sensitivity, story-pool) hit one cache row.
*
* Note the missing `userId` parameter — by design. Callers MUST
* NOT thread their authenticated user's id through this function;
* the public lead must never carry per-user salt.
*
* @param {Array} stories
* @param {string} sensitivity
* @param {{ callLLM: Function; cacheGet: Function; cacheSet: Function }} deps
* @returns {ReturnType<typeof generateDigestProse>}
*/
export async function generateDigestProsePublic(stories, sensitivity, deps) {
// userId param to generateDigestProse is unused when isPublic=true
// (see hashDigestInput's userSlot logic). Pass an empty string so
// a typo on a future caller can't accidentally salt the public
// cache.
return generateDigestProse('', stories, sensitivity, deps, {
profile: null,
greeting: null,
isPublic: true,
});
}
// ── Envelope enrichment ────────────────────────────────────────────────────
/**
* Bounded-concurrency map. Preserves input order. Doesn't short-circuit
* on individual failures — fn is expected to return a sentinel (null)
* on error and the caller decides.
*/
async function mapLimit(items, limit, fn) {
if (!Array.isArray(items) || items.length === 0) return [];
const n = Math.min(Math.max(1, limit), items.length);
const out = new Array(items.length);
let next = 0;
async function worker() {
while (true) {
const idx = next++;
if (idx >= items.length) return;
try {
out[idx] = await fn(items[idx], idx);
} catch {
out[idx] = items[idx];
}
}
}
await Promise.all(Array.from({ length: n }, worker));
return out;
}
/**
* Take a baseline BriefEnvelope (stubbed whyMatters + stubbed lead /
* threads / signals) and enrich it with LLM output. All failures fall
* through cleanly — the envelope that comes out is always a valid
* BriefEnvelope (structure unchanged; only string/array field
* contents are substituted).
*
* @param {object} envelope
* @param {{ userId: string; sensitivity?: string }} rule
* @param {{ callLLM: Function; cacheGet: Function; cacheSet: Function }} deps
*/
export async function enrichBriefEnvelopeWithLLM(envelope, rule, deps) {
if (!envelope?.data || !Array.isArray(envelope.data.stories)) return envelope;
const stories = envelope.data.stories;
// Default to 'high' (NOT 'all') so the digest prompt and cache key
// align with what the rest of the pipeline (compose, buildDigest,
// cache, log) treats undefined-sensitivity rules as. Mismatched
// defaults would (a) mislead personalization — the prompt would say
// "Reader sensitivity level: all" while the actual brief contains
// only critical/high stories — and (b) bust the cache for legacy
// rules vs explicit-'all' rules that should share entries. See PR
// #3387 review (P3).
const sensitivity = rule?.sensitivity ?? 'high';
// Per-story enrichment — whyMatters AND description in parallel
// per story (two LLM calls) but bounded across stories.
const enrichedStories = await mapLimit(stories, WHY_MATTERS_CONCURRENCY, async (story) => {
const [why, desc] = await Promise.all([
generateWhyMatters(story, deps),
generateStoryDescription(story, deps),
]);
if (!why && !desc) return story;
return {
...story,
...(why ? { whyMatters: why } : {}),
...(desc ? { description: desc } : {}),
};
});
// Per-user digest prose — one call.
const prose = await generateDigestProse(rule.userId, stories, sensitivity, deps);
const digest = prose
? {
...envelope.data.digest,
lead: prose.lead,
threads: prose.threads,
signals: prose.signals,
}
: envelope.data.digest;
return {
...envelope,
data: {
...envelope.data,
digest,
stories: enrichedStories,
},
};
}