Files
worldmonitor/scripts/lib/brief-llm.mjs
Elie Habib 34dfc9a451 fix(news): ground LLM surfaces on real RSS description end-to-end (#3370)
* feat(news/parser): extract RSS/Atom description for LLM grounding (U1)

Add description field to ParsedItem, extract from the first non-empty of
description/content:encoded (RSS) or summary/content (Atom), picking the
longest after HTML-strip + entity-decode + whitespace-normalize. Clip to
400 chars. Reject empty, <40 chars after strip, or normalize-equal to the
headline — downstream consumers fall back to the cleaned headline on '',
preserving current behavior for feeds without a description.

CDATA end is anchored to the closing tag so internal ]]> sequences do not
truncate the match. Preserves cached rss:feed:v1 row compatibility during
the 1h TTL bleed since the field is additive.

Part of fix: pipe RSS description end-to-end so LLM surfaces stop
hallucinating named actors (docs/plans/2026-04-24-001-...).

Covers R1, R7.

* feat(news/story-track): persist description on story:track:v1 HSET (U2)

Append description to the story:track:v1 HSET only when non-empty. Additive
— no key version bump. Old rows and rows from feeds without a description
return undefined on HGETALL, letting downstream readers fall back to the
cleaned headline (R6).

Extract buildStoryTrackHsetFields as a pure helper so the inclusion gate is
unit-testable without Redis.

Update the contract comment in cache-keys.ts so the next reader of the
schema sees description as an optional field.

Covers R2, R6.

* feat(proto): NewsItem.snippet + SummarizeArticleRequest.bodies (U3)

Add two additive proto fields so the article description can ride to every
LLM-adjacent consumer without a breaking change:

- NewsItem.snippet (field 12): RSS/Atom description, HTML-stripped,
  ≤400 chars, empty when unavailable. Wired on toProtoItem.
- SummarizeArticleRequest.bodies (field 8): optional article bodies
  paired 1:1 with headlines for prompt grounding. Empty array is today's
  headline-only behavior.

Regenerated TS client/server stubs and OpenAPI YAML/JSON via sebuf v0.11.1
(PATH=~/go/bin required — Homebrew's protoc-gen-openapiv3 is an older
pre-bundle-mode build that collides on duplicate emission).

Pre-emptive bodies:[] placeholders at the two existing SummarizeArticle
call sites in src/services/summarization.ts; U6 replaces them with real
article bodies once SummarizeArticle handler reads the field.

Covers R3, R5.

* feat(brief/digest): forward RSS description end-to-end through brief envelope (U4)

Digest accumulator reader (seed-digest-notifications.mjs::buildDigest) now
plumbs the optional `description` field off each story:track:v1 HGETALL into
the digest story object. The brief adapter (brief-compose.mjs::
digestStoryToUpstreamTopStory) prefers the real RSS description over the
cleaned headline; when the upstream row has no description (old rows in the
48h bleed, feeds that don't carry one), we fall back to the cleaned headline
so today behavior is preserved (R6).

This is the upstream half of the description cache path. U5 lands the LLM-
side grounding + cache-prefix bump so Gemini actually sees the article body
instead of hallucinating a named actor from the headline.

Covers R4 (upstream half), R6.

* feat(brief/llm): RSS grounding + sanitisation + 4 cache prefix bumps (U5)

The actual fix for the headline-only named-actor hallucination class:
Gemini 2.5 Flash now receives the real article body as grounding context,
so it paraphrases what the article says instead of filling role-label
headlines from parametric priors ("Iran's new supreme leader" → "Ali
Khamenei" was the 2026-04-24 reproduction; with grounding, it becomes
the actual article-named actor).

Changes:

- buildStoryDescriptionPrompt interpolates a `Context: <body>` line
  between the metadata block and the "One editorial sentence" instruction
  when description is non-empty AND not normalise-equal to the headline.
  Clips to 400 chars as a second belt-and-braces after the U1 parser cap.
  No Context line → identical prompt to pre-fix (R6 preserved).

- sanitizeStoryForPrompt extended to cover `description`. Closes the
  asymmetry where whyMatters was sanitised and description wasn't —
  untrusted RSS bodies now flow through the same injection-marker
  neutraliser before prompt interpolation. generateStoryDescription wraps
  the story in sanitizeStoryForPrompt before calling the builder,
  matching generateWhyMatters.

- Four cache prefixes bumped atomically to evict pre-grounding rows:
    scripts/lib/brief-llm.mjs:
      brief:llm:description:v1 → v2  (Railway, description path)
      brief:llm:whymatters:v2 → v3   (Railway, whyMatters fallback)
    api/internal/brief-why-matters.ts:
      brief:llm:whymatters:v6 → v7                (edge, primary)
      brief:llm:whymatters:shadow:v4 → shadow:v5  (edge, shadow)
  hashBriefStory already includes description in the 6-field material
  (v5 contract) so identity naturally drifts; the prefix bump is the
  belt-and-braces that guarantees a clean cold-start on first tick.

- Tests: 8 new + 2 prefix-match updates on tests/brief-llm.test.mjs.
  Covers Context-line injection, empty/dup-of-headline rejection,
  400-char clip, sanitisation of adversarial descriptions, v2 write,
  and legacy-v1 row dark (forced cold-start).

Covers R4 + new sanitisation requirement.

* feat(news/summarize): accept bodies + bump summary cache v5→v6 (U6)

SummarizeArticle now grounds on per-headline article bodies when callers
supply them, so the dashboard "News summary" path stops hallucinating
across unrelated headlines when the upstream RSS carried context.

Three coordinated changes:

1. SummarizeArticleRequest handler reads req.bodies, sanitises each entry
   through sanitizeForPrompt (same trust treatment as geoContext — bodies
   are untrusted RSS text), clips to 400 chars, and pads to the headlines
   length so pair-wise identity is stable.

2. buildArticlePrompts accepts optional bodies and interleaves a
   `    Context: <body>` line under each numbered headline that has a
   non-empty body. Skipped in translate mode (headline[0]-only) and when
   all bodies are empty — yielding a byte-identical prompt to pre-U6
   for every current caller (R6 preserved).

3. summary-cache-key bumps CACHE_VERSION v5→v6 so the pre-grounding rows
   (produced from headline-only prompts) cold-start cleanly. Extends
   canonicalizeSummaryInputs + buildSummaryCacheKey with a pair-wise
   bodies segment `:bd<hash>`; the prefix is `:bd` rather than `:b` to
   avoid colliding with `:brief:` when pattern-matching keys. Translate
   mode is headline[0]-only and intentionally does not shift on bodies.

Dedup reorder preserved: the handler re-pairs bodies to the deduplicated
top-5 via findIndex, so layout matches without breaking cache identity.

New tests: 7 on buildArticlePrompts (bodies interleave, partial fill,
translate-mode skip, clip, short-array tolerance), 8 on
buildSummaryCacheKey (pair-wise sort, cache-bust on body drift, translate
skip). Existing summary-cache-key assertions updated v5→v6.

Covers R3, R4.

* feat(consumers): surface RSS snippet across dashboard, email, relay, MCP + audit (U7)

Thread the RSS description from the ingestion path (U1-U5) into every
user-facing LLM-adjacent surface. Audit the notification producers so
RSS-origin and domain-origin events stay on distinct contracts.

Dashboard (proto snippet → client → panel):
- src/types/index.ts NewsItem.snippet?:string (client-side field).
- src/app/data-loader.ts proto→client mapper propagates p.snippet.
- src/components/NewsPanel.ts renders snippet as a truncated (~200 chars,
  word-boundary ellipsis) `.item-snippet` line under each headline.
- NewsPanel.currentBodies tracks per-headline bodies paired 1:1 with
  currentHeadlines; passed as options.bodies to generateSummary so the
  server-side SummarizeArticle LLM grounds on the article body.

Summary plumbing:
- src/services/summarization.ts threads bodies through SummarizeOptions
  → generateSummary → runApiChain → tryApiProvider; cache key now includes
  bodies (via U6's buildSummaryCacheKey signature).

MCP world-brief:
- api/mcp.ts pairs headlines with their RSS snippets and POSTs `bodies`
  to /api/news/v1/summarize-article so the MCP tool surface is no longer
  starved.

Email digest:
- scripts/seed-digest-notifications.mjs plain-text formatDigest appends
  a ~200-char truncated snippet line under each story; HTML formatDigestHtml
  renders a dim-grey description div between title and meta. Both gated
  on non-empty description (R6 — empty → today's behavior).

Real-time alerts:
- src/services/breaking-news-alerts.ts BreakingAlert gains optional
  description; checkBatchForBreakingAlerts reads item.snippet; dispatchAlert
  includes `description` in the /api/notify payload when present.

Notification relay:
- scripts/notification-relay.cjs formatMessage gated on
  NOTIFY_RELAY_INCLUDE_SNIPPET=1 (default off). When on, RSS-origin
  payloads render a `> <snippet>` context line under the title. When off
  or payload.description absent, output is byte-identical to pre-U7.

Audit (RSS vs domain):
- tests/notification-relay-payload-audit.test.mjs enforces file-level
  @notification-source tags on every producer, rejects `description:` in
  domain-origin payload blocks, and verifies the relay codepath gates
  snippet rendering under the flag.
- Tag added to ais-relay.cjs (domain), seed-aviation.mjs (domain),
  alert-emitter.mjs (domain), breaking-news-alerts.ts (rss).

Deferred (plan explicitly flags): InsightsPanel + cluster-producer
plumbing (bodies default to [] — will unlock gradually once news:insights:v1
producer also carries primarySnippet).

Covers R5, R6.

* docs+test: grounding-path note + bump pinned CACHE_VERSION v5→v6 (U8)

Final verification for the RSS-description-end-to-end fix:

- docs/architecture.mdx — one-paragraph "News Grounding Pipeline"
  subsection tracing parser → story:track:v1.description → NewsItem.snippet
  → brief / SummarizeArticle / dashboard / email / relay / MCP, with the
  empty-description R6 fallback rule called out explicitly.
- tests/summarize-reasoning.test.mjs — Fix-4 static-analysis pin updated
  to match the v6 bump from U6. Without this the summary cache bump silently
  regressed CI's pinned-version assertion.

Final sweep (2026-04-24):
- grep -rn 'brief:llm:description:v1' → only in the U5 legacy-row test
  simulation (by design: proves the v2 bump forces cold-start).
- grep -rn 'brief:llm:whymatters:v2/v6/shadow:v4' → no live references.
- grep -rn 'summary:v5' → no references.
- CACHE_VERSION = 'v6' in src/utils/summary-cache-key.ts.
- Full tsx --test sweep across all tests/*.test.{mjs,mts}: 6747/6747 pass.
- npm run typecheck + typecheck:api: both clean.

Covers R4, R6, R7.

* fix(rss-description): address /ce:review findings before merge

14 fixes from structured code review across 13 reviewer personas.

Correctness-critical (P1 — fixes that prevent R6/U7 contract violations):
- NewsPanel signature covers currentBodies so view-mode toggles that leave
  headlines identical but bodies different now invalidate in-flight summaries.
  Without this, switching renderItems → renderClusters mid-summary let a
  grounded response arrive under a stale (now-orphaned) cache key.
- summarize-article.ts re-pairs bodies with headlines BEFORE dedup via a
  single zip-sanitize-filter-dedup pass. Previously bodies[] was indexed by
  position in light-sanitized headlines while findIndex looked up the
  full-sanitized array — any headline that sanitizeHeadlines emptied
  mispaired every subsequent body, grounding the LLM on the wrong story.
- Client skips the pre-chain cache lookup when bodies are present, since
  client builds keys from RAW bodies while server sanitizes first. The
  keys diverge on injection content, which would silently miss the
  server's authoritative cache every call.

Test + audit hardening:
- Legacy v1 eviction test now uses the real hashBriefStory(story()) suffix
  instead of a literal "somehash", so a bug where the reader still queried
  the v1 prefix at the real key would actually be caught.
- tests/summary-cache-key.test.mts adds 400-char clip identity coverage so
  the canonicalizer's clip and any downstream clip can't silently drift.
- tests/news-rss-description-extract.test.mts renames the well-formed
  CDATA test and adds a new test documenting the malformed-]]> fallback
  behavior (plain regex captures, article content survives).

Safe_auto cleanups:
- Deleted dead SNIPPET_PUSH_MAX constant in notification-relay.cjs.
- BETA-mode groq warm call now passes bodies, warming the right cache slot.
- seed-digest shares a local normalize-equality helper for description !=
  headline comparison, matching the parser's contract.
- Pair-wise sort in summary-cache-key tie-breaks on body so duplicate
  headlines produce stable order across runs.
- buildSummaryCacheKey gained JSDoc documenting the client/server contract
  and the bodies parameter semantics.
- MCP get_world_brief tool description now mentions RSS article-body
  grounding so calling agents see the current contract.
- _shared.ts `opts.bodies![i]!` double-bang replaced with `?? ''`.
- extractRawTagBody regexes cached in module-level Map, mirroring the
  existing TAG_REGEX_CACHE pattern.

Deferred to follow-up (tracked for PR description / separate issue):
- Promote shared MAX_BODY constant across the 5 clip sites
- Promote shared truncateForDisplay helper across 4 render sites
- Collapse NewsPanel.{currentHeadlines, currentBodies} → Array<{title, snippet}>
- Promote sanitizeStoryForPrompt to shared/brief-llm-core.js
- Split list-feed-digest.ts parser helpers into sibling -utils.ts
- Strengthen audit test: forward-sweep + behavioral gate test

Tests: 6749/6749 pass. Typecheck clean on both configs.

* fix(summarization): thread bodies through browser T5 path (Codex #2)

Addresses the second of two Codex-raised findings on PR #3370:

The PR threaded bodies through the server-side API provider chain
(Ollama → Groq → OpenRouter → /api/news/v1/summarize-article) but the
local browser T5 path at tryBrowserT5 was still summarising from
headlines alone. In BETA_MODE that ungrounded path runs BEFORE the
grounded server providers; in normal mode it remains the last
fallback. Whenever T5-small won, the dashboard summary surface
regressed to the headline-only path — the exact hallucination class
this PR exists to eliminate.

Fix: tryBrowserT5 accepts an optional `bodies` parameter and
interleaves each body with its paired headline via a `headline —
body` separator in the combined text (clipped to 200 chars per body
to stay within T5-small's ~512-token context window). All three call
sites (BETA warm, BETA cold, normal-mode fallback) now pass the
bodies threaded down from generateSummary options.bodies.

When bodies is empty/omitted, the combined text is byte-identical to
pre-fix (R6 preserved).

On Codex finding #1 (story:track:v1 additive-only HSET keeps a body
from an earlier mention of the same normalized title), declining to
change. The current rule — "if this mention has a body, overwrite;
otherwise leave the prior body alone" — is defensible: a body from
mention A is not falsified by mention B being body-less (a wire
reprint doesn't invalidate the original source's body). A feed that
publishes a corrected headline creates a new normalized-title hash,
so no stale body carries forward. The failure window is narrow (live
story evolving while keeping the same title through hours of
body-less wire reprints) and the 7-day STORY_TTL is the backstop.
Opening a follow-up issue to revisit semantics if real-world evidence
surfaces a stale-grounding case.

* fix(story-track): description always-written to overwrite stale bodies (Codex #1)

Revisiting Codex finding #1 on PR #3370 after re-review. The previous
response declined the fix with reasoning; on reflection the argument
was over-defending the current behavior.

Problem: buildStoryTrackHsetFields previously wrote `description` only
when non-empty. Because story:track:v1 rows are collapsed by
normalized-title hash, an earlier mention's body would persist for up
to STORY_TTL (7 days) on subsequent body-less mentions of the same
story. Consumers reading `track.description` via HGETALL could not
distinguish "this mention's body" from "some mention's body from the
last week," silently grounding brief / whyMatters / SummarizeArticle
LLMs on text the current mention never supplied. That violates the
grounding contract advertised to every downstream surface in this PR.

Fix: HSET `description` unconditionally on every mention — empty
string when the current item has no body, real body when it does. An
empty value overwrites any prior mention's body so the row is always
authoritative for the current cycle. Consumers continue to treat
empty description as "fall back to cleaned headline" (R6 preserved).
The 7-day STORY_TTL and normalized-title hash semantics are unchanged.

Trade-off accepted: a valid body from Feed A (NYT) is wiped when Feed
B (AP body-less wire reprint) arrives for the same normalized title,
even though Feed A's body is factually correct. Rationale: the
alternative — keeping Feed A's body indefinitely — means the user
sees Feed A's body attributed (by proximity) to an AP mention at a
later timestamp, which is at minimum misleading and at worst carries
retracted/corrected details. Honest absence beats unlabeled presence.

Tests: new stale-body overwrite sequence test (T0 body → T1 empty →
T2 new body), existing "writes description when non-empty" preserved,
existing "omits when empty" inverted to "writes empty, overwriting."
cache-keys.ts contract comment updated to mark description as
always-written rather than optional.
2026-04-24 16:25:14 +04:00

563 lines
23 KiB
JavaScript
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
// Phase 3b: LLM enrichment for the WorldMonitor Brief envelope.
//
// Substitutes the stubbed `whyMatters` per story and the stubbed
// executive summary (`digest.lead` / `digest.threads` / `digest.signals`)
// with Gemini 2.5 Flash output via the existing OpenRouter-backed
// callLLM chain. The LLM provider is pinned to openrouter by
// skipProviders:['ollama','groq'] so the brief's editorial voice
// stays on one model across environments.
//
// Deliberately:
// - Pure parse/build helpers are exported for testing without IO.
// - Cache layer is parameterised (cacheGet / cacheSet) so tests use
// an in-memory stub and production uses Upstash.
// - Any failure (null LLM result, parse error, cache hiccup) falls
// through to the original stub — the brief must always ship.
//
// Cache semantics:
// - brief:llm:whymatters:v1:{storyHash} — 24h, shared across users.
// whyMatters is editorial global-stakes commentary, not user
// personalisation, so per-story caching collapses N×U LLM calls
// to N.
// - brief:llm:digest:v1:{userId}:{poolHash} — 4h, per user.
// The executive summary IS personalised to a user's sensitivity
// and surfaced story pool, so cache keys include a hash of both.
// 4h balances cost vs freshness — hourly cron pays at most once
// per 4 ticks per user.
import { createHash } from 'node:crypto';
import {
WHY_MATTERS_SYSTEM,
buildWhyMattersUserPrompt,
hashBriefStory,
parseWhyMatters,
} from '../../shared/brief-llm-core.js';
import { sanitizeForPrompt } from '../../server/_shared/llm-sanitize.js';
/**
* Sanitize the story fields that flow into buildWhyMattersUserPrompt and
* buildStoryDescriptionPrompt. Mirrors
* server/worldmonitor/intelligence/v1/brief-why-matters-prompt.ts
* sanitizeStoryFields — the legacy Railway fallback path must apply the
* same defense as the analyst endpoint, since this is exactly what runs
* when the endpoint misses / returns null / throws.
*
* `description` is included because the RSS-description fix (2026-04-24)
* now threads untrusted article bodies into the description prompt as
* grounding context. Without sanitising it, a hostile feed's
* `<description>` is an unsanitised injection vector — the asymmetry with
* whyMatters (already sanitised) was a latent bug, fixed here.
*
* Kept local (not promoted to brief-llm-core.js) because llm-sanitize.js
* only lives in server/_shared and the edge endpoint already sanitizes
* before its own buildWhyMattersUserPrompt call.
*
* @param {{ headline?: string; source?: string; threatLevel?: string; category?: string; country?: string; description?: string }} story
*/
function sanitizeStoryForPrompt(story) {
return {
headline: sanitizeForPrompt(story.headline ?? ''),
source: sanitizeForPrompt(story.source ?? ''),
threatLevel: sanitizeForPrompt(story.threatLevel ?? ''),
category: sanitizeForPrompt(story.category ?? ''),
country: sanitizeForPrompt(story.country ?? ''),
description: sanitizeForPrompt(story.description ?? ''),
};
}
// Re-export for backcompat with existing tests / callers.
export { WHY_MATTERS_SYSTEM, hashBriefStory, parseWhyMatters };
export const buildWhyMattersPrompt = buildWhyMattersUserPrompt;
// ── Tunables ───────────────────────────────────────────────────────────────
const WHY_MATTERS_TTL_SEC = 24 * 60 * 60;
const DIGEST_PROSE_TTL_SEC = 4 * 60 * 60;
const STORY_DESCRIPTION_TTL_SEC = 24 * 60 * 60;
const WHY_MATTERS_CONCURRENCY = 5;
// Pin to openrouter (google/gemini-2.5-flash). Ollama isn't deployed
// in Railway and groq (llama-3.1-8b) produces noticeably less
// editorial prose than Gemini Flash.
const BRIEF_LLM_SKIP_PROVIDERS = ['ollama', 'groq'];
// ── whyMatters (per story) ─────────────────────────────────────────────────
// The pure helpers (`WHY_MATTERS_SYSTEM`, `buildWhyMattersUserPrompt` (aliased
// to `buildWhyMattersPrompt` for backcompat), `parseWhyMatters`, `hashBriefStory`)
// live in `shared/brief-llm-core.js` so the Vercel-edge endpoint
// (`api/internal/brief-why-matters.ts`) can import them without pulling in
// `node:crypto`. See the `shared/` → `scripts/shared/` mirror convention.
/**
* Resolve a `whyMatters` sentence for one story.
*
* Three-layer graceful degradation:
* 1. `deps.callAnalystWhyMatters(story)` — the analyst-context edge
* endpoint (brief:llm:whymatters:v3 cache lives there). Preferred.
* 2. Legacy direct-Gemini chain: cacheGet (v2) → callLLM → cacheSet.
* Runs whenever the analyst call is missing, returns null, or throws.
* 3. Caller (enrichBriefEnvelopeWithLLM) uses the baseline stub if
* this function returns null.
*
* Returns null on all-layer failure.
*
* @param {object} story
* @param {{
* callLLM: (system: string, user: string, opts: object) => Promise<string|null>;
* cacheGet: (key: string) => Promise<unknown>;
* cacheSet: (key: string, value: unknown, ttlSec: number) => Promise<void>;
* callAnalystWhyMatters?: (story: object) => Promise<string|null>;
* }} deps
*/
export async function generateWhyMatters(story, deps) {
// Priority path: analyst endpoint. It owns its own cache and has
// ALREADY validated the output via parseWhyMatters (gemini path) or
// parseWhyMattersV2 (analyst path, multi-sentence). We must NOT
// re-parse here with the single-sentence v1 parser — that silently
// truncates v2's 23-sentence output to the first sentence. Trust
// the wire shape; only reject an obviously-bad payload (empty, stub
// echo, or length outside the legal bounds for either parser).
if (typeof deps.callAnalystWhyMatters === 'function') {
try {
const analystOut = await deps.callAnalystWhyMatters(story);
if (typeof analystOut === 'string') {
const trimmed = analystOut.trim();
const lenOk = trimmed.length >= 30 && trimmed.length <= 500;
const notStub = !/^story flagged by your sensitivity/i.test(trimmed);
if (lenOk && notStub) return trimmed;
console.warn(
`[brief-llm] callAnalystWhyMatters → fallback: endpoint returned out-of-bounds or stub (len=${trimmed.length})`,
);
} else {
console.warn('[brief-llm] callAnalystWhyMatters → fallback: null/empty response');
}
} catch (err) {
console.warn(
`[brief-llm] callAnalystWhyMatters → fallback: ${err instanceof Error ? err.message : String(err)}`,
);
}
}
// Fallback path: legacy direct-Gemini chain with the v3 cache.
// Bumped v2→v3 on 2026-04-24 alongside the RSS-description fix: rows
// keyed on the prior v2 prefix were produced from headline-only prompts
// and may reference hallucinated named actors. The prefix bump forces
// a clean cold-start on first tick after deploy; entries expire in
// ≤24h so the prior prefix ages out naturally without a DEL sweep.
const key = `brief:llm:whymatters:v3:${await hashBriefStory(story)}`;
try {
const hit = await deps.cacheGet(key);
if (typeof hit === 'string' && hit.length > 0) return hit;
} catch { /* cache miss is fine */ }
// Sanitize story fields before interpolating into the prompt. The analyst
// endpoint already does this; without it the Railway fallback path was an
// unsanitized injection vector for any future untrusted `source` / `headline`.
const { system, user } = buildWhyMattersPrompt(sanitizeStoryForPrompt(story));
let text = null;
try {
text = await deps.callLLM(system, user, {
maxTokens: 120,
temperature: 0.4,
timeoutMs: 10_000,
skipProviders: BRIEF_LLM_SKIP_PROVIDERS,
});
} catch {
return null;
}
const parsed = parseWhyMatters(text);
if (!parsed) return null;
try {
await deps.cacheSet(key, parsed, WHY_MATTERS_TTL_SEC);
} catch { /* cache write failures don't matter here */ }
return parsed;
}
// ── Per-story description (replaces title-verbatim fallback) ──────────────
const STORY_DESCRIPTION_SYSTEM =
'You are the editor of WorldMonitor Brief, a geopolitical intelligence magazine. ' +
'Given the story attributes below, write ONE concise sentence (1630 words) that ' +
'describes the development itself — not why it matters, not the reader reaction. ' +
'Editorial, serious, past/present tense, named actors where possible. Do NOT ' +
'repeat the headline verbatim. No preamble, no quotes, no questions, no markdown, ' +
'no hedging. One sentence only.';
/**
* @param {{ headline: string; source: string; category: string; country: string; threatLevel: string; description?: string }} story
* @returns {{ system: string; user: string }}
*/
export function buildStoryDescriptionPrompt(story) {
// Grounding context: when the RSS feed carried a real description
// (post-RSS-description fix, 2026-04-24), interpolate it as `Context:`
// between the metadata block and the "One editorial sentence" instruction.
// This is the actual fix for the named-actor hallucination class — the LLM
// now has the article's body to paraphrase instead of filling role-label
// headlines from its parametric priors. Skip when description is empty or
// normalise-equal to the headline (no grounding value; parser already
// filters this but the prompt builder is a second belt-and-braces check).
const normalise = /** @param {string} x */ (x) => x.trim().toLowerCase().replace(/\s+/g, ' ');
const rawDescription = typeof story.description === 'string' ? story.description.trim() : '';
const contextUseful = rawDescription.length > 0
&& normalise(rawDescription) !== normalise(story.headline ?? '');
const contextLine = contextUseful ? `Context: ${rawDescription.slice(0, 400)}` : null;
const lines = [
`Headline: ${story.headline}`,
`Source: ${story.source}`,
`Severity: ${story.threatLevel}`,
`Category: ${story.category}`,
`Country: ${story.country}`,
...(contextLine ? [contextLine] : []),
'',
'One editorial sentence describing what happened (not why it matters):',
];
return { system: STORY_DESCRIPTION_SYSTEM, user: lines.join('\n') };
}
/**
* Parse + validate the LLM story-description output. Rejects empty
* responses, boilerplate preambles that slipped through the system
* prompt, outputs that trivially echo the headline (sanity guard
* against models that default to copying the prompt), and lengths
* that drift far outside the prompted range.
*
* @param {unknown} text
* @param {string} [headline] used to detect headline-echo drift
* @returns {string | null}
*/
export function parseStoryDescription(text, headline) {
if (typeof text !== 'string') return null;
let s = text.trim();
if (!s) return null;
s = s.replace(/^[\u201C"']+/, '').replace(/[\u201D"']+$/, '').trim();
const match = s.match(/^[^.!?]+[.!?]/);
const sentence = match ? match[0].trim() : s;
if (sentence.length < 40 || sentence.length > 400) return null;
if (typeof headline === 'string') {
const normalise = /** @param {string} x */ (x) => x.trim().toLowerCase().replace(/\s+/g, ' ');
// Reject outputs that are a verbatim echo of the headline — that
// is exactly the fallback we're replacing, shipping it as
// "LLM enrichment" would be dishonest about cache spend.
if (normalise(sentence) === normalise(headline)) return null;
}
return sentence;
}
/**
* Resolve a description sentence for one story via cache → LLM.
* Returns null on any failure; caller falls back to the composer's
* baseline (cleaned headline) rather than shipping with a placeholder.
*
* @param {object} story
* @param {{
* callLLM: (system: string, user: string, opts: object) => Promise<string|null>;
* cacheGet: (key: string) => Promise<unknown>;
* cacheSet: (key: string, value: unknown, ttlSec: number) => Promise<void>;
* }} deps
*/
export async function generateStoryDescription(story, deps) {
// Shares hashBriefStory() with whyMatters — the key prefix
// (`brief:llm:description:v2:`) is what separates the two cache
// namespaces; the material is the six fields including description.
// Bumped v1→v2 on 2026-04-24 alongside the RSS-description fix so
// cached pre-grounding output (hallucinated named actors from
// headline-only prompts) is evicted. hashBriefStory itself includes
// description in the hash material, so content drift invalidates
// naturally too — the prefix bump is belt-and-braces.
const key = `brief:llm:description:v2:${await hashBriefStory(story)}`;
try {
const hit = await deps.cacheGet(key);
if (typeof hit === 'string') {
// Revalidate on cache hit so a pre-fix bad row (short, echo,
// malformed) can't flow into the envelope unchecked.
const valid = parseStoryDescription(hit, story.headline);
if (valid) return valid;
}
} catch { /* cache miss is fine */ }
// Sanitise the story BEFORE building the prompt. `description` (RSS body)
// is untrusted input; without sanitisation, a hostile feed's
// `<description>` would be an injection vector. The whyMatters path
// already does this — keep the two symmetric.
const { system, user } = buildStoryDescriptionPrompt(sanitizeStoryForPrompt(story));
let text = null;
try {
text = await deps.callLLM(system, user, {
maxTokens: 140,
temperature: 0.4,
timeoutMs: 10_000,
skipProviders: BRIEF_LLM_SKIP_PROVIDERS,
});
} catch {
return null;
}
const parsed = parseStoryDescription(text, story.headline);
if (!parsed) return null;
try {
await deps.cacheSet(key, parsed, STORY_DESCRIPTION_TTL_SEC);
} catch { /* ignore */ }
return parsed;
}
// ── Digest prose (per user) ────────────────────────────────────────────────
const DIGEST_PROSE_SYSTEM =
'You are the chief editor of WorldMonitor Brief. Given a ranked list of ' +
"today's top stories for a reader, produce EXACTLY this JSON and nothing " +
'else (no markdown, no code fences, no preamble):\n' +
'{\n' +
' "lead": "<23 sentence executive summary, editorial tone, references ' +
'the most important 12 threads, addresses the reader in the third person>",\n' +
' "threads": [\n' +
' { "tag": "<one-word editorial category e.g. Energy, Diplomacy, Climate>", ' +
'"teaser": "<one sentence describing what is developing>" }\n' +
' ],\n' +
' "signals": ["<forward-looking imperative phrase, <=14 words>"]\n' +
'}\n' +
'Threads: 36 items reflecting actual clusters in the stories. ' +
'Signals: 24 items, forward-looking.';
/**
* @param {Array<{ headline: string; threatLevel: string; category: string; country: string; source: string }>} stories
* @param {string} sensitivity
* @returns {{ system: string; user: string }}
*/
export function buildDigestPrompt(stories, sensitivity) {
const lines = stories.slice(0, 12).map((s, i) => {
const n = String(i + 1).padStart(2, '0');
return `${n}. [${s.threatLevel}] ${s.headline}${s.category} · ${s.country} · ${s.source}`;
});
const user = [
`Reader sensitivity level: ${sensitivity}`,
'',
"Today's surfaced stories (ranked):",
...lines,
].join('\n');
return { system: DIGEST_PROSE_SYSTEM, user };
}
/**
* Strict shape check for a parsed digest-prose object. Used by BOTH
* parseDigestProse (fresh LLM output) AND generateDigestProse's
* cache-hit path, so a bad row written under an older/buggy version
* can't poison the envelope at SETEX time. Returns a **normalised**
* copy of the object on success, null on any shape failure — never
* returns the caller's object by reference so downstream writes
* can't observe internal state.
*
* @param {unknown} obj
* @returns {{ lead: string; threads: Array<{tag:string;teaser:string}>; signals: string[] } | null}
*/
export function validateDigestProseShape(obj) {
if (!obj || typeof obj !== 'object' || Array.isArray(obj)) return null;
const lead = typeof obj.lead === 'string' ? obj.lead.trim() : '';
if (lead.length < 40 || lead.length > 800) return null;
const rawThreads = Array.isArray(obj.threads) ? obj.threads : [];
const threads = rawThreads
.filter((t) => t && typeof t.tag === 'string' && typeof t.teaser === 'string')
.map((t) => ({
tag: t.tag.trim().slice(0, 40),
teaser: t.teaser.trim().slice(0, 220),
}))
.filter((t) => t.tag.length > 0 && t.teaser.length > 0)
.slice(0, 6);
if (threads.length < 1) return null;
// The prompt instructs the model to produce signals of "<=14 words,
// forward-looking imperative phrase". Enforce both a word cap (with
// a small margin of 4 words for model drift and compound phrases)
// and a byte cap — a 30-word "signal" would render as a second
// paragraph on the signals page, breaking visual rhythm. Previously
// only the byte cap was enforced, allowing ~40-word signals to
// sneak through when the model ignored the word count.
const rawSignals = Array.isArray(obj.signals) ? obj.signals : [];
const signals = rawSignals
.filter((x) => typeof x === 'string')
.map((x) => x.trim())
.filter((x) => {
if (x.length === 0 || x.length >= 220) return false;
const words = x.split(/\s+/).filter(Boolean).length;
return words <= 18;
})
.slice(0, 6);
return { lead, threads, signals };
}
/**
* @param {unknown} text
* @returns {{ lead: string; threads: Array<{tag:string;teaser:string}>; signals: string[] } | null}
*/
export function parseDigestProse(text) {
if (typeof text !== 'string') return null;
let s = text.trim();
if (!s) return null;
// Defensive: strip common wrappings the model sometimes inserts
// despite the explicit system instruction.
s = s.replace(/^```(?:json)?\s*/i, '').replace(/\s*```$/, '').trim();
let obj;
try {
obj = JSON.parse(s);
} catch {
return null;
}
return validateDigestProseShape(obj);
}
/**
* Cache key for digest prose. MUST cover every field the LLM sees,
* in the order it sees them — anything less and we risk returning
* pre-computed prose for a materially different prompt (e.g. the
* same stories re-ranked, or with corrected category/country
* metadata). The old "sort + headline|severity" hash was explicitly
* about cache-hit rate; that optimisation is the wrong tradeoff for
* an editorial product whose correctness bar is "matches the email".
*
* v2 key space so pre-fix cache rows (under the looser key) are
* ignored on rollout — a one-tick cost to pay for clean semantics.
*/
function hashDigestInput(userId, stories, sensitivity) {
// Canonicalise as JSON of the fields the prompt actually references,
// in the prompt's ranked order. Stable stringification via an array
// of tuples keeps field ordering deterministic without relying on
// JS object-key iteration order.
const material = JSON.stringify([
sensitivity ?? '',
...stories.slice(0, 12).map((s) => [
s.headline ?? '',
s.threatLevel ?? '',
s.category ?? '',
s.country ?? '',
s.source ?? '',
]),
]);
const h = createHash('sha256').update(material).digest('hex').slice(0, 16);
return `${userId}:${sensitivity}:${h}`;
}
/**
* Resolve the digest prose object via cache → LLM.
* @param {string} userId
* @param {Array} stories
* @param {string} sensitivity
* @param {object} deps — { callLLM, cacheGet, cacheSet }
*/
export async function generateDigestProse(userId, stories, sensitivity, deps) {
// v2 key: see hashDigestInput() comment. Full-prompt hash + strict
// shape validation on every cache hit.
const key = `brief:llm:digest:v2:${hashDigestInput(userId, stories, sensitivity)}`;
try {
const hit = await deps.cacheGet(key);
// CRITICAL: re-run the shape validator on cache hits. Without
// this, a bad row (written under an older buggy code path, or
// partial write, or tampered Redis) flows straight into
// envelope.data.digest and the envelope later fails
// assertBriefEnvelope() at the /api/brief render boundary. The
// user's brief URL then 404s / expired-pages. Treat a
// shape-failed hit the same as a miss — re-LLM and overwrite.
if (hit) {
const validated = validateDigestProseShape(hit);
if (validated) return validated;
}
} catch { /* cache miss fine */ }
const { system, user } = buildDigestPrompt(stories, sensitivity);
let text = null;
try {
text = await deps.callLLM(system, user, {
maxTokens: 700,
temperature: 0.4,
timeoutMs: 15_000,
skipProviders: BRIEF_LLM_SKIP_PROVIDERS,
});
} catch {
return null;
}
const parsed = parseDigestProse(text);
if (!parsed) return null;
try {
await deps.cacheSet(key, parsed, DIGEST_PROSE_TTL_SEC);
} catch { /* ignore */ }
return parsed;
}
// ── Envelope enrichment ────────────────────────────────────────────────────
/**
* Bounded-concurrency map. Preserves input order. Doesn't short-circuit
* on individual failures — fn is expected to return a sentinel (null)
* on error and the caller decides.
*/
async function mapLimit(items, limit, fn) {
if (!Array.isArray(items) || items.length === 0) return [];
const n = Math.min(Math.max(1, limit), items.length);
const out = new Array(items.length);
let next = 0;
async function worker() {
while (true) {
const idx = next++;
if (idx >= items.length) return;
try {
out[idx] = await fn(items[idx], idx);
} catch {
out[idx] = items[idx];
}
}
}
await Promise.all(Array.from({ length: n }, worker));
return out;
}
/**
* Take a baseline BriefEnvelope (stubbed whyMatters + stubbed lead /
* threads / signals) and enrich it with LLM output. All failures fall
* through cleanly — the envelope that comes out is always a valid
* BriefEnvelope (structure unchanged; only string/array field
* contents are substituted).
*
* @param {object} envelope
* @param {{ userId: string; sensitivity?: string }} rule
* @param {{ callLLM: Function; cacheGet: Function; cacheSet: Function }} deps
*/
export async function enrichBriefEnvelopeWithLLM(envelope, rule, deps) {
if (!envelope?.data || !Array.isArray(envelope.data.stories)) return envelope;
const stories = envelope.data.stories;
const sensitivity = rule?.sensitivity ?? 'all';
// Per-story enrichment — whyMatters AND description in parallel
// per story (two LLM calls) but bounded across stories.
const enrichedStories = await mapLimit(stories, WHY_MATTERS_CONCURRENCY, async (story) => {
const [why, desc] = await Promise.all([
generateWhyMatters(story, deps),
generateStoryDescription(story, deps),
]);
if (!why && !desc) return story;
return {
...story,
...(why ? { whyMatters: why } : {}),
...(desc ? { description: desc } : {}),
};
});
// Per-user digest prose — one call.
const prose = await generateDigestProse(rule.userId, stories, sensitivity, deps);
const digest = prose
? {
...envelope.data.digest,
lead: prose.lead,
threads: prose.threads,
signals: prose.signals,
}
: envelope.data.digest;
return {
...envelope,
data: {
...envelope.data,
digest,
stories: enrichedStories,
},
};
}