Files
worldmonitor/shared/brief-llm-core.js
Elie Habib ec35cf4158 feat(brief): analyst prompt v2 — multi-sentence, grounded, story description (#3269)
* feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description

Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output
indistinguishable from legacy Gemini: identical single-sentence
abstraction ("destabilize / systemic / sovereign risk repricing") with
no named actors, metrics, or dates — in several cases Gemini was MORE
specific.

Root cause: 18–30 word cap compressed context specifics out.

v2 loosens three dials at once so we can settle the A/B:

1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences,
   40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc,
   MUST cite one specific named actor / metric / date / place from
   the context. Analyst path only; gemini path stays on v1.

2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects
   preamble boilerplate + leaked section labels + markdown.

3. Story description plumbed through — endpoint body accepts optional
   story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB).
   Cron forwards it when upstream has one (skipped when it equals the
   headline — no new signal).

Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the
first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length.

If shadow-diff 24h after deploy still shows no delta vs gemini, kill
is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy).

Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean.

* fix(brief): stop truncating v2 multi-sentence output + description in cache hash

Two P1s caught in PR #3269 review.

P1a — cron reparsed endpoint output with v1 single-sentence parser,
silently dropping sentences 2+3 of v2 analyst output. The endpoint had
ALREADY validated the string (parseWhyMattersV2 for analyst path;
parseWhyMatters for gemini). Re-parsing with v1 took only the first
sentence — exact regression #3269 was meant to fix.

Fix: trust the endpoint. Replace re-parse with bounds check (30–500
chars) + stub-echo reject. Added regression test asserting multi-
sentence output reaches the envelope unchanged.

P1b — `story.description` flowed into the analyst prompt but NOT into
the cache hash. Two requests with identical core fields but different
descriptions collided on one cache slot → second caller got prose
grounded in the FIRST caller's description.

Fix: add `description` as the 6th field of `hashBriefStory`. Bump
endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are
dropped. Updated the parity sentinel in brief-llm-core.test.mjs to
match 6-field semantics. Added regression tests covering different-
descriptions-differ and present-vs-absent-differ.

Tests: 6083 pass. typecheck × 2 clean.
2026-04-21 22:25:54 +04:00

189 lines
7.7 KiB
JavaScript
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
// @ts-check
/**
* Edge-safe pure helpers for the brief LLM enrichment path. Shared by:
* - scripts/lib/brief-llm.mjs (Railway cron, Node)
* - api/internal/brief-why-matters.ts (Vercel edge)
*
* No `node:*` imports. Hashing via Web Crypto (`crypto.subtle.digest`),
* which is available in both Edge and modern Node. Everything else is
* pure string manipulation.
*
* Any change here MUST be mirrored byte-for-byte to
* `scripts/shared/brief-llm-core.js` (enforced by the shared-mirror
* parity test; see `feedback_shared_dir_mirror_requirement`).
*/
/**
* System prompt for the one-sentence "why this matters" enrichment.
* Moved verbatim from scripts/lib/brief-llm.mjs so the edge endpoint
* and the cron fallback emit the identical editorial voice.
*/
export const WHY_MATTERS_SYSTEM =
'You are the editor of WorldMonitor Brief, a geopolitical intelligence magazine. ' +
'For each story below, write ONE concise sentence (1830 words) explaining the ' +
'regional or global stakes. Editorial, impersonal, serious. No preamble ' +
'("This matters because…"), no questions, no calls to action, no markdown, ' +
'no quotes. One sentence only.';
/**
* @param {{
* headline: string;
* source: string;
* threatLevel: string;
* category: string;
* country: string;
* }} story
* @returns {{ system: string; user: string }}
*/
export function buildWhyMattersUserPrompt(story) {
const user = [
`Headline: ${story.headline}`,
`Source: ${story.source}`,
`Severity: ${story.threatLevel}`,
`Category: ${story.category}`,
`Country: ${story.country}`,
'',
'One editorial sentence on why this matters:',
].join('\n');
return { system: WHY_MATTERS_SYSTEM, user };
}
/**
* Parse + validate the LLM response into a single editorial sentence.
* Returns null when the output is obviously wrong (empty, boilerplate
* preamble that survived stripReasoningPreamble, too short / too long).
*
* @param {unknown} text
* @returns {string | null}
*/
export function parseWhyMatters(text) {
if (typeof text !== 'string') return null;
let s = text.trim();
if (!s) return null;
s = s.replace(/^[\u201C"']+/, '').replace(/[\u201D"']+$/, '').trim();
const match = s.match(/^[^.!?]+[.!?]/);
const sentence = match ? match[0].trim() : s;
if (sentence.length < 30 || sentence.length > 400) return null;
if (/^story flagged by your sensitivity/i.test(sentence)) return null;
return sentence;
}
/**
* Deterministic 16-char hex hash of the SIX story fields that flow
* into the whyMatters prompt (5 core + description). Cache identity
* MUST cover every field that shapes the LLM output, or two requests
* with the same core fields but different descriptions will share a
* cache entry and the second caller gets prose grounded in the first
* caller's description (P1 regression caught in PR #3269 review).
*
* History:
* - pre-v3: 5 fields, sync `node:crypto.createHash`.
* - v3: moved to Web Crypto (async), same 5 fields.
* - v5 (with endpoint cache bump to brief:llm:whymatters:v5:):
* 6 fields — `description` added to match the analyst path's
* v2 prompt which interpolates `Description: <desc>` between
* headline and source.
*
* Uses Web Crypto so the module is edge-safe. Returns a Promise because
* `crypto.subtle.digest` is async; cron call sites are already in an
* async context so the await is free.
*
* @param {{
* headline?: string;
* source?: string;
* threatLevel?: string;
* category?: string;
* country?: string;
* description?: string;
* }} story
* @returns {Promise<string>}
*/
export async function hashBriefStory(story) {
const material = [
story.headline ?? '',
story.source ?? '',
story.threatLevel ?? '',
story.category ?? '',
story.country ?? '',
// New in v5: description is a prompt input on the analyst path,
// so MUST be part of cache identity. Absent on legacy paths →
// empty string → deterministic; same-story-same-description pairs
// still collide on purpose, different descriptions don't.
story.description ?? '',
].join('||');
const bytes = new TextEncoder().encode(material);
const digest = await crypto.subtle.digest('SHA-256', bytes);
let hex = '';
const view = new Uint8Array(digest);
for (let i = 0; i < view.length; i++) {
hex += view[i].toString(16).padStart(2, '0');
}
return hex.slice(0, 16);
}
// ── Analyst-path prompt v2 (multi-sentence, grounded) ──────────────────────
//
// Shadow-diff on 12 prod stories (2026-04-21) showed the v1 analyst output
// was indistinguishable from the legacy Gemini-only output: identical
// single-sentence abstraction-speak ("destabilize / systemic / sovereign
// risk repricing") with no named actors, metrics, or dates. Root cause:
// the 1830 word cap compressed the context's specifics out of the LLM's
// response. v2 loosens to 4070 words across 23 sentences and REQUIRES
// the LLM to ground at least one specific reference from the live context.
/**
* System prompt for the analyst-path v2 (23 sentences, ~4070 words,
* grounded in a specific named actor / metric / date / place drawn
* from the live context). Shape nudged toward the WMAnalyst chat voice
* (SITUATION → ANALYSIS → optional WATCH) but rendered as plain prose,
* no section labels in the output.
*/
export const WHY_MATTERS_ANALYST_SYSTEM_V2 =
'You are the lead analyst at WorldMonitor Brief, a geopolitical intelligence magazine. ' +
'Using the Live WorldMonitor Context AND the story, write 23 sentences (4070 words total) ' +
'on why the story matters.\n\n' +
'STRUCTURE:\n' +
'1. SITUATION — what is happening right now, grounded in a SPECIFIC named actor, ' +
'metric, date, or place drawn from the context.\n' +
'2. ANALYSIS — the structural consequence (why this forces a repricing, shifts ' +
'the balance, triggers a cascade).\n' +
'3. (Optional) WATCH — the threshold or indicator to track, if clear from the context.\n\n' +
'HARD CONSTRAINTS:\n' +
'- Total length 4070 words across 23 sentences.\n' +
'- MUST reference at least ONE specific: named person / country / organization / ' +
'number / percentage / date / city — drawn from the context, NOT invented.\n' +
'- No preamble ("This matters because…", "The importance of…").\n' +
'- No markdown, no bullet points, no section labels in the output — plain prose.\n' +
'- Editorial, impersonal, serious. No calls to action, no questions, no quotes.';
/**
* Parse + validate the analyst-path v2 LLM response. Accepts
* multi-sentence output (23 sentences), 100500 chars. Otherwise
* same rejection semantics as v1 (stub echo, empty) plus explicit
* rejection of preamble boilerplate and leaked section labels.
*
* Returns null when the output is obviously wrong so the caller can
* fall through to the next layer.
*
* @param {unknown} text
* @returns {string | null}
*/
export function parseWhyMattersV2(text) {
if (typeof text !== 'string') return null;
let s = text.trim();
if (!s) return null;
// Drop surrounding quotes if the model insisted.
s = s.replace(/^[\u201C"']+/, '').replace(/[\u201D"']+$/, '').trim();
if (s.length < 100 || s.length > 500) return null;
// Reject the stub echo (same as v1).
if (/^story flagged by your sensitivity/i.test(s)) return null;
// Reject common preamble the system prompt explicitly banned.
if (/^(this matters because|the importance of|it is important|importantly,|in summary,|to summarize)/i.test(s)) {
return null;
}
// Reject markdown / section-label leakage (we told it to use plain prose).
if (/^(#|-|\*|\d+\.\s)/.test(s)) return null;
if (/^(situation|analysis|watch)\s*[:\-–—]/i.test(s)) return null;
return s;
}