mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description
Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output
indistinguishable from legacy Gemini: identical single-sentence
abstraction ("destabilize / systemic / sovereign risk repricing") with
no named actors, metrics, or dates — in several cases Gemini was MORE
specific.
Root cause: 18–30 word cap compressed context specifics out.
v2 loosens three dials at once so we can settle the A/B:
1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences,
40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc,
MUST cite one specific named actor / metric / date / place from
the context. Analyst path only; gemini path stays on v1.
2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects
preamble boilerplate + leaked section labels + markdown.
3. Story description plumbed through — endpoint body accepts optional
story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB).
Cron forwards it when upstream has one (skipped when it equals the
headline — no new signal).
Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the
first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length.
If shadow-diff 24h after deploy still shows no delta vs gemini, kill
is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy).
Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean.
* fix(brief): stop truncating v2 multi-sentence output + description in cache hash
Two P1s caught in PR #3269 review.
P1a — cron reparsed endpoint output with v1 single-sentence parser,
silently dropping sentences 2+3 of v2 analyst output. The endpoint had
ALREADY validated the string (parseWhyMattersV2 for analyst path;
parseWhyMatters for gemini). Re-parsing with v1 took only the first
sentence — exact regression #3269 was meant to fix.
Fix: trust the endpoint. Replace re-parse with bounds check (30–500
chars) + stub-echo reject. Added regression test asserting multi-
sentence output reaches the envelope unchanged.
P1b — `story.description` flowed into the analyst prompt but NOT into
the cache hash. Two requests with identical core fields but different
descriptions collided on one cache slot → second caller got prose
grounded in the FIRST caller's description.
Fix: add `description` as the 6th field of `hashBriefStory`. Bump
endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are
dropped. Updated the parity sentinel in brief-llm-core.test.mjs to
match 6-field semantics. Added regression tests covering different-
descriptions-differ and present-vs-absent-differ.
Tests: 6083 pass. typecheck × 2 clean.
189 lines
7.7 KiB
JavaScript
189 lines
7.7 KiB
JavaScript
// @ts-check
|
||
/**
|
||
* Edge-safe pure helpers for the brief LLM enrichment path. Shared by:
|
||
* - scripts/lib/brief-llm.mjs (Railway cron, Node)
|
||
* - api/internal/brief-why-matters.ts (Vercel edge)
|
||
*
|
||
* No `node:*` imports. Hashing via Web Crypto (`crypto.subtle.digest`),
|
||
* which is available in both Edge and modern Node. Everything else is
|
||
* pure string manipulation.
|
||
*
|
||
* Any change here MUST be mirrored byte-for-byte to
|
||
* `scripts/shared/brief-llm-core.js` (enforced by the shared-mirror
|
||
* parity test; see `feedback_shared_dir_mirror_requirement`).
|
||
*/
|
||
|
||
/**
|
||
* System prompt for the one-sentence "why this matters" enrichment.
|
||
* Moved verbatim from scripts/lib/brief-llm.mjs so the edge endpoint
|
||
* and the cron fallback emit the identical editorial voice.
|
||
*/
|
||
export const WHY_MATTERS_SYSTEM =
|
||
'You are the editor of WorldMonitor Brief, a geopolitical intelligence magazine. ' +
|
||
'For each story below, write ONE concise sentence (18–30 words) explaining the ' +
|
||
'regional or global stakes. Editorial, impersonal, serious. No preamble ' +
|
||
'("This matters because…"), no questions, no calls to action, no markdown, ' +
|
||
'no quotes. One sentence only.';
|
||
|
||
/**
|
||
* @param {{
|
||
* headline: string;
|
||
* source: string;
|
||
* threatLevel: string;
|
||
* category: string;
|
||
* country: string;
|
||
* }} story
|
||
* @returns {{ system: string; user: string }}
|
||
*/
|
||
export function buildWhyMattersUserPrompt(story) {
|
||
const user = [
|
||
`Headline: ${story.headline}`,
|
||
`Source: ${story.source}`,
|
||
`Severity: ${story.threatLevel}`,
|
||
`Category: ${story.category}`,
|
||
`Country: ${story.country}`,
|
||
'',
|
||
'One editorial sentence on why this matters:',
|
||
].join('\n');
|
||
return { system: WHY_MATTERS_SYSTEM, user };
|
||
}
|
||
|
||
/**
|
||
* Parse + validate the LLM response into a single editorial sentence.
|
||
* Returns null when the output is obviously wrong (empty, boilerplate
|
||
* preamble that survived stripReasoningPreamble, too short / too long).
|
||
*
|
||
* @param {unknown} text
|
||
* @returns {string | null}
|
||
*/
|
||
export function parseWhyMatters(text) {
|
||
if (typeof text !== 'string') return null;
|
||
let s = text.trim();
|
||
if (!s) return null;
|
||
s = s.replace(/^[\u201C"']+/, '').replace(/[\u201D"']+$/, '').trim();
|
||
const match = s.match(/^[^.!?]+[.!?]/);
|
||
const sentence = match ? match[0].trim() : s;
|
||
if (sentence.length < 30 || sentence.length > 400) return null;
|
||
if (/^story flagged by your sensitivity/i.test(sentence)) return null;
|
||
return sentence;
|
||
}
|
||
|
||
/**
|
||
* Deterministic 16-char hex hash of the SIX story fields that flow
|
||
* into the whyMatters prompt (5 core + description). Cache identity
|
||
* MUST cover every field that shapes the LLM output, or two requests
|
||
* with the same core fields but different descriptions will share a
|
||
* cache entry and the second caller gets prose grounded in the first
|
||
* caller's description (P1 regression caught in PR #3269 review).
|
||
*
|
||
* History:
|
||
* - pre-v3: 5 fields, sync `node:crypto.createHash`.
|
||
* - v3: moved to Web Crypto (async), same 5 fields.
|
||
* - v5 (with endpoint cache bump to brief:llm:whymatters:v5:):
|
||
* 6 fields — `description` added to match the analyst path's
|
||
* v2 prompt which interpolates `Description: <desc>` between
|
||
* headline and source.
|
||
*
|
||
* Uses Web Crypto so the module is edge-safe. Returns a Promise because
|
||
* `crypto.subtle.digest` is async; cron call sites are already in an
|
||
* async context so the await is free.
|
||
*
|
||
* @param {{
|
||
* headline?: string;
|
||
* source?: string;
|
||
* threatLevel?: string;
|
||
* category?: string;
|
||
* country?: string;
|
||
* description?: string;
|
||
* }} story
|
||
* @returns {Promise<string>}
|
||
*/
|
||
export async function hashBriefStory(story) {
|
||
const material = [
|
||
story.headline ?? '',
|
||
story.source ?? '',
|
||
story.threatLevel ?? '',
|
||
story.category ?? '',
|
||
story.country ?? '',
|
||
// New in v5: description is a prompt input on the analyst path,
|
||
// so MUST be part of cache identity. Absent on legacy paths →
|
||
// empty string → deterministic; same-story-same-description pairs
|
||
// still collide on purpose, different descriptions don't.
|
||
story.description ?? '',
|
||
].join('||');
|
||
const bytes = new TextEncoder().encode(material);
|
||
const digest = await crypto.subtle.digest('SHA-256', bytes);
|
||
let hex = '';
|
||
const view = new Uint8Array(digest);
|
||
for (let i = 0; i < view.length; i++) {
|
||
hex += view[i].toString(16).padStart(2, '0');
|
||
}
|
||
return hex.slice(0, 16);
|
||
}
|
||
|
||
// ── Analyst-path prompt v2 (multi-sentence, grounded) ──────────────────────
|
||
//
|
||
// Shadow-diff on 12 prod stories (2026-04-21) showed the v1 analyst output
|
||
// was indistinguishable from the legacy Gemini-only output: identical
|
||
// single-sentence abstraction-speak ("destabilize / systemic / sovereign
|
||
// risk repricing") with no named actors, metrics, or dates. Root cause:
|
||
// the 18–30 word cap compressed the context's specifics out of the LLM's
|
||
// response. v2 loosens to 40–70 words across 2–3 sentences and REQUIRES
|
||
// the LLM to ground at least one specific reference from the live context.
|
||
|
||
/**
|
||
* System prompt for the analyst-path v2 (2–3 sentences, ~40–70 words,
|
||
* grounded in a specific named actor / metric / date / place drawn
|
||
* from the live context). Shape nudged toward the WMAnalyst chat voice
|
||
* (SITUATION → ANALYSIS → optional WATCH) but rendered as plain prose,
|
||
* no section labels in the output.
|
||
*/
|
||
export const WHY_MATTERS_ANALYST_SYSTEM_V2 =
|
||
'You are the lead analyst at WorldMonitor Brief, a geopolitical intelligence magazine. ' +
|
||
'Using the Live WorldMonitor Context AND the story, write 2–3 sentences (40–70 words total) ' +
|
||
'on why the story matters.\n\n' +
|
||
'STRUCTURE:\n' +
|
||
'1. SITUATION — what is happening right now, grounded in a SPECIFIC named actor, ' +
|
||
'metric, date, or place drawn from the context.\n' +
|
||
'2. ANALYSIS — the structural consequence (why this forces a repricing, shifts ' +
|
||
'the balance, triggers a cascade).\n' +
|
||
'3. (Optional) WATCH — the threshold or indicator to track, if clear from the context.\n\n' +
|
||
'HARD CONSTRAINTS:\n' +
|
||
'- Total length 40–70 words across 2–3 sentences.\n' +
|
||
'- MUST reference at least ONE specific: named person / country / organization / ' +
|
||
'number / percentage / date / city — drawn from the context, NOT invented.\n' +
|
||
'- No preamble ("This matters because…", "The importance of…").\n' +
|
||
'- No markdown, no bullet points, no section labels in the output — plain prose.\n' +
|
||
'- Editorial, impersonal, serious. No calls to action, no questions, no quotes.';
|
||
|
||
/**
|
||
* Parse + validate the analyst-path v2 LLM response. Accepts
|
||
* multi-sentence output (2–3 sentences), 100–500 chars. Otherwise
|
||
* same rejection semantics as v1 (stub echo, empty) plus explicit
|
||
* rejection of preamble boilerplate and leaked section labels.
|
||
*
|
||
* Returns null when the output is obviously wrong so the caller can
|
||
* fall through to the next layer.
|
||
*
|
||
* @param {unknown} text
|
||
* @returns {string | null}
|
||
*/
|
||
export function parseWhyMattersV2(text) {
|
||
if (typeof text !== 'string') return null;
|
||
let s = text.trim();
|
||
if (!s) return null;
|
||
// Drop surrounding quotes if the model insisted.
|
||
s = s.replace(/^[\u201C"']+/, '').replace(/[\u201D"']+$/, '').trim();
|
||
if (s.length < 100 || s.length > 500) return null;
|
||
// Reject the stub echo (same as v1).
|
||
if (/^story flagged by your sensitivity/i.test(s)) return null;
|
||
// Reject common preamble the system prompt explicitly banned.
|
||
if (/^(this matters because|the importance of|it is important|importantly,|in summary,|to summarize)/i.test(s)) {
|
||
return null;
|
||
}
|
||
// Reject markdown / section-label leakage (we told it to use plain prose).
|
||
if (/^(#|-|\*|\d+\.\s)/.test(s)) return null;
|
||
if (/^(situation|analysis|watch)\s*[:\-–—]/i.test(s)) return null;
|
||
return s;
|
||
}
|