mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
feat(brief): analyst prompt v2 — multi-sentence, grounded, story description (#3269)
* feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description
Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output
indistinguishable from legacy Gemini: identical single-sentence
abstraction ("destabilize / systemic / sovereign risk repricing") with
no named actors, metrics, or dates — in several cases Gemini was MORE
specific.
Root cause: 18–30 word cap compressed context specifics out.
v2 loosens three dials at once so we can settle the A/B:
1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences,
40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc,
MUST cite one specific named actor / metric / date / place from
the context. Analyst path only; gemini path stays on v1.
2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects
preamble boilerplate + leaked section labels + markdown.
3. Story description plumbed through — endpoint body accepts optional
story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB).
Cron forwards it when upstream has one (skipped when it equals the
headline — no new signal).
Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the
first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length.
If shadow-diff 24h after deploy still shows no delta vs gemini, kill
is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy).
Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean.
* fix(brief): stop truncating v2 multi-sentence output + description in cache hash
Two P1s caught in PR #3269 review.
P1a — cron reparsed endpoint output with v1 single-sentence parser,
silently dropping sentences 2+3 of v2 analyst output. The endpoint had
ALREADY validated the string (parseWhyMattersV2 for analyst path;
parseWhyMatters for gemini). Re-parsing with v1 took only the first
sentence — exact regression #3269 was meant to fix.
Fix: trust the endpoint. Replace re-parse with bounds check (30–500
chars) + stub-echo reject. Added regression test asserting multi-
sentence output reaches the envelope unchanged.
P1b — `story.description` flowed into the analyst prompt but NOT into
the cache hash. Two requests with identical core fields but different
descriptions collided on one cache slot → second caller got prose
grounded in the FIRST caller's description.
Fix: add `description` as the 6th field of `hashBriefStory`. Bump
endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are
dropped. Updated the parity sentinel in brief-llm-core.test.mjs to
match 6-field semantics. Added regression tests covering different-
descriptions-differ and present-vs-absent-differ.
Tests: 6083 pass. typecheck × 2 clean.
This commit is contained in:
7
shared/brief-llm-core.d.ts
vendored
7
shared/brief-llm-core.d.ts
vendored
@@ -4,6 +4,9 @@ export interface BriefStoryHashInput {
|
||||
threatLevel?: string;
|
||||
category?: string;
|
||||
country?: string;
|
||||
/** v5: part of cache identity so same-story + different description
|
||||
* don't collide on cached analyst output. */
|
||||
description?: string;
|
||||
}
|
||||
|
||||
export interface BriefStoryPromptInput {
|
||||
@@ -24,3 +27,7 @@ export function buildWhyMattersUserPrompt(story: BriefStoryPromptInput): {
|
||||
export function parseWhyMatters(text: unknown): string | null;
|
||||
|
||||
export function hashBriefStory(story: BriefStoryHashInput): Promise<string>;
|
||||
|
||||
// ── v2 (analyst path only) ────────────────────────────────────────────────
|
||||
export const WHY_MATTERS_ANALYST_SYSTEM_V2: string;
|
||||
export function parseWhyMattersV2(text: unknown): string | null;
|
||||
|
||||
@@ -69,11 +69,20 @@ export function parseWhyMatters(text) {
|
||||
}
|
||||
|
||||
/**
|
||||
* Deterministic 16-char hex hash of the five story fields that flow
|
||||
* into the whyMatters prompt. Same material as the pre-v3 sync
|
||||
* implementation (`scripts/lib/brief-llm.mjs:hashBriefStory`) — a
|
||||
* fixed fixture in tests/brief-llm-core.test.mjs pins the output so a
|
||||
* future refactor cannot silently invalidate every cached entry.
|
||||
* Deterministic 16-char hex hash of the SIX story fields that flow
|
||||
* into the whyMatters prompt (5 core + description). Cache identity
|
||||
* MUST cover every field that shapes the LLM output, or two requests
|
||||
* with the same core fields but different descriptions will share a
|
||||
* cache entry and the second caller gets prose grounded in the first
|
||||
* caller's description (P1 regression caught in PR #3269 review).
|
||||
*
|
||||
* History:
|
||||
* - pre-v3: 5 fields, sync `node:crypto.createHash`.
|
||||
* - v3: moved to Web Crypto (async), same 5 fields.
|
||||
* - v5 (with endpoint cache bump to brief:llm:whymatters:v5:):
|
||||
* 6 fields — `description` added to match the analyst path's
|
||||
* v2 prompt which interpolates `Description: <desc>` between
|
||||
* headline and source.
|
||||
*
|
||||
* Uses Web Crypto so the module is edge-safe. Returns a Promise because
|
||||
* `crypto.subtle.digest` is async; cron call sites are already in an
|
||||
@@ -85,6 +94,7 @@ export function parseWhyMatters(text) {
|
||||
* threatLevel?: string;
|
||||
* category?: string;
|
||||
* country?: string;
|
||||
* description?: string;
|
||||
* }} story
|
||||
* @returns {Promise<string>}
|
||||
*/
|
||||
@@ -95,6 +105,11 @@ export async function hashBriefStory(story) {
|
||||
story.threatLevel ?? '',
|
||||
story.category ?? '',
|
||||
story.country ?? '',
|
||||
// New in v5: description is a prompt input on the analyst path,
|
||||
// so MUST be part of cache identity. Absent on legacy paths →
|
||||
// empty string → deterministic; same-story-same-description pairs
|
||||
// still collide on purpose, different descriptions don't.
|
||||
story.description ?? '',
|
||||
].join('||');
|
||||
const bytes = new TextEncoder().encode(material);
|
||||
const digest = await crypto.subtle.digest('SHA-256', bytes);
|
||||
@@ -105,3 +120,69 @@ export async function hashBriefStory(story) {
|
||||
}
|
||||
return hex.slice(0, 16);
|
||||
}
|
||||
|
||||
// ── Analyst-path prompt v2 (multi-sentence, grounded) ──────────────────────
|
||||
//
|
||||
// Shadow-diff on 12 prod stories (2026-04-21) showed the v1 analyst output
|
||||
// was indistinguishable from the legacy Gemini-only output: identical
|
||||
// single-sentence abstraction-speak ("destabilize / systemic / sovereign
|
||||
// risk repricing") with no named actors, metrics, or dates. Root cause:
|
||||
// the 18–30 word cap compressed the context's specifics out of the LLM's
|
||||
// response. v2 loosens to 40–70 words across 2–3 sentences and REQUIRES
|
||||
// the LLM to ground at least one specific reference from the live context.
|
||||
|
||||
/**
|
||||
* System prompt for the analyst-path v2 (2–3 sentences, ~40–70 words,
|
||||
* grounded in a specific named actor / metric / date / place drawn
|
||||
* from the live context). Shape nudged toward the WMAnalyst chat voice
|
||||
* (SITUATION → ANALYSIS → optional WATCH) but rendered as plain prose,
|
||||
* no section labels in the output.
|
||||
*/
|
||||
export const WHY_MATTERS_ANALYST_SYSTEM_V2 =
|
||||
'You are the lead analyst at WorldMonitor Brief, a geopolitical intelligence magazine. ' +
|
||||
'Using the Live WorldMonitor Context AND the story, write 2–3 sentences (40–70 words total) ' +
|
||||
'on why the story matters.\n\n' +
|
||||
'STRUCTURE:\n' +
|
||||
'1. SITUATION — what is happening right now, grounded in a SPECIFIC named actor, ' +
|
||||
'metric, date, or place drawn from the context.\n' +
|
||||
'2. ANALYSIS — the structural consequence (why this forces a repricing, shifts ' +
|
||||
'the balance, triggers a cascade).\n' +
|
||||
'3. (Optional) WATCH — the threshold or indicator to track, if clear from the context.\n\n' +
|
||||
'HARD CONSTRAINTS:\n' +
|
||||
'- Total length 40–70 words across 2–3 sentences.\n' +
|
||||
'- MUST reference at least ONE specific: named person / country / organization / ' +
|
||||
'number / percentage / date / city — drawn from the context, NOT invented.\n' +
|
||||
'- No preamble ("This matters because…", "The importance of…").\n' +
|
||||
'- No markdown, no bullet points, no section labels in the output — plain prose.\n' +
|
||||
'- Editorial, impersonal, serious. No calls to action, no questions, no quotes.';
|
||||
|
||||
/**
|
||||
* Parse + validate the analyst-path v2 LLM response. Accepts
|
||||
* multi-sentence output (2–3 sentences), 100–500 chars. Otherwise
|
||||
* same rejection semantics as v1 (stub echo, empty) plus explicit
|
||||
* rejection of preamble boilerplate and leaked section labels.
|
||||
*
|
||||
* Returns null when the output is obviously wrong so the caller can
|
||||
* fall through to the next layer.
|
||||
*
|
||||
* @param {unknown} text
|
||||
* @returns {string | null}
|
||||
*/
|
||||
export function parseWhyMattersV2(text) {
|
||||
if (typeof text !== 'string') return null;
|
||||
let s = text.trim();
|
||||
if (!s) return null;
|
||||
// Drop surrounding quotes if the model insisted.
|
||||
s = s.replace(/^[\u201C"']+/, '').replace(/[\u201D"']+$/, '').trim();
|
||||
if (s.length < 100 || s.length > 500) return null;
|
||||
// Reject the stub echo (same as v1).
|
||||
if (/^story flagged by your sensitivity/i.test(s)) return null;
|
||||
// Reject common preamble the system prompt explicitly banned.
|
||||
if (/^(this matters because|the importance of|it is important|importantly,|in summary,|to summarize)/i.test(s)) {
|
||||
return null;
|
||||
}
|
||||
// Reject markdown / section-label leakage (we told it to use plain prose).
|
||||
if (/^(#|-|\*|\d+\.\s)/.test(s)) return null;
|
||||
if (/^(situation|analysis|watch)\s*[:\-–—]/i.test(s)) return null;
|
||||
return s;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user