Files
worldmonitor/shared
Elie Habib ec35cf4158 feat(brief): analyst prompt v2 — multi-sentence, grounded, story description (#3269)
* feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description

Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output
indistinguishable from legacy Gemini: identical single-sentence
abstraction ("destabilize / systemic / sovereign risk repricing") with
no named actors, metrics, or dates — in several cases Gemini was MORE
specific.

Root cause: 18–30 word cap compressed context specifics out.

v2 loosens three dials at once so we can settle the A/B:

1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences,
   40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc,
   MUST cite one specific named actor / metric / date / place from
   the context. Analyst path only; gemini path stays on v1.

2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects
   preamble boilerplate + leaked section labels + markdown.

3. Story description plumbed through — endpoint body accepts optional
   story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB).
   Cron forwards it when upstream has one (skipped when it equals the
   headline — no new signal).

Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the
first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length.

If shadow-diff 24h after deploy still shows no delta vs gemini, kill
is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy).

Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean.

* fix(brief): stop truncating v2 multi-sentence output + description in cache hash

Two P1s caught in PR #3269 review.

P1a — cron reparsed endpoint output with v1 single-sentence parser,
silently dropping sentences 2+3 of v2 analyst output. The endpoint had
ALREADY validated the string (parseWhyMattersV2 for analyst path;
parseWhyMatters for gemini). Re-parsing with v1 took only the first
sentence — exact regression #3269 was meant to fix.

Fix: trust the endpoint. Replace re-parse with bounds check (30–500
chars) + stub-echo reject. Added regression test asserting multi-
sentence output reaches the envelope unchanged.

P1b — `story.description` flowed into the analyst prompt but NOT into
the cache hash. Two requests with identical core fields but different
descriptions collided on one cache slot → second caller got prose
grounded in the FIRST caller's description.

Fix: add `description` as the 6th field of `hashBriefStory`. Bump
endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are
dropped. Updated the parity sentinel in brief-llm-core.test.mjs to
match 6-field semantics. Added regression tests covering different-
descriptions-differ and present-vs-absent-differ.

Tests: 6083 pass. typecheck × 2 clean.
2026-04-21 22:25:54 +04:00
..