fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 (#3218)

* fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 PR B of the scoring recalibration plan (docs/plans/2026-04-17-002). Builds on PR A (weight rebalance, PR #3144) which achieved Pearson 0.669. This PR targets the remaining noise in the 50-69 band where editorials, tutorials, and domestic crime score alongside real news. LLM prompt upgrade (both writers): - scripts/ais-relay.cjs CLASSIFY_SYSTEM_PROMPT: added per-level guidelines, content-type distinction (editorial/opinion/tutorial → info, domestic crime → info, mass-casualty → critical), and concrete counterexamples. - server/worldmonitor/intelligence/v1/classify-event.ts: same guidelines added to align the second cache writer. Classify cache bump: - classify:sebuf:v1: → classify:sebuf:v2: in all three locations (ais-relay.cjs classifyCacheKey, list-feed-digest.ts enrichWithAiCache, _shared.ts CLASSIFY_CACHE_PREFIX). Old v1 entries expire naturally (24h TTL). All items reclassified within 15 min of Railway deploy. Keyword additions (_classifier.ts): - HIGH: 'sanctions imposed', 'sanctions package', 'new sanctions' (phrase patterns — no false positives on 'sanctioned 10 individuals') - MEDIUM: promoted 'ceasefire' from LOW to reduce prompt/keyword misalignment during cold-cache window Shadow-log v4: - Clean dataset for post-prompt-change recalibration. v3 rolls off via 7-day TTL. Deploy order: Railway first (seedClassify prewarms v2 cache immediately), then Vercel. First ~15 min of v4 may carry stale digest-cached scores. 🤖 Generated with Claude Opus 4.6 via Claude Code + Compound Engineering v2.49.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(scoring): align classify-event prompt + remove dead keywords + update v4 docs Review findings on PR #3218: P1: classify-event.ts prompt was missing 2 counterexamples and the "Focus" line present in the relay prompt. Both writers share classify:sebuf:v2 cache, so differing prompts mean nondeterministic classification depending on which path writes first. Now both prompts have identical level guidelines and counterexamples (format differs: array vs single object, but classification logic is aligned). P2: Removed 3 dead phrase-pattern keywords (sanctions imposed/package/ new) — the existing 'sanctions' entry already substring-matches all of them and maps to the same (high, economic). Just dead code that misled readers into thinking they added coverage. P2: Updated stale v3 references in cache-keys.ts (doc block + exported constant) and shadow-score-report.mjs header to v4. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-25 17:14:57 +02:00 · 2026-04-20 08:05:13 +04:00
parent 84eec7f09f
commit 1581b2dd70
9 changed files with 51 additions and 19 deletions
--- a/server/_shared/cache-keys.ts
+++ b/server/_shared/cache-keys.ts
@@ -22,8 +22,9 @@ export const STORY_TRACKING_TTL_S = 172800;
 * TTL for all: 172800s (48h), refreshed each digest cycle.
 * Shadow scoring key (written by notification-relay.cjs, which owns the live
 * value — the constant here is documentation only, not imported):
- *   shadow:score-log:v3            ZSet   score=epoch_ms, member=JSON{ts,importanceScore,severity,eventType,title,source,publishedAt,corroborationCount,variant}
- *   shadow:score-log:v2            ZSet   legacy (weight rebalance PR) — self-prunes via 7d ZREMRANGEBYSCORE
+ *   shadow:score-log:v4            ZSet   score=epoch_ms, member=JSON{ts,importanceScore,severity,eventType,title,source,publishedAt,corroborationCount,variant}
+ *   shadow:score-log:v3            ZSet   legacy (weight rebalance) — self-prunes via 7d ZREMRANGEBYSCORE
+ *   shadow:score-log:v2            ZSet   legacy (stale-score fix) — self-prunes
 *   shadow:score-log:v1            ZSet   legacy (pre-PR #3069) — self-prunes
 */
 export const STORY_TRACK_KEY = (titleHash: string) => `story:track:v1:${titleHash}`;
@@ -31,10 +32,10 @@ export const STORY_SOURCES_KEY = (titleHash: string) => `story:sources:v1:${titl
 export const STORY_PEAK_KEY = (titleHash: string) => `story:peak:v1:${titleHash}`;
 export const DIGEST_ACCUMULATOR_KEY = (variant: string, lang = 'en') => `digest:accumulator:v1:${variant}:${lang}`;
 export const DIGEST_LAST_SENT_KEY = (userId: string, variant: string) => `digest:last-sent:v1:${userId}:${variant}`;
-// NOTE: notification-relay.cjs owns the live value (shadow:score-log:v3 since weight rebalance).
+// NOTE: notification-relay.cjs owns the live value (shadow:score-log:v4 since prompt upgrade).
 // This export is documentation/discoverability; changing it here does NOT affect the relay.
 // If you modify the key, also update scripts/notification-relay.cjs SHADOW_SCORE_LOG_KEY.
-export const SHADOW_SCORE_LOG_KEY = 'shadow:score-log:v3';
+export const SHADOW_SCORE_LOG_KEY = 'shadow:score-log:v4';
 export const STORY_TTL = 604800;           // 7 days — enough for sustained multi-day stories
 export const DIGEST_ACCUMULATOR_TTL = 172800; // 48h — lookback window for digest content

--- a/server/worldmonitor/intelligence/v1/_shared.ts
+++ b/server/worldmonitor/intelligence/v1/_shared.ts
@@ -9,7 +9,7 @@ import { hashString, sha256Hex } from '../../../_shared/hash';
 // ========================================================================

 export const UPSTREAM_TIMEOUT_MS = 25_000;
-const CLASSIFY_CACHE_PREFIX = 'classify:sebuf:v1:';
+const CLASSIFY_CACHE_PREFIX = 'classify:sebuf:v2:';

 // ========================================================================
 // Tier-1 country definitions (used by risk-scores + country-intel-brief)
--- a/server/worldmonitor/intelligence/v1/classify-event.ts
+++ b/server/worldmonitor/intelligence/v1/classify-event.ts
@@ -52,7 +52,22 @@ export async function classifyEvent(
 Levels: critical, high, medium, low, info
 Categories: conflict, protest, disaster, diplomatic, economic, terrorism, cyber, health, environmental, military, crime, infrastructure, tech, general

-Focus: geopolitical events, conflicts, disasters, diplomacy. Classify by real-world severity and impact.
+Guidelines for LEVEL assignment:
+- critical: Active military strikes, mass-casualty events (10+ killed), ceasefire agreements/collapses, nuclear incidents, pandemic declarations, coups, strait/waterway closures
+- high: Armed conflict updates, major diplomatic actions, sanctions packages, significant natural disasters, blockades, terrorist attacks
+- medium: Ongoing conflict analysis, economic impact reports, protest movements, regional policy changes, military exercises
+- low: Diplomatic meetings, trade discussions, humanitarian aid, election updates, peacekeeping deployments
+- info: Opinion/editorial pieces, analysis/explainer articles, historical retrospectives, lifestyle, entertainment, routine local news, tutorials
+
+Key distinction: classify by THE EVENT, not the headline's emotional tone.
+- "Guardian view on ceasefire: need real peace" → editorial, not a ceasefire → info
+- "Trump's obsession with energy" → opinion/analysis → info
+- "Man killed his estranged wife" → domestic crime, not geopolitical → info
+- "How to Crack the SAM Database" → tutorial → info
+- "700 killed in Sudan drone strikes" → mass-casualty → critical
+
+Focus: geopolitical events, conflicts, disasters, diplomacy.
+Classify by real-world event severity, not headline sentiment.

 Return: {"level":"...","category":"..."}`;

--- a/server/worldmonitor/news/v1/_classifier.ts
+++ b/server/worldmonitor/news/v1/_classifier.ts
@@ -101,6 +101,7 @@ const MEDIUM_KEYWORDS: KeywordMap = {
  'epidemic': 'health',
  'infection spread': 'health',
  'oil spill': 'environmental',
+  'ceasefire': 'diplomatic',
  'pipeline explosion': 'infrastructure',
  'blackout': 'infrastructure',
  'power outage': 'infrastructure',
@@ -119,7 +120,6 @@ const LOW_KEYWORDS: KeywordMap = {
  'talks': 'diplomatic',
  'peacekeeping': 'diplomatic',
  'humanitarian aid': 'diplomatic',
-  'ceasefire': 'diplomatic',
  'peace treaty': 'diplomatic',
  'climate change': 'environmental',
  'emissions': 'environmental',
--- a/server/worldmonitor/news/v1/list-feed-digest.ts
+++ b/server/worldmonitor/news/v1/list-feed-digest.ts
@@ -275,7 +275,7 @@ async function enrichWithAiCache(items: ParsedItem[]): Promise<void> {
  const keyMap = new Map<string, ParsedItem[]>();
  for (const item of candidates) {
    const hash = (await sha256Hex(item.title.toLowerCase())).slice(0, 16);
-    const key = `classify:sebuf:v1:${hash}`;
+    const key = `classify:sebuf:v2:${hash}`;
    const existing = keyMap.get(key) ?? [];
    existing.push(item);
    keyMap.set(key, existing);