fix(scoring): upgrade LLM classifier prompt + keywords + cache v2 (#3218)

* fix(scoring): upgrade LLM classifier prompt + keywords + cache v2

PR B of the scoring recalibration plan (docs/plans/2026-04-17-002).
Builds on PR A (weight rebalance, PR #3144) which achieved Pearson
0.669. This PR targets the remaining noise in the 50-69 band where
editorials, tutorials, and domestic crime score alongside real news.

LLM prompt upgrade (both writers):
- scripts/ais-relay.cjs CLASSIFY_SYSTEM_PROMPT: added per-level
  guidelines, content-type distinction (editorial/opinion/tutorial →
  info, domestic crime → info, mass-casualty → critical), and concrete
  counterexamples.
- server/worldmonitor/intelligence/v1/classify-event.ts: same guidelines
  added to align the second cache writer.

Classify cache bump:
- classify:sebuf:v1: → classify:sebuf:v2: in all three locations
  (ais-relay.cjs classifyCacheKey, list-feed-digest.ts enrichWithAiCache,
  _shared.ts CLASSIFY_CACHE_PREFIX). Old v1 entries expire naturally
  (24h TTL). All items reclassified within 15 min of Railway deploy.

Keyword additions (_classifier.ts):
- HIGH: 'sanctions imposed', 'sanctions package', 'new sanctions'
  (phrase patterns — no false positives on 'sanctioned 10 individuals')
- MEDIUM: promoted 'ceasefire' from LOW to reduce prompt/keyword
  misalignment during cold-cache window

Shadow-log v4:
- Clean dataset for post-prompt-change recalibration. v3 rolls off via
  7-day TTL.

Deploy order: Railway first (seedClassify prewarms v2 cache immediately),
then Vercel. First ~15 min of v4 may carry stale digest-cached scores.

🤖 Generated with Claude Opus 4.6 via Claude Code + Compound Engineering v2.49.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(scoring): align classify-event prompt + remove dead keywords + update v4 docs

Review findings on PR #3218:

P1: classify-event.ts prompt was missing 2 counterexamples and the
"Focus" line present in the relay prompt. Both writers share
classify:sebuf:v2 cache, so differing prompts mean nondeterministic
classification depending on which path writes first. Now both prompts
have identical level guidelines and counterexamples (format differs:
array vs single object, but classification logic is aligned).

P2: Removed 3 dead phrase-pattern keywords (sanctions imposed/package/
new) — the existing 'sanctions' entry already substring-matches all of
them and maps to the same (high, economic). Just dead code that misled
readers into thinking they added coverage.

P2: Updated stale v3 references in cache-keys.ts (doc block + exported
constant) and shadow-score-report.mjs header to v4.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Elie Habib
2026-04-20 08:05:13 +04:00
committed by GitHub
parent 84eec7f09f
commit 1581b2dd70
9 changed files with 51 additions and 19 deletions

View File

@@ -22,8 +22,9 @@ export const STORY_TRACKING_TTL_S = 172800;
* TTL for all: 172800s (48h), refreshed each digest cycle.
* Shadow scoring key (written by notification-relay.cjs, which owns the live
* value — the constant here is documentation only, not imported):
* shadow:score-log:v3 ZSet score=epoch_ms, member=JSON{ts,importanceScore,severity,eventType,title,source,publishedAt,corroborationCount,variant}
* shadow:score-log:v2 ZSet legacy (weight rebalance PR) — self-prunes via 7d ZREMRANGEBYSCORE
* shadow:score-log:v4 ZSet score=epoch_ms, member=JSON{ts,importanceScore,severity,eventType,title,source,publishedAt,corroborationCount,variant}
* shadow:score-log:v3 ZSet legacy (weight rebalance) — self-prunes via 7d ZREMRANGEBYSCORE
* shadow:score-log:v2 ZSet legacy (stale-score fix) — self-prunes
* shadow:score-log:v1 ZSet legacy (pre-PR #3069) — self-prunes
*/
export const STORY_TRACK_KEY = (titleHash: string) => `story:track:v1:${titleHash}`;
@@ -31,10 +32,10 @@ export const STORY_SOURCES_KEY = (titleHash: string) => `story:sources:v1:${titl
export const STORY_PEAK_KEY = (titleHash: string) => `story:peak:v1:${titleHash}`;
export const DIGEST_ACCUMULATOR_KEY = (variant: string, lang = 'en') => `digest:accumulator:v1:${variant}:${lang}`;
export const DIGEST_LAST_SENT_KEY = (userId: string, variant: string) => `digest:last-sent:v1:${userId}:${variant}`;
// NOTE: notification-relay.cjs owns the live value (shadow:score-log:v3 since weight rebalance).
// NOTE: notification-relay.cjs owns the live value (shadow:score-log:v4 since prompt upgrade).
// This export is documentation/discoverability; changing it here does NOT affect the relay.
// If you modify the key, also update scripts/notification-relay.cjs SHADOW_SCORE_LOG_KEY.
export const SHADOW_SCORE_LOG_KEY = 'shadow:score-log:v3';
export const SHADOW_SCORE_LOG_KEY = 'shadow:score-log:v4';
export const STORY_TTL = 604800; // 7 days — enough for sustained multi-day stories
export const DIGEST_ACCUMULATOR_TTL = 172800; // 48h — lookback window for digest content

View File

@@ -9,7 +9,7 @@ import { hashString, sha256Hex } from '../../../_shared/hash';
// ========================================================================
export const UPSTREAM_TIMEOUT_MS = 25_000;
const CLASSIFY_CACHE_PREFIX = 'classify:sebuf:v1:';
const CLASSIFY_CACHE_PREFIX = 'classify:sebuf:v2:';
// ========================================================================
// Tier-1 country definitions (used by risk-scores + country-intel-brief)

View File

@@ -52,7 +52,22 @@ export async function classifyEvent(
Levels: critical, high, medium, low, info
Categories: conflict, protest, disaster, diplomatic, economic, terrorism, cyber, health, environmental, military, crime, infrastructure, tech, general
Focus: geopolitical events, conflicts, disasters, diplomacy. Classify by real-world severity and impact.
Guidelines for LEVEL assignment:
- critical: Active military strikes, mass-casualty events (10+ killed), ceasefire agreements/collapses, nuclear incidents, pandemic declarations, coups, strait/waterway closures
- high: Armed conflict updates, major diplomatic actions, sanctions packages, significant natural disasters, blockades, terrorist attacks
- medium: Ongoing conflict analysis, economic impact reports, protest movements, regional policy changes, military exercises
- low: Diplomatic meetings, trade discussions, humanitarian aid, election updates, peacekeeping deployments
- info: Opinion/editorial pieces, analysis/explainer articles, historical retrospectives, lifestyle, entertainment, routine local news, tutorials
Key distinction: classify by THE EVENT, not the headline's emotional tone.
- "Guardian view on ceasefire: need real peace" → editorial, not a ceasefire → info
- "Trump's obsession with energy" → opinion/analysis → info
- "Man killed his estranged wife" → domestic crime, not geopolitical → info
- "How to Crack the SAM Database" → tutorial → info
- "700 killed in Sudan drone strikes" → mass-casualty → critical
Focus: geopolitical events, conflicts, disasters, diplomacy.
Classify by real-world event severity, not headline sentiment.
Return: {"level":"...","category":"..."}`;

View File

@@ -101,6 +101,7 @@ const MEDIUM_KEYWORDS: KeywordMap = {
'epidemic': 'health',
'infection spread': 'health',
'oil spill': 'environmental',
'ceasefire': 'diplomatic',
'pipeline explosion': 'infrastructure',
'blackout': 'infrastructure',
'power outage': 'infrastructure',
@@ -119,7 +120,6 @@ const LOW_KEYWORDS: KeywordMap = {
'talks': 'diplomatic',
'peacekeeping': 'diplomatic',
'humanitarian aid': 'diplomatic',
'ceasefire': 'diplomatic',
'peace treaty': 'diplomatic',
'climate change': 'environmental',
'emissions': 'environmental',

View File

@@ -275,7 +275,7 @@ async function enrichWithAiCache(items: ParsedItem[]): Promise<void> {
const keyMap = new Map<string, ParsedItem[]>();
for (const item of candidates) {
const hash = (await sha256Hex(item.title.toLowerCase())).slice(0, 16);
const key = `classify:sebuf:v1:${hash}`;
const key = `classify:sebuf:v2:${hash}`;
const existing = keyMap.get(key) ?? [];
existing.push(item);
keyMap.set(key, existing);