fix(scoring): relay recomputes importanceScore post-LLM + shadow-log v2 + parity test (#3069)

* fix(scoring): relay recomputes importanceScore post-LLM + shadow-log v2 + parity test Before this change, classified rss_alert events published by ais-relay carried a stale importanceScore: the digest computed it from the keyword-level threat before the LLM upgrade, and the relay republished that value unchanged. Shadow log (2,850 entries / 7 days) had Pearson 0.31 vs human rating with zero events reaching the ≥85 critical gate — the score being measured was the keyword fallback, not the AI classification. Fixes: - ais-relay.cjs: recompute importanceScore locally from the post-LLM level using an exact mirror of the digest formula (SEVERITY_SCORES, SCORE_WEIGHTS, SOURCE_TIERS, formula). Publish includes corroborationCount for downstream shadow-log enrichment. - notification-relay.cjs: delete the duplicate shadowLogScore() call that produced ~50% near-duplicate pairs. Move shadow log to v2 key with JSON-encoded members carrying severity, source, corroborationCount, variant — future calibration cycles get cleaner data. - shadow-score-{report,rank}.mjs: parse both v2 JSON and legacy v1 string members; default to v2, override via SHADOW_SCORE_KEY env. - _classifier.ts: narrow keyword additions — blockade, siege, sanction (singular), escalation → HIGH; evacuation orders (plural) → CRITICAL. - tests/importance-score-parity.test.mjs: extracts tier map and formula from both TS digest and CJS relay sources, asserts identical output across 12 sample cases. Catches any future drift. - tests/relay-importance-recompute.test.mjs + notification-relay-shadow-log .test.mjs: regression tests for the publish path and single-write discipline. Gates remain OFF. After deploy, collect 48h of fresh shadow:score-log:v2 data, re-run scripts/shadow-score-rank.mjs for calibration, then set final IMPORTANCE_SCORE_MIN / high / critical thresholds before enabling IMPORTANCE_SCORE_LIVE=1. See docs/internal/scoringDiagnostic.md (local) for full diagnosis. 🤖 Generated with Claude Sonnet 4.6 via Claude Code + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(scoring): PR #3069 review amendments — revert risky keywords + extract SOURCE_TIERS Addresses review findings on PR #3069 (todos/193 through 204). BLOCKING (P1): - Revert 5 keyword additions in _classifier.ts. Review showed `escalation`, `sanction`, `siege`, `blockade`, `evacuation orders` fire HIGH on `de-escalation`, `sanctioned`, `besieged`, `blockaded` (substring matches), and the plural `evacuation orders` is already covered by the singular. Classifier work will land in a separate PR with phrase-based rules. - Delete dead `digestImportanceScore` field from relay allTitles metadata (written in two places, read nowhere). IMPORTANT (P2): - Extract SOURCE_TIERS to shared/source-tiers.{json,cjs} using the existing shared/rss-allowed-domains.* precedent. Dockerfile.relay already `COPY shared/` (whole dir), so no infra change. Deletes 255-line inline duplicate from ais-relay.cjs; TS digest imports the same JSON via resolveJsonModule. Tier-map parity is now structural. - Simplify parity test — tier extraction no longer needed. SEVERITY_SCORES + SCORE_WEIGHTS + scoring function parity retained across 12 cases plus an unknown-level defensiveness case. Deleted no-op regex replace (`.replace(/X/g, 'X')`). Fixed misleading recency docstring. - Pipeline shadow log: ZADD + ZREMRANGEBYSCORE + belt+suspenders EXPIRE now go in a single Upstash /pipeline POST (~50% RT reduction, no billing delta). - Bounded ZRANGE in shadow-score-report.mjs (20k cap + warn if reached). - Bump outbound webhook envelope v1→v2 to signal the new `corroborationCount` field on rss_alert payloads. - Restore rss_alert eventType gate at shadowLogScore caller (skip promise cost for non-rss events). - Align ais-relay scorer comment with reality: it has ONE intentional deviation from digest (`?? 0` on severity for defensiveness, returning 0 vs NaN on unknown levels). Documented + tested. P3: - Narrow loadEnv in scripts to only UPSTASH_REDIS_REST_* (was setting any UPPER_UNDERSCORE env var from .env.local). - Escape markdown specials in rating-sheet.md title embeds. Pre-existing activation blockers NOT fixed here (tracked as todos 196, 203): `/api/notify` accepts arbitrary importanceScore from authenticated Pro users, and notification channel bodies don't escape mrkdwn/Discord markup. Both must close before `IMPORTANCE_SCORE_LIVE=1`. Net: -614 lines (more deleted than added). 26 regression assertions pass. npm run typecheck, typecheck:api, test:data all pass. 🤖 Generated with Claude Sonnet 4.6 via Claude Code + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(scoring): mirror source-tiers.{json,cjs} into scripts/shared/ The `scripts-shared-mirror` enforcement in tests/edge-functions.test.mjs requires every *.json and *.cjs in shared/ to have a byte-identical copy in scripts/shared/ (Railway rootDirectory=scripts deploy bundle cannot reach repo-root shared/). Last commit added shared/source-tiers.{json,cjs} without mirroring them. CI caught it. * fix(scoring): revert webhook envelope to v1 + log shadow-log pipeline failures Two P1/P2 review findings on PR #3069: P1: Bumping webhook envelope to version: '2' was a unilateral breaking change — the other webhook producers (proactive-intelligence.mjs:407, seed-digest-notifications.mjs:736) still emit v1, so the same endpoint would receive mixed envelope versions per event type. A consumer validating `version === '1'` would break specifically on realtime rss_alert deliveries while proactive_brief and digest events kept working. Revert to '1' and document why — `corroborationCount` is an additive payload field, backwards-compatible for typical consumers; strict consumers using `additionalProperties: false` should be handled via a coordinated version bump across all producers in a separate PR. P2: The new shadow-log /pipeline write swallowed all errors silently (no resp.ok check, no per-command error inspection), a regression from the old upstashRest() path which at least logged non-2xx. Since the 48h recalibration cycle depends on shadow:score-log:v2 filling with clean data, a bad auth token or malformed pipeline body would leave operators staring at an empty ZSET with no signal. Now logs HTTP failures and per-command pipeline errors. * docs(scoring): fix stale v1 references + clarify two-copy source-tiers mirror Two follow-up review findings on PR #3069: P2 — Source-tier "single source of truth" comments were outdated. PR #3069 ships TWO JSON copies (shared/source-tiers.json for Vercel edge + main relay container, scripts/shared/source-tiers.json for Railway services using rootDirectory=scripts). Comments in server/_shared/source-tiers.ts and scripts/ais-relay.cjs now explicitly document the mirror setup and point at the two tests that enforce byte-identity: the existing scripts-shared-mirror test (tests/edge-functions.test.mjs:37-48) and a new explicit cross-check in tests/importance-score-parity.test.mjs. Adding the assertion here is belt-and-suspenders: if edge-functions.test.mjs is ever narrowed, the parity test still catches drift. P3 — Stale v1 references in shared metadata/docs. The actual writer moved to shadow:score-log:v2 in notification-relay.cjs, but server/_shared/cache-keys.ts:23-31 still documented v1 and exported the v1 string. No runtime impact (the export has zero importers — relay uses its own local const) but misleading. Updated the doc block to explain both v1 (legacy, self-pruning) and v2 (current), bumped the constant to v2, and added a comment that notification-relay.cjs is the owner. Header comment in scripts/shadow-score-report.mjs now documents the SHADOW_SCORE_KEY override path too. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:14:57 +02:00 · 2026-04-13 21:53:21 +04:00
parent 1525518498
commit 7aa8dd1bf2
13 changed files with 1503 additions and 347 deletions
--- a/shared/source-tiers.cjs
+++ b/shared/source-tiers.cjs
@@ -0,0 +1,2 @@
+// CJS wrapper — source of truth is source-tiers.json
+module.exports = require('./source-tiers.json');
--- a/shared/source-tiers.json
+++ b/shared/source-tiers.json
@@ -0,0 +1,257 @@
+{
+  "Reuters": 1,
+  "Reuters World": 1,
+  "Reuters Business": 1,
+  "Reuters US": 1,
+  "AP News": 1,
+  "AFP": 1,
+  "Bloomberg": 1,
+  "White House": 1,
+  "State Dept": 1,
+  "Pentagon": 1,
+  "UN News": 1,
+  "CISA": 1,
+  "UK MOD": 1,
+  "IAEA": 1,
+  "WHO": 1,
+  "UNHCR": 1,
+  "MIIT (China)": 1,
+  "MOFCOM (China)": 1,
+  "Tagesschau": 1,
+  "ANSA": 1,
+  "NOS Nieuws": 1,
+  "SVT Nyheter": 1,
+  "BBC World": 2,
+  "BBC Middle East": 2,
+  "BBC Persian": 2,
+  "Guardian World": 2,
+  "Guardian ME": 2,
+  "NPR News": 2,
+  "CNN World": 2,
+  "CNBC": 2,
+  "MarketWatch": 2,
+  "Al Jazeera": 2,
+  "Financial Times": 2,
+  "Politico": 2,
+  "Axios": 2,
+  "EuroNews": 2,
+  "France 24": 2,
+  "Le Monde": 2,
+  "Wall Street Journal": 1,
+  "Fox News": 2,
+  "NBC News": 2,
+  "CBS News": 2,
+  "ABC News": 2,
+  "PBS NewsHour": 2,
+  "The National": 2,
+  "Yonhap News": 2,
+  "Chosun Ilbo": 2,
+  "El País": 2,
+  "El Mundo": 2,
+  "BBC Mundo": 2,
+  "Brasil Paralelo": 2,
+  "Der Spiegel": 2,
+  "Die Zeit": 2,
+  "DW News": 2,
+  "Corriere della Sera": 2,
+  "Repubblica": 2,
+  "NRC": 2,
+  "De Telegraaf": 2,
+  "Dagens Nyheter": 2,
+  "Svenska Dagbladet": 2,
+  "BBC Turkce": 2,
+  "DW Turkish": 2,
+  "Hurriyet": 2,
+  "TVN24": 2,
+  "Polsat News": 2,
+  "Rzeczpospolita": 2,
+  "BBC Russian": 2,
+  "Meduza": 2,
+  "Novaya Gazeta Europe": 2,
+  "Bangkok Post": 2,
+  "Thai PBS": 2,
+  "ABC News Australia": 2,
+  "Guardian Australia": 2,
+  "VnExpress": 2,
+  "Tuoi Tre News": 2,
+  "Nikkei Tech": 2,
+  "NHK World": 2,
+  "Nikkei Asia": 2,
+  "Kathimerini": 2,
+  "Naftemporiki": 2,
+  "Premium Times": 2,
+  "Vanguard Nigeria": 2,
+  "Channels TV": 2,
+  "ThisDay": 2,
+  "Treasury": 2,
+  "DOJ": 2,
+  "DHS": 2,
+  "CDC": 2,
+  "FEMA": 2,
+  "Military Times": 2,
+  "USNI News": 2,
+  "Oryx OSINT": 2,
+  "RUSI": 2,
+  "CNAS": 2,
+  "Arms Control Assn": 2,
+  "Bulletin of Atomic Scientists": 2,
+  "FAO GIEWS": 2,
+  "War on the Rocks": 2,
+  "DigiChina": 2,
+  "Y Combinator Blog": 2,
+  "a16z Blog": 2,
+  "Sequoia Blog": 2,
+  "Crunchbase News": 2,
+  "CB Insights": 2,
+  "PitchBook News": 2,
+  "The Information": 2,
+  "Paul Graham Essays": 2,
+  "Stratechery": 2,
+  "Acquired Podcast": 2,
+  "All-In Podcast": 2,
+  "a16z Podcast": 2,
+  "The Twenty Minute VC": 2,
+  "Hard Fork (NYT)": 2,
+  "Pivot (Vox)": 2,
+  "Benedict Evans": 2,
+  "The Pragmatic Engineer": 2,
+  "Lenny Newsletter": 2,
+  "How I Built This": 2,
+  "Masters of Scale": 2,
+  "Good News Network": 2,
+  "Positive.News": 2,
+  "Reasons to be Cheerful": 2,
+  "Optimist Daily": 2,
+  "Yes! Magazine": 2,
+  "My Modern Met": 2,
+  "Politico Tech": 2,
+  "EU Commission Digital": 2,
+  "OECD Digital": 2,
+  "Stanford HAI": 2,
+  "Defense One": 3,
+  "Breaking Defense": 3,
+  "The War Zone": 3,
+  "Defense News": 3,
+  "Janes": 3,
+  "Task & Purpose": 3,
+  "gCaptain": 3,
+  "Foreign Policy": 3,
+  "The Diplomat": 3,
+  "Bellingcat": 3,
+  "Atlantic Council": 3,
+  "Foreign Affairs": 3,
+  "CrisisWatch": 3,
+  "CSIS": 3,
+  "RAND": 3,
+  "Brookings": 3,
+  "Carnegie": 3,
+  "Krebs Security": 3,
+  "Ransomware.live": 3,
+  "Federal Reserve": 3,
+  "SEC": 3,
+  "MIT Tech Review": 3,
+  "Ars Technica": 3,
+  "Iran International": 3,
+  "Fars News": 3,
+  "Xinhua": 3,
+  "TASS": 3,
+  "RT": 3,
+  "RT Russia": 3,
+  "Layoffs.fyi": 3,
+  "OpenAI News": 3,
+  "The Hill": 3,
+  "Brookings Tech": 3,
+  "CSIS Tech": 3,
+  "MIT Tech Policy": 3,
+  "AI Now Institute": 3,
+  "Bruegel (EU)": 3,
+  "Chatham House Tech": 3,
+  "ISEAS (Singapore)": 3,
+  "ORF Tech (India)": 3,
+  "RIETI (Japan)": 3,
+  "Lowy Institute": 3,
+  "China Tech Analysis": 3,
+  "Wilson Center": 3,
+  "GMF": 3,
+  "Stimson Center": 3,
+  "EU ISS": 3,
+  "AEI": 3,
+  "Responsible Statecraft": 3,
+  "FPRI": 3,
+  "Jamestown": 3,
+  "AI Regulation": 3,
+  "Tech Antitrust": 3,
+  "EFF News": 3,
+  "EU Digital Policy": 3,
+  "Euractiv Digital": 3,
+  "China Tech Policy": 3,
+  "UK Tech Policy": 3,
+  "India Tech Policy": 3,
+  "EU Startups": 3,
+  "Tech.eu": 3,
+  "Sifted (Europe)": 3,
+  "The Next Web": 3,
+  "Tech in Asia": 3,
+  "TechCabal (Africa)": 3,
+  "Inc42 (India)": 3,
+  "YourStory": 3,
+  "e27 (SEA)": 3,
+  "DealStreetAsia": 3,
+  "Pandaily (China)": 3,
+  "36Kr English": 3,
+  "TechNode (China)": 3,
+  "China Tech News": 3,
+  "The Bridge (Japan)": 3,
+  "Japan Tech News": 3,
+  "Korea Tech News": 3,
+  "KED Global": 3,
+  "Entrackr (India)": 3,
+  "India Tech News": 3,
+  "Taiwan Tech News": 3,
+  "La Silla Vacía": 3,
+  "LATAM Tech News": 3,
+  "Startups.co (LATAM)": 3,
+  "Contxto (LATAM)": 3,
+  "Brazil Tech News": 3,
+  "Mexico Tech News": 3,
+  "LATAM Fintech": 3,
+  "Wamda (MENA)": 3,
+  "Magnitt": 3,
+  "Daily Trust": 3,
+  "in.gr": 3,
+  "iefimerida": 3,
+  "Proto Thema": 3,
+  "This Week in Startups": 3,
+  "Lex Fridman Tech": 3,
+  "The Vergecast": 3,
+  "Decoder (Verge)": 3,
+  "AI Podcast (NVIDIA)": 3,
+  "Gradient Dissent": 3,
+  "Eye on AI": 3,
+  "The Pitch": 3,
+  "Upworthy": 3,
+  "DailyGood": 3,
+  "Good Good Good": 3,
+  "GOOD Magazine": 3,
+  "Sunny Skyz": 3,
+  "The Better India": 3,
+  "Mongabay": 3,
+  "Conservation Optimism": 3,
+  "Shareable": 3,
+  "GNN Heroes Spotlight": 3,
+  "GNN Science": 3,
+  "GNN Animals": 3,
+  "GNN Health": 3,
+  "GNN Heroes": 3,
+  "GNN Earth": 3,
+  "Hacker News": 4,
+  "The Verge": 4,
+  "The Verge AI": 4,
+  "VentureBeat AI": 4,
+  "Yahoo Finance": 4,
+  "TechCrunch Layoffs": 4,
+  "ArXiv AI": 4,
+  "AI News": 4,
+  "Layoffs News": 4,
+  "GloNewswire (Taiwan)": 4
+}