mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
fix(scoring): relay recomputes importanceScore post-LLM + shadow-log v2 + parity test (#3069)
* fix(scoring): relay recomputes importanceScore post-LLM + shadow-log v2 + parity test
Before this change, classified rss_alert events published by ais-relay carried
a stale importanceScore: the digest computed it from the keyword-level threat
before the LLM upgrade, and the relay republished that value unchanged. Shadow
log (2,850 entries / 7 days) had Pearson 0.31 vs human rating with zero events
reaching the ≥85 critical gate — the score being measured was the keyword
fallback, not the AI classification.
Fixes:
- ais-relay.cjs: recompute importanceScore locally from the post-LLM level
using an exact mirror of the digest formula (SEVERITY_SCORES, SCORE_WEIGHTS,
SOURCE_TIERS, formula). Publish includes corroborationCount for downstream
shadow-log enrichment.
- notification-relay.cjs: delete the duplicate shadowLogScore() call that
produced ~50% near-duplicate pairs. Move shadow log to v2 key with
JSON-encoded members carrying severity, source, corroborationCount,
variant — future calibration cycles get cleaner data.
- shadow-score-{report,rank}.mjs: parse both v2 JSON and legacy v1 string
members; default to v2, override via SHADOW_SCORE_KEY env.
- _classifier.ts: narrow keyword additions — blockade, siege, sanction
(singular), escalation → HIGH; evacuation orders (plural) → CRITICAL.
- tests/importance-score-parity.test.mjs: extracts tier map and formula from
both TS digest and CJS relay sources, asserts identical output across 12
sample cases. Catches any future drift.
- tests/relay-importance-recompute.test.mjs + notification-relay-shadow-log
.test.mjs: regression tests for the publish path and single-write discipline.
Gates remain OFF. After deploy, collect 48h of fresh shadow:score-log:v2
data, re-run scripts/shadow-score-rank.mjs for calibration, then set final
IMPORTANCE_SCORE_MIN / high / critical thresholds before enabling
IMPORTANCE_SCORE_LIVE=1.
See docs/internal/scoringDiagnostic.md (local) for full diagnosis.
🤖 Generated with Claude Sonnet 4.6 via Claude Code + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(scoring): PR #3069 review amendments — revert risky keywords + extract SOURCE_TIERS
Addresses review findings on PR #3069 (todos/193 through 204).
BLOCKING (P1):
- Revert 5 keyword additions in _classifier.ts. Review showed `escalation`,
`sanction`, `siege`, `blockade`, `evacuation orders` fire HIGH on
`de-escalation`, `sanctioned`, `besieged`, `blockaded` (substring matches),
and the plural `evacuation orders` is already covered by the singular.
Classifier work will land in a separate PR with phrase-based rules.
- Delete dead `digestImportanceScore` field from relay allTitles metadata
(written in two places, read nowhere).
IMPORTANT (P2):
- Extract SOURCE_TIERS to shared/source-tiers.{json,cjs} using the
existing shared/rss-allowed-domains.* precedent. Dockerfile.relay
already `COPY shared/` (whole dir), so no infra change. Deletes
255-line inline duplicate from ais-relay.cjs; TS digest imports the
same JSON via resolveJsonModule. Tier-map parity is now structural.
- Simplify parity test — tier extraction no longer needed. SEVERITY_SCORES
+ SCORE_WEIGHTS + scoring function parity retained across 12 cases
plus an unknown-level defensiveness case. Deleted no-op regex replace
(`.replace(/X/g, 'X')`). Fixed misleading recency docstring.
- Pipeline shadow log: ZADD + ZREMRANGEBYSCORE + belt+suspenders EXPIRE
now go in a single Upstash /pipeline POST (~50% RT reduction, no
billing delta).
- Bounded ZRANGE in shadow-score-report.mjs (20k cap + warn if reached).
- Bump outbound webhook envelope v1→v2 to signal the new
`corroborationCount` field on rss_alert payloads.
- Restore rss_alert eventType gate at shadowLogScore caller (skip
promise cost for non-rss events).
- Align ais-relay scorer comment with reality: it has ONE intentional
deviation from digest (`?? 0` on severity for defensiveness, returning
0 vs NaN on unknown levels). Documented + tested.
P3:
- Narrow loadEnv in scripts to only UPSTASH_REDIS_REST_* (was setting
any UPPER_UNDERSCORE env var from .env.local).
- Escape markdown specials in rating-sheet.md title embeds.
Pre-existing activation blockers NOT fixed here (tracked as todos 196,
203): `/api/notify` accepts arbitrary importanceScore from authenticated
Pro users, and notification channel bodies don't escape mrkdwn/Discord
markup. Both must close before `IMPORTANCE_SCORE_LIVE=1`.
Net: -614 lines (more deleted than added). 26 regression assertions pass.
npm run typecheck, typecheck:api, test:data all pass.
🤖 Generated with Claude Sonnet 4.6 via Claude Code + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(scoring): mirror source-tiers.{json,cjs} into scripts/shared/
The `scripts-shared-mirror` enforcement in tests/edge-functions.test.mjs
requires every *.json and *.cjs in shared/ to have a byte-identical copy
in scripts/shared/ (Railway rootDirectory=scripts deploy bundle cannot
reach repo-root shared/). Last commit added shared/source-tiers.{json,cjs}
without mirroring them. CI caught it.
* fix(scoring): revert webhook envelope to v1 + log shadow-log pipeline failures
Two P1/P2 review findings on PR #3069:
P1: Bumping webhook envelope to version: '2' was a unilateral breaking
change — the other webhook producers (proactive-intelligence.mjs:407,
seed-digest-notifications.mjs:736) still emit v1, so the same endpoint
would receive mixed envelope versions per event type. A consumer
validating `version === '1'` would break specifically on realtime
rss_alert deliveries while proactive_brief and digest events kept
working. Revert to '1' and document why — `corroborationCount` is an
additive payload field, backwards-compatible for typical consumers;
strict consumers using `additionalProperties: false` should be handled
via a coordinated version bump across all producers in a separate PR.
P2: The new shadow-log /pipeline write swallowed all errors silently
(no resp.ok check, no per-command error inspection), a regression from
the old upstashRest() path which at least logged non-2xx. Since the
48h recalibration cycle depends on shadow:score-log:v2 filling with
clean data, a bad auth token or malformed pipeline body would leave
operators staring at an empty ZSET with no signal. Now logs HTTP
failures and per-command pipeline errors.
* docs(scoring): fix stale v1 references + clarify two-copy source-tiers mirror
Two follow-up review findings on PR #3069:
P2 — Source-tier "single source of truth" comments were outdated.
PR #3069 ships TWO JSON copies (shared/source-tiers.json for Vercel edge +
main relay container, scripts/shared/source-tiers.json for Railway services
using rootDirectory=scripts). Comments in server/_shared/source-tiers.ts
and scripts/ais-relay.cjs now explicitly document the mirror setup and
point at the two tests that enforce byte-identity: the existing
scripts-shared-mirror test (tests/edge-functions.test.mjs:37-48) and a
new explicit cross-check in tests/importance-score-parity.test.mjs.
Adding the assertion here is belt-and-suspenders: if edge-functions.test.mjs
is ever narrowed, the parity test still catches drift.
P3 — Stale v1 references in shared metadata/docs. The actual writer
moved to shadow:score-log:v2 in notification-relay.cjs, but
server/_shared/cache-keys.ts:23-31 still documented v1 and exported the
v1 string. No runtime impact (the export has zero importers — relay
uses its own local const) but misleading. Updated the doc block to
explain both v1 (legacy, self-pruning) and v2 (current), bumped the
constant to v2, and added a comment that notification-relay.cjs is the
owner. Header comment in scripts/shadow-score-report.mjs now documents
the SHADOW_SCORE_KEY override path too.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2
shared/source-tiers.cjs
Normal file
2
shared/source-tiers.cjs
Normal file
@@ -0,0 +1,2 @@
|
||||
// CJS wrapper — source of truth is source-tiers.json
|
||||
module.exports = require('./source-tiers.json');
|
||||
257
shared/source-tiers.json
Normal file
257
shared/source-tiers.json
Normal file
@@ -0,0 +1,257 @@
|
||||
{
|
||||
"Reuters": 1,
|
||||
"Reuters World": 1,
|
||||
"Reuters Business": 1,
|
||||
"Reuters US": 1,
|
||||
"AP News": 1,
|
||||
"AFP": 1,
|
||||
"Bloomberg": 1,
|
||||
"White House": 1,
|
||||
"State Dept": 1,
|
||||
"Pentagon": 1,
|
||||
"UN News": 1,
|
||||
"CISA": 1,
|
||||
"UK MOD": 1,
|
||||
"IAEA": 1,
|
||||
"WHO": 1,
|
||||
"UNHCR": 1,
|
||||
"MIIT (China)": 1,
|
||||
"MOFCOM (China)": 1,
|
||||
"Tagesschau": 1,
|
||||
"ANSA": 1,
|
||||
"NOS Nieuws": 1,
|
||||
"SVT Nyheter": 1,
|
||||
"BBC World": 2,
|
||||
"BBC Middle East": 2,
|
||||
"BBC Persian": 2,
|
||||
"Guardian World": 2,
|
||||
"Guardian ME": 2,
|
||||
"NPR News": 2,
|
||||
"CNN World": 2,
|
||||
"CNBC": 2,
|
||||
"MarketWatch": 2,
|
||||
"Al Jazeera": 2,
|
||||
"Financial Times": 2,
|
||||
"Politico": 2,
|
||||
"Axios": 2,
|
||||
"EuroNews": 2,
|
||||
"France 24": 2,
|
||||
"Le Monde": 2,
|
||||
"Wall Street Journal": 1,
|
||||
"Fox News": 2,
|
||||
"NBC News": 2,
|
||||
"CBS News": 2,
|
||||
"ABC News": 2,
|
||||
"PBS NewsHour": 2,
|
||||
"The National": 2,
|
||||
"Yonhap News": 2,
|
||||
"Chosun Ilbo": 2,
|
||||
"El País": 2,
|
||||
"El Mundo": 2,
|
||||
"BBC Mundo": 2,
|
||||
"Brasil Paralelo": 2,
|
||||
"Der Spiegel": 2,
|
||||
"Die Zeit": 2,
|
||||
"DW News": 2,
|
||||
"Corriere della Sera": 2,
|
||||
"Repubblica": 2,
|
||||
"NRC": 2,
|
||||
"De Telegraaf": 2,
|
||||
"Dagens Nyheter": 2,
|
||||
"Svenska Dagbladet": 2,
|
||||
"BBC Turkce": 2,
|
||||
"DW Turkish": 2,
|
||||
"Hurriyet": 2,
|
||||
"TVN24": 2,
|
||||
"Polsat News": 2,
|
||||
"Rzeczpospolita": 2,
|
||||
"BBC Russian": 2,
|
||||
"Meduza": 2,
|
||||
"Novaya Gazeta Europe": 2,
|
||||
"Bangkok Post": 2,
|
||||
"Thai PBS": 2,
|
||||
"ABC News Australia": 2,
|
||||
"Guardian Australia": 2,
|
||||
"VnExpress": 2,
|
||||
"Tuoi Tre News": 2,
|
||||
"Nikkei Tech": 2,
|
||||
"NHK World": 2,
|
||||
"Nikkei Asia": 2,
|
||||
"Kathimerini": 2,
|
||||
"Naftemporiki": 2,
|
||||
"Premium Times": 2,
|
||||
"Vanguard Nigeria": 2,
|
||||
"Channels TV": 2,
|
||||
"ThisDay": 2,
|
||||
"Treasury": 2,
|
||||
"DOJ": 2,
|
||||
"DHS": 2,
|
||||
"CDC": 2,
|
||||
"FEMA": 2,
|
||||
"Military Times": 2,
|
||||
"USNI News": 2,
|
||||
"Oryx OSINT": 2,
|
||||
"RUSI": 2,
|
||||
"CNAS": 2,
|
||||
"Arms Control Assn": 2,
|
||||
"Bulletin of Atomic Scientists": 2,
|
||||
"FAO GIEWS": 2,
|
||||
"War on the Rocks": 2,
|
||||
"DigiChina": 2,
|
||||
"Y Combinator Blog": 2,
|
||||
"a16z Blog": 2,
|
||||
"Sequoia Blog": 2,
|
||||
"Crunchbase News": 2,
|
||||
"CB Insights": 2,
|
||||
"PitchBook News": 2,
|
||||
"The Information": 2,
|
||||
"Paul Graham Essays": 2,
|
||||
"Stratechery": 2,
|
||||
"Acquired Podcast": 2,
|
||||
"All-In Podcast": 2,
|
||||
"a16z Podcast": 2,
|
||||
"The Twenty Minute VC": 2,
|
||||
"Hard Fork (NYT)": 2,
|
||||
"Pivot (Vox)": 2,
|
||||
"Benedict Evans": 2,
|
||||
"The Pragmatic Engineer": 2,
|
||||
"Lenny Newsletter": 2,
|
||||
"How I Built This": 2,
|
||||
"Masters of Scale": 2,
|
||||
"Good News Network": 2,
|
||||
"Positive.News": 2,
|
||||
"Reasons to be Cheerful": 2,
|
||||
"Optimist Daily": 2,
|
||||
"Yes! Magazine": 2,
|
||||
"My Modern Met": 2,
|
||||
"Politico Tech": 2,
|
||||
"EU Commission Digital": 2,
|
||||
"OECD Digital": 2,
|
||||
"Stanford HAI": 2,
|
||||
"Defense One": 3,
|
||||
"Breaking Defense": 3,
|
||||
"The War Zone": 3,
|
||||
"Defense News": 3,
|
||||
"Janes": 3,
|
||||
"Task & Purpose": 3,
|
||||
"gCaptain": 3,
|
||||
"Foreign Policy": 3,
|
||||
"The Diplomat": 3,
|
||||
"Bellingcat": 3,
|
||||
"Atlantic Council": 3,
|
||||
"Foreign Affairs": 3,
|
||||
"CrisisWatch": 3,
|
||||
"CSIS": 3,
|
||||
"RAND": 3,
|
||||
"Brookings": 3,
|
||||
"Carnegie": 3,
|
||||
"Krebs Security": 3,
|
||||
"Ransomware.live": 3,
|
||||
"Federal Reserve": 3,
|
||||
"SEC": 3,
|
||||
"MIT Tech Review": 3,
|
||||
"Ars Technica": 3,
|
||||
"Iran International": 3,
|
||||
"Fars News": 3,
|
||||
"Xinhua": 3,
|
||||
"TASS": 3,
|
||||
"RT": 3,
|
||||
"RT Russia": 3,
|
||||
"Layoffs.fyi": 3,
|
||||
"OpenAI News": 3,
|
||||
"The Hill": 3,
|
||||
"Brookings Tech": 3,
|
||||
"CSIS Tech": 3,
|
||||
"MIT Tech Policy": 3,
|
||||
"AI Now Institute": 3,
|
||||
"Bruegel (EU)": 3,
|
||||
"Chatham House Tech": 3,
|
||||
"ISEAS (Singapore)": 3,
|
||||
"ORF Tech (India)": 3,
|
||||
"RIETI (Japan)": 3,
|
||||
"Lowy Institute": 3,
|
||||
"China Tech Analysis": 3,
|
||||
"Wilson Center": 3,
|
||||
"GMF": 3,
|
||||
"Stimson Center": 3,
|
||||
"EU ISS": 3,
|
||||
"AEI": 3,
|
||||
"Responsible Statecraft": 3,
|
||||
"FPRI": 3,
|
||||
"Jamestown": 3,
|
||||
"AI Regulation": 3,
|
||||
"Tech Antitrust": 3,
|
||||
"EFF News": 3,
|
||||
"EU Digital Policy": 3,
|
||||
"Euractiv Digital": 3,
|
||||
"China Tech Policy": 3,
|
||||
"UK Tech Policy": 3,
|
||||
"India Tech Policy": 3,
|
||||
"EU Startups": 3,
|
||||
"Tech.eu": 3,
|
||||
"Sifted (Europe)": 3,
|
||||
"The Next Web": 3,
|
||||
"Tech in Asia": 3,
|
||||
"TechCabal (Africa)": 3,
|
||||
"Inc42 (India)": 3,
|
||||
"YourStory": 3,
|
||||
"e27 (SEA)": 3,
|
||||
"DealStreetAsia": 3,
|
||||
"Pandaily (China)": 3,
|
||||
"36Kr English": 3,
|
||||
"TechNode (China)": 3,
|
||||
"China Tech News": 3,
|
||||
"The Bridge (Japan)": 3,
|
||||
"Japan Tech News": 3,
|
||||
"Korea Tech News": 3,
|
||||
"KED Global": 3,
|
||||
"Entrackr (India)": 3,
|
||||
"India Tech News": 3,
|
||||
"Taiwan Tech News": 3,
|
||||
"La Silla Vacía": 3,
|
||||
"LATAM Tech News": 3,
|
||||
"Startups.co (LATAM)": 3,
|
||||
"Contxto (LATAM)": 3,
|
||||
"Brazil Tech News": 3,
|
||||
"Mexico Tech News": 3,
|
||||
"LATAM Fintech": 3,
|
||||
"Wamda (MENA)": 3,
|
||||
"Magnitt": 3,
|
||||
"Daily Trust": 3,
|
||||
"in.gr": 3,
|
||||
"iefimerida": 3,
|
||||
"Proto Thema": 3,
|
||||
"This Week in Startups": 3,
|
||||
"Lex Fridman Tech": 3,
|
||||
"The Vergecast": 3,
|
||||
"Decoder (Verge)": 3,
|
||||
"AI Podcast (NVIDIA)": 3,
|
||||
"Gradient Dissent": 3,
|
||||
"Eye on AI": 3,
|
||||
"The Pitch": 3,
|
||||
"Upworthy": 3,
|
||||
"DailyGood": 3,
|
||||
"Good Good Good": 3,
|
||||
"GOOD Magazine": 3,
|
||||
"Sunny Skyz": 3,
|
||||
"The Better India": 3,
|
||||
"Mongabay": 3,
|
||||
"Conservation Optimism": 3,
|
||||
"Shareable": 3,
|
||||
"GNN Heroes Spotlight": 3,
|
||||
"GNN Science": 3,
|
||||
"GNN Animals": 3,
|
||||
"GNN Health": 3,
|
||||
"GNN Heroes": 3,
|
||||
"GNN Earth": 3,
|
||||
"Hacker News": 4,
|
||||
"The Verge": 4,
|
||||
"The Verge AI": 4,
|
||||
"VentureBeat AI": 4,
|
||||
"Yahoo Finance": 4,
|
||||
"TechCrunch Layoffs": 4,
|
||||
"ArXiv AI": 4,
|
||||
"AI News": 4,
|
||||
"Layoffs News": 4,
|
||||
"GloNewswire (Taiwan)": 4
|
||||
}
|
||||
Reference in New Issue
Block a user