mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* fix(scoring): relay recomputes importanceScore post-LLM + shadow-log v2 + parity test
Before this change, classified rss_alert events published by ais-relay carried
a stale importanceScore: the digest computed it from the keyword-level threat
before the LLM upgrade, and the relay republished that value unchanged. Shadow
log (2,850 entries / 7 days) had Pearson 0.31 vs human rating with zero events
reaching the ≥85 critical gate — the score being measured was the keyword
fallback, not the AI classification.
Fixes:
- ais-relay.cjs: recompute importanceScore locally from the post-LLM level
using an exact mirror of the digest formula (SEVERITY_SCORES, SCORE_WEIGHTS,
SOURCE_TIERS, formula). Publish includes corroborationCount for downstream
shadow-log enrichment.
- notification-relay.cjs: delete the duplicate shadowLogScore() call that
produced ~50% near-duplicate pairs. Move shadow log to v2 key with
JSON-encoded members carrying severity, source, corroborationCount,
variant — future calibration cycles get cleaner data.
- shadow-score-{report,rank}.mjs: parse both v2 JSON and legacy v1 string
members; default to v2, override via SHADOW_SCORE_KEY env.
- _classifier.ts: narrow keyword additions — blockade, siege, sanction
(singular), escalation → HIGH; evacuation orders (plural) → CRITICAL.
- tests/importance-score-parity.test.mjs: extracts tier map and formula from
both TS digest and CJS relay sources, asserts identical output across 12
sample cases. Catches any future drift.
- tests/relay-importance-recompute.test.mjs + notification-relay-shadow-log
.test.mjs: regression tests for the publish path and single-write discipline.
Gates remain OFF. After deploy, collect 48h of fresh shadow:score-log:v2
data, re-run scripts/shadow-score-rank.mjs for calibration, then set final
IMPORTANCE_SCORE_MIN / high / critical thresholds before enabling
IMPORTANCE_SCORE_LIVE=1.
See docs/internal/scoringDiagnostic.md (local) for full diagnosis.
🤖 Generated with Claude Sonnet 4.6 via Claude Code + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(scoring): PR #3069 review amendments — revert risky keywords + extract SOURCE_TIERS
Addresses review findings on PR #3069 (todos/193 through 204).
BLOCKING (P1):
- Revert 5 keyword additions in _classifier.ts. Review showed `escalation`,
`sanction`, `siege`, `blockade`, `evacuation orders` fire HIGH on
`de-escalation`, `sanctioned`, `besieged`, `blockaded` (substring matches),
and the plural `evacuation orders` is already covered by the singular.
Classifier work will land in a separate PR with phrase-based rules.
- Delete dead `digestImportanceScore` field from relay allTitles metadata
(written in two places, read nowhere).
IMPORTANT (P2):
- Extract SOURCE_TIERS to shared/source-tiers.{json,cjs} using the
existing shared/rss-allowed-domains.* precedent. Dockerfile.relay
already `COPY shared/` (whole dir), so no infra change. Deletes
255-line inline duplicate from ais-relay.cjs; TS digest imports the
same JSON via resolveJsonModule. Tier-map parity is now structural.
- Simplify parity test — tier extraction no longer needed. SEVERITY_SCORES
+ SCORE_WEIGHTS + scoring function parity retained across 12 cases
plus an unknown-level defensiveness case. Deleted no-op regex replace
(`.replace(/X/g, 'X')`). Fixed misleading recency docstring.
- Pipeline shadow log: ZADD + ZREMRANGEBYSCORE + belt+suspenders EXPIRE
now go in a single Upstash /pipeline POST (~50% RT reduction, no
billing delta).
- Bounded ZRANGE in shadow-score-report.mjs (20k cap + warn if reached).
- Bump outbound webhook envelope v1→v2 to signal the new
`corroborationCount` field on rss_alert payloads.
- Restore rss_alert eventType gate at shadowLogScore caller (skip
promise cost for non-rss events).
- Align ais-relay scorer comment with reality: it has ONE intentional
deviation from digest (`?? 0` on severity for defensiveness, returning
0 vs NaN on unknown levels). Documented + tested.
P3:
- Narrow loadEnv in scripts to only UPSTASH_REDIS_REST_* (was setting
any UPPER_UNDERSCORE env var from .env.local).
- Escape markdown specials in rating-sheet.md title embeds.
Pre-existing activation blockers NOT fixed here (tracked as todos 196,
203): `/api/notify` accepts arbitrary importanceScore from authenticated
Pro users, and notification channel bodies don't escape mrkdwn/Discord
markup. Both must close before `IMPORTANCE_SCORE_LIVE=1`.
Net: -614 lines (more deleted than added). 26 regression assertions pass.
npm run typecheck, typecheck:api, test:data all pass.
🤖 Generated with Claude Sonnet 4.6 via Claude Code + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(scoring): mirror source-tiers.{json,cjs} into scripts/shared/
The `scripts-shared-mirror` enforcement in tests/edge-functions.test.mjs
requires every *.json and *.cjs in shared/ to have a byte-identical copy
in scripts/shared/ (Railway rootDirectory=scripts deploy bundle cannot
reach repo-root shared/). Last commit added shared/source-tiers.{json,cjs}
without mirroring them. CI caught it.
* fix(scoring): revert webhook envelope to v1 + log shadow-log pipeline failures
Two P1/P2 review findings on PR #3069:
P1: Bumping webhook envelope to version: '2' was a unilateral breaking
change — the other webhook producers (proactive-intelligence.mjs:407,
seed-digest-notifications.mjs:736) still emit v1, so the same endpoint
would receive mixed envelope versions per event type. A consumer
validating `version === '1'` would break specifically on realtime
rss_alert deliveries while proactive_brief and digest events kept
working. Revert to '1' and document why — `corroborationCount` is an
additive payload field, backwards-compatible for typical consumers;
strict consumers using `additionalProperties: false` should be handled
via a coordinated version bump across all producers in a separate PR.
P2: The new shadow-log /pipeline write swallowed all errors silently
(no resp.ok check, no per-command error inspection), a regression from
the old upstashRest() path which at least logged non-2xx. Since the
48h recalibration cycle depends on shadow:score-log:v2 filling with
clean data, a bad auth token or malformed pipeline body would leave
operators staring at an empty ZSET with no signal. Now logs HTTP
failures and per-command pipeline errors.
* docs(scoring): fix stale v1 references + clarify two-copy source-tiers mirror
Two follow-up review findings on PR #3069:
P2 — Source-tier "single source of truth" comments were outdated.
PR #3069 ships TWO JSON copies (shared/source-tiers.json for Vercel edge +
main relay container, scripts/shared/source-tiers.json for Railway services
using rootDirectory=scripts). Comments in server/_shared/source-tiers.ts
and scripts/ais-relay.cjs now explicitly document the mirror setup and
point at the two tests that enforce byte-identity: the existing
scripts-shared-mirror test (tests/edge-functions.test.mjs:37-48) and a
new explicit cross-check in tests/importance-score-parity.test.mjs.
Adding the assertion here is belt-and-suspenders: if edge-functions.test.mjs
is ever narrowed, the parity test still catches drift.
P3 — Stale v1 references in shared metadata/docs. The actual writer
moved to shadow:score-log:v2 in notification-relay.cjs, but
server/_shared/cache-keys.ts:23-31 still documented v1 and exported the
v1 string. No runtime impact (the export has zero importers — relay
uses its own local const) but misleading. Updated the doc block to
explain both v1 (legacy, self-pruning) and v2 (current), bumped the
constant to v2, and added a comment that notification-relay.cjs is the
owner. Header comment in scripts/shadow-score-report.mjs now documents
the SHADOW_SCORE_KEY override path too.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
210 lines
8.6 KiB
JavaScript
210 lines
8.6 KiB
JavaScript
/**
|
|
* Parity test: the relay-inlined importance scorer (scripts/ais-relay.cjs)
|
|
* must produce identical output to the canonical digest scorer
|
|
* (server/worldmonitor/news/v1/list-feed-digest.ts).
|
|
*
|
|
* Background: PR #2604 introduced importanceScore in the digest. The relay
|
|
* republishes classified headlines as rss_alert events and must carry a score
|
|
* recomputed from the post-LLM threat level (see docs/internal/scoringDiagnostic.md).
|
|
* Both sides load SOURCE_TIERS from shared/source-tiers.json (same bytes), so
|
|
* tier-map parity is structural. This test covers SEVERITY_SCORES, SCORE_WEIGHTS,
|
|
* and computeImportanceScore() itself — the pieces still duplicated until a
|
|
* follow-up moves them into shared/ too (todo #195, part 2).
|
|
*
|
|
* Run: node --test tests/importance-score-parity.test.mjs
|
|
*/
|
|
|
|
import { describe, it } from 'node:test';
|
|
import assert from 'node:assert/strict';
|
|
import { readFileSync } from 'node:fs';
|
|
import { resolve, dirname } from 'node:path';
|
|
import { fileURLToPath } from 'node:url';
|
|
|
|
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
const repoRoot = resolve(__dirname, '..');
|
|
|
|
const digestSrc = readFileSync(
|
|
resolve(repoRoot, 'server/worldmonitor/news/v1/list-feed-digest.ts'),
|
|
'utf-8',
|
|
);
|
|
const relaySrc = readFileSync(
|
|
resolve(repoRoot, 'scripts/ais-relay.cjs'),
|
|
'utf-8',
|
|
);
|
|
|
|
// Shared source of truth: both sides load this JSON at runtime.
|
|
// The test uses it as the oracle for tier lookups.
|
|
const sharedSourceTiers = JSON.parse(
|
|
readFileSync(resolve(repoRoot, 'shared/source-tiers.json'), 'utf-8'),
|
|
);
|
|
|
|
// ── Extract constants from source files ──────────────────────────────────────
|
|
|
|
function extractObjectLiteral(src, varName) {
|
|
// Locate `<prefix>const NAME ... = ` then brace-match the literal. Works for
|
|
// single-line and multi-line objects and tolerates `as const` / type suffixes.
|
|
// Not JS-aware: does not skip strings/comments/templates. Current constants
|
|
// are plain objects of primitives so this is sufficient; if the tracked
|
|
// literals ever grow embedded braces inside strings, upgrade this to the
|
|
// TypeScript compiler API.
|
|
const re = new RegExp(`(?:export\\s+)?const\\s+${varName}\\b[^=]*=\\s*\\{`);
|
|
const match = src.match(re);
|
|
if (!match) throw new Error(`Could not find declaration for ${varName}`);
|
|
const braceStart = match.index + match[0].length - 1;
|
|
let depth = 1;
|
|
let i = braceStart + 1;
|
|
while (i < src.length && depth > 0) {
|
|
const ch = src[i];
|
|
if (ch === '{') depth++;
|
|
else if (ch === '}') depth--;
|
|
i++;
|
|
}
|
|
if (depth !== 0) throw new Error(`Unbalanced braces in ${varName}`);
|
|
const literal = src.slice(braceStart, i);
|
|
return new Function(`return (${literal});`)();
|
|
}
|
|
|
|
function extractFunctionBody(src, fnSignature) {
|
|
const idx = src.indexOf(fnSignature);
|
|
if (idx === -1) throw new Error(`Could not find ${fnSignature}`);
|
|
const openIdx = src.indexOf('{', idx + fnSignature.length);
|
|
let depth = 1;
|
|
let i = openIdx + 1;
|
|
while (i < src.length && depth > 0) {
|
|
if (src[i] === '{') depth++;
|
|
else if (src[i] === '}') depth--;
|
|
i++;
|
|
}
|
|
return src.slice(openIdx + 1, i - 1);
|
|
}
|
|
|
|
const digestSeverityScores = extractObjectLiteral(digestSrc, 'SEVERITY_SCORES');
|
|
const digestScoreWeights = extractObjectLiteral(digestSrc, 'SCORE_WEIGHTS');
|
|
|
|
const relaySeverityScores = extractObjectLiteral(relaySrc, 'RELAY_SEVERITY_SCORES');
|
|
const relayScoreWeights = extractObjectLiteral(relaySrc, 'RELAY_SCORE_WEIGHTS');
|
|
|
|
// ── Reconstruct the scorers as pure functions for output comparison ─────────
|
|
|
|
const digestFnBody = extractFunctionBody(digestSrc, 'function computeImportanceScore(');
|
|
const digestComputeImportanceScore = new Function(
|
|
'level', 'source', 'corroborationCount', 'publishedAt',
|
|
'SEVERITY_SCORES', 'SCORE_WEIGHTS', 'SOURCE_TIERS',
|
|
`
|
|
function getSourceTier(name) { return SOURCE_TIERS[name] ?? 4; }
|
|
${digestFnBody}
|
|
`,
|
|
);
|
|
|
|
function digestScore(level, source, corroboration, publishedAt) {
|
|
return digestComputeImportanceScore(
|
|
level, source, corroboration, publishedAt,
|
|
digestSeverityScores, digestScoreWeights, sharedSourceTiers,
|
|
);
|
|
}
|
|
|
|
const relayFnBody = extractFunctionBody(relaySrc, 'function relayComputeImportanceScore(');
|
|
const relayComputeImportanceScore = new Function(
|
|
'level', 'source', 'corroborationCount', 'publishedAt',
|
|
'RELAY_SEVERITY_SCORES', 'RELAY_SCORE_WEIGHTS', 'RELAY_SOURCE_TIERS',
|
|
`
|
|
function relayGetSourceTier(name) { return RELAY_SOURCE_TIERS[name] ?? 4; }
|
|
${relayFnBody}
|
|
`,
|
|
);
|
|
|
|
function relayScore(level, source, corroboration, publishedAt) {
|
|
return relayComputeImportanceScore(
|
|
level, source, corroboration, publishedAt,
|
|
relaySeverityScores, relayScoreWeights, sharedSourceTiers,
|
|
);
|
|
}
|
|
|
|
// ── Tests ────────────────────────────────────────────────────────────────────
|
|
|
|
describe('SOURCE_TIERS structural parity', () => {
|
|
it('shared/source-tiers.json has the expected shape', () => {
|
|
assert.ok(Object.keys(sharedSourceTiers).length > 100, 'tier map unexpectedly small');
|
|
for (const [name, tier] of Object.entries(sharedSourceTiers)) {
|
|
assert.ok([1, 2, 3, 4].includes(tier), `${name} has invalid tier ${tier}`);
|
|
}
|
|
});
|
|
|
|
it('scripts/shared/source-tiers.json matches shared/source-tiers.json byte-for-byte', () => {
|
|
// Also guarded by tests/edge-functions.test.mjs (scripts-shared-mirror).
|
|
// Duplicated here as an explicit parity cross-check so drift can't sneak
|
|
// through if the edge-functions test is ever narrowed.
|
|
const canonical = readFileSync(resolve(repoRoot, 'shared/source-tiers.json'), 'utf-8');
|
|
const mirror = readFileSync(resolve(repoRoot, 'scripts/shared/source-tiers.json'), 'utf-8');
|
|
assert.equal(
|
|
mirror, canonical,
|
|
'scripts/shared/source-tiers.json drifted from shared/source-tiers.json — run: cp shared/source-tiers.json scripts/shared/',
|
|
);
|
|
});
|
|
});
|
|
|
|
describe('SEVERITY_SCORES parity (digest ↔ relay)', () => {
|
|
it('matches the canonical level → score mapping', () => {
|
|
assert.deepEqual(relaySeverityScores, digestSeverityScores);
|
|
});
|
|
});
|
|
|
|
describe('SCORE_WEIGHTS parity (digest ↔ relay)', () => {
|
|
it('matches the canonical component weights', () => {
|
|
assert.deepEqual(relayScoreWeights, digestScoreWeights);
|
|
});
|
|
|
|
it('weights sum to 1.0', () => {
|
|
const sum = Object.values(digestScoreWeights).reduce((a, b) => a + b, 0);
|
|
assert.ok(Math.abs(sum - 1.0) < 1e-9, `weights sum to ${sum}, expected 1.0`);
|
|
});
|
|
});
|
|
|
|
describe('computeImportanceScore parity (digest ↔ relay)', () => {
|
|
// Both scorers call Date.now() internally, so recency is non-deterministic
|
|
// across calls but identical on the same call (we evaluate digest then relay
|
|
// with the same wall-clock). publishedAt is "1h before the test ran" only
|
|
// as a rough anchor — the exact recency score drifts with test run time,
|
|
// which is acceptable because both sides see the same drift.
|
|
const oneHourAgo = Date.now() - 3600_000;
|
|
|
|
const cases = [
|
|
['critical', 'Reuters', 5],
|
|
['critical', 'BBC World', 3],
|
|
['critical', 'Defense One', 1],
|
|
['critical', 'Hacker News', 1],
|
|
['high', 'AP News', 2],
|
|
['high', 'Al Jazeera', 4],
|
|
['high', 'unknown-source', 1], // unknown source defaults to tier 4
|
|
['medium', 'BBC World', 1],
|
|
['medium', 'Federal Reserve', 5],
|
|
['low', 'Reuters', 1],
|
|
['info', 'Reuters', 1],
|
|
['info', 'Hacker News', 5],
|
|
];
|
|
|
|
for (const [level, source, corr] of cases) {
|
|
it(`${level} / ${source} / corr=${corr}`, () => {
|
|
const a = digestScore(level, source, corr, oneHourAgo);
|
|
const b = relayScore(level, source, corr, oneHourAgo);
|
|
assert.equal(
|
|
b, a,
|
|
`score mismatch for ${level}/${source}/corr=${corr}: digest=${a} relay=${b}`,
|
|
);
|
|
});
|
|
}
|
|
|
|
// Intentional asymmetry documented at the relay's inline comment:
|
|
// relay defensively returns 0 for unknown severity; digest returns NaN.
|
|
// If the shared module refactor completes (todo #195 part 2), this
|
|
// divergence disappears.
|
|
it('handles unknown severity level without throwing', () => {
|
|
const bad = 'bogus-level';
|
|
const d = digestScore(bad, 'Reuters', 1, oneHourAgo);
|
|
const r = relayScore(bad, 'Reuters', 1, oneHourAgo);
|
|
// digest → NaN (propagates from undefined * number); relay → finite number (?? 0 fallback)
|
|
assert.ok(Number.isNaN(d) || d === 0, `digest should be NaN or 0, got ${d}`);
|
|
assert.ok(Number.isFinite(r), `relay should be finite (defensive), got ${r}`);
|
|
});
|
|
});
|