Files
worldmonitor/tests/brief-composer-rule-dedup.test.mjs
Elie Habib 9c14820c69 fix(digest): brief filter-drop instrumentation + cache-key correctness (#3387)
* fix(digest): include sensitivity in digestFor cache key

buildDigest filters by rule.sensitivity BEFORE dedup, but digestFor
memoized only on (variant, lang, windowStart). Stricter-sensitivity
users in a shared bucket inherited the looser populator's pool,
producing the wrong story set and defeating downstream topic-grouping
adjacency once filterTopStories re-applied sensitivity.

Solution 1 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.

* feat(digest): instrument per-user filterTopStories drops

Adds an optional onDrop metrics callback to filterTopStories and threads
it through composeBriefFromDigestStories. The seeder aggregates counts
per composed brief and emits one structured log line per user per tick:

  [digest] brief filter drops user=<id> sensitivity=<s> in=<count>
    dropped_severity=<n> dropped_url=<n> dropped_headline=<n>
    dropped_shape=<n> out=<count>

Decides whether the conditional Solution 3 (post-filter regroup) is
warranted by quantifying how often post-group filter drops puncture
multi-member topics in production. No behaviour change for callers
that omit onDrop.

Solution 0 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.

* fix(digest): close two Sol-0 instrumentation gaps from code review

Review surfaced two P2 gaps in the filter-drop telemetry that weakened
its diagnostic purpose for Sol-3 gating:

1. Cap-truncation silent drop: filterTopStories broke on
   `out.length >= maxStories` BEFORE the onDrop emit sites, so up to
   (DIGEST_MAX_ITEMS - MAX_STORIES_PER_USER) stories per user were
   invisible. Added a 'cap' reason to DropMetricsFn and emit one event
   per skipped story so `in - out - sum(dropped_*) == 0` reconciles.

2. Wipeout invisibility: composeAndStoreBriefForUser only logged drop
   stats for the WINNING candidate. When every candidate composed to
   null, the log line never fired — exactly the wipeout case Sol-0
   was meant to surface. Now tracks per-candidate drops and emits an
   aggregate `outcome=wipeout` line covering all attempts.

Also tightens the digest-cache-key sensitivity regex test to anchor
inside the cache-key template literal (it would otherwise match the
unrelated `chosenCandidate.sensitivity ?? 'high'` in the new log line).

PR review residuals from
docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md
ce-code-review run 20260424-232911-37a2d5df.

* chore: ignore .context/ ce-code-review run artifacts

The ce-code-review skill writes per-run artifacts (reviewer JSON,
synthesis.md, metadata.json) under .context/compound-engineering/.
These are local-only — neither tracked nor linted.

* fix(digest): emit per-attempt filter-drop rows, not per-user

Addresses two PR #3387 review findings:

- P2: Earlier candidates that composed to null (wiped out by post-group
  filtering) had their dropStats silently discarded when a later
  candidate shipped — exactly the signal Sol-0 was meant to surface.
- P3: outcome=wipeout row was labeled with allCandidateDrops[0]
  .sensitivity, misleading when candidates within one user have
  different sensitivities.

Fix: emit one structured row per attempted candidate, tagged with that
candidate's own sensitivity and variant. Outcome is shipped|rejected.
A wipeout is now detectable as "all rows for this user are rejected
within the tick" — no aggregate-row ambiguity. Removes the
allCandidateDrops accumulator entirely.

* fix(digest): align composeBriefFromDigestStories sensitivity default to 'high'

Addresses PR #3387 review (P2): composeBriefFromDigestStories defaulted
to `?? 'all'` while buildDigest, the digestFor cache key, and the new
per-attempt log line all default to `?? 'high'`. The mismatch is
harmless in production (the live cron path pre-filters the pool) but:

- A non-prefiltered caller with undefined sensitivity would silently
  ship medium/low stories.
- Per-attempt telemetry labels the attempt as `sensitivity=high` while
  compose actually applied 'all' — operators are misled.

Aligning compose to 'high' makes the four sites agree and the telemetry
honest. Production output is byte-identical (input pool was already
'high'-filtered upstream).

Adds 3 regression tests asserting the new default: critical/high admitted,
medium/low dropped, and onDrop fires reason=severity for the dropped
levels (locks in alignment with per-attempt telemetry).

* fix(digest): align remaining sensitivity defaults to 'high'

Addresses PR #3387 review (P2 + P3): three more sites still defaulted
missing sensitivity to 'all' while compose/buildDigest/cache/log now
treat it as 'high'.

P2 — compareRules (scripts/lib/brief-compose.mjs:35-36): the rank
function used to default to 'all', placing legacy undefined-sensitivity
rules FIRST in the candidate order. Compose then applied a 'high'
filter to them, shipping a narrow brief while an explicit 'all' rule
for the same user was never tried. Aligned to 'high' so the rank
matches what compose actually applies.

P3 — enrichBriefEnvelopeWithLLM (scripts/lib/brief-llm.mjs:526):
the digest prompt and cache key still used 'all' for legacy rules,
misleading personalization ("Reader sensitivity level: all" while the
brief contains only critical/high stories) and busting the cache for
legacy vs explicit-'all' rows that should share entries.

Also aligns the @deprecated composeBriefForRule (line 164) for
consistency, since tests still import it.

3 new regression tests in tests/brief-composer-rule-dedup.test.mjs
lock in the new ranking: explicit 'all' beats undefined-sensitivity,
undefined-sensitivity ties with explicit 'high' (decided by updatedAt),
and groupEligibleRulesByUser candidate order respects the rank.

6853/6853 tests pass (was 6850 → +3).
2026-04-25 00:23:29 +04:00

256 lines
11 KiB
JavaScript

// Regression tests for the Phase 3a composer's rule-selection logic.
//
// Two guards:
// 1. aiDigestEnabled default parity — undefined must be opt-IN, matching
// seed-digest-notifications.mjs:914 and notifications-settings.ts:228.
// 2. Per-user dedupe — alertRules are (userId, variant)-scoped but the
// brief key is user-scoped. Multi-variant users must produce exactly
// one brief per issue, with a deterministic tie-breaker.
import { describe, it } from 'node:test';
import assert from 'node:assert/strict';
import {
dedupeRulesByUser,
groupEligibleRulesByUser,
shouldExitNonZero,
} from '../scripts/lib/brief-compose.mjs';
function rule(overrides = {}) {
return {
userId: 'user_abc',
variant: 'full',
enabled: true,
digestMode: 'daily',
sensitivity: 'high',
aiDigestEnabled: true,
digestTimezone: 'UTC',
updatedAt: 1_700_000_000_000,
...overrides,
};
}
describe('dedupeRulesByUser', () => {
it('keeps a single rule unchanged', () => {
const out = dedupeRulesByUser([rule()]);
assert.equal(out.length, 1);
assert.equal(out[0].variant, 'full');
});
it('dedupes multi-variant users to one rule, preferring "full"', () => {
const out = dedupeRulesByUser([
rule({ variant: 'finance', sensitivity: 'high' }),
rule({ variant: 'full', sensitivity: 'critical' }),
rule({ variant: 'tech', sensitivity: 'all' }),
]);
assert.equal(out.length, 1);
assert.equal(out[0].variant, 'full');
});
it('when no full variant: picks most permissive sensitivity', () => {
const out = dedupeRulesByUser([
rule({ variant: 'tech', sensitivity: 'critical' }),
rule({ variant: 'finance', sensitivity: 'all' }),
rule({ variant: 'energy', sensitivity: 'high' }),
]);
assert.equal(out.length, 1);
// 'all' is the most permissive.
assert.equal(out[0].variant, 'finance');
});
it('never cross-contaminates across userIds', () => {
const out = dedupeRulesByUser([
rule({ userId: 'user_a', variant: 'full' }),
rule({ userId: 'user_b', variant: 'tech' }),
rule({ userId: 'user_a', variant: 'finance' }),
]);
assert.equal(out.length, 2);
const a = out.find((r) => r.userId === 'user_a');
const b = out.find((r) => r.userId === 'user_b');
assert.equal(a.variant, 'full');
assert.equal(b.variant, 'tech');
});
it('drops rules without a string userId', () => {
const out = dedupeRulesByUser([
rule({ userId: /** @type {any} */ (null) }),
rule({ userId: 'user_ok' }),
]);
assert.equal(out.length, 1);
assert.equal(out[0].userId, 'user_ok');
});
it('is deterministic across duplicate full-variant rules via updatedAt tie-breaker', () => {
const older = rule({ variant: 'full', sensitivity: 'high', updatedAt: 1_000 });
const newer = rule({ variant: 'full', sensitivity: 'high', updatedAt: 2_000 });
const out1 = dedupeRulesByUser([older, newer]);
const out2 = dedupeRulesByUser([newer, older]);
// Earlier updatedAt wins — stable under input reordering.
assert.equal(out1[0].updatedAt, 1_000);
assert.equal(out2[0].updatedAt, 1_000);
});
describe('undefined sensitivity ranks as "high" (NOT "all")', () => {
// PR #3387 review (P2): the rank function used to default to 'all',
// which would place a legacy undefined-sensitivity rule FIRST in
// the candidate order — but composeBriefFromDigestStories now
// applies a 'high' filter to undefined-sensitivity rules. Result:
// an explicit 'all' rule for the same user would never be tried,
// and the user would silently receive a narrower brief. Rank must
// match what compose actually applies.
function ruleWithoutSensitivity(overrides = {}) {
const r = rule(overrides);
delete r.sensitivity;
return r;
}
it('explicit "all" rule beats undefined-sensitivity rule of same variant + age', () => {
const explicitAll = rule({ variant: 'full', sensitivity: 'all', updatedAt: 1_000 });
const undefSens = ruleWithoutSensitivity({ variant: 'full', updatedAt: 1_000 });
// Both arrival orders must produce the same winner.
const out1 = dedupeRulesByUser([explicitAll, undefSens]);
const out2 = dedupeRulesByUser([undefSens, explicitAll]);
assert.equal(out1[0].sensitivity, 'all');
assert.equal(out2[0].sensitivity, 'all');
});
it('undefined-sensitivity rule ties with explicit "high" (decided by updatedAt)', () => {
// Both should rank as 'high' → tiebreak by updatedAt → newer (older?)
// matches existing semantics: earlier updatedAt wins per the
// "stable under input reordering" test above.
const undefSens = ruleWithoutSensitivity({ variant: 'full', updatedAt: 1_000 });
const explicitHigh = rule({ variant: 'full', sensitivity: 'high', updatedAt: 2_000 });
const out1 = dedupeRulesByUser([undefSens, explicitHigh]);
const out2 = dedupeRulesByUser([explicitHigh, undefSens]);
// Earlier updatedAt wins → undefined rule (1_000 < 2_000).
assert.equal(out1[0].updatedAt, 1_000);
assert.equal(out2[0].updatedAt, 1_000);
});
it('candidate order in groupEligibleRulesByUser respects new ranking', () => {
// groupEligibleRulesByUser sorts candidates so the most-permissive
// (and most-preferred) is tried first by composeAndStoreBriefForUser.
// After the rank-default fix, undefined-sensitivity should sit
// BELOW explicit 'all' in the try order.
const explicitAll = rule({ variant: 'full', sensitivity: 'all', updatedAt: 1_000 });
const undefSens = ruleWithoutSensitivity({ variant: 'full', updatedAt: 2_000 });
const grouped = groupEligibleRulesByUser([undefSens, explicitAll]);
const candidates = grouped.get('user_abc');
assert.equal(candidates[0].sensitivity, 'all', 'explicit "all" should be tried first');
assert.equal(candidates[1].sensitivity, undefined, 'undefined sensitivity should come second');
});
});
});
describe('aiDigestEnabled default parity', () => {
// The composer's main loop short-circuits on `rule.aiDigestEnabled
// === false`. Exercising the predicate directly so a refactor that
// re-inverts it (back to `!rule.aiDigestEnabled`) fails loud.
function shouldSkipForAiDigest(rule) {
return rule.aiDigestEnabled === false;
}
it('includes rules with aiDigestEnabled: true', () => {
assert.equal(shouldSkipForAiDigest({ aiDigestEnabled: true }), false);
});
it('includes rules with aiDigestEnabled: undefined (legacy rows)', () => {
assert.equal(shouldSkipForAiDigest({ aiDigestEnabled: undefined }), false);
});
it('includes rules with no aiDigestEnabled field at all (legacy rows)', () => {
assert.equal(shouldSkipForAiDigest({}), false);
});
it('excludes only when explicitly false', () => {
assert.equal(shouldSkipForAiDigest({ aiDigestEnabled: false }), true);
});
it('groupEligibleRulesByUser: opted-out preferred variant falls back to opted-in sibling', () => {
const grouped = groupEligibleRulesByUser([
rule({ variant: 'full', aiDigestEnabled: false, updatedAt: 100 }),
rule({ variant: 'finance', aiDigestEnabled: true, updatedAt: 200 }),
]);
const candidates = grouped.get('user_abc');
assert.ok(candidates, 'user is still eligible via the opt-in variant');
assert.equal(candidates.length, 1);
assert.equal(candidates[0].variant, 'finance');
});
it('groupEligibleRulesByUser: user with all variants opted-out is dropped entirely', () => {
const grouped = groupEligibleRulesByUser([
rule({ variant: 'full', aiDigestEnabled: false }),
rule({ variant: 'finance', aiDigestEnabled: false }),
]);
assert.equal(grouped.size, 0);
});
it('groupEligibleRulesByUser: retains all eligible candidates in preference order', () => {
const grouped = groupEligibleRulesByUser([
rule({ variant: 'finance', sensitivity: 'critical', updatedAt: 100 }),
rule({ variant: 'full', sensitivity: 'critical', updatedAt: 200 }),
rule({ variant: 'tech', sensitivity: 'all', updatedAt: 300 }),
]);
const candidates = grouped.get('user_abc');
assert.equal(candidates.length, 3);
// First is full (preferred variant); then tech (most permissive sensitivity);
// then finance. Fallback loop in the main() script tries them in this order.
assert.equal(candidates[0].variant, 'full');
assert.equal(candidates[1].variant, 'tech');
assert.equal(candidates[2].variant, 'finance');
});
it('shouldExitNonZero: returns false when no failures', () => {
assert.equal(shouldExitNonZero({ success: 10, failed: 0 }), false);
});
it('shouldExitNonZero: catches 100% failure on small attempted volume', () => {
// 4 attempted, 4 failed, 96 eligible skipped-empty. The earlier
// (eligibleUserCount) denominator would read 4/100=4% and pass.
assert.equal(shouldExitNonZero({ success: 0, failed: 4 }), true);
});
it('shouldExitNonZero: 1/20 failures is exactly at 5% (floor(20*0.05)=1), trips', () => {
// Exact-threshold boundary: documents intentional behaviour.
assert.equal(shouldExitNonZero({ success: 19, failed: 1 }), true);
});
it('shouldExitNonZero: 1/50 failures stays under threshold (floor(50*0.05)=2)', () => {
// Threshold floor is Math.max(1, floor(N*0.05)). For N<40 a
// single failure always trips. At N=50 the threshold is 2, so
// 1/50 stays green. Ops intuition: the 5% bar is only a "bar"
// once you have a meaningful sample.
assert.equal(shouldExitNonZero({ success: 49, failed: 1 }), false);
});
it('shouldExitNonZero: 2/10 exceeds threshold', () => {
// floor(10 * 0.05) = 0 → Math.max forces 1. failed=2 >= 1.
assert.equal(shouldExitNonZero({ success: 8, failed: 2 }), true);
});
it('shouldExitNonZero: single isolated failure still tripwires', () => {
// floor(1 * 0.05) = 0 → Math.max forces 1. failed=1 >= 1.
assert.equal(shouldExitNonZero({ success: 0, failed: 1 }), true);
});
it('shouldExitNonZero: zero attempted means no signal, returns false', () => {
assert.equal(shouldExitNonZero({ success: 0, failed: 0 }), false);
});
it('matches seed-digest-notifications convention', async () => {
// Cross-reference: the existing digest cron uses the same
// `!== false` test. If it drifts, the brief and digest will
// disagree on who is eligible. This assertion lives here to
// surface the divergence loudly.
const fs = await import('node:fs/promises');
const src = await fs.readFile(
new URL('../scripts/seed-digest-notifications.mjs', import.meta.url),
'utf8',
);
assert.ok(
src.includes('rule.aiDigestEnabled !== false'),
'seed-digest-notifications.mjs must keep `rule.aiDigestEnabled !== false`',
);
});
});