mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* fix(digest): include sensitivity in digestFor cache key
buildDigest filters by rule.sensitivity BEFORE dedup, but digestFor
memoized only on (variant, lang, windowStart). Stricter-sensitivity
users in a shared bucket inherited the looser populator's pool,
producing the wrong story set and defeating downstream topic-grouping
adjacency once filterTopStories re-applied sensitivity.
Solution 1 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.
* feat(digest): instrument per-user filterTopStories drops
Adds an optional onDrop metrics callback to filterTopStories and threads
it through composeBriefFromDigestStories. The seeder aggregates counts
per composed brief and emits one structured log line per user per tick:
[digest] brief filter drops user=<id> sensitivity=<s> in=<count>
dropped_severity=<n> dropped_url=<n> dropped_headline=<n>
dropped_shape=<n> out=<count>
Decides whether the conditional Solution 3 (post-filter regroup) is
warranted by quantifying how often post-group filter drops puncture
multi-member topics in production. No behaviour change for callers
that omit onDrop.
Solution 0 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.
* fix(digest): close two Sol-0 instrumentation gaps from code review
Review surfaced two P2 gaps in the filter-drop telemetry that weakened
its diagnostic purpose for Sol-3 gating:
1. Cap-truncation silent drop: filterTopStories broke on
`out.length >= maxStories` BEFORE the onDrop emit sites, so up to
(DIGEST_MAX_ITEMS - MAX_STORIES_PER_USER) stories per user were
invisible. Added a 'cap' reason to DropMetricsFn and emit one event
per skipped story so `in - out - sum(dropped_*) == 0` reconciles.
2. Wipeout invisibility: composeAndStoreBriefForUser only logged drop
stats for the WINNING candidate. When every candidate composed to
null, the log line never fired — exactly the wipeout case Sol-0
was meant to surface. Now tracks per-candidate drops and emits an
aggregate `outcome=wipeout` line covering all attempts.
Also tightens the digest-cache-key sensitivity regex test to anchor
inside the cache-key template literal (it would otherwise match the
unrelated `chosenCandidate.sensitivity ?? 'high'` in the new log line).
PR review residuals from
docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md
ce-code-review run 20260424-232911-37a2d5df.
* chore: ignore .context/ ce-code-review run artifacts
The ce-code-review skill writes per-run artifacts (reviewer JSON,
synthesis.md, metadata.json) under .context/compound-engineering/.
These are local-only — neither tracked nor linted.
* fix(digest): emit per-attempt filter-drop rows, not per-user
Addresses two PR #3387 review findings:
- P2: Earlier candidates that composed to null (wiped out by post-group
filtering) had their dropStats silently discarded when a later
candidate shipped — exactly the signal Sol-0 was meant to surface.
- P3: outcome=wipeout row was labeled with allCandidateDrops[0]
.sensitivity, misleading when candidates within one user have
different sensitivities.
Fix: emit one structured row per attempted candidate, tagged with that
candidate's own sensitivity and variant. Outcome is shipped|rejected.
A wipeout is now detectable as "all rows for this user are rejected
within the tick" — no aggregate-row ambiguity. Removes the
allCandidateDrops accumulator entirely.
* fix(digest): align composeBriefFromDigestStories sensitivity default to 'high'
Addresses PR #3387 review (P2): composeBriefFromDigestStories defaulted
to `?? 'all'` while buildDigest, the digestFor cache key, and the new
per-attempt log line all default to `?? 'high'`. The mismatch is
harmless in production (the live cron path pre-filters the pool) but:
- A non-prefiltered caller with undefined sensitivity would silently
ship medium/low stories.
- Per-attempt telemetry labels the attempt as `sensitivity=high` while
compose actually applied 'all' — operators are misled.
Aligning compose to 'high' makes the four sites agree and the telemetry
honest. Production output is byte-identical (input pool was already
'high'-filtered upstream).
Adds 3 regression tests asserting the new default: critical/high admitted,
medium/low dropped, and onDrop fires reason=severity for the dropped
levels (locks in alignment with per-attempt telemetry).
* fix(digest): align remaining sensitivity defaults to 'high'
Addresses PR #3387 review (P2 + P3): three more sites still defaulted
missing sensitivity to 'all' while compose/buildDigest/cache/log now
treat it as 'high'.
P2 — compareRules (scripts/lib/brief-compose.mjs:35-36): the rank
function used to default to 'all', placing legacy undefined-sensitivity
rules FIRST in the candidate order. Compose then applied a 'high'
filter to them, shipping a narrow brief while an explicit 'all' rule
for the same user was never tried. Aligned to 'high' so the rank
matches what compose actually applies.
P3 — enrichBriefEnvelopeWithLLM (scripts/lib/brief-llm.mjs:526):
the digest prompt and cache key still used 'all' for legacy rules,
misleading personalization ("Reader sensitivity level: all" while the
brief contains only critical/high stories) and busting the cache for
legacy vs explicit-'all' rows that should share entries.
Also aligns the @deprecated composeBriefForRule (line 164) for
consistency, since tests still import it.
3 new regression tests in tests/brief-composer-rule-dedup.test.mjs
lock in the new ranking: explicit 'all' beats undefined-sensitivity,
undefined-sensitivity ties with explicit 'high' (decided by updatedAt),
and groupEligibleRulesByUser candidate order respects the rank.
6853/6853 tests pass (was 6850 → +3).
323 lines
14 KiB
JavaScript
323 lines
14 KiB
JavaScript
// WorldMonitor Brief compose library.
|
|
//
|
|
// Pure helpers for producing the per-user brief envelope that the
|
|
// hosted magazine route (api/brief/*) + dashboard panel + future
|
|
// channels all consume. Shared between:
|
|
// - scripts/seed-digest-notifications.mjs (the consolidated cron;
|
|
// composes a brief for every user it's about to dispatch a
|
|
// digest to, so the magazine URL can be injected into the
|
|
// notification output).
|
|
// - future tests + ad-hoc tools.
|
|
//
|
|
// Deliberately has NO top-level side effects: no env guards, no
|
|
// process.exit, no main(). Import anywhere.
|
|
//
|
|
// History: this file used to include a stand-alone Railway cron
|
|
// (`seed-brief-composer.mjs`). That path was retired in the
|
|
// consolidation PR — the digest cron now owns the compose+send
|
|
// pipeline so there is exactly one cron writing brief:{userId}:
|
|
// {issueDate} keys.
|
|
|
|
import {
|
|
assembleStubbedBriefEnvelope,
|
|
filterTopStories,
|
|
issueDateInTz,
|
|
} from '../../shared/brief-filter.js';
|
|
|
|
// ── Rule dedupe (one brief per user, not per variant) ───────────────────────
|
|
|
|
const SENSITIVITY_RANK = { all: 0, high: 1, critical: 2 };
|
|
|
|
function compareRules(a, b) {
|
|
const aFull = a.variant === 'full' ? 0 : 1;
|
|
const bFull = b.variant === 'full' ? 0 : 1;
|
|
if (aFull !== bFull) return aFull - bFull;
|
|
// Default missing sensitivity to 'high' (NOT 'all') so the rank
|
|
// matches what compose/buildDigest/cache/log actually treat the
|
|
// rule as. Otherwise a legacy undefined-sensitivity rule would be
|
|
// ranked as the most-permissive 'all' and tried first, but compose
|
|
// would then apply a 'high' filter — shipping a narrow brief while
|
|
// an explicit 'all' rule for the same user is never tried.
|
|
// See PR #3387 review (P2).
|
|
const aRank = SENSITIVITY_RANK[a.sensitivity ?? 'high'] ?? 0;
|
|
const bRank = SENSITIVITY_RANK[b.sensitivity ?? 'high'] ?? 0;
|
|
if (aRank !== bRank) return aRank - bRank;
|
|
return (a.updatedAt ?? 0) - (b.updatedAt ?? 0);
|
|
}
|
|
|
|
/**
|
|
* Group eligible (not-opted-out) rules by userId with each user's
|
|
* candidates sorted in preference order. Callers walk the candidate
|
|
* list and take the first that produces non-empty stories — falls
|
|
* back across variants cleanly.
|
|
*/
|
|
export function groupEligibleRulesByUser(rules) {
|
|
const byUser = new Map();
|
|
for (const rule of rules) {
|
|
if (!rule || typeof rule.userId !== 'string') continue;
|
|
if (rule.aiDigestEnabled === false) continue;
|
|
const list = byUser.get(rule.userId);
|
|
if (list) list.push(rule);
|
|
else byUser.set(rule.userId, [rule]);
|
|
}
|
|
for (const list of byUser.values()) list.sort(compareRules);
|
|
return byUser;
|
|
}
|
|
|
|
/**
|
|
* @deprecated Kept for existing test imports. Prefer
|
|
* groupEligibleRulesByUser + per-user fallback at call sites.
|
|
*/
|
|
export function dedupeRulesByUser(rules) {
|
|
const out = [];
|
|
for (const candidates of groupEligibleRulesByUser(rules).values()) {
|
|
if (candidates.length > 0) out.push(candidates[0]);
|
|
}
|
|
return out;
|
|
}
|
|
|
|
// ── Failure gate ─────────────────────────────────────────────────────────────
|
|
|
|
/**
|
|
* Decide whether the consolidated cron should exit non-zero because
|
|
* the brief-write failure rate is structurally bad (not just a
|
|
* transient blip). Denominator is ATTEMPTED writes, not eligible
|
|
* users: skipped-empty users never reach the write path and must not
|
|
* dilute the ratio.
|
|
*
|
|
* @param {{ success: number; failed: number; thresholdRatio?: number }} counters
|
|
*/
|
|
export function shouldExitNonZero({ success, failed, thresholdRatio = 0.05 }) {
|
|
if (failed <= 0) return false;
|
|
const attempted = success + failed;
|
|
if (attempted <= 0) return false;
|
|
const threshold = Math.max(1, Math.floor(attempted * thresholdRatio));
|
|
return failed >= threshold;
|
|
}
|
|
|
|
// ── Insights fetch ───────────────────────────────────────────────────────────
|
|
|
|
/** Unwrap news:insights:v1 envelope and project the fields the brief needs. */
|
|
export function extractInsights(raw) {
|
|
const data = raw?.data ?? raw;
|
|
const topStories = Array.isArray(data?.topStories) ? data.topStories : [];
|
|
const clusterCount = Number.isFinite(data?.clusterCount) ? data.clusterCount : topStories.length;
|
|
const multiSourceCount = Number.isFinite(data?.multiSourceCount) ? data.multiSourceCount : 0;
|
|
return {
|
|
topStories,
|
|
numbers: { clusters: clusterCount, multiSource: multiSourceCount },
|
|
};
|
|
}
|
|
|
|
// ── Date + display helpers ───────────────────────────────────────────────────
|
|
|
|
const MONTH_NAMES = [
|
|
'January', 'February', 'March', 'April', 'May', 'June',
|
|
'July', 'August', 'September', 'October', 'November', 'December',
|
|
];
|
|
|
|
export function dateLongFromIso(iso) {
|
|
const [y, m, d] = iso.split('-').map(Number);
|
|
return `${d} ${MONTH_NAMES[m - 1]} ${y}`;
|
|
}
|
|
|
|
export function issueCodeFromIso(iso) {
|
|
const [, m, d] = iso.split('-');
|
|
return `${d}.${m}`;
|
|
}
|
|
|
|
export function localHourInTz(nowMs, timezone) {
|
|
try {
|
|
const fmt = new Intl.DateTimeFormat('en-US', {
|
|
timeZone: timezone,
|
|
hour: 'numeric',
|
|
hour12: false,
|
|
});
|
|
const hour = fmt.formatToParts(new Date(nowMs)).find((p) => p.type === 'hour')?.value;
|
|
const n = Number(hour);
|
|
return Number.isFinite(n) ? n : 9;
|
|
} catch {
|
|
return 9;
|
|
}
|
|
}
|
|
|
|
export function userDisplayNameFromId(userId) {
|
|
// Clerk IDs look like "user_2abc…". Phase 3b will hydrate real
|
|
// names via a Convex query; for now a generic placeholder so the
|
|
// magazine's greeting reads naturally.
|
|
void userId;
|
|
return 'Reader';
|
|
}
|
|
|
|
// ── Compose a full brief for a single rule ──────────────────────────────────
|
|
|
|
const MAX_STORIES_PER_USER = 12;
|
|
|
|
/**
|
|
* Filter + assemble a BriefEnvelope for one alert rule from a
|
|
* prebuilt upstream top-stories list (news:insights:v1 shape).
|
|
*
|
|
* @deprecated The live path is composeBriefFromDigestStories(), which
|
|
* reads from the same digest:accumulator pool as the email. This
|
|
* entry point is kept only for tests that stub a news:insights payload
|
|
* directly — real runs would ship a brief with a different story
|
|
* list than the email and should use the digest-stories path.
|
|
*
|
|
* @param {object} rule — enabled alertRule row
|
|
* @param {{ topStories: unknown[]; numbers: { clusters: number; multiSource: number } }} insights
|
|
* @param {{ nowMs: number }} [opts]
|
|
*/
|
|
export function composeBriefForRule(rule, insights, { nowMs = Date.now() } = {}) {
|
|
// Default to 'high' (NOT 'all') for parity with composeBriefFromDigestStories,
|
|
// buildDigest, the digestFor cache key, and the per-attempt log line.
|
|
// See PR #3387 review (P2).
|
|
const sensitivity = rule.sensitivity ?? 'high';
|
|
const tz = rule.digestTimezone ?? 'UTC';
|
|
const stories = filterTopStories({
|
|
stories: insights.topStories,
|
|
sensitivity,
|
|
maxStories: MAX_STORIES_PER_USER,
|
|
});
|
|
if (stories.length === 0) return null;
|
|
const issueDate = issueDateInTz(nowMs, tz);
|
|
return assembleStubbedBriefEnvelope({
|
|
user: { name: userDisplayNameFromId(rule.userId), tz },
|
|
stories,
|
|
issueDate,
|
|
dateLong: dateLongFromIso(issueDate),
|
|
issue: issueCodeFromIso(issueDate),
|
|
insightsNumbers: insights.numbers,
|
|
// Same nowMs as the rest of the envelope so the function stays
|
|
// deterministic for a given input — tests + retries see identical
|
|
// output.
|
|
issuedAt: nowMs,
|
|
localHour: localHourInTz(nowMs, tz),
|
|
});
|
|
}
|
|
|
|
// ── Compose from digest-accumulator stories (the live path) ─────────────────
|
|
|
|
// RSS titles routinely end with " - <Publisher>" / " | <Publisher>" /
|
|
// " — <Publisher>" (Google News normalised form + most major wires).
|
|
// Leaving the suffix in place means the brief headline reads like
|
|
// "... as Iran reimposes restrictions - AP News" instead of "... as
|
|
// Iran reimposes restrictions", and the source attribution underneath
|
|
// ends up duplicated. We strip the suffix ONLY when it matches the
|
|
// primarySource we're about to attribute anyway — so we never strip
|
|
// a real subtitle that happens to look like "foo - bar".
|
|
const HEADLINE_SUFFIX_RE_PART = /\s+[-\u2013\u2014|]\s+([^\s].*)$/;
|
|
|
|
/**
|
|
* @param {string} title
|
|
* @param {string} publisher
|
|
* @returns {string}
|
|
*/
|
|
export function stripHeadlineSuffix(title, publisher) {
|
|
if (typeof title !== 'string' || title.length === 0) return '';
|
|
if (typeof publisher !== 'string' || publisher.length === 0) return title.trim();
|
|
const trimmed = title.trim();
|
|
const m = trimmed.match(HEADLINE_SUFFIX_RE_PART);
|
|
if (!m) return trimmed;
|
|
const tail = m[1].trim();
|
|
// Case-insensitive full-string match. We're conservative: only strip
|
|
// when the tail EQUALS the publisher — a tail that merely contains
|
|
// it (e.g. "- AP News analysis") is editorial content and stays.
|
|
if (tail.toLowerCase() !== publisher.toLowerCase()) return trimmed;
|
|
return trimmed.slice(0, m.index).trimEnd();
|
|
}
|
|
|
|
/**
|
|
* Adapter: the digest accumulator hydrates stories from
|
|
* story:track:v1:{hash} (title / link / severity / lang / score /
|
|
* mentionCount / description?) + story:sources:v1:{hash} SMEMBERS. It
|
|
* does NOT carry a category or country-code — those fields are optional
|
|
* in the upstream brief-filter shape and default cleanly.
|
|
*
|
|
* Since envelope v2, the story's `link` field is carried through as
|
|
* `primaryLink` so filterTopStories can emit a BriefStory.sourceUrl.
|
|
* Stories without a valid link are still passed through here — the
|
|
* filter drops them at the validation boundary rather than this adapter.
|
|
*
|
|
* Description plumbing (post RSS-description fix, 2026-04-24):
|
|
* When the ingested story:track row carries a cleaned RSS description,
|
|
* it rides here as `s.description` and becomes the brief's baseline
|
|
* description. When absent (old rows inside the 48h bleed, or feeds
|
|
* without a description), we fall back to the cleaned headline —
|
|
* preserving today's behavior and letting Phase 3b's LLM enrichment
|
|
* still operate over something, not nothing.
|
|
*
|
|
* @param {object} s — digest-shaped story from buildDigest()
|
|
*/
|
|
function digestStoryToUpstreamTopStory(s) {
|
|
const sources = Array.isArray(s?.sources) ? s.sources : [];
|
|
const primarySource = sources.length > 0 ? sources[0] : 'Multiple wires';
|
|
const rawTitle = typeof s?.title === 'string' ? s.title : '';
|
|
const cleanTitle = stripHeadlineSuffix(rawTitle, primarySource);
|
|
const rawDescription = typeof s?.description === 'string' ? s.description.trim() : '';
|
|
return {
|
|
primaryTitle: cleanTitle,
|
|
// When upstream persists a real RSS description (via story:track:v1
|
|
// post-fix), forward it; otherwise fall back to the cleaned headline
|
|
// so downstream consumers (brief filter, Phase 3b LLM) always have
|
|
// something to ground on.
|
|
description: rawDescription || cleanTitle,
|
|
primarySource,
|
|
primaryLink: typeof s?.link === 'string' ? s.link : undefined,
|
|
threatLevel: s?.severity,
|
|
// story:track:v1 carries neither field, so the brief falls back
|
|
// to 'General' / 'Global' via filterTopStories defaults.
|
|
category: typeof s?.category === 'string' ? s.category : undefined,
|
|
countryCode: typeof s?.countryCode === 'string' ? s.countryCode : undefined,
|
|
};
|
|
}
|
|
|
|
/**
|
|
* Compose a BriefEnvelope from a per-rule digest-accumulator pool
|
|
* (same stories the email digest uses), plus global insights numbers
|
|
* for the stats page.
|
|
*
|
|
* Returns null when no story survives the sensitivity filter — caller
|
|
* falls back to another variant or skips the user.
|
|
*
|
|
* @param {object} rule — enabled alertRule row
|
|
* @param {unknown[]} digestStories — output of buildDigest(rule, windowStart)
|
|
* @param {{ clusters: number; multiSource: number }} insightsNumbers
|
|
* @param {{ nowMs?: number, onDrop?: import('../../shared/brief-filter.js').DropMetricsFn }} [opts]
|
|
* `onDrop` is forwarded to filterTopStories so the seeder can
|
|
* aggregate per-user filter-drop counts without this module knowing
|
|
* how they are reported.
|
|
*/
|
|
export function composeBriefFromDigestStories(rule, digestStories, insightsNumbers, { nowMs = Date.now(), onDrop } = {}) {
|
|
if (!Array.isArray(digestStories) || digestStories.length === 0) return null;
|
|
// Default to 'high' (NOT 'all') for undefined sensitivity, aligning
|
|
// with buildDigest at scripts/seed-digest-notifications.mjs:392 and
|
|
// the digestFor cache key. The live cron path pre-filters the pool
|
|
// to {critical, high}, so this default is a no-op for production
|
|
// calls — but a non-prefiltered caller with undefined sensitivity
|
|
// would otherwise silently widen to {medium, low} stories while the
|
|
// operator log labels the attempt as 'high', misleading telemetry.
|
|
// See PR #3387 review (P2) and Defect 2 / Solution 1 in
|
|
// docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.
|
|
const sensitivity = rule.sensitivity ?? 'high';
|
|
const tz = rule.digestTimezone ?? 'UTC';
|
|
const upstreamLike = digestStories.map(digestStoryToUpstreamTopStory);
|
|
const stories = filterTopStories({
|
|
stories: upstreamLike,
|
|
sensitivity,
|
|
maxStories: MAX_STORIES_PER_USER,
|
|
onDrop,
|
|
});
|
|
if (stories.length === 0) return null;
|
|
const issueDate = issueDateInTz(nowMs, tz);
|
|
return assembleStubbedBriefEnvelope({
|
|
user: { name: userDisplayNameFromId(rule.userId), tz },
|
|
stories,
|
|
issueDate,
|
|
dateLong: dateLongFromIso(issueDate),
|
|
issue: issueCodeFromIso(issueDate),
|
|
insightsNumbers,
|
|
issuedAt: nowMs,
|
|
localHour: localHourInTz(nowMs, tz),
|
|
});
|
|
}
|