mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(brief-llm): canonical synthesis prompt + v3 cache key
Extends generateDigestProse to be the single source of truth for
brief executive-summary synthesis (canonicalises what was previously
split between brief-llm's generateDigestProse and seed-digest-
notifications.mjs's generateAISummary). Ports Brain B's prompt
features into buildDigestPrompt:
- ctx={profile, greeting, isPublic} parameter (back-compat: 4-arg
callers behave like today)
- per-story severity uppercased + short-hash prefix [h:XXXX] so the
model can emit rankedStoryHashes for stable re-ranking
- profile lines + greeting opener appear only when ctx.isPublic !== true
validateDigestProseShape gains optional rankedStoryHashes (≥4-char
strings, capped to MAX_STORIES_PER_USER × 2). v2-shaped rows still
pass — field defaults to [].
hashDigestInput v3:
- material includes profile-SHA, greeting bucket, isPublic flag,
per-story hash
- isPublic=true substitutes literal 'public' for userId in the cache
key so all share-URL readers of the same (date, sensitivity, pool)
hit ONE cache row (no PII in public cache key)
Adds generateDigestProsePublic(stories, sensitivity, deps) wrapper —
no userId param by design — for the share-URL surface.
Cache prefix bumped brief:llm:digest:v2 → v3. v2 rows expire on TTL.
Per the v1→v2 precedent (see hashDigestInput comment), one-tick cost
on rollout is acceptable for cache-key correctness.
Tests: 72/72 passing in tests/brief-llm.test.mjs (8 new for the v3
behaviors), full data suite 6952/6952.
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 1, Codex-approved (5 rounds).
* feat(brief): envelope v3 — adds digest.publicLead for share-URL surface
Bumps BRIEF_ENVELOPE_VERSION 2 → 3. Adds optional
BriefDigest.publicLead — non-personalised executive lead generated
by generateDigestProsePublic (already in this branch from the
previous commit) for the public share-URL surface. Personalised
`lead` is the canonical synthesis for authenticated channels;
publicLead is its profile-stripped sibling so api/brief/public/*
never serves user-specific content (watched assets/regions).
SUPPORTED_ENVELOPE_VERSIONS = [1, 2, 3] keeps v1 + v2 envelopes
in the 7-day TTL window readable through the rollout — the
composer only ever writes the current version, but readers must
tolerate older shapes that haven't expired yet. Same rollout
pattern used at the v1 → v2 bump.
Renderer changes (server/_shared/brief-render.js):
- ALLOWED_DIGEST_KEYS gains 'publicLead' (closed-key-set still
enforced; v2 envelopes pass because publicLead === undefined is
the v2 shape).
- assertBriefEnvelope: new isNonEmptyString check on publicLead
when present. Type contract enforced; absence is OK.
Tests (tests/brief-magazine-render.test.mjs):
- New describe block "v3 publicLead field": v3 envelope renders;
malformed publicLead rejected; v2 envelope still passes; ad-hoc
digest keys (e.g. synthesisLevel) still rejected — confirming
the closed-key-set defense holds for the cron-local-only fields
the orchestrator must NOT persist.
- BRIEF_ENVELOPE_VERSION pin updated 2 → 3 with rollout-rationale
comment.
Test results: 182 brief-related tests pass; full data suite
6956/6956.
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 2, Codex Round-3 Medium #2.
* feat(brief): synthesis splice + rankedStoryHashes pre-cap re-order
Plumbs the canonical synthesis output (lead, threads, signals,
publicLead, rankedStoryHashes from generateDigestProse) through the
pure composer so the orchestration layer can hand pre-resolved data
into envelope.digest. Composer stays sync / no I/O — Codex Round-2
High #2 honored.
Changes:
scripts/lib/brief-compose.mjs:
- digestStoryToUpstreamTopStory now emits `hash` (the digest story's
stable identifier, falls back to titleHash when absent). Without
this, rankedStoryHashes from the LLM has nothing to match against.
- composeBriefFromDigestStories accepts opts.synthesis = {lead,
threads, signals, rankedStoryHashes?, publicLead?}. When passed,
splices into envelope.digest after the stub is built. Partial
synthesis (e.g. only `lead` populated) keeps stub defaults for the
other fields — graceful degradation when L2 fallback fires.
shared/brief-filter.js:
- filterTopStories accepts optional rankedStoryHashes. New helper
applyRankedOrder re-orders stories by short-hash prefix match
BEFORE the cap is applied, so the model's editorial judgment of
importance survives MAX_STORIES_PER_USER. Stable for ties; stories
not in the ranking come after in original order. Empty/missing
ranking is a no-op (legacy callers unchanged).
shared/brief-filter.d.ts:
- filterTopStories signature gains rankedStoryHashes?: string[].
- UpstreamTopStory gains hash?: unknown (carried through from
digestStoryToUpstreamTopStory).
Tests added (tests/brief-from-digest-stories.test.mjs):
- synthesis substitutes lead/threads/signals/publicLead.
- legacy 4-arg callers (no synthesis) keep stub lead.
- partial synthesis (only lead) keeps stub threads/signals.
- rankedStoryHashes re-orders pool before cap.
- short-hash prefix match (model emits 8 chars; story carries full).
- unranked stories go after in original order.
Test results: 33/33 in brief-from-digest-stories; 182/182 across all
brief tests; full data suite 6956/6956.
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 3, Codex Round-2 Low + Round-2 High #2.
* feat(brief): single canonical synthesis per user; rewire all channels
Restructures the digest cron's per-user compose + send loops to
produce ONE canonical synthesis per user per issueSlot — the lead
text every channel (email HTML, plain-text, Telegram, Slack,
Discord, webhook) and the magazine show is byte-identical. This
eliminates the "two-brain" divergence that was producing different
exec summaries on different surfaces (observed 2026-04-25 0802).
Architecture:
composeBriefsForRun (orchestration):
- Pre-annotates every eligible rule with lastSentAt + isDue once,
before the per-user pass. Same getLastSentAt helper the send loop
uses so compose + send agree on lastSentAt for every rule.
composeAndStoreBriefForUser (per-user):
- Two-pass winner walk: try DUE rules first (sortedDue), fall back
to ALL eligible rules (sortedAll) for compose-only ticks.
Preserves today's dashboard refresh contract for weekly /
twice_daily users on non-due ticks (Codex Round-4 High #1).
- Within each pass, walk by compareRules priority and pick the
FIRST candidate with a non-empty pool — mirrors today's behavior
at scripts/seed-digest-notifications.mjs:1044 and prevents the
"highest-priority but empty pool" edge case (Codex Round-4
Medium #2).
- Three-level synthesis fallback chain:
L1: generateDigestProse(fullPool, ctx={profile,greeting,!public})
L2: generateDigestProse(envelope-sized slice, ctx={})
L3: stub from assembleStubbedBriefEnvelope
Distinct log lines per fallback level so ops can quantify
failure-mode distribution.
- Generates publicLead in parallel via generateDigestProsePublic
(no userId param; cache-shared across all share-URL readers).
- Splices synthesis into envelope via composer's optional
`synthesis` arg (Step 3); rankedStoryHashes re-orders the pool
BEFORE the cap so editorial importance survives MAX_STORIES.
- synthesisLevel stored in the cron-local briefByUser entry — NOT
persisted in the envelope (renderer's assertNoExtraKeys would
reject; Codex Round-2 Medium #5).
Send loop:
- Reads lastSentAt via shared getLastSentAt helper (single source
of truth with compose flow).
- briefLead = brief?.envelope?.data?.digest?.lead — the canonical
lead. Passed to buildChannelBodies (text/Telegram/Slack/Discord),
injectEmailSummary (HTML email), and sendWebhook (webhook
payload's `summary` field). All-channel parity (Codex Round-1
Medium #6).
- Subject ternary reads cron-local synthesisLevel: 1 or 2 →
"Intelligence Brief", 3 → "Digest" (preserves today's UX for
fallback paths; Codex Round-1 Missing #5).
Removed:
- generateAISummary() — the second LLM call that produced the
divergent email lead. ~85 lines.
- AI_SUMMARY_CACHE_TTL constant — no longer referenced. The
digest:ai-summary:v1:* cache rows expire on their existing 1h
TTL (no cleanup pass).
Helpers added:
- getLastSentAt(rule) — extracted Upstash GET for digest:last-sent
so compose + send both call one source of truth.
- buildSynthesisCtx(rule, nowMs) — formats profile + greeting for
the canonical synthesis call. Preserves all today's prefs-fetch
failure-mode behavior.
Composer:
- compareRules now exported from scripts/lib/brief-compose.mjs so
the cron can sort each pass identically to groupEligibleRulesByUser.
Test results: full data suite 6962/6962 (was 6956 pre-Step 4; +6
new compose-synthesis tests from Step 3).
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 4 + 4b. Codex-approved (5 rounds).
* fix(brief-render): public-share lead fail-safe — never leak personalised lead
Public-share render path (api/brief/public/[hash].ts → renderer
publicMode=true) MUST NEVER serve the personalised digest.lead
because that string can carry profile context — watched assets,
saved-region names, etc. — written by generateDigestProse with
ctx.profile populated.
Previously: redactForPublic redacted user.name and stories.whyMatters
but passed digest.lead through unchanged. Codex Round-2 High
(security finding).
Now (v3 envelope contract):
- redactForPublic substitutes digest.lead = digest.publicLead when
the v3 envelope carries one (generated by generateDigestProsePublic
with profile=null, cache-shared across all public readers).
- When publicLead is absent (v2 envelope still in TTL window OR v3
envelope where publicLead generation failed), redactForPublic sets
digest.lead to empty string.
- renderDigestGreeting: when lead is empty, OMIT the <blockquote>
pull-quote entirely. Page still renders complete (greeting +
horizontal rule), just without the italic lead block.
- NEVER falls back to the original personalised lead.
assertBriefEnvelope still validates publicLead's contract (when
present, must be a non-empty string) BEFORE redactForPublic runs,
so a malformed publicLead throws before any leak risk.
Tests added (tests/brief-magazine-render.test.mjs):
- v3 envelope renders publicLead in pull-quote, personalised lead
text never appears.
- v2 envelope (no publicLead) omits pull-quote; rest of page
intact.
- empty-string publicLead rejected by validator (defensive).
- private render still uses personalised lead.
Test results: 68 brief-magazine-render tests pass; full data suite
remains green from prior commit.
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 5, Codex Round-2 High (security).
* feat(digest): brief lead parity log + extra acceptance tests
Adds the parity-contract observability line and supplementary
acceptance tests for the canonical synthesis path.
Parity log (per send, after successful delivery):
[digest] brief lead parity user=<id> rule=<v>:<s>:<lang>
synthesis_level=<1|2|3> exec_len=<n> brief_lead_len=<n>
channels_equal=<bool> public_lead_len=<n>
When channels_equal=false an extra WARN line fires —
"PARITY REGRESSION user=… — email lead != envelope lead." Sentry's
existing console-breadcrumb hook lifts this without an explicit
captureMessage call. Plan acceptance criterion A5.
Tests added (tests/brief-llm.test.mjs, +9):
- generateDigestProsePublic: two distinct callers with identical
(sensitivity, story-pool) hit the SAME cache row (per Codex
Round-2 Medium #4 — "no PII in public cache key").
- public + private writes never collide on cache key (defensive).
- greeting bucket change re-keys the personalised cache (Brain B
parity).
- profile change re-keys the personalised cache.
- v3 cache prefix used (no v2 writes).
Test results: 77/77 in brief-llm; full data suite 6971/6971
(was 6962 pre-Step-7; +9 new public-cache tests).
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 6 (partial) + 7. Acceptance A5, A6.g, A6.f.
* test(digest): backfill A6.h/i/l/m acceptance tests via helper extraction
* fix(brief): close two correctness regressions on multi-rule + public surface
Two findings from human review of the canonical-synthesis PR:
1. Public-share redaction leaked personalised signals + threads.
The new prompt explicitly personalises both `lead` and `signals`
("personalise lead and signals"), but redactForPublic only
substituted `lead` — leaving `signals` and `threads` intact.
Public renderer's hasSignals gate would emit the signals page
whenever `digest.signals.length > 0`, exposing watched-asset /
region phrasing to anonymous readers. Same privacy bug class
the original PR was meant to close, just on different fields.
2. Multi-rule users got cross-pool lead/storyList mismatch.
composeAndStoreBriefForUser picks ONE winning rule for the
canonical envelope. The send loop then injected that ONE
`briefLead` into every due rule's channel body — even though
each rule's storyList came from its own (per-rule) digest pool.
Multi-rule users (e.g. `full` + `finance`) ended up with email
bodies leading on geopolitics while listing finance stories.
Cross-rule editorial mismatch reintroduced after the cross-
surface fix.
Fix 1 — public signals + threads:
- Envelope shape: BriefDigest gains `publicSignals?: string[]` +
`publicThreads?: BriefThread[]` (sibling fields to publicLead).
Renderer's ALLOWED_DIGEST_KEYS extended; assertBriefEnvelope
validates them when present.
- generateDigestProsePublic already returned a full prose object
(lead + signals + threads) — orchestration now captures all
three instead of just `.lead`. Composer splices each into its
envelope slot.
- redactForPublic substitutes:
digest.lead ← publicLead (or empty → omits pull-quote)
digest.signals ← publicSignals (or empty → omits signals page)
digest.threads ← publicThreads (or category-derived stub via
new derivePublicThreadsStub helper — never
falls back to the personalised threads)
- New tests cover all three substitutions + their fail-safes.
Fix 2 — per-rule synthesis in send loop:
- Each due rule independently calls runSynthesisWithFallback over
ITS OWN pool + ctx. Channel body lead is internally consistent
with the storyList (both from the same pool).
- Cache absorbs the cost: when this is the winner rule, the
synthesis hits the cache row written during the compose pass
(same userId/sensitivity/pool/ctx) — no extra LLM call. Only
multi-rule users with non-overlapping pools incur additional
LLM calls.
- magazineUrl still points at the winner's envelope (single brief
per user per slot — `(userId, issueSlot)` URL contract). Channel
lead vs magazine lead may differ for non-winner rule sends;
documented as acceptable trade-off (URL/key shape change to
support per-rule magazines is out of scope for this PR).
- Parity log refined: adds `winner_match=<bool>` field. The
PARITY REGRESSION warning now fires only when winner_match=true
AND the channel lead differs from the envelope lead (the actual
contract regression). Non-winner sends with legitimately
different leads no longer spam the alert.
Test results:
- tests/brief-magazine-render.test.mjs: 75/75 (+7 new for public
signals/threads + validator + private-mode-ignores-public-fields)
- Full data suite: 6995/6995 (was 6988; +7 net)
- typecheck + typecheck:api: clean
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Addresses 2 review findings on PR #3396 not anticipated in the
5-round Codex review.
* fix(brief): unify compose+send window, fall through filter-rejection
Address two residual risks in PR #3396 (single-canonical-brain refactor):
Risk 1 — canonical lead synthesized from a fixed 24h pool while the
send loop ships stories from `lastSentAt ?? 24h`. For weekly users
that meant a 24h-pool lead bolted onto a 7d email body — the same
cross-surface divergence the refactor was meant to eliminate, just in
a different shape. Twice-daily users hit a 12h-vs-24h variant.
Fix: extract the window formula to `digestWindowStartMs(lastSentAt,
nowMs, defaultLookbackMs)` in digest-orchestration-helpers.mjs and
call it from BOTH the compose path's digestFor closure AND the send
loop. The compose path now derives windowStart per-candidate from
`cand.lastSentAt`, identical to what the send loop will use for that
rule. Removed the now-unused BRIEF_STORY_WINDOW_MS constant.
Side-effect: digestFor now receives the full annotated candidate
(`cand`) instead of just the rule, so it can reach `cand.lastSentAt`.
Backwards-compatible at the helper level — pickWinningCandidateWithPool
forwards `cand` instead of `cand.rule`.
Cache memo hit rate drops since lastSentAt varies per-rule, but
correctness > a few extra Upstash GETs.
Risk 2 — pickWinningCandidateWithPool returned the first candidate
with a non-empty raw pool as winner. If composeBriefFromDigestStories
then dropped every story (URL/headline/shape filters), the caller
bailed without trying lower-priority candidates. Pre-PR behaviour was
to keep walking. This regressed multi-rule users whose top-priority
rule's pool happens to be entirely filter-rejected.
Fix: optional `tryCompose(cand, stories)` callback on
pickWinningCandidateWithPool. When provided, the helper calls it after
the non-empty pool check; falsy return → log filter-rejected and walk
to the next candidate; truthy → returns `{winner, stories,
composeResult}` so the caller can reuse the result. Without the
callback, legacy semantics preserved (existing tests + callers
unaffected).
Caller composeAndStoreBriefForUser passes a no-synthesis compose call
as tryCompose — cheap pure-JS, no I/O. Synthesis only runs once after
the winner is locked in, so the perf cost is one extra compose per
filter-rejected candidate, no extra LLM round-trips.
Tests:
- 10 new cases in tests/digest-orchestration-helpers.test.mjs
covering: digestFor receiving full candidate; tryCompose
fall-through to lower-priority; all-rejected returns null;
composeResult forwarded; legacy semantics without tryCompose;
digestWindowStartMs lastSentAt-vs-default branches; weekly +
twice-daily window parity assertions; epoch-zero ?? guard.
- Updated tests/digest-cache-key-sensitivity.test.mjs static-shape
regex to match the new `cand.rule.sensitivity` cache-key shape
(intent unchanged: cache key MUST include sensitivity).
Stacked on PR #3396 — targets feat/brief-two-brain-divergence.
422 lines
18 KiB
JavaScript
422 lines
18 KiB
JavaScript
// WorldMonitor Brief compose library.
|
|
//
|
|
// Pure helpers for producing the per-user brief envelope that the
|
|
// hosted magazine route (api/brief/*) + dashboard panel + future
|
|
// channels all consume. Shared between:
|
|
// - scripts/seed-digest-notifications.mjs (the consolidated cron;
|
|
// composes a brief for every user it's about to dispatch a
|
|
// digest to, so the magazine URL can be injected into the
|
|
// notification output).
|
|
// - future tests + ad-hoc tools.
|
|
//
|
|
// Deliberately has NO top-level side effects: no env guards, no
|
|
// process.exit, no main(). Import anywhere.
|
|
//
|
|
// History: this file used to include a stand-alone Railway cron
|
|
// (`seed-brief-composer.mjs`). That path was retired in the
|
|
// consolidation PR — the digest cron now owns the compose+send
|
|
// pipeline so there is exactly one cron writing brief:{userId}:
|
|
// {issueDate} keys.
|
|
|
|
import {
|
|
assembleStubbedBriefEnvelope,
|
|
filterTopStories,
|
|
issueDateInTz,
|
|
} from '../../shared/brief-filter.js';
|
|
|
|
// ── Rule dedupe (one brief per user, not per variant) ───────────────────────
|
|
|
|
const SENSITIVITY_RANK = { all: 0, high: 1, critical: 2 };
|
|
|
|
// Exported so the cron orchestration's two-pass winner walk
|
|
// (sortedDue / sortedAll) can sort each pass identically to how
|
|
// `groupEligibleRulesByUser` already orders candidates here. Kept as
|
|
// a same-shape function so callers can reuse it without re-deriving
|
|
// the priority key.
|
|
export function compareRules(a, b) {
|
|
const aFull = a.variant === 'full' ? 0 : 1;
|
|
const bFull = b.variant === 'full' ? 0 : 1;
|
|
if (aFull !== bFull) return aFull - bFull;
|
|
// Default missing sensitivity to 'high' (NOT 'all') so the rank
|
|
// matches what compose/buildDigest/cache/log actually treat the
|
|
// rule as. Otherwise a legacy undefined-sensitivity rule would be
|
|
// ranked as the most-permissive 'all' and tried first, but compose
|
|
// would then apply a 'high' filter — shipping a narrow brief while
|
|
// an explicit 'all' rule for the same user is never tried.
|
|
// See PR #3387 review (P2).
|
|
const aRank = SENSITIVITY_RANK[a.sensitivity ?? 'high'] ?? 0;
|
|
const bRank = SENSITIVITY_RANK[b.sensitivity ?? 'high'] ?? 0;
|
|
if (aRank !== bRank) return aRank - bRank;
|
|
return (a.updatedAt ?? 0) - (b.updatedAt ?? 0);
|
|
}
|
|
|
|
/**
|
|
* Group eligible (not-opted-out) rules by userId with each user's
|
|
* candidates sorted in preference order. Callers walk the candidate
|
|
* list and take the first that produces non-empty stories — falls
|
|
* back across variants cleanly.
|
|
*/
|
|
export function groupEligibleRulesByUser(rules) {
|
|
const byUser = new Map();
|
|
for (const rule of rules) {
|
|
if (!rule || typeof rule.userId !== 'string') continue;
|
|
if (rule.aiDigestEnabled === false) continue;
|
|
const list = byUser.get(rule.userId);
|
|
if (list) list.push(rule);
|
|
else byUser.set(rule.userId, [rule]);
|
|
}
|
|
for (const list of byUser.values()) list.sort(compareRules);
|
|
return byUser;
|
|
}
|
|
|
|
/**
|
|
* @deprecated Kept for existing test imports. Prefer
|
|
* groupEligibleRulesByUser + per-user fallback at call sites.
|
|
*/
|
|
export function dedupeRulesByUser(rules) {
|
|
const out = [];
|
|
for (const candidates of groupEligibleRulesByUser(rules).values()) {
|
|
if (candidates.length > 0) out.push(candidates[0]);
|
|
}
|
|
return out;
|
|
}
|
|
|
|
// ── Failure gate ─────────────────────────────────────────────────────────────
|
|
|
|
/**
|
|
* Decide whether the consolidated cron should exit non-zero because
|
|
* the brief-write failure rate is structurally bad (not just a
|
|
* transient blip). Denominator is ATTEMPTED writes, not eligible
|
|
* users: skipped-empty users never reach the write path and must not
|
|
* dilute the ratio.
|
|
*
|
|
* @param {{ success: number; failed: number; thresholdRatio?: number }} counters
|
|
*/
|
|
export function shouldExitNonZero({ success, failed, thresholdRatio = 0.05 }) {
|
|
if (failed <= 0) return false;
|
|
const attempted = success + failed;
|
|
if (attempted <= 0) return false;
|
|
const threshold = Math.max(1, Math.floor(attempted * thresholdRatio));
|
|
return failed >= threshold;
|
|
}
|
|
|
|
// ── Insights fetch ───────────────────────────────────────────────────────────
|
|
|
|
/** Unwrap news:insights:v1 envelope and project the fields the brief needs. */
|
|
export function extractInsights(raw) {
|
|
const data = raw?.data ?? raw;
|
|
const topStories = Array.isArray(data?.topStories) ? data.topStories : [];
|
|
const clusterCount = Number.isFinite(data?.clusterCount) ? data.clusterCount : topStories.length;
|
|
const multiSourceCount = Number.isFinite(data?.multiSourceCount) ? data.multiSourceCount : 0;
|
|
return {
|
|
topStories,
|
|
numbers: { clusters: clusterCount, multiSource: multiSourceCount },
|
|
};
|
|
}
|
|
|
|
// ── Date + display helpers ───────────────────────────────────────────────────
|
|
|
|
const MONTH_NAMES = [
|
|
'January', 'February', 'March', 'April', 'May', 'June',
|
|
'July', 'August', 'September', 'October', 'November', 'December',
|
|
];
|
|
|
|
export function dateLongFromIso(iso) {
|
|
const [y, m, d] = iso.split('-').map(Number);
|
|
return `${d} ${MONTH_NAMES[m - 1]} ${y}`;
|
|
}
|
|
|
|
export function issueCodeFromIso(iso) {
|
|
const [, m, d] = iso.split('-');
|
|
return `${d}.${m}`;
|
|
}
|
|
|
|
export function localHourInTz(nowMs, timezone) {
|
|
try {
|
|
const fmt = new Intl.DateTimeFormat('en-US', {
|
|
timeZone: timezone,
|
|
hour: 'numeric',
|
|
hour12: false,
|
|
});
|
|
const hour = fmt.formatToParts(new Date(nowMs)).find((p) => p.type === 'hour')?.value;
|
|
const n = Number(hour);
|
|
return Number.isFinite(n) ? n : 9;
|
|
} catch {
|
|
return 9;
|
|
}
|
|
}
|
|
|
|
export function userDisplayNameFromId(userId) {
|
|
// Clerk IDs look like "user_2abc…". Phase 3b will hydrate real
|
|
// names via a Convex query; for now a generic placeholder so the
|
|
// magazine's greeting reads naturally.
|
|
void userId;
|
|
return 'Reader';
|
|
}
|
|
|
|
// ── Compose a full brief for a single rule ──────────────────────────────────
|
|
|
|
// Cap on stories shown per user per brief.
|
|
//
|
|
// Default 12 — kept at the historical value because the offline sweep
|
|
// harness (scripts/sweep-topic-thresholds.mjs) showed bumping the cap
|
|
// to 16 against 2026-04-24 production replay data DROPPED visible
|
|
// quality at the active 0.45 threshold (visible_quality 0.916 → 0.716;
|
|
// positions 13-16 are mostly singletons or members of "should-separate"
|
|
// clusters at this threshold, so they dilute without helping adjacency).
|
|
//
|
|
// Env-tunable via DIGEST_MAX_STORIES_PER_USER so future sweep evidence
|
|
// (different threshold, different label set, different pool composition)
|
|
// can be acted on with a Railway env flip without a redeploy. Any
|
|
// invalid / non-positive value falls back to the 12 default.
|
|
//
|
|
// "Are we getting better" signal: re-run scripts/sweep-topic-thresholds.mjs
|
|
// with --cap N before flipping the env, and the daily
|
|
// scripts/brief-quality-report.mjs after.
|
|
function readMaxStoriesPerUser() {
|
|
const raw = process.env.DIGEST_MAX_STORIES_PER_USER;
|
|
if (raw == null || raw === '') return 12;
|
|
const n = Number.parseInt(raw, 10);
|
|
return Number.isFinite(n) && n > 0 ? n : 12;
|
|
}
|
|
// Exported so brief-llm.mjs (buildDigestPrompt + hashDigestInput) can
|
|
// slice to the same cap. Hard-coding `slice(0, 12)` there would mean
|
|
// the LLM prose only references the first 12 stories even when the
|
|
// brief envelope carries more — a quiet mismatch between what the
|
|
// reader sees as story cards vs the AI summary above them. Reviewer
|
|
// P1 on PR #3389.
|
|
export const MAX_STORIES_PER_USER = readMaxStoriesPerUser();
|
|
|
|
/**
|
|
* Filter + assemble a BriefEnvelope for one alert rule from a
|
|
* prebuilt upstream top-stories list (news:insights:v1 shape).
|
|
*
|
|
* @deprecated The live path is composeBriefFromDigestStories(), which
|
|
* reads from the same digest:accumulator pool as the email. This
|
|
* entry point is kept only for tests that stub a news:insights payload
|
|
* directly — real runs would ship a brief with a different story
|
|
* list than the email and should use the digest-stories path.
|
|
*
|
|
* @param {object} rule — enabled alertRule row
|
|
* @param {{ topStories: unknown[]; numbers: { clusters: number; multiSource: number } }} insights
|
|
* @param {{ nowMs: number }} [opts]
|
|
*/
|
|
export function composeBriefForRule(rule, insights, { nowMs = Date.now() } = {}) {
|
|
// Default to 'high' (NOT 'all') for parity with composeBriefFromDigestStories,
|
|
// buildDigest, the digestFor cache key, and the per-attempt log line.
|
|
// See PR #3387 review (P2).
|
|
const sensitivity = rule.sensitivity ?? 'high';
|
|
const tz = rule.digestTimezone ?? 'UTC';
|
|
const stories = filterTopStories({
|
|
stories: insights.topStories,
|
|
sensitivity,
|
|
maxStories: MAX_STORIES_PER_USER,
|
|
});
|
|
if (stories.length === 0) return null;
|
|
const issueDate = issueDateInTz(nowMs, tz);
|
|
return assembleStubbedBriefEnvelope({
|
|
user: { name: userDisplayNameFromId(rule.userId), tz },
|
|
stories,
|
|
issueDate,
|
|
dateLong: dateLongFromIso(issueDate),
|
|
issue: issueCodeFromIso(issueDate),
|
|
insightsNumbers: insights.numbers,
|
|
// Same nowMs as the rest of the envelope so the function stays
|
|
// deterministic for a given input — tests + retries see identical
|
|
// output.
|
|
issuedAt: nowMs,
|
|
localHour: localHourInTz(nowMs, tz),
|
|
});
|
|
}
|
|
|
|
// ── Compose from digest-accumulator stories (the live path) ─────────────────
|
|
|
|
// RSS titles routinely end with " - <Publisher>" / " | <Publisher>" /
|
|
// " — <Publisher>" (Google News normalised form + most major wires).
|
|
// Leaving the suffix in place means the brief headline reads like
|
|
// "... as Iran reimposes restrictions - AP News" instead of "... as
|
|
// Iran reimposes restrictions", and the source attribution underneath
|
|
// ends up duplicated. We strip the suffix ONLY when it matches the
|
|
// primarySource we're about to attribute anyway — so we never strip
|
|
// a real subtitle that happens to look like "foo - bar".
|
|
const HEADLINE_SUFFIX_RE_PART = /\s+[-\u2013\u2014|]\s+([^\s].*)$/;
|
|
|
|
/**
|
|
* @param {string} title
|
|
* @param {string} publisher
|
|
* @returns {string}
|
|
*/
|
|
export function stripHeadlineSuffix(title, publisher) {
|
|
if (typeof title !== 'string' || title.length === 0) return '';
|
|
if (typeof publisher !== 'string' || publisher.length === 0) return title.trim();
|
|
const trimmed = title.trim();
|
|
const m = trimmed.match(HEADLINE_SUFFIX_RE_PART);
|
|
if (!m) return trimmed;
|
|
const tail = m[1].trim();
|
|
// Case-insensitive full-string match. We're conservative: only strip
|
|
// when the tail EQUALS the publisher — a tail that merely contains
|
|
// it (e.g. "- AP News analysis") is editorial content and stays.
|
|
if (tail.toLowerCase() !== publisher.toLowerCase()) return trimmed;
|
|
return trimmed.slice(0, m.index).trimEnd();
|
|
}
|
|
|
|
/**
|
|
* Adapter: the digest accumulator hydrates stories from
|
|
* story:track:v1:{hash} (title / link / severity / lang / score /
|
|
* mentionCount / description?) + story:sources:v1:{hash} SMEMBERS. It
|
|
* does NOT carry a category or country-code — those fields are optional
|
|
* in the upstream brief-filter shape and default cleanly.
|
|
*
|
|
* Since envelope v2, the story's `link` field is carried through as
|
|
* `primaryLink` so filterTopStories can emit a BriefStory.sourceUrl.
|
|
* Stories without a valid link are still passed through here — the
|
|
* filter drops them at the validation boundary rather than this adapter.
|
|
*
|
|
* Description plumbing (post RSS-description fix, 2026-04-24):
|
|
* When the ingested story:track row carries a cleaned RSS description,
|
|
* it rides here as `s.description` and becomes the brief's baseline
|
|
* description. When absent (old rows inside the 48h bleed, or feeds
|
|
* without a description), we fall back to the cleaned headline —
|
|
* preserving today's behavior and letting Phase 3b's LLM enrichment
|
|
* still operate over something, not nothing.
|
|
*
|
|
* @param {object} s — digest-shaped story from buildDigest()
|
|
*/
|
|
function digestStoryToUpstreamTopStory(s) {
|
|
const sources = Array.isArray(s?.sources) ? s.sources : [];
|
|
const primarySource = sources.length > 0 ? sources[0] : 'Multiple wires';
|
|
const rawTitle = typeof s?.title === 'string' ? s.title : '';
|
|
const cleanTitle = stripHeadlineSuffix(rawTitle, primarySource);
|
|
const rawDescription = typeof s?.description === 'string' ? s.description.trim() : '';
|
|
return {
|
|
primaryTitle: cleanTitle,
|
|
// When upstream persists a real RSS description (via story:track:v1
|
|
// post-fix), forward it; otherwise fall back to the cleaned headline
|
|
// so downstream consumers (brief filter, Phase 3b LLM) always have
|
|
// something to ground on.
|
|
description: rawDescription || cleanTitle,
|
|
primarySource,
|
|
primaryLink: typeof s?.link === 'string' ? s.link : undefined,
|
|
threatLevel: s?.severity,
|
|
// story:track:v1 carries neither field, so the brief falls back
|
|
// to 'General' / 'Global' via filterTopStories defaults.
|
|
category: typeof s?.category === 'string' ? s.category : undefined,
|
|
countryCode: typeof s?.countryCode === 'string' ? s.countryCode : undefined,
|
|
// Stable digest story hash. Carried through so:
|
|
// (a) the canonical synthesis prompt can emit `rankedStoryHashes`
|
|
// referencing each story by hash (not position, not title),
|
|
// (b) `filterTopStories` can re-order the pool by ranking BEFORE
|
|
// applying the MAX_STORIES_PER_USER cap, so the model's
|
|
// editorial judgment of importance survives the cap.
|
|
// Falls back to titleHash when the digest path didn't materialise
|
|
// a primary `hash` (rare; shape varies across producer versions).
|
|
hash: typeof s?.hash === 'string' && s.hash.length > 0
|
|
? s.hash
|
|
: (typeof s?.titleHash === 'string' ? s.titleHash : undefined),
|
|
};
|
|
}
|
|
|
|
/**
|
|
* Compose a BriefEnvelope from a per-rule digest-accumulator pool
|
|
* (same stories the email digest uses), plus global insights numbers
|
|
* for the stats page.
|
|
*
|
|
* Returns null when no story survives the sensitivity filter — caller
|
|
* falls back to another variant or skips the user.
|
|
*
|
|
* Pure / synchronous. The cron orchestration layer pre-resolves the
|
|
* canonical synthesis (`exec` from `generateDigestProse`) and the
|
|
* non-personalised `publicLead` (`generateDigestProsePublic`) and
|
|
* passes them in via `opts.synthesis` — this module performs no LLM
|
|
* I/O.
|
|
*
|
|
* @param {object} rule — enabled alertRule row
|
|
* @param {unknown[]} digestStories — output of buildDigest(rule, windowStart)
|
|
* @param {{ clusters: number; multiSource: number }} insightsNumbers
|
|
* @param {{
|
|
* nowMs?: number,
|
|
* onDrop?: import('../../shared/brief-filter.js').DropMetricsFn,
|
|
* synthesis?: {
|
|
* lead?: string,
|
|
* threads?: Array<{ tag: string, teaser: string }>,
|
|
* signals?: string[],
|
|
* rankedStoryHashes?: string[],
|
|
* publicLead?: string,
|
|
* publicSignals?: string[],
|
|
* publicThreads?: Array<{ tag: string, teaser: string }>,
|
|
* },
|
|
* }} [opts]
|
|
* `onDrop` is forwarded to filterTopStories so the seeder can
|
|
* aggregate per-user filter-drop counts without this module knowing
|
|
* how they are reported.
|
|
* `synthesis` (when provided) substitutes envelope.digest.lead /
|
|
* threads / signals / publicLead with the canonical synthesis from
|
|
* the orchestration layer, and re-orders the candidate pool by
|
|
* `synthesis.rankedStoryHashes` before applying the cap.
|
|
*/
|
|
export function composeBriefFromDigestStories(rule, digestStories, insightsNumbers, { nowMs = Date.now(), onDrop, synthesis } = {}) {
|
|
if (!Array.isArray(digestStories) || digestStories.length === 0) return null;
|
|
// Default to 'high' (NOT 'all') for undefined sensitivity, aligning
|
|
// with buildDigest at scripts/seed-digest-notifications.mjs:392 and
|
|
// the digestFor cache key. The live cron path pre-filters the pool
|
|
// to {critical, high}, so this default is a no-op for production
|
|
// calls — but a non-prefiltered caller with undefined sensitivity
|
|
// would otherwise silently widen to {medium, low} stories while the
|
|
// operator log labels the attempt as 'high', misleading telemetry.
|
|
// See PR #3387 review (P2) and Defect 2 / Solution 1 in
|
|
// docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.
|
|
const sensitivity = rule.sensitivity ?? 'high';
|
|
const tz = rule.digestTimezone ?? 'UTC';
|
|
const upstreamLike = digestStories.map(digestStoryToUpstreamTopStory);
|
|
const stories = filterTopStories({
|
|
stories: upstreamLike,
|
|
sensitivity,
|
|
maxStories: MAX_STORIES_PER_USER,
|
|
onDrop,
|
|
rankedStoryHashes: synthesis?.rankedStoryHashes,
|
|
});
|
|
if (stories.length === 0) return null;
|
|
const issueDate = issueDateInTz(nowMs, tz);
|
|
const envelope = assembleStubbedBriefEnvelope({
|
|
user: { name: userDisplayNameFromId(rule.userId), tz },
|
|
stories,
|
|
issueDate,
|
|
dateLong: dateLongFromIso(issueDate),
|
|
issue: issueCodeFromIso(issueDate),
|
|
insightsNumbers,
|
|
issuedAt: nowMs,
|
|
localHour: localHourInTz(nowMs, tz),
|
|
});
|
|
// Splice canonical synthesis into the envelope's digest. Done as a
|
|
// shallow merge so the assembleStubbedBriefEnvelope path stays the
|
|
// single source for greeting/numbers/threads-default. We only
|
|
// override the LLM-driven fields when the orchestrator supplied
|
|
// them; missing fields fall back to the stub for graceful
|
|
// degradation when synthesis fails.
|
|
if (synthesis && envelope?.data?.digest) {
|
|
if (typeof synthesis.lead === 'string' && synthesis.lead.length > 0) {
|
|
envelope.data.digest.lead = synthesis.lead;
|
|
}
|
|
if (Array.isArray(synthesis.threads) && synthesis.threads.length > 0) {
|
|
envelope.data.digest.threads = synthesis.threads;
|
|
}
|
|
if (Array.isArray(synthesis.signals)) {
|
|
envelope.data.digest.signals = synthesis.signals;
|
|
}
|
|
if (typeof synthesis.publicLead === 'string' && synthesis.publicLead.length > 0) {
|
|
envelope.data.digest.publicLead = synthesis.publicLead;
|
|
}
|
|
// Public signals/threads are non-personalised siblings produced by
|
|
// generateDigestProsePublic. Captured separately from the
|
|
// personalised signals/threads above so the share-URL renderer
|
|
// never has to choose between leaking and omitting a whole page.
|
|
if (Array.isArray(synthesis.publicSignals) && synthesis.publicSignals.length > 0) {
|
|
envelope.data.digest.publicSignals = synthesis.publicSignals;
|
|
}
|
|
if (Array.isArray(synthesis.publicThreads) && synthesis.publicThreads.length > 0) {
|
|
envelope.data.digest.publicThreads = synthesis.publicThreads;
|
|
}
|
|
}
|
|
return envelope;
|
|
}
|