mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
2f5445284b6e62493c8d2e1b31f95b57c283f1fd
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2f5445284b |
fix(brief): single canonical synthesis brain — eliminate email/brief lead divergence (#3396)
* feat(brief-llm): canonical synthesis prompt + v3 cache key
Extends generateDigestProse to be the single source of truth for
brief executive-summary synthesis (canonicalises what was previously
split between brief-llm's generateDigestProse and seed-digest-
notifications.mjs's generateAISummary). Ports Brain B's prompt
features into buildDigestPrompt:
- ctx={profile, greeting, isPublic} parameter (back-compat: 4-arg
callers behave like today)
- per-story severity uppercased + short-hash prefix [h:XXXX] so the
model can emit rankedStoryHashes for stable re-ranking
- profile lines + greeting opener appear only when ctx.isPublic !== true
validateDigestProseShape gains optional rankedStoryHashes (≥4-char
strings, capped to MAX_STORIES_PER_USER × 2). v2-shaped rows still
pass — field defaults to [].
hashDigestInput v3:
- material includes profile-SHA, greeting bucket, isPublic flag,
per-story hash
- isPublic=true substitutes literal 'public' for userId in the cache
key so all share-URL readers of the same (date, sensitivity, pool)
hit ONE cache row (no PII in public cache key)
Adds generateDigestProsePublic(stories, sensitivity, deps) wrapper —
no userId param by design — for the share-URL surface.
Cache prefix bumped brief:llm:digest:v2 → v3. v2 rows expire on TTL.
Per the v1→v2 precedent (see hashDigestInput comment), one-tick cost
on rollout is acceptable for cache-key correctness.
Tests: 72/72 passing in tests/brief-llm.test.mjs (8 new for the v3
behaviors), full data suite 6952/6952.
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 1, Codex-approved (5 rounds).
* feat(brief): envelope v3 — adds digest.publicLead for share-URL surface
Bumps BRIEF_ENVELOPE_VERSION 2 → 3. Adds optional
BriefDigest.publicLead — non-personalised executive lead generated
by generateDigestProsePublic (already in this branch from the
previous commit) for the public share-URL surface. Personalised
`lead` is the canonical synthesis for authenticated channels;
publicLead is its profile-stripped sibling so api/brief/public/*
never serves user-specific content (watched assets/regions).
SUPPORTED_ENVELOPE_VERSIONS = [1, 2, 3] keeps v1 + v2 envelopes
in the 7-day TTL window readable through the rollout — the
composer only ever writes the current version, but readers must
tolerate older shapes that haven't expired yet. Same rollout
pattern used at the v1 → v2 bump.
Renderer changes (server/_shared/brief-render.js):
- ALLOWED_DIGEST_KEYS gains 'publicLead' (closed-key-set still
enforced; v2 envelopes pass because publicLead === undefined is
the v2 shape).
- assertBriefEnvelope: new isNonEmptyString check on publicLead
when present. Type contract enforced; absence is OK.
Tests (tests/brief-magazine-render.test.mjs):
- New describe block "v3 publicLead field": v3 envelope renders;
malformed publicLead rejected; v2 envelope still passes; ad-hoc
digest keys (e.g. synthesisLevel) still rejected — confirming
the closed-key-set defense holds for the cron-local-only fields
the orchestrator must NOT persist.
- BRIEF_ENVELOPE_VERSION pin updated 2 → 3 with rollout-rationale
comment.
Test results: 182 brief-related tests pass; full data suite
6956/6956.
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 2, Codex Round-3 Medium #2.
* feat(brief): synthesis splice + rankedStoryHashes pre-cap re-order
Plumbs the canonical synthesis output (lead, threads, signals,
publicLead, rankedStoryHashes from generateDigestProse) through the
pure composer so the orchestration layer can hand pre-resolved data
into envelope.digest. Composer stays sync / no I/O — Codex Round-2
High #2 honored.
Changes:
scripts/lib/brief-compose.mjs:
- digestStoryToUpstreamTopStory now emits `hash` (the digest story's
stable identifier, falls back to titleHash when absent). Without
this, rankedStoryHashes from the LLM has nothing to match against.
- composeBriefFromDigestStories accepts opts.synthesis = {lead,
threads, signals, rankedStoryHashes?, publicLead?}. When passed,
splices into envelope.digest after the stub is built. Partial
synthesis (e.g. only `lead` populated) keeps stub defaults for the
other fields — graceful degradation when L2 fallback fires.
shared/brief-filter.js:
- filterTopStories accepts optional rankedStoryHashes. New helper
applyRankedOrder re-orders stories by short-hash prefix match
BEFORE the cap is applied, so the model's editorial judgment of
importance survives MAX_STORIES_PER_USER. Stable for ties; stories
not in the ranking come after in original order. Empty/missing
ranking is a no-op (legacy callers unchanged).
shared/brief-filter.d.ts:
- filterTopStories signature gains rankedStoryHashes?: string[].
- UpstreamTopStory gains hash?: unknown (carried through from
digestStoryToUpstreamTopStory).
Tests added (tests/brief-from-digest-stories.test.mjs):
- synthesis substitutes lead/threads/signals/publicLead.
- legacy 4-arg callers (no synthesis) keep stub lead.
- partial synthesis (only lead) keeps stub threads/signals.
- rankedStoryHashes re-orders pool before cap.
- short-hash prefix match (model emits 8 chars; story carries full).
- unranked stories go after in original order.
Test results: 33/33 in brief-from-digest-stories; 182/182 across all
brief tests; full data suite 6956/6956.
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 3, Codex Round-2 Low + Round-2 High #2.
* feat(brief): single canonical synthesis per user; rewire all channels
Restructures the digest cron's per-user compose + send loops to
produce ONE canonical synthesis per user per issueSlot — the lead
text every channel (email HTML, plain-text, Telegram, Slack,
Discord, webhook) and the magazine show is byte-identical. This
eliminates the "two-brain" divergence that was producing different
exec summaries on different surfaces (observed 2026-04-25 0802).
Architecture:
composeBriefsForRun (orchestration):
- Pre-annotates every eligible rule with lastSentAt + isDue once,
before the per-user pass. Same getLastSentAt helper the send loop
uses so compose + send agree on lastSentAt for every rule.
composeAndStoreBriefForUser (per-user):
- Two-pass winner walk: try DUE rules first (sortedDue), fall back
to ALL eligible rules (sortedAll) for compose-only ticks.
Preserves today's dashboard refresh contract for weekly /
twice_daily users on non-due ticks (Codex Round-4 High #1).
- Within each pass, walk by compareRules priority and pick the
FIRST candidate with a non-empty pool — mirrors today's behavior
at scripts/seed-digest-notifications.mjs:1044 and prevents the
"highest-priority but empty pool" edge case (Codex Round-4
Medium #2).
- Three-level synthesis fallback chain:
L1: generateDigestProse(fullPool, ctx={profile,greeting,!public})
L2: generateDigestProse(envelope-sized slice, ctx={})
L3: stub from assembleStubbedBriefEnvelope
Distinct log lines per fallback level so ops can quantify
failure-mode distribution.
- Generates publicLead in parallel via generateDigestProsePublic
(no userId param; cache-shared across all share-URL readers).
- Splices synthesis into envelope via composer's optional
`synthesis` arg (Step 3); rankedStoryHashes re-orders the pool
BEFORE the cap so editorial importance survives MAX_STORIES.
- synthesisLevel stored in the cron-local briefByUser entry — NOT
persisted in the envelope (renderer's assertNoExtraKeys would
reject; Codex Round-2 Medium #5).
Send loop:
- Reads lastSentAt via shared getLastSentAt helper (single source
of truth with compose flow).
- briefLead = brief?.envelope?.data?.digest?.lead — the canonical
lead. Passed to buildChannelBodies (text/Telegram/Slack/Discord),
injectEmailSummary (HTML email), and sendWebhook (webhook
payload's `summary` field). All-channel parity (Codex Round-1
Medium #6).
- Subject ternary reads cron-local synthesisLevel: 1 or 2 →
"Intelligence Brief", 3 → "Digest" (preserves today's UX for
fallback paths; Codex Round-1 Missing #5).
Removed:
- generateAISummary() — the second LLM call that produced the
divergent email lead. ~85 lines.
- AI_SUMMARY_CACHE_TTL constant — no longer referenced. The
digest:ai-summary:v1:* cache rows expire on their existing 1h
TTL (no cleanup pass).
Helpers added:
- getLastSentAt(rule) — extracted Upstash GET for digest:last-sent
so compose + send both call one source of truth.
- buildSynthesisCtx(rule, nowMs) — formats profile + greeting for
the canonical synthesis call. Preserves all today's prefs-fetch
failure-mode behavior.
Composer:
- compareRules now exported from scripts/lib/brief-compose.mjs so
the cron can sort each pass identically to groupEligibleRulesByUser.
Test results: full data suite 6962/6962 (was 6956 pre-Step 4; +6
new compose-synthesis tests from Step 3).
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 4 + 4b. Codex-approved (5 rounds).
* fix(brief-render): public-share lead fail-safe — never leak personalised lead
Public-share render path (api/brief/public/[hash].ts → renderer
publicMode=true) MUST NEVER serve the personalised digest.lead
because that string can carry profile context — watched assets,
saved-region names, etc. — written by generateDigestProse with
ctx.profile populated.
Previously: redactForPublic redacted user.name and stories.whyMatters
but passed digest.lead through unchanged. Codex Round-2 High
(security finding).
Now (v3 envelope contract):
- redactForPublic substitutes digest.lead = digest.publicLead when
the v3 envelope carries one (generated by generateDigestProsePublic
with profile=null, cache-shared across all public readers).
- When publicLead is absent (v2 envelope still in TTL window OR v3
envelope where publicLead generation failed), redactForPublic sets
digest.lead to empty string.
- renderDigestGreeting: when lead is empty, OMIT the <blockquote>
pull-quote entirely. Page still renders complete (greeting +
horizontal rule), just without the italic lead block.
- NEVER falls back to the original personalised lead.
assertBriefEnvelope still validates publicLead's contract (when
present, must be a non-empty string) BEFORE redactForPublic runs,
so a malformed publicLead throws before any leak risk.
Tests added (tests/brief-magazine-render.test.mjs):
- v3 envelope renders publicLead in pull-quote, personalised lead
text never appears.
- v2 envelope (no publicLead) omits pull-quote; rest of page
intact.
- empty-string publicLead rejected by validator (defensive).
- private render still uses personalised lead.
Test results: 68 brief-magazine-render tests pass; full data suite
remains green from prior commit.
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 5, Codex Round-2 High (security).
* feat(digest): brief lead parity log + extra acceptance tests
Adds the parity-contract observability line and supplementary
acceptance tests for the canonical synthesis path.
Parity log (per send, after successful delivery):
[digest] brief lead parity user=<id> rule=<v>:<s>:<lang>
synthesis_level=<1|2|3> exec_len=<n> brief_lead_len=<n>
channels_equal=<bool> public_lead_len=<n>
When channels_equal=false an extra WARN line fires —
"PARITY REGRESSION user=… — email lead != envelope lead." Sentry's
existing console-breadcrumb hook lifts this without an explicit
captureMessage call. Plan acceptance criterion A5.
Tests added (tests/brief-llm.test.mjs, +9):
- generateDigestProsePublic: two distinct callers with identical
(sensitivity, story-pool) hit the SAME cache row (per Codex
Round-2 Medium #4 — "no PII in public cache key").
- public + private writes never collide on cache key (defensive).
- greeting bucket change re-keys the personalised cache (Brain B
parity).
- profile change re-keys the personalised cache.
- v3 cache prefix used (no v2 writes).
Test results: 77/77 in brief-llm; full data suite 6971/6971
(was 6962 pre-Step-7; +9 new public-cache tests).
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 6 (partial) + 7. Acceptance A5, A6.g, A6.f.
* test(digest): backfill A6.h/i/l/m acceptance tests via helper extraction
* fix(brief): close two correctness regressions on multi-rule + public surface
Two findings from human review of the canonical-synthesis PR:
1. Public-share redaction leaked personalised signals + threads.
The new prompt explicitly personalises both `lead` and `signals`
("personalise lead and signals"), but redactForPublic only
substituted `lead` — leaving `signals` and `threads` intact.
Public renderer's hasSignals gate would emit the signals page
whenever `digest.signals.length > 0`, exposing watched-asset /
region phrasing to anonymous readers. Same privacy bug class
the original PR was meant to close, just on different fields.
2. Multi-rule users got cross-pool lead/storyList mismatch.
composeAndStoreBriefForUser picks ONE winning rule for the
canonical envelope. The send loop then injected that ONE
`briefLead` into every due rule's channel body — even though
each rule's storyList came from its own (per-rule) digest pool.
Multi-rule users (e.g. `full` + `finance`) ended up with email
bodies leading on geopolitics while listing finance stories.
Cross-rule editorial mismatch reintroduced after the cross-
surface fix.
Fix 1 — public signals + threads:
- Envelope shape: BriefDigest gains `publicSignals?: string[]` +
`publicThreads?: BriefThread[]` (sibling fields to publicLead).
Renderer's ALLOWED_DIGEST_KEYS extended; assertBriefEnvelope
validates them when present.
- generateDigestProsePublic already returned a full prose object
(lead + signals + threads) — orchestration now captures all
three instead of just `.lead`. Composer splices each into its
envelope slot.
- redactForPublic substitutes:
digest.lead ← publicLead (or empty → omits pull-quote)
digest.signals ← publicSignals (or empty → omits signals page)
digest.threads ← publicThreads (or category-derived stub via
new derivePublicThreadsStub helper — never
falls back to the personalised threads)
- New tests cover all three substitutions + their fail-safes.
Fix 2 — per-rule synthesis in send loop:
- Each due rule independently calls runSynthesisWithFallback over
ITS OWN pool + ctx. Channel body lead is internally consistent
with the storyList (both from the same pool).
- Cache absorbs the cost: when this is the winner rule, the
synthesis hits the cache row written during the compose pass
(same userId/sensitivity/pool/ctx) — no extra LLM call. Only
multi-rule users with non-overlapping pools incur additional
LLM calls.
- magazineUrl still points at the winner's envelope (single brief
per user per slot — `(userId, issueSlot)` URL contract). Channel
lead vs magazine lead may differ for non-winner rule sends;
documented as acceptable trade-off (URL/key shape change to
support per-rule magazines is out of scope for this PR).
- Parity log refined: adds `winner_match=<bool>` field. The
PARITY REGRESSION warning now fires only when winner_match=true
AND the channel lead differs from the envelope lead (the actual
contract regression). Non-winner sends with legitimately
different leads no longer spam the alert.
Test results:
- tests/brief-magazine-render.test.mjs: 75/75 (+7 new for public
signals/threads + validator + private-mode-ignores-public-fields)
- Full data suite: 6995/6995 (was 6988; +7 net)
- typecheck + typecheck:api: clean
Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Addresses 2 review findings on PR #3396 not anticipated in the
5-round Codex review.
* fix(brief): unify compose+send window, fall through filter-rejection
Address two residual risks in PR #3396 (single-canonical-brain refactor):
Risk 1 — canonical lead synthesized from a fixed 24h pool while the
send loop ships stories from `lastSentAt ?? 24h`. For weekly users
that meant a 24h-pool lead bolted onto a 7d email body — the same
cross-surface divergence the refactor was meant to eliminate, just in
a different shape. Twice-daily users hit a 12h-vs-24h variant.
Fix: extract the window formula to `digestWindowStartMs(lastSentAt,
nowMs, defaultLookbackMs)` in digest-orchestration-helpers.mjs and
call it from BOTH the compose path's digestFor closure AND the send
loop. The compose path now derives windowStart per-candidate from
`cand.lastSentAt`, identical to what the send loop will use for that
rule. Removed the now-unused BRIEF_STORY_WINDOW_MS constant.
Side-effect: digestFor now receives the full annotated candidate
(`cand`) instead of just the rule, so it can reach `cand.lastSentAt`.
Backwards-compatible at the helper level — pickWinningCandidateWithPool
forwards `cand` instead of `cand.rule`.
Cache memo hit rate drops since lastSentAt varies per-rule, but
correctness > a few extra Upstash GETs.
Risk 2 — pickWinningCandidateWithPool returned the first candidate
with a non-empty raw pool as winner. If composeBriefFromDigestStories
then dropped every story (URL/headline/shape filters), the caller
bailed without trying lower-priority candidates. Pre-PR behaviour was
to keep walking. This regressed multi-rule users whose top-priority
rule's pool happens to be entirely filter-rejected.
Fix: optional `tryCompose(cand, stories)` callback on
pickWinningCandidateWithPool. When provided, the helper calls it after
the non-empty pool check; falsy return → log filter-rejected and walk
to the next candidate; truthy → returns `{winner, stories,
composeResult}` so the caller can reuse the result. Without the
callback, legacy semantics preserved (existing tests + callers
unaffected).
Caller composeAndStoreBriefForUser passes a no-synthesis compose call
as tryCompose — cheap pure-JS, no I/O. Synthesis only runs once after
the winner is locked in, so the perf cost is one extra compose per
filter-rejected candidate, no extra LLM round-trips.
Tests:
- 10 new cases in tests/digest-orchestration-helpers.test.mjs
covering: digestFor receiving full candidate; tryCompose
fall-through to lower-priority; all-rejected returns null;
composeResult forwarded; legacy semantics without tryCompose;
digestWindowStartMs lastSentAt-vs-default branches; weekly +
twice-daily window parity assertions; epoch-zero ?? guard.
- Updated tests/digest-cache-key-sensitivity.test.mjs static-shape
regex to match the new `cand.rule.sensitivity` cache-key shape
(intent unchanged: cache key MUST include sensitivity).
Stacked on PR #3396 — targets feat/brief-two-brain-divergence.
|
||
|
|
3373b542e9 |
feat(brief): make MAX_STORIES_PER_USER env-tunable (default 12, evidence kept it at 12) (#3389)
* fix(brief): bump MAX_STORIES_PER_USER 12 → 16 Production telemetry from PR #3387 surfaced cap-truncation as the dominant filter loss: 73% of `sensitivity=all` users had `dropped_cap=18` per tick (30 qualified stories truncated to 12). Multi-member topics straddling the position-12 boundary lost members. Bumping the cap to 16 lets larger leading topics fit fully without affecting `sensitivity=critical` users (their pools cap at 7-10 stories — well below either threshold). Reduces dropped_cap from ~18 to ~14 per tick. Validation signal: watch the `[digest] brief filter drops` log line on Railway after deploy — `dropped_cap=` should drop by ~4 per tick. Side effect: this addresses the dominant production signal that Solution 3 (post-filter regroup, originally planned in docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md) was supposed to handle. Production evidence killed Sol-3's premise (0 non-cap drops in 70 samples), so this is a simpler, evidence-backed alternative. * revise(brief): keep MAX_STORIES_PER_USER default at 12, add env-tunability Reviewer asked "why 16?" and the honest answer turned out to be: the data doesn't support it. After landing PR #3390's sweep harness with visible-window metrics, re-ran against 2026-04-24 production replay: threshold=0.45 cap=12 -> visible_quality 0.916 (best at this cap) threshold=0.45 cap=16 -> visible_quality 0.716 (cap bump HURTS) threshold=0.42 cap=12 -> visible_quality 0.845 threshold=0.42 cap=16 -> visible_quality 0.845 (neutral) At the current 0.45 threshold, positions 13-16 are mostly singletons or members of "should-separate" clusters — they dilute the brief without helping topic adjacency. Bumping the cap default to 16 was a wrong inference from the dropped_cap=18 signal alone. Revised approach: - Default MAX_STORIES_PER_USER stays at 12 (matches historical prod). - Constant becomes env-tunable via DIGEST_MAX_STORIES_PER_USER so any future sweep result can be acted on with a Railway env flip without a redeploy. The actual evidence-backed adjacency fix from the sweep is to lower DIGEST_DEDUP_TOPIC_THRESHOLD from 0.45 -> 0.42 (env flip; see PR #3390). * fix(brief-llm): tie buildDigestPrompt + hashDigestInput slice to MAX_STORIES_PER_USER Greptile P1 on PR #3389: with MAX_STORIES_PER_USER now env-tunable, hard-coded stories.slice(0, 12) in buildDigestPrompt and hashDigestInput would mean the LLM prose only references the first 12 stories when the brief carries more. Stories 13+ would appear as visible cards but be invisible to the AI summary — a quiet mismatch between reader narrative and brief content. Cache key MUST stay aligned with the prompt slice or it drifts from the prompt content; same constant fixes both sites. Exports MAX_STORIES_PER_USER from brief-compose.mjs (single source of truth) and imports it in brief-llm.mjs. No behaviour change at the default cap of 12. |
||
|
|
9c14820c69 |
fix(digest): brief filter-drop instrumentation + cache-key correctness (#3387)
* fix(digest): include sensitivity in digestFor cache key
buildDigest filters by rule.sensitivity BEFORE dedup, but digestFor
memoized only on (variant, lang, windowStart). Stricter-sensitivity
users in a shared bucket inherited the looser populator's pool,
producing the wrong story set and defeating downstream topic-grouping
adjacency once filterTopStories re-applied sensitivity.
Solution 1 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.
* feat(digest): instrument per-user filterTopStories drops
Adds an optional onDrop metrics callback to filterTopStories and threads
it through composeBriefFromDigestStories. The seeder aggregates counts
per composed brief and emits one structured log line per user per tick:
[digest] brief filter drops user=<id> sensitivity=<s> in=<count>
dropped_severity=<n> dropped_url=<n> dropped_headline=<n>
dropped_shape=<n> out=<count>
Decides whether the conditional Solution 3 (post-filter regroup) is
warranted by quantifying how often post-group filter drops puncture
multi-member topics in production. No behaviour change for callers
that omit onDrop.
Solution 0 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.
* fix(digest): close two Sol-0 instrumentation gaps from code review
Review surfaced two P2 gaps in the filter-drop telemetry that weakened
its diagnostic purpose for Sol-3 gating:
1. Cap-truncation silent drop: filterTopStories broke on
`out.length >= maxStories` BEFORE the onDrop emit sites, so up to
(DIGEST_MAX_ITEMS - MAX_STORIES_PER_USER) stories per user were
invisible. Added a 'cap' reason to DropMetricsFn and emit one event
per skipped story so `in - out - sum(dropped_*) == 0` reconciles.
2. Wipeout invisibility: composeAndStoreBriefForUser only logged drop
stats for the WINNING candidate. When every candidate composed to
null, the log line never fired — exactly the wipeout case Sol-0
was meant to surface. Now tracks per-candidate drops and emits an
aggregate `outcome=wipeout` line covering all attempts.
Also tightens the digest-cache-key sensitivity regex test to anchor
inside the cache-key template literal (it would otherwise match the
unrelated `chosenCandidate.sensitivity ?? 'high'` in the new log line).
PR review residuals from
docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md
ce-code-review run 20260424-232911-37a2d5df.
* chore: ignore .context/ ce-code-review run artifacts
The ce-code-review skill writes per-run artifacts (reviewer JSON,
synthesis.md, metadata.json) under .context/compound-engineering/.
These are local-only — neither tracked nor linted.
* fix(digest): emit per-attempt filter-drop rows, not per-user
Addresses two PR #3387 review findings:
- P2: Earlier candidates that composed to null (wiped out by post-group
filtering) had their dropStats silently discarded when a later
candidate shipped — exactly the signal Sol-0 was meant to surface.
- P3: outcome=wipeout row was labeled with allCandidateDrops[0]
.sensitivity, misleading when candidates within one user have
different sensitivities.
Fix: emit one structured row per attempted candidate, tagged with that
candidate's own sensitivity and variant. Outcome is shipped|rejected.
A wipeout is now detectable as "all rows for this user are rejected
within the tick" — no aggregate-row ambiguity. Removes the
allCandidateDrops accumulator entirely.
* fix(digest): align composeBriefFromDigestStories sensitivity default to 'high'
Addresses PR #3387 review (P2): composeBriefFromDigestStories defaulted
to `?? 'all'` while buildDigest, the digestFor cache key, and the new
per-attempt log line all default to `?? 'high'`. The mismatch is
harmless in production (the live cron path pre-filters the pool) but:
- A non-prefiltered caller with undefined sensitivity would silently
ship medium/low stories.
- Per-attempt telemetry labels the attempt as `sensitivity=high` while
compose actually applied 'all' — operators are misled.
Aligning compose to 'high' makes the four sites agree and the telemetry
honest. Production output is byte-identical (input pool was already
'high'-filtered upstream).
Adds 3 regression tests asserting the new default: critical/high admitted,
medium/low dropped, and onDrop fires reason=severity for the dropped
levels (locks in alignment with per-attempt telemetry).
* fix(digest): align remaining sensitivity defaults to 'high'
Addresses PR #3387 review (P2 + P3): three more sites still defaulted
missing sensitivity to 'all' while compose/buildDigest/cache/log now
treat it as 'high'.
P2 — compareRules (scripts/lib/brief-compose.mjs:35-36): the rank
function used to default to 'all', placing legacy undefined-sensitivity
rules FIRST in the candidate order. Compose then applied a 'high'
filter to them, shipping a narrow brief while an explicit 'all' rule
for the same user was never tried. Aligned to 'high' so the rank
matches what compose actually applies.
P3 — enrichBriefEnvelopeWithLLM (scripts/lib/brief-llm.mjs:526):
the digest prompt and cache key still used 'all' for legacy rules,
misleading personalization ("Reader sensitivity level: all" while the
brief contains only critical/high stories) and busting the cache for
legacy vs explicit-'all' rows that should share entries.
Also aligns the @deprecated composeBriefForRule (line 164) for
consistency, since tests still import it.
3 new regression tests in tests/brief-composer-rule-dedup.test.mjs
lock in the new ranking: explicit 'all' beats undefined-sensitivity,
undefined-sensitivity ties with explicit 'high' (decided by updatedAt),
and groupEligibleRulesByUser candidate order respects the rank.
6853/6853 tests pass (was 6850 → +3).
|
||
|
|
34dfc9a451 |
fix(news): ground LLM surfaces on real RSS description end-to-end (#3370)
* feat(news/parser): extract RSS/Atom description for LLM grounding (U1)
Add description field to ParsedItem, extract from the first non-empty of
description/content:encoded (RSS) or summary/content (Atom), picking the
longest after HTML-strip + entity-decode + whitespace-normalize. Clip to
400 chars. Reject empty, <40 chars after strip, or normalize-equal to the
headline — downstream consumers fall back to the cleaned headline on '',
preserving current behavior for feeds without a description.
CDATA end is anchored to the closing tag so internal ]]> sequences do not
truncate the match. Preserves cached rss:feed:v1 row compatibility during
the 1h TTL bleed since the field is additive.
Part of fix: pipe RSS description end-to-end so LLM surfaces stop
hallucinating named actors (docs/plans/2026-04-24-001-...).
Covers R1, R7.
* feat(news/story-track): persist description on story:track:v1 HSET (U2)
Append description to the story:track:v1 HSET only when non-empty. Additive
— no key version bump. Old rows and rows from feeds without a description
return undefined on HGETALL, letting downstream readers fall back to the
cleaned headline (R6).
Extract buildStoryTrackHsetFields as a pure helper so the inclusion gate is
unit-testable without Redis.
Update the contract comment in cache-keys.ts so the next reader of the
schema sees description as an optional field.
Covers R2, R6.
* feat(proto): NewsItem.snippet + SummarizeArticleRequest.bodies (U3)
Add two additive proto fields so the article description can ride to every
LLM-adjacent consumer without a breaking change:
- NewsItem.snippet (field 12): RSS/Atom description, HTML-stripped,
≤400 chars, empty when unavailable. Wired on toProtoItem.
- SummarizeArticleRequest.bodies (field 8): optional article bodies
paired 1:1 with headlines for prompt grounding. Empty array is today's
headline-only behavior.
Regenerated TS client/server stubs and OpenAPI YAML/JSON via sebuf v0.11.1
(PATH=~/go/bin required — Homebrew's protoc-gen-openapiv3 is an older
pre-bundle-mode build that collides on duplicate emission).
Pre-emptive bodies:[] placeholders at the two existing SummarizeArticle
call sites in src/services/summarization.ts; U6 replaces them with real
article bodies once SummarizeArticle handler reads the field.
Covers R3, R5.
* feat(brief/digest): forward RSS description end-to-end through brief envelope (U4)
Digest accumulator reader (seed-digest-notifications.mjs::buildDigest) now
plumbs the optional `description` field off each story:track:v1 HGETALL into
the digest story object. The brief adapter (brief-compose.mjs::
digestStoryToUpstreamTopStory) prefers the real RSS description over the
cleaned headline; when the upstream row has no description (old rows in the
48h bleed, feeds that don't carry one), we fall back to the cleaned headline
so today behavior is preserved (R6).
This is the upstream half of the description cache path. U5 lands the LLM-
side grounding + cache-prefix bump so Gemini actually sees the article body
instead of hallucinating a named actor from the headline.
Covers R4 (upstream half), R6.
* feat(brief/llm): RSS grounding + sanitisation + 4 cache prefix bumps (U5)
The actual fix for the headline-only named-actor hallucination class:
Gemini 2.5 Flash now receives the real article body as grounding context,
so it paraphrases what the article says instead of filling role-label
headlines from parametric priors ("Iran's new supreme leader" → "Ali
Khamenei" was the 2026-04-24 reproduction; with grounding, it becomes
the actual article-named actor).
Changes:
- buildStoryDescriptionPrompt interpolates a `Context: <body>` line
between the metadata block and the "One editorial sentence" instruction
when description is non-empty AND not normalise-equal to the headline.
Clips to 400 chars as a second belt-and-braces after the U1 parser cap.
No Context line → identical prompt to pre-fix (R6 preserved).
- sanitizeStoryForPrompt extended to cover `description`. Closes the
asymmetry where whyMatters was sanitised and description wasn't —
untrusted RSS bodies now flow through the same injection-marker
neutraliser before prompt interpolation. generateStoryDescription wraps
the story in sanitizeStoryForPrompt before calling the builder,
matching generateWhyMatters.
- Four cache prefixes bumped atomically to evict pre-grounding rows:
scripts/lib/brief-llm.mjs:
brief:llm:description:v1 → v2 (Railway, description path)
brief:llm:whymatters:v2 → v3 (Railway, whyMatters fallback)
api/internal/brief-why-matters.ts:
brief:llm:whymatters:v6 → v7 (edge, primary)
brief:llm:whymatters:shadow:v4 → shadow:v5 (edge, shadow)
hashBriefStory already includes description in the 6-field material
(v5 contract) so identity naturally drifts; the prefix bump is the
belt-and-braces that guarantees a clean cold-start on first tick.
- Tests: 8 new + 2 prefix-match updates on tests/brief-llm.test.mjs.
Covers Context-line injection, empty/dup-of-headline rejection,
400-char clip, sanitisation of adversarial descriptions, v2 write,
and legacy-v1 row dark (forced cold-start).
Covers R4 + new sanitisation requirement.
* feat(news/summarize): accept bodies + bump summary cache v5→v6 (U6)
SummarizeArticle now grounds on per-headline article bodies when callers
supply them, so the dashboard "News summary" path stops hallucinating
across unrelated headlines when the upstream RSS carried context.
Three coordinated changes:
1. SummarizeArticleRequest handler reads req.bodies, sanitises each entry
through sanitizeForPrompt (same trust treatment as geoContext — bodies
are untrusted RSS text), clips to 400 chars, and pads to the headlines
length so pair-wise identity is stable.
2. buildArticlePrompts accepts optional bodies and interleaves a
` Context: <body>` line under each numbered headline that has a
non-empty body. Skipped in translate mode (headline[0]-only) and when
all bodies are empty — yielding a byte-identical prompt to pre-U6
for every current caller (R6 preserved).
3. summary-cache-key bumps CACHE_VERSION v5→v6 so the pre-grounding rows
(produced from headline-only prompts) cold-start cleanly. Extends
canonicalizeSummaryInputs + buildSummaryCacheKey with a pair-wise
bodies segment `:bd<hash>`; the prefix is `:bd` rather than `:b` to
avoid colliding with `:brief:` when pattern-matching keys. Translate
mode is headline[0]-only and intentionally does not shift on bodies.
Dedup reorder preserved: the handler re-pairs bodies to the deduplicated
top-5 via findIndex, so layout matches without breaking cache identity.
New tests: 7 on buildArticlePrompts (bodies interleave, partial fill,
translate-mode skip, clip, short-array tolerance), 8 on
buildSummaryCacheKey (pair-wise sort, cache-bust on body drift, translate
skip). Existing summary-cache-key assertions updated v5→v6.
Covers R3, R4.
* feat(consumers): surface RSS snippet across dashboard, email, relay, MCP + audit (U7)
Thread the RSS description from the ingestion path (U1-U5) into every
user-facing LLM-adjacent surface. Audit the notification producers so
RSS-origin and domain-origin events stay on distinct contracts.
Dashboard (proto snippet → client → panel):
- src/types/index.ts NewsItem.snippet?:string (client-side field).
- src/app/data-loader.ts proto→client mapper propagates p.snippet.
- src/components/NewsPanel.ts renders snippet as a truncated (~200 chars,
word-boundary ellipsis) `.item-snippet` line under each headline.
- NewsPanel.currentBodies tracks per-headline bodies paired 1:1 with
currentHeadlines; passed as options.bodies to generateSummary so the
server-side SummarizeArticle LLM grounds on the article body.
Summary plumbing:
- src/services/summarization.ts threads bodies through SummarizeOptions
→ generateSummary → runApiChain → tryApiProvider; cache key now includes
bodies (via U6's buildSummaryCacheKey signature).
MCP world-brief:
- api/mcp.ts pairs headlines with their RSS snippets and POSTs `bodies`
to /api/news/v1/summarize-article so the MCP tool surface is no longer
starved.
Email digest:
- scripts/seed-digest-notifications.mjs plain-text formatDigest appends
a ~200-char truncated snippet line under each story; HTML formatDigestHtml
renders a dim-grey description div between title and meta. Both gated
on non-empty description (R6 — empty → today's behavior).
Real-time alerts:
- src/services/breaking-news-alerts.ts BreakingAlert gains optional
description; checkBatchForBreakingAlerts reads item.snippet; dispatchAlert
includes `description` in the /api/notify payload when present.
Notification relay:
- scripts/notification-relay.cjs formatMessage gated on
NOTIFY_RELAY_INCLUDE_SNIPPET=1 (default off). When on, RSS-origin
payloads render a `> <snippet>` context line under the title. When off
or payload.description absent, output is byte-identical to pre-U7.
Audit (RSS vs domain):
- tests/notification-relay-payload-audit.test.mjs enforces file-level
@notification-source tags on every producer, rejects `description:` in
domain-origin payload blocks, and verifies the relay codepath gates
snippet rendering under the flag.
- Tag added to ais-relay.cjs (domain), seed-aviation.mjs (domain),
alert-emitter.mjs (domain), breaking-news-alerts.ts (rss).
Deferred (plan explicitly flags): InsightsPanel + cluster-producer
plumbing (bodies default to [] — will unlock gradually once news:insights:v1
producer also carries primarySnippet).
Covers R5, R6.
* docs+test: grounding-path note + bump pinned CACHE_VERSION v5→v6 (U8)
Final verification for the RSS-description-end-to-end fix:
- docs/architecture.mdx — one-paragraph "News Grounding Pipeline"
subsection tracing parser → story:track:v1.description → NewsItem.snippet
→ brief / SummarizeArticle / dashboard / email / relay / MCP, with the
empty-description R6 fallback rule called out explicitly.
- tests/summarize-reasoning.test.mjs — Fix-4 static-analysis pin updated
to match the v6 bump from U6. Without this the summary cache bump silently
regressed CI's pinned-version assertion.
Final sweep (2026-04-24):
- grep -rn 'brief:llm:description:v1' → only in the U5 legacy-row test
simulation (by design: proves the v2 bump forces cold-start).
- grep -rn 'brief:llm:whymatters:v2/v6/shadow:v4' → no live references.
- grep -rn 'summary:v5' → no references.
- CACHE_VERSION = 'v6' in src/utils/summary-cache-key.ts.
- Full tsx --test sweep across all tests/*.test.{mjs,mts}: 6747/6747 pass.
- npm run typecheck + typecheck:api: both clean.
Covers R4, R6, R7.
* fix(rss-description): address /ce:review findings before merge
14 fixes from structured code review across 13 reviewer personas.
Correctness-critical (P1 — fixes that prevent R6/U7 contract violations):
- NewsPanel signature covers currentBodies so view-mode toggles that leave
headlines identical but bodies different now invalidate in-flight summaries.
Without this, switching renderItems → renderClusters mid-summary let a
grounded response arrive under a stale (now-orphaned) cache key.
- summarize-article.ts re-pairs bodies with headlines BEFORE dedup via a
single zip-sanitize-filter-dedup pass. Previously bodies[] was indexed by
position in light-sanitized headlines while findIndex looked up the
full-sanitized array — any headline that sanitizeHeadlines emptied
mispaired every subsequent body, grounding the LLM on the wrong story.
- Client skips the pre-chain cache lookup when bodies are present, since
client builds keys from RAW bodies while server sanitizes first. The
keys diverge on injection content, which would silently miss the
server's authoritative cache every call.
Test + audit hardening:
- Legacy v1 eviction test now uses the real hashBriefStory(story()) suffix
instead of a literal "somehash", so a bug where the reader still queried
the v1 prefix at the real key would actually be caught.
- tests/summary-cache-key.test.mts adds 400-char clip identity coverage so
the canonicalizer's clip and any downstream clip can't silently drift.
- tests/news-rss-description-extract.test.mts renames the well-formed
CDATA test and adds a new test documenting the malformed-]]> fallback
behavior (plain regex captures, article content survives).
Safe_auto cleanups:
- Deleted dead SNIPPET_PUSH_MAX constant in notification-relay.cjs.
- BETA-mode groq warm call now passes bodies, warming the right cache slot.
- seed-digest shares a local normalize-equality helper for description !=
headline comparison, matching the parser's contract.
- Pair-wise sort in summary-cache-key tie-breaks on body so duplicate
headlines produce stable order across runs.
- buildSummaryCacheKey gained JSDoc documenting the client/server contract
and the bodies parameter semantics.
- MCP get_world_brief tool description now mentions RSS article-body
grounding so calling agents see the current contract.
- _shared.ts `opts.bodies![i]!` double-bang replaced with `?? ''`.
- extractRawTagBody regexes cached in module-level Map, mirroring the
existing TAG_REGEX_CACHE pattern.
Deferred to follow-up (tracked for PR description / separate issue):
- Promote shared MAX_BODY constant across the 5 clip sites
- Promote shared truncateForDisplay helper across 4 render sites
- Collapse NewsPanel.{currentHeadlines, currentBodies} → Array<{title, snippet}>
- Promote sanitizeStoryForPrompt to shared/brief-llm-core.js
- Split list-feed-digest.ts parser helpers into sibling -utils.ts
- Strengthen audit test: forward-sweep + behavioral gate test
Tests: 6749/6749 pass. Typecheck clean on both configs.
* fix(summarization): thread bodies through browser T5 path (Codex #2)
Addresses the second of two Codex-raised findings on PR #3370:
The PR threaded bodies through the server-side API provider chain
(Ollama → Groq → OpenRouter → /api/news/v1/summarize-article) but the
local browser T5 path at tryBrowserT5 was still summarising from
headlines alone. In BETA_MODE that ungrounded path runs BEFORE the
grounded server providers; in normal mode it remains the last
fallback. Whenever T5-small won, the dashboard summary surface
regressed to the headline-only path — the exact hallucination class
this PR exists to eliminate.
Fix: tryBrowserT5 accepts an optional `bodies` parameter and
interleaves each body with its paired headline via a `headline —
body` separator in the combined text (clipped to 200 chars per body
to stay within T5-small's ~512-token context window). All three call
sites (BETA warm, BETA cold, normal-mode fallback) now pass the
bodies threaded down from generateSummary options.bodies.
When bodies is empty/omitted, the combined text is byte-identical to
pre-fix (R6 preserved).
On Codex finding #1 (story:track:v1 additive-only HSET keeps a body
from an earlier mention of the same normalized title), declining to
change. The current rule — "if this mention has a body, overwrite;
otherwise leave the prior body alone" — is defensible: a body from
mention A is not falsified by mention B being body-less (a wire
reprint doesn't invalidate the original source's body). A feed that
publishes a corrected headline creates a new normalized-title hash,
so no stale body carries forward. The failure window is narrow (live
story evolving while keeping the same title through hours of
body-less wire reprints) and the 7-day STORY_TTL is the backstop.
Opening a follow-up issue to revisit semantics if real-world evidence
surfaces a stale-grounding case.
* fix(story-track): description always-written to overwrite stale bodies (Codex #1)
Revisiting Codex finding #1 on PR #3370 after re-review. The previous
response declined the fix with reasoning; on reflection the argument
was over-defending the current behavior.
Problem: buildStoryTrackHsetFields previously wrote `description` only
when non-empty. Because story:track:v1 rows are collapsed by
normalized-title hash, an earlier mention's body would persist for up
to STORY_TTL (7 days) on subsequent body-less mentions of the same
story. Consumers reading `track.description` via HGETALL could not
distinguish "this mention's body" from "some mention's body from the
last week," silently grounding brief / whyMatters / SummarizeArticle
LLMs on text the current mention never supplied. That violates the
grounding contract advertised to every downstream surface in this PR.
Fix: HSET `description` unconditionally on every mention — empty
string when the current item has no body, real body when it does. An
empty value overwrites any prior mention's body so the row is always
authoritative for the current cycle. Consumers continue to treat
empty description as "fall back to cleaned headline" (R6 preserved).
The 7-day STORY_TTL and normalized-title hash semantics are unchanged.
Trade-off accepted: a valid body from Feed A (NYT) is wiped when Feed
B (AP body-less wire reprint) arrives for the same normalized title,
even though Feed A's body is factually correct. Rationale: the
alternative — keeping Feed A's body indefinitely — means the user
sees Feed A's body attributed (by proximity) to an AP mention at a
later timestamp, which is at minimum misleading and at worst carries
retracted/corrected details. Honest absence beats unlabeled presence.
Tests: new stale-body overwrite sequence test (T0 body → T1 empty →
T2 new body), existing "writes description when non-empty" preserved,
existing "omits when empty" inverted to "writes empty, overwriting."
cache-keys.ts contract comment updated to mark description as
always-written rather than optional.
|
||
|
|
ec35cf4158 |
feat(brief): analyst prompt v2 — multi-sentence, grounded, story description (#3269)
* feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description
Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output
indistinguishable from legacy Gemini: identical single-sentence
abstraction ("destabilize / systemic / sovereign risk repricing") with
no named actors, metrics, or dates — in several cases Gemini was MORE
specific.
Root cause: 18–30 word cap compressed context specifics out.
v2 loosens three dials at once so we can settle the A/B:
1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences,
40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc,
MUST cite one specific named actor / metric / date / place from
the context. Analyst path only; gemini path stays on v1.
2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects
preamble boilerplate + leaked section labels + markdown.
3. Story description plumbed through — endpoint body accepts optional
story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB).
Cron forwards it when upstream has one (skipped when it equals the
headline — no new signal).
Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the
first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length.
If shadow-diff 24h after deploy still shows no delta vs gemini, kill
is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy).
Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean.
* fix(brief): stop truncating v2 multi-sentence output + description in cache hash
Two P1s caught in PR #3269 review.
P1a — cron reparsed endpoint output with v1 single-sentence parser,
silently dropping sentences 2+3 of v2 analyst output. The endpoint had
ALREADY validated the string (parseWhyMattersV2 for analyst path;
parseWhyMatters for gemini). Re-parsing with v1 took only the first
sentence — exact regression #3269 was meant to fix.
Fix: trust the endpoint. Replace re-parse with bounds check (30–500
chars) + stub-echo reject. Added regression test asserting multi-
sentence output reaches the envelope unchanged.
P1b — `story.description` flowed into the analyst prompt but NOT into
the cache hash. Two requests with identical core fields but different
descriptions collided on one cache slot → second caller got prose
grounded in the FIRST caller's description.
Fix: add `description` as the 6th field of `hashBriefStory`. Bump
endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are
dropped. Updated the parity sentinel in brief-llm-core.test.mjs to
match 6-field semantics. Added regression tests covering different-
descriptions-differ and present-vs-absent-differ.
Tests: 6083 pass. typecheck × 2 clean.
|
||
|
|
2f19d96357 |
feat(brief): route whyMatters through internal analyst-context endpoint (#3248)
* feat(brief): route whyMatters through internal analyst-context endpoint
The brief's "why this is important" callout currently calls Gemini on
only {headline, source, threatLevel, category, country} with no live
state. The LLM can't know whether a ceasefire is on day 2 or day 50,
that IMF flagged >90% gas dependency in UAE/Qatar/Bahrain, or what
today's forecasts look like. Output is generic prose instead of the
situational analysis WMAnalyst produces when given live context.
This PR adds an internal Vercel edge endpoint that reuses a trimmed
variant of the analyst context (country-brief, risk scores, top-3
forecasts, macro signals, market data — no GDELT, no digest-search)
and ships it through a one-sentence LLM call with the existing
WHY_MATTERS_SYSTEM prompt. The endpoint owns its own Upstash cache
(v3 prefix, 6h TTL), supports a shadow mode that runs both paths in
parallel for offline diffing, and is auth'd via RELAY_SHARED_SECRET.
Three-layer graceful degradation (endpoint → legacy Gemini-direct →
stub) keeps the brief shipping on any failure.
Env knobs:
- BRIEF_WHY_MATTERS_PRIMARY=analyst|gemini (default: analyst; typo → gemini)
- BRIEF_WHY_MATTERS_SHADOW=0|1 (default: 1; only '0' disables)
- BRIEF_WHY_MATTERS_SHADOW_SAMPLE_PCT=0..100 (default: 100)
- BRIEF_WHY_MATTERS_ENDPOINT_URL (Railway, optional override)
Cache keys:
- brief:llm:whymatters:v3:{hash16} — envelope {whyMatters, producedBy,
at}, 6h TTL. Endpoint-owned.
- brief:llm:whymatters:shadow:v1:{hash16} — {analyst, gemini, chosen,
at}, 7d TTL. Fire-and-forget.
- brief:llm:whymatters:v2:{hash16} — legacy. Cron's fallback path
still reads/writes during the rollout window; expires in ≤24h.
Tests: 6022 pass (existing 5915 + 12 core + 36 endpoint + misc).
typecheck + typecheck:api + biome on changed files clean.
Plan (Codex-approved after 4 rounds):
docs/plans/2026-04-21-001-feat-brief-why-matters-analyst-endpoint-plan.md
* fix(brief): address /ce:review round 1 findings on PR #3248
Fixes 5 findings from multi-agent review, 2 of them P1:
- #241 P1: `.gitignore !api/internal/**` was too broad — it re-included
`.env`, `.env.local`, and any future secret file dropped into that
directory. Narrowed to explicit source extensions (`*.ts`, `*.js`,
`*.mjs`) so parent `.env` / secrets rules stay in effect inside
api/internal/.
- #242 P1: `Dockerfile.digest-notifications` did not COPY
`shared/brief-llm-core.js` + `.d.ts`. Cron would have crashed at
container start with ERR_MODULE_NOT_FOUND. Added alongside
brief-envelope + brief-filter COPY lines.
- #243 P2: Cron dropped the endpoint's source/producedBy ground-truth
signal, violating PR #3247's own round-3 memory
(feedback_gate_on_ground_truth_not_configured_state.md). Added
structured log at the call site: `[brief-llm] whyMatters source=<src>
producedBy=<pb> hash=<h>`. Endpoint response now includes `hash` so
log + shadow-record pairs can be cross-referenced.
- #244 P2: Defense-in-depth prompt-injection hardening. Story fields
flowed verbatim into both LLM prompts, bypassing the repo's
sanitizeForPrompt convention. Added sanitizeStoryFields helper and
applied in both analyst and gemini paths.
- #245 P2: Removed redundant `validate` option from callLlmReasoning.
With only openrouter configured in prod, a parse-reject walked the
provider chain, then fell through to the other path (same provider),
then the cron's own fallback (same model) — 3x billing on one reject.
Post-call parseWhyMatters check already handles rejection cleanly.
Deferred to P3 follow-ups (todos 246-248): singleflight, v2 sunset,
misc polish (country-normalize LOC, JSDoc pruning, shadow waitUntil,
auto-sync mirror, context-assembly caching).
Tests: 6022 pass. typecheck + typecheck:api clean.
* fix(brief-why-matters): ctx.waitUntil for shadow write + sanitize legacy fallback
Two P2 findings on PR #3248:
1. Shadow record was fire-and-forget without ctx.waitUntil on an Edge
function. Vercel can terminate the isolate after response return,
so the background redisPipeline write completes unreliably — i.e.
the rollout-validation signal the shadow keys were supposed to
provide was flaky in production.
Fix: accept an optional EdgeContext 2nd arg. Build the shadow
promise up front (so it starts executing immediately) then register
it with ctx.waitUntil when present. Falls back to plain unawaited
execution when ctx is absent (local harness / tests).
2. scripts/lib/brief-llm.mjs legacy fallback path called
buildWhyMattersPrompt(story) on raw fields with no sanitization.
The analyst endpoint sanitizes before its own prompt build, but
the fallback is exactly what runs when the endpoint misses /
errors — so hostile headlines / sources reached the LLM verbatim
on that path.
Fix: local sanitizeStoryForPrompt wrapper imports sanitizeForPrompt
from server/_shared/llm-sanitize.js (existing pattern — see
scripts/seed-digest-notifications.mjs:41). Wraps story fields
before buildWhyMattersPrompt. Cache key unchanged (hash is over raw
story), so cache parity with the analyst endpoint's v3 entries is
preserved.
Regression guard: new test asserts the fallback prompt strips
"ignore previous instructions", "### Assistant:" line prefixes, and
`<|im_start|>` tokens when injection-crafted fields arrive.
Typecheck + typecheck:api clean. 6023 / 6023 data tests pass.
* fix(digest-cron): COPY server/_shared/llm-sanitize into digest-notifications image
Reviewer P1 on PR #3248: my previous commit (
|
||
|
|
81536cb395 |
feat(brief): source links, LLM descriptions, strip suffix (envelope v2) (#3181)
* feat(brief): source links, LLM descriptions, strip publisher suffix (envelope v2) Three coordinated fixes to the magazine content pipeline. 1. Headlines were ending with " - AP News" / " | Reuters" etc. because the composer passed RSS titles through verbatim. Added stripHeadlineSuffix() in brief-compose.mjs, conservative case- insensitive match only when the trailing token equals primarySource, so a real subtitle that happens to contain a dash still survives. 2. Story descriptions were the headline verbatim. Added generateStoryDescription to brief-llm.mjs, plumbed into enrichBriefEnvelopeWithLLM: one additional LLM call per story, cached 24h on a v1 key covering headline, source, severity, category, country. Cache hits are revalidated via parseStoryDescription so a bad row cannot flow to the envelope. Falls through to the cleaned headline on any failure. 3. Source attribution was plain text, no outgoing link. Bumped BRIEF_ENVELOPE_VERSION to 2, added BriefStory.sourceUrl. The composer now plumbs story:track:v1.link through digestStoryToUpstreamTopStory, UpstreamTopStory.primaryLink, filterTopStories, BriefStory.sourceUrl. The renderer wraps the Source line in an anchor with target=_blank, rel=noopener noreferrer, and UTM params (utm_source=worldmonitor, utm_medium=brief, utm_campaign=<issueDate>, utm_content=story- <rank>). UTM appending is idempotent, publisher-attributed URLs keep their own utm_source. Envelope validation gains a validateSourceUrl step (https/http only, no userinfo credentials, parseable absolute URL). Stories without a valid upstream link are dropped by filterTopStories rather than shipping with an unlinked source. Tests: 30 renderer tests to 38; new assertions cover UTM presence on every anchor, HTML-escaping of ampersands in hrefs, pre-existing UTM preservation, and all four validator rejection modes. New composer tests cover suffix stripping, link plumb-through, and v2 drop-on-no- link behaviour. New LLM tests for generateStoryDescription cover cache hit/miss, revalidation of bad rows, 24h TTL, and null-on- failure. * fix(brief): v1 back-compat window on renderer + consolidate story hash helper Two P1/P2 review findings on #3181. P1 (v1 back-compat). Bumping BRIEF_ENVELOPE_VERSION 1 to 2 made every v1 envelope still resident in Redis under the 7-day TTL fail assertBriefEnvelope. The hosted /api/brief route would 404 "expired" and the /api/latest-brief preview would downgrade to "composing", breaking already-issued links from the preceding week. Fix: renderer now accepts SUPPORTED_ENVELOPE_VERSIONS = Set([1, 2]) on READ. BRIEF_ENVELOPE_VERSION stays at 2 and is the only version the composer ever writes. BriefStory.sourceUrl is required when version === 2 and absent on v1; when rendering a v1 story the source line degrades to plain text (no anchor), matching pre-v2 appearance. When the TTL window passes the set can shrink to [2] in a follow-up. P2 (hash dedup). hashStoryDescription was byte-identical to hashStory, inviting silent drift if one prompt gains a field the other forgets. Consolidated into hashBriefStory. Cache key separation remains via the distinct prefixes (brief:llm:whymatters:v2:/brief:llm:description:v1:). Tests: adds 3 v1 back-compat assertions (plain source line, field validation still runs, defensive sourceUrl check), updates the version-mismatch assertion to match the new supported-set message. 161/161 pass (was 158). Full test:data 5706/5706. |
||
|
|
c2356890da |
feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini (#3172)
* feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini
Replaces the Phase 3a stubs with editorial output from Gemini 2.5
Flash via the existing OpenRouter-backed callLLM chain. Two LLM
pathways, different caching semantics:
whyMatters (per story): 1 editorial sentence, 18-30 words, global
stakes. Cache brief:llm:whymatters:v1:{sha256(headline|source|severity)}
with 24h TTL shared ACROSS users (whyMatters is not personalised).
Bounded concurrency 5 so a 12-story brief doesn't open 12 parallel
sockets to OpenRouter.
digest prose (per user): JSON { lead, threads[], signals[] }
replacing the stubs. Cache brief:llm:digest:v1:{userId}:{sensitivity}
:{poolHash} with 4h TTL per-user. Pool hash is order-insensitive
so rank shuffling doesn't invalidate.
Provider pinned to OpenRouter (google/gemini-2.5-flash) via
skipProviders: ['ollama', 'groq'] per explicit user direction.
Null-safe all the way down. If the LLM is unreachable, parse fails,
or cache throws, enrichBriefEnvelopeWithLLM returns the baseline
envelope with its stubs intact. The brief always ships. Kill switch
BRIEF_LLM_ENABLED is distinct from AI_DIGEST_ENABLED so the brief's
editorial prose and the email's AI summary can be toggled
independently during provider outages.
Files:
scripts/lib/brief-llm.mjs (new) — pure prompt/parse helpers + IO
generateWhyMatters/generateDigestProse + envelope enrichment
scripts/seed-digest-notifications.mjs — BRIEF_LLM_ENABLED flag,
briefLlmDeps closure, enrichment inserted between compose + SETEX
tests/brief-llm.test.mjs (new, 34 cases)
End-to-end verification: the enriched envelope passes
assertBriefEnvelope() — the renderer's strict validator is the gate
between composer and api/brief, so we prove the enriched envelope
still validates.
156/156 brief tests pass. Both tsconfigs typecheck clean.
* fix(brief): address three P1 review findings on Phase 3b
All three findings are about cache-key correctness + envelope safety.
P1-A — whyMatters cache key under-specifies the prompt.
hashStory keyed on headline|source|threatLevel, but the prompt also
carries category + country. Upstream classification or geocoding
corrections that leave those three fields unchanged would return
pre-correction prose for a materially different prompt. Bumped to
v2 key space (pre-fix rows ignored, re-LLM once on rollout). Added
regression tests for category + country busting the cache.
P1-B — digest prose cache key under-specifies the prompt.
hashDigestInput sorted stories and hashed headline|threatLevel only.
The actual prompt includes ranked order + category + country + source.
v2 hash now canonicalises to JSON of the fields in the prompt's
ranked order. Test inverted to lock the corrected behaviour
(reordering MUST miss the cache). Added a test for category change
invalidating.
P1-C — malformed cached digest poisons the envelope at SETEX time.
On cache hit generateDigestProse accepted any object with a string
lead, skipping the full shape check. enrichBriefEnvelopeWithLLM then
wrote prose.threads/.signals into the envelope, and the cron SETEXed
unvalidated. A bad cache row would 404 /api/brief at render time.
Two-layer fix:
1. Extracted validateDigestProseShape(obj) — same strictness
parseDigestProse ran on fresh output. generateDigestProse now
runs it on cache hits too, and returns a normalised copy.
2. Cron now re-runs assertBriefEnvelope on the ENRICHED envelope
before SETEX. On assertion failure it falls back to the
unenriched baseline (already passed assertion on construction).
Regression test: malformed cached row is rejected on hit and the
LLM is called again to overwrite.
Tests: 8 new regression cases locking all three findings. Total brief
test suite now 185/185 green. Both tsconfigs typecheck clean.
Cache-key version bumps (v1 -> v2) trigger one-off cache miss on
deploy. Editorial prose re-LLM'd on the next cron tick per user.
* fix(brief): address two P2 review findings on #3172
P2-A: misleading test name 'different users share the cache' asserted
the opposite (per-user isolation). Renamed to 'different users do NOT
share the digest cache even when the story pool is identical' so a
future reader can't refactor away the per-user key on a misreading.
P2-B: signal length validator only capped bytes (< 220 chars), so a
30-word signal could pass even though the prompt says '<=14 words'.
Added a word-count filter with an 18-word ceiling (14 + 4 margin for
model drift / hyphenated compounds). Regression test locks the
behaviour: signals with >14-word drift are dropped, short imperatives
pass.
43/43 brief-llm tests pass. Both tsconfigs typecheck clean.
|