feat(brief): source links, LLM descriptions, strip suffix (envelope v2) (#3181)

* feat(brief): source links, LLM descriptions, strip publisher suffix (envelope v2)

Three coordinated fixes to the magazine content pipeline.

1. Headlines were ending with " - AP News" / " | Reuters" etc. because
   the composer passed RSS titles through verbatim. Added
   stripHeadlineSuffix() in brief-compose.mjs, conservative case-
   insensitive match only when the trailing token equals primarySource,
   so a real subtitle that happens to contain a dash still survives.

2. Story descriptions were the headline verbatim. Added
   generateStoryDescription to brief-llm.mjs, plumbed into
   enrichBriefEnvelopeWithLLM: one additional LLM call per story,
   cached 24h on a v1 key covering headline, source, severity,
   category, country. Cache hits are revalidated via
   parseStoryDescription so a bad row cannot flow to the envelope.
   Falls through to the cleaned headline on any failure.

3. Source attribution was plain text, no outgoing link. Bumped
   BRIEF_ENVELOPE_VERSION to 2, added BriefStory.sourceUrl. The
   composer now plumbs story:track:v1.link through
   digestStoryToUpstreamTopStory, UpstreamTopStory.primaryLink,
   filterTopStories, BriefStory.sourceUrl. The renderer wraps the
   Source line in an anchor with target=_blank, rel=noopener
   noreferrer, and UTM params (utm_source=worldmonitor,
   utm_medium=brief, utm_campaign=<issueDate>, utm_content=story-
   <rank>). UTM appending is idempotent, publisher-attributed URLs
   keep their own utm_source.

Envelope validation gains a validateSourceUrl step (https/http only,
no userinfo credentials, parseable absolute URL). Stories without a
valid upstream link are dropped by filterTopStories rather than
shipping with an unlinked source.

Tests: 30 renderer tests to 38; new assertions cover UTM presence on
every anchor, HTML-escaping of ampersands in hrefs, pre-existing UTM
preservation, and all four validator rejection modes. New composer
tests cover suffix stripping, link plumb-through, and v2 drop-on-no-
link behaviour. New LLM tests for generateStoryDescription cover
cache hit/miss, revalidation of bad rows, 24h TTL, and null-on-
failure.

* fix(brief): v1 back-compat window on renderer + consolidate story hash helper

Two P1/P2 review findings on #3181.

P1 (v1 back-compat). Bumping BRIEF_ENVELOPE_VERSION 1 to 2 made every
v1 envelope still resident in Redis under the 7-day TTL fail
assertBriefEnvelope. The hosted /api/brief route would 404 "expired"
and the /api/latest-brief preview would downgrade to "composing",
breaking already-issued links from the preceding week.

Fix: renderer now accepts SUPPORTED_ENVELOPE_VERSIONS = Set([1, 2])
on READ. BRIEF_ENVELOPE_VERSION stays at 2 and is the only version
the composer ever writes. BriefStory.sourceUrl is required when
version === 2 and absent on v1; when rendering a v1 story the source
line degrades to plain text (no anchor), matching pre-v2 appearance.
When the TTL window passes the set can shrink to [2] in a follow-up.

P2 (hash dedup). hashStoryDescription was byte-identical to hashStory,
inviting silent drift if one prompt gains a field the other forgets.
Consolidated into hashBriefStory. Cache key separation remains via
the distinct prefixes (brief:llm:whymatters:v2:/brief:llm:description:v1:).

Tests: adds 3 v1 back-compat assertions (plain source line, field
validation still runs, defensive sourceUrl check), updates the
version-mismatch assertion to match the new supported-set message.
161/161 pass (was 158). Full test:data 5706/5706.
This commit is contained in:
Elie Habib
2026-04-18 21:49:17 +04:00
committed by GitHub
parent 8fc302abd9
commit 81536cb395
12 changed files with 748 additions and 39 deletions

View File

@@ -21,7 +21,7 @@
// - Brainstorm: docs/brainstorms/2026-04-17-worldmonitor-brief-magazine-requirements.md
// - Plan: docs/plans/2026-04-17-003-feat-worldmonitor-brief-magazine-plan.md
import { BRIEF_ENVELOPE_VERSION } from '../../shared/brief-envelope.js';
import { BRIEF_ENVELOPE_VERSION, SUPPORTED_ENVELOPE_VERSIONS } from '../../shared/brief-envelope.js';
/**
* @typedef {import('../../shared/brief-envelope.js').BriefEnvelope} BriefEnvelope
@@ -123,9 +123,49 @@ const ALLOWED_STORY_KEYS = new Set([
'headline',
'description',
'source',
'sourceUrl',
'whyMatters',
]);
// Closed list of URL schemes we will interpolate into `href=`. A source
// record with an unknown scheme is a composer bug, not something to
// render — the story is dropped at envelope-validation time rather than
// shipping with an unlinked / broken source.
const ALLOWED_SOURCE_URL_SCHEMES = new Set(['https:', 'http:']);
/**
* Parses and validates a story source URL. Returns the normalised URL
* string on success; throws a descriptive error otherwise. The renderer
* validator wraps this in a per-story path-prefixed error so composer
* bugs are easy to locate.
*
* @param {unknown} raw
* @returns {string}
*/
function validateSourceUrl(raw) {
if (typeof raw !== 'string' || raw.length === 0) {
throw new Error('must be a non-empty string');
}
let parsed;
try {
parsed = new URL(raw);
} catch {
throw new Error(`must be a parseable absolute URL (got ${JSON.stringify(raw)})`);
}
if (!ALLOWED_SOURCE_URL_SCHEMES.has(parsed.protocol)) {
throw new Error(`scheme ${JSON.stringify(parsed.protocol)} is not allowed (http/https only)`);
}
// Bar `javascript:`-style smuggling via credentials or a Unicode host
// that renders like a legitimate outlet. These aren't exploitable
// through the renderer (we only emit the URL in an href with
// rel=noopener and we escape it), but they're always a composer bug
// so flag at write time.
if (parsed.username || parsed.password) {
throw new Error('must not include userinfo credentials');
}
return parsed.toString();
}
/**
* @param {Record<string, unknown>} obj
* @param {Set<string>} allowed
@@ -165,9 +205,15 @@ export function assertBriefEnvelope(envelope) {
const env = /** @type {Record<string, unknown>} */ (envelope);
assertNoExtraKeys(env, ALLOWED_ENVELOPE_KEYS, 'envelope');
if (env.version !== BRIEF_ENVELOPE_VERSION) {
// Accept any version in SUPPORTED_ENVELOPE_VERSIONS. The composer
// only ever writes the current BRIEF_ENVELOPE_VERSION; older
// versions are tolerated on READ so links issued in the 7-day TTL
// window survive a renderer rollout. Unknown versions are still
// rejected — an unexpected shape would lead the renderer to
// interpolate garbage.
if (typeof env.version !== 'number' || !SUPPORTED_ENVELOPE_VERSIONS.has(env.version)) {
throw new Error(
`renderBriefMagazine: envelope.version=${JSON.stringify(env.version)} does not match renderer version=${BRIEF_ENVELOPE_VERSION}. Deploy a matching renderer before producing envelopes at this version.`,
`renderBriefMagazine: envelope.version=${JSON.stringify(env.version)} is not in supported set [${[...SUPPORTED_ENVELOPE_VERSIONS].join(', ')}]. Deploy a matching renderer before producing envelopes at this version.`,
);
}
if (!isFiniteNumber(env.issuedAt)) {
@@ -242,6 +288,20 @@ export function assertBriefEnvelope(envelope) {
`envelope.data.stories[${i}].threatLevel must be one of critical|high|medium|low (got ${JSON.stringify(st.threatLevel)})`,
);
}
// sourceUrl is required on v2 and absent on v1. When present on
// either version, it must parse cleanly — a malformed URL would
// break the href. On v1 it's expected to be absent; a v1 envelope
// that somehow carries a sourceUrl is still validated (cheap
// defence against composer regressions).
if (env.version === BRIEF_ENVELOPE_VERSION || st.sourceUrl !== undefined) {
try {
validateSourceUrl(st.sourceUrl);
} catch (err) {
throw new Error(
`envelope.data.stories[${i}].sourceUrl ${/** @type {Error} */ (err).message}`,
);
}
}
});
// Cross-field invariant: surfaced count must match the actual number
@@ -455,11 +515,44 @@ function renderDigestSignals({ signals, dateShort, pageIndex, totalPages }) {
}
/**
* @param {{ story: BriefStory; rank: number; palette: 'light' | 'dark'; pageIndex: number; totalPages: number }} opts
* Build a tracked outgoing URL for the source line. Adds utm_source /
* utm_medium / utm_campaign / utm_content only when absent — if the
* upstream feed already embeds UTM (many publisher RSS do), we keep
* their attribution intact and just append ours after.
*
* Returns the original `raw` on URL parse failure. This path is
* unreachable in practice because assertBriefEnvelope already proved
* the URL parses, but fail-safe is cheap.
*
* @param {string} raw validated absolute https URL
* @param {string} issueDate envelope.data.date (YYYY-MM-DD)
* @param {number} rank 1-indexed story rank
*/
function renderStoryPage({ story, rank, palette, pageIndex, totalPages }) {
function buildTrackedSourceUrl(raw, issueDate, rank) {
try {
const u = new URL(raw);
if (!u.searchParams.has('utm_source')) u.searchParams.set('utm_source', 'worldmonitor');
if (!u.searchParams.has('utm_medium')) u.searchParams.set('utm_medium', 'brief');
if (!u.searchParams.has('utm_campaign')) u.searchParams.set('utm_campaign', issueDate);
if (!u.searchParams.has('utm_content')) u.searchParams.set('utm_content', `story-${pad2(rank)}`);
return u.toString();
} catch {
return raw;
}
}
/**
* @param {{ story: BriefStory; rank: number; palette: 'light' | 'dark'; pageIndex: number; totalPages: number; issueDate: string }} opts
*/
function renderStoryPage({ story, rank, palette, pageIndex, totalPages, issueDate }) {
const threatClass = HIGHLIGHTED_LEVELS.has(story.threatLevel) ? ' crit' : '';
const threatLabel = THREAT_LABELS[story.threatLevel];
// v1 envelopes don't carry sourceUrl — render the source as plain
// text (matching pre-v2 appearance). v2 envelopes always have a
// validated URL, so we wrap in a UTM-tracked anchor.
const sourceBlock = story.sourceUrl
? `<a class="source-link" href="${escapeHtml(buildTrackedSourceUrl(story.sourceUrl, issueDate, rank))}" target="_blank" rel="noopener noreferrer">${escapeHtml(story.source)}</a>`
: escapeHtml(story.source);
return (
`<section class="page story ${palette}">` +
'<div class="left">' +
@@ -472,7 +565,7 @@ function renderStoryPage({ story, rank, palette, pageIndex, totalPages }) {
'</div>' +
`<h3>${escapeHtml(story.headline)}</h3>` +
`<p class="desc">${escapeHtml(story.description)}</p>` +
`<div class="source">Source · ${escapeHtml(story.source)}</div>` +
`<div class="source">Source · ${sourceBlock}</div>` +
'</div>' +
'</div>' +
'<div class="right">' +
@@ -715,6 +808,17 @@ const STYLE_BLOCK = `<style>
}
.story.light .source { color: var(--sienna); }
.story.dark .source { color: var(--mint); }
/* Outgoing source anchor — inherit the palette colour from .source,
underline for affordance. rel=noopener noreferrer and target=_blank
are set in HTML; this is purely visual. */
.story .source-link {
color: inherit;
text-decoration: underline;
text-decoration-thickness: 1px;
text-underline-offset: 0.18em;
transition: text-decoration-thickness 160ms ease;
}
.story .source-link:hover { text-decoration-thickness: 2px; }
/* Logo ekg dot: mint on every page so the brand "signal" pulse
shows across the whole magazine. Light pages use the muted mint
so it doesn't glare against #fafafa. */
@@ -986,6 +1090,7 @@ export function renderBriefMagazine(envelope) {
palette: i % 2 === 0 ? 'light' : 'dark',
pageIndex: ++p,
totalPages,
issueDate: date,
}),
);
});