Files
worldmonitor/tests/middleware-bot-gate.test.mts
Elie Habib 38e6892995 fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205)
* fix(brief): per-run slot URL so same-day digests link to distinct briefs

Digest emails at 8am and 1pm on the same day pointed to byte-identical
magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz.
Each compose run overwrote the single daily envelope in place, and the
composer rolling 24h story window meant afternoon output often looked
identical to morning. Readers clicking an older email got whatever the
latest cron happened to write.

Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The
magazine URL, carousel URLs, and Redis key all carry the slot, and each
digest dispatch gets its own frozen envelope that lives out the 7d TTL.
envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026".

The digest cron also writes a brief:latest:{userId} pointer (7d TTL,
overwritten each compose) so the dashboard panel and share-url endpoint
can locate the most recent brief without knowing the slot. The
previous date-probing strategy does not work once keys carry HHMM.

No back-compat for the old YYYY-MM-DD format: the verifier rejects it,
the composer only ever writes the new shape, and any in-flight
notifications signed under the old format will 403 on click. Acceptable
at the rollout boundary per product decision.

* fix(brief): carve middleware bot allowlist to accept slot-format carousel path

BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the
pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted
by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist
and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn
image fetchers would 403 on sendMediaGroup, breaking previews for the
new digest links.

CI missed this because tests/middleware-bot-gate.test.mts still
exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the
slot format and add a regression asserting the pre-slot shape is now
rejected, so legacy links cannot silently leak the allowlist after
the rollout.

* fix(brief): preserve caller-requested slot + correct no-brief share-url error

Two contract bugs in the slot rollout that silently misled callers:

1. GET /api/latest-brief?slot=X where X has no envelope was returning
   { status: 'composing', issueDate: <today UTC> } — which reads as
   "today's brief is composing" instead of "the specific slot you
   asked about doesn't exist". A caller probing a known historical
   slot would get a completely unrelated "today" signal. Now we echo
   the requested slot back (issueSlot + issueDate derived from its
   date portion) when the caller supplied ?slot=, and keep the
   UTC-today placeholder only for the no-param path.

2. POST /api/brief/share-url with no slot and no latest-pointer was
   falling into the generic invalid_slot_shape 400 branch. That is
   not an input-shape problem; it is "no brief exists yet for this
   user". Return 404 brief_not_found — the same code the
   existing-envelope check returns — so callers get one coherent
   contract: either the brief exists and is shareable, or it doesn't
   and you get 404.
2026-04-19 14:15:59 +04:00

132 lines
5.4 KiB
TypeScript

// Regression tests for middleware.ts's bot-UA gate.
//
// Pins the contract around the `/api/brief/carousel/` carve-out
// shipped in PR #3196: social-platform image fetchers
// (Slack/Telegram/Discord/LinkedIn/etc.) must be able to download
// the carousel PNGs even though their UAs contain "bot" and thus
// match BOT_UA, while the generic bot gate must still 403 plain
// scrapers on every other API path.
//
// Without this test the allowlist is the kind of policy that
// silently regresses on future middleware edits — Telegram's
// sendMediaGroup failure mode ("WEBPAGE_CURL_FAILED") does not
// surface as a CI failure anywhere else.
import { describe, it } from 'node:test';
import assert from 'node:assert/strict';
import middleware from '../middleware';
const TELEGRAM_BOT_UA = 'TelegramBot (like TwitterBot)';
const SLACKBOT_UA = 'Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)';
const DISCORDBOT_UA = 'Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)';
const LINKEDINBOT_UA = 'LinkedInBot/1.0 (compatible; Mozilla/5.0; Apache-HttpClient +http://www.linkedin.com)';
const GENERIC_CURL_UA = 'curl/8.1.2';
const GENERIC_SCRAPER_UA = 'Mozilla/5.0 (compatible; SomeRandomBot/1.2)';
const CHROME_UA =
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 ' +
'(KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36';
// Slot format: YYYY-MM-DD-HHMM — per compose run, matches the
// carousel route's ISSUE_DATE_RE and the signer's slot regex.
const CAROUSEL_PATH = '/api/brief/carousel/user_abc/2026-04-19-0800/0';
// Bare YYYY-MM-DD (the pre-slot shape) must no longer match, so digest
// links that predate the slot rollout naturally fall into the bot gate
// instead of silently leaking the allowlist.
const LEGACY_DATE_ONLY_CAROUSEL_PATH = '/api/brief/carousel/user_abc/2026-04-19/0';
const OTHER_API_PATH = '/api/notifications';
const MALFORMED_CAROUSEL_PATH = '/api/brief/carousel/admin/dashboard';
function call(pathOrUrl: string, ua: string): Response | void {
const url = pathOrUrl.startsWith('http')
? pathOrUrl
: `https://www.worldmonitor.app${pathOrUrl}`;
const req = new Request(url, {
headers: ua ? { 'user-agent': ua } : {},
});
return middleware(req) as Response | void;
}
describe('middleware bot gate / carousel allowlist', () => {
it('passes TelegramBot through on the carousel route (the PR #3196 fix)', () => {
const res = call(CAROUSEL_PATH, TELEGRAM_BOT_UA);
assert.equal(res, undefined, 'Telegram must be able to fetch carousel images');
});
it('passes Slackbot through on the carousel route', () => {
const res = call(CAROUSEL_PATH, SLACKBOT_UA);
assert.equal(res, undefined);
});
it('passes Discordbot through on the carousel route', () => {
const res = call(CAROUSEL_PATH, DISCORDBOT_UA);
assert.equal(res, undefined);
});
it('passes LinkedInBot through on the carousel route', () => {
const res = call(CAROUSEL_PATH, LINKEDINBOT_UA);
assert.equal(res, undefined);
});
it('still 403s curl on the carousel route (bot gate protects from non-social UAs)', () => {
const res = call(CAROUSEL_PATH, GENERIC_CURL_UA);
assert.ok(res instanceof Response, 'should return a Response, not pass through');
assert.equal(res.status, 403);
});
it('still 403s a generic "bot" UA on the carousel route', () => {
const res = call(CAROUSEL_PATH, GENERIC_SCRAPER_UA);
assert.ok(res instanceof Response);
assert.equal(res.status, 403);
});
it('still 403s TelegramBot on non-carousel API routes (allowlist is scoped, not global)', () => {
const res = call(OTHER_API_PATH, TELEGRAM_BOT_UA);
assert.ok(res instanceof Response);
assert.equal(res.status, 403);
});
it('still 403s TelegramBot on malformed carousel paths (regex enforces route shape)', () => {
const res = call(MALFORMED_CAROUSEL_PATH, TELEGRAM_BOT_UA);
assert.ok(res instanceof Response);
assert.equal(res.status, 403);
});
it('still 403s missing UA on the carousel route (short-UA guard)', () => {
const res = call(CAROUSEL_PATH, '');
assert.ok(res instanceof Response);
assert.equal(res.status, 403);
});
it('passes normal browsers through on the carousel route', () => {
const res = call(CAROUSEL_PATH, CHROME_UA);
assert.equal(res, undefined);
});
it('passes normal browsers through on any API route', () => {
const res = call(OTHER_API_PATH, CHROME_UA);
assert.equal(res, undefined);
});
it('does not accept page 3+ on the carousel route (pageFromIndex only has 0/1/2)', () => {
const res = call('/api/brief/carousel/user_abc/2026-04-19-0800/3', TELEGRAM_BOT_UA);
assert.ok(res instanceof Response, 'out-of-range page must hit the bot gate');
assert.equal(res.status, 403);
});
it('does not accept non-slot segments on the carousel route', () => {
const res = call('/api/brief/carousel/user_abc/today/0', TELEGRAM_BOT_UA);
assert.ok(res instanceof Response);
assert.equal(res.status, 403);
});
it('does not accept the pre-slot YYYY-MM-DD shape (slot rollout parity)', () => {
// Once the composer moves to slot URLs, legacy date-only paths
// should NOT leak the social allowlist — they correspond to
// expired pre-rollout links whose Redis keys no longer exist.
const res = call(LEGACY_DATE_ONLY_CAROUSEL_PATH, TELEGRAM_BOT_UA);
assert.ok(res instanceof Response);
assert.equal(res.status, 403);
});
});