Commit Graph

16 Commits

Author SHA1 Message Date
Elie Habib
048bb8bb52 fix(brief): unblock whyMatters analyst endpoint (middleware 403) + DIGEST_ONLY_USER filter (#3255)
* fix(brief): unblock whyMatters analyst endpoint + add DIGEST_ONLY_USER filter

Three changes, all operational for PR #3248's brief-why-matters feature.

1. middleware.ts PUBLIC_API_PATHS allowlist
Railway logs post-#3248 merge showed every cron call to
/api/internal/brief-why-matters returning 403 — middleware's "short
UA" guard (~L183) rejects Node undici's default UA before the
endpoint's own Bearer-auth runs. The feature never executed in prod;
three-layer fallback silently shipped legacy Gemini output. Same
class as /api/seed-contract-probe (2026-04-15). Endpoint still
carries its own subtle-crypto HMAC auth, so bypassing the UA gate
is safe.

2. Explicit UA on callAnalystWhyMatters fetch
Defense-in-depth. Explicit 'worldmonitor-digest-notifications/1.0'
keeps the endpoint reachable if PUBLIC_API_PATHS is ever refactored,
and makes cron traffic distinguishable from ops curl in logs.

3. DIGEST_ONLY_USER=user_xxx filter
Operator single-user test flag. Set on Railway to run compose + send
for one user on the next tick (then unset) — validates new features
end-to-end without fanning out. Empty/unset = normal fan-out. Applied
right after rule fetch so both compose and dispatch paths respect it.

Regression tests: 15 new cases in tests/middleware-bot-gate.test.mts
pin every PUBLIC_API_PATHS entry against 3 triggers (empty/short/curl
UA) plus a negative sibling-path suite so a future prefix-match
refactor can't silently unblock /api/internal/.

Tests: 6043 pass. typecheck + typecheck:api clean. biome: pre-existing
main() complexity warning bumped 74→78 by the filter block (unchanged
in character from pre-PR).

* test(middleware): expand sibling-path negatives to cover all 3 trigger UAs

Greptile flagged: `SIBLING_PATHS` was only tested with `EMPTY_UA`. Under
the current middleware chain this is sufficient (sibling paths hit the
short-UA OR BOT_UA 403 regardless), but it doesn't pin *which* guard
fires. A future refactor that moves `PUBLIC_API_PATHS.has(path)` later
in the chain could let a curl or undici UA pass on a sibling path
without this suite failing.

Fix: iterate the 3 sibling paths against all 3 trigger UAs (empty,
short/undici, curl). Every combination must still 403 regardless of
which guard catches it. 6 new test cases.

Tests: 35 pass in the middleware-bot-gate suite (was 29).
2026-04-21 19:41:58 +04:00
Elie Habib
38e6892995 fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205)
* fix(brief): per-run slot URL so same-day digests link to distinct briefs

Digest emails at 8am and 1pm on the same day pointed to byte-identical
magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz.
Each compose run overwrote the single daily envelope in place, and the
composer rolling 24h story window meant afternoon output often looked
identical to morning. Readers clicking an older email got whatever the
latest cron happened to write.

Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The
magazine URL, carousel URLs, and Redis key all carry the slot, and each
digest dispatch gets its own frozen envelope that lives out the 7d TTL.
envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026".

The digest cron also writes a brief:latest:{userId} pointer (7d TTL,
overwritten each compose) so the dashboard panel and share-url endpoint
can locate the most recent brief without knowing the slot. The
previous date-probing strategy does not work once keys carry HHMM.

No back-compat for the old YYYY-MM-DD format: the verifier rejects it,
the composer only ever writes the new shape, and any in-flight
notifications signed under the old format will 403 on click. Acceptable
at the rollout boundary per product decision.

* fix(brief): carve middleware bot allowlist to accept slot-format carousel path

BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the
pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted
by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist
and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn
image fetchers would 403 on sendMediaGroup, breaking previews for the
new digest links.

CI missed this because tests/middleware-bot-gate.test.mts still
exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the
slot format and add a regression asserting the pre-slot shape is now
rejected, so legacy links cannot silently leak the allowlist after
the rollout.

* fix(brief): preserve caller-requested slot + correct no-brief share-url error

Two contract bugs in the slot rollout that silently misled callers:

1. GET /api/latest-brief?slot=X where X has no envelope was returning
   { status: 'composing', issueDate: <today UTC> } — which reads as
   "today's brief is composing" instead of "the specific slot you
   asked about doesn't exist". A caller probing a known historical
   slot would get a completely unrelated "today" signal. Now we echo
   the requested slot back (issueSlot + issueDate derived from its
   date portion) when the caller supplied ?slot=, and keep the
   UTC-today placeholder only for the no-param path.

2. POST /api/brief/share-url with no slot and no latest-pointer was
   falling into the generic invalid_slot_shape 400 branch. That is
   not an input-shape problem; it is "no brief exists yet for this
   user". Return 404 brief_not_found — the same code the
   existing-envelope check returns — so callers get one coherent
   contract: either the brief exists and is shareable, or it doesn't
   and you get 404.
2026-04-19 14:15:59 +04:00
Elie Habib
85d6308ed0 fix(brief): unblock Telegram carousel fetch in middleware bot gate (#3196)
* fix(brief): allow Telegram/social UAs to fetch carousel images

middleware.ts BOT_UA regex (/bot/i) was 403 on Telegram sendMediaGroup
fetch of /api/brief/carousel/<u>/<d>/<p>. SOCIAL_IMAGE_UA allowlist
(includes telegrambot) was scoped to /favico/* and .png suffix only;
carousel returns image/png but the URL has no extension.

Symptom: Railway log [digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED
and zero images above the Telegram brief.

Fix: extend UA-bypass guard to cover /api/brief/carousel/ prefix.
HMAC token on the URL is the real auth; UA allowlist is defence-in-depth.

* Address P2 + P3: regression test + route-shape regex

P2: Add tests/middleware-bot-gate.test.mts — 13 cases pinning the
contract:
  - TelegramBot/Slackbot/Discordbot/LinkedInBot pass on carousel
  - curl, generic bot UAs, missing UA still 403 on carousel
  - TelegramBot 403s on non-carousel API routes (scoped, not global)
  - Malformed carousel paths (admin/dashboard, page >= 3, non-ISO
    date) all still 403 via the regex
  - Normal browsers pass everywhere

P3: Replace startsWith('/api/brief/carousel/') prefix with
BRIEF_CAROUSEL_PATH_RE matching the exact shape enforced by
api/brief/carousel/[userId]/[issueDate]/[page].ts
(userId / YYYY-MM-DD / page 0|1|2). A future
/api/brief/carousel/admin or similar sibling cannot inherit the
bypass. Comment now lists every social-image UA this protects.

typecheck + typecheck:api clean. test:data 5772/5772.
2026-04-19 09:16:14 +04:00
Elie Habib
de769ce8e1 fix(api): unblock Pro API clients at edge + accept x-api-key alias (#3155)
* fix(api): unblock Pro API clients at edge + accept x-api-key alias

Fixes #3146: Pro API subscriber getting 403 when calling from Railway.
Two independent layers were blocking server-side callers:

1. Vercel Edge Middleware (middleware.ts) blocks any UA matching
   /bot|curl\/|python-requests|go-http|java\//, which killed every
   legitimate server-to-server API client before the gateway even saw
   the request. Add bypass: requests carrying an `x-worldmonitor-key`
   or `x-api-key` header that starts with `wm_` skip the UA gate.
   The prefix is a cheap client-side signal, not auth — downstream
   server/gateway.ts still hashes the key and validates against the
   Convex `userApiKeys` table + entitlement check.

2. Header name mismatch. Docs/gateway only accepted
   `X-WorldMonitor-Key`, but most API clients default to `x-api-key`.
   Accept both header names in:
     - api/_api-key.js (legacy static-key allowlist)
     - server/gateway.ts (user-issued Convex-backed keys)
     - server/_shared/premium-check.ts (isCallerPremium)
   Add `X-Api-Key` to CORS Allow-Headers in server/cors.ts and
   api/_cors.js so browser preflights succeed.

Follow-up outside this PR (Cloudflare dashboard, not in repo):
- Extend the "Allow api access with WM" custom WAF rule to also match
  `starts_with(http.request.headers["x-api-key"][0], "wm_")`, so CF
  Managed Rules don't block requests using the x-api-key header name.
- Update the api-cors-preflight CF Worker's corsHeaders to include
  `X-Api-Key` (memory: cors-cloudflare-worker.md — Worker overrides
  repo CORS on api.worldmonitor.app).

* fix(api): tighten middleware bypass shape + finish x-api-key alias coverage

Addresses review findings on #3155:

1. middleware.ts bypass was too loose. "Starts with wm_" let any caller
   send X-Api-Key: wm_fake and skip the UA gate, shifting unauthenticated
   scraper load onto the gateway's Convex lookup. Tighten to the exact
   key format emitted by src/services/api-keys.ts:generateKey —
   `^wm_[a-f0-9]{40}$` (wm_ + 20 random bytes as hex). Still a cheap
   edge heuristic (no hash lookup in middleware), but raises spoofing
   from trivial prefix match to a specific 43-char shape.

2. Alias was incomplete on bespoke endpoints outside the shared gateway:
   - api/v2/shipping/route-intelligence.ts: async wm_ user-key fallback
     now reads X-Api-Key as well
   - api/v2/shipping/webhooks.ts: webhook ownership fingerprint now
     reads X-Api-Key as well (same key value → same SHA-256 → same
     ownerTag, so a user registering with either header can manage
     their webhook from the other)
   - api/widget-agent.ts: accept X-Api-Key in the auth read AND in the
     OPTIONS Allow-Headers list
   - api/chat-analyst.ts: add X-Api-Key to the OPTIONS Allow-Headers
     list (auth path goes through shared helpers already aliased)
2026-04-18 08:18:49 +04:00
Elie Habib
1346946f15 fix(middleware): allow /api/seed-contract-probe to bypass bot UA filter (#3099)
Vercel log showed 'Middleware 403 Forbidden' on /api/seed-contract-probe
for both curl-from-ops and UptimeRobot requests. middleware.ts's BOT_UA
regex matches 'curl/' and 'bot', so any monitoring/probe UA was blocked
before reaching the handler — even though the probe has its own
RELAY_SHARED_SECRET auth that makes the UA check redundant.

Added /api/seed-contract-probe to PUBLIC_API_PATHS (joining /api/version
and /api/health). Safe: the endpoint enforces x-probe-secret matching
RELAY_SHARED_SECRET internally; bypassing the generic UA gate does not
reduce security.

Commented the allowlist to spell out the invariant: entries must carry
their own auth, because this list disables the middleware's generic bot
gate.

Verified via Vercel Inspector log trace:
  Firewall: bypass → OK
  Middleware: 403 Forbidden ← this commit fixes it
  Handler: (unreachable before fix)
2026-04-15 09:40:35 +04:00
Elie Habib
0169245f45 feat(seo): BlogPosting schema, FAQPage JSON-LD, extensible author system (#2284)
* feat(seo): BlogPosting schema, FAQPage JSON-LD, author system, AI crawler welcome

Blog structured data:
- Change @type Article to BlogPosting for all blog posts
- Author: Organization to Person with extensible default (Elie Habib)
- Add per-post author/authorUrl/authorBio/modifiedDate frontmatter fields
- Auto-extract FAQPage JSON-LD from FAQ sections in all 17 posts
- Show Updated date when modifiedDate differs from pubDate
- Add author bio section with GitHub avatar and fallback

Main app:
- Add commodity variant to middleware VARIANT_HOST_MAP and VARIANT_OG
- Add commodity.worldmonitor.app to sitemap.xml
- Shorten index.html meta description to 136 chars (was 161)
- Remove worksFor block from index.html author JSON-LD
- Welcome all bots in robots.txt (removed per-bot blocks, global allows)
- Update llms.txt: five variants listed, all 17 blog post URLs added

* fix(seo): scope FAQ regex to section boundary, use author-aware avatar

- extractFaqLd now slices only to the next ## heading (was: to end of body)
  preventing bold text in post-FAQ sections from being mistakenly extracted
- Avatar src now derived from DEFAULT_AUTHOR_GITHUB constant (koala73)
  only when using the default author; custom authors fall back to favicon
  so multi-author posts show a correct image instead of the wrong profile
2026-03-26 12:48:56 +04:00
Elie Habib
e1258ba1c4 fix: exclude /api/health from bot UA filtering in middleware (#1499)
UptimeRobot UA contains "bot" (UptimeRo*bot*) which triggers the
BOT_UA regex, causing 403 on health checks. Add /api/health to
PUBLIC_API_PATHS alongside /api/version.
2026-03-12 22:14:28 +04:00
Hasan AlDoy
02f3fe77a9 feat: Arabic font support and HLS live streaming UI (#1020)
* feat: enhance support for HLS streams and update font styles

* chore: add .vercelignore to exclude large local build artifacts from Vercel deploys

* chore: include node types in tsconfig to fix server type errors on Vercel build

* fix(middleware): guard optional variant OG lookup to satisfy strict TS

* fix: desktop build and live channels handle null safety

- scripts/build-sidecar-sebuf.mjs: Skip building removed [domain]/v1/[rpc].ts (removed in #785)
- src/live-channels-window.ts: Add optional chaining for handle property to prevent null errors
- src-tauri/Cargo.lock: Bump version to 2.5.24

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix: address review issues on PR #1020

- Remove AGENTS.md (project guidelines belong to repo owner)
- Restore tracking script in index.html (accidentally removed)
- Revert tsconfig.json "node" types (leaks Node globals to frontend)
- Add protocol validation to isHlsUrl() (security: block non-http URIs)
- Revert Cargo.lock version bump (release management concern)

* fix: address P2/P3 review findings

- Preserve hlsUrl for HLS-only channels in refreshChannelInfo (was
  incorrectly clearing the stream URL on every refresh cycle)
- Replace deprecated .substr() with .substring()
- Extract duplicated HLS display name logic into getChannelDisplayName()

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Elie Habib <elie.habib@gmail.com>
2026-03-05 10:16:43 +04:00
Elie Habib
898ac7b1c4 perf(rss): route RSS direct to Railway, skip Vercel middleman (#961)
* perf(rss): route RSS direct to Railway, skip Vercel middleman

Vercel /api/rss-proxy has 65% error rate (207K failed invocations/12h).
Route browser RSS requests directly to Railway (proxy.worldmonitor.app)
via Cloudflare CDN, eliminating Vercel as middleman.

- Add VITE_RSS_DIRECT_TO_RELAY feature flag (default off) for staged rollout
- Centralize RSS proxy URL in rssProxyUrl() with desktop/dev/prod routing
- Make Railway /rss public (skip auth, keep rate limiting with CF-Connecting-IP)
- Add wildcard *.worldmonitor.app CORS + always emit Vary: Origin on /rss
- Extract ~290 RSS domains to shared/rss-allowed-domains.cjs (single source of truth)
- Convert Railway domain check to Set for O(1) lookups
- Remove rss-proxy from KEYED_CLOUD_API_PATTERN (no longer needs API key header)
- Add edge function test for shared domain list import

* fix(edge): replace node:module with JSON import for edge-compatible RSS domains

api/_rss-allowed-domains.js used createRequire from node:module which is
unsupported in Vercel Edge Runtime, breaking all edge functions (including
api/gpsjam). Replaced with JSON import attribute syntax that works in both
esbuild (Vercel build) and Node.js 22+ (tests).

Also fixed middleware.ts TS18048 error where VARIANT_OG[variant] could be
undefined.

* test(edge): add guard against node: built-in imports in api/ files

Scans ALL api/*.js files (including _ helpers) for node: module imports
which are unsupported in Vercel Edge Runtime. This would have caught the
createRequire(node:module) bug before it reached Vercel.

* fix(edge): inline domain array and remove NextResponse reference

- Replace `import ... with { type: 'json' }` in _rss-allowed-domains.js
  with inline array — Vercel esbuild doesn't support import attributes
- Replace `NextResponse.next()` with bare `return` in middleware.ts —
  NextResponse was never imported

* ci(pre-push): add esbuild bundle check and edge function tests

The pre-push hook now catches Vercel build failures locally:
- esbuild bundles each api/*.js entrypoint (catches import attribute
  syntax, missing modules, and other bundler errors)
- runs edge function test suite (node: imports, module isolation)
2026-03-04 18:42:00 +04:00
Elie Habib
c956b73470 feat: consolidate 4 Vercel deployments into 1 via runtime variant detection (#756)
Replace build-time VITE_VARIANT resolution with hostname-based detection
for web deployments. A single build now serves all 4 variants (full, tech,
finance, happy) via subdomain routing, eliminating 3 redundant Vercel
projects and their build minutes.

- Extract VARIANT_META into shared src/config/variant-meta.ts
- Detect variant from hostname (tech./finance./happy. subdomains)
- Preserve VITE_VARIANT env var for desktop builds and localhost dev
- Add social bot OG responses in middleware for variant subdomains
- Swap favicons and meta tags at runtime per resolved variant
- Restrict localStorage variant reads to localhost/Tauri only
2026-03-02 16:39:02 +04:00
Elie Habib
36e36d8b57 Cost/traffic hardening, runtime fallback controls, and PostHog removal (#638)
- Remove PostHog analytics runtime and configuration
- Add API rate limiting (api/_rate-limit.js)
- Harden traffic controls across edge functions
- Add runtime fallback controls and data-loader improvements
- Add military base data scripts (fetch-mirta-bases, fetch-osm-bases)
- Gitignore large raw data files
- Settings playground prototypes
2026-03-01 11:53:20 +04:00
Elie Habib
dec52f2787 feat: add Cloudflare edge caching infrastructure for api.worldmonitor.app (#471)
Route web production RPC traffic through api.worldmonitor.app via fetch
interceptor (installWebApiRedirect). Add default Cache-Control headers
(s-maxage=300, stale-while-revalidate=60) on GET 200 responses, with
no-store override for real-time endpoints (vessel snapshot). Update CORS
to allow GET method. Skip Vercel bot middleware for API subdomain using
hostname check (non-spoofable, replacing CF-Ray header approach). Update
desktop cloud fallback to route through api.worldmonitor.app.
2026-02-27 19:48:49 +04:00
Elie Habib
408d5d3374 security: harden IPC, gate DevTools, isolate external windows, exempt /api/version (#348)
* security: harden IPC commands, gate DevTools, and isolate external windows

- Remove devtools from default Tauri features; gate behind opt-in
  Cargo feature so production builds never expose DevTools
- Add IPC origin validation (require_trusted_window) to 9 sensitive
  commands: get_secret, get_all_secrets, set_secret, delete_secret,
  get_local_api_token, read/write/delete_cache_entry, fetch_polymarket
- Isolate youtube-login window into restricted capability (core:window
  only) — prevents external-origin webview from invoking app commands
- Add 5-minute TTL to cached sidecar auth token in fetch patch closure
- Document renderer trust boundary threat model in runtime.ts

* docs: add contributors, security acknowledgments, and desktop security policy

- Add Contributors section to README with all 16 GitHub contributors
- Add Security Acknowledgments crediting Cody Richard for 3 disclosures
- Update SECURITY.md with desktop runtime security model (Tauri IPC
  origin validation, DevTools gating, sidecar auth, capability isolation,
  fetch patch trust boundary)
- Add Tauri-specific items to security report scope
- Correct API key storage description to cover both web and desktop

* fix: exempt /api/version from bot-blocking middleware

The desktop update check and sidecar requests were getting 403'd by the
middleware's bot UA filter (curl/) and short UA check.
2026-02-25 06:14:16 +00:00
Elie Habib
eee0eece50 fix: whitelist social preview bots + restrict SW routes to same-origin (#251)
* fix: restrict SW route patterns to same-origin only

The broad regex /^https?:\/\/.*\/api\/.*/i matched ANY URL with /api/
in the path, including external APIs like NASA EONET
(eonet.gsfc.nasa.gov/api/v3/events). Workbox intercepted these
cross-origin requests with NetworkOnly, causing no-response errors
when CORS failed.

Changed all /api/, /ingest/, and /rss/ SW route patterns to use
sameOrigin callback check so only our Vercel routes get NetworkOnly
handling. External APIs now pass through without SW interference.

* fix: whitelist social preview bots on OG image assets

Slack-ImgProxy (distinct from Slackbot) was blocked from fetching
/favico/og-image.png by both our bot filter and Vercel Attack Challenge.
Extend middleware matcher to /favico/* and allow all social preview/image
bots through on static asset paths.
2026-02-23 10:44:28 +00:00
Elie Habib
b714e7b13e fix: harden API routes, batch FRED requests, and sanitize tooltip HTML
- fred-data: batch mode (comma-separated series_id) reduces 7 edge
  function invocations to 1; cap at 15 series; propagate upstream
  502s instead of masking as empty 200; add X-Data-Status header
- ucdp-events: parallelize page fetches; track failed pages and use
  short cache TTL for partial results instead of caching at full 6h
- ucdp: add OPTIONS/method guard matching ucdp-events pattern
- middleware: exact-match social bot paths instead of startsWith
- vercel.json: use VERCEL_GIT_PREVIOUS_SHA for multi-commit diffs;
  add middleware.ts, settings.html, vercel.json to watch list
- Panel.ts: use safeHtml() allowlist sanitizer for tooltip content
- dom-utils: add safeHtml() with tag/attribute allowlist and
  javascript: URI blocking
2026-02-20 14:51:54 +04:00
Elie Habib
3ffe76b208 perf: add bot protection middleware and robots.txt to reduce API abuse
Block crawlers/scrapers from /api/* routes via Edge Middleware (403 for
bot user-agents and missing/short UAs). Social preview bots (Twitter,
Facebook, LinkedIn, Slack, Discord) are allowed on /api/story and
/api/og-story for OG previews. robots.txt reinforces the same policy.
2026-02-20 13:38:43 +04:00