worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	048bb8bb52	fix(brief): unblock whyMatters analyst endpoint (middleware 403) + DIGEST_ONLY_USER filter (#3255 ) * fix(brief): unblock whyMatters analyst endpoint + add DIGEST_ONLY_USER filter Three changes, all operational for PR #3248's brief-why-matters feature. 1. middleware.ts PUBLIC_API_PATHS allowlist Railway logs post-#3248 merge showed every cron call to /api/internal/brief-why-matters returning 403 — middleware's "short UA" guard (~L183) rejects Node undici's default UA before the endpoint's own Bearer-auth runs. The feature never executed in prod; three-layer fallback silently shipped legacy Gemini output. Same class as /api/seed-contract-probe (2026-04-15). Endpoint still carries its own subtle-crypto HMAC auth, so bypassing the UA gate is safe. 2. Explicit UA on callAnalystWhyMatters fetch Defense-in-depth. Explicit 'worldmonitor-digest-notifications/1.0' keeps the endpoint reachable if PUBLIC_API_PATHS is ever refactored, and makes cron traffic distinguishable from ops curl in logs. 3. DIGEST_ONLY_USER=user_xxx filter Operator single-user test flag. Set on Railway to run compose + send for one user on the next tick (then unset) — validates new features end-to-end without fanning out. Empty/unset = normal fan-out. Applied right after rule fetch so both compose and dispatch paths respect it. Regression tests: 15 new cases in tests/middleware-bot-gate.test.mts pin every PUBLIC_API_PATHS entry against 3 triggers (empty/short/curl UA) plus a negative sibling-path suite so a future prefix-match refactor can't silently unblock /api/internal/. Tests: 6043 pass. typecheck + typecheck:api clean. biome: pre-existing main() complexity warning bumped 74→78 by the filter block (unchanged in character from pre-PR). * test(middleware): expand sibling-path negatives to cover all 3 trigger UAs Greptile flagged: `SIBLING_PATHS` was only tested with `EMPTY_UA`. Under the current middleware chain this is sufficient (sibling paths hit the short-UA OR BOT_UA 403 regardless), but it doesn't pin which guard fires. A future refactor that moves `PUBLIC_API_PATHS.has(path)` later in the chain could let a curl or undici UA pass on a sibling path without this suite failing. Fix: iterate the 3 sibling paths against all 3 trigger UAs (empty, short/undici, curl). Every combination must still 403 regardless of which guard catches it. 6 new test cases. Tests: 35 pass in the middleware-bot-gate suite (was 29).	2026-04-21 19:41:58 +04:00
Elie Habib	38e6892995	fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205 ) * fix(brief): per-run slot URL so same-day digests link to distinct briefs Digest emails at 8am and 1pm on the same day pointed to byte-identical magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz. Each compose run overwrote the single daily envelope in place, and the composer rolling 24h story window meant afternoon output often looked identical to morning. Readers clicking an older email got whatever the latest cron happened to write. Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The magazine URL, carousel URLs, and Redis key all carry the slot, and each digest dispatch gets its own frozen envelope that lives out the 7d TTL. envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026". The digest cron also writes a brief:latest:{userId} pointer (7d TTL, overwritten each compose) so the dashboard panel and share-url endpoint can locate the most recent brief without knowing the slot. The previous date-probing strategy does not work once keys carry HHMM. No back-compat for the old YYYY-MM-DD format: the verifier rejects it, the composer only ever writes the new shape, and any in-flight notifications signed under the old format will 403 on click. Acceptable at the rollout boundary per product decision. * fix(brief): carve middleware bot allowlist to accept slot-format carousel path BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn image fetchers would 403 on sendMediaGroup, breaking previews for the new digest links. CI missed this because tests/middleware-bot-gate.test.mts still exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the slot format and add a regression asserting the pre-slot shape is now rejected, so legacy links cannot silently leak the allowlist after the rollout. * fix(brief): preserve caller-requested slot + correct no-brief share-url error Two contract bugs in the slot rollout that silently misled callers: 1. GET /api/latest-brief?slot=X where X has no envelope was returning { status: 'composing', issueDate: <today UTC> } — which reads as "today's brief is composing" instead of "the specific slot you asked about doesn't exist". A caller probing a known historical slot would get a completely unrelated "today" signal. Now we echo the requested slot back (issueSlot + issueDate derived from its date portion) when the caller supplied ?slot=, and keep the UTC-today placeholder only for the no-param path. 2. POST /api/brief/share-url with no slot and no latest-pointer was falling into the generic invalid_slot_shape 400 branch. That is not an input-shape problem; it is "no brief exists yet for this user". Return 404 brief_not_found — the same code the existing-envelope check returns — so callers get one coherent contract: either the brief exists and is shareable, or it doesn't and you get 404.	2026-04-19 14:15:59 +04:00
Elie Habib	85d6308ed0	fix(brief): unblock Telegram carousel fetch in middleware bot gate (#3196 ) * fix(brief): allow Telegram/social UAs to fetch carousel images middleware.ts BOT_UA regex (/bot/i) was 403 on Telegram sendMediaGroup fetch of /api/brief/carousel/<u>/<d>/<p>. SOCIAL_IMAGE_UA allowlist (includes telegrambot) was scoped to /favico/* and .png suffix only; carousel returns image/png but the URL has no extension. Symptom: Railway log [digest] Telegram carousel 400 ... WEBPAGE_CURL_FAILED and zero images above the Telegram brief. Fix: extend UA-bypass guard to cover /api/brief/carousel/ prefix. HMAC token on the URL is the real auth; UA allowlist is defence-in-depth. * Address P2 + P3: regression test + route-shape regex P2: Add tests/middleware-bot-gate.test.mts — 13 cases pinning the contract: - TelegramBot/Slackbot/Discordbot/LinkedInBot pass on carousel - curl, generic bot UAs, missing UA still 403 on carousel - TelegramBot 403s on non-carousel API routes (scoped, not global) - Malformed carousel paths (admin/dashboard, page >= 3, non-ISO date) all still 403 via the regex - Normal browsers pass everywhere P3: Replace startsWith('/api/brief/carousel/') prefix with BRIEF_CAROUSEL_PATH_RE matching the exact shape enforced by api/brief/carousel/[userId]/[issueDate]/[page].ts (userId / YYYY-MM-DD / page 0\|1\|2). A future /api/brief/carousel/admin or similar sibling cannot inherit the bypass. Comment now lists every social-image UA this protects. typecheck + typecheck:api clean. test:data 5772/5772.	2026-04-19 09:16:14 +04:00
Elie Habib	de769ce8e1	fix(api): unblock Pro API clients at edge + accept x-api-key alias (#3155 ) * fix(api): unblock Pro API clients at edge + accept x-api-key alias Fixes #3146: Pro API subscriber getting 403 when calling from Railway. Two independent layers were blocking server-side callers: 1. Vercel Edge Middleware (middleware.ts) blocks any UA matching /bot\|curl\/\|python-requests\|go-http\|java\//, which killed every legitimate server-to-server API client before the gateway even saw the request. Add bypass: requests carrying an `x-worldmonitor-key` or `x-api-key` header that starts with `wm_` skip the UA gate. The prefix is a cheap client-side signal, not auth — downstream server/gateway.ts still hashes the key and validates against the Convex `userApiKeys` table + entitlement check. 2. Header name mismatch. Docs/gateway only accepted `X-WorldMonitor-Key`, but most API clients default to `x-api-key`. Accept both header names in: - api/_api-key.js (legacy static-key allowlist) - server/gateway.ts (user-issued Convex-backed keys) - server/_shared/premium-check.ts (isCallerPremium) Add `X-Api-Key` to CORS Allow-Headers in server/cors.ts and api/_cors.js so browser preflights succeed. Follow-up outside this PR (Cloudflare dashboard, not in repo): - Extend the "Allow api access with WM" custom WAF rule to also match `starts_with(http.request.headers["x-api-key"][0], "wm_")`, so CF Managed Rules don't block requests using the x-api-key header name. - Update the api-cors-preflight CF Worker's corsHeaders to include `X-Api-Key` (memory: cors-cloudflare-worker.md — Worker overrides repo CORS on api.worldmonitor.app). * fix(api): tighten middleware bypass shape + finish x-api-key alias coverage Addresses review findings on #3155: 1. middleware.ts bypass was too loose. "Starts with wm_" let any caller send X-Api-Key: wm_fake and skip the UA gate, shifting unauthenticated scraper load onto the gateway's Convex lookup. Tighten to the exact key format emitted by src/services/api-keys.ts:generateKey — `^wm_[a-f0-9]{40}$` (wm_ + 20 random bytes as hex). Still a cheap edge heuristic (no hash lookup in middleware), but raises spoofing from trivial prefix match to a specific 43-char shape. 2. Alias was incomplete on bespoke endpoints outside the shared gateway: - api/v2/shipping/route-intelligence.ts: async wm_ user-key fallback now reads X-Api-Key as well - api/v2/shipping/webhooks.ts: webhook ownership fingerprint now reads X-Api-Key as well (same key value → same SHA-256 → same ownerTag, so a user registering with either header can manage their webhook from the other) - api/widget-agent.ts: accept X-Api-Key in the auth read AND in the OPTIONS Allow-Headers list - api/chat-analyst.ts: add X-Api-Key to the OPTIONS Allow-Headers list (auth path goes through shared helpers already aliased)	2026-04-18 08:18:49 +04:00
Elie Habib	1346946f15	fix(middleware): allow /api/seed-contract-probe to bypass bot UA filter (#3099 ) Vercel log showed 'Middleware 403 Forbidden' on /api/seed-contract-probe for both curl-from-ops and UptimeRobot requests. middleware.ts's BOT_UA regex matches 'curl/' and 'bot', so any monitoring/probe UA was blocked before reaching the handler — even though the probe has its own RELAY_SHARED_SECRET auth that makes the UA check redundant. Added /api/seed-contract-probe to PUBLIC_API_PATHS (joining /api/version and /api/health). Safe: the endpoint enforces x-probe-secret matching RELAY_SHARED_SECRET internally; bypassing the generic UA gate does not reduce security. Commented the allowlist to spell out the invariant: entries must carry their own auth, because this list disables the middleware's generic bot gate. Verified via Vercel Inspector log trace: Firewall: bypass → OK Middleware: 403 Forbidden ← this commit fixes it Handler: (unreachable before fix)	2026-04-15 09:40:35 +04:00
Elie Habib	0169245f45	feat(seo): BlogPosting schema, FAQPage JSON-LD, extensible author system (#2284 ) * feat(seo): BlogPosting schema, FAQPage JSON-LD, author system, AI crawler welcome Blog structured data: - Change @type Article to BlogPosting for all blog posts - Author: Organization to Person with extensible default (Elie Habib) - Add per-post author/authorUrl/authorBio/modifiedDate frontmatter fields - Auto-extract FAQPage JSON-LD from FAQ sections in all 17 posts - Show Updated date when modifiedDate differs from pubDate - Add author bio section with GitHub avatar and fallback Main app: - Add commodity variant to middleware VARIANT_HOST_MAP and VARIANT_OG - Add commodity.worldmonitor.app to sitemap.xml - Shorten index.html meta description to 136 chars (was 161) - Remove worksFor block from index.html author JSON-LD - Welcome all bots in robots.txt (removed per-bot blocks, global allows) - Update llms.txt: five variants listed, all 17 blog post URLs added * fix(seo): scope FAQ regex to section boundary, use author-aware avatar - extractFaqLd now slices only to the next ## heading (was: to end of body) preventing bold text in post-FAQ sections from being mistakenly extracted - Avatar src now derived from DEFAULT_AUTHOR_GITHUB constant (koala73) only when using the default author; custom authors fall back to favicon so multi-author posts show a correct image instead of the wrong profile	2026-03-26 12:48:56 +04:00
Elie Habib	e1258ba1c4	fix: exclude /api/health from bot UA filtering in middleware (#1499 ) UptimeRobot UA contains "bot" (UptimeRobot) which triggers the BOT_UA regex, causing 403 on health checks. Add /api/health to PUBLIC_API_PATHS alongside /api/version.	2026-03-12 22:14:28 +04:00
Hasan AlDoy	02f3fe77a9	feat: Arabic font support and HLS live streaming UI (#1020 ) * feat: enhance support for HLS streams and update font styles * chore: add .vercelignore to exclude large local build artifacts from Vercel deploys * chore: include node types in tsconfig to fix server type errors on Vercel build * fix(middleware): guard optional variant OG lookup to satisfy strict TS * fix: desktop build and live channels handle null safety - scripts/build-sidecar-sebuf.mjs: Skip building removed [domain]/v1/[rpc].ts (removed in #785) - src/live-channels-window.ts: Add optional chaining for handle property to prevent null errors - src-tauri/Cargo.lock: Bump version to 2.5.24 Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix: address review issues on PR #1020 - Remove AGENTS.md (project guidelines belong to repo owner) - Restore tracking script in index.html (accidentally removed) - Revert tsconfig.json "node" types (leaks Node globals to frontend) - Add protocol validation to isHlsUrl() (security: block non-http URIs) - Revert Cargo.lock version bump (release management concern) * fix: address P2/P3 review findings - Preserve hlsUrl for HLS-only channels in refreshChannelInfo (was incorrectly clearing the stream URL on every refresh cycle) - Replace deprecated .substr() with .substring() - Extract duplicated HLS display name logic into getChannelDisplayName() --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Elie Habib <elie.habib@gmail.com>	2026-03-05 10:16:43 +04:00
Elie Habib	898ac7b1c4	perf(rss): route RSS direct to Railway, skip Vercel middleman (#961 ) * perf(rss): route RSS direct to Railway, skip Vercel middleman Vercel /api/rss-proxy has 65% error rate (207K failed invocations/12h). Route browser RSS requests directly to Railway (proxy.worldmonitor.app) via Cloudflare CDN, eliminating Vercel as middleman. - Add VITE_RSS_DIRECT_TO_RELAY feature flag (default off) for staged rollout - Centralize RSS proxy URL in rssProxyUrl() with desktop/dev/prod routing - Make Railway /rss public (skip auth, keep rate limiting with CF-Connecting-IP) - Add wildcard .worldmonitor.app CORS + always emit Vary: Origin on /rss - Extract ~290 RSS domains to shared/rss-allowed-domains.cjs (single source of truth) - Convert Railway domain check to Set for O(1) lookups - Remove rss-proxy from KEYED_CLOUD_API_PATTERN (no longer needs API key header) - Add edge function test for shared domain list import fix(edge): replace node:module with JSON import for edge-compatible RSS domains api/_rss-allowed-domains.js used createRequire from node:module which is unsupported in Vercel Edge Runtime, breaking all edge functions (including api/gpsjam). Replaced with JSON import attribute syntax that works in both esbuild (Vercel build) and Node.js 22+ (tests). Also fixed middleware.ts TS18048 error where VARIANT_OG[variant] could be undefined. * test(edge): add guard against node: built-in imports in api/ files Scans ALL api/.js files (including _ helpers) for node: module imports which are unsupported in Vercel Edge Runtime. This would have caught the createRequire(node:module) bug before it reached Vercel. fix(edge): inline domain array and remove NextResponse reference - Replace `import ... with { type: 'json' }` in _rss-allowed-domains.js with inline array — Vercel esbuild doesn't support import attributes - Replace `NextResponse.next()` with bare `return` in middleware.ts — NextResponse was never imported * ci(pre-push): add esbuild bundle check and edge function tests The pre-push hook now catches Vercel build failures locally: - esbuild bundles each api/*.js entrypoint (catches import attribute syntax, missing modules, and other bundler errors) - runs edge function test suite (node: imports, module isolation)	2026-03-04 18:42:00 +04:00
Elie Habib	c956b73470	feat: consolidate 4 Vercel deployments into 1 via runtime variant detection (#756 ) Replace build-time VITE_VARIANT resolution with hostname-based detection for web deployments. A single build now serves all 4 variants (full, tech, finance, happy) via subdomain routing, eliminating 3 redundant Vercel projects and their build minutes. - Extract VARIANT_META into shared src/config/variant-meta.ts - Detect variant from hostname (tech./finance./happy. subdomains) - Preserve VITE_VARIANT env var for desktop builds and localhost dev - Add social bot OG responses in middleware for variant subdomains - Swap favicons and meta tags at runtime per resolved variant - Restrict localStorage variant reads to localhost/Tauri only	2026-03-02 16:39:02 +04:00
Elie Habib	36e36d8b57	Cost/traffic hardening, runtime fallback controls, and PostHog removal (#638 ) - Remove PostHog analytics runtime and configuration - Add API rate limiting (api/_rate-limit.js) - Harden traffic controls across edge functions - Add runtime fallback controls and data-loader improvements - Add military base data scripts (fetch-mirta-bases, fetch-osm-bases) - Gitignore large raw data files - Settings playground prototypes	2026-03-01 11:53:20 +04:00
Elie Habib	dec52f2787	feat: add Cloudflare edge caching infrastructure for api.worldmonitor.app (#471 ) Route web production RPC traffic through api.worldmonitor.app via fetch interceptor (installWebApiRedirect). Add default Cache-Control headers (s-maxage=300, stale-while-revalidate=60) on GET 200 responses, with no-store override for real-time endpoints (vessel snapshot). Update CORS to allow GET method. Skip Vercel bot middleware for API subdomain using hostname check (non-spoofable, replacing CF-Ray header approach). Update desktop cloud fallback to route through api.worldmonitor.app.	2026-02-27 19:48:49 +04:00
Elie Habib	408d5d3374	security: harden IPC, gate DevTools, isolate external windows, exempt /api/version (#348 ) * security: harden IPC commands, gate DevTools, and isolate external windows - Remove devtools from default Tauri features; gate behind opt-in Cargo feature so production builds never expose DevTools - Add IPC origin validation (require_trusted_window) to 9 sensitive commands: get_secret, get_all_secrets, set_secret, delete_secret, get_local_api_token, read/write/delete_cache_entry, fetch_polymarket - Isolate youtube-login window into restricted capability (core:window only) — prevents external-origin webview from invoking app commands - Add 5-minute TTL to cached sidecar auth token in fetch patch closure - Document renderer trust boundary threat model in runtime.ts * docs: add contributors, security acknowledgments, and desktop security policy - Add Contributors section to README with all 16 GitHub contributors - Add Security Acknowledgments crediting Cody Richard for 3 disclosures - Update SECURITY.md with desktop runtime security model (Tauri IPC origin validation, DevTools gating, sidecar auth, capability isolation, fetch patch trust boundary) - Add Tauri-specific items to security report scope - Correct API key storage description to cover both web and desktop * fix: exempt /api/version from bot-blocking middleware The desktop update check and sidecar requests were getting 403'd by the middleware's bot UA filter (curl/) and short UA check.	2026-02-25 06:14:16 +00:00
Elie Habib	eee0eece50	fix: whitelist social preview bots + restrict SW routes to same-origin (#251 ) * fix: restrict SW route patterns to same-origin only The broad regex /^https?:\/\/.\/api\/./i matched ANY URL with /api/ in the path, including external APIs like NASA EONET (eonet.gsfc.nasa.gov/api/v3/events). Workbox intercepted these cross-origin requests with NetworkOnly, causing no-response errors when CORS failed. Changed all /api/, /ingest/, and /rss/ SW route patterns to use sameOrigin callback check so only our Vercel routes get NetworkOnly handling. External APIs now pass through without SW interference. * fix: whitelist social preview bots on OG image assets Slack-ImgProxy (distinct from Slackbot) was blocked from fetching /favico/og-image.png by both our bot filter and Vercel Attack Challenge. Extend middleware matcher to /favico/* and allow all social preview/image bots through on static asset paths.	2026-02-23 10:44:28 +00:00
Elie Habib	b714e7b13e	fix: harden API routes, batch FRED requests, and sanitize tooltip HTML - fred-data: batch mode (comma-separated series_id) reduces 7 edge function invocations to 1; cap at 15 series; propagate upstream 502s instead of masking as empty 200; add X-Data-Status header - ucdp-events: parallelize page fetches; track failed pages and use short cache TTL for partial results instead of caching at full 6h - ucdp: add OPTIONS/method guard matching ucdp-events pattern - middleware: exact-match social bot paths instead of startsWith - vercel.json: use VERCEL_GIT_PREVIOUS_SHA for multi-commit diffs; add middleware.ts, settings.html, vercel.json to watch list - Panel.ts: use safeHtml() allowlist sanitizer for tooltip content - dom-utils: add safeHtml() with tag/attribute allowlist and javascript: URI blocking	2026-02-20 14:51:54 +04:00
Elie Habib	3ffe76b208	perf: add bot protection middleware and robots.txt to reduce API abuse Block crawlers/scrapers from /api/* routes via Edge Middleware (403 for bot user-agents and missing/short UAs). Social preview bots (Twitter, Facebook, LinkedIn, Slack, Discord) are allowed on /api/story and /api/og-story for OG previews. robots.txt reinforces the same policy.	2026-02-20 13:38:43 +04:00

16 Commits