mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Go to file

Elie Habib 8ea4c8f163 feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change) (#3330 )

* feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change)

Ship the measurement layer before picking any recall-lift strategy.

Why: the current dedup path embeds titles only, so brief-wire headlines
that share a real event but drop the geographic anchor (e.g. "Alleged
Coup: defendant arrives in court" vs "Trial opens after Nigeria charges
six over 2025 coup plot") can slip past the 0.60 cosine threshold. To
tune recall without regressing precision we need a replayable per-tick
dataset — one record per story with the exact fields any downstream
candidate (title+slug, LLM-canonicalise, text-embedding-3-large, cross-
encoder re-rank, etc.) would need to score.

This PR ships ONLY the log. Zero behaviour change:
  - Opt-in via DIGEST_DEDUP_REPLAY_LOG=1 (default OFF).
  - Writer is best-effort: all errors swallowed + warned, never affects
    digest delivery. No throw path.
  - Records include hash, originalIndex, isRep, clusterId, raw +
    normalised title, link, severity/score/mentions/phase/sources,
    embeddingCacheKey, hasEmbedding sidecar flag, and the tick's config
    snapshot (mode, clustering, cosineThreshold, topicThreshold, veto).
  - clusterId derives from rep.mergedHashes (already set by
    materializeCluster) so the orchestrator is untouched.
  - Storage: Upstash list keyed by {variant}:{lang}:{sensitivity}:{date}
    with 30-day EXPIRE. Date suffix caps per-key growth; retention
    covers the labelling cadence + cross-candidate comparison window.
  - Env flag is '1'-only (fail-closed on typos, same pattern as
    DIGEST_DEDUP_MODE).

Activation path (post-merge): flip DIGEST_DEDUP_REPLAY_LOG=1 on the
seed-digest-notifications Railway service. Watch one cron tick for the
RPUSH + EXPIRE pair (or a single warn line if creds/upstream flake),
then leave running for at least one week to accumulate calibration data.

Tests: 21 unit tests covering flag parsing, key shape + sanitisation,
record field correctness (isRep, clusterId, embeddingCacheKey,
hasEmbedding, tickConfig), pipeline null/throw handling, malformed
input. Existing 77 dedup tests unchanged and still green.

* fix(digest-dedup): capture topicGroupingEnabled in replay tickConfig

Review catch (PR #3330): the tickConfig snapshot omitted
topicGroupingEnabled even though readOrchestratorConfig returns it and
the digest's post-dedup topic ordering gates on it. A tick run with
DIGEST_DEDUP_TOPIC_GROUPING=0 serialised identically to a default
tick, making those runs non-replayable for the calibration work this
log is meant to enable.

Add topicGroupingEnabled to the recorded tickConfig. One-line schema
fix + regression test asserting topic-grouping-off ticks serialise
distinctly from default.

22/22 tests pass.

* fix(digest-dedup): await replay-log write to survive explicit process.exit

Review catch (PR #3330): the fire-and-forget `void writeReplayLog(...)`
call could be dropped on the explicit-exit paths — the brief-compose
failure gate at line 1539 and main().catch at line 1545 both call
process.exit(1). Unlike natural exit, process.exit does not drain
in-flight promises, so the last N ticks' replay records could be
silently lost on runs where measurement fidelity matters most.

Fix: await the writeReplayLog call. Safe because:
  - writeReplayLog returns synchronously when the flag is off
    (replayLogEnabled check is the first thing it does)
  - It has a top-level try/catch that always returns a result object
  - The Upstash pipeline call has a 10s timeout ceiling
  - buildDigest already awaits many Upstash calls (dedup, compose,
    render) so one more is not a hot-path concern

Comment block added above the call explains why the await is
deliberate — so a future refactor doesn't revert it to void thinking
it's a leftover.

No test change: existing writeReplayLog unit tests already cover the
disabled / empty / success / error paths. The fix is a single-keyword
change in a caller that was already guaranteed-safe by the callee's
contract.

* refactor(digest-dedup): address Greptile P2 review comments on replay log

Three non-blocking polish items from the automated review, bundled
because they all touch the same new module and none change behaviour.

1. tsMs captured BEFORE deduplicateStories (seed-digest-notifications.mjs).
   Previously sampled after dedup returned, so briefTickId reflected
   dedup-completion time rather than tick-start. For downstream readers
   the natural reading of "briefTickId" is when the tick began
   processing; moved the Date.now() call to match that expectation.
   Drift is maybe 100ms-2s on cold-cache embed calls — small, but
   moving it is free.

2. buildReplayLogKey emptiness check now strips ':' and '-' in addition
   to '_'. A pathological ruleId of ':::' previously passed through
   verbatim, producing keys like `digest:replay-log:v1::::2026-04-23`
   that confuse redis-cli's namespace tooling (SCAN / KEYS / tab
   completion). The new guard falls back to "unknown" on any input
   that's all separators. Added a regression test covering the
   ':::' / '---' / '___' / mixed cases.

3. tickConfig is now a per-record shallow copy instead of a shared
   reference. Storage is unaffected (writeReplayLog serialises each
   record via JSON.stringify independently) but an in-memory consumer
   that mutated one record's tickConfig for experimentation would have
   silently affected all other records in the same batch. Added a
   regression test asserting mutation doesn't leak across records.

Tests: 24/24 pass (22 prior + 2 new regression). Typecheck + lint clean.

2026-04-23 11:50:19 +04:00

.github

chore(api): enforce sebuf contract + migrate drifting endpoints (#3207 ) (#3242 )

2026-04-22 09:55:59 +03:00

.husky

chore(api): enforce sebuf contract + migrate drifting endpoints (#3207 ) (#3242 )

2026-04-22 09:55:59 +03:00

api

feat(resilience): PR 2 §3.4 recovery-domain weight rebalance (#3328 )

2026-04-23 10:25:18 +04:00

blog-site

feat(seo): BlogPosting schema, FAQPage JSON-LD, extensible author system (#2284 )

2026-03-26 12:48:56 +04:00

consumer-prices-core

fix(railway): tolerate Ubuntu apt mirror failures in NIXPACKS + Dockerfile builds (#3142 )

2026-04-17 08:35:20 +04:00

convex

refactor(emails): refresh Pro welcome email — surface WM Analyst, Widgets, MCP (#3300 )

2026-04-22 23:18:32 +04:00

data

feat(oref): Tzeva Adom as primary alert source + Hebrew translation dictionaries (#2863 )

2026-04-09 12:39:34 +04:00

deploy/nginx

Add Brotli-first API compression for sidecar and nginx

2026-02-20 08:41:22 +04:00

docker

fix(csp): allow Dodo payment frames + Google Pay permission (#2789 )

2026-04-07 20:26:50 +04:00

docs

feat(resilience): PR 2 §3.4 recovery-domain weight rebalance (#3328 )

2026-04-23 10:25:18 +04:00

e2e

feat(auth): integrate clerk.dev (#1812 )

2026-03-26 13:47:22 +02:00

plans

perf(map): lazy supercluster init, memoize filterByTime, lazy static layers (#1985 )

2026-03-21 15:32:51 +04:00

pro-test

fix(pro-marketing): /pro reflects entitlement — swap upgrade CTAs to dashboard link for Pro users (#3301 )

2026-04-22 23:39:32 +04:00

proto

chore: remove dormant proactive-intelligence agent (superseded by digest) (#3325 )

2026-04-23 09:15:57 +04:00

public

feat(agent-readiness): publish MCP Server Card at /.well-known/mcp/server-card.json (#3329 )

2026-04-23 09:23:00 +04:00

scripts

feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change) (#3330 )

2026-04-23 11:50:19 +04:00

server

fix(aviation): seeder writes delays-bootstrap aggregate (close EMPTY-on-quiet-traffic alarm) (#3334 )

2026-04-23 11:43:54 +04:00

shared

fix(brief): category-gated context + RELEVANCE RULE to stop formulaic grounding (#3281 )

2026-04-22 08:21:01 +04:00

src

fix(idb-cleanup): swallow TransactionInactiveError in idempotent IDB cursor loops (#3335 )

2026-04-23 11:47:20 +04:00

src-tauri

chore(api): enforce sebuf contract + migrate drifting endpoints (#3207 ) (#3242 )

2026-04-22 09:55:59 +03:00

tests

feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change) (#3330 )

2026-04-23 11:50:19 +04:00

todos

feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200 )

2026-04-19 13:49:48 +04:00

.dockerignore

feat: self-hosted Docker stack (#1521 )

2026-03-19 12:07:20 +04:00

.env.example

fix(email): route Intelligence Brief off the alerts@ mailbox (#3321 )

2026-04-23 08:51:27 +04:00

.gitignore

feat(brief): route whyMatters through internal analyst-context endpoint (#3248 )

2026-04-21 14:03:27 +04:00

.markdownlint-cli2.jsonc

refactor(sanctions): simplify handler to Redis-read-only, fix seed OOM risk (#1753 )

2026-03-17 12:20:10 +04:00

.npmrc

chore: suppress npm deprecation warnings via .npmrc loglevel=error (#1862 )

2026-03-19 09:48:23 +04:00

.nvmrc

fix(dx): add node_modules guard to pre-push hook and pin Node 22 (#1368 )

2026-03-10 08:27:44 +04:00

.vercelignore

feat: Arabic font support and HLS live streaming UI (#1020 )

2026-03-05 10:16:43 +04:00

AGENTS.md

feat: harness engineering P0 - linting, testing, architecture docs (#1587 )

2026-03-14 21:29:21 +04:00

ARCHITECTURE.md

feat: harness engineering P0 - linting, testing, architecture docs (#1587 )

2026-03-14 21:29:21 +04:00

biome.json

feat(supply-chain): Global Shipping Intelligence — Sprint 0 + Sprint 1 (#2870 )

2026-04-09 17:06:03 +04:00

brief-palette-playground.html

feat(brief): swap sienna rust for two-strength WM mint (Option B palette) (#3178 )

2026-04-18 20:50:16 +04:00

CHANGELOG.md

chore(api): enforce sebuf contract + migrate drifting endpoints (#3207 ) (#3242 )

2026-04-22 09:55:59 +03:00

CODE_OF_CONDUCT.md

docs: add community guidelines (contributing, code of conduct, security) (#226 )

2026-02-22 16:26:13 +02:00

compound-engineering.local.md

feat(finance-panels): add 7 macro/market panels + Daily Brief context (issues #2245-#2253) (#2258 )

2026-03-26 08:03:09 +04:00

CONTRIBUTING.md

fix: add India boundary override from Natural Earth 50m data (#1796 )

2026-03-19 02:40:05 +04:00

DEPLOYMENT-PLAN.md

feat(auth): integrate clerk.dev (#1812 )

2026-03-26 13:47:22 +02:00

docker-compose.yml

Revert "feat: seed orchestrator with auto-seeding, persistence, and managemen…" (#2060 )

2026-03-22 19:59:42 +04:00

Dockerfile

Revert "feat: seed orchestrator with auto-seeding, persistence, and managemen…" (#2060 )

2026-03-22 19:59:42 +04:00

Dockerfile.digest-notifications

feat(brief): route whyMatters through internal analyst-context endpoint (#3248 )

2026-04-21 14:03:27 +04:00

Dockerfile.relay

fix(relay): COPY missing _seed-envelope-source + _seed-contract — chokepointFlows stale 32h (#3132 )

2026-04-16 17:28:16 +04:00

Dockerfile.seed-bundle-portwatch-port-activity

feat(portwatch): split port-activity into standalone Railway cron + restore per-country shape (#3231 )

2026-04-20 15:21:43 +04:00

Dockerfile.seed-bundle-resilience-validation

fix(resilience): ship full scripts/ tree in validation Docker image (#3054 )

2026-04-13 15:43:55 +04:00

index.html

fix(csp): allow Stripe 3D Secure frames + consolidate Dodo CSP entries (#2806 )

2026-04-07 23:47:27 +04:00

LICENSE

chore: switch license to AGPL-3.0, externalize Sentry DSN

2026-02-19 07:24:47 +04:00

live-channels.html

feat(live): custom channel management with review fixes (#282 )

2026-02-23 22:51:44 +00:00

Makefile

feat(advisories): gold standard migration for security advisories (#1637 )

2026-03-15 11:54:08 +04:00

middleware.ts

fix(brief): unblock whyMatters analyst endpoint (middleware 403) + DIGEST_ONLY_USER filter (#3255 )

2026-04-21 19:41:58 +04:00

nixpacks.toml

fix(railway): tolerate Ubuntu apt mirror failures in NIXPACKS + Dockerfile builds (#3142 )

2026-04-17 08:35:20 +04:00

package-lock.json

fix(deps): promote yaml from transitive peer to top-level dependency (#3333 )

2026-04-23 11:22:54 +04:00

package.json

chore(lint): exclude docs/brainstorms and docs/ideation from lint:md (#3332 )

2026-04-23 11:25:14 +04:00

playwright.config.ts

test: add coverage for finance/trending/reload and stabilize map harness

2026-02-17 19:22:55 +04:00

README.md

docs(marketing): bump source-count claims from 435+ to 500+ (#3241 )

2026-04-20 22:39:42 +04:00

SECURITY.md

security: harden IPC, gate DevTools, isolate external windows, exempt /api/version (#348 )

2026-02-25 06:14:16 +00:00

SELF_HOSTING.md

feat: self-hosted Docker stack (#1521 )

2026-03-19 12:07:20 +04:00

settings.html

docs: expand AGPL-3.0 license section in README (#1143 )

2026-03-06 23:47:04 +04:00

tsconfig.api.json

fix(resilience): satisfy release gate validation (#2686 )

2026-04-04 19:31:02 +04:00

tsconfig.json

Add missing layers to DeckGLMap for feature parity with D3 Map

2026-01-25 07:21:33 +00:00

vercel.json

chore(vercel): RFC 9116 canonical /security.txt → /.well-known/security.txt 301 (#3326 )

2026-04-23 09:13:40 +04:00

vite.config.ts

chore(api): enforce sebuf contract + migrate drifting endpoints (#3207 ) (#3242 )

2026-04-22 09:55:59 +03:00

vitest.config.mts

feat: Dodo Payments integration + entitlement engine & webhook pipeline (#2024 )

2026-04-03 00:25:18 +04:00

README.md

World Monitor

Real-time global intelligence dashboard — AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface.

Documentation · Releases · Contributing

What It Does

500+ curated news feeds across 15 categories, AI-synthesized into briefs
Dual map engine — 3D globe (globe.gl) and WebGL flat map (deck.gl) with 45 data layers
Cross-stream correlation — military, economic, disaster, and escalation signal convergence
Country Intelligence Index — composite risk scoring across 12 signal categories
Finance radar — 92 stock exchanges, commodities, crypto, and 7-signal market composite
Local AI — run everything with Ollama, no API keys required
5 site variants from a single codebase (world, tech, finance, commodity, happy)
Native desktop app (Tauri 2) for macOS, Windows, and Linux
21 languages with native-language feeds and RTL support

For the full feature list, architecture, data sources, and algorithms, see the documentation.

Quick Start

git clone https://github.com/koala73/worldmonitor.git
cd worldmonitor
npm install
npm run dev

Open localhost:5173. No environment variables required for basic operation.

For variant-specific development:

npm run dev:tech       # tech.worldmonitor.app
npm run dev:finance    # finance.worldmonitor.app
npm run dev:commodity  # commodity.worldmonitor.app
npm run dev:happy      # happy.worldmonitor.app

See the self-hosting guide for deployment options (Vercel, Docker, static).

Tech Stack

Category	Technologies
Frontend	Vanilla TypeScript, Vite, globe.gl + Three.js, deck.gl + MapLibre GL
Desktop	Tauri 2 (Rust) with Node.js sidecar
AI/ML	Ollama / Groq / OpenRouter, Transformers.js (browser-side)
API Contracts	Protocol Buffers (92 protos, 22 services), sebuf HTTP annotations
Deployment	Vercel Edge Functions (60+), Railway relay, Tauri, PWA
Caching	Redis (Upstash), 3-tier cache, CDN, service worker

Full stack details in the architecture docs.

Flight Data

Flight data provided gracefully by Wingbits, the most advanced ADS-B flight data solution.

Data Sources

WorldMonitor aggregates 65+ external data sources across geopolitics, finance, energy, climate, aviation, cyber, military, infrastructure, and news intelligence. See the full data sources catalog for providers, feed tiers, and collection methods.

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

npm run typecheck        # Type checking
npm run build:full       # Production build

License

AGPL-3.0 for non-commercial use. Commercial license required for any commercial use.

Use Case	Allowed?
Personal / research / educational	Yes
Self-hosted (non-commercial)	Yes, with attribution
Fork and modify (non-commercial)	Yes, share source under AGPL-3.0
Commercial use / SaaS / rebranding	Requires commercial license

See LICENSE for full terms. For commercial licensing, contact the maintainer.

Author

Elie Habib — GitHub

Contributors

Security Acknowledgments

We thank the following researchers for responsibly disclosing security issues:

Cody Richard — Disclosed three security findings covering IPC command exposure, renderer-to-sidecar trust boundary analysis, and fetch patch credential injection architecture (2026)

See our Security Policy for responsible disclosure guidelines.

worldmonitor.app · docs.worldmonitor.app · finance.worldmonitor.app · commodity.worldmonitor.app

Star History

Languages

TypeScript 49.1%

JavaScript 47%

CSS 2.9%

HTML 0.4%

Rust 0.3%

Other 0.1%