mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Go to file

Elie Habib 90f4ac0f78 feat(consumer-prices): strict search-hit validator (shadow mode) (#3101 )

* feat(consumer-prices): add 'candidate' match state + negativeTokens schema

Schema foundation for the strict-validator plan:
- migration 008 widens product_matches.match_status CHECK to include
  'candidate' so weak search hits can be persisted without entering
  aggregates (aggregate.ts + snapshots filter on ('auto','approved')
  so candidates are excluded automatically).
- BasketItemSchema gains optional negativeTokens[] — config-driven
  reject tokens for obvious class errors (e.g. 'canned' for fresh
  tomatoes). Product-taxonomy splits like plain vs greek yogurt
  belong in separate substitutionGroup values, not here.
- upsertProductMatch accepts 'candidate' and writes evidence_json
  so reviewers can see why a match was downgraded.

* feat(consumer-prices): add validateSearchHit pure helper + known-bad test fixtures

Deterministic post-extraction validator that replaces the boolean
isTitlePlausible gate for scoring and candidate triage. Evaluates
four signals and returns { ok, score, reasons, signals }:

  - class-error rejects from BasketItem.negativeTokens (whole-token
    match for single words; substring match for hyphenated entries
    like 'plant-based' so 'Plant-Based Yogurt' trips without needing
    token-splitting gymnastics)
  - non-food indicators (seeds, fertilizer, planting) — shared with
    the legacy gate
  - token-overlap ratio over identity tokens (>2 chars, non-packaging)
  - quantity-window conformance against minBaseQty/maxBaseQty

Score is a 0..1 weighted sum (overlap 0.55, size 0.35/0.2/0, class-
clean 0.10). AUTO_MATCH_THRESHOLD=0.75 exported for the scrape-side
auto-vs-candidate decision.

Locked all five bad log examples into regression tests and added
matching positive cases so the rule set proves both sides of the
boundary. Also added vitest.config.ts so consumer-prices-core tests
run under its own config instead of inheriting the worldmonitor
root config (which excludes this directory).

* feat(consumer-prices): wire validator (shadow) + replace 1.0 auto-match

search.ts:
- Thread BasketItem constraints (baseUnit, min/maxBaseQty, negativeTokens,
  substitutionGroup) through discoverTargets → fetchTarget → parseListing
  using explicit named fields, not an opaque JSON blob.
- _extractFromUrl now runs validateSearchHit alongside isTitlePlausible.
  Legacy gate remains the hard gate; validator is shadow-only for now —
  when legacy accepts but validator rejects, a [search:shadow-reject]
  line is logged with reasons + score so the rollout diff report can
  inform the decision to flip the gate. No live behavior change.
- ValidatorResult attached to SearchPayload + rawPayload so scrape.ts
  can score the match without re-running the validator.

scrape.ts:
- Remove unconditional matchScore:1.0 / status:'auto' insert. Use the
  validator score from the adapter payload. Hits with ok=true and
  score >= AUTO_MATCH_THRESHOLD (0.75) keep 'auto'; everything else
  (including validator.ok=false) writes 'candidate' with evidence_json
  carrying the reasons + signals. Aggregates filter on ('auto','approved')
  so candidates are excluded automatically.
- Adapters without a validator (exa-search, etc.) fall back to the
  legacy 1.0/auto behavior so this PR is a no-op for non-search paths.

* feat(consumer-prices): populate negativeTokens for 6 known-bad groups

* fix(consumer-prices): enforce validator on pin path + drop 'cane' from sugar rejects

Addresses PR #3101 review:

1. Pinned direct hits bypassed the validator downgrade — the new
   auto-vs-candidate decision only ran inside the !wasDirectHit block,
   so a pin that drifted onto the wrong product (the steady-state
   common path) would still flow poisoned prices into aggregates
   through the existing 'auto' match. Now: before inserting an
   observation, if the direct hit's validator.ok === false, skip the
   observation and route the target through handlePinError so the pin
   soft-disables after 3 strikes. Legacy isTitlePlausible continues to
   gate the pin extraction itself.

2. 'cane' was a hard reject for sugar_white across all 10 baskets but
   'white cane sugar' is a legitimate SKU descriptor — would have
   downgraded real products to candidate and dropped coverage. Removed
   from every essentials_*.yaml sugar_white negativeTokens list.
   Added a regression test that locks in 'Silver Spoon White Cane
   Sugar 1kg' as a must-pass positive case.

* fix(consumer-prices): strip size tokens from identity + protect approved rows

Addresses PR #3101 round 2 review:

1. Compact size tokens ("1kg", "500g", "250ml") were kept as identity
   tokens. Firecrawl emits size spaced ("1 kg"), which tokenises to
   ["1","kg"] — both below the length>2 floor — so the compact "1kg"
   token could never match. Short canonical names like "Onions 1kg"
   lost 0.5 token overlap and legitimate hits landed at score 0.725 <
   AUTO_MATCH_THRESHOLD, silently downgrading to candidate. Size
   fidelity is already enforced by the quantity-window check; identity
   tokens now ignore /^\d+(?:\.\d+)?[a-z]+$/. New regression test
   locks in "Fresh Red Onions 1 kg" as a must-pass case.

2. upsertProductMatch's DO UPDATE unconditionally wrote EXCLUDED.status.
   A re-scrape whose validator scored an already-approved URL below
   0.75 would silently demote human-curated 'approved' rows to
   'candidate'. Added a CASE guard so approved stays approved; every
   other state follows the new validator verdict.

* fix(consumer-prices): widen curated-state guard to review + rejected

PR #3101 round 3: the CASE only protected 'approved' from being
overwritten. 'review' (written by validate.ts when a price is an
outlier, or by humans sending a row back) and 'rejected' (human
block) are equally curated — a re-scrape under this path silently
overwrites them with the fresh validator verdict and re-enables the
URL in aggregate queries on the next pass.

Widen the immutable set to ('approved','review','rejected'). Also
stop clearing pin_disabled_at on those rows so a quarantined pin
keeps its disabled flag until the review workflow resolves it.

* fix(analyze-stock): classify dividend frequency by median gap

recentDivs.length within a hard 365.25-day window misclassifies quarterly
payers whose last-year Q1 payment falls just outside the cutoff — common
after mid-April each year, when Date.now() - 365.25d lands after Jan's
payment timestamp. The test 'non-zero CAGR for a quarterly payer' flaked
calendar-dependently for this reason.

Prefer median inter-payment interval: quarterly = ~91d median gap,
regardless of where the trailing-12-month window happens to bisect the
payment series. Falls back to the old count when <2 entries exist.

Also documents the CAGR filter invariant in the test helper.

* fix(analyze-stock): suppress frequency when no recent divs + detect regime slowdowns

Addresses PR #3102 review:

1. Suspended programs no longer leak a frequency badge. When recentDivs
   is empty, dividendYield and trailingAnnualDividendRate are both 0;
   emitting 'Quarterly' derived from historical median would contradict
   those zeros in the UI. paymentsPerYear now short-circuits to 0 before
   the interval classifier runs.

2. Whole-history median-gap no longer masks cadence regime changes. The
   reconciliation now depends on trailing-year count:
     recent >= 3  → interval classifier (robust to calendar drift)
     recent 1..2  → inspect most-recent inter-payment gap:
                    > 180d = real slowdown, trust count (Annual)
                    <= 180d = calendar drift, trust interval (Quarterly)
     recent 0     → empty frequency (suspended)
   The interval classifier itself is now scoped to the last 2 years so
   it responds to regime changes instead of averaging over 5y of history.

Regression tests:
- 'emits empty frequency when the dividend program has been suspended' —
  3y of quarterly history + 18mo silence must report '' not 'Quarterly'.
- 'detects a recent quarterly → annual cadence change' — 12 historical
  quarterly payments + 1 recent annual payment must report 'Annual'.

* fix(analyze-stock): scope interval median to trailing year when recent>=3

Addresses PR #3102 review round 2: the reconciler's recent>=3 branch
called paymentsPerYearFromInterval(entries), which scopes to the last
2 years. A monthly→quarterly shift (12 monthly payments in year -2..-1
plus 4 quarterly in year -1..0) produced a 2-year median of ~30d and
misclassified as Monthly even though the current trailing-year cadence
is clearly quarterly.

Pass recentDivs directly to the interval classifier when recent>=3.
Two payments in the trailing year = 1 gap which suffices for the median
(gap count >=1, median well-defined). The historical-window 2y scoping
still applies for the recent 1..2 branch, where we actively need
history to distinguish drift from slowdown.

Regression test: 12 monthly payments from -13..-24 months ago + 4
quarterly payments inside the trailing year must classify as Quarterly.

* fix(analyze-stock): use true median (avg of two middles) for even gap counts

PR #3102 P2: gaps[floor(length/2)] returns the upper-middle value for
even-length arrays, biasing toward slower cadence at classifier
thresholds when the trailing-year sample is small. Use the average of
the two middles for even lengths. Harmless on 5-year histories with
50+ gaps where values cluster, but correct at sparse sample sizes where
the trailing-year branch can have only 2–3 gaps.

2026-04-15 14:28:18 +04:00

.github

fix(readme): replace broken docs link with relative path (#2639 )

2026-04-03 09:21:38 +04:00

.husky

feat(supply-chain): Sprint C — scenario engine (templates, job API, Railway worker, map activation) (#2890 )

2026-04-10 14:44:14 +04:00

api

feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated (#3097 )

2026-04-15 09:16:27 +04:00

blog-site

feat(seo): BlogPosting schema, FAQPage JSON-LD, extensible author system (#2284 )

2026-03-26 12:48:56 +04:00

consumer-prices-core

feat(consumer-prices): strict search-hit validator (shadow mode) (#3101 )

2026-04-15 14:28:18 +04:00

convex

fix(relay): treat quiet hours start === end as disabled, not 24/7 (#3061 ) (#3066 )

2026-04-14 13:14:53 +04:00

data

feat(oref): Tzeva Adom as primary alert source + Hebrew translation dictionaries (#2863 )

2026-04-09 12:39:34 +04:00

deploy/nginx

Add Brotli-first API compression for sidecar and nginx

2026-02-20 08:41:22 +04:00

docker

fix(csp): allow Dodo payment frames + Google Pay permission (#2789 )

2026-04-07 20:26:50 +04:00

docs

feat(market): Hyperliquid perp positioning flow as leading indicator (#3074 )

2026-04-14 08:05:40 +04:00

e2e

feat(auth): integrate clerk.dev (#1812 )

2026-03-26 13:47:22 +02:00

plans

perf(map): lazy supercluster init, memoize filterByTime, lazy static layers (#1985 )

2026-03-21 15:32:51 +04:00

pro-test

feat(email): add deliverability guards to reduce waitlist bounces (#2819 )

2026-04-08 11:21:40 +04:00

proto

feat(market): Hyperliquid perp positioning flow as leading indicator (#3074 )

2026-04-14 08:05:40 +04:00

public

fix(pro): bake Clerk publishable key into /pro static bundle (#2800 )

2026-04-07 22:28:29 +04:00

scripts

feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated (#3097 )

2026-04-15 09:16:27 +04:00

server

fix(analyze-stock): classify dividend frequency by median gap (#3102 )

2026-04-15 14:00:57 +04:00

shared

fix(scoring): relay recomputes importanceScore post-LLM + shadow-log v2 + parity test (#3069 )

2026-04-13 21:53:21 +04:00

src

feat(analytics): send sub status & planKey with Umami identity (#3093 )

2026-04-14 21:33:12 +04:00

src-tauri

chore: bump version to 2.6.7 (#2637 )

2026-04-03 07:37:45 +04:00

tests

fix(analyze-stock): classify dividend frequency by median gap (#3102 )

2026-04-15 14:00:57 +04:00

todos

Phase 0: Regional Intelligence snapshot writer foundation (#2940 )

2026-04-11 17:55:39 +04:00

.dockerignore

feat: self-hosted Docker stack (#1521 )

2026-03-19 12:07:20 +04:00

.env.example

feat: " climate disasters alerts seeders " (#2550 )

2026-04-04 23:18:53 +04:00

.gitignore

fix(catalog): update prices to match Dodo catalog via API (#2678 )

2026-04-04 15:05:00 +04:00

.markdownlint-cli2.jsonc

refactor(sanctions): simplify handler to Redis-read-only, fix seed OOM risk (#1753 )

2026-03-17 12:20:10 +04:00

.npmrc

chore: suppress npm deprecation warnings via .npmrc loglevel=error (#1862 )

2026-03-19 09:48:23 +04:00

.nvmrc

fix(dx): add node_modules guard to pre-push hook and pin Node 22 (#1368 )

2026-03-10 08:27:44 +04:00

.vercelignore

feat: Arabic font support and HLS live streaming UI (#1020 )

2026-03-05 10:16:43 +04:00

AGENTS.md

feat: harness engineering P0 - linting, testing, architecture docs (#1587 )

2026-03-14 21:29:21 +04:00

ARCHITECTURE.md

feat: harness engineering P0 - linting, testing, architecture docs (#1587 )

2026-03-14 21:29:21 +04:00

biome.json

feat(supply-chain): Global Shipping Intelligence — Sprint 0 + Sprint 1 (#2870 )

2026-04-09 17:06:03 +04:00

CHANGELOG.md

feat(route-explorer): Sprint 6 — free-tier blur + analytics + docs (#3000 )

2026-04-12 11:31:20 +04:00

CODE_OF_CONDUCT.md

docs: add community guidelines (contributing, code of conduct, security) (#226 )

2026-02-22 16:26:13 +02:00

compound-engineering.local.md

feat(finance-panels): add 7 macro/market panels + Daily Brief context (issues #2245-#2253) (#2258 )

2026-03-26 08:03:09 +04:00

CONTRIBUTING.md

fix: add India boundary override from Natural Earth 50m data (#1796 )

2026-03-19 02:40:05 +04:00

DEPLOYMENT-PLAN.md

feat(auth): integrate clerk.dev (#1812 )

2026-03-26 13:47:22 +02:00

docker-compose.yml

Revert "feat: seed orchestrator with auto-seeding, persistence, and managemen…" (#2060 )

2026-03-22 19:59:42 +04:00

Dockerfile

Revert "feat: seed orchestrator with auto-seeding, persistence, and managemen…" (#2060 )

2026-03-22 19:59:42 +04:00

Dockerfile.relay

feat(energy): Ember monthly electricity seed (V5-6a) (#2815 )

2026-04-08 12:25:54 +04:00

Dockerfile.seed-bundle-resilience-validation

fix(resilience): ship full scripts/ tree in validation Docker image (#3054 )

2026-04-13 15:43:55 +04:00

index.html

fix(csp): allow Stripe 3D Secure frames + consolidate Dodo CSP entries (#2806 )

2026-04-07 23:47:27 +04:00

LICENSE

chore: switch license to AGPL-3.0, externalize Sentry DSN

2026-02-19 07:24:47 +04:00

live-channels.html

feat(live): custom channel management with review fixes (#282 )

2026-02-23 22:51:44 +00:00

Makefile

feat(advisories): gold standard migration for security advisories (#1637 )

2026-03-15 11:54:08 +04:00

middleware.ts

fix(middleware): allow /api/seed-contract-probe to bypass bot UA filter (#3099 )

2026-04-15 09:40:35 +04:00

nixpacks.toml

fix(relay): add explicit start command to nixpacks.toml (#2403 )

2026-03-28 08:58:05 +04:00

package-lock.json

fix(seeds): upstream API drift — SPDR XLSX + IMF IRFCL + IMF-External BX/BM drop (#3076 )

2026-04-14 08:19:47 +04:00

package.json

fix(seeds): upstream API drift — SPDR XLSX + IMF IRFCL + IMF-External BX/BM drop (#3076 )

2026-04-14 08:19:47 +04:00

playwright.config.ts

test: add coverage for finance/trending/reload and stabilize map harness

2026-02-17 19:22:55 +04:00

README.md

docs: Update README.md with corrected links to documentation (#2551 )

2026-04-06 16:00:19 +04:00

SECURITY.md

security: harden IPC, gate DevTools, isolate external windows, exempt /api/version (#348 )

2026-02-25 06:14:16 +00:00

SELF_HOSTING.md

feat: self-hosted Docker stack (#1521 )

2026-03-19 12:07:20 +04:00

settings.html

docs: expand AGPL-3.0 license section in README (#1143 )

2026-03-06 23:47:04 +04:00

tsconfig.api.json

fix(resilience): satisfy release gate validation (#2686 )

2026-04-04 19:31:02 +04:00

tsconfig.json

Add missing layers to DeckGLMap for feature parity with D3 Map

2026-01-25 07:21:33 +00:00

vercel.json

fix(csp): add Stripe to payment Permissions-Policy (#2807 )

2026-04-08 00:33:51 +04:00

vite.config.ts

feat(resilience): add service proto and stub handlers (#2657 )

2026-04-04 08:04:46 +04:00

vitest.config.mts

feat: Dodo Payments integration + entitlement engine & webhook pipeline (#2024 )

2026-04-03 00:25:18 +04:00

README.md

World Monitor

Real-time global intelligence dashboard — AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface.

Documentation · Releases · Contributing

What It Does

435+ curated news feeds across 15 categories, AI-synthesized into briefs
Dual map engine — 3D globe (globe.gl) and WebGL flat map (deck.gl) with 45 data layers
Cross-stream correlation — military, economic, disaster, and escalation signal convergence
Country Intelligence Index — composite risk scoring across 12 signal categories
Finance radar — 92 stock exchanges, commodities, crypto, and 7-signal market composite
Local AI — run everything with Ollama, no API keys required
5 site variants from a single codebase (world, tech, finance, commodity, happy)
Native desktop app (Tauri 2) for macOS, Windows, and Linux
21 languages with native-language feeds and RTL support

For the full feature list, architecture, data sources, and algorithms, see the documentation.

Quick Start

git clone https://github.com/koala73/worldmonitor.git
cd worldmonitor
npm install
npm run dev

Open localhost:5173. No environment variables required for basic operation.

For variant-specific development:

npm run dev:tech       # tech.worldmonitor.app
npm run dev:finance    # finance.worldmonitor.app
npm run dev:commodity  # commodity.worldmonitor.app
npm run dev:happy      # happy.worldmonitor.app

See the self-hosting guide for deployment options (Vercel, Docker, static).

Tech Stack

Category	Technologies
Frontend	Vanilla TypeScript, Vite, globe.gl + Three.js, deck.gl + MapLibre GL
Desktop	Tauri 2 (Rust) with Node.js sidecar
AI/ML	Ollama / Groq / OpenRouter, Transformers.js (browser-side)
API Contracts	Protocol Buffers (92 protos, 22 services), sebuf HTTP annotations
Deployment	Vercel Edge Functions (60+), Railway relay, Tauri, PWA
Caching	Redis (Upstash), 3-tier cache, CDN, service worker

Full stack details in the architecture docs.

Flight Data

Flight data provided gracefully by Wingbits, the most advanced ADS-B flight data solution.

Data Sources

WorldMonitor aggregates 65+ external data sources across geopolitics, finance, energy, climate, aviation, cyber, military, infrastructure, and news intelligence. See the full data sources catalog for providers, feed tiers, and collection methods.

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

npm run typecheck        # Type checking
npm run build:full       # Production build

License

AGPL-3.0 for non-commercial use. Commercial license required for any commercial use.

Use Case	Allowed?
Personal / research / educational	Yes
Self-hosted (non-commercial)	Yes, with attribution
Fork and modify (non-commercial)	Yes, share source under AGPL-3.0
Commercial use / SaaS / rebranding	Requires commercial license

See LICENSE for full terms. For commercial licensing, contact the maintainer.

Author

Elie Habib — GitHub

Contributors

Security Acknowledgments

We thank the following researchers for responsibly disclosing security issues:

Cody Richard — Disclosed three security findings covering IPC command exposure, renderer-to-sidecar trust boundary analysis, and fetch patch credential injection architecture (2026)

See our Security Policy for responsible disclosure guidelines.

worldmonitor.app · docs.worldmonitor.app · finance.worldmonitor.app · commodity.worldmonitor.app

Star History

Languages

TypeScript 49.1%

JavaScript 47%

CSS 2.9%

HTML 0.4%

Rust 0.3%

Other 0.1%