36 Commits

Author SHA1 Message Date
Elie Habib
d9194a5179 fix(railway): tolerate Ubuntu apt mirror failures in NIXPACKS + Dockerfile builds (#3142)
Ubuntu's noble-security package-index CDN is returning hash-sum
mismatches (2026-04-17), causing ALL Railway NIXPACKS builds to fail
at the 'apt-get update && apt-get install curl' layer with exit
code 100. Multiple Railway services are red.

NIXPACKS' aptPkgs = ['curl'] generates a strict
'apt-get update && apt-get install -y' that fails hard on any
mirror error. Fix: replace aptPkgs with manual cmds that:
  1. Allow apt-get update to partially fail (|| true)
  2. Use --fix-missing on apt-get install so packages from healthy
     mirrors still install even if one mirror is broken

Same treatment for consumer-prices-core/Dockerfile.

Files changed:
- nixpacks.toml (root — used by ais-relay + standalone cron seeders)
- scripts/nixpacks.toml (used by bundled seed services)
- consumer-prices-core/Dockerfile

The || true on apt-get update is safe because:
  1. curl is the only package we install and it's often already present
     in the NIXPACKS base image (nix-env provides it)
  2. If curl genuinely isn't available, the seeder will fail at runtime
     with a clear 'curl: not found' error — not a silent degradation
2026-04-17 08:35:20 +04:00
Elie Habib
90f4ac0f78 feat(consumer-prices): strict search-hit validator (shadow mode) (#3101)
* feat(consumer-prices): add 'candidate' match state + negativeTokens schema

Schema foundation for the strict-validator plan:
- migration 008 widens product_matches.match_status CHECK to include
  'candidate' so weak search hits can be persisted without entering
  aggregates (aggregate.ts + snapshots filter on ('auto','approved')
  so candidates are excluded automatically).
- BasketItemSchema gains optional negativeTokens[] — config-driven
  reject tokens for obvious class errors (e.g. 'canned' for fresh
  tomatoes). Product-taxonomy splits like plain vs greek yogurt
  belong in separate substitutionGroup values, not here.
- upsertProductMatch accepts 'candidate' and writes evidence_json
  so reviewers can see why a match was downgraded.

* feat(consumer-prices): add validateSearchHit pure helper + known-bad test fixtures

Deterministic post-extraction validator that replaces the boolean
isTitlePlausible gate for scoring and candidate triage. Evaluates
four signals and returns { ok, score, reasons, signals }:

  - class-error rejects from BasketItem.negativeTokens (whole-token
    match for single words; substring match for hyphenated entries
    like 'plant-based' so 'Plant-Based Yogurt' trips without needing
    token-splitting gymnastics)
  - non-food indicators (seeds, fertilizer, planting) — shared with
    the legacy gate
  - token-overlap ratio over identity tokens (>2 chars, non-packaging)
  - quantity-window conformance against minBaseQty/maxBaseQty

Score is a 0..1 weighted sum (overlap 0.55, size 0.35/0.2/0, class-
clean 0.10). AUTO_MATCH_THRESHOLD=0.75 exported for the scrape-side
auto-vs-candidate decision.

Locked all five bad log examples into regression tests and added
matching positive cases so the rule set proves both sides of the
boundary. Also added vitest.config.ts so consumer-prices-core tests
run under its own config instead of inheriting the worldmonitor
root config (which excludes this directory).

* feat(consumer-prices): wire validator (shadow) + replace 1.0 auto-match

search.ts:
- Thread BasketItem constraints (baseUnit, min/maxBaseQty, negativeTokens,
  substitutionGroup) through discoverTargets → fetchTarget → parseListing
  using explicit named fields, not an opaque JSON blob.
- _extractFromUrl now runs validateSearchHit alongside isTitlePlausible.
  Legacy gate remains the hard gate; validator is shadow-only for now —
  when legacy accepts but validator rejects, a [search:shadow-reject]
  line is logged with reasons + score so the rollout diff report can
  inform the decision to flip the gate. No live behavior change.
- ValidatorResult attached to SearchPayload + rawPayload so scrape.ts
  can score the match without re-running the validator.

scrape.ts:
- Remove unconditional matchScore:1.0 / status:'auto' insert. Use the
  validator score from the adapter payload. Hits with ok=true and
  score >= AUTO_MATCH_THRESHOLD (0.75) keep 'auto'; everything else
  (including validator.ok=false) writes 'candidate' with evidence_json
  carrying the reasons + signals. Aggregates filter on ('auto','approved')
  so candidates are excluded automatically.
- Adapters without a validator (exa-search, etc.) fall back to the
  legacy 1.0/auto behavior so this PR is a no-op for non-search paths.

* feat(consumer-prices): populate negativeTokens for 6 known-bad groups

* fix(consumer-prices): enforce validator on pin path + drop 'cane' from sugar rejects

Addresses PR #3101 review:

1. Pinned direct hits bypassed the validator downgrade — the new
   auto-vs-candidate decision only ran inside the !wasDirectHit block,
   so a pin that drifted onto the wrong product (the steady-state
   common path) would still flow poisoned prices into aggregates
   through the existing 'auto' match. Now: before inserting an
   observation, if the direct hit's validator.ok === false, skip the
   observation and route the target through handlePinError so the pin
   soft-disables after 3 strikes. Legacy isTitlePlausible continues to
   gate the pin extraction itself.

2. 'cane' was a hard reject for sugar_white across all 10 baskets but
   'white cane sugar' is a legitimate SKU descriptor — would have
   downgraded real products to candidate and dropped coverage. Removed
   from every essentials_*.yaml sugar_white negativeTokens list.
   Added a regression test that locks in 'Silver Spoon White Cane
   Sugar 1kg' as a must-pass positive case.

* fix(consumer-prices): strip size tokens from identity + protect approved rows

Addresses PR #3101 round 2 review:

1. Compact size tokens ("1kg", "500g", "250ml") were kept as identity
   tokens. Firecrawl emits size spaced ("1 kg"), which tokenises to
   ["1","kg"] — both below the length>2 floor — so the compact "1kg"
   token could never match. Short canonical names like "Onions 1kg"
   lost 0.5 token overlap and legitimate hits landed at score 0.725 <
   AUTO_MATCH_THRESHOLD, silently downgrading to candidate. Size
   fidelity is already enforced by the quantity-window check; identity
   tokens now ignore /^\d+(?:\.\d+)?[a-z]+$/. New regression test
   locks in "Fresh Red Onions 1 kg" as a must-pass case.

2. upsertProductMatch's DO UPDATE unconditionally wrote EXCLUDED.status.
   A re-scrape whose validator scored an already-approved URL below
   0.75 would silently demote human-curated 'approved' rows to
   'candidate'. Added a CASE guard so approved stays approved; every
   other state follows the new validator verdict.

* fix(consumer-prices): widen curated-state guard to review + rejected

PR #3101 round 3: the CASE only protected 'approved' from being
overwritten. 'review' (written by validate.ts when a price is an
outlier, or by humans sending a row back) and 'rejected' (human
block) are equally curated — a re-scrape under this path silently
overwrites them with the fresh validator verdict and re-enables the
URL in aggregate queries on the next pass.

Widen the immutable set to ('approved','review','rejected'). Also
stop clearing pin_disabled_at on those rows so a quarantined pin
keeps its disabled flag until the review workflow resolves it.

* fix(analyze-stock): classify dividend frequency by median gap

recentDivs.length within a hard 365.25-day window misclassifies quarterly
payers whose last-year Q1 payment falls just outside the cutoff — common
after mid-April each year, when Date.now() - 365.25d lands after Jan's
payment timestamp. The test 'non-zero CAGR for a quarterly payer' flaked
calendar-dependently for this reason.

Prefer median inter-payment interval: quarterly = ~91d median gap,
regardless of where the trailing-12-month window happens to bisect the
payment series. Falls back to the old count when <2 entries exist.

Also documents the CAGR filter invariant in the test helper.

* fix(analyze-stock): suppress frequency when no recent divs + detect regime slowdowns

Addresses PR #3102 review:

1. Suspended programs no longer leak a frequency badge. When recentDivs
   is empty, dividendYield and trailingAnnualDividendRate are both 0;
   emitting 'Quarterly' derived from historical median would contradict
   those zeros in the UI. paymentsPerYear now short-circuits to 0 before
   the interval classifier runs.

2. Whole-history median-gap no longer masks cadence regime changes. The
   reconciliation now depends on trailing-year count:
     recent >= 3  → interval classifier (robust to calendar drift)
     recent 1..2  → inspect most-recent inter-payment gap:
                    > 180d = real slowdown, trust count (Annual)
                    <= 180d = calendar drift, trust interval (Quarterly)
     recent 0     → empty frequency (suspended)
   The interval classifier itself is now scoped to the last 2 years so
   it responds to regime changes instead of averaging over 5y of history.

Regression tests:
- 'emits empty frequency when the dividend program has been suspended' —
  3y of quarterly history + 18mo silence must report '' not 'Quarterly'.
- 'detects a recent quarterly → annual cadence change' — 12 historical
  quarterly payments + 1 recent annual payment must report 'Annual'.

* fix(analyze-stock): scope interval median to trailing year when recent>=3

Addresses PR #3102 review round 2: the reconciler's recent>=3 branch
called paymentsPerYearFromInterval(entries), which scopes to the last
2 years. A monthly→quarterly shift (12 monthly payments in year -2..-1
plus 4 quarterly in year -1..0) produced a 2-year median of ~30d and
misclassified as Monthly even though the current trailing-year cadence
is clearly quarterly.

Pass recentDivs directly to the interval classifier when recent>=3.
Two payments in the trailing year = 1 gap which suffices for the median
(gap count >=1, median well-defined). The historical-window 2y scoping
still applies for the recent 1..2 branch, where we actively need
history to distinguish drift from slowdown.

Regression test: 12 monthly payments from -13..-24 months ago + 4
quarterly payments inside the trailing year must classify as Quarterly.

* fix(analyze-stock): use true median (avg of two middles) for even gap counts

PR #3102 P2: gaps[floor(length/2)] returns the upper-middle value for
even-length arrays, biasing toward slower cadence at classifier
thresholds when the trailing-year sample is small. Use the average of
the two middles for even lengths. Harmless on 5-year histories with
50+ gaps where values cluster, but correct at sparse sample sizes where
the trailing-year branch can have only 2–3 gaps.
2026-04-15 14:28:18 +04:00
Elie Habib
044598346e feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated (#3097)
* feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated

Opt-in contract path in runSeed: when opts.declareRecords is provided, write
{_seed, data} envelope to the canonical key alongside legacy seed-meta:*
(dual-write). State machine: OK / OK_ZERO / RETRY with zeroIsValid opt.
declareRecords throws or returns non-integer → hard fail (contract violation).
extraKeys[*] support per-key declareRecords; each extra key writes its own
envelope. Legacy seeders (no declareRecords) entirely unchanged.

Migrated all 91 scripts/seed-*.mjs to contract mode. Each exports
declareRecords returning the canonical record count, and passes
schemaVersion: 1 + maxStaleMin (matched to api/health.js SEED_META, or 2.5x
interval where no registry entry exists). Contract conformance reports 84/86
seeders with full descriptor (2 pre-existing warnings).

Legacy seed-meta keys still written so unmigrated readers keep working;
follow-up slices flip health.js + readers to envelope-first.

Tests: 61/61 PR 1 tests still pass.

Next slices for PR 2:
- api/health.js registry collapse + 15 seed-bundle-*.mjs canonicalKey wiring
- reader migration (mcp, resilience, aviation, displacement, regional-snapshot)
- direct writers — ais-relay.cjs, consumer-prices-core publish.ts
- public-boundary stripSeedEnvelope + test migration

Plan: docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md

* fix(seed-contract): unwrap envelopes in internal cross-seed readers

After PR 2a enveloped 91 canonical keys as {_seed, data}, every script-side
reader that returned the raw parsed JSON started silently handing callers the
envelope instead of the bare payload. WoW baselines (bigmac, grocery-basket,
fear-greed) saw undefined .countries / .composite; seed-climate-anomalies saw
undefined .normals from climate:zone-normals:v1; seed-thermal-escalation saw
undefined .fireDetections from wildfire:fires:v1; seed-forecasts' ~40-key
pipeline batch returned envelopes for every input.

Fix: route every script-side reader through unwrapEnvelope(...).data. Legacy
bare-shape values pass through unchanged (unwrapEnvelope returns
{_seed: null, data: raw} for any non-envelope shape).

Changed:
- scripts/_seed-utils.mjs: import unwrapEnvelope; redisGet, readSeedSnapshot,
  verifySeedKey all unwrap. Exported new readCanonicalValue() helper for
  cross-seed consumers.
- 18 seed-*.mjs scripts with local redisGet-style helpers or inline fetch
  patched to unwrap via the envelope source module (subagent sweep).
- scripts/seed-forecasts.mjs pipeline batch: parse() unwraps each result.
- scripts/seed-energy-spine.mjs redisMget: unwraps each result.

Tests:
- tests/seed-utils-envelope-reads.test.mjs: 7 new cases covering envelope
  + legacy + null paths for readSeedSnapshot and verifySeedKey.
- Full seed suite: 67/67 pass (was 61, +6 new).

Addresses both of user's P1 findings on PR #3097.

* feat(seed-contract): envelope-aware reads in server + api helpers

Every RPC and public-boundary reader now automatically strips _seed from
contract-mode canonical keys. Legacy bare-shape values pass through unchanged
(unwrapEnvelope no-ops on non-envelope shapes).

Changed helpers (one-place fix — unblocks ~60 call sites):
- server/_shared/redis.ts: getRawJson, getCachedJson, getCachedJsonBatch
  unwrap by default. cachedFetchJson inherits via getCachedJson.
- api/_upstash-json.js: readJsonFromUpstash unwraps (covers api/mcp.ts
  tool responses + all its canonical-key reads).
- api/bootstrap.js: getCachedJsonBatch unwraps (public-boundary —
  clients never see envelope metadata).

Left intentionally unchanged:
- api/health.js / api/seed-health.js: read only seed-meta:* keys which
  remain bare-shape during dual-write. unwrapEnvelope already imported at
  the meta-read boundary (PR 1) as a defensive no-op.

Tests: 67/67 seed tests pass. typecheck + typecheck:api clean.

This is the blast-radius fix the PR #3097 review called out — external
readers that would otherwise see {_seed, data} after the writer side
migrated.

* fix(test): strip export keyword in vm.runInContext'd seed source

cross-source-signals-regulatory.test.mjs loads scripts/seed-cross-source-signals.mjs
via vm.runInContext, which cannot parse ESM `export` syntax. PR 2a added
`export function declareRecords` to every seeder, which broke this test's
static-analysis approach.

Fix: strip the `export` keyword from the declareRecords line in the
preprocessed source string so the function body still evaluates as a plain
declaration.

Full test:data suite: 5307/5307 pass. typecheck + typecheck:api clean.

* feat(seed-contract): consumer-prices publish.ts writes envelopes

Wrap the 5 canonical keys written by consumer-prices-core/src/jobs/publish.ts
(overview, movers:7d/30d, freshness, categories:7d/30d/90d, retailer-spread,
basket-series) in {_seed, data} envelopes. Legacy seed-meta:<key> writes
preserved for dual-write.

Inlined a buildEnvelope helper (10 lines) rather than taking a cross-package
dependency — consumer-prices-core is a standalone npm package. Documented the
four-file parity contract (mjs source, ts mirror, js edge mirror, this copy).

Contract fields: sourceVersion='consumer-prices-core-publish-v1', schemaVersion=1,
state='OK' (recordCount>0) or 'OK_ZERO' (legitimate zero).

Typecheck: no new errors in publish.ts.

* fix(seed-contract): 3 more server-side readers unwrap envelopes

Found during final audit:

- server/worldmonitor/resilience/v1/_shared.ts: resilience score reader
  parsed cached GetResilienceScoreResponse raw. Contract-mode seed-resilience-scores
  now envelopes those keys.
- server/worldmonitor/resilience/v1/get-resilience-ranking.ts: p05/p95
  interval lookup parsed raw from seed-resilience-scores' extra-key path.
- server/worldmonitor/infrastructure/v1/_shared.ts: mgetJson() used for
  count-source keys (wildfire:fires:v1, news:insights:v1) which are both
  contract-mode now.

All three now unwrap via server/_shared/seed-envelope. Legacy shapes pass
through unchanged.

Typecheck clean.

* feat(seed-contract): ais-relay.cjs direct writes produce envelopes

32 canonical-key write sites in scripts/ais-relay.cjs now produce {_seed, data}
envelopes. Inlined buildEnvelope() (CJS module can't require ESM source) +
envelopeWrite(key, data, ttlSeconds, meta) wrapper. Enveloped keys span market
bootstrap, aviation, cyber-threats, theater-posture, weather-alerts, economic
spending/fred/worldbank, tech-events, corridor-risk, usni-fleet, shipping-stress,
social:reddit, wsb-tickers, pizzint, product-catalog, chokepoint transits,
ucdp-events, satellites, oref.

Left bare (not seeded data keys): seed-meta:* (dual-write legacy),
classifyCacheKey LLM cache, notam:prev-closed-state internal state,
wm:notif:scan-dedup flags.

Updated tests/ucdp-seed-resilience.test.mjs regex to accept both upstashSet
(pre-contract) and envelopeWrite (post-contract) call patterns.

* feat(seed-contract): 15 bundle files add canonicalKey for envelope gate

54 bundle sections across 12 files now declare canonicalKey alongside the
existing seedMetaKey. _bundle-runner.mjs (from PR 1) prefers canonicalKey
when both are present — gates section runs on envelope._seed.fetchedAt
read directly from the data key, eliminating the meta-outlives-data class
of bugs.

Files touched:
- climate (5), derived-signals (2), ecb-eu (3), energy-sources (6),
  health (2), imf-extended (4), macro (10), market-backup (9),
  portwatch (4), relay-backup (2), resilience-recovery (5), static-ref (2)

Skipped (14 sections, 3 whole bundles): multi-key writers, dynamic
templated keys (displacement year-scoped), or non-runSeed orchestrators
(regional brief cron, resilience-scores' 222-country publish, validation/
benchmark scripts). These continue to use seedMetaKey or their own gate.

seedMetaKey preserved everywhere — dual-write. _bundle-runner.mjs falls
back to legacy when canonicalKey is absent.

All 15 bundles pass node --check. test:data: 5307/5307. typecheck:all: clean.

* fix(seed-contract): 4 PR #3097 review P1s — transform/declareRecords mismatches + envelope leaks

Addresses both P1 findings and the extra-key seed-meta leak surfaced in review:

1. runSeed helper-level invariant: seed-meta:* keys NEVER envelope.
   scripts/_seed-utils.mjs exports shouldEnvelopeKey(key) — returns false for
   any key starting with 'seed-meta:'. Both atomicPublish (canonical) and
   writeExtraKey (extras) gate the envelope wrap through this helper. Fixes
   seed-iea-oil-stocks' ANALYSIS_META_EXTRA_KEY silently getting enveloped,
   which broke health.js parsing the value as bare {fetchedAt, recordCount}.
   Also defends against any future manual writeExtraKey(..., envelopeMeta)
   call that happens to target a seed-meta:* key.

2. seed-token-panels canonical + extras fixed.
   publishTransform returns data.defi (the defi panel itself, shape {tokens}).
   Old declareRecords counted data.defi.tokens + data.ai.tokens + data.other.tokens
   on the transformed payload → 0 → RETRY path → canonical market:defi-tokens:v1
   never wrote, and because runSeed returned before the extraKeys loop,
   market:ai-tokens:v1 + market:other-tokens:v1 stayed stale too.
   New: declareRecords counts data.tokens on the transformed shape. AI_KEY +
   OTHER_KEY extras reuse the same function (transforms return structurally
   identical panels). Added isMain guard so test imports don't fire runSeed.

3. api/product-catalog.js cached reader unwraps envelope.
   ais-relay.cjs now envelopes product-catalog:v2 via envelopeWrite(). The
   edge reader did raw JSON.parse(result) and returned {_seed, data} to
   clients, breaking the cached path. Fix: import unwrapEnvelope from
   ./_seed-envelope.js, apply after JSON.parse. One site — :238-241 is
   downstream of getFromCache(), so the single reader fix covers both.

4. Regression lock tests/seed-contract-transform-regressions.test.mjs (11 cases):
   - shouldEnvelopeKey invariant: seed-meta:* false, canonical true
   - Token-panels declareRecords works on transformed shape (canonical + both extras)
   - Explicit repro of pre-fix buggy signature returning 0 — guards against revert
   - resolveRecordCount accepts 0, rejects non-integer
   - Product-catalog envelope unwrap returns bare shape; legacy passes through

Verification:
- npm run test:data → 5318/5318 pass (was 5307 — 11 new regressions)
- npm run typecheck:all → clean
- node --check on every modified script

iea-oil-stocks canonical declareRecords was NOT broken (user confirmed during
review — buildIndex preserves .members); only its ANALYSIS_META_EXTRA_KEY
was affected, now covered generically by commit 1's helper invariant.

* fix(seed-contract): seed-token-panels validateFn also runs on post-transform shape

Review finding: fixing declareRecords wasn't sufficient — atomicPublish() runs
validateFn(publishData) on the transformed payload too. seed-token-panels'
validate() checked data.defi/.ai/.other on the transformed {tokens} shape,
returned false, and runSeed took the early skipped-write branch (before even
reaching the declareRecords RETRY logic). Net effect: same as before the
declareRecords fix — canonical + both extras stayed stale.

Fix: validate() now checks the canonical defi panel directly (Array.isArray
(data?.tokens) && has at least one t.price > 0). AI/OTHER panels are validated
implicitly by their own extraKey declareRecords on write.

Audited the other 9 seeders with publishTransform (bls-series, bis-extended,
bis-data, gdelt-intel, trade-flows, iea-oil-stocks, jodi-gas, sanctions-pressure,
forecasts): all validateFn's correctly target the post-transform shape. Only
token-panels regressed.

Added 4 regression tests (tests/seed-contract-transform-regressions.test.mjs):
- validate accepts transformed panel with priced tokens
- validate rejects all-zero-price tokens
- validate rejects empty/missing tokens
- Explicit pre-fix repro (buggy old signature fails on transformed shape)

Verification:
- npm run test:data → 5322/5322 pass (was 5318; +4 new)
- npm run typecheck:all → clean
- node --check clean

* feat(seed-contract): add /api/seed-contract-probe validation endpoint

Single machine-readable gate for 'is PR #3097 working in production'.
Replaces the curl/jq ritual with one authenticated edge call that returns
HTTP 200 ok:true or 503 + failing check list.

What it validates:
- 8 canonical keys have {_seed, data} envelopes with required data fields
  and minRecords floors (fsi-eu, zone-normals, 3 token panels + minRecords
  guard against token-panels RETRY regression, product-catalog, wildfire,
  earthquakes).
- 2 seed-meta:* keys remain BARE (shouldEnvelopeKey invariant; guards
  against iea-oil-stocks ANALYSIS_META_EXTRA_KEY-class regressions).
- /api/product-catalog + /api/bootstrap responses contain no '_seed' leak.

Auth: x-probe-secret header must match RELAY_SHARED_SECRET (reuses existing
Vercel↔Railway internal trust boundary).

Probe logic is exported (checkProbe, checkPublicBoundary, DEFAULT_PROBES) for
hermetic testing. tests/seed-contract-probe.test.mjs covers every branch:
envelope pass/fail on field/records/shape, bare pass/fail on shape/field,
missing/malformed JSON, Redis non-2xx, boundary seed-leak detection,
DEFAULT_PROBES sanity (seed-meta invariant present, token-panels minRecords
guard present).

Usage:
  curl -H "x-probe-secret: $RELAY_SHARED_SECRET" \
       https://api.worldmonitor.app/api/seed-contract-probe

PR 3 will extend the probe with a stricter mode that asserts seed-meta:*
keys are GONE (not just bare) once legacy dual-write is removed.

Verification:
- tests/seed-contract-probe.test.mjs → 15/15 pass
- npm run test:data → 5338/5338 (was 5322; +16 new incl. conformance)
- npm run typecheck:all → clean

* fix(seed-contract): tighten probe — minRecords on AI/OTHER + cache-path source header

Review P2 findings: the probe's stated guards were weaker than advertised.

1. market:ai-tokens:v1 + market:other-tokens:v1 probes claimed to guard the
   token-panels extra-key RETRY regression but only checked shape='envelope'
   + dataHas:['tokens']. If an extra-key declareRecords regressed to 0, both
   probes would still pass because checkProbe() only inspects _seed.recordCount
   when minRecords is set. Now both enforce minRecords: 1.

2. /api/product-catalog boundary check only asserted no '_seed' leak — which
   is also true for the static fallback path. A broken cached reader
   (getFromCache returning null or throwing) could serve fallback silently
   and still pass this probe. Now:
   - api/product-catalog.js emits X-Product-Catalog-Source: cache|dodo|fallback
     on the response (the json() helper gained an optional source param wired
     to each of the three branches).
   - checkPublicBoundary declaratively requires that header's value match
     'cache' for /api/product-catalog, so a fallback-serve fails the probe
     with reason 'source:fallback!=cache' or 'source:missing!=cache'.

Test updates (tests/seed-contract-probe.test.mjs):
- Boundary check reworked to use a BOUNDARY_CHECKS config with optional
  requireSourceHeader per endpoint.
- New cases: served-from-cache passes, served-from-fallback fails with source
  mismatch, missing header fails, seed-leak still takes precedence, bad
  status fails.
- Token-panels sanity test now asserts minRecords≥1 on all 3 panels.

Verification:
- tests/seed-contract-probe.test.mjs → 17/17 pass (was 15, +2 net)
- npm run test:data → 5340/5340
- npm run typecheck:all → clean
2026-04-15 09:16:27 +04:00
Elie Habib
224d6fa2e3 fix(consumer-prices): count risers+fallers in movers recordCount (#3098)
* fix(consumer-prices): count risers+fallers in movers recordCount

Health endpoint reported consumerPricesMovers as EMPTY_DATA whenever the
30d window had zero risers, because recordCount's `??` chain in publish.ts
picks only one sibling array. Bipolar payloads (risers[] + fallers[]) need
the sum; otherwise a valid all-fallers payload registers as 0 records and
trips false staleness alarms.

Fix both the authoritative publish job and the manual fallback seed script.

* fix(consumer-prices): floor movers recordCount at 1 + include essentialsSeries fallback

Addresses PR #3098 review:

1. All-flat markets (every sampled item unchanged) legitimately produce
   risers=[] AND fallers=[] from buildMoversSnapshot. Summing the two still
   yields 0 → health reports EMPTY_DATA for a valid snapshot. Floor at 1;
   advanceSeedMeta already gates writes on upstream freshness, so this
   can't mask an upstream-unavailable case.

2. Seed script's non-movers fallback was missing essentialsSeries, so
   basket-series payloads from the manual script reported recordCount=1
   instead of the series length. Align with publish.ts.

* fix(consumer-prices): force recordCount=0 for upstreamUnavailable placeholders

Addresses PR #3098 review: flooring movers at 1 in the manual fallback seeder
also floored the synthetic emptyMovers() placeholder (upstreamUnavailable=true)
the script writes when BASE_URL is unset or the upstream returns null. Since
writeExtraKeyWithMeta always persists seed-meta, that made a real outage read
green in api/health.js. Short-circuit upstreamUnavailable payloads to 0 so
the outage surfaces.
2026-04-15 09:06:24 +04:00
Elie Habib
261d40dfa1 fix(consumer-prices): block restaurant URLs and reject null productName extractions (#2156)
Two data-quality regressions found in the 2026-03-23 scrape:

1. ananinja_sa: Exa returns restaurant menu URLs (e.g. /restaurants/burger-king)
   alongside product pages. isTitlePlausible passes "onion rings" for "Onions 1kg"
   because "onion" is a substring. Add urlPathContains: /product/ to restrict Exa
   results to product pages only, matching the existing pattern used by tamimi/carrefour.

2. search.ts parseListing: when Firecrawl can't extract a productName, we fell back
   to the canonical name as rawTitle. This silently stored unverifiable matches
   (raw_title = canonical_name) as real prices — e.g. Noon matched a chicken product
   as "Eggs Fresh 12 Pack". Reject the result outright when productName is missing.

DB: manually disabled 30 known-bad pins (26 canonical-name=raw-title failures,
2 restaurant URLs, 2 bundle-size mismatches) via pin_disabled_at.
2026-03-23 21:47:53 +04:00
Elie Habib
f6f0fcbd55 feat(consumer-prices): add second retailers for 5 markets + re-enable carrefour_sa (#2157)
Addresses the single-retailer confidence problem: 6 of 8 markets had only 1
active retailer, meaning any scrape failure or bad pin produced undetectable bad data.

Changes:
- carrefour_sa: re-enabled (#2148 OOS prompt fix merged)
- noon_sa (NEW): second SA retailer; Exa finds noon.com/saudi-en/ URLs
  urlPathContains=/saudi-en/ isolates SA from UAE/Egypt storefronts
- coles_au (NEW): second AU retailer alongside Woolworths
- ocado_gb (NEW): second GB retailer alongside Tesco
- jiomart_in (NEW): second IN retailer alongside BigBasket
- coldstorage_sg (NEW): second SG retailer alongside FairPrice
- carrefour_br (NEW): second BR retailer alongside Pão de Açúcar

DB: disabled Walmart organic jasmine rice (wrong type) and Kroger PLU-4093 onion (by-weight produce item, not 3lb bag)
2026-03-23 21:47:17 +04:00
Elie Habib
96f568f934 fix(ananinja-sa): add inStockFromPrice=true — price aggregator, Firecrawl misreads availability (#2147) 2026-03-23 17:13:44 +04:00
Elie Habib
c49f5290d0 fix(consumer-prices): add size-aware extraction to prevent Walmart pack-size mismatches (#2148)
Extend SearchAdapter to embed a size constraint clause in the Firecrawl
prompt when the canonical name contains a parseable size (e.g. "1 gallon",
"5lb"). This guides the LLM to reject wrong-size results instead of picking
the nearest listing regardless of pack count or volume.

Changes:
- size.ts: add gallon/gal/fl to UNIT_MAP; extend PACK_PATTERN to accept
  word-separator "N pack" in addition to "Nx" and "N×"
- search.ts: import parseSize; add exported extractSizeHint() helper;
  add sizeText field to ExtractedProduct; embed size clause in Firecrawl
  prompt; propagate rawSizeText from extracted data instead of hardcoding null
- tests: add gallon/gal/pack tests to size.test.ts; new
  search-extract-size.test.ts for extractSizeHint coverage
2026-03-23 17:08:14 +04:00
Elie Habib
c1c016b03c feat(sa): add ananinja_sa (9/12 items) + fix OOS price hallucination in extraction prompt (#2139)
ananinja.com is well-indexed by Exa with real SAR prices (eggs 23, milk 6,
chicken 17 SAR). carrefour_sa and panda_sa added as disabled configs for
future re-evaluation.

Also fixes Firecrawl extraction prompt that instructed it to return a price
"even if out of stock", causing it to grab carousel prices (always 3.95 SAR)
when the main product had no displayed price. Updated prompt now returns null
for OOS items with no price shown.

Tamimi disabled (0/12 after query tweak). carrefour_sa mostly OOS.
panda.sa has no Exa indexing.
2026-03-23 15:21:55 +04:00
Elie Habib
0c8a754f43 fix(tamimi-sa): disable retailer after 0/12 products on 2026-03-23 (#2137)
Firecrawl can't extract tamimimarkets.com even with updated query template.
Soft-disabled with dated comment; re-enable once extraction is unblocked.
2026-03-23 15:07:20 +04:00
Elie Habib
56dba6c159 feat(consumer-prices): product pinning, BigBasket fix, spread threshold, retailer sync (#2136)
* feat(consumer-prices): product pinning, BigBasket fix, spread threshold, retailer sync

Implements scraper stability plan to prevent URL churn between runs:

Product pinning (core fix):
- Migration 007: adds pin_disabled_at, consecutive_out_of_stock, pin_error_count columns
- After first successful Exa+Firecrawl match, reuse the stored URL directly on subsequent
  runs without re-running Exa. Stale pins are soft-disabled (never deleted) after 3x OOS
  or 3x fetch errors, triggering automatic Exa rediscovery on the next run.
- On direct-path failure, falls back to normal Exa flow in the same run.
- Compound map key "basketSlug:canonicalName" prevents collisions in multi-basket markets.

Retailer active-state sync:
- getOrCreateRetailer now writes active=config.enabled on every upsert.
- scrapeAll iterates ALL configs (not just enabled) so disabled retailers get synced to DB.
- Eliminates need for manual SQL hotfixes to set active=false.

Analytics correctness:
- All product_matches reads in aggregate, validate, worldmonitor snapshot now filter
  pin_disabled_at IS NULL so soft-disabled stale matches don't skew indices.
- getBaselinePrices adds missing match_status IN ('auto','approved') guard.

BigBasket IN fix:
- inStockFromPrice: true flag overrides out-of-stock when price > 0.
- BigBasket gates on delivery pincode, not product availability — Firecrawl misread
  the pincode gate as out-of-stock for all 12 basket items.

Spread reliability:
- Minimum 4 common categories required to compute retailer_spread_pct.
- Writes explicit 0 when below threshold to prevent stale noisy value persisting.
- US spread 134.8% from single cooking_oil pair is now suppressed.

Other:
- Tamimi SA: better Exa query template targeting tamimimarkets.com directly.
- Remove KE from frontend MARKETS array (basket data preserved in DB).
- 13 new vitest unit tests covering pinning, inStockFromPrice, host validation.

* fix(scrape): distinguish wasDirectHit from isDirect to close pin-fallback loop

When a pinned target falls back to Exa (fetchTarget sets direct:false in
payload), the old code read isDirect from target.metadata (always true),
causing two bugs:
1. upsertProductMatch was skipped → new Exa URL never pinned
2. Stale-pin counters reset on any pin target → broken pins never disabled

Introduce wasDirectHit = isDirect && rawPayload.direct === true so:
- Stale-pin maintenance (OOS/reset) only fires when pin URL was actually used
- Exa fallback (isDirect && !wasDirectHit) fires handlePinError on old pin
- upsertProductMatch guard uses !wasDirectHit so fallback results get pinned
2026-03-23 14:19:30 +04:00
Elie Habib
ed12bf5c84 fix(consumer-prices): disable migros_ch, coop_ch, sainsburys_gb (#2113)
All three return 0/12 every run:
- migros_ch, coop_ch: Exa domain bleed (competitor URLs returned)
- sainsburys_gb: Exa returns no pages for all GB grocery queries

Disabled with dated comments explaining the root cause.
2026-03-23 08:41:12 +04:00
Elie Habib
8db8ef3f6e fix(consumer-prices): disable wholefoods_us -- Amazon login wall blocks Firecrawl (#2100)
* fix(consumer-prices): disable wholefoods_us -- Amazon login wall blocks Firecrawl

WFM product pages require Amazon authentication. Without a session cookie,
every page renders an auth modal containing img-login-tomato.png and
img-login-cheese.png. Firecrawl's LLM extracts those image alt-texts as
the product name, causing 11/12 title mismatch failures per run.

Confirmed via direct Firecrawl scrape: markdown shows "Log in with Amazon"
and the tomato/cheese auth prompt images -- no product content rendered.
US basket is well covered by kroger_us (12/12) and walmart_us (11/12).

* fix(consumer-prices): seed 9 new global baskets and their canonical products

Migration 006 seeds baskets for AU, BR, CH, GB, IN, KE, SA, SG, US.
These were added as YAML configs in the global expansion PR but never
seeded into the DB, causing "Basket not found in DB -- run seed first"
failures for all 9 markets in every aggregate run.

Also fixes ON CONFLICT for canonical_products to use the partial index
added in migration 003 (canonical_products_name_category_null_idx),
which covers rows where brand_norm/variant_norm/size_value/size_unit
are all NULL. Shared canonical names (Basmati Rice, Sunflower Oil 1L,
White Sugar 1kg, etc.) are reused via ON CONFLICT DO NOTHING.

Note: migration 005 (computed_indices_null_idx) must also be applied
on the production DB -- it was committed but may not have run, causing
the separate "no unique or exclusion constraint" error on essentials-ae.
2026-03-23 08:04:25 +04:00
Elie Habib
47df75daab fix(consumer-prices): parallel scraping, retailer config fixes, disable naivas (#2086)
* fix(consumer-prices): parallel scraping, retailer config fixes, disable naivas

- scrapeAll() now uses 5-worker concurrent pool (was sequential) -- cuts
  28-min run to ~5-6 min since each retailer hits a different domain
- Move initProviders/teardownAll out of scrapeRetailer() to avoid
  registry teardown race when workers share the _providers singleton
- sainsburys_gb: fix baseUrl to groceries.sainsburys.co.uk (Exa returns
  URLs from this subdomain, not www.sainsburys.co.uk); remove wrong path filter
- coop_ch: fix urlPathContains /de/food/ -> /de/lebensmittel/ (German site)
- migros_ch: remove urlPathContains /de/produkt/ (Exa doesn't index these paths)
- naivas_ke: disable (Exa returns 0 matching URLs for www.naivas.online,
  wasting 72 API calls * 6s each per run)

* fix(consumer-prices): harden scrapeRetailer against stuck runs and teardown leaks

P1: move API key validation before createScrapeRun() so a missing key never
leaves a scrape_run row stuck in status='running' forever

P2: wrap single-retailer CLI path in try/finally so teardownAll() is called
even when scrapeRetailer() throws, preventing Playwright process leaks

P2: add comment explaining why initProviders() is kept in scrapeAll() --
GenericPlaywrightAdapter (playwright/p0 adapters) uses fetchWithFallback()
from the registry; search/exa-search bypass it with their own instances

P3: add comment in migros_ch.yaml documenting why urlPathContains was removed
2026-03-23 01:31:56 +04:00
Elie Habib
895bcc3ba3 feat(consumer-prices): SearchAdapter global pipeline -- 10 markets, 18 retailers (#2063)
* fix(consumer-prices): smarter Exa queries + Firecrawl URL fallback for missed prices

- Add market-aware query building (maps marketCode to country name) so LuLu
  queries prefer UAE pages over BHD/SAR/QAR GCC country pages
- Support searchQueryTemplate in acquisition config per retailer:
  - noon_grocery_ae: adds "grocery fresh food" to avoid seeds and storage boxes
  - lulu_ae: adds UAE country hint to anchor to the right market
- Firecrawl URL fallback: when all Exa summaries fail price extraction, scrape
  the first result URL with Firecrawl (JS-rendered markdown surfaces prices
  that static Exa summaries miss, fixes Noon chicken/rice/bread pricing gaps)
- Pass FIRECRAWL_API_KEY to ExaSearchAdapter in scrape.ts

* feat(consumer-prices): SearchAdapter -- Exa URL discovery + Firecrawl structured extraction

Replace ExaSearchAdapter regex-on-AI-summary with proper two-stage pipeline:
Stage 1 (Exa): neural search on retailer domain, ranked product page URLs.
Stage 2 (Firecrawl): structured LLM extraction from JS-rendered page.

Fixes Noon JS price invisibility and LuLu wrong-currency results.
Domain allowlist (exact hostname) + isTitlePlausible (40% token overlap) block
seed/container mismatches. All 4 AE retailers switched to adapter: search.
queryTemplate per retailer: LuLu anchors to UAE, Noon adds grocery food context.

* fix(consumer-prices): SearchAdapter smoke test fixes

- Fix FirecrawlProvider extract response path: data.data.extract not data.data
- Add prompt to ExtractSchema so Firecrawl LLM parses split prices (4 + .69 = 4.69)
- Update extract prompt to return listed price even when out of stock
- isTitlePlausible: strip packaging words (Pack/Box/Bag) from canonical tokens so
  size variants (eggs x 6, eggs x 15) match canonical Eggs Fresh 12 Pack on eggs token
  while keeping 40% threshold to block seeds/canned goods
- Add urlPathContains to SearchConfigSchema for URL path filtering
- Noon: urlPathContains /p/ blocks category pages returning wrong products
- LuLu: baseUrl -> gcc.luluhypermarket.com (site migrated), urlPathContains /en-ae/
- All retailers: numResults 3 -> 5 for more fallback URLs
- Add ExtractSchema.prompt field to acquisition types

* feat(consumer-prices): SearchAdapter genericity improvements

- MARKET_NAMES: expand to 20 markets (GB, US, CA, AU, DE, FR, NL, SG, IN, PK, NG, KE, ZA)
  fallback to marketCode.toUpperCase() for any unlisted market
- Default query: add 'grocery' hint so new retailers work well without custom queryTemplate
- queryTemplate: add {category} token (maps to basket item category)
- isTitlePlausible: naive stemming (tomatoes->tomato, carrots->carrot) with two guards:
  1. NON_FOOD_INDICATORS set (seeds/planting/seedling) rejects garden products before token check
  2. stem only applied when result length >= 4 and differs from original (blocks egg->egg false positive)
- stem guard prevents 'eggs'->'egg' matching 'Egg Storage Box Container'
- stem guard allows 'tomatoes'->'tomato' matching 'Fresh Tomato India 1kg'

* feat(consumer-prices): global expansion -- 9 markets, 13 retailers, USD normalization

- Add 9 basket configs: US, UK, SG, IN, CH, SA, AU, KE, BR
- Add 13 retailer configs: Walmart/Kroger/WholeFoods (US), Tesco/Sainsbury's (UK),
  FairPrice (SG), BigBasket (IN), Migros/Coop (CH), Tamimi (SA),
  Woolworths (AU), Naivas (KE), Pao de Acucar (BR)
- Add src/fx/rates.ts: static FX table (16 currencies to USD)
- aggregate.ts: compute basket_total_usd metric for cross-country comparison
- search.ts: add Switzerland (ch) and Brazil (br) to MARKET_NAMES

* fix(consumer-prices): code review fixes -- match gate, MARKET_NAMES dedup, domain check

- scrape.ts: extend match-creation gate to include search adapter (was exa-search only)
  Without this, all 13 new global retailers never wrote product_matches rows so the
  aggregate job produced no index values -- the global expansion was a silent no-op
- market-names.ts: extract shared MARKET_NAMES; exa-search had an incomplete 7-market
  copy silently producing blank market context for non-GCC queries
- exa-search.ts: add isAllowedHost check before firecrawlFetch (domain validation bypass)
- fx/rates.ts: add RATES_DATE export for ops staleness visibility
- essentials_in/ke: add 12th item (paneer / processed cheese) for coverage parity
- wholefoods_us: add urlPathContains /product/ to block non-product Exa results
2026-03-23 00:21:09 +04:00
Elie Habib
09dbee1eaa fix(consumer-prices): fold validate into aggregate job, remove separate cron step (#2062) 2026-03-22 20:29:18 +04:00
Elie Habib
a46cce463e feat(consumer-prices): add validate job as price sanity gate (#2059)
* feat(consumer-prices): add validate job as price sanity gate between scrape and aggregate

* fix(consumer-prices): fix computed_indices duplicate rows from NULL unique constraint
2026-03-22 20:25:15 +04:00
Elie Habib
a060ec773b feat(consumer-prices): add Spinneys UAE retailer; add SPAR UAE config (disabled) (#2054) 2026-03-22 13:44:19 +04:00
Elie Habib
20f83701e5 fix(consumer-prices): drop unused schema and fix snapshot correctness (#2048) 2026-03-22 13:10:01 +04:00
Elie Habib
6d66c06f07 fix(consumer-prices): pipeline hardening, basket spread fix, panel bugs, sw-update test sync (#2040)
* fix(consumer-prices): harden scrape/aggregate/publish pipeline

- scrape: treat 0-product parse as error (increments errorsCount, skips
  pagesSucceeded) so noon_grocery_ae missing eggs_12/tomatoes_1kg marks
  the run partial instead of completed
- publish: fix freshData gate (freshnessMin >= 0) so a scrape finishing
  at exactly 0 min lag still advances seed-meta
- aggregate: wrap per-basket aggregation in try/catch so one failing
  basket does not skip remaining baskets; re-throw if any failed
- seed-consumer-prices.mjs: require --force flag to prevent accidentally
  stomping publish.ts 26h TTLs with short 10-60min fallback TTLs

* fix(consumer-prices): correct basket comparison with intersection + dedup

Both aggregate.ts and the retailer spread snapshot were summing ALL
matched SKUs per retailer without deduplication, making Carrefour
appear most expensive simply because it had more matched products
(31 "items" vs Noon's 20 for a 12-item basket).

Fixes:
- aggregate.ts retailer_spread_pct: deduplicate per (retailer, basketItem)
  taking cheapest price, then only compare on items all retailers carry
- worldmonitor.ts buildRetailerSpreadSnapshot: same dedup + intersection
  logic in SQL — one best_price per (retailer, basket_item), common_items
  CTE filters to items every active retailer covers
- exa-search.ts parseListing: log whether Exa returned 0 results or
  results with no extractable price, to distinguish the two failure modes

* fix(consumer-prices-panel): correct parse rate display, category names, and freshness colors

- parseSuccessRate is stored as 0-100 but UI was doing *100 again (shows 10000%)
- Category name builder converts snake_case to Title Case (Cooking_oil → Cooking Oil)
- Add missing cp-fresh--ok/warn/stale/unknown CSS classes (freshness labels had no color)
- Add border-radius to stat cards and range buttons; add font-family to range buttons
- Add padding + bottom border to cp-range-bar for visual separation

* fix(consumer-prices): gate overview spread_pct query to last 2 days

buildOverviewSnapshot queried retailer_spread_pct with no recency
filter, so ORDER BY metric_date DESC LIMIT 1 would serve an
arbitrarily old row when today's aggregate run omitted a write
(no retailer intersection). Add INTERVAL '2 days' cutoff — covers
24h cron cadence plus scheduling drift. Falls through to 0 (→ UI
shows '—') when no recent value exists.
2026-03-22 11:46:40 +04:00
Elie Habib
1de2b1e674 chore(consumer-prices): note aggregate runs as independent Railway cron (#2015) 2026-03-21 21:57:20 +04:00
Elie Habib
ec6edb8cb8 chore(consumer-prices): note publish runs as independent Railway cron (#2014) 2026-03-21 21:40:32 +04:00
Elie Habib
fdef684af2 fix(consumer-prices): select cats.category in GROUP BY to fix publish SQL error (#2012)
PostgreSQL requires every non-aggregated SELECT expression to appear in
GROUP BY. t.category and cats.category are equal by the JOIN condition but
PG doesn't fold them — selecting cats.category (which is already in GROUP BY)
resolves the error on overview:ae and categories:ae queries.
2026-03-21 21:22:18 +04:00
Elie Habib
9feefecd1e fix(consumer-prices): prevent Playwright hang from blocking aggregate + publish (#2006)
* fix(consumer-prices): prevent Playwright hang from blocking aggregate + publish jobs

browser.close() can silently hang after scraping completes, keeping the
node process alive indefinitely. This blocked the && chain in the Railway
startCommand, so aggregate.js and publish.js never ran.

Two-layer fix:
- teardown(): race browser.close() against a 5s timeout so Chromium
  unresponsiveness never blocks teardownAll()
- scrape.ts: add a 12-minute hard-kill timer as an ultimate safety net
  in case any other async handle prevents natural exit

* fix(consumer-prices): replace startup watchdog with bounded closePool timeout

The 12-minute hard-kill timer started at process startup, meaning a
legitimate long scrape would be killed mid-job with exit code 0 —
letting aggregate and publish run against partial data.

Replace with a 5s race on closePool() in the finally block, mirroring
the teardown() fix in playwright.ts. With both hang points bounded,
main() always resolves promptly after scraping completes and
process.exit() fires immediately with the correct exit code.
2026-03-21 20:25:21 +04:00
Elie Habib
4ee8065e6e fix(consumer-prices): ensure scrape job exits 0 to unblock aggregate and publish (#1999)
* fix(consumer-prices): ensure scrape job exits 0 to unblock aggregate and publish

Playwright browser teardown or lingering event loop handles were causing
node to exit with a non-zero code, silently breaking the && chain so
aggregate and publish never ran, leaving Redis empty and the panel stuck
at "Data collection in progress".

Wraps the entry point in an explicit async main() with try/finally and
forces process.exit(0) so the && chain always proceeds to aggregate and
publish regardless of Playwright or pg cleanup errors.

* fix(consumer-prices): preserve non-zero exit code for real scrape failures

The previous fix always exited 0, masking actual failures like DB errors
or config issues and making them look like successful runs to Railway.

Now only teardown noise (Playwright/pg handles) is neutralized by forcing
process.exit() — real failures set process.exitCode = 1 so Railway still
sees a failed run and alerting remains accurate.
2026-03-21 17:49:29 +04:00
Elie Habib
cb5a4ebd71 fix(consumer-prices): call closePool() after each job so && chain exits cleanly (#1978)
scrape.js, aggregate.js and publish.js all open a pg.Pool which keeps the
Node.js event loop alive indefinitely. Without pool.end(), the process
never exits — so "node scrape.js && node aggregate.js && node publish.js"
runs only scrape and then hangs, causing aggregate and publish to never run
and leaving all consumer-prices:* Redis keys empty.

Fix: import closePool() from db/client.ts and call it in a .finally()
on the top-level promise of each job, ensuring the process exits after
work completes (or on error).
2026-03-21 11:54:05 +04:00
Elie Habib
2e16159bb6 feat(economic): WoW price tracking + weekly cadence for BigMac & Grocery panels (#1974)
* feat(economic): add WoW tracking and fix plumbing for bigmac/grocery-basket panels

Phase 1 — Fix Plumbing:
- Adjust CACHE_TTL to 10 days (864000s) for bigmac and grocery-basket seeds
- Align health.js SEED_META maxStaleMin to 10080 (7 days) for both
- Add grocery-basket and bigmac to seed-health.js SEED_DOMAINS with intervalMin: 5040
- Refactor publish.ts writeSnapshot to accept advanceSeedMeta param; only
  advance seed-meta when fresh data exists (overallFreshnessMin < 120)
- Add manual-fallback-only comment to seed-consumer-prices.mjs

Phase 2 — Week-over-Week Tracking:
- Add wow_pct field to BigMacCountryPrice and CountryBasket proto messages
- Add wow_avg_pct, wow_available, prev_fetched_at to both response protos
- Regenerate client/server TypeScript from updated protos
- Add readCurrentSnapshot() helper + WoW computation to seed-bigmac.mjs
  and seed-grocery-basket.mjs; write :prev key via extraKeys
- Update BigMacPanel.ts to show per-country WoW column and global avg summary
- Update GroceryBasketPanel.ts to show WoW badge on total row and basket avg summary
- Add .bm-wow-up, .bm-wow-down, .bm-wow-summary, .gb-wow CSS classes
- Fix server handlers to include new WoW fields in fallback responses

* fix(economic): guard :prev extraKey against null on first seed run; eliminate double freshness query in publish.ts

* refactor(economic): address code review findings from PR #1974

- Extract readSeedSnapshot() into _seed-utils.mjs (DRY: was duplicated
  verbatim in seed-bigmac and seed-grocery-basket)
- Add FRESH_DATA_THRESHOLD_MIN constant in publish.ts (replace magic 120)
- Fix seed-consumer-prices.mjs contradictory JSDoc (remove stale
  "Deployed as: Railway cron service" line that contradicted manual-only warning)
- Add i18n keys panels.bigmacWow / panels.bigmacCountry to en.json
- Replace hardcoded "WoW" / "Country" with t() calls in BigMacPanel
- Replace IIFE-in-ternary pattern with plain if blocks in BigMacPanel
  and GroceryBasketPanel (P2/P3 from code review)

* fix(publish): gate advanceSeedMeta on any-retailer freshness, not average

overallFreshnessMin is the arithmetic mean across all retailers, so with
1 fresh + 2 stale retailers the average can exceed 120 min and suppress
seed-meta advancement even while fresh data is being published.

Use retailers.some(r => r.freshnessMin < 120) to correctly implement
"at least one retailer scraped within the last 2 hours."
2026-03-21 10:56:48 +04:00
Elie Habib
c4c03e7e65 fix(consumer-prices): restore seed script + fix publish job writing to wrong Redis (#1966)
* fix(consumer-prices): restore dropped seed script, fix maxStaleMin

seed-consumer-prices.mjs was included in PR #1901 but accidentally
dropped during a subsequent squash merge. Restored verbatim from the
PR diff. The script fetches from consumer-prices-core and writes all
Redis keys (overview, categories, movers, spread, freshness, series).

Also corrects health.js maxStaleMin values from 2880 (48h) to match
actual seed TTLs: overview/categories/movers=90min, spread=120min,
freshness=30min. The previous values would mask outages for 2 days.

Needs CONSUMER_PRICES_CORE_BASE_URL set in Railway env to produce
real data. Without it the script writes empty-but-valid placeholders.

* fix(consumer-prices): switch publish job to Upstash HTTP REST client

Replaced standard redis TCP client (REDIS_URL → Valkey) with Upstash
HTTP REST calls (UPSTASH_REDIS_REST_URL + UPSTASH_REDIS_REST_TOKEN),
matching the pattern used by all other seed scripts. This is why all
consumer-prices Redis keys were permanently empty — publish.ts was
writing to Railway-internal Valkey instead of Upstash.

Railway env vars added to seed-consumer-prices service separately.
2026-03-21 08:36:35 +04:00
Elie Habib
2b19bf0317 fix(consumer-prices): write retailer_spread_pct metric in aggregate job (#1941)
buildOverviewSnapshot queries retailer_spread_pct from computed_indices
but aggregate.ts never wrote it, causing the spread column to always be NULL.
2026-03-20 20:08:20 +04:00
Elie Habib
455449821b fix(consumer-prices): target partial index in upsertCanonicalProduct ON CONFLICT (#1939) 2026-03-20 19:57:10 +04:00
Elie Habib
e2bdd55268 fix(consumer-prices): allow self-signed SSL cert for Railway Postgres (#1938) 2026-03-20 19:48:04 +04:00
Elie Habib
3ae1327d73 fix(consumer-prices): fix tsc errors — highlights cast + QueryResultRow constraint (#1937) 2026-03-20 19:03:46 +04:00
Elie Habib
80243c1a38 fix(consumer-prices): install devDeps before tsc build in Dockerfile (#1935) 2026-03-20 18:45:57 +04:00
Elie Habib
eb988e9a3a chore(consumer-prices): add package-lock.json for Railway Docker build (#1934) 2026-03-20 18:28:22 +04:00
Elie Habib
84f8fe25b1 feat(consumer-prices): wire Exa engine into DB pipeline for time-series tracking (#1932)
- Add migration 002: seed canonical_products, baskets, basket_items for essentials-ae
- Add migration 003: partial unique index to fix NULL uniqueness gap in canonical_products
- Add ExaSearchAdapter to scrape.ts; auto-creates canonical product→basket matches
- Fix getBasketItemId to lookup by canonical_name via JOIN (not category) to avoid
  dairy collision (3 items share same category)
- Fix getOrCreateRetailer race condition with ON CONFLICT upsert
- Add per-category index writes in aggregate.ts; guard division-by-zero
- Set all publish.ts TTLs to 93600s (26h) to survive cron scheduling drift
- Set health.js maxStaleMin to 2880 (daily cron × 2) for correct staleness detection
- Remove redundant seed-consumer-prices.mjs (publish.ts writes to Redis directly)
- Add src/cli/validate.ts DB health check script
- Fix z.record() to z.record(z.string(), z.unknown()) for Zod compat
2026-03-20 17:59:42 +04:00
Elie Habib
7711e9de03 feat(consumer-prices): add basket price monitoring domain (#1901)
* feat(consumer-prices): add basket price monitoring domain

Adds end-to-end consumer price tracking to enable inflation monitoring
across key markets, starting with UAE essentials basket.

- consumer-prices-core/: companion scraping service with pluggable
  acquisition providers (Playwright, Exa, Firecrawl, Parallel P0),
  config-driven retailer YAML, Postgres schema, Redis snapshots
- proto/worldmonitor/consumer_prices/v1/: 6-RPC service definition
- api/consumer-prices/v1/[rpc].ts: Vercel edge route
- server/worldmonitor/consumer-prices/v1/: Redis-backed RPC handlers
- src/services/consumer-prices/: circuit breakers + bootstrap hydration
- src/components/ConsumerPricesPanel.ts: 5-tab panel (overview /
  categories / movers / spread / health)
- scripts/seed-consumer-prices.mjs: Railway cron seed script
- Wire into bootstrap, health, panels, gateway, cache-keys, locale

* fix(consumer-prices): resolve all code review findings

P0: populate topCategories — categoryResult was fetched but never used.
Added buildTopCategories() helper with grouped CTE query that extracts
current_index and week-over-week pct per category.

P1 (4 fixes):
- aggregate: replace N+1 getBaselinePrice loop with single batch query
  getBaselinePrices(ids[], date) via ANY($1) — eliminates 119 DB roundtrips
  per basket run
- aggregate/computeValueIndex: was dividing all category floors by the same
  arbitrary first baseline; now uses per-item floor price with per-item
  baseline (same methodology as fixed index but with cheapest price)
- basket-series endpoint now seeded: added buildBasketSeriesSnapshot() to
  worldmonitor.ts, /basket-series route in companion API, publish.ts writes
  7d/30d/90d series per basket, seed script fetches and writes all three ranges
- scrape: call teardownAll() after each retailer run to close Playwright
  browser; without this the Chromium process leaked on Railway

P2 (4 fixes):
- db/client: remove rejectUnauthorized: false — was bypassing TLS cert
  validation on all non-localhost connections
- publish: seed-meta now writes { fetchedAt, recordCount } matching the format
  expected by _seed-utils.mjs writeExtraKeyWithMeta (was writing { fetchedAt, key })
- products: remove unused getMatchedProductsForBasket — exact duplicate of
  getBasketRows in aggregate.ts; never imported by anything

Snapshot type overhaul:
- Flatten WMOverviewSnapshot to match proto GetConsumerPriceOverviewResponse
  (was nested under overview:{}; handlers read flat)
- All asOf fields changed from number to string (int64 → string per proto JSON)
- freshnessMin/parseSuccessRate null -> 0 defaults
- lastRunAt changed from epoch number to ISO string
- Mover items now include currentPrice and currencyCode
- emptyOverview/Movers/Spread/Freshness in seed script use String(Date.now())

* feat(consumer-prices): wire Exa search engine as acquisition backend for UAE retailers

Ports the proven Exa+summary price extraction from PR #1904 (seed-grocery-basket.mjs)
into consumer-prices-core as ExaSearchAdapter, replacing unvalidated Playwright CSS
scraping for all three UAE retailers (Carrefour, Lulu, Noon).

- New ExaSearchAdapter: discovers targets from basket YAML config (one per item),
  calls Exa API with contents.summary to get AI-extracted prices, uses matchPrice()
  regex (ISO codes + symbol fallback + CURRENCY_MIN guards) to extract AED amounts
- New db/queries/matches.ts: upsertProductMatch() + getBasketItemId() for auto-linking
  scraped Exa results to basket items without a separate matching step
- scrape.ts: selects ExaSearchAdapter when config.adapter === 'exa-search'; after
  insertObservation(), auto-creates canonical product and product_match (status: 'auto')
  so aggregate.ts can compute indices immediately without manual review
- All three UAE retailer YAMLs switched to adapter: exa-search and enabled: true;
  CSS extraction blocks removed (not used by search adapter)
- config/types.ts: adds 'exa-search' to adapter enum

* fix(consumer-prices): use EXA_API_KEYS (with fallback to EXA_API_KEY) matching PR #1904 pattern

* fix(consumer-prices): wire ConsumerPricesPanel in layout + fix movers limit:0 bug

Addresses Codex P1 findings on PR #1901:
- panel-layout.ts: import and createPanel('consumer-prices') so the panel
  actually renders in finance/commodity variants where it is enabled in config
- consumer-prices/index.ts: limit was hardcoded 0 causing slice(0,0) to always
  return empty risers/fallers after bootstrap is consumed; fixed to 10

* fix(consumer-prices): add categories snapshot to close P2 gap

consumer-prices:categories:ae:* was in BOOTSTRAP_KEYS but had no producer,
so the Categories tab always showed upstreamUnavailable.

- buildCategoriesSnapshot() in worldmonitor.ts — wraps buildTopCategories()
  and returns WMCategoriesSnapshot matching ListConsumerPriceCategoriesResponse
- /categories route in consumer-prices-core API
- publish.ts writes consumer-prices:categories:{market}:{range} for 7d/30d/90d
- seed-consumer-prices.mjs fetches all three ranges from consumer-prices-core
  and writes them to Redis alongside the other snapshots

P1 issues (snapshot structure mismatch + limit:0 movers) were already fixed
in earlier commits on this branch.

* fix(types): add variants? to PANEL_CATEGORY_MAP type
2026-03-20 17:08:22 +04:00