mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
main
13 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
90f4ac0f78 |
feat(consumer-prices): strict search-hit validator (shadow mode) (#3101)
* feat(consumer-prices): add 'candidate' match state + negativeTokens schema
Schema foundation for the strict-validator plan:
- migration 008 widens product_matches.match_status CHECK to include
'candidate' so weak search hits can be persisted without entering
aggregates (aggregate.ts + snapshots filter on ('auto','approved')
so candidates are excluded automatically).
- BasketItemSchema gains optional negativeTokens[] — config-driven
reject tokens for obvious class errors (e.g. 'canned' for fresh
tomatoes). Product-taxonomy splits like plain vs greek yogurt
belong in separate substitutionGroup values, not here.
- upsertProductMatch accepts 'candidate' and writes evidence_json
so reviewers can see why a match was downgraded.
* feat(consumer-prices): add validateSearchHit pure helper + known-bad test fixtures
Deterministic post-extraction validator that replaces the boolean
isTitlePlausible gate for scoring and candidate triage. Evaluates
four signals and returns { ok, score, reasons, signals }:
- class-error rejects from BasketItem.negativeTokens (whole-token
match for single words; substring match for hyphenated entries
like 'plant-based' so 'Plant-Based Yogurt' trips without needing
token-splitting gymnastics)
- non-food indicators (seeds, fertilizer, planting) — shared with
the legacy gate
- token-overlap ratio over identity tokens (>2 chars, non-packaging)
- quantity-window conformance against minBaseQty/maxBaseQty
Score is a 0..1 weighted sum (overlap 0.55, size 0.35/0.2/0, class-
clean 0.10). AUTO_MATCH_THRESHOLD=0.75 exported for the scrape-side
auto-vs-candidate decision.
Locked all five bad log examples into regression tests and added
matching positive cases so the rule set proves both sides of the
boundary. Also added vitest.config.ts so consumer-prices-core tests
run under its own config instead of inheriting the worldmonitor
root config (which excludes this directory).
* feat(consumer-prices): wire validator (shadow) + replace 1.0 auto-match
search.ts:
- Thread BasketItem constraints (baseUnit, min/maxBaseQty, negativeTokens,
substitutionGroup) through discoverTargets → fetchTarget → parseListing
using explicit named fields, not an opaque JSON blob.
- _extractFromUrl now runs validateSearchHit alongside isTitlePlausible.
Legacy gate remains the hard gate; validator is shadow-only for now —
when legacy accepts but validator rejects, a [search:shadow-reject]
line is logged with reasons + score so the rollout diff report can
inform the decision to flip the gate. No live behavior change.
- ValidatorResult attached to SearchPayload + rawPayload so scrape.ts
can score the match without re-running the validator.
scrape.ts:
- Remove unconditional matchScore:1.0 / status:'auto' insert. Use the
validator score from the adapter payload. Hits with ok=true and
score >= AUTO_MATCH_THRESHOLD (0.75) keep 'auto'; everything else
(including validator.ok=false) writes 'candidate' with evidence_json
carrying the reasons + signals. Aggregates filter on ('auto','approved')
so candidates are excluded automatically.
- Adapters without a validator (exa-search, etc.) fall back to the
legacy 1.0/auto behavior so this PR is a no-op for non-search paths.
* feat(consumer-prices): populate negativeTokens for 6 known-bad groups
* fix(consumer-prices): enforce validator on pin path + drop 'cane' from sugar rejects
Addresses PR #3101 review:
1. Pinned direct hits bypassed the validator downgrade — the new
auto-vs-candidate decision only ran inside the !wasDirectHit block,
so a pin that drifted onto the wrong product (the steady-state
common path) would still flow poisoned prices into aggregates
through the existing 'auto' match. Now: before inserting an
observation, if the direct hit's validator.ok === false, skip the
observation and route the target through handlePinError so the pin
soft-disables after 3 strikes. Legacy isTitlePlausible continues to
gate the pin extraction itself.
2. 'cane' was a hard reject for sugar_white across all 10 baskets but
'white cane sugar' is a legitimate SKU descriptor — would have
downgraded real products to candidate and dropped coverage. Removed
from every essentials_*.yaml sugar_white negativeTokens list.
Added a regression test that locks in 'Silver Spoon White Cane
Sugar 1kg' as a must-pass positive case.
* fix(consumer-prices): strip size tokens from identity + protect approved rows
Addresses PR #3101 round 2 review:
1. Compact size tokens ("1kg", "500g", "250ml") were kept as identity
tokens. Firecrawl emits size spaced ("1 kg"), which tokenises to
["1","kg"] — both below the length>2 floor — so the compact "1kg"
token could never match. Short canonical names like "Onions 1kg"
lost 0.5 token overlap and legitimate hits landed at score 0.725 <
AUTO_MATCH_THRESHOLD, silently downgrading to candidate. Size
fidelity is already enforced by the quantity-window check; identity
tokens now ignore /^\d+(?:\.\d+)?[a-z]+$/. New regression test
locks in "Fresh Red Onions 1 kg" as a must-pass case.
2. upsertProductMatch's DO UPDATE unconditionally wrote EXCLUDED.status.
A re-scrape whose validator scored an already-approved URL below
0.75 would silently demote human-curated 'approved' rows to
'candidate'. Added a CASE guard so approved stays approved; every
other state follows the new validator verdict.
* fix(consumer-prices): widen curated-state guard to review + rejected
PR #3101 round 3: the CASE only protected 'approved' from being
overwritten. 'review' (written by validate.ts when a price is an
outlier, or by humans sending a row back) and 'rejected' (human
block) are equally curated — a re-scrape under this path silently
overwrites them with the fresh validator verdict and re-enables the
URL in aggregate queries on the next pass.
Widen the immutable set to ('approved','review','rejected'). Also
stop clearing pin_disabled_at on those rows so a quarantined pin
keeps its disabled flag until the review workflow resolves it.
* fix(analyze-stock): classify dividend frequency by median gap
recentDivs.length within a hard 365.25-day window misclassifies quarterly
payers whose last-year Q1 payment falls just outside the cutoff — common
after mid-April each year, when Date.now() - 365.25d lands after Jan's
payment timestamp. The test 'non-zero CAGR for a quarterly payer' flaked
calendar-dependently for this reason.
Prefer median inter-payment interval: quarterly = ~91d median gap,
regardless of where the trailing-12-month window happens to bisect the
payment series. Falls back to the old count when <2 entries exist.
Also documents the CAGR filter invariant in the test helper.
* fix(analyze-stock): suppress frequency when no recent divs + detect regime slowdowns
Addresses PR #3102 review:
1. Suspended programs no longer leak a frequency badge. When recentDivs
is empty, dividendYield and trailingAnnualDividendRate are both 0;
emitting 'Quarterly' derived from historical median would contradict
those zeros in the UI. paymentsPerYear now short-circuits to 0 before
the interval classifier runs.
2. Whole-history median-gap no longer masks cadence regime changes. The
reconciliation now depends on trailing-year count:
recent >= 3 → interval classifier (robust to calendar drift)
recent 1..2 → inspect most-recent inter-payment gap:
> 180d = real slowdown, trust count (Annual)
<= 180d = calendar drift, trust interval (Quarterly)
recent 0 → empty frequency (suspended)
The interval classifier itself is now scoped to the last 2 years so
it responds to regime changes instead of averaging over 5y of history.
Regression tests:
- 'emits empty frequency when the dividend program has been suspended' —
3y of quarterly history + 18mo silence must report '' not 'Quarterly'.
- 'detects a recent quarterly → annual cadence change' — 12 historical
quarterly payments + 1 recent annual payment must report 'Annual'.
* fix(analyze-stock): scope interval median to trailing year when recent>=3
Addresses PR #3102 review round 2: the reconciler's recent>=3 branch
called paymentsPerYearFromInterval(entries), which scopes to the last
2 years. A monthly→quarterly shift (12 monthly payments in year -2..-1
plus 4 quarterly in year -1..0) produced a 2-year median of ~30d and
misclassified as Monthly even though the current trailing-year cadence
is clearly quarterly.
Pass recentDivs directly to the interval classifier when recent>=3.
Two payments in the trailing year = 1 gap which suffices for the median
(gap count >=1, median well-defined). The historical-window 2y scoping
still applies for the recent 1..2 branch, where we actively need
history to distinguish drift from slowdown.
Regression test: 12 monthly payments from -13..-24 months ago + 4
quarterly payments inside the trailing year must classify as Quarterly.
* fix(analyze-stock): use true median (avg of two middles) for even gap counts
PR #3102 P2: gaps[floor(length/2)] returns the upper-middle value for
even-length arrays, biasing toward slower cadence at classifier
thresholds when the trailing-year sample is small. Use the average of
the two middles for even lengths. Harmless on 5-year histories with
50+ gaps where values cluster, but correct at sparse sample sizes where
the trailing-year branch can have only 2–3 gaps.
|
||
|
|
261d40dfa1 |
fix(consumer-prices): block restaurant URLs and reject null productName extractions (#2156)
Two data-quality regressions found in the 2026-03-23 scrape: 1. ananinja_sa: Exa returns restaurant menu URLs (e.g. /restaurants/burger-king) alongside product pages. isTitlePlausible passes "onion rings" for "Onions 1kg" because "onion" is a substring. Add urlPathContains: /product/ to restrict Exa results to product pages only, matching the existing pattern used by tamimi/carrefour. 2. search.ts parseListing: when Firecrawl can't extract a productName, we fell back to the canonical name as rawTitle. This silently stored unverifiable matches (raw_title = canonical_name) as real prices — e.g. Noon matched a chicken product as "Eggs Fresh 12 Pack". Reject the result outright when productName is missing. DB: manually disabled 30 known-bad pins (26 canonical-name=raw-title failures, 2 restaurant URLs, 2 bundle-size mismatches) via pin_disabled_at. |
||
|
|
f6f0fcbd55 |
feat(consumer-prices): add second retailers for 5 markets + re-enable carrefour_sa (#2157)
Addresses the single-retailer confidence problem: 6 of 8 markets had only 1 active retailer, meaning any scrape failure or bad pin produced undetectable bad data. Changes: - carrefour_sa: re-enabled (#2148 OOS prompt fix merged) - noon_sa (NEW): second SA retailer; Exa finds noon.com/saudi-en/ URLs urlPathContains=/saudi-en/ isolates SA from UAE/Egypt storefronts - coles_au (NEW): second AU retailer alongside Woolworths - ocado_gb (NEW): second GB retailer alongside Tesco - jiomart_in (NEW): second IN retailer alongside BigBasket - coldstorage_sg (NEW): second SG retailer alongside FairPrice - carrefour_br (NEW): second BR retailer alongside Pão de Açúcar DB: disabled Walmart organic jasmine rice (wrong type) and Kroger PLU-4093 onion (by-weight produce item, not 3lb bag) |
||
|
|
96f568f934 | fix(ananinja-sa): add inStockFromPrice=true — price aggregator, Firecrawl misreads availability (#2147) | ||
|
|
c1c016b03c |
feat(sa): add ananinja_sa (9/12 items) + fix OOS price hallucination in extraction prompt (#2139)
ananinja.com is well-indexed by Exa with real SAR prices (eggs 23, milk 6, chicken 17 SAR). carrefour_sa and panda_sa added as disabled configs for future re-evaluation. Also fixes Firecrawl extraction prompt that instructed it to return a price "even if out of stock", causing it to grab carousel prices (always 3.95 SAR) when the main product had no displayed price. Updated prompt now returns null for OOS items with no price shown. Tamimi disabled (0/12 after query tweak). carrefour_sa mostly OOS. panda.sa has no Exa indexing. |
||
|
|
0c8a754f43 |
fix(tamimi-sa): disable retailer after 0/12 products on 2026-03-23 (#2137)
Firecrawl can't extract tamimimarkets.com even with updated query template. Soft-disabled with dated comment; re-enable once extraction is unblocked. |
||
|
|
56dba6c159 |
feat(consumer-prices): product pinning, BigBasket fix, spread threshold, retailer sync (#2136)
* feat(consumer-prices): product pinning, BigBasket fix, spread threshold, retailer sync
Implements scraper stability plan to prevent URL churn between runs:
Product pinning (core fix):
- Migration 007: adds pin_disabled_at, consecutive_out_of_stock, pin_error_count columns
- After first successful Exa+Firecrawl match, reuse the stored URL directly on subsequent
runs without re-running Exa. Stale pins are soft-disabled (never deleted) after 3x OOS
or 3x fetch errors, triggering automatic Exa rediscovery on the next run.
- On direct-path failure, falls back to normal Exa flow in the same run.
- Compound map key "basketSlug:canonicalName" prevents collisions in multi-basket markets.
Retailer active-state sync:
- getOrCreateRetailer now writes active=config.enabled on every upsert.
- scrapeAll iterates ALL configs (not just enabled) so disabled retailers get synced to DB.
- Eliminates need for manual SQL hotfixes to set active=false.
Analytics correctness:
- All product_matches reads in aggregate, validate, worldmonitor snapshot now filter
pin_disabled_at IS NULL so soft-disabled stale matches don't skew indices.
- getBaselinePrices adds missing match_status IN ('auto','approved') guard.
BigBasket IN fix:
- inStockFromPrice: true flag overrides out-of-stock when price > 0.
- BigBasket gates on delivery pincode, not product availability — Firecrawl misread
the pincode gate as out-of-stock for all 12 basket items.
Spread reliability:
- Minimum 4 common categories required to compute retailer_spread_pct.
- Writes explicit 0 when below threshold to prevent stale noisy value persisting.
- US spread 134.8% from single cooking_oil pair is now suppressed.
Other:
- Tamimi SA: better Exa query template targeting tamimimarkets.com directly.
- Remove KE from frontend MARKETS array (basket data preserved in DB).
- 13 new vitest unit tests covering pinning, inStockFromPrice, host validation.
* fix(scrape): distinguish wasDirectHit from isDirect to close pin-fallback loop
When a pinned target falls back to Exa (fetchTarget sets direct:false in
payload), the old code read isDirect from target.metadata (always true),
causing two bugs:
1. upsertProductMatch was skipped → new Exa URL never pinned
2. Stale-pin counters reset on any pin target → broken pins never disabled
Introduce wasDirectHit = isDirect && rawPayload.direct === true so:
- Stale-pin maintenance (OOS/reset) only fires when pin URL was actually used
- Exa fallback (isDirect && !wasDirectHit) fires handlePinError on old pin
- upsertProductMatch guard uses !wasDirectHit so fallback results get pinned
|
||
|
|
ed12bf5c84 |
fix(consumer-prices): disable migros_ch, coop_ch, sainsburys_gb (#2113)
All three return 0/12 every run: - migros_ch, coop_ch: Exa domain bleed (competitor URLs returned) - sainsburys_gb: Exa returns no pages for all GB grocery queries Disabled with dated comments explaining the root cause. |
||
|
|
8db8ef3f6e |
fix(consumer-prices): disable wholefoods_us -- Amazon login wall blocks Firecrawl (#2100)
* fix(consumer-prices): disable wholefoods_us -- Amazon login wall blocks Firecrawl WFM product pages require Amazon authentication. Without a session cookie, every page renders an auth modal containing img-login-tomato.png and img-login-cheese.png. Firecrawl's LLM extracts those image alt-texts as the product name, causing 11/12 title mismatch failures per run. Confirmed via direct Firecrawl scrape: markdown shows "Log in with Amazon" and the tomato/cheese auth prompt images -- no product content rendered. US basket is well covered by kroger_us (12/12) and walmart_us (11/12). * fix(consumer-prices): seed 9 new global baskets and their canonical products Migration 006 seeds baskets for AU, BR, CH, GB, IN, KE, SA, SG, US. These were added as YAML configs in the global expansion PR but never seeded into the DB, causing "Basket not found in DB -- run seed first" failures for all 9 markets in every aggregate run. Also fixes ON CONFLICT for canonical_products to use the partial index added in migration 003 (canonical_products_name_category_null_idx), which covers rows where brand_norm/variant_norm/size_value/size_unit are all NULL. Shared canonical names (Basmati Rice, Sunflower Oil 1L, White Sugar 1kg, etc.) are reused via ON CONFLICT DO NOTHING. Note: migration 005 (computed_indices_null_idx) must also be applied on the production DB -- it was committed but may not have run, causing the separate "no unique or exclusion constraint" error on essentials-ae. |
||
|
|
47df75daab |
fix(consumer-prices): parallel scraping, retailer config fixes, disable naivas (#2086)
* fix(consumer-prices): parallel scraping, retailer config fixes, disable naivas - scrapeAll() now uses 5-worker concurrent pool (was sequential) -- cuts 28-min run to ~5-6 min since each retailer hits a different domain - Move initProviders/teardownAll out of scrapeRetailer() to avoid registry teardown race when workers share the _providers singleton - sainsburys_gb: fix baseUrl to groceries.sainsburys.co.uk (Exa returns URLs from this subdomain, not www.sainsburys.co.uk); remove wrong path filter - coop_ch: fix urlPathContains /de/food/ -> /de/lebensmittel/ (German site) - migros_ch: remove urlPathContains /de/produkt/ (Exa doesn't index these paths) - naivas_ke: disable (Exa returns 0 matching URLs for www.naivas.online, wasting 72 API calls * 6s each per run) * fix(consumer-prices): harden scrapeRetailer against stuck runs and teardown leaks P1: move API key validation before createScrapeRun() so a missing key never leaves a scrape_run row stuck in status='running' forever P2: wrap single-retailer CLI path in try/finally so teardownAll() is called even when scrapeRetailer() throws, preventing Playwright process leaks P2: add comment explaining why initProviders() is kept in scrapeAll() -- GenericPlaywrightAdapter (playwright/p0 adapters) uses fetchWithFallback() from the registry; search/exa-search bypass it with their own instances P3: add comment in migros_ch.yaml documenting why urlPathContains was removed |
||
|
|
895bcc3ba3 |
feat(consumer-prices): SearchAdapter global pipeline -- 10 markets, 18 retailers (#2063)
* fix(consumer-prices): smarter Exa queries + Firecrawl URL fallback for missed prices
- Add market-aware query building (maps marketCode to country name) so LuLu
queries prefer UAE pages over BHD/SAR/QAR GCC country pages
- Support searchQueryTemplate in acquisition config per retailer:
- noon_grocery_ae: adds "grocery fresh food" to avoid seeds and storage boxes
- lulu_ae: adds UAE country hint to anchor to the right market
- Firecrawl URL fallback: when all Exa summaries fail price extraction, scrape
the first result URL with Firecrawl (JS-rendered markdown surfaces prices
that static Exa summaries miss, fixes Noon chicken/rice/bread pricing gaps)
- Pass FIRECRAWL_API_KEY to ExaSearchAdapter in scrape.ts
* feat(consumer-prices): SearchAdapter -- Exa URL discovery + Firecrawl structured extraction
Replace ExaSearchAdapter regex-on-AI-summary with proper two-stage pipeline:
Stage 1 (Exa): neural search on retailer domain, ranked product page URLs.
Stage 2 (Firecrawl): structured LLM extraction from JS-rendered page.
Fixes Noon JS price invisibility and LuLu wrong-currency results.
Domain allowlist (exact hostname) + isTitlePlausible (40% token overlap) block
seed/container mismatches. All 4 AE retailers switched to adapter: search.
queryTemplate per retailer: LuLu anchors to UAE, Noon adds grocery food context.
* fix(consumer-prices): SearchAdapter smoke test fixes
- Fix FirecrawlProvider extract response path: data.data.extract not data.data
- Add prompt to ExtractSchema so Firecrawl LLM parses split prices (4 + .69 = 4.69)
- Update extract prompt to return listed price even when out of stock
- isTitlePlausible: strip packaging words (Pack/Box/Bag) from canonical tokens so
size variants (eggs x 6, eggs x 15) match canonical Eggs Fresh 12 Pack on eggs token
while keeping 40% threshold to block seeds/canned goods
- Add urlPathContains to SearchConfigSchema for URL path filtering
- Noon: urlPathContains /p/ blocks category pages returning wrong products
- LuLu: baseUrl -> gcc.luluhypermarket.com (site migrated), urlPathContains /en-ae/
- All retailers: numResults 3 -> 5 for more fallback URLs
- Add ExtractSchema.prompt field to acquisition types
* feat(consumer-prices): SearchAdapter genericity improvements
- MARKET_NAMES: expand to 20 markets (GB, US, CA, AU, DE, FR, NL, SG, IN, PK, NG, KE, ZA)
fallback to marketCode.toUpperCase() for any unlisted market
- Default query: add 'grocery' hint so new retailers work well without custom queryTemplate
- queryTemplate: add {category} token (maps to basket item category)
- isTitlePlausible: naive stemming (tomatoes->tomato, carrots->carrot) with two guards:
1. NON_FOOD_INDICATORS set (seeds/planting/seedling) rejects garden products before token check
2. stem only applied when result length >= 4 and differs from original (blocks egg->egg false positive)
- stem guard prevents 'eggs'->'egg' matching 'Egg Storage Box Container'
- stem guard allows 'tomatoes'->'tomato' matching 'Fresh Tomato India 1kg'
* feat(consumer-prices): global expansion -- 9 markets, 13 retailers, USD normalization
- Add 9 basket configs: US, UK, SG, IN, CH, SA, AU, KE, BR
- Add 13 retailer configs: Walmart/Kroger/WholeFoods (US), Tesco/Sainsbury's (UK),
FairPrice (SG), BigBasket (IN), Migros/Coop (CH), Tamimi (SA),
Woolworths (AU), Naivas (KE), Pao de Acucar (BR)
- Add src/fx/rates.ts: static FX table (16 currencies to USD)
- aggregate.ts: compute basket_total_usd metric for cross-country comparison
- search.ts: add Switzerland (ch) and Brazil (br) to MARKET_NAMES
* fix(consumer-prices): code review fixes -- match gate, MARKET_NAMES dedup, domain check
- scrape.ts: extend match-creation gate to include search adapter (was exa-search only)
Without this, all 13 new global retailers never wrote product_matches rows so the
aggregate job produced no index values -- the global expansion was a silent no-op
- market-names.ts: extract shared MARKET_NAMES; exa-search had an incomplete 7-market
copy silently producing blank market context for non-GCC queries
- exa-search.ts: add isAllowedHost check before firecrawlFetch (domain validation bypass)
- fx/rates.ts: add RATES_DATE export for ops staleness visibility
- essentials_in/ke: add 12th item (paneer / processed cheese) for coverage parity
- wholefoods_us: add urlPathContains /product/ to block non-product Exa results
|
||
|
|
a060ec773b | feat(consumer-prices): add Spinneys UAE retailer; add SPAR UAE config (disabled) (#2054) | ||
|
|
7711e9de03 |
feat(consumer-prices): add basket price monitoring domain (#1901)
* feat(consumer-prices): add basket price monitoring domain
Adds end-to-end consumer price tracking to enable inflation monitoring
across key markets, starting with UAE essentials basket.
- consumer-prices-core/: companion scraping service with pluggable
acquisition providers (Playwright, Exa, Firecrawl, Parallel P0),
config-driven retailer YAML, Postgres schema, Redis snapshots
- proto/worldmonitor/consumer_prices/v1/: 6-RPC service definition
- api/consumer-prices/v1/[rpc].ts: Vercel edge route
- server/worldmonitor/consumer-prices/v1/: Redis-backed RPC handlers
- src/services/consumer-prices/: circuit breakers + bootstrap hydration
- src/components/ConsumerPricesPanel.ts: 5-tab panel (overview /
categories / movers / spread / health)
- scripts/seed-consumer-prices.mjs: Railway cron seed script
- Wire into bootstrap, health, panels, gateway, cache-keys, locale
* fix(consumer-prices): resolve all code review findings
P0: populate topCategories — categoryResult was fetched but never used.
Added buildTopCategories() helper with grouped CTE query that extracts
current_index and week-over-week pct per category.
P1 (4 fixes):
- aggregate: replace N+1 getBaselinePrice loop with single batch query
getBaselinePrices(ids[], date) via ANY($1) — eliminates 119 DB roundtrips
per basket run
- aggregate/computeValueIndex: was dividing all category floors by the same
arbitrary first baseline; now uses per-item floor price with per-item
baseline (same methodology as fixed index but with cheapest price)
- basket-series endpoint now seeded: added buildBasketSeriesSnapshot() to
worldmonitor.ts, /basket-series route in companion API, publish.ts writes
7d/30d/90d series per basket, seed script fetches and writes all three ranges
- scrape: call teardownAll() after each retailer run to close Playwright
browser; without this the Chromium process leaked on Railway
P2 (4 fixes):
- db/client: remove rejectUnauthorized: false — was bypassing TLS cert
validation on all non-localhost connections
- publish: seed-meta now writes { fetchedAt, recordCount } matching the format
expected by _seed-utils.mjs writeExtraKeyWithMeta (was writing { fetchedAt, key })
- products: remove unused getMatchedProductsForBasket — exact duplicate of
getBasketRows in aggregate.ts; never imported by anything
Snapshot type overhaul:
- Flatten WMOverviewSnapshot to match proto GetConsumerPriceOverviewResponse
(was nested under overview:{}; handlers read flat)
- All asOf fields changed from number to string (int64 → string per proto JSON)
- freshnessMin/parseSuccessRate null -> 0 defaults
- lastRunAt changed from epoch number to ISO string
- Mover items now include currentPrice and currencyCode
- emptyOverview/Movers/Spread/Freshness in seed script use String(Date.now())
* feat(consumer-prices): wire Exa search engine as acquisition backend for UAE retailers
Ports the proven Exa+summary price extraction from PR #1904 (seed-grocery-basket.mjs)
into consumer-prices-core as ExaSearchAdapter, replacing unvalidated Playwright CSS
scraping for all three UAE retailers (Carrefour, Lulu, Noon).
- New ExaSearchAdapter: discovers targets from basket YAML config (one per item),
calls Exa API with contents.summary to get AI-extracted prices, uses matchPrice()
regex (ISO codes + symbol fallback + CURRENCY_MIN guards) to extract AED amounts
- New db/queries/matches.ts: upsertProductMatch() + getBasketItemId() for auto-linking
scraped Exa results to basket items without a separate matching step
- scrape.ts: selects ExaSearchAdapter when config.adapter === 'exa-search'; after
insertObservation(), auto-creates canonical product and product_match (status: 'auto')
so aggregate.ts can compute indices immediately without manual review
- All three UAE retailer YAMLs switched to adapter: exa-search and enabled: true;
CSS extraction blocks removed (not used by search adapter)
- config/types.ts: adds 'exa-search' to adapter enum
* fix(consumer-prices): use EXA_API_KEYS (with fallback to EXA_API_KEY) matching PR #1904 pattern
* fix(consumer-prices): wire ConsumerPricesPanel in layout + fix movers limit:0 bug
Addresses Codex P1 findings on PR #1901:
- panel-layout.ts: import and createPanel('consumer-prices') so the panel
actually renders in finance/commodity variants where it is enabled in config
- consumer-prices/index.ts: limit was hardcoded 0 causing slice(0,0) to always
return empty risers/fallers after bootstrap is consumed; fixed to 10
* fix(consumer-prices): add categories snapshot to close P2 gap
consumer-prices:categories:ae:* was in BOOTSTRAP_KEYS but had no producer,
so the Categories tab always showed upstreamUnavailable.
- buildCategoriesSnapshot() in worldmonitor.ts — wraps buildTopCategories()
and returns WMCategoriesSnapshot matching ListConsumerPriceCategoriesResponse
- /categories route in consumer-prices-core API
- publish.ts writes consumer-prices:categories:{market}:{range} for 7d/30d/90d
- seed-consumer-prices.mjs fetches all three ranges from consumer-prices-core
and writes them to Redis alongside the other snapshots
P1 issues (snapshot structure mismatch + limit:0 movers) were already fixed
in earlier commits on this branch.
* fix(types): add variants? to PANEL_CATEGORY_MAP type
|