mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
Two data-quality regressions found in the 2026-03-23 scrape: 1. ananinja_sa: Exa returns restaurant menu URLs (e.g. /restaurants/burger-king) alongside product pages. isTitlePlausible passes "onion rings" for "Onions 1kg" because "onion" is a substring. Add urlPathContains: /product/ to restrict Exa results to product pages only, matching the existing pattern used by tamimi/carrefour. 2. search.ts parseListing: when Firecrawl can't extract a productName, we fell back to the canonical name as rawTitle. This silently stored unverifiable matches (raw_title = canonical_name) as real prices — e.g. Noon matched a chicken product as "Eggs Fresh 12 Pack". Reject the result outright when productName is missing. DB: manually disabled 30 known-bad pins (26 canonical-name=raw-title failures, 2 restaurant URLs, 2 bundle-size mismatches) via pin_disabled_at.