worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-26 01:24:59 +02:00

Author	SHA1	Message	Date
Elie Habib	7c0c08ad89	feat(energy-atlas): seed-side countries[] denorm on disruptions + CountryDeepDive row (§R #5 = B) (#3377 ) * feat(energy-atlas): seed-side countries[] denorm + CountryDeepDive row (§R #5 = B) Per plan §R/#5 decision B: denormalise countries[] at seed time on each disruption event so CountryDeepDivePanel can filter events per country without an asset-registry round trip. Schema join (pipeline/storage → event.assetId) happens once in the weekly cron, not on every panel render. The alternative (client-side join) was rejected because it couples UI logic to asset-registry internals and duplicates the join for every surface that wants a per-country filter. Changes: - `proto/.../list_energy_disruptions.proto`: add `repeated string countries = 15` to EnergyDisruptionEntry with doc comment tying it to the plan decision and the always-non-empty invariant. - `scripts/_energy-disruption-registry.mjs`: • Load pipeline-gas + pipeline-oil + storage-facilities registries once per seed cycle; index by id. • `deriveCountriesForEvent()` resolves assetId to {fromCountry, toCountry, transitCountries} (pipeline) or {country} (storage), deduped + alpha-sorted so byte-diff stability holds. • `buildPayload()` attaches the computed countries[] to every event before writing. • `validateRegistry()` now requires non-empty countries[] of ISO2 codes. Combined with the seeder's `emptyDataIsFailure: true`, this surfaces orphaned assetIds loudly — the next cron tick fails validation and seed-meta stays stale, tripping health alarms. - `scripts/data/energy-disruptions.json`: fix two orphaned assetIds that the new join caught: • `cpc-force-majeure-2022`: `cpc-pipeline` → `cpc` (matches the entry in pipelines-oil.json). • `pdvsa-designation-2019`: `ve-petrol-2026-q1` (non-existent) → `venezuela-anzoategui-puerto-la-cruz`. - `server/.../list-energy-disruptions.ts`: project countries[] into the RPC response via coerceStringArray. Legacy pre-denorm rows surface as empty array (always present on wire, length 0 => old). - `src/components/CountryDeepDivePanel.ts`: add 4th Atlas row — "Energy disruptions in {iso2}" — filtered by `iso2 ∈ countries[]`. Failure is silent; EnergyDisruptionsPanel (upcoming) is the primary disruption surface. - `tests/energy-disruptions-registry.test.mts`: switch to validating the buildPayload output (post-denorm), add §R #5 B invariant tests, plus a raw-JSON invariant ensuring curators don't hand-edit countries[] (it's derived, not declared). Proto regen note: `make generate` currently fails with a duplicate openapi plugin collision in buf.gen.yaml (unrelated bug — 3 plugin entries emit to the same out dir). Worked around by temporarily trimming buf.gen.yaml to just the TS plugins for this regen. Added only the `countries: string[]` wire field to both service_client and service_server; no other generated-file drift in this PR. * chore(proto): regenerate openapi specs for countries[] field Runs `make generate` with the sebuf v0.11.1 plugin now correctly resolved via the PATH fix (cherry-picked from fix/makefile-generate-path-prefix). The new `countries` field on EnergyDisruptionEntry propagates into: - docs/api/SupplyChainService.openapi.yaml (primary per-service spec) - docs/api/SupplyChainService.openapi.json (machine-readable variant) - docs/api/worldmonitor.openapi.yaml (consolidated bundle) No TypeScript drift beyond the already-committed service_client.ts / service_server.ts updates in `80797e7cc`. * fix(energy-atlas): drop highlightEventId emission (review P2) Codex P2: loadDisruptionsForCountry dispatched `highlightEventId` but neither PipelineStatusPanel nor StorageFacilityMapPanel consumes it (the openDetailHandler reads only pipelineId / facilityId). The UI's implicit promise (event-specific highlighting) wasn't delivered — clickthrough was asset-generic, and the extra wire field was a misleading API surface. Fix: emit only {pipelineId, facilityId} in the dispatched detail. Row click opens the asset drawer; user sees the full per-asset disruption timeline and locates the event visually. Symmetric fix for PR #3378's EnergyDisruptionsPanel — both emitters now match the drawer contract exactly. Re-add `highlightEventId` here when the drawer panels ship matching consumer code (openDetailHandler accepts it, loadDetail stores it, renderDisruptionTimeline scrolls + emphasises the matching event). Typecheck clean, test:data 6698/6698 pass. * fix(energy-atlas): collision detection + abort signal + label clamp (review P2) Three Codex P2 findings on PR #3377: 1. `loadAssetRegistries()` spread-merged gas + oil pipelines, silently overwriting entries on id collision. No collision today, but a curator adding a pipeline under the same id to both files would cause `deriveCountriesForEvent` to return wrong-commodity country data with no test flagging it. Fix: explicit merge loop that throws on duplicate id. The next cron tick fails validation, seed-meta stays stale, health alarms fire — same loud-failure pattern the rest of the seeder uses. 2. `loadDisruptionsForCountry` didn't thread `this.signal` through the RPC fetch shim. The stale-closure guard (`currentCode !== iso2`) discarded stale RESULTS, but the in-flight request couldn't be cancelled when the user switched countries or closed the panel. Fix: wrap globalThis.fetch with { signal: this.signal } in the client factory, matching the signal lifecycle the rest of the panel already uses. 3. `shortDescription` values up to 200 chars rendered without ellipsis in the compact Atlas row, overflowing the row layout. Fix: new `truncateDisruptionLabel` helper clamps to 80 chars with ellipsis. Full text still accessible via click-through to the asset drawer. Typecheck clean, test:data 6698/6698 pass.	2026-04-24 19:08:07 +04:00
Sebastien Melki	e68a7147dd	chore(api): sebuf migration follow-ups (post-#3242) (#3287 ) * chore(api-manifest): rewrite brief-why-matters reason as proper internal-helper justification Carried in from #3248 merge as a band-aid (called out in #3242 review followup checklist item 7). The endpoint genuinely belongs in internal-helper — RELAY_SHARED_SECRET-bearer auth, cron-only caller, never reached by dashboards or partners. Same shape constraint as api/notify.ts. Replaces the apologetic "filed here to keep the lint green" framing with a proper structural justification: modeling it as a generated service would publish internal cron plumbing as user-facing API surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lint): premium-fetch parity check for ServiceClients (closes #3279) Adds scripts/enforce-premium-fetch.mjs — AST-walks src/, finds every `new <ServiceClient>(...)` (variable decl OR `this.foo =` assignment), tracks which methods each instance actually calls, and fails if any called method targets a path in src/shared/premium-paths.ts PREMIUM_RPC_PATHS without `{ fetch: premiumFetch }` on the constructor. Per-call-site analysis (not class-level) keeps the trade/index.ts pattern clean — publicClient with globalThis.fetch + premiumClient with premiumFetch on the same TradeServiceClient class — since publicClient never calls a premium method. Wired into: - npm run lint:premium-fetch - .husky/pre-push (right after lint:rate-limit-policies) - .github/workflows/lint-code.yml (right after lint:api-contract) Found and fixed three latent instances of the HIGH(new) #1 class from #3242 review (silent 401 → empty fallback for signed-in browser pros): - src/services/correlation-engine/engine.ts — IntelligenceServiceClient built with no fetch option called deductSituation. LLM-assessment overlay on convergence cards never landed for browser pros without a WM key. - src/services/economic/index.ts — EconomicServiceClient with globalThis.fetch called getNationalDebt. National-debt panel rendered empty for browser pros. - src/services/sanctions-pressure.ts — SanctionsServiceClient with globalThis.fetch called listSanctionsPressure. Sanctions-pressure panel rendered empty for browser pros. All three swap to premiumFetch (single shared client, mirrors the supply-chain/index.ts justification — premiumFetch no-ops safely on public methods, so the public methods on those clients keep working). Verification: - lint:premium-fetch clean (34 ServiceClient classes, 28 premium paths, 466 src/ files analyzed) - Negative test: revert any of the three to globalThis.fetch → exit 1 with file:line and called-premium-method names - typecheck + typecheck:api clean - lint:api-contract / lint:rate-limit-policies / lint:boundaries clean - tests/sanctions-pressure.test.mjs + premium-fetch.test.mts: 16/16 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(military): fetchStaleFallback NEG_TTL=30s parity (closes #3277) The legacy /api/military-flights handler had NEG_TTL = 30_000ms — a short suppression window after a failed live + stale read so we don't Redis-hammer the stale key during sustained relay+seed outages. Carried into the sebuf list-military-flights handler: - Module-scoped `staleNegUntil` timestamp (per-isolate on Vercel Edge, which is fine — each warm isolate gets its own 30s suppression window). - Set whenever fetchStaleFallback returns null (key missing, parse fail, empty array after staleToProto filter, or thrown error). - Checked at the entry of fetchStaleFallback before doing the Redis read. - Test seam `_resetStaleNegativeCacheForTests()` exposed for unit tests. Test pinned in tests/redis-caching.test.mjs: drives a stale-empty cycle three times — first read hits Redis, second within window doesn't, after test-only reset it does again. Verified: 18/18 redis-caching tests pass, typecheck:api clean, lint:premium-fetch clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(lint): rate-limit-policies regex → import() (closes #3278) The previous lint regex-parsed ENDPOINT_RATE_POLICIES from the source file. That worked because the literal happens to fit a single line per key today, but a future reformat (multi-line key wrap, formatter swap, etc.) would silently break the lint without breaking the build — exactly the failure mode that's worse than no lint at all. Fix: - Export ENDPOINT_RATE_POLICIES from server/_shared/rate-limit.ts. - Convert scripts/enforce-rate-limit-policies.mjs to async + dynamic import() of the policy object directly. Same TS module that the gateway uses at runtime → no source-of-truth drift possible. - Run via tsx (already a dev dep, used by test:data) so the .mjs shebang can resolve a .ts import. - npm script swapped to `tsx scripts/...`. .husky/pre-push uses `npm run lint:rate-limit-policies` so no hook change needed. Verified: - Clean: 6 policies / 182 gateway routes. - Negative test (rename a key to the original sanctions typo /api/sanctions/v1/lookup-entity): exit 1 with the same incident- attributed remedy message as before. - Reformat test (split a single-line entry across multiple lines): still passes — the property is what's read, not the source layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(shipping/v2): alertThreshold: 0 preserved; drop dead validation branch (#3242 followup) Before: alert_threshold was a plain int32. proto3 scalar default is 0, so the handler couldn't distinguish "partner explicitly sent 0 (deliver every disruption)" from "partner omitted the field (apply legacy default 50)" — both arrived as 0 and got coerced to 50 by `> 0 ? : 50`. Silent intent-drop for any partner who wanted every alert. The subsequent `alertThreshold < 0` branch was also unreachable after that coercion. After: - Proto field is `optional int32 alert_threshold` — TS type becomes `alertThreshold?: number`, so omitted = undefined and explicit 0 stays 0. - Handler uses `req.alertThreshold ?? 50` — undefined → 50, any number passes through unchanged. - Dead `< 0 \|\| > 100` runtime check removed; buf.validate `int32.gte = 0, int32.lte = 100` already enforces the range at the wire layer. Partner wire contract: identical for the omit-field and 1..100 cases. Only behavioural change is explicit 0 — previously impossible to request, now honored per proto3 optional semantics. Scoped `buf generate --path worldmonitor/shipping/v2` to avoid the full- regen `@ts-nocheck` drift Seb documented in the #3242 PR comments. Re-applied `@ts-nocheck` on the two regenerated files manually. Tests: - `alertThreshold 0 coerces to 50` flipped to `alertThreshold 0 preserved`. - New test: `alertThreshold omitted (undefined) applies legacy default 50`. - `rejects > 100` test removed — proto/wire validation handles it; direct handler calls intentionally bypass wire and the handler no longer carries a redundant runtime range check. Verified: 18/18 shipping-v2-handler tests pass, typecheck + typecheck:api clean, all 4 custom lints clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(shipping/v2): document missing webhook delivery worker + DNS-rebinding contract (#3242 followup) #3242 followup checklist item 6 from @koala73 — sanity-check that the delivery worker honors the re-resolve-and-re-check contract that isBlockedCallbackUrl explicitly delegates to it. Audit finding: no delivery worker for shipping/v2 webhooks exists in this repo. Grep across the entire tree (excluding generated/dist) shows the only readers of webhook:sub:* records are the registration / inspection / rotate-secret handlers themselves. No code reads them and POSTs to the stored callbackUrl. The delivery worker is presumed to live in Railway (separate repo) or hasn't been built yet — neither is auditable from this repo. Refreshes the comment block at the top of webhook-shared.ts to: - explicitly state DNS rebinding is NOT mitigated at registration - spell out the four-step contract the delivery worker MUST follow (re-validate URL, dns.lookup, re-check resolved IP against patterns, fetch with resolved IP + Host header preserved) - flag the in-repo gap so anyone landing delivery code can't miss it Tracking the gap as #3288 — acceptance there is "delivery worker imports the patterns + helpers from webhook-shared.ts and applies the four steps before each send." Action moves to wherever the delivery worker actually lives (Railway likely). No code change. Tests + lints unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(lint): add rate-limit-policies step (greptile P1 #3287) Pre-push hook ran lint:rate-limit-policies but the CI workflow did not, so fork PRs and --no-verify pushes bypassed the exact drift check the lint was added to enforce (closes #3278). Adding it right after lint:api-contract so it runs in the same context the lint was designed for. * refactor(lint): premium-fetch regex → import() + loop classRe (greptile P2 #3287) Two fragilities greptile flagged on enforce-premium-fetch.mjs: 1. loadPremiumPaths regex-parsed src/shared/premium-paths.ts with /'(\/api\/[^']+)'/g — same class of silent drift we just removed from enforce-rate-limit-policies in #3278. Reformatting the source Set (double quotes, spread, helper-computed entries) would drop paths from the lint while leaving the runtime untouched. Fix: flip the shebang to `#!/usr/bin/env -S npx tsx` and dynamic-import PREMIUM_RPC_PATHS directly, mirroring the rate-limit pattern. package.json lint:premium-fetch now invokes via tsx too so the npm-script path matches direct execution. 2. loadClientClassMap ran classRe.exec once, silently dropping every ServiceClient after the first if a file ever contained more than one. Current codegen emits one class per file so this was latent, but a template change would ship un-linted classes. Fix: collect every class-open match with matchAll, slice each class body with the next class's start as the boundary, and scan methods per-body so method-to-class binding stays correct even with multiple classes per file. Verification: - lint:premium-fetch clean (34 classes / 28 premium paths / 466 files — identical counts to pre-refactor, so no coverage regression). - Negative test: revert src/services/economic/index.ts to globalThis.fetch → exit 1 with file:line, bound var name, and premium method list (getNationalDebt). Restore → clean. - lint:rate-limit-policies still clean. * fix(shipping/v2): re-add alertThreshold handler range guard (greptile nit 1 #3287) Wire-layer buf.validate enforces 0..100, but direct handler invocation (internal jobs, test harnesses, future transports) bypasses it. Cheap invariant-at-the-boundary — rejects < 0 or > 100 with ValidationError before the record is stored. Tests: restored the rejects-out-of-range cases that were dropped when the branch was (correctly) deleted as dead code on the previous commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(lint): premium-fetch method-regex → TS AST (greptile nits 2+5 #3287) loadClientClassMap: The method regex `async (\w+)\s\([^)]\)\s:\sPromise<[^>]+>\s\{\slet path = "..."` assumed (a) no nested `)` in arg types, (b) no nested `>` in the return type, (c) `let path = "..."` as the literal first statement. Any codegen template shift would silently drop methods with the lint still passing clean — the same silent-drift class #3287 just closed on the premium-paths side. Now walks the service_client.ts AST, matches `export class ServiceClient`, iterates `MethodDeclaration` members, and reads the first `let path: string = '...'` variable statement as a StringLiteral. Tolerant to any reformatting of arg/return types or method shape. findCalls scope-blindness: Added limitation comment — the walker matches `<varName>.<method>()` anywhere in the file without respecting scope. Two constructions in different function scopes sharing a var name merge their called-method sets. No current src/ file hits this; the lint errs cautiously (flags both instances). Keeping the walker simple until scope-aware binding is needed. webhook-shared.ts: Inlined issue reference (#3288) so the breadcrumb resolves without bouncing through an MDX that isn't in the diff. Verification: - lint:premium-fetch clean — 34 classes / 28 premium paths / 489 files. Pre-refactor: 34 / 28 / 466. Class + path counts identical; file bump is from the main-branch rebase, not the refactor. - Negative test: revert src/services/economic/index.ts premiumFetch → globalThis.fetch. Lint exits 1 at `src/services/economic/index.ts:64:7` with `premium method(s) called: getNationalDebt`. Restore → clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> refactor(lint): rate-limit OpenAPI regex → yaml parser (greptile nit 3 #3287) Input side (ENDPOINT_RATE_POLICIES) was flipped to live `import()` in `4e79d029`. Output side (OpenAPI routes) still regex-scraped top-level `paths:` keys with `/^\s{4}(\/api\/[^\s:]+):/gm` — hard-coded 4-space indent. Any YAML formatter change (2-space indent, flow style, line folding) would silently drop routes and let policy-drift slip through — same silent-drift class the input-side fix closed. Now uses the `yaml` package (already a dep) to parse each .openapi.yaml and reads `doc.paths` directly. Verification: - Clean: 6 policies / 189 routes (was 182 — yaml parser picks up a handful the regex missed, closing a silent coverage gap). - Negative test: rename policy key back to /api/sanctions/v1/lookup-entity → exits 1 with the same incident-attributed remedy. Restore → clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(codegen): regenerate unified OpenAPI bundle for alert_threshold proto change The shipping/v2 webhook alert_threshold field was flipped from `int32` to `optional int32` with an expanded doc comment in `f3339464`. That comment now surfaces in the unified docs/api/worldmonitor.openapi.yaml bundle (introduced by #3341). Regenerated with sebuf v0.11.1 to pick it up. No behaviour change — bundle-only documentation drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 18:00:41 +03:00
Elie Habib	34dfc9a451	fix(news): ground LLM surfaces on real RSS description end-to-end (#3370 ) * feat(news/parser): extract RSS/Atom description for LLM grounding (U1) Add description field to ParsedItem, extract from the first non-empty of description/content:encoded (RSS) or summary/content (Atom), picking the longest after HTML-strip + entity-decode + whitespace-normalize. Clip to 400 chars. Reject empty, <40 chars after strip, or normalize-equal to the headline — downstream consumers fall back to the cleaned headline on '', preserving current behavior for feeds without a description. CDATA end is anchored to the closing tag so internal ]]> sequences do not truncate the match. Preserves cached rss:feed:v1 row compatibility during the 1h TTL bleed since the field is additive. Part of fix: pipe RSS description end-to-end so LLM surfaces stop hallucinating named actors (docs/plans/2026-04-24-001-...). Covers R1, R7. * feat(news/story-track): persist description on story:track:v1 HSET (U2) Append description to the story:track:v1 HSET only when non-empty. Additive — no key version bump. Old rows and rows from feeds without a description return undefined on HGETALL, letting downstream readers fall back to the cleaned headline (R6). Extract buildStoryTrackHsetFields as a pure helper so the inclusion gate is unit-testable without Redis. Update the contract comment in cache-keys.ts so the next reader of the schema sees description as an optional field. Covers R2, R6. * feat(proto): NewsItem.snippet + SummarizeArticleRequest.bodies (U3) Add two additive proto fields so the article description can ride to every LLM-adjacent consumer without a breaking change: - NewsItem.snippet (field 12): RSS/Atom description, HTML-stripped, ≤400 chars, empty when unavailable. Wired on toProtoItem. - SummarizeArticleRequest.bodies (field 8): optional article bodies paired 1:1 with headlines for prompt grounding. Empty array is today's headline-only behavior. Regenerated TS client/server stubs and OpenAPI YAML/JSON via sebuf v0.11.1 (PATH=~/go/bin required — Homebrew's protoc-gen-openapiv3 is an older pre-bundle-mode build that collides on duplicate emission). Pre-emptive bodies:[] placeholders at the two existing SummarizeArticle call sites in src/services/summarization.ts; U6 replaces them with real article bodies once SummarizeArticle handler reads the field. Covers R3, R5. * feat(brief/digest): forward RSS description end-to-end through brief envelope (U4) Digest accumulator reader (seed-digest-notifications.mjs::buildDigest) now plumbs the optional `description` field off each story:track:v1 HGETALL into the digest story object. The brief adapter (brief-compose.mjs:: digestStoryToUpstreamTopStory) prefers the real RSS description over the cleaned headline; when the upstream row has no description (old rows in the 48h bleed, feeds that don't carry one), we fall back to the cleaned headline so today behavior is preserved (R6). This is the upstream half of the description cache path. U5 lands the LLM- side grounding + cache-prefix bump so Gemini actually sees the article body instead of hallucinating a named actor from the headline. Covers R4 (upstream half), R6. * feat(brief/llm): RSS grounding + sanitisation + 4 cache prefix bumps (U5) The actual fix for the headline-only named-actor hallucination class: Gemini 2.5 Flash now receives the real article body as grounding context, so it paraphrases what the article says instead of filling role-label headlines from parametric priors ("Iran's new supreme leader" → "Ali Khamenei" was the 2026-04-24 reproduction; with grounding, it becomes the actual article-named actor). Changes: - buildStoryDescriptionPrompt interpolates a `Context: <body>` line between the metadata block and the "One editorial sentence" instruction when description is non-empty AND not normalise-equal to the headline. Clips to 400 chars as a second belt-and-braces after the U1 parser cap. No Context line → identical prompt to pre-fix (R6 preserved). - sanitizeStoryForPrompt extended to cover `description`. Closes the asymmetry where whyMatters was sanitised and description wasn't — untrusted RSS bodies now flow through the same injection-marker neutraliser before prompt interpolation. generateStoryDescription wraps the story in sanitizeStoryForPrompt before calling the builder, matching generateWhyMatters. - Four cache prefixes bumped atomically to evict pre-grounding rows: scripts/lib/brief-llm.mjs: brief:llm:description:v1 → v2 (Railway, description path) brief:llm:whymatters:v2 → v3 (Railway, whyMatters fallback) api/internal/brief-why-matters.ts: brief:llm:whymatters:v6 → v7 (edge, primary) brief:llm:whymatters:shadow:v4 → shadow:v5 (edge, shadow) hashBriefStory already includes description in the 6-field material (v5 contract) so identity naturally drifts; the prefix bump is the belt-and-braces that guarantees a clean cold-start on first tick. - Tests: 8 new + 2 prefix-match updates on tests/brief-llm.test.mjs. Covers Context-line injection, empty/dup-of-headline rejection, 400-char clip, sanitisation of adversarial descriptions, v2 write, and legacy-v1 row dark (forced cold-start). Covers R4 + new sanitisation requirement. * feat(news/summarize): accept bodies + bump summary cache v5→v6 (U6) SummarizeArticle now grounds on per-headline article bodies when callers supply them, so the dashboard "News summary" path stops hallucinating across unrelated headlines when the upstream RSS carried context. Three coordinated changes: 1. SummarizeArticleRequest handler reads req.bodies, sanitises each entry through sanitizeForPrompt (same trust treatment as geoContext — bodies are untrusted RSS text), clips to 400 chars, and pads to the headlines length so pair-wise identity is stable. 2. buildArticlePrompts accepts optional bodies and interleaves a ` Context: <body>` line under each numbered headline that has a non-empty body. Skipped in translate mode (headline[0]-only) and when all bodies are empty — yielding a byte-identical prompt to pre-U6 for every current caller (R6 preserved). 3. summary-cache-key bumps CACHE_VERSION v5→v6 so the pre-grounding rows (produced from headline-only prompts) cold-start cleanly. Extends canonicalizeSummaryInputs + buildSummaryCacheKey with a pair-wise bodies segment `:bd<hash>`; the prefix is `:bd` rather than `:b` to avoid colliding with `:brief:` when pattern-matching keys. Translate mode is headline[0]-only and intentionally does not shift on bodies. Dedup reorder preserved: the handler re-pairs bodies to the deduplicated top-5 via findIndex, so layout matches without breaking cache identity. New tests: 7 on buildArticlePrompts (bodies interleave, partial fill, translate-mode skip, clip, short-array tolerance), 8 on buildSummaryCacheKey (pair-wise sort, cache-bust on body drift, translate skip). Existing summary-cache-key assertions updated v5→v6. Covers R3, R4. * feat(consumers): surface RSS snippet across dashboard, email, relay, MCP + audit (U7) Thread the RSS description from the ingestion path (U1-U5) into every user-facing LLM-adjacent surface. Audit the notification producers so RSS-origin and domain-origin events stay on distinct contracts. Dashboard (proto snippet → client → panel): - src/types/index.ts NewsItem.snippet?:string (client-side field). - src/app/data-loader.ts proto→client mapper propagates p.snippet. - src/components/NewsPanel.ts renders snippet as a truncated (~200 chars, word-boundary ellipsis) `.item-snippet` line under each headline. - NewsPanel.currentBodies tracks per-headline bodies paired 1:1 with currentHeadlines; passed as options.bodies to generateSummary so the server-side SummarizeArticle LLM grounds on the article body. Summary plumbing: - src/services/summarization.ts threads bodies through SummarizeOptions → generateSummary → runApiChain → tryApiProvider; cache key now includes bodies (via U6's buildSummaryCacheKey signature). MCP world-brief: - api/mcp.ts pairs headlines with their RSS snippets and POSTs `bodies` to /api/news/v1/summarize-article so the MCP tool surface is no longer starved. Email digest: - scripts/seed-digest-notifications.mjs plain-text formatDigest appends a ~200-char truncated snippet line under each story; HTML formatDigestHtml renders a dim-grey description div between title and meta. Both gated on non-empty description (R6 — empty → today's behavior). Real-time alerts: - src/services/breaking-news-alerts.ts BreakingAlert gains optional description; checkBatchForBreakingAlerts reads item.snippet; dispatchAlert includes `description` in the /api/notify payload when present. Notification relay: - scripts/notification-relay.cjs formatMessage gated on NOTIFY_RELAY_INCLUDE_SNIPPET=1 (default off). When on, RSS-origin payloads render a `> <snippet>` context line under the title. When off or payload.description absent, output is byte-identical to pre-U7. Audit (RSS vs domain): - tests/notification-relay-payload-audit.test.mjs enforces file-level @notification-source tags on every producer, rejects `description:` in domain-origin payload blocks, and verifies the relay codepath gates snippet rendering under the flag. - Tag added to ais-relay.cjs (domain), seed-aviation.mjs (domain), alert-emitter.mjs (domain), breaking-news-alerts.ts (rss). Deferred (plan explicitly flags): InsightsPanel + cluster-producer plumbing (bodies default to [] — will unlock gradually once news:insights:v1 producer also carries primarySnippet). Covers R5, R6. * docs+test: grounding-path note + bump pinned CACHE_VERSION v5→v6 (U8) Final verification for the RSS-description-end-to-end fix: - docs/architecture.mdx — one-paragraph "News Grounding Pipeline" subsection tracing parser → story:track:v1.description → NewsItem.snippet → brief / SummarizeArticle / dashboard / email / relay / MCP, with the empty-description R6 fallback rule called out explicitly. - tests/summarize-reasoning.test.mjs — Fix-4 static-analysis pin updated to match the v6 bump from U6. Without this the summary cache bump silently regressed CI's pinned-version assertion. Final sweep (2026-04-24): - grep -rn 'brief:llm:description:v1' → only in the U5 legacy-row test simulation (by design: proves the v2 bump forces cold-start). - grep -rn 'brief:llm:whymatters:v2/v6/shadow:v4' → no live references. - grep -rn 'summary:v5' → no references. - CACHE_VERSION = 'v6' in src/utils/summary-cache-key.ts. - Full tsx --test sweep across all tests/.test.{mjs,mts}: 6747/6747 pass. - npm run typecheck + typecheck:api: both clean. Covers R4, R6, R7. fix(rss-description): address /ce:review findings before merge 14 fixes from structured code review across 13 reviewer personas. Correctness-critical (P1 — fixes that prevent R6/U7 contract violations): - NewsPanel signature covers currentBodies so view-mode toggles that leave headlines identical but bodies different now invalidate in-flight summaries. Without this, switching renderItems → renderClusters mid-summary let a grounded response arrive under a stale (now-orphaned) cache key. - summarize-article.ts re-pairs bodies with headlines BEFORE dedup via a single zip-sanitize-filter-dedup pass. Previously bodies[] was indexed by position in light-sanitized headlines while findIndex looked up the full-sanitized array — any headline that sanitizeHeadlines emptied mispaired every subsequent body, grounding the LLM on the wrong story. - Client skips the pre-chain cache lookup when bodies are present, since client builds keys from RAW bodies while server sanitizes first. The keys diverge on injection content, which would silently miss the server's authoritative cache every call. Test + audit hardening: - Legacy v1 eviction test now uses the real hashBriefStory(story()) suffix instead of a literal "somehash", so a bug where the reader still queried the v1 prefix at the real key would actually be caught. - tests/summary-cache-key.test.mts adds 400-char clip identity coverage so the canonicalizer's clip and any downstream clip can't silently drift. - tests/news-rss-description-extract.test.mts renames the well-formed CDATA test and adds a new test documenting the malformed-]]> fallback behavior (plain regex captures, article content survives). Safe_auto cleanups: - Deleted dead SNIPPET_PUSH_MAX constant in notification-relay.cjs. - BETA-mode groq warm call now passes bodies, warming the right cache slot. - seed-digest shares a local normalize-equality helper for description != headline comparison, matching the parser's contract. - Pair-wise sort in summary-cache-key tie-breaks on body so duplicate headlines produce stable order across runs. - buildSummaryCacheKey gained JSDoc documenting the client/server contract and the bodies parameter semantics. - MCP get_world_brief tool description now mentions RSS article-body grounding so calling agents see the current contract. - _shared.ts `opts.bodies![i]!` double-bang replaced with `?? ''`. - extractRawTagBody regexes cached in module-level Map, mirroring the existing TAG_REGEX_CACHE pattern. Deferred to follow-up (tracked for PR description / separate issue): - Promote shared MAX_BODY constant across the 5 clip sites - Promote shared truncateForDisplay helper across 4 render sites - Collapse NewsPanel.{currentHeadlines, currentBodies} → Array<{title, snippet}> - Promote sanitizeStoryForPrompt to shared/brief-llm-core.js - Split list-feed-digest.ts parser helpers into sibling -utils.ts - Strengthen audit test: forward-sweep + behavioral gate test Tests: 6749/6749 pass. Typecheck clean on both configs. * fix(summarization): thread bodies through browser T5 path (Codex #2) Addresses the second of two Codex-raised findings on PR #3370: The PR threaded bodies through the server-side API provider chain (Ollama → Groq → OpenRouter → /api/news/v1/summarize-article) but the local browser T5 path at tryBrowserT5 was still summarising from headlines alone. In BETA_MODE that ungrounded path runs BEFORE the grounded server providers; in normal mode it remains the last fallback. Whenever T5-small won, the dashboard summary surface regressed to the headline-only path — the exact hallucination class this PR exists to eliminate. Fix: tryBrowserT5 accepts an optional `bodies` parameter and interleaves each body with its paired headline via a `headline — body` separator in the combined text (clipped to 200 chars per body to stay within T5-small's ~512-token context window). All three call sites (BETA warm, BETA cold, normal-mode fallback) now pass the bodies threaded down from generateSummary options.bodies. When bodies is empty/omitted, the combined text is byte-identical to pre-fix (R6 preserved). On Codex finding #1 (story:track:v1 additive-only HSET keeps a body from an earlier mention of the same normalized title), declining to change. The current rule — "if this mention has a body, overwrite; otherwise leave the prior body alone" — is defensible: a body from mention A is not falsified by mention B being body-less (a wire reprint doesn't invalidate the original source's body). A feed that publishes a corrected headline creates a new normalized-title hash, so no stale body carries forward. The failure window is narrow (live story evolving while keeping the same title through hours of body-less wire reprints) and the 7-day STORY_TTL is the backstop. Opening a follow-up issue to revisit semantics if real-world evidence surfaces a stale-grounding case. * fix(story-track): description always-written to overwrite stale bodies (Codex #1) Revisiting Codex finding #1 on PR #3370 after re-review. The previous response declined the fix with reasoning; on reflection the argument was over-defending the current behavior. Problem: buildStoryTrackHsetFields previously wrote `description` only when non-empty. Because story:track:v1 rows are collapsed by normalized-title hash, an earlier mention's body would persist for up to STORY_TTL (7 days) on subsequent body-less mentions of the same story. Consumers reading `track.description` via HGETALL could not distinguish "this mention's body" from "some mention's body from the last week," silently grounding brief / whyMatters / SummarizeArticle LLMs on text the current mention never supplied. That violates the grounding contract advertised to every downstream surface in this PR. Fix: HSET `description` unconditionally on every mention — empty string when the current item has no body, real body when it does. An empty value overwrites any prior mention's body so the row is always authoritative for the current cycle. Consumers continue to treat empty description as "fall back to cleaned headline" (R6 preserved). The 7-day STORY_TTL and normalized-title hash semantics are unchanged. Trade-off accepted: a valid body from Feed A (NYT) is wiped when Feed B (AP body-less wire reprint) arrives for the same normalized title, even though Feed A's body is factually correct. Rationale: the alternative — keeping Feed A's body indefinitely — means the user sees Feed A's body attributed (by proximity) to an AP mention at a later timestamp, which is at minimum misleading and at worst carries retracted/corrected details. Honest absence beats unlabeled presence. Tests: new stale-body overwrite sequence test (T0 body → T1 empty → T2 new body), existing "writes description when non-empty" preserved, existing "omits when empty" inverted to "writes empty, overwriting." cache-keys.ts contract comment updated to mark description as always-written rather than optional.	2026-04-24 16:25:14 +04:00
Sebastien Melki	fcbb8bc0a1	feat(proto): unified OpenAPI bundle via sebuf v0.11.0 (#3341 ) * feat(proto): generate unified worldmonitor.openapi.yaml bundle Adds a third protoc-gen-openapiv3 invocation that merges every service into a single docs/api/worldmonitor.openapi.yaml spanning all 68 RPCs, using the new bundle support shipped in sebuf 0.11.0 (SebastienMelki/sebuf#158). Per-service YAML/JSON files are untouched and continue to back the Mintlify docs in docs/docs.json. The bundle runs with strategy: all and bundle_only=true so only the aggregate file is emitted, avoiding duplicate-output conflicts with the existing per-service generator. Requires protoc-gen-openapiv3 >= v0.11.0 locally: go install github.com/SebastienMelki/sebuf/cmd/protoc-gen-openapiv3@v0.11.0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(proto): bump sebuf to v0.11.0 and document unified OpenAPI bundle - Makefile: SEBUF_VERSION v0.7.0 → v0.11.0 (required for bundle support). - proto/buf.gen.yaml: point bundle_server at https://api.worldmonitor.app. - CONTRIBUTING.md: new "OpenAPI Output" section covering per-service specs vs the unified worldmonitor.openapi.yaml bundle, plus a note that all three sebuf plugins must be installed from the pinned version. - AGENTS.md: clarify that `make generate` also produces the unified spec and requires sebuf v0.11.0. - CHANGELOG.md: Unreleased entry announcing the bundle and version bump. Also regenerates the bundle with the updated server URL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(codegen): regenerate TS client/server with sebuf v0.11.0 Mechanical output of the bumped protoc-gen-ts-client and protoc-gen-ts-server. Two behavioral improvements roll in from sebuf: - Proto enum fields now use the proper `_UNSPECIFIED` sentinel in default-value checks instead of the empty string, so generated query-string serializers correctly omit enum params only when they actually equal the proto default. - `repeated string` query params now serialize via `forEach(v => params.append(...))` instead of being coerced through `String(req.field)`, matching the existing `parseStringArray()` contract on the server side. All files also drop the `// @ts-nocheck` header that earlier sebuf versions emitted — 0.11.0 output type-checks cleanly under our tsconfig. No hand edits. Reproduce with `make install-plugins && make generate`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(proto): bump sebuf v0.11.0 → v0.11.1, realign tests with repeated-param wire format - Bump SEBUF_VERSION to v0.11.1, pulling in the OpenAPI fix for repeated scalar query params (SebastienMelki/sebuf#161). `repeated string` fields now emit `type: array` + `items.type: string` + `style: form` + `explode: true` instead of `type: string`, so SDK generators consuming the unified bundle produce correct array clients. - Regenerate all 12 OpenAPI specs (unified bundle + Aviation, Economic, Infrastructure, Market, Trade per-service). TS client/server codegen is byte-identical to v0.11.0 — only the OpenAPI emitter was out of sync. - Update three tests that asserted the pre-v0.11 comma-joined wire format (`symbols=AAPL,MSFT`) to match the current repeated-param form (`symbols=AAPL&symbols=MSFT`) produced by `params.append(...)`: - tests/market-service-symbol-casing.test.mjs (2 cases: getAll) - tests/stock-analysis-history.test.mts - tests/stock-backtest.test.mts Locally: test:data 6619/6619 pass, typecheck clean, lint exit 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-04-23 16:24:03 +03:00

4 Commits