mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-26 01:24:59 +02:00
* feat(scoring): add composite importance score + story tracking infrastructure
- Extract SOURCE_TIERS/getSourceTier to server/_shared/source-tiers.ts so
server handlers can import it without pulling in client-only modules;
src/config/feeds.ts re-exports for backward compatibility
- Add story tracking Redis key helpers to cache-keys.ts
(story:track:v1, story:sources:v1, story:peak:v1, digest:accumulator:v1)
- Export SEVERITY_SCORES from _classifier.ts for server-side score math
- Add upstashPipeline() to redis.ts for arbitrary batched Redis writes
- Add importanceScore/corroborationCount/storyPhase fields to proto,
generated TS, and src/types NewsItem
- Add StoryMeta message and StoryPhase enum to proto
- In list-feed-digest.ts:
- Build corroboration map across full corpus BEFORE per-category truncation
- Compute importanceScore (severity 40% + tier 20% + corroboration 30%
+ recency 10%) per item
- Sort by importanceScore desc before truncating at MAX_ITEMS_PER_CATEGORY
- Write story:track / story:sources / story:peak / digest:accumulator
to Redis in 80-story pipeline batches after each digest build
Score gate in notification-relay.cjs follows in the next PR (shadow mode,
behind IMPORTANCE_SCORE_LIVE flag). RELAY_GATES_READY removal of
/api/notify comes after 48h shadow comparison confirms parity.
* fix(scoring): add storyPhase field + regenerate proto types
- Add storyPhase to ParsedItem and toProtoItem (defaults UNSPECIFIED)
- Regenerate service_server.ts: required fields, StoryPhase type relocated
- Regenerate service_client.ts and OpenAPI docs from buf generate
- Fix typecheck:api error on missing required storyPhase in NewsItem
* fix(scoring): address all code review findings from PR #2604
P1:
- await writeStoryTracking instead of fire-and-forget to prevent
silent data loss on edge isolate teardown
- remove duplicate upstashPipeline; use existing runRedisPipeline
- strip non-https links before Redis write (XSS prevention)
- implement storyPhase read path: HGETALL batch + computePhase()
so BREAKING/DEVELOPING/SUSTAINED/FADING badges are now live
P2/P3:
- extend STORY_TTL 48h → 7 days (sustained stories no longer reset)
- extract SCORE_WEIGHTS named constants with rationale comment
- move SEVERITY_SCORES out of _classifier.ts into list-feed-digest.ts
- add normalizeTitle comment referencing todo #102
- pre-compute title hashes once, share between phase read + write
* fix(scoring): correct enrichment-before-scoring and write-before-read ordering
Two sequencing bugs:
1. enrichWithAiCache ran after truncation (post-slice), so items whose
threat level was upgraded by the LLM cache could have already been
cut from the top-20, and downgraded items kept inflated scores.
Fix: enrich ALL items from the full corpus before scoring, so
importanceScore always uses the final post-LLM classification level.
2. Phase HGETALL read happened before writeStoryTracking, meaning
first-time stories had no Redis entry and always returned UNSPECIFIED
instead of BREAKING, and all existing stories lagged one cycle behind.
Fix: write tracking first, then read back for phase assignment.
1 line
9.6 KiB
JSON
1 line
9.6 KiB
JSON
{"components":{"schemas":{"CategoriesEntry":{"properties":{"key":{"type":"string"},"value":{"$ref":"#/components/schemas/CategoryBucket"}},"type":"object"},"CategoryBucket":{"properties":{"items":{"items":{"$ref":"#/components/schemas/NewsItem"},"type":"array"}},"type":"object"},"Error":{"description":"Error is returned when a handler encounters an error. It contains a simple error message that the developer can customize.","properties":{"message":{"description":"Error message (e.g., 'user not found', 'database connection failed')","type":"string"}},"type":"object"},"FeedStatusesEntry":{"properties":{"key":{"type":"string"},"value":{"type":"string"}},"type":"object"},"FieldViolation":{"description":"FieldViolation describes a single validation error for a specific field.","properties":{"description":{"description":"Human-readable description of the validation violation (e.g., 'must be a valid email address', 'required field missing')","type":"string"},"field":{"description":"The field path that failed validation (e.g., 'user.email' for nested fields). For header validation, this will be the header name (e.g., 'X-API-Key')","type":"string"}},"required":["field","description"],"type":"object"},"GeoCoordinates":{"description":"GeoCoordinates represents a geographic location using WGS84 coordinates.","properties":{"latitude":{"description":"Latitude in decimal degrees (-90 to 90).","format":"double","maximum":90,"minimum":-90,"type":"number"},"longitude":{"description":"Longitude in decimal degrees (-180 to 180).","format":"double","maximum":180,"minimum":-180,"type":"number"}},"type":"object"},"GetSummarizeArticleCacheRequest":{"description":"GetSummarizeArticleCacheRequest looks up a pre-computed summary by cache key.","properties":{"cacheKey":{"description":"Deterministic cache key computed by buildSummaryCacheKey().","type":"string"}},"type":"object"},"ListFeedDigestRequest":{"properties":{"lang":{"description":"ISO 639-1 language code (en, fr, ar, etc.)","type":"string"},"variant":{"description":"Site variant: full, tech, finance, happy","type":"string"}},"type":"object"},"ListFeedDigestResponse":{"properties":{"categories":{"additionalProperties":{"$ref":"#/components/schemas/CategoryBucket"},"description":"Per-category buckets — keys match category names from feed config","type":"object"},"feedStatuses":{"additionalProperties":{"type":"string"},"description":"Per-feed status — only non-ok states emitted; absent key implies ok.\n Values: empty (feed returned 0 items), timeout (timed out during fetch).","type":"object"},"generatedAt":{"description":"ISO 8601 timestamp of when this digest was generated","type":"string"}},"type":"object"},"NewsItem":{"description":"NewsItem represents a single news article from RSS feed aggregation.","properties":{"corroborationCount":{"description":"Number of distinct sources that reported the same story in this digest cycle.","format":"int32","type":"integer"},"importanceScore":{"description":"Composite importance score (0-100): severity × 40% + source tier × 20% + corroboration × 30% + recency × 10%.\n Absent (0) when not yet scored.","format":"int32","type":"integer"},"isAlert":{"description":"Whether this article triggered an alert condition.","type":"boolean"},"link":{"description":"Article URL.","type":"string"},"location":{"$ref":"#/components/schemas/GeoCoordinates"},"locationName":{"description":"Human-readable location name.","type":"string"},"publishedAt":{"description":"Publication time, as Unix epoch milliseconds.. Warning: Values \u003e 2^53 may lose precision in JavaScript","format":"int64","type":"integer"},"source":{"description":"Source feed name.","minLength":1,"type":"string"},"storyPhase":{"description":"StoryPhase represents the lifecycle stage of a tracked news story.","enum":["STORY_PHASE_UNSPECIFIED","STORY_PHASE_BREAKING","STORY_PHASE_DEVELOPING","STORY_PHASE_SUSTAINED","STORY_PHASE_FADING"],"type":"string"},"threat":{"$ref":"#/components/schemas/ThreatClassification"},"title":{"description":"Article headline.","minLength":1,"type":"string"}},"required":["source","title"],"type":"object"},"SummarizeArticleRequest":{"description":"SummarizeArticleRequest specifies parameters for LLM article summarization.","properties":{"geoContext":{"description":"Geographic signal context to include in the prompt.","type":"string"},"headlines":{"items":{"description":"Headlines to summarize (max 8 used).","minItems":1,"type":"string"},"minItems":1,"type":"array"},"lang":{"description":"Output language code, default \"en\".","type":"string"},"mode":{"description":"Summarization mode: \"brief\", \"analysis\", \"translate\", \"\" (default).","type":"string"},"provider":{"description":"LLM provider: \"ollama\", \"groq\", \"openrouter\"","minLength":1,"type":"string"},"systemAppend":{"description":"Optional system prompt append for analytical framework instructions.","type":"string"},"variant":{"description":"Variant: \"full\", \"tech\", or target language for translate mode.","type":"string"}},"required":["provider"],"type":"object"},"SummarizeArticleResponse":{"description":"SummarizeArticleResponse contains the LLM summarization result.","properties":{"error":{"description":"Error message if the request failed.","type":"string"},"errorType":{"description":"Error type/name (e.g. \"TypeError\").","type":"string"},"fallback":{"description":"Whether the client should try the next provider in the fallback chain.","type":"boolean"},"model":{"description":"Model identifier used for generation.","type":"string"},"provider":{"description":"Provider that produced the result (or \"cache\").","type":"string"},"status":{"description":"SummarizeStatus indicates the outcome of a summarization request.","enum":["SUMMARIZE_STATUS_UNSPECIFIED","SUMMARIZE_STATUS_SUCCESS","SUMMARIZE_STATUS_CACHED","SUMMARIZE_STATUS_SKIPPED","SUMMARIZE_STATUS_ERROR"],"type":"string"},"statusDetail":{"description":"Human-readable detail for non-success statuses (skip reason, etc.).","type":"string"},"summary":{"description":"The generated summary text.","type":"string"},"tokens":{"description":"Token count from the LLM response.","format":"int32","type":"integer"}},"type":"object"},"ThreatClassification":{"description":"ThreatClassification represents an AI-assessed threat level for a news item.","properties":{"category":{"description":"Event category.","type":"string"},"confidence":{"description":"Confidence score (0.0 to 1.0).","format":"double","maximum":1,"minimum":0,"type":"number"},"level":{"description":"ThreatLevel represents the assessed threat level of a news event.","enum":["THREAT_LEVEL_UNSPECIFIED","THREAT_LEVEL_LOW","THREAT_LEVEL_MEDIUM","THREAT_LEVEL_HIGH","THREAT_LEVEL_CRITICAL"],"type":"string"},"source":{"description":"Classification source — \"keyword\", \"ml\", or \"llm\".","type":"string"}},"type":"object"},"ValidationError":{"description":"ValidationError is returned when request validation fails. It contains a list of field violations describing what went wrong.","properties":{"violations":{"description":"List of validation violations","items":{"$ref":"#/components/schemas/FieldViolation"},"type":"array"}},"required":["violations"],"type":"object"}}},"info":{"title":"NewsService API","version":"1.0.0"},"openapi":"3.1.0","paths":{"/api/news/v1/list-feed-digest":{"get":{"description":"ListFeedDigest returns a pre-aggregated digest of all RSS feeds for a site variant.","operationId":"ListFeedDigest","parameters":[{"description":"Site variant: full, tech, finance, happy","in":"query","name":"variant","required":false,"schema":{"type":"string"}},{"description":"ISO 639-1 language code (en, fr, ar, etc.)","in":"query","name":"lang","required":false,"schema":{"type":"string"}}],"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ListFeedDigestResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"ListFeedDigest","tags":["NewsService"]}},"/api/news/v1/summarize-article":{"post":{"description":"SummarizeArticle generates an LLM summary with provider selection and fallback support.","operationId":"SummarizeArticle","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleRequest"}}},"required":true},"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"SummarizeArticle","tags":["NewsService"]}},"/api/news/v1/summarize-article-cache":{"get":{"description":"GetSummarizeArticleCache looks up a cached summary by deterministic key (CDN-cacheable GET).","operationId":"GetSummarizeArticleCache","parameters":[{"description":"Deterministic cache key computed by buildSummaryCacheKey().","in":"query","name":"cache_key","required":false,"schema":{"type":"string"}}],"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"GetSummarizeArticleCache","tags":["NewsService"]}}}} |