Files
worldmonitor/docs/api/NewsService.openapi.yaml
Elie Habib 80cb7d5aa7 fix(cache): digest TTL alignment + slow-browser tier + feedStatuses trim (#1798)
* fix(cache): align Redis digest + RSS feed TTLs to CF CDN TTL

RSS feed TTL 600s → 3600s; digest TTL 900s → 3600s.
CF CDN caches at 3600s, so Redis expiring earlier caused every hourly
CF revalidation to hit a cold origin and run the full buildDigest()
pipeline (75 feeds, up to 25s). Aligning both to 3600s ensures CF
revalidation gets a warm Redis hit and returns immediately.

* fix(cache): emit only non-ok feedStatuses; update proto comment + make generate

Digest was emitting 'ok' for every successful feed (~50 entries, ~1-2KB
per response). No in-repo client reads feedStatuses values. Changed to
only emit 'empty' and 'timeout'; absent key implies ok.

Updated proto comment to document the absence-implies-ok contract and
ran make generate to regenerate docs/api/ OpenAPI files.

* fix(cache): add slow-browser tier; move digest route to it

New 'slow-browser' tier is identical to 'slow' but adds max-age=300,
letting browsers skip the network for 5 minutes. Without max-age,
browsers ignore s-maxage and send conditional If-None-Match on every
20-min poll — each costing 1 billable edge request even for 304s.

Scoped only to list-feed-digest (a safe polling endpoint). Premium
user-triggered endpoints (analyze-stock, backtest-stock) stay on 'slow'
where browser caching is inappropriate.

* test: regression tests for feedStatuses and slow-browser tier

- digest-no-reclassify: assert buildDigest does not write 'ok' to feedStatuses
- route-cache-tier: include slow-browser in tier regex; assert slow-browser
  has max-age and slow tier does not

* fix(cache): add variant to per-feed RSS cache key

rss:feed:v1:${url} was shared across variants even though classifyByKeyword()
bakes variant-specific threat/category labels into the cached ParsedItem[].
Feeds shared between full and tech variants (Verge, Ars, HN, etc.) had
whichever variant populated the cache first control the other variant's
classifications for the full 3600s TTL — turning a pre-existing 10-minute
bleed-through into a 1-hour accuracy bug for the tech dashboard.

Fix: key is now rss:feed:v1:${variant}:${url}.

* fix(cache): bypass browser HTTP cache on digest fetch

max-age=300 on the slow-browser tier lets browsers serve the digest
from their HTTP cache for up to 5 minutes, including on explicit
in-app refresh (window.location.reload) or page reload after a
breaking event. Users would see stale data until the TTL expired.

Add cache: 'no-cache' to tryFetchDigest() so every fetch revalidates
against CF edge. CF returns 304 (minimal cost) when data is unchanged,
or 200 with the current digest. s-maxage and CF-level caching are
unaffected; max-age still benefits browser back/forward cache.

* fix(cache): 15-min consistent TTL + degrade guard for digest

Issue 1 — TTL alignment: Redis digest TTL reverted to 900s (from 3600).
slow-browser tier reduced from s-maxage=1800/CDN=3600 to s-maxage=900 on
both sides, matching the Redis TTL. The freshness window is now consistently
15 minutes across Redis, Vercel edge, and CF CDN. max-age=300 (browser
local) is kept to avoid unnecessary revalidations on tab switch.

Issue 2 — Cache poisoning: replaced cachedFetchJson in listFeedDigest with
explicit getCachedJson/setCachedJson. After buildDigest(), if total items
across all categories is 0 the response is treated as degraded: Redis write
is skipped and markNoCacheResponse(ctx.request) is called so the gateway
sets Cache-Control: no-store instead of the normal tier headers. This
prevents a transient bad run from poisoning Redis and browser/CDN for the
full TTL. Error paths also call markNoCacheResponse.
2026-03-18 10:19:17 +04:00

338 lines
14 KiB
YAML

openapi: 3.1.0
info:
title: NewsService API
version: 1.0.0
paths:
/api/news/v1/summarize-article:
post:
tags:
- NewsService
summary: SummarizeArticle
description: SummarizeArticle generates an LLM summary with provider selection and fallback support.
operationId: SummarizeArticle
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/SummarizeArticleRequest'
required: true
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/SummarizeArticleResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
/api/news/v1/summarize-article-cache:
get:
tags:
- NewsService
summary: GetSummarizeArticleCache
description: GetSummarizeArticleCache looks up a cached summary by deterministic key (CDN-cacheable GET).
operationId: GetSummarizeArticleCache
parameters:
- name: cache_key
in: query
description: Deterministic cache key computed by buildSummaryCacheKey().
required: false
schema:
type: string
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/SummarizeArticleResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
/api/news/v1/list-feed-digest:
get:
tags:
- NewsService
summary: ListFeedDigest
description: ListFeedDigest returns a pre-aggregated digest of all RSS feeds for a site variant.
operationId: ListFeedDigest
parameters:
- name: variant
in: query
description: 'Site variant: full, tech, finance, happy'
required: false
schema:
type: string
- name: lang
in: query
description: ISO 639-1 language code (en, fr, ar, etc.)
required: false
schema:
type: string
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/ListFeedDigestResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
components:
schemas:
Error:
type: object
properties:
message:
type: string
description: Error message (e.g., 'user not found', 'database connection failed')
description: Error is returned when a handler encounters an error. It contains a simple error message that the developer can customize.
FieldViolation:
type: object
properties:
field:
type: string
description: The field path that failed validation (e.g., 'user.email' for nested fields). For header validation, this will be the header name (e.g., 'X-API-Key')
description:
type: string
description: Human-readable description of the validation violation (e.g., 'must be a valid email address', 'required field missing')
required:
- field
- description
description: FieldViolation describes a single validation error for a specific field.
ValidationError:
type: object
properties:
violations:
type: array
items:
$ref: '#/components/schemas/FieldViolation'
description: List of validation violations
required:
- violations
description: ValidationError is returned when request validation fails. It contains a list of field violations describing what went wrong.
SummarizeArticleRequest:
type: object
properties:
provider:
type: string
minLength: 1
description: 'LLM provider: "ollama", "groq", "openrouter"'
headlines:
type: array
items:
type: string
minItems: 1
description: Headlines to summarize (max 8 used).
minItems: 1
mode:
type: string
description: 'Summarization mode: "brief", "analysis", "translate", "" (default).'
geoContext:
type: string
description: Geographic signal context to include in the prompt.
variant:
type: string
description: 'Variant: "full", "tech", or target language for translate mode.'
lang:
type: string
description: Output language code, default "en".
required:
- provider
description: SummarizeArticleRequest specifies parameters for LLM article summarization.
SummarizeArticleResponse:
type: object
properties:
summary:
type: string
description: The generated summary text.
model:
type: string
description: Model identifier used for generation.
provider:
type: string
description: Provider that produced the result (or "cache").
tokens:
type: integer
format: int32
description: Token count from the LLM response.
fallback:
type: boolean
description: Whether the client should try the next provider in the fallback chain.
error:
type: string
description: Error message if the request failed.
errorType:
type: string
description: Error type/name (e.g. "TypeError").
status:
type: string
enum:
- SUMMARIZE_STATUS_UNSPECIFIED
- SUMMARIZE_STATUS_SUCCESS
- SUMMARIZE_STATUS_CACHED
- SUMMARIZE_STATUS_SKIPPED
- SUMMARIZE_STATUS_ERROR
description: SummarizeStatus indicates the outcome of a summarization request.
statusDetail:
type: string
description: Human-readable detail for non-success statuses (skip reason, etc.).
description: SummarizeArticleResponse contains the LLM summarization result.
GetSummarizeArticleCacheRequest:
type: object
properties:
cacheKey:
type: string
description: Deterministic cache key computed by buildSummaryCacheKey().
description: GetSummarizeArticleCacheRequest looks up a pre-computed summary by cache key.
ListFeedDigestRequest:
type: object
properties:
variant:
type: string
description: 'Site variant: full, tech, finance, happy'
lang:
type: string
description: ISO 639-1 language code (en, fr, ar, etc.)
ListFeedDigestResponse:
type: object
properties:
categories:
type: object
additionalProperties:
$ref: '#/components/schemas/CategoryBucket'
description: Per-category buckets — keys match category names from feed config
feedStatuses:
type: object
additionalProperties:
type: string
description: |-
Per-feed status — only non-ok states emitted; absent key implies ok.
Values: empty (feed returned 0 items), timeout (timed out during fetch).
generatedAt:
type: string
description: ISO 8601 timestamp of when this digest was generated
CategoriesEntry:
type: object
properties:
key:
type: string
value:
$ref: '#/components/schemas/CategoryBucket'
FeedStatusesEntry:
type: object
properties:
key:
type: string
value:
type: string
CategoryBucket:
type: object
properties:
items:
type: array
items:
$ref: '#/components/schemas/NewsItem'
NewsItem:
type: object
properties:
source:
type: string
minLength: 1
description: Source feed name.
title:
type: string
minLength: 1
description: Article headline.
link:
type: string
description: Article URL.
publishedAt:
type: integer
format: int64
description: 'Publication time, as Unix epoch milliseconds.. Warning: Values > 2^53 may lose precision in JavaScript'
isAlert:
type: boolean
description: Whether this article triggered an alert condition.
threat:
$ref: '#/components/schemas/ThreatClassification'
location:
$ref: '#/components/schemas/GeoCoordinates'
locationName:
type: string
description: Human-readable location name.
required:
- source
- title
description: NewsItem represents a single news article from RSS feed aggregation.
ThreatClassification:
type: object
properties:
level:
type: string
enum:
- THREAT_LEVEL_UNSPECIFIED
- THREAT_LEVEL_LOW
- THREAT_LEVEL_MEDIUM
- THREAT_LEVEL_HIGH
- THREAT_LEVEL_CRITICAL
description: ThreatLevel represents the assessed threat level of a news event.
category:
type: string
description: Event category.
confidence:
type: number
maximum: 1
minimum: 0
format: double
description: Confidence score (0.0 to 1.0).
source:
type: string
description: Classification source — "keyword", "ml", or "llm".
description: ThreatClassification represents an AI-assessed threat level for a news item.
GeoCoordinates:
type: object
properties:
latitude:
type: number
maximum: 90
minimum: -90
format: double
description: Latitude in decimal degrees (-90 to 90).
longitude:
type: number
maximum: 180
minimum: -180
format: double
description: Longitude in decimal degrees (-180 to 180).
description: GeoCoordinates represents a geographic location using WGS84 coordinates.