Files
worldmonitor/scripts/seed-bundle-portwatch-port-activity.mjs
Elie Habib 0a4eff0053 feat(portwatch): split port-activity into standalone Railway cron + restore per-country shape (#3231)
Context: PR #3225 globalised EP3 because the per-country shape was
missing the section budget. Post-merge production log (2026-04-20)
proved the globalisation itself was worse: 42s/page full-table scans
(ArcGIS has no `date` index — confirmed via service metadata probe)
AND intermittent "Invalid query parameters" on the global WHERE.

Probes of outStatistics as an alternative showed it works for small
countries (BRA: 19s, 103 ports) but times out server-side for heavy
ones (USA: 313k historic rows, 30s+ server-compute, multiple
retries returned HTTP_STATUS 000). Not a reliable path.

The only shape ArcGIS reliably handles is per-country WHERE ISO3='X'
AND date > Y (uses the ISO3 index). Its problem was fitting 174
countries in the 420s portwatch bundle budget — solve that by giving
it its own container.

Changes:

- scripts/seed-portwatch-port-activity.mjs: restore per-country
  paginated EP3 with the accumulator shape from PR #3225 folded into
  the per-country loop (memory stays O(ports-per-country), not
  O(all-rows)). Keep every stabiliser: AbortSignal.any through
  fetchWithTimeout, SIGTERM handler with stage/batch/errors flush,
  per-country Promise.race with AbortController that actually cancels
  the work, eager p.catch for mid-batch error flush.
- Add fetchWithRetryOnInvalidParams — single retry on the specific
  "Invalid query parameters" error class ArcGIS has returned
  intermittently in prod. Does not retry other error classes.
- Bump LOCK_TTL_MS from 30 to 60 min to match the wider wall-time
  budget of the standalone cron.
- scripts/seed-bundle-portwatch.mjs: remove PW-Port-Activity from the
  main portwatch bundle. Keeps PW-Disruptions (hourly), PW-Main (6h),
  PW-Chokepoints-Ref (weekly).
- scripts/seed-bundle-portwatch-port-activity.mjs: new 1-section
  bundle. 540s section timeout, 570s bundle budget. Includes the
  full Railway service provisioning checklist in the header.
- Dockerfile.seed-bundle-portwatch-port-activity: mirrors the
  resilience-validation pattern — node:22-alpine, full scripts/ tree
  copy (avoids the add-an-import-forget-to-COPY class that has bit
  us 3+ times), shared/ for _country-resolver.
- tests/portwatch-port-activity-seed.test.mjs: rewrite assertions for
  the per-country shape. 54 tests pass (was 50, +4 for new
  assertions on the standalone bundle + Dockerfile + retry wrapper +
  ISO3 shape). Full test:data: 5883 pass. Typecheck + lint clean.

Post-merge Railway provisioning: see header of
seed-bundle-portwatch-port-activity.mjs for the 7-step checklist.
2026-04-20 15:21:43 +04:00

52 lines
2.5 KiB
JavaScript

#!/usr/bin/env node
// Standalone Railway cron service for supply_chain:portwatch-ports.
//
// Split out of seed-bundle-portwatch.mjs on 2026-04-20 because ArcGIS
// Daily_Ports_Data queries scale poorly at the N-countries level: even
// with per-country ISO3-indexed WHERE clauses + concurrency 12, wall
// time exceeded the bundle's 540s budget. Globalising the fetch (PR
// #3225) traded timeouts for a different failure mode (42s full-table
// scans + intermittent "Invalid query parameters"). Giving this seeder
// its own container decouples its worst-case runtime from the main
// portwatch bundle and lets it run on an interval appropriate to the
// ~10-day upstream dataset lag.
//
// Railway service provisioning checklist (after merge):
// 1. Create new service: portwatch-port-activity-seed
// 2. Builder: DOCKERFILE, dockerfilePath: Dockerfile.seed-bundle-portwatch-port-activity
// 3. Root directory: "" (empty) — avoids NIXPACKS auto-detection (see
// feedback_railway_dockerfile_autodetect_overrides_builder.md)
// 4. Cron schedule: "0 */24 * * *" (daily, UTC) — dataset lag means
// 12h cadence is overkill; 24h keeps us inside the freshness
// expectations downstream
// 5. Env vars (copy from existing seed services):
// UPSTASH_REDIS_REST_URL, UPSTASH_REDIS_REST_TOKEN,
// PROXY_URL (for 429 fallback)
// 6. Watch paths (in service settings):
// scripts/seed-portwatch-port-activity.mjs,
// scripts/seed-bundle-portwatch-port-activity.mjs,
// scripts/_seed-utils.mjs,
// scripts/_proxy-utils.cjs,
// scripts/_country-resolver.mjs,
// scripts/_bundle-runner.mjs,
// Dockerfile.seed-bundle-portwatch-port-activity
// 7. Monitor first run for STALE_SEED recovery on portwatch-ports.
import { runBundle, HOUR } from './_bundle-runner.mjs';
await runBundle('portwatch-port-activity', [
{
label: 'PW-Port-Activity',
script: 'seed-portwatch-port-activity.mjs',
seedMetaKey: 'supply_chain:portwatch-ports',
canonicalKey: 'supply_chain:portwatch-ports:v1:_countries',
// 12h interval gate — matches the historical cadence. Actual Railway
// cron should trigger at 24h; the interval gate prevents rapid-fire
// re-runs if someone manually retriggers mid-day.
intervalMs: 12 * HOUR,
// 540s section timeout — full budget for the one section. Bundle
// runner still SIGTERMs if the child hangs, and the seeder's
// SIGTERM handler releases the lock + extends TTLs.
timeoutMs: 540_000,
},
], { maxBundleMs: 570_000 });