Files
worldmonitor/Dockerfile.seed-bundle-portwatch-port-activity
Elie Habib 0a4eff0053 feat(portwatch): split port-activity into standalone Railway cron + restore per-country shape (#3231)
Context: PR #3225 globalised EP3 because the per-country shape was
missing the section budget. Post-merge production log (2026-04-20)
proved the globalisation itself was worse: 42s/page full-table scans
(ArcGIS has no `date` index — confirmed via service metadata probe)
AND intermittent "Invalid query parameters" on the global WHERE.

Probes of outStatistics as an alternative showed it works for small
countries (BRA: 19s, 103 ports) but times out server-side for heavy
ones (USA: 313k historic rows, 30s+ server-compute, multiple
retries returned HTTP_STATUS 000). Not a reliable path.

The only shape ArcGIS reliably handles is per-country WHERE ISO3='X'
AND date > Y (uses the ISO3 index). Its problem was fitting 174
countries in the 420s portwatch bundle budget — solve that by giving
it its own container.

Changes:

- scripts/seed-portwatch-port-activity.mjs: restore per-country
  paginated EP3 with the accumulator shape from PR #3225 folded into
  the per-country loop (memory stays O(ports-per-country), not
  O(all-rows)). Keep every stabiliser: AbortSignal.any through
  fetchWithTimeout, SIGTERM handler with stage/batch/errors flush,
  per-country Promise.race with AbortController that actually cancels
  the work, eager p.catch for mid-batch error flush.
- Add fetchWithRetryOnInvalidParams — single retry on the specific
  "Invalid query parameters" error class ArcGIS has returned
  intermittently in prod. Does not retry other error classes.
- Bump LOCK_TTL_MS from 30 to 60 min to match the wider wall-time
  budget of the standalone cron.
- scripts/seed-bundle-portwatch.mjs: remove PW-Port-Activity from the
  main portwatch bundle. Keeps PW-Disruptions (hourly), PW-Main (6h),
  PW-Chokepoints-Ref (weekly).
- scripts/seed-bundle-portwatch-port-activity.mjs: new 1-section
  bundle. 540s section timeout, 570s bundle budget. Includes the
  full Railway service provisioning checklist in the header.
- Dockerfile.seed-bundle-portwatch-port-activity: mirrors the
  resilience-validation pattern — node:22-alpine, full scripts/ tree
  copy (avoids the add-an-import-forget-to-COPY class that has bit
  us 3+ times), shared/ for _country-resolver.
- tests/portwatch-port-activity-seed.test.mjs: rewrite assertions for
  the per-country shape. 54 tests pass (was 50, +4 for new
  assertions on the standalone bundle + Dockerfile + retry wrapper +
  ISO3 shape). Full test:data: 5883 pass. Typecheck + lint clean.

Post-merge Railway provisioning: see header of
seed-bundle-portwatch-port-activity.mjs for the 7-step checklist.
2026-04-20 15:21:43 +04:00

32 lines
1.4 KiB
Docker

# =============================================================================
# Seed Bundle: PortWatch Port-Activity (daily cron, standalone)
# =============================================================================
# Runs scripts/seed-bundle-portwatch-port-activity.mjs which spawns the single
# PW-Port-Activity section via _bundle-runner.mjs. Split out of the main
# portwatch bundle on 2026-04-20 because ArcGIS Daily_Ports_Data scales poorly
# at the per-country level — see the bundle script header for the full
# rationale.
#
# Runtime deps this image needs:
# - node:22-alpine (ESM, AbortSignal.any, fetch)
# - scripts/ tree (seeder + _seed-utils + _proxy-utils + _country-resolver
# + _bundle-runner + proxy helpers)
# - shared/ (country mappings that _country-resolver depends on)
# - .nvmrc? (not needed; node version baked into the base image)
# =============================================================================
FROM node:22-alpine
WORKDIR /app
# Ship the full scripts/ tree rather than cherry-picking. The cherry-picked
# approach has broken three+ seed services in the past when a local import
# was added without updating the Dockerfile (PR #3041, #3052, #3196). 2 MB of
# scripts + robustness is a good trade.
COPY scripts/ ./scripts/
COPY shared/ ./shared/
ENV NODE_OPTIONS="--max-old-space-size=1024 --dns-result-order=ipv4first"
CMD ["node", "scripts/seed-bundle-portwatch-port-activity.mjs"]