Files
worldmonitor/scripts/seed-etf-flows.mjs
Elie Habib 9b07fc8d8a feat(yahoo): _yahoo-fetch helper with curl-only Decodo proxy fallback + 4 seeder migrations (#3120)
* feat(_yahoo-fetch): curl-only Decodo proxy fallback helper

Yahoo Finance throttles Railway egress IPs aggressively. 4 seeders
(seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes, seed-market-quotes)
duplicated the same fetchYahooWithRetry block with no proxy fallback.
This helper consolidates them and adds the proxy fallback.

Yahoo-specific: CURL-ONLY proxy strategy. Probed 2026-04-16:
  query1.finance.yahoo.com via CONNECT (httpsProxyFetchRaw): HTTP 404
  query1.finance.yahoo.com via curl    (curlFetch):          HTTP 200
Yahoo's edge blocks Decodo's CONNECT egress IPs but accepts the curl
egress IPs. Helper deliberately omits the CONNECT leg — adding it
would burn time on guaranteed-404 attempts. Production defaults expose
ONLY curlProxyResolver + curlFetcher.

All learnings from PR #3118 + #3119 reviews baked in:
- lastDirectError accumulator across the loop, embedded in final throw +
  Error.cause chain
- catch block uses break (NOT throw) so thrown errors also reach proxy
- DI seams (_curlProxyResolver, _proxyCurlFetcher) for hermetic tests
- _PROXY_DEFAULTS exported for production-default lock tests
- Sync curlFetch wrapped with await Promise.resolve() to future-proof
  against an async refactor (Greptile P2 from #3119)

Tests (tests/yahoo-fetch.test.mjs, 11 cases):
- Production defaults: curl resolver/fetcher reference equality
- Production defaults: NO CONNECT leg present (regression guard)
- 200 OK passthrough, never touches proxy
- 429 with no proxy → throws exhausted with HTTP 429 in message
- Retry-After header parsed correctly
- 429 + curl proxy succeeds → returns proxy data
- Thrown fetch error on final retry → proxy fallback runs (P1 guard)
- 429 + proxy ALSO fails → both errors visible in message + cause chain
- Proxy malformed JSON → throws exhausted
- Non-retryable 500 → no extra direct retry, falls to proxy
- parseRetryAfterMs unit (exported sanity check)

Verification: 11/11 helper tests pass. node --check clean.

Phase 1 of 2 — seeder migrations follow.

* feat(yahoo-seeders): migrate 4 seeders to _yahoo-fetch helper

Removes the duplicated fetchYahooWithRetry function (4 byte-identical
copies across seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes,
seed-market-quotes) and routes all Yahoo Finance fetches through the
new scripts/_yahoo-fetch.mjs helper. Each seeder gains the curl-only
Decodo proxy fallback baked into the helper.

Per-seeder changes (mechanical):
- import { fetchYahooJson } from './_yahoo-fetch.mjs'
- delete the local fetchYahooWithRetry function
- replace 'const resp = await fetchYahooWithRetry(url, label); if (!resp)
  return X; const json = await resp.json()' with
  'let json; try { json = await fetchYahooJson(url, { label }); }
  catch { return X; }'
- prune now-unused CHROME_UA/sleep imports where applicable

Latent bugs fixed in passing:
- seed-etf-flows.mjs:23 and seed-market-quotes.mjs:38 referenced
  CHROME_UA without importing it (would throw ReferenceError at
  runtime if the helper were called). Now the call site is gone in
  etf-flows; in market-quotes CHROME_UA is properly imported because
  Finnhub call still uses it.

seed-commodity-quotes also has fetchYahooChart1y (separate non-retry
function for gold history). Migrated to use fetchYahooJson under the
hood — preserves return shape, adds proxy fallback automatically.

Verification:
- node --check clean on all 4 modified seeders
- npm run typecheck:all clean
- npm run test:data: 5374/5374 pass

Phase 2 of 2.

* fix(_yahoo-fetch): log success AFTER parse + add _sleep DI seam for honest Retry-After test

Greptile P2: "[YAHOO] proxy (curl) succeeded" was logged BEFORE
JSON.parse(text). On malformed proxy JSON, Railway logs would show:

  [YAHOO] proxy (curl) succeeded for AAPL
  throw: Yahoo retries exhausted ...

Contradictory + breaks the post-deploy log-grep verification this PR
relies on ("look for [YAHOO] proxy (curl) succeeded"). Fix: parse
first; success log only fires when parse succeeds AND the value is
about to be returned.

Greptile P3: 'Retry-After header parsed correctly' test used header
value '0', but parseRetryAfterMs() treats non-positive seconds as null
→ helper falls through to default linear backoff. So the test was
exercising the wrong branch despite its name.

Fix: added _sleep DI opt seam to the helper. New test injects a sleep
spy and asserts the captured duration:

  Retry-After: '7' → captured sleep == [7000]   (Retry-After branch)
  no Retry-After  → captured sleep == [10]      (default backoff = retryBaseMs * 1)

Two paired tests lock both branches separately so a future regression
that collapses them is caught.

Also added a log-ordering regression test: malformed proxy JSON must
NOT emit the 'succeeded' log. Captures console.log into an array and
asserts no 'proxy (curl) succeeded' line appeared before the throw.

Verification:
- tests/yahoo-fetch.test.mjs: 13/13 (was 11, +2)
- npm run test:data: 5376/5376 (+2)
- npm run typecheck:all: clean

Followup commits on PR #3120.
2026-04-16 09:25:06 +04:00

163 lines
5.2 KiB
JavaScript
Executable File

#!/usr/bin/env node
import { loadEnvFile, loadSharedConfig, runSeed } from './_seed-utils.mjs';
import { fetchYahooJson } from './_yahoo-fetch.mjs';
import { fetchAvBulkQuotes } from './_shared-av.mjs';
const etfConfig = loadSharedConfig('etfs.json');
loadEnvFile(import.meta.url);
const CANONICAL_KEY = 'market:etf-flows:v1';
const CACHE_TTL = 5400; // 90min — 1h buffer over 15min cron cadence (was 60min = 45min buffer)
const YAHOO_DELAY_MS = 200;
const ETF_LIST = etfConfig.btcSpot;
function sleep(ms) {
return new Promise((r) => setTimeout(r, ms));
}
function parseEtfChartData(chart, ticker, issuer) {
const result = chart?.chart?.result?.[0];
if (!result) return null;
const quote = result.indicators?.quote?.[0];
const closes = quote?.close || [];
const volumes = quote?.volume || [];
const validCloses = closes.filter((p) => p != null);
const validVolumes = volumes.filter((v) => v != null);
if (validCloses.length < 2) return null;
const latestPrice = validCloses[validCloses.length - 1];
const prevPrice = validCloses[validCloses.length - 2];
const priceChange = prevPrice ? ((latestPrice - prevPrice) / prevPrice) * 100 : 0;
const latestVolume = validVolumes.length > 0 ? validVolumes[validVolumes.length - 1] : 0;
const avgVolume =
validVolumes.length > 1
? validVolumes.slice(0, -1).reduce((a, b) => a + b, 0) / (validVolumes.length - 1)
: latestVolume;
const volumeRatio = avgVolume > 0 ? latestVolume / avgVolume : 1;
const direction = priceChange > 0.1 ? 'inflow' : priceChange < -0.1 ? 'outflow' : 'neutral';
const estFlowMagnitude = latestVolume * latestPrice * (priceChange > 0 ? 1 : -1) * 0.1;
return {
ticker,
issuer,
price: +latestPrice.toFixed(2),
priceChange: +priceChange.toFixed(2),
volume: latestVolume,
avgVolume: Math.round(avgVolume),
volumeRatio: +volumeRatio.toFixed(2),
direction,
estFlow: Math.round(estFlowMagnitude),
};
}
async function fetchEtfFlows() {
const etfs = [];
let misses = 0;
const avKey = process.env.ALPHA_VANTAGE_API_KEY;
const covered = new Set();
// --- Primary: Alpha Vantage REALTIME_BULK_QUOTES ---
if (avKey) {
const tickers = ETF_LIST.map(e => e.ticker);
const avData = await fetchAvBulkQuotes(tickers, avKey);
for (const { ticker, issuer } of ETF_LIST) {
const av = avData.get(ticker);
if (!av) continue;
const { price, change: priceChange, volume } = av;
const direction = priceChange > 0.1 ? 'inflow' : priceChange < -0.1 ? 'outflow' : 'neutral';
const estFlow = Math.round(volume * price * (priceChange > 0 ? 1 : -1) * 0.1);
// avgVolume and volumeRatio require 5-day history not available from REALTIME_BULK_QUOTES
etfs.push({ ticker, issuer, price: +price.toFixed(2), priceChange: +priceChange.toFixed(2), volume, avgVolume: 0, volumeRatio: 0, direction, estFlow });
covered.add(ticker);
console.log(` [AV] ${ticker}: $${price.toFixed(2)} (${direction})`);
}
}
// --- Fallback: Yahoo (for any ETFs not covered by AV) ---
let yahooIdx = 0;
for (let i = 0; i < ETF_LIST.length; i++) {
const { ticker, issuer } = ETF_LIST[i];
if (covered.has(ticker)) continue;
if (yahooIdx > 0) await sleep(YAHOO_DELAY_MS);
yahooIdx++;
try {
const url = `https://query1.finance.yahoo.com/v8/finance/chart/${ticker}?range=5d&interval=1d`;
let chart;
try {
chart = await fetchYahooJson(url, { label: ticker });
} catch {
misses++;
continue;
}
const parsed = parseEtfChartData(chart, ticker, issuer);
if (parsed) {
etfs.push(parsed);
covered.add(ticker);
console.log(` [Yahoo] ${ticker}: $${parsed.price} (${parsed.direction})`);
} else {
misses++;
}
} catch (err) {
console.warn(` [Yahoo] ${ticker} error: ${err.message}`);
misses++;
}
if (misses >= 3 && etfs.length === 0) break;
}
if (etfs.length === 0) {
throw new Error(`All ETF fetches failed (${misses} misses)`);
}
const totalVolume = etfs.reduce((sum, e) => sum + e.volume, 0);
const totalEstFlow = etfs.reduce((sum, e) => sum + e.estFlow, 0);
const inflowCount = etfs.filter((e) => e.direction === 'inflow').length;
const outflowCount = etfs.filter((e) => e.direction === 'outflow').length;
etfs.sort((a, b) => b.volume - a.volume);
return {
timestamp: new Date().toISOString(),
summary: {
etfCount: etfs.length,
totalVolume,
totalEstFlow,
netDirection: totalEstFlow > 0 ? 'NET INFLOW' : totalEstFlow < 0 ? 'NET OUTFLOW' : 'NEUTRAL',
inflowCount,
outflowCount,
},
etfs,
rateLimited: false,
};
}
function validate(data) {
return Array.isArray(data?.etfs) && data.etfs.length >= 1;
}
export function declareRecords(data) {
return Array.isArray(data?.etfs) ? data.etfs.length : 0;
}
runSeed('market', 'etf-flows', CANONICAL_KEY, fetchEtfFlows, {
validateFn: validate,
ttlSeconds: CACHE_TTL,
sourceVersion: 'alphavantage+yahoo-chart-5d',
declareRecords,
schemaVersion: 1,
maxStaleMin: 60,
}).catch((err) => {
const _cause = err.cause ? ` (cause: ${err.cause.message || err.cause.code || err.cause})` : ''; console.error('FATAL:', (err.message || err) + _cause);
process.exit(1);
});