mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(_yahoo-fetch): curl-only Decodo proxy fallback helper Yahoo Finance throttles Railway egress IPs aggressively. 4 seeders (seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes, seed-market-quotes) duplicated the same fetchYahooWithRetry block with no proxy fallback. This helper consolidates them and adds the proxy fallback. Yahoo-specific: CURL-ONLY proxy strategy. Probed 2026-04-16: query1.finance.yahoo.com via CONNECT (httpsProxyFetchRaw): HTTP 404 query1.finance.yahoo.com via curl (curlFetch): HTTP 200 Yahoo's edge blocks Decodo's CONNECT egress IPs but accepts the curl egress IPs. Helper deliberately omits the CONNECT leg — adding it would burn time on guaranteed-404 attempts. Production defaults expose ONLY curlProxyResolver + curlFetcher. All learnings from PR #3118 + #3119 reviews baked in: - lastDirectError accumulator across the loop, embedded in final throw + Error.cause chain - catch block uses break (NOT throw) so thrown errors also reach proxy - DI seams (_curlProxyResolver, _proxyCurlFetcher) for hermetic tests - _PROXY_DEFAULTS exported for production-default lock tests - Sync curlFetch wrapped with await Promise.resolve() to future-proof against an async refactor (Greptile P2 from #3119) Tests (tests/yahoo-fetch.test.mjs, 11 cases): - Production defaults: curl resolver/fetcher reference equality - Production defaults: NO CONNECT leg present (regression guard) - 200 OK passthrough, never touches proxy - 429 with no proxy → throws exhausted with HTTP 429 in message - Retry-After header parsed correctly - 429 + curl proxy succeeds → returns proxy data - Thrown fetch error on final retry → proxy fallback runs (P1 guard) - 429 + proxy ALSO fails → both errors visible in message + cause chain - Proxy malformed JSON → throws exhausted - Non-retryable 500 → no extra direct retry, falls to proxy - parseRetryAfterMs unit (exported sanity check) Verification: 11/11 helper tests pass. node --check clean. Phase 1 of 2 — seeder migrations follow. * feat(yahoo-seeders): migrate 4 seeders to _yahoo-fetch helper Removes the duplicated fetchYahooWithRetry function (4 byte-identical copies across seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes, seed-market-quotes) and routes all Yahoo Finance fetches through the new scripts/_yahoo-fetch.mjs helper. Each seeder gains the curl-only Decodo proxy fallback baked into the helper. Per-seeder changes (mechanical): - import { fetchYahooJson } from './_yahoo-fetch.mjs' - delete the local fetchYahooWithRetry function - replace 'const resp = await fetchYahooWithRetry(url, label); if (!resp) return X; const json = await resp.json()' with 'let json; try { json = await fetchYahooJson(url, { label }); } catch { return X; }' - prune now-unused CHROME_UA/sleep imports where applicable Latent bugs fixed in passing: - seed-etf-flows.mjs:23 and seed-market-quotes.mjs:38 referenced CHROME_UA without importing it (would throw ReferenceError at runtime if the helper were called). Now the call site is gone in etf-flows; in market-quotes CHROME_UA is properly imported because Finnhub call still uses it. seed-commodity-quotes also has fetchYahooChart1y (separate non-retry function for gold history). Migrated to use fetchYahooJson under the hood — preserves return shape, adds proxy fallback automatically. Verification: - node --check clean on all 4 modified seeders - npm run typecheck:all clean - npm run test:data: 5374/5374 pass Phase 2 of 2. * fix(_yahoo-fetch): log success AFTER parse + add _sleep DI seam for honest Retry-After test Greptile P2: "[YAHOO] proxy (curl) succeeded" was logged BEFORE JSON.parse(text). On malformed proxy JSON, Railway logs would show: [YAHOO] proxy (curl) succeeded for AAPL throw: Yahoo retries exhausted ... Contradictory + breaks the post-deploy log-grep verification this PR relies on ("look for [YAHOO] proxy (curl) succeeded"). Fix: parse first; success log only fires when parse succeeds AND the value is about to be returned. Greptile P3: 'Retry-After header parsed correctly' test used header value '0', but parseRetryAfterMs() treats non-positive seconds as null → helper falls through to default linear backoff. So the test was exercising the wrong branch despite its name. Fix: added _sleep DI opt seam to the helper. New test injects a sleep spy and asserts the captured duration: Retry-After: '7' → captured sleep == [7000] (Retry-After branch) no Retry-After → captured sleep == [10] (default backoff = retryBaseMs * 1) Two paired tests lock both branches separately so a future regression that collapses them is caught. Also added a log-ordering regression test: malformed proxy JSON must NOT emit the 'succeeded' log. Captures console.log into an array and asserts no 'proxy (curl) succeeded' line appeared before the throw. Verification: - tests/yahoo-fetch.test.mjs: 13/13 (was 11, +2) - npm run test:data: 5376/5376 (+2) - npm run typecheck:all: clean Followup commits on PR #3120.
172 lines
7.2 KiB
JavaScript
172 lines
7.2 KiB
JavaScript
// Yahoo Finance fetch helper with curl-only Decodo proxy fallback.
|
|
//
|
|
// Yahoo Finance throttles Railway egress IPs aggressively (429s). Existing
|
|
// seeders had identical `fetchYahooWithRetry` blocks duplicated 4 times
|
|
// (seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes,
|
|
// seed-market-quotes) with no proxy fallback. This helper consolidates
|
|
// them and adds the proxy fallback.
|
|
//
|
|
// PROXY STRATEGY — CURL ONLY, NO CONNECT
|
|
//
|
|
// Decodo provides two egress paths via different hosts:
|
|
// - resolveProxyForConnect() → gate.decodo.com (CONNECT egress pool)
|
|
// - resolveProxy() → us.decodo.com (curl-x egress pool)
|
|
//
|
|
// Probed 2026-04-16:
|
|
// query1.finance.yahoo.com via CONNECT (httpsProxyFetchRaw): HTTP 404
|
|
// query1.finance.yahoo.com via curl (curlFetch): HTTP 200
|
|
//
|
|
// Yahoo's edge blocks Decodo's CONNECT egress IPs but accepts the curl
|
|
// egress IPs. So this helper deliberately omits the CONNECT leg — adding
|
|
// it would burn time on a guaranteed-404 attempt before the curl path
|
|
// runs anyway. Production defaults expose ONLY the curl resolver +
|
|
// fetcher (see _PROXY_DEFAULTS).
|
|
//
|
|
// If Yahoo's behavior toward Decodo CONNECT changes (e.g. Decodo rotates
|
|
// the CONNECT pool), add a second leg following the
|
|
// scripts/_open-meteo-archive.mjs cascade pattern.
|
|
|
|
import { CHROME_UA, sleep, resolveProxy, curlFetch } from './_seed-utils.mjs';
|
|
|
|
const RETRYABLE_STATUSES = new Set([429, 503]);
|
|
const MAX_RETRY_AFTER_MS = 60_000;
|
|
|
|
/**
|
|
* Production defaults. Exported so tests can lock the wiring at the
|
|
* helper level (see tests/yahoo-fetch.test.mjs production-defaults
|
|
* cases). Mixing these up — e.g. swapping in resolveProxyForConnect
|
|
* — would route requests through the egress pool Yahoo blocks.
|
|
*/
|
|
export const _PROXY_DEFAULTS = Object.freeze({
|
|
curlProxyResolver: resolveProxy,
|
|
curlFetcher: curlFetch,
|
|
});
|
|
|
|
/**
|
|
* Parse `Retry-After` header value (seconds OR HTTP-date). Mirrors the
|
|
* helper in scripts/_open-meteo-archive.mjs — duplicated for now to keep
|
|
* each helper module self-contained; consolidate to _seed-utils.mjs if
|
|
* a third helper needs it.
|
|
*/
|
|
export function parseRetryAfterMs(value) {
|
|
if (!value) return null;
|
|
const seconds = Number(value);
|
|
if (Number.isFinite(seconds) && seconds > 0) {
|
|
return Math.min(seconds * 1000, MAX_RETRY_AFTER_MS);
|
|
}
|
|
const retryAt = Date.parse(value);
|
|
if (Number.isFinite(retryAt)) {
|
|
return Math.min(Math.max(retryAt - Date.now(), 1000), MAX_RETRY_AFTER_MS);
|
|
}
|
|
return null;
|
|
}
|
|
|
|
/**
|
|
* Fetch JSON from a Yahoo Finance endpoint with retry + proxy fallback.
|
|
*
|
|
* @param {string} url - Yahoo Finance URL (typically
|
|
* `https://query1.finance.yahoo.com/v8/finance/chart/<symbol>...`).
|
|
* @param {object} [opts]
|
|
* @param {string} [opts.label] - Symbol or label for log lines (default 'unknown').
|
|
* @param {number} [opts.timeoutMs] - Per-attempt timeout (default 10_000).
|
|
* @param {number} [opts.maxRetries] - Direct retries (default 3 → 4 attempts total).
|
|
* @param {number} [opts.retryBaseMs] - Linear backoff base (default 5_000).
|
|
* @returns {Promise<unknown>} Parsed JSON. Throws on exhaustion.
|
|
*
|
|
* Throws (does NOT return null) on exhaustion — caller decides whether
|
|
* to swallow with try/catch. Existing pre-helper code returned null on
|
|
* failure; migrating callers should wrap in try/catch where null
|
|
* semantics is required (rare — most should propagate the error).
|
|
*/
|
|
export async function fetchYahooJson(url, opts = {}) {
|
|
const {
|
|
label = 'unknown',
|
|
timeoutMs = 10_000,
|
|
maxRetries = 3,
|
|
retryBaseMs = 5_000,
|
|
// Test hooks. Production callers leave these unset and get
|
|
// _PROXY_DEFAULTS. Tests inject mocks to exercise the proxy path
|
|
// without spinning up real curl execs. `_sleep` lets tests assert
|
|
// the actual backoff durations (e.g. Retry-After parsing) without
|
|
// sleeping in real time.
|
|
_curlProxyResolver = _PROXY_DEFAULTS.curlProxyResolver,
|
|
_proxyCurlFetcher = _PROXY_DEFAULTS.curlFetcher,
|
|
_sleep = sleep,
|
|
} = opts;
|
|
|
|
// Track the last direct-path failure so the eventual throw carries
|
|
// useful upstream context (HTTP status, error message). Without this
|
|
// the helper would throw "retries exhausted" alone and lose the signal
|
|
// that triggered the proxy attempt.
|
|
let lastDirectError = null;
|
|
|
|
for (let attempt = 0; attempt <= maxRetries; attempt++) {
|
|
let resp;
|
|
try {
|
|
resp = await fetch(url, {
|
|
headers: { 'User-Agent': CHROME_UA },
|
|
signal: AbortSignal.timeout(timeoutMs),
|
|
});
|
|
} catch (err) {
|
|
lastDirectError = err;
|
|
if (attempt < maxRetries) {
|
|
const retryMs = retryBaseMs * (attempt + 1);
|
|
console.warn(` [YAHOO] ${label} ${err?.message ?? err}; retrying in ${Math.round(retryMs / 1000)}s (${attempt + 1}/${maxRetries})`);
|
|
await _sleep(retryMs);
|
|
continue;
|
|
}
|
|
// Final direct attempt threw (timeout, ECONNRESET, DNS, etc.).
|
|
// Fall through to the proxy fallback below — NEVER throw here.
|
|
// PR #3118 review: throwing here silently bypasses the proxy path
|
|
// for thrown-error cases.
|
|
break;
|
|
}
|
|
|
|
if (resp.ok) return await resp.json();
|
|
|
|
lastDirectError = new Error(`HTTP ${resp.status}`);
|
|
|
|
if (RETRYABLE_STATUSES.has(resp.status) && attempt < maxRetries) {
|
|
const retryAfter = parseRetryAfterMs(resp.headers.get('retry-after'));
|
|
const retryMs = retryAfter ?? retryBaseMs * (attempt + 1);
|
|
console.warn(` [YAHOO] ${label} ${resp.status} — waiting ${Math.round(retryMs / 1000)}s (${attempt + 1}/${maxRetries})`);
|
|
await _sleep(retryMs);
|
|
continue;
|
|
}
|
|
|
|
break;
|
|
}
|
|
|
|
// Curl-only proxy fallback. See module header for why CONNECT is
|
|
// omitted (Yahoo blocks Decodo's CONNECT egress IPs).
|
|
const curlProxyAuth = _curlProxyResolver();
|
|
if (curlProxyAuth) {
|
|
try {
|
|
console.log(` [YAHOO] direct exhausted on ${label} (${lastDirectError?.message ?? 'unknown'}); trying proxy (curl)`);
|
|
// _proxyCurlFetcher (curlFetch / execFileSync) is sync today;
|
|
// wrap with await Promise.resolve so a future async refactor of
|
|
// curlFetch silently keeps working instead of handing a Promise
|
|
// to JSON.parse (Greptile P2 from PR #3119).
|
|
const text = await Promise.resolve(_proxyCurlFetcher(url, curlProxyAuth, { 'User-Agent': CHROME_UA, Accept: 'application/json' }));
|
|
// Parse BEFORE logging success. If JSON.parse throws, the catch block
|
|
// below records lastProxyError and we throw exhausted — no contradictory
|
|
// "succeeded" log line followed by an "exhausted" throw. The post-deploy
|
|
// verification in the PR description relies on this success log being
|
|
// a true success signal.
|
|
const parsed = JSON.parse(text);
|
|
console.log(` [YAHOO] proxy (curl) succeeded for ${label}`);
|
|
return parsed;
|
|
} catch (curlErr) {
|
|
throw new Error(
|
|
`Yahoo retries exhausted for ${label} (last direct: ${lastDirectError?.message ?? 'unknown'}; last proxy: ${curlErr?.message ?? curlErr})`,
|
|
{ cause: lastDirectError ?? curlErr },
|
|
);
|
|
}
|
|
}
|
|
|
|
throw new Error(
|
|
`Yahoo retries exhausted for ${label}${lastDirectError ? ` (last direct: ${lastDirectError.message})` : ''}`,
|
|
lastDirectError ? { cause: lastDirectError } : undefined,
|
|
);
|
|
}
|