Files
worldmonitor/scripts/shared/iso2-to-iso3.json
Elie Habib 02555671f2 refactor: consolidate country name/code mappings into single canonical sources (#2676)
* refactor(country-maps): consolidate country name/ISO maps

Expand shared/country-names.json from 265 to 309 entries by merging
geojson names, COUNTRY_ALIAS_MAP, upstream API variants (World Bank,
WHO, UN, FAO), and seed-correlation extras.

Add ISO3 map generator (generate-iso3-maps.cjs) producing
iso3-to-iso2.json (239 entries) and iso2-to-iso3.json (239 entries)
with TWN and XKX supplements.

Add build-country-names.cjs for reproducible expansion from all sources.
Sync scripts/shared/ copies for edge-function test compatibility.

* refactor: consolidate country name/code mappings into single canonical sources

Eliminates fragmented country mapping across the repo. Every feature
(resilience, conflict, correlation, intelligence) was maintaining its
own partial alias map.

Data consolidation:
- Expand shared/country-names.json from 265 to 302 entries covering
  World Bank, WHO, UN, FAO, and correlation script naming variants
- Generate shared/iso3-to-iso2.json (239 entries) and
  shared/iso2-to-iso3.json from countries.geojson + supplements
  (Taiwan TWN, Kosovo XKX)

Consumer migrations:
- _country-resolver.mjs: delete COUNTRY_ALIAS_MAP (37 entries),
  replace 2MB geojson parse with 5KB iso3-to-iso2.json
- conflict/_shared.ts: replace 33-entry ISO2_TO_ISO3 literal
- seed-conflict-intel.mjs: replace 20-entry ISO2_TO_ISO3 literal
- _dimension-scorers.ts: replace geojson-based ISO3 construction
- get-risk-scores.ts: replace 31-entry ISO3_TO_ISO2 literal
- seed-correlation.mjs: replace 102-entry COUNTRY_NAME_TO_ISO2
  and 90-entry ISO3_TO_ISO2, use resolveIso2() from canonical
  resolver, lower short-alias threshold to 2 chars with word
  boundary matching, export matchCountryNamesInText(), add isMain
  guard

Tests:
- New tests/country-resolver.test.mjs with structural validation,
  parity regression for all 37 old aliases, ISO3 bidirectional
  consistency, and Taiwan/Kosovo assertions
- Updated resilience seed test for new resolver signature

Net: -190 lines, 0 hardcoded country maps remaining

* fix: normalize raw text before country name matching

Text matchers (geo-extract, seed-security-advisories, seed-correlation)
were matching normalized keys against raw text containing diacritics
and punctuation. "Curaçao", "Timor-Leste", "Hong Kong S.A.R." all
failed to resolve after country-names.json keys were normalized.

Fix: apply NFKD + diacritic stripping + punctuation normalization to
input text before matching, same transform used on the keys.

Also add "hong kong" and "sao tome" as short-form keys for bigram
headline matching in geo-extract.

* fix: remove 'u s' alias that caused US/VI misattribution

'u s' in country-names.json matched before 'u s virgin islands' in
geo-extract's bigram scanner, attributing Virgin Islands headlines
to US. Removed since 'usa', 'united states', and the uppercase US
expansion already cover the United States.
2026-04-04 15:38:02 +04:00

242 lines
3.5 KiB
JSON

{
"AD": "AND",
"AE": "ARE",
"AF": "AFG",
"AG": "ATG",
"AI": "AIA",
"AL": "ALB",
"AM": "ARM",
"AO": "AGO",
"AQ": "ATA",
"AR": "ARG",
"AS": "ASM",
"AT": "AUT",
"AU": "AUS",
"AW": "ABW",
"AX": "ALA",
"AZ": "AZE",
"BA": "BIH",
"BB": "BRB",
"BD": "BGD",
"BE": "BEL",
"BF": "BFA",
"BG": "BGR",
"BH": "BHR",
"BI": "BDI",
"BJ": "BEN",
"BL": "BLM",
"BM": "BMU",
"BN": "BRN",
"BO": "BOL",
"BR": "BRA",
"BS": "BHS",
"BT": "BTN",
"BW": "BWA",
"BY": "BLR",
"BZ": "BLZ",
"CA": "CAN",
"CD": "COD",
"CF": "CAF",
"CG": "COG",
"CH": "CHE",
"CI": "CIV",
"CK": "COK",
"CL": "CHL",
"CM": "CMR",
"CN": "CHN",
"CO": "COL",
"CR": "CRI",
"CU": "CUB",
"CV": "CPV",
"CW": "CUW",
"CY": "CYP",
"CZ": "CZE",
"DE": "DEU",
"DJ": "DJI",
"DK": "DNK",
"DM": "DMA",
"DO": "DOM",
"DZ": "DZA",
"EC": "ECU",
"EE": "EST",
"EG": "EGY",
"EH": "ESH",
"ER": "ERI",
"ES": "ESP",
"ET": "ETH",
"FI": "FIN",
"FJ": "FJI",
"FK": "FLK",
"FM": "FSM",
"FO": "FRO",
"FR": "FRA",
"GA": "GAB",
"GB": "GBR",
"GD": "GRD",
"GE": "GEO",
"GG": "GGY",
"GH": "GHA",
"GI": "GIB",
"GL": "GRL",
"GM": "GMB",
"GN": "GIN",
"GQ": "GNQ",
"GR": "GRC",
"GS": "SGS",
"GT": "GTM",
"GU": "GUM",
"GW": "GNB",
"GY": "GUY",
"HK": "HKG",
"HM": "HMD",
"HN": "HND",
"HR": "HRV",
"HT": "HTI",
"HU": "HUN",
"ID": "IDN",
"IE": "IRL",
"IL": "ISR",
"IM": "IMN",
"IN": "IND",
"IO": "IOT",
"IQ": "IRQ",
"IR": "IRN",
"IS": "ISL",
"IT": "ITA",
"JE": "JEY",
"JM": "JAM",
"JO": "JOR",
"JP": "JPN",
"KE": "KEN",
"KG": "KGZ",
"KH": "KHM",
"KI": "KIR",
"KM": "COM",
"KN": "KNA",
"KP": "PRK",
"KR": "KOR",
"KW": "KWT",
"KY": "CYM",
"KZ": "KAZ",
"LA": "LAO",
"LB": "LBN",
"LC": "LCA",
"LI": "LIE",
"LK": "LKA",
"LR": "LBR",
"LS": "LSO",
"LT": "LTU",
"LU": "LUX",
"LV": "LVA",
"LY": "LBY",
"MA": "MAR",
"MC": "MCO",
"MD": "MDA",
"ME": "MNE",
"MF": "MAF",
"MG": "MDG",
"MH": "MHL",
"MK": "MKD",
"ML": "MLI",
"MM": "MMR",
"MN": "MNG",
"MO": "MAC",
"MP": "MNP",
"MR": "MRT",
"MS": "MSR",
"MT": "MLT",
"MU": "MUS",
"MV": "MDV",
"MW": "MWI",
"MX": "MEX",
"MY": "MYS",
"MZ": "MOZ",
"NA": "NAM",
"NC": "NCL",
"NE": "NER",
"NF": "NFK",
"NG": "NGA",
"NI": "NIC",
"NL": "NLD",
"NO": "NOR",
"NP": "NPL",
"NR": "NRU",
"NU": "NIU",
"NZ": "NZL",
"OM": "OMN",
"PA": "PAN",
"PE": "PER",
"PF": "PYF",
"PG": "PNG",
"PH": "PHL",
"PK": "PAK",
"PL": "POL",
"PM": "SPM",
"PN": "PCN",
"PR": "PRI",
"PS": "PSE",
"PT": "PRT",
"PW": "PLW",
"PY": "PRY",
"QA": "QAT",
"RO": "ROU",
"RS": "SRB",
"RU": "RUS",
"RW": "RWA",
"SA": "SAU",
"SB": "SLB",
"SC": "SYC",
"SD": "SDN",
"SE": "SWE",
"SG": "SGP",
"SH": "SHN",
"SI": "SVN",
"SK": "SVK",
"SL": "SLE",
"SM": "SMR",
"SN": "SEN",
"SO": "SOM",
"SR": "SUR",
"SS": "SSD",
"ST": "STP",
"SV": "SLV",
"SX": "SXM",
"SY": "SYR",
"SZ": "SWZ",
"TC": "TCA",
"TD": "TCD",
"TF": "ATF",
"TG": "TGO",
"TH": "THA",
"TJ": "TJK",
"TL": "TLS",
"TM": "TKM",
"TN": "TUN",
"TO": "TON",
"TR": "TUR",
"TT": "TTO",
"TV": "TUV",
"TW": "TWN",
"TZ": "TZA",
"UA": "UKR",
"UG": "UGA",
"UM": "UMI",
"US": "USA",
"UY": "URY",
"UZ": "UZB",
"VA": "VAT",
"VC": "VCT",
"VE": "VEN",
"VG": "VGB",
"VI": "VIR",
"VN": "VNM",
"VU": "VUT",
"WF": "WLF",
"WS": "WSM",
"XK": "XKX",
"YE": "YEM",
"ZA": "ZAF",
"ZM": "ZMB",
"ZW": "ZWE"
}