Files
worldmonitor/shared/country-names.json
Elie Habib 02555671f2 refactor: consolidate country name/code mappings into single canonical sources (#2676)
* refactor(country-maps): consolidate country name/ISO maps

Expand shared/country-names.json from 265 to 309 entries by merging
geojson names, COUNTRY_ALIAS_MAP, upstream API variants (World Bank,
WHO, UN, FAO), and seed-correlation extras.

Add ISO3 map generator (generate-iso3-maps.cjs) producing
iso3-to-iso2.json (239 entries) and iso2-to-iso3.json (239 entries)
with TWN and XKX supplements.

Add build-country-names.cjs for reproducible expansion from all sources.
Sync scripts/shared/ copies for edge-function test compatibility.

* refactor: consolidate country name/code mappings into single canonical sources

Eliminates fragmented country mapping across the repo. Every feature
(resilience, conflict, correlation, intelligence) was maintaining its
own partial alias map.

Data consolidation:
- Expand shared/country-names.json from 265 to 302 entries covering
  World Bank, WHO, UN, FAO, and correlation script naming variants
- Generate shared/iso3-to-iso2.json (239 entries) and
  shared/iso2-to-iso3.json from countries.geojson + supplements
  (Taiwan TWN, Kosovo XKX)

Consumer migrations:
- _country-resolver.mjs: delete COUNTRY_ALIAS_MAP (37 entries),
  replace 2MB geojson parse with 5KB iso3-to-iso2.json
- conflict/_shared.ts: replace 33-entry ISO2_TO_ISO3 literal
- seed-conflict-intel.mjs: replace 20-entry ISO2_TO_ISO3 literal
- _dimension-scorers.ts: replace geojson-based ISO3 construction
- get-risk-scores.ts: replace 31-entry ISO3_TO_ISO2 literal
- seed-correlation.mjs: replace 102-entry COUNTRY_NAME_TO_ISO2
  and 90-entry ISO3_TO_ISO2, use resolveIso2() from canonical
  resolver, lower short-alias threshold to 2 chars with word
  boundary matching, export matchCountryNamesInText(), add isMain
  guard

Tests:
- New tests/country-resolver.test.mjs with structural validation,
  parity regression for all 37 old aliases, ISO3 bidirectional
  consistency, and Taiwan/Kosovo assertions
- Updated resilience seed test for new resolver signature

Net: -190 lines, 0 hardcoded country maps remaining

* fix: normalize raw text before country name matching

Text matchers (geo-extract, seed-security-advisories, seed-correlation)
were matching normalized keys against raw text containing diacritics
and punctuation. "Curaçao", "Timor-Leste", "Hong Kong S.A.R." all
failed to resolve after country-names.json keys were normalized.

Fix: apply NFKD + diacritic stripping + punctuation normalization to
input text before matching, same transform used on the keys.

Also add "hong kong" and "sao tome" as short-form keys for bigram
headline matching in geo-extract.

* fix: remove 'u s' alias that caused US/VI misattribution

'u s' in country-names.json matched before 'u s virgin islands' in
geo-extract's bigram scanner, attributing Virgin Islands headlines
to US. Removed since 'usa', 'united states', and the uppercase US
expansion already cover the United States.
2026-04-04 15:38:02 +04:00

306 lines
6.7 KiB
JSON

{
"afghanistan": "AF",
"aland": "AX",
"albania": "AL",
"algeria": "DZ",
"american samoa": "AS",
"andorra": "AD",
"angola": "AO",
"anguilla": "AI",
"antarctica": "AQ",
"antigua and barbuda": "AG",
"argentina": "AR",
"armenia": "AM",
"aruba": "AW",
"australia": "AU",
"austria": "AT",
"azerbaijan": "AZ",
"bahamas": "BS",
"bahamas the": "BS",
"bahrain": "BH",
"bangladesh": "BD",
"barbados": "BB",
"belarus": "BY",
"belgium": "BE",
"belize": "BZ",
"benin": "BJ",
"bermuda": "BM",
"bhutan": "BT",
"bolivarian republic of venezuela": "VE",
"bolivia": "BO",
"bosnia and herzegovina": "BA",
"botswana": "BW",
"brazil": "BR",
"british indian ocean territory": "IO",
"british virgin islands": "VG",
"brunei": "BN",
"brunei darussalam": "BN",
"bulgaria": "BG",
"burkina faso": "BF",
"burma": "MM",
"burundi": "BI",
"cabo verde": "CV",
"cambodia": "KH",
"cameroon": "CM",
"canada": "CA",
"cape verde": "CV",
"cayman islands": "KY",
"central african republic": "CF",
"chad": "TD",
"chile": "CL",
"china": "CN",
"colombia": "CO",
"comoros": "KM",
"congo": "CG",
"congo brazzaville": "CG",
"congo dem rep": "CD",
"congo kinshasa": "CD",
"congo rep": "CG",
"cook islands": "CK",
"costa rica": "CR",
"cote d ivoire": "CI",
"croatia": "HR",
"cuba": "CU",
"curacao": "CW",
"cyprus": "CY",
"czech republic": "CZ",
"czechia": "CZ",
"democratic peoples republic of korea": "KP",
"democratic republic of the congo": "CD",
"denmark": "DK",
"djibouti": "DJ",
"dominica": "DM",
"dominican republic": "DO",
"dr congo": "CD",
"drc": "CD",
"east timor": "TL",
"ecuador": "EC",
"egypt": "EG",
"egypt arab rep": "EG",
"el salvador": "SV",
"equatorial guinea": "GQ",
"eritrea": "ER",
"estonia": "EE",
"eswatini": "SZ",
"ethiopia": "ET",
"falkland islands": "FK",
"faroe islands": "FO",
"federated states of micronesia": "FM",
"fiji": "FJ",
"finland": "FI",
"france": "FR",
"french polynesia": "PF",
"french southern and antarctic lands": "TF",
"gabon": "GA",
"gambia": "GM",
"gambia the": "GM",
"gaza": "PS",
"georgia": "GE",
"germany": "DE",
"ghana": "GH",
"gibraltar": "GI",
"greece": "GR",
"greenland": "GL",
"grenada": "GD",
"guam": "GU",
"guatemala": "GT",
"guernsey": "GG",
"guinea": "GN",
"guinea bissau": "GW",
"guyana": "GY",
"haiti": "HT",
"heard island and mcdonald islands": "HM",
"honduras": "HN",
"hong kong": "HK",
"hong kong s a r": "HK",
"hong kong sar china": "HK",
"hungary": "HU",
"iceland": "IS",
"india": "IN",
"indonesia": "ID",
"iran": "IR",
"iran islamic rep": "IR",
"iraq": "IQ",
"ireland": "IE",
"isle of man": "IM",
"israel": "IL",
"italy": "IT",
"ivory coast": "CI",
"jamaica": "JM",
"japan": "JP",
"jersey": "JE",
"jordan": "JO",
"kazakhstan": "KZ",
"kenya": "KE",
"kiribati": "KI",
"korea dem peoples rep": "KP",
"korea rep": "KR",
"kosovo": "XK",
"kuwait": "KW",
"kyrgyz republic": "KG",
"kyrgyzstan": "KG",
"lao pdr": "LA",
"laos": "LA",
"latvia": "LV",
"lebanon": "LB",
"lesotho": "LS",
"liberia": "LR",
"libya": "LY",
"liechtenstein": "LI",
"lithuania": "LT",
"luxembourg": "LU",
"macao s a r": "MO",
"macao sar china": "MO",
"madagascar": "MG",
"malawi": "MW",
"malaysia": "MY",
"maldives": "MV",
"mali": "ML",
"malta": "MT",
"marshall islands": "MH",
"mauritania": "MR",
"mauritius": "MU",
"mexico": "MX",
"micronesia": "FM",
"micronesia fed sts": "FM",
"moldova": "MD",
"monaco": "MC",
"mongolia": "MN",
"montenegro": "ME",
"montserrat": "MS",
"morocco": "MA",
"morocco western sahara": "MA",
"mozambique": "MZ",
"myanmar": "MM",
"namibia": "NA",
"nauru": "NR",
"nepal": "NP",
"netherlands": "NL",
"new caledonia": "NC",
"new zealand": "NZ",
"nicaragua": "NI",
"niger": "NE",
"nigeria": "NG",
"niue": "NU",
"norfolk island": "NF",
"north korea": "KP",
"north macedonia": "MK",
"northern mariana islands": "MP",
"norway": "NO",
"occupied palestinian territory": "PS",
"oman": "OM",
"pakistan": "PK",
"palau": "PW",
"palestine": "PS",
"palestine state of": "PS",
"palestinian territories": "PS",
"panama": "PA",
"papua new guinea": "PG",
"paraguay": "PY",
"peru": "PE",
"philippines": "PH",
"pitcairn islands": "PN",
"plurinational state of bolivia": "BO",
"poland": "PL",
"portugal": "PT",
"puerto rico": "PR",
"qatar": "QA",
"republic of korea": "KR",
"republic of serbia": "RS",
"republic of the congo": "CG",
"romania": "RO",
"russia": "RU",
"russian federation": "RU",
"rwanda": "RW",
"saint barthelemy": "BL",
"saint helena": "SH",
"saint kitts and nevis": "KN",
"saint lucia": "LC",
"saint martin": "MF",
"saint pierre and miquelon": "PM",
"saint vincent and the grenadines": "VC",
"samoa": "WS",
"san marino": "SM",
"sao tome": "ST",
"sao tome and principe": "ST",
"saudi arabia": "SA",
"senegal": "SN",
"serbia": "RS",
"seychelles": "SC",
"sierra leone": "SL",
"singapore": "SG",
"sint maarten": "SX",
"slovak republic": "SK",
"slovakia": "SK",
"slovenia": "SI",
"solomon islands": "SB",
"somalia": "SO",
"south africa": "ZA",
"south georgia and the islands": "GS",
"south korea": "KR",
"south sudan": "SS",
"spain": "ES",
"sri lanka": "LK",
"st kitts and nevis": "KN",
"st lucia": "LC",
"st vincent and the grenadines": "VC",
"sudan": "SD",
"suriname": "SR",
"swaziland": "SZ",
"sweden": "SE",
"switzerland": "CH",
"syria": "SY",
"syrian arab republic": "SY",
"taiwan": "TW",
"tajikistan": "TJ",
"tanzania": "TZ",
"thailand": "TH",
"the bahamas": "BS",
"the comoros": "KM",
"the gambia": "GM",
"the maldives": "MV",
"the netherlands": "NL",
"the philippines": "PH",
"the seychelles": "SC",
"timor leste": "TL",
"togo": "TG",
"tonga": "TO",
"trinidad and tobago": "TT",
"tunisia": "TN",
"turkey": "TR",
"turkiye": "TR",
"turkmenistan": "TM",
"turks and caicos": "TC",
"turks and caicos islands": "TC",
"tuvalu": "TV",
"u s virgin islands": "VI",
"uae": "AE",
"uganda": "UG",
"uk": "GB",
"ukraine": "UA",
"united arab emirates": "AE",
"united kingdom": "GB",
"united republic of tanzania": "TZ",
"united states": "US",
"united states minor outlying islands": "UM",
"united states of america": "US",
"united states virgin islands": "VI",
"uruguay": "UY",
"usa": "US",
"uzbekistan": "UZ",
"vanuatu": "VU",
"vatican": "VA",
"venezuela": "VE",
"venezuela rb": "VE",
"viet nam": "VN",
"vietnam": "VN",
"wallis and futuna": "WF",
"west bank": "PS",
"west bank and gaza": "PS",
"western sahara": "EH",
"yemen": "YE",
"yemen rep": "YE",
"zambia": "ZM",
"zimbabwe": "ZW"
}