19 KiB
BRACC Data Source Catalog
Generated from docs/source_registry_br_v1.csv (as-of UTC: 2026-03-01T23:05:00Z)
- Universe v1 sources: 108
- Implemented pipelines: 45
- Loaded sources (load_state=loaded): 36
- Partial sources (load_state=partial): 8
- Not loaded sources (load_state=not_loaded): 64
- Status counts: loaded=36, partial=5, stale=3, blocked_external=1, not_built=63
Catalog note: counts and status labels are generated from the public registry (docs/source_registry_br_v1.csv).
This document includes reference production inventory context and backlog discovery; it is not a guarantee that every listed source is currently loaded in your local environment.
1. Reference Production Snapshot (Loaded/Implemented Inventory)
The table below is a timestamped reference snapshot and should be interpreted together with the generated summary block above.
| # | Source | Pipeline | Nodes Created | Rels Created | Notes |
|---|---|---|---|---|---|
| 1 | CNPJ (Receita Federal) | cnpj |
53.6M Company, 1.98M Person | 24.6M SOCIO_DE | ~85GB uncompressed |
| 2 | TSE (Elections + Donations) | tse |
7.1M Person, 101K Election | 8.2M DOOU, 2.93M CANDIDATO_EM | 2002-2024 historical |
| 3 | Transparencia (Contracts) | transparencia |
38K Contract, 27.6K Amendment | 32K VENCEU, 29K AUTOR_EMENDA | Federal contracts |
| 4 | CEIS/CNEP (Sanctions) | sanctions |
23.8K Sanction | 23.8K SANCIONADA | Banned companies/persons |
| 5 | BNDES (Dev. Bank Loans) | bndes |
9.2K Finance | 8.7K RECEBEU_EMPRESTIMO | |
| 6 | PGFN (Tax Debt) | pgfn |
24M Finance | 24M DEVE | Divida ativa da Uniao |
| 7 | ComprasNet (Contracts) | comprasnet |
1.08M Contract | 1.07M VENCEU | Federal procurement |
| 8 | TCU (Audit Sanctions) | tcu |
45K Sanction | 45K SANCIONADA | Inabilitados/inidoneos |
| 9 | TransfereGov | transferegov |
71K Amendment, 67K Convenio | 320K BENEFICIOU, 70K GEROU_CONVENIO | Federal transfers |
| 10 | RAIS (Labor Stats) | rais |
29.5K LaborStats | -- | Aggregate by CNAE+UF (no CPF) |
| 11 | INEP (Education) | inep |
224K Education | 18K MANTEDORA_DE | Education census |
| 12 | DATASUS/CNES | datasus |
602K Health | 435K OPERA_UNIDADE | Health facility registry |
| 13 | IBAMA (Embargoes) | ibama |
79K Embargo | 79K EMBARGADA | Environmental enforcement |
| 14 | DOU (Official Gazette) | dou |
3.98M DOUAct | 169K MENCIONOU, 13K PUBLICOU | Parquet via BigQuery |
| 15 | Camara (Expenses) | camara |
4.6M Expense | 4.6M GASTOU, 4.9M FORNECEU | Deputy CEAP expenses |
| 16 | Senado (Expenses) | senado |
272K Expense | 272K FORNECEU | Senator CEAPS expenses |
| 17 | ICIJ (Offshore Leaks) | icij |
4.8K OffshoreEntity, 6.6K OffshoreOfficer | 2.3K OFFICER_OF | Panama/Paradise/Pandora papers |
| 18 | OpenSanctions (Global PEPs) | opensanctions |
118K GlobalPEP | 7.6K GLOBAL_PEP_MATCH | Name-matched to Brazilian entities |
| 19 | CVM (Proceedings) | cvm |
522 CVMProceeding | 1.1K CVM_SANCIONADA | Securities sanctions |
| 20 | CVM Funds | cvm_funds |
41K Fund | -- | Investment fund registry |
| 21 | Servidores (Public Servants) | (transparencia) | 635K PublicOffice | 636K RECEBEU_SALARIO | Federal servants + salaries |
| 22 | CEAF (Expelled Servants) | ceaf |
4.1K Expulsion | 4.1K EXPULSO | Fired for misconduct |
| 23 | CEPIM (Barred NGOs) | cepim |
3.6K BarredNGO | 3.6K IMPEDIDA | NGOs barred from agreements |
| 24 | CPGF (Govt Credit Cards) | cpgf |
1.46M GovCardExpense | -- | LGPD masks CPFs |
| 25 | Viagens a Servico | viagens |
3.71M GovTravel | -- | LGPD masks CPFs |
| 26 | Renuncias Fiscais | renuncias |
291.8K TaxWaiver | 291.8K RECEBEU_RENUNCIA | R$414B+ in tax waivers |
| 27 | Acordos de Leniencia | leniency |
112 LeniencyAgreement | -- | Companies that confessed |
| 28 | BCB Penalidades | bcb |
3.5K BCBPenalty | -- | Fines on financial institutions |
| 29 | STF (Supreme Court) | stf |
2.38M LegalCase | -- | Supreme court proceedings |
| 30 | PEP CGU | pep_cgu |
133.8K PEPRecord | -- | Politically exposed persons |
| 31 | TSE Bens (Candidate Assets) | tse_bens |
14.3M DeclaredAsset | 14.3M DECLAROU_BEM | Declared patrimony |
| 32 | TSE Filiados | tse_filiados |
16.5M PartyMembership | -- | Party membership history |
| 33 | OFAC SDN | ofac |
39.2K InternationalSanction* | -- | US Treasury sanctions |
| 34 | EU Sanctions | eu_sanctions |
(merged above) | -- | EU consolidated sanctions |
| 35 | UN Sanctions | un_sanctions |
(merged above) | -- | UN Security Council sanctions |
| 36 | World Bank Debarment | world_bank |
(merged above) | -- | Debarred firms |
| 37 | Holdings (CNPJ derived) | holdings |
-- | 59K HOLDING_DE | Derived from CNPJ socios |
| 38 | SIOP (Budget Amendments) | siop |
71.1K Amendment | -- | Parliamentary amendment execution |
| 39 | Senado CPIs | senado_cpis |
3 CPI | -- | Congressional investigations |
* InternationalSanction: 39.2K total across OFAC + EU + UN + World Bank
Production totals (2026-02-26): ~141M nodes, ~92M relationships across 35 node labels and 33 relationship types.
2. PIPELINE EXISTS — DATA PENDING (3 sources)
| Source | Pipeline | Status | Blocker |
|---|---|---|---|
| PNCP (Bid Publications) | pncp |
Downloading — 35 files (2021-08→2024-06), still running to 2026-02 | Time — API paginates by month |
| SICONFI (Municipal Finance) | siconfi |
Downloading 2024 data (~530K/700K rows), pipeline fixed (CSV not JSON) | Time — 5,570 municipalities × 5 years |
| CAGED (Labor Movements) | caged |
Pipeline rewritten as aggregate LaborStats. Needs re-download from PDET FTP | Public data has no employer CNPJ. FTP URL: ftp://ftp.mtps.gov.br/pdet/microdados/NOVO CAGED/ |
3. NOT YET BUILT (60+ sources)
3.1 CGU / Transparencia Portal
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 1 | Bolsa Familia/BPC | portaldatransparencia.gov.br/download-de-dados/bolsa-familia-pagamentos | CSV | ~20M | SocialBenefit nodes | LOW | CPFs masked by LGPD |
3.2 BCB / Central Bank
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 2 | BCB Multas | dados.bcb.gov.br | CSV | ~5K | BankFine nodes | HIGH | Administrative fines |
| 3 | ESTBAN | dados.bcb.gov.br | CSV | ~500K/mo | BankingStats nodes | LOW | Bank branch balance sheets |
| 4 | IF.data | dados.bcb.gov.br | CSV | ~2K quarterly | FinancialInstitution nodes | LOW | Financial institution metrics |
| 5 | BCB Liquidacao | dados.bcb.gov.br | CSV | ~200 | BankLiquidation nodes | MEDIUM | Liquidated financial institutions |
3.3 Judiciary
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 6 | CNJ DataJud | api-publica.datajud.cnj.jus.br | REST API (self-service key) | Tens of millions | LegalCase nodes | VERY HIGH | Proceedings across all courts |
| 7 | STJ Dados Abertos | dadosabertos.stj.jus.br | CSV/XML | ~500K | LegalCase nodes | HIGH | Superior court decisions |
| 8 | CNCIAI (Improbidade) | cnj.jus.br (part of DataJud) | API | ~10K | ImprobityCase nodes | VERY HIGH | Administrative misconduct convictions |
| 9 | CARF (Tax Appeals) | carf.fazenda.gov.br | Structured | ~500K | TaxAppeal nodes | MEDIUM | Federal tax appeal decisions |
3.4 Regulatory Agencies (11 sources)
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 19 | ANP (Oil/Gas Royalties) | dados.gov.br/dados/conjuntos-dados/anp | API + CSV | ~100K/yr | Royalty, FuelPrice nodes | MEDIUM | Oil royalties + fuel pricing |
| 20 | ANEEL (Energy) | dadosabertos.aneel.gov.br | API | ~50K | EnergyContract nodes | MEDIUM | Energy concessions and contracts |
| 21 | ANM (Mining) | dados.gov.br/dados/conjuntos-dados/anm | API + CSV | ~100K | MiningConcession nodes | HIGH | Mining rights, often tied to deforestation |
| 22 | ANTT (Roads) | dados.gov.br/dados/conjuntos-dados/antt | API | ~10K | TransportContract nodes | LOW | Transport concessions |
| 23 | ANS (Health Insurance) | dados.gov.br/dados/conjuntos-dados/ans | API | ~50K | HealthPlan nodes | LOW | Health plan operators |
| 24 | ANVISA (Drug/Food) | dados.gov.br/dados/conjuntos-dados/anvisa | API | ~100K | RegulatoryApproval nodes | LOW | Product registrations |
| 25 | ANAC (Aviation) | dados.gov.br/dados/conjuntos-dados/anac | API | ~10K | AviationConcession nodes | LOW | Airport concessions |
| 26 | ANTAQ (Waterways) | dados.gov.br/dados/conjuntos-dados/antaq | API | ~5K | PortContract nodes | LOW | Port authority contracts |
| 27 | ANA (Water) | dados.gov.br/dados/conjuntos-dados/ana | API | ~10K | WaterConcession nodes | LOW | Water resource grants |
| 28 | ANATEL (Telecom) | dados.gov.br/dados/conjuntos-dados/anatel | API | ~50K | TelecomLicense nodes | LOW | Telecom licenses |
| 29 | SUSEP (Insurance) | dados.gov.br/dados/conjuntos-dados/susep | CSV | ~10K | InsuranceEntity nodes | LOW | Insurance market data |
3.5 Financial / Securities (2 sources)
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 30 | CVM Full (Ownership/Funds) | dados.cvm.gov.br | CSV | Millions | DETEM_PARTICIPACAO rels | HIGH | Shareholder chains, fund ownership |
| 31 | Receita DIRBI | dados.gov.br | CSV | Large | TaxBenefit nodes | MEDIUM | Tax benefit declarations |
3.6 Environmental (3 sources)
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 32 | MapBiomas Alerta | alerta.mapbiomas.org/api | REST API | 465K+ alerts | DeforestationAlert nodes | HIGH | Validated deforestation, property overlap |
| 33 | SiCAR (Rural Registry) | car.gov.br/publico/municipios/downloads | Bulk shapefiles | ~7M properties | RuralProperty nodes | HIGH | Rural property boundaries + owners |
| 34 | ICMBio/CNUC | icmbio.gov.br | API | ~2.5K | ConservationUnit nodes | LOW | Protected area boundaries |
3.7 Labor (2 sources)
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 35 | CAGED | basedosdados.org (br_me_caged) | BigQuery | ~2M/mo | LaborMovement nodes | MEDIUM | Monthly hiring/firing (no CPF in public data) |
| 36 | RAIS Microdata | basedosdados.org (br_me_rais) | BigQuery | ~50M/yr | DetailedLabor nodes | MEDIUM | Identified data requires formal authorization |
3.8 Budget / Fiscal (4 sources)
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 37 | SIOP Emendas | siop.planejamento.gov.br | CSV + API | ~30K/yr | DetailedAmendment nodes | HIGH | Parliamentary amendment execution details |
| 38 | SICONFI | siconfi.tesouro.gov.br | REST API (siconfipy) | ~5.5K municipalities | MunicipalFinance nodes | MEDIUM | Municipal/state fiscal data |
| 39 | Tesouro Emendas | tesouro.gov.br | CSV | ~50K | TreasuryAmendment nodes | HIGH | Treasury-tracked amendment spending |
| 40 | SIGA Brasil | www12.senado.leg.br/orcamento/sigabrasil | CSV export | Massive | BudgetExecution nodes | MEDIUM | Full federal budget execution |
3.9 Legislative (4 sources)
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 41 | Camara Full API (Votes/Bills) | dadosabertos.camara.leg.br/api/v2 | REST API + BigQuery | Millions | Vote, Bill nodes | MEDIUM | Deputy votes, bill authorship |
| 42 | Senado Full API (Votes/CPIs) | legis.senado.leg.br/dadosabertos | REST API + BigQuery | Large | SenateVote, CPI nodes | MEDIUM | Senate votes, CPI details |
| 43 | TSE Filiados | basedosdados.org (br_tse_eleicoes.filiacao_partidaria) | BigQuery | ~15M | PartyMember edges | MEDIUM | Party membership history |
| 44 | TSE Bens (Candidate Assets) | basedosdados.org (br_tse_eleicoes.bens_candidato) | BigQuery | ~500K | DeclaredAsset nodes | HIGH | Declared patrimony per election |
3.10 International Sanctions (5 sources)
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 45 | OFAC SDN | sanctionssearch.ofac.treas.gov | Direct CSV | ~12K | InternationalSanction nodes | HIGH | US Treasury sanctions list |
| 46 | EU Sanctions | data.europa.eu/data/datasets/consolidated-list-of-persons | Direct CSV | ~5K | InternationalSanction nodes | HIGH | EU consolidated sanctions |
| 47 | UN Sanctions | scsanctions.un.org/resources/xml | Direct XML | ~2K | InternationalSanction nodes | HIGH | UN Security Council sanctions |
| 48 | World Bank Debarment | worldbank.org/en/projects-operations/procurement/debarred-firms | CSV (OpenSanctions mirror) | ~1K | InternationalSanction nodes | MEDIUM | Debarred firms/individuals |
| 49 | INTERPOL Red Notices | interpol.int/How-we-work/Notices/Red-Notices | REST API | ~7K | InternationalNotice nodes | MEDIUM | Requires API key |
3.11 State / Municipal (10+ sources)
| # | Source | URL | Format | Est. Volume | Nodes/Rels | Value | Notes |
|---|---|---|---|---|---|---|---|
| 50 | PNCP Full | pncp.gov.br/api/consulta | Swagger REST API | Massive | Procurement nodes | HIGH | National procurement portal, paginate by date |
| 51 | TCE-SP | transparencia.tce.sp.gov.br | REST API | Large | StateProcurement nodes | HIGH | Sao Paulo state audit court |
| 52 | TCE-PE | sistemas.tce.pe.gov.br | REST API (CPF/CNPJ search) | Large | StateProcurement nodes | MEDIUM | Pernambuco audit court |
| 53 | TCE-RJ | dados.tce.rj.gov.br | REST API | Large | StateProcurement nodes | MEDIUM | Rio de Janeiro audit court |
| 54 | TCE-RS | portal.tce.rs.gov.br | Bulk downloads | Large | StateProcurement nodes | MEDIUM | Rio Grande do Sul audit court |
| 55 | MiDES | basedosdados.org (br_mides) | BigQuery | Massive | MunicipalProcurement nodes | VERY HIGH | 72% of municipalities covered |
| 56 | Querido Diario | queridodiario.ok.org.br/api | REST API + bulk ZIPs | 104K+ issues | MunicipalGazetteAct nodes | HIGH | Municipal gazette full text |
| 57-66 | State Transparency Portals | (SP, MG, BA, CE, GO, PR, SC, RS, PE, RJ) | Varies | Varies | StateExpense nodes | MEDIUM | Each state has its own portal |
4. GITHUB SHORTCUTS (pre-processed data)
Community-maintained datasets and tools that accelerate ingestion.
| # | Repo / Source | What | Volume | Value | Status |
|---|---|---|---|---|---|
| G1 | brasil-io-public.s3.amazonaws.com (holding.csv.gz) | Company-to-company ownership chains | 787K rels, 9MB | HIGH | Ready to load |
| G2 | SINARC | Pre-built anti-corruption graph | 90GB | REFERENCE | Format unclear, use as validation |
| G3 | cnpj-chat/cnpj-data-pipeline | State-level CNPJ Parquet from GitHub Releases | Large | MEDIUM | Alternative CNPJ format |
| G4 | rictom/rede-cnpj | Pre-computed CNPJ relationship SQLite | Large | MEDIUM | Includes TSE/Transparencia crosslinks |
| G5 | hackfestcc/dados-hackfestcc | Curated anti-corruption datasets | Small | LOW | Reference datasets |
| G6 | DanielFillol/DataJUD_API_CALLER | Go-based DataJud bulk downloader | -- | HIGH | Speeds up CNJ ingestion |
| G7 | Serenata de Amor (suspicions.xz) | Flagged CEAP anomalies | 8K records | MEDIUM | Pre-analyzed deputy expenses |
| G8 | mcp-senado | MCP server wrapping Senate API (56 tools) | -- | LOW | Developer tool, not data |
| G9 | mcp-portal-transparencia | MCP server wrapping Transparency Portal API | -- | LOW | Developer tool, not data |
5. BIGQUERY DATASETS (via Base dos Dados)
basedosdados.org provides cleaned, standardized Brazilian public data in BigQuery. Free tier has limits; paid plans for heavy use.
| BQ Dataset ID | Key Tables | Loaded? | Notes |
|---|---|---|---|
| br_rf_cnpj | empresas, socios, estabelecimentos | YES (direct CSV) | Used direct Receita download instead |
| br_tse_eleicoes | candidatos, receitas, despesas, bens_candidato, filiacao_partidaria | PARTIAL | Candidates + donations loaded via TSE direct; bens + filiados not yet |
| br_me_rais | microdados_vinculos | PARTIAL | Aggregate loaded; microdata requires formal auth |
| br_me_caged | microdados_movimentacao | NO | Monthly labor data |
| br_stf_corte_aberta | decisoes | NO | Supreme court decisions |
| br_camara_dados_abertos | votacao, proposicao, deputado | PARTIAL | Expenses loaded; votes/bills not yet |
| br_senado_cpipedia | cpi | NO | CPI investigation data |
| br_bd_diretorios_brasil | municipio, uf, setor_censitario | NO | Reference tables for joins |
| br_mides | licitacao, contrato, item | NO | Municipal procurement (72% coverage) |
6. INGESTION PRIORITY MATRIX
Recommended build order based on: value for pattern detection, implementation effort, and data volume.
| Priority | Source | Effort | Volume | Value | Rationale |
|---|---|---|---|---|---|
| 1 | CGU PEP List | Trivial (CSV) | ~100K | HIGH | Replaces hardcoded PEP_ROLES; authoritative PEP classification |
| 2 | CEAF (Expelled Servants) | Easy (CSV) | ~10K | HIGH | Servants expelled for misconduct; cross-ref with companies |
| 3 | Acordos de Leniencia | Trivial (CSV) | ~34 | VERY HIGH | Companies that admitted wrongdoing; tiny dataset, immense value |
| 4 | OFAC SDN | Easy (CSV) | ~12K | HIGH | International sanctions; direct download, well-structured |
| 5 | Brasil.IO Holdings | Trivial (9MB download) | 787K rels | HIGH | Company-to-company ownership chains; immediate graph enrichment |
| 6 | DOU via IN XML | Medium (XML parsing) | Large | HIGH | Bypasses Cloudflare; official gazette appointments and acts |
| 7 | TSE Bens (Candidate Assets) | Easy (BigQuery) | ~500K | HIGH | Declared patrimony; detect unexplained wealth growth |
| 8 | TSE Filiados (Party Members) | Easy (BigQuery) | ~15M | MEDIUM | Party membership history; useful for political network mapping |
| 9 | CVM Full Ownership | Medium (CSV) | Millions | HIGH | Shareholder chains reveal hidden beneficial ownership |
| 10 | CNJ DataJud | Medium (API + key) | Massive | VERY HIGH | Judicial proceedings; largest gap in current graph |
Effort Scale
- Trivial: Direct CSV download, schema matches existing patterns, <1 day
- Easy: CSV/BigQuery, minor transforms needed, 1-2 days
- Medium: API pagination, format conversion, or authentication required, 3-5 days
- Hard: Scraping, Cloudflare bypass, complex parsing, or formal data request, 1-2 weeks