Team A pipelines: CGU PEP registry, CEAF expelled servants, Leniência agreements,
OFAC SDN sanctions, Brasil.IO company holdings. Each with download script, tests,
fixtures. Promoted deadlock retry from 4 pipelines into Neo4jBatchLoader.
Created docs/data-sources.md master catalog (85+ sources). Updated init.cypher
(12 constraints, 15 indexes), meta API (24 sources), frontend (tokens, icons, i18n).
798 tests green (190 API + 454 ETL + 154 frontend).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Câmara: deputies without CPF now linked via deputy_id (~10K recovered).
Senado: senator lookup from Dados Abertos API enables CPF-first GASTOU +
name fallback without CANDIDATO_EM filter (~200K recoverable).
GlobalPEP: post-load Cypher script for 2-phase exact name matching.
Also: OpenRouter MCP config, triple-AI consensus skill.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Servidores pipeline redesign: LGPD-masked CPFs (6-digit partials) caused
136K merge collisions when used as Person/PublicOffice keys. Now uses
SHA-256 hash IDs (servidor_id, office_id) as merge keys, keeping
cpf_partial as a property for entity resolution matching.
Production results: 635K PublicOffice, 632K Person, 36K SAME_AS links.
Also fixes: deputy_supplier_loop.cypher path (DOOU→Person→Election),
senado.py GASTOU relationship creation, MCP server docs in CLAUDE.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same person can run in multiple years with different sq_candidato IDs
but same CPF. Merging by sq_candidato violates the Person.cpf uniqueness
constraint. Now merges by CPF when available, sq_candidato as fallback.
Uses coalesce() in CANDIDATO_EM/DOOU queries to find persons by either key.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CVM: Portal restructured URLs (PAS→PROCESSO/SANCIONADOR). New format
uses NUP as ID, semicolon-delimited, latin-1 encoding, ZIP archive.
Pipeline now uses name-based matching (no CPF/CNPJ in new data).
Senado: CSVs have metadata row on line 1 — add skiprows=1.
Camara: CSVs use UTF-8 BOM — switch from latin-1 to utf-8-sig.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- graph_expand, entity_connections, entity_timeline, entity_score, node_degree:
add entity label allowlist to all elementId() lookups (same IDOR class as SEC-01)
- search.cypher: exclude User/Investigation/Annotation/Tag from full-text results
- pattern_service.py: separate TimeoutError from unexpected exceptions for visibility
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- deploy.sh: defer DOMAIN check until after arg parsing (dry-run works without DOMAIN)
- pattern_donation_amendment_loop: ORDER BY amendment_value (was a.value, null for new fields)
- loader.py: reject non-identifier keys to prevent Cypher injection from malformed data
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- loader.py: build SET from union of all row keys (not just first row)
- datasus.py: normalize atende_sus to '1'/'0' (was storing raw 'SIM'/'NAO')
- sanctions.py: store None for missing date_end (was empty string, broke IS NULL check)
- deploy.sh: fail fast if DOMAIN env var is not set
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 new pipelines: bndes, comprasnet, datasus, dou, ibama, inep, pgfn, rais, tcu, transferegov
Date formatting transform, test fixtures for all sources
Download scripts (comprasnet, datasus, dou), audit tool, graph viz doc
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Servidores have LGPD-masked CPFs (only 6 middle digits visible). This adds
two-layer SAME_AS matching to link 739K servidores to TSE/CNPJ persons:
- Phase 0: pre-compute cpf_middle6 on existing full-CPF Person nodes
- Phase 4: partial CPF + exact name match (confidence 0.95)
- Phase 5: unique name-only match for classified servidores (confidence 0.85)
Integration tests against real Neo4j caught and fixed a Cypher bug: MERGE
cannot use list index (targets[0]) directly — needs WITH alias first.
Also: make link-persons target, cpf_middle6/cpf_partial indexes,
testcontainers conftest fix, neutrality fix in value_sanitization.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cypher queries updated for all 15 node labels and 15 relationship types
(graph_expand, entity_connections, entity_timeline, entity_score,
schema_init). 3 new patterns: debtor_contracts, embargoed_receiving,
loan_debtor. Landing redesign with HeroGraph, FeatureIcons, typewriter
hook, 13 data sources. Frontend: 5 new data colors, 13 rel types in
graph store, i18n for new entity/relationship types. Schema: 5 new
uniqueness constraints + 3 new indexes. 473 tests green.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RC3: nodeCanvasObject + nodeCanvasObjectMode wrapped in useCallback — prevents
ForceGraph2D from re-initializing render pipeline on every React render.
RC4: pauseAnimation() on engine stop (halts RAF loop after layout settles) and
on unmount (prevents CPU burn after navigating away). Simulation tuned:
cooldownTime=4000, d3AlphaDecay=0.03, d3VelocityDecay=0.5 for faster settling.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Investigation router: prefix-based routing (relative paths), shared route on separate mini-router
- FastAPI: redirect_slashes=False safety net
- Schema: added Sanction.date_start and Amendment.date indexes
- New models: ExposureFactor, ExposureResponse, TimelineEvent, TimelineResponse
- New Cypher: entity_score, entity_score_peers, entity_timeline (cursor-paginated)
- New service: score_service.py with weighted exposure index (connections/sources/financial/patterns/baseline)
- New endpoints: GET /entity/{id}/exposure, GET /entity/{id}/timeline
- Fix: meta.py dict type annotations (pre-existing mypy errors)
- Tests: 17 new tests (score_service + entity_timeline), all 143 pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WS1: Fix donation-contract pattern — reverse DOOU path direction
(Company→Person not Company→Election), d.value→d.valor
WS2: Schema bootstrap — copy init.cypher to queries/, add ensure_schema()
called in lifespan. All IF NOT EXISTS, idempotent.
WS3: Rate limiting — add default_limits to Limiter, per-endpoint decorators
on auth (10/min), search (30/min), patterns (30/min)
WS4: Investigation ownership — GET investigation/annotations/tags now require
CurrentUser, Cypher matches User→OWNS→Investigation. Exports pass user_id.
WS5: Entity ID standardization — search/graph return document_id via coalesce,
investigation queries match on cpf/cnpj/contract_id/sanction_id/amendment_id
OR elementId. New /by-element-id endpoint. Frontend EntityDetail auto-routes.
WS6: Frontend SourceAttribution — sources typed as {database} objects not strings,
components use source.database, test mocks updated.
292 tests green (124 API + 103 ETL + 65 frontend), neutrality clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TSE pipeline stored company donor CNPJs as raw 14-digit strings while all
other pipelines used formatted XX.XXX.XXX/XXXX-XX. This caused ~17K duplicate
Company nodes and broke cross-source MERGE operations. Transparencia accepted
short/invalid CNPJs (e.g. "11"), creating 4,427 garbage Company nodes.
- tse.py: format_cnpj(donor_doc) instead of storing raw digits
- transparencia.py: reject CNPJ with len != 14 digits (covers empty + short)
- Tests: assert formatted CNPJ for company donors, add short-CNPJ skip test
- Neo4j migration: merged 937 duplicates, reformatted 16,360 in place, deleted
4,427 garbage nodes (53,654,691 → 53,649,327 Company nodes)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- TSE pipeline: add razao_social to company donor nodes (was only name)
- Sanctions pipeline: add razao_social to Company nodes (not Person)
- Transparencia pipeline: skip contracts with empty CNPJ digits
- API search/graph: fallback to name when razao_social missing
- Root cause: 24.6K Company nodes from TSE/sanctions had null razao_social
because those pipelines only set name, not razao_social
- Add 4 tests for razao_social mapping and contract_id validation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows skipping completed phases when restarting a streaming load,
avoiding redundant MERGE operations on already-loaded data.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- .env.example: document Neo4j memory settings for 40M+ node production
- docker-compose.prod.yml: remove misleading VITE_API_URL runtime env
(Vite bakes env at build time; Caddy proxies relative paths correctly)
- deploy.sh: health check through Caddy (HTTPS) instead of direct API port
- deploy.yml: pin appleboy/ssh-action to commit hash (supply-chain safety)
- backup-cron.sh: installer for daily Neo4j dump backup at 03:00 UTC
- snapshot-volume.sh: Hetzner Cloud volume snapshot via hcloud CLI
- healthcheck-cron.sh: uptime monitor every 5 min with webhook alerts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
list_rows() + selected_fields avoids temp tables and storage quota.
Parallel downloads (one per table). Fix default output to ../data/.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RF server unreachable — use Base dos Dados BQ mirror instead.
New download script streams BQ → CSV page-by-page (no OOM).
run_streaming() auto-detects RF vs BQ format, reuses same transforms.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New endpoint GET /api/v1/investigations/{id}/export/pdf with lang param.
Jinja2 A4 template, weasyprint rendering, lazy import for test compat.
Frontend: export button + API client method + i18n keys. 6 new tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers login/register/logout/restore flows in auth store and
Login page rendering, form submission, error display, navigation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auth store (Zustand): login/register/logout/restore, localStorage persistence,
auto-injects Bearer token via api/client.ts on all requests.
Login page: email/password form with register toggle and invite code field.
Protected routes: /investigations requires auth, redirects to /login.
AppShell: shows user email + logout when authenticated, login link when not.
ErrorBoundary: React error boundary wrapping entire app in main.tsx.
i18n: auth keys added for PT-BR + EN.
Also fixes: ETL pandas-stubs for mypy, sanctions.py no-any-return,
ETL integration conftest graceful testcontainers import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: 13 Cypher queries, investigation service with 12 async ops,
router with 13 endpoints (CRUD, annotations, tags, share tokens, export).
Frontend: InvestigationPanel, InvestigationDetail (inline editing),
AnnotationEditor, TagManager (color picker), Timeline. Zustand store
with full state management. Bilingual i18n. 79 API + 20 frontend tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Baseline service compares entities against CNAE sector and regional
peers. Frontend adds PatternCard, PatternResultCard, Patterns page
with bilingual i18n and route integration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Design tokens: entity type colors, spacing, typography as TS constants
- MoneyLabel: R$ formatting (Brazilian Real with locale)
- SourceBadge: data source colored badges
- ConfidenceBadge: solid/dashed visual for match confidence
- Disclaimer: neutral data disclaimer via i18n
- SearchBar: input with type filter dropdown
- SearchResults: result list with entity colors and source badges
- API client: typed fetch wrapper with search/entity/graph endpoints
- i18n: expanded PT-BR and EN translations
- 12 vitest tests passing (lint + type-check + vitest clean)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>