Files
br-acc/scripts
bruno cesar e3ebbe6d76 Servidor entity matching: bypass LGPD-masked CPFs via partial CPF + name
Servidores have LGPD-masked CPFs (only 6 middle digits visible). This adds
two-layer SAME_AS matching to link 739K servidores to TSE/CNPJ persons:

- Phase 0: pre-compute cpf_middle6 on existing full-CPF Person nodes
- Phase 4: partial CPF + exact name match (confidence 0.95)
- Phase 5: unique name-only match for classified servidores (confidence 0.85)

Integration tests against real Neo4j caught and fixed a Cypher bug: MERGE
cannot use list index (targets[0]) directly — needs WITH alias first.

Also: make link-persons target, cpf_middle6/cpf_partial indexes,
testcontainers conftest fix, neutrality fix in value_sanitization.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:46:51 -03:00
..