Who the asterisk protects
· 18min read
In some routine official gazette, in the header of a single-judge decision from the State Court of Accounts, you find this sentence:
INTERESTED PARTY: Mariana Esteves Carvalho Albuquerque. CPF no.
***.123.456-**.
The sentence is well composed. The full name, prepositions in place. The CPF chopped at both ends. Iperon, when granting the retirement, saw no reason to hide the name of the retiree; the Court of Accounts, when registering it, saw no reason to change that choice; but both saw reason to hide two chunks of the CPF. The document publishes and conceals on the same line, with the serenity of well-trained civil service.
The scene repeats across hundreds of decisions. The Court of Accounts performs the summary review of retirement acts granted by the state pension institute and publishes the result in its own Official Gazette. Each decision carries the interested partyâs full name, position, posting, the articles of the Constitution and amendments on which the act is founded, and the CPF masked at both ends. Nobody read the whole page and asked: if the name is right here, what are the asterisks protecting?
The math nobody does
The Brazilian CPF has eleven digits. The first nine are, in principle, free; the last two are check digits, computed from the first nine by a predictable operation â modulo eleven, fixed in a Receita Federal regulation1. In other words: the last two add no information that isnât already contained in the first nine. They exist to detect typos, not to hide information.
When a CPF is masked in the form ***.XXX.XXX-**, five digits are hidden. The casual reader counts five asterisks and imagines five digits of uncertainty. Five decimal digits would mean a hundred thousand possibilities. A hundred thousand is a big number.
Itâs the wrong number.
The last two asterisks donât hide anything the others havenât already said. Given any nine-digit prefix, the two check digits are unique. That leaves the three asterisks at the start. Three decimal digits. A thousand possibilities.
To enumerate those thousand possibilities, all you need is a three-level for loop in any language with integer arithmetic. For each candidate triple, you compute the two check digits, complete the CPF, and youâre done: one valid CPF per candidate, a thousand candidates in total. The operation fits in fifteen lines of Python. It runs in microseconds.
The math is mathing. Five asterisks look like five digits. They are not.
The name is the front door
The previous exercise â generating a thousand candidates â is elegant and unnecessary. In almost every practical case, nobody needs to generate a thousand candidates, because the five asterisks live surrounded by information that already uniquely identifies the person.
Mariana Esteves Carvalho Albuquerque, whose name appears in the single-judge decision, is not just any Mariana. She is a retired state civil servant, with a defined position, a recorded posting, a numbered registration. The Transparency Portal publishes the full name, registration number, position, posting and salary of the entire payroll. The stateâs Electronic Official Gazette, searchable by full text across almost two decades of archive, carries the appointment ordinance, some promotion, some leave, the publication of the retirement act. Somewhere in those publications, over those twenty years, the CPF appeared in full. The LGPD became law in 2018; the rest of the servantâs documentary history is older, and was indexed.
The question the asterisk pretends to dodge is a question the asterisk has no way of dodging: who is this person. The act has already answered. The chopped CPF is a redundant confirmation of an identification already performed by the documentâs own header.
When the Brazilian system of performative protection feels especially diligent â ahem ahem, IPERON đ€§ â it also anonymizes the registration number. Something like ****-1234 appears next to the chopped CPF. The operation is mathematically worse than publishing either of the two in full. Two partially masked identifiers cross by intersection: the set of candidates compatible with ***.452.318-** intersected with the set compatible with ****-1234 collapses, in most cases, to a single person, even without the name. The handbook that hides two fingers of the CPF and two fingers of the registration number is giving more information, not less.
It wasnât always this way. Sometime between 2018 and 2022, everyone in the Brazilian public service became convinced â by a combination of stray handbooks and fear of the legal office â that the chopped CPF was the formal mark of LGPD compliance. The chop was applied without touching the rest. The name stayed in full because removing the name would, then yes, contradict the purpose of the act. The CPF was the offering laid on the altar.
flowchart LR
A["Act in the Court Gazette<br/>full name<br/>partial CPF"] --> B[Transparency Portal]
A --> C[Searchable Official Gazette]
B --> D[registration, position, posting]
C --> E["older publications<br/>(full CPF)"]
D --> F[unique identification]
E --> F
Robson and Dona Maria
Robson is twenty-seven, an IT technician at a gas station on the BR-364 highway, and knows enough Python to solve small problems. He maintains the card terminals, configures the convenience storeâs Wi-Fi, updates the pumpâs system. He reads the act because his brother-in-law has just retired and heâs curious. The asterisks donât stop him because he doesnât even need to decipher them: he pastes the name into Google, finds the servant on the Transparency Portal, confirms it on the approved-candidates page of some old civil service exam, and in ten minutes he has the full picture. He used no tool that isnât free. He downloaded nothing. He ran no script. He just read â and the Brazilian system of official publications allows reading.
Dona Maria lives next to a civil servant who retired for permanent disability last year but still plays pickup soccer on Sundays. Sheâs a widow, has read newspapers her whole life, and sheâs suspicious. She looks up her neighborâs name in the Official Gazette, finds the single-judge decision, reads disability retirement, and sees the CPF chopped at the ends. She has no technical training. She doesnât know about the Transparency Portal. The asterisks paralyze her, not because they are insurmountable, but because they signal legal ritual and Dona Maria has understood, correctly, that she wasnât invited to the ritual. She closes the browser. The social oversight she could have exercised â one of those small civic vigilances that sustain control over administrative acts â did not happen.
The spine question of the whole post fits in one sentence: which of the two does the anonymization work against?
Against Dona Maria. Robson doesnât even know she exists.
The hacker from Araraquara
For the case in which Robson canât close it through web triangulation â stubborn homonymy, a servant with a clean digital presence, a target whose CPF was never published anywhere â thereâs no need to invoke a new category. Itâs the same Robson, with more tenacity and more free time. We can call him the hacker from Araraquara, in honor of the character from Brazilian political folklore who was moved to open prison last week. The only difference from Robson is this: this one downloaded, from some torrent, the 2021 Serasa dump â two hundred and twenty million CPFs with full name, date of birth, address and motherâs name, indexed in some SQLite file on an external drive. In any hard case, he resolves it in fifteen seconds.
The technical ceiling of the non-state, non-Big-Tech Brazilian adversary has a name, a criminal record and an ankle monitor â and is, materially, the Robson from the previous paragraph with more stubbornness. The handbookâs barrier never even reached Robsonâs level.
flowchart LR
M["Dona Maria"] -.stopped by asterisks.-> X["â"]
R["Robson"] -->|10 minutes| ID["unique<br/>identification"]
R -.+ stubbornness<br/>+ Serasa dump.-> H["hacker from<br/>Araraquara"]
H -->|15 seconds| ID
The PET bottle on top of the meter
Thereâs a technical name for this kind of mistake, and it came before the CPF: security theater, the expression coined by the cryptographer Bruce Schneier in the early 2000s to describe public protection rituals whose real function is just to display that a protection is being executed. The shoe inspection at airports is the canonical American example. Bars on the windows with the back door unlocked is the generic Brazilian example, and almost any condominium in the country offers its variation.
The paradigmatic case, though, is a better one: remember when we used to put a PET bottle full of water on top of the electricity meter? It was the national ritual of the 1990s and early 2000s â a full bottle, lying down or standing up, on top of the meter, in the serene faith that it slowed consumption. It didnât. Water has no opinion about the meter. But it worked through another path: we saw the bottle there, every day, and remembered to turn off the living-room light. The ritual was false in physics and true in psychology. It worked by mistake, but it worked.
The asterisk in the Official Gazette is a PET bottle without even the reminder effect. Whoever sees the five asterisks doesnât think I need to protect the servantâs data; thinks, at best, ah, anonymization, and moves on to the full name right next to it.
The other 843 Franklin Silveira Baldos and I publicly thank you for hiding the 7, the 6 and the 4 of my CPF right after stating each one of our full names.
And whoever produces the act, on the other end, also isnât thinking about protection â theyâre thinking about formal compliance. Neither side of the publication is being psychologically reminded of anything. The Brazilian ritual normally pays the price of technical uselessness with the profit of psychological effect. This one neither pays nor profits.
It isnât security theater. Itâs theater of security theater.
The self-contradicting handbook
The production of the handbook has its own sociology, and the first absurdity is that there isnât the handbook â there are hundreds. No unified technical guidance came out of the National Data Protection Authority. No general normative instruction came out of the federal government. No directive that the whole public sector could follow came out of any central body. Instead, in every autarchy, every court, every state secretariat, every professional council, every public university, a data-governance committee of its own was formed â people from legal, from the chief of staffâs office, from IT and from communications. Each of these committees meets. Each produces, in some quarter, a document titled, with discreet local variation, Best Practices for Anonymization of Personal Data in Administrative Acts. Itâs between four and twelve pages long, it bears the bodyâs coat of arms, some grounding in the LGPD, and a final section with masking examples. The invariably recommended example is ***.XXX.XXX-**. The handbook is approved by ordinance. The ordinance is published in the Official Gazette. In that same Official Gazette, three pages later, someoneâs retirement act appears with the full name and the chopped CPF.
Hundreds of independent committees, in parallel, over years, worked to arrive at the same wrong answer.
The kind of institutional productivity only Brazil can pull off.
A small pull-of-the-credentials, low risk: my masterâs thesis was on administrative transparency. Itâs not a noble title; at most, it authorizes a technically qualified irritation with the normative PET bottle.
Thereâs a detail that makes the thing even more elegant. The handbookâs authors â legal, the chief of staffâs office, IT â are exactly the people with full access to the bodyâs databases. They themselves constitute the set against which the anonymization of the CPF in the publication would, in theory, be a defense. They are the internal Robsons, with the difference that they have credentials. The ritual is being executed, in significant part, by the very actors against whom it would appear to protect â and in practice it has never protected, because nobody needs a chopped CPF when they have a login to the system. The handbook is not a security policy. It is a performance of compliance, written by the very actors who would render it ineffective, addressed to an external adversary who does not exist.
To measure the depth of the reflex, I asked a commercial language model for editorial feedback on this essay. The poor thing, trained on terabytes of Brazilian public text post-2018, recommended â with the best intentions â that I anonymize the opening example, because citing a real name next to a partially masked CPF could, according to it, expose the specific person. The handbook has even contaminated the synthetic reader. It left Porto Velho, crossed the Pacific, was trained on some server in California, and came back intact in the form of well-meaning editorial advice. The ritual found a way to propagate itself even without committees.
What the LGPD actually says
The LGPD defined anonymization in art. 5, item XI, with words that donât admit the Brazilian use of the term:
Anonymization: the use of reasonable and available technical means at the time of processing, by which a datum loses the possibility of association, directly or indirectly, with an individual.
A thousand candidates crossed with full name, position, posting and two decades of indexed Official Gazette do not constitute a datum that has lost the possibility of association. Robson is not an unreasonable technical means. Heâs a gas-station tech with Python. The legal definition of anonymization is generously broad, and even so the Brazilian practice doesnât fit inside it.
The verb in the definition is specific: loses the possibility of association. Doesnât make it harder. Doesnât make it more expensive. Doesnât discourage the curious. Loses. The LGPD adopted a binary definition â either the datum was in fact disconnected from the subject, or it wasnât. There is no intermediate regime, there is no half-anonymization. Tricks that make reidentification trivial for any Robson donât meet the legal hypothesis: they donât even try. From the privacy side, then, the chop has nothing to stand on.
That leaves examining it from the opposite side: transparency. The LGPD provides, in art. 23, a specific hypothesis for the processing of personal data by the public power, articulated with the Access to Information Law, whose art. 8 defines the catalog of active transparency â salaries, personnel acts, contracts. The Constitution, in art. 37, caput, makes publicity a guiding principle of public administration. The Supreme Federal Court, in ARE 652.777 of 2015, decided that the nominal disclosure of civil servantsâ salaries is a legitimate consequence of that principle. The legal system, in other words, has already made its choice in favor of transparency for civil-servant administrative acts â and the chop of the CPF operates below that choice, raising the cost of verification for those who should be able to verify. It doesnât anonymize because it canât. It gets in the way because the full name right next to it summons a verification that the chop makes harder for no reason. It does the worst of both worlds, and does it firmly.
The missing mens legis
The LGPD was not conceived in Brazil. It is, to a large extent, the Brazilian cousin of the European General Data Protection Regulation â the GDPR, written in 2016 and in force since 2018. The GDPR did not come from a legislative vacuum: it came, in considerable part, from the political response to the growing perception, throughout the 2010s, that some companies were concentrating a disproportionate informational power. The Cambridge Analytica scandal, in 2018, gave name and face to that perception â Facebook revealed it had exposed the data of eighty-seven million users to a political consulting firm that used them for electoral microtargeting, in an episode that ran through the Brexit campaign and the 2016 American election. The GDPRâs legislative work was already under way before the scandal; Cambridge Analytica gave the popular name to what was being regulated. The LGPD, two years later, reflected the same motivation.
What happened on the way from the law to the handbook is a form of transference. The companies that originated the concern keep operating essentially as they operated. Systemic leaks cross the Brazilian landscape without provoking a proportional institutional response. Serasa leaked some two hundred and twenty million CPFs in 2021. INSS records have appeared on forums for years. The telemarketer who calls during our lunch break knows the exact value of our last bill, and weâve given up asking how he knows. The LGPD exists while all of this happens. But the part of the LGPD that actually bites â that generates committees, handbooks, training sessions, internal disciplinary actions, removal of useful information from public databases â is the part that squeezes the least dangerous agent in the system: the front-desk servant, the academic researcher, the local journalist, the citizen overseer.
Whoever wrote the LGPD was thinking about Mark Zuckerberg. Whoever applies the LGPD is thinking about Dona Maria.
It isnât necessary to attribute systemic bad faith to anyone for this to happen, and I donât. The ritual survives on its own, by a combination of institutional risk aversion, the fragmentation of the stateâs technical capacity, and the administrative inertia that prefers a demonstrable formal protection to a substantive protection thatâs hard to display. The handbook is displayable. Internal segregation of duties isnât. The asterisk is the visible mark of compliance, and thatâs why it multiplied.
flowchart TD
Q["Whom the partial<br/>asterisk doesn't stop"]
P["Whom the partial<br/>asterisk stops"]
Q --> BT["Big Tech / data brokers"]
Q --> H["hacker from Araraquara"]
Q --> R["Robson"]
P --> DM["Dona Maria"]
The honest alternative
The honest technical path for civil-servant administrative acts is simple and old. Either you publish by name what the Constitution wants public â name, position, posting, legal grounds, value of the benefits â and accept that oversight is, in part, popular; or you actually protect what needs to be protected â health, dependents, banking data, home address â through segregation of duties, access logs by registration number, periodic auditing of internal queries and mechanisms that detect patterns of inappropriate curiosity in database access. The two operations are compatible: the first is publicity, the second is protection. The asterisk in the Official Gazette is neither. It is a third thing, which looks like the second while undoing the first.
The asterisk in the Official Gazette doesnât hide a person. It hides who is allowed to look at her. Robson is looking.
Further reading
- Law no. 13.709/2018 (LGPD), art. 5, XI â the legal definition of anonymization that Brazilian practice fails to meet.
- Latanya Sweeney, k-Anonymity: A Model for Protecting Privacy (2002) â the canonical paper, with the finding that three demographic attributes uniquely identify roughly 87% of American citizens.
- Arvind Narayanan and Vitaly Shmatikov, Robust De-anonymization of Large Sparse Datasets (2008) â the Netflix Prize, empirical proof that âanonymizedâ datasets frequently are not.
- Paul Ohm, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization (UCLA Law Review, 2010) â the American legal essay against the illusion of perfect anonymization.
- Bruce Schneier, Beyond Fear (2003) â the book in which the expression security theater first appears, and the systematization of what is real vs. performative protection.
- STF, ARE 652.777/SP (2015) â the nominal disclosure of civil servantsâ salaries as a consequence of the constitutional principle of publicity.
- Law no. 12.527/2011 (LAI), art. 8 â active transparency as a duty of the State, taking priority over the privacy of the public agent in the exercise of office.
- Wikipedia entry on Walter Delgatti Neto â the hacker from Araraquara as documentary character: the average Brazilian technical ceiling has a name, an address, a criminal record and an ankle monitor.
- Jorge Luis Borges, Funes el memorioso â on what happens when the database doesnât forget.
Footnotes
-
The reader who clicked this footnote is probably also the reader who would write the fifteen lines of Python. The CPFâs two check digits are defined as follows: given the nine-digit prefix
dââŠdâ, you compute the weighted sumsâ = 10·dâ + 9·dâ + 8·dâ + ⊠+ 2·dâ; the tenth digitDâis(sâ·10) mod 11, with the convention that the result becomes 0 if it equals 10. The eleventhDâis defined analogously, with weights from 11 down to 2 applied todââŠdâand the freshly computedDâ. The operation is deterministic and cheap. It runs silently inside any system that validates a CPF â banks, tax returns, forms â and has done so for decades. Hiding the last two digits is like hiding the result of a sum whose every term is in plain sight. â©
Related posts
The Serpent's Egg
The duty of rationality is incompatible with judicial patrimonialism. Article 489 of the Brazilian Civil Procedure Code of 2015 is that serpent's egg â incubated inside the patrimonial system, by the hands of its most eloquent representative, without him realizing what he was hatching.
Patents For Social Vulnerabilities: A Modest Proposal For Turning Criminals Into Consultants
A proposal for a patent-like system for social engineering techniques to incentivize disclosure and defense.
Comments
Comments not configured yet.