US bans differential privacy in Census data

447 points - today at 1:54 PM

Comments

asolove today at 3:22 PM

The replies here arguing we should publish it all are wild in the worst kind of first-order thinking way.

It’s a census: it just asks questions.

If you start publishing and weaponizing the data against people with various attributes, they’ll just lie or not answer. And then you are left with worse than nothing: bad data people try to act on.

kajman today at 4:53 PM

I "enumerated" for the last census. Trust in my community was already not high* and I had lots of interesting encounters. I really believed the rather invasive data I was collecting with a friendly face would be used and handled responsibly. I feel for the poor souls that'll sign up to go door to door for 2030 now that the firewalls against weaponizing and monetizing all of our sensitive government data has been torn down, and even more for those that will volunteer information that can hurt them.

The comments that this rather expensive endeavour should just be about getting a head count are also amusing to me. The data collected was such an important baseline of common understanding, and this will not be a good thing for its future quality. I've grown very jaded now seeing all the things taken for granted in this country and lost or degraded recently with a whimper.

*: To be fair, they sent me specifically to places that didn't respond, so I was naturally led to believe that everyone in my region hated the government, ignored bizzarrely threatening fliers, or had recently moved and had no knowledge of the inhabitants (if any) during the census period.

Kim_Bruning today at 3:52 PM

Coming from a certain european country, you never know what answer on the census might get you into trouble.

"What is your religious affiliation". Seems perfectly innocuous, but turned out to be retroactively fatal if your answer could be attributed to you by a certain foreign occupier in the 1940s .

arjie today at 4:14 PM

Pretty sad, in my opinion. In my ideal the state should have visibility into the shape of the people present so that we can make good decisions about our combined organization. I think we’re making a mistake we will come to regret by intentionally damaging our data collection infrastructure.

I think a large amount of the US’s success is the result of good institutions handling granular data. Policies can be adjusted to match outcomes more rapidly than otherwise.

I understand why people decide to diminish all state capacity - they feel that governments are populated by their opponents who will use state capacity against them. But as our relative strength wanes, our ability to overcome these forces of inertia does as well. And then our governments become less capable and eventually life starts getting worse.

We don’t need house-level data immediately (except perhaps in order to place census blocks within their appropriate congressional district etc). But there are aggregation units above which we should be using as good information as we possibly could be.

MinimalAction today at 3:51 PM

Whatever you do, there is a level of trust that is assumed when census takes place. The trust that this data is then not identified in a way that could be targeted for scams, frauds, and other such evils. But in NY, house sale records are made public but much to the detriment, many mortgage companies fake a bill for payment.

Differential privacy is absolutely necessary, and the social scientists being unable to reconstruct the data at an individual level is intended. A macroscopic description is rather enough for most purposes, and anything more is asking for a surveillance state.

jmole today at 3:23 PM

Ban it from the dataset, add it to the analysis. You can choose your own flavor of noise.

I don't know what the political undertones are here, but at some level you need to have actual ground truth, including "this person/household declined".

Publishing raw data though? That seems like shooting yourself in the foot from a national security perspective, not to mention all the other reasons not to do it.

tbrownaw today at 3:22 PM

> Differential privacy makes this trade-off explicit, and thus impossible to ignore. Maybe banning it is a way of pretending that the problem doesn't exist, in the hope that it will go away?

Or it's saying that one of these conflicting goals is more valuable than the other, and so shouldn't be sacrificed for it.

foolfoolz today at 4:10 PM

i have such a hard time reconciling stuff like this:

> The census bureau decided to adopt differential privacy for the 2020 Census

and:

> The consequences will be dire for utility or for privacy, and possibly both. It's hard to understate this point: future statistical releases will either be useless compared to past ones, or they will be incredibly unsafe

so we took the census for centuries before this point, and it was “ok.” and for the last census only we added some privacy items. but if we remove just one of those filters, we are in “dire” circumstances? but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.

this makes it feel like an emotional overblown problem

iugtmkbdfil834 today at 4:29 PM

Can anyone explain to me the previous state and why it was desirable? I admittedly do not understand why people are getting riled up. I am not being difficult. I really don't understand the original state and the changed state here.

sherburt3 today at 5:49 PM

So "differential privacy" pretty much sounds like someone gets to modify the results of a census and how it gets modified is entirely up to their discretion.

Seems like something that could be abused to achieve political objectives.

Kim_Bruning today at 4:20 PM

https://www.npr.org/2026/06/12/nx-s1-5855734/census-bureau-d...

thih9 today at 4:33 PM

I guess this could be implemented externally.

Eg via some app that instructs respondents to enter a specific answer in a pseudorandomly chosen question.

Of course security would be another question.

ProllyInfamous today at 4:20 PM

The fines for non-compliance are low enough to remain silent.

Do. The American Census Survey (randomly-selected long-form questionairre) is dangerously overinvasive.

Bratmon today at 6:42 PM

This is a rare occasion of the Trump administration getting something right.

Why even do a census if you're just going to synthesize random data as the last step?

0xbadcafebee today at 6:19 PM

Any privacy-diminishing changes at federal level happening during this administration are for one reason only: to amass more power in Conservative administration/governance. At the federal level it's Project 2025, at the state level it's making sure states stay red and disenfranchise minorities.

watersb today at 3:30 PM

The better to sell the data, all your privates are belong to us.

mikelitoris today at 5:19 PM

But why?? Differential privacy works? It's not even "woke" or whatever these people perceive. It's just math man...

lokar today at 3:55 PM

Can anyone share how other countries handle this?

ThePhysicist today at 4:35 PM

I think it should be noted that there was a lot of dissatisfaction from users of the census data as far as I know. So it's not been banned just for politicals sake or because they hate privacy... Some people I talked to in the privacy field even called the whole thing a total disaster and weren't shy to put blame on John Abowd who apparently pushed this through despite a lot of internal opposition and concerns. Not sure if that's true, but what is definitely true is that the way the data was released produced serious issues downstream as most researchers and statisticians that ingested the data weren't prepared for receiving noisy data values. Differential privacy was applied in a way such that many invariants that data users cared about weren't preserved, which was expected as it's not possible as you can't preserve all invariants and at the same time add meaningful noise to the data. The thing is, with such a differentially private data release you need to adapt all of the downstream analyses to take into account the exact mechanism the data was altered in. And since the census bureau used a very intricate mechanism that didn't just add Laplace noise to data values but instead relied on a multi-stage process that preserved some invariants but not others it was very difficult to even write routines to account for the changes being made to the data. They essentially asked of every data user to rewrite their whole analysis pipeline based on the exact disclosure mechanism that contained a large number of bespoke choices regarding which data invariants to preserve and basically produced a mix of noisy, synthesized data that was just really hard to reason about. I don't even know if there even would've been a way to do this better, but the fact is that not every small county or school district has top-tier statisticians at hand that can just read a whole monograph on differentially private synthesized census data and then hotpatch their existing analysis systems to work with that data.

I was a big fan of differential privacy but now I think it might be doing more harm than good, as I haven't seen a single case where it was applied successfully in a problem where it actually mattered, and it contributed strongly to discrediting and preventing a lot of work on other anonymization techniques as it was deemed the only way to preserve privacy by the research community, so showing up with enhancements to k-anonymity or any other noise mechanism not rooted in it was a sure way to get ridiculed and ignored. And it's just not a practical mechanism, even when it works for a single disclosure you always end up having to blow up the privacy budget to a ridiculous amount in order to keep disclosing statistics as otherwise you would for almost all real-world data run out of budget after a few publications.

So, for me it's a technique that works in the areas where it doesn't really matter (publishing highly aggregated statistics that pose almost zero privacy risk even without differential privacy) and doesn't work in other areas where it would actually matter (publishing fine-grained data about individuals or small groups). There are some niche use cases but in my view the privacy community has really overblown the importance of differential privacy by portraying it as the only way to reliably anonymize data.

BTW the German census bureau has an interesting approach to anonymization which they use for several decades already and so far I haven't heard of any cases of successful de-anonymization of the data, maybe the US bureau should have a look at that for their own needs.

declan_roberts today at 5:55 PM

Census data is extremely powerful. It's why some states lost house seats and why some gained house seats.

It must therefore be maximally transparent. Do you want president Trump or palantir to decide on the "noise infusion" algorithm?

wnc3141 today at 3:30 PM

Stalin's demographic researchers kept disappearing until they came up with the numbers he wanted.

delichon today at 3:18 PM

The dueling political demands of accuracy and privacy are simply incompatible at some level. After reading this, maybe Hanlon's Razor isn't the right standard. Besides malice and stupidity, there is impossibility. Some problems just aren't solvable under certain constraints. I don't envy the statisticians tasked with finding a politically palatable solution to a math problem.

SpicyLemonZest today at 5:14 PM

I really have to take the anti-noise side here. I get why it's a hard problem, and I get why the Census Bureau thought this was a neat solution. But I'm imagining an accountant stepping through a similar chain of logic:

* I want to accurately report the finances of our company to the best of my ability.

* But that report would allow people to reconstruct private data about the terms of our contracts with various counterparties. I'd really like to avoid that, there's no rule that says we're supposed to release that data. In fact some of those contracts probably came with nondisclosure agreements!

* So here's what I'm going to do. I'm going to calculate our results to the best of my ability, and then I'm going to add random values to them and report only the randomized ones. Any reconstruction people try to do will be wrong because of the randomness.

* If the SEC says "no, you need to report your actual numbers", I will explain to them that there's no such thing as an actual number because all data is noisy.

I can't get behind it.

ck2 today at 4:32 PM

if you want to keep your sanity, I suggest silently adding the phrase

     "...for the next 950 days"

every time you read some politically spiteful news like this

because the next two years are going to become insanely miserable

zkzk_gamal today at 4:35 PM

i think they will use ai as a leverage card to other country to order them

yegortk today at 4:41 PM

Data shall set you free... or not

xenophonf today at 3:20 PM

This is a gift to reactionary gerrymandering and voting restriction efforts, along with things like yesterday's FBI raid of an Ohio voting rights organization.

https://www.statenews.org/government-politics/2026-06-12/ohi...

Representative Joyce Beatty is from Ohio and was instrumental in stopping Trump from illegally renaming the Kennedy Center.

https://www.theatlantic.com/culture/2026/06/kennedy-center-b...

ofcyes today at 4:34 PM

[dead]

abletonlive today at 3:18 PM

[flagged]

Pragmata today at 3:14 PM

Frankly i see no reason to keep this data private. They should simply publish a full dataset of the census, with no such data coarsening/differential privacy/ etc...

Fundamentally this is public data. If it's to dangerous to make public, it's too dangerous to collect, and people should be aware of exactly what it is.

There are very few things that the state has data on that should not be made public. Census data is simply not one of those things.

publishing should be the default for any data, and to keep it unpublished should require substantially good reasons that impact the country as a whole. Frankly, if it isn't detailed national defence plans, i struggle to see any data that should not be public.

whatever1 today at 3:14 PM

We can make them more accurate by leveraging ICE going door to door.