US bans differential privacy in Census data
447 points - today at 1:54 PM
SourceComments
It’s a census: it just asks questions.
If you start publishing and weaponizing the data against people with various attributes, they’ll just lie or not answer. And then you are left with worse than nothing: bad data people try to act on.
The comments that this rather expensive endeavour should just be about getting a head count are also amusing to me. The data collected was such an important baseline of common understanding, and this will not be a good thing for its future quality. I've grown very jaded now seeing all the things taken for granted in this country and lost or degraded recently with a whimper.
*: To be fair, they sent me specifically to places that didn't respond, so I was naturally led to believe that everyone in my region hated the government, ignored bizzarrely threatening fliers, or had recently moved and had no knowledge of the inhabitants (if any) during the census period.
"What is your religious affiliation". Seems perfectly innocuous, but turned out to be retroactively fatal if your answer could be attributed to you by a certain foreign occupier in the 1940s .
I think a large amount of the US’s success is the result of good institutions handling granular data. Policies can be adjusted to match outcomes more rapidly than otherwise.
I understand why people decide to diminish all state capacity - they feel that governments are populated by their opponents who will use state capacity against them. But as our relative strength wanes, our ability to overcome these forces of inertia does as well. And then our governments become less capable and eventually life starts getting worse.
We don’t need house-level data immediately (except perhaps in order to place census blocks within their appropriate congressional district etc). But there are aggregation units above which we should be using as good information as we possibly could be.
Differential privacy is absolutely necessary, and the social scientists being unable to reconstruct the data at an individual level is intended. A macroscopic description is rather enough for most purposes, and anything more is asking for a surveillance state.
I don't know what the political undertones are here, but at some level you need to have actual ground truth, including "this person/household declined".
Publishing raw data though? That seems like shooting yourself in the foot from a national security perspective, not to mention all the other reasons not to do it.
Or it's saying that one of these conflicting goals is more valuable than the other, and so shouldn't be sacrificed for it.
> The census bureau decided to adopt differential privacy for the 2020 Census
and:
> The consequences will be dire for utility or for privacy, and possibly both. It's hard to understate this point: future statistical releases will either be useless compared to past ones, or they will be incredibly unsafe
so we took the census for centuries before this point, and it was “ok.” and for the last census only we added some privacy items. but if we remove just one of those filters, we are in “dire” circumstances? but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
this makes it feel like an emotional overblown problem
Seems like something that could be abused to achieve political objectives.
Eg via some app that instructs respondents to enter a specific answer in a pseudorandomly chosen question.
Of course security would be another question.
Do. The American Census Survey (randomly-selected long-form questionairre) is dangerously overinvasive.
Why even do a census if you're just going to synthesize random data as the last step?
I was a big fan of differential privacy but now I think it might be doing more harm than good, as I haven't seen a single case where it was applied successfully in a problem where it actually mattered, and it contributed strongly to discrediting and preventing a lot of work on other anonymization techniques as it was deemed the only way to preserve privacy by the research community, so showing up with enhancements to k-anonymity or any other noise mechanism not rooted in it was a sure way to get ridiculed and ignored. And it's just not a practical mechanism, even when it works for a single disclosure you always end up having to blow up the privacy budget to a ridiculous amount in order to keep disclosing statistics as otherwise you would for almost all real-world data run out of budget after a few publications.
So, for me it's a technique that works in the areas where it doesn't really matter (publishing highly aggregated statistics that pose almost zero privacy risk even without differential privacy) and doesn't work in other areas where it would actually matter (publishing fine-grained data about individuals or small groups). There are some niche use cases but in my view the privacy community has really overblown the importance of differential privacy by portraying it as the only way to reliably anonymize data.
BTW the German census bureau has an interesting approach to anonymization which they use for several decades already and so far I haven't heard of any cases of successful de-anonymization of the data, maybe the US bureau should have a look at that for their own needs.
It must therefore be maximally transparent. Do you want president Trump or palantir to decide on the "noise infusion" algorithm?
* I want to accurately report the finances of our company to the best of my ability.
* But that report would allow people to reconstruct private data about the terms of our contracts with various counterparties. I'd really like to avoid that, there's no rule that says we're supposed to release that data. In fact some of those contracts probably came with nondisclosure agreements!
* So here's what I'm going to do. I'm going to calculate our results to the best of my ability, and then I'm going to add random values to them and report only the randomized ones. Any reconstruction people try to do will be wrong because of the randomness.
* If the SEC says "no, you need to report your actual numbers", I will explain to them that there's no such thing as an actual number because all data is noisy.
I can't get behind it.
"...for the next 950 days"
every time you read some politically spiteful news like thisbecause the next two years are going to become insanely miserable
https://www.statenews.org/government-politics/2026-06-12/ohi...
Representative Joyce Beatty is from Ohio and was instrumental in stopping Trump from illegally renaming the Kennedy Center.
https://www.theatlantic.com/culture/2026/06/kennedy-center-b...
Fundamentally this is public data. If it's to dangerous to make public, it's too dangerous to collect, and people should be aware of exactly what it is.
There are very few things that the state has data on that should not be made public. Census data is simply not one of those things.
publishing should be the default for any data, and to keep it unpublished should require substantially good reasons that impact the country as a whole. Frankly, if it isn't detailed national defence plans, i struggle to see any data that should not be public.