When a little imprecision can help: Case studies from statistical privacy
To sanitize data for the purpose of disclosure control is to destroy its precision in some way. When done in an explicit or controlled manner, the imprecision can be salvaged to the statistician's benefit. This talk discusses how imprecision that results from privacy protection may be appropriated to improve our statistical understanding from the data at hand. Two ideas are sketched. The first demonstrates how knowledge about the imprecision can be harnessed to facilitate statistical computation and recover inference in a manner faithful to the downstream task. The second employs the imprecise probabilities vocabulary to establish analytical limits for key inferential quantities under minimal knowledge or assumptions about the downstream task and the privacy mechanism. Both ideas serve as persuasive arguments for a formal and transparent approach to disclosure control.
This body of work bears witness to the challenges that emerged from the U.S. Census Bureau's revamp of its disclosure avoidance system for the 2020 Decennial Census, and more broadly through efforts to expand data access to support research and policymaking under modern data governance directives. To that end, I conclude with an assessment of strongly quantitative notions of privacy, notably differential privacy, against prevailing qualitative guidelines of confidentiality protection to highlight its benefits and limitations.