Algorithmic Data Privacy, summer semester 22/23

Monday 10:40 in S8

Webpage for a course on data privacy, taught by Pavel Hubáček and Pavel Veselý. Formally, it's under subject Selected Topics in Algorithms II - NTIN111 (please enroll there if you want to get credits).

Brief introduction: Datasets containing sensitive personal information could serve as invaluable resources to policy-makers, public institutions, and companies. Yet, their analysis or release could jeopardize the privacy of individuals whose data is contained in such datasets. To tackle this issue, several approaches have been proposed, such as anonymizing the dataset. However, most of the ad hoc attempts at privacy failed spectacularly, which lead to serious leaks of sensitive personal information.

To address all possible attacks at fishing out sensitive information from aggregate statistics, a robust notion of differential privacy has been developed in 2006 by Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam D. Smith (the authors received the prestigious 2017 Gödel Prize for this work). Since then, differential privacy has received significant attention and made it into the production of companies such as Google and Apple.

This course will describe algorithms for analyzing data in a differentially private way. We will also cover lower bounds on differentially private algorithms as well as the main variants of the model.

Prerequisites: Differentially private algorithms are necessarily randomized and their privacy and utility guarantees are analyzed mathematically. We will thus need a certain level of mathematical maturity and a background in probability. We will briefly review all necessary probability tools, most often without proofs, which are cover by other courses.

Lectures

Slides (continually updated)
13 Feb P.H.: Intro & some failed attempts at achieving privacy, namely: simple deanonymization and k-anonymity (examples: NYC taxi data, Netflix prize, memoization in neural networks, census data, ...). Reconstruction attack due to Dinur and Nissim '03. Based on [G] (the first two lectures) and [DR] (Section 8.1). If you missed the first lecture, it is still fine to join and come to the second lecture, since we didn't start with differential privacy yet.
20 Feb P.H.: Reconstruction attacks (continued) and introduction to differential privacy (DP): the definition of DP, randomized response, ell_1 sensitivity of functions, and Laplace mechanism. Based on [G] (lectures 3 and part of 4)
27 Feb P.H.: Laplace mechanism (continued), general counting versus histogram queries, post-processing property, and group privacy. Based on [G] (lecture 4)
6 Mar P.V.: Quick recap. Proof of basic composition of epsilon-DP (including how to deal with adaptivity of queries/mechanisms). Approximate differential privacy, that is (epsilon,delta)-DP, a discussion about the choice of delta (name and shame mechanism), and Gaussian mechanism (only a part of the proof of (epsilon,delta)-DP; the rest is in the lecture notes [G] (lecture 5). Based on [G] (end of lecture 4 and lecture 5).
13 Mar P.V.: Advanced composition (proof based on Section 3.5 of [DR], but instead of KL divergence and Max Divergence, we used the privacy loss random variable). Exponential mechanism (definition and an example on how to use it for single-item auctions), based on [G] (lecture 7).
20 Mar P.V.: Exponential mechanism (Laplace mechanism is a special case and example with finding the maximum, proofs of privacy and utility guarantees), based on [G] (lecture 7). Sparse vector technique, mainly algorithm AboveThreshold, based on [G] (a part of lecture 9).
27 Mar P.H.: private PAC learning and truthful mechanism design using differential privacy, based on [B] (lectures 9 and 10).

Literature and online resources

Illustration of the Laplace additive noise mechanism for enforcing differential privacy Illustration of the Gaussian additive noise mechanism for enforcing differential privacy

Image source: Laplace and Gaussian