In the previous two posts we looked at a randomization scheme for protecting the privacy of a binary response. This post will look briefly at adding noise to continuous or unbounded data. I like to keep the posts here fairly short, but this topic is fairly technical. To keep it short I’ll omit some of the details and give more of an intuitive overview.
Differential privacy
The idea of differential privacy is to guarantee bounds on how much information may be revealed by someone’s participation in a database. These bounds are described by two numbers, ε (epsilon) and δ (delta). We’re primarily interested in the multiplicative bound described by ε. This number is roughly the number of bits of information an analyst might gain regarding an individual. (The multiplicative bound is exp(ε) and so ε, the natural log of the multiplicative bound, would be the information measure, though technically in nats rather than bits since we’re using natural logs rather than logs base 2.)
The δ term is added to the multiplicative bound. Ideally δ is 0, that is, we’d prefer (ε, 0)-differential privacy, but sometimes we have to settle for (ε, δ)-differential privacy. Roughly speaking, the δ term represents the possibility that a few individuals may stand to lose more privacy than the rest, that the multiplicative bound doesn’t apply to everyone. If δ is very small, this risk is very small.
Update: See a detailed introduction to differential privacy here.
Laplace mechanism
The Laplace distribution is also known as the double exponential distribution because its distribution function looks like the exponential distribution function with a copy reflected about the y-axis; these two exponential curves join at the origin to create a sort of circus tent shape. The absolute value of a Laplace random variable is an exponential random variable.
Why are we interested this particular distribution? Because we’re interested in multiplicative bounds, and so it’s not too surprising that exponential distributions might make the calculations work out because of the way the exponential scales multiplicatively.
The Laplace mechanism adds Laplacian-distributed noise to a function. If Δf is the sensitivity of a function f, a measure of how revealing the function might be, then adding Laplace noise with scale Δf/ε preserves (ε 0)-differential privacy.
Technically, Δf is the l1 sensitivity. We need this detail because the results for Gaussian noise involve l2 sensitivity. This is just a matter of whether we measure sensitivity by the l1 (sum of absolute values) norm or the l2 (root sum of squares) norm.
Gaussian mechanism
The Gaussian mechanism protects privacy by adding randomness with a more familiar normal (Gaussian) distribution. Here the results are a little messier. Let ε be strictly between 0 and 1 and pick δ > 0. Then the Gaussian mechanism is (ε, δ)-differentially private provided the scale of the Gaussian noise satisfies
It’s not surprising that the l2 norm appears in this context since the normal distribution and l2 norm are closely related. It’s also not surprising that a δ term appears: the Laplace distribution is ideally suited to multiplicative bounds but the normal distribution is not.
Fascinating work.