Differentially Private Gaussian Processes

Differentially Private
Gaussian Processes

Presented by Mike Smith
m.t.smith@sheffield.ac.uk

Differential Privacy

A Quick Introduction

See The Algorithmic Foundations of Differential Privacy by Dwork and Roth for a rigorous introduction to the framework.

Anonymity Gone Wrong

In the mid-1990s the Massachusetts Group Insurance Commission released 'anonymised' health records for state employees.

Anonymity Gone Wrong

The Governor assured the public that GIC had protected patient privacy by deleting identifiers.
Broken Promises of Privacy, Paul Ohm

Anonymity Gone Wrong

The data was 'anonymised' by removing names. Other identifying columns remained.

Anonymity Gone Wrong

Latanya Sweeney used the voter rolls to find the Governor's medical records...

...which she posted to him.

This is a linkage attack: it uses auxiliary information to compromise privacy in a database.

Example

We might want to find out whether people hold a controversial view, that they wouldn't normally express:

The trams were actually a good idea

Many people in this room might secretly agree, but maybe wouldn't like to say 'Yes' to this question.

If flips are different, tell the truth.

If they're the same...

say

YES for heads

and

NO for tails

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	50	38	27	19	13	9	6	4	3	2	1	1	1	0	0	0	0	0	0	0
1	63	50	38	28	20	14	10	7	4	3	2	1	1	1	0	0	0	0	0	0
2	73	62	50	39	29	21	15	11	7	5	3	2	2	1	1	1	0	0	0	0
3	81	72	62	50	40	30	22	16	11	8	6	4	3	2	1	1	1	0	0	0
4	87	80	72	61	50	40	31	23	17	12	9	6	4	3	2	1	1	1	0	0
5	91	86	79	71	61	51	41	32	24	18	13	10	7	5	3	2	2	1	1	1
6	94	91	85	79	70	61	51	41	32	25	19	14	10	8	5	4	3	2	1	1
7	96	94	90	85	78	69	60	51	42	33	26	20	15	11	8	6	4	3	2	2
8	97	96	93	89	84	77	69	60	51	42	34	27	21	16	12	9	7	5	3	3
9	98	97	95	92	88	83	76	68	60	51	42	35	28	22	17	13	10	7	5	4
10	99	98	97	95	92	87	82	75	68	59	51	43	35	28	23	18	14	10	8	6
11	99	99	98	96	94	91	87	81	75	67	59	51	43	36	29	23	18	14	11	8
12	99	99	98	97	96	94	90	86	80	74	67	59	51	43	36	30	24	19	15	12
13	100	99	99	98	97	95	93	90	85	80	73	66	59	51	44	37	30	25	20	16
14	100	100	99	99	98	97	95	92	89	84	79	73	66	58	51	44	37	31	25	21
15	100	100	100	99	99	98	96	94	92	88	84	78	72	65	58	51	44	38	32	26
16	100	100	100	99	99	98	97	96	94	91	87	83	78	71	65	58	51	44	38	32
17	100	100	100	100	99	99	98	97	96	93	91	87	82	77	71	65	58	51	45	38
18	100	100	100	100	100	99	99	98	97	95	93	90	86	82	76	71	64	58	51	45
19	100	100	100	100	100	99	99	99	98	97	95	92	89	86	81	76	70	64	58	51

Summary

We want to protect a user from a linkage attack...

...while still performing inference over the whole group.

Making a dataset private needs more than just erasing names.

To achieve a level of privacy one needs to add randomness to the data.

This is a fundamental feature of differential privacy.

Differential Privacy for GPs

We have a dataset in which the inputs, $X$, are public. The outputs, $\mathbf{y}$, we want to keep private.

Data consists of the heights and weights of 287 women from a census of the !Kung

Vectors and Functions

Hall et al. (2013) showed that one can ensure that a version of $f$, function $\tilde{f}$ is $(\varepsilon, \delta)$-differentially private by adding a scaled sample from a GP prior.

Applied to GPs

We applied this method to the GP posterior.

The covariance of the posterior only depends on the inputs, $X$. So we can compute this without applying DP.

The mean function, $f_D(\mathbf{x_*})$, does depend on $\mathbf{y}$. $f_D(\mathbf{x_*}) = \mathbf{k}_*^\top K^{-1} \mathbf{y}$

We are interested in finding $|| f_D(\mathbf{x_*}) - f_{D^\prime}(\mathbf{x_*}) ||_H^2$

...how much the mean function (in RKHS) can change due to a change in $\mathbf{y}$.

Applied to GPs

Using the representer theorem, we can write $|| f_D(\mathbf{x_*}) - f_{D^\prime}(\mathbf{x_*}) ||_H^2$
as:

$\Big|\Big|\sum_{i=1}^n k(\mathbf{x_*},\mathbf{x}_i) \left(\alpha_i - \alpha^\prime_i\right)\Big|\Big|_H^2$

where $\mathbf{\alpha} - \mathbf{\alpha}^\prime = K^{-1} \left(\mathbf{y} - \mathbf{y}^\prime \right)$

$\Big|\Big|\sum_{i=1}^n k(\mathbf{x_*},\mathbf{x}_i) \left(\alpha_i - \alpha^\prime_i\right)\Big|\Big|_H^2$

where $\mathbf{\alpha} - \mathbf{\alpha}^\prime = K^{-1} \left(\mathbf{y} - \mathbf{y}^\prime \right)$

We constrain the kernel: $-1\leq k \leq 1$ and we only allow one element of $\mathbf{y}$ and $\mathbf{y}'$ to differ (by at most $d$).

So only one column of $K^{-1}$ will be involved in the change of mean (which we are summing over).

The distance above can then be shown to be no greater than $d\;||K^{-1}||_\infty$

Applied to GPs

This 'works' in that it allows DP predictions...

But to avoid too much noise, the value of $\varepsilon$ is too large (here it is 100!)

Inducing Inputs

Using sparse methods (i.e. inducing inputs) can help reduce the sensitivity a little.

Effect of perturbation

Previously I mentioned that the noise is sampled from the GP's prior.

This is not necessarily the most 'efficient' covariance to use.

Effect of perturbation

Cloaking

Left: Ideal covariance. Right: actual covariance

Cloaking

Hall et al. (2013) also presented a bound on vectors.

We need to find a bound ($\Delta$) on the scale of the output change, in term of its Mahalanobis distance with respect to the added noise covariance.

$\sup_{D \sim {D'}} ||M^{-1/2} (\mathbf{y}_D - \mathbf{y}_{D'})||_2 \leq \Delta$

Then use this to add noise to our vector $\mathbf{v}_D$:

$\tilde{\mathbf{v}_D} = \mathbf{v}_D + \frac{\text{c}(\delta)\Delta}{\varepsilon}Z$

Cloaking

Intuitively we want to construct M so that it has greatest covariance in those directions most affected by changes in training points, so that it will be most able to mask those changes.

The posterior mean predictions are,

$\mathbf{y}_* = K_{*f} K^{-1} \mathbf{y}$

The effect of perturbing each training point on each test point is represented in the cloaking matrix, $C = K_{*f} K^{-1}$

Cloaking

We assume we only need protect the change in one training input at a time.

...introduce y-y' earlier

sub into bound

optimisation

Example: House prices

Example: citibike

Also need to do intro re uganda

The go-to book on differential privacy, by Dwork and Roth;
Dwork, Cynthia, and Aaron Roth. "The algorithmic foundations of differential privacy." Theoretical Computer Science 9.3-4 (2013): 211-407. link

I found this paper allowed me to start applying DP to GP;
Hall, Rob, Alessandro Rinaldo, and Larry Wasserman. "Differential privacy for functions and functional data." The Journal of Machine Learning Research 14.1 (2013): 703-727. link

Articles about the Massachusetts privacy debate
Barth-Jones, Daniel C. "The're-identification'of Governor William Weld's medical information: a critical re-examination of health data identification risks and privacy protections, then and now." Then and Now (June 4, 2012) (2012). link

Ohm, Paul. "Broken promises of privacy: Responding to the surprising failure of anonymization." UCLA Law Review 57 (2010): 1701. link

Narayanan, Arvind, and Edward W. Felten. "No silver bullet: De-identification still doesn’t work." White Paper (2014). link

Howell, N. Data from a partial census of the !kung san, dobe. 1967-1969. https://public.tableau. com/profile/john.marriott#!/vizhome/ kung-san/Attributes, 1967.

Images used: BostonGlobe: Mass Mutual, Weld. Harvard: Sweeney. Rich on flickr: Sheffield skyline.

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	50	38	27	19	13	9	6	4	3	2	1	1	1	0	0	0	0	0	0	0
1	63	50	38	28	20	14	10	7	4	3	2	1	1	1	0	0	0	0	0	0
2	73	62	50	39	29	21	15	11	7	5	3	2	2	1	1	1	0	0	0	0
3	81	72	62	50	40	30	22	16	11	8	6	4	3	2	1	1	1	0	0	0
4	87	80	72	61	50	40	31	23	17	12	9	6	4	3	2	1	1	1	0	0
5	91	86	79	71	61	51	41	32	24	18	13	10	7	5	3	2	2	1	1	1
6	94	91	85	79	70	61	51	41	32	25	19	14	10	8	5	4	3	2	1	1
7	96	94	90	85	78	69	60	51	42	33	26	20	15	11	8	6	4	3	2	2
8	97	96	93	89	84	77	69	60	51	42	34	27	21	16	12	9	7	5	3	3
9	98	97	95	92	88	83	76	68	60	51	42	35	28	22	17	13	10	7	5	4
10	99	98	97	95	92	87	82	75	68	59	51	43	35	28	23	18	14	10	8	6
11	99	99	98	96	94	91	87	81	75	67	59	51	43	36	29	23	18	14	11	8
12	99	99	98	97	96	94	90	86	80	74	67	59	51	43	36	30	24	19	15	12
13	100	99	99	98	97	95	93	90	85	80	73	66	59	51	44	37	30	25	20	16
14	100	100	99	99	98	97	95	92	89	84	79	73	66	58	51	44	37	31	25	21
15	100	100	100	99	99	98	96	94	92	88	84	78	72	65	58	51	44	38	32	26
16	100	100	100	99	99	98	97	96	94	91	87	83	78	71	65	58	51	44	38	32
17	100	100	100	100	99	99	98	97	96	93	91	87	82	77	71	65	58	51	45	38
18	100	100	100	100	100	99	99	98	97	95	93	90	86	82	76	71	64	58	51	45
19	100	100	100	100	100	99	99	99	98	97	95	92	89	86	81	76	70	64	58	51

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	50	38	27	19	13	9	6	4	3	2	1	1	1	0	0	0	0	0	0	0
1	63	50	38	28	20	14	10	7	4	3	2	1	1	1	0	0	0	0	0	0
2	73	62	50	39	29	21	15	11	7	5	3	2	2	1	1	1	0	0	0	0
3	81	72	62	50	40	30	22	16	11	8	6	4	3	2	1	1	1	0	0	0
4	87	80	72	61	50	40	31	23	17	12	9	6	4	3	2	1	1	1	0	0
5	91	86	79	71	61	51	41	32	24	18	13	10	7	5	3	2	2	1	1	1
6	94	91	85	79	70	61	51	41	32	25	19	14	10	8	5	4	3	2	1	1
7	96	94	90	85	78	69	60	51	42	33	26	20	15	11	8	6	4	3	2	2
8	97	96	93	89	84	77	69	60	51	42	34	27	21	16	12	9	7	5	3	3
9	98	97	95	92	88	83	76	68	60	51	42	35	28	22	17	13	10	7	5	4
10	99	98	97	95	92	87	82	75	68	59	51	43	35	28	23	18	14	10	8	6
11	99	99	98	96	94	91	87	81	75	67	59	51	43	36	29	23	18	14	11	8
12	99	99	98	97	96	94	90	86	80	74	67	59	51	43	36	30	24	19	15	12
13	100	99	99	98	97	95	93	90	85	80	73	66	59	51	44	37	30	25	20	16
14	100	100	99	99	98	97	95	92	89	84	79	73	66	58	51	44	37	31	25	21
15	100	100	100	99	99	98	96	94	92	88	84	78	72	65	58	51	44	38	32	26
16	100	100	100	99	99	98	97	96	94	91	87	83	78	71	65	58	51	44	38	32
17	100	100	100	100	99	99	98	97	96	93	91	87	82	77	71	65	58	51	45	38
18	100	100	100	100	100	99	99	98	97	95	93	90	86	82	76	71	64	58	51	45
19	100	100	100	100	100	99	99	99	98	97	95	92	89	86	81	76	70	64	58	51

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	50	38	27	19	13	9	6	4	3	2	1	1	1	0	0	0	0	0	0	0
1	63	50	38	28	20	14	10	7	4	3	2	1	1	1	0	0	0	0	0	0
2	73	62	50	39	29	21	15	11	7	5	3	2	2	1	1	1	0	0	0	0
3	81	72	62	50	40	30	22	16	11	8	6	4	3	2	1	1	1	0	0	0
4	87	80	72	61	50	40	31	23	17	12	9	6	4	3	2	1	1	1	0	0
5	91	86	79	71	61	51	41	32	24	18	13	10	7	5	3	2	2	1	1	1
6	94	91	85	79	70	61	51	41	32	25	19	14	10	8	5	4	3	2	1	1
7	96	94	90	85	78	69	60	51	42	33	26	20	15	11	8	6	4	3	2	2
8	97	96	93	89	84	77	69	60	51	42	34	27	21	16	12	9	7	5	3	3
9	98	97	95	92	88	83	76	68	60	51	42	35	28	22	17	13	10	7	5	4
10	99	98	97	95	92	87	82	75	68	59	51	43	35	28	23	18	14	10	8	6
11	99	99	98	96	94	91	87	81	75	67	59	51	43	36	29	23	18	14	11	8
12	99	99	98	97	96	94	90	86	80	74	67	59	51	43	36	30	24	19	15	12
13	100	99	99	98	97	95	93	90	85	80	73	66	59	51	44	37	30	25	20	16
14	100	100	99	99	98	97	95	92	89	84	79	73	66	58	51	44	37	31	25	21
15	100	100	100	99	99	98	96	94	92	88	84	78	72	65	58	51	44	38	32	26
16	100	100	100	99	99	98	97	96	94	91	87	83	78	71	65	58	51	44	38	32
17	100	100	100	100	99	99	98	97	96	93	91	87	82	77	71	65	58	51	45	38
18	100	100	100	100	100	99	99	98	97	95	93	90	86	82	76	71	64	58	51	45
19	100	100	100	100	100	99	99	99	98	97	95	92	89	86	81	76	70	64	58	51