Error Analysis and Variable Selection for Differential Private Learning Algorithm ()
1. Introduction
Privacy protection attracts much attention in many branches of computer sci- ence. To deal with this, Dwork et al. proposed differential privacy in [1] . Soon [2] builds an exponential mechanism which is a useful approach to construct a differential private algorithm. The concept is introduced into learning theory in [3] . There, the authors consider output perturbation and object perturbation for ERM algorithms. Analysis of privacy and generalization for those algorithms also has been conducted. P. Jain and his collaborators have done a lot of work on differential private learning afterwards [4] [5] and etc. Recently, in [6] , the authors find that the empirical average of the output from a differential private algorithm can converge to its expectation. And [7] provides another analysis of this convergence, which motivates our work.
In this paper, we consider the following statistical learning model (see [8] [9] for more details): The input space
is a compact metric space, and the output space is
as a regression problem. Throughout the paper, we assume the output
is uniformly bounded, i.e.,
for some
almost surely. On the sample space
, we try to find a function
via some algorithms
, reflecting the relationship between the input and output. Algorithm
relies on the random chosen sample
, while the sample is drawn according to a distribution function
on
. Furthermore, we assume there is a marginal distribution
on
and conditional distribution
on
given some
.
Now we expect the algorithm can provide some privacy protection. We assume
satisfies the
differential private condition [1] . Denoting the Hamming distance between two sample sets
is
i.e., there is only one element is different. Then
-differential private is defined as follows:
Definition 1 A random algorithm
is
-differential private if for every two data sets
satisfying
, and every sets
we have
Here
is a function space from
to
, which is called the hypothesis space. In the sequel, we focus on the
-differential privacy with some
, which is always called
-differential privacy for simplicity. How to choose an appropriate
is a fundamental problem in differential private algorithms [10] , and we will provide a method during our error estimation in the following sections.
2. Concentration Inequality
In this section, we study the error between average and expectation for an algorithm
providing
-differential privacy. Our first result can be stated as follow:
Theorem 1 If an algorithm
provides
-differential privacy, and outputs a positive function
with bounded expectation
for some
, where the expectation is taken over the sample via the algorithm output. Then
and
Denote sample sets
for
. We observe that
Then
On the other hand,
This leads to
These verify our results.
Remark 1 Similar results are proposed in [6] and [7] . However, there the authors limits the function to take value in
or
, our result here extends theirs to the function taking value in
. This makes our following error analysis implementable.
3. Differential Private Learning Algorithm
In this section we consider the differential private least squares regularization algorithm. For a Mercer kernel
defined on
, the function space
is the induced reproducing kernel Hilbert space (RKHS). Denote
for any
, and
. It is well known that
as the reproducing property. In the sequel, we always assume
for some constant
. The least squares regularization algorithm, which has been extensively studied in such as [8] [11] [12] and etc. is:
(1)
Denote
as a projection operator as we did in [13] [14] :
Then we add a noise term
in the original algorithm (1) like the output perturbation algorithm in [3] :
(2)
where the density of
is independent with
which will be clarified in the following analysis. Moreover, we take the following notation for simplicity:
Definition 2 We denote
as the maximum infinite norm of difference when changing one sample point in
, i.e., if
,
Then we have the following result:
Lemma 1 Assume
is bounded, and
has density function
proportion to
, then algorithm (2) provides
-differential
privacy.
The proof is just as Theorem 4 in [15] . For all possible function
, and
differ in one element, then
and
So
Then the lemma is proved by a union bound.
Now we will bound the term
.
Lemma 2 For the function
obtained from algorithm (1), assume
for any
for some
, and
, we have
Assume
and
are two results derived via algorithm (1) given any sample set
satisfying
. Without loss of generality, we set
. Since the two functions are both the optimizer of algorithm (1), take derivative for
we have
and
These lead to
Take inner product with
by both sides we have
This means
The last inequality is from the fact that
Since
, then
as well. Therefore,
for any
. So
for any
, and our lemma holds.
It can be easily verified by discussion that
for any
, so we have the choice of noise
and the result for algorithm (2).
Proposition 1 Assume
for any
for some
, and
takes value in
, we choose the density of
to be
, where
, then the algorithm (2) pro-
vides
-differential privacy.
The proof is by combining the two lemmas and the inequality above. And by simply calculation we can get the expression of
.
4. Error Analysis for Differential Private Learning Algorithm
In this section, we will study the expectation of the error between
, where
is the regression function which minimizes
. Firstly we shall introduce the error decomposition:
(3)
where
is a function in
to be determined and
Here
and
involve the function
from random algorithm (2) so we call them random errors.
and
are similar as classical ones in the past literature in learning theory and we still call them sample error and approximation error. In the following, we will study these errors respectively.
4.1. Error Bounds for Random Errors
Proposition 2 For function
obtained from algorithm (2) with density of
as described in Proposition 1, we have
Note that
analogous analysis to the proof of Theorem 1 tells us that
which verifies the proposition.
For the term
, we have the same analysis.
Proposition 3 For function
obtained from algorithm (2) with density of
as described in Proposition 1, we have
Since
we have
And the proposition is proved.
4.2. Error Estimates for Sample Error and Approximation Error
Error estimates for sample error and approximation error have been extensively studied since [8] . Here we provide the proof for completeness. It is known that
in the error decomposition (3) can be arbitrarily chosen in
in [12] [13] [14] and etc. Here we simply choose it to be the classical one
From [16] [17] we have the expression of
is
where
is the operator defined on
as
[8] told us that
has a eigenvalue sequence
satisfies
when
, and
. Now we recall the Hoeffding inequality [18] .
Lemma 3 Let
be a random variable on a probability space
satisfying
for some
for almost all
, then
Then we have the following analysis.
Proposition 4 For
and
defined as above, assume
, we have
Firstly we bound the sample error.
Let
, since
, and
we have
. So from Hoeffding inequality there holds
Then we have
For the approximation error, note that
[9]
which is independent with
and
, we have
On the other hand, in [8] , the authors pointed out that
for
any
. So
Combining the 3 bounds above, we can verify the proposition.
4.3. Convergence Result with Fixed
In our analysis for
above, we indeed have the following result
Therefore, the error decomposition can be
Then by choosing
for balance we have the following
result.
Theorem 2 Let
derived from algorithm (2),
,
defined in the
above subsections, and assume
, take
,
there holds
where constant
.
4.4. Selection of
and Total Error Bound
From the analysis for random error, sample error and approximation error above, we can obtain the whole error bound as follow.
Theorem 3 Let
derived from algorithm (2),
,
defined in the
above subsections, and assume
, take
and
we have
where constant
It can be seen from error decomposition (3) that
Since
, we have
, i.e., we can choose
. Now take
and
for balance, and the result is proved.
5. Conclusions
Theorem 2, where
is taken as a constant, reveals that the generalization error
converges not to the one of regression function
, but a little different one
in expectation.
It can be seen from the definition of differential privacy that algorithms will provide more privacy when
tends to 0. However, Theorem 3 shows that
cannot be too small, since the expected error will be very large accordingly. Hence our choice can be regarded as a balance between privacy protection and the expected error. In [19] , the authors announce that
also needs tend to 0 in some rates to keep generalization which matches our result.
Compared with previous learning theory results [12] [20] [21] [22] and etc., our learning rate is not so good since a perturbation term is introduced. However, in our result Theorem 1, we did not need a capacity condition as what we did in classical error analysis, i.e., conditions on covering numbers, VC or Vg dimensions. Instead the
-differential private condition is adopted. So it may be capable and interesting for us to apply such condition to other learning algorithms.
Acknowledgements
This work is supported by NSFC (Nos. 11326096, 11401247), NSF of Guangdong Province in China (No. 2015A030313674), National Social Science Fund in China (No. 15BTJ024), Planning Fund Project of Humanities and Social Science Research in Chinese Ministry of Education (No. 14YJAZH040), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (No. 2016KQNCX162) and the Major Incubation Research Project of Huizhou University (No. hzux1201619).