Error Analysis and Variable Selection for Differential Private Learning Algorithm

In this paper, we construct a modified least squares regression algorithm which can provide privacy protection. A new concentration inequality is applied and the expected error bound is derived by error decomposition. Furthermore, via the error analysis, we find a method to choose an appropriate parameter  to balance the error and privacy.


Introduction
Privacy protection attracts much attention in many branches of computer science.To deal with this, Dwork et al. proposed differential privacy in [1].Soon [2] builds an exponential mechanism which is a useful approach to construct a differential private algorithm.The concept is introduced into learning theory in [3].
There, the authors consider output perturbation and object perturbation for ERM algorithms.Analysis of privacy and generalization for those algorithms also has been conducted.P. Jain and his collaborators have done a lot of work on differential private learning afterwards [4] [5] and etc.Recently, in [6], the authors find that the empirical average of the output from a differential private algorithm can converge to its expectation.And [7] provides another analysis of this convergence, which motivates our work.
In this paper, we consider the following statistical learning model (see [8] [9] for more details): The input space X is a compact metric space, and the output space is Y ⊂  as a regression problem.Throughout the paper, we assume the output Y is uniformly bounded, i.e., y M ≤ for some 0 M > almost surely.On the sample space : Z X Y = × , we try to find a function : f X Y → via some algorithms  , reflecting the relationship between the input and output.Algorithm  relies on the random chosen sample the sample is drawn according to a distribution function ρ on Z .Furthermore, we assume there is a marginal distribution X ρ on X and conditional distri- bution ( ) y x ρ on Y given some x .Now we expect the algorithm can provide some privacy protection.We assume  satisfies the ( ) , γ  differential private condition [1].Denoting the Hamming distance between two sample sets { } , # 1, , : , i.e., there is only one element is different.Then ( )  -differential private is defined as follows:  -differential private if for every two data sets 1 2 , z z satisfying ( ) , and every sets Here  is a function space from X to Y , which is called the hypothesis space.In the sequel, we focus on the ( ) , 0  -differential privacy with some , which is always called  -differential privacy for simplicity.How to choose an appropriate  is a fundamental problem in differential private algorithms [10], and we will provide a method during our error estimation in the following sections.

Concentration Inequality
In this section, we study the error between average and expectation for an algorithm  providing  -differential privacy.Our first result can be stated as follow: Theorem 1 If an algorithm  provides  -differential privacy, and outputs a positive function


for some 0 G > , where the expectation is taken over the sample via the algorithm output.Then , , , , , , , .
These verify our results.
Remark 1 Similar results are proposed in [6] and [7].However, there the authors limits the function to take value in [ ] 0,1 or { } 0,1 , our result here extends theirs to the function taking value in +  .This makes our following error analysis implementable.

Differential Private Learning Algorithm
In this section we consider the differential private least squares regularization algorithm.For a Mercer kernel K defined on X X × , the function space as the reproducing property.In the sequel, we always assume y M ≤ for some constant 0 M > .The least squares regularization algorithm, which has been extensively studied in such as [8] [11] [12] and etc. is: Denote π as a projection operator as we did in [13] [14]: , . , Then we add a noise term b in the original algorithm (1) like the output perturbation algorithm in [3]: where the density of b is independent with z which will be clarified in the following analysis.Moreover, we take the following notation for simplicity: ∆ as the maximum infinite norm of difference when changing one sample point in z , i.e., if ( ) .
Then we have the following result: The proof is just as Theorem 4 in [15].For all possible function r , and , ′ z z differ in one element, then Pr Pr e e Pr .
Then the lemma is proved by a union bound. .Without loss of generality, we set ( )

Now we will bound the term
, , , , . Since the two functions are both the optimizer of algorithm (1), take derivative for f we have .
The last inequality is from the fact that ( ) sup sup , .
for any , ′ z z , and our lemma holds.It can be easily verified by discussion that for any , ′ z z , so we have the choice of noise b and the result for algorithm (2).  ) , then the algorithm (2) provides  -differential privacy.
The proof is by combining the two lemmas and the inequality above.And by simply calculation we can get the expression of α .

Error Analysis for Differential Private Learning Algorithm
In this section, we will study the expectation of the error between is the regression function which minimizes ( ) . Firstly we shall introduce the error decomposition: , where f λ is a function in K  to be determined and , Here 1  and 2  involve the function , z f  from random algorithm (2) so we call them random errors. and ( ) D λ are similar as classical ones in the past literature in learning theory and we still call them sample error and approximation error.In the following, we will study these errors respectively.

Error Bounds for Random Errors
Proposition 2 For function , z f  obtained from algorithm (2) with density of b as described in Proposition 1, we have ( ) which verifies the proposition.
For the term 2  , we have the same analysis.Proposition 3 For function , z f  obtained from algorithm (2) with density of b as described in Proposition 1, we have ( ) , And the proposition is proved.

Error Estimates for Sample Error and Approximation Error
Error estimates for sample error and approximation error have been extensively studied since [8].Here we provide the proof for completeness.It is known that f λ in the error decomposition (3) can be arbitrarily chosen in K  in [12] [13]   [14] and etc.Here we simply choose it to be the classical one ( ) From [16] [17] we have the expression of f λ is ( ) where K L is the operator defined on 2  Lemma 3 Let ξ be a random variable on a probability space Z satisfying ( ) Then we have the following analysis.
Proposition 4 For f λ and f ρ defined as above, assume ( ) Firstly we bound the sample error. ( . , . So from Hoeffding inequality there holds . 128 For the approximation error, note that ( ) ( ) which is independent with z and b , we have ( On the other hand, in [8], the authors pointed out that .
Combining the 3 bounds above, we can verify the proposition.

Convergence Result with Fixed 
In our analysis for , 1 z    above, we indeed have the following result .
Therefore, the error decomposition can be Then by choosing

Selection of  and Total Error Bound
From the analysis for random error, sample error and approximation error above, we can obtain the whole error bound as follow.
Theorem 3 Let , z f  derived from algorithm (2), , z f λ , f λ defined in the above subsections, and assume ( )   for balance, and the result is proved.

Conclusions
Theorem 2, where  is taken as a constant, reveals that the generalization error ( ) ( ) converges not to the one of regression function ( ) , but a little different one ( ) ( ) It can be seen from the definition of differential privacy that algorithms will provide more privacy when  tends to 0. However, Theorem 3 shows that  cannot be too small, since the expected error will be very large accordingly.
Hence our choice can be regarded as a balance between privacy protection and the expected error.In [19], the authors announce that  also needs tend to 0 in some rates to keep generalization which matches our result.
Compared with previous learning theory results [12] [20] [21] [22] and etc., our learning rate is not so good since a perturbation term is introduced.However, in our result Theorem 1, we did not need a capacity condition as what we did in classical error analysis, i.e., conditions on covering numbers, VC or Vγ dimensions.Instead the  -differential private condition is adopted.So it may be capable and interesting for us to apply such condition to other learning algorithms.

2
For the function , z f λ obtained from algorithm (1), assume , derived via algorithm (1) given any sample set , ′ z z satisfying ( ) some R M ≥ , and b takes value in ( ) , −∞ +∞ , we choose the density of b to be


analogous analysis to the proof of Theorem 1 tells us that

(
and we recall the Hoeffding in- equality[18]. , z f  derived from algorithm (2), , z f λ , f λ defined in the above subsections, and assume ( )


It can be seen from error decomposition (3) that