^{1}

^{1}

In this paper, we construct a modified least squares regression algorithm which can provide privacy protection. A new concentration inequality is applied and the expected error bound is derived by error decomposition. Furthermore, via the error analysis, we find a method to choose an appropriate parameter to balance the error and privacy.

Privacy protection attracts much attention in many branches of computer sci- ence. To deal with this, Dwork et al. proposed differential privacy in [

In this paper, we consider the following statistical learning model (see [

Now we expect the algorithm can provide some privacy protection. We assume A satisfies the ( ϵ , γ ) differential private condition [

d ( z 1 , z 2 ) = # { i = 1 , ⋯ , m : z 1 , i ≠ z 2 , i } ,

i.e., there is only one element is different. Then ( ϵ , γ ) -differential private is defined as follows:

Definition 1 A random algorithm A : Z m → H is ( ϵ , γ ) -differential private if for every two data sets z 1 , z 2 satisfying d ( z 1 , z 2 ) = 1 , and every sets O ∈ H we have

Pr { A ( z 1 ) ∈ O } ≤ e ϵ ⋅ Pr { A ( z 2 ) ∈ O } + γ .

Here H is a function space from X to Y , which is called the hypothesis space. In the sequel, we focus on the ( ϵ ,0 ) -differential privacy with some 0 < ϵ < 1 , which is always called ϵ -differential privacy for simplicity. How to choose an appropriate ϵ is a fundamental problem in differential private algorithms [

In this section, we study the error between average and expectation for an algorithm A providing ϵ -differential privacy. Our first result can be stated as follow:

Theorem 1 If an algorithm A provides ϵ -differential privacy, and outputs a positive function g z , A : X × Y → ℝ with bounded expectation E z , A g z , A ≤ G for some G > 0 , where the expectation is taken over the sample via the algorithm output. Then

E z , A ( 1 m ∑ i = 1 m g z , A ( z i ) − ∫ Z g z , A ( z ) d ρ ) ≤ 2 G ϵ ,

and

E z , A ( ∫ Z g z , A ( z ) d ρ − 1 m ∑ i = 1 m g z , A ( z i ) ) ≤ 2 G ϵ .

Denote sample sets w j = { z 1 , z 2 , ⋯ , z j − 1 , z ′ j , z j + 1 , ⋯ , z m } for j ∈ { 1 , 2 , ⋯ , m } . We observe that

E z , A ( 1 m ∑ i = 1 m g z , A ( z i ) ) = 1 m ∑ i = 1 m E z E A ( g z , A ( z i ) ) = 1 m ∑ i = 1 m E z E z ′ i ∫ 0 + ∞ Pr A { g z , A ( z i ) ≥ t } d t

≤ 1 m ∑ i = 1 m E z E z ′ i ∫ 0 + ∞ e ϵ Pr A { g w i , A ( z i ) ≥ t } d t = e ϵ 1 m ∑ i = 1 m E w i E z i E A ( g w i , A ( z i ) ) = e ϵ 1 m ∑ i = 1 m E w i , A E z i ( g w i , A ( z i ) ) = e ϵ 1 m ∑ i = 1 m E w i , A ∫ Z g w i , A ( z ) d ρ = e ϵ 1 m ∑ i = 1 m E z , A ∫ Z g z , A ( z ) d ρ = e ϵ E z , A ∫ Z g z , A ( z ) d ρ .

Then

E z , A ( 1 m ∑ i = 1 m g z , A ( z i ) − ∫ Z g z , A ( z ) d ρ ) ≤ ( e ϵ − 1 ) E z , A ( ∫ Z g z , A ( z ) d ρ ) ≤ 2 G ϵ .

On the other hand,

E z , A ∫ Z g z , A ( z ) d ρ = 1 m ∑ i = 1 m E z E A ∫ Z g z , A ( z ) d ρ = 1 m ∑ i = 1 m E w i E A ∫ Z g w i , A ( z ) d ρ = 1 m ∑ i = 1 m E w i E A ∫ Z g w i , A ( z i ) d ρ ( z i ) = 1 m ∑ i = 1 m E w i E z i E A ( g w i , A ( z i ) ) = 1 m ∑ i = 1 m E z E z ′ i ∫ 0 + ∞ Pr A { g w i , A ( z i ) ≥ t } d t ≤ 1 m ∑ i = 1 m E z E z ′ i e ϵ ∫ 0 + ∞ Pr A { g z , A ( z i ) ≥ t } d t = e ϵ 1 m ∑ i = 1 m E z E A ( g z , A ( z i ) ) = e ϵ E z , A 1 m g z , A ( z i ) .

This leads to

E z , A ( ∫ Z g z , A ( z ) d ρ − 1 m ∑ i = 1 m g z , A ( z i ) ) = ( e ϵ − 1 ) E z , A 1 m ∑ i = 1 m g z , A ( z j ) ≤ 2 G ϵ .

These verify our results.

Remark 1 Similar results are proposed in [

In this section we consider the differential private least squares regularization algorithm. For a Mercer kernel K defined on X × X , the function space H K : = span { K ( x , ⋅ ) , x ∈ X } ¯ is the induced reproducing kernel Hilbert space (RKHS). Denote K x ( y ) = K ( x , y ) for any x , y ∈ X , and κ = sup x , y ∈ X K ( x , y ) . It is well known that f ( x ) = 〈 f , K x 〉 K as the reproducing property. In the sequel, we always assume | y | ≤ M for some constant M > 0 . The least squares regularization algorithm, which has been extensively studied in such as [

f z , λ = arg min f ∈ H K 1 m ∑ i = 1 m ( f ( x i ) − y i ) 2 + λ ‖ f ‖ K 2 . (1)

Denote π as a projection operator as we did in [

π ( f ( x ) ) = { M , f ( x ) > M f ( x ) , − M ≤ f ( x ) ≤ M − M , f ( x ) < − M .

Then we add a noise term b in the original algorithm (1) like the output perturbation algorithm in [

f z , A ( x ) = π ( f z , λ ( x ) ) + b (2)

where the density of b is independent with z which will be clarified in the following analysis. Moreover, we take the following notation for simplicity:

E ( f ) = ∫ Z ( f ( x ) − y ) 2 d ρ , E z ( f ) = 1 m ∑ i = 1 m ( f ( x i ) − y i ) 2 .

Definition 2 We denote Δ f z as the maximum infinite norm of difference when changing one sample point in z , i.e., if d ( z , z ′ ) = 1 ,

Δ f z = sup z , z ′ ‖ f z − f z ′ ‖ ∞ .

Then we have the following result:

Lemma 1 Assume Δ π ( f z , λ ( x ) ) is bounded, and b has density function

proportion to exp { − ϵ | b | Δ π ( f z , λ ) } , then algorithm (2) provides ϵ -differential

privacy.

The proof is just as Theorem 4 in [

Pr { f z , A = r } = Pr b { b = r − π ( f z , λ ) } ∝ exp ( − ϵ ‖ r − π ( f z , λ ) ‖ ∞ Δ π ( f z , λ ) ) ,

and

Pr { f z ′ , A = r } = Pr b { b = r − π ( f z ′ , λ ) } ∝ exp ( − ϵ ‖ r − π ( f z ′ , λ ) ‖ ∞ Δ π ( f z ′ , λ ) ) .

So

Pr { f z , A = r } ≤ Pr { π ( f z ′ , A ) = r } × e ϵ ‖ π ( f z , λ ) − π ( f z ′ , λ ) ‖ ∞ Δ π ( f z , λ ) ≤ e ϵ Pr { f z ′ , A = r } .

Then the lemma is proved by a union bound.

Now we will bound the term Δ f z , λ .

Lemma 2 For the function f z , λ obtained from algorithm (1), assume ‖ f z , λ ‖ K ≤ R for any z ∈ Z m for some R ≥ M , and 0 < λ ≤ 1 , we have

Δ f z , λ ≤ 2 R κ 2 ( κ + 1 ) λ m .

Assume f z , λ and f z ′ , λ are two results derived via algorithm (1) given any sample set z , z ′ satisfying d ( z , z ′ ) = 1 . Without loss of generality, we set z ′ = ( z 1 , z 2 , ⋯ , z m − 1 , z m ′ ) . Since the two functions are both the optimizer of algorithm (1), take derivative for f we have

2 m ∑ i = 1 m ( f z , λ ( x i ) − y i ) K x i + 2 λ f z , λ = 0

and

2 m ∑ i = 1 m − 1 ( f z ′ , λ ( x i ) − y i ) K x i + 2 m ( f z ′ , λ ( x ′ m ) − y ′ m ) K x m + 2 λ f z ′ , λ = 0.

These lead to

1 m ∑ i = 1 m ( f z , λ ( x i ) − f z ′ , λ ( x i ) ) K x i + λ ( f z , λ − f z ′ , λ ) = 1 m [ ( f z ′ , λ ( x ′ m ) − y ′ m ) K x ′ m − ( f z , λ ( x m ) − y m ) K x m ] .

Take inner product with f z , λ − f z ′ , λ by both sides we have

1 m ∑ i = 1 m ( f z , λ ( x i ) − f z ′ , λ ( x i ) ) 2 + λ ‖ f z , λ − f z ′ , λ ‖ K 2 = 1 m [ ( f z ′ , λ ( x ′ m ) − y ′ m ) ( f z , λ ( x ′ m ) − f z ′ , λ ( x ′ m ) ) − ( f z , λ ( x m ) − y m ) ( f z , λ ( x m ) − f z ′ , λ ( x m ) ) ] .

This means

λ ‖ f z , λ − f z ′ , λ ‖ K 2 ≤ 1 m [ | f z ′ , λ ( x ′ m ) − y ′ m | + | f z , λ ( x m ) − y m | ] ⋅ ‖ f z , λ − f z ′ , λ ‖ ∞ ≤ 1 m ( ‖ f z ′ , λ ‖ ∞ + ‖ f z , λ ‖ ∞ + 2 M ) κ ‖ f z , λ − f z ′ , λ ‖ K .

The last inequality is from the fact that

‖ f ‖ ∞ = sup x ∈ X f ( x ) = sup x ∈ X 〈 f , K x 〉 K ≤ ‖ K x ‖ K ⋅ ‖ f ‖ K ≤ κ ‖ f ‖ K .

Since ‖ f z , λ ‖ K ≤ R , then ‖ f z ′ , λ ‖ K ≤ R as well. Therefore,

‖ f z , λ − f z ′ , λ ‖ K ≤ 1 λ m ( 2 R κ + 2 M ) κ ≤ 2 R κ ( κ + 1 ) λ m

for any 0 < λ ≤ 1 . So

‖ f z , λ − f z ′ , λ ‖ ∞ ≤ 2 R κ 2 ( κ + 1 ) λ m

for any z , z ′ , and our lemma holds.

It can be easily verified by discussion that

‖ π ( f z , λ ) − π ( f z ′ , λ ) ‖ ∞ ≤ ‖ f z , λ − f z ′ , λ ‖ ∞

for any z , z ′ , so we have the choice of noise b and the result for algorithm (2).

Proposition 1 Assume ‖ f z , λ ‖ K ≤ R for any z ∈ Z m for some R ≥ M , and b takes value in ( − ∞ , + ∞ ) , we choose the density of b to be

1 α exp ( − λ m ϵ | b | 2 R κ 2 ( κ + 1 ) ) , where α = 4 R κ 2 ( κ + 1 ) λ m ϵ , then the algorithm (2) pro-

vides ϵ -differential privacy.

The proof is by combining the two lemmas and the inequality above. And by simply calculation we can get the expression of α .

In this section, we will study the expectation of the error between E ( f z , A ) − E ( f ρ ) , where f ρ = ∫ Y y d ρ ( y | x ) is the regression function which minimizes E ( f ) . Firstly we shall introduce the error decomposition:

E ( f z , A ) − E ( f ρ ) ≤ E ( f z , A ) − E ( f ρ ) + λ ‖ f z , λ ‖ K 2 ≤ E ( f z , A ) − E z ( f z , A ) + E z ( f z , A ) − E z ( π ( f z , λ ) ) + E z ( π ( f z , λ ) ) + λ ‖ f z , λ ‖ K 2 − E ( f ρ ) ≤ E ( f z , A ) − E z ( f z , A ) + E z ( f z , A ) − E z ( π ( f z , λ ) ) + E z ( f z , λ ) + λ ‖ f z , λ ‖ K 2 − E ( f ρ ) ≤ E ( f z , A ) − E z ( f z , A ) + E z ( f z , A ) − E z ( π ( f z , λ ) ) + E z ( f λ ) + λ ‖ f λ ‖ K 2 − E ( f ρ ) ≤ R 1 + R 2 + S + D ( λ ) , (3)

where f λ is a function in H K to be determined and

R 1 = E ( f z , A ) − E z ( f z , A ) ,

R 2 = E z ( f z , A ) − E z ( π ( f z , λ ) ) ,

S = E z ( f λ ) − E ( f λ ) ,

D ( λ ) = E ( f λ ) − E ( f ρ ) + λ ‖ f λ ‖ K 2 .

Here R 1 and R 2 involve the function f z , A from random algorithm (2) so we call them random errors. S and D ( λ ) are similar as classical ones in the past literature in learning theory and we still call them sample error and approximation error. In the following, we will study these errors respectively.

Proposition 2 For function f z , A obtained from algorithm (2) with density of b as described in Proposition 1, we have

E z , A R 1 ≤ 8 ϵ ( 2 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + M 2 ) .

Note that

R 1 = ∫ Z ( f z , A ( x ) − y ) 2 d ρ − 1 m ∑ i = 1 m ( f z , A ( x i ) − y i ) 2 ,

analogous analysis to the proof of Theorem 1 tells us that

E z , A ( ∫ Z ( f z , A ( x ) − y ) 2 d ρ − 1 m ∑ i = 1 m ( f z , A ( x i ) − | y i ) 2 ) ≤ ( e ϵ − 1 ) E z E A 1 m ∑ i = 1 m ( π ( f z , λ ( x i ) ) + b − y i ) 2 d ρ = 2 ϵ E z E b ( b 2 + b ( π ( f z , λ ( x i ) ) − y i ) + ( π ( f z , λ ( x i ) ) − y i ) 2 ) ≤ 2 ϵ ( 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 4 M 2 ) ,

which verifies the proposition.

For the term R 2 , we have the same analysis.

Proposition 3 For function f z , A obtained from algorithm (2) with density of b as described in Proposition 1, we have

E z , A R 2 ≤ 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 .

Since

R 2 = E z ( f z , A ) − E z ( π ( f z , λ ) ) = 1 m ∑ i = 1 m [ ( f z , A ( x i ) − y i ) 2 − ( π ( f z , λ ( x i ) ) − y i ) 2 ] = 1 m ∑ i = 1 m b ( b + 2 π ( f z , λ ( x i ) ) − 2 y i ) = b 2 + 2 b ⋅ 1 m ∑ i = 1 m ( π ( f z , λ ( x i ) ) − y i ) ,

we have

E z , A R 2 = E z E b b 2 ≤ 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 .

And the proposition is proved.

Error estimates for sample error and approximation error have been extensively studied since [

f λ = arg min f ∈ H K E ( f ) + λ ‖ f ‖ K 2 .

From [

f λ = ( L K + λ ) − 1 L K f ρ ,

where L K is the operator defined on L ρ X 2 as

L K f ( t ) = ∫ X f ( x ) K ( x , t ) d ρ X .

[

Lemma 3 Let ξ be a random variable on a probability space Z satisfying | ξ ( z ) − E ξ | ≤ B for some B > 0 for almost all z ∈ Z , then

Pr { | 1 m ∑ i = 1 m ξ ( z i ) − E ξ ≥ ε | } ≤ 2 exp { − m ε 2 2 B 2 } .

Then we have the following analysis.

Proposition 4 For f λ and f ρ defined as above, assume f ρ ∈ L K r ( L ρ X 2 ) , we have

E z , A S + D ( λ ) ≤ 8 2 π M 2 m + λ min { 2 r , 1 } ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .

Firstly we bound the sample error.

S = E ( f λ ) − E z ( f λ ) = ∫ Z ( f λ ( x ) − y ) 2 d ρ − 1 m ∑ i = 1 m ( f λ ( x i ) − y i ) 2 .

Let ξ ( z ) = − ( f λ ( x ) − y ) 2 , since | f ρ ( x ) | = | ∫ Y y d ρ ( y | x ) | ≤ M , and

‖ f λ ‖ ∞ = ‖ ( L K + λ I ) − 1 L K f ρ ‖ ∞ ≤ ‖ ( L K + λ I ) − 1 L K ‖ ⋅ ‖ f ρ ‖ ∞ ≤ M ,

we have | ξ − E ξ | ≤ 8 M 2 . So from Hoeffding inequality there holds

Pr z { | ∫ Z ( f λ ( x ) − y ) 2 d ρ − 1 m ∑ i = 1 m ( f λ ( x i ) − y i ) 2 | ≥ ε } ≤ 2 exp { − m ε 2 128 M 4 } .

Then we have

E z , A S ≤ E z | S | = ∫ 0 + ∞ Pr z { | S | ≥ t } d t = ∫ 0 + ∞ 2 exp { − m t 2 128 M 4 } d t ≤ 8 2 π M 2 m .

For the approximation error, note that E ( f λ ) − E ( f ρ ) = ‖ f λ − f ρ ‖ ρ 2 [

which is independent with z and b , we have

E z , A E ( f λ ) − E ( f ρ ) = ‖ f λ − f ρ ‖ ρ 2 = ‖ ( L K + λ I ) − 1 ( L K − ( L K + λ I ) ) f ρ ‖ ρ 2 = λ 2 ‖ ( L K + λ I ) − 1 L K r L K − r f ρ ‖ ρ 2 ≤ λ 2 ‖ ( L K + λ I ) − 1 L K r ‖ 2 ‖ L K − r f ρ ‖ ρ 2 ≤ { λ 2 r ‖ L K − r f ρ ‖ ρ 2 , r ≤ 1 λ 2 κ 4 ( r − 1 ) ‖ L K − r f ρ ‖ ρ 2 , r > 1 ≤ λ min { 2 r , 2 } ( κ 4 ( r − 1 ) + 1 ) ‖ L K − r f ρ ‖ ρ 2 .

On the other hand, in [

any f ∈ H K . So

E z , A λ ‖ f λ ‖ K 2 = λ ‖ ( L K + λ I ) − 1 L K f ρ ‖ K 2 = λ ‖ ( L K + λ I ) − 1 L K 1 2 f ρ ‖ ρ 2 ≤ λ ‖ ( L K + λ I ) − 1 L K 1 2 + r ‖ 2 ⋅ ‖ L K − r f ρ ‖ ρ 2 ≤ { λ 2 r ‖ L K − r f ρ ‖ ρ 2 , r ≤ 1 2 λ ⋅ κ 4 r − 2 ‖ L K − r f ρ ‖ ρ 2 , r > 1 2 ≤ λ min { 2 r , 1 } ( κ 4 r − 2 + 1 ) ‖ L K − r f ρ ‖ ρ 2 .

Combining the 3 bounds above, we can verify the proposition.

In our analysis for E z , A R 1 above, we indeed have the following result

E z , A R 1 ≤ 16 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ + 2 ϵ E z ε z ( π ( f z , λ ) ) .

Therefore, the error decomposition can be

E z , A ( E ( f z , A ) − ( 1 + 2 ϵ ) E ( f ρ ) ) = E z , A ( R 1 + R 2 + S + D ( λ ) − 2 ϵ E ( f ρ ) ) ≤ 16 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ + 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + E z 2 ϵ ( E z ( π ( f z , λ ) ) − E ( f ρ ) ) + E z ( S + D ( λ ) ) ≤ 24 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 2 ϵ E z ( E z ( f z , λ ) + λ ‖ f z , λ ‖ K 2 − E ( f ρ ) ) + E z ( S + D ( λ ) ) ≤ 24 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 2 ϵ E z ( E z ( f λ ) + λ ‖ f λ ‖ K 2 − E ( f ρ ) ) + E z ( S + D ( λ ) ) ≤ 24 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + ( 1 + 2 ϵ ) E z ( S + D ( λ ) ) ≤ 24 M 2 κ 4 ( κ + 1 ) 2 λ 3 m 2 ϵ 2 + 3 2 π M 2 ( 1 + 2 ϵ ) m + λ min { 1 , 2 r } ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .

Then by choosing λ = ( 1 m ) 2 / min { 4 , 3 + 2 r } for balance we have the following

result.

Theorem 2 Let f z , A derived from algorithm (2), f z , λ , f λ defined in the

above subsections, and assume f ρ ∈ L K r ( L ρ X 2 ) , take λ = ( 1 m ) 2 / min { 4 , 3 + 2 r } ,

there holds

E z , A ( E ( f z , A ) − ( 1 + 2 ϵ ) E ( f ρ ) ) ≤ C ϵ ( 1 m ) min { 1 2 , 4 r 3 + 2 r } ,

where constant

C ϵ = 24 M 2 κ 4 ( κ + 1 ) 2 ϵ 2 + 8 2 π M 2 ( 1 + 2 ϵ ) + ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .

From the analysis for random error, sample error and approximation error above, we can obtain the whole error bound as follow.

Theorem 3 Let f z , A derived from algorithm (2), f z , λ , f λ defined in the

above subsections, and assume f ρ ∈ L K r ( L ρ X 2 ) , take

λ = ( 1 m ϵ ) 2 / min { 4 , 3 + 2 r } ,

and

ϵ = ( 1 m ) min { 1 / 3 , 4 r / ( 3 + 6 r ) }

we have

E z , A ( E ( f z , A ) − E ( f ρ ) ) ≤ C ˜ ( 1 m ) min { 1 3 , 4 r 3 + 6 r } ,

where constant

C ˜ = 8 ( 1 + 2 π ) M 2 + 24 M 2 κ 4 ( κ + 1 ) 2 + ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .

It can be seen from error decomposition (3) that

E z , A ( E ( f z , A ) − E ( f ρ ) ) ≤ E z , A ( E ( f z , A ) − E ( f ρ ) + λ ‖ f z , λ ‖ K 2 ) ≤ E z , A ( R 1 + R 2 + S + D ( λ ) ) ≤ 8 ϵ ( 2 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + M 2 ) + 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 8 2 π M 2 m + λ min { 2 r , 1 } ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 ≤ 8 M 2 ϵ + 24 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 8 2 π M 2 m + λ min { 2 r , 1 } ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .

Since λ ‖ f z , λ ‖ K 2 ≤ E z ( f z , λ ) + λ ‖ f z , λ ‖ K 2 ≤ E z ( 0 ) ≤ M 2 , we have ‖ f z , λ ‖ K ≤ M λ , i.e., we can choose R = M λ . Now take λ = ( 1 m ϵ ) 2 / min { 4 , 3 + 2 r } and ϵ = ( 1 m ) min { 1 / 3 , 4 r / ( 3 + 6 r ) } for balance, and the result is proved.

Theorem 2, where ϵ is taken as a constant, reveals that the generalization error E ( π ( f z , A ) ) converges not to the one of regression function E ( f ρ ) , but a little different one ( 1 + 2 ϵ ) E ( f ρ ) in expectation.

It can be seen from the definition of differential privacy that algorithms will provide more privacy when ϵ tends to 0. However, Theorem 3 shows that ϵ cannot be too small, since the expected error will be very large accordingly. Hence our choice can be regarded as a balance between privacy protection and the expected error. In [

Compared with previous learning theory results [

This work is supported by NSFC (Nos. 11326096, 11401247), NSF of Guangdong Province in China (No. 2015A030313674), National Social Science Fund in China (No. 15BTJ024), Planning Fund Project of Humanities and Social Science Research in Chinese Ministry of Education (No. 14YJAZH040), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (No. 2016KQNCX162) and the Major Incubation Research Project of Huizhou University (No. hzux1201619).

Nie, W.L. and Wang, C. (2017) Error Analysis and Variable Selection for Differential Private Learning Algorithm. Journal of Applied Mathematics and Physics, 5, 900-911. https://doi.org/10.4236/jamp.2017.54079