1. Introduction
Online learning is widely used recently in computer sciences, due to its effi- ciency in calculation and well theoretical results. Compared with the classical batch learning in learning theory, online algorithms update the output only according to the last sample point. So such algorithms are very effective to handle the practical problems and have been studied in [1] [2] [3] [4] [5] and etc. However, as the technologic development of data analysis, there are risks for applying such algorithms on a big data set. A commonly used notion for mea- suring the risk is differential privacy [6] . Little references on this topic can be found except for [7] . There the authors conducted an analysis for online convex programming. Choice for the parameters of differential privacy and utilities ana- lysis are presented for algorithms such as implicit gradient descent and genera- lized infinitesimal gradient descent. In this paper, the line of work begins with [8] is considered which can be thought as a kernel online learning algorithm.
2. Fundamental Principles
Our setting for online learning is introduced as follow. Let the input space
be a compact metric space, and output
for some
as a regression problem. Denote
as the sample space. Assume there is a probability measure
on
, which can be decomposed to marginal dis- tribution
on
and conditional distribution
on
at
. Then the regression function is defined by
(1)
which is indeed the conditional expectation of
given
. The regression fun- ction minimizes the least square generalization error (see [9] for more details)
(2)
So learning algorithms always aim to approximate the regression function
based on samples
, which are drawn independently from
distribution
. Let
be a Mercer Kernel, and
is the in- duced reproducing kernel Hilbert space (RKHS, [10] ), i.e., the completion of
where
for any
with respect to
the inner product
. The corresponding norm in
is denoted as
. Now our online learning algorithm as
(3)
with
. Here
is the step size and
is the regularization parameter.
When applying this online algorithm on private data set, it may leak some sensitive information. To deal with this privacy problem, Dwork et al. introdu- ced differential privacy in [11] . Which can be described as follow. For the sample space
introduced above, the Hamming distance between two sample sets
is
(4)
Definition 1 A random algorithm
is
-differential pri- vate if for every two data sets
satisfying
, and every set
, there holds
(5)
To endow our online algorithm the differential privacy property, a perturba- tion term is added into the output of (3), that is,
(6)
where
takes value in
with distribution to be determined in following analysis.
Differential private online learning has already been studied in [7] , there the authors consider a differentially private online convex programming problem. Here our algorithm is different, which is based on the Mercer kernels. Our pur- pose in this paper is to firstly provide the explicit density function for
and then conduct an error analysis for (6), which reveals the learning rate.
3. Differential Privacy Analysis
In this section, a detail analysis for the perturbation term
in algorithm (6) will be conducted. Firstly recall the useful definition of sensitivity and lemma proposed in [11] .
Definition 2 denote
as the maximum infinite norm of difference betw- een the outputs when changing the last sample point in
. Let
and
,
and
derived from (3) accordingly, it is clear that
(7)
Then a similar result to [11] is:
Lemma 1 Assume
is bounded by
, and
has density function
proportion to
, then algorithm (6) provides
-differential privacy.
Proof. For all possible output function
, and
differ in last element, then
(8)
and
(9)
So by triangle inequality,
(10)
Then the lemma is proved by a union bound. 
It is obvious that if finding the upper bound for
, the distribution for
can be derived. Set
and
for some
and
. Moreover, denote
(as
is Mercer Kernel
on compact metric space
). The next lemma is taken from [1] to bound
.
Lemma 2 If
, then for all
, there holds
(11)
Now the main result for differential privacy for algorithm (6) follows.
Theorem 1 When choosing
and
for some
and
, let the density function of
is
with
and
(12)
then the algorithm (6) provides
-differential privacy.
Proof. From (3) there holds
(13)
and
(14)
Then
(15)
From the above lemma
for all
. By the reproducing proper- ty that
(see [9] ),
(16)
Therefore
(17)
Set
to be the right hand side in lemma 1 then the theorem is proved. 
4. Error Analysis
In this section,
is assumed for simplicity. It will be shown that
obtained from (6) still converge to regression function
by choosing appro- priate parameter
under the choice of
and
as in the theorem in the last section. To this end, an error decomposition is needed. Denote operators
as
for
, and
as the identity operator. It is easy to verify that
. Notice that
, the following decomposition can be deduced:
(18)
(19)
(20)
Here
and
. In the follow- ing the first term is called initial error and second one is sample error. The initial error is easy to bound from the analysis above. Since
is such that
,
is a positive operator with
.
For the sample error, it is more difficult and the Pinelis-Bernstein inequality [4] will be applied.
Lemma 3 Let
be a martingale difference sequence in a Hilbert space. Sup- pose that almost surely
and
for some constants
. Then for any
, with probability at least
, there holds
(21)
Now the error bounds for sample error can be derived. Notice that
. Set
, then
(22)
(23)
(24)
And
. So for any
, with probability at least
,
(25)
Note that
, hence
(26)
Combining the initial error, sample error bounds and applying Markov in- equality for the fact that
, the total error estimation is obtained.
Theorem 2 Choose
and
as in the theorem in the last section, with confidence
, there holds
(27)
where constant
.
5. Conclusion
In this paper, analysis is performed for the differential privacy (Theorem 1) and generalization property (Theorem 2) for the online differential private learning algorithm (6). Under the choice of parameters in our theorems, the algorithm (6) can provide
-differential privacy and keep learning rate close to
, for any
. However, this error bound is not satisfactory enough. It might be an interesting problem to promote the error bound from
to
in our future work.
Fund
This work is supported by NSFC (Nos. 11326096, 11401247), NSF of Guangdong Province in China (No. 2015A030313674), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (No. 2013LYM_0089), National Social Science Fund in China (No. 15BTJ024), Planning Fund Project of Humanities and Social Science Research in Chinese Ministry of Education (No. 14YJAZH040) and Doctor Grants of Huizhou University (No. C511.0206).