Quantile Regression Based on Laplacian Manifold Regularizer with the Data Sparsity in l1 Spaces ()
1. Introduction
The classical least-squares regression models have focused mainly on estimating conditional mean functions. In contrast, quantile regression can provide richer information about the conditional distribution of response variables such as stretching or compressing tails, so it is particularly useful in applications when both lower and upper or all quantiles are of interest. Over the last years, quantile regression has become a popular statistical method in various research fields, such as reference charts in medicine [1] , survival analysis [2] and economics [3] .
In addition, relative to the least-squares regression, quantile regression estimates are more robust against outliers in the response measurements. We introduce a framework for data-dependent regularization that exploits the geometry of the marginal distribution. The labeled and unlabeled data learnt from the problem constructs a framework and incorporates the framework as an additional regularization term. The framework exploits the geometry of the probability distribution that generates the data. Hence, there are two regularization terms: one controlling the complexity of the classifier in the ambient space and the other controlling the complexity as measured by geometry of the distribution in the intrinsic space.
2. The Model
In this paper, under the framework of learning theory, we study l1-regularized and manifold regularized quantile regression. Let
be a compact subset of
and
. There is a probability distribution
on
according to which examples are generated for function learning. Labeled examples are
pairs generated according to
. Unlabeled examples are simply
drawn according to the marginal distribution
of
. We will make a specific assumption that an identifiable relation between
and the conditional
. The conditional t-quantile is a set-valued function defined by
(2.1)
where
,
and
.
The empirical method for estimating the conditional t-quantile function is based on the t-pinball loss
(2.2)
Then, denote the generalization error to minimize the conditional t-quantile function
with the loss function
(2.3)
where
. Based on observations, the empirical risk of the function
is
(2.4)
Next, We assume that
with respect to
.
In kernel-based learning, this minimization process usually takes place in a hypothesis space, Reproducing Kernel Hilbert Space (RKHS) [4] [5]
generated by a kernel function
. In the empirical case, a graph-based regular quantile regression problem can be typically formulated in terms of the following optimization
(2.5)
By the representers theorem, the solution to (2.5) can be written as
(2.6)
The l1-norm penalty not only shrinks the fitted coefficients toward zero but also causes some of the coefficients to be exactly zero when making
sufficiently large. When the data lies on a low-dimensional manifold, the graph- based method seems more effective for semi-supervised learning and many approaches have been proposed for instance Transductive SVM [6] , Measure- based Regularization [7] and so on. Then the l1-regularized and manifold regularized quantile regression are as following
(2.7)
where
,
.
are nonnegative regularization parameters.
is the unnormalized graph Laplacian, where
is a diagonal matrix with diagonal entries
. The weight
is given by a similar function
. The more similar
and
, the larger
should be.
3. The Restriction
Definition 3.1. The projection operator on the space of function is defined by
(3.1)
Hence, it is natural to measure the approximation ability by the distance
with
.
Definition 3.2. Let
and
. We say that
has a t-quantile of p-average type
if for almost all
with respect to
, there exist a
quantile
and constants
,
such that for each
,
(3.2)
(3.3)
and that the function
,
satisfies
.
For
and
, denote
(3.4)
Lemma 3.1. If
has a t-quantile of p-average type
for some
and
, then for any measurable function
, there holds
(3.5)
where
and
Definition 3.3. We say that the kernel function
is a
kernel with
if there exists some constants
, such that
(3.6)
We assume throughout the paper that
and denote
. Our approximation condition is given as
(3.7)
here, the kernel
is defined by
(3.8)
Hence, although kernel
in not positive semi-definite,
is a Mercer kernel,
denotes the associated reproducing kernel Hilbert spaces. The kernel
defines an integral operator
by
(3.9)
Note that
is a self-adjoint positive operator on
.Hence its s-th power
is well defined for any
. we take the RKHS
with
(3.10)
It is easy to see
, so that any function
can be expressed as
for some
.
Definition 3.4. Define a Banach space
with the norm
(3.11)
Definition 3.5. For every
, the l2-empirical covering number of
is
(3.12)
Lemma 3.2. There exist an exponent
and a constant
such that
(3.13)
suppose that
(3.14)
Define an operator
on
as
(3.15)
with
. The above equation tells us that
(3.16)
Next, The performance of
approaching
can be described through the regularizing function
defined as
(3.17)
the above function
given by (3.13 ) can be expressed as
(3.18)
where
. Moreover,
is a continuous function on
and
(3.19)
Definition 3.6. Let
be a set of function on
,
. The
metric between function on
is
(3.20)
Denote the ball of radius
as
.
4. Error Analysis
4.1. Error Decomposition
Proposition 4.1. Let
and
given by (2.6). Then
(4.1.1)
where
(4.1.2)
(4.1.3)
(4.1.4)
(4.1.5)
(4.1.6)
(4.1.7)
Proof. A direct decomposition shows that
(4.1.8)
The fact
implies that
. Hence, the second item in the right-hand side of the above equation is at most 0 by the reason that
is the minimizer of (2.7). Duo to
, we see that the last but one item is at most 0. The fifth item is less than by the
fact that
. Thus we complete the proof.,
4.2. Estimation of the Regularization Error
Proposition 4.2. Assume (3.7) holds, denoting
for some
then we have
(4.2.1)
where
.
Proof. Denote
. By proposition 2 in [8] , we get the following relationships
and
. Connected with the definition of
, we have
where
. we derive the desired bound.,
4.3. Estimation of the Manifold Error
In this subsection, we estimation the manifold error. Denote
(4.3.1)
(4.3.2)
(4.3.3)
(4.3.4)
where
,
,
. So we can see that
Lemma 4.1. Let
be a random variable on a probability space
with
satisfying
for some constant
. Then for any
, we have
(4.3.5)
Proposition 4.3. under the approximation condition (3.13), let
and
for some
. then for any
with the confidence
, there holds
(4.3.6)
Proof. By the definition of
, we have
. Since
, there holds
,
,
. Applying lemma 4.1, with confidence
,
(4.3.7)
(4.3.8)
(4.3.9)
(4.3.10)
Then we find the manifold error bound holds true.,
4.4. Estimation of the Hypothesis Error
This subsection is devoted to estimate the hypothesis errors. Under the assumption that the sample is i.i.d. drawn from
and
a.e., we estimate
as following.
Proposition 4.4. For any
,with confidence
, we have
(4.4.1)
(4.4.2)
Proof. We estimate
. Recall
, then
.
Applying Lemma 4.1 to the random variable
on
with value in
. There is
(4.4.3)
(4.4.4)
with confidence
, there holds
(4.4.5)
since
(4.4.6)
Then the bound of the following is derived with
(4.4.7)
Finally, we have
(4.4.8)
The
has been proved in [8] .,
4.5. Estimation of the Sample Error
Since
is a function valued random variable which depends on the sample error in the data independent space
which contains all possible hypothesis spaces
. Our estimations for
are based on the following concentration inequality see [8] .
Lemma 4.2. Let
be a class of measurable function on
. Assume that there are constants
and
such that
and
for every
. If for some
and
,
(4.5.1)
then there exists a constant
depending only on
such that for any
, with confidence
, there holds
(4.5.2)
where
.The same bound also holds true for
.
The following proposition which has been proved in [9] will be utilized to bound
.
Proposition 4.5. suppose that
has a t-quantile of p-average type
for some
,
. Let
and
. Assume
satisfies the capacity assumption (3.12) with some
. Then, for any
, with confidence
, there holds, for all
,
(4.5.3)
Here
and
are the constants depending on
and
.
The following proposition which has been proved in [9] will be utilized to bound
.
Proposition 4.6. Under the assumptions of proposition 4.5. Then, for any
,with confidence
, there holds,
(4.5.4)
here
is a constant independent of
.
Proof. We consider the following function set with
to bound
(4.5.5)
since
and
, for any
, we have
(4.5.6)
By Lemma 3.1, the variance-expectation condition of
is satisfied with
given by (3.4) and
. Then we get
(4.5.7)
Applying lemma 4.2 to
, then for any
, with confidence
, there holds that, for any
,
(4.5.8)
where
and
. From the processing of estimating
, for any
, with confidence
, we have
(4.5.9)
which implies there exists a subset
of
with measure at most
such that
(4.5.10)
The above inequality guarantees
with for every
. By Lemma 4.2 and (4.5.8), there existing
with measure at most
such that for every
, we have
and
(4.5.11)
Proposition 4.2 implies that
and
(4.5.12)
The Proposition 4.4 tells that there exists a subset
of
with measure of at most
such that for every
,
(4.5.13)
Let
. Obviously, the measure of
is at most
and for every
, the above inequalities hold. Finally, we combines (4.5.11), (4.5.12), (4.5.13), the result is completed.,
5. Total Error Bound
Proposition 5.1. suppose that
has a t-quantile of p-average type
for some
and
, and that Approximation condition (3.7) and Capacity condition (3.12) hold. Let
and
. Then, there exists a subset
of
with measure at most
such that for any
, we have
(5.1)
Here
are constants independent of
and
Proof. Proposition 4.4 ensures the existence
of
with measure at most
such that
(5.2)
(5.3)
hold for any
.
Proposition 4.5 tell us that there exists a subset
of
with measure at most
, such that
(5.4)
Proposition 4.6 ensures the existence of a subset
of
with measure at most
such that
(5.5)
Proposition 4.3 ensures that there exists a subset
of
with measure almost
such that
(5.6)
Takeing
, the measure of
is at most
, combining (5.2)-(5.6) and (4.2.1), then for every
we get
(5.7)
Here
is a constant independent of
, and
are as above.
Next, let
. Hence, the inequality (5.6) can be expressed as
(5.8)
where
is the rest terms. From Lemma 7.2 in [learning theory: an approximation theory viewpoint], the (5.8) has a unique positive solution
which can be bounds as
(5.9)
then the result is derived.,
6. Convergence Radius and Main Result
Proposition 6.1. Under the assumptions in proposition 5.1, we take
,
with
. Then, for any
, with confidence
, there holds
(6.1)
Proof. Applying
with
and letting
to proposition 5.1, then for any
, there exists a subset
of
with measure at most
such that
(6.2)
where the constants are given
(6.3)
where
is a constant independent of
.
It follows that
(6.4)
Then, we define a sequence
by
and, for
(6.5)
Duo to
ensures
, we have
(6.6)
with
. Hence the measure of is at least. By the iteration formula (6.5), we have
(6.7)
where
Noting that
, to ensure that
(6.8)
we only need
(6.9)
Then we get
(6.10)
Combine (6.2) and (6.10), we have
(6.11)
with
(6.12)
where The bound follows by replacing
by
.,
Theorem 6.1. Assume (3.7) and (3.13) hold. Taking
and
. Suppose that
has a t-quantile of
average type
for some
and
,
. Then for any
, with confidence
, we have
(6.13)
Proof. Applying Lemma 3.1, proposition 6.1 and proposition 5.1, with confidence
, we have
(6.14)
Here
is a constant independent of
and
(6.15)
with
The proof is complete.,
7. The Sparsity of the Algorithm
In this subsection, we consider the purpose of investigating sparsity of algorithm (2.7). Here the sparsity means the vanishing of some coefficients in the expansion
(7.1)
We provide a general result for the vanishing of the coefficient.
Proposition 7.1. Let
,
and
be the coefficient vector of
. If
(7.2)
(7.3)
(7.4)
Then we have
.
Proof. Define the function
in (2.7) to be optimized with
. Denote
by substituting the jth component of
to zero. Comparing
with
, since
and
, we have
(7.5)
If (7.2)-(7.4) are satisfied, we see from
that we must have
. In order to estimate error
, we only need to bounds
. We thus derive the following inequality, which plays an important role in our mathematical analysis.,
8. Conclusion
In this paper, we have a discussion of the lowest convergence rate of quantile regression with manifold regularization optimizing the intrinsic structure using the unlabeled data. The main result is to establish an upper bound for the total
error showing less than
. Meanwhile, the quantile regression
provides a piecewice linearity but a convex technique to overcome difficulties such as a high nonlinearity dependence on the predictor and linear suboptimal models. Finally, the sparsity is analysised in the
space.
Acknowledgements
We would like to thank the referees for their constructive comments and suggestions which have improved the paper. This work was supported by the [the Natural Sciences Foundation of China] under Grant [number 71401124].