1. Introduction
In the past twenty years, the most popular direction of statistics is high- dimensional data. In functional magnetic resonance imaging (FMRI), bioin- formatics, Web mining, climate research, risk management and social science, it not only has a wide range of applications, but also the main direction of scientific research at present. In theoretical and practical, high-dimensional precision matrix estimation always plays a very important role and has wide applications in many fields.
Thus, estimation of high-dimensional precision matrix is increasingly becoming a crucial question in many field. However, estimation of high- dimensional precision matrix has two difficulty: 1) sparsity of estimator; (ii) the positive-definiteness constraint. Huang et al. [1] considered using Cholesky decomposition to estimate the precision matrix. Although the regularized Cholesky decomposition approach can achieve a positive-semidefiniteness, it can not guarantee sparsity of estimator. Meinshausen et al. [2] use a neigh- bourhood selection scheme in which one can sequentially estimate the support of each row of precision matrix by fitting lasso penalized least squares regression model. Peng et al. [3] considered a joint neighbourhood estimator by using the lasso penalization. Yuan [4] considered the Dantzig selector to replace the lasso penalized least squares in the neighbourhood selection scheme. Cai et al. [5] considered a constrained
minimization estimator for estimating sparse precision matrices. However, this methods mentioned are not always achieve a positive-definiteness.
To overcome the difficulty (ii), one possible method is using the eigen- decomposition of
and designing
to satisfy condition
. Assume that
has the eigen-decomposition
and then a positive semi- definite estimator was gained by setting
. However, this strategy destroys the sparsity pattern of
for sparse precision matrix estimation. Yuan et al. [6] considered the lasso penalized likelihood criterion and used the maxd et al. gorithm to compute the estimator. Friedman et al. [7] considered the graphical lasso algorithm for solving the lasso penalized Gaussian likelihood estimator. Witten et al. [8] optimized the graphical lasso. By the lasso or
penalized Gaussian likelihood estimator, thoses methods simultaneously achieve positive-definiteness and sparsity.
Recently, Zhang et al. [9] consider a constrained convex optimization frame- work for high-dimensional precision matrix. They used lasso penalized D-trace loss replace traditional lasso function, and enforced the positive-definite constraint
for some arbitrarily small
. In their work, focusing on solving problem as follow:
(1)
It is important to note that
is not a tuning parameter like
. We simply include
in the procedure to ensure that the smallest eigenvalue of the estimator is at least
. They developed an efficient alternating direction method of multipliers (ADMM) to solve the challenging optimization problem (1) and establish its convergence properties.
To gain a better estimator for high-dimensional precision matrix and achieve the more optimal convergence rate, this paper mainly propose an effective algorithm, an accelerated gradient method ( [10] ), with fast global convergence rates to solve problem (1). This method mainly basis on the Nesterov's method for accelerating the gradient method ( [11] [12] ), showing that by exploiting the special structure of the trace norm, the classical gradient method for smooth problems can be adapted to solve the trace regularized nonsmooth problems. However, for our problem (1), we have not trace norm, instead of is
norm form, but this method have the similar efficiently result for our problem. Numerical results show that this method for our problem (1) not only has significant computational advantages, but also achieves the optimal convergence
rate as
.
The paper is organized as follows: Section 2 introduces our methodology, including model establishing in Section 2.1; step size estimation in Section 2.2; an accelerate gradient method algorithm in Section 2.3; the convergence analysis results of this algorithm in Section 2.4. Section 3 introduced numerical results for our method in comparing with other methods. And discussion are made in Section 4. All proofs are given in the Appendix.
2. An Accelerated Gradient Method
2.1. Model Establishing
According to introduction, our optimization problem D-trace Loss function as follow:
(2)
where
is a nonnegative penalization parameter,
is the sample cova-
riance matrix.
, and
is the
off- diagonal penalty. Defining
, and
is a con-
tinuously differentiable function. Considering the gradient step
(3)
where
is a stepsize,
. The smooth part (3)
can be reformulated equivalent as a proximal regularization of the linearized function
at
:
(4)
where
(5)
Based on this equivalence relationship, solving the optimization problem (2) by the following iterative step:
(6)
with equality in the last line by ignoring terms that do not depend on
.
Defining
as the projection of a matrix
onto the convex cone
. Assuming that
has the eigen-decomposition
, and then
can be obtained as
. Defining an entry-wise soft-thresholding rule for all the off-diagonal elements of a matrix
with
. Thus, the above problem can be summarized in the following theorem:
Theorem 1: Let
, and
is symmetric covariance matrix, then:
(7)
is given by
, where
with
.
The proof of this theorem is easy by applying the soft-thresholding method.
2.2. Step Size Estimation
To guarantee the convergence rate of the resulting iterative sequence, Firstly giving the relationship between our proximal function
and the objection function
at the certain point.
Lemma 1: Let
(8)
where
is defined in Equation (6). Assuming the following inequality holds:
(9)
then for any
, then:
(10)
This lemma is proved in the Appendix.
At each iterative step of the algorithm, an appropriate step size
is needed to satisfy
and
(11)
Since the gradient of f(・) satisfies Lipschitz continuous, according to Nesterov et al. [11] work, having follow lemma.
Lemma 2: Supposing that
is a convex function, and the gradient of
denote
is Lipschitz continuous with constant
, then:
(12)
so, we have
(13)
Hence, when
, then:
(14)
The above results show that the condition in Equation (11) is always satisfied when the update rule
(15)
2.3. An Accelerate Gradient Method Algorithm
In practice,
may be unknown or it is expensive to compute. To use the following step size estimation method, usually, giving an initial estimate of
as
and increasing this estimate with a multiplicative factor
re- peatedly until the condition in Equation (11) is satisfied. It is well known ( [11] [12] ) that if the objection function is smooth, then the accelerate gradient
method can achieve the optimal convergence rate of
. Recently, there
have other similar methods applying in problems consisted a smooth part and a non-smooth part ( [10] [13] [14] [15] ). Then giving the accelerate gradient algorithm to solve the optimization problem in Equation (2).
Algorithm1:An accelerate gradient method algorithm for high-dimensional precision matrix
1) Initialize:
,
,
,
2) Iterate:
3) set
4) While
, set
5) Set
and update
2.4. Convergence Analysis
In our method, two sequences
and
are updated recursively. In particular,
is the approximate solution at the kth step and
is called the search point ( [11] [12] ), which is constructed as a linear combination of the latest two approximate solutions
and
. In this section, the con-
vergence rate of the method can be showed as
. This result is sum-
marized in the following theorem.
Theorem 2: Let
and
be the covariance matrices sequence generated by our algorithm. Then for any
, having
(16)
where
.
3. Simulation
In this section, providing numerical results for our algorithm which will show our algorithmic advantages by three model. In the simulation study, data were generated from
, where
. And the sample size was taken to be n = 400 in all models, and let p =500 in Models 1 and 2, and p = 484 in Model 3, which is similar to Zhang et al. [9] . The numerical results of three models as follow:
Model 1:
for
and
otherwise.
Model 2:
for
and
otherwise.
Model 3:
for mod
,
and
otherwise; this is the grid model in Ravikumar et al. [16] and requires
to be an integer.
Simulation results based on 100 independent replications are showed in Table 1. This paper mainly compare the three methods in terms of four quantities: the
operator risk E
, the matrix
risk E
, and the
percentages of correctly estimated nonzeros and zeros (TP and TN), where
norm
is written as
. In the first two columns smaller numbers are better; in the last two columns larger numbers are better. In general, Table 1 shows that our estimator performs better than Zhang et al.’s method estimator and the lasso penalized Gaussian likelihood estimator.
4. Conclusion
This paper mainly estimate positive-definite sparse precision matrix estimation
Table 1. Comparison of our method with Zhang et al.’s method and graphical lasso.
via lasso penalized D-trace loss by an efficient accelerated gradient method. The positive-definiteness and sparsity are the most important property of large covariance matrices, our method not only efficiently achieves these property, but also shows an better convergence rate. Numerical results have show that our estimator also have a better performance, comparing to Zhang et al.’s method and the Graphical lasso method.
Funding
This project was supported by National Natural Science Foundation of China (71601003) and the National Statistical Scientific Research Projects (2015LZ54).
Appendix: Proff of Theorems and Lemmas
Appendix: Proof of Lemma 1
Since that both the trace function and
norm are all convex function, so
(17)
(18)
where
, is the sub-gradient of
norm at point
.
Since
and combing in Equations (17), (18) then
(19)
Since
is a minimizer of
, thus
(20)
So the Equation (19) can be simplified as:
(21)
Appendix: Proof of Theorem 2
Defining
,
, easily obtaining
(22)
since
. so
(23)
By applying Lemma 1, easily obtaining:
(24)
so:
(25)
Applying (23) (25), then:
(26)
Combining the Equation (26) and the relation
, easily ob-
taining:
(27)
Submit or recommend next manuscript to SCIRP and we will provide best service for you:
Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc.
A wide selection of journals (inclusive of 9 subjects, more than 200 journals)
Providing 24-hour high-quality service
User-friendly online submission system
Fair and swift peer-review system
Efficient typesetting and proofreading procedure
Display of the result of downloads and visits, as well as the number of cited articles
Maximum dissemination of your research work
Submit your manuscript at: http://papersubmission.scirp.org/
Or contact apm@scirp.org