Modified Cp Criterion for Optimizing Ridge and Smooth Parameters in the MGR Estimator for the Nonparametric GMANOVA Model

Longitudinal trends of observations can be estimated using the generalized multivariate analysis of variance (GMANOVA) model proposed by [10]. In the present paper, we consider estimating the trends nonparametrically using known basis functions. Then, as in nonparametric regression, an overfitting problem occurs. [13] showed that the GMANOVA model is equivalent to the varying coefficient model with non-longitudinal covariates. Hence, as in the case of the ordinary linear regression model, when the number of covariates becomes large, the estimator of the varying coefficient becomes unstable. In the present paper, we avoid the overfitting problem and the instability problem by applying the concept behind penalized smoothing spline regression and multivariate generalized ridge regression. In addition, we propose two criteria to optimize hyper parameters, namely, a smoothing parameter and ridge parameters. Finally, we compare the ordinary least square estimator and the new estimator.


Introduction
We consider the generalized multivariate analysis of variance (GMANOVA) model with n observations of p -dimensional vectors of response variables.This model was proposed by [10]  matrix of non-stochastic within-individual explanatory variables of   rank q  X   q p  , and an n p  matrix of error variables, respectively, where n is the sample size, n 1 is an n -dimensional vector of ones and k 0 is a k -dimensional vector of zeros.Then, the GMANOVA model is expressed as coefficient matrix an μ is q -dimensional unknown vector.We assume that   , , , Then we can express the GMANOVA model as

 
, Let S be an unbiased estimator of the unknown covariance matrix Σ that is given by Then, the maximum likelihood (ML) estimators of μ and Ξ are given by   YS X X S X , respectively.The ML es- timators are the unbiased and asymptotically efficiency estimators of μ and Ξ .
In the GMANOVA model, is often used as the j th row vector of X .
Then, we estimate the longitudinal trends of Y using   1 q  -polynomial curves.However, occasionally, the polynomial curve cannot thoroughly express flexible longitudinal trends.Hence, we consider estimating the longitudinal trends nonparametrically in the same manner as [11] and [5], i.e., we use the known basis function as   t x and assume that p is large.In the present paper, we refer to the GMANOVA model with X obtained from the basis function as the nonparametric GMANOVA model.In the nonparametric GMANOVA model, it is well known that the ML estimators become unstable because 1  S becomes unstable when p is large.Thus, we deal with the least square (LS) estimators of μ and Ξ , which are o b t a in ed b y mi n i mi z i n g Then, the LS estimators of μ and Ξ are obtained by 1 1

ˆ( )
respectively.Note that μ does not depend on A .The LS estimators are simple and unbiased estimators of μ and Ξ .However, as well as ordinary nonparametric regression model, the LS estimators cause an overfitting problem when we use basis functions to estimate the longitudinal trends nonparametrically.
In order to avoid the overfitting problem, we use    X X K instead of  X X as the penalized smoothing spline regression (see, e.g., [2]), where    0  is a smoothing parameter and K is a q q  known penalty matrix.Let       criterion proposed by [7,8] in the univariate linear regression model (for multivariate case, see e.g., [15]).For the case in which p  X Ι and 1 p  , [17] proposed the p C and its bias-corrected p C (modified p C ; p MC ) criteria for optimizing the ridge parameter.However, an optimal  cannot be obtained without an iterative computational algorithm because an optimal  cannot be obtained in closed form.
On the other hand, [4] also proposed a generalized ridge (GR) regression in the univariate linear regression model, i.e., the model (1) with p  X Ι and 1 p  , simultaneously with the ridge regression.The GR estimator is defined not by a single ridge parameter, but rather by multiple ridge parameters Then, several authors proposed a non-iterative GR estimator (see, e.g., [6]).[18] proposed a GR regression in the multivariate linear regression model, i.e., the model (1) with p  X Ι and 1 p  .We call this generalized ridge regression the multivariate GR (MGR) regression.They also proposed the p C and p MC criteria for optimizing ridge parameters θ in the MGR regression.They showed that the optimized θ by minimizing two criteria are obtained in closed form.[9] proposed non-iterative MGR estimators by extending non-iterative GR estimators.Several computational tasks are required in estimating  nonparametrically because we determine the optimal  and the number of basis functions simultaneously.Fortunately, [18] reported that the performance of the MGR regression is the almost same as that of the multivariate ridge regression.Hence, we use the MGR regression in order to avoid the multicollinearity problem that occurs in A in order to reduce the number of computational tasks.The remainder of the present paper is organized as follows: In Section 2, we propose new estimators using the concept of the penalized smoothing spline regression and the MGR regression.In Section 3, we show the target mean squared error (MSE) of a predicted value of Y .We then propose the p C and p MC criteria to optimize ridge parameters and smoothing parameter in the new estimator.
Using these criteria, we show that the optimized ridge parameters are obtained in closed form under the fixed  .
We also show the magnitude relationship between the optimized ridge parameters.In Section 4, we compare the LS estimator in (3) with the proposed estimator through numerical studies.In Section 5, we present our conclusions.

The New Estimators
In the model (1), we consider estimating the longitudinal trends nonparametrically by using basis functions X .Then, we consider the following estimators in order to avoid the overfitting problem in the nonparametric GMANOVA model, where   0   is a smoothing parameter and K is a q q  known penalty matrix.In this estimator, we must determine K before using this estimator.Since K is usually set as some nonnegative definite matrix, we assume that K is a nonnegative definite matrix.
O is a q q  matrix of zeros, then this estimator corresponds to the LS estimators μ and Ξ in (1).Note that this estimator controls the smoothness of each estimated curve through only one parameter  .When we use this estimator, we need to optimize the parameter  because this estimator changes with  .

If multicollinearity occurs in
where 0   is a ridge parameter.This estimator with q  K I corresponds to the estimator of [16].If 0   , then this estimator corresponds to the estimator in (5).
Note that   A Y in this estimator corre- sponds to the ridge estimator for a multivariate linear model [17].In this estimator, we need to optimize  and  because this estimator changes with these pa-rameters.However, we cannot obtain the optimized  and  in closed form.Thus, we need to use an iterative computational algorithm to optimize two parameters.From another point of view, this estimator controls the smoothness of each estimated curve   , ' through only one parameter  .Hence, this estimator is not a well fitting curve when the smoothnesses of the true curves differ.
Hence, we apply the concept of the MGR estimator [18] to   A Y in order to obtain the optimized ridge parameter in closed form.Here, we derive the MGR estimator for the nonparametric GMANOVA model as follows: where , and . In this estimator, since θ shrinks the estimators of corresponds to the MGR estimator in [18].

Target MSE
In order to define the MSE of the predicted value of Y , we prepare the following discrepancy function for measuring the distance between n p  matrices 1 F and Since Σ is an unknown covariance matrix, we use the unbiased estimator S in (2) instead of Σ to estimate   , r F F using the following sample discrepancy function: These two functions,   , the discrepancy function r , the MSE of the predicted value of Y is defined as , which is the predicted value of Y when we use ˆ μ and , ˆ θ Ξ in (7).In the present paper, we regard θ and  making the MSE the smallest as the principle optimum.However, we cannot use the MSE in (9) in actual application because this MSE includes unknown parameters.Hence, we must estimate (9) in order to estimate the optimum θ and  .

Y Y EY
From the properties of the function r and using , and .
Σ are non-stochastic variables.For calculating the expectations in the MSE, we prove the following lemma.
Lemma 3.1.For any p p  non-stochastic matrix J , we obtain . We obtain Hence we obtain Thus, the lemma is proven.
Using this lemma, we obtain , we can propose the instinctive estimator of MSE, referred to as the p C criterion, as follows: When we use this criterion, we optimize the ridge parameter θ and the smoothing parameter  by the following algorithm: 3) We obtain where 4) We optimize the ridge parameter and the smoothing Note that this p C criterion corresponds to that in [18] when p  X I and There is some bias between the MSE in (9) and the p C criterion in (10) Then, we obtain   Σ (see, e.g., [14]) and   Therefore, we obtain the unbiased estimator for , where G H (11) As in the case of using the p C , we optimize θ and  using this criterion as follows: 3) We obtain where   , we do not need the above iterative computational algorithm.

Optimizations using the
Note that the terms with respect to θ in the p C and   as follows: where ii u and ii v are the   2) We optimize the ridge parameter and the smoothing parameter as , respectively, by using (C)  , (M)  and the closed forms in ( 12) and (13).
This means that we can reduce the processing time to optimize the parameters, and we need to use the optimization algorithm for only one parameter,  , for any k .

Magnitude Relationships between Optimized Ridge Parameters
In this subsection, we prove the magnitude relationships between proof.Since we assume K as a nonnegative definite matrix, there exists L that satisfies   K L L (see, e.g., [3]).Then, since 0 K is a nonnegative definite matrix.This means that all of the eigenvalues of proof.Through simple calculation, we obtain and M 0 c  .Using these relationships, we obtain the following theorem.
Theorem 3.1.For any ˆ0   , we obtain proof.We consider the following situations: In (1),  .Hence, we only consider situation (2).Note that    to be satisfied.Then, we obtain S is a positive definite matrix, 0 ii u  for any ˆ0  .From corollary 3.1, we have    .Thus, this theorem is proven.
This theorem corresponds to that in [9] when for each optimized smoothing parameter.Theorem 3.2.We consider the following situations: 1) For any (C) ˆ0   and (M) ˆ0   , we obtain the following relationships based on the above situations: 2) If ( 2) and (3), then 3) If ( 2) and ( 4), then proof.In ( 1) and ( 5), the relationships (i) and (iv) are true.Hence we need only prove relationships (ii) and (iii).
Then we obtain using the closed forms of ( 12) and (13).Through simple calculation, we obtain Since 0 i d  and the denominator is positive, the sign of Hence we obtain relationships (ii) and (iii).Thus, the theorem is proven.

Numerical Studies
In this section, we compare the LS estimator μ and Ξ in (3) The explanatory matrix A is given by and each row vector of N is generated from the independent k -dimensional normal distribution with mean k 0 and covariance matrix k I .Let i m ,   1, ,12 i   be a p -dimensional vector.We set each i m as follows:  ; , , ; , , z z z h t is Richard's growth curve model [12].
We set the longitudinal trends using these . Then, we standardized ,1, 2,1, ' , , , , ' . We set each element of X as a cubic B -spline basis function.Since X is set using the cubic B -spline, we note that 3 q p   .Additional details concerning K and X are reported in [2].We simulate 10, 000 repetitions for each n , p , k , a  and y  .In each repetition, we fixed A , but Y varies.We search   using the closed forms of ( 12) and ( 13) in each repetition.In each repetition, we need to optimize q because X and K vary with q .We calculate in each repetition.Then, we adopt the optimized q by minimizing each criterion in each repetition.
After that, we calculate for , which is obtained using  and   .As in the case of using ˆˆ( ), ˆ  θ Ξ , we adopt q by using each criterion in each repetition for Ξ and  Ξ .Some of the results are shown in Tables 1   and 2. The values in the tables are obtained by Each estimator optimized by using the p MC criterion for  , θ , and q is more improve than that by using the p C criterion for each estimator in almost all situations.This indicates that the p MC criterion is a better estimator of the MSE of each predicted value of Y than the p C criterion.The reasons for this are that the p MC criterion is an unbiased estimator of MSE

Conclusions
In the present paper, we estimate the longitudinal trends nonparametrically by using the nonparametric GMANOVA model in (1), which is defined using basis functions as X in the GMANOVA model.When we use basis functions as X , the LS estimators μ and Ξ incur overfitting.In order to avoid this problem, we proposed ˆ μ and ˆ Ξ in (5) using the smoothing parameter    0  and the q q  known penalty non-negative definite matrix K .However, if multicollinearity occurs in A , Ξ and ˆ Ξ are not good estimators due to large variance.In the present paper, we also proposed , ˆ θ Ξ in (7) in order to avoid the multicollinearity problem that occurs in A and the overfitting problem by using basis functions as X .The estimator ˆ Ξ controls the smoothness of each estimated longitudinal curve using only one parameter  .On the other hand, in the estimator , ˆ θ Ξ , the rough smoothness of estimated longitudinal curves is controlled using  , and each smoothness of   in the varying coefficient model ( 4) is controlled by θ .
We also proposed the p C and p MC criteria in (10) and (11) for optimizing the ridge parameter θ and the smoothing parameter  .Then, using the p GC criterion in (14) and minimizing this criterion in Theorem A, we obtain the optimized θ using the p C and p MC criteria in closed form as (12) and (13)  is used because the optimized θ obtained using each criterion is always obtained as ( 12) and ( 13) for any k .Furthermore, we must optimize q if we use model (1) to estimate the longitudinal trends.This means that we optimize the parameters in the estimators and calculate the valuation of the estimator for each q , and then we compare these values in order to optimize q .Since this optimization requires an iterative computational algorithm, we must reduce the processing time for estimating the parameters in the estimator.Hence, the advantage of using , ˆ θ Ξ is very important.This optimized ridge parameter in (12) and ( 13) corresponds to that in [18] when

Acknowledgments
I would like to express my deepest gratitude to Dr. Hirokazu Yanagihara of Hiroshima University for his valuable ideas and useful discussions.In addition, I would like to thank Prof. Yasunori Fujikoshi, Prof. Hirofumi Wakaki, Dr. Kenichi Satoh and Dr. Kengo Kato of Hiroshima University for their useful suggestions and comments.Finally, I would like to thank Dr. Tomoyuki Akita of Hiroshima University for his advice with regard to programming.

Minimization of the p GC Criterion
In this appendix, we show that the optimizations using the p C and p MC criteria in (10) and (11) are obtained in closed form as ( 12) and ( 13) for any    0  .[9] proposed the generalized p C   p GC criterion for the MGR regression (originally the p GC criterion for selection variables in the univariate regression was proposed by [1]).Similar to their idea, we proposed the p GC criterion for the nonparametric GMANOVA model.
By omitting constant terms and some terms with respect to  in the p C and where the function r is given by (8) in closed form, as shown in the following theorem.
Theorem A. For any i and    proof.Since where ij u and ij v are the   , i j th element of U and V , respectively.Clearly, ij u and ij v also vary with  .Note that for any 0 S is a positive definite matrix (see, e.g., [3]).

 
, be as follows:  .
If we restrict w to be greater than or equal to 0, then this function is equivalent to the function

4 )
We optimize the ridge parameter and the smoothing parameter as and the p C criterion in(10) by using a number of constant terms M c and   in closed form for any fixed 0 Hence, we consider obtaining the optimum θ by minimizing the p GC crite- rion.From Theorem A, the optimum θ is obtained in closed form as(15).Using the closed form in(15) a function of  , we can regard the p C and p MC criteria for optimizing θ and  in (10) and (11) as a function of  .This means that we can use these criteria to optimize  .Then, we can rewrite the optimization algorithms to optimize the ridge parameter θ and the smoothing parameter  by minimizing the p

G 1 .
we obtain  as a nonnegative definite matrix for any 0   .Thus, the lemma is proven.Using the same idea, we have   For any ˆ0  , we obtain

From
Theorem 3.1, we obtained the relationships be-

which indicates that the last six rows of   12 M 6 M
t are obtained by changing the scale of   t .The response matrix Y is generated by program in the software Matlab used to search for a minimum value, because   the search algorithm, the starting point for the search is set as 0   .Then, we obtain the optimized ridge parameters

Ξ
with those using the LS estimators μ and Ξ , and the estimators  μ and  Ξ in(5).When we use  Ξ , we obtain  by minimizing In Theorem 3.2, we also established the relationships between

p
MC can be used to optimize the smoothing parameter  and the number of basis functions q .Hence, we can use ˆ μ and , ˆ θ Ξ , the parameters θ ,  , and q of which are optimized by the p MC criterion for estimating the longitudinal trends.

p
MC criteria in(10) and(11), these criteria are included in a class of criteria specified by    0  .This class is expressed by the p GC criterion as calculate the second and third terms in the right-hand side of the above equation as follows: we consider the following function for w   : 17) has a minimum value at ŵ , which is i

Table 2 . MSE when q is selected using each criterion for each method in each repetition
( 12)k  .pMC is the best method.