Hierarchical Penalized Mixed Model

Penalized spline has been a popular method for estimating an unknown function in the non-parametric regression due to their use of low-rank spline bases, which make computations tractable. However its performance is poor when estimating functions that are rapidly varying in some regions and are smooth in other regions. This is contributed by the use of a global smoothing parameter that provides a constant amount of smoothing across the function. In order to make this spline spatially adaptive we have introduced hierarchical penalized splines which are obtained by modelling the global smoothing parameter as another spline.


Introduction
Non parametric smoothing involves letting the data determine the amount of smoothing. Classical smoothing splines use a global smoothing parameter in order to control the amount of smoothing in a function. When homogeneity of the smoothness cannot be reasonably assumed across the whole domain of the function, a natural extension is to allow the smoothing parameter to vary over the domain as a penalty function of independent variable, adapting to the change of roughness [1] [2]. Adaptive smoothing has been an interesting topic in statistics and it involves allowing the smoothing parameter, the bandwidth or the placement of knots to vary across the domain, adapting to the change of roughness [3]- [10]. In penalized regression splines, [11] modeled the penalty function by a linear interpolation on the logarithmic scale, [12] modeled the penalty function from full Bayesian approach and used Markov chain Monte Carlo for computa-tion, [13] developed a fast and simple algorithm for the Bayesian p-spline based on Laplace approximation for the marginal likelihood. Modeling the smoothing parameter as a penalty function of independent variable can also be used to achieve adaptiveness. This involves formulating the adaptive smoothing as a minimization problem with a new penalty function in which the estimate has the same form as the smoothing spline and method developed for classical smoothing splines can be used. [1] derived the reproducing kernels for a generic penalty function and suggested modeling it by B-splines. [2] studied the solution of the penalized least square estimate in which the Smoothing parameter is a varying function across the domain under the Reproducing Kernel Hilbert Space approach. [14] proposed to model the penalty function by a step function where the segmentation is data driven and estimate it by maximizing the generalized likelihood. A complexity penalty was added to the generalized likelihood in selecting the best step function from a collection of candidate. This approach was very computational expensive due to the large number of candidate models and proposed search algorithm and thus has a serious limitation. In this research we aim at developing a Hierarchical penalty model using p-splines which will result in more adaptive smoothing.

Penalized Splines
P-splines are low-order basis spline with a penalty to avoid under smoothing.
They are typically not spatially adaptive and hence have trouble when functions are varying rapidly. Regression splines are approximations to functions typically using low-order number of basis function. These splines are subject to lack of smoothness and various strategies have been proposed to attain this smoothness. e.g Regression P-splines [15] achieves smoothness by penalizing the sum of squares or likelihood by a single penalty parameter. The penalty parameter and the fit using P-splines are easy to compute using mixed model technology; [16] [17] [18] and are not sensitive to knot parameter selection [11]. A penalized spline can be seen as a compromise between smoothing and regression spline and it combines the attractive attributes of regression and smoothing splines.
They are basically regression spline in which the penalty is applied directly to the coefficients of the piecewise polynomial. Hence one can retain a large number of knots and constrain their effect using a penalty to avoid over fitting. The number of knots defining the spline function is larger than that justified by the data but smaller than the number of observations. Thus they are referred to as low-rank smoothers and this significantly reduces numerical effort. The level of over fitting is controlled by a roughness penalty over the curve. The most common choice is a penalty based on the integral of a squared derivative of a spline curve.
To avoid the drawbacks in regression spline and optimize the fit we can choose a large number of knots e.g. min (n/4, 40) as suggested by [11] and prevent over fitting by penalizing the coefficients of splines. That is, one finds subject to 2 d a < for non negative constant a. Where Y is the response variable, β and d are the fixed and random effects vectors, X and S are the design matrices associated with the fixed and random effects vectors. Using a Lagrange multiplier, this minimization can be written as The resulting estimate is given by The smoothness of this estimate varies continuously as a function of a global smoothing parameter ω. The larger the value of ω the more the fit shrinks towards polynomial fit while small values of ω result in an over fitted estimate.
Penalized spline can be seen as a generalization of the spline smoothing with more flexible choice of bases, penalties and knots. One chooses the spline basis based on sufficiently large number of knot and penalizes unnecessary structure.
This spline possesses a number of good properties: It shows no boundary effect as many kernels smoother do. i.e. the spreading of a fitted curve as density outside of the domain of the data generally accompanied by bending towards zero, it is a straight forward extension of (generalized) linear regression models, conserve moments (means, variances) of the data i.e. Given a linear p spline with degree q + 1 and a penalty of order q + 1 or higher This property is very useful in density smoothing where mean and variance of the estimated density are the same as mean and the variance of the data for any amount of smoothing. It also has polynomial curve fit as its limits. That is, for a penalty of order q and large values of the smoothing parameter ω, the fitted function will approach a polynomial of degree 1 q − , if the degree of the p-spline is equal or higher than q. Also the computations, including those of cross validation are relatively cheap and can easily be incorporated into standard software [15]. fects, d is an unknown vector of random effects or subject specific, with mean zero and variance W, X and S are design matrices associated with a vector of fixed effects β and a vector of random effects d respectively and ε is a vector of residual error term with zero mean and covariance matrix P. The dimensions of the design matrices X and S must conform to the lengths of the observation vector Y and the number of fixed and random effects respectively. It is generally assumed that the elements of d are uncorrelated with the elements of ε in which case the covariance matrix of the random effects and residual error term is a block diagonal

Mixed Models
The matrices S and W will themselves be block diagonal if the data arise from A mixed model is of the form,

Y X Sd
Assuming that d and ε are multivariate normal; The log likelihood becomes, which although not obvious algebraically must also satisfy the mixed model equation given earlier. Since one of the ways in which these equations can be derived is directly from multivariate normality assumption. Typically W and P will not be known and can be estimated by substituting the expression for β back into ( ) ; ; l W P β and maximizing the result over the parameters defining W and P. Once estimates for W and P have been determined, we can return to the mixed model equations and determined the BLUP d of random effects vector d as the vector that minimizes the expected mean squared error of prediction.
The BLUP of d can be expressed as the posterior expectation of the random effects given the data The knots are placed over the range of x's and the dimension of r is chosen generously. In penalized spline the approach is to put a penalty on the coefficient of k ρ . The standard approach is to minimize sum of squares and the quadratic penalty T D ωρ ρ , where ω is the penalty parameter and D is the penalty square matrix. In truncated polynomial D is an identity matrix and the penalty is T ωρ ρ . In B spline basis the penalty is constructed using the difference between neighboring spline coefficients [15]. An important feature of penalized spline is its links to linear mixed model. Due to this link we assume where 1 2 , , , l α α α  is a second layer of knots covering the range of 1 2 , , , r τ τ τ  . l is practically less than r. The hierarchical penalized smoothing model is completed by the shrinkage assumption