Estimation of Regression Model Using a Two Stage Nonparametric Approach

Based on the empirical or theoretical qualitative information about the relationship between response variable and covariates, we propose a new approach to model polynomial regression using a shape restricted regression after estimating the direction by sufficient dimension reduction. The purpose of this paper is to illustrate that in the absence of prior information other than the shape constraints, our approach provides a flexible fit to the data and improves regression predictions. We use central subspace to estimate the directions and fit a final model by shape restricted regression, when the shape is known or is stipulated from empirical inspection. Comparisons with an alternative nonparametric regression are included. Simulated and real data analyses are conducted to illustrate the performance of our approach.


Introduction
Even if the assumption of monotonicity, convexity or concavity is common, shape restricted regression has not been extensively applied in real applications for two main reasons.As the number of observations   n , the data dimensionality , and the number of constraints increases, computational and statistical difficulties (i.e.overfitting) are encountered, refer to [1,2] for detailed discussion.These and other authors proposed different methods to overcome the computational difficulties but there is no optimal solution.

  d   m
To tackle these limitations, we estimate the direction by the sufficient dimension reduction and fit a final model by the shape restricted regression based on the theoretical shape or stipulated shape of the empirical results.The recent literature for the sufficient dimension reduction proposed practical methods which provide adequate information about the regression with many predictors.Reference [3] considered a general method for estimating the direction in regressions that can be described fully by linear combinations of the predictors without assuming a model for the conditional distribution of Y X , where and Y X are response and explanatory variables, respectively.They also introduced a method to estimate the direction in a single-index regression and [4] extended it to multiple index regression by successive direction extraction.
More specifically, the main goal of this research is to show that the polynomial regression modeling by Central Subspace (CS) and Shape Restriction (SR) methods works well in practice, especially if the scatter plot shows a pattern.As is known that the curve fitting is finding a curve which matches a series of data points and possibly other constraints.This approach is commonly used by scientists and engineers to visualize and plot the curve that best describes the shape and behavior of their data.When more than two dimensions are used, we do not have the luxury of graphical representation any more but have theoretical information about the relationship of the response variable and predictors.Shape restricted regression is a non-parametric approach for building models whose fits are monotone, convex or concave in their covariates.Thses assumptions are commonly applied in biology [5], ranking [6], medicine [7], statistics [8] and psychology [9].
In general, one fits a straight line when the relationship between the response variable and the linear combination of the predictors is linear.Otherwise, one applies poly-nomial, logarithmic or exponential regression to fit the data.These regressions are practical methodologies when the mean function with predictors is smooth.It is wellknown that the estimation approaches from regression theory are useful in building linear or nonlinear relationships between the values of the predictors and the corresponding conditional mean of the response variable.See [10] for a detailed exposition of widely studied regression methods, particularly polynomial regression.However, the straightforward and efficient analysis may not be generally possible with many predictors.In many situations when the underlying regression function or scatter plot has a particular shape or form, the fitted model can be characterized by certain order or shape restrictions.In this case, the shape restricted classes of regression function are preferred.This nonparametric regression method provides a flexible fit to the data and improves regression predictions.
In addition, when the empirical results between the response and predictors appear to have a particular shape that has certain order or shape restrictions, the shape restricted regression functions may best explain the relationships.Taking shape restrictions into account, one can reduce the model root mean square error or increase the power of the test.This improves the efficiency of a statistical analysis, provided that the hypothesized shape restriction actually holds [11].
In order to contextualize the goal of this article, it is necessary to review the concept of CS and SR.In Section 2, we summarize the notion of CS and an estimation method of CS when the dimension is assumed to be known.Also, we suggest a data dependent approach to detect the unknown dimensions.In Section 3, we review the shape restricted regression and the constraint cone, over which we minimize the sum of squared errors of our approach for one dimension case.We apply our new approach to the simulated and a real data in Section 4.There are a few comments and concluding remarks in Section 5. d

Estimation Method by Central Subspace
Let be a scalar response variable and Y X be a 1 p  covariate vector.Suppose the goal is to make an inference about how the conditional distribution Y X varies with the values of X .Then, the sufficient dimension reduction method is to find the number of linear combinations, T .This indicates a useful reduction in the dimension of T B X X , where all the information in X about is included in the -linear combinations.Here, (1) holds trivially for and a dimension reduction subspace always exists.Hence, if the intersection of DRSs is itself a DRS, the Central Subspace (CS) is defined as the intersection of all DRSs, which is written as That is, CS is the minimum DRS that preserves the original information relating to the data.
In this article, we use a method for estimating the CS, The two forms in (2) are the informational correlation and the expected conditional log-likelihood, respectively.The idea behind this setting is to maximize the information index  over all matrices when  h is equivalent to maximizing the expected conditional log-likelihood.This information index is similar to the Kullback-Leibler information between the and the product of the mar- , quantifying the dependence of on .The important properties of the above information index is supported by Pro- . In other words, there would be no loss of information of predictors if X were replaced by the linear combinations.This is equivalent to finding a matrix The computation starts to maximize the sample version of  to estimate a basis for the CS.If all the densities were known, a sample version of is maximized over all matrices .Because the densities in n are practically unknown, we use the nonparametric approach to estimate one-dimensional and multi-dimensional density estimates.Here, for the choice of kernels and selection of bandwidths, we follow the general guideline proposed by [13].Since the Gaussian kernel performed well for the simulated and real data sets, we use density estimates based on a Gaussian kernel for the one-dimensional density and a product of Gaussian kernels for the multi-dimensional densities.Let be the univariate Gaussian kernel, be the vector, and be the observation.Then the -dimensional density estimate has the following form: , , , where is the optimal bandwidth in the sense of minimizing the mean integrated square error from [13].The density in is replaced by the estimates defined in (3) and maximize (4) for all matrices such that This method incorporates T ; it is the sequential quadratic programming procedure of [14].
Since prior information about d may not be available in practice, it will be useful to find a simpler way to determine using the data.The sufficient dimension reduction methods have been proposed for the determination of the minimal dimension of the CS.See [15][16][17] for details.In this paper, using the estimating function (5)

Fitting Model with Shape Restricted Regression
In this section, we review some fundamental concepts that can help us to lay the groundwork for the construction of the shape restricted method.More details about the properties of the constraint cone and polar cones can be found in [11,[18][19][20][21].

 
on f  can be monotone, convex or concave based on th alitative information about the relationship between response variable and predictors or empirical results.
For simplicity let 1 q e qu  and   .[20,22] for detaile scussio straine over which we minimize the sum of squared errors is constructed as follows: the monotone nondecreasing constraints can be written as The restriction of to the set o ex functions is ac f conv complished by the qualities .
In our case, is a realized value of the linear combination of th predictors;  X is estimated using CS.Any of these sets of inequalities defines m half spaces in n R , and their intersection forms a closed polyhedral conv cone in n R .The cone is designated by ex coordinates are equally spaced, the a and .nondecreasing conc ve and convex constraints are given by the following constraint matrices, respectively:

Suppose we have the following model
Copyright © 2013 SciRes.AM Some computational details: The ordinary leastsquares regression estimator is the projection of the data vector on to a lower-dimensional line r subspace of ontrast the shape restricted estimator can be y .In c a n  ob xis tained through the projection of y on to an m dimensional polyhedral convex cone in n  [23].We have the following useful proposition which shows the e tence and uniqueness of the projection of the vector y on a closed convex set (see [11]).
Proposition 1 Let C be a closed convex subset of .
the constraint s e for monotone, nondecreasing concave, nondecreasi x and vectors can be the form ple, any con obtained from .For exam ula vex   For a large data set, it is better to use the above vectors because the previous method of obtaining the edges putationally intensive.Another advantage is that the computations of the inner products with the second approach are faster because of all the zero entries in the vectors.
The polar cone of the constraint cone is ( [19], p. 121) Geometrically, the polar cone is the set of points in n  which make an obtuse angle with all points in  .Let us note some straightforward properties of 0  : The following proposition is a useful tor t tool for finding the constrained least squares estima .Its proof was discussed in detail by [23].
, the projection of y onto the the the num number s there is only a finite and  , faces J the mixed primal-dual b [18] or the hinge algo thm of [23] [11] may be used.A code of he algorithm was written in R. The code can be obtained from the authors equest.

Numerical Illustration
We examined the performance of the proposed methodology using a real and four simulated data sets.For each of the simulated data set, we carried out the computational algorithms as described in Sections 2 and 3 for sample size n = 100.Recently, [24] investigated a sufficient dime and dimensions in the time seri that the performance to detec improved as sample size increases and the computation is more intensive for higher dimensions such as 2 d  and 3.For the first three examples, we simulated four data sets which have the worst scenario, 100 n  , and computationally less intensive dimensions, 1 d  and 2 only, to illustrate clearly how the shape restricted method works well in these directions.For the nonparametric alternative we used a kernel.Although optimal bandwidth selection is essential, we used data adaptive fixed bandwidth that was recommended by [25].Here max x and min x are the mum and minimum values used in estimation, respectively, and n is the sample size.
We considered quadratic regression model linear combination of six predictors for the first example and ten predictors for the second example.In the third example, we simulated data from cubic regression of a linear combination with ten predictors.
We simulated data from the above model where the mean fu unction of a linear combination with six predictors.First, we estimated the dimension and direction by CS combined with AIC (5) as described nction is a quadratic f in Section 2. As shown in In this model, we cons n of one linear com idered another quadratic mean functio bination with ten predictors.Using the same procedures as the previous exa estimated dimension and direction by CS.Based on AIC, mple, we a true dimension 1 d  was detected, see Table 1 for details.The estimated vector is  0, 0.003, 0.013, 0.013, 0.046, 0.029 .In this example, we simulated data from a cubic polynomial model of one linear combination with ten predictors.In the first step, we estimated dime nd direction by CS.Table 1 indicated that a true dimension nsion a using convex regression.Figure 2 s that the shape restricted regression fits the data better than kernel regression, which leads to a smaller error sum of squares.
Example 3: Model 3    Table 1.AIC values for the simulated and a real data sets using Equations (5).
The entries of the Table 2 are the square roots of the ASEL.The results from Table 2 demonstrate that our method performed fairly well in all cases.It is better than kernel regression, in particular, when the data is generated from quadratic and cubic regressions.In general, we can see that our method provides better or comparable fits for the simulated examples, which is also supported by the results of Figures 1-3.
Example 4: Highway Accident Data For illustration, we applied our method to a real data set, Highway Accident Data.See Weisberg (2005) for a detailed description about this data.The data include 39 sections of large highways in the state of Minnesota in 1973 and the variables relate the automobile accident rate in accidents per million vehicle miles to several potential terms.We use log(Rate) as a response variable and eleven terms as explanatory variables.The definition of terms of this data is described in Table 10.5 of Weisberg (2005).
First, we estimated the direction of the predictor variables without losing any information using CS.As shown in The response variable Y is non-decreasing in both predic- . In addition, the marginal scatter plots,  vs display an increasing trends.Next, we fitted a model by a multiple isotonic regression.The isotonic fit is shown in Figure 4.This shows that our approach may be a better choice than parametric or nonparametric models that do not use the constraints and works well even for two dimensional model.
For the purpose of comparison, we computed Average Squared Error Loss (ASEL) of our models and alterna-   From the scatter plot of Figure 4, there is some curvature in the relationship between T 1 ˆX  choice vs .Hence, a concave curve may be a good to reflect the relationship between the response variable and the linear combination of the predictors.Figure 4 shows the concave and cubic regression fits.The plot of Figure 5 and the ASEL in Table 2 suggests that our method gives reasonable fit to this data set.

Comments
The polynomial regression is one of handy methods in regression analysis.However, this straightforward analysis is not generally possible with many predictors.Hence, the major message that we would like to deliver in this paper is that the estimation of direction by CS and fitting the model by SR is advantageous for high dimensional data that has many predictors.After estimating the direction by sufficient dimension reduction, it is not easy to choose the appropriate polynomial regression model from the pattern of the scatter plot without any theoretial basi

YX
columns form a basis for the subspace of .Therefore, a Dimension Reduction Subspace (DRS) for on B p  is defined as any subspace   B  of , for which (1) holds.Here is defined as the space spanned by the columns of .That is, (1) represents that Y is independent of p con-gress .Note that  V C in both case straint cone can be specified by a set of linearly independent vectors 1 , , m 

Figure 1 . (Model 1 )
Figure 1.(Model 1) Data are generated from quadratic function of a linear combination with six predictors.The solid curve is quadratic fit and the dotted curve is the shape restricted fit.

Figure 2 .
Figure 2. (Model 2) Data are generated from quadratic function of a linear combination with ten predictors.The solid curve is quadratic fit and the dotted curve is the shape restricted fit.

Figure 3 . (Model 3 )
Figure 3. (Model 3) Data are generated from cubic function of a linear combination with six predictors.The solid curve is cubic fit and the dotted curve is the shape restricted fit.

Fig ure 5 .
Fig ure5.Highway Accident Data: The solid line is cubic fit and and the dotted line is the shape restricted fit.

Table 2 ,
the dimension is detected by