CBPS-Based Inference in Nonlinear Regression Models with Missing Data

In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.


Introduction
Consider the nonlinear regression model: where i Y is a scalar response variate, i X is a 1 d × vector of covariate, β is a 1 p × vector of unknown regression parameter, ( ) f ⋅ is a known function, and it is nonlinear with respect to β , i ε is a random In general, d is different from p. The model has been studied by many authors, such as Jennrich [1], Wu [2], Crainceanu and Ruppert [3] and so on.
Missing data is frequently encountered in statistical studies, and ignoring it could lead to biased estimation and misleading conclusions. Inverse probability weighting (Horvitz and Thompson [4]) and imputation are two main methods for dealing with missing data. Since Scharfstein et al. [5] noted that the augmented inverse probability weighted (AIPW) estimator in Robins et al. [6] was double-robust, authors have proposed many estimators with the double-robust property, see Tan [7], Kang and Schafer [8], Cao et al. [9]. The estimator is doubly robust in the sense that consistent estimation can be obtained if either the outcome regression model or the propensity score model is correctly specified. The AIPW estimators have been advocated for routine use (Bang and Robins [10]). For model (1), in the absence of missing data, the weighted least squares estimator of β can be obtained by minimizing the objective function In the presence of missing data, the above-mentioned method can not be used directly, so we make use of AIPW method to consider the model (1).
Throughout this paper, we assume that X's are observed completely, Y is missing at random (Rubin [11]).
Thus, the data actually observed are independent and identically distributed ( )( ) , , 1, , Y is missing. The missing at random (MAR) assumption implies that δ and Y are conditionally independent given X, that is, This probability is called the propensity score (Rosenbaum and Rubin [12]). If (1) is just the classical linear model. The linear models with missing data have been studied in existing papers, such as Wang and Rao ( [13] [14]), Xue [15], Qin and Lei [16] and so on. The inverse probability weighted imputation methods of Xue [15] and other papers are based on the nonparametric estimators of the propensity score model. However, it is difficult to obtain the nonparametric estimators because of the "curse of dimensionality", and as mentioned in the Kang and Schafer [8], the AIPW estimators can be severely biased when both models are misspecified. In addition, there is little work done for model (1) with missing responses.
In this paper, we construct estimators for β and µ of model (1), based on the covariate balancing propensity score (CBPS) method proposed by Imai and Ratkovic [17]. As mentioned in Imai and Ratkovic [18], the weights based on CBPS are robust in the sense that they improve covariate balance even when propensity score model is misspecified. Our estimator has the following merits: 1) it avoids the "curse of dimensionality"; 2) it avoids selecting the optimal bandwidth; 3) it improves performance of the AIPW estimators in terms of bias, standard deviation (SD) and mean-squared error (MSE), even when both outcome regression model and propensity score model are misspecified.
The rest of this paper is organized as follows. In Section 2, based on the CBPS and the AIPW methods, the estimators for the regression parameter β and the population mean µ are proposed, and the asymptotic properties of the estimators are investigated. In Section 3, simulation studies are carried out to assess the performance of the proposed method. In Section 4, concluding remarks are made. In Appendix, the proofs of the main results are given.

Construction of Estimators
The most popular choice of ( ) X π is a logistic regression function (Qin and Zhang [19]). We make the same choice and posit a logistic regression model for ( ) where α ∈ Θ is d-dimensional unknown column vector parameter.

CBPS-Based Estimator for the Propensity Score
Based on ( )( ) , people can obtain the estimator α by maximizing the log-likelihood function: Assuming that ( ) , X π α is twice continuously differentiable with respect to α , so maximizing the (3) implies the first-order condition However, the main drawback of this standard method is that the propensity score model ( ) X π may be misspecified, yielding biased estimators for the interesting parameters, such as β and µ . To overcome the drawback, we borrow the following ideas of Imai and Ratkovic [17]. Similar to arguments present by Imai and Ratkovic [17], we operationalize the covariate balancing property by using inverse propensity score weighting Equation (5) ensures that the first moment of each covariate is banlanced and the weights based on CBPS are robust even when propensity score model is misspecified. The key idea behind the CBPS is that propensity score model determines the missing mechanism and covariate balancing weights, see Imai and Ratkovic [17]. The sample analogue of the covariate balancing moment condition given in Equation (5) is According to Imai and Ratkovic [17], the CBPS is said to be just identified when the number of moment conditions equals that of parameters. If we use the covariate balancing conditions given in Equation (6) alone, the CBPS is just-identified. If we combine Equation (6) with the score condition given in Equation (4), then the CBPS is overidentified because the number of moment conditions exceeds that of parameters.
Combining Equation (6) with the score condition given in Equation (4), we obtain the following equation: Let α be the solution to the Equation (7). For the overidentified CBPS, the GMM (Hansen [20]) estimator α can be obtained by minimizing the following equation with respect to α for some positive-semidefinite symmetric weight matrix W: It is easy to show that, under some regularity conditions, α is a consistent estimator of 0 α , the true value of α . For the just-identified CBPS, we borrow the ideas of Imai and Ratkovic [17] and still minimize Equation (8) without the score condition to find α .

Estimator for the Regression Parameter
To make use of AIPW method, we borrow the idea of Seber and Wild [21] and define the least squares estimator of β based on complete-case data by solving the following estimating equation: There is no closed form of β , but it can be obtained by the following iterative equation: where c is a prespecified tolerance and ⋅ denotes the 2 L norm, then we stop the above iterative algorithm and obtain the least squares estimator of β , denoted by ˆc β .
Although the implementation of the complete case method is simple, it may result in misleading conclusion by simply excluding the missing data. In this section, we introduce an AIPW method based on CBPS to deal with the problems of complete case method. (11) is a full data model without missing data. So similar to Equation (10), we can obtain an estimator ˆI β of β by iterative equation and α is obtained by CBPS method.
The following Theorem 2 gives the asymptotic normality of ˆI β .
To apply Theorem 2 to construct the confidence region of β , we use and ( ) ( ) We can construct the confidence interval of β using (12) and (13).

Estimator for the Response Mean
It is of interest to estimate the mean of Y, say µ , when there are missing data in the responses. We here make use of the method of Xue [15] to construct the estimators of µ . Let if µ is the true parameter. Then the proposed estimator is In the following theorem, we state the asymptotic properties of μ .
Borrowing the method of Xue [15], we can obtain the following consistent estimator of V:

Simulation Examples
We conducted simulation studies to examine the performance of the proposed estimation methods. The simulated data are generated from the model The components of ( ) , , X X X X = are generated from the uniform distribution ( ) 0,1 U respectively and ε is generated from the standard normal distribution, δ is generated from Bernoulli with true propensity score model When both models are misspecified or either of them is misspecified, we adopt the same way as Kang and Schafer [8] to examine whether our method can improve the empirical performance of doubly robust estimators or not. Similar to Kang and Schafer [8], only the , the model is misspecified. As in the original study, we conduct simulations for population mean µ under four scenarios: 1) both outcome and propensity score models are correctly specified; 2) only the propensity score model is correct; 3) only the outcome model is correct; 4) both outcome and propensity score models are correctly misspecified. Due to the regression parameter β is in the outcome regression model, we only conduct simulations for β under (1) and (3) scenarios. For each scenario, we conduct 1000 simulations and calculate the bias, standard deviation (SD) and mean-squared error (MSE) for β and µ . The results of our simulations are presented in Tables 1-3. For a given scenario, we examine the performance of estimators on the basis of four different propensity score methods: Table 1. Relative performance of the estimators for regression parameter based on different propensity score estimation methods when both models are correct.     a) usual GLM method; b) the just-identified CBPS estimation with the covariate balancing moment conditions and without the score condition (CBPS1); c) the overidentified CBPS estimation with both the covariate balancing and score conditions (CBPS2); d) The true propensity score model which we do not need to estimate (TRUE). From Table 1 and Table 2, we can see that SD and MSE of our estimators for β decrease as n increases. Whether the propensity score model is specified correctly or not, the proposed estimators based on CBPS have smaller SD and MSE than the usual GLM estimators mostly. The CBPS with or without the score condition can substantially improve the performance of usual estimator. Compared with estimators based on true propensity score model, our proposed estimators perform as well as them in the terms of SD and MSE. Table 3 shows that, under the four scenarios, the SD and MSE of our proposed estimators remain lower than the usual GLM estimators. Similar to Imai and Ratkovic [17], the final scenario illustrates the most important point made by Kang and Schafer [8] that doubly robust estimator can deteriorate when both the outcome and the propensity models are misspecified. Under this scenario, the doubly robust estimators based on usual GLM have a significant amount of bias and variance. However, the CBPS can improve the performance of doubly robust estimators. In a word, we obtain the same conclusion as Imai and Ratkovic [17] that the CBPS can yield robust estimators of population mean, even when both the outcome and propensity score models are misspecified.

Concluding Remarks
We have proposed an improved estimation method for the parameters of interest in the nonlinear regression model with missing responses. The estimators based on CBPS and AIPW method have the following merits: 1) They avoid the "curse of dimensionality" and avoid selecting the optimal bandwidth; 2) When either the outcome regression model or the propensity score model is correctly specified, the proposed estimators perform as well as estimators based on true propensity model in the terms of SD and MSE; 3) When both outcome regression and propensity score models are misspecified, as mentioned in Section 1, the usual AIPW estimator can be severely biased, but our method improves the performance of them and obtains an improved estimator for population mean. The simulation shows that the proposed method is feasible. Furthermore, with appropriately modification, the proposed method can be extended to other models with missing responses. The exhaustive procedure will be presented in our future work.

Appendix: Proofs of the Main Results
Throughout, let 0 α be the true value of α , and ( ) To complete the proofs of Theorems 1-3, the following lemma is needed. If there is a function To prove Theorem 2, we will verify the asymptotically normality of ( ) , , Under MAR assumption, we have  Submit or recommend next manuscript to SCIRP and we will provide best service for you: Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc. A wide selection of journals (inclusive of 9 subjects, more than 200 journals) Providing 24-hour high-quality service User-friendly online submission system Fair and swift peer-review system Efficient typesetting and proofreading procedure Display of the result of downloads and visits, as well as the number of cited articles Maximum dissemination of your research work Submit your manuscript at: http://papersubmission.scirp.org/