^{1}

^{*}

^{2}

Computer intensive methods have recently been intensively studied in the field of mathematics, statistics, physics, engineering, behavioral and life sciences. Bootstrap is a computer intensive method that can be used to estimate variability of estimators, estimate probabilities and quantile related to test statistics or to construct confidence intervals, explore the shape of distribution of estimators or test statistics and to construct predictive distributions to show their asymptotic behaviors. In this paper, we fitted the classical logistic regression model, and performed both parametric and non-parametric bootstrap for estimating confidence interval of param eters for logistic model and odds ratio. We also conducted test of hypothesis that the prevalence does not depend on age. Conclusions from both bootstrap methods were similar to those of classical logistic regression.

Knowing the distribution of test statistic of random sample drawn from population of interest provides clues as to the methods to be employed in analyzing such data. Statisticians normally have to make decision on the nature of the distribution for the population of which the sample was obtained. A good guess of the nature of the population distribution leads to powerful test. However, high price is paid if the assumption of the distribution is wrong. Under normal circumstances it is not possible to validate the distribution of sample by re-sampling from given population due to high cost of implementation. It is very important to consider other methods of analyzing data, which are flexible with the choice of the distribution and based on this, bootstrap methods was introduced.

Bootstrap methods are computer-based methods for assessing measures of accuracy to statistical estimates like sample mean and standard errors (Efron and Tibshirani, 1994 [

The aims of this paper are to formulate a logistic regression model and estimate the probability of infection as function of age using a Generalized Linear Model for binary data, construct 95% confidence intervals for the unknown parameters of the model and test the hypothesis that the prevalence does not depend on age using both classical and bootstrap (non-parametric and parametric) methods.

The dataset Keil (see Appendix), is a serological data of Hepatitis A from Bulgaria. It contains information about the age of the subject (in age group of one year), the number of seropositive (number of infected by hepatitis A), and sample size at each are group.

Bootstrapping is rapidly becoming a popular alternative tool to estimate parameters and standard errors for logistic regression model (Ariffin and Midi, 2012 [

where

In applications where the standard asymptotic theory does not hold, the null reference distribution can be obtained through parametric bootstrapping (Reynolds and Templin, 2004 [

Parametric bootstrap confidence interval

Using an algorithm by (Zoubir and Iskander, 2004 [

1) Estimate parameters (

2) Draw bootstrap sample

3) For each

4) Estimate the bootstrap mean and standard error of

5) Estimate

The non-parametric bootstrap belongs to the general sub-field non-parametric statistics that is defined by (Dudewicz, 1976 [

The idea called substitution principle or the plug-in rule gives explicit recognition of the fact that frequentist inference involves replacement of an unknown probability distribution by an estimate. In the simplest setting a random sample is available and the nonparametric estimate is the empirical distribution function, while a parametric model with a parameter of fixed dimension is replaced by its maximum likelihood estimate (Davison et al., 2003 [

Non-Parametric bootstrap confidence interval

Using a procedure proposed by (Zoubir and Iskander, 2004 [

1) Make a new dataset for binary response with covariate(s)

2) Draw bootstrap sample by sampling the pairs with replacements from new the dataset

3) For each

4) Estimate the bootstrap mean and standard error of

5) Estimate

In many applications, significance testing can be used to assess the plausibility of certain hypothesis. The likelihood ratio test, the score test and the Wald test are three asymptotically equivalent test procedures. For regular cases, their null distribution is a

The algorithm for parametric test of hypothesis given by (Fox, 2015 [

1) Estimate parameters (

2) Estimate

3) Draw bootstrap sample

4) For each

5) Calculate bootstrap P-value by

where # represent the number of times.

An algorithm for non-parametric test of hypothesis given by (Fox, 2015 [

1) Make a new dataset for binary response with covariate(s)

2) Estimate parameters (

3) By fixing x, draw bootstrap sample by sampling from only y with replacements form new dataset

4) For each

5) Calculate bootstrap P-value by

where # represent the number of times.

The parameter estimates together with standard errors (s.e) and confidence intervals (C.I) by logistic model (1) using classical approach are shown in

Results obtained from the logistic model (1) by parametric bootstrap are shown in

The parameter estimates together with standard errors (s.e) and confidence intervals (C.I) of the logistic model (1) by using non-parametric bootstrap approach are presented in

From Tables 1-3, it can be observed that the parameter estimates are very close. The standard errors of estimates for parametric bootstrap were slightly smaller compared to that of non-parametric bootstrap but very close to that obtained from classical approach. This is due to the fact that in both Classical and Parametric bootstrap methods,

Parameter | estimate | s.e | 95% C.I | z value | Pr(>|z|) |
---|---|---|---|---|---|

Intercept ( | −1.4301 | 0.1736 | (−1.7704, −1.0899) | −8.24 | <0.0001 |

AGE ( | 0.0838 | 0.0066 | (0.0709, 0.0967) | 12.72 | <0.0001 |

1.0874 | 0.0072 | (1.0735, 1.1015) |

Parameter | estimate | s.e | 95% C.I | P-value |
---|---|---|---|---|

−1.4412 | 0.1756 | (−1.7988, −1.1119) | ||

0.0844 | 0.0066 | (0.0722, 0.0981) | <0.0001 | |

1.0880 | 0.0072 | (1.0748, 1.1031) |

Parameter | estimate | s.e | 95% C.I | P-value |
---|---|---|---|---|

−1.4400 | 0.1813 | (−1.8107, −1.1007) | ||

0.0843 | 0.0070 | (0.0716, 0.9878) | <0.0001 | |

1.0880 | 0.0076 | (1.0743, 1.1038) |

the design matrix for the covariate

The P-values obtained for testing hypothesis

Comparing the length of confidence intervals for the three methods, it was observed that the interval length for non-parametric method is wider compared to that of classical and parametric methods. The classical and that of parametric methods have similar interval length.

The 95% confidence intervals for predicted prevalence by using both parametric and non-parametric methods are presented in

The bootstrap technique used for estimation and testing produced flexible results. Most of the results were similar to the classical results established under probability theory.

From the classical logistic regression model estimates, it was observed that the prevalence of Hepathitis A infection increased with age. Parametric and non-parametric methods used to investigate the effect of age gave similar results.

We conclude that this computer intensive method gives us an idea about the asymptotic behavior of estimators and also it is easy in implementation based on simulations.

Adjei, I.A. and Karim, R. (2016) An Application of Boots- trapping in Logistic Regression Model. Open Access Library Journal, 3: e3049. http://dx.doi.org/10.4236/oalib.1103049

Dataset

Submit or recommend next manuscript to OALib Journal and we will provide best service for you:

Publication frequency: Monthly

9 subject areas of science, technology and medicine

Fair and rigorous peer-review system

Fast publication process

Article promotion in various social networking sites (LinkedIn, Facebook, Twitter, etc.)

Maximum dissemination of your research work

Submit Your Paper Online: Click Here to Submit

Or Contact service@oalib.com