Open Journal of Statistics

Volume 5, Issue 5 (August 2015)

ISSN Print: 2161-718X   ISSN Online: 2161-7198

Google-based Impact Factor: 0.53  Citations  

Small Sample Behaviors of the Delete-d Cross Validation Statistic

HTML  XML Download Download as PDF (Size: 1630KB)  PP. 382-392  
DOI: 10.4236/ojs.2015.55040    2,826 Downloads   3,832 Views  Citations
Author(s)

ABSTRACT

Built upon an iterative process of resampling without replacement and out-of-sample prediction, the delete-d cross validation statistic CV(d) provides a robust estimate of forecast error variance. To compute CV(d), a dataset consisting of n observations of predictor and response values is systematically and repeatedly partitioned (split) into subsets of size nd (used for model training) and d (used for model testing). Two aspects of CV(d) are explored in this paper. First, estimates for the unknown expected value E[CV(d)] are simulated in an OLS linear regression setting. Results suggest general formulas for E[CV(d)] dependent on σ2 (“true” model error variance), nd (training set size), and p (number of predictors in the model). The conjectured E[CV(d)] formulas are connected back to theory and generalized. The formulas break down at the two largest allowable d values (d = np – 1 and d = np, the 1 and 0 degrees of freedom cases), and numerical instabilities are observed at these points. An explanation for this distinct behavior remains an open question. For the second analysis, simulation is used to demonstrate how the previously established asymptotic conditions {d/n → 1 and nd → ∞ as n → ∞} required for optimal linear model selection using CV(d) for model ranking are manifested in the smallest sample setting, using either independent or correlated candidate predictors.

Share and Cite:

Kastens, J. (2015) Small Sample Behaviors of the Delete-d Cross Validation Statistic. Open Journal of Statistics, 5, 382-392. doi: 10.4236/ojs.2015.55040.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.