Open Journal of Statistics

Volume 13, Issue 6 (December 2023)

ISSN Print: 2161-718X   ISSN Online: 2161-7198

Google-based Impact Factor: 0.53  Citations  

Ultra-High Dimensional Feature Selection and Mean Estimation under Missing at Random

HTML  XML Download Download as PDF (Size: 649KB)  PP. 850-871  
DOI: 10.4236/ojs.2023.136043    47 Downloads   217 Views  

ABSTRACT

Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients drop out of the study, making the data missing, so a method for estimating the mean of the response variable with missing values for the ultra-high dimensional datasets is needed. In this paper, we propose a two-stage ultra-high dimensional variable screening method, RF-SIS, based on random forest regression, which effectively solves the problem of estimating missing values due to excessive data dimension. After the dimension reduction process by applying RF-SIS, mean interpolation is executed on the missing responses. The results of the simulated data show that compared with the estimation method of directly deleting missing observations, the estimation results of RF-SIS-MI have significant advantages in terms of the proportion of intervals covered, the average length of intervals, and the average absolute deviation.

Share and Cite:

Li, W. , Deng, G. and Pan, D. (2023) Ultra-High Dimensional Feature Selection and Mean Estimation under Missing at Random. Open Journal of Statistics, 13, 850-871. doi: 10.4236/ojs.2023.136043.

Cited by

No relevant information.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.