TITLE:
Statistical Methods of SNP Data Analysis and Applications
AUTHORS:
Alexander Bulinski, Oleg Butkovsky, Victor Sadovnichy, Alexey Shashkin, Pavel Yaskov, Alexander Balatskiy, Larisa Samokhodskaya, Vsevolod Tkachuk
KEYWORDS:
Genetic Data Statistical Analysis; Multifactor Dimensionality Reduction; Ternary Logic Regression; Random Forests; Stochastic Gradient Boosting; Independent Rule; Single Nucleotide Polymorphisms; Coronary Heart Disease; Myocardial Infarction
JOURNAL NAME:
Open Journal of Statistics,
Vol.2 No.1,
January
6,
2012
ABSTRACT: We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction, logic regression, random forests, stochastic gradient boosting along with their new modifications. We use complementary approaches to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and non-genetic risk factors are examined. To perform the data analysis concerning the coronary heart disease and myocardial infarction the Lomonosov Moscow State University supercomputer “Chebyshev” was employed.