TITLE:
Identification of Deleterious Single Amino Acid Polymorphism Using Sequence Information Based on Feature Selection and Parameter Optimization
AUTHORS:
Xiao Chen, Qinke Peng, Jia Lv
KEYWORDS:
Single Amino Acid Polymorphisms; Support Vector Machine; Univariate Marginal Distribution Algorithm
JOURNAL NAME:
Engineering,
Vol.5 No.10B,
December
18,
2013
ABSTRACT:
Most of the human genetic variations are single nucleotide
polymorphisms (SNPs), and among them, non-synonymous SNPs, also known as SAPs,
attract extensive interest. SAPs can be neural or disease associated. Many
studies have been done to distinguish deleterious SAPs from neutral ones. Since
many previous studies were based on both structural and sequence features of
the SAP, these methods are not applicable when protein structures are not
available. In the current paper, we developed a method based on UMDA and SVM
using protein sequence information to predict SAP’s disease association. We
extracted a set of features that are independent of protein structure for each
SAP. Then a SVM-based machine-learning classifier that used grid search to tune
parameters was applied to predict the possible disease associa-tion of
SAPs. The SVM method reaches good prediction accuracy. Since the input data of
SVM contain irrelevant and noisy features and parameters of SVM also affect the
prediction performance, we introduced UMDA-based wrapper approach to search for
the ‘best’ solution. The UMDA-based method greatly improved prediction
performance. Com-pared with current method, our method achieved
better performance.