TITLE:
The Comparison between Random Forest and Support Vector Machine Algorithm for Predicting β-Hairpin Motifs in Proteins
AUTHORS:
Shaochun Jia, Xiuzhen Hu, Lixia Sun
KEYWORDS:
Random Forest Algorithm; Support Vector Machine Algorithm; β-Hairpin Motif; Increment of Diversity; Scoring Function; Predicted Secondary Structure Information
JOURNAL NAME:
Engineering,
Vol.5 No.10B,
December
17,
2013
ABSTRACT:
Based on the research of predictingβ-hairpin motifs in proteins, we apply Random Forest and Support
Vector Machine algorithm to predictβ-hairpin
motifs in ArchDB40 dataset. The motifs with the loop length of 2 to 8 amino
acid residues are extracted as research object and thefixed-length pattern
of 12 amino acids are selected. When using the same characteristic parameters
and the same test method, Random Forest algorithm is more effective than
Support Vector Machine. In addition, because of Random Forest algorithm doesn’t
produce overfitting phenomenon while the dimension of characteristic parameters is
higher, we use Random Forest based on higher dimension characteristic
parameters to predictβ-hairpin
motifs. The better prediction results are obtained; the overall accuracy and
Matthew’s correlation coefficient of 5-fold cross-validation achieve 83.3% and
0.59, respectively.