TITLE:
PFP-RFSM: Protein fold prediction by using random forests and sequence motifs
AUTHORS:
Junfei Li, Jigang Wu, Ke Chen
KEYWORDS:
Protein Fold; Structure Analysis; Random Forest; Sequence Motifs
JOURNAL NAME:
Journal of Biomedical Science and Engineering,
Vol.6 No.12,
December
20,
2013
ABSTRACT:
Protein
tertiary structure is indispensible in revealing the biological functions of
proteins. De novo perdition of
protein tertiary structure is dependent on protein fold recognition. This study
proposes a novel method for prediction of protein fold types which takes primary
sequence as input. The proposed method, PFP-RFSM, employs
a random forest classifier and a comprehensive feature representation, including
both sequence and predicted structure descriptors. Particularly, we
propose a method for generation of features based on sequence motifs and those
features are firstly employed in protein fold prediction. PFP-RFSM and ten
representative protein fold predictors are validated in a benchmark dataset
consisting of 27 fold types. Experiments demonstrate that PFP-RFSM outperforms
all existing protein fold predictors and improves the success rates by 2%-14%.
The results suggest sequence motifs are effective in classification and
analysis of protein sequences.