PFP-RFSM: Protein fold prediction by using random forests and sequence motifs

HTML  Download Download as PDF (Size: 254KB)  PP. 1161-1170  
DOI: 10.4236/jbise.2013.612145    3,056 Downloads   5,142 Views  Citations

ABSTRACT

Protein tertiary structure is indispensible in revealing the biological functions of proteins. De novo perdition of protein tertiary structure is dependent on protein fold recognition. This study proposes a novel method for prediction of protein fold types which takes primary sequence as input. The proposed method, PFP-RFSM, employs a random forest classifier and a comprehensive feature representation, including both sequence and predicted structure descriptors. Particularly, we propose a method for generation of features based on sequence motifs and those features are firstly employed in protein fold prediction. PFP-RFSM and ten representative protein fold predictors are validated in a benchmark dataset consisting of 27 fold types. Experiments demonstrate that PFP-RFSM outperforms all existing protein fold predictors and improves the success rates by 2%-14%. The results suggest sequence motifs are effective in classification and analysis of protein sequences.

 

Share and Cite:

Li, J. , Wu, J. and Chen, K. (2013) PFP-RFSM: Protein fold prediction by using random forests and sequence motifs. Journal of Biomedical Science and Engineering, 6, 1161-1170. doi: 10.4236/jbise.2013.612145.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.