TITLE:
Sequence based prediction of relative solvent accessibility using two-stage support vector regression with confidence values
AUTHORS:
Ke Chen, Michal Kurgan, Lukasz Kurgan
KEYWORDS:
relative solvent accessibility; support vector regression; PSI-BLAST; PSI-PRED; secondary protein structure
JOURNAL NAME:
Journal of Biomedical Science and Engineering,
Vol.1 No.1,
June
6,
2008
ABSTRACT: Predicted relative solvent accessibility (RSA) provides useful information for prediction of binding sites and reconstruction of the 3D-structure based on a protein sequence. Recent years observed development of several RSA prediction methods including those that generate real values and those that predict discrete states (buried vs. exposed). We propose a novel method for real value prediction that aims at minimizing the prediction error when compared with six existing methods. The proposed method is based on a two-stage Support Vector Regression (SVR) predictor. The improved prediction quality is a result of the developed composite sequence representation, which includes a custom-selected subset of features from the PSI-BLAST profile, secondary structure predicted with PSI-PRED, and binary code that indicates position of a given residue with respect to sequence termini. Cross validation tests on a benchmark dataset show that our method achieves 14.3 mean absolute error and 0.68 correlation. We also propose a confidence value that is associated with each predicted RSA values. The confidence is computed based on the difference in predictions from the two-stage SVR and a second two-stage Linear Regression (LR) predictor. The confidence values can be used to indicate the quality of the output RSA predictions.