TITLE:
mLysPTMpred: Multiple Lysine PTM Site Prediction Using Combination of SVM with Resolving Data Imbalance Issue
AUTHORS:
Md. Al Mehedi Hasan, Shamim Ahmad
KEYWORDS:
Multi-Label PTM Site Predictor, Sequence-Coupling Model, General PseAAC, Data Imbalance Issue, Different Error Costs, Support Vector Machine
JOURNAL NAME:
Natural Science,
Vol.10 No.9,
September
30,
2018
ABSTRACT:
Post-translational
modification (PTM) increases the functional diversity of proteins by
introducing new functional groups to the side chain of amino acid of a protein.
Among all amino acid residues, the side chain of lysine (K) can undergo many
types of PTM, called K-PTM, such as “acetylation”, “crotonylation”, “methylation” and “succinylation” and also responsible for occurring multiple PTM in
the same lysine of a protein which leads to the requirement of multi-label PTM site
identification. However, most of the existing computational methods have been
established to predict various single-label PTM sites and a very few have been developed to solve
multi-label issue which needs further improvement. Here, we have developed a
computational tool termed mLysPTMpred to predict multi-label lysine PTM sites
by 1) incorporating the sequence-coupled information into the general pseudo
amino acid composition, 2) balancing the effect of skewed training dataset by
Different Error Cost method, and 3) constructing a multi-label predictor using
a combination of support vector machine (SVM). This predictor achieved 83.73%
accuracy in predicting the multi-label PTM site of K-PTM types. Moreover, all
the experimental results along with accuracy outperformed than the existing
predictor iPTM-mLys. A user-friendly web server of mLysPTMpred is available at http://research.ru.ac.bd/mLysPTMpred/.