TITLE:
A novel over-sampling method and its application to miRNA prediction
AUTHORS:
Xuan Tho Dang, Osamu Hirose, Thammakorn Saethang, Vu Anh Tran, Lan Anh T. Nguyen, Tu Kien T. Le, Mamoru Kubo, Yoichi Yamada, Kenji Satou
KEYWORDS:
Imbalanced Dataset; Over-Sampling; SMOTE; miRNA Classification
JOURNAL NAME:
Journal of Biomedical Science and Engineering,
Vol.6 No.2A,
February
27,
2013
ABSTRACT:
MicroRNAs (miRNAs) are short
(~22nt) non-coding RNAs that play an indispensable role in gene regulation of
many biological processes. Most of current computational, comparative, and
non-comparative methods commonly classify
human precursor micro- RNA (pre-miRNA) hairpins from both genome pseudo
hairpins and other non-coding RNAs (ncRNAs). Although there were a few
approaches achieving promising results in applying class imbalance learning
methods, this issue has still not solved completely and successfully yet by the
existing methods because of imbalanced class distribution in the datasets. For
example, SMOTE is a famous and general over-sampling method addressing this
problem, however in some cases it cannot improve or sometimes reduces classification performance. Therefore,
we developed a novel over-sampling method named incre-mental- SMOTE to
distinguish human pre-miRNA hairpins from both genome pseudo hairpins and other
ncRNAs. Experimental results on pre-miRNA datasets from Batuwita et al. showed that our method achieved
better Sensitivity and G-mean than the control (no over- sampling), SMOTE,
and several successsors of modified SMOTE
including safe-level-SMOTE and border-line-SMOTE. In addition, we also
applied the novel method to five imbalanced benchmark datasets from UCI Machine
Learning Repository and achieved improvements in Sensitivity and G-mean.
These results suggest that our method outperforms SMOTE and several successors
of it in various biomedical classification problems including miRNA classification.