TITLE:
Combined Use of k-Mer Numerical Features and Position-Specific Categorical Features in Fixed-Length DNA Sequence Classification
AUTHORS:
Dau Phan, Ngoc Giang Nguyen, Favorisen Rosyking Lumbanraja, Mohammad Reza Faisal, Bahriddin Abapihi, Bedy Purnama, Mera Kartika Delimayanti, Mamoru Kubo, Kenji Satou
KEYWORDS:
Sequence Classification, Numerical and Categorical Features, Feature Selection
JOURNAL NAME:
Journal of Biomedical Science and Engineering,
Vol.10 No.8,
August
30,
2017
ABSTRACT: To classify DNA sequences, k-mer frequency is widely used since it can convert variable-length sequences into fixed-length and numerical feature vectors. However, in case of fixed-length DNA sequence classification, subsequences starting at a specific position of the given sequence can also be used as categorical features. Through the performance evaluation on six datasets of fixed-length DNA sequences, our algorithm based on the above idea achieved comparable or better performance than other state-of-the art algorithms.