TITLE:
A novel voting system for the identification of eukaryotic genome promoters
AUTHORS:
Lin Lei, Kaiyan Feng, Zhisong He, Yudong Cai
KEYWORDS:
MRMR (Minimum Redundancy Maximum Relevance); Transcription Start Sites (TSS); Promoter Identification; Promoter Identifier Integration
JOURNAL NAME:
Journal of Biomedical Science and Engineering,
Vol.3 No.7,
July
15,
2010
ABSTRACT: Motivation: Accurate identification and delineation of promoters/TSSs (transcription start sites) is important for improving genome annotation and devising experiments to study and understand transcriptional regulation. Many promoter identifiers are developed for promoter identification. However, each promoter identifier has its own focuses and limitations, and we introduce an integration scheme to combine some identifiers together to gain a better prediction performance. Result: In this contribution, 8 promoter identifiers (Proscan, TSSG, TSSW, FirstEF, eponine, ProSOM, EP3, FPROM) are chosen for the investigation of integration. A feature selection method, called mRMR (Minimum Redundancy Maximum Relevance), is novelly transferred to promoter identifier selection by choosing a group of robust and complementing promoter identifiers. For comparison, four integration methods (SMV, WMV, SMV_IS, WMV_IS), from simple to complex, are developed to process a training dataset with 1400 se- quences and a testing dataset with 378 sequences. As a result, 5 identifiers (FPROM, FirstEF, TSSG, epo- nine, TSSW) are chosen by mRMR, and the integration of them achieves 70.08% and 67.83% correct prediction rates for a training dataset and a testing dataset respectively, which is better than any single identifier in which the best single one only achieves 59.32% and 61.78% for the training dataset and testing dataset respectively.