Accurate Plant MicroRNA Prediction Can Be Achieved Using Sequence Motif Features

Abstract

MicroRNAs (miRNAs) are short (~21 nt) nucleotide sequences that are either co-transcribed during the production of mRNA or are organized in intergenic regions transcribed by RNA polymerase II. In animals, Drosha, and in plants DCL1 recognize pre-miRNAs which set themselves apart by their characteristic stem loop (hairpin) structure. This structure appears important for their recognition during the process of maturation leading to functioning mature miRNAs. A large body of research is available for computational pre-miRNA detection in animals, but less within the plant kingdom. For the prediction of pre-miRNAs, usually machine learning approaches are employed. Therefore, it is necessary to convert the pre-miRNAs into a set of features that can be calculated and many such features have been described. We here select a subset of the previously described features and add sequence motifs as new features. The resulting model which we called MotifmiRNAPred was tested on known pre-miRNAs listed in miRBase and its accuracy was compared to existing approaches in the field. With an accuracy of 99.95% for the generalized plant model, it distinguishes itself from previously published results which reach an average accuracy between 74% and 98%. We believe that our approach is useful for prediction of pre-miRNAs in plants without per species adjustment.

Share and Cite:

Yousef, M. , Allmer, J. and Khalifa, W. (2016) Accurate Plant MicroRNA Prediction Can Be Achieved Using Sequence Motif Features. Journal of Intelligent Learning Systems and Applications, 8, 9-22. doi: 10.4236/jilsa.2016.81002.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Erson-Bensan, A.E. (2014) Introduction to microRNAs in Biological Systems. Methods in Molecular Biology, 1107, 1-14.
http://www.ncbi.nlm.nih.gov/pubmed/24272428
[2] Allmer, J. and Yousef, M. (2012) Computational Methods for ab Initio Detection of microRNAs. Frontiers in Genetics.
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3467617&tool=pmcentrez&rendertype= abstract
[3] Lee, R.C., Feinbaum, R.L. and Ambros, V. (1993) The C. elegans Heterochronic Gene lin-4 Encodes Small RNAs with Antisense Complementarity to lin-14. Cell, 75, 843-854.
http://www.ncbi.nlm.nih.gov/pubmed/8252621
[4] Tüfekci, K.U., Oner, M.G., Meuwissen, R.L.J. and Genc, S. (2014) The Role of microRNAs in Human Diseases. Methods in Molecular Biology, 1107, 33-50.
http://www.ncbi.nlm.nih.gov/pubmed/24272430
[5] Zhang, Z., Yu, J., Li, D., Zhang, Z., Liu, F., Zhou, X., et al. (2010) PMRD: Plant microRNA Database. Nucleic Acids Research, 38, D806-D813.
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2808885&tool=pmcentrez&rendertype= abstract
[6] Kim, V.N., Han, J. and Siomi, M.C. (2009) Biogenesis of Small RNAs in Animals. Nature Reviews Molecular Cell Biology, 10, 126-139.
http://www.ncbi.nlm.nih.gov/pubmed/19165215
[7] Chapman, E.J. and Carrington, J.C. (2007) Specialization and Evolution of Endogenous Small RNA Pathways. Nature Reviews Genetics, Nature Publishing Group, 8, 884-896.
[8] Allmer, J. (2014) Computational and Bioinformatics Methods for microRNA Gene Prediction. Methods in Molecular Biology, 1107, 157-175.
http://www.ncbi.nlm.nih.gov/pubmed/24272436
[9] Hamzeiy, H., Allmer, J. and Yousef, M. (2014) Computational Methods for microRNA Target Prediction. Methods in Molecular Biology, 1107, 207-221.
http://www.ncbi.nlm.nih.gov/pubmed/24272439
[10] Sa?ar, M.D. and Allmer, J. (2013) Comparison of Four ab Initio microRNA Prediction Tools. Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms, SciTePress—Science and and Technology Publications, Barcelona, 190-195.
http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0004248201900195
[11] de ON Lopes, I., Schliep, A. and de Carvalho, A.C.P. de L.F. (2014) The Discriminant Power of RNA Features for Pre-miRNA Recognition. BMC Bioinformatics, 15, 124.
http://dx.doi.org/10.1186/1471-2105-15-124
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4046174&tool=pmcentrez&rendertype= abstract
[12] Kozomara, A. and Griffiths-Jones, S. (2011) miRBase: Integrating microRNA Annotation and Deep-Sequencing Data. Nucleic Acids Research, 39, D152-D157.
http://dx.doi.org/10.1093/nar/gkq1027
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3013655&tool=pmcentrez&rendertype= abstract
[13] Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B. and Bartel, D.P. (2003) Vertebrate microRNA Genes. Science, 299, 1540. http://www.ncbi.nlm.nih.gov/pubmed/12624257
http://dx.doi.org/10.1126/science.1080372
[14] Weber, M.J. (2005) New Human and Mouse microRNA Genes Found by Homology Search. FEBS Journal, 272, 59-73.
http://www.ncbi.nlm.nih.gov/pubmed/15634332
http://dx.doi.org/10.1111/j.1432-1033.2004.04389.x
[15] Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., et al. (2003) The microRNAs of Caenorhabditis elegans. Genes & Development, 17, 991-1008.
http://www.ncbi.nlm.nih.gov/pubmed/12672692
http://dx.doi.org/10.1101/gad.1074403
[16] Lai, E.C., Tomancak, P., Williams, R.W. and Rubin, G.M. (2003) Computational Identification of Drosophila microRNA Genes. Genome Biology, 4, R42.
http://www.ncbi.nlm.nih.gov/pubmed/12844358
http://dx.doi.org/10.1186/gb-2003-4-7-r42
[17] Grad, Y., Aach, J., Hayes, G.D., Reinhart, B.J., Church, G.M., Ruvkun, G., et al. (2003) Computational and Experimental Identification of C. elegans microRNAs. Molecular Cell, 11, 1253-1263.
http://www.ncbi.nlm.nih.gov/pubmed/12769849
http://dx.doi.org/10.1016/S1097-2765(03)00153-9
[18] Teune, J.-H. and Steger, G. (2010) NOVOMIR: De Novo Prediction of MicroRNA-Coding Regions in a Single Plant-Genome. Journal of Nucleic Acids, 2010, Article ID: 495904.
http://www.ncbi.nlm.nih.gov/pubmed/20871826
http://dx.doi.org/10.4061/2010/495904
[19] Ding, J., Zhou, S. and Guan, J. (2010) MiRenSVM: Towards Better Prediction of microRNA Precursors Using an Ensemble SVM Classifier with Multi-Loop Features. BMC Bioinformatics, 11, S11.
http://www.ncbi.nlm.nih.gov/pubmed/21172046
http://dx.doi.org/10.1186/1471-2105-11-s11-s11
[20] Xue, C., Li, F., He, T., Liu, G.-P., Li, Y. and Zhang, X. (2005) Classification of Real and Pseudo microRNA Precursors Using Local Structure-Sequence Features and Support Vector Machine. BMC Bioinformatics, 6, 310.
http://www.ncbi.nlm.nih.gov/pubmed/16381612
http://dx.doi.org/10.1186/1471-2105-6-310
[21] Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X. and Lu, Z. (2007) MiPred: Classification of Real and Pseudo microRNA Precursors Using Random Forest Prediction Model with Combined Features. Nucleic Acids Research, 35, W339-W344.
http://www.ncbi.nlm.nih.gov/pubmed/17553836
http://dx.doi.org/10.1093/nar/gkm368
[22] Keshavan, R., Virata, M., Keshavan, A. and Zeller, R.W. (2010) Computational Identification of Ciona intestinalis microRNAs. Zoological Science, 27, 162-170.
http://www.ncbi.nlm.nih.gov/pubmed/20141421
http://dx.doi.org/10.2108/zsj.27.162
[23] Lagos-Quintana, M., Rauhut, R., Lendeckel, W. and Tuschl, T. (2001) Identification of Novel Genes Coding for Small Expressed RNAs. Science, 294, 853-858.
http://www.ncbi.nlm.nih.gov/pubmed/11679670
http://dx.doi.org/10.1126/science.1064921
[24] Lau, N.C., Lim, L.P., Weinstein, E.G. and Bartel, D.P. (2001) An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegans. Science, 294, 858-862.
http://www.ncbi.nlm.nih.gov/pubmed/11679671
http://dx.doi.org/10.1126/science.1065062
[25] Lee, R.C. and Ambros, V. (2001) An Extensive Class of Small RNAs in Caenorhabditis elegans. Science, 294, 862-864.
http://dx.doi.org/10.1126/science.1065329
[26] Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., et al. (2000) Conservation of the Sequence and Temporal Expression of Let-7 Heterochronic Regulatory RNA. Nature, 408, 86-89.
http://www.ncbi.nlm.nih.gov/pubmed/11081512
http://dx.doi.org/10.1038/35040556
[27] Wang, X., Zhang, J., Li, F., Gu, J., He, T., Zhang, X., et al. (2005) MicroRNA Identification Based on Sequence and Structure Alignment. Bioinformatics, 21, 3610-3614.
http://www.ncbi.nlm.nih.gov/pubmed/15994192
http://dx.doi.org/10.1093/bioinformatics/bti562
[28] Hertel, J. and Stadler, P.F. (2006) Hairpins in a Haystack: Recognizing microRNA Precursors in Comparative Genomics Data. Bioinformatics, 22, 197-202.
http://www.ncbi.nlm.nih.gov/pubmed/16873472
http://dx.doi.org/10.1093/bioinformatics/btl257
[29] Ritchie, W., Gao, D. and Rasko, J.E.J. (2012) Defining and Providing Robust Controls for microRNA Prediction. Bioinformatics, 28, 1058-1061.
http://www.ncbi.nlm.nih.gov/pubmed/22408193
http://dx.doi.org/10.1093/bioinformatics/bts114
[30] Wu, Y., Wei, B., Liu, H., Li, T. and Rayner, S. (2011) MiRPara: A SVM-Based Software Tool for Prediction of Most Probable microRNA Coding Regions in Genome Scale Sequences. BMC Bioinformatics, 12, 107.
http://www.ncbi.nlm.nih.gov/pubmed/21504621
http://dx.doi.org/10.1186/1471-2105-12-107
[31] Yousef, M., Jung, S., Showe, L.C. and Showe, M.K. (2008) Learning from Positive Examples When the Negative Class Is Undetermined—microRNA Gene Identification. Algorithms for Molecular Biology, 3, 2.
http://www.ncbi.nlm.nih.gov/pubmed/18226233
http://dx.doi.org/10.1186/1748-7188-3-2
[32] Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M.J., et al. (2005) Identification of Clustered microRNAs Using an ab Initio Prediction Method. BMC Bioinformatics, 6, 267.
http://www.ncbi.nlm.nih.gov/pubmed/16274478
http://dx.doi.org/10.1186/1471-2105-6-267
[33] Gomes, C.P.C., Cho, J.-H., Hood, L., Franco, O.L., Pereira, R.W. and Wang, K. (2013) A Review of Computational Tools in microRNA Discovery. Frontiers in Genetics, 4, 81.
http://dx.doi.org/10.3389/fgene.2013.00081
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3654206&tool=pmcentrez&rendertype= abstract
[34] Billoud, B., Nehr, Z., Le Bail, A. and Charrier, B. (2014) Computational Prediction and Experimental Validation of microRNAs in the Brown Alga Ectocarpus siliculosus. Nucleic Acids Research, 42, 417-429.
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3874173&tool=pmcentrez&rendertype= abstract
http://dx.doi.org/10.1093/nar/gkt856
[35] Oliveira, J.S., Mendes, N.D., Carocha, V., Graca, C., Paiva, J.A. and Freitas, A.T. (2013) A Computational Approach for MicroRNA Identification in Plants: Combining Genome-Based Predictions with RNA-Seq Data. Journal of Data Mining in Genomics & Proteomics, 4, 130.
http://www.omicsonline.org/2153-0602/2153-0602-4-130.php?aid=14889
http://dx.doi.org/10.4172/2153-0602.1000130
[36] Xuan, P., Guo, M., Liu, X., Huang, Y., Li, W. and Huang, Y. (2011) PlantMiRNAPred: Efficient Classification of Real and Pseudo Plant Pre-miRNAs. Bioinformatics, 27, 1368-1376. http://www.ncbi.nlm.nih.gov/pubmed/21441575
http://dx.doi.org/10.1093/bioinformatics/btr153
[37] Williams, P.H., Eyles, R. and Weiller, G. (2012) Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees. Journal of Nucleic Acids, 2012, Article ID: 652979.
http://dx.doi.org/10.1155/2012/652979
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3503367&tool=pmcentrez&rendertype= abstract
[38] Cakir, M.V. and Allmer, J. (2010) Systematic Computational Analysis of Potential RNAi Regulation in Toxoplasma gondii. Proceedings of the 5th International Symposium on Health Informatics and Bioinformatics, Ankara, 20-22 April 2010, 31-38.
http://dx.doi.org/10.1109/hibit.2010.5478909
[39] Adai, A., Johnson, C., Mlotshwa, S., Archer-Evans, S., Manocha, V., Vance, V., et al. (2005) Computational Prediction of miRNAs in Arabidopsis thaliana. Genome Research, 15, 78-91.
http://dx.doi.org/10.1101/gr.2908205
[40] Rajagopalan, R., Vaucheret, H., Trejo, J. and Bartel, D.P. (2006) A Diverse and Evolutionarily Fluid Set of microRNAs in Arabidopsis thaliana. Genes & Development, 20, 3407-3425.
http://dx.doi.org/10.1101/gad.1476406
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1698448&tool=pmcentrez&rendertype= abstract
[41] Jain, M., Chevala, V.V.S.N. and Garg, R. (2014) Genome-Wide Discovery and Differential Regulation of Conserved and Novel microRNAs in Chickpea via Deep Sequencing. Journal of Experimental Botany, 65, 5945-5958.
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4203128&tool=pmcentrez&rendertype= abstract
http://dx.doi.org/10.1093/jxb/eru333
[42] Berezikov, E., Cuppen, E. and Plasterk, R.H.A. (2006) Approaches to microRNA Discovery. Nature Genetics, 38, 2-7.
http://www.ncbi.nlm.nih.gov/pubmed/16736019
http://dx.doi.org/10.1038/ng1794
[43] Dai, X., Zhuang, Z. and Zhao, P.X. (2011) Computational Analysis of miRNA Targets in Plants: Current Status and Challenges. Briefings in Bioinformatics, 12, 115-121.
http://www.ncbi.nlm.nih.gov/pubmed/20858738
http://dx.doi.org/10.1093/bib/bbq065
[44] Kurtoglu, K.Y., Kantar, M., Lucas, S.J. and Budak, H. (2013) Unique and Conserved microRNAs in Wheat Chromosome 5D Revealed by Next-Generation Sequencing. PLoS ONE, 8, e69801.
http://dx.doi.org/10.1371/journal.pone.0069801
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3720673&tool=pmcentrez&rendertype= abstract
[45] Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., et al. (2009) MEME SUITE: Tools for Motif Discovery and Searching. Nucleic Acids Research, 37, W202-W208.
http://dx.doi.org/10.1093/nar/gkp335
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2703892&tool=pmcentrez&rendertype= abstract
[46] Bailey, T.L. and Elkan, C. (1994) Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. Proceedings of the International Conference on Intelligent Systems for Molecular Biology, 2, 28-36.
http://www.ncbi.nlm.nih.gov/pubmed/7584402
[47] Yan, T., Yoo, D., Berardini, T.Z., Mueller, L.A., Weems, D.C., Weng, S., et al. (2005) PatMatch: A Program for Finding Patterns in Peptide and Nucleotide Sequences. Nucleic Acids Research, 33, W262-W266.
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1160129&tool=pmcentrez&rendertype= abstract
http://dx.doi.org/10.1093/nar/gki368
[48] van der Burgt, A., Fiers, M.W.J.E., Nap, J.-P. and van Ham, R.C.H.J. (2009) In Silico miRNA Prediction in Metazoan Genomes: Balancing between Sensitivity and Specificity. BMC Genomics, 10, 204.
http://www.biomedcentral.com/1471-2164/10/204/
http://dx.doi.org/10.1186/1471-2164-10-204
[49] Bentwich, I. (2008) Identifying Human microRNAs. Current Topics in Microbiology and Immunology, 320, 257-269.
http://dx.doi.org/10.1007/978-3-540-75157-1_12
[50] Nam, J.-W., Shin, K.-R., Han, J., Lee, Y., Kim, V.N. and Zhang, B.-T. (2005) Human microRNA Prediction through a Probabilistic Co-Learning Model of Sequence and Structure. Nucleic Acids Research, 33, 3570-3581.
http://www.ncbi.nlm.nih.gov/pubmed/15987789
http://dx.doi.org/10.1093/nar/gki668
[51] Nam, J.-W., Kim, J., Kim, S.-K., Zhang, B.-T. (2006) ProMiR II: A Web Server for the Probabilistic Prediction of Clustered, Nonclustered, Conserved and Nonconserved microRNAs. Nucleic Acids Research, 34, W455-W458.
http://www.ncbi.nlm.nih.gov/pubmed/16845048
http://dx.doi.org/10.1093/nar/gkl321
[52] Ng, K.L.S. and Mishra, S.K. (2007) De Novo SVM Classification of Precursor microRNAs from Genomic Pseudo Hairpins Using Global and Intrinsic Folding Measures. Bioinformatics, 23, 1321-1330.
http://www.ncbi.nlm.nih.gov/pubmed/17267435
http://dx.doi.org/10.1093/bioinformatics/btm026
[53] Thain, D., Tannenbaum, T. and Livny, M. (2005) Distributed Computing in Practice: The Condor Experience. Concurrency and Computation: Practice and Experience, 17, 2-4.
http://dx.doi.org/10.1002/cpe.938
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.3035
[54] Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002) Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 46, 389-422.
http://link.springer.com/article/10.1023%2FA%3A1012487302797
http://dx.doi.org/10.1023/A:1012487302797
[55] Vapnik, V.N. (1995) The Nature of Statistical Learning Theory. Springer-Verlag, New York.
http://dl.acm.org/citation.cfm?id=211359
http://dx.doi.org/10.1007/978-1-4757-2440-0
[56] Gewehr, J.E., Szugat, M. and Zimmer, R. (2007) BioWeka—Extending the Weka Framework for Bioinformatics. Bioinformatics, 23, 651-653.
http://www.ncbi.nlm.nih.gov/pubmed/17237069
http://dx.doi.org/10.1093/bioinformatics/btl671
[57] Chang, C.-C. and Lin, C.-J. (2011) LIBSVM. ACM Transactions on Intelligent Systems and Technology, 2, 1-27.
http://dl.acm.org/citation.cfm?doid=1961189.1961199
http://dx.doi.org/10.1145/1961189.1961199
[58] Batuwita, R. and Palade, V. (2009) microPred: Effective Classification of Pre-miRNAs for Human miRNA Gene Prediction. Bioinformatics, 25, 989-995.
http://www.ncbi.nlm.nih.gov/pubmed/19233894
http://dx.doi.org/10.1093/bioinformatics/btp107
[59] Zhang, B.H., Pan, X.P., Cox, S.B., Cobb, G.P. and Anderson, T.A. (2006) Evidence That miRNAs Are Different from Other RNAs. Cellular and Molecular Life Sciences, 63, 246-254.
http://dx.doi.org/10.1007/s00018-005-5467-7
[60] Sacar, M.D. and Allmer, J. (2014) Machine Learning Methods for microRNA Gene Prediction. Methods in Molecular Biology, 1107, 177-187.
http://www.ncbi.nlm.nih.gov/pubmed/24272437
http://dx.doi.org/10.1007/978-1-62703-748-8_10

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.