A distribution pattern assisted method of transcription factor binding site discovery for both yeast and filamentous fungi


Transcription factors (TFs) are the core sentinels of gene regulation functioning by binding to highly specific DNA sequences to activate or repress the recruitment of RNA polymerase. The ability to identify transcription factor binding sites (TFBSs) is necessary to understand gene regulation and infer regulatory networks. Despite the fact that bioinformatics tools have been developed for years to improve computational identification of TFBSs, the accurate prediction still remains changeling as DNA motifs recognized by TFs are typically short and often lack obvious patterns. In this study we introduced a new attribute-motif distribution pattern (MDP) to assist in TFBS prediction. MDP was developed using a TF distribution pattern curve generated by analyzing 25 yeast TFs and 37 of their experimentally validated binding motifs, followed by calculating a scoring value to quantify the reliability of each motif prediction. Finally, MDP was tested using another set of 7 TFs with known binding sites to in silico validate the approach. The method was further tested in a non-yeast system using the filamentous fungus Magnaporthe oryzae transcription factor MoCRZ1. We demonstrate superior prediction reranking results using MDP over the commonly used program MEME and the other four predictors. The data showed significant improvements in the ranking of validated TFBS and provides a more sensitive statistics based approach for motif discovery.

Share and Cite:

Hu, J. , Chen, C. , Huang, K. and Mitchell, T. (2013) A distribution pattern assisted method of transcription factor binding site discovery for both yeast and filamentous fungi. Advances in Bioscience and Biotechnology, 4, 509-517. doi: 10.4236/abb.2013.44067.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Latchman, D.S. (1997) Transcription factors: An overview. International Journal of Biochemistry Cell B, 29, 1305-1312. doi:10.1016/S1357-2725(97)00085-X
[2] Karin, M. (1990) Too many transcription factors: Positive and negative interactions. New Biology, 2, 126-131.
[3] Roeder, R.G. (1996) The role of general initiation factors in transcription by RNA polymerase II. Trends Biochemistry Science, 21, 327-335. doi:10.1016/0968-0004(96)10050-5
[4] Nikolov, D.B. and Burley, S.K. (1997) RNA polymerase II transcription initiation: A structural view. PNAS, 94, 15-22. doi:10.1073/pnas.94.1.15
[5] Lee, T.I. and Young, R.A. (2000) Transcription of eukaryotic protein-coding genes. Annual Review of Genetics, 34, 77-137. doi:10.1146/annurev.genet.34.1.77
[6] Biggin, M.D. (2001) To bind or not to bind. Nat Genet, 28, 303-304. doi:10.1038/91045
[7] Ben-Gal, I., Shani, A., Gohr, A., Grau, J., Arviv, S., Shmilovici, A., Posch, S. and Grosse, I. (2005) Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics, 21, 2657-2666. doi:10.1093/bioinformatics/bti410
[8] Bailey, T.L. and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of International Conference of Intelligence and Systematic Molecular Biology, 2, 28-36.
[9] Frith, M.C., Hansen, U., Spouge, J.L. and Weng, Z. (2004) Finding functional sequence elements by multiple local alignment. Nucleic Acids Research, 32, 189-200. doi:10.1093/nar/gkh169
[10] Marsan, L. and Sagot, M.F. (2000) Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of Computing Biology, 7, 345-362. doi:10.1089/106652700750050826
[11] Pavesi, G., Mauri, G. and Pesole, G. (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics, 17, S207-S214. doi:10.1093/bioinformatics/17.suppl_1.S207
[12] Stormo, G.D. (1990) Consensus patterns in DNA. Methods Enzymol, 183, 211-221. doi:10.1016/0076-6879(90)83015-2
[13] Bailey, T.L. (2011) DREME: Motif discovery in transcription factor ChIP-seq data. Bioinformatics, 27, 1653-1659. doi:10.1093/bioinformatics/btr261
[14] Lichtenberg, J., Kurz, K., Liang, X., Alouran, R., Neiman, L., Nau, L.J., Welch, J.D., Jacox, E., Bitterman, T., Ecker, K., et al. (2010) WordSeeker: Concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures. BMC Bioinformatics, 11, S6. doi:10.1186/1471-2105-11-S12-S6
[15] Levy, S. and Hannenhalli, S. (2002) Identification of transcription factor binding sites in the human genome sequence. Mammalian Genome, 13, 510-514. doi:10.1007/s00335-002-2175-6
[16] Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S. and Kellis, M. (2005) Systematic discovery of regulatory motifs in human promoters and 3’UTRs by comparison of several mammals. Nature, 434, 338-345. doi:10.1038/nature03441
[17] Bulyk, M.L. (2003) Computational prediction of transcription-factor binding site locations. Genome Biology, 5, 201. doi:10.1186/gb-2003-5-1-201
[18] Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., et al. (1996) Life with 6000 genes. Science, 274, 546-567. doi:10.1126/science.274.5287.546
[19] Abdulrehman, D., Monteiro, P.T., Teixeira, M.C., Mira, N.P., Lourenco, A.B., dos Santos, S.C., Cabrito, T.R., Francisco, A.P., Madeira, S.C., Aires, R.S., et al. (2011) YEASTRACT: Providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface. Nucleic Acids Research, 39, D136-D140. doi:10.1093/nar/gkq964
[20] Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W. and Lenhard, B. (2004) JASPAR: An open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research, 32, D91-D94. doi:10.1093/nar/gkh012
[21] Ebbole, D.J. (2007) Magnaporthe as a model for understanding host-pathogen interactions. Annual Review of Phytopathology, 45, 437-456. doi:10.1146/annurev.phyto.45.062806.094346
[22] Choi, J., Kim, Y., Kim, S., Park, J. and Lee, Y.H. (2009) MoCRZ1, a gene encoding a calcineurin-responsive transcription factor, regulates fungal growth and pathogenicity of Magnaporthe oryzae. Fungal Genetic Biology, 46, 243-254. doi:10.1016/j.fgb.2008.11.010
[23] Kim, S., Hu, J., Oh, Y., Park, J., Choi, J., Lee, Y.H., Dean, R.A. and Mitchell, T.K. (2010) Combining ChIP-chip and expression profiling to model the MoCRZ1 mediated circuit for Ca/calcineurin signaling in the rice blast fungus. PLoS Pathogens, 6, e1000909. doi:10.1371/journal.ppat.1000909
[24] Lin, Z., Wu, W.S., Liang, H., Woo, Y. and Li, W.H. (2010) The spatial distribution of cis regulatory elements in yeast promoters and its implications for transcriptional regulation. BMC Genomics, 11, 581. doi:10.1186/1471-2164-11-581
[25] Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.B., Reynolds, D.B., Yoo, J., et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature, 431, 99-104. doi:10.1038/nature02800
[26] Koudritsky, M. and Domany, E. (2008) Positional distribution of human transcription factor binding sites. Nucleic Acids Research, 36, 6795-6805. doi:10.1093/nar/gkn752
[27] Liu, X.S., Brutlag, D.L. and Liu, J.S. (2002) An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Natural Biotechnology, 20, 835-839.
[28] Lichtenberg, J., Yilmaz, A., Welch, J.D., Kurz, K., Liang, X.Y., Drews, F., Ecker, K., Lee, S.S., Geisler, M., Grotewold, E., et al. (2009) The word landscape of the noncoding segments of the Arabidopsis thaliana genome. BMC Genomics, 10, 463. doi:10.1186/1471-2164-10-463
[29] Zhu, G., Spellman, P.T., Volpe, T., Brown, P.O., Botstein, D., Davis, T.N. and Futcher, B. (2000) Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth. Nature, 406, 90-94. doi:10.1038/35017581
[30] Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D. and Futcher, B. (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biological Cell, 9, 3273-3297.
[31] Gupta, S., Stamatoyannopoulos, J.A., Bailey, T.L. and Noble, W.S. (2007) Quantifying similarity between motifs. Genome Biology, 8, R24. doi:10.1186/gb-2007-8-2-r24
[32] Kullas, A.L., Martin, S.J. and Davis, D. (2007) Adaptation to environmental pH: Integrating the Rim101 and calcineurin signal transduction pathways. Molecular Microbiology, 66, 858-871. doi:10.1111/j.1365-2958.2007.05929.x

Copyright © 2021 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.