Incorporating heterogeneous biological data sources in clustering gene expression data
Gang-Guo Li, Zheng-Zhi Wang
.
DOI: 10.4236/health.2009.11004   PDF    HTML     5,146 Downloads   8,853 Views   Citations

Abstract

In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity measure. On the basis of the similarity measures of protein- protein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure is introduced into K-means method, which can be considered as an improved K-means method. The improved K-means method and other three clustering methods are evaluated by a real dataset. Per-formance of these methods is assessed by a prediction accuracy analysis through known gene annotations. Our results show that the improved K-means method outperforms other clustering methods. The performance of the improved K-means method is also tested by varying the tuning coefficients of the combined dissimilarity measure. The results show that it is very helpful and meaningful to incorporate het-erogeneous data sources in clustering gene expression data, and those coefficients for the genome-wide or completed data sources should be given larger values when constructing the combined dissimilarity measure.

Share and Cite:

Li, G. and Wang, Z. (2009) Incorporating heterogeneous biological data sources in clustering gene expression data. Health, 1, 17-23. doi: 10.4236/health.2009.11004.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] D. Lockhart and E. Winzeler, (2000) Genomics gene expression and DNA arrays, Nature, 405, 827-846.
[2] M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, (1998) Cluster analysis and display of genome- wide expression patterns, Proc. Natl Acad. Sci. USA, 95, 14863-14868.
[3] S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho, and G. M. Church, (1999) Systematic determination of genetic network architecture, Nature Genetics, 22, 281-285.
[4] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub, (1999) Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation, Proc. Natl Acad. Sci. USA, 96, 2907–2912.
[5] S. Raychaudhuri, J. M. Stuart, and R. B. Altman, (2000) Principal component analysis to summarize microarray experiments: Application to sporulation time series, Pac. Symp. Biocomput., 455–466.
[6] M. P. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S. Furey, M. Jr. Ares, and D. Haussler, (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl Acad. Sci, 97, 262–267.
[7] H. Xia, A. Panaye, and B. T. Fan, (2007) Nonlinear SVM approaches to QSPR/QSAR studies and drug design, Current Computer-Aided Drug Design, 3, 341–352.
[8] D. Hanisch, A. Zien, R. Zimmer, and T. Lengauer, (2002) Coclustering of biological networks and gene expression data, Bioinformatics, 18, 145–154.
[9] J. Kasturi and R. Acharya, (2005) Clustering of diverse genomic data using information fusion, Bioinformatics, 21, 423–429.
[10] K. Rafal and Z. Adam, (2006) Incorporating gene ontology in clustering gene expression data, CBMS’06.
[11] L. Kaufman and P. Rousseeuw, (1990) Finding groups in data: An introduction to cluster analysis, Wiley, New York.
[12] S. Chris, B. Bobby-Joe , R. Teresa, B. Lorrie, B. Ashton, and T. Mike, (2006) BioGRID: A general repository for interaction datasets, Nucleic Acids Research, Database issue, 34, D535–D539.
[13] X. loannis, F. Esteban, S. Lukasz, D. Xiaoqun, T. Michael, M. Edward, and E. David, (2001) DIP: The database of interacing proteins: 2001 update, Nucleic Acids Research, 29, 239–241.
[14] C. Alfarano, C. E. Andrade, K. Anthony, N. Bahroos, M. Bajec, K. Bantoft, D. Betel, B. Bobechko, K. Boutilier, and E. Burgess, (2005) The biomolecular interaction network database and related tools: 2005 update, Nucleic Acids Res., 33, D418–D424.
[15] A. Zanzoni, L. Montecchi-Palazzi, M. Quondam, G. Ausiello, M. Helmer-Citterich, and G. Cesareni, (2002) MINT: A molecular INTeraction database, FEBS Lett., 513, 135–140.
[16] H. W. Mewes, C. Amid, R. Arnold, D. Frishman, V. Guldener, G. Mannhaupt, M. Munsterkotter, P. Pagel, N. Strack, and V. Stumpflen, (2004) MIPS: Analysis and annotation of proteins from whole genomes, Nucleic Acids Res, 32, 41–44.
[17] C. T. Harbison, D. B. Gordon, T. I. Lee, N. J. Rinaldi, K. D. Macisaac, T. W. Danford, N. M. Hannett, J. B. Tagne, D. B. Reynolds, J. Yoo, et al., (2004) Transcriptional regulatory code of a eukaryotic genome, Nature, 431, 99–104.
[18] T. I. Lee, N. J. Rinaldi, F. Robert, D. T. Odom, Z. Bar-Joseph, G. K. Gerber, N. M. Hannett, C. T. Harbison, C. M. Thompson, I. Simon, et al., (2002) Tanscriptional regulatory networks in Saccharomyces cerevisiae, Science, 298, 799–804.
[19] S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho, and G. M. Church, (1999) Systematic determination of genetic network architecture, Nature Genetics, 22, 281–285.
[20] J. Handl, J. Knowles, and D. Kell, (2005) Computational cluster validation in post-genomic data analysis, Bioinformatics, 21, 3201–3212.
[21] N. Bolshakova, F. Azuje, and P. Cunningham, (2005) A knowledge-driven approach to cluster validity assessment, Bioinformatics, 21, 2546–2547.
[22] A. Thalamuthu, M. Indranil, X. J. Zeng, and G. C. Tseng, (2006) Evaluation and comparison of gene clustering methods in microarray analysis, Bioinformatics, 22, 2405–2412.
[23] P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher, (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Mol. Biol. Cell, 9, 3273–3297.
[24] B. Trond, D. Bjarte, and J. Inge, (2004) LSimpute: accurate estimation of missing values in microarray data with least squares methods, Nucleic Acids Research, 32(3).
[25] Young Lab, http://web.wi.mit.edu/young/regulatory_ code.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.