The Application of Cluster Analysis in Type II Diabetes Genome Association Study


Genetic diseases, such as Type II diabetes, are caused by a combination of environmental factors and mutations in multiple genes. Patients who have been diagnosed with such diseases cannot easily be treated. However, many diseases can be avoided if people at high risk change their living style, one example is their diet. Genome association study has been used to identify the risk factor of genetic disease. With the development of DNA microarray technique, it is possible to access the human genetic information related to specific diseases. This paper uses a combinatorial method to analyze the genetic case-control data for Type II diabetes. A distance based cluster method has been applied to publicly available genotype data on Type II diabetes for epidemiological study and achieved a high accurate result.

Share and Cite:

Hu, H. and Mao, W. (2014) The Application of Cluster Analysis in Type II Diabetes Genome Association Study. Journal of Computer and Communications, 2, 1-8. doi: 10.4236/jcc.2014.29001.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Type 2 Diabetes.
[2] Cardon, L.R. and Bell, J.I. (2001) Association Study Designs for Complex Diseases. Nature Reviews: Genetics, 2, 91- 98.
[3] Hirschhorn, J.N. and Daly, M.J. (2005) Genome-Wide Association Studies for Common Diseases and Complex Diseases. Nature Reviews: Genetics, 6, 95-108.
[4] Merikangas, K.R. and Risch, N. (2003) Will the Genomics Revolution Revo-lutionize Psychiatry. The American Journal of Psychiatry, 160, 625-635.
[5] Botstein, D. and Risch, N. (2003) Discovering Genotypes Underlying Human Phenotypes: Past Successes for Mendelian Disease, Future Approaches for Complex Disease. Nature Genetics, 33, 228-237.
[6] Clark, A.G., Boerwinkle, E., Hixson, J. and Sing, C.F. (2005) Determinants of the Success of Whole-Genome Associaiion Testing. Genome Research, 15, 1463-1467.
[7] He, J. and Zelikovsky, A. (2006) Tag SNP Selection Based on Multivariate Linear Regression. Proceedings of Interna- tional Conference on Computational Science, LNCS 3992, 750-757.
[8] Brinza, D., He, J. and Zelikovsky, A. (2006) Combinatorial Search Methods for Multi-SNP Disease Association. Proceedings of International Conference of the IEEE Engineering in Medicine and Biology, 1, 5802-5805.
[9] Margaret, H.D. Data Mining―Intrdocution and Advanced Topics. Prentice Hall, Upper Saddle River.
[10] Mao, W., Brinza, D., Hundewale, N., Gremalschi, S. and Zelikovsky, A. (2006) Genotype Susceptibility and Integrated Risk Factors for Complex Diseases. Proceedings of IEEE International Conference on Granular Computing, 2006, 754-757.
[11] Kimmel, G. and Shamir, R. (2005) A Block-Free Hidden Markov Model for Genotypes and Its Application to Disease Association. Journal of Computational Biology, 12, 1243-1260.
[12] Listgarten, J., Damaraju, S., Poulin, B., Cook, L., Dufour, J., Driga, A., Mackey, J., Wishart, D., Greiner, R. and Zanke, B. (2004) Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Clinical Cancer Research, 10, 2725-2737.
[13] Wellcome Trust Case Control Consortium (WTCCC).

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.