Similarity/dissimilarity analysis of protein sequences using the spatial median as a descriptor

Abstract

A novel 3-D graphical representation of protein sequence has been introduced. A right cone of a unit base and unit height has been selected to represent protein sequences on its surface. The twenty amino acids have been represented by 20 circles and all protein's residues have been represented by n lines on the cone's surface. All the spots which represent the protein's residues have been shown in the cone's top view. The spatial median of all the spots is used as a new descriptor of any protein sequence. This approach was applied on two short segments of protein of yeast Saccharomyces cerevisiae. The examination of the similarities/dissimilarities for the eight ND5 proteins and the six β-globin proteins illustrate the utility of our approach. A linear correlation and significance analysis have been provided to compare our results and the percentage sequence alignment identity.

Share and Cite:

M. Abo-Elkhier, M. (2012) Similarity/dissimilarity analysis of protein sequences using the spatial median as a descriptor. Journal of Biophysical Chemistry, 3, 142-148. doi: 10.4236/jbpc.2012.32016.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Echenique P. (2007) Introduction to protein folding for physicists. Contemporary Physics, 48, 81-108. doi:10.1080/00107510701520843
[2] Feng, Z.P. and Zhang, C.T. (2002) A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins. The International Journal of Biochemistry & Cell Biology, 34, 298-307. doi:10.1016/S1357-2725(01)00121-2
[3] Randic, M. (2004) 2-D graphical representation of proteins based on virtual genetic code. SAR and QSAR in Environmental Research, 15, 147-157. doi:10.1080/10629360410001697744
[4] Randic, M., Zupan, J. and Balaban, A.T. (2004) Unique graphical representation of protein sequences based on nucleotide triplet codons. Chemical Physics Letters, 397, 247-252. doi:10.1016/j.cplett.2004.08.118
[5] Yu, Z.G., Anh, V. and Lau, K.S. (2004) Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. Journal of Theoritical Biology, 226, 341-348. doi:10.1016/j.jtbi.2003.09.009
[6] Randic, M., Butina, D. and Zupan, J. (2006) Novel 2-D graphical representation of proteins. Chemical Physics Letters, 419, 528-532. doi:10.1016/j.cplett.2005.11.091
[7] Randic, M., Novic, M., Topic, D.V. and Plasvic, D. (2006) Novel numerical and graphical representation of DNA sequences and proteins. SAR and QSAR in Environmental Research, 17, 583-595. doi:10.1080/10629360601033549
[8] Chapin, G.A., Diaz, H.G., Molina, R., Santos, J.V., Uriarte, E. and Diaz, Y.G. (2006) Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L. FEBS Letters, 580, 723-730. doi:10.1016/j.febslet.2005.12.072
[9] Randic, M. (2007) 2-D graphical representation of proteins based on physico-chemical properties of amino acids. Chemical Physics Letters, 440, 291-295. doi:10.1016/j.cplett.2007.04.037
[10] Randic, M. (2007) On a geometry-based approach to protein sequence alignment. Journal of Mathematical Chemistry, 43, 756-772. doi:10.1007/s10910-007-9229-7
[11] Randic, M., Zupan, J. and Topic D.V. (2007) On representation of proteins by star-like graphs. Journal of Molecular Graphics and Modelling, 26, 290-305. doi:10.1016/j.jmgm.2006.12.006
[12] Wen, J. and Zhang, Y.Y. (2009) A 2D graphical representation of protein sequence and its numerical characterization. Chemical Physics Letters, 476, 281-286. doi:10.1016/j.cplett.2009.06.017
[13] Li, C., Yu, X. Yang, L., Zheng, X. and Wang, Z. (2009) 3-D maps and coupling numbers for protein sequences, Physica A: Statistical Mechanics and Its Applications, 388, 1967-1972. doi:10.1016/j.physa.2009.01.017
[14] Randic, M., Mehulic, K., Vukicevic, D., Pisanski, T., Topic, D.V. and Plavsic, D. (2009) Graphical representation of proteins as four-color maps and their numerical characterization. Journal of Molecular Graphics and Modelling, 27, 637-641. doi:10.1016/j.jmgm.2008.10.004
[15] Abo el Maaty, M.I., Abo-Elkhier, M.M. and Abd Elwahaab, M.A. (2010) 3D graphical representation of protein sequences and their statistical characterization. Physica A: Statistical Mechanics and Its Applications, 389, 4668-4676.
[16] Abo el Maaty, M.I., Abo-Elkhier, M.M. and Abd Elwahaab, M.A. (2010) Representation of protein sequences on latitude-like circles and longitude-like semi-circles. Chemical Physics Letters, 493, 386-391. doi:10.1016/j.cplett.2010.05.039
[17] Novic, M. and Randic, M. (2008) Representation of proteins as walks in 20-D space. SAR and QSAR in Environmental Research, 19, 317-337. doi:10.1080/10629360802085066
[18] Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P. and Zhang, H. (2001) An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 17, 149-154. doi:10.1093/bioinformatics/17.2.149
[19] Out, H.H. and Sayood, K. (2003) A new sequence distance measure for phylogenetic tree construction. Bioinformatics, 19, 2122-2130. doi:10.1093/bioinformatics/btg295
[20] Makarenkov, V. and Lapointe, F. (2004) A weighted leastsquares approach for inferring phylogenies from incomplete distance matrices. Bioinformatics, 20, 2113-2121. doi:10.1093/bioinformatics/bth211

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.