Prot-Class: A bioinformatics tool for protein classification based on amino acid signatures


Knowledge about characteristics shared across known members of a protein family enables their identification within the complete set of proteins in an organism. Shared features are usually expressed through motifs, which can incorporate specific patterns and even amino acid (AA) biases. Based on a set of classification patterns and biases it can be determined which additional proteins may belong to a specific family and share its functionality. A bioinformatics tool (Prot-Class) was implemented to examine protein sequences and characterize them based upon user-defined AA composition percentages and user defined AA patterns. In addition the tool allows for the identification of repeated AA patterns, biased AA compositions within windows of user-defined length, and the characteristics of putative signal peptides and glycosylphosphatidylinositol (GPI) lipid anchors. ProtClass is general purpose and can be applied to analyze protein sequences from any organism. The Prot-Class source code is available through the GNU General Public License v3 and can be accessed via the Google Code Repository:

Share and Cite:

Lichtenberg, J. , Keppler, B. , Conley, T. , Gu, D. , Burns, P. , Welch, L. and Showalter, A. (2012) Prot-Class: A bioinformatics tool for protein classification based on amino acid signatures. Natural Science, 4, 1161-1164. doi: 10.4236/ns.2012.412A141.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. Journal of Moecular Biology, 215, 403-410.
[2] Showalter, A.M., Keppler, B.D., Lichtenberg, J., Gu, D. and Welch, L.R. (2010) A bioinformatics approach to the identification, classification, and analysis of hydroxyprolinerich glycoproteins. Plant Physiology, 153, 485-513. doi:10.1104/pp.110.156554
[3] Spalding, J.D. and Hoyle, D.C. (2005) Accuracy of string kernels for protein sequence classification. Lecture Notes in Computer Science, 3686, 454-460.
[4] Zaki, N.M., Deris, S. and Illias, R. (2005) Application of string kernels in protein sequence classification. Applied Bioinformatics, 4, 45-52.
[5] Vries, J., Munshi, R., Tobi, D., Klein-Seetharaman, K., Benos, P.V. and Bahar, I. (2004) A sequence alignment- independent method for protein classification. Applied Bioinformatics, 3, 137-148. doi:10.2165/00822942-200403020-00008
[6] Heinkoff, S. and Heinkoff, J. (1994) Protein family classification based on searching a database of blocks. Genomics, 19, 97-107. doi:10.1006/geno.1994.1018
[7] Heinkoff, S. and Heinkoff, J. (1994) A protein family classification method for analysis of large dna sequences. Proceedings of the 27th Annual Hawaii International Conference on Systems Sciences, New York, 265-274.
[8] Schultz, C.J., Rumsewicz, M.P., Johnson, K.L., Jones, B.J., Gaspar, Y.M. and Bacic, A. (2002) Using genomic resources to guide research directions. The arabinogalactan protein gene family as a test case. Plant Physiology, 129, 1448-1463. doi:10.1104/pp.003459
[9] Bendtsen, J.D., Nielsen, H., von Heijne, G. and Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0. Journal of Molecular Biology, 340, 783-795. doi:10.1016/j.jmb.2004.05.028
[10] Nielsen, H., Engelbrecht, J., Brunak, S. and von Heijne, G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering, 10, 1-6. doi:10.1093/protein/10.1.1
[11] Eisenhaber, B., Wildpaner, M., Schultz, C.J., Borner, G.H., Dupree, P. and Eisenhaber, F. (2003). Gylcosylphosphatidylinositol lipid anchoring of plant proteins. Sensitive prediction from sequence- and genome-wide studies for Arabidopsis and rice. Plant Physiology, 133, 1691-1701. doi:10.1104/pp.103.023580
[12] Johnson, K.L., Jones, B.J., Schultz, C.J. and Bacic, A. (2003) Non-enzymic cell wall (glyco) proteins. In: Rose, J.K.C., Ed., The Plant Cell Wall, Blackwell Publishers, Oxford, 111-154.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.