A Brief Overview of a Few Popular and Important Protein Databases


Database is a repository of information. In today’s world there are different types of databases available. In this present review the focus is on a few popular and most widely used biological databases that store protein sequence and structure information. The databases that are of utmost importance to do basic biological research work are PDB, SCOP, CATH and UniProt/SwissProt and GenBank. These databases have different utilities & they play important roles in different fields of biology and bioinformatics. PDB provides the structural information of proteins, protein-complexes and proteins complexed with other macromolecules. SCOP & CATH store various annotations of protein sequences and structures. UniProt is a central repository of protein sequences & functions created by joining the information contained in SwissProt, TrEMBL.

Share and Cite:

A. Bagchi, "A Brief Overview of a Few Popular and Important Protein Databases," Computational Molecular Bioscience, Vol. 2 No. 4, 2012, pp. 115-120. doi: 10.4236/cmb.2012.24012.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] L. Liu and M. T. ?zsu, “Encyclopedia of Database System,” Springer, Berlin, 2009.
[2] P. Beynon-Davies, “Database Systems,” 3rd Edition, Palgrave, Houndmills, Basingstoke, 2004.
[3] T. Connolly and B. Carolyn, “Database Systems,” Harlow, New York, 2002.
[4] C. J. Date, “An Introduction to Database Systems,” 8th Edition, Addison Wesley, Boston, 2003.
[5] D. M. Kroenke and D. J. Auer, “Database Concepts,” 3rd ed, Prentice, New York, 2007.
[6] T. Teorey, S. Lightstone and T. Nadeau, “Database Modeling & Design: Logical Design,” 4th Edition, Morgan Kaufmann Press, Burlington, 2005.
[7] J. W. Tukey, “Exploratory Data Analysis,” Addison Wesley, Reading, 1977.
[8] L. Manovich, “Database as a Symbolic Form,” MIT Press, Cambridge, 2001.
[9] J. Galindo, “Handbook on Fuzzy Information Processing in Databases,” Information Science Reference (an Imprint of Idea Group Inc.), 2008. doi:10.4018/978-1-59904-853-6
[10] J. Gray and A. Reuter, “Transaction Processing: Concepts and Techniques,” Morgan Kaufmann Publishers, Burlington, 1992.
[11] H. M. Berman, “The Protein Data Bank: A Historical Perspective,” Acta Crystallographica Section A: Foundations of Crystallography, Vol. A64, 2008, pp. 88-95. doi:10.1107/S0108767307035623
[12] E. F. Meyer, “The First Years of the Protein Data Bank,” Protein Science, Vol. 6, No. 7, 1997, pp. 1591-1597. doi:10.1002/pro.5560060724
[13] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne, “The Protein Data Bank,” Nucleic Acids Research, Vol. 28, No. 1, 2000, pp. 235-242. doi:10.1093/nar/28.1.235
[14] J. Westbrook, N. Ito, H. Nakamura, K. Henrick and H. M. Berman, “PDBML: The Representation of Archival Macromolecular Structure Data in XML,” Bioinformatics, Vol. 21, No. 7, 2005, pp. 988-992. doi:10.1093/bioinformatics/bti082
[15] H. M. Berman, K. Henrick, H. Nakamura, J. Markley, P. E. Bourne and J. Westbrook, “Realism about PDB,” Nature Biotechnology, Vol. 25, 2007, pp. 845-846. doi:10.1038/nbt0807-845
[16] C. Schierz, L. N. Soldatova and R. D. King, “Overhauling the PDB,” Nature Biotechnology, Vol. 25, 2007, pp. 437-442. doi:10.1038/nbt0407-437
[17] F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, E. F. Meyer Jr., M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi and M. Tasumi, “The Protein Data Bank: A Computer-Based Archival File for Macromolecular Structures,” Journal of Molecular Biology, Vol. 112, No. 3, 1977, pp. 535-542. doi:10.1016/S0022-2836(77)80200-3
[18] A. Vaguine, J. Richelle and S. J. Wodak, “SFCHECK: A Unified Set of Procedure for Evaluating the Quality of Macromolecular Structure-Factor Data and Their Agreement with Atomic Model,” Acta Crystallographica, Vol. D55, 1999, pp. 191-205. doi:10.1107/S0907444998006684
[19] G. Murzin, S. E. Brenner, T. Hubbard and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” Journal of Molecular Biology, Vol. 247, No. 4, 1995, pp. 536-540. doi:10.1016/S0022-2836(05)80134-2
[20] L. Lo Conte, S. E. Brenner, T. J. Hubbard, C. Chothia and A. G. Murzin, “SCOP Database in 2002: Refinements Accommodate Structural Genomics,” Nucleic Acids Research, Vol. 30, No. 1, 2002, pp. 264-267. doi:10.1093/nar/30.1.264
[21] Andreeva, D. Howorth, S. E. Brenner, T. J. Hubbard, C. Chothia and A. G. Murzin, “SCOP Database in 2004: Refinements Integrate Structure and Sequence Family Data,” Nucleic Acids Research, Vol. 32, Suppl. 1, 2004, pp. D226-229. doi:10.1093/nar/gkh039
[22] R. Day, D. A. Beck, R. S. Armen and V. Daggett, “A Consensus View of Fold Space: Combining SCOP, CATH, and the Dali Domain Dictionary,” Protein Science, Vol. 12, No. 10, 2003, pp. 2150-2160. doi:10.1110/ps.0306803
[23] A. Orengo, A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells and J. M. Thornton, “CATH—A Hierarchic Classification of Protein Domain Structures,” Structure, Vol. 5, No. 8, 1997, pp. 1093-1108. doi:10.1016/S0969-2126(97)00260-8
[24] Hadley and D. T. Jones, “A Systematic Comparison of Protein Structure Classifications: SCOP, CATH and FSSP,” Structure, Vol. 7, No. 9, 1999, pp. 1099-1112. doi:10.1016/S0969-2126(99)80177-4
[25] L. Cuff, I. Sillitoe, T. Lewis, A. B. Clegg, R. Rentzsch, N. Furnham, M. Pellegrini-Calace, D. Jones, J. Thornton and C. A. Orengo, “Extending CATH: Increasing Coverage of the Protein Structure Universe and Linking Structure with Function,” Nucleic Acids Research, Vol. 39, Suppl. 1, 2011, pp. D420-D426. doi:10.1093/nar/gkq1001
[26] R. Apweiler, A. Bairoch, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteige and H. Huang, “UniProt: The Universal Protein Knowledgebase,” Nucleic Acids Research, Vol. 32, Suppl. 1, 2004, pp. 1115-1119. doi:10.1093/nar/gkh131
[27] O’Donovan, M. J. Martin, A. Gattiker, E. Gasteiger, A. Bairoch and R. Apweiler, “High-Quality Protein Knowledge Resource: SWISS-PROT and TrEMBL,” Briefings in Bioinformatics, Vol. 3, 2002, pp. 275-284.
[28] C. Uniprot, “The Universal Protein Resource (UniProt),” Nucleic Acids Research, Vol. 36, Suppl. 1, 2007, pp. D190-D195. doi:10.1093/nar/gkm895
[29] R. Leinonen, F. G. Diez, D. Binns, W. Fleischmann, R. Lopez and R. Apweiler, “UniProt Archive,” Bioinformatics, Vol. 20, No. 17, 2004, pp. 3236-3237. doi:10.1093/bioinformatics/bth191
[30] B. E. Suzek, H. Huang, P. McGarvey, R. Mazumder and C. H. Wu, “UniRef: Comprehensive and Non-Redundant UniProt Reference Clusters,” Bioinformatics, Vol. 23, No. 10, 2007, pp. 1282-1288. doi:10.1093/bioinformatics/btm098
[31] A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell and E. W. Sayers, “GenBank,” Nucleic Acids Research, Vol. 39, Suppl. 1, 2011, pp. D32-D37. doi:10.1093/nar/gkq1079

Copyright © 2021 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.