Using schema transforation pathways for biological data integration


In web environments, proteomics data integra-tionin the life sciences needs to handle the problem of data conflicts arising from the het-erogeneity of data resources and from incom-patibilities between the inputs and outputs of services used in the analysis of the resources. The integration of complex, fast changing bio-logical data repositories can be potentially sup-ported by Grid computing to enable distributed data analysis. This paper presents an approach addressing the data conflict problems of pro-teomics data integration. We describe a pro-posed proteomics data integration architecture, in which a heterogeneous data integration sys-tem interoperates with Web Services and query processing tools for the virtual and materialised integration of a number of proteomics resources, either locally or remotely. Finally, we discuss how the architecture can be further used for supporting data maintenance and analysis ac-tivities.

Share and Cite:

Fan, H. and Wang, F. (2008) Using schema transforation pathways for biological data integration. Journal of Biomedical Science and Engineering, 1, 204-209. doi: 10.4236/jbise.2008.13035.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] P. Buneman et al. (1994) Comprehension syntax. SIGMOD Re-cord, 23(1):87–96.
[2] A. Bairoch and R. Apweiler. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res, 28:45–48.
[3] H. M. Berman, J. Westbrook, and et al. (2000) The Protein Data Bank. Nucleic Acids Res, 28:235–242.
[4] R. Craig, J. P. Cortens, and R. C. Beavis. (2004) Open source system for analyzing, validating, and storing protein identification data. Journal of Proteome Research, 3(6).
[5] H. Fan and A. Poulovassilis. (2005) Using schema transformation pathways for data lineage tracing. In Proc. BNCOD’05, LNCS 3567, pages 133–144.
[6] Hao Fan. (2005) Investigating a Heterogeneous Data Integration Approach for Data Warehousing. PhD thesis, Birkbeck College, University of London.
[7] H. Fan and L. Li. (2007) Study on Metadata Applications for Pro-teomics Data Integration. In Proc. ICBBE’07, IEEE.
[8] K. Garwood et al. (2004) Pedro: A database for storing, searching and disseminating experimental proteomics data. BMC Genomics, 5.
[9] E. Jasper, A. (2003) Poulovassilis, and L. Zamboulis. Processing IQL queries and migrating data in the AutoMed toolkit. Technical Report 20, Automed Project.
[10] P. McBrien and A. Poulovassilis. (2003) Data integration by bi-directional schema transformation rules. In Proc. ICDE’03, pages 227–238.
[11] T. McLaughlin, J. A. Siepen, J. Selley, J. A. Lynch, K. W. Lau, H. Yin, S. J. Gaskell, and S. J. Hubbard. (2006) Pepseeker: a database of proteome peptide identifications for investigating fragmentation patterns. Nucleic Acids Research, 34.
[12] D.N. Perkins, D.J. Pappin, D.M. Creasy, and J.S. Cottrell. (1999) Probabilitybased protein identification by searching sequence da-tabases using mass spectrometry data. Electrophoresis, 20(18).
[13] L. Zamboulis, H. Fan et al, (2006) Data Access and Integration in the ISPIDER Proteomics Grid. In proc. DILS, pages 3–18.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.