Novel Inorganic Pyrophosphatase from Soil Metagenomic and Family and Subfamily Prediction ()
1. Introduction
Inorganic Pyrophosphatases (PPase, E.C 3.6.1.1), are ubiquitous and is the central enzyme of phosphorus metabolism . This enzyme is an important control of the cellular concentration inorganic pyrophosphate (PPi) and thus controlling biosynthetic reactions like nucleic acid and protein system and is responsible for the pump across the membranes responsible for triplicating ions .
The PPases are made up of two groups, Soluble PPases (Family I and II) and membrane PPase (M-PPase) - . These groups of enzyme are very important for maintaining life. For this reason, we looked for a novel gene of PPase families in a soil metagenome, because we know that soil harbors an immense diversity of microorganisms yet most remains unexplored . For this, the used metagenomic approach has become an indispensable tool for studying the diversity and metabolic potential of environmental microbes, whose bulk is as yet noncultivable, and is a potent method to study this soil demand - . Thus we work with a metagenomic library of a Eucalyptus spp. arboretum (EAA), belonging to Laboratory of Biochemistry and microorganisms of Plants (LBMP), to identify novel gene of PPase families activity, which was submitted to bioinformatics analyses, modeling for testing the biological potential.
2. Methods
We use the metagenomic library of from Eucalyptus spp. arboretum (EAA) . The screening in metagenomic library by PCR for identify the positive clone with degenerate primers, building for this work, using some sequences deposited in NCBI database (DQ182493, DQ916115, DQ916118), Hydro-F- 5’CGTSGGVTAYCGSTAYTTYGA3’; Hydro-R-5’CGMTYDCCYGCSCCDCCYTC3’ . The positive clone A09 plaque 13 was submitted to the process of shotgun sub cloning method . DNAs containing inserts were sequenced by using standard protocols with an ABI PRISM® 3100 Genetic Analyzer. The DNA sequence was determined with the program by the programs Phred , Phrap and Consed . The open reading frames (ORFs) were identified and translated by using the program ORF Finder at NCBI and using BLAST X (http://www.ncbi.nlm.nih.gov/gorf/orfig.cgi), for analysis of similarity with protein. After the genes functional identification the data were analyzed in ProDom is a comprehensive database of protein domain families generated from the global comparison of all available protein sequences. Pfam is a database of protein families, where families are sets of protein regions that share a significant degree of sequence similarity, thereby suggesting homology, similarity is detected using the HMMER3.
The determination homology and the I-Tasser server we used are on-line platform for protein structure and function predictions. 3D models are built based on multiple threading alignments by LOMETS and iterative template fragment assembly simulations, functions in slights are derived by matching the 3D models with BioLip protein function database. PyMOL Molecular Graphics System, version 1.5.0.4 is a program user sponsored molecular visualization system on an open-source foundation . We used the MEMSAT3 and MEMSAT-SVM a novel version of a widely used transmembrane topology prediction method and PSIPRED to identify the signature subfamily . The program used for comparison was Basic local alignment search tools (Blast) and the sequences compared with those online at the GenBank. Sequence alignments were first done using Clustal W (version 1.8) , and then adjusted using the BioEdit, version 5.0.9 Program . Phylogenetic relationships were inferred by preferential alignments of the Membrane PPase (MPPase) protein sequences obtained from GenBank. This was done using the program MEGA5 (version 2.1) . Bootstrap analysis was performed with 1000 replicates .
3. Results
The data information of shotgun library was assembled according functional knowledge from genes using OrfFinder, after we analyzed of metagenomic involves functional annotation of the predicted genes by database comparison searches using the and ProDom and Pfam programs . The assembly sequences were submitted to the GenBank, the accession number KF715620.
We identify the novel gene of inorganic pyrophosphatase (PPase) so called MetaPPase. Then we submitted the amino acid sequences on I-Tasser serve, has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score) based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1]) of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > −1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the ITASSER models can be predicted with an average error of 0.08 for TM-score and 2 Å for RMSD .
The templates protein of similar folds from PDB (Protein Data Bank) library, with the result: 100.0% confidence by the single highest scoring template is a true homology, code template PDB 4A06, i.d. was of 25%, Fold: H-PPase, Superfamily: H(+)-translocating pyrophosphatase.
After predictions of structure and function of the MetaPPase gene, we used the PyMol to visualize the image structure (Figure 1). Structure like M-PPase that display particular characteristics in proportions other families, these consist homodimer with 15 - 17 transmembrane (TM) helices . To confirm this characteristic was used the MEMSAT3 and MEMSAT-SVM program , to identify transmembrane helices (Figure 2).
It is possible observed in Figure 2, each yellow segment represents a transmembrane helices and the N-terminal is the periplasm and C-terminal in cytoplasm. Furthermore the M-PPases can represent in H+ Transporting
(a) (b)
Figure 1. MetaPPase structure (a) and template conformation PDB accession number 4AV6 (b). The α-helices are in cyan, the β-sheet pink and the loops are salmon, for both structures.
Figure 2. Schematic diagram of the MEMSAT3 and MEMSAT-SVM predictions for the query sequence. Traces indicate the RAW outputs for the prediction SVMs. Dashed lines indicate the prediction threshold. Where PL: Pore lining residue, SP: Signal peptide residue, RE: Re-entrant helix residue and iL/oL&HL: helix prediction.
PPase (H+-PPase), divided in subfamilies the K+ independent and K+ dependent where K+ independent are insensitive to monovalent cations, where K+ dependent enzymes need millimolar concentrations of K+ for activity. These results suggest the possibility of MetaPPase be H+ Transporting PPase (H+-PPase). The signature of MetaPPase and a region contain three conserved aspartates that are involved in the binding of cations (Figure 3).
The analyses performed yet allowed to estimate the family of novel MetaPPase and to obtain more information about the subfamily we led a phylogenetic analysis. The sequences in the tree were selected by redundancy filtering to leave only representative sequences for each group of highly similar sequences . The total number of sequences was found in the NCBI protein sequence database for each PPase subfamily.
The evolutionary history was inferred using the Neighbor-Joining method . The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches . The evolutionary distances were computed using the JTT matrix-based method and are in the units of the number of amino acid substitutions per site. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 80 amino acid sequences (Figure 4). All positions containing gaps and missing data were eliminated. There were a total of 367 positions in the final dataset. Evolutionary analyses were conducted in MEGA5 . The tree showed the evolutionary tree of MetaPPase, this tree including plant and protest H+-PPase, consists of independently envolving K+ independent and K+ dependent and families.
4. Discussion
The MetaPPase belongs to membrane integral PPase (M-PPase) family, that can be divided into four subfamilies based on their ion-pumping specificity Na + and/or H+ and the latter they are dependent K+ or not - . We observed the at Figure 1 similarities which represent M-PPase family and couple the hydrolysis of PPi to the transport of cations across membranes - .
This subfamilies of transports are widespread in bacteria, archaea an plants - , but recently identify the Na+ Transporting PPase (Na+PPase) in mesophilic microorganism, and this enzyme are similar to the H+- PPase in many aspects but require both K+ and Na+ for activity .
Our topology model of transmembrane helices Figure 2 has confirmed the 17-transmembrane domain structure predicted by the hydropathy analysis of the primary structure, moreover the N-terminal is the periplasm and C-terminal in cytoplasm, the same characteristics was describe in Streptomyces coelicolor, and H+-PPases subfamily .
The Figure 3 suggests that MetaPPase represents enzymatic function of H+-PPase . According Suzuki , using the alignment of the predicted amino acid sequence for identifies active site and the ligands are highly conserved.
The evolutionary tree of membrane PPases (Figure 4) allowed to identified MetaPPase with H+-PPase and probably belongs the subfamily of K+ independent despite the MetaPPase be in separate clade with low bootstrap
Figure 3. The consensus sequences of membrane PPase identify the conserved signature of subfamily H+-PPase.
Figure 4. The sequence in the tree was selected of verified and putative membrane PPase. The sequences found in the NCBI protein databases for each PPase subfamily (H+-PPase with K+ independent or K+ dependentand and Na+-PPase).
in relation the others microorganisms , demonstrated that all members of the K+ independent family appear to operate as H+ pumps.
5. Conclusions
Our results corroborate with the hypothesis the high unexplored microbial diversity of soil are able to found novel genes. Metagenomic approach has become an indispensable tool, allowing to isolating novel genes and the functional annotation of the predicted genes by database comparison searches and is essential to identify the MetaPPase.
The use of different bioinformatics tools supports the predictions of the family and subfamily of MetaPPase which suppose operations as H+ Transporting PPase (H+-PPase), K+ independent, observed at evolutionary tree.
This suggests a special feature that, our work in situ will be cloning the gene expression vector for subsequent kinetic characterization and crystallization.
Acknowledgements
We thank the Program of Postgraduate in Agropecuary Microbiology (PPMA) and Coordenação de Aperfei- çoamento de Nível Superior (CAPES) for the financial support.
Abbreviations List
ABI: PRISM® 3100 Genetic Analyzer
BioEdit: Sequence Alignment Editor for Windows 95/98/NT/XP/Vista/7
BLAST: Basic Local Alignment Search Tool
ClustalW: Multiple Sequence Alignment
Consed: Sequence assembly editor companion to Phrap
DNA: Deoxyribonucleic acid
EAA: Eucalyptus spp.Arboretum
GenBank: Sequence database provided by the National Center for Biotechnology Information (NCBI)
H-PPase: Pyrophosphate-energised proton pump
H+-PPase:Hydrogen ions transporting PPase
HMMER3: Databases for homologs of protein sequences
iL/oL&HL:Helix prediction
I-Tasser: Server for protein structure and function prediction
K+:potassium ions
LBMP: Laboratory of Biochemistry and Microorganisms of Plants
LOMETS:Local Meta-Threading-Server
MEGA5: Molecular Evolutionary Genetics Analysis
MEMSAT3 & MEMSAT-SVM: Membrane Helix Prediction
MetaPPase: inorganic pyrophosphatase metagenomic
Na+-PPase: Sodium ions transporting PPase
NCBI: National Center for Biotechnology Information
OrfFinder: Searches for open reading frames
ORFs: Open reading frames
PCR: Polymerase Chain Reaction
PDB: Protein Data Bank
Pfam: Database is a large collection of protein families
Phrap: Program for shotgun sequence assembly
Phred: Base calling software with quality estimation
PL: Pore lining residue
PPases: inorganic pyrophosphatase
PPi: inorganic phosphatase
ProDom: Protein Domain Prediction
PSIPRED: Protein Sequence Analysis Workbench
PyMOL: Molecular Graphics System
RAW: Traces indicate outputs for the prediction SVMs
RE: Re-entrant helix
SP: Signal peptide
SVMs: Support Vector Machine
TM: Transmembrane.
NOTES
*Corresponding authors.