Novel Inorganic Pyrophosphatase from Soil Metagenomic and Family and Subfamily Prediction

Inorganic pyrophosphatase (PPase) is widely studied, to be extremely important for survival of plants and microorganisms. PPases catalyze an essential reaction the hydrolysis of inorganic pyrophosphate (PPi) to inorganic phosphate (Pi). Studies involving the mechanism of PPase were performed in microorganisms culture. We didn’t found reports of PPase derived from soil metagenomic libraries. Soil environment has immense diversity of microorganisms, yet most remains unexplored and the metagenome are the technologies used and investigate uncultured microorganisms potential. The aim is to identify novel genes using the metagenomic approaches from a bioinformatics perspective and hopefully will serve as a useful resource. With this purpose, we used the metagenomic library of Eucalyptus spp. arboretum (EAA). We did a screening to select a positive clone and submitted them to the process of shotgun. The data obtained was submitted to bioinformatics analyses. These analyses identified were the novel MetaPPase gene and were classified according to the predict family and subfamily.


Introduction
Inorganic Pyrophosphatases (PPase, E.C 3.6.1.1),are ubiquitous and is the central enzyme of phosphorus metabolism [1].This enzyme is an important control of the cellular concentration inorganic pyrophosphate (PPi) and thus controlling biosynthetic reactions like nucleic acid and protein system and is responsible for the pump across the membranes responsible for triplicating ions [2].
The PPases are made up of two groups, Soluble PPases (Family I and II) and membrane PPase (M-PPase) [2]- [4].These groups of enzyme are very important for maintaining life.For this reason, we looked for a novel gene of PPase families in a soil metagenome, because we know that soil harbors an immense diversity of microorganisms yet most remains unexplored [5].For this, the used metagenomic approach has become an indispensable tool for studying the diversity and metabolic potential of environmental microbes, whose bulk is as yet noncultivable, and is a potent method to study this soil demand [6]- [10].Thus we work with a metagenomic library of a Eucalyptus spp.arboretum (EAA), belonging to Laboratory of Biochemistry and microorganisms of Plants (LBMP), to identify novel gene of PPase families activity, which was submitted to bioinformatics analyses, modeling for testing the biological potential.

Methods
We use the metagenomic library of from Eucalyptus spp.arboretum (EAA) [11].The screening in metagenomic library by PCR for identify the positive clone with degenerate primers, building for this work, using some sequences deposited in NCBI database (DQ182493, DQ916115, DQ916118), Hydro-F-5'CGTSGGVTAYCGSTAYTTYGA3'; Hydro-R-5'CGMTYDCCYGCSCCDCCYTC3' [10].The positive clone A09 plaque 13 was submitted to the process of shotgun sub cloning method [12].DNAs containing inserts were sequenced by using standard protocols with an ABI PRISM ® 3100 Genetic Analyzer.The DNA sequence was determined with the program by the programs Phred [13], Phrap and Consed [14].The open reading frames (ORFs) were identified and translated by using the program ORF Finder at NCBI and using BLAST X (http://www.ncbi.nlm.nih.gov/gorf/orfig.cgi),for analysis of similarity with protein.After the genes functional identification the data were analyzed in ProDom [15] is a comprehensive database of protein domain families generated from the global comparison of all available protein sequences.Pfam [16] is a database of protein families, where families are sets of protein regions that share a significant degree of sequence similarity, thereby suggesting homology, similarity is detected using the HMMER3.
The determination homology and the I-Tasser server [17] we used are on-line platform for protein structure and function predictions.3D models are built based on multiple threading alignments by LOMETS and iterative template fragment assembly simulations, functions in slights are derived by matching the 3D models with BioLip protein function database.PyMOL Molecular Graphics System, version 1.5.0.4 is a program user sponsored molecular visualization system on an open-source foundation [18].We used the MEMSAT3 and MEMSAT-SVM a novel version of a widely used transmembrane topology prediction method [19] and PSIPRED to identify the signature subfamily [20].The program used for comparison was Basic local alignment search tools (Blast) [21] and the sequences compared with those online at the GenBank.Sequence alignments were first done using Clustal W (version 1.8) [22], and then adjusted using the BioEdit, version 5.0.9Program [23].Phylogenetic relationships were inferred by preferential alignments of the Membrane PPase (MPPase) protein sequences obtained from GenBank.This was done using the program MEGA5 (version 2.1) [24].Bootstrap analysis was performed with 1000 replicates [25].

Results
The data information of shotgun library was assembled according functional knowledge from genes using Orf-Finder, after we analyzed of metagenomic involves functional annotation of the predicted genes by database comparison searches using the and ProDom and Pfam programs [15] [16].The assembly sequences were submitted to the GenBank, the accession number KF715620.
We identify the novel gene of inorganic pyrophosphatase (PPase) so called MetaPPase.Then we submitted the amino acid sequences on I-Tasser serve, has generated protein structure predictions for thousands of modeling requests from more than 35 countries.A scoring function (C-score) based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions.A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1]) of the first models with a correlation coefficient of 0.91.Using a C-score cutoff > −1.5 for the models of correct topology, both false positive and false negative rates are below 0.1.Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 Å for RMSD [17].
The templates protein of similar folds from PDB (Protein Data Bank) library, with the result: 100.0%confidence by the single highest scoring template is a true homology, code template PDB 4A06, i.d. was of 25%, Fold: H-PPase, Superfamily: H (+) -translocating pyrophosphatase.
After predictions of structure and function of the MetaPPase gene, we used the PyMol to visualize the image structure (Figure 1).Structure like M-PPase that display particular characteristics in proportions other families, these consist homodimer with 15 -17 transmembrane (TM) helices [26].To confirm this characteristic was used the MEMSAT3 and MEMSAT-SVM program [19], to identify transmembrane helices (Figure 2).PPase (H + -PPase), divided in subfamilies the K + independent and K + dependent where K + independent are insensitive to monovalent cations, where K + dependent enzymes need millimolar concentrations of K + for activity.These results suggest the possibility of MetaPPase be H + Transporting PPase (H + -PPase).The signature of Meta-PPase and a region contain three conserved aspartates that are involved in the binding of cations (Figure 3).The analyses performed yet allowed to estimate the family of novel MetaPPase and to obtain more information about the subfamily we led a phylogenetic analysis.The sequences in the tree were selected by redundancy filtering to leave only representative sequences for each group of highly similar sequences [27].The total number of sequences was found in the NCBI protein sequence database for each PPase subfamily.

It is possible observed in
The evolutionary history was inferred using the Neighbor-Joining method [28].The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches [25].The evolutionary distances were computed using the JTT matrix-based method [29] and are in the units of the number of amino acid substitutions per site.The rate variation among sites was modeled with a gamma distribution (shape parameter = 1).The analysis involved 80 amino acid sequences (Figure 4).All positions containing gaps and missing data were eliminated.There were a total of 367 positions in the final dataset.Evolutionary analyses were conducted in MEGA5 [24].The tree showed the evolutionary tree of MetaPPase, this tree including plant and protest H + -PPase, consists of independently envolving K + independent and K + dependent and families.

Discussion
The MetaPPase belongs to membrane integral PPase (M-PPase) family, that can be divided into four subfamilies based on their ion-pumping specificity Na + and/or H + and the latter they are dependent K + or not [18]- [20].We observed the at Figure 1 similarities which represent M-PPase family and couple the hydrolysis of PPi to the transport of cations across membranes [2]- [4] [26].
This subfamilies of transports are widespread in bacteria, archaea an plants [4] [26]- [28], but recently identify the Na + Transporting PPase (Na + PPase) in mesophilic microorganism, and this enzyme are similar to the H + -PPase in many aspects but require both K + and Na + for activity [2] [3].
Our topology model of transmembrane helices Figure 2 has confirmed the 17-transmembrane domain structure predicted by the hydropathy analysis of the primary structure, moreover the N-terminal is the periplasm and C-terminal in cytoplasm, the same characteristics was describe in Streptomyces coelicolor, and H + -PPases subfamily [26].
The Figure 3 suggests that MetaPPase represents enzymatic function of H + -PPase [27].According Suzuki [30], using the alignment of the predicted amino acid sequence for identifies active site and the ligands are highly conserved.
The evolutionary tree of membrane PPases (Figure 4) allowed to identified MetaPPase with H + -PPase and probably belongs the subfamily of K + independent despite the MetaPPase be in separate clade with low bootstrap  in relation the others microorganisms [3], demonstrated that all members of the K + independent family appear to operate as H + pumps.

Conclusions
Our results corroborate with the hypothesis the high unexplored microbial diversity of soil are able to found novel genes.Metagenomic approach has become an indispensable tool, allowing to isolating novel genes and the functional annotation of the predicted genes by database comparison searches and is essential to identify the MetaPPase.The use of different bioinformatics tools supports the predictions of the family and subfamily of MetaPPase which suppose operations as H + Transporting PPase (H + -PPase), K + independent, observed at evolutionary tree.
This suggests a special feature that, our work in situ will be cloning the gene expression vector for subsequent kinetic characterization and crystallization.

Figure 2 ,Figure 1 .Figure 2 .
Figure 1.MetaPPase structure (a) and template conformation PDB accession number 4AV6 (b).The α-helices are in cyan, the β-sheet pink and the loops are salmon, for both structures.

Figure 3 .
Figure 3.The consensus sequences of membrane PPase identify the conserved signature of subfamily H + -PPase.

Figure 4 .
Figure 4.The sequence in the tree was selected of verified and putative membrane PPase.The sequences found in the NCBI protein databases for each PPase subfamily (H + -PPase with K + independent or K + dependentand and Na + -PPase).