Evaluation of Molecular Techniques in Characterization of Deep Terrestrial Biosphere

A suite of molecular methods targeting 16S rRNA genes (i.e., DGGE, clone and high-throughput [HTP] amplicon library sequencing) was used to profile the microbial communities in deep Fennoscandian crystalline bedrock fracture fluids. Variation among bacterial 16S rRNA genes was examined with two commonly used primer pairs: P1/P2 and U968f/U1401r. DGGE using U968f/ U1401r mostly detected β-, γ-proteobacteria and Firmicutes, while P1/P2 primers additionally detected other proteobacterial clades and candidate divisions. However, in combination with clone libraries the U968f/U1401r primers detected a higher bacterial diversity than DGGE alone. HTP amplicon sequencing with P1/P2 revealed an abundance of the DGGE bacterial groups as well as many other bacterial taxa likely representing minor components of these communities. Archaeal diversity was investigated via DGGE or HTP amplicon sequencingusing primers A344F/ 519RP. The majority of archaea detected with HTP amplicon sequencing belonged to uncultured Thermoplasmatales and Pendant 33/DHVE3, 4, 6 groups. DGGE of the same samples detected mostly SAGMEG and Methanosarcinales archaea, but almost none of those were revealed by HTP amplicon sequencing. Overall, our results show that the inferred diversity and composition of microbial communities in deep fracture fluids is highly dependent on analytical technique and that the method should be carefully selected with this in mind.


Introduction
Due to the difficulty of obtaining samples, microbial communities in deep terrestrial subsurface environments are some of the least studied in the world.In deep fracture fluids of Fennoscandian crystalline bedrock, microbes typically occur at densities of 10 3 to 10 6 cells•ml −1 [1]- [5].Cell numbers are the highest in groundwater layers near the surface but rapidly decrease and experience a tenfold drop at 100 m and up to a 100-fold decrease at 500 m [1] [2] [4] [6].Despite the relatively low concentration of cells, taxonomic and functional diversity of microbial communities in the deep terrestrial subsurface is believed to be high [1]- [6].
According to culture dependent techniques such as Most Probable Number (MPN), microbial communities in the anaerobic deep subsurface of the Fennoscandian Shield reduce iron, sulphate, manganese, nitrate and oxygen and produce acetate and methane [7] [8].While this information is important for estimating the functional potential of these communities, identification of the microbes detected by MPN remains challenging due to the difficulty of in vitro culture.It has been estimated that only 1% -5% of the microbial cells in an environmental sample can be successfully cultured on laboratory media.Thus, in vitro cultivation techniques offer a very incomplete picture of microbial communities [9] [10] and it is unclear if and how artificial conditions affect the metabolism or functional activity of isolated microbes [11].Given that microbial communities in the deep subsurface may comprise thousands of species, in vitro cultivation is an impractical option [12].
Analysis of phylogenetic marker genes using molecular methods offers a way to detect and identify the microbial taxa present in fracture fluids and can thereby be used to generate estimates of community diversity and composition [4] [13].Gene for ribosomal 16S subunit (16S rRNA) is commonly used as a taxonomic marker because it is highly conserved, found in all bacteria and archaea, and a rich 16S rRNA database has been generated by systematists and evolutionary biologists against which sequences from fracture fluids can be compared and identified [14] [15].
Recent advances in high-throughput (HTP) sequencing technologies have further enhanced the study of microbial communities in deep subsurface environments.HTP sequencing is becoming increasingly popular due to improved cost-efficiency, robust techniques, and the possibility for greater coverage and resolution of the microbial community.With higher detection sensitivity, it has become possible to obtain more complete estimates of community composition and infer relationships between microbes and their geochemical surroundings [5].HTP amplicon library sequencing techniques can also detect microbes that occur infrequently or at low abundances in environmental samples much better than other methods [5] [21] [22].
However, molecular techniques, like all methods, are limited or biased according to their design and sophistication.For example, selection of PCR primers used to amplify 16S rRNA genes has a tremendous impact on the microbial diversity detected [23]- [25].In addition, the method used to separate and identify PCR amplicons is important.DGGE is widely used in molecular ecology to rapidly screen microbial communities in different habitats (e.g., [3] [26] [27]).However, PCR-DGGE is believed to be a rather coarse technique that can over-emphasize abundant groups while failing to detect taxa comprising <1% of microbial cells in a sample [27]- [30].Furthermore, different techniques can be biased with respect to the detection of certain taxa, e.g., the SAR11 cluster of aquatic α-proteobacteria has been shown by clone library screening and fluorescent in situ hybridization (FISH) to be the dominant group of bacteria in most marine ecosystems, but they are only rarely detected by DGGE [31] [32].DGGE can also underestimate microbial diversity in that different amplicons with similar migration properties will not be resolved on the gel [33]- [35].On the other hand, while sequences obtained from clone libraries are generally of better quality than those amplified from DGGE bands, the technique is biased towards gene fragments that are better compatible with the cloning system and leaves "difficult" fragments undetected [36] [37].The HTP sequencing of amplicon libraries yield a much higher number of sequences per sample than what is possible for DGGE or practical for clone libraries.However, although the read length for amplicon library sequencing is increasing, it remains relatively short and can consequently yield imprecise or uncertain phylogenetic resolution.In addition, the higher rate of sequencing error in comparison to Sanger sequencing can mislead estimates of species richness and community diversity.
In this study, we used DGGE, clone libraries and HTP amplicon sequencing to investigate microbial (i.e., bacteria and archaea) community composition in terrestrial groundwater 78 -546 m deep collected from Olkiluoto Island, Finland.We aimed to determine the influence of classical (DGGE and clone libraries) and novel molecular tools (amplicon library sequencing) on the characterization of microbial communities and population diversity.We also investigated the effect of primer selection on the detection of bacterial groups in groundwater samples.

Materials and Methods
Sample preparation.Groundwater samples were obtained from several drillholes on Olkiluoto Island, Finland (61˚14'13"N, 21˚26'27"E).The site is described in detail in [4].Samples were drawn at depths from 78 -546 m below ground surface (mbgs) (Table 1).All samples except ONK-PVA3/83m were obtained with devices that enable samples to be retrieved at in situ pressure (as described in [4] [6]).Sample ONK-PVA3/83m originated from a free-flowing drillhole in the ONKALO tunnel (mbsl) (http://www.posiva.fi/en/final_disposal/onkalo#.U329rC-D2Go) and was collected anaerobically and aseptically into a sterile and anaerobic headspace vial (vol.120 ml) sealed with a butyl rubber septum and aluminum crimp cap (Bellco Glass, NJ, USA).Groundwater was drawn into the glass vial via airtight poly-acetate tubing attached to a sterile hypodermic needle.The concentration of biomass and protocol for DNA extraction was as described in [4].In short, the microbial cells from each water sample were collected on 0.2 µm pore size nitrocellulose acetate filters (Corning Inc., NY, USA) by vacuum suction.Prior to DNA extraction the filters were cut to approximately 2 × 2 mm pieces in a laminar fume hood with sterile scalpels.The DNA was extracted with the Power Soil DNA extraction kit (Mo-Bio Laboratories Inc., Solana Beach, CA, USA).The microbial biomass collected on the filter pieces was lysed by bead beating with a Ribolyser (Hybaid Ltd., Ashford, UK) device for 30 s at 6 m•s −1 after which the DNA extraction proceeded according to the manufacturer's instructions.
The samples were subsequently submitted to different molecular biological methods (Table 2).PCR.Two different primer pairs were used for amplification of bacterial 16S rRNA gene fragments.Primer pair P1 and P2 [26] was used to amplify a 193bp long fragment covering the V3 variable region of the bacterial 16S rRNA gene, while primers U968f and U1401r [38] were used to amplify a 473 bp long fragment covering the V6-8 regions.For DGGE analysis, the forward primers were modified with a GC-clamp at their 5' end [26] [38].
PCR amplification was carried out in 50 µl reaction volumes containing 1 × Dynazyme ® II buffer (10 m MTris-HCl, pH 8.8, 1.5 mM MgCl 2 , 50 mM KCl and 1% Triton-X-100), 0.2 mM each deoxynucleoside triphosphate, 20 or 50 pmol of each primer for primer pairs P1/P2 and U968f/U1401r, respectively, and 1U Dynazyme ® II DNA polymerase (Finnzymes, Espoo, Finland).For primer pair P1/P2, 0.5 µl formamide was added to each PCR, and 10 µg BSA was added for PCRs with U968f/U1401r. 2 µl undiluted DNA extract (approximately 1 -3 ng DNA µl −1 ) was used as template.The PCR program for P1/P2 consisted of an initial denaturation step of 5 min at 94˚C followed by 35 cycles of 1 min at 94˚C, 1 min at 55˚C and 1 min at 72˚C, followed by a final elongation at 72˚C for 5 min.PCR program for U968f/U1401r consisted of 5 min initial denaturation at 94˚C followed by 30 cycles of 30 s at 94˚C, 20 s at 55˚C and 40 s at 72˚C and final elongation at 72˚C for 7 min.
The archaeal 16S rRNA gene fragments were produced by a nested PCR [4].An 806 bp fragment of archaeal 16S rRNA gene was first amplified with A109f and Arch915R [39] [40].The reaction mixture was as described above.The PCR program consisted of an initial denaturation of 5 min at 95˚C followed by 30 cycles of 1 min at 95˚C, 1 min at 54˚C and 2 min at 72˚C.In the second PCR, a 227 bp fragment was produced with A344F-GC (GC-clamp at the 5' end) and 519RP [40] [41] flanking the V3 hypervariable region of 16S rRNA.The reaction mixture was as in the first PCR except that 2 µl of purified amplification product was used as template.The PCR program was 5 min at 95˚C, 30 cycles of 1 min at 95˚C, 1 min at 50˚C, 1 min at 72˚C followed by 10 min at 72˚C.PCR products were visualized via electrophoresis in a 1.2% agarose gel stained with EtBr (0.2 µg•ml −1 ).The gel was run in 1× TAE buffer for 1 h at 150 V and subsequently photographed in UV light with the Bio-Rad GelDoc imager (Bio-Rad, CA, USA).
DGGE.Bacterial and archaeal 16S rRNA amplicons were resolved by denaturing gradient gel electrophoresis (DGGE) in an 8% acrylamide/bis (37.5:1) gel.The denaturing gradient for amplicons obtained with primer pair P1/P2 and the archaeal primers A344FGC/519RP was 20% -65%, and 38% -60% for amplicons obtained with U968f/U1401r.Electrophoresis was performed at 60˚C for 19 h and 65 V for P1/P2 products, 16 h and 85 V for U968f/U1401r products, and 18 h and 65 V for archaeal products.Following electrophoresis, DGGE gels were stained with Sybr Green II (Lonza, Switzerland) according to the manufacturer's recommendations and visual-ized under UV light with a Bio-Rad GelDoc imager.Bands were cut from gels with sterile disposable plastic Pasteur pipettes.Each excised band was incubated in 20 µl ultra clean water overnight at 4˚C in order to elute the DNA.Eluted products were reamplified with the same primers and conditions as in the original PCRs and resulting amplicons were purified with the Qiaquick PCR Purification kit (Qiagen, Germany).Sequencing was performed from both ends of the amplicon with the BigDye ® Terminator v3.1 Cycle Sequencing kit (Applied Biosystems, California, USA) in the ABI Prism 310 Genetic Analyzer (Applied Biosystems, California, USA).
Clone libraries.Bacterial U968f/U1401r amplicons were used to construct clone libraries.Amplicons were purified with the QIAquick PCR purification kit (Qiagen, Germany) according to the manufacturer's protocol.The TOPO-TA cloning kit (Life Technologies, UK) was used for ligation and chemically competent DH5α Escherichia coli cells were used for transformation.Cloning was performed according to the manufacturer's instructions.Two parallel LB plates containing 50 µg•ml −1 kanamycin were prepared for each cloning reaction and 40 µl X-gal (40 mg•ml −1 ) was spread on the surface.Plates were pre-incubated for 1 h at 37˚C before 10 and 50 µl aliquots were taken from each reaction and plated.Plates were incubated overnight at 37˚C.Between 128 and 229 white clones from each reaction were checked for insertion by colony PCR.Bacterial cells from individual colonies were suspended in 50 µl sterile molecular grade water.Of this suspension, 2.5 µl was used as template in 10 µl PCR reactions.PCR reactions consisted of 1 × reaction buffer, 0.25 m MdNTP, 10 pmol each of vector specific primers M13F and M13R and 0.25 U Dynazyme II.The amplification program consisted of an initial cell lysis and denaturation step of 5 min at 94˚C, followed by 35 cycles of 1 min at 94˚C, 1 min at 54˚C and 3 min at 72˚C, and a final extension step of 10 min at 72˚C.The colony PCR products were checked on a 1.2% agarose gel as described above.Ninety-five positive clones were randomly selected for growth in Luria broth containing 50 µg•ml −1 kanamycin and plasmids were extracted using the QIAprep Spin Miniprep Kit (Qiagen).The clones were sequenced using the M13 primers and BigDye ® Terminator v3.1 cycle sequencing kit in an ABI Prism 310 Genetic Analyzer.
Amplicon libraries.Amplicon libraries for HTP sequencing were prepared by PCR from OL-KR40/385m, OL-KR49/532m and OL-KR40/609m.For bacterial 16S rRNA gene libraries, P1/P2 with adapter and barcode sequence modifications at their 5' end were used in PCR (Table 2).For archaeal 16S rRNA gene libraries, a nested PCR approach was used with A109f and Arch915R [39] [40] in the first round after which A344FGC/ 519RP modified with 5' adapter and barcode sequences were used for the production of the amplicon library.
The amplicon libraries for bacterial 16S rRNA were produced using the Dynazyme II polymerase (Finnzymes, Finland) in 50 µl reaction volumes as described above.For the archaeal 16S rRNA gene amplicon libraries were generated with 50 µl reactions consisting of 1 × GC reaction buffer, 0.2 m MdNTP, 2.5 pmol of each primer and 1 unit of Phusion polymerase (Finnzymes, Finland).As template, 2 µl first round PCR product was used.The PCR program for amplicon libraries consisted of an initial denaturation at 98˚C for 30 s followed by 35 cycles of 98˚C for 10 s, 55˚C for 15 s and 72˚C for 15 s and a final extension of 5 min at 72˚C.Amplicon libraries were sequenced at the Institute of Biotechnology using Roche 454 FLX technology.
Community analyses and phylogeny.Sequences obtained from clone and amplicon libraries were analyzed with Mothur [42].First, the adapter and barcode sequences from the amplicon library sequences as well as the primer sequences from both clone and amplicon library sequences were removed.Sequences were then aligned to reference alignments composed of bacterial (14,956) or archaeal (2297) 16S rRNA gene sequences obtained from the Silva database [43].Bacterial amplicon library sequences were at least 95 bp long or 320 bp long for the clone library sequences.Archaeal amplicon library sequences were at least 90 bp long.All sequences that aligned to the reference were included in subsequent analyses.Sequences that did not align well (e.g., due to poor sequence quality) were discarded at this point.
Sequences at least 90% identical were combined and treated as Operational Taxonomic Units (OTUs).Representativeness of sequencing was tested by rarefaction analysis of the resulting OTUs.The estimated Chao [44] and ACE [45] species richness, rarefaction analyses as well as the Shannon [46] and Simpson [47] diversity indices were analyzed for archaea and bacteria separately for each sample with Mothur [43].
Representative sequences of each OTU as well as the sequences obtained from DGGE were subjected to phylogenetic analysis.Sequence sets obtained with different primer pairs were analyzed separately.Sequences were aligned with appropriate references and the most similar sequences were identified from the NCBI database using the blastn tool as applied in Geneious Pro (Biomatters Ltd, New Zealand).Sequences were aligned in Geneious Pro using ClustalW and then checked and edited by eye before being trimmed to a region represented by all OTUs.The trimmed alignment was then realigned using Muscle with default settings included in Geneious Pro.Maximum likelihood trees were calculated using the PhyML algorithm [48] included in Geneious Pro.Support values for inferred branches were calculated with 1000 bootstrap pseudoreplicates.
In silico testing of primer coverage.The taxonomic coverage of the primers used in this study was tested against the 16S rRNA gene sequence database of the Ribosomal Database Project (RDP) release 11 [49] using the Probe Match tool.The search was restricted to the area covered by the primers, i.e. positions 341-534 for primers P1/P2, 960-1450 for U986f/U1401r, 109-934 for A109f/A934r and 344-354 for A344f/A519r, E. coli numbering.In addition, the primers were compared to the NCBI genome database using blastn.
Statistical analyses.Pearson's linear r correlation between the presence or absence of bacterial taxa in correlation to community profiling technique (DGGE, clone libraries, HTP sequencing), primer pair, number of identified sequences and the geochemical parameters were calculated using PAST v. 3.0 [50] in order to determine which of these parameters had the greatest effect on the detected taxa.

DGGE.
Bacteria.DGGE analysis of bacterial communities was performed with two different primer sets described above.U968f/U1401r [38] amplified 5 -18 DGGE bands in each sample (Figure 1, Table 3(a), Table 4(a)), 2 -7 of which were successfully sequenced per sample.P1/P2 [26] amplified 11 -19 products per sample (Figure 1, Table 3(a), Table 4(a)), 3 -10 of which were sequenced but only few of these sequences were of sufficient quality for subsequent analysis.Richness estimates were not calculated for DGGE results because of relatively low number of bands and the difficulty of determining the abundance of a given DGGE band sequence.
Archaea.The archaeal community in samples OL-KR40/385m, OL-KR49/532m and OL-KR40/609m was characterized using nested PCR with A109f and Arch915R as first-round primers and A344FGC and 519RP in the second round.Between 8 -23 DGGE bands were detected per sample (Table 3(   Clone libraries.Clone libraries of the bacterial 16S rRNA genes were constructed from three samples, ONK-PVA3/83m, OL-KR43/96m and OL-KR42/175m with U968f/U1401r.Out of 96 clones chosen for sequencing from each clone library, good quality sequences were obtained from 88, 90 and 66 clones, respectively (Table 3
Coverage of primers.The P1/P2 primer pair covered 81% of the bacterial 16S rRNA gene sequences of the RDP database spanning the predetermined positions of the primer pair when allowing for no mismatches between the primer and target sequences.The primers U968f/U1401r, on the other hand covered less than 10% of the total number of bacterial 16S rRNA gene sequences spanning the predetermined position of the 16S rRNA genes.While P1/P2 matched between 79% -99% of the 16S rRNA gene sequences of the different proteobacterial classes U968f/U1401r matched only 0.1% of the β-proteobacteria, and none of the other proteobacterial groups.In general, the primers U968f/U1401r matched between 0% -0.5% of the sequences in the bacterial groups detected by P1/P2.However, without restricting the search to only positions between nucleotides 960-1450 the detected diversity was much greater and included representatives of all proteobacterial classes and representatives of all bacterial classes and phyla detected by P1/P2.By allowing a mismatch in primer U968f in position 9 and restricting the search to positions 960-1450 the detection increased to 15.2% of the bacterial 16S rRNA genes in the RDP, increasing especially the detectability of β-proteobacteia, Clostridia, Chlamydiae and Verrucomicrobia.
Statistics.The detection of the different bacterial groups did not correlate with the pair of bacterial primers used.Instead, the detection of Actinomycetes, Chloroflexi, Chlorobi, Candidate Divisions OP3, OP11 and TM7 correlated positively (>0.6, p < 0.01) with the method of detection.These groups were detected from the clone libraries, but not by the corresponding DGGE.The detection of α-proteobacteria, Clostridia, Tenericutes, Erysipelotrichi, Clamydiae and Spitochaete on the other hand corresponded strongly (0.5 -0.9, p < 0.05) with the number of sequences identified per sample.These bacterial groups were only detected by HTP or clone library sequencing.Of the geochemical parameters measured, Chloroflexi, Chlorobi, Candidate Divisions OD1 and OP11 correlated with DIC (>0.5, p < 0.03), Verrucomicrobia, Chloroflexi, Chlorobi and Candidate Division OP11 corresponded with alkalinity (>0.5, p < 0.05) and Candidate Divisions OD1 correlated with ammonia (>0.5, p < 0.03), however with clearly lower correspondences than shown for the method of detection or number of identified sequences.

Discussion
Primers used during PCR amplification of target gene regions and the techniques used to analyze their products can greatly influence the results of microbial community studies (e.g.[25] [51] [52]).In this study, we evaluated two commonly used primer sets for the amplification of bacterial 16S rRNA genes from deep groundwater samples in three alternative analytical approaches: DGGE, clone libraries and recently developed amplicon libraries for HTP sequencing.Primers P1/P2 gave a much better resolution of bacterial communities than primers U968f/U1401r.While U968f/U1401r in most samples mainly detected γ-proteobacteria in the DGGE analysis, P1/P2 also detected a variety of more rare bacterial lineages, such as Bacteroidetes and α-proteobacteria.But in general, both primer sets amplified 16S rRNA genes from a range of distantly related bacteria when used for clone libraries or HTP amplicon sequencing (Figure 1, Table 3(a)).Although neither primer set had statistically significant effect on the diversity of detected bacterial groups, sequences obtained with U968f/U1401r were generally of higher quality and longer than those obtained with P1/P2.Longer sequences are desired as they provide greater phylogenetic resolution promoting the use of U968f/U1401r for molecular screening of microbial diversity.DGGE enabled a rapid visualization of differences in community composition among samples.Earlier research of the Fennoscandian deep subsurface has demonstrated its potential for monitoring changes in microbial community composition at different depths over time and for screening samples prior to the construction of clone libraries or metagenomic work [2] [5] [13].However, despite use of two universal bacteria primer pairs, DGGE appeared to be biased towards Proteobacteria, especially the γ-and β-groups.In fact, most of the bacterial community was not detected by DGGE, regardless of which primer set was used.Groups such as Nitrospirae, Candidate Divisions OD1, OP3, OP11 and TM7, Verrucomicrobia and the Chlorobi-Bacteroidetes group detected in the clone libraries with primers U968f/U1401r were not obtained in the DGGE analysis (Table 3).Similar results have been shown in aquatic environments, e.g., for the SAR11 α-proteobacteria [31] [32].These microorganisms were not detected by DGGE but were inferred to be the dominant bacteria in the water column by FISH and clone libraries.According to [31], low primer affinity was the main reason for the low detection of SAR11.The authors also speculated that SAR11 is not readily detected by DGGE due to high microdiversity among similar species or strains that may cause a multitude of faint bands in DGGE that elude detection.This is in accordance with our DGGE results where numerous bands were detected but only a few of which were strong enough to be isolated and reamplified.In addition, many DGGE bands were not successfully sequenced which may bias the DGGE results towards the abundant and readily amplified taxa.Kisand and Vikner [30] also showed that using a broad denaturing gradient (20% -70%) resulted in so-called multi-domain melting profiles (i.e., smears) for 11 flavobacterial isolates but a defined DGGE band for γ-proteobacteria.
In comparison to DGGE, clone libraries revealed significantly greater bacterial diversity, sequences were of higher quality than those re-amplified from DGGE bands, and a higher number of sequences were obtained per sample.While only two sequences were obtained from each sample by DGGE, 23 -45 different OTUs were observed from the clone libraries of the same samples.As such, clone libraries appear to be better able to characterize uncultured microbial communities in the deep terrestrial subsurface but can fail to detect unknown parts of the community due to host-vector-insert incompatibility.
As expected, amplicon library sequencing surpassed both DGGE and clone library analyses in terms of sequence coverage and microbial diversity detected.By obtaining ten-to hundred-fold higher number of sequence reads from each sample, both dominant and rare microbial groups were detected.According to DGGE performed with the same primers, γ-and β-proteobacteria dominated the amplicon libraries.However, other groups (e.g., Bacteroidetes) were detected by amplicon library sequencing at high abundances in samples OL-KR40/ 385m and OL-KR40/609m (7.9% -26%) yet were not detected by DGGE.Similarly, α-, β-, δ-and ε-proteobacteria, and Firmicutes were also detected in amplicon libraries but not in DGGE (Table 3

(a)).
In this study, clone libraries and amplicon libraries were produced from different samples, which limits a direct comparison of the two methods.Nevertheless, the two methods statistically significantly increased the detection of specific bacterial groups.This agrees with two related studies by Itävaara et al. [2] [3], where the deep biosphere of Outokumpu was examined using the P1/P2 primer set for DGGE and amplicon library sequencing and U968f/U1401r for clone libraries.Despite diverse fingerprints of the deep subsurface bacterial community, only a few bands were successfully sequenced from DGGE.Most of these sequences belonged to Clostridia with only one sequence affiliating with β-proteobacteria [2].In contrast, clone libraries showed that β-proteobacteria dominated the community close to the surface while Clostridia dominated at greater depths [3].Results obtained with amplicon library sequencing corresponded well with those obtained with clone libraries, although the relative abundance of β-proteobacteria and Clostridia differed between methods.
In case of archaeal communities, results of both amplicon sequencing and DGGE of the Olkiluoto samples investigated in this study agreed but were not identical.For example, Thermoplasmatales archaea was detected as a dominant group in all amplicon libraries, but was not detected by DGGE from sample OL-KR49/532m.Likewise, SAGMEG/SAGMA1 archaea were detected in all samples by DGGE but only by the amplicon library of OL-KR40/385m.This is in accordance with Nyyssönen et al. [5], where archaeal communities in the Outokumpu deep biosphere were studied using the same primers for DGGE and amplicon library sequencing as in this study.

Conclusion
We compared different PCR primers and molecular methods (i.e., DGGE, clone libraries and HTP amplicon library sequencing) used to survey the community of uncultured microorganisms in deep terrestrial fracture fluids.
Our results show that primer selection is of critical importance and, as yet, no single primer pair is able to produce a comprehensive picture of the microbial community in fracture fluids.In addition, different methods provide diverse results depending on their sensitivity and resolving power and should be used judiciously.While DGGE analysis can be used for rapid sample screening, it is often stated that taxa comprising <1% of the community fail to be detected.Our results, however, indicate that the detection limit may be even higher and other factors such as PCR primers play a role in the detection rates or probabilities of certain groups.Clone libraries can provide longer sequences than DGGE or amplicon libraries, resulting in more precise classification of uncultured microorganisms, but this approach is laborious and expensive compared to amplicon sequencing.Amplicon library sequencing covers the microbial diversity most efficiently because a great number of sequences are obtained and may provide the most reliable picture of microbial community composition in deep subsurface environments.

Figure 2 .
Figure 2. Rarefaction curves on the sequences obtained from (a) the clone libraries of bacterial 16S rRNA genes with primers U968f/U1401r, (b) the amplicon library sequencing of bacterial 16S rRNA genes with primers P1/P2, and (c) the amplicon library sequencing of archaeal 16S rRNA gene fragments.The distance 0.1 was used.

Figure S1 .Figure S2 .
Figure S1.Maximum likelihood tree on the bacterial 16S rRNA gene sequences obtained from DGGE and clone libraries produced with primers U968f/U1401r.The DGGE bands are shown in red and the clone library sequences in blue.Bootstrap support values were calculated with 500 pseudoreplicates.Nodes with >50% bootstrap support are indicated.

Table 1 .
Geochemical and physical characteristics of the samples (courtesy of Posiva Oy).

Table 2 .
Assays used for each sample.

Table 3 .
The (a) bacterial and (b) archaeal diversity obtained by HTP amplicon sequencing or clone library squencing in relation to DGGE.The number indicate relative abundance of sequences for HTP amplicon sequencing and clone libraries, and number of identified DGGE bands for the DGGE.

Table 4 .
The number of identified sequences, diversity and richness indices of the samples characterized by HTP amplicon library sequencing and clone library sequencing in relation to DGGE of (a) bacteria and (b) archaea.