Metagenomic Shotgun Sequencing Provides Prevalence Data for Pathogens, and Source-Tracking Indices Useful in Public Health Risk Assessment of Environmental Waters

State-approved membrane filtration (MF) techniques for water quality assessments were contrasted with metagenomic shotgun sequencing (MSS) pro-tocols to evaluate their efficacy in providing precise health-risk indices for surface waters. Samples from a freshwater receiving pond (ABI-1002) and two up-stream storm water ditches (ABI-1003) and (ABI-1004) yielded alarmingly high Fecal coliform MF densities of 220, >2000 and >2000 CFU/100ml respectively. The indicator, Enterococcus bacteria exceeded allowable limits in all but the equipment control (ABI-1001). Using MSS, the relative numerical abundance of pathogenic bacteria, virulence and antibiotic resistance genes revealed the status and potential pollution sources of each ditch. High levels of Shigella sp. (0 (ABI-1001), 4945 (ABI-1002), 55,008 (ABI-1003), and 2221 (ABI-1004) genomic reads/100ml) correlated with virulence genes and antibiotic resistance genes found in fecal samples for ABI1003 and not ABI1004. Traditional culture methods (TCM) showed possible fecal contamination in two of the four samples, and no contamination in the others. MSS clearly distinguished between fecal and environmental bacteria contamination sources, and pin-pointed actual risks from pathogens. Our data underscore the potential utility of MSS in precision risk assessment for public and biodiversity health and tracking of environmental microbiomes shifts by field managers and policy makers.


Introduction
For over 50 years, microbiological risk assessment of environmental samples has been based on monitoring the prevalence of indicator organisms that are generally not harmful, but indicate the possible presence of pathogenic bacteria, viruses and protozoans (EPA 822-R-10-005, 2010). The widely used indicator bacteria include easily cultured heterotrophic bacteria-Enterococcus spp., Escherichia coli, Fecal coliforms and more recently Bacteroidetes spp., Clostridia spp., some phages and bio-marker genes (Esiobu et al., 2004;Wade et al., 2006;Noble et al., 2003;Coakley et al., 2015). The current regulations of surface water quality standards for Florida are referenced in document 62 -302 (Water Quality Standards Variances. Epa.gov). Enteric organisms such as E. coli, Fecal coliforms, and Enterococci spp. are the only well-regulated indicators of microbiological health risks. The safety thresholds vary and are based on the class of water and organism being tested. Issues related to the sensitivity and specificity of the indicator detection systems have been continually revised and improved with advances in chromogenic media and one-step assays (Odonkor & Ampofo, 2013;Ferguson et al., 2013). However, numerous limitations remain. Reviews and discussions of these challenges abound in literature (Rochelle-Newall et al., 2015;Tan et al., 2015;Figueras & Borrego, 2010;Evangelista & Coburn, 2010), and include but not limited to questions about the suitability of enteric indicators for respiratory/skin illnesses, the interpretation of prevalence data in non-point source environments, the difficulty in distinguishing between environmental strains and real indicators (fecal strains), the lack of indicators for biodiversity in preserved areas and finally, the utility of the numbers (Esiobu et al., 2013). The US Environmental Protection Agency (EPA) manual for monitoring (EPA 5.11 Fecal Bacteria, 2012; EPA-820-R-14-010, 2014; EPA-820-R-14-011, 2014) justified the indicator system because "it is difficult, time-consuming, and expensive to test directly for the presence of a large variety of pathogens; water is usually tested for coliforms and fecal Streptococci instead". In some instances, these relatively inexpensive culture techniques provide data considered sufficient to accurately assess (at least in part) the public health risk of water samples. On the other hand, interpretation of results from non-point sources of pollution is not always clear-cut. To address the many challenges that remain unsolved, emerging technologies such as the metagenomic shotgun sequencing, which can detect virulence genes and all microbial life forms-viruses, bacteria, fungi and protozoans in a single assay are being developed. These permit a direct rather than an indirect assessment of public health risks. Techniques such as amplicon metagenomic analysis using 16S rRNA pyrosequencing (Gomez-Alvarez et al., 2012) are being used to rapidly and effectively monitor different disinfection treatments of drinking water samples. Similarly, Cabral et al. (2018) employed sequencing technologies to successfully characterize microbial communities in water ways while other recent studies (Mohiuddin et al., 2017;Roy et al., 2018;Cocolin et al., 2018;Li et al., 2018) have demonstrated the utility of next genera- sight to the complex problems associated with surface water quality, and direct detection of possible pathogenic organisms in beach sand. Furthermore, whole genome sequencing has been employed to study functional genes in potable water treatment systems and associated biofilms (Douterelo et al., 2018). Wide-scale use of these emerging sequencing technologies for environmental monitoring will require experimentally modelled interpretations of the big-data generated, as well as comparative analysis of the results with validated culture methods and other epidemiological indices. In this study we determined and compared the microbiological water quality of four samples using the traditional EPA standard methods and metagenomic shotgun sequencing of genomic DNA extracted directly from water to evaluate their relative efficacy in predicting actual health risks to the public and biodiversity in the environment.

Sampling
Four water samples were collected in September of 2016 aseptically from: (A) deionized water used to ensure sampling equipment was contaminant free (ABI-1001)-(quality control), (B) a receiving freshwater pond used for recreational activities (ABI-1002), a storm water ditch whose effluent enters the freshwater pond (ABI-1003), and a second storm water ditch located the furthest from the pond and whose effluent merges with those of ABI-1003 before emptying into the pond (ABI-1004.) The exact locations of sample points are kept confidential for privacy purposes. All samples were collected at the same event after rainfall to ensure storm water ditches held water. Fecal coliform membrane filtration and Enterococcus membrane filtration samples were collected in 100 ml sterile vessels containing sodium thiosulfate, sufficient to neutralize any free chlorine present. Metagenomic shotgun sequencing samples were also collected from the same locations, but in 1 L amber bottles (sterilized by rinsing with 10% bleach) with no preservatives. All samples after collection were placed immediately on ice and stored at 4˚C in the laboratory until analysis.

Fecal coliform Enumeration
Fecal coliform bacteria were enumerated following the standard methods SM9222D method (Clesceri et al., 1998). Filtration volumes were based on multiple years of historical data for these sample sites, and results were reported in Colony Forming Units (CFU)/100ml. Two dilutions were prepared for each sample (10 ml and 50 ml) following NELAP protocol of recovering 20 -60 CFU.
Blanks were also prepared to ensure sterility of membrane filter equipment using sterile deionized (DI) water. All samples were processed within the Florida Department of Environmental Protection (DEP) guidelines of 8 hours after collection.
Enterococcus spp. were detected using the EPA 1600 method for enumeration of Enterococci from water (EPA-821-R-06-009, 2006). Dilutions were carried out as in Fecal coliform above, based on multiple years of historical data for these sample sites, and results were reported in CFU/100ml. Two dilutions were prepared for each sample (10 ml and 50 ml) following Standard Methods protocol of recovering 20 -60 CFU. Blanks were also prepared to ensure sterility of membrane filter equipment using sterile DI water. All samples were processed within the Florida DEP guidelines of 8 hours after collection.

Metagenomic Shotgun Sequencing
Five hundred (500) ml of each water sample was thoroughly mixed and vacuum filtered through 0.2 µm pore-sized polycarbonate filters, and a replicate sample filtered with the remaining 500 ml. All cells were washed out and concentrated into 5 ml of sterile DI water by placing the polycarbonate filter into a 50 ml centrifuge tube containing 5 ml of sterile DI water and vortexing. The suspension of cells was then harvested by ultra-centrifugation at 15,000 rpm in 1.5 ml aliquots.
The pelleted cells were pooled into one tube before lysis and DNA extraction

Results
Results obtained from this study are all reported as CFU/100ml unless otherwise stated, and the samples are labeled as ABI1001-ABI1004, with ABI1001 being the equipment control (blank) as previously mentioned. and Enterococci using traditional culture techniques-SM9222D and EPA 1600.
The lowest run dilution for these samples was 10 ml, making the highest possible detection limit to be 2000 CFU/100ml. As such all results that are at 2000 CFU/100ml are presumed to be greater than 2000 CFU/100ml, because the plates were too numerous to count. Results reported for these culturing techniques are based on the most accurate number calculated from the volume of sample filtered. It is noteworthy that while Fecal coliforms (a nearly ubiquitous enteric group of bacteria in animals) exceeded detection limits in ABI-1003 and 1004; enterococcal levels were contrasting; being significantly less in the latter.
The numbers in the receiving pond and control samples were within expected levels.  Fecal coliforms in ABI 1003; only 499 for ABI 1004 and more than 3000 for the receiving pond. Although the equivalency of genomic reads and colony forming units is not necessarily linear, the remarkable difference between both techniques underscore the challenge with sensitivity and specificity of risk assessment assays.
In Figure 3, the prevalence of some potentially pathogenic bacteria species found in each sample is presented in genomic reads/100ml of sample to allow comparisons with results obtained from traditional methods. Shigella sp. were detected at the following densities-(0 (ABI-1001), 4945 (ABI-1002), 55,008 (ABI-1003), and 2221 (ABI-1004) genomic reads/100ml). With the exception of Pseudomonas aeruginosa which is ubiquitous in nature, residing in soils, water, and human skin; sample ABI 1003 contained 100 to 1000 times more potential pathogens than ABI 1004 even though both samples were incorrectly rated of similar pollution trend based on traditional indicator culture approach ( Figure   1). Most of the pathogens detected in high numbers include Vibrio cholerae and Staphylococcus spp. and other enteric organisms. None of the common pathogens was detected in the receiving pond.
In addition to specific pathogens, indicator genera, a single assay of MSS also detects virulence and antibiotic resistance genes. In Figure 4   The more maroon the bar is, the higher incidence that gene was detected. (b) A represents the top 50% of antibiotic resistance genes detected. The more maroon the bar is, the higher incidence that gene was detected.  Figure 4(b) displays a strikingly important distribution of antibiotic resistance genes associated with the samples. ABI-1003, again, had much more diversity and prevalence of antibiotic resistance genes than could be included in this chart. There is a clear correlation between virulence/antibiotic resistance genes and the density of Shigella sp. and other potential pathogens presented in Figure 3. The absence or relatively low density of these genes in sample ABI1004 is congruent with results displayed in Figure 2 and Figure 3 but sharply contrast traditional culture results which recorded high abundance of Fecal coliforms and enterococci (Figure 1).
The relative occurrence of bacteria phyla detected in all samples is shown in

Traditional Culture Methods
Using traditional culturing methods alongside of metagenomic shotgun sequencing allowed us to compare these methods for the first time to understand the microbial community indices versus membrane plate count results. The traditional culturing methods (Figure 1) showed that samples ABI-1003 and ABI-1004 exceeded the Florida Fecal coliform limit of not to exceed 800 CFU/100ml on any one day) as per SM 9222D for Fecal coliform organisms (Clesceri et al., 1998).
Furthermore, ABI-1002 showed relatively low concentrations of traditional indicator organisms, and traditional test results were within the acceptable criteria of Florida surface water limits. These methods are beneficial due to their cost effectiveness and quick production of results. Although this information is quick and easy to obtain, traditional indicator organisms tend to adapt and survive for prolonged periods in the humid and warm Florida climate (Bonilla et al., 2007;Hartz et al., 2008). Besides, the phenotypic and biochemical assays are prone to yield false positive results due to the enormous versatility of environmental bacteria metabolism. Also, many environmental factors can influence the results of these methods such as: runoff of soil from rainfall, debris entering sampling point, and natural conditions such as shade from sunlight.

Metagenomic Shotgun Sequencing
It was possible to obtain a plethora of data from metagenomic shotgun sequencing not discernible from the traditional culture methods used in this study. MSS results enabled the construction of a clear risk assessment by more than one layer of evidence: human pathogens, human pathogen relative abundance, antibiotic resistance genes, virulence genes, viruses, and also all other microorganisms such as environmental bacteria. The relative abundance of human associated pathogens can indicate source and potential risk of an environmental sample (Wade et al., 2006). All of the parameters obtained in a single test (including strain level detection) allowed for potential source-tracking of any fecal pollution present in a given sample. Metagenomic shotgun sequencing can also be applied to define a microbiome for particular environmental water bodies.
Understanding the water body microbiome condition could allow for fewer repeats of traditional culture testing and create a more comprehensive fingerprint for long term monitoring of changes in user behaviors and climate.  Most of the bacteria found in sample ABI-1003 in Figure 3 are enteric bacteria of fecal origin. Close-up scrutiny revealed that only one Bacteroides spp. was found in ABI-1003, the classic Bacteroides fragilis str S6L5. This strain of Bacteroides fragilis is also possibly associated with human gastro-intestinal disease, although research is still being conducted. No specific human Bacteroides sp. tected. Another interpretation of the apparent discrepancy is that the organisms were well below risk levels and within safety thresholds, and so not detecting such mcirobes would not alter the utility of the MSS. Nevertheless, experimental determination of the actual threshold sensitivity test will enable its large-scale application in food and environmental quality control.

Benefits and Downfalls of TCM and MSS
Unlike TCM, metagenomic shotgun sequencing allows for coupling multiple layers of risk assessment indices obtained concurrently to make predictions with a very high level of confidence. The profile of virulence genes detected in ABI 1004 is more diverse and abundant than the relatively uncontaminated ABI 1002, and both samples were completely devoid of the genes present in ABI 1003. When coupled with data in Figure 3, it becomes apparent that the former contained non-pathogenic naturally occurring organisms. Given the larger dataset of pathogenic bacteria, virulence genes, and antibiotic resistance genes associated with human enteric organisms, the MSS flags ABI-1003 as a suspect. The utilization of new technologies to assess public health impacts from water bodies is an overlooked benefit.

Conclusion
This study confirmed the gross limitations of the traditional culture gold standard for assessing water quality of surface waters to include false positives for indicator bacteria and lack of source-tracking components. The numbers of real positives did not always correlate with actual pathogen presence. However, the TCM is a relatively inexpensive assay with short turnaround time. Metagenomic shotgun sequencing technology on the other hand provided a powerful resolution for a water body where state-mandated thresholds for safe recreation were exceeded, and for which a non-point or point source was not clearly evident.
Multiple risk assessment parameters obtained from the relatively expensive MSS analysis include specific identification/quantification of all microbiological entities, virulence genes and antibiotic resistance genes; allowing for a robust health risk evaluation in one step. In addition, the comprehensive data on the microbiomes of the niche provide important base-line reference for early detection and intervention in cases of anthropogenic or climate change perturbation. The need to define and standardize the sensitivity of MSS (assuring the detection of only viable bacteria, even at low relative abundance) is critical to its wide applications in environmental health management. Until the high cost of the MSS is

Authors' Contributions
BM organized sample collection, assisted with wet-lab microbiology, contributed to data analysis and wiring of the manuscript. KD helped with preparation of metagenomic DNA samples. LM assisted with sample collection, contributed to manuscript write up and data presentation. NE conceived and designed study; supervised experimental procedures, directed data analysis and interpretation; and helped with writing of the manuscript.