Application of High-Throughput Sequencing : Discovery of Informative SNPs to Subtype Bacillus anthracis

Single Nucleotide Polymorphisms (SNPs) are the most common and abundant genetic variation found in the genome of any living species, from bacteria to humans. In bacterial genotyping, these evolutionarily stable point mutations represent important DNA markers that can be used to elucidate deep phylogenetic relationships among worldwide strains, but also to discriminate closely related strains. With the advent of next generation sequencing (NGS) technologies, affordable solutions are now available to get access to the complete genome sequence of an organism. Sequencing efforts of an increasing number of strains provide an unprecedented opportunity to create comprehensive species phylogenies. In this study, a comparative analysis of 161 genomes of Bacillus anthracis has being conducted to discover new informative SNPs that further resolves B. anthracis SNP-based phylogenetic tree. Nine previously unpublished SNPs that define major groups or sub-groups within the B. anthracis species were selected and developed into SNP discriminative assays. We report here a cost-effective high-resolution melting-based genotyping method for the screening of 27 canonical SNPs that includes these new diagnostic markers. The present assays are useful to rapidly assign an isolate to one sub-lineages or sub-groups and determine its phylogenetic placement on the B. anthracis substructure population.


Introduction
Bacillus anthracis is a zoonotic pathogen that primarily affects grazing herbivores, but can be used as an agent of bioterrorism.This Gram-positive endospore-forming bacterium is the causative agent of anthrax, an infectious and life threatening diseases in all mammals, including humans.B. anthracis evolved as a highly fit clonal pathogen that has spread throughout the world and became ecologically established as local, genetically distinct populations [1].
In both microbiological forensics and epidemiological investigations, obtaining reliable information regarding the identification and source of a suspicious strain is often essential to set up effective responses, as highlighted by the Amerithrax investigations following the so-called 2001 "anthrax letter attacks" in the United States [2].The principal genomic markers used for genotyping a highly monomorphic bacterium such as B. anthracis are variable number tandem repeats (VNTRs) [3]- [5], single nucleotide repeats (SNRs) [6]- [8] and SNPs [9].Since 2007, the genotyping of a set of 13 canonical SNPs has emerged as the gold standard method to be used for determining phylogenetic relationships within this bacterium [10]- [13].SNPs are generated by nucleotide substitutions, probably associated to errors during DNA replication that are not repaired.These rare events occur at an estimated rate of approximately 10 −10 changes per nucleotide per generation in the B. anthracis species [14].Because of their low mutation rates, SNPs represent ideal targets for phylogenetic analyses.SNPs are evolutionarily stable markers that are unlikely to mutate back to their ancestral state [9].This stability can be used for defining canonical markers representative of groups of related strains, such as phylogenetic sub-groups [9].
The substructure of the B. anthracis population is divided into three major clades (A, B and C), with further subdivisions into 13 major lineages or groups [10] [12] [13].The A-radiation has known the most dramatic dispersal and clonal expansion.It represents nowadays the majority of anthrax cases reported around the world [3] [15] [16].The eight A-lineages or groups exhibit distinct geographical patterns across the five continents.Most basal of the A-clade is the African A.Br.005/006 group, consistent with an "out of Africa" hypothesis for B. anthracis.Soon after this divergence, the Vollum-clade appeared, with isolates currently ecologically established in Iran, Pakistan and Afghanistan, for instance.The highly successful TransEurasian (TEA) group (A.Br.008/009) is slightly more recent and spread across Europe and Asia from which a derived lineage (A.Br.WNA) was introduced into North America.Another recent bifurcation led to the A.Br.003/004, A.Br. Australia-94 and Ames/Sterne (A.Br.001/002)groups which contain isolates found, respectively, in South America (A.Br.003/004),Asia and part of Europe (A.Br.Aust94 and A.Br.001/002) [1] [10].Collections from western and south-central Asia (Turkey, India) and western China are dominated by genotypes belonging to the A.Br.Aust94 lineage.Further into central and eastern China the genotypes are dominated by isolates belonging to the A.Br.001/002 group and its derived A.Br.Ames lineage.The B clade is divided into two genetically distinct lineages with geographical restricted repartition.The B.Br.001/002 group (including the derived B.Br.Kruger lineage) is ecologically established in Southern Africa, in particular in the Kruger National Park (South Africa), where it co-exists with other strains from the A-clade [10] [17].The B.Br.CNEVA lineage is exclusively found in continental Europe [10] [18]- [21].The C-clade is composed of an uncommon lineage, C.Br.A1055, clustering only three collection's isolates of unknown origin [10] [22].
Whole genome sequencing is the ultimate genotyping tool, giving the highest possible resolution, with comparison information spanning from distant phylogenetic relationships to the highest level of subtyping.Identification of thousands of SNPs retrieved from compiled NGS sequences facilitates high-resolution strain tracking and provides the level of discrimination required for microbial forensics.Canonical SNP typing and whole genome comparisons of multiple strains will soon become the method of choice in B. anthracis genotyping [23].We have recently reported the high throughput sequencing and phylogeographical analysis of 122 strains of B. anthracis isolated in France [24].The resulting whole-genome SNPs analysis determined relationships existing between French strains at a level of discrimination never reached before and reconstructed the French population substructure in details.Eight new canonical SNPs have been identified that allow classifying French B. anthracis isolates at a higher resolution [24].In this study, we compared the chromosomal sequences of 161 globally diverse strains, focusing on identifying additional informative SNPs, to provide further resolution into current SNP-based typing of B. anthracis.Twenty-seven SNP discrimination assays, including nine newly described markers, are reported.The PCR-HRM-based method presented here is a cost-effective tool to genotype the highly clonal B. anthracis species.

Extraction of Whole-Genome SNPs among B. anthracis Strains
In this study, the genome of one African A.Br.005/006-affiliated strain (IEMVT 89) and a hundreds of French strains (n = 123) belonging to 3 groups or lineages (A.Br.001/002,A.Br.011/009 (A.Br.008/009sub-group), and B.Br.CNEVA) [24] were whole-genome compared to thirty-seven globally diverse genomes publically available in database [25] (Table 1).Ames Ancestor [GenBank: AE017334.2]was used as reference genome.Sequences alignment and whole-genome SNPs discovery were carried out using the BioNumerics version 7.0 software (Applied Maths, Belgium) and its Chromosome Comparisons module.SNPs were next filtered to avoid erroneous and biased data.Missing sequences, ribosomal operons, VNTR loci and contiguous SNPs in a 10 pb window were eliminated from the analysis.

Phylogenetic Analysis
A minimum spanning tree was drawn in BioNumerics by using the filtered whole genome sequencing SNP data as input.Nodes were automatically numbered by the software.Canonical SNPs that defining major group, sub-groups or clusters of strains along the SNP tree were identified by searching for key signatures with allelic states shared only by these selected subgroups.

SNP Discrimination Assays by HRM
After identification, primer sequences suitable for HRM analysis were designed for each SNP marker using the Primer 3+ software (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi/)(Table 2).Short amplicons (<100 pb) were selected.High-resolution melting (HRM) is a post-PCR technique that determines with high precision the melt profile of PCR products.A new generation dye is incorporated into a double-stranded DNA.Using a slow constant increase in temperature, fluorescence acquisition allows the distinction between two different populations of amplicons.Amplification was performed on the ViiA7™ Real-Time PCR System (Life Technologies) using the Light-Cycler ® 480 High Resolution Melting Master Mix (Roche Diagnostics).The reaction mixture consisted of 0.2 μM of each primer, 1 × LightCycler ® 480 HRM master mix and 2.5 mM MgCl 2 in a 10 μl final volume.The following parameters were used: 10 min at 95˚C were followed by 40 cycles consisting of 10 s at 95˚C, 10 s at 58˚C and 20 s at 72˚C.Samples were next heated to 95˚C for 30 s, cooled down to 65˚C for 1 min and heated from 65˚C to 88˚C at a rate of 1˚C/s with 25 acquisitions/˚C.HRM data were analyzed by the ViiA7™ Software (version 1.2.1).

In Silico SNP Typing
Thirty-seven B. anthracis genomes available in public NCBI database or Sequence Read Archive (SRA) were used for this study [25] (Table 1).These globally diverse genomes were first subjected to in silico canSNP typing using the 14 canSNPs references set previously published [10] [12] to determine their phylogenetic placement on the B. anthracis canSNP tree (Table 1).French strains were previously shown to belong to A.Br.011/009 (a sub-group of the TEA population) (n = 32), A.Br.001/002 (n = 24) and B.Br.CNEVA (n = 67) lineages or groups.The African IEMVT 89 strain is affiliated to A.Br.005/006 [24].

Phylogenetic Analysis
A total of 161 genomes of B. anthracis were aligned to the Ames ancestor reference sequence and whole-genome compared with a focus on SNPs discovery.SNPs that define the major A.Br.Aust94 lineage (four genomes), A.Br.003/004 group (one genome) and A.Br.005/006 group (two genomes) were identified and one canonical marker selected for each subpopulation (Table 2, Figure 1).Interestingly, the whole-genome SNP analysis further resolved the A.Br.001/002 subpopulation (32 genomes) into two distinct phylogenetic sub-groups which were named A01 and A02 (Figure 1, inset A).The A01 sub-group (also termed "Ames sub-group") radiates very shortly after the A01-A02 divergence (1 SNP at position 515111) into at least 5 sub-branches.The first one contains the A.Br.Ames lineage (strains Ames Ancestor, Ames, A2012 and A0248).The second one is composed by the A0389 strain.The third one includes the Japanese strain Ba103 and one old French strain (CIP A211).The two last branches are composed of, respectively, two old French isolates (CIP 81.89 and CIP 53.169), and the A16 Chinese strain.The A02 sub-group (also termed "Sterne sub-group") clusters the Sterne vaccine strain and all recent strains isolated in France from the Doubs department, including the 08-8_20 genome [24].Noteworthy to mention, the Carbosap vaccine was found to be phylogenetically related to five French genomes positioned within the third branch of the six sub-branches previously discovered within the A.Br.011/009 sub-groups.SNPs specific to these A.Br.011/009 branches were also selected (Table 2, Figure 1, inset B).

Discussion
Cost is an important issue for all diagnostic assays.In this respect, real-time PCR platforms and high-resolution DNA melting analysis (HRM) are attractive technologies for SNP-interrogation.HRM techniques can determine with high precision the melt profile of PCR products using accurate fluorescence data acquisition over small temperature increments [26].Two standard primers are used to amplify short segments flanking each SNP.The melting profile of the resulting amplicon is characteristic of its GC content, i.e., a substitution of a G or C to an A or T reduces the melting temperature (Tm) while a substitution of an A or T to a G or C increases the Tm.
Based on its simplicity, low cost, non-destructive nature, high sensitivity and specificity, the popularity of HRM analysis has grown considerably in the last few years.HRM analysis is an effective alternative to other PCR-based methods such as Dual Probe TaqMan PCR assays [27] and Mismatch Amplification Mutation Assays (MAMA) [28] [29].HRM is an ideal format for scoring a small number of SNPs.PCR-HRM assays can be rapidly designed around SNPs to determine the extent to which each marker varies in the B. anthracis population.
Over the last decade, significant research efforts have been undertaken to develop genotyping methods with increased resolving power for B. anthracis strain differentiation.The absence of horizontal transfer of genetic material, a slow rate of accumulation of mutations and the paucity of large genome rearrangements within the genome of B. anthracis [9] make SNPs markers of choice for genotyping a species.The advent of next-generation sequencing technologies and comparative genome analyses offers a powerful approach for identifying rare polymorphisms for the unbiased typing of highly clonal pathogens.Since 2007, and the identification by Van Ert et al. of 13 "canonical" SNPs representing key phylogenetic positions along the B. anthracis phylogenetic SNP tree, a growing number of diagnostic SNPs has been discovered and used to survey B. anthracis diversity in nature [11] [24] [30]- [32].In this study, a same but large-scale approach, comparing 161 B. anthracis genomes of diverse origins, were conducted to discover novel informative SNPs that can further discriminate within lineages or groups of strains.Nine novel DNA signatures were selected and developed into SNP discrimination assays.Combined with previously established canSNPs [10]- [12] [24], the set of 27 markers described here allows a more precise and reliable phylogenetically assignment of any B. anthracis strain into 20 sub-lineages or sub-groups.The present method will contribute to improve our ability to characterize B. anthracis strains and to react rapidly when the identity and origin of a suspicious strain need to be established.

Figure 1 .
Figure 1.Phylogeny of the major canSNP groups of B. anthracis.Dendrogram of 14 canSNP analysis of Bacillus anthracis isolates after Van Ert et al. (2007) and Marston et al. (2011).Canonical SNP positions and names that define lineages, groups or sub-groups are indicated in black arrows (previously published canSNP) and red arrows (newly discovered canSNPs).The A.Br.001/002 group is now subdivided into two sub-groups: A.Br.01 and A.Br.02 (insetA).The A.Br.011/009 sub-group is further resolved into six branches as previously described for the French TEA subpopulation (inset B).These new groups are named after the canSNP marker and assay that define their position.

Table 1 .
Whole genome sequences available in public database used in this study.

Table 2 .
canSNPs and primer sequences used for HRM analysis.

Table 3 .
Melting temperature of new canonical SNPs.