for constant epidemiological rates  . Combining this two models   , and embedding the result in a Bayesian inference framework, gave rise the birth-death model  . This new approach was successfully employed to the study of HCV samples that were collected at one time point  .
We employed 90 million steps of MCMC. Statistical uncertainty in the data was reflected by the 95% highest probability density (HPD) values. Results were examined using the TRACER v1.4 program  from the BEAST package. Convergence was assessed with ESS (effective sample size) values after a burn-in of 3 million steps.
2.5. Phylogenetic Tree Analysis
Phylogenetic tree analysis was done by two different approaches. First, we use DensiTree program  . Briefly, DensiTree is a program for drawing sets of trees stored in Nexus format. The main idea is to draw all trees in the set, but instead of using opaque lines, it uses transparency. As a result, areas where a lot of the trees agree on the topology and branch length, there will be many lines drawn and the screen will show a densely colored area. Areas where there are a few competing topologies will be highlighted by a web of lines. Uncertainty in node heights and their distribution can be shown by smears around the mean node height  . A total of 20,000 trees were drawn using this approach in the tree shown in this work using the output of the Bayesian Markov MCMC.
Second, maximum clade credibility trees were generated using Tree Annotator program from the BEAST package and the FigTree program v1.2.2 (available at: http://tree.bio.ed.ac.uk) was used for the visualization of the annotated trees.
2.6. Median-Joining Networks
Reconstructing phylogenies from intraspecific data is often a challenging task because of large sample sizes and small genetic distances between individuals  .
For this reason, different and complementary approaches may be needed in order to draw accurate conclusions from data analysis. Median-joining method for constructing networks combines algorithms for finding minimum spanning trees by favoring short connections and maximum-parsimony heuristic algorithms, which sequentially adds new vertices called median vectors  . Moreover, median-joining method is applicable to multistate characters, like amino acid sequences. We constructed the quasispecies population network using the reduced-median-network approaches of Network 4.6 program (available at: http://www.fluxus-engineering.com).
2.7. Evolutionary Interactions between Amino Acid Sites of the HCV HVR1
In order to gain insight into the evolutionary interactions between sites of the HVR1 we employed a Bayesian Graphical Model (BGM)  applied to reconstructed evolutionary histories of individual sites to find evidence of co-evolution between sites in the HCV HVR1 sequences of the quasispecies population. This was done by means of the use of the Datamonkey server  . Breafly, using the JTT model, a one parent per node (i.e. a node can be conditionally dependent on no more than one other node in the graph) analysis was done. From the 22 sites selected in this analysis, co-evolving sites we observed by means of the use of BGM, where sites were selected based on entropy. Co-evolving sites were assessed by the posterior probability of correlated substitutions between the pair of sites considered (i.e. the posterior probability for site 1 and site 2 being conditionally dependent).
2.8. Mapping of Amino Acid Sites in an in silico 3D Structure of HCV HVR1
The crystallographic structure of the complete HVC E2 is presently unknown. In order to gain insight into the positions of co-evolving aminoacid sites in the HVR1 region, we modeled in silico the HVR1 of HCV isolate H77 (genotype 1) and mapped the interacting sites in this 3D structure. First, an in silico 3D structure of the HVR1 was obtained by means of the use of 3D-JIGSAW v3.0 program  . Once obtained, the structure was visualized using the JMol v14.0.4 program  .
3.1. Bayesian Coalescent Analysis of HCV HVR1 Quasiespecies Sequences
In order to gain insight into the mode of evolution of an HCV HVR1 quasispecies population, we used a Bayesian Markov Chain MCMC approach to analyze 100 HCV HVR1 amino acid sequences from a quasispecies population circulating in a HCV 1b genotype patient  . These sequences represent 100 different haplotypes and their frequencies in the population varied from 10.49% to 0.10%  . The results shown in Table 1 are the outcome of the analysis for 90 million steps of the MCMC, using the JTT model + invariants, a relaxed clock  and the Birth-Death Skyline Contemporary model  .
As it can be seen in Table 1, convergence is supported by high ESS values. When the JTT model was used, a mean rate of 4.80 × 10−2 amino acid substitutions per site per year is observed in the HVR1 of this evolving HCV quasispecies population, revealing a significant and high rate of evolution of this HCV genome region.
3.2. Phylogenetic Tree Analysis of HCV HVR1 Quasispecies Sequences
To study the phylogenetic relations among the HCV HVR1 quasispecies enrolled in these studies we first
Table 1. Bayesian coalescent inference of an HCV quasispecies population.
aIn all cases, the main values are shown. bHigh probability density values. cEffective sample size. dMean rate was calculated in amino acid substitution/site/year.
constructed phylogenetic trees using DensiTree  to draw all trees in the set transparently. As a result, areas where a lot of the trees agree in topology and branch lengths show up as highly colored areas, while areas with little agreement show up as webs. Thus, DensiTree provides a quick method for qualitative analysis of tree sets  . The results of these studies are shown in Figure 1.
As it can be seen in the figure, the HCV quasispecies population isolated from the patient can be assigned to three different major clusters. The most frequent haplotype represented in the quasispecies population (Seq 1, with a frequency of 10.49%) is assigned to cluster 1, the less frequent haplotype (Seq 100, with a frequency of 0.10%) is assigned to cluster 2. Interestingly, a complete different phylogenetic cluster (cluster 3) is clearly observed, with haplotypes ranging from 7.41% to 0.107% in abundance. These results also suggest a temporal co-circulation of the three different clusters (see Figure 1).
In order to confirm these findings, a maximum clade credibility phylogenetic tree analysis was conducted. The results of these studies are shown in Figure 2.
Again, three major phylogenetic clusters were observed. Strains assigned to the same cluster in Figure 1 using DensiTree approach are clustered together in the maximum clade credibility tree (see Figure 2). Cluster 3 has been evolving in a different genetic lineage from clusters 1 and 2 from ancestors that existed approximately around 1.7 years before the time of isolation. Temporal co-circulation of different clusters can be also observed.
Figure 1. DensiTree phylogenetic tree analysis of HVR1 of E2 genes from HCV quasispecies. Consensus over 20,000 trees is shown. Three clusters depicting different quasispecies sub-populations are indicated by numbers at the right side of the figure. Cluster 1 contains 23 sequences (Seq 1, 2, 13, 14, 17, 19, 23, 31, 39, 40, 44, 45, 55, 58, 59, 60, 61, 63, 64, 75, 77, 89 and 95); Cluster 2 contains 40 sequences (Seq 8, 9, 16, 26, 27, 29, 30, 32, 33, 34, 35, 36, 37, 41, 47, 48, 49, 52, 53, 54, 56, 62, 65, 68, 69, 71, 73, 74, 79, 81, 85, 86, 90, 93, 94, 96, 97, 98, 99, 100) and Cluster 3 contains 37 sequences (Seq 3, 4, 5, 6, 7, 10, 11, 12, 15, 18, 20, 22, 24, 25, 28, 38, 42,43, 46, 50, 51, 57, 66, 67, 70, 72, 75, 76, 78, 80, 82, 83, 84, 87,88, 91, 92). A dominant topology is observed for most of the tree. Inside each sub-population, three alternative interpretations are given, blue for most frequently occurring topology, red for second, and light green for third. The most abundant strain (Seq 1) is shown by an arrow; the less abundant one (Seq 100) is shown by a star.
Figure 2. Bayesian MCMC phylogenetic tree analysis of HVR1 of E2 genes from HCV quasispecies. A maximum credibility clade obtained using the JTT model, the birth-death skyline population model and a relaxed clock (uncorrelated exponential) is shown. The tree is rooted to their most recent common ancestor (MRCA). Bar at the bottom of the tree show time in years. Therest as same as Figure 1.
3.3. Median-Joining Networks of an HCV Quasispecies Population
In order to study the degree of genetic variability and evolution of the HCV quasispecies it is advisable to use different and complementary approaches. For these reasons and in order to confirm the results found by the phylogenetic tree analyses performed, we constructed a median-joining network of the quasispecies present in the population and enrolled in these studies. We have used a reduced median-joining network option by means of the use of Network 4.6 program  . The reason for using a reduced instead of a full median network is to improve clarity since a full median network can easily contain too many links and median vectors to visualize and interpret. The results of these studies are shown in Figure 3.
As it can be seen in the figure, although a star-like network is found in the center of the network, in which the most abundant sequence is found (Seq1), significant median vectors distances separate two other genetic groups. This speaks again of a high degree of genetic variability among the quasispecies found in the population. The most abundant sequence in the quasispecies is not the center of a tight and complex network around this sequence, suggesting that different genetic lineages are present in the quasispecies population in agreement with the phylogenetic tree analyses.
3.4. Co-Evolution of Amino Acid Sites in the HVR1 of HCV Quasispecies Genomes
HVR1 plays a major role in both HCV cell entry and immune evasion  . HVR1 contains dominant neutralizing epitopes, and its variation may lead to virus escape from pre-existing neutralizing antibodies   . Substitutions in HVR1 have been proposed to be driven by immune pressure and to play an important role in the establishment of persistent infection and disease progression  . Three different HVR1 microdomains have been recently proposed  . The first microdomain (residues 14, 15, and 25 - 27) play a key role in binding of the HCV to the SR-BI receptor and any residue in this microdomain is indispensable for HCV cell entry  .
In order to gain insight into the co-evolving and interacting sites among the HVR1, we performed a Bayesian Graphical Model analysis  . The results of these studies are shown in Figure 4.
Interestingly, strong associations in the consensus network occurred between residues 12 and 15, and residues 13 and 14 (posterior probability of correlated substitutions of 1.00). A large component of associations inside the network comprised associations with residue 12, which also has a strong supported association with residue 26 (posterior probability of 0.81). This speaks of the possible roll of residues at positions 12 and 13 of the HCV HVR1 to allow the virus to shift between combinations of residues to escape the immune system while retaining its structure and functions.
A structural interaction occurs between residues that cooperate in the formation and stabilization of secondary or tertiary protein structures  . Unfortunately, the structure of complete E2 remains unknown at present. For that reason, an in order to gain insight into the possible interactions among HVR1 site, we modeled in silico the HVR1 of HCV H77 isolate (genotype 1) and mapped interacting sites in this model. The results of these studies are shown in Figure 5.
Figure 3. HVR1 E2 HCV intrahost quasispecies network analysis. A reduced median-joining network is shown. In the network, each node represents a unique haplotype within the viral quasispecies population. The length of the link represents the aminoacid differences between the two different haplotypes. Sequences are shown by name next to the nodes. Numbers depict the major sub-populations in the network. Colors represent the relative abundance frequencies found in the quasispecies population  : Seq1, 10.49% (most abundant haplotype) is shown in fucsia; Seq 2, 8.54%, is shown in green, Seq5, 4.29%, is shown in orange; Seq 8, 2.25%, is shown in blue, Seq 9, 2.25%, is shown in violet; Seq 15 - 18, 1.07%, is shown in yellow; Seq 21 - 23, 26 and 27, 0.80 to 0.65%, is shown in brown; haplotypes with less than 0.62 % in abundance are shown in grey.
Figure 4. Consensus network of co-evolving sites in the HVR1 region of an HCV quasispecies population. Each node, represented by a square, corresponds to a residue in the HVR1, numbered according to their relative position to the HVR1 of HCV isolate H77 (genotype 1). The posterior probability of correlated substitutions between each pair of sites is indicated by numbers between the two sites.
Figure 5. In silico structural 3D model of the HVR1 of HCV H77 isolate. A three dimensional structure of the molecule is shown. Microdomains 1 through 3 of the HVR1  are shown in blue, green and white, respectively. Co-evolving sites 12, 15 and 26 are indicated in red and by yellow lines and their relative positions in the 3D structure are shown. In (a) the predicted structure of the HVR1 is shown in a cartoon style scheme; in (b) amino acids in the molecule are shown by sticks and balls.
The in silico 3D model of the HVR1 describes a coil (aminoacids positions 1-8), an α-helix (positions 9-18), a second coil (positions 19-22), a second α-helix (positions 23-24) and a final third coil region (positions 25-27) (see Figure 5). Interestingly, residues found to be relevant in the network of co-evolving sites, like positions 12 and 15 or position 13 and 14, are situated in an ordered α-helix region of the HVR1. This suggests that interactions may allow residues to be replaced by other combinations while the overall ordered structure is maintained.
RNA viruses exist in large intra-host populations which display great genotypic and phenotypic diversity with a rate of evolution that are in the range of 10−2 to 10−3 substitutions per site per year   . These rates rank among the highest ones reported for RNA viruses  .
The use of deep sequencing methods has increased significantly through the recent years and has revealed a new perspective in evolutionary studies of RNA viruses, enabling an unparalleled capacity for characterization of minor variants from the mutant spectrum of viral quasispecies   .
On the other hand, Bayesian analysis provides a new and powerful method for reconstruction of evolutionary relationships. Moreover, Bayesian methods permit to estimates uncertainty of the models employed   .
Using a Bayesian coalescent MCMC approach, the results of these studies revealed a mean rate of evolution of HCV HVR1 of the intra-host quasispecies population of 4.80 × 10−2 amino acid substitutions/site/year (Table 1). This is in agreement with recent studies that established a significantly high rate of evolution of the HVR1 region within a single host. Besides, these studies demonstrated that the evolution of this region within a single host resulted to be considerably higher than that of among hosts  .
Recent studies on the evolution of HCV during chronic infection, like the case of the patient where the quasispecies population was isolated  , revealed that after progression to chronicity the virus evolves primary under negative selection pressure  .These studies suggest that during the disease course, the virus evolves through four stages: 1) HCV establishes a population in a new host under relaxed selection and in the absence of specific immune response; 2) the variability of the viral population increases and immune escape variants are generated; 3) the diversification of the viral population into a set of subpopulations that supersede the initial dominant population; and 4) HCV settles under strong negative selection  . This is in agreement with the results of this work, since a sharp and rapid diversification in three different sub-populations was observed (see Figure 1 and Figure 2).
Interestingly, while previous work on quasispecies population speaks of a master variant, which is surrounded by a spectrum of mutants with a certain probability distribution  , and the master variant is often perceived as the variant with the highest fitness under particular environmental conditions  . The results of this work, based in ultra-deep sequence analysis of a quasispecies population, revealed that this is not the case in the quasispecies population analyzed. The most abundant sequence (Seq1, 10.49% abundance) is evolving in just one sub-population (see Figure 1 and Figure 2), showing a more close genetic relation with cluster 1 sub-population haplotypes and a more distant genetic relation with clusters 2 and 3 sub-populations (Figure 1 and Figure 2). Importantly, these results suggest that the quasispecies concept, that assumes that the population as a whole rather than an individual variant is the actual target of selection  . Thus, the advantage of the heterogeneous ensemble of variants lies in the improved ability of the HCV population to efficiently explore a wide sequence space to find the fitness optimum  .
Median-joining network analysis showed that the quasispecies population was composed by three distinct (1-3) subpopulations (see Figure 3). Again, at least two different sub-populations (2 and 3) that accounts for more than 50% of the hayplotypes observed arose in the quasispecies population and are situated in different spaces in the network by comparison with sub-population 1, which contains the master sequence (Figure 3). These results suggest a divergent virus quasispecies population that occupied large extensions of the sequence space  .
Detection of coevolving residues in a protein by the comparative analysis of homologous gene sequences is an important source of evidence for the functional and/or structural characterization of proteins  . The extensive of the HVR1 remains an obstacle for the development of appropriate HCV anti-viral strategies and vaccine development. HVR1 plays a substantial roll in mediating virus cell entry, antibody-mediated neutralization, and immune evasion  . It has been suggested that there is a complex interplay between HVR1 and the SR-BI and CD81 receptors  . Interestingly, we have identified two amino acid positions that play a key role in the co-evolution of sites in the HVR1 of this quasispecies population (positions 12 and 13, see Figure 4). These amino acids interact with key amino acid positions recently identified to be fundamental for binding of the HCV envelope protein to the SR-BI receptor (positions 14, 15 and 26)  . Although our in silico model represents an approximation to resolve in detail the 3D structure of the HVR1 and complete E2 protein (unfortunately not available at this time), our results suggest that most of these sties lie in an ordered α-helix structure and co-evo- lution among suitable combinations of amino acids may be needed to maintain this structure (see Figure 5). This is in agreement with recent results revealing that deletion of these key residues situated in microdomain 1 of HVR1 could create local conformational changes, suggesting that their role in interaction with SR-BI may be indirect  .
This study has permitted us to observe a quasispecies population evolution to a greater detail. The composition, complexity, and amplitude of the mutant spectrum of a HCV quasispecies population that ultra-deep sequence studies permits to observe reveals the powerful influence of the quasispecies nature of the HCV populations has in the behavior of the virus, disease and response to antiviral strategies. In-deep studies on the HCV quasispecies complexity are extremely important, since the advantage of a broad mutant spectrum lies in the increased capacity of the viral population to find portions of sequence space in which to increase its fitness  .
A key issue in HCV treatment is whether drug-resistant variants preexist in HCV populations and if the presence of these minority subpopulations with mutations that decrease sensitivity to direct antiviral agents (DAAs)s would help to determine treatment outcome. An international expert panel (HCV Drug Development Advisory Group; HCV DRAG) has implemented detailed recommendations for resistance testing during clinical evaluation of new DAAs in which pre-treatment samples should be rigorously analyzed by NGS approaches in order to identify mutation patterns of known or novel pre-existing variants and to provide the baseline for mutations emerging at later time points  . This speaks of the importance of the use of appropriate and accurate approaches in order to cope to these important matters.
The results of this work highlight the importance and suitability of an in-depth analysis of the HCV quasispecies population circulating in vivo in patients infected with the virus by means of next-generation sequencing (NGS) platforms. Towards that goal, we combined in this work a deep-sequencing approach, with a proper and reliable data reconstruction method  , embedded in a Bayesian coalescent phylogenetic analysis approach in order to accurate reconstruct the phylogeny of an evolving HCV quasispecies population in great detail  . Moreover, by the very nature of evolving viral quasispecies, different and complementary approaches are needed in order to draw accurate conclusions from intra-host virus evolution  . We found the median-joining method a suitable and complementary approach  .
The future of HCV treatment will rely in our capacity of assessing pre-existence memory genomes with resistance mutations at baseline and during treatment in order to decide the most suitable HCV antiviral combination therapy  . The results of this work highlight the suitability of these analyses towards this important goal to control HCV infection.
A Bayesian coalescent phylogenetic analysis of hypervariable region 1 (HVR1) sequences of a HCV quasispecies population circulating in a chronic patient obtained by ultra-deep sequencing was performed. The results of these studies revealed a sharp and rapid diversification of the HCV quasispecies in three different sub-popula- tions, where at least one cluster was entirely composed of less frequent haplotypes ranging from 7.41% to 0.107% in abundance. The most abundant variant in the quasispecies population was not the center of a complex network around this sequence, suggesting that the quasispecies population as a whole efficiently explore a wide sequence space in which to increase its fitness. The results of this work highlight the importance of minority genomes in HCV population history and evolution as well as the roll of mutant clouds as reservoirs of phenotypic and genetic variants for virus adaptability. A NGS approach, combined with a proper data reconstruction method together with a Bayesian coalescent analysis approach have permitted a reliable reconstruction of the history and evolution of the HCV quasispecies population circulating in the patient. All these will have important implications for assessing the presence of memory genomes in relation to treatment and control of HCV infection.
We acknowledge support by Agencia Nacional de Investigacion e Innovacion (ANII) through project PE_ALI_ 2009_1_1603 and Sistema Nacional de Becas, and PEDECIBA, Uruguay.