^{1}

^{*}

^{2}

^{3}

Human Immunodeficiency Virus (HIV) dynamics in Africa are purely characterised by sparse sampling of DNA sequences for individuals who are infected. There are some sub-groups that are more at risk than the general population. These sub-groups have higher infectivity rates. We came up with a likelihood inference model of multi-type birth-death process that can be used to make inference for HIV epidemic in an African setting. We employ a likelihood inference that incorporates a probability of removal from infectious pool in the model. We have simulated trees and made parameter inference on the simulated trees as well as investigating whether the model distinguishes between heterogeneous and homogeneous dynamics. The model makes fairly good parameter inference. It distinguishes between heterogeneous and homogeneous dynamics well. Parameter estimation was also performed under sparse sampling scenario. We investigated whether trees obtained from a structured population are more balanced than those from a non-structured host population using tree statistics that measure tree balance and imbalance. Trees from non-structured population were more balanced basing on Colless and Sackin indices.

Human Immunodeficiency Virus (HIV) is among the most infectious diseases globally. According to [

Many African countries are battling with HIV epidemic. For example, the total burden of HIV in Uganda is increasing according to spectrum estimates from the Ministry of Health as documented in [

In order to study such a host population so as to make inference about HIV epidemic, viewing it as a structured population gives more reliable parameter inference due to heterogeneity manifested by individuals in terms of infectivity. In this paper, we are interested in a model that depicts HIV dynamics in an African setting. Dis-proportionality in terms of infectivity is one of the characteristics significant for HIV dynamics in Africa. In order to account for this, we employ models that make inference for structured host population. Incorporating such models for HIV dynamics in Africa is still a challenge as the models should entail characteristics like sparse sampling and heterogeneity in infectivity among infected individuals. We incorporate such characteristics in the model for a structured host population. This aids in making parameter inference for HIV epidemic in an African setting. Molecular sequence data uncovers underlying disease dynamics in structured host populations much better than prevalence and incidence data.

Molecular sequence data is increasingly becoming available even in Africa, for example Southern African Treatment and Resistance Network has a growing database of greater than 70,000 HIV sequences [

We extend the multi-type birth death model of likelihood inference implemented by [

The model used in this paper differs from the Bayesian inference that was implemented by [

Using ideas developed in [

For example, deme 1 can be a general population and deme 2 a sub-population at risk.

We describe the parameters in the first deme. Those for the second deme are described analogously. The following are the parameters for the model in deme 1;

1) Parameter λ 11 is the rate at which an individual in deme 1 transmits or infects an individual of deme 1. Similarly, λ 12 is the rate at which an individual in deme 1 transmits to an individual of deme 2. These are the speciation (transmission) rate parameters in deme 1. There is both within ( λ 11 ) and across ( λ 12 ) deme transmission in this multi-type birth-death model.

2) Parameter μ 1 is the rate at which an individual in deme 1 dies. This is the extinction rate parameter.

3) Parameter ψ 1 is the rate at which an individual of deme 1 is sampled. Sampling in this context is the way HIV sequences are included into the study.

4) Parameter r is the probability that an individual in either of the demes becomes non-infectious upon sampling (being removed from infectious pool). An individual therefore remains infectious with probability 1 − r . Parameter r is the added parameter in our model which is not in the model presented in [

For the derivation of the likelihood inference for parameter estimation, we still build on ideas used in [

In

dotted vertical line, 0 is the present, τ is the time at which a certain individual that belongs to deme 2 was sampled, t is the time taken to give raise to the sub-tree (dashed oval) that is subtended by an individual represented by an edge E. T is the time taken for the process or epidemic. This is the time from the origin of the epidemic to present. The model illustrated in

In this derivation, D E 1 ( t ) is the probability density for an individual represented by an edge E in deme 1 giving raise to the sub-tree of age t (dotted oval). For D E 1 ( t ) , t > τ as shown in

D E 1 ( t + Δ t ) = ( 1 − ( ∑ j = 1 2 λ 1 , j + μ 1 + ψ 1 ) Δ t ) D E 1 ( t ) + ∑ j = 1 2 λ 1 , j Δ t Z j ( t ) D E 1 ( t ) + ∑ j = 1 2 λ 1 , j Δ t Z 1 ( t ) D E j ( t ) + O ( Δ t 2 ) (1)

From Equation (1), the first term on the right is for no birth, death and sampling events, second term represents birth of an individual of deme j while lineage j produces no samples in time t, third term signifies birth of an individual of deme j while lineage 1 produces no samples in time t. The last term on right is the probability that more than one event occurs in the time interval Δ t .

After re-arranging the terms in Equation (1) and letting Δ t → 0 , we obtain the differential equation:

d d t D E 1 ( t ) = − ( ∑ j = 1 2 λ 1 , j + μ 1 + ψ 1 ) D E 1 ( t ) + ∑ j = 1 2 λ 1 , j Z j ( t ) D E 1 ( t ) + ∑ j = 1 2 λ 1 , j Z 1 ( t ) D E j ( t ) (2)

The initial condition at τ is given as:

D E 1 ( τ ) = r μ 1 ψ 1 + ( 1 − r ) ∑ j = 1 2 λ 1, j ψ 1 , for j = 1 , and D E 1 ( τ ) = 0, for j ≠ 1, (3)

The first term on the right of Equation (3) depicts individuals who become non-infectious upon sampling while the last term signifies those who are not removed from the infectious pool, i.e., they remain infectious even after being sampled. In typical African communities, many individuals infected with HIV remain infecting others either knowingly or unknowingly. When r is 1, the initial condition given in Equation (3) reduces to that used in [

Equation (2) is used to obtain the quantity D E 1 ( T ) which is the desired probability density for the multi-type birth-death tree. The derivation for Z 1 ( t ) is required to obtain the density of the tree. Z 1 ( t ) is derived using similar steps as those used in the derivation for Equation (2). The differential equation for Z 1 ( t ) is given as:

d d t Z 1 ( t ) = ( 1 − ψ 1 ) μ 1 − ( ∑ j = 1 2 λ 1 , j + μ 1 + ψ 1 ) Z 1 ( t ) + ∑ j = 1 2 λ 1 , j Z 1 ( t ) Z j ( t ) , (4)

with the initial condition,

Z 1 ( 0 ) = Z 2 ( 0 ) = 1. (5)

The first term on the right of Equation (4) represents death without sampling, the second signifies no birth, death and sampling as well as an individual in deme 1 producing no samples. The last term depicts a birth of an individual j but both individuals 1 and j producing no samples in time t. The initial condition given in Equation (5) implies that an individual at time t = 0 is not sampled. This is the case because the model used accounts for sampling through time only. Contemporaneous sampling, which is sampling at only one time point especially the present is not modelled in this inference.

The differential equations (2) and (4) with initial condition given by (3) are for deme 1. The corresponding differential equations for deme 2 were analogously derived. The system of differential equations with their initial conditions that is integrated along a given edge for 2 demes is given as:

d d t Z 1 ( t ) = ( 1 − ψ 1 ) μ 1 − ( λ 1 , 1 + λ 1 , 2 + μ 1 + ψ 1 ) Z 1 ( t ) + λ 1 , 1 Z 1 2 ( t ) + λ 1 , 2 Z 1 ( t ) Z 2 ( t ) d d t Z 2 ( t ) = ( 1 − ψ 2 ) μ 2 − ( λ 2 , 1 + λ 2 , 2 + μ 2 + ψ 2 ) Z 2 ( t ) + λ 2 , 1 Z 2 ( t ) Z 1 ( t ) + λ 2 , 2 Z 2 2 ( t ) d d t D E 1 ( t ) = − ( λ 1 , 1 + λ 1 , 2 + μ 1 + ψ 1 ) D E 1 ( t ) + ( λ 1 , 1 Z 1 ( t ) + λ 1 , 2 Z 2 ( t ) ) D E 1 ( t ) + λ 1 , 1 Z 1 ( t ) D E 1 ( t ) + λ 1 , 2 Z 1 ( t ) D E 2 (t)

d d t D E 2 ( t ) = − ( λ 2 , 1 + λ 2 , 2 + μ 2 + ψ 2 ) D E 2 ( t ) + ( λ 2 , 1 Z 1 ( t ) + λ 2 , 2 Z 2 ( t ) ) D E 2 ( t ) + λ 2 , 1 Z 2 ( t ) D E 1 ( t ) + λ 2 , 2 Z 2 ( t ) D E 2 ( t ) Z 1 ( 0 ) = Z 2 ( 0 ) = 1 D E 1 ( τ ) = r μ 1 ψ 1 + ( 1 − r ) ( λ 1 , 1 + λ 1 , 2 ) ψ 1 D E 2 ( τ ) = r μ 2 ψ 2 + ( 1 − r ) ( λ 2 , 1 + λ 2 , 2 ) ψ 2 (6)

Solving the system of equations in (6) numerically gives the calculations along an edge of a tree. This is then followed by pruning at the nodes and finally reaching the root of the tree. This results in obtaining either D E 1 ( T ) or D E 2 ( T ) depending on whether the root was in deme 1 or 2.

The probability density of the tree with the first individual at time T conditioned on observing at least an individual who is sampled for a structured population with 2 demes is given as:

p ( T | λ , μ , ψ , T ) = f 1 D E 1 ( T ) 1 − Z 1 ( T ) + f 2 D E 2 ( T ) 1 − Z 2 ( T ) (7)

The probability that an individual at time T is of deme 1 is given by f 1 and its distribution must be specified. Equation (7) is the likelihood of the parameters given the data. This is used for parameter inference. It should be noted that neither Z 1 ( T ) nor Z 2 ( T ) can be 1. This is because if any takes on 1, then the process will not give raise to the inferred tree. The conditioning is necessary since studies have shown that this conditioning gives more accurate parameter values [

λ = ( λ 1 , 1 , λ 1 , 2 , λ 2 , 1 , λ 2 , 2 ) μ = ( μ 1 , μ 2 ) , ψ = ( ψ 1 , ψ 2 ) , r and T .

The necessary changes suggested in Equation (3) to reflect HIV dynamics in an African setting were effected in R using an R package, Treesim of [

Another set of simulations was done for 100 simulated trees, with 200 sampled tips. We varied the probability of removal, taking on values, r = 0.8 and r = 0.5. The sampling probability remained at 0.2 for both sub-populations. In another set of simulations, the sampling probabilities were set to ψ 1 = ψ 2 = 0.05 for both the sub-populations. This depicted sparse sampling which is highly the case in several HIV epidemics for Africa. Again the probability of removal was varied as it was set at r = 0.2 and r = 0.8. To investigate the effect of varying the sampling proportions, we simulated trees when the sampling proportions were different. We set ψ 1 = 0.2 and ψ 2 = 0.01 for this kind of parameter inference.

For comparison between two models, we used likelihood ratios. From [

L R = − 2 [ max θ ∈ Θ 0 log ( L ( θ ) ) − max θ ∈ Θ 1 log ( L ( θ ) ) ] (8)

In [

( X | H ) is the likelihood of the data (X) under hypothesis H. This likelihood statistic simplifies to the one given in Equation (8). Likelihood ratio statistic is asymptotically chi-squared distributed under the null hypothesis according to [

For heterogeneous dynamics, we assume that the population is structured into demes, say 2 as in our model. Each of the deme has different birth and death parameters. For homogeneous case, the population is not structured and individuals have the same birth and death rates. We investigated how well our model distinguished between heterogeneous and homogeneous dynamics. For simplicity, we had two sets of simulations, named A and B.

In A, both trees simulated and parameter inference were made under MTBD-2. For B, parameter inference was made for multi-type birth-death model with 1 deme (MTBD-1), yet trees were obtained under MTBD-2. We then performed likelihood tests to find out the number of trees that were rejected in support of the MTBD-2 (heterogeneous) dynamics. For homogeneous dynamics, we simulated trees under MTBD-1 and made parameter inference under both MTBD-1 and MTBD-2. Likelihood ratio tests were used to find out how many trees were rejected whose inference was made under MTBD-2.

Since the use of likelihood ratios requires that H 0 is derived from a full alternative H 1 , in A, we tested H 0 : Tree simulation under MTBD-2 and parameter inference under MTBD-1 against H 1 : Tree simulation under MTBD-2 and parameter inference under MTBD-2. In B, we had H 0 : Tree simulation under MTBD-1 and parameter inference under MTBD-1 against H 1 : Tree simulation under MTBD-1 and parameter inference under MTBD-2. H 0 and H 1 are defined as null and alternative models, respectively.

In the absence of real DNA sequence data for a structured host population, DNA sequences were simulated using a stochastic agent based model called, the Discrete Spatial Phylo Simulator (DSPS). DSPS was also used in [

In order to make analysis faster, we down sampled the set of sequences which we used in the analysis. In this set, there were 21532 sequences and we sampled 200 sequences uniformly at random. We further reduced the sub-groups to homosexuals, heterosexuals and bisexuals. Out of 200 sequences, 157 were for homosexuals, 38 for bisexuals and only 5 for heterosexuals. Since we were interested in MTBD-2 model, we discarded 5 sequences for heterosexuals. This resulted in analysis of 195 sequences in our set of simulated HIV sequences.

We made a Bayesian study for a sampled set of sequences using Bayesian Evolution Analysis by Sampling Trees (BEAST) developed by [

We used the mean estimates for the parameters obtained from this Bayesian study to simulate trees with 200 sampled tips. The death rate parameters used in the simulation were equivalent to becoming non-infectious parameters from the Bayesian study. The birth-rate parameters were not inferred directly from this study. The birth-rate parameters were however computed from the reproductive numbers for each deme. This was possible since the BEAST study inferred the reproductive numbers of each deme. Since the reproductive number of each was

obtained using R 0 i = ∑ j = 1 2 λ i , j μ i , and yet we obtained R 0 i and μ i from the BEAST study, we therefore set ∑ j = 1 2 λ i , j = μ i R 0 i during the simulations with the

mean parameter estimates from the BEAST study of the simulated sequences. From the simulated trees, we estimated the parameters to evaluate the performance of our model with parameters that were obtained from the simulated DNA sequences.

We investigated whether there was a relationship between some tree statistics and the structuring of the host population. This was done using tree statistics that measure balance and imbalance of a phylogeny. In [

Gamma ( γ ) statistic was defined in [

γ = ( 1 n − 2 ∑ i = 2 n − 1 ( ∑ k = 2 i k g k ) ) − ( T 2 ) T 1 12 ( n − 2 ) (9)

where

T = ∑ j = 2 n j g j

We analysed whether the values for γ -statistic are affected by the structuring of a host population. We computed gamma-statistic for both structured and non-structured host population. In [

Another tree statistic which we used is called Sackin index which adds the number of internal nodes between each leaf and the root. It is defined as:

I s n = ∑ i = 1 n N i (10)

where n is the number of leaves of the tree and N i is the number of internal nodes crossed in the path from leaf i to the root. It is well known in systematic biology that the expectation of I s n under Yule model is of order 2 n ln ( n ) . The normalized Sackin index converges in distribution as the number of leaves, n grows

to infinity. The normalized Sackin index is defined in [

Colless index is another tree statistic that assesses tree imbalance. It is defined as:

I c = 2 ( n − 1 ) ( n − 2 ) ∑ i = 1 n − 1 | T R − T L | (11)

where i is an internal node, n is the total number of leaves. For each internal node, T R is the number of terminal taxa subtended by the right hand branch and T L is the number of terminal taxa subtended by left hand branch. For a normalised I c , it ranges from 0 (perfect balance) to 1 (incomplete) balance.

Tree balance is an important consideration for phylogenies because the balance of the true phylogeny affects the accuracy of its estimates as stated in [

We simulated phylogenetic trees in R using TreeSim package of [

In the first set of simulations, we simulated 5000 trees with 200 sampled tips under both structured and non-structured host populations (A1 and A2). In the second set of simulations, we increased the number of sampled tips to 500 while other parameters remained the same as those for the first set of simulations under both structured and non-structured host populations (B1 and B2). In the third set of simulations, parameters were the same as those in first set of simulations but increased the number of sampled trees to 15,000 for both structured and non-structured populations (C1 and C2). Under each of the simulations, we computed gamma-statistic values for the simulated trees. Using an R package apTreeshape of [

For fixed sampling probabilities (

Having observed that increasing the number of simulated trees to 100 improved parameter estimates, we increased the number of sampled tips from 100 to 200 in the next set of simulations. This is shown in

For | True value | Mean | Median | 95% CI |
---|---|---|---|---|

17.000 | 16.542 | 16.489 | [11.007, 22.704] | |

4.000 | 4.724 | 2.895 | [0.000, 25.838] | |

6.000 | 10.782 | 6.489 | [0.000, 39.772] | |

8.000 | 7.273 | 6.827 | [0.000, 16.052] | |

9.000 | 11.593 | 9.781 | [0.000, 23.597] | |

3.000 | 0.935 | 1.243 | [−25.061, 20.470] | |

For | ||||

17.000 | 16.464 | 16.520 | [11.207, 20.851] | |

4.000 | 3.983 | 3.765 | [0.000, 10.515] | |

6.000 | 8.092 | 6.922 | [0.000, 32.281] | |

8.000 | 8.153 | 7.651 | [0.000, 17.024] | |

9.000 | 10.015 | 9.252 | [0.000, 23.330] | |

3.000 | 3.014 | 2.104 | [−14.388, 33.727] | |

For | ||||

17.000 | 16.137 | 16.920 | [5.484, 20.558] | |

4.000 | 5.007 | 4.483 | [0.000, 15.587] | |

6.000 | 6.611 | 5.016 | [0.000, 25.069] | |

8.000 | 7.959 | 8.684 | [0.000, 12.711] | |

9.000 | 11.302 | 9.492 | [3.754, 30.200] | |

3.000 | 3.237 | 1.899 | [−2.926, 16.476] |

Parameters for | True value | Mean | Median | 95% CI |
---|---|---|---|---|

17.000 | 16.438 | 16.898 | [7.990, 19.241] | |

4.000 | 4.407 | 3.598 | [0.000, 15.162] | |

6.000 | 6.508 | 6.108 | [0.000, 18.591] | |

8.000 | 7.693 | 8.484 | [0.000, 11.829] | |

9.000 | 10.983 | 9.038 | [4.330, 26.605] | |

3.000 | 2.985 | 2.058 | [−2.162, 13.363] | |

For | ||||

17.000 | 16.869 | 16.905 | [12.755, 20.473] | |

4.000 | 3.766 | 3.510 | [0.000, 10.938] | |

6.000 | 7.258 | 6.429 | [0.000, 21.580] | |

8.000 | 7.959 | 8.107 | [2.528, 11.630] | |

9.000 | 10.094 | 9.433 | [2.249, 23.875] | |

3.000 | 2.335 | 2.395 | [−11.925, 13.089] |

estimation. The 95% confidence intervals for parameters in the upper part of

To investigate whether sampling intensity had an effect on parameter estimates, we varied the sampling proportion from 0.2 to 0.05. Parameter estimates when the sampling intensity was fixed at

When parameter inference was made with different sampling proportions, i.e, for

For the error bars shown in

To establish whether our model distinguished between heterogeneous population (MTBD-2) from homogeneous population (MTBD-1), we performed likelihood ratio tests. In the likelihood ratio inference, we used a decision rule of rejecting the null-hypothesis if the likelihood ratio (test statistic) is greater than

Parameters for | True value | Mean | Median | 95% CI |
---|---|---|---|---|

17.000 | 16.347 | 15.770 | [0.000, 29.305] | |

4.000 | 4.351 | 2.190 | [0.000, 14.730] | |

6.000 | 12.800 | 6.120 | [0.000, 41.502] | |

8.000 | 7.482 | 5.842 | [0.000, 25.795] | |

9.000 | 11.718 | 9.402 | [2.312, 34.556] | |

3.000 | 3.865 | 0.886 | [−18.917, 36.919] | |

For | ||||

17.000 | 16.788 | 16.200 | [7.715, 22.555] | |

4.000 | 4.499 | 3.623 | [0.000, 16.305] | |

6.000 | 12.477 | 8.373 | [0.000, 36.119] | |

8.000 | 5.452 | 5.410 | [0.000, 12.472] | |

9.000 | 13.710 | 11.637 | [0.000, 43.433] | |

3.000 | −1.834 | −1.238 | [−18.265, 11.133] |

(A) | True value | Mean | Median | 95% CI |
---|---|---|---|---|

17.000 | 16.586 | 17.067 | [8.634, 23.338] | |

4.000 | 4.285 | 3.076 | [0.000, 12.518] | |

6.000 | 10.645 | 7.853 | [0.000, 33.991] | |

8.000 | 6.939 | 6.774 | [0.000, 15.088] | |

9.000 | 12.285 | 9.990 | [0.019, 28.269] | |

3.000 | 0.519 | −0.396 | [−21.688, 18.662] | |

(B) | ||||

35.774 | 33.637 | [2.879, 61.085] | ||

4.139 | 3.489 | [1.741, 8.112] | ||

−0.969 | −1.999 | [−7.964, 9.120] |

(A) | True value | Mean | Median | 95% CI |
---|---|---|---|---|

15.000 | 14.707 | 14.514 | [9.530, 20.433] | |

7.500 | 5.625 | 5.871 | [0.000, 15.468] | |

15.000 | 24.068 | 18.018 | [2.885, 81.330] | |

7.500 | 6.973 | 6.460 | [0.000, 20.342] | |

6.000 | 6.515 | 3.488 | [0.000, 30.117] | |

6.000 | 9.578 | 4.414 | [−35.129, 96.456] | |

(B) | ||||

15.000 | 8.079 | 6.877 | [0.873, 19.679] | |

7.500 | 17.461 | 16.330 | [3.898, 35.268] | |

6.000 | 5.303 | 2.467 | [−12.899, 45.491] |

However, according to [

We analysed 195 sequences from the sample. These sequences were structured into two groups, namely MSM and Bisexuals. For this sample, we run MCMC in BEAST which was of 12,000,000 length. The mixing was not very good as some parameters had effective sample sizes (ESS) which were low although some had high ESS of above 400. Our interest was mainly on three parameters, namely

From results shown in

Parameters for | True value | Mean | Median | 95% CI |
---|---|---|---|---|

0.100 | 0.374 | 0.134 | [0.000, 1.813] | |

0.140 | 0.178 | 0.147 | [0.000, 0.759] | |

1.280 | 1.306 | 1.315 | [0.991, 1.613] | |

0.800 | 0.746 | 0.627 | [0.035, 2.274] | |

0.215 | 0.325 | 0.264 | [0.000, 1.009] | |

0.215 | 0.175 | 0.110 | [−0.950, 1.354] |

A1: 5000 simulated trees with 200 sampled tips in a structured population | |||
---|---|---|---|

Tree statistic | Mean | Median | 95% CI |

Gamma-statistic | −8.0765 | −35.0077 | [−95.1967, 36.576] |

Colless’ index | 1848.324 | 1763 | [1111, 3090] |

Sackin’s index | 2873.392 | 2785 | [2191, 4062] |

A2: 5000 simulated trees with 200 sampled tips in a non-structured population | |||

Tree statistic | Mean | Median | 95% CI |

Gamma-statistic | −25.4391 | −34.6674 | [−125.8019, 55.931] |

Colless’ index | 1558.561 | 1490 | [947, 2532] |

Sackin’s index | 2609.295 | 2539 | [2050, 3533] |

B1: 5000 simulated trees with 500 sampled tips in a structured population | |||

Tree statistic | Mean | Median | 95% CI |

Gamma-statistic | −52.1761 | −51.7531 | [−109.566, 4.93] |

Colless’ index | 5964.97 | 5714 | [3821, 9482] |

Sackin’s index | 8974.741 | 8734 | [6955, 12375] |

B2: 5000 simulated trees with 500 sampled tips in a non-structured population | |||

Tree statistic | Mean | Median | 95% CI |

Gamma-statistic | −103.5037 | −52.006 | [−154.277, 36.237] |

Colless’ index | 4934.575 | 4747 | [3247, 7631] |

Sackin’s index | 8016.174 | 7830 | [6441, 10616] |

C1: 15,000 simulated trees with 200 sampled tips in a structured population | |||

Tree statistic | Mean | Median | 95% CI |

Gamma-statistic | −44.139 | −35.071 | [−105.172, 40.067] |

Colless’ index | 1854 | 1764 | [1092, 3121] |

Sackin’s index | 2878.715 | 2788 | [2173, 4090] |

C2: 15,000 simulated trees with 200 sampled tips in a non-structured population | |||

Tree statistic | Mean | Median | 95% CI |

Gamma-statistic | −25.312 | −34.754 | [−124.382, 71.212] |

Colless’ index | 1553.154 | 1484 | [948, 2539] |

Sackin’s index | 2603.978 | 2534 | [2052, 3549] |

negative. However, the confidence intervals became narrower. When the sampled trees were increased from 5000 to 15,000 trees, while kept number of sampled tips to 200, the mean values for gamma-statistic became more negative under structured population though remained almost constant under non structured population. The median values remained almost constant.

For Colless index, the mean and median values were higher under structured population compared to non-structured population. Increasing the sampled tips from 200 to 500 resulted in higher mean and median Colless index values. Increasing simulated trees from 5000 to 15,000 resulted in no big difference in the mean and median values. Sackin index values have the same pattern as the Colless index values.

The densities for the three tree statistics are shown in

From the results shown in Tables 1-6 and Figures 3-6, parameter estimation was better when sampling probability was fixed at 0.2 compared to when it was set at 0.05 to depict sparse sampling. It was also observed that although parameter estimation for sparse sampling (

Our proposed model distinguished between heterogeneous and homogeneous populations. This is because 41 trees out of 50 supported the heterogeneous model dynamics, rejecting the homogeneous type of model inference. For homogeneous dynamics, all the 50 trees accepted the homogeneous dynamics and rejected the heterogeneous type dynamics. We employed the likelihood test statistic for this model selection using a level of significance (

For the investigation between structuring of a population and tree balance, it was concluded from the results shown in

We thank the Editor and reviewers for their comments which greatly improved this paper. The authors thank Pan-African University Institute of Basic Sciences, Technology and Innovation (PAUSTI) for funding this research.

The authors declare no conflicts of interest regarding the publication of this paper.

Kayondo, H.W., Mwalili, S. and Mango, J.M. (2019) Inferring Multi-Type Birth-Death Parameters for a Structured Host Population with Application to HIV Epidemic in Africa. Computational Molecular Bioscience, 9, 108-131. https://doi.org/10.4236/cmb.2019.94009