Increase Data Characters to Construct the Molecular Phylogeny of the Drosophila auraria Species Complex

Previous phylogenetic analyses of the auraria species complex have led to conflicting hypotheses concerning their relationship; therefore the addition of new sequence data is necessary to discover the phylogeny of this species complex. Here we present new data derived from 22 genes to reconstruct the phylogeny of the auraria species complex. A variety of statistical tests, as well as maximum likelihood mapping analysis, were performed to estimate data quality, suggesting that all genes had a high degree of contribution to resolve the phylogeny. Individual locus was analyzed using maximum likelihood (ML), and the concatenated dataset (21,882 bp) were analyzed using partitioned maximum likelihood (ML) and Bayesian analyses. Separated analysis produced various phylogenetic relationships. Phylogenetic topologies from ML and Bayesian analysis based on concatenated dataset show that D. subauraria was well supported as the first species by separated analysis, concatenated dataset analysis, and some previous analysis, then followed by D. auraria and D. biauraria, D. quadraria and D. triauraria. The close relationships of D. quadraria and D. triauraria were consistent with most previous studies. The phylogenetic position of the D. auraria and D. biauraria will be resolved by more data sets.


Sequence Alignment and Statistical Tests
Alignment of multiple DNA sequences was performed with MUSCLE for each gene [30].The base composition, variable sites, and average genetic p-distance among all taxa were calculated by MEGA 4 [31].The degree of nucleotide substitution saturation for each gene was tested using DAMBE 4.5.47 software [32].
A test for homogeneity of base frequencies across taxa was conducted using

Nucleotide Evolutionary Model Selection, Phylogenetic Analysis
For separate analysis, maximum likelihood (ML) trees for each locus were constructed in PAUP*v.4.0b10 [33] with the best nucleotide substitution model as determined by the Akaike Information Criterion (AIC).The concatenated dataset was divided into 22 partitions representing 25 genes, the best-known likelihood (BKL) tree for concatenated dataset was inferred after conducting 1000 RAxML runs using the f-d option for thorough searching and bootstrap replicates were performed in the multithread compiled version of RAxML-7.04.And Bayesian analysis running in MrBayes-3.1.2[34] with 1,000,000 MCMC generations using the substitution model and parameters deduced from Model Test 3.06 [35].

Alternative Phylogenetic Hypotheses Test
SH test using CONSEL version 0.1 [36] were performed to test the statistical support of most of the previous hypotheses (Figure 1) and the hypotheses deduced from separate analysis.The BKL tree as optimal likelihood tree was modified using TreeView [37] to produce phylogenetic trees representing the alternative hypotheses.

Sequence Alignment and Statistical Tests
Aligned sequences for the individual gene regions varied from 334 to 2455 bp in length, and the variation and parsimony informative sites were quite different among all genes, bab1 and 28S contain the highest and lowest number of parsimony informative sites, respectively (Table 3).Most of the average p-distances among the taxa were lower than 10% (18 out of 22); the mrp1 and bab1 have larger values, whereas 28S, adh234, and odo have very small values (all lower than 2.2%).The test for substitution saturation [32] show that all gene regions have no substitution saturation.The sequences of all fragments show homogeneity of base frequencies (P ≥ 0.05).

Phylogenetic Analysis
The topologies of the trees deduced from the concatenated dataset under the ML and Bayesian analysis were completely identical (Figure 2), the five species in auraria species complex consisted of three lineages, the D. subauraria is the first species, then D. auraria and D. biauraria, D. quadraria and D. triauraria.The percent bootstrap support in ML analysis and posterior probabilities in Bayesian analysis all are 100 and 1.0, respectively.Maximum likelihood (ML) trees for each locus constructed in PAUP*v.4.0b10 [33] were different (Figure 3).

Alternative Hypotheses Test
All ML trees from each gene completely supported the melanogaster species group comprised of three monophyletic lineages: the ananassae subgroup, the montium subgroup, and the melanogaster subgroup plus oriental subgroups; however genes differed in the relationships among these groups.The montium subgroup was supported as the sister taxon of all remaining members of the melanogaster group by 6 of the 17 genes.The close relationships of the melanogaster, suzukii, and takahashii subgroups were supported by 4 genes.All 17 genes supported suzukii and takahashii as the sister lineages, and 7 genes supported the monophyly clade of ficusphila, eugracilis, elegans, and rhopaloa subgroups.Five of the 17 genes accepted the paraphyly of the suzukii subgroup, in which D. lucipennis is the sister species of D. elegans (see Table 3, and supplemental material).The p-values of all alternative phylogenetic hypotheses (Figure 1), except the hypotheses (p-values = 0.016) suggested by van der Linde and Houle (2008), are significantly lower than 0.005.

Quality Evaluation for All the Representative Genes
Data quality is crucial for phylogenetic analysis, especially, based on DNA data-  sets; improper data which conceal conflicting evidence could lead to incorrect phylogenetic tree topologies [38].All data in this study included various parsimony information sites, and the average genetic p-distance was close to 10% (Table 1), except for 28S (4.0%) and the four mtDNA (COI = 8.6%, COII = 8.1%, ND1 = 7.8%, and ND4 = 7.9%).28S is 343 bp in length, a small part of the 28S complete sequence; the low genetic p-distance could come from arbitrary selection, hence, the conservative regions of this gene was selecte D. The mtDNA was traditionally considered as conservative genes; these kinds of genes were effective for discovering "higher-level" phylogenies.The likelihood mapping (Table 3) results also indicated that these four genes have significant contributions to resolve the phylogeny of the melanogaster species group (COI = 80.0%, COII = L. Gan et al.
of these species, it is advantageous to consider add more DNA sequences data.
From all the separated and most previous analysis, it was obvious that any signal gene data was limited to discover the phylogenetic relationships of this species complex because almost all the previous analysis and separated analysis were rejected by analysis based on the concatenated dataset.Especially, some authors emphasized that phylogenetic hypotheses.

Conclusions
1) The phylogenetic relationship of Auraria species complex were analyzed based on different data, D. subauraria is the first species, D. auraria and D. biauraria, D. quadraria and D. triauraria as the second clusters.
2) To discuss the phylogenetic relationship of relative species, it was advantageous to concatenate large dataset than single locus.

Figure 1 .
Figure 1.The diagram of the phylogenetic relationships of auraria species complex based on different data sets.(A, B, Q, S and T are D. auraria, D. biauraria, D. subauraria, D. quadraria and D. triauraria, respectively.

Figure 2 .
Figure 2. The phylogenies of auraria complex were deduced from the concatenated dataset under the ML and Bayesian analysis.the number on the branch refer to bootstrap support in ML analysis and posterior probabilities in Bayesian analysis.D. melanogaster was out group.

Figure 3 .
Figure 3. Phylogenetic relationships of the auraria species complex were constructed based on different single locus.

Table 1 .
Experimental species name and GenBank accession numbers.

Table 2 .
PCR conditions and primers.

Table 3 .
The characters of the 22 genes across 5 of the auraria species complex.