Increase Data Characters to Construct the Molecular Phylogeny of the Drosophila auraria Species Complex


Previous phylogenetic analyses of the auraria species complex have led to conflicting hypotheses concerning their relationship; therefore the addition of new sequence data is necessary to discover the phylogeny of this species complex. Here we present new data derived from 22 genes to reconstruct the phylogeny of the auraria species complex. A variety of statistical tests, as well as maximum likelihood mapping analysis, were performed to estimate data quality, suggesting that all genes had a high degree of contribution to resolve the phylogeny. Individual locus was analyzed using maximum likelihood (ML), and the concatenated dataset (21,882 bp) were analyzed using partitioned maximum likelihood (ML) and Bayesian analyses. Separated analysis produced various phylogenetic relationships. Phylogenetic topologies from ML and Bayesian analysis based on concatenated dataset show that D. subauraria was well supported as the first species by separated analysis, concatenated dataset analysis, and some previous analysis, then followed by D. auraria and D. biauraria, D. quadraria and D. triauraria. The close relationships of D. quadraria and D. triauraria were consistent with most previous studies. The phylogenetic position of the D. auraria and D. biauraria will be resolved by more data sets.

Share and Cite:

Gan, L. , Li, G. , Li, W. , Zeng, Q. and Yang, Y. (2017) Increase Data Characters to Construct the Molecular Phylogeny of the Drosophila auraria Species Complex. Open Journal of Genetics, 7, 40-49. doi: 10.4236/ojgen.2017.71004.

1. Introduction

The members of the auraria species complex in which ordinarily five members were involved (D. auraria, D. biauraria, D. subauraria, D. quadraria and D. triauraria) [1] [2] were considered as perfective model for reproductive isolation, flight activities, ability of diapauses, courtship songs and cold tolerance [3] . Recently, the phylogeny of the auraria species complex was studied based on various data, specifically, DNA sequence data. However, all analyses brought conflicting phylogenetic hypotheses [4] - [18] (Figure 1). The cause of the conflicting hypotheses is not known. All previous studies on the phylogeny of this species complex are based on different sample sizes or genetic markers. Differences in the number of taxa and the number of genes can have an effect on phylogenetic accuracy [19] . In many previous phylogenetic treatments of this species complex, representatives of only 4 species or less were included [4] [7] [13] [14] [15] [16] [17] [20] . Incomplete or insufficient taxon sampling has led to major inconsistencies in phylogenetic reconstructions [20] [21] [22] [23] [24] . On the other hand, differing sets of genetic markers were selected in previous studies, the most previous investigations were based on no more than two genetic markers [9] [11] [12] [15] [16] [17] , phylogenetic hypotheses deduced from small amounts of sequence data would be incongruent or pool support [25] . Moreover, highly conserved genetic markers were involved in some analyses [3] [7] [13] [14] but some authors suggested that fast-evolving DNA regions were prior to analysis the molecular phylogenies of closely related species [26] . Although the phylogenetic relationships of these five members were deduced from 17 loci [18] , the hypothesis that “increasing sampling outside the group may decrease accuracy” [27] may have applied; therefore, Yang (2012) did not resolve this complex problem. Many investigations suggested that maximizing gene numbers was advantageous to resolve complex phylogeny [28] [29] . Consequently, it

Figure 1. The diagram of the phylogenetic relationships of auraria species complex based on different data sets. (A, B, Q, S and T are D. auraria, D. biauraria, D. subauraria, D. quadraria and D. triauraria, respectively.

was advantageous to reconstruct the phylogeny of the five species based on increasing gene sampling sizes.

Finally, in this study, 22 genes segments were first used to reanalyze the phylogenetic relationships of D. auraria, D. biauraria, D. subauraria, D. quadraria and D. triauraria. These loci included partial genomic sequences of mitochondrial genes: cytochrome oxidase subunit I (COI), cytochrome oxidase subunit II (COII), mitochondrial genes ND1 (ND1) and ND4 (ND4); and nuclear ribosomal sequences: 28S rDNA (28S), internal transcribed spacer of nuclear ribosomal DNA (ITS including ITS1, 5.8S, 2S, and ITS2), and nuclear genes: amylase (amy), a paralogue of the amylase genes (amr), sn-glycerol-3-phosphate dehydrogenase (gpdh), histone 2 spacers (h2s), Dopa decarboxylase (ddc), extra sexcombs (esc), hunchback (hb), extron 2, 3, 4 of alcohol dehydrogenase gene (adh234), nucleoporin 96 - 98 gene (nup), membrane protein (patched) gene (ptc), and Xenopus Cdc6 (cdc), genes for odorant-binding protein 57d, odorant-binding protein 57e (odo), multidrug-resistance associated protein 1-(mrp1), wingless (wgl), intron1 of bab gene (bab1), endophilin B (endoB).

2. Materials and Methods

2.1. The Study Taxa and Sequences Data

The sequences of COI, COII, ND1, ND4, 28S, ITS, amy, amryel, gpdh, h2s, ddc, esc, hb, adh234, nup, ptc, Cdc6 were download from GneBank (GenBank accession numbers were listed in Yang et al., 2012). Sequences of odorant-binding protein 57d and e (odo), multidrug-resistance associated protein 1(mrp1), wingless (wgl), intron1 of bab gene (bab1) and endophilin B (endoB) were newly presented in this study. The detail information is given in Table 1. D. melanogaster was selected as the out group. PCR conditions and primers are listed in Table 2.

2.2. Sequence Alignment and Statistical Tests

Alignment of multiple DNA sequences was performed with MUSCLE for each gene [30] . The base composition, variable sites, and average genetic p-distance among all taxa were calculated by MEGA 4 [31] . The degree of nucleotide substitution saturation for each gene was tested using DAMBE 4.5.47 software [32] . A test for homogeneity of base frequencies across taxa was conducted using

Table 1. Experimental species name and GenBank accession numbers.

Table 2. PCR conditions and primers.

PAUP 4.0 beta 10 [33] .

2.3. Nucleotide Evolutionary Model Selection, Phylogenetic Analysis

For separate analysis, maximum likelihood (ML) trees for each locus were constructed in PAUP*v.4.0b10 [33] with the best nucleotide substitution model as determined by the Akaike Information Criterion (AIC). The concatenated dataset was divided into 22 partitions representing 25 genes, the best-known likelihood (BKL) tree for concatenated dataset was inferred after conducting 1000 RAxML runs using the f-d option for thorough searching and bootstrap replicates were performed in the multithread compiled version of RAxML-7.04. And Bayesian analysis running in MrBayes-3.1.2 [34] with 1,000,000 MCMC generations using the substitution model and parameters deduced from Model Test 3.06 [35] .

2.4. Alternative Phylogenetic Hypotheses Test

SH test using CONSEL version 0.1 [36] were performed to test the statistical support of most of the previous hypotheses (Figure 1) and the hypotheses deduced from separate analysis. The BKL tree as optimal likelihood tree was modified using TreeView [37] to produce phylogenetic trees representing the alternative hypotheses.

3. Results

3.1. Sequence Alignment and Statistical Tests

Aligned sequences for the individual gene regions varied from 334 to 2455 bp in length, and the variation and parsimony informative sites were quite different among all genes, bab1 and 28S contain the highest and lowest number of parsimony informative sites, respectively (Table 3). Most of the average p-distances among the taxa were lower than 10% (18 out of 22); the mrp1 and bab1 have

Table 3. The characters of the 22 genes across 5 of the auraria species complex.

P*: Parsimony information sites; A*: Average genetic p-distance (%).

larger values, whereas 28S, adh234, and odo have very small values (all lower than 2.2%). The test for substitution saturation [32] show that all gene regions have no substitution saturation. The sequences of all fragments show homogeneity of base frequencies (P ≥ 0.05).

3.2. Phylogenetic Analysis

The topologies of the trees deduced from the concatenated dataset under the ML and Bayesian analysis were completely identical (Figure 2), the five species in auraria species complex consisted of three lineages, the D. subauraria is the first species, then D. auraria and D. biauraria, D. quadraria and D. triauraria. The percent bootstrap support in ML analysis and posterior probabilities in Bayesian analysis all are 100 and 1.0, respectively. Maximum likelihood (ML) trees for each locus constructed in PAUP*v.4.0b10 [33] were different (Figure 3).

3.3. Alternative Hypotheses Test

All ML trees from each gene completely supported the melanogaster species group comprised of three monophyletic lineages: the ananassae subgroup, the montium subgroup, and the melanogaster subgroup plus oriental subgroups; however genes differed in the relationships among these groups. The montium subgroup was supported as the sister taxon of all remaining members of the melanogaster group by 6 of the 17 genes. The close relationships of the melanogaster, suzukii, and takahashii subgroups were supported by 4 genes. All 17 genes supported suzukii and takahashii as the sister lineages, and 7 genes supported the monophyly clade of ficusphila, eugracilis, elegans, and rhopaloa subgroups. Five of the 17 genes accepted the paraphyly of the suzukii subgroup, in which D. lucipennis is the sister species of D. elegans (see Table 3, and supplemental material). The p-values of all alternative phylogenetic hypotheses (Figure 1), except the hypotheses (p-values = 0.016) suggested by van der Linde and Houle (2008), are significantly lower than 0.005.

4. Discussion

Quality Evaluation for All the Representative Genes

Data quality is crucial for phylogenetic analysis, especially, based on DNA data-

Figure 2. The phylogenies of auraria complex were deduced from the concatenated dataset under the ML and Bayesian analysis. the number on the branch refer to bootstrap support in ML analysis and posterior probabilities in Bayesian analysis. D. melanogaster was out group.

Figure 3. Phylogenetic relationships of the auraria species complex were constructed based on different single locus.

sets; improper data which conceal conflicting evidence could lead to incorrect phylogenetic tree topologies [38] . All data in this study included various parsimony information sites, and the average genetic p-distance was close to 10% (Table 1), except for 28S (4.0%) and the four mtDNA (COI = 8.6%, COII = 8.1%, ND1 = 7.8%, and ND4 = 7.9%). 28S is 343 bp in length, a small part of the 28S complete sequence; the low genetic p-distance could come from arbitrary selection, hence, the conservative regions of this gene was selecte D. The mtDNA was traditionally considered as conservative genes; these kinds of genes were effective for discovering “higher-level” phylogenies. The likelihood mapping (Table 3) results also indicated that these four genes have significant contributions to resolve the phylogeny of the melanogaster species group (COI = 80.0%, COII = 81.5%, ND1 = 88.8%, and ND4 = 89.9%).

According to the test method described by Xia et al. (2003), most of the genes showed no saturation (Table 2) except for ITS and nup, with little saturation. Although saturation could bring noise into the phylogenetic analysis, these two genes included useful phylogenetic information, and the values of the average genetic p-distance suggest that these two genes have “fast” evolutionary rates. Thollesson (1999), however, suggested that fast genes also included phylogenetic signal and Scholler (1994) demonstrated divergent genes, e.g., ITS, are appropriate genetic markers to discover the phylogeny of the melanogaster species group. The likelihood mapping also showed these two genes have considerable amounts of phylogenetic signal. Therefore, these genes were included to avoid loss of phylogenetic signal.

The concatenated dataset has a large amount of useful parsimony information (parsimony sites, 4365), and the average genetic p-distance (14.0%), together with the likelihood mapping values (resolved quarters, 99.5%), indicated that the combined data has provided sufficient phylogenetic signals to resolve the phylogenetic relationships of the melanogaster species group.

In the present study, the phylogenetic relationships of the auraria species complex were reconstructed based on 22 gene segments (Figure 2). The phylogenetic tree indicated that D. subauraria is the first species, then D. auraria and D. biauraria, D. quadraria and D. triauraria, which were supported with high bootstrap values and posterior probabilities (100 and 1.0, respectively). D. subauraria was absent in the phylogenetic analysis before it was found in Japan [39] , but based on the RAPD data, it was the first species in this species complex. The phylogenetic position ofthe D. subauraria as the first species was also supported by some previous analysis [3] [5] [8] [10] and halfofthe separated analysis (11 genes of the 22 genes. On the other hand, no evidences show that D. subauraria could produce fertility offspring with other member of this species complex. All the above mentioned evidences suggested that the D. subauraria was the earlier divergent species. D. quadraria was always considered as the first species before D. subauraria was taken into phylogenetic analysis [8] , however, D. quadraria and D. triauraria could produce cross-fertilize generations [2] , which indicated that there was no reproductive isolation between these two species, therefore, these two species were ever treated as the same species [1] [2] . Moreover, some previous analysis based on two-dimensional electrophores and DNA data sets [9] [15] supported the close relationships of D. quadraria and D. triauraria, which were also supported by sixes gene data sets in the analysis (abb1, edo, ND1, ptc, amy, and COII). The phylogenetic relationships of D. auraria and D. biauraria were only supported by three gene data sets in separated analysis (ptc, abb1, and mrp1) and some previous evidences [4] [18] . From all the analysis, it could be draw a primary conclusion that D. subauraria was the first species, and D. quadraria and D. triauraria were the close relative two species, but the phylogenetic positions of D. auraria and D. biauraria should be resolved basedon more data sets. However, in the future analysis of the phylogenetic relationships of these species, it is advantageous to consider add more DNA sequences data. From all the separated and most previous analysis, it was obvious that any signal gene data was limited to discover the phylogenetic relationships of this species complex because almost all the previous analysis and separated analysis were rejected by analysis based on the concatenated dataset. Especially, some authors emphasized that phylogenetic hypotheses.

5. Conclusions

1) The phylogenetic relationship of Auraria species complex were analyzed based on different data, D. subauraria is the first species, D. auraria and D. biauraria, D. quadraria and D. triauraria as the second clusters.

2) To discuss the phylogenetic relationship of relative species, it was advantageous to concatenate large dataset than single locus.


This work was supported by the Department science of Hubei Province (2012FFB00304) and the Foundation of Wuhan City Technology Bureau (2013070104010013). We thank the Drosophila Genetic Resource Center and the Bloomington Drosophila center for supplying the flies used in this study.


*These authors contributed equally to this work.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Bock, I.R. and Wheeler, M.R. (1972) The Drosophila melanogaster Species Group Studies in Genetics VII. Univ. Texas Publ., 7213.
[2] Kimura, M.T. (1987) Habitat Differentiation and Speciation in the Drosophila auraria Species-Complex (Diptera, Drosophilidae). Japanese Journal of Entomology, 55, 429-436.
[3] Miyake, H. and Watada, M. (2007) Molecular phylogeny of the Drosophila auraria Species Complex and Allied Species of Japan Based on Nuclear and Mitochondrial DNA Sequences. Genes & Genetic Systems, 82, 77-88.
[4] Ohnishi, S., Kim, K.-W. and Watanabe, T.K. (1983) Biochemical Phylogeny of the Drosophila montium Species Subgroup. The Japanese Journal of Genetics, 58, 141-151.
[5] Kim, B.K., Watanabe, T.K. and Kitagawa, O. (1989) Evolutionary Genetics of the Drosophila montium Subgroup. I. Reproductive Isolations and the Phylogeny. The Japanese Journal of Genetics, 64, 177-190.
[6] Dai, Z. (1994) Study on Evolutionary Genetics of Drosophila auraria Species Complex—Cladistic Analysis and Phonetic Analysis. Journal of Genetics and Genomics, 21, 436-440.
[7] Goto, S.G. and Kimura, M. (2001) Phylogenetic Utility of Mitochondrial COI and COI Inuclear Gpdh Genes in Drosophila. Molecular Phylogenetics and Evolution, 18, 404-422.
[8] Zhao, Z.-M. (2001) Genetic Differentiation within Drosophila auraria Species Complex Revealed by Random Amplified Polymorphic DNA (RAPD). Acta Zoologica Sinica, 47, 625-631.
[9] Liu, Z.-M. (2002) Preliminary Studies on the Thr-Gly Region of the Period Gene in the Drosophila auraria Species Complex. Zoological Research, 23, 1-6.
[10] Lu, J., Lu, J., Chen, H.-X., Zhang, W.-X. and Dai, Z.-H. (2002) Molecular Phylogeny of Drosophila auraria Species Complex. Journal of Genetics and Genomics, 29, 39-49.
[11] Zhang, Z. and Inomata, N. (2003) Phylogeny and the Evolution of the Amylase Multigenes in the Drosophila montium Species Subgroup. Journal of Molecular Evolution, 56, 121-130.
[12] Yang, Y., Zhang, Y.P., Qian, Y.H. and Zeng, Q.T. (2004) Phylogenetic Relationships of the Drosophila melanogaster Species Group Deduced from Spacer Regions of Histone Gene H2A. Molecular Phylogenetics and Evolution, 30, 336-343.
[13] Mou, S.L., Zeng, Q.T., Yang, Y., Qian, Y.H. and Hu, G.A. (2005) Phylogeny of Melanogster Species Group Inferred from ND4L and ND4 Genes. Zoological Research, 26, 344-349.
[14] Lewis, R.L., Beckenbach, A.T. and Mooers, A.O. (2005) The Phylogeny of the Subgroups within the Melanogaster Species Group: Likelihood Tests on COI and COII Sequences and a Bayesian Estimate of Phylogeny. Molecular Phylogenetics and Evolution, 37, 15-24.
[15] Da Lage, J.L., Kergoat, G.J., Maczkowiak, F., Silvain, J.-F., Cariou, M.-L. and Lachaise, D. (2007) A Phylogeny of Drosophilidae Using the Amyrel Gene: Questioning the Drosophila melanogaster Species Group Boundaries. Journal of Zoological Systematics and Evolutionary Research, 45, 47-63.
[16] Van der Linde, K. and Houle, D. (2008) A Supertree Analysis and Literature Review of the Genus Drosopila and Closely Related Genera (Diptera, Drosophilidae). Insect Systematics & Evolution, 39, 241-267.
[17] Van der Linde, K., Houle, D., Spicer, G.S. and Steppan, S. (2010) A Supermatrix-Based Molecular Phylogeny of the Family Drosophilidae. Genetics Research, 92, 25-38.
[18] Yang, Y., Hou, Z.-C., Qian, Y.-H., Kang, H. and Zeng, Q.-T. (2012) Increasing the Data Size to Accurately Reconstruct the Phylogenetic Relationships between Nine Subgroups of the Drosophila melanogaster Species Group (Drosophilidae, Diptera). Molecular Phylogenetics and Evolution, 62, 214-223.
[19] Rokas, A. and Carroll, S.B. (2005) More Genes or More Taxa? The Relative Contribution of Gene Number and Taxon Number to Phylogenetic Accuracy. Molecular Biology and Evolution, 22, 1337-1344.
[20] Schawaroch, V.A. (2002) Phylogeny of a Paradigm Lineage: The Drosophila melanogaster Species Group (Diptera: Drosophilidae). Biological Journal of the Linnean Society, 76, 21-37.
[21] Hillis, D.M., Pollock, D., Mcguire, J.A. and Zwickl, D.J. (2003) Is Sparse Taxon Sampling a Problem for Phylogenetic Inference? Systematic Biology, 52, 124-126.
[22] Rosenberg, M.S. and Kumar, S. (2001) Incomplete Taxon Sampling Is Not a Problem for Phylogenetic Inference. Proceedings of the National Academy of Sciences of the United States of America, 98, 10751-10756.
[23] Pollock, D.D., Zwickl, D.J., Mcguire, J.A. and Hillis, D.M. (2002) Increased Taxon Sampling Is Advantageous for Phylogenetic Inference. Systematic Biology, 51, 664-671.
[24] Zwickl, D.J., Hillis, D.M. and Crandall, K. (2002) Increased Taxon Sampling Greatly Reduces Phylogenetic Error. Systematic Biology, 51, 588-598.
[25] Kopp, A. and True, J.R. (2002) Phylogeny of the Oriental Drosophila melanogaster Species Group: A Multilocus Reconstruction. Systematic Biology, 51, 786-805.
[26] Schlötterer, C. and Hauser, M.T. (1994) Comparative Evolutionary Analysis of rDNA ITS Regions in Drosophila. Molecular Biology and Evolution, 11, 513-522.
[27] Rannala, B., Huelsenbeck, J.P., Yang, Z., Nielsen, R. and Cannatella, D. (1998) Taxon Sampling and the Accuracy of Large Phylogenies. Systematic Biology, 47, 702-710.
[28] Bapteste, E., Brinkmann, H., Lee, J. A., Moore, D. V., Sensen, C. W., Gordon, P., Durufle, L., Gaasterland, T., Lopez, P., Muller, M. and Philippe, H. (2002) The Analysis of 100 Genes Supports the Grouping of Three Highly Divergent Amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proceedings of the National Academy of Sciences of the United States of America, 99, 1414-1419.
[29] Wolf, A.M., Conaway, M.R., Crowther, J.Q., Hazen, K.Y., Nadler, J.L., Oneida, B. and Bovbjerg, V.E. (2004) Translating Lifestyle Intervention to Practice in Obese Patients with Type 2 Diabetes: Improving Control with Activity and Nutrition (ICAN) Study. Diabetes Care, 27, 1570-1576.
[30] Edgar, R.C. (2004) MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Research, 32, 1792-1797.
[31] Tamura, K., Dudley, J., Nei, M. and Kumar, S. (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0. Molecular Biology and Evolution, 24, 1596-1599.
[32] Xia, X., Xie, Z., Salemi, M., Chen, L. and Wang, Y. (2003) An Index of Substitution Saturation and Its Application. Molecular Phylogenetics and Evolution, 26, 1-7.
[33] Swofford, D.L. (2002) PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods) Version 4. Associates, Sinauer, Sunderland, MA.
[34] Ronquist, F. and Huelsenbeck, J.P. (2003) MrBayes 3: Bayesian Phylogenetic Inference under Mixed Models. Bioinformatics, 19, 1572-1574.
[35] Posada, D. and Crandall, K. (1998) MODELTEST: Testing the Model of DNA Substitution. Bioinformatics, 14, 817-818.
[36] Shimodaira, H. and Hasegawa, M. (2001) CONSEL: For Assessing the Confidence of Phylogenetic Tree Selection. Bioinformatics, 17, 1246-1247.
[37] Page, R.D. (1996) TreeView: An Application to Display Phylogenetic Trees on Personal Computers. Computer Applications in the Biosciences, 12, 357-358.
[38] Kimura, M. (1983) The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge.
[39] Ohnishi, S. and Watanabe, T.K. (1984) Systematics of the Drosophila montium Species Subgroup: A Biochemical Approach. Zoological Science, 1, 801-807.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.