Open Journal of Statistics

Volume 10, Issue 2 (April 2020)

ISSN Print: 2161-718X   ISSN Online: 2161-7198

Google-based Impact Factor: 0.72  Citations  h5-index & Ranking

Predicting the Underlying Structure for Phylogenetic Trees Using Neural Networks and Logistic Regression

HTML  XML Download Download as PDF (Size: 1145KB)  PP. 239-251  
DOI: 10.4236/ojs.2020.102017    142 Downloads   294 Views  


Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ from those needed when a population is not structured. In this paper, we compared two supervised machine learning techniques, that is artificial neural network (ANN) and logistic regression models for prediction of an underlying structure for phylogenetic trees. We carried out parameter tuning for the models to identify optimal models. We then performed 10-fold cross-validation on the optimal models for both logistic regression and ANN. We also performed a non-supervised technique called clustering to identify the number of clusters that could be identified from simulated phylogenetic trees. The trees were from both structured and non-structured populations. Clustering and prediction using classification techniques were done using tree statistics such as Colless, Sackin and cophenetic indices, among others. Results from 10-fold cross-validation revealed that both logistic regression and ANN models had comparable results, with both models having average accuracy rates of over 0.75. Most of the clustering indices used resulted in 2 or 3 as the optimal number of clusters.

Cite this paper

Kayondo, H. and Mwalili, S. (2020) Predicting the Underlying Structure for Phylogenetic Trees Using Neural Networks and Logistic Regression. Open Journal of Statistics, 10, 239-251. doi: 10.4236/ojs.2020.102017.

Cited by

No relevant information.

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.