Graph-Theoretic Algorithms to Improve Phylogenomic Analyses

NSF grant CCF-1535977, collaborative with UC Berkeley. We are developing new theoretical computer science and discrete algorithms for improving the estimation of large species and gene trees, and specifically enabling statistical methods to scale to ultra-large datasets.


Funding: U.S. National Science Foundation grant CCF-1535977 (Algorithms in the Field).

Project Overview: Understanding the history of life on earth - how species evolved from their common ancestor - is a major goal of biological research. These evolutionary trees are very hard to construct with high accuracy, because nearly all of the most accurate approaches require the solution to computationally hard optimization problems. Furthermore, research has shown that the evolutionary tree for a single gene can be different from the evolutionary tree for the species, and current methods do not provide adequate accuracy on genome-scale data. As a result, large evolutionary trees, covering big portions of "The Tree of Life", are very difficult to compute with high accuracy. This project will develop methods that can enable highly accurate species tree estimation. The key approach is the development of novel divide-and-conquer strategies, whereby a dataset is divided into overlapping subsets, species trees are constructed on the subsets, and then the subset species trees are merged together into a tree on the full dataset. These approaches will be combined with powerful statistical estimation methods, to potentially transform the capability of evolutionary biologists to analyze their data. This project will also provide open source software for the new methods that are developed, and provide training in the use of the software to biologists at national meetings. The project will also contribute to interdisciplinary training for two doctoral students, one at Illinois and one at Berkeley, and course materials for computational biology will be made available online.


Project Software:

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.