Novel Methodologies for Genome-scale Evolutionary Analysis of Multilocus Data
PI: Tandy Warnow
Funding: U.S. National Science Foundation grant DBI-1461364.
The project reports document can be found at
this NSF web page
Project Overview:
The Novel Methodologies for Genome-scale Evolutionary Analysis of Multilocus Data project is
a joint effort among groups at Stanford University, UT Austin, Rice University, and Linfield College. The project aims to (1) devise new algorithms for species tree inference; (2) develop new methods for scalability of inference algorithms to large-scale genomic data; (3) perform mathematical, simulation-based, and empirical evaluations of the properties of species tree inference algorithms.
Highlights:
The major effort for the Warnow Lab
has been the development of computationally efficient
methods for species tree estimation in the presence of
gene tree discord due to incomplete lineage sorting.
The highlights of this work include
- ASTRAL, a polynomial
time method for computing a tree from a set of gene trees,
and that is statistically consistent under the multi-species
coalescent model. ASTRAL is very fast, and can
analyze datasets with up to 1,000 genes and 1,000 species
in under a day.
ASTRAL was used in the Thousand Plant Transcriptome Project
(1KP) to compute a plant phylogeny in
Wickett, Mirarab, et al.,
PNAS 2014.
See Mirarab et al., Bioinformatics 2014 and
Mirarab and Warnow
Bioinformatics 2015, (special
issue for ISMB 2015).
- ASTRID, a coalescent-based method for
estimating species trees from multiple gene trees,
and which is faster than ASTRAL and often as accurate.
See Vachaspati
and Warnow, BMC Genomics 2015.
- Statistical binning, a technique to improve
the estimation of gene trees in a multi-locus
phylogenomic project. See
Mirarab et al., Science
2014 for the original
paper, and
Bayzid et al., PLOS One, for the improved version (weighted
statistical binning).
Statistical binning was used to provide a coalescent-based
estimation of the Avian phylogeny in Jarvis, Mirarab, et al., Science 2014.
- Theoretical work evaluating
the impact of gene tree estimation error
coalescent-based
species tree estimation methods.
See
Roch and Warnow, Systematic Biology 2015.
- BBCA,
a technique
to improve the scalability of *BEAST (Heled and Drummond),
for co-estimating gene trees and species trees, so that it
could run on larger numbers of genes with greater efficiency.
See
Zimmerman, Mirarab,
and Warnow, BMC Genomics, 2014.
- FastRFS, a new and very fast supertree method
that finds an exact solution to the Robinson-Foulds Supertree problem
within a constrained search space. FastRFS
was published in Bioinformatics 2016, and presented
at the RECOMB Comparative Genomics 2016 conference.
Project Software:
The Warnow Lab also produced other software, focusing on supertree estimation,
multiple sequence alignment, metagenomic data analysis,
and other topics.
See this page
for a nearly full list of software produced by the lab. Below, we
provide links to
software for estimating species trees from heterogeneous sets of gene trees, where heterogeneity is due to incomplete lineage sorting (ILS).
-
Statistical Binning,
methods for improving gene tree estimation in a multi-locus setting.
Statistical binning was used by the Avian Phylogenomics
Project to produce a species tree for 48 birds using approximately
14,000 loci.
-
ASTRAL, software for estimating
a species tree from a set of gene trees, taking gene tree discordance
due to incomplete lineage sorting into account.
ASTRAL was used by the
Thousand Plant Transcriptome Project to
produce a species tree for land plants,
and is in use by many phylogenomics projects.
-
ASTRID,
software for estimating a species tree from a set of gene trees,
taking gene tree discordance into account. ASTRID is
an improved version of NJst, and is both
faster than ASTRAL and sometimes more accurate.
Summer Symposia and Software Schools:
Among several symposia and software schools (some locally
at the University of Illinois at Urbana-Champaign),
the grant provides summer symposia and software schools to train researchers
(from students through faculty) in new multiple
sequence alignment methods, species tree estimation
from sets of gene trees or their sequence alignments,
and other topics within phylogenomics.
Publications:
See my online publication list for all
papers (many of which can be downloaded). Specific papers related
to phylogenomic estimation (most of which
are supported by this grant) are given below:
-
Yu, Y., T. Warnow, and L. Nakhleh.
"Algorithms for MDC-based Multi-locus Phylogeny Inference."
Proceedings of RECOMB 2011
(PDF).
The full paper has additional
results:
"Algorithms for MDC-Based Multi-Locus Phylogeny Inference:
Beyond Rooted Binary Gene Trees on Single Alleles,"
J. Computational Biology November 2011, Vol. 18, No. 11, pp 1543-1559.
- Yang, J. and T. Warnow
"Fast and accurate methods for phylogenomic analyses."
RECOMB Comparative Genomics 2011, and
BMC Bioinformatics 12(Suppl 9): S4 (5 October 2011).
- Swenson, M.S., R. Suri, C.R. Linder, and T. Warnow.
"SuperFine: fast and accurate supertree estimation."
Systematic Biology
(2012) 61(2):214-227.
-
Neves, D. T., T. Warnow, J. L. Sobral and K.
Pingali.
``Parallelizing SuperFine."
27th Symposium on Applied Computing
(ACM-SAC),
Bioinformatics, 2012,
pages 1361--1367,
doi = 10.1145/2231936.2231992.
-
Bayzid, Md. S. and T. Warnow.
"Estimating Optimal Species Trees from Incomplete
Gene Trees under Deep Coalescence."
Journal of Computational Biology,
June 2012, Vol. 19, No. 6: 591-605,
special issue for Simon Tavare and Michael Waterman.
(HTML).
-
Nguyen, N., S. Mirarab, and T. Warnow.
"MRL and SuperFine+MRL: new supertree methods."
Journal
Algorithms for Molecular Biology 7:3, 2012.
-
Nelesen, S., K. Liu, L.-S. Wang, C. R. Linder, and T. Warnow.
"DACTAL: divide-and-conquer trees (almost)
without alignments."
Bioinformatics
Vol 28, ISMB 2012, pages i274-i282.
-
M.S. Bayzid, S. Mirarab, and T. Warnow.
"Inferring optimal species trees under gene duplication and loss."
Pacific Symposium on Biocomputing, 18:250-261 (2013).
(PDF).
-
M.S. Bayzid and T. Warnow.
"Naive binning improves phylogenomic analyses".
Bioinformatics,
supplementary materials.
-
T. Warnow.
"Large-scale multiple sequence alignment and
phylogeny estimation"
Chapter 6 in "Models and Algorithms for Genome Evolution", edited by Cedric Chauve, Nadia El-Mabrouk and Eric Tannier, Springer series on "Computational Biology".
For a preprint (not in final form) of this chapter, see this
PDF,
-
S. Mirarab, R. Reaz, Md. S. Bayzid, T. Zimmermann, M.S. Swenson,
and T. Warnow.
"ASTRAL: Genome-Scale Coalescent-Based Species Tree Estimation."
Proceedings, ECCB (European Conference on
Computational Biology), 2014.
Also, Bioinformatics 2014 30 (17): i541-i548.
doi: 10.1093/bioinformatics/btu462.
(PDF)
-
Md S. Bayzid, T. Hunt, and T. Warnow.
"Disk Covering Methods Improve Phylogenomic
Analyses".
Proceedings of
RECOMB-CG (Comparative Genomics), 2014,
and BMC Genomics 2014, 15(Suppl 6): S7.
(PDF) and
Supplementary materials
-
T. Zimmermann, S. Mirarab and T. Warnow.
"BBCA: Improving the scalability of *BEAST
using random binning".
Proceedings of RECOMB-CG (Comparative
Genomics), 2014, and
BMC Genomics 2014, 15(Suppl 6): S11.
(PDF) and
Supplementary materials.
-
S. Mirarab, Md S. Bayzid, and T. Warnow.
"Evaluating summary methods for multi-locus species
tree estimation in the presence of incomplete lineage
sorting".
Systematic Biology, doi = {10.1093/sysbio/syu063},
(PDF)
-
N. Wickett, S. Mirarab, N. Nguyen, T. Warnow, et al. (37 authors).
``Phylotranscriptomic analysis of the origin and diversification
of land plants."
Proceedings of the National Academy of Sciences (PNAS),
doi: 10.1073/pnas.1323926111.
(PDF).
-
N. Nguyen, S. Mirarab, B. Liu, M. Pop, and T. Warnow
"TIPP:Taxonomic Identification and Phylogenetic Profiling."
Bioinformatics, 2014;
doi: 10.1093/bioinformatics/btu721.
(webpage)
-
S. Mirarab, Md. S. Bayzid, B. Boussau, and T. Warnow.
"Statistical binning enables an accurate coalescent-based estimation of the avian tree".
Science, 12 December 2014: 1250463.
Science
- E. D. Jarvis, S. Mirarab, A. J. Aberer, B. Li, P. Houde, C. Li, S. Y. W. Ho, B. C. Faircloth, B. Nabholz, J. T. Howard, A. Suh, C. C. Weber, R. R. da Fonseca, J. Li, F. Zhang, H. Li, L. Zhou, N. Narula, L. Liu, G. Ganapathy, B. Boussau, Md. S. Bayzid, V. Zavidovych, S. Subramanian, T. Gabaldon, S. Capella-Gutierrez, J. Huerta-Cepas, B. Rekepalli, K. Munch, M. Schierup, B. Lindow,
W. C. Warren, D. Ray, R. E. Green, M. W. Bruford, X. Zhan, A. Dixon, S. Li, N. Li, Y. Huang, E. P. Derryberry, M. F. Bertelsen, F. H. Sheldon, R. T. Brumfield, C. V. Mello, P. V. Lovell, M. Wirthlin,
M. P. C. Schneider, F. Prosdocimi, J. A. Samaniego, A. M. V. Velazquez, A.
Alfaro-Nunez,
P. F. Campos, B. Petersen, T. Sicheritz-Ponten, A. Pas, T. Bailey, P. Scofield, M. Bunce,
D. M. Lambert, Q. Zhou, P. Perelman, A. C. Driskell, B. Shapiro, Z. Xiong, Y. Zeng, S. Liu, Z. Li,
B. Liu, K. Wu, J. Xiao, X. Yinqi, Q. Zheng, Y. Zhang, H. Yang, J. Wang, L. Smeds, F. E. Rheindt,
M. Braun, J. Fjeldsa, L. Orlando, F. K. Barker, K. A. Jonsson, W. Johnson, K.-P. Koepfli,
S. O'Brien, D. Haussler, O. A. Ryder, C. Rahbek, E. Willerslev, G. R. Graves, T. C. Glenn,
J. McCormack, D. Burt, H. Ellegren, P. Alstrom, S. V. Edwards, A. Stamatakis, D. P. Mindell,
J. Cracraft, E. L. Braun, T. Warnow, W. Jun, M. T. P. Gilbert, and G. Zhang. "Whole-genome analyses resolve early branches in the tree of life of modern birds."
Science 12 December 2014: 1320-1331.
- N. Nguyen, S. Mirarab, K. Kumar, and T. Warnow, "Ultra-large alignments using phylogeny aware profiles". Proceedings RECOMB 2015 and Genome Biology (2015) 16:124
-
M. S. Bayzid, S. Mirarab, B. Boussau, and T. Warnow.
"Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses", PLOS One, 2015, DOI: 10.1371/journal.pone.0129183.
(html)
(PDF)