Novel Methodologies for Genome-scale Evolutionary Analysis of Multilocus Data

PI: Tandy Warnow

Funding: U.S. National Science Foundation grant DBI-1461364.

Project Overview: The Novel Methodologies for Genome-scale Evolutionary Analysis of Multilocus Data project is a joint effort among groups at Stanford University, UT Austin, Rice University, and Linfield College. The project aims to (1) devise new algorithms for species tree inference; (2) develop new methods for scalability of inference algorithms to large-scale genomic data; (3) perform mathematical, simulation-based, and empirical evaluations of the properties of species tree inference algorithms.

Highlights: The major effort for the Warnow Lab has been the development of computationally efficient methods for species tree estimation in the presence of gene tree discord due to incomplete lineage sorting. The highlights of this work include

Project Software: The Warnow Lab also produced other software, focusing on supertree estimation, multiple sequence alignment, metagenomic data analysis, and other topics. See this page for a nearly full list of software produced by the lab. Below, we provide links to software for estimating species trees from heterogeneous sets of gene trees, where heterogeneity is due to incomplete lineage sorting (ILS).

Summer Symposia and Software Schools: Among several symposia and software schools (some locally at the University of Illinois at Urbana-Champaign), the grant provides summer symposia and software schools to train researchers (from students through faculty) in new multiple sequence alignment methods, species tree estimation from sets of gene trees or their sequence alignments, and other topics within phylogenomics.

Publications: See my online publication list for all papers (many of which can be downloaded). Specific papers related to phylogenomic estimation (most of which are supported by this grant) are given below:

  1. Yu, Y., T. Warnow, and L. Nakhleh. "Algorithms for MDC-based Multi-locus Phylogeny Inference." Proceedings of RECOMB 2011 (PDF). The full paper has additional results: "Algorithms for MDC-Based Multi-Locus Phylogeny Inference: Beyond Rooted Binary Gene Trees on Single Alleles," J. Computational Biology November 2011, Vol. 18, No. 11, pp 1543-1559.
  2. Yang, J. and T. Warnow "Fast and accurate methods for phylogenomic analyses." RECOMB Comparative Genomics 2011, and BMC Bioinformatics 12(Suppl 9): S4 (5 October 2011).
  3. Swenson, M.S., R. Suri, C.R. Linder, and T. Warnow. "SuperFine: fast and accurate supertree estimation." Systematic Biology (2012) 61(2):214-227.
  4. Neves, D. T., T. Warnow, J. L. Sobral and K. Pingali. ``Parallelizing SuperFine." 27th Symposium on Applied Computing (ACM-SAC), Bioinformatics, 2012, pages 1361--1367, doi = 10.1145/2231936.2231992.
  5. Bayzid, Md. S. and T. Warnow. "Estimating Optimal Species Trees from Incomplete Gene Trees under Deep Coalescence." Journal of Computational Biology, June 2012, Vol. 19, No. 6: 591-605, special issue for Simon Tavare and Michael Waterman. (HTML).
  6. Nguyen, N., S. Mirarab, and T. Warnow. "MRL and SuperFine+MRL: new supertree methods." Journal Algorithms for Molecular Biology 7:3, 2012.
  7. Nelesen, S., K. Liu, L.-S. Wang, C. R. Linder, and T. Warnow. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics Vol 28, ISMB 2012, pages i274-i282.
  8. M.S. Bayzid, S. Mirarab, and T. Warnow. "Inferring optimal species trees under gene duplication and loss." Pacific Symposium on Biocomputing, 18:250-261 (2013). (PDF).
  9. M.S. Bayzid and T. Warnow. "Naive binning improves phylogenomic analyses". Bioinformatics, supplementary materials.
  10. T. Warnow. "Large-scale multiple sequence alignment and phylogeny estimation" Chapter 6 in "Models and Algorithms for Genome Evolution", edited by Cedric Chauve, Nadia El-Mabrouk and Eric Tannier, Springer series on "Computational Biology". For a preprint (not in final form) of this chapter, see this PDF,
  11. S. Mirarab, R. Reaz, Md. S. Bayzid, T. Zimmermann, M.S. Swenson, and T. Warnow. "ASTRAL: Genome-Scale Coalescent-Based Species Tree Estimation." Proceedings, ECCB (European Conference on Computational Biology), 2014. Also, Bioinformatics 2014 30 (17): i541-i548. doi: 10.1093/bioinformatics/btu462. (PDF)
  12. Md S. Bayzid, T. Hunt, and T. Warnow. "Disk Covering Methods Improve Phylogenomic Analyses". Proceedings of RECOMB-CG (Comparative Genomics), 2014, and BMC Genomics 2014, 15(Suppl 6): S7. (PDF) and Supplementary materials
  13. T. Zimmermann, S. Mirarab and T. Warnow. "BBCA: Improving the scalability of *BEAST using random binning". Proceedings of RECOMB-CG (Comparative Genomics), 2014, and BMC Genomics 2014, 15(Suppl 6): S11. (PDF) and Supplementary materials.
  14. S. Mirarab, Md S. Bayzid, and T. Warnow. "Evaluating summary methods for multi-locus species tree estimation in the presence of incomplete lineage sorting". Systematic Biology, doi = {10.1093/sysbio/syu063}, (PDF)
  15. N. Wickett, S. Mirarab, N. Nguyen, T. Warnow, et al. (37 authors). ``Phylotranscriptomic analysis of the origin and diversification of land plants." Proceedings of the National Academy of Sciences (PNAS), doi: 10.1073/pnas.1323926111. (PDF).
  16. N. Nguyen, S. Mirarab, B. Liu, M. Pop, and T. Warnow "TIPP:Taxonomic Identification and Phylogenetic Profiling." Bioinformatics, 2014; doi: 10.1093/bioinformatics/btu721. (webpage)
  17. S. Mirarab, Md. S. Bayzid, B. Boussau, and T. Warnow. "Statistical binning enables an accurate coalescent-based estimation of the avian tree". Science, 12 December 2014: 1250463. Science
  18. E. D. Jarvis, S. Mirarab, A. J. Aberer, B. Li, P. Houde, C. Li, S. Y. W. Ho, B. C. Faircloth, B. Nabholz, J. T. Howard, A. Suh, C. C. Weber, R. R. da Fonseca, J. Li, F. Zhang, H. Li, L. Zhou, N. Narula, L. Liu, G. Ganapathy, B. Boussau, Md. S. Bayzid, V. Zavidovych, S. Subramanian, T. Gabaldon, S. Capella-Gutierrez, J. Huerta-Cepas, B. Rekepalli, K. Munch, M. Schierup, B. Lindow, W. C. Warren, D. Ray, R. E. Green, M. W. Bruford, X. Zhan, A. Dixon, S. Li, N. Li, Y. Huang, E. P. Derryberry, M. F. Bertelsen, F. H. Sheldon, R. T. Brumfield, C. V. Mello, P. V. Lovell, M. Wirthlin, M. P. C. Schneider, F. Prosdocimi, J. A. Samaniego, A. M. V. Velazquez, A. Alfaro-Nunez, P. F. Campos, B. Petersen, T. Sicheritz-Ponten, A. Pas, T. Bailey, P. Scofield, M. Bunce, D. M. Lambert, Q. Zhou, P. Perelman, A. C. Driskell, B. Shapiro, Z. Xiong, Y. Zeng, S. Liu, Z. Li, B. Liu, K. Wu, J. Xiao, X. Yinqi, Q. Zheng, Y. Zhang, H. Yang, J. Wang, L. Smeds, F. E. Rheindt, M. Braun, J. Fjeldsa, L. Orlando, F. K. Barker, K. A. Jonsson, W. Johnson, K.-P. Koepfli, S. O'Brien, D. Haussler, O. A. Ryder, C. Rahbek, E. Willerslev, G. R. Graves, T. C. Glenn, J. McCormack, D. Burt, H. Ellegren, P. Alstrom, S. V. Edwards, A. Stamatakis, D. P. Mindell, J. Cracraft, E. L. Braun, T. Warnow, W. Jun, M. T. P. Gilbert, and G. Zhang. "Whole-genome analyses resolve early branches in the tree of life of modern birds." Science 12 December 2014: 1320-1331.
  19. N. Nguyen, S. Mirarab, K. Kumar, and T. Warnow, "Ultra-large alignments using phylogeny aware profiles". Proceedings RECOMB 2015 and Genome Biology (2015) 16:124
  20. M. S. Bayzid, S. Mirarab, B. Boussau, and T. Warnow. "Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses", PLOS One, 2015, DOI: 10.1371/journal.pone.0129183. (html) (PDF)
  21. S. Mirarab and T. Warnow. "ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes", Proceedings ISMB 2015, and Bioinformatics 2015 31 (12): i44-i52 doi: 10.1093/bioinformatics/btv234 (PDF)
  22. S. Roch and T. Warnow. "On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods", Systematic Biology, 64(4):663-676, 2015, (PDF)
  23. T. Warnow. "Concatenation analyses in the presence of incomplete lineage sorting", PLOS Currents: Tree of Life 2015 May 22. Edition 1. doi: 10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7 (HTML)
  24. R. Davidson, P. Vachaspati, S. Mirarab, and T. Warnow. "Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer." RECOMB-Comparative Genomics, and BMC Genomics, 2015, Preliminary version at (PDF), 2015, 16 (Suppl 10): S1.
  25. J. Chou, A. Gupta, S. Yaduvanshi, R. Davidson, M. Nute, S. Mirarab and T. Warnow. "A comparative study of SVDquartets and other coalescent-based species tree estimation methods." RECOMB-Comparative Genomics and BMC Genomics, 2015., 2015, 16 (Suppl 10): S2.
  26. P. Vachaspati and T. Warnow. "ASTRID: Accurate Species TRees from Internode Distances." RECOMB-Comparative Genomics and BMC Genomics, 2015 2015, 16 (Suppl 10): S3.
  27. S. Mirarab, Md. S. Bayzid, B. Boussau, and T. Warnow. "Response to Comment on `Statistical binning enables an accurate coalescent-based estimation of the avian tree'." Science, 2015, volume 350, number 6257, p. 171, DOI: 10.1126/science.aaa7719.
  28. J.E. Tarver, M. d. Reis, S. Mirarab, R. J. Moran, S. Parker, J.E. O'Reilly, B.L. King, M.J. O'Connell, R.J. Asher, T. Warnow, K. J. Peterson, P.C.J. Donoghue, and D. Pisani. "The interrelationships of placental mammals and the limits of phylogenetic inference." Genome Biology and Evolution, doi:10.1093/gbe/evv261.
  29. L. Uricchio, T. Warnow, and N. Rosenberg (2016). "An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees." BMC Bioinformatics, 17 (Suppl 14): 1266, special issue for RECOMB-CG. (HTML)
  30. P. Vachaspati and T. Warnow (2016). "FastRFS: Fast and Accurate Robinson-Foulds Supertrees using Constrained Exact Optimization". Bioinformatics 2016; doi: 10.1093/bioinformatics/btw600. (Special issue for papers from RECOMB-CG) (PDF)
  31. N.-P. Nguyen, M. Nute, S. Mirarab, and T. Warnow (2016). "HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs.'' BMC Genomics 17 (Suppl 10):765, special issue for RECOMB-CG. (HTML) (supplement)

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.