Boosting BAli-Phy: using PASTA and UPP to enable BAli-Phy to analyze large and ultra-large datasets

Mike Nute
Department of Statistics
The University of Illinois at Urbana-Champaign

Multiple sequence alignment is an important initial step in many bioinformatics analyses, including in metagenomics and phylogeny estimation. When sequences have evolved down a tree, the most accurate sequence alignments are frequently generated by Bayesian methods that co-estimate the tree and the alignment, but these methods tend to be slow and impractical on alignments beyond 200 sequences. We show how one such method, BAli-Phy, can be used within a divide-and-conquer strategy to extend the accuracy of such methods to 10,000 sequences and we present results on simulated data that demonstrate the increase in accuracy not only of the alignment but of the estimated phylogeny.