Research Overview: My research combines mathematics, computer science, probability, and statistics, in order to develop algorithms with improved accuracy for large-scale and complex estimation problems in phylogenomics (genome-scale phylogeny estimation), multiple sequence alignment, metagenomics, and historical linguistics.

I work especially on the hardest computational problems in these areas, where large dataset sizes and model complexity makes existing approaches have insufficient accuracy. For these problems, I develop innovative strategies (often including graph-theoretic algorithms that employ divide-and-conquer, combined with powerful statistical methods), and prove theorems about the methods we develop. I also work in Historical Linguistics, which seeks to estimate how language families (e.g., Indo-European) evolved. We use real data and perform massive simulations to evaluate the performance of methods that we develop, and also collaborate closely with biologists and linguists in data analysis.

Our current collaborations include the 1KP (Thousand Transcriptome Project) and the Avian Phylogenomics Project. These collaborations include data analysis and the development of new methods for estimating alignments and trees (both gene trees and species trees). We welcome collaborations with biologists who have data that are difficult to analyze, either because the datasets are too large for current methods, or because current methods fail to have sufficiently high accuracy.

As an example of my work, please see recent talks.