Research Overview

My research combines mathematics, computer science, probability, and statistics, in order to develop algorithms with improved accuracy for large-scale and complex estimation problems in phylogenomics, multiple sequence alignment, and metagenomics.

I work especially on the hardest computational problems in these areas, where large dataset sizes and model complexity makes existing approaches have insufficient accuracy. For these problems, I develop innovative strategies (often including graph-theoretic algorithms that employ divide-and-conquer, combined with machine learning methods), develop software, analyze biological datasets (in collaboration with biologists around the world), and prove theorems about the methods we develop.

Recent and current NSF grants to support my work include:

I also work in Historical Linguistics, which seeks to estimate how language families (e.g., Indo-European) evolved. We use real data and perform massive simulations to evaluate the performance of methods that we develop, and also collaborate closely with biologists and linguists in data analysis.


Our current collaborations include the 1KP (Thousand Transcriptome Project) and the Avian Phylogenomics Project. These collaborations include data analysis and the development of new methods for estimating alignments and trees (both gene trees and species trees). We welcome collaborations with biologists who have data that are difficult to analyze, either because the datasets are too large for current methods, or because current methods fail to have sufficiently high accuracy.

Computational analyses

A great deal of my work involves exploration of the design space of the algorithms we develop, which in turn depends very much on the availability of substantial computational resources. At the University of Illinois, I have been able to do these analyses using the Illinois Campus Cluster Program as well as Blue Waters.

For more about my research:

Please see recent talks, list of publications, or news articles.