I stand with the African-American community and all my friends and colleagues who are outraged by the killing of George Floyd and other African-American men and women.


Research Overview: I am a computer scientist, data scientist, and perhaps even a statistician. I work on algorithmic problems in computational biology with the aim of developing methods that biologists will use and that will have transformative accuracy and scalability. Part of this work involves mathematics (to understand the theoretical guarantees of the methods I develop, and of other methods), but part of it is also empirical (to understand performance on data). So implementation and testing is very important. All of my methods are a combination of graph algorithms and machine learning or statistical learning. My work in machine learning in particular involves the development of novel ensemble methods, using phylogenetic estimation to guide the design of the ensemble. The machine learning I do is largely unsupervised or semi-supervised learning, largely because there is very limited reliable labeled data in my field; as a result, I do not work in deep learning. Mathematical proofs are part of what I do, but my focus on empirical performance (on data, in other words) drives my research. My current work is on large-scale and complex estimation problems in phylogenomics (genome-scale phylogeny estimation), multiple sequence alignment, metagenomics, and historical linguistics. I am a big fan of Blue Waters, and have benefitted from several allocations. I also very much like collaborating with biologists, and have worked with the Avian Phylogenomics Project and the Thousand Plant Transcriptome project, among others.
I am seeking new grad students available: I have openings in my group for graduate students (PhD or MS) to work on developing computational methods for large-scale multiple sequence alignment, phylogeny estimation, metagenomics, and even historical linguistics. Strong programming skills, mathematical intuition, and interest in collaboration are necessary. If you are interested in working with me, you should take my graduate course CS 581: Algorithmic Genomic Biology which I will teach in Fall 2020.

Postdoc positions at UIUC Computer Science. These are flexible postdocs that can be used with anyone in the CS department. If you want to teach, then these positions will be funded 50% by the department and 50% by the research faculty mentor. In exchange for departmental funding, these postdocs will teach 1 course per year, based on department needs and the candidate's interest; if the candidate wants to teach more, they will have the opportunity to do so.

Computational Phylogenetics: An introduction to designing methods for phylogeny estimation, published by Cambridge University Press (and available for purchase at Amazon and as an E-book at Google Play). Errata are posted as I find them. The image of the Monterey Cypress is there because of the NSF-funded CIPRES project, whose purpose was to develop the methods and computational infrastructure to improve large-scale phylogeny estimation. Why I wrote this book.

I dedicated the book to my PhD advisor Gene Lawler, who died in 1994; see this memorium (published in the Journal of Computational Biology, 10 Jun 2009) that I co-authored with Dan Gusfield, David Shmoys, and Jan Karel Lenstra about Gene.

Bioinformatics and Phylogenetics: Seminal Contributions of Bernard Moret, published by Springer. This book is a Festschrift for Bernard Moret, who retired from EPFL in December 2016. The book contains a collection of self-contained chapters that can be used for an advanced course in computational biology and bioinformatics.

Current Funding:

Recent NSF funding has supported work in phylogenomics, described here. This is still an area of very active research in my group. I also recently benefited from support of the John P. Simon Guggenheim Foundation, and earlier support from the David and Lucile Packard Foundation, the Radcliffe Institute for Advanced Study at Harvard University, the Program for Evolutionary Dynamics at Harvard University, and Microsoft Research, New England. The Founder Professorship is funded through the Grainger Engineering Breakthroughs Initiative, which is supporting development of research in Big Data and Bioengineering at UIUC. I am grateful to the National Science Foundation for its continuous support since 1994. See this page for completed projects funded by NSF, starting in 2001.

"Plus de détails, plus de détails, disait-il à son fils, il n'y a d'originalité et de vérité que dans les détails..." -- Stendhal, Lucien Leuwen (a quote much loved by my stepfather, Martin J. Klein, and an essential guide for all scholarship).

