|
Tandy Warnow
Associate Director, Siebel School of Computing and Data Science Grainger Distinguished Chair in Engineering Member, Carl R. Woese Institute for Genomic Biology Affiliate, National Center for Supercomputing Applications Affiliate, Coordinated Sciences Laboratory Affiliate, Unit for Criticism and Interpretive Theory Affiliate, departments of Electrical and Computer Engineering; Mathematics; Statistics; Evolution, Ecology, and Behavior; Entomology; and Plant Biology. Member (Treasurer), Executive Committee and Board of Directors, International Society for Computational Biology (ISCB), 2025-2027 Fellow of the ISCB (International Society for Computational Biology), 2017
|
|
Graduate students: My group is full and I am not looking for new students. If you are interested in working in computational biology, please contact the other faculty in our Bioinformatics and Computational Biology group (see this page). If you are interested in network science, please check with my collaborator George Chacko; we work with several students.
Undergraduate students: I have had wonderful experiences with several outstanding undergraduate students, several of whom have gone on to do PhDs in CS at MIT, UCLA, and other strong institutions. If you want to work with me, you should first take a course with me. For students who want to work in bioinformatics, that would mean my graduate course, CS 581, which I offer once a year (typically in the Spring semester). On occasion I will consider advanced students (junior or senior year) who have not taken a course with me, but this generally only happens when they have prior work on a topic that I am currently working on. If you are an undergraduate student and would like to work with me, please send email with your transcript and CV, and indicate which papers of mine you have read and why they interest you. Note that the first semester will be a for-credit, unpaid, CS 397 course; the grade for the 397 will be based on what you achieved, and how it is written up in a final report. Note that this grade will be based in part on the writing quality (30%) as well as content (see this page for some advice about writing). Also see No, You Don't Get an A for Effort. Finally, please also note that I expect all students who work with me to have strong programming skills in several languages, and to have completed CS 222, 225, 361, and 374, with grades not lower than A-.
Research in Bioinformatics My main research is focused on algorithmic problems in computational biology with the aim of developing methods that biologists will use and that will have transformative accuracy and scalability. Some of this work is summarized in Philosophical Transactions of the Royal Society B, but see also the extended version in a preprint, "Recent Progress on Methods for Estimating and Updating Large Phylogenies". Part of this work involves mathematics (to understand the theoretical guarantees of the methods I develop, and of other methods), but part of it is also empirical (to understand performance on data). So implementation and testing is very important. All of my methods are a combination of graph algorithms and machine learning or statistical learning. My work in machine learning in particular involves the development of novel ensemble methods, using phylogenetic estimation to guide the design of the ensemble. The machine learning I do is largely unsupervised or semi-supervised learning, largely because there is very limited reliable labeled data in my field; as a result, I do not work in deep learning. Mathematical proofs are part of what I do, but my focus on empirical performance (on data, in other words) drives my research. My current work is on large-scale and complex estimation problems in phylogenomics (genome-scale phylogeny estimation), multiple sequence alignment, and metagenomics. I very much like collaborating with biologists, and have worked with the Avian Phylogenomics Project and the Thousand Plant Transcriptome project, among others. Finally, I will hold a workshop on large-scale phylogenetics and multiple sequence alignment at the NSF-funded Institute for Mathematical and Statistical Innovation August 11-14, 2025; see this page
Research in Network Science I have a new interest in Network Science, and I collaborate with George Chacko. Our work focuses in two main directions: (1) understanding the organization of scientific communities, and especially emerging trends in biomedical research and (2) developing novel clustering methods that enable discovery from citation networks. Among the highlights of this collaboration with George Chacko are two papers we published in Quantitative Science Studies (part of MIT press): (1) Bradley et al. in in which we identify model misspecification as a problem for a prior publication published in Science, and Wedell et al., where we propose a new model and method for community detection based on center-periphery structures, and we apply it to a citation graph for the field of extracellular vesicles. We also have a new paper (accepted to PLOS Complex Systems, 2024) about the failure for clustering methods to produce well-connected clusters, see Park et al., Well-Connected Communities in Real-World Networks. Preprint available on arXiv (HTML)
Research in Historical Linguistics
Just as species evolve, so do languages, and the inference of the
evolutionary histories of different languages is of great interest to me.
Some of my early work in this area is via collaboration with
Don Ringe (Univ Pennsylvania), Steve Evans (Berkeley), and Luay
Nakhleh (Rice University). See our webpage at
historical linguistics.
Computational Phylogenetics:
An introduction to designing methods for phylogeny
estimation, published by Cambridge University Press
(and available for purchase at Amazon and
as an E-book at Google Play).
Errata are posted as I find them.
The image of the Monterey Cypress is there because of
the NSF-funded CIPRES project,
whose purpose was to develop the methods and computational
infrastructure to improve large-scale phylogeny estimation.
Why I wrote this book.
I dedicated the book to
my PhD advisor
Gene Lawler,
who died in 1994; see
this memorium
(published in the Journal of Computational Biology,
10 Jun 2009)
that I co-authored
with Dan Gusfield, David Shmoys, and Jan Karel Lenstra
about Gene.
Bioinformatics and Phylogenetics:
Seminal Contributions of Bernard Moret, published by
Springer.
This book is a
Festschrift for Bernard Moret,
who retired from EPFL in December 2016.
The book contains a collection of self-contained chapters
that can be used for an advanced course in
computational biology and bioinformatics.
Current Funding:
"Plus de détails, plus de détails, disait-il à son fils, il n'y a d'originalité et de vérité que dans les détails..." -- Stendhal, Lucien Leuwen (a quote much loved by my stepfather, Martin J. Klein, and an essential guide for all scholarship).