Co-Chief Scientist, the
C3.ai Digital Transformation Institute,
Associate Head, Department of Computer Science
Special advisor to the Head of Computer Science
Grainger Distinguished Chair in Engineering
Member, Bioinformatics and
Computational Biology Group
Member, Carl R. Woese Institute for Genomic Biology
Affiliate, National Center for Supercomputing Applications
Affiliate, Coordinated Sciences Laboratory
Affiliate, Unit for Criticism and Interpretive Theory
Affiliate, departments of Electrical and Computer
Engineering; Bioengineering; Mathematics; Statistics;
Evolution, Ecology, and Behavior;
Entomology; and Plant Biology.
Fellow of the ISCB (International Society for Computational Biology), 2017
Fellow of the ACM (Association for Computing Machinery), 2015
Fellow of the AAAS (American Association for the Advancement of Science), 2021
PhD (Mathematics) University of California at Berkeley, 1991
B.S. (Mathematics) University of California at Berkeley, 1984
Google Scholar page
Statement of support for Iranian women
I feel for the brave women of Iran who
are endangered. I am deeply upset by the
deaths that are reported.
I know I am not alone in this.
Statement of support for Black Lives Matters
I stand with the African-American community and
all my friends and colleagues who are outraged
by the killing of George Floyd and other African-American
men and women.
- CS 597 offerngs:
I am eager to do 597 courses with current PhD or MS students
at UIUC who are looking for potential thesis topics.
The research I can supervise would be in
one of these areas:
clustering and community detection (joint with Professor George
(2) computational historical linguistics (joint with
linguists I work with), or
(3) computational biology.
Please email me directly
if this potentially interests you.
I contributed to a paper about
1 Billion Years of Green Plant Evolution,
by the One Thousand Plant Transcriptomes Initiative.
The CS department has a Future Faculty Fellows program for
We are still accepting applications --
please contact me if you want more information.
Teaching Fall 2022
CS 581: Algorithmic
Genomic Biology, Fall 2022.
This is a course designed for graduate students interested potentially in working in computational biology, but it does not assume any knowledge of biology.
this page for a partial list (with links
to some papers) of papers that resulted from course projects.
This course will be held TuTh 11AM-12:15PM, by zoom.
CS 598: Computational Scientometrics (CS 598CGG).
I will co-teach this course with George Chacko.
This course will be focused on reading the literature from both
scientometrics and the data mining aspects of this field,
and will highlight talks from major contributors to the field.
Please get in touch with George or me to get more information.
The course will be held MW from 9:30-10:45 AM, by zoom.
CS 591: BIO.
I will co-teach this with Mohammed El-Kebir, which will
be primarily the PhD students in the BCB (Bioinformatics and
Computational Biology) group presenting their work.
The course be hybrid, held in person from 4-4:50 Thursdays in 1304 Siebel Center,
and with a zoom link (provided to registered participants).
Research in Bioinformatics
My main research is focused on
algorithmic problems in computational biology with the aim of developing methods that biologists will use and that will have transformative accuracy and scalability.
Some of this work is summarized in
a preprint, "Recent Progress on Methods for Estimating and Updating Large Phylogenies".
Part of this work involves mathematics (to understand the theoretical guarantees of the methods I develop, and of other methods), but part of it is also empirical (to understand performance on data). So implementation and testing is very important. All of my methods are a combination of graph algorithms and machine learning or statistical learning. My work in machine learning in particular involves the development of novel ensemble methods, using phylogenetic estimation to guide the design of the ensemble. The machine learning I do is largely unsupervised or semi-supervised learning, largely because there is very limited reliable labeled data in my field; as a result, I do not work in deep learning. Mathematical proofs are part of what I do, but my focus on empirical performance (on data, in other words) drives my research.
My current work is on
large-scale and complex estimation problems in
phylogenomics (genome-scale phylogeny estimation),
multiple sequence alignment,
I very much like collaborating with biologists, and have
worked with the Avian Phylogenomics Project and the
Thousand Plant Transcriptome project, among others.
Research in Scientometrics
I have a new interest in
Our work focuses in two main directions:
(1) understanding the organization of scientific communities, and especially emerging trends in biomedical research and
(2) developing novel clustering methods that enable discovery from citation networks.
Among the highlights of this collaboration with George Chacko are two papers
we published in
Quantitative Science Studies (part of MIT press):
Bradley et al.
in which we identify
model misspecification as a problem for a prior
published in Science, and
Wedell et al.,
where we propose a new model and method for community detection based
on center-periphery structures,
and we apply it to a citation graph for the field of
Research in Historical Linguistics
Just as species evolve, so do languages, and the inference of the
evolutionary histories of different languages is of great interest to me.
Some of my early work in this area is via collaboration with
Don Ringe (Univ Pennsylvania), Steve Evans (Berkeley), and Luay
Nakhleh (Rice University). See our webpage at
Interested in working with me?
If you are interested in working with me as a graduate student,
please read this first, since it will give
you context for what I work on.
I have a full lab (6 current PhD students) right now, and so I am
only taking new students where there is a clearly strong fit
between the student's interest and mine.
To work with me, you should take one of the classes I am teaching this
Fall 2022 semester (see above); these classes involve course projects,
which will give you a chance to do research on a problem that could
become a thesis topic.
If there is a good outcome (and a good fit with my group), then
in Spring 2023, we can do a 597 (Independent Study)
and discuss potential advising.
If you are interested in working with me as a postdoc,
please read this first and then contact me.
If you are an undergraduate, please see this page,
and then contact me.
Future Faculty Fellows at UIUC Computer Science.
These are flexible postdocs that can be used with anyone
in the CS department. If you want to teach, then
these positions will be funded 50% by the
department and 50% by the research faculty mentor.
In exchange for departmental funding, these postdocs will teach 1 course per year, based on department needs and the candidate's interest; if the candidate wants to teach more, they will have the opportunity to do so.
An introduction to designing methods for phylogeny
estimation, published by Cambridge University Press
(and available for purchase at Amazon and
as an E-book at Google Play).
Errata are posted as I find them.
The image of the Monterey Cypress is there because of
the NSF-funded CIPRES project,
whose purpose was to develop the methods and computational
infrastructure to improve large-scale phylogeny estimation.
Why I wrote this book.
I dedicated the book to
my PhD advisor
who died in 1994; see
(published in the Journal of Computational Biology,
10 Jun 2009)
that I co-authored
with Dan Gusfield, David Shmoys, and Jan Karel Lenstra
Bioinformatics and Phylogenetics:
Seminal Contributions of Bernard Moret, published by
This book is a
Festschrift for Bernard Moret,
who retired from EPFL in December 2016.
The book contains a collection of self-contained chapters
that can be used for an advanced course in
computational biology and bioinformatics.
Recent NSF funding has supported
work in metagenomics,
and graph-theoretic algorithms.
All of these are still very active research areas in my group.
I also recently benefited from support
of the John P.
Simon Guggenheim Foundation, and
earlier support from
David and Lucile Packard Foundation,
Institute for Advanced Study at Harvard University,
for Evolutionary Dynamics at Harvard University,
and Microsoft Research, New England.
The Grainger Distinguished Chair in Engineering is
funded through the
Grainger Engineering Breakthroughs Initiative, which is supporting development of
research in Big Data and Bioengineering at UIUC.
I am grateful to the
National Science Foundation for its
continuous support since 1994.
for completed projects funded by NSF, starting in 2001.
"Plus de détails, plus de détails, disait-il à son
n'y a d'originalité et de vérité que dans les
-- Stendhal, Lucien Leuwen (a quote much loved by my stepfather,
J. Klein, and an essential guide for all scholarship).