General information for graduate students and postdocs interested in joining my lab
This page is for potential graduate students and postdocs interested in working wiht me.
If you are an undergraduate, please apply through the CS Department
Summer Research Program.
Update for postdoc applicants
I have very limited funding for a postdoc at this time. If you have your own
funding and have already worked on phylogenetics or computational
scientometrics, feel free to contact me.
Update for new graduate students, December 19, 2022
My current group has 6 PhD students, which is large
enough for me, and
I am unlikely to consider a new student unless there is
clearly a strong overlap in the student's prior research
and my research
interests (i.e., phylogenetics, multiple sequence alignment,
community detection, and historical linguistics).
Nevertheless, I am very glad to do CS 597 (independent study)
with new graduate students, and if the
research goes well I can consider taking you on as a dissertation student.
Overview of my research
I work on algorithmic problems (mostly in computational biology, but also
in scientometrics and historical linguistics) with the aim of developing methods that biologists will use and that will have transformative accuracy and scalability. Part of this work involves mathematics (to understand the theoretical guarantees of the methods I develop, and of other methods), but part of it is also empirical (to understand performance on data). So implementation and testing is very important.
All of my methods use a combination of graph algorithms and
machine learning or statistical learning.
My work
in machine learning in particular involves the development
of novel ensemble methods, using phylogenetic estimation to
guide the
design of the ensemble.
The machine learning I do is largely
unsupervised or semi-supervised learning, largely
because there is very limited reliable
labeled data in my field;
as a result, I do not work in
deep learning.
Mathematical proofs are part of what I do, but
my focus on empirical performance (on data, in other words)
drives my research.
If your interests are clearly in deep learning and other
types of supervised learning, or if you don't like
implementing methods, you should
probably find
another advisor. Otherwise,
please read on.
My lab prepares students for academic careers as faculty members,
so if you clearly are more interested in an industry job, this is
probably not the right lab for you.
If you are interested in working with me, please read the
entire page here, and then write back.
I am looking for whether there is a good fit between
your interests and skills, and what I am working on.
Therefore,
it's best if you have read at least one of my papers in
some depth and can tell me what interests you
about that paper. At the bottom of this page,
I have a list of papers
that are closely related to my current
research, and your comments on one or more
of those (based on a careful reading) will
be helpful to me.
Skills required for students
The primary objective of my research is to
produce new algorithms and
software that can dramatically improve
phylogenetic analysis (whether in linguistics or in biology),
as tested in simulation or on real data.
Theoretical research
is often done at the same time, using probability theory to
predict performance under Markov models of evolution, but
then testing these predictions in simulation.
If you are a student who loves to design
algorithms, likes the challenge of developing
good heuristics for
NP-hard optimization problems, loves to program, and enjoys
collaborations (especially with scientists!),
you may find this research area fun and
rewarding.
Absolutely no background in biology or linguistics is required.
Research in my lab requires
strong skills in
algorithm design and analysis and
software development.
In addition,
excellent interpersonal skills,
oral and written communication skills,
and a passion for research are also necessary.
Overall, the required technical skills or coursework can be
described by:
- Strong programming skills in several
programming languages, such as Java, C/C++, Python, Perl
and R, and ability to learn others
(essential)
- Upper division course in algorithm design
and analysis (essential)
- Upper division courses in graph theory,
statistics, and probability theory
(necessary, but can be obtained after joining)
The first year in the lab
The CS department has instituted a new approach to
PhD graduate student development, where the first year
is spent participating in potentially
more than one lab.
This allows students to experience different research
groups before settling into their ``research group" home.
I think this is a great approach, since students often change
their minds about
what they want to work on in the first year!
The first semester in my group can be considered something
like a
rotation, and is generally
spent on a specific research project in
collaboration with other students, so that
you can find out about
the research area.
If you are interested in working with me in computational
biology or historical linguistics, then
you will take
CS 581 in this semester, which I teach every year.
(Check with me about what to take if you want to work on scientometrics).
CS 581 requires a final project, and you should do
a research project on a topic that might be suitable for a paper for your
cousre project. I will work with you on that.
In the second semester, if you wish to continue working with me,
you should sign up for CS 597, and do research with me in that semester.
That research will likely involve substantial implementation and evaluation, and
may involve other students as well, but the goal will be to produce a paper that can
be submitted to a strong journal or conference by the end of the semester.
You are encouraged to obtain the textbook for this
course,
Computational Phylogenetics:
An introduction to designing methods for phylogeny
estimation, published by Cambridge University Press.
After the second semester,
you and I can discuss
what you could do if you want
to do a PhD in my lab.
Please feel free to talk with my current or former students
about working with me; the list of students is
available here.
To find out more about my computational biology work
Please see my
textbook on computational phylogenetics, which is what I use
for CS 581.
The following is a good representative of the
kinds of work I am doing
(the parenthetical numbers refer to the
number in my online publication list).
I've also selected one or two papers in each
category as a paper I recommend you
read before talking with me, and
put those in boldface.
The course notes for my Fall 2021
CS 581
course
in
Algorithmic Genomic Biology.
The first chapter of my textbook, available
here, provides an overview of the type of work that I do in
phylogenetics.
Some of my recent papers
(the parenthetical numbers refer to the
number in my online publication list, and
boldfaced items are the ones I recommend in particular):
-
Multiple Sequence Alignment: PASTA (116 and 122), UPP (126), and
MAGUS (183)
-
Ensembles of HMMs: SEPP (106), TIPP (121), UPP (126), and
HIPPI (141)
-
Phylogenomics: ASTRAL (117 and 128), ASTRID (133),
FastMulRFS
(177),
FASTRAL (187), and
DISCO (196).
-
Supertree methods: FastRFS (144) and SuperFine (105); see also
a recent unpublished
survey paper
-
Large-scale tree estimation using divide-and-conquer: DACTAL (112),
INC (159,166,167,178), NJMerge (162,169),
TreeMerge (168),
GTM (175)), and (192) applying GTM in particular
for maximum likelihood tree estimation; see
also Chapter 11 in my textbook (which has
divide-and-conquer
methods that use chordal graph theory)
- Clustering and community detection in scientometrics
(HTML)
(see in particular the paper by Wedell et a).
You may also want to read some of the
following survey papers for this research area:
-
Large-scale multiple
sequence alignment and phylogeny estimation,
T. Warnow, 2013, in Models and Algorithms for Genome Evolution, Springer Computational Biology Series, C. Chauge, N. El-Mabrouk, and E. Tannier, Editors.
-
Disk Covering Methods: improving the
accuracy and speed of large-scale phylogenetic
analyses by T. Warnow (appeared as
``Large-scale phylogenetic reconstruction,
in S. Aluru (editor), Handbook of Computational Biology, Chapman
& Hall, CRC Computer and Information Science Series, 2005).
Next steps
Students applying for admission to a UIUC graduate program:
As I have indicated above, my group is already fairly large and may become larger
(depending on who joins my group this year). Hence, if you are not
already admitted to UIUC, it is unlikely that I will be able to consider you
for a position. Howeer, if you already have publications in my research area
or have a very strong background and keen interest in working with me, then
do get in touch.
Please note, however,
that admission to the CS or other graduate programs at UIUC is done by
committees - not by individual faculty, and hence not by me!
Please also note that I am permitted to supervise graduate
students in many programs, but have a preference for
those students with appropriate backgrounds in CS, ECE, Statistics,
or Mathematics.
If you are applying to a PhD program in some other department,
I will be glad to talk with you about being a member of your dissertation committee,
but am unlikely to be your thesis advisor.
Current UIUC Graduate students:
Graduate students (whether MS or PhD) who are
already enrolled at UIUC in CS, ECE, Statistics, or Mathematics,
are encouraged
to contact me about thesis research possibilities.
If you are enrolled in another program (e.g., Bioengineering
or some biology program), I can consider you for a project
where I am a member of your thesis committee, but
you will need to have someone else
be your main supervisor.
Prospective postdocs:
If you are interested in joining the lab as a postdoc,
you should have a PhD in Computer Science, Mathematics, or
Statistics, and
you should have
already published several peer-reviewed
papers in algorithms for phylogenetics, computational
historical linguistics, or scientometrics.
You will need to provide three letters of
reference from faculty members in computer science,
mathematics, or statistics doing research
in phylogenetics.
Please write to me first to see if I am able to support you for a postdoc.
If I am, then you should submit your application
through the Future Faculty Fellows
submission website.
Information to provide if you are interested in working wiht me (and are a current
graduate student or applying for a postdoc)
- Say which research area(s) interest you (i.e., computational
biology, historical inguistics, or scientometrics) and why
- List the courses you have taken from the required
coursework for that area (and note that if you have not taken the
required courses, I will ask you to complete this before we continue the disucssion)
- Let me
know
which of my papers you've read and what you thought about them
(it's good if you ask questions)
- List the programming languages you are strong in
- Provide information about your most significant research experience to date (if any)
-
Provide your CV (including a list of publications)
and transcript (if a graduate student)