Research Experiences for Undergrads in
the
Warnow Lab
Research Overview
My research is in three areas: computational biology,
scientometrics, and historical linguistics.
I enjoy working with undergraduates, but I require
that students have completed a graduate course in
these topics (CS 581: Algorithmic Genomic Biology or CS 598: Computational Scientometrics).
Research in my lab for these areas involves algorithm design
and implementation, so students who are strong programmers and
meticulous experimentalists are welcome.
However, in all these cases, there is required coursework that you
must have before working with me, as indicated below.
Thus, if you have not completed the required coursework, please understand
that I will ask you to complete the coursework before
we can begin working together.
You will also need to provide the information requested in the "How to apply" section, listed below.
-
Scientometrics.
I work with CS Professor George Chacko in method development
for scientometrics.
Required coursework: CS 598 (Fall 2022) with George Chacko (I am co-teaching).
Also:
CS 361 and 374,
STAT 400 and 410, and at least one of CS 412, 446, 498 (Applied Machine
Learning)
- Computational biology research.
My research combines
mathematics, computer science,
probability, and
statistics, in order to develop
algorithms with improved accuracy for
large-scale and complex estimation problems in
phylogenomics (genome-scale phylogeny estimation),
multiple sequence alignment, and
metagenomics.
Required coursework: CS 581.
Interested in applying?
CS/ECE/Stats/Math students:
Note that you do not need to know any biology to do this research!
To succeed in this research you should have very strong
programming skills (especially in Python),
be interested in challenging yourself, good
at working with others and also independently, and
have strong communication skills (both oral and written).
Biology students:
I am also interested in working with biology undergraduate students,
as long as you have practical experience in phylogeny estimation
or multiple sequence alignment, and your research interests involve
doing analyses of this type.
Let me know what you would like to do reseach on, and
how it fits into my research.
Papers to read:
Before you apply to work with me, please
first read a few of my recent papers.
The following is a good representative of the
kinds of work I am doing
(the parenthetical numbers refer to the
number in my online publication list):
-
Multiple Sequence Alignment: PASTA (116 and 122), UPP (126), and MAGUS (183)
-
Ensembles of HMMs: SEPP (106), TIPP (121), UPP (126), and HIPPI (141)
-
Phylogenomics: ASTRAL (117 and 128), ASTRID (133),
FastMulRFS (177),
and DISCO (196)
-
Supertree methods: FastRFS (144) and SuperFine (105); see also
a recent unpublished
survey paper
-
Large-scale tree estimation using divide-and-conquer: DACTAL (112),
INC (159,166,167), NJMerge (162,169), TreeMerge (168), GTM (175),
and two survey papers (192,201)
-
Community finding in networks (applied to scientometrics): Center-Periphery Communities (203)
Possible research projects:
I am open to many different possible research projects, but the
most likely ones to succeed would be ones where you would work with one
of my current PhD students.
However, if you have something specific in mind, please let me know what you
would like to do.
Here are some types of research projects that I would be glad to support:
- Designing new methods and testing existing
methods for evolutionary tree estimation
on large datasets using simulations
- Re-analyzing some important phylogenomic datasets using
new computational methods
- Comparing divide-and-conquer strategies on ultra-large
datasets, and designing new divide-and-conquer strategies
- Implementing a GUI for phylogenomic software
Here are two examples of publications done by undergraduate students with
me:
-
Kodi Collins (now Kodi Taraszka) worked on multiple sequence alignment with me, and published
PASTA for Proteins.
in Bioinformatics 2018.
Kodi is now a PhD student in Computer Science at UCLA.
-
Thien Le worked on gene tree estimation with me, and published
Using INC within Divide-and-Conquer
in the Proceedings of Algorithms for Computational Biology, 2019.
Thien is now a PhD student in EECS at MIT.
Doing a research project with me involves
a substantial commitment.
Research students have
individual meetings (at least weekly, but
more often when you are implementing and
testing methods, or writing up results for publication)
with me and one or more of
my graduate students.
It will also involve attendance in weekly group meetings.
I provide mentoring in
learning how to present
research results, analyze data, read scientific
papers, and design methods.
In other words,
being a research student involves a substantial effort
and time commitment on your part, but also from me and
from graduate students in my group.
How to apply
I receive many applications for research positions in my lab,
and can only accept students who are serious about the
effort involved and where there is a good fit
with my
group.
Also note that I will not consider your
application unless you have taken the required
courses (CS 581 for students interested in
computational biology, and CS 598: Computational Scientometrics
for students interested in scientometrics).
Please
send me an email with your current transcript, and an
answer to the following questions:
-
Have you taken the required courses (CS 581 for computational
biology students, and CS 598: Computational Scientometrics) for those
interested in scientometrics)? If so, provide your transcript (unofficial is fine).
-
When do you expect to graduate with your undergraduate degree?
-
Do you want to go to graduate school? If so, would this be in Computer Science or some other field? Would it be for a PhD or a MS? What are your career goals?
-
Do you want to do an undergraduate thesis? Are you considering this research as potential thesis project?
-
Why do you want to do a research project with me?
-
Which of my papers have you read? (if you haven't read any of the papers,
please read some from the last three or so years, and let me know
which papers you liked and why.)
-
Do you prefer theory (and in particular proving theorems about methods) or developing, implementing, and testing methods on data? (If you like both, that's fine.)
-
What have your favorite courses been so far, and why?
-
What programming languages are you most comfortable with?
-
How comfortable are you with learning material on your own?
-
How comfortable are you with collaborating with others?
-
How comfortable are you with taking existing codes and fixing errors or refactoring them?
-
Practical issues:
-
Do you need to get paid for your research? Or would getting course credit be enough?
-
How many hours a week do you want to work on a research project with me? Would you want to work during the summer?
-
Finally, are you a U.S. citizen or permanent resident? (This impacts sources of funding.)
Summer 2019 REU students
During the summer 2019 semester, I worked with 10 undergraduate students
(all from UIUC).
These REU students
learned the mathematical foundations
of the material, which is covered
in my textbook.
In addition, they looked
at the
lectures for CS 581, and did
modified homeworks (suitable for undergrads), which are available at
this
page.
Finally, they did
some research projects, which are described in
my notes on first projects.
My textbook:
Computational Phylogenetics:
An introduction to designing methods for phylogeny
estimation, published by Cambridge University Press.