I work on algorithmic problems in computational biology with the aim of developing methods that biologists will use and that will have transformative accuracy and scalability. Part of this work involves mathematics (to understand the theoretical guarantees of the methods I develop, and of other methods), but part of it is also empirical (to understand performance on data). So implementation and testing is very important. All of my methods are a combination of graph algorithms and machine learning or statistical learning. My work in machine learning in particular involves the development of novel ensemble methods, using phylogenetic estimation to guide the design of the ensemble. The machine learning I do is largely unsupervised or semi-supervised learning, largely because there is very limited reliable labeled data in my field; as a result, I do not work in deep learning. Mathematical proofs are part of what I do, but my focus on empirical performance (on data, in other words) drives my research. If your interests are clearly in deep learning and other types of supervised learning, or if you don't like implementing methods, you should probably find another advisor. Otherwise, please read on.
I am actively recruiting new PhD students and postdocs to participate in cutting edge research in developing new methods with breakthrough performance for algorithmic problems in phylogeny estimation and related subjects. My research area is a blend of statistical estimation and discrete algorithms, and so the best students are either coming from statistics or computer science (and possibly mathematics or biology). There are many opportunities for theoretical research (as in proving theorems under Markov models of evolution, proofs of NP-hardness and approximability). There are also some opportunities for machine learning (but not nearly as much as other topics in computational biology and bioinformatics). If you want to work on problems that involve deep learning, you should contact Prof. Jian Peng, who has many such problems.
The skills I am looking for are described below, but can be summarized by: keen interest in solving real problems, enjoyment of discrete mathematics and algorithms, appreciation for data and hence statistics, and excellent coding skills. Communication (both oral and written) is very important, and being able to collaborate well with other students is also important.
Also, my lab prepares students for academic careers as faculty members, so if you clearly are more interested in an industry job, this is probably not the right lab for you.
If you are interested in working with me, please read the entire page here, and then write back. I am looking for whether there is a good fit between your interests and skills, and what I am working on. Therefore, it's best if you have read at least one of my papers in some depth and can tell me what interests you about that paper. At the bottom of this page, I hve a, list of papers that are closely related to my current research, and your comments on one or more of those (based on a careful reading) will be helpful to me.
Current UIUC Graduate students: Graduate students (whether MS or PhD) who are already enrolled at UIUC in CS, ECE, Statistics, or Mathematics, are encouraged to contact me about thesis research possibilities. If you are enrolled in another program (e.g., Bioengineering or some biology program), I can consider you for a project where I am a member of your thesis committee, but you will need to have someone else be your main supervisor.
Students applying for admission to a UIUC graduate program:
If you are not yet admitted to a graduate program at UIUC,
please note that admission to these programs is done by
committees - not by individual faculty, and hence not by me!
I encourage you to write to me about your interests, but
please note that admission to a graduate program depends
on meeting the expectations and standards of the
graduate program and not just finding someone keen to
advise you as a graduate student.
In particular, successful applicants to the MS or
PhD program in Computer Science
typically
are CS majors from strong CS undergraduate programs.
If your undergraduate degree is not in CS, you may do
better in applying to some other graduate program.
You should check with the graduate programs
that are potential good fits for you directly.
First semester is a rotation: The first semester is a rotation, generally spent on a specific research project in collaboration with other students, so that you can find out about the research area. However, all graduate students who wish to work with me must take my graduate course. You are encouraged to obtain the textbook for this course, Computational Phylogenetics: An introduction to designing methods for phylogeny estimation, published by Cambridge University Press. After a semester as a rotation student, you and I can discuss what you could do if you want to do a PhD in my lab. Please feel free to talk with my current or former students about working with me; the list of students is available here.
Prospective postdocs:
If you are interested in joining the lab as a postdoc,
you should have a PhD in Computer Science, Mathematics, or
Statistics, and
you should have
already published several peer-reviewed
papers in algorithms for phylogenetics.
You will need to provide three letters of
reference from faculty members in computer science,
mathematics, or statistics doing research
in phylogenetics.
To find out more (for all students): If you are a student at UIUC and either an upper division undergraduate or a graduate student, then you should take my graduate class in Algorithmic Genomic Biology, or else obtain the textbook for the course and read it! The CS 581 course introduces students to computational phylogenomics, and many students do research as a course project. These research projects often result in published journal and conference papers, and thus are a great way to learn about the research area.
If you have not taken this course yet, please first read a few of my recent papers. In particular, the following is a good representative of the kinds of work I am doing in my three active projects (the parenthetical numbers refer to the number in my online publication list). I've also selected one or two papers in each category as a paper I recommend you read before talking with me, and put those in boldface.