Information for prospective undergraduates

Please see this page, and note that I do not consider undergraduate researchers who have not taken my graduate course, CS 581.

Information for prospective graduate students and postdocs

For applicants to the PhD student program: If you are interested in applying to a PhD program at UIUC and in working in my research group, please read this carefully. I am indeed looking for new students, and so I will be glad to review your application to the graduate program at UIUC. Please be aware that admission is decided by a committee, and not by individual faculty. However, if you list me among the faculty that interest you, then after your application is received (and the letters are all in), I will be contacted by the committee to look at your file. It will help if you specifically discuss one or more of my recent papers that interests you, so that I can understand your research interests.

For people interested in postdoctoral positions: I have limited funding at this time for new members of my lab, so postdoctoral applicants will need to apply for supplementary funding (e.g., through the CS department's Future Faculty Fellows program).

For applicants to the MS program: I prefer to work with PhD students, because research is a long-term commitment. However, MS students who want to do PhDs are welcome to write to me.

I work on algorithmic problems in computational biology with the aim of developing methods that biologists will use and that will have transformative accuracy and scalability. Part of this work involves mathematics (to understand the theoretical guarantees of the methods I develop, and of other methods), but part of it is also empirical (to understand performance on data). So implementation and testing is very important. All of my methods use a combination of graph algorithms and machine learning or statistical learning. My work in machine learning in particular involves the development of novel ensemble methods, using phylogenetic estimation to guide the design of the ensemble. The machine learning I do is largely unsupervised or semi-supervised learning, largely because there is very limited reliable labeled data in my field; as a result, I do not work in deep learning. Mathematical proofs are part of what I do, but my focus on empirical performance (on data, in other words) drives my research. If your interests are clearly in deep learning and other types of supervised learning, or if you don't like implementing methods, you should probably find another advisor. Otherwise, please read on.

I am actively recruiting new graduate students and postdocs to participate in cutting edge research in developing new methods with breakthrough performance for algorithmic problems in phylogeny estimation and related subjects. My research area is a blend of statistical estimation and discrete algorithms, and so the best students are either coming from statistics or computer science (and possibly mathematics or biology). There are many opportunities for theoretical research (as in proving theorems under Markov models of evolution, proofs of NP-hardness and approximability). There are also some opportunities for machine learning (but not nearly as much as other topics in computational biology and bioinformatics). If you want to work on problems that involve deep learning, you should contact Prof. Jian Peng, who has many such problems.

The skills I am looking for are described below, but can be summarized by: keen interest in solving real problems, enjoyment of discrete mathematics and algorithms, appreciation for data and hence statistics, and excellent coding skills. Communication (both oral and written) is very important, and being able to collaborate well with other students is also important.

Also, my lab prepares students for academic careers as faculty members, so if you clearly are more interested in an industry job, this is probably not the right lab for you.

If you are interested in working with me, please read the entire page here, and then write back. I am looking for whether there is a good fit between your interests and skills, and what I am working on. Therefore, it's best if you have read at least one of my papers in some depth and can tell me what interests you about that paper. At the bottom of this page, I have a list of papers that are closely related to my current research, and your comments on one or more of those (based on a careful reading) will be helpful to me.

My research

The primary objective of my research is to produce new algorithms and software that can dramatically improve phylogenetic analysis (whether in linguistics or in biology), as tested in simulation or on real data. Theoretical research is often done at the same time, using probability theory to predict performance under Markov models of evolution, but then testing these predictions in simulation. If you are a student who loves to design algorithms, likes the challenge of developing good heuristics for NP-hard optimization problems, loves to program, and enjoys collaborations (especially with scientists!), you may find this research area fun and rewarding. Absolutely no background in biology or linguistics is required. Research in my lab requires strong skills in algorithm design and analysis and software development. In addition, excellent interpersonal skills, oral and written communication skills, and a passion for research are also necessary. Overall, the required technical skills or coursework can be described by:

Current UIUC Graduate students: Graduate students (whether MS or PhD) who are already enrolled at UIUC in CS, ECE, Statistics, or Mathematics, are encouraged to contact me about thesis research possibilities. If you are enrolled in another program (e.g., Bioengineering or some biology program), I can consider you for a project where I am a member of your thesis committee, but you will need to have someone else be your main supervisor.

Students applying for admission to a UIUC graduate program: If you are not yet admitted to a graduate program at UIUC, please note that admission to these programs is done by committees - not by individual faculty, and hence not by me! I encourage you to write to me about your interests, but please note that admission to a graduate program depends on meeting the expectations and standards of the graduate program and not just finding someone keen to advise you as a graduate student. In particular, successful applicants to the MS or PhD program in Computer Science typically are CS majors from strong CS undergraduate programs. If your undergraduate degree is not in CS, you may do better in applying to some other graduate program. You should check with the graduate programs that are potential good fits for you directly.

First semester is a rotation: The first semester is a rotation, generally spent on a specific research project in collaboration with other students, so that you can find out about the research area. However, all graduate students who wish to work with me must take my graduate course. You are encouraged to obtain the textbook for this course, Computational Phylogenetics: An introduction to designing methods for phylogeny estimation, published by Cambridge University Press. After a semester as a rotation student, you and I can discuss what you could do if you want to do a PhD in my lab. Please feel free to talk with my current or former students about working with me; the list of students is available here.

Prospective postdocs: If you are interested in joining the lab as a postdoc, you should have a PhD in Computer Science, Mathematics, or Statistics, and you should have already published several peer-reviewed papers in algorithms for phylogenetics. You will need to provide three letters of reference from faculty members in computer science, mathematics, or statistics doing research in phylogenetics.

To find out more (for all students): If you are a student at UIUC and either an upper division undergraduate or a graduate student, then you should take my graduate class in Algorithmic Genomic Biology, or else obtain the textbook for the course and read it! The CS 581 course introduces students to computational phylogenomics, and many students do research as a course project. These research projects often result in published journal and conference papers, and thus are a great way to learn about the research area.

If you have not taken this course yet, please first read a few of my recent papers. In particular, the following is a good representative of the kinds of work I am doing in my three active projects (the parenthetical numbers refer to the number in my online publication list). I've also selected one or two papers in each category as a paper I recommend you read before talking with me, and put those in boldface.

  • The course notes for my Spring 2020 graduate class in Algorithmic Genomic Biology
  • My textbook, Computational Phylogenetics: an introduction to designing methods for phylogeny estimation.
  • Some of my recent papers (the parenthetical numbers refer to the number in my online publication list, and boldfaced items are the ones I recommend in particular): You may also want to read some of the following introductory materials to this research area:

    Next steps

    If you are interested in working with me, please contact me by email and let me know which of my papers you've read, what projects you'd like to work on, and what your background is (see above). It would be best if you picked at least one paper from the list above and told me what interests you about the work in the paper.