General information for graduate students and postdocs interested in joining my lab

This page is for potential graduate students and postdocs interested in working wiht me. If you are an undergraduate, please apply through the CS Department Summer Research Program.

Update for postdoc applicants

I have very limited funding for a postdoc at this time. If you have your own funding and have already worked on phylogenetics or computational scientometrics, feel free to contact me.

Update for new graduate students, December 19, 2022

My current group has 6 PhD students, which is large enough for me, and I am unlikely to consider a new student unless there is clearly a strong overlap in the student's prior research and my research interests (i.e., phylogenetics, multiple sequence alignment, community detection, and historical linguistics).

Nevertheless, I am very glad to do CS 597 (independent study) with new graduate students, and if the research goes well I can consider taking you on as a dissertation student.

Overview of my research

I work on algorithmic problems (mostly in computational biology, but also in scientometrics and historical linguistics) with the aim of developing methods that biologists will use and that will have transformative accuracy and scalability. Part of this work involves mathematics (to understand the theoretical guarantees of the methods I develop, and of other methods), but part of it is also empirical (to understand performance on data). So implementation and testing is very important. All of my methods use a combination of graph algorithms and machine learning or statistical learning. My work in machine learning in particular involves the development of novel ensemble methods, using phylogenetic estimation to guide the design of the ensemble. The machine learning I do is largely unsupervised or semi-supervised learning, largely because there is very limited reliable labeled data in my field; as a result, I do not work in deep learning. Mathematical proofs are part of what I do, but my focus on empirical performance (on data, in other words) drives my research. If your interests are clearly in deep learning and other types of supervised learning, or if you don't like implementing methods, you should probably find another advisor. Otherwise, please read on.

My lab prepares students for academic careers as faculty members, so if you clearly are more interested in an industry job, this is probably not the right lab for you. If you are interested in working with me, please read the entire page here, and then write back. I am looking for whether there is a good fit between your interests and skills, and what I am working on. Therefore, it's best if you have read at least one of my papers in some depth and can tell me what interests you about that paper. At the bottom of this page, I have a list of papers that are closely related to my current research, and your comments on one or more of those (based on a careful reading) will be helpful to me.

Skills required for students

The primary objective of my research is to produce new algorithms and software that can dramatically improve phylogenetic analysis (whether in linguistics or in biology), as tested in simulation or on real data. Theoretical research is often done at the same time, using probability theory to predict performance under Markov models of evolution, but then testing these predictions in simulation. If you are a student who loves to design algorithms, likes the challenge of developing good heuristics for NP-hard optimization problems, loves to program, and enjoys collaborations (especially with scientists!), you may find this research area fun and rewarding. Absolutely no background in biology or linguistics is required.

Research in my lab requires strong skills in algorithm design and analysis and software development. In addition, excellent interpersonal skills, oral and written communication skills, and a passion for research are also necessary. Overall, the required technical skills or coursework can be described by:

The first year in the lab

The CS department has instituted a new approach to PhD graduate student development, where the first year is spent participating in potentially more than one lab. This allows students to experience different research groups before settling into their ``research group" home. I think this is a great approach, since students often change their minds about what they want to work on in the first year!

The first semester in my group can be considered something like a rotation, and is generally spent on a specific research project in collaboration with other students, so that you can find out about the research area. If you are interested in working with me in computational biology or historical linguistics, then you will take CS 581 in this semester, which I teach every year. (Check with me about what to take if you want to work on scientometrics). CS 581 requires a final project, and you should do a research project on a topic that might be suitable for a paper for your cousre project. I will work with you on that.

In the second semester, if you wish to continue working with me, you should sign up for CS 597, and do research with me in that semester. That research will likely involve substantial implementation and evaluation, and may involve other students as well, but the goal will be to produce a paper that can be submitted to a strong journal or conference by the end of the semester. You are encouraged to obtain the textbook for this course, Computational Phylogenetics: An introduction to designing methods for phylogeny estimation, published by Cambridge University Press.

After the second semester, you and I can discuss what you could do if you want to do a PhD in my lab. Please feel free to talk with my current or former students about working with me; the list of students is available here.

To find out more about my computational biology work

Please see my textbook on computational phylogenetics, which is what I use for CS 581. The following is a good representative of the kinds of work I am doing (the parenthetical numbers refer to the number in my online publication list). I've also selected one or two papers in each category as a paper I recommend you read before talking with me, and put those in boldface.
  • The course notes for my Fall 2021 CS 581 course in Algorithmic Genomic Biology.
  • The first chapter of my textbook, available here, provides an overview of the type of work that I do in phylogenetics.
  • Some of my recent papers (the parenthetical numbers refer to the number in my online publication list, and boldfaced items are the ones I recommend in particular): You may also want to read some of the following survey papers for this research area:

    Next steps

    Students applying for admission to a UIUC graduate program: As I have indicated above, my group is already fairly large and may become larger (depending on who joins my group this year). Hence, if you are not already admitted to UIUC, it is unlikely that I will be able to consider you for a position. Howeer, if you already have publications in my research area or have a very strong background and keen interest in working with me, then do get in touch. Please note, however, that admission to the CS or other graduate programs at UIUC is done by committees - not by individual faculty, and hence not by me! Please also note that I am permitted to supervise graduate students in many programs, but have a preference for those students with appropriate backgrounds in CS, ECE, Statistics, or Mathematics. If you are applying to a PhD program in some other department, I will be glad to talk with you about being a member of your dissertation committee, but am unlikely to be your thesis advisor.

    Current UIUC Graduate students: Graduate students (whether MS or PhD) who are already enrolled at UIUC in CS, ECE, Statistics, or Mathematics, are encouraged to contact me about thesis research possibilities. If you are enrolled in another program (e.g., Bioengineering or some biology program), I can consider you for a project where I am a member of your thesis committee, but you will need to have someone else be your main supervisor.

    Prospective postdocs: If you are interested in joining the lab as a postdoc, you should have a PhD in Computer Science, Mathematics, or Statistics, and you should have already published several peer-reviewed papers in algorithms for phylogenetics, computational historical linguistics, or scientometrics. You will need to provide three letters of reference from faculty members in computer science, mathematics, or statistics doing research in phylogenetics. Please write to me first to see if I am able to support you for a postdoc. If I am, then you should submit your application through the Future Faculty Fellows submission website.

    Information to provide if you are interested in working wiht me (and are a current graduate student or applying for a postdoc)