CS 581 (Fall 2022): Algorithmic Computational Genomics

Instructor: Tandy Warnow, Grainger Distinguished Chair in Engineering Tandy Warnow

Time: TuTh 11 AM to 12;15 by zoom (link will be sent to registered students) - please email the instructor for the registration link. However, this class has a required in-person activity, where you present your course project proposal (see below).

office hours : Tandy's will be Fridays, 11 AM to noon, and Yasamin's will be Thursdays 3-4 PM. Starting Sept 30, these office hours will be by zoom. The zoom link was shared with the students via an email on 9/30. Yasamin's office hours will be by zoom. The zoom information will be sent to the registered students by email. Exceptions will be listed here:

Announcements

Teaching Assistant: Yasamin Tabatabaee. Office hours: TBD

Lectures: These will be posted at least 24 hours in advance. You are expected to read these before coming to class.

Homework: These include reading assignments as well as problem sets. You are expected to do all the reading by the due date.

Course description: This is a course on applied algorithms, focusing on the use of discrete mathematics, graph theory, probability theory, statistics, machine learning, and simulations, to design and analyze algorithms for phylogeny (evolutionary tree) estimation, multiple sequence alignment, genome-scale phylogenetics, with extra topics based on student interest (e.g., genome assembly and annotation, and metagenomics). See the detailed syllabus for a more detailed descrption of the course material. Each of these biological problems is important and unsolved, so that new methods are needed. Every year, at least one student in the course has done a project that was subsequently published in scientific conferences and journals (see this page); you can be one of these students!

Course project The course requires a final project of each student. You are strongly encouraged to do a research project, but you can also do a survey paper on some topic relevant to the course material. In both cases, your project should be a paper of at least 3000 words (not including the bibliography) in a format and style appropriate for submission to a journal such as Bioinformatics (however, please provide this in a single column format, not double column). Research projects can involve two students, but survey papers must be done by yourself. (Note: When projects are done by two students, the division of the work should be communicated in the write-up, and each student should submit their own course project write-up. Please see me to discuss requirements regarding division of work and write-ups for your specific project, if this applies to you.) Grades on the final project depend upon the kind of project you do. For a research paper, your grade will be 30% writing and 70% content. If you do a survey paper, the grade will be 40% writing and 60% content. In both cases, you should include a thoughtful discussion of the relevant literature and have an appropriate bibliography. Note also the requirements for reproducibility (for research papers) and the expectations about writing quality, so see this PDF for some writing advice. See also this page for suggested topics. I meet with each group several times during the semester to help them make progress, and the TA is also available to help. To see some of the papers that have resulted from these course projects, see this page. Finally, note that there is a required in-person presentation of the course project proposal. You will need to schedule this with me, and they will not take place during the class lecture.

Pre-requisites CS 374 and CS 361/STAT 361, or consent of the instructor; no biology background is required. As most of you did not do your undergraduate degrees at UIUC, you would not have taken these courses here. I am not concerned if you are a graduate student in the CS program, since this would imply you have this background anyway. But if you are a graduate student in another program, you will need to meet with me to discuss your background. The first homework, to some extent, will be used to evaluate your readiness for the course in terms of your background training. I will grade this myself and then meet with you if your performance on the homework does not reflect sufficient background. In that case, we should discuss options, including you switching to Credit/No Credit, dropping the course but auditing, etc.

COVID-19 precautions In an abundance of caution, I have decided to hold the course lectures online, and will provide the zoom link to registered students. However, I will schedule small group "office hours" in person and hold these outdoors while the weather permits; these in person office hours will require appointments, and will be limited to 2-3 students at a time, so that adequate distance can be maintained between everyone. I will also hold office hours by zoom each week (and of course these you don't need to schedule). If you are ill, have been exposed to COVID-19, or have recently tested positive for COVID-19, do not come to in-person office hours. Moreover, please keep several feet away from me, the TA, and other students, during the in-person office hours. Please also see this page for additional specific information about my COVID-19 policies.

Who should take this class: The course is designed for graduate students in CS, ECE, Math, and Statistics; no background in biology is required.

Undergraduate students: If you are an advanced undergraduate student (in CS, ECE, Mathematics, Physics, or Statistics) and interested in taking the course, please email me to discuss your qualifications. I generally do not let undergraduate students into the class because this is a research-focused advanced course requiring many different skills (including theorem proving, implementation, analysis of algorithms, scientific literature reviews, etc.). However, if you are sufficiently advanced (preferably a senior with substantial coursework already completed that shows these multitude of skills), serious about the commitment necessary to do this course, and planning to apply for PhD programs, then I may allow you into the class.

Assigned reading The assigned reading will include papers from the scientific literature, as well as the required textbook Computational Phylogenetics: An introduction to designing methods for phylogeny estimation, published by Cambridge University Press. Nearly all the textbook will be covered during the class, and most of the homework will be taken from the textbook. You do need to get this textbook, therefore. Please check the campus bookstore for availability. Students have also obtained the book from Amazon or Cambridge University Press. An e-book is also available from Google Play.

Grading

Midterm The midterm is take-home, and you are expected to do it entirely by yourself. If you have questions, please ask the TA or the instructor, rather than consulting others. It will be distributed on Thursday October 6 (at 11 AM) and will be due on Tuesday October 11 by 10:30 AM (before class begins). Please submit your exam either in PDF form in Moodle, or in hardcopy to Samantha Smith in Siebel 4301 by the deadline.

Guidance on writing assignments. Many of the activities in this course involve writing, and the grade for these assignments depends in part on quality of the writing. This is specifically true for the final project. It's very important that you familiarize yourself with expectations about scholarly writing, and in particular with how to avoid plagiarizing. Please see the information in the Academic Integrity page and specifically note the instructions about plagiarism and how paraphrasing improperly can count as plagiarism. In addition, please see my write-up with guidelines for reviewing computational papers.

Expectations

Absence policy Absences are allowed but the student is required to learn what was covered when they were not attending the lecture (via zoom). Note that the lectures are not recorded! For this reason, course presentations are provided on the course webpage.

Additional Syllabus statements