Spring 2020 graduate course, CS 581: Algorithmic Computational Genomics

Instructor: Tandy Warnow, Founder Professor of Computer Science Tandy Warnow

Course meets Tuesdays and Thursdays 11:00-12:15, Tuesdays/Thursdays in 1131 Siebel.

Tandy's office hours: Tu 1-2 PM (and by appointment) in Siebel Center 3235

Teaching Assistant: Vladimir Smirnov (smirnov3@illinois.edu). Office hours Wednesdays, 3:00-4:00 PM, in Siebel 0218

Lectures

Homework

Course description: This is a course on applied algorithms, focusing on the use of discrete mathematics, graph theory, probability theory, statistics, machine learning, and simulations, to design and analyze algorithms for phylogeny (evolutionary tree) estimation, multiple sequence alignment, genome-scale phylogenetics, genome assembly and annotation, and metagenomics. Each of these biological problems is important and unsolved, so that new methods are needed. Hence, this course will provide opportunities for computer scientists, mathematicians, and statisticians, to do original and important research that can have an impact on biology. Every year, at least one student in the course has done a project that was subsequently published in scientific conferences and journals; you can be one of these students! For examples of these papers, see Mirarab et al., Bioinformatics 2014, Zimmermann et al., BMC Genomics 2014, Davidson et al., BMC Genomics 2015, Chou et al., BMC Genomics 2015, Vachaspati and Warnow, BMC Genomics 2015, Nute and Warnow, BMC Genomics 2016, Christensen et al., Algorithms for Molecular Biology 2018, and Pattabiraman and Warnow, ACM-BCB 2018. In addition, Palash Sashittal (who took CS 581 in Fall 2018) and Mohammed El-Kebir have just had a paper just accepted to RECOMB-CG 2019 for work Palash did for his course project.

Who should take this class: The course is designed for graduate students in computer science, computer engineering, bioengineering, mathematics, and statistics, and does not depend on any prior background in biology.

Biology graduate students: Every year, biology graduate students have taken the course for credit and done well. Therefore, if you are a biology graduate student where these questions are relevant to your research (especially if using phylogeny estimation or multiple sequence alignments), you are very welcome in the class! However, please meet with me to discuss my expectations regarding homework and exams for biology students.

Pre-requisites: CS 374 and CS 361/STAT 361, or consent of the instructor; no biology background is required. If you did not take these pre-requisites at UIUC but have equivalent coursework in algorithms and probability/statistics, you will probably do fine. If you are a biologist without this background but you are working on problems where phylogeny estimation or multiple sequence alignment are important, you may be able to take the course as well with some extra work. Please see me if you have any questions about whether the course is suitable for you!

Course Textbook: Computational Phylogenetics: An introduction to designing methods for phylogeny estimation, published by Cambridge University Press. Errata are posted as I find them. You can get the hardcopy at the university bookstore (it is supposed to be there) or on Amazon. You can also get the E-book at Google Play. The image of the Monterey Cypress is there because of the NSF-funded CIPRES project, whose purpose was to develop the methods and computational infrastructure to improve large-scale phylogeny estimation.

Other course materials: Approximately the first half of the course will cover phylogenomics and multiple sequence alignment, and is based on the textbook. The second half of the course will be based on reading and presenting papers on topics of interest to the students that are related to the course material. You are expected to do all assigned reading (whether from the textbook or of published papers) in advance of coming to class.

Grading:

Homeworks: Homeworks need to be submitted to MOODLE in PDF format; these are due at 10 PM on the due date, which will normally be on Thursdays. Homeworks can be submitted up to 48 hours past the deadline for reduced credit (80% if within 24 hours and 60% if within 48 hours). The single worst homework grade will be dropped.

Midterm: The midterm will be take-home; no collaborations or consultation of materials (online or in papers/books) other than the textbook are permitted. The midterm will be distributed to the students on March 5 in class, and solutions need to be submitted on Sunday March 8 by 5 PM in Moodle. Hardcopy solutions (matching what you submitted in Moodle) need to be handed in no later than Monday March 9 (2 PM) to Tandy Warnow or Samantha Smith (3240 Siebel Center). The midterm will contain four parts:

There will be four questions in Part IV, each worth 10 points, and you should do them all; we will drop the worst score. See (this review document).

Final Project: The course requires a final project of each student, and is due in class on the last day the class meets. Please provide hardcopy to me directly - in class or in my office hours. You are strongly encouraged to do a research project, but you can also do a survey paper on some topic relevant to the course material. In both cases, your project should be a paper (of about 15 pages) in a format and style appropriate for submission to a journal. Research projects can involve two students, but survey papers must be done by yourself. Grades on the final project depend upon the kind of project you do. For a research paper, your grade will be 30% writing, 40% scientific/algorithmic rigor, and 30% impact. If you do a survey paper, the grade will be 30% writing, 30% summary of the literature you discuss, and 40% commentary (i.e., insight, critical and thoughtful discussion of the issues that come up). See the chapter on Projects from the course textbook for possible research projects. You might also want to look at this list of suggested final projects. You will need to submit a 1-2 page proposal for your final project in advance (via Moodle, deadline March 11). You should meet with me to discuss this proposal before you submit it, and then again meet with me after I have given you feedback about the proposal. An approved proposal is required (via Moodle).

Guidance on writing assignments. Many of the activities in this course involve writing, and this is particularly true for the final project if you do any kind of survey of the literature. It's very important that you familiarize yourself with expectations about scholarly writing, and in particular with how to avoid plagiarizing. Please see the information in the Academic Integrity page and specifically note the instructions about plagiarism and how paraphrasing improperly can count as plagiarism. In addition, please see my write-up with guidelines for reviewing computational papers.

Reading and presenting the scientific literature : All students will present research papers from the recent scientific literature. The presentation of scientific papers is a major part of the course, and all students are expected to participate actively in discussing these papers. Your class participation counts 10% towards your course grade, and half of this is based on class presentations (both your own presentation and your Q&A).

Course Participation: Your course participation will be evaluated in terms of how you participate in the in-class discussions of the scientific literature we are reading, and also of the presentations of scientific papers given by the other students. For a list of possible papers to read, see this list of papers to read and review, and read this document for how to critique these papers. (You are, of course, welcome to select any paper you want -- just not one of mine.)

Academic integrity

Emergency response recommendations

Optional additional reading

Please see the websites for the course from previous semesters, such as CS 581 from Fall 2018, CS 581 from Spring 2018, and CS 581 from Spring 2017, which have substantial overlap with this semester's course.