Course meets Tuesdays and Thursdays 12:30-1:45 PM Tuesday/Thursday in 1105 SC.

*Tandy's office hours*: Tu 2-3 PM (and by appointment) in Siebel Center 3235

**Teaching Assistant**: Wei Qian, weiqian3@illinois.edu.
Office hours: Thursdays 11 AM to 12 PM, in 1218 Siebel Center.

** Course description**:
This is a course on applied algorithms, focusing on the use of
discrete mathematics, graph theory, probability theory,
statistics, machine learning,
and simulations, to
design and analyze algorithms for
phylogeny (evolutionary tree) estimation,
multiple sequence alignment, genome-scale phylogenetics,
genome assembly and annotation, and metagenomics.
Each of these biological problems is important and unsolved,
so that new methods are needed.
Hence, this course will provide opportunities for
computer scientists, mathematicians, and statisticians, to
do original and important research that can have an
impact on biology.
Every year, at least one student in the course
has done a project that was subsequently
published in scientific conferences and journals; you can
be one of these students!
For examples of these papers,
see Mirarab et al., Bioinformatics 2014,
Zimmermann et al.,
BMC Genomics 2014,
Davidson
et al., BMC Genomics 2015,
Chou et al., BMC Genomics 2015,
Vachaspati and Warnow, BMC Genomics 2015,
Nute and Warnow, BMC Genomics 2016,
Christensen et al., Algorithms for Molecular Biology 2018, and
Pattabiraman and Warnow, ACM-BCB 2018.

**Who should take this class:**
The course is designed for
graduate students in computer science, computer
engineering, bioengineering, mathematics,
and statistics, and does not depend on any prior background
in biology.

**Biology graduate students:**
Every year, biology graduate students have taken the course
for credit and done well.
Therefore, if you are a
biology graduate student
where these questions are relevant to your research
(especially if using phylogeny estimation or
multiple sequence alignments), you are very welcome in the class!
However, please meet with me to
discuss my expectations regarding homework and exams for
biology students.

**
Pre-requisites**:
CS 374 and CS 361/STAT 361, or consent of
the instructor;
no biology background is required.
If you did not take these pre-requisites at UIUC but have
equivalent coursework in algorithms and probability/statistics,
you will probably do fine.
If you are a biologist without this background but you are
working on problems where phylogeny estimation or multiple
sequence alignment are important, you may be able to
take the course as well with some extra work.
Please see me if you have any questions about whether the
course is suitable for you!

**Course Textbook: **
*Computational Phylogenetics*:
An introduction to designing methods for phylogeny
estimation, published by Cambridge University Press.
Errata are posted as I find them.
You can get the hardcopy
at the university bookstore (it is supposed to be there) or
on
Amazon.
You can also get the E-book at Google Play.
The image of the Monterey Cypress is there because of
the NSF-funded CIPRES project,
whose purpose was to develop the methods and computational
infrastructure to improve large-scale phylogeny estimation.

** Other course materials**:
Approximately the first half of the course is based on phylogenomics and
multiple sequence alignment, and is based on the textbook.
The second half of the course will cover
genome assembly and annotation, comparative
genomics, and metagenomics, and will be based
on the scientific literature.
You are
expected to do all assigned reading (whether from
the textbook or of published papers)
in advance of coming to class.

**
Grading**:

- Homeworks: 40% (worst homework grade dropped)
- Take home midterm: 25% (handed out November 8, return November 13 noon by Moodle or hardcopy)
- Final Project: 25% (due last day of class)
- Course Participation and paper presentation: 10%

** Homeworks**:
Homeworks need to be submitted to MOODLE in
PDF format; these
are due at 10 PM on the due date, which
will normally be on Thursdays.
Homeworks can be submitted
up to 48 hours past the deadline for reduced
credit (80% if within 24 hours
and 60% if within 48 hours).
The single worst homework grade will be dropped.

** Final Project**:
The course requires a final project of each student, and is
due in class on the last day the class meets.
Please provide
hardcopy to
me directly - in class or in my office hours.
You are strongly encouraged to do a research project, but you can also do a survey paper on some topic relevant to the course material. In both cases, your project should be a paper (of about 15 pages) in a format and style appropriate for submission to a journal. Research projects can involve two students, but survey papers must be done by yourself.
Grades on the final project depend upon the kind of project you do. For a research paper, your grade will be 30% writing, 40% scientific/algorithmic rigor, and 30% impact. If you do a survey paper, the grade will be 30% writing, 30% summary of the literature you discuss, and 40% commentary (i.e., insight, critical and thoughtful discussion of the issues that come up).
See the chapter on Projects from the course textbook
for possible research projects.
You might also want to look at
this list of suggested
final projects.
*You will need to submit a 1-2 page proposal for your final project
in advance (via Moodle).
You should meet with me to discuss this proposal before you submit it,
and then again meet with me after I have given you feedback about
the proposal.
An approved proposal is required (via Moodle).
*

** Class Presentation**:
All students will present research papers from
the recent scientific literature.
The presentation of scientific papers is
a major part of the course, and all students
are expected to participate actively
in discussing these papers.

*If you are presenting the paper:*You will select a paper from the recent scientific literature (related to the course material); these papers will be distributed to the entire class, and all students will read all the papers and submit questions to you in advance to help you prepare the presentation. You will have 15 minutes for the presentation, followed by 15 minutes of Q&A from the other students. You will send the PDF of your presentation to me at least 48 hours before the presentation (to receive feedback from me, submit it at least 72 hours before).-
*If you are not presenting the paper:*You are expected to read and understand the paper, and to participate actively in the discussion. You will submit a one paragraph summary of the paper and two questions about the paper to the professor at least 72 hours before the presentation; these will be forwarded to the student doing the presentation. You should ask at least one question during the Q&A period. The questions you ask count towards the course participation grade. - Paper presentations:
- Sayantani Basu will present "Collecting reliable clades using the Greedy Strict Consensus Merger" (PDF)
- Thomas Cowell will present "Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information", (HTML)
- Surbhi Jain will present "A two-state model of tree evolution and its application to Alu Retrotransposition" (HTML)
- Bryce Kille will present "On the Approximability of Numerical Taxonomy (Fitting Distances by Tree Metrics)", SICOMP 1998, https://doi.org/10.1137/S0097539795296334.
- Yuanyuan Qi will present "Quartet Inference from SNP Data Under the Coalescent Model", a paper on SVDquartets by Chifman and Kubatko. (PDF)
- Palash Sashitall will present "QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data" (HTML)
- Sarah Schieferstein will present "Horses or farmers? The tower of Babel and confidence in trees" (HTML)
- Mia Schoening will present "IQ-Tree: A fast and effective stochastic algorithm for maximum-likelihood phylogenies", (HTML)
- Vlad Smirnov will present "Clustering by Passing Messages Between Data Points" (PDF)
- Xilin Yu will present "TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees", (HTML)

- "Mash: fast genome and metagenome distance estimation using MinHash" (HTML)
- "Phase transition in the sample complexity of likelihood-based phylogeny inference" (HTML)
- "Renewing Felsenstein's phylogenetic bootstrap in the era of big data" (HTML)
- "Twisted trees and inconsistency of tree estimation when gaps are treated as missing data -- The impact of model mis-specification in distance corrections" (HTML)
- "On the Variance of Internode Distance Under the Multispecies Coalescent" (HTML)
- "Necessary and sufficient conditions for consistent root reconstruction in Markov models on trees" (PDF)
- "Distance-based species tree estimation under the coalescent: Information-theoretic trade-off between number of loci and sequence length" (HTML) (arXiv)
- "Species Trees from Gene Trees Despite a High Rate of Lateral Genetic Transfer: A Tight Bound" (arXiv)
- "Quartet MaxCut: A fast algorithm for amalgamating quartet trees" (HTML)
- "SiFit: A Method for Inferring Tumor Trees from Single-Cell Sequencing Data under Finite-site Models" (bioRxiv)
- "Towards an accurate and efficient heuristic for species/gene tree co-estimation" (HTML)

**Course Participation:**
Your course participation will be evaluated in terms of
how you
participate in the in-class discussions of the scientific literature
we are reading, and also of the presentations
of scientific papers given
by the other students.

**Emergency response recommendations**

Please see the websites for the course from previous semesters, such as CS 581 from Spring 2018 and CS 581 from Spring 2017, which have substantial overlap with this semester's course.