CS 598 AGB, Spring 2016 Course Schedule
2016 Course website
For your homework problems, please use the
old version of the textbook, which is here.
- January 19, 2016. Introduction to course.
(PPTX)
(PDF)
- January 21, 2016.
Introduction to stochastic models of sequence evolution,
using the Cavender-Farris-Neyman
(CFN)
model as an example.
Phylogeny estimation
under the CFN model.
(PPT)
(PDF)
For details about distance-based methods, see these:
(PPT)
(PDF)
Reading before class: Chapters 1-3 from textbook.
- January 26, 2016.
The Newick string representation of rooted trees.
Representation of rooted trees using
subtrees, distances,
clades and bipartitions.
Constructing rooted trees from clades by constructing Hasse Diagrams.
Constructing unrooted trees from unrooted four-leaf trees using
the All Quartets Algorithm.
(PPT)
(PDF)
Reading before class: Chapter 4.1-4.4, 4.6, and Chapter 5
Homework #1: Do at least 5 of the
following 11 problems from the textbook: 3.3(7), 3.3(8), 3.3(9).1-2, 3.3(17),
4.1(1), 4.2(8), 4.3(1), 4.3(2), 4.6(1), 5.1(4), and 5.2(3).
- January 28 and February 2, 2016.
Note: no office hour on February 2. Please
come
to my office hour on February 1, from 12-1,
instead.
January 28: Maximum parsimony (MP):
computational complexity and dynamic
programming solution for fixed tree variant.
February 2:
Parsimony-informative characters and
why MP is not statistically consistent under the CFN model
(the Felsenstein Zone).
(PPT) (PDF)
Reading before January 28 class:
Chapters 6.1-6.3, 6.5, 6.6, 6.8, and 9.13; also
the classic paper
"Cases in which parsimony and compatibility methods
will be positively misleading", by Joseph
Felsenstein, Systematic Zoology, Volume
27, No. 4 (1978), pp. 401-410.
- February 4, 2016.
Problem solving in class (very similar to homework 2),
using these
problems.
Pranjal Vachaspati and Ashu Gupta, guest lecturers.
Reading before class: Chapter 7.1-7.9
- February 9, 2016.
Analyzing sets of trees. Consensus
methods and supertree methods.
The Aho, Sagiv, Szymanski, and Ullman algorithm.
(PPTX)
Reading before class: Chapter 8.1-8.3
Homework #2:
Do at least 5 of the following problems:
6.2(1), 6.3(2), 6.4(2), 6.4(3), 6.8(5),
7.2(1), 7.3(1), and 7.4(2).
- February 11, 2016.
Statistical gene tree estimation, theory and practice.
(PPTX)
(PDF)
Reading before class: Chapter 9.1-9.12
and
The Hobgoblin of Phylogenetics, by David
M. Hillis, John P. Huelsenbeck, and David L. Swofford (Nature,
Vol. 369, 2 June 1994), one of the classic papers
in phylogenetics.
Homework #3: one page paper (PDF) discussing
the assigned paper by Hillis et al.
-
February 16, 2016.
Multiple sequence alignment.
1) Insertions, deletions, and pairwise sequence alignment.
2) Edit distances.
3)
The Needleman-Wunsch algorithm (which can be
phrased in terms of maximizing the score
(http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm or
http://www.avatar.se/molbioinfo2001/dynprog/dynamic.html)
or minimizing the edit distance
(http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/).
4) Multiple sequence alignment optimization problems.
5) MSA methods in practice.
Edit distances and pairwise alignment: (PDF)
MSA methods in practice:
(PDF)
(PPTX)
Reading before class: Chapters 10.1-10.3.
- February 18, 2016.
No office hours Monday Feb 22
Class discussion of papers selected for Homework #4.
Reading before class: Chapter 10.5-10.9.
Homework #4:
Select a paper (from 2000 to present) that shows a comparison
of MSA or tree estimation methods on simulated or
biological datasets (e.g., one of the papers
from page 6 of the presentation for February 11). Write a
paper of 2-5 pages with your discussion
of the paper. Make sure
to (a) provide full bibliography information about
the paper,
(b)
to summarize the paper, and (c)
to discuss whether you
agree with the conclusions and why.
Suggest a follow-up experiment or study, or identify
a question that was not answered by the study.
(The point is to be critical.)
In class: Give a 3 minute presentation about the paper
you selected and what you thought of it.
- February 23, 2016.
Hidden Markov Models (HMMs), and their use in multiple sequence
alignment
(PPTX)
(PDF)
Reading before class: Chapter 10.4,
and
http://www.cs.princeton.edu/~mona/Lecture/HMM1.pdf.
Homework #5:
Everyone (including biologists):
-
Download all software for the tutorial on February 25
from this location.
-
(7.5 pts)
Do problems 1-17 from
Review Questions.
You will receive credit for the best 15 problems from
1-17.
- (2.5 pts) Do at least one of problems 18-25 from the Review questions,
OR read one of the papers selected by one of
the other students from February 18, and write your own
2-3 page
review of the paper.
Extra credit: Do one or more of Problems 3.3(18)-3.3(22)
from the textbook.
-
February 25, 2016.
Tutorial by Dr. Nam-phuong Nguyen on PASTA (co-estimation
of multiple sequence alignment and tree),
computing distances between trees,
distances between alignments, and visualizing trees and alignments.
-
March 1, 2016.
Ensembles of HMMs and their use in biomolecular sequence
analysis.
Guest lecture by Dr. Nam Nguyen (PPTX).
Reading before class:
UPP paper and
TIPP paper.
Homework #6: Do at least 13
problems from Chapters 8-10, with at least
four problems in each chapter.
You will receive credit for the best 10 problems.
-
March 3, 2016.
Phylogenomics (genome-scale phylogeny estimation):
Inferring species trees in the presence
of Incomplete Lineage Sorting.
(PDF)
Background material:
(PDF)
(PPTX)
Reading before class: Chapter 11.1-11.4
Extra credit:
Use PASTA or UPP to
compute at least two multiple sequence
alignments and phylogenies
for some biological sequence dataset
of at least 20 sequences using at least two different
pipelines (vary the multiple sequence alignment
method, and/or vary the method for computing the phylogeny
given an alignment).
Compare alignments if you can, and compare trees
using bipartition distances (also called RF distances).
For the comparison of
trees, it will be
helpful if you estimate branch support (using bootstrapping
or some other technique), so that the significance
of the differences can be appreciated.
Comment on the differences you observe.
Write this up!
(Also, once you know how to do this, you might look and see
what happens when you take two nearly identical
sequence datasets, where the first is obtained by
replacing one of the sequences in the dataset by
a random sequence. How different
are the two trees you obtain?)
- March 8, 2016.
New methods for species tree estimation in the
presence of ILS.
Guest lecture: Pranjal Vachaspati (or perhaps Jed Chou)
(Pranjal's PDF)
(Jed's PDF)
Reading before class:
Homework #7:
(a) Write a 2-5 page critique of one paper either providing
a new method for
species tree estimation from multi-locus data, or
comparing methods for species tree estimation.
(b) Prepare a 5-7 minute presentation (in PDF format)
of the paper and your critique.
Submit the critique
either in class or by email to Tandy Warnow, and submit the
PDF presentation by email to Tandy.
No office hours Monday and Tuesday; will reschedule
for Friday
-
March 10, 2016.
Class presentations, discussing papers about methods for
species tree estimation from multi-locus data.
Here is Siavash's presentation from ISMB about
ASTRAL-2.
-
March 15, 2016.
We will discuss the midterm.
Due today (in class): 2-3 page document (PDF) describing
one or two final projects you might want to do.
Note: survey papers are fine, but research projects
are doable and more fun.
If you want to do a research project,
you can do this with another student in the
class; otherwise, you should work by yourself.
You must also list two papers (related
to your final project) that you have
already read, and that you would
be willing to present in class.
See this list for suggestions of possible final projects.
-
March 17, 2016. Class discussion: final projects.
Each person should present their
plans for a final project.
This does not require any PDF/PPT, but do be prepared to stand up
and talk about what you are thinking about doing.
Homework #8: Send PDF (by email to me) of the paper you will present during the March
31 to August 14 period.
The paper needs to be related to your final project.
Your presentation should be 20 minutes long, and you will
need to send me your presentation (in PDF or PPTX/PPT format)
at least 48 hours before your presentation date.
Your presentations and the paper you are presenting will be posted
to the class webpage so that the other students
in the class can see both before your talk. Also, you will receive
questions from the other students in the class 24 hours before
your presentation.
I will assign you a date to present the paper by March 19.
- March 21-25: Spring vacation
-
March 29, 2016.
Midterm papers due in class by 11:10 AM (or emailed to me
in PDF format,
or delivered to Elaine Wilson before then).
Solutions to Parts 1 and 2.
- March 31, 2016. Student presentations of midterm
projects.
- April 5, 2016. Student presentations.
- Mike Nute will talk about
"Joint Bayesian estimation
of alignment and phylogeny" by
B. Redelings and M. Suchard,
Systematic Biology 54(3):401-418, 2005.
(PDF)
Homework #9: for each presentation, write one paragraph summary
of the paper and provide two
questions for the student presenting the paper. These homeworks
are due by email at least 48 hours
before the student presentations (i.e., by Sunday at
11 AM), and will be forwarded to the students.
- April 7, 2016. Student presentations.
- Martin Hellwig will give a talk about
"Full modeling versus summarizing gene-tree
uncertainty: Method choice and species-tree accuracy"
by L.L. Knowles et al., Molecular Phylogenetics
and Evolution 65 (2012): 501-509.
(PPTX)
- Danielle Campbell will talk about
"Comparing two Bayesian methods for
gene tree/species tree reconstruction: simulations
with incomplete lineage sorting and
horizontal gene transfer" by Chung and Ané,
Systematic Biology (2011): syr003.
(PDF)
Homework #10: for each presentation, write one paragraph summary
of the paper and provide two
questions for the student presenting the paper. These homeworks
are due by email at least 48 hours
before the student presentations (i.e., by Tuesday at 11 AM), and will be forwarded to the students.
- April 12, 2016. Student presentations.
Homework #11:
for each presentation, write one paragraph summary
of the paper and provide two
questions for the student presenting the paper. These homeworks
are due by email at least 48 hours
before the student presentations, and will be forwarded to the students.
- April 14, 2016. Student presentations of
-
Kajori Banerjee will talk about
"FastTree computing large minimum evolution
trees with profiles instead of a distance
matrix" by MN Price, PS Dehal, and AP Arkin.
Molecular Biology and
Evolution 26(7), 1641-1650.
(PDF)
- Jordan Luber will give a talk about
"MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform" by K. Katoh et al. Nucleic Acids Research 30.14 (2002): 3059-3066.
(PDF)
(PPTX)
Homework #12:
for each presentation, write one paragraph summary
of the paper and provide two
questions for the student presenting the paper. These homeworks
are due by email at least 48 hours
before the student presentations, and will be forwarded to the students.
- April 19, 2016.
Advanced topic: New approaches
for supertree estimation.
Pranjal Vachaspati, guest lecturer.
(PDF)
No office hours April 18-22
- April 21, 2016. Advanced topic:
New methods for
co-estimation of gene trees and species trees.
Ashu Gupta,
guest lecturer
(PPTX)
Homework #13: write one page summary of 4/19 presentation,
and include one question.
- April 26, 2016.
Advanced topic:
Computational Historical linguistics (constructing phylogenetic trees and
networks from linguistic data)
(PDF)
(PPTX)
Homework #14: write one page summary of 4/21 presentation
and include one question.
- April 28, 2016.
BBCA: Improving *BEAST using random binning
(PDF)
Note: no more regularly scheduled office hours;
if you wish to meet with me, we can arrange one by appointment.
- May 3, 2016.
LAST CLASS DAY.
Jian Peng, guest lecturer, will speak
about computational methods for predicting
protein structure.
- May 5, 2016.
FINAL PROJECTS DUE by email (anytime before midnight).
Note: If you wish to get feedback on an early draft, submit it by
email by
May 1, 2016.
If you need an extension, please request it in advance.