Homework for CS 581, Fall 2022
Homework policies
Due date:
All homework is due at 10 PM on the due date, via Moodle (unless
otherwise specified).
Late homeworks (up to 48 hours late) can be accepted for reduced
credit:
80% if within 24 hours and 60% if within 48 hours.
Collaboration policy:
You are expected to write up the homework yourself, but you are
welcome to discuss the homework with other students in the class.
If you discuss the homework with other students, clearly specify who you
worked with on
your submitted homework.
Reading assignments:
Some homeworks involve homework problems from the textbook, and
many homework assignments involve
reading the textbook or published papers.
The class discussion depends on you doing the reading,
as I will not be teaching all the material.
Homeworks connected to student presentations (starting Oct. 24)
Note that starting on October 25, each student will be presenting
a paper in the class.
That presentation, plus the discussion afterwards,
will use up the entire class.
Every student is expected to read the papers in advance,
and there will be a homework assignment (for each paper)
with submission date (in Moodle) the evening before the presentation.
That homework assignment includes writing at least 3 paragraphs
about the paper and providing questions
for the author (see Moodle for instructions).
The average of the homework grades, across these 7 assignments,
will count for 3 homework grades.
Review questions:
The textbook has two types of questions: review questions and
homework problems.
In general, I will be assigning problems from the
homework problems and not from the review questions (although
do note that this is not true for some homework assignments).
We may discuss
the review questions in class, so
please look over the review questions as well.
Disputing a grade:
Please come see me directly if you have questions
or concerns about how your homework was graded.
Grading policy:
The homework overall contributes 40% of the course grade,
and each homework (unless otherwise specified) contributes the same amount.
The worst two hw grades are dropped.
Reading assignments
The assignments from the textbook do not include the "Further Reading"
sections.
Also note that we will
be including assigned reading from the scientific literature,
with 1-2 papers a week assigned after we finish going through the
textbook assignments.
In general, reading about 10-20 pages each week is to be expected.
The due dates for the reading assignments are the night before the class meeting,
which means you should come to class having already done the assigned reading.
- August 29: Chapter 1
(PDF)
and Chap 2.1-2.2, 3.1-3.4.2
-
August 31: Chapters 2.3-2.10, 3.4.3-3.5, 5.1-5.5.1, 6.1-6.2.
- September 5: Chapter 4.1-4.7, 8.1-8.8
- September 7: Chapters 5.6-5.7, 5.10, 5.11, 8.13
- September 12: Chapter 10.1-10.5,
and Zaharias and Warnow 2022
- September 14: Chapter 10.6-10.7, 7.1-7.7.9, 11.8
- September 19: Chapter 9.1-9.5
- September 21: Chapter 9.6-9.12
- October 10: Chapter 9.13-9.20
- October 13: Yasamin's paper at
(HTML)
- November 28. Please read the following papers, which address the impact of
taxon sampling. These papers are just a few of the papers
written in response to the paper by Hillis on the Hobgoblin of Phylogenetics,
which will be presented by Jiayue.
-
Junhyong Kim.
"Large-scale phylogenies and measuring the performance of phylogenetic estimators",
Systematic Biology 1988.
DOI: 10.1080/106351598261021
-
Ziheng Yang and Nick Goldman. "Are big trees indeed easy?." Trends in Ecology and Evolution 12.9 (1997): 357-357.
DOI: 10.1016/s0169-5347(97)83196-5.
-
David Hillis.
"Are big trees indeed easy? Reply from
D.M. Hillis".
Trends in Ecology and Evolution 12.9 (1997).
DOI: 10.1016/s0169-5347(97)83198-9
Homework Assignments
- Homework 0. Due Tuesday, August 30. This will be graded but
the grade will not count towards the course grade.
-
All review questions for Chapter 1.
- Chapter 1, problems 1, 8, 10, and 11.
- Appendix B, problem 21
- Prove by induction on n (where n is a positive integer)
that every connected graph on n nodes
has at least n-1 edges.
- Homework 1. Due Friday, September 2, 10 PM.
-
Chapter 2: problems 29 and 32.
-
Chapter 3: problems 5 and 22.
-
Homework 2. Due Friday, September 9, 10 PM
-
Chapter 4 problems 1, 5, 6, 17.
-
Chapter 5 problems 7, 9, 12, 18.
-
Chapter 8 problems 7-9.
-
Homework 3. Due Friday, September 16, 10 PM
- Chapter 6, problem 2,4,8
- Chapter 7, review questions 1,2,4.
-
Homework 4. Due Friday, September 23, 10 PM.
- Chapter 10, problems 4-7, 10
- Homework 5. Due Sunday, September 25, 10 PM (but
late homework past this time not allowed).
In this
week's homework, you will be doing analyses of datasets using
phylogeny estimation software, and you will write it all up as well.
The collaboration policy for this homework is you can discuss
with others, but you must write your own scripts, analyze the
data yourself, and write it all up yourself.
You will need to get help from
the TA, so please get ready.
Write up your results sufficiently for the experiment you did to be reproducible
(i.e., method version numbers and commands).
Comment on: what did you expect to see? What did you see?
What did you learn?
Include properly cited references for any software that you use.
-
This is for gene tree estimation, using ``true alignments".
Familiarize yourself with
the description of the
simulated datasets from the 1000M1 and 1000M4 model conditions
(see this webpage)
from the original
paper that introduced them (Liu et al., "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.).
The datasets themselves are at this page.
You will construct trees on true alignments for 5 replicates,
using 5 methods: FastTree (under GTR and also under JC) and Neighbor Joining
(using logdet, JC distances, and p-distances).
Note, you may also want to try some other methods, but first get
the 5 listed above.
For each tree you compute, compare it to the PIMT for
the dataset, and record the FN and FP error rates.
Do not be surprised if they are not the same, and do think about what this means.
Make a table or figure of average tree errors for each of the five methods for
the two models.
For each of the two model conditions (i.e., 1000M1 and 1000M4), comment on all trends you observe, including:
- Which method is the most accurate?
- Does changing the estimation model for FastTree impact accuracy?
If so, how?
- Does changing the distance correction for Neighbor Joining
impact accuracy? If so, how?
Next, compare results under the model conditions 1000M1 and 1000M4:
- Do any of the trends above change as you change model conditions?
- Does the relative accuracy of methods remain the same between
the model conditions?
- One of the model conditions is "easier" (in that the methods have higher
accuracy); which one? And what is it about the model condition (i.e.,
numeric parameters and/or empirical statistics of the
sequences that are produced) that
suggests why that model condition is easier than the other?
Also comment on the following:
- Describe how the data were generated (this requires you look at the paper from Science 2009); what sequence evolution model was used?
- Which of the methods that you ran is statistically consistent for the model that generated the data? Justify
your answers.
- Did the statistically consistent methods
always produce more accurate trees than the methods not
guaranteed to be consistent?
Note: your grade for this assignment will be based on
both content (75) and writing (25%).
Reproducibilty is part of content.
Please review the "Guidelines on writing assignments", on the course
webpage.
- Homework 6. Due Sept 30, 10 PM.
- Review questions:
Chapter 9, review questions 1-14
-
HW problems:
Chapter 9 problems 1-3,
-
Read the following papers, and also find at least one other paper
on the topic of multiple sequence alignment that cites at least one
of these papers.
Write a review of the papers you've read, summarizing
and critiquing each paper, and provide comments on your
thoughts overall after having read the papers.
Be sure to comment on
your opinion on whatever debates might be present in these
papers.
Your review should be at 11 pt font, 1.5" margins,
single spaced, and at least 3 pages (not counting the references.
The grade will depend on writing as well as content.
-
Boyce, Kieran, Fabian Sievers, and Desmond G. Higgins. "Simple chained guide trees give high-quality protein multiple sequence alignments." Proceedings of the National Academy of Sciences 111.29 (2014): 10556-10561.
-
Garriga, E., Di Tommaso, P., Magis, C. et al. Large multiple sequence alignments with a root-to-leaf regressive method. Nat Biotechnol 37,
1466--1470 (2019). https://doi.org/10.1038/s41587-019-0333-6)
- Homework 7. Due October 6, 9 AM.
- Note assigned reading (above).
- Chapter 9, homework problems 10-12.
- Give a polynomial time algorithm for the
following problem. The input is a pair of strings, and the output
is the length of the longest common substring.
Note specifically that this means there are no omitted letters, which is allowed
in common subsequences.
For example, the longest shared substring between CAATAGAC and CCGTAGATTTA is TAGA
but TAGA is not the longest common subsequence.
Give a big-O analysis for the runtime.
Full credit depends on how efficient the algorithm is.
- Course Project Assignment 1. Due October 18, 10 PM. (This only
counts towards class participation.)
Submit a 1-2 page course project proposal. Detail is not
important, as we will iterate together based on whatever you submit.
Explain the problem you wish to address and what you plan to do.
This can be a literature survey if you do not want to do
research.
If you are not working alone, then say who you
will work with;
in general, teams should not be bigger
than two people.
Provide a list of papers related to your topic, with at least 5 papers
(and note that most should be relatively recent, within the last
5 to 10 years).
-
October 24: homework associated with student presentation
on October 25 (see Moodle, and also see information above).
- Course Project Assignment 2. Due October 24. Revised proposals due. Please see and read instructions on project proposals.
- October 26: homework associated with student presentation on October 27.
- October 29: homework associated with student presentation on November 1
- Course Project Assignment 3. Due October 31. Final version of course proposals due. Please see and read instructions on project proposals.
- Nov 2: homework associated with student presentation on November 3
- Nov 7: homework associated with student presentation on November 8
- Nov 9: homework associated with student presentation on November 10
- Nov 16: homework associated with student presentation on November 17
- Nov 28: homework associated with student presentation on November 29
- Course Project Assignment 4 (FINAL). Due last day of class.