Homework for CS 581, Fall 2022
All homework is due at 10 PM on the due date, via Moodle (unless
Late homeworks (up to 48 hours late) can be accepted for reduced
80% if within 24 hours and 60% if within 48 hours.
You are expected to write up the homework yourself, but you are
welcome to discuss the homework with other students in the class.
If you discuss the homework with other students, clearly specify who you
worked with on
your submitted homework.
Some homeworks involve homework problems from the textbook, and
many homework assignments involve
reading the textbook or published papers.
The class discussion depends on you doing the reading,
as I will not be teaching all the material.
The textbook has two types of questions: review questions and
In general, I will be assigning problems from the
homework problems and not from the review questions (although
do note that this is not true for some homework assignments).
We may discuss
the review questions in class, so
please look over the review questions as well.
Disputing a grade:
Please come see me directly if you have questions
or concerns about how your homework was graded.
The homework overall contributes 40% of the course grade,
and each homework (unless otherwise specified) contributes the same amount.
The worst two hw grades are dropped.
The assignments from the textbook do not include the "Further Reading"
Also note that we will
be including assigned reading from the scientific literature,
with 1-2 papers a week assigned after we finish going through the
In general, reading about 10-20 pages each week is to be expected.
The due dates for the reading assignments are the night before the class meeting,
which means you should come to class having already done the assigned reading.
- August 29: Chapter 1
and Chap 2.1-2.2, 3.1-3.4.2
August 31: Chapters 2.3-2.10, 3.4.3-3.5, 5.1-5.5.1, 6.1-6.2.
- September 5: Chapter 4.1-4.7, 8.1-8.8
- September 7: Chapters 5.6-5.7, 5.10, 5.11, 8.13
- September 12: Chapter 10.1-10.5,
and Zaharias and Warnow 2022
- September 14: Chapter 10.6-10.7, 7.1-7.7.9, 11.8
- September 19: Chapter 9.1-9.5
- September 21: Chapter 9.6-9.12
- October 10: Chapter 9.13-9.20
- Homework 0. Due Tuesday, August 30. This will be graded but
the grade will not count towards the course grade.
All review questions for Chapter 1.
- Chapter 1, problems 1, 8, 10, and 11.
- Appendix B, problem 21
- Prove by induction on n (where n is a positive integer)
that every connected graph on n nodes
has at least n-1 edges.
- Homework 1. Due Friday, September 2, 10 PM.
Chapter 2: problems 29 and 32.
Chapter 3: problems 5 and 22.
Homework 2. Due Friday, September 9, 10 PM
Chapter 4 problems 1, 5, 6, 17.
Chapter 5 problems 7, 9, 12, 18.
Chapter 8 problems 7-9.
Homework 3. Due Friday, September 16, 10 PM
- Chapter 6, problem 2,4,8
- Chapter 7, review questions 1,2,4.
Homework 4. Due Friday, September 23, 10 PM.
- Chapter 10, problems 4-7, 10
- Homework 5. Due Sunday, September 25, 10 PM (but
late homework past this time not allowed).
week's homework, you will be doing analyses of datasets using
phylogeny estimation software, and you will write it all up as well.
The collaboration policy for this homework is you can discuss
with others, but you must write your own scripts, analyze the
data yourself, and write it all up yourself.
You will need to get help from
the TA, so please get ready.
Write up your results sufficiently for the experiment you did to be reproducible
(i.e., method version numbers and commands).
Comment on: what did you expect to see? What did you see?
What did you learn?
Include properly cited references for any software that you use.
This is for gene tree estimation, using ``true alignments".
Familiarize yourself with
the description of the
simulated datasets from the 1000M1 and 1000M4 model conditions
(see this webpage)
from the original
paper that introduced them (Liu et al., "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.).
The datasets themselves are at this page.
You will construct trees on true alignments for 5 replicates,
using 5 methods: FastTree (under GTR and also under JC) and Neighbor Joining
(using logdet, JC distances, and p-distances).
Note, you may also want to try some other methods, but first get
the 5 listed above.
For each tree you compute, compare it to the PIMT for
the dataset, and record the FN and FP error rates.
Do not be surprised if they are not the same, and do think about what this means.
Make a table or figure of average tree errors for each of the five methods for
the two models.
For each of the two model conditions (i.e., 1000M1 and 1000M4), comment on all trends you observe, including:
Next, compare results under the model conditions 1000M1 and 1000M4:
- Which method is the most accurate?
- Does changing the estimation model for FastTree impact accuracy?
If so, how?
- Does changing the distance correction for Neighbor Joining
impact accuracy? If so, how?
Also comment on the following:
- Do any of the trends above change as you change model conditions?
- Does the relative accuracy of methods remain the same between
the model conditions?
- One of the model conditions is "easier" (in that the methods have higher
accuracy); which one? And what is it about the model condition (i.e.,
numeric parameters and/or empirical statistics of the
sequences that are produced) that
suggests why that model condition is easier than the other?
Note: your grade for this assignment will be based on
both content (75) and writing (25%).
Reproducibilty is part of content.
Please review the "Guidelines on writing assignments", on the course
- Describe how the data were generated (this requires you look at the paper from Science 2009); what sequence evolution model was used?
- Which of the methods that you ran is statistically consistent for the model that generated the data? Justify
- Did the statistically consistent methods
always produce more accurate trees than the methods not
guaranteed to be consistent?
- Homework 6. Due Sept 30, 10 PM.
- Review questions:
Chapter 9, review questions 1-14
Chapter 9 problems 1-3,
Read the following papers, and also find at least one other paper
on the topic of multiple sequence alignment that cites at least one
of these papers.
Write a review of the papers you've read, summarizing
and critiquing each paper, and provide comments on your
thoughts overall after having read the papers.
Be sure to comment on
your opinion on whatever debates might be present in these
Your review should be at 11 pt font, 1.5" margins,
single spaced, and at least 3 pages (not counting the references.
The grade will depend on writing as well as content.
Boyce, Kieran, Fabian Sievers, and Desmond G. Higgins. "Simple chained guide trees give high-quality protein multiple sequence alignments." Proceedings of the National Academy of Sciences 111.29 (2014): 10556-10561.
Garriga, E., Di Tommaso, P., Magis, C. et al. Large multiple sequence alignments with a root-to-leaf regressive method. Nat Biotechnol 37,
1466--1470 (2019). https://doi.org/10.1038/s41587-019-0333-6)
- Homework 7. Due October 6, 9 AM.
- Note assigned reading (above).
- Chapter 9, homework problems 10-12.
- Give a polynomial time algorithm for the
following problem. The input is a pair of strings, and the output
is the length of the longest common substring.
Note specifically that this means there are no omitted letters, which is allowed
in common subsequences.
For example, the longest shared substring between CAATAGAC and CCGTAGATTTA is TAGA
but TAGA is not the longest common subsequence.
Give a big-O analysis for the runtime.
Full credit depends on how efficient the algorithm is.
- Course Project Assignment 1. Due October 18, 10 PM. (This only
counts towards class participation.)
Submit a 1-2 page course project proposal. Detail is not
important, as we will iterate together based on whatever you submit.
Explain the problem you wish to address and what you plan to do.
This can be a literature survey if you do not want to do
If you are not working alone, then say who you
will work with;
in general, teams should not be bigger
than two people.
Provide a list of papers related to your topic, with at least 5 papers
(and note that most should be relatively recent, within the last
5 to 10 years).
Starting October 17, submit a one-paragraph summary
and 2-3 questions for each paper that is being presented the next day,
and send it to the student doing the presentation by 10 AM the day before
(with a cc to Tandy and Yasamin).
- Course Project Assignment 2. Due October 24. Revised proposals due. Please see and read instructions on project proposals.
- Course Project Assignment 3. Due October 31. Final version of course proposals due. Please see and read instructions on project proposals.
- Course Project Assignment 4 (FINAL). Due last day of class.