Homework Assignments for CS 581, Spring 2025
The homeworks are typically due on Wednesdays, and need to be submitted
to Moodle. You can submit as many times as you want, up until the deadline.
Note: the deadline is now 11 AM on Wednesdays, so I can go over solutions
in class on Thursdays
k
The homework comes in at least three types:
-
Some homework involves doing mathematical problems, typically
taken from the textbook.
-
Some homework involves reading papers in the scientific literature,
then summarizing and critiquing them.
Please see this document
about advice specific for this.
-
Other homework involves running computational analyses on the
Campus Cluster using the Instructional queue and writing up a report on what you did. Please note that these assignments do
require your early attention, because if you have problems using the
campus cluster or the specified software, you will need help from the
TA (Eleanor Wedell), and it is best to get that help early.
Note that the homework requires writing, and that for the second and third
type of assignment the writing quality will be evaluated and will be considered in the homework grade.
Please
see my advice about writing at this page.
Homework #1 does not count towards your grade.
Of Homeworks 2 and later, the worst homework grade is dropped.
Use of AI in homework assignments is not allowed:
Unless specifically indicated otherwise, the use of any assistive AI is not allowed, not even for grammar checking.
Do not consult any external sources, such as the web.
When you are asked to write something, you must do this yourself, without use of assitive AI. (Spell checkers and grammar checkers are fine.)
Please also read carefully the rules about plagiarism, which also include paraphrasing other work.
Using Moodle and the late homework policy:
The homework needs to be written clearly (preferably using latex) and then saved as a PDF, and submitted in Moodle. You can revise your submission as many times as you want before the deadline, and so early submission is encouraged. Late submissions after the deadline are penalized, as indicated in the course webpage (i.e., 25% points removed is submitted late but within 24 hours, and no submissions allowed afterwards). If you have a real emergency (e..g., you are in the hospital, or you are attending a funeral), please let me know as early as possible so I can consider your case specially. Please do note that the bottom homework grade is dropped and since there will be many homeworks, no single homework counts that much. In other words: do not stress about this.
Collaboration policy:
The first homework was specifically for each person to work without any collaboration
with any other student.
For other homeworks (unless specified), you are allowed to collaborate with
other students in the class (but not outside the class).
If you do this, you must write the name of the other student with whom you
worked on the homework.
You must write up your solutions yourself, and do not look at each other's write-up.
Justifying answers:
In the homework assignments, please justify your answers and explain your
reasoning - don't just put down a solution.
For example, in a problem that says "Is this set of characters compatible?" Don't just say
Yes or No, expain how you derive the answer.
Reading assignments:
Each homework has reading assignments.
Any reading assignment that is from the textbook should be completed
by Monday of that week, since the Tuesday lecture will assume you have
read the material (and know all the content, including definitions).
Not all content in assigned reading is covered in class, and you are
nevertheless responsible for it.
Homework assignments
-
Homework #1.
Due Monday January 27, 5 PM.
This homework is designed to evaluate your preparedness for
the course, and should be relatively easy if you have mastered
the course pre-requisites.
(PDF)
-
Homework #2.
Reading assignment (due Monday, Feb 3)
- Chapter 1 and Chap 2.1-2.2, 3.1-3.4.2
from the textbook.
- The paper by Nakhleh et al. from PSB 2002
(PDF)
Homework (for submission in Moodle, due Wednesday Feb 5)
-
All review questions from Chapter 1.
-
Write a 1-2 page summary and critique of the Nakhleh et al. paper.
See my advice about how to write such critiques at this page.
Please do take care with your writing!
-
Chapter 2, problems 29 and 32.
-
Chapter 3, problems 5 and 22.
-
Homework #3.
Reading assignment (due Monday, Feb 10)
-
Chapter 2.3-2.8, 3.5, 4.1-4.7, 5.1-5.6, 5.10, 6.1-6.2, 8.1-8.8
Homework (for submission in Moodle, due Wednesday Feb 12)
-
Chapter 4, problems 5,6,17
-
Chapter 5 problems 7,9,12,18
-
Chapter 6, probleems 2,4,12
-
Homework #4
Reading assignment (due Monday, Feb 17)
Homework (for submission in Moodle, due Wednesday, Feb 19)
-
Chapter 8, problems 7-9, 13, 14
(For problems 13 and 14, give high level arguments,
do not calculate exact probabilitties)
-
Chapter 10, problems 4-7, 9
- Homework #5
Reading assignment (due Monday Feb 24)
Homework (for submission in Moodle, due Wednesday Feb 26)
- Pick one of the assigned papers from HW 4 or 5.
For that paper, look at papers that cite the paper, and pick two that are
sufficiently close and are also either an algorithm or a discussion
about methodological issues.
Write a 3-4 page review and critique of the three papers that
compares them and explains how they are similar or different in findings.
Make sure to provide a proper bibliography.
(This part of the homework counts for 60 points)
- Chapter 9, problems 1, 3, 5, 11 (this part counts for 40 points)
- Homework #6 (due March 13 at 11 PM) - late submission not allowed
- Familiarize yourself with the datasets at this link
- Read the paper from Liu et al. from Homework #5 carefully.
- In this
week's homework, you will be doing analyses of datasets using
phylogeny estimation software, and you will write it all up as well.
The collaboration policy for this homework is you can discuss
with others, but you must write your own scripts, analyze the
data yourself, and write it all up yourself.
You will need to get help from
the TA, so please get ready.
Write up your results sufficiently for the experiment you did to be reproducible
(i.e., method version numbers and commands).
Comment on: what did you expect to see? What did you see?
What did you learn?
Include properly cited references for any software that you use.
-
This is for gene tree estimation, using "true alignments".
Familiarize yourself with
the description of the
simulated datasets from the 1000M1 and 1000M4 model conditions from
the original
paper that introduced them (Liu et al., "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.).
The datasets themselves are at the link above.
You will construct trees on true alignments for 5 replicates,
using 5 methods: FastTree (under GTR and also under JC) and Neighbor Joining
(using JC distances, logdet, and p-distances).
Note, you may also want to try some other methods, but first get
the ones listed above.
For each tree you compute, compare it to the PIMT for
the dataset, and record the FN and FP error rates.
Do not be surprised if they are not the same, and do think about what this means.
Make a table or figure of average tree errors for each of the five methods for
the two models.
For each of the two model conditions (i.e., 1000M1 and 1000M4), comment on all trends you observe, including:
- Which method is the most accurate?
- Does changing the estimation model for FastTree impact accuracy?
If so, how?
- Does changing the distance correction for Neighbor Joining
impact accuracy? If so, how?
Next, compare results under the model conditions 1000M1 and 1000M4:
- Do any of the trends above change as you change model conditions?
- Does the relative accuracy of methods remain the same between
the model conditions?
- One of the model conditions is "easier" (in that the methods have higher
accuracy); which one? And what is it about the model condition (i.e.,
numeric parameters and/or empirical statistics of the
sequences that are produced) that
suggests why that model condition is easier than the other?
Also comment on the following:
- Describe how the data were generated (this requires you look at the paper from Science 2009); what sequence evolution model was used?
- Which of the methods that you ran is statistically consistent for the model that generated the data? Justify
your answers.
- Did the statistically consistent methods
always produce more accurate trees than the methods not
guaranteed to be consistent?
Note: your grade for this assignment will be based on
both content (75%) and writing (25%).
Reproducibilty is part of content.
Please review the "Guidelines on writing assignments", on the course
webpage.
- Optional homework
Note: if you do this homework, you can replace
your grade on one homework (and hence improve your course grade).
This is due by Monday April 21 (midnight)
-
Chapter 8, problem 11
-
Chapter 9, problem 10.
-
Chapter 10, problem 8
-
Write a 2-page review about the DEPP paper, discussing its
contributions, its
relationship to the other literature, etc.
Make sure to include a good bibliography.