Homework for CS 581, Fall 2021
Homework policies
Due date:
All homework is due at 10 PM on the due date, via Moodle (unless
otherwise specified).
Late homeworks (up to 48 hours late) can be accepted for reduced
credit:
80% if within 24 hours and 60% if within 48 hours.
Collaboration policy:
You are expected to write up the homework yourself, but you are
welcome to discuss the homework with other students in the class.
If you discuss the homework with other students, clearly specify this on
your homework.
Reading assignments:
Some homeworks involve homework problems from the textbook, and
many homework assignments involve
reading the textbook or published papers.
The class discussion depends on you doing the reading,
as I will not be teaching all the material.
Review questions:
The textbook has two types of questions: review questions and
homework problems.
In general, I will be assigning problems from the
homework problems and not from the review questions (although
do note that this is not true for some homework assignments).
We may discuss
the review questions in class, so
please look over the review questions as well.
Disputing a grade:
Please come see me directly if you have questions
or concerns about how your homework was graded.
Grading policy:
The homework overall contributes 40% of the course grade,
and each homework (unless otherwise specified) contributes the same amount.
The worst hw grade is dropped.
Reading assignments
The assignments from the textbook do not include the "Further Reading"
sections. The total page count is approximate. Also note that we will
be including assigned reading from the scientific literature,
with 1-2 papers a week assigned after we finish going through the
textbook assignments.
In general, reading about 10-20 pages each week is to be expected.
- August 31: Chapter 1
(PDF)
(24 pages)
- September 3: Chapters 2, 3, 5.1-5.5.1
(18+7+7=32 pages)
- September 6: Chapter 4.1-4.4, 6.1-6.2, 7.1-7.2, 7.5
(11+7+4+2=24 pages)
- September 8: Chapters 5.6, 8.1-8.2, 8.4-8.8
(1+19=20 pages)
- September 13: Chapter 10.1-10.5
(20 pages)
- September 15: Chapter 10.7
(5 pages)
- September 20: Chapter 9.1-9.5
(17 pages)
- September 22: Chapter 9.6-9.12
(18 pages)
- September 27:
The Effect of the Guide Tree on
Multiple Sequence Alignments and Subsequent
Phylogenetic Analyses, by Nelesen et al (PSB 2008)
(PDF).
- October 14:
Recent
progress on methods for estimating and updating large phylogenies, by Paul Zaharias and Tandy Warnow.
- October 19.
Read the papers presented by students:
Yasamin's paper
and Sarthak's paper
and make sure to submit your write-up (paragraph and 2-3 questions) for each paper in Moodle.
- October 21.
Read the papers presented by students:
Homa's paper and
Melissa's paper.
Make sure to submit your write-up (paragraph and 2-3 questions) for each paper in Moodle.
- October 26.
Read the papers presented by students:
Greg's paper.
Akhil's paper.
- October 28.
Mingye's paper and
Kowshika's paper.
- November 2.
Yutong's paper
- November 15.
(HTML), book chapter by Luay Nakhleh (speaker on Tuesday Nov 16)
about phylogenetic networks.
Homework Assignments
- Homework 0. Due Tuesday, August 31. This will not be graded.
-
All review questions for the chapter,
- Chapter 1, problems 1, 8, 10, and 11.
- Homework 1. Due Friday, September 3, 10 PM.
-
Chapter 2: problems 29 and 32.
-
Chapter 3: problems 5 and 22.
-
Homework 2. Due Friday, September 10, 10 PM
-
Chapter 4 problems 1, 5, 6, 17.
-
Chapter 5 problems 7, 9, 12, 18.
-
Chapter 8 problems 7-9.
-
Homework 3. Due Friday, September 17, 10 PM
- Chapter 6, problem 2,4,8
- Chapter 7, review questions 1,2,4.
- Chapter 10, problems 4-7, 10
- Homework 4. Due Sunday, September 26, 10 PM (but
late homework past this time not allowed).
In this
week's homework, you will be doing analyses of datasets using
phylogeny estimation software, and you will write it all up as well.
The collaboration policy for this homework is you can discuss
with others, but you must write your own scripts, analyze the
data yourself, and write it all up yourself.
You will need to get help from
Eleanor, so please get ready.
Write up your results sufficiently for the experiment you did to be reproducible
(i.e., method version numbers and commands).
Comment on: what did you expect to see? What did you see?
What did you learn?
Include properly cited references for any software that you use.
-
This is for gene tree estimation, using ``true alignments".
Familiarize yourself with
the description of the
simulated datasets from the 1000M1 and 1000M4 model conditions
(see this webpage)
from the original
paper that introduced them (Liu et al., "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.).
The datasets themselves are at this page.
You will construct trees on true alignments for 5 replicates,
using 5 methods: FastTree (under GTR and also under JC) and Neighbor Joining
(using logdet, JC distances, and p-distances).
Note, you may also want to try some other methods, but first get
the 5 listed above.
For each tree you compute, compare it to the PIMT for
the dataset, and record the FN and FP error rates.
Do not be surprised if they are not the same, and do think about what this means.
Make a table or figure of average tree errors for each of the five methods for
the two models.
For each of the two model conditions (i.e., 1000M1 and 1000M4), comment on all trends you observe, including:
- Which method is the most accurate?
- Does changing the estimation model for FastTree impact accuracy?
If so, how?
- Does changing the distance correction for Neighbor Joining
impact accuracy? If so, how?
Next, compare results under the model conditions 1000M1 and 1000M4:
- Do any of the trends above change as you change model conditions?
- Does the relative accuracy of methods remain the same between
the model conditions?
- One of the model conditions is "easier" (in that the methods have higher
accuracy); which one? And what is it about the model condition (i.e.,
numeric parameters and/or empirical statistics of the
sequences that are produced) that
suggests why that model condition is easier than the other?
Also comment on the following:
- Describe how the data were generated (this requires you look at the paper from Science 2009); what sequence evolution model was used?
- Which of the methods that you ran is statistically consistent for the model that generated the data? Justify
your answers.
- Did the statistically consistent methods
always produce more accurate trees than the methods not
guaranteed to be consistent?
Note: your grade for this assignment will be based on
both content (75%) and writing (25%).
Please review the "Guidelines on writing assignments", on the course
webpage.
- Homework 5. Due October 5, 10 PM.
- Review questions:
Chapter 9, review questions 1-14
-
HW problems:
Chapter 9 problems 1-3,10,11,12.
-
Write a summary (1-2 pages),of the paper
at (PDF).
What problem does the paper address? What is the goal? What does it achieve?
What techniques are used, and are they successful?
What do you think would be the next steps?
- Homework 6. Due October 18, 10 PM. (This only
counts towards class participation.)
Submit a 1-2 page course project proposal. Detail is not
important, as we will iterate together based on whatever you submit.
Explain the problem you wish to address and what you plan to do.
This can be a literature survey if you do not want to do
research.
If you are not working alone, then say who you
will work with;
in general, teams should not be bigger
than two people.
Provide a list of papers related to your topic, with at least 5 papers
(and note that most should be relatively recent, within the last
5 to 10 years).
-
Starting Monday October 18, submit a one-paragraph summary
and 2-3 questions for each paper that is being presented the next day,
and send it to the student doing the presentation by 10 AM the day before
(with a cc to Tandy and Eleanor).
- Homework #7, due October 24. Revised proposals due. Please see and read instructions on project proposals.
- Homework #8, due October 31. Final version of course proposals due. Please see and read instructions on project proposals.
- Course project due last day of class.