CS 466: Introduction to Bioinformatics
All homework must be submitted through Moodle in PDF
format and are due
at 1 PM on their due date.
Please make sure that the homeworks are legible.
Unless otherwise specified,
all homework problems are
taken from the homework or review problems in the Computational Phylogenetics
textbook provided at the end of the specified chapter.
Please note the difference between homework and review problems!
Rules regarding homework (academic integrity code):
Do not copy any text from any source without proper attribution.
The safest way to do this is to put your copied text in quotes
and specify the source.
Even paraphrasing is considered plagiarism.
You are not allowed to consult with anyone outside of the class (other than the instructor and TA) about homework problems. However, you are encouraged to work with other students currently enrolled in this course on the homework. If you work with someone else, observe the following guidelines: (1) indicate on the homework who you worked with, (2) do not look at the other homework solutions when you write your own solutions, (3) do not share your solutions with anyone, and (4) write your homework solutions entirely on your own, using your own language.
Homework regrade policy: You can request regrading of
up to three problems in total (note - not three homeworks).
To request a regrade, let me know
within one week of the homework
return date, and I will give you a new
problem assignment. You will need to submit
your answer via Moodle, in the same PDF as the next homework
(i.e., in one combined PDF).
Extra credit policy:
There are two types of extra credit homework
assignments. Some of these are explicitly identified
as such, and all students can do these for
extra credit (see HW 13). Also, some of
the homeworks that are assigned to the graduate
students taking 4 credit units can be done by
other students for extra credit; see
HW 5, Hw 6, and HW 8 (Part 1).
A grade of 80% or better on any extra credit homework can then be used
to replace any prior homework grade.
- In general,
homeworks submitted late but before 24 hours after
the deadline receive 80% of the grade.
Homework submitted after 24 hours but
before 48 hours receive 60% of the grade.
No homework will be accepted after 48 hours past the
some of these homeworks are questions on papers that
will be presented or on course project proposals; these
are subject to a more restricted
late policy. For these homeworks, submitting after the
deadline but before the next morning at 7 AM will
receive 50% credit, and homework submitted after
that will not be accepted.
The assignments based on comments or questions for
papers or course project proposals count as 1/3 the credit of
the earlier homeworks.
Graduate students receiving 4 graduate units:
Extra assignments are provided that you should do, in
addition to the regular homework problems.
(Other students are allowed to do these for "extra credit",
as described above.)
April 13 (HW 14):
Submit comments on all project proposals presented on April 12.
April 15 (HW 15):
Submit comments on all project proposals presented on April 14.
April 18 (HW 16):
Submit comments on all project proposals presented on April 17.
April 20 (HW 17):
Submit comments on all project proposals presented on April 19.
April 21 (HW 18):
Submit at least two questions
about the paper on Clustal Omega (reading assignment for April 19);
at least one question must be about the supplementary materials.
To receive credit for this homework, you must
ask the questions during the
Submit draft of course project on Moodle (HW 19).
May 3: Submit final project on Moodle (this will appear as
a homework assignment).
- January 21: If you
did not take CS 173, then do
homework problems 7, 14, 15, and 29 from Appendix B.
(This homework does not count towards your grade.)
- January 24
Homework problems 1, 3, 11, 13, 14, 17, and 27
from Appendix B.
January 31 (HW 2):
(a) Homework problems 19, 20, and 21 from Appendix B.
(b) Write down
the De Bruijn graph for the string given by ACGTAACTAAGACGATAAT using k=4
(i.e., the vertices of the graph refer to 3-mers).
- February 7 (HW 3):
- Part 1:
Play the rockgame from
Homework problem 35 from
Appendix B with a friend. Try various starting conditions
(e.g., 3 rocks on each pile, or 3 rocks on one pile and 4 rocks on
the other, etc.).
Suppose you are the first player and you try really hard; do you think you
can always win, no matter what your opponent does when
the starting condition has 10 rocks on each pile?
What if the starting condition has 10 rocks on one pile and 11 on
- Part 2:
Write a short (2-3 page) summary of the three papers you read
(see assigned reading for January 30 and February 1). Make
sure you provide a
bibliography and your thoughts on the differing
opinions. Submit in PDF format on Moodle.
Please note that your grade will be based on
both what you say and your writing (grammar, spelling, etc.).
- February 14 (HW 4):
1. Homework problems 1-3 from Chapter 9.
2. Consider a two-player rock game where there are
two piles of rocks.
In each round, a player picks a pile and takes one or two rocks off.
The player who removes the last rock wins. Give
a dynamic programming algorithm to compute the winner.
Thus, the input will be the number of rocks on each pile,
and the output will be "1" if the first player is guaranteed
to have a winning strategy, and "2" if
the second player has a winning strategy.
Hence, your DP algorithm should compute a matrix WINNER(i,j) indicating who wins on input
with i rocks on one pile and j rocks on the other.
Analyze the running time for your algorithm, and explain why it
Show the matrix computed by your DP algorithm on input
where there are 5 rocks on one
pile and 6 rocks on the other pile.
(See homework problem 35 from
Appendix B as an example of this kind of problem.)
3. Design a dynamic programming algorithm to
compute the minimum edit distance between two
binary strings (i.e., strings of 0s and 1s), where
indels each cost C and substitutions cost C'.
Hence, the input will be four things:
strings X and Y, and the constants C and C'.
Your PDF should include:
- Your algorithm in PSEUDO-CODE, with explanations for the meaning
of each variable and what each line does.
complete Big-O analysis of the
matrix computed by your algorithm on
strings X=01000 and Y = 100011,
where C=2 and C'=3.
- February 21 (HW 5):
Do homework problem 11 from Chapter 9, and
answer all the questions from this document.
Graduate students receiving 4 graduate units
(extra credit for everyone else):
Implement the algorithm for the rock game from February 14 (any
language you like).
Run it for the following input cases:
Submit PDF of your
(well-documented) code and the output on the test data.
(a) 5 rocks on each pile and
(b) 100 rocks on each pile.
- February 28 (HW 6):
Do any 10 of review questions 1-14 from Chapter 9.
(b) Graduate students receiving 4 graduate units
(extra credit for everyone else):
Do homework problem 10 from Chapter 9.
- March 7 (HW 7):
Do (a) any 5 of the review questions from Chapter 1 and
(b) homework problems 1 and 2 from Chapter 1.
Graduate students receiving 4 graduate units:
Pick two papers (published in the last 5 years) you
would like to present in the course, and submit
a PDF for each paper
and an explanation of
why you selected them.
I will let you know which one you should
Note: the papers must be directly related to the
course topics: algorithmic aspects of genome assembly, multiple
sequence alignment, or phylogeny estimation.
In some cases, the
application of these methods to a scientific question
may be acceptable.
You may wish to select a paper that is discussed in
- March 14 and 18 (HW 8):
- Due March 14:
Do review questions 13-16 and homework problem 10 for Chapter 8.
Graduate students receiving 4 graduate units :
- Part 1 (extra credit for other students):
Homework problem 15 from Chapter 5 (due March 18)
Homework problem 13 from Chapter 8 (due March 18)
- Part 2:
paper selected for your presentation in PDF format; please note that the paper
will be distributed to the students in the class and
posted on the class website.
You will also need to submit
the PDF of your presentation (to me and the TA) a minimum of 48 hours
before your presentation time.
The presentations will be later in the semester.
- March 26 (HW 9): See Moodle for problem set. (This homework does
not have a grace period for submitting answers, and counts more than
the usual homework assignment.)
- April 3 (HW 10): Final project proposals (2-4 pages, with bibliography).
- April 8 (HW 11): Revised project proposal (2-5 pages, with bibliography).
- April 11 (HW 12a,b and 13):
- Submit comments on all project proposals presented on April 10.
Submit a PDF with figures showing an alignment and/or a
tree that you computed
on one biological dataset, and provide details for how you calculated
12b: Extra credit (do at most one). Each of the
following problems involves analyzing a dataset
Implement a method for the following problem:
Submit your code (well documented) and show
the cost of T for input that I will provide.
Note: develop the code entirely by yourself - do not copy code
for MST from any external source.
- Input: A multiple sequence
alignment M on a set S of sequences, and
a cost C for substitutions and C' for indels.
- Output: a minimum spanning tree
for the set S where the cost of an edge
(v,w) is the cost of the pairwise
alignment of u and v according to the
multiple sequence alignment and
the costs C and C'.
Construct a phylogenetic tree on a biological dataset (provided at
the URL listed below) using at least
two different multiple sequence alignment methods, followed
one or more phylogeny estimation methods (e.g., neighbor joining,
maximum parsimony, maximum likelihood).
Write a paper about your analysis, including
full details for how you analyzed the dataset (software,
version numbers, commands) and draw the trees you computed.
If the trees are different, comment on the differences.
See https://pranj.al/mammalian-tutorial.html for the dataset
and links to software for
getting started with this.
- HW 13: Second revision of project proposals (2-5 pages, with bibliography).