CS 581: Algorithmic Genomic Biology
Detailed Syllabus:

Jan 17: Introduction to course

January 19 to Feb 2: Phylogenetic trees
 Reading assignment:
 January 19: Sections 1.11.4 from textbook
 January 24: Sections 1.51.8 from textbook
 January 26: Sections 8.18.7 from textbook
 January 31: Sections 4.14.3 and 8.8 from textbook
 February 2: Sections 11.1 and 11.2 from textbook
 Lectures:
 January 19: Is it just the data? (PDF)
and Introduction to trees
(PDF)
 January 24, 26, 31, and Feb 2: Phylogenetic tree estimation
(PDF)
(PPT)
(January 24 slides 146,
January 26 slides 4771,
January 31 slides 7288,
February 2 slides 89111)
 We also discussed the strict and majority
consensus trees, why they always exist and are unique,
and how to construct trees from sets of
bipartitions (i.e., binary characters).
See Sections 4.4.1, 6.2.1, and 6.2.2 for this
material.
 Feb 7: Pairwise sequence alignment

Reading assignment:

Sections 9.19.4 from Computational Phylogenetics.

Lectures:

DP algorithm for pairwise edit distance
(PDF)

Feb 9 and 14: Hidden Markov models and
their applications in bioinformatics.
 Reading assignment:
 Feb 9: Sections 9.69.7.2 from Computational Phylogenetics,
and
Mark Paskin's Introduction to Probability Theory (PDF),
 Feb 14:
Sections 9.7.3 and 9.7.4 from
textbook, and
Mona Singh's course notes on profile HMMs
(PDF),
 Lectures:

Introduction to HMMs
(PPTX)
(PDF)

The Viterbi Algorithm
(PDF).
If time permits, I will also talk
about "Ensembles of HMMs", see
(PDF).

Feb 16: Attend bioinformatics
talks from 1112 in CSL room B02 (Mike Nute will
talk about
HIPPI, research he
did for a prior version of this course).
Register for the CSL conference
at this website.

Feb 21: Multiple sequence alignment.
 Reading assignment:
 Feb 21: Section 9.5 and 9.7.5 from
textbook
 Lectures:
 Feb 21: Tree Alignment
(PDF)
and Multiple sequence alignment techniques
(PDF)

Feb 23March 16: Species tree estimation
 Reading assignment:
 Feb 23: Sections 9.119.13 from textbook
 Feb 28: Sections 9.159.19 from textbook.
 March 2: Sections 10.110.4 from textbook.
Read papers from the list below.
Also, during the March 216 period, each student
will present a paper in the course.
Make sure you have read the paper they selected
and submitted questions on the paper via Moodle
at least 24 hours before their presentation.
The boldfaced papers below are
those papers selected by students,
and also
those that
I will present.

E.S. Allman, J.H. Degnan, and J.A. Rhodes. Identifying the rooted species tree from the distribution
of unrooted gene trees under the coalescent. Journal of Mathematical Biology, 62:833862,
2011.

E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014.
syu087.

M.S. Bansal, G. Banay, J.P. Gogarten, and R. Shamir. Detecting highways of horizontal gene
transfer. Journal of Computational Biology, 18(9):10871114, 2011.

M.S. Bayzid, T. Hunt, and T. Warnow. Disk covering methods improve phylogenomic analyses.
BMC Genomics, 15(Suppl 6):S7, 2014. A preliminary version appeared in RECOMBComparative
Genomics.

V. Berry and O. Gascuel.
Inferring evolutionary trees with strong combinatorial evidence.
Theoretical Computer Science 24 (2000), 271298.

B. Boussau, G.J. Szollsi, and L. Duret. Genomescale coestimation of species and
gene trees. Genome Research, 23(2):323330, December 2013

M. Brinkmeyer, T. Griebel, and S. Bocker. Polynomial supertree methods revisited. Advances
in Bioinformatics, 2011. Article ID 524182, doi=10.1155/2011/524182.

D. Bryant and J. Lagergren.
Compatibility of unrooted phylogenetic trees is FPT.
Theoretical Computer Science 351 (2006), 296302.

D. Bryant and M.A. Steel. 2001. Constructing optimal trees from quartets. Journal of
Algorithms, 38, 237259.

J. Chifman and L. Kubatko. 2014. Quartet inference from SNP data under the coalescent.
Bioinformatics, 30(23), 33173324.

J. Chifman and L. Kubatko 2015. Identifiability of the unrooted species tree topology
under the coalescent model with timereversible substitution processes, sitespecific
rate variation, and invariable sites. Journal of Theoretical Biology, 374, 3547.

G. Dasarathy, R. Nowak, and S. Roch 2015. Data requirement for phylogenetic inference
from multiple loci: a new distance method. IEEE/ACM Transactions on Computational
Biology and Bioinformatics, 12(2), 422432.

M.I. DeGiorgio and J.H. Degnan 2010. Fast and consistent estimation of species trees
using supermatrix rooted triples. Molecular Biology and Evolution, 27(3), 55269.

L.S. Kubatko and J.H. Degnan 2007. Inconsistency of phylogenetic estimates from concatenated
data under coalescence. Systematic Biology, 56, 17.
 F.J. Lapointe and G. Cucumel. The average consensus procedure: combination of weighted
trees containing identical or overlapping sets of taxa. Systematic Biology, 46(2):306312, 1997.
 B. Larget, S.K. Kotha, C.N. Dewey, and C. Ané 2010. BUCKy: Gene tree/species tree
reconciliation with the Bayesian concordance analysis. Bioinformatics, 26(22),
29102911.

S. Mirarab and T. Warnow 2015. ASTRALII: coalescentbased species tree estimation
with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12), i44i52.
 E. Mossel and S. Roch 2011. Incomplete lineage sorting: consistent phylogeny estimation
from multiple loci. IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 7(1), 166171.
 E. Mossel and S. Roch 2015. Distancebased species tree estimation:
information theoretic
tradeoff between number of loci and sequence length under the coalescent.
arXiv preprint. arXiv:1504.05289v1.

S. Roch and S. Snir 2013. Recovering the treelike trend of evolution despite extensive
lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology,
20, 93112.

S. Roch and M.A. Steel. Likelihoodbased tree reconstruction on a concatenation
of aligned sequence data sets can be statistically inconsistent. Theoretical Population
Biology 2015, 100, pp. 5662.

S. Roch and T. Warnow. 2015. On the robustness to gene tree estimation error (or lack
thereof) of coalescentbased species tree methods. Systematic Biology,
64(4), 663676.
 C. Scornavacca and N. Galtier. Incomplete lineage sorting in mammalian phylogenomics. Syst Biol 2017; 66 (1): 112120. doi: 10.1093/sysbio/syw082

J. Tonini, A. Moore, D. Stern, M. Shcheglovitova, and G. Orti.
Concatenation and species tree methods have
statistically indistinguishable accuracy under a range of
simulated conditions",
PLOS Currents: Tree of Life, 2015.
 Y. Yu, J. Dong, K. Liu, and L. Nakhleh. 2014. Maximum likelihood inference of reticulate
evolutionary histories. Proceedings of the National Academy of Sciences (USA),
111(46), 1644816453.

T. Zimmermann, S. Mirarab, and T. Warnow. 2014. BBCA: Improving the scalability of
*BEAST using random binning. BMC Genomics, 15(Suppl 6), S11. Proceedings of
RECOMBCG (Comparative Genomics).
 March 4: Section 10.5 from textbook.
 Lectures:
 Feb 23:
Coalescentbased species tree estimation (PDF)
 Feb 28: Introduction to supertree methods
(PDF)
 March 2: Introduction to
species tree estimation methods (PDF)
 March 7: Sarah Christensen, Yunan Luo, and
Muhammad Khan.

Sarah will present
"Constructing optimal trees from quartets", by
Bryant and Steel, Journal of
Algorithms 2001, 38, 237259.
(PDF)

Yunan will present
"Weighted quartets phylogenetics" by
Avni, Cohen, and Snir, Systematic Biology, 2015, 64(2):232242.
See (PDF).

Muhammad will present
"Recovering the treelike trend of evolution despite
extensive lateral genetic transfer: a probabilistic analysis"
by Roch and Snir, Journal of Computational Biology 2013, 20, 93112.
See (PDF).
 March 9: Shoham Das,
Jeremy Kemball, and Ben Kurtovic.
 March 14. Daewon Seo,
Thien Le, and Syed Shalan Naqvi.
 March 16. Qing Ye,
Ehsan Saleh, and
Shant Boodaghians.

March 2024: Spring Break

March 28: Preparation for midterm.

Reading assignment: Appendix B.10 from
Computational Phylogenetics
and Research Ethics.
Also see the following three presentations about
multiple sequence alignment methods, in practice

(PDF) (specifically look at slides 23, 25, 26, 30, 31, 4244)

(PDF) (about solutions to Generalized
Tree Alignment)

(PDF) (about the impact of
guide trees, and using SATé and PASTA to boost methods)

(PDF) (about method evaluation,
in general)

March 30: Distribution of midterm and discussion of the
questions
 April 4: Review of midterm
 April 6: Guest lecture by Erin Molloy
 April 1113: Largescale tree estimation

Reading assignment:
 April 11:
Sections 11.311.4 and 8.118.12.
 Introduction to chordal graphs and tree construction
(PDF)
 Why divideandconquer is helpful
(PDF)
 April 13:
Section 11.511.8
(PDF)
 April 18.
Introduction to historical linguistics.
(PDF)
 April 20.
Computing perfect phylogenies.
(PDF)
(PPT)
 April 25.
Mohammed ElKebir will talk
Combinatorial Algorithms in Tumor Phylogenetics.
Abstract:
Cancer is a genetic disease, where cell division, mutation and selection produce a heterogeneous tumor composed of distinct clones, i.e. different subpopulations of cells with different complements of mutations. In the later stages of cancer progression, cancerous cells from the primary tumor migrate and seed metastases at distant anatomical sites. Similarly to the evolutionary history of species, we can represent the cell division and mutation history of an individual tumor by a characterbased phylogenetic tree, where characters are mutations and taxa are clones. With cancer bulk sequencing data, however, we do not directly observe the leaves (taxa) of the phylogenetic tree. Instead, we are given variant allele frequencies that correspond to a mixture of unknown leaves in unknown proportions. The task in the Perfect Phylogeny Mixture Deconvolution Problem is to infer a twostate perfect phylogeny and mixing proportions of its leaves that explain the given allele frequencies. I will introduce algorithms for solving this problem based on a combinatorial characterization of perfect phylogeny trees as a restricted class of spanning trees in a graph, a characterization that also demonstrates the computational complexity of the problem. In addition, I will introduce a novel theoretical framework for analyzing the history of migrations of cells between anatomical sites in metastatic cancers. Using these methods, I analyze several cancers and identify tumor phylogenies and migration histories that are more biologically plausible than previously reported analyses.
 April 27.
Introduction to metagenomics.
(PDF)
 May 2 (last day of class:
turn in your final projects)