CS 581: detailed syllabus
The course covers Chapters 1-10 from
Computational Phylogenetics:
An Introduction to Designing Methods for Phylogeny Estimation, and
then additional topics based on student interest.
Some of the material is learned through reading major papers from the
scientific literature.
Topics covered include:
-
Stochastic models of evolution, including sequence
evolution models (e.g., Cavender-Farris-Newman, Jukes-Cantor, Generalized Time
Reversible Model, No Common Mechanism Model, etc.) and gene evolution models
(e.g., Multi-Species Coalescent and DLCOAL).
-
Statistical concepts: statistical identifiability of a model parameter and
statistical consistency of
an estimation procedure, and sample complexity for an estimation procedure.
The "safety radius" of a phylogeny estimation method,
and absolute fast converging methods.
-
Discrete mathematics for characterization and inference of trees (e.g.,
representation of rooted trees by clades and rooted triplets, and
representation of unrooted trees by additive matrices, bipartitions,
and quartet trees).
-
Major phylogeny estimation methods: maximum likelihood and Bayesian estimation,
maximum parsimony, and distance-based methods, and their theoretical and empirical
performance.
Branch support estimation.
-
Multiple sequence alignment estimation, including pairwise alignment through
Needleman-Wunsch, Bayesian estimation, and Hidden Markov Models (HMMs).
-
Algorithm design techniques: dynamic programming, divide-and-conquer, local
search strategies.
The specific challenges involved in large-scale estimation of alignments and
trees.
Each year, special topics are included, based on student
interest.
Material for these topics is generally based on the recent
scientific literature, and may include:
- Phylogenetic networks
- Machine learning in phylogenomics (including deep learning
and reinforcement learning)
- Phylogenetic estimation and epidemiology
- Whole genome alignment
- Computational Historical Linguistics