High-level suggestions
To find papers to read, you can follow specific authors,
conferences, and journals, where high quality and readable publications
relevant to computational phylogenetics are likely to occur.
Here are some suggestions for each (incomplete in all cases,
and the lists are alphabetized):
- Authors:
This is an incomplete list, but really almost anything by
these people will be good (well written and relevant to
phylogenetics).
Some are more empirical and some are more mathematical.
Some just focus on the implementation issues for
software.
Elizabeth Allman,
Cecile Ane,
Joe Felsenstein,
Olivier Gascuel,
Nick Goldman,
Dan Gusfield,
David Hillis,
Barbara Holland,
Katharina Huber,
Daniel Huson,
Junhyong Kim,
Laura Kubatko,
Siavash Mirarab,
Bernard Moret,
Vincent Moulton,
Luay Nakhleh,
Sebastien Roch,
David Sankoff,
Alexis Stamatakis,
Mike Steel,
and
Simon Whelan.
- Conferences:
Some of the easier papers to read are in conferece
proceedings, such as
RECOMB, WABI, RECOMB-Comparative Genomics, ISMB, and ECCB.
- Journals:
Bioinformatics,
BMC Genomics,
IEEE/ACM Transactions on Computational Biology,
Journal of Computational Biology.
Molecular Biology and Evolution,
Molecular Phylogenetics and Evolution, and
Systematic Biology.
In general, the more recent the paper, the more relevant it is to
research you might want to do. But the earlier
papers may be easier to read.
My recommendation is you look at a few papers, and then
pick one (or two) that you are comfortable reading
and presenting.
Also, look at the supplementary materials!
Before you present the paper or write up your review,
read this document.
Papers related to maximum parsimony
-
Links between maximum likelihood and maximum parsimony under a simple model of site substitution.
Tuffley and Steel (1996).
Bulletin of Molecular Biology
59(3):581-607.
(HTML)
-
Success of Parsimony in the Four-Taxon Case: Long-Branch Repulsion by Likelihood in the Farris Zone.
Siddall (2005).
Cladistics
(HTML)
-
General inconsistency conditions for maximum parsimony: effects of branch lengths and increasing numbers of taxa.
Kim (1996).
Systematic Biology 45.3 (1996): 363-374.
(HTML)
Papers about taxon sampling
-
Hobgoblin of phylogenetics?
Hillis et al. (1994).
Nature volume 369, pages 363-364
(also read the follow-up papers)
(HTML).
-
Taxon sampling and the accuracy of phylogenetic analyses,
Heath et al. (2008).
Journal of Systematics and Evolution 46 (3): 239-257
(PDF)
-
Success of Phylogenetic Methods in the Four-Taxon Case.
Huelsenbeck and Hillis (1993).
Systematic Biology, Volume 42, Issue 3, Pages 247-264,
(HTML)
-
Is it better to add taxa or characters to a difficult phylogenetic problem?
Graybeal (1998). Systematic Biology,
47.1: 9-17
(HTML)
Papers about distance-based tree estimation
-
On the Approximability of Numerical Taxonomy (Fitting Distances
by Tree Metrics).
Agarwala et al. (1998) SIAM J. Computing
(HTML)
-
Large-scale neighbor-joining with NINJA.
Wheeler (2009).
International Workshop on Algorithms in Bioinformatics (WABI).
(PDF)
-
A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences.
Lacey and Chang (2006).
Mathematical Biosciences, 199(2), 188215.
(HTML)
-
The performance of neighbor-joining methods of phylogenetic reconstruction.
Atteson (1999).
Algorithmica 25.2-3: 251-278.
(PDF)
Papers about consensus and agreement subtrees
-
Maximum agreement subtree in a set of evolutionary trees: Metrics and efficient algorithms.
Amir and Keselman (1997). SIAM J. Computing.
Vol. 26, No. 6, pp. 1656-1669.
(HTML)
-
Collecting reliable clades using the Greedy Strict Consensus Merger.
Fleischauer and Bocker.
PeerJ (2016).
(PDF)
-
Against consensus.
Barrett et al. (1991).
Systematic Zoology 40.4 : 486-493.
(HTML)
Papers on multiple sequence alignment
-
T-Coffee: a novel method for fast and accurate multiple sequence alignment.
Notredame et al. (2000).
Journal of Molecular Biology, 302, 205-217.
(HTML)
-
The maximum weight trace problem in multiple sequence alignment.
Kececioglu (1993)/
Pages 106-119 of: Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 684. Springer-Verlag.
(PDF)
-
ProbCons: probabilistic consistency-based multiple sequence alignment.
Do et al. (2005).
Genome Research, 15(2),330-340.
(HTML)
-
Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis.
Loytynoja and Goldman (2008).
Science 320, no. 5883: 1632-1635.
(HTML)
-
Probalign: multiple sequence alignment using partition function posterior probabilities.
Roshan and Livesay.
Bioinformatics, Volume 22, Issue 22, 15 November 2006, Pages 2715-2721,
(HTML)
-
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.
Pei and Grishin.
Bioinformatics, Volume 23, Issue 7, 1 April 2007, Pages 802-808,
(HTML)
-
Current Methods for Automated Filtering of Multiple
Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.
Tan et al.,
Syst. Biol 2015, 10.1093/sysbio/syv033)
(HTML)
-
Selection of Conserved Blocks from Multiple
Alignments for Their Use in Phylogenetic Analysis.
Castresana.
Molecular
Biology and Evolution, Volume 17, Issue 4, 1 April 2000,
Pages 540-552,
(HTML)
-
Approximation algorithms for tree alignment with a given phylogeny.
Wang et al. (1996).
Algorithmica 16 (3), 302-315
(HTML)
-
Efficient methods for multiple sequence alignment with guaranteed error bounds/
Gusfield (1993).
Bulletin of mathematical biology 55 (1), 141-154
(PDF)
-
Sequence embedding for fast
construction of guide trees for multiple sequence alignment.
Blackshields et al. (2010).
Algorithms for Molecular Biology 5(21).
(HTML)
Papers on genome rearrangements
-
Assignment of orthologous genes via genome rearrangement.
Chen et al. (2005).
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB),
Vol 2, Issue 4, pp. 302-315.
(PDF)
-
A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study.
Bader et al.,
J. Computational Biology,
vol 8, number 5, 2001,
pp. 483-491.
(PDF)
-
Multiple Genome Rearrangement and Breakpoint Phylogeny.
Sankoff and Blanchette.
Journal of Computational Biology, 2009, Vol. 5, No. 3,
(HTML)
-
A Very Elementary Presentation of the Hannenhalli-Pevzner Theory.
Bergeron (2001).
In: Amir A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg
(HTML)
-
Approximating the true evolutionary distance between two genomes.
Swenson et al. (2008).
ACM J. of Experimental Algorithmics 12, 3.5
(HTML)
-
Scaling up accurate phylogenetic reconstruction from gene-order data.
Tang and Moret (2003).
11th Conf. on Intelligent Systems in Molecular Biology,
published in Bioinformatics.
(HTML)
Papers on maximum likelihood software
-
IQ-Tree: A fast and effective stochastic algorithm for
maximum-likelihood phylogenies.
Nguyen et al., Molecular Biology and Evolution, Volume 32, Issue 1,
January 2015, Pages 268-274, https://doi.org/10.1093/molbev/msu300
(HTML)
-
RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference.
Kozlov et al.
Bioinformatics, Volume 35, Issue 21, 1 November 2019, Pages 4453-4455,
(HTML)
-
FastTree 2-approximately maximum-likelihood trees for large alignments.
Price et al.
PLoS One 2020, 5 (3), e9490
(HTML)
Papers on species tree estimation
-
Gene trees in species trees.
W. Maddison (1997).
Systematic biology 46.3: 523-536.
(HTML)
-
Practical speed-up of Bayesian inference of species phylogenies by
restricting the space of gene trees.
Wang et al., bioRxiv 2019,
(PDF)
-
Towards an accurate and efficient heuristic for species/gene tree
co-estimation.
Wang and Nakhleh, Bioinformatics 34(17): i697-i705
(HTML)
-
Quartet Inference from SNP Data Under the Coalescent Model.
Chifman and Kubatko.
Bioinformatics, Volume 30, Issue 23, 1 December 2014, Pages 3317-3324,
(HTML)
-
ASTRAL-Pro: quartet-based species tree inference despite paralogy.
Zhang et al.
bioRxiv 2019.12.12.874727
(HTML)
-
Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction.
Erfan Sayyari et al.
Molecular Biology and Evolution, Volume 34, Issue 12, December 2017, Pages
3279-3291,
(HTML)
-
Distance-based species tree estimation under the coalescent:
Information-theoretic trade-off between number of loci and sequence length.
Mossel and Roch.
(HTML)
(arXiv)
-
On the Variance of Internode Distance Under the Multispecies Coalescent.
Roch, RECOMB-CG 2018, pp. 196-206
(HTML)
-
Incomplete lineage sorting: consistent phylogeny estimation from multiple loci.
Mossel and Roch (2008).
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(1), 166-171.
(PDF)
-
Species tree inference by minimizing deep coalescences
Than and Nakhleh (2009).
PLoS computational biology 5 (9)
(HTML)
Papers on deep learning in phylogenetics
-
Deep residual neural networks resolve quartet molecular phylogenies,
Zhengting Zou et al.,
Molecular Biology and Evolution, msz307,
(HTML)
-
Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning
Anton Suvorov et al.
Systematic Biology, Volume 69, Issue 2, March 2020, Pages 221-233,
(HTML)
Papers on complex sequence evolution
-
GHOST: Recovering Historical Signal from Heterotachously Evolved Sequence Alignments
Stephen M Crotty et al.
Systematic Biology, Volume 69, Issue 2, March 2020, Pages 249-264,
(HTML)
-
Modeling Compositional Heterogeneity.
Peter Foster.
Systematic Biology 53(3):485-495, 2004.
(HTML)
-
Reversible polymorphism-aware phylogenetic models and their application to tree inference.
Schrempf et al.
Journal of Theoretical Biology (2016),
407: 362-370.
(HTML)
-
Detecting and visualising the impact of heterogeneous evolutionary processes on phylogenetic estimates.
Jermiin et al.
bioRxiv 2020
(HTML)
Papers on quartet amalgamation methods
-
Constructing optimal trees from quartets.
Bryant and Steel (2001).
Journal of Algorithms 38.1 (2001): 237-259.
(PDF)
-
Quartet MaxCut: A fast algorithm for amalgamating quartet trees.
Snir and Rao.
Molecular Phylogenetics and Evolution, 2012, 62(1):1-8.
(HTML)
-
Weighted Quartets Phylogenetics. Avni et al.
Systematic Biology, Volume 64, Issue 2, March 2015,
pages 232-242
(HTML)
-
Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach.
Reaz et al.
PLOS One, 2014: 9(8):3104008.
(HTML)
-
Inferring evolutionary trees with strong combinatorial evidence.
Berry and Gascuel.
Theoretical computer science 240 (2), 271-298
(PDF)
-
A practical algorithm for recovering the best supported edges of an evolutionary tree,
Berry et al., SODA 2000
(PDF)
-
Fast error-tolerant quartet phylogeny algorithms.
Brown and Truszkowski (2011).
Annual Symposium on Combinatorial Pattern Matching, 147-161
(HTML)
Papers on reticulate evolution and phylogenetic networks
-
Networks: expanding evolutionary thinking.
Bapteste et al.
Trends in Genetics 29 (8), 439-441
(PDF)
-
The cobweb of life revealed by genome-scale estimates of horizontal gene transfer.
Ge et al.
PLoS biology 3.10 (2005).
(HTML)
-
RIATA-HGT: a fast and accurate heuristic for reconstructing horizontal gene transfer
Nakhleh et al.
International Computing and Combinatorics Conference, 84-93
(HTML)
-
Species Trees from Gene Trees Despite a High Rate of Lateral Genetic Transfer: A Tight Bound (Extended Abstract).
Daskalakis and Roch.
SODA 2016,
DOI:10.1137/1.9781611974331.ch110
(arXiv)
-
Recovering the Treelike Trend of Evolution Despite
Extensive Lateral Genetic Transfer: A Probabilistic Analysis.
Roch and Snir.
Journal of Computational Biology, 2013, Vol. 20, No. 2.
(HTML)
-
Inferring Phylogenetic Networks Using PhyloNet.
Wen et al. (2018).
Systematic Biology, Volume 67, Issue 4, July 2018, Pages 735-740,
(HTML)
-
Computational approaches to species phylogeny inference and gene tree reconciliation.
Nakhleh (2013).
Trends in ecology and evolution. Volume 28 (12), 719-728.
(HTML)
-
Neighbor-Net: An Agglomerative Method for the Construction of Phylogenetic Networks,
Bryant and Moulton (2004).
Molecular Biology and Evolution, Volume 21, Issue 2, Pages 255-265
(HTML)
Papers on alignment-free tree estimation
-
Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.
Bogusz and Whelan.
Syst. Biol. 66(2):218-231, 2017
DOI:10.1093/sysbio/syw074
-
An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood.
Zahin et al.
bioRxiv 2019.
(HTML)
Papers on tumor phylogenetics
-
Computational Models for Cancer Phylogenetics.
Schwartz (2019),
book chapter in
Bioinformatics and Phylogenetics, pp 243-275.
(PDF)
-
Inferring clonal evolution of tumors from single nucleotide somatic mutations.
Jiao et al.
BMC Bioinformatics (2014), 15(1), 35. doi:10.1186/1471-2105-15-35)
(HTML)
-
Computational approaches for inferring tumor evolution from single-cell genomic data.
Zafar et al.
Current Opinions in Systems Biology, Volume 7, pages 16-25, 2018.
(PDF)
-
A Combinatorial Approach for Single-cell Variant Detection via Phylogenetic Inference.
Edrisi et al., bioRxiv 2019,
(HTML)
-
Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures.
El-Kebir et al.
Cell systems 3 (1), 43-53 (2016).
(HTML)
Papers about phylogenetic placement
-
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.
Matsen et al. (2010).
BMC Bioinformatics 11(1):538.
(HTML)
-
APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments.
Balaban et al.
Systematic Biology 2019, syz063,
(HTML)
-
Phylogenetic Placement of Exact Amplicon Sequences Improves Associations
with Clinical Information.
Janssen et al., mSystems, Volume 3, Issue 3, 2018
(HTML)
Other papers
Many of these papers actually should be classified above somewhere, but I haven't gotten around to doing that...
-
MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees.
Matthews and Williams (2010).
BMC Bioinformatics 11.S1: S15.
(HTML)
-
A two-state model of tree evolution and its application to
Alu Retrotransposition.
Moshiri and Mirarab.
Systematic Biology, Volume 67, Issue 3, pages 475-489 (2018)
(HTML)
-
QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data.
Skums et al.
Bioinformatics, Vol. 34, Issue 1, pages 163-170 (2018)
(HTML)
-
Horses or farmers? The tower of Babel and confidence in trees.
Nicholls, J. Royal Statistical Society, volume 5, issue 3, pp. 112-117
(HTML)
-
Cactus: Algorithms for genome multiple sequence alignment.
Paten et al.
Genome research 21.9 (2011): 1512-1528.
(HTML)
-
Progressive alignment with Cactus: a multiple-genome aligner for
the thousand-genome era.
Armstrong et al.
bioRxiv (2019): 730531.
(HTML)
-
Mash: fast genome and metagenome distance estimation using MinHash.
Ondov et al. (2016).
Genome Biology 17:132.
(HTML)
-
Phase transition in the sample complexity of likelihood-based
phylogeny inference.
Roch and Sly.
Probability Theory and Related Fields 169, 3-62(2017).
(HTML)
-
Renewing Felsenstein's phylogenetic bootstrap in the era of big data.
Lemoine et al., Nature, 555, 452-456 (2018).
(HTML)
-
Twisted trees and inconsistency of tree estimation when gaps
are treated as missing data -- The impact of model mis-specification in distance corrections.
McTavish et al.
Molecular Phylogenetics and Evolution, Volume 93, pp. 289-295 (2015).
(HTML)
-
Necessary and sufficient conditions for consistent root reconstruction in Markov models on trees.
Fan and Roch.
Electron. J. Probab. 23 (2018), no. 47, 1-24.
(PDF)
-
Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences.
Li and Godzik (2006).
Bioinformatics, 22(13), 1658-1659. doi:10.1093/bioinformatics/btl158)
(HTML)
-
Haplotyping as perfect phylogeny: conceptual framework and efficient solutions.
Gusfield (2002).
Proceedings of the sixth annual international conference on Computational Biology (RECOMB).
(PDF)
-
Emerging Frontiers in the Study of Molecular Evolution.
Liberles et al. (2020).
Journal of Molecular Evolution,
(HTML)
-
Computational pan-genomics: status, promises and challenges.
The Computational Pan-Genomics Consortium.
Briefings in Bioinformatics (2018), 19 (1), 118-135.
(HTML)
-
On the inference of ancestries in admixed populations.
Sankararaman et al. (2008).
Genome research 18 (4), 668-675.
(HTML)
-
Learning nonsingular phylogenies and hidden Markov models.
Mossel and Roch (2005).
Proceedings of the thirty-seventh annual ACM Symposium on
Theory of Computing (STOC),
Pages 366-375
(HTML)
-
Terraces in phylogenetic tree space.
Sanderson et al. (2011).
Science 333 (6041), 448-450.
(HTML)