CPHL: COMPUTATIONAL PHYLOGENETICS IN HISTORICAL LINGUISTICS


[INTRODUCTION] [PEOPLE] [PUBLICATIONS] [DATASETS] [SOFTWARE]

INTRODUCTION

The CPHL project, funded in part by the National Science Foundation, is a collaborative effort involving linguistics, computer science, and statistics, aimed at various goals.
  1. Producing and maintaining real linguistic datasets, in particular of Indo-European languages.
  2. Formulating statistical models that capture the evolution of historical linguistic data.
  3. Designing simulation tools and accuracy measures for generating synthetic data for studying the performance of reconstruction methods.
  4. Developing and implementing statistically-based as well as combinatorial methods for reconstructing language phylogenies, including phylogenetic networks.
New York Times article from 1996, about early work with Don Ringe.

NSF support for this project was provided through grants 0312911 and 0312830.


Former Courses and Workshops


PEOPLE


[Back to Top]

PUBLICATIONS

Copyright Notice: The documents accessible through these links are included by the author as a means to ensure convenient electronic dissemination of technical work on a non-commercial basis. Copyright and all rights therein are maintained by the copyright holders (the authors or the publishers), notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's and publisher's copyright. In particular, these works may not be re-posted without permission of the copyright holders.


18 T. Warnow, S.N. Evans, and L. Nakhleh (2023) " Progress on Constructing Phylogenetic Networks for Languages" arXiv v:2306.06298v2 PDF
17 F. Barbancon, S.N. Evans, L. Nakhleh, D. Ringe, and T. Warnow (2013) "An experimental study comparing linguistic phylogenetic reconstruction methods." Diachronica 30(2):143-170, and appendix. A preliminary version of this paper appeared in the conference "Languages and Genes", held at UC Santa Barbara, and organized by Bernard Comrie. PDF
16 J. Nichols and T. Warnow, "Tutorial on computational linguistic phylogeny." Linguistics and Language Compass, Vol. 2, Issue 5, September 2008, pages 760-820. PDF
15 D. Ringe and T. Warnow, "Linguistic history and computational cladistics." In: Origin and Evolution of Languages: Approaches, Models, Paradigms, B. Laks (ed.), Equinox Publishing, March 2008.
14 L. Nakhleh, T. Warnow, D. Ringe, and S.N. Evans, "A Comparison of Phylogenetic Reconstruction Methods on an IE Dataset." Transactions of the Philological Society, 3(2): 171-192, 2005. (The full version of the paper includes more details about running the methods, the datasets, etc.) PDF
13 L. Nakhleh, D. Ringe, and T. Warnow, "Perfect Phylogenetic Networks: A New Methodology for Reconstructing the Evolutionary History of Natural Languages." LANGUAGE, Journal of the Linguistic Society of America, 81(2):382-420, 2005. PDF
12 T. Warnow, S.N. Evans, D. Ringe, and L. Nakhleh, "A stochastic model of language evolution that incorporates homoplasy and borrowing." Phylogenetic Methods and the Prehistory of Languages. MacDonald Institute for Archaeological Research, 2006. PDF
11 T. Warnow, S.N. Evans, D. Ringe, and L. Nakhleh, "Stochastic models of language evolution and an application to the Indo-European family of languages." Technical report, Department of Statistics, The University of California, Berkeley, 2004. PDF
10 S.N. Evans, Don Ringe, and Tandy Warnow, "Inference of divergence times as a statistical inverse problem." Phylogenetic Methods and the Prehistory of Languages. Cambridge, UK, July 2004. PDF
9 S.N. Evans, and Tandy Warnow, "Unidentifiable divergence times in rates-across-sites models." IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(3): 130-134, 2004. PDF
8 E. Erdem, V. Lifschitz, L. Nakhleh, and D. Ringe, "Reconstructing the evolutionary history of Indo-European languages using answer set programming." Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages (PADL 03), 2003. PDF
7 D. Ringe, Tandy Warnow, and A. Taylor, "Indo-European and Computational Cladistics." Transactions of the Philological Society, 100(1):59-129, 2002. PDF
6 M. Bonet, C.A. Phillips, T. Warnow, and S. Yooseph, "Constructing evolutionary trees in the presence of polymorphic characters." SIAM J. Computing, 29(1):103-131, 1999. PDF
5 D. Ringe, " Tocharian class II presents and subjunctives and the reconstruction of the Proto-Indo-European verb." Tocharian and Indo-European Studies 9:121-142, 2000.
4 D. Ringe, T. Warnow, and A. Taylor, "Computational cladistics and the position of Tocharian ." In: The Bronze Age and early Iron Age peoples of eastern Central Asia (1998, ed. Victor Mair; JIES Monograph 26), pp. 391-414.
3 T. Warnow, "Mathematical approaches to comparative linguistics." Proceedings of the National Academy of Sciences, Vol. 94, pp. 6585-6590, 1997.
2 T. Warnow, D. Ringe, and A. Taylor, "Reconstructing the evolutionary history of natural languages ." Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1996, pp. 314-322.
1 T. Warnow, D. Ringe, and A. Taylor, "Reconstructing the evolutionary history of natural languages." IRCS Report 95-16. Philadelphia (1995): Institute for Research in Cognitive Science, University of Pennsylvania. Technical report, 18 pp.

[Back to Top]

DATASETS

Linguists Don Ringe and Ann Taylor have produced two datasets of 24 languages, representing the 12 major subgroups of IE languages. The screened dataset is produced from the unscreened dataset by removing all characters that clearly exhibited parallel evolution and/or back-mutation (those two phenomena are usually referred to as "homoplasy").
[Back to Top]

SOFTWARE

Software tools (some of which are now publicly available, and others that we are in the process of making public):
[Back to Top]
Site 
Meter