The CPHL project, funded in part by the National Science Foundation, is a collaborative effort involving linguistics, computer science, and statistics, aimed at various goals.
  1. Producing and maintaining real linguistic datasets, in particular of Indo-European languages.
  2. Formulating statistical models that capture the evolution of historical linguistic data.
  3. Designing simulation tools and accuracy measures for generating synthetic data for studying the performance of reconstruction methods.
  4. Developing and implementing statistically-based as well as combinatorial methods for reconstructing language phylogenies, including phylogenetic networks.
New York Times article from 1996, about early work with Don Ringe.

NSF support for this project was provided through grants 0312911 and 0312830.

Linguists Don Ringe and Ann Taylor have produced two datasets of 24 languages, representing the 12 major subgroups of IE languages. The screened dataset is produced from the unscreened dataset by removing all characters that clearly exhibited parallel evolution and/or back-mutation (those two phenomena are usually referred to as "homoplasy").
Software tools (some of which are now publicly available, and others that we are in the process of making public):
