IIBR Informatics: Advancing Bioinformatics Methods using Ensembles of Profile Hidden Markov Models


Funding: U.S. National Science Foundation grant DBI-2006069 (ABI Innovation), $500,000.

Project Overview: Profile Hidden Markov Models (i.e., profile HMMs) are probabilistic graphical models that are in wide use in bioinformatics. Research over the last decade has shown that ensembles of profile HMMs (e-HMMs) can provide greater accuracy than a single profile HMM for many applications in bioinformatics, including phylogenetic placement, multiple sequence alignment, and taxonomic identification of metagenomic reads. Although these improvements have been substantial, the design of these e-HMMs has been fairly ad hoc, and their use can be computationally intensive, which reduces their appeal in practice. This project will advance the use of e-HMMs by developing statistically rigorous techniques for building e-HMMs with the goal of improving accuracy and improving understanding of e-HMMs, and will also develop methods that use e-HMMs for protein structure and function prediction. Broader impacts include software schools, engagement with under-represented groups, and open-source software. Project software and papers are available at http://tandy.cs.illinois.edu/eHMMproject.html.

Journal publications supported by this grant:

Preprints supported by this grant

Project Software:

Symposia and Software Schools: The grant will provide symposia and software schools to train researchers (from students through faculty) in new methods.

Presentations: See http://tandy.cs.illinois.edu/talks.html for the full list of talks.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.