An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy

Koch, Jordan M.; Holder, Mark T.

dc.contributor.author	Koch, Jordan M.
dc.contributor.author	Holder, Mark T.
dc.date.accessioned	2014-03-21T14:40:30Z
dc.date.available	2014-03-21T14:40:30Z
dc.date.issued	2012-12-14
dc.identifier.citation	Koch JM, Holder MT. An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy. PLOS Currents Tree of Life. 2012 Dec 14. Edition 1. http://dx.doi.org/10.1371/4fd1286980c08.
dc.identifier.uri	http://hdl.handle.net/1808/13347
dc.description.abstract	Felsenstein’s pruning algorithm allows one to calculate the probability of any particular data pattern arising on a phylogeny given a model of character evolution. Here we present a similar dynamic programming algorithm. Our algorithm treats the tree and model as known. The algorithm makes it feasible to calculate the probability that a randomly selected character will be a member of a particular class of character patterns. Specifically, we are interested in binning patterns by the number of parsimony steps and the set of states observed at the tips of the tree. This algorithm was developed to expand the range of data set sizes that can be used with Waddell et al.’s marginal testing approach for assessing the adequacy of a model. The algorithms introduced can also be used in likelihood calculations which correct for ascertainment biases. For example, Lewis introduced an Mkv model which corrects for the lack of constant sites. The probability of a constant pattern arising can be calculated using the algorithm that we present, or by enumerating all possible constant patterns and calculating the probability of each one. Because the number of constant data patterns is small, both methods are efficient. However, elaborations of the Mkv model (such as those in Nylander et al) require calculating the probability of parsimony-uninformative patterns arising. For large trees and characters with many possible character states, the number of possible parismony-uninformative patterns is immense. In these cases, the algorithms introduced here will be more efficient. The algorithm has been implemented in open source software written in C++.
dc.description.sponsorship	JMK would like to thank NIH 5 R25GM62232 and the Initiative for Maximizing Student Development for funding. MTH thanks NSF-DEB-1208393 and NSF-DEB-0732920 for financial support.
dc.publisher	Public Library of Science
dc.title	An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy
dc.type	Article
kusw.kuauthor	Holder, Mark T.
kusw.kudepartment	Department of Ecology and Evolutionary Biology
kusw.oastatus	fullparticipation
dc.identifier.doi	10.1371/4fd1286980c08.
kusw.oaversion	Scholarly/refereed, publisher version
kusw.oapolicy	This item meets KU Open Access policy criteria.
dc.rights.accessrights	openAccess

Files in this item

Name:: Holder.pdf
Size:: 2.562Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.