Wednesday, April 30, 2008

How shalt thou partition a Bayesian analysis?

Partitioned Bayesian phylogenetic analysis is all the rage. In a partitioned analysis, the data is divided a priori into regions of DNA that are thought to be evolving at different rates (e.g., first, second, and third codon positions). It makes sense to partition datasets: we know that different types of DNA evolve at different rates. How many partitions, and of which, should be used is somewhat more controversial. Matt Brandley et al. (Sys. Bio. 54:373-390) was among the first to address this question with an empirical dataset. They used Bayes factors to assess alternative partitioning strategies. They ultimately selected the most heavily partitioned analysis because it resulted in significantly better likelihood score than alternatives, even when penalizing for overparameterization. Most subsequent analyses have come to a similar conclusion: the most heavily partitioned dataset is nearly always favored by Bayes factors. McGuire et al. (Sys. Bio. 56:837-856) consider other possible means for selecting among alternative partitioning strategies (e.g., decision theoretic methodology). They found that not all methods share the Bayes factors tendency to select the most heavily partitioned framework. Although they favor a method that selects a somewhat less heavily partitioned analysis (DT), the various partitioning strategies appear to have little impact on their overall conclusions. Always good to do the most sophisticated analyses, but it can be a bit frustrating when you find you would have gotten the same answer with lots less effort...

No comments: