- Methodology article
- Open Access
An improved approximate-Bayesian model-choice method for estimating shared evolutionary history
- Jamie R Oaks^{1, 2}Email author
https://doi.org/10.1186/1471-2148-14-150
© Oaks; licensee BioMed Central Ltd. 2014
- Received: 7 March 2014
- Accepted: 10 June 2014
- Published: 3 July 2014
Abstract
Background
To understand biological diversification, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes.
Results
By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa.
Conclusions
The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency to incorrectly estimate models of shared evolutionary history with strong support.
Keywords
- Dirichlet-process prior
- Approximate-Bayesian computation
- Model choice
- Phylogeography
- Biogeography
Background
Understanding the processes that generate biodiversity and regulate community assembly is a major goal of evolutionary biology. Large-scale changes to the environment, including geological and climatic events, can affect the evolutionary history of entire communities of co-distributed species and their associated microbiota. For example, by partitioning communities, such an event can isolate groups of populations and cause a temporal cluster of speciation events across co-distributed taxa. Given the dynamic nature of our planet, such biogeographical processes likely play a significant role in determining diversification rates and patterns. At recent timescales, temporal clusters of diversification caused by biogeographical events can leave a signature in the genetic variation within and among the affected lineages. Thus, methods for accurately estimating models of shared evolutionary events across co-distributed taxa from genetic data are important for better understanding how regional and global biogeographical processes affect biodiversity.
This inference problem is challenging due to the stochastic nature by which mutations occur in populations and how they are inherited over generations [1, 2]. Thus, a method for estimating historical patterns of divergences across taxa should explicitly model the stochastic mutational and ancestral processes that generate and filter the genetic variation we observe in present-day genetic data. An appealing approach would be a comparative, Bayesian model-choice method for inferring the probability of competing divergence histories while integrating over uncertainty in mutational and ancestral processes via models of nucleotide substitution and lineage coalescence. The sample space of such a model-choice procedure would include all models ranging from a single divergence-time parameter (i.e., simultaneous divergence of all co-distributed taxa) to the fully generalized model in which each taxon diverged at a unique time.
The software package msBayes implements such an approach in an approximate-Bayesian model-choice framework [3, 4]. The method models temporally clustered divergences across taxa caused by a biogeographical event (or a “divergence event”) as a single, instantaneous occurrence. In other words, a divergence event causes a set of taxa to share the same moment of divergence along a continuous time scale (i.e., simultaneous divergence). Given aligned sequence data for Y pairs of populations, msBayes estimates the number of divergence events shared among the pairs, the timing of the events, and the assignment of pairs to the events, while integrating out uncertainty in demographic parameters and the genealogical histories of the sequences. Thus, the method samples over all possible divergence models of differing dimensionality (i.e., all the possible partitions of Y pairs to 1,2,…,Y divergence-time parameters), and, in so doing, estimates the posterior probability of each model.
msBayes has been used to address biogeographical questions in a variety of empirical systems. Some examples include (1) whether the rise of the Isthmus of Panama caused co-divergence among species of echinoids co-distributed across the Pacific and Atlantic sides of the isthmus [3], (2) if an historical seaway across the Baja Peninsula caused co-divergence across species of squamates and mammals co-distributed both north and south of the putative seaway [5], (3) if species of gall-wasps and their associated parasitoids share divergences across putative glacial refugia [6], and (4) whether repeated fragmentation of the oceanic Islands of the Philippines during Pleistocene sea-level fluctuations caused diversification of vertebrate taxa distributed across the islands [7]. Such applications of the method often result in strong posterior support for co-divergence among all or subsets of the taxa investigated (e.g., [3, 5–12]).
For priors on divergence-time and demographic parameters, msBayes uses continuous uniform probability distributions. This causes divergence models with more divergence-time parameters to integrate over a much greater parameter space with low likelihood yet high prior density, which can result in small marginal likelihoods relative to models with fewer divergence-time parameters [13, 14]. Given that the marginal likelihood of a model weighted by its prior is what determines its posterior probability, this can cause support for models with fewer divergence events [7, 15]. This is not a critique of Bayesian model choice in general; comparing models by their marginal likelihoods provides a “natural” penalty for over-parameterization and can be a great strength of the Bayesian approach. However, given the sensitivity of marginal likelihoods to the prior, care is needed when selecting prior distributions [14]. Selecting distributions that will often place high prior density in large regions of parameter space with low likelihood can lead to small marginal likelihoods of parameter-rich models even if they are correct.
Furthermore, msBayes uses a discrete uniform prior over the number of divergence events 1,2,…,Y. Because there are many more possible assignments of population pairs to intermediate numbers of divergence events, this imposes a prior on divergence models that puts most of the prior mass on models with either very few or very many divergence-time parameters (see Figure five of [7]; for brevity I will refer to this prior as “U-shaped”). Given that models with many divergence events can have small marginal likelihoods due to the uniform priors on divergence-time parameters, the U-shaped prior will effectively create a strong prior preference for models with very few divergence events.
Recently, Oaks et al. [7, 15] found via simulation that msBayes will often strongly support models with a small number of divergence events shared among taxa, even when divergences were random over broad timescales. They suggested this behavior was due to the combination of uniform priors on parameters causing small marginal likelihoods of richer models and the U-shaped prior on divergence models. Hickerson et al. [16] suggested the problem was caused by sampling error, and proposed as a solution an approximate-Bayesian model averaging approach that samples over empirically informed uniform priors. However, Oaks et al. [15] evaluated the approach proposed by Hickerson et al. [16] using simulations and found that it did not mitigate the method’s propensity to incorrectly infer clustered divergences, and often preferred priors that excluded the true values of the model’s parameters. Here, I describe a new approach that successfully mitigates spurious inference of co-divergence while avoiding negative side effects of empirically informed uniform priors.
In this study, I introduce a new method, implemented in the software dpp-msbayes, that extends the model of msBayes. I use this method to test whether alternative parameterizations and priors improve the behavior of the approximate-Bayesian model-choice approach to estimating shared divergence events. The new approach uses a Dirichlet-process prior (DPP) over all possible models of divergence, and gamma and beta probability distributions in place of uniform priors on many of the model’s parameters. Using simulations, I show that the new implementation has improved robustness, accuracy, and power compared to the original model. The results confirm that the improved performance of the new model is due to a combination of (1) more flexible priors on divergence-time and demographic parameters that avoid placing high prior density in improbable regions of parameter space, and (2) a diffuse Dirichlet-process prior that does not strongly disfavor divergence models with intermediate numbers of divergence events. After reanalyzing sequence data from 22 pairs of taxa from the Philippines [7] under the new model, I find a large amount of posterior uncertainty in the number of divergence events shared among the taxa; a result in contrast with the original msBayes model and congruent with intuition given the richness of the model and the relatively small amount of information in the data.
Methods
The model
Summary of the notation used throughout this work; modified from Oaks et al. [7]
Symbol | Description |
---|---|
Y | Number of population pairs. |
n _{ i } | The number of genome copies sampled from population pair i, with n_{1,i} sampled from population 1, and n_{2,i} from population 2. |
k _{ i } | Number of loci sampled from population pair i. |
K | Total number of unique loci sampled. |
X _{i, j} | Sequence alignment of locus j sampled from population pair i. |
${S}_{i,\phantom{\rule{0.3em}{0ex}}j}^{\ast}$ | Population genetic summary statistics calculated from X_{i, j}. |
X | Vector containing the sequence alignments of each locus from each population pair: $\left({X}_{1,1},\dots ,{X}_{Y}{,}_{{k}_{Y}}\right)$. |
S ^{∗} | Vector containing the summary statistics of each locus from each population pair: $\left({\mathit{\text{S}}}_{1,1}^{\ast},\dots ,{\mathit{\text{S}}}_{Y,{k}_{Y}}^{\ast}\right)$. |
B_{ ε }(S^{∗}) | Multi-dimensional Euclidean space around the observed summary statistics, S^{∗}. |
ε | Radius of B_{ ε }(S^{∗}), i.e., the tolerance of the ABC estimation. |
G _{i, j} | Gene tree of the sequences in X_{i, j}. |
G | Vector containing the gene trees of each locus from each population pair: $\left({G}_{1,1},\dots ,{G}_{Y,{k}_{Y}}\right)$. |
|τ| | Number of population divergence-time parameters shared among the Y population pairs. |
τ | Time of population divergence in 4N_{ C } generations. |
τ | Set of divergence-time parameters: {τ_{1},…,τ_{|τ|}}. |
t _{ i } | The index of the divergence-time in τ to which population pair i is mapped. |
t | Vector of divergence-time indices: (t_{1},…,t_{ Y }). |
T _{ i } | Time of divergence in 4N_{ C } generations between the populations of pair i. |
T | Vector of divergence times for each of the population pairs: (T_{1},…,T_{ Y }). |
${\mathcal{T}}_{i,\phantom{\rule{0.3em}{0ex}}j}$ | Scaled time of divergence between the populations of pair i for locus j. |
| Vector containing the scaled divergence times of each locus from each population pair: $({\mathcal{T}}_{1,1},\dots ,{\mathcal{T}}_{Y,{k}_{Y}})$. |
θ_{D1,i},θ_{D2,i} | Mutation-rate-scaled effective population size of the 1^{ s t } and 2^{ n d } descendent population, respectively, of pair i. |
θ _{A,i} | Mutation-rate-scaled effective population size of the population ancestral to pair i. |
θ_{ D1 },θ_{ D2 } | Vectors (θ_{D1,1},…,θ_{D1,Y}) and (θ_{D2,1},…,θ_{D2,Y}), respectively. |
θ _{ A } | Vector containing the θ_{ A } parameters for each population pair: (θ_{A,1},…,θ_{A,Y}). |
υ _{ j } | Mutation-rate multiplier of locus j. |
υ | Vector containing the locus-specific mutation-rate multipliers: (υ_{1},…,υ_{ K }). |
α | The shape parameter of the gamma prior distribution on υ. |
ζ_{D1,i},ζ_{D2,i} | θ-scaling parameters that determine the magnitude of the population bottleneck in the 1^{ s t } and 2^{ n d } descendant population of pair i, |
respectively. The bottleneck in each descendant population begins immediately after divergence. | |
ζ_{ D1 },ζ_{ D2 } | Vectors (ζ_{D1,1},…,ζ_{D1,Y}) and (ζ_{D2,1},…,ζ_{D2,Y}), respectively. |
τ _{B,i} | Proportion of time between present and T_{ i } when the bottleneck ends for the descendant populations of pair i. |
τ _{ B } | Vector containing the τ_{ B } parameters for each population pair: (τ_{B,1},…,τ_{B,Y}). |
m_{ i } | Symmetric migration rate between the descendant populations of pair i. |
m | Vector containing the migration rates for each population pair: (m_{ i },…,m_{ Y }). |
ρ _{i, j} | θ-scaling constant provided by the investigator for locus j of pair i. This constant is required to scale θ for differences in ploidy among loci |
or differences in generation times among taxa. | |
ν _{i, j} | θ-scaling constant provided by the investigator for locus j of pair i. This constant is required to scale θ for differences in mutation rates |
among loci or among taxa. | |
ρ | Vector of ploidy and/or generation-time scaling constants: $({\rho}_{1,\phantom{\rule{0.3em}{0ex}}1},\dots ,{\rho}_{\mathit{\text{Y}},{k}_{Y}})$ |
ν | Vector of mutation-rate scaling constants: $({\nu}_{1,\phantom{\rule{0.3em}{0ex}}1},\dots ,{\nu}_{\mathit{\text{Y}},{k}_{Y}})$ |
$\stackrel{\u0304}{\mathit{\text{T}}}$ | Mean of divergence times across the Y population pairs. |
${s}_{T}^{2}$ | Variance of divergence times across the Y population pairs. |
D _{ T } | Dispersion index of divergence times across the Y population pairs $\left({s}_{T}^{2}/\stackrel{\u0304}{\mathit{\text{T}}}\right)$. |
n | Number of samples from the joint prior. |
Λ | Vector of parameter values drawn from the joint prior. |
S | Vector containing the summary statistics calculated from data simulated under parameter values drawn from the prior (Λ). |
Λ | Random sample of Λ_{1},…,Λ_{ n } drawn form the prior. |
| Summary statistic vectors S_{1},…,S_{ n } for each Λ_{1},…,Λ_{ n } drawn from the prior. |
I assume an investigator is interested in inferring the distribution of divergence times among Y pairs of populations. For each pair i, n_{ i } genome copies have been sampled, with n_{1,i} copies sampled from population 1, and n_{2,i} sampled from population 2. From these genomes, let k_{ i } be the number of DNA sequence loci collected for population pair i, and K be the total number of unique loci sampled across the Y pairs of populations. I use X_{i, j} to represent the multiple sequence alignment of locus j for population pair i. $\mathbf{X}=({X}_{1,1},\dots ,{X}_{Y,{k}_{Y}})$ is the full dataset, i.e., a vector of sequence alignments for all pairs and loci. Let G_{i, j} represent the gene tree upon which X_{i, j} evolved according to fixed HKY85 substitution model parameters ϕ_{i,j}. The investigator must specify the parameters of all $\mathit{\varphi}=({\varphi}_{1,1},\dots ,{\varphi}_{\mathit{\text{Y}},{k}_{Y}})$ substitution models by which the alignments evolved along the $\mathbf{G}=({G}_{1,1},\dots ,{G}_{\mathit{\text{Y}},{k}_{Y}})$ gene trees. Furthermore, the investigator must specify a vector of fixed constants $\mathit{\rho}=({\rho}_{1,1},\dots ,{\rho}_{\mathit{\text{Y}},{k}_{Y}})$ that scale the population-size parameters for known differences in ploidy among loci and/or differences in generation times among population pairs. Lastly, the investigator must also specify a vector of fixed constants $\mathit{\nu}=({\nu}_{1,1},\dots ,{\nu}_{\mathit{\text{Y}},{k}_{Y}})$ that scale the population-size parameters for known differences in mutation rates among loci and/or among taxa.
where T=(T_{1},…,T_{ Y }) is a vector of population divergence times for each of the Y pairs of populations, Θ=(Θ_{1},…,Θ_{ Y }) is a vector of the demographic parameters for each of the Y population pairs, υ=(υ_{1},…υ_{ K }) is a vector of locus-specific mutation-rate multipliers for each of the K loci, α is the shape parameter of a gamma-distributed prior on υ, and p(X|ϕ,ρ,ν), is the probability of the data (or the marginal likelihood of the model) given the fixed constants provided by the investigator.
where B_{ ε }(S^{∗}) is the multidimensional Euclidean space around the vector of summary statistics, the radius of which is the tolerance ε. The sources of approximation are the insufficiency of the statistics and the ε being greater than zero. I describe the full model in detail before delving into the numerical method of estimating the approximate model.
Likelihood and gene-tree prior terms of Equation 2
The first term, p(X_{i, j}|G_{i, j},ϕ_{i, j}), is the probability of the sequence alignment of locus j for population pair i given the gene tree and HKY85 [17] substitution model parameters [18, i.e., the “Felsenstein likelihood”]. The model allows for an intra-locus recombination rate r, which, for simplicity, is assumed to be zero in Equation 2. If r is non-zero, this term requires an additional product over the columns (sites) of each sequence alignment to allow sites to have different genealogies. The second term, p(G_{i, j}|T_{ i },Θ_{ i },υ_{ j },ρ_{i, j},ν_{i, j}), is the probability of the gene tree under a multi-population coalescent model (i.e., species tree) where the ancestral population of pair i diverges and gives rise to the two sampled descendant populations. Each Θ contains the following demographic parameters: The mutation-rate-scaled effective sizes (θ = 4N μ) of the ancestral, θ_{ A }, and descendant populations, θ_{D1} and θ_{D2}; the proportion of the first, ζ_{D1}, and second population, ζ_{D2}, that persist during bottlenecks that begin immediately after divergence in forward-time; the proportion of time between present and divergence when the bottlenecks end for both populations, τ_{ B }; and the symmetric migration rate between the descendant populations, m. Thus, the probability of the n_{ i }−1 coalescence times (node heights) of gene tree G_{i, j} is given by a multi-population Kingman-coalescent model [19] where the ancestral population of size θ_{A,i}ρ_{i, j}ν_{i, j}υ_{ j } diverges at time T_{ i } into two descendant populations of constant size θ_{D1,i}ρ_{i, j}ν_{i, j}υ_{ j }ζ_{D1,i} and θ_{D2,i}ρ_{i, j}ν_{i, j}υ_{ j }ζ_{D2,i}, which, after time T_{ i }τ_{B,i}, grow exponentially to their present size θ_{D1,i}ρ_{i, j}ν_{i, j}υ_{ j } and θ_{D2,i}ρ_{i, j}ν_{i, j}υ_{ j }, respectively. Following divergence, the descendant populations of pair i exchange migrants at a symmetric rate of m_{ i }.
Additional prior terms of Equation 2
where each υ is independently and identically distributed (iid) as υ∼G a m m a(α,1/α). If the recombination rate r is allowed to be non-zero, the prior term p(r) would be added to Equation 2, and the prior would be r∼G a m m a(a_{ r },b_{ r }), where a_{ r } and b_{ r } are specified by the investigator.
The priors for the demographic parameters are ${\theta}_{A}\sim \mathit{\text{Gamma}}({a}_{{\theta}_{A}},{b}_{{\theta}_{A}}),{\theta}_{D1}\sim \mathit{\text{Gamma}}({a}_{{\theta}_{D}},{b}_{{\theta}_{D}}),{\theta}_{D2}\sim \mathit{\text{Gamma}}({a}_{{\theta}_{D}},{b}_{{\theta}_{D}}),{\zeta}_{D1}\sim \mathit{\text{Beta}}({a}_{{\zeta}_{D}},{b}_{{\zeta}_{D}}),{\zeta}_{D2}\sim \mathit{\text{Beta}}({a}_{{\zeta}_{D}},{b}_{{\zeta}_{D}}),{\tau}_{B}\sim U(0,1)$, and m∼G a m m a(a_{ m },b_{ m }), where the hyper-parameters of each prior distribution can be specified by the investigator. By default, θ_{ A }, θ_{D1}, and θ_{D2} share the same prior (i.e., ${a}_{{\theta}_{A}}={a}_{{\theta}_{D}}$ and ${b}_{{\theta}_{A}}={b}_{{\theta}_{D}}$), but a separate gamma-distributed prior can be assigned to θ_{ A }. Also, the ζ_{D1},ζ_{D2}, and m parameters are optional (i.e., the investigator can assume that there has been no migration between populations of each pair and/or the population size of each descendant population has been constant through time).
Priors on divergence models
to each population pair i, such that T = (T_{1} = f(τ,t,1),…,T_{ Y }=f(τ,t,Y)).
Biologically speaking, τ contains the times of divergence events, the length of which |τ| is the number of divergence events shared across the Y pairs of populations. For example, if τ contains a single divergence-time parameter τ_{1}, all Y pairs of populations are constrained to diverge at this time (i.e., t would contain the index 1 repeated Y times, and T would contain the value τ_{1} repeated Y times), whereas if it contains Y divergence-time parameters, the model is fully generalized to allow all of the pairs to diverge at unique times.
Unlike the model implemented in msBayes, here I place priors on t and τ, rather than |τ| and τ. As a result, t determines the number of divergence-time parameters (|τ|) in the model. Below, I first describe the prior used for τ and the timescale it imposes on the model before discussing the priors implemented for t.
Thus, for each of the divergence times in τ to be on the same scale, the relative mutation rates among the pairs of populations are assumed to be known and fixed according to the user-provided values in ν.
where ${\stackrel{\u0304}{\theta}}_{D,i}$ is the mean of θ_{D1} and θ_{D2} for pair i. This gives us the vector of scaled divergence times $\mathcal{T}=({\mathcal{T}}_{1,1},\dots ,{\mathcal{T}}_{\mathit{\text{Y}},{k}_{Y}})$.
which is the Bell number [20]. The original msBayes model samples over the unordered realizations of t, such that the sample space is reduced to all the possible integer partitions of Y[4, 7, 21–23] (Additional file 1: Table S1). I denote the set of all possible integer partitions of the Y pairs of populations as a(Y) and the length of that set as |a(Y)|, and I hereinafter refer to these integer partitions as “unordered” divergence models or partitions. The advantages, disadvantages, and justification of ignoring the order of t is discussed in detail below.
i.e., a discrete uniform prior over all the integer partitions of Y (unordered divergence models). I denote this prior as t∼D U{a(Y)}.
The second prior is based on the Dirichlet process, which is a stochastic process that groups random variables into an unknown number of discrete parameter classes [24, 25]. The Dirichlet process has been used as a non-parametric Bayesian approach to many inference problems in evolutionary biology [26–31]. Here, I use the Dirichlet process to place a prior over all possible ordered partitions of Y population pairs into divergence-time parameter classes (i.e., “divergence events”). As discussed above, the time of each divergence-time parameter is drawn from the base distribution τ∼G a m m a(a_{ τ },b_{ τ }). The partitioning of the population pairs to divergence-time classes is controlled by the concentration parameter χ, which determines how clustered the process will be. I take a hierarchical approach and use a prior probability distribution (i.e., hyperprior) for χ[32]. More specifically, I use a gamma-distributed prior χ∼G a m m a(a_{ χ },b_{ χ }), where a_{ χ } and b_{ χ } are specified by the investigator. I use t∼D P(χ) to denote this Dirichlet-process prior.
where c(·,·) are the unsigned Stirling numbers of the first kind. Equations 15 and 16 show that smaller values of χ will favor fewer divergence-time parameters, and thus more clustered models of divergence, whereas larger values will favor more divergence-time parameters, and thus less clustered models of divergence.
Differences between this model and the original msBayesmodel
The prior on divergence models
One of the key differences between this model and that of msBayes[4] is the prior distribution on divergence models. As discussed in Oaks et al. [7], in msBayes the prior used for t is a combination of a discrete uniform prior over the possible number of divergence events |τ| from 1 to Y with a multinomial distribution on the number of times each index of τ appears in t, with the constraint that all τ parameters are represented at least once (see Equation two of [7]). I denote this prior used by msBayes as t∼D U{1,…,Y}. Oaks et al. [7] discuss how placing a uniform prior over the number of divergence parameters (denoted |τ| here, and as Ψ in [4]) imposes an “U-shaped” prior over divergence models (t; see Figure five(B) of [7]). To avoid this, I place priors directly on the sample space of divergence models, thus eliminating the parameter Ψ from the model. I introduce two priors on divergence models: (1) a prior that is uniform over all unordered divergence models, and (2) a Dirichlet-process prior on all ordered divergence models. The latter provides an investigator with a great deal of flexibility in expressing their prior beliefs about models of divergence.
Estimating ordered divergence models
As mentioned above, msBayes samples over unordered divergence models (i.e., unordered partitions of the Y pairs of populations). That is, the identity of each population pair, and all the information associated with it, is discarded. In my implementation, inference can be done on either unordered or ordered models of divergence. This is discussed in more detail in the description of the ABC implementation below.
The priors on nuisance parameters
I have replaced the use of continuous uniform distributions for priors on many of the model’s parameters (τ,θ_{ A },θ_{D1},θ_{D2},ζ_{D1},ζ_{D2},r,m) with more flexible parametric distributions from the exponential family. I introduce gamma-distributed priors for rate parameters that have a sample space of all positive real numbers (τ,θ_{ A },θ_{D1},θ_{D2},r,m), and beta-distributed priors for parameters that are proportions bounded by zero and one (ζ_{D1} and ζ_{D2}). These priors provide an investigator with much greater flexibility in expressing prior uncertainty regarding the parameters of the model.
such that the user-specified uniform prior on descendant population size is a prior on the mean size of the two descendant populations of each pair. Under my model, the sizes of the descendant populations of each pair are iid as ${\theta}_{D1}\sim \mathit{\text{Gamma}}({a}_{{\theta}_{D}},{b}_{{\theta}_{D}})$ and ${\theta}_{D2}\sim \mathit{\text{Gamma}}({a}_{{\theta}_{D}},{b}_{{\theta}_{D}})$. This relaxes the assumption that the sizes of the two descendant populations are interdependent and negatively correlated.
Flexibility in parameterizing the model
In the new implementation, I provide the ability to control the richness of the model. For the θ parameters, by default, the model is fully generalized to allow each population pair to have three parameters: θ_{ A },θ_{D1}, and θ_{D2}. Furthermore, if an investigator prefers to reduce the number of parameters, any model of θ parameters nested within this general model can also be specified, including the most restricted model where the ancestral and descendant populations of each pair share a single θ parameter.
I also provide the option of eliminating the parameters associated with the post-divergence bottlenecks in the descendant populations of each pair (τ_{ B }, ζ_{D1}, and ζ_{D2}), which constrains the descendant populations to be of constant size from present back to the divergence event. Also, rather than eliminate the bottleneck parameters, I allow ζ_{D1} and ζ_{D2} to be constrained to be equal, which removes one free parameter from the model for each of the population pairs.
Overall, my implementation allows an investigator to specify a model that has as many as seven parameters per population pair (θ_{ A },θ_{D1},θ_{D2},τ_{ B },ζ_{D1},ζ_{D2}, and m) or as few as one parameter per pair (θ), in addition to the n_{ i }−1 coalescence-time parameters (i.e., the node heights of the gene tree).
Time scale
As described above, divergence times are in units of θ_{ C }/μ generations, where θ_{ C } is the expectation of the prior on descendant-population size. As described by Oaks et al. [7], in msBayes, θ_{ C } is half of the upper limit of the continuous uniform prior on the mean of the descendant population sizes. This is only equal to the expectation of the prior if the lower limit of the prior is zero.
ABC estimation of the posterior of the model
Sampling from the prior
To estimate the approximate posterior of Equation 3, I use an ABC rejection algorithm. The first step of this algorithm entails collecting a random sample of parameter values from the joint prior and their associated summary statistics. Each sample is generated by (1) drawing values of all the model’s parameters, which I denote Λ, from their respective prior distributions; (2) simulating gene trees $\mathbf{G}=({G}_{1,1},\dots ,{G}_{Y,{k}_{Y}})$ for each locus of each population pair by drawing coalescent times from a multi-population Kingman-coalescent model given the demographic parameters; (3) simulating sequence alignments $\mathbf{X}=({X}_{1,1},\dots ,{X}_{Y,{k}_{Y}})$ along the gene trees under the HKY85 substitution parameters $\mathit{\varphi}=({\varphi}_{1,1},\dots ,{\varphi}_{Y,{k}_{Y}})$ that have the same number of sequences and sequence lengths as the observed dataset; and (4) calculating population genetic summary statistics $\mathbf{S}=({S}_{1,1},\dots ,{S}_{Y,{k}_{Y}})$ from the simulated sequence alignments. Optionally, an additional step can be performed to reduce the summary statistics to the means across loci for each population pair to get S=(S_{1},…,S_{ Y }). Either way, S contains the same summary statistics as those estimated from the observed data S^{∗}. After repeating this procedure n times, we have a random sample of parameter vectors Λ=(Λ_{1},…,Λ_{ n }) from the model prior and their associated vectors of summary statistics $\mathbb{S}=({\mathbf{S}}_{1},\dots ,{S}_{n})$.
For all of the analyses below, I use four summary statistics for each pair of populations: π[33], θ_{ W }[34], π_{ n e t }[35], and S D(π−θ_{ W }) [36]. Furthermore, in addition to model parameters, each sample Λ also contains four statistics that summarize T: the mean ($\stackrel{\u0304}{T}$), variance $\left({s}_{T}^{2}\right)$, dispersion index (${D}_{T}={s}_{T}^{2}/\stackrel{\u0304}{T}$), and the number of divergence time parameters (|τ|). Previously, these have been denoted as E(τ)V a r(τ),Ω, and Ψ, respectively [3, 4, 7]. I use $\stackrel{\u0304}{T}$ and ${s}_{T}^{2}$ in place E(τ) and V a r(τ) to make clear that these values do not represent the prior or posterior expectation/variance of divergence times. I use D_{ T } in place of Ω to clarify that this is a statistic rather than a parameter of the model. Lastly, I use |τ| in place of Ψ, because the number of divergence-time parameters is no longer a parameter in the new implementation.
Obtaining an approximate posterior from the prior samples
I use a rejection algorithm to retain an approximate posterior sample of Λ from the prior sample Λ=(Λ_{1},…,Λ_{ n }). First, the observed summary statistics S^{∗}, and the summary statistics of the prior samples $\mathbb{S}=({\mathbf{S}}_{1},\dots ,{S}_{\mathbf{n}})$, are standardized using the means and standard deviations of the statistics from the prior sample (i.e., the prior mean is subtracted from each statistic, and the difference is divided by the prior standard deviation). After all statistics are standardized, the Euclidean distance between S^{∗} and each S within is calculated. The samples that fall within a range of tolerance ε around S^{∗} are retained. The range of tolerance is determined by specifying the desired number of posterior samples to be retained. Post-hoc adjustment of the posterior sample can also be performed with a number of regression techniques [37–39]. For analyses below, I use the general linear model (GLM) regression adjustment [39] as implemented in ABCtoolbox v1.1 [40], which Oaks et al. [7] showed performs very similarly to weighted local-linear regression and multinomial logistic regression adjustments [37] for msBayes posteriors.
Ordering of taxon-specific summary statistics
As alluded to in the model description, msBayes does not maintain the order of the taxon-specific summary statistics S within each S. Rather, the summary statistics are re-ordered by descending values of average pairwise differences between the descendant populations (π_{ b }) [4, 41]. This has the advantage of reducing the sample space of possible divergence models t, but there are at least two disadvantages. First, additional information in the data is lost. By discarding the identity of the Y pairs of populations, all pair-specific information about the amount of data (e.g., the number of gene copies collected from each of the populations [n_{1} and n_{2}], the number of loci, and the length of the loci), and the taxon- and locus-specific parameters (ϕ,ν,ρ, and υ) is lost. Second, the results are more difficult to interpret, because divergence models and parameter estimates cannot be directly associated to the taxa under study.
The re-ordering of the summary statistic vectors also has an important implication for the ABC algorithm. When calculating the Euclidean distance between the observed data and each simulated dataset, the summary statistics being compared often represent sequence alignments of different taxon pairs and/or loci. More specifically, the summary statistics calculated from the observed sequence alignments are being compared to summary statistics calculated from datasets simulated with potentially different (1) numbers of sequences (n_{1} and n_{2}), (2) length of alignments, (3) numbers of loci (k), (4) HKY85 model parameters (ϕ), (5) mutation-rate multipliers (ν), and (6) ploidy multipliers (ρ).
In the original descriptions of the msBayes method [3, 4], this re-ordering is justified by the fact that the expected value of π_{ b } is unrelated to sample size n_{1} and n_{2} and thus exchangeable among pairs. This is incorrect for two reasons. First, the entire vector of summary statistics S for each pair of populations is re-ordered across pairs, which implies that the justification for re-ordering π_{ b } applies to all the statistics within each S. However, the expectations for statistics that estimate gross diversity (e.g., π and θ_{ W }) are not independent of sample size for structured populations (e.g., the divergent pairs of populations modeled by msBayes), and other statistics are not independent of sample size in general (e.g., S D(π−θ_{ W })). Second, and more importantly, having the same expectation does not ensure random variables are exchangeable. Rather, for variables to be exchangeable their marginal distributions must be the same (i.e., they must be identically distributed). None of the summary statistics used by msBayes, including π_{ b }, have this property when there is any variation among taxa or loci in the (1) numbers of sequences (n_{1} and n_{2}), (2) length of alignments, (3) numbers of loci (k), (4) HKY85 model parameters (ϕ), (5) mutation-rate multipliers (ν), or (6) ploidy multipliers (ρ). Whenever such variation is present (i.e., nearly all empirical applications), the taxon-specific summary statistics S are not exchangeable, and the reshuffling of the summary statistic vectors is not mathematically valid.
The magnitude of the affect of this violation of exchangeability is not known. Huang et al. [4] demonstrated that the reordering of the summary statistic vectors can greatly increase the method’s tendency to infer a single divergence event. By definition, if the summary statistic vectors were exchangeable, the reordering would not change the likelihood or posterior (barring sampling error). Thus, the results of Huang et al. [4] suggest the reordering of the statistics is potentially introducing sizeable error to the analysis.
For comparability with msBayes, I maintain the option for re-ordering taxon-specific summary statistics by π_{ b }. However, by default, the order is preserved, and ordered divergence models are estimated. In all of the simulation-based analyses described below, the summary statistic vectors are exchangeable, because the simulated datasets have the same (1) numbers of sequences, (2) length of sequences, (3) numbers of loci, (4) HKY85 model parameters, (5) mutation-rate multipliers, and (6) ploidy multipliers.
Assessing model-choice behavior and robustness
- 1.
The M _{ m s B a y e s } model represents the original msBayes implementation with the U-shaped prior on unordered divergence models and uniform priors on divergence-time and demographic parameters; t∼D U{1,…,Y},τ∼U(0,10),θ _{ A }∼U(0,0.05), and ${\stackrel{\u0304}{\theta}}_{D}\sim U(0,0.05)$.
The models evaluated in the simulation-based analyses
Priors | ||||
---|---|---|---|---|
Model | t | τ | θ | |
M _{ m s B a y e s } | t∼D U{1,…,Y} | τ∼U(0,10 [ 25 M G A]) | θ_{ A }∼U(0,0.05) | ${\stackrel{\u0304}{\theta}}_{D}\sim U(0,0.05)$ |
M _{ U s h a p e d } | t∼D U{1,…,Y} | τ∼E x p(m e a n=2.887[ 7.22M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | |
M _{ U n i f o r m } | t∼D U{a(Y)} | τ∼E x p(m e a n=2.887[ 7.22M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | |
M _{ D P P } | t∼D P(χ∼G a m m a(·,·)) | τ∼E x p(m e a n=2.887[ 7.22M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) |
- 2.
The M _{ Ushaped } model with the U-shaped prior of msBayes on unordered divergence models, but with exponential priors on divergence-time and demographic parameters; t∼D U{1,…,Y},τ∼E x p(m e a n=2.887),θ _{ A }∼E x p(m e a n=0.025),θ _{D1}∼E x p(m e a n=0.025), and θ _{D2}∼E x p(m e a n=0.025).
- 3.
The M _{ U n i f o r m } model with a uniform prior over unordered divergence models and exponential priors on divergence-time and demographic parameters; t∼D U{a(Y)},τ∼E x p(m e a n=2.887),θ _{ A }∼E x p(m e a n=0.025),θ _{D1}∼E x p(m e a n=0.025), and θ _{D2}∼E x p(m e a n=0.025).
- 4.
The M _{ D P P } model with a Dirichlet-process prior on ordered divergence models and exponential priors on divergence-time and demographic parameters; t ∼ D P(χ∼G a m m a(2,2)),τ ∼ E x p(m e a n=2.887), θ _{ A } ∼ E x p(m e a n=0.025),θ _{D1} ∼ E x p(m e a n=0.025), and θ _{D2}∼E x p(m e a n=0.025).
I selected the exponential prior on divergence time used in models M_{ D P P },M_{ U n i f o r m }, and M_{ U s h a p e d } to have the same variance as the uniform prior in model M_{ m s B a y e s }. I selected the exponential prior on population size used in models M_{ D P P }, M_{ U n i f o r m }, and M_{ U n i f o r m } to have the same mean as the uniform prior in model M_{ m s B a y e s }, so that all four models have the same θ_{ C } and thus the same units of time. All of the models were the same in other respects, with three free θ parameters for each population pair, two uniformly distributed (b e t a(1,1)) ζ_{ D } parameters per pair, no migration, no recombination, and re-sorting of taxon-specific summary statistics by π_{ b } (i.e., sampling unordered divergence models). For all simulations, I used a data structure of eight population pairs, with a single 1000 base-pair locus sampled from 10 individuals from each population.
For each of the four models, I simulated 1×10^{6} samples from the prior and 50,000 datasets, also drawn from the prior. I then analyzed each of the simulated datasets, retaining a posterior of 1000 samples from the respective prior. A GLM-regression adjusted posterior was also estimated from each of the posterior samples [39]. To assess the robustness of each of the four models, I also analyzed the datasets simulated under the other three models. Overall, for each model, I produced 200,000 posterior estimates, 50,000 from the datasets simulated under that model, and 150,000 from the datasets simulated under the other three models.
For each set of 50,000 simulated datasets, I used the posterior estimates to assess the model-choice behavior of each model. I did this by assigning the 50,000 estimates of the posterior probability of one-divergence event to 20 bins of width 0.05, and plotted the estimated posterior probability of each bin against the proportion of replicates in that bin with a true value consistent with one divergence event [7, 42]. Ideally, the estimated posterior probability of the one-divergence model should estimate the probability that the one-divergence model is correct. For large numbers of simulation replicates, the proportion of the replicates in each bin for which the one-divergence model is true will approximate the probability that the one-divergence model is the correct model. Thus, if the method has the desirable behavior such that the estimated posterior probability of the one-divergence model is an unbiased estimate of the probability that the one-divergence model is correct, the points should fall near the identity line. For example, let us say the method estimates a posterior probability of 0.90 for 1000 datasets simulated from the prior. If the method is accurately estimating the probability that the one-divergence model is correct given the data, then the one-divergence model should be the true model in approximately 900 of the 1000 replicates. Any trend away from the identity line indicates the method is biased in the sense that it is not accurately estimating the probability that the one-divergence model is the correct model.
I constructed these plots using two criteria for the one-divergence model: (1) the number of divergence-time parameters (|τ|=1) and (2) the dispersion index of divergence times (D_{ T }<0.01). For the latter, D_{ T }<0.01 has been commonly used as an arbitrary criterion for a single “simultaneous” divergence event (e.g., [3, 5, 6]). I focused on the one-divergence model to assess model-choice behavior, because it is often of biogeographic interest and is easily comparable among the three different priors used on divergence models.
In addition to the four models above, I also assessed the behavior of a model that samples over ordered divergence models (i.e., the order of the taxon-specific summary statistic vectors were maintained for the observed and simulated datasets); all other settings were identical to the M_{ D P P } model. I denote this model as M°_{ D P P }. I simulated 1×10^{6} prior samples and 50,000 datasets, and analyzed them as above. I was not able to analyze the simulated datasets of the other models under the ordered model, because the identity of the population pairs is not contained in the simulations of the other models.
Assessing power
I evaluated the power of the same four models (Table 2) to detect random variation in divergence times using methods similar to Oaks et al. [7]. For all power simulations, I used a data structure identical to that of the empirical dataset of Philippine vertebrates analyzed by Oaks et al. [7], which consists of 22 pairs of populations. Due to the larger number of pairs, I used a different hyperprior on the concentration parameter for the M_{ D P P } model; I used a prior of t∼D P(χ∼G a m m a(1.5,18.1)) over divergence models for the model M_{ D P P }. All other aspects of the four models in Table 2 were identical to those used in the validation analyses described above. For each of the four models, I generated 2 × 10^{6} samples from the prior.
Next, I simulated datasets from three series of models in which the divergence times of the 22 pairs were random (i.e., no clustering; |τ|=22). The models comprising each series differ in the variance of the distribution from which the divergence times are randomly drawn. When the variance of random divergence times is small, all of the models in Table 2 are expected to struggle to detect this variation and will often incorrectly estimate highly clustered models of divergence (i.e., few divergence events). The goal is to assess how much temporal variation in random divergence times is necessary before the behavior of the models of Table 2 begins to improve. This will determine the timescales over which the models can reliably detect random variation in divergence times and avoid spurious inference of clustered divergence models.
- 1.
The ${\mathcal{\mathcal{M}}}_{\mathit{\text{msBayes}}}$ models are identically distributed as M _{ m s B a y e s } except the divergence times for each of the 22 pairs of populations are randomly drawn from a series of uniform distributions, U(0,τ _{ m a x }), where τ _{ m a x } was set to: 0.2, 0.4, 0.6, 0.8, 1.0, and 2.0, in 4N _{ C } generations.
The models used to simulate pseudo-replicate datasets for assessing the power of the models in Table 2
Priors | ||||
---|---|---|---|---|
Model series | t | τ | θ | |
${\mathcal{\mathcal{M}}}_{\mathit{\text{msBayes}}}$ | |τ|=22 | τ∼U(0,0.2 [ 0.5 M G A]) | θ_{ A }∼U(0,0.05) | ${\stackrel{\u0304}{\theta}}_{D}\sim U(0,0.05)$ |
|τ|=22 | τ∼U(0,0.4 [ 1.0 M G A]) | θ_{ A }∼U(0,0.05) | ${\stackrel{\u0304}{\theta}}_{D}\sim U(0,0.05)$ | |
|τ|=22 | τ∼U(0,0.6 [ 1.5 M G A]) | θ_{ A }∼U(0,0.05) | ${\stackrel{\u0304}{\theta}}_{D}\sim U(0,0.05)$ | |
|τ|=22 | τ∼U(0,0.8 [ 2.0 M G A]) | θ_{ A }∼U(0,0.05) | ${\stackrel{\u0304}{\theta}}_{D}\sim U(0,0.05)$ | |
|τ|=22 | τ∼U(0,1.0 [ 2.5 M G A]) | θ_{ A }∼U(0,0.05) | ${\stackrel{\u0304}{\theta}}_{D}\sim U(0,0.05)$ | |
|τ|=22 | τ∼U(0,2.0 [ 5.0 M G A]) | θ_{ A }∼U(0,0.05) | ${\stackrel{\u0304}{\theta}}_{D}\sim U(0,0.05)$ | |
${\mathcal{\mathcal{M}}}_{\mathit{\text{Uniform}}}$ | |τ|=22 | τ∼U(0,0.2 [ 0.5 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | |
|τ|=22 | τ∼U(0,0.4 [ 1.0 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | ||
|τ|=22 | τ∼U(0,0.6 [ 1.5 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | ||
|τ|=22 | τ∼U(0,0.8 [ 2.0 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | ||
|τ|=22 | τ∼U(0,1.0 [ 2.5 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | ||
|τ|=22 | τ∼U(0,2.0 [ 5.0 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | ||
${\mathcal{\mathcal{M}}}_{\mathit{\text{Exp}}}$ | |τ|=22 | τ∼E x p(m e a n=0.058 [ 0.14 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | |
|τ|=22 | τ∼E x p(m e a n=0.115 [ 0.29 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | ||
|τ|=22 | τ∼E x p(m e a n=0.173 [ 0.43 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | ||
|τ|=22 | τ∼E x p(m e a n=0.231 [ 0.58 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | ||
|τ|=22 | τ∼E x p(m e a n=0.289 [ 0.72 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) | ||
|τ|=22 | τ∼E x p(m e a n=0.577 [ 1.44 M G A]) | θ_{ A }∼θ_{D1}∼θ_{D2}∼E x p(m e a n=0.025) |
- 2.
The ${\mathcal{\mathcal{M}}}_{\mathit{\text{Uniform}}}$ models are identically distributed as M _{ U n i f o r m } and M _{ D P P } except the 22 divergence times are randomly drawn from the same series of uniform priors as above.
- 3.
The ${\mathcal{\mathcal{M}}}_{\mathit{\text{Exp}}}$ models are also identically distributed as M _{ U n i f o r m } and M _{ D P P } except the 22 divergence times are randomly drawn from a series of of exponential distributions: E x p(m e a n=0.058), E x p(m e a n=0.115), E x p(m e a n=0.173), E x p(m e a n=0.231), E x p(m e a n=0.289), and E x p(m e a n=0.577). These exponential distributions have the same variance as their uniform counterparts in the first two series of models.
For each of the six models in each of the three series of models, I simulated 1000 datasets (18,000 datasets in total). I then analyzed each simulated dataset under all four prior models (Table 2), producing 72,000 posterior estimates, each with 1000 samples. I also estimated a GLM-regression adjusted posterior from each of the posterior samples [39].
An empirical application
The models used to analyze the data from the 22 pairs of taxa from the Philippines ( M ), and a subset of nine of those pairs from the Islands of Negros and Panay ( $\mathbb{M}$ )
Model | Priors |
---|---|
M _{ m s B a y e s } | t∼D U{1,…,Y} τ∼U(0,34.64 [ 17.3 M G A]) θ_{ A }∼U(0,0.01) θ_{D1},θ_{D2}∼B e t a(1,1)×2×U(0,0.01) ζ_{D1}∼U(0,1) ζ_{D2}∼U(0,1) |
M _{ U n i f o r m } | t∼D U{a(Y)} τ∼E x p(m e a n=10 [ 5 M G A]) θ_{ A }∼E x p(m e a n=0.005) θ_{D1}∼E x p(m e a n=0.005) θ_{D2}∼E x p(m e a n=0.005) |
ζ_{D1}∼B e t a(5,1) ζ_{D2}∼B e t a(5,1) | |
M _{ D P P } | t∼D P(χ∼G a m m a(1.5,18.1)) τ∼E x p(m e a n=10 [ 5 M G A]) θ_{ A }∼E x p(m e a n=0.005) θ_{D1}∼E x p(m e a n=0.005) |
θ_{D2}∼E x p(m e a n=0.005) ζ_{D1}∼B e t a(5,1) ζ_{D2}∼B e t a(5,1) | |
${\mathbf{M}}_{\mathit{\text{DPP}}}^{\mathit{\text{inform}}}$ | t∼D P(χ ∼G a m m a(1.5,18.1)) τ ∼E x p(m e a n=6 [ 3 M G A]) θ_{ A } ∼E x p(m e a n=0.005) θ_{D1} ∼E x p(m e a n=0.005) |
θ_{D2} ∼E x p(m e a n=0.005) ζ_{D1}∼B e t a(5,1) ζ_{D2}∼B e t a(5,1) | |
${\mathbf{M}}_{\mathit{\text{DPP}}}^{\mathit{\text{simple}}}$ | t∼D P(χ∼G a m m a(1.5,18.1)) τ∼E x p(m e a n=10 [ 5 M G A]) θ_{ A }=θ_{D1}=θ_{D2}∼E x p(m e a n=0.005) ζ_{D1}=ζ_{D2}=1.0 |
${\mathbb{M}}_{\mathit{\text{DPP}}}$ | t∼D P(χ∼G a m m a(1.5,5.0)) τ∼E x p(m e a n=10 [ 5 M G A]) θ_{ A }∼E x p(m e a n=0.005) θ_{D1}=θ_{D2}∼E x p(m e a n=0.005) |
ζ_{D1}=ζ_{D2}=1.0 |
To compare models that sample over ordered versus unordered models of divergence, I also analyzed the data from the subset of nine-taxon pairs that are sampled from the Islands of Negros and Panay in the Philippines. The model I used for these analyses had a Dirichlet-process prior over divergence models and two demographic parameters (θ_{ A } and (θ_{ D }) for each pair of taxa, in addition to the n_{ i }−1 coalescent times (see Table 4 for details). One of the models, which I denote $\mathbb{M}{\text{\xb0}}_{\mathit{\text{DPP}}}$, maintained the identity of the taxon pairs and sampled over ordered models of divergence, while the other (${\mathbb{M}}_{\mathit{\text{DPP}}}$) re-sorted the summary statistics of the pairs by π_{ b }, losing the identity of the taxa and thus sampled over unordered models of divergence. For both analyses, I simulated 5 ×10^{7} samples from the prior and retained an approximate posterior of 10,000 samples.
Results
Validation analyses: Estimation accuracy
In terms of estimating the variance of divergence times (D_{ T }), the models with exponentially distributed priors (M_{ U s h a p e d }, M_{ U n i f o r m }, and M_{ D P P }) perform similarly when applied to datasets generated under all four of the models in Table 2 (Additional file 1: Figure S1). The M_{ m s B a y e s } model performs similarly to these models when applied to its own datasets, but is sensitive to model violations and is more biased when applied to data generated under the other three models (Additional file 1: Figure S1). Results are similar for the GLM-adjusted estimates of D_{ T }, albeit the regression adjustment tends to improve estimates of this continuous statistic for all the models (Additional file 1: Figure S2).
The same general pattern is seen for estimates of $\stackrel{\u0304}{T}$, with (1) all four models performing similarly when applied to the data generated under the M_{ m s B a y e s } model, (2) the models with exponentially distributed priors performing similarly when applied to data generated under the other three models, and (3) the M_{ m s B a y e s } model is sensitive to model violations and is more biased whenever applied to data generated under other models (Additional file 1: Figure S3). Also, the regression adjustment tends to slightly improve estimates of this continuous statistic for all of the models (Additional file 1: Figure S4).
In terms of estimating the number of divergence events (|τ|), the M_{ D P P } model has the lowest root mean square error (RMSE) when applied to data generated under most of the models of Table 2 (Additional file 1: Figure S5). The M_{ m s B a y e s } model performs slightly better when applied to its own data, but is the worst performer when applied to data generated under other models (Additional file 1: Figure S5). There is a trend of M_{ D P P }>M_{ U n i f o r m }>M_{ U s h a p e d }>M_{ m s B a y e s } in terms of estimation accuracy as measured by RMSE when the models are applied to data generated under most of the models (Additional file 1: Figure S5). Unlike for the continuous statistics, regression adjustment of this discrete statistic tends to increase estimation bias; all of the models tend to underestimate |τ| after the GLM-adjustment (Additional file 1: Figure S6).
Validation analyses: Model-choice accuracy
I find that all four models accurately estimate the posterior probability of the one-divergence model when applied to their own datasets (i.e., when the prior is correct; see diagonal of Figures 1 and 2). The M_{ U n i f o r m } and M_{ D P P } models show robustness to prior violations and perform well when applied to data generated under other models (Figures 1 and 2). However, both are less accurate and tend to underestimate the probability of the one-divergence model when applied to the data generated under M_{ U s h a p e d } (Figures 1 and 2). In contrast, the M_{ m s B a y e s } model is biased toward overestimating the posterior probability of the one-divergence model when applied to data generated under the other three models (Figures 1 and 2). This bias is particularly strong whenever divergence models are not distributed under its U-shaped prior (Figure 1C–D). The other model with the U-shaped prior on divergence models, but exponential priors on parameters (M_{ U s h a p e d }), performs similarly to the M_{ m s B a y e s } model in that it performs well when applied to its own data, but overestimates the probability of the one-divergence model when applied to data generated by the other models (Figures 1 and 2). However, the bias is stronger in the M_{ m s B a y e s } model than M_{ U s h a p e d }.
Overall, the results suggest that the M_{ D P P } and M_{ U n i f o r m } models are relatively robust in terms of model-choice accuracy, and when model violations do cause them to be biased, they tend to under-estimate the probability of the model with a single, shared divergence event. In contrast, the M_{ m s B a y e s } model is very sensitive to model violations, and strongly over-estimates the probability of the one-divergence model whenever the model is misspecified. Furthermore, the results suggest that using exponentially distributed priors on nuisance parameters rather than uniform priors helps the M_{ U s h a p e d } model perform better than M_{ m s B a y e s }, but it is still hindered by the U-shaped prior on divergence models and tends to over-estimate the probability of the one-divergence model whenever there are violations of the model.
Validation analyses: Ordered divergence models
The results show that the method performs similarly when sampling over ordered models of divergence (Additional file 1: Figures S9 and S10). This suggests that the method is not adversely affected by the increase in the number of possible discrete models (from 22 unordered to 4140 ordered models) when there are eight pairs of populations. This is encouraging, because, as discussed above, estimating unordered models of divergence by shuffling the summary statistic vectors calculated from the sequence alignments is not valid for most empirical datasets. Given these results, estimation of unordered divergence models should be avoided for empirical applications of the method.
Power analyses: Estimation accuracy
All of the models I evaluated (Table 2) struggle to estimate the variance of divergence times D_{ T } regardless of which of the three series of models (Table 3) the data were generated under (Additional file 1: Figures S11–13). The models with the U-shaped prior on divergence models (M_{ m s B a y e s } and M_{ U s h a p e d }) tend to underestimate the variance in divergence times (Plots A–L of Additional file 1: Figures S11–13). whereas the models with Uniform or Dirichlet-process priors over divergence models tend to overestimate variance in divergence times (Plots M–X of Additional file 1: Figures S11–13).
When the divergence times of the 22 population pairs are randomly drawn from a series of exponential priors (${\mathcal{\mathcal{M}}}_{\mathit{\text{Exp}}}$), the M_{ D P P } model is the best estimator of D_{ T }, followed by M_{ U n i f o r m } (Additional file 1: Figure S11). The M_{ m s B a y e s } model is strongly biased toward underestimating D_{ T }, estimating values of zero for most of the replicates across all the data models of ${\mathcal{\mathcal{M}}}_{\mathit{\text{Exp}}}$ (Additional file 1: Figure S11). The results of the M_{ U s h a p e d } model are intermediate between those of M_{ m s B a y e s } and the new models M_{ D P P } and M_{ U n i f o r m } (Additional file 1: Figure S11).
Similarly, when the true divergence times are randomly drawn from a series of uniform priors (${\mathcal{\mathcal{M}}}_{\mathit{\text{Uniform}}}$), the M_{ D P P } and M_{ U n i f o r m } models tend to over-estimate the variance in divergence times, whereas the M_{ m s B a y e s } model underestimates D_{ T }, estimating values of zero for most replicates across all the data models of ${\mathcal{\mathcal{M}}}_{\mathit{\text{Uniform}}}$ (Additional file 1: Figure S12). Again, the performance of the M_{ U s h a p e d } model is intermediate between the M_{ m s B a y e s } and M_{ D P P }/ M_{ U n i f o r m } models (Additional file 1: Figure S12). The results are very similar when the four models are applied to the data simulated under the ${\mathcal{\mathcal{M}}}_{\mathit{\text{msBayes}}}$ series of models (Additional file 1: Figure S13).
Power analyses: Model choice
As mentioned above, the increased power of the new models is also evident when looking at the estimated posterior probability of the one-divergence model across the power analyses (Figure 4 and Additional file 1: Figures S17–19). The M_{ D P P } and M_{ U n i f o r m } models estimate low posterior probability of |τ|=1 across all of the simulations. This is in contrast to the M_{ m s B a y e s } model, which infers high posterior probabilities of a single divergence for most replicates across all simulations (Figure 4 and Additional file 1: Figures S17–19). The exponential priors on divergence-time and demographic parameters (model M_{ U s h a p e d }) result in lower estimates of the probability of one divergence when compared to M_{ m s B a y e s }, but higher estimates when compared to M_{ U n i f o r m } and M_{ D P P } (Figure 4 and Additional file 1: Figures S17–19). The M_{ D P P } and M_{ U n i f o r m } models do frequently support the one-divergence model according to a Bayes factor criterion of greater than 10, but still less frequently than the M_{ m s B a y e s } model. This result is not surprising given the extremely small prior probability of the one-divergence model under the M_{ D P P } and M_{ U n i f o r m } models (i.e., very few posterior samples of the one-divergence model will result in a large Bayes factor under these models). However, the small posterior probability of the one-divergence model estimated under M_{ D P P } and M_{ U n i f o r m } should prevent an investigator from overinterpreting the Bayes factor as strong support for clustered divergences.
Empirical results
The disparity between the results of the M_{ m s B a y e s } model and the new models is even more pronounced when looking at the 10 divergence models (t) estimated to have the highest probability under each of the models (Additional file 1: Figures S26–30). Again, the new models estimate more divergences, a large amount of posterior uncertainty, and an order of magnitude smaller probability for their respective MAP-divergence model when compared to the M_{ m s B a y e s } model (Additional file 1: Figures S26–30).
The small difference between the results of the ${\mathbb{M}}_{\mathit{\text{DPP}}}$ and $\mathbb{M}{\text{\xb0}}_{\mathit{\text{DPP}}}$ models is consistent across multiple analyses, and thus could be due to error introduced to the ${\mathbb{M}}_{\mathit{\text{DPP}}}$ model by the invalid shuffling of the summary statistic vectors. Both models estimate a similar set of 10 unordered divergence models with the highest posterior probability (Additional file 1: Figures S31 and S32).
The main advantages of the $\mathbb{M}{\text{\xb0}}_{\mathit{\text{DPP}}}$ model over the ${\mathbb{M}}_{\mathit{\text{DPP}}}$ are that (1) the incorrect shuffling of the summary statistic vectors is avoided, (2) the identity of the taxa is maintained, and thus a fully marginalized estimate of divergence times across the taxa can be obtained (Additional file 1: Figure S33), and (3) the probability of co-divergence among any set of taxa can be estimated from the posterior sample.
Discussion
My results demonstrate that using alternative priors on parameters and divergence models improved the behavior of the msBayes model. In the new implementation, model-choice estimation is more accurate and shows greater robustness to model violations (Figures 1 and 2). The original model is very sensitive to violations and, when present, strongly over-estimates the probability of one-divergence event shared across all taxa (Figures 1 and 2). When more appropriate priors are used for divergence-time and demographic parameters, and either a Dirichlet-process or uniform prior applied across divergence models, the model is less sensitive to violations, and, when violations do cause bias, the method tends to underestimate the probability of models with temporally clustered divergences (Figures 1 and 2). Given that clustered models are often of particular interest to biogeographers, this behavior of the new method can be considered conservative.
The modifications also improve the method’s power to detect random variation in divergence times, reducing the tendency to estimate clustered divergences (Figures 3, 4, 5 and 6). My results are similar to those of Oaks et al. [7] in that I find msBayes will often infer strong support for clustered divergences when divergences are random over quite broad timescales (Figures 3, 4, 5 and 6). My results expand on this by showing that this behavior is consistent across a range of conditions underlying the data. The new method, dpp-msBayes, has greater power to detect random temporal variation in divergences, is less prone to spurious inference of clustered divergence models, and much less likely to incorrectly infer such models with strong support (Figures 3, 4, 5 and 6).
By evaluating a model intermediate between the old and new implementation (M_{ U s h a p e d }), I was able to determine the relative affects of my modifications to the model. Across all of the analyses, the results show that using better priors on divergence-time and demographic parameters alone does improve the performance of the method. The magnitude of the bias toward inferring support for the one-divergence model when there are model violations is reduced when the exponential priors are used in place of the uniform priors (Figures 1 and 2). Furthermore, using exponential priors improves the method’s power to detect temporally random divergences (Figures 3, 4, 5 and 6). Throughout the analyses, the intermediate model (M_{ U s h a p e d }) performs better than the msBayes model, but not as well as the models with alternative priors on divergence models. This suggests, as predicted by Oaks et al. [7, 15], that the tendency of msBayes to erroneously support models of temporally clustered divergences is caused by a combination of (1) small marginal likelihoods of models with more τ parameters due to uniform priors on divergence-time and demographic parameters and (2) the U-shaped prior on divergence models giving low prior density to models with intermediate numbers of divergence parameters. The former essentially rules out models with many τ parameters, which causes the latter to act like an "L-shaped" prior with a spike of prior density on the one-divergence model. Given the parameter richness of the model and the relatively small amount of information in the summary statistics, it is not surprising that the combination of these two factors can create a strong tendency to infer clustered models of divergence.
While the modifications improve the behavior of the model, I urge caution when using the method and interpreting its results. The method attempts to approximate the posterior of a very parameter-rich model using relatively little information from the data. For example, when applied to the dataset of 22 taxon pairs from the Philippines [7], the model has as many as 604–625 free parameters (depending on |τ|), and samples over 1002 unordered divergence models. Even under the simplest possible model allowed under the new implementation, the model still has 471–492 free parameters. Furthermore, the stochastic coalescent and mutational processes being modeled predict a large amount of variation in possible datasets even when the parameter values are known. The richness and stochastic nature of the model makes for a difficult inference problem, especially when using a small number of summary statistics calculated from the sequence alignments of each taxon pair. The population-genetic summary statistics used by the method contain little information about many of the free parameters in the model. Thus, I expect the improved method will still be sensitive to priors, and the power, while improved, may still be low. While there is much less prior sensitivity under the new model compared to those observed by Oaks et al. [7], there is still an effect when comparing the results of the empirical data analyzed under a diffuse (M_{ D P P }) and informative $\left({\mathbf{M}}_{\mathit{\text{DPP}}}^{\mathit{\text{inform}}}\right)$ divergence-time prior (Figure 7C versus D). The fact that the posterior shifts toward the prior under the informative prior suggests that the shift away from the prior toward fewer divergence events under the diffuse prior might still be caused by small marginal likelihoods of models with more divergence-time parameters (Figure 7).
Nonetheless, it is reassuring to see a large amount of posterior uncertainty when the new implementation is applied to the empirical datasets (Figures 7 and 8). Applications of the msBayes model often result in strong posterior support for estimated scenarios (e.g., [3, 5–12]), as I found here (Figure 7). Given the richness of the model, the variance of the processes being modeled, and the relatively small amount of information in the summary statistics calculated from the sequence data, finding strong posterior support for any scenario is unexpected. Based on results of the empirical and power analyses (Figures 4, 6, 7 and 8), the new implementation more accurately reflects posterior uncertainty and avoids spurious support for biogeographical scenarios.
I also urge caution when using dpp-msBayes due to the lack of theoretical validation of Bayesian model choice when the full data are replaced by summary statistics that are insufficient for discriminating across models under comparison [44], which is certainly the case here. Robert et al. [44] demonstrated that ABC estimates of model posterior probabilities can be inaccurate when such across-model insufficient statistics are used.
Given all of these caveats, I encourage investigators to view this method as a means of exploring their data for general temporal patterns of divergences across taxa, rather than a rigorous means of evaluating hypotheses. As recommended by Oaks et al. [7], any results from the method should be accompanied by (1) analyses under a variety of priors to assess the assumptions underlying model inference and the prior sensitivity of the results, and (2) simulation-based power analyses to provide insight into the temporal resolution of the method. Both approaches are important to help guide the interpretation of results.
Given the difficulty of this estimation problem, I anticipate that full-likelihood methods that can leverage all of the information present in the sequence data will become increasingly important for robustly estimating shared evolutionary history across taxa [45]. With improving numerical methods for sampling over models of differing dimensionality [46, 47], advances in Monte Carlo techniques [48], and increasing efficiency of likelihood calculations [49], analyzing rich comparative phylogeograpical models in a full-likelihood Bayesian framework is becoming computationally practical, especially when considering that simulating millions of random datasets from the prior under the simple ABC rejection approach is inefficient and computationally nontrivial.
Conclusions
I introduced a new model for estimating shared divergence histories across taxa from DNA sequence data within an approximate-Bayesian model-choice framework. The new method, dpp-msBayes, takes a non-parametric approach to the problem by using a Dirichlet-process prior on the temporal distribution of divergences across taxa. The new method shows improved robustness, accuracy, and power compared to the existing method, msBayes. Compared to msBayes, the new approach better estimates posterior uncertainty, which greatly reduces the chances of incorrectly estimating biogeographical scenarios of shared divergence events. This is important, because models of shared divergence events are often ofparticular interest to researchers who employ these methods. This new tool will allow evolutionary biologists to better leverage comparative genetic data to assess the affects of regional and global biogeographical processes on biodiversity.
Declarations
Acknowledgements
I am greatful to Melissa Callahan, Mark Holder, Emily McTavish, Daniel Money, Jordan Koch, Adam Leaché, Peter Foster, two anonymous reviewers, and Christian Robert and his blog (http://xianblog.wordpress.com/) for insightful comments that greatly improved this work. I thank the National Science Foundation for supporting this work (DEB 1011423 and DBI 1308885). This work was also supported by the University of Kansas (KU) Office of Graduate Studies, Society of Systematic Biologists, Sigma Xi Scientific Research Society, KU Department of Ecology and Evolutionary Biology, and the KU Biodiversity Institute. I also thank Mark Holder, the KU Information and Telecommunication Technology Center, KU Computing Center, and the iPlant Collaborative for the computational support necessary to conduct the analyses presented herein.
Authors’ Affiliations
References
- Hudson RR: Gene genealogies and the coalescent process. Oxf Surv Evol Biol. 1990, 7 (1): 1-44.Google Scholar
- Wakeley J: Coalescent Theory: An Introduction. 2009, Greenwood Village, Colorado, USA: Roberts and Company PublishersGoogle Scholar
- Hickerson MJ, Stahl EA, Lessios HA: Test for simultaneous divergence using approximate Bayesian computation. Evolution. 2006, 60 (12): 2435-2453.PubMedView ArticleGoogle Scholar
- Huang W, Takebayashi N, Qi Y, Hickerson MJ: MTML-msBayes: Approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity. BMC Bioinformatics. 2011, 12: 1-doi:10.1186/1471-2105-12-1PubMedPubMed CentralView ArticleGoogle Scholar
- Crews SC, Hickerson MJ, Leaché AD: Two waves of diversification in mammals and reptiles of Baja California revealed by hierarchical Bayesian analysis. Biol Lett. 2007, 3 (6): 646-650. doi:10.1098/rsbl.2007.0368PubMedPubMed CentralView ArticleGoogle Scholar
- Stone GN, Lohse K, Nicholls JA, Fuentes-Utrilla P, Sinclair F, Schönrogge K, Csóka G, Melika G, Nieves-Aldrey J-L, Pujade-Villar J, Tavakoli M, Askew RR, Hickerson MJ: Reconstructing community assembly in time and space reveals enemy escape in a Western Palearctic insect community. Curr Biol. 2012, 22 (6): 532-537.PubMedView ArticleGoogle Scholar
- Oaks JR, Sukumaran J, Esselstyn JA, Linkem CW, Siler CD, Holder MT, Brown RM: Evidence for climate-driven diversification? a caution for interpreting ABC inferences of simultaneous historical events. Evolution. 2013, 67 (4): 991-1010. doi:10.1111/j.1558-5646.2012.01840.xPubMedView ArticleGoogle Scholar
- Barber BR, Klicka J: Two pulses of diversification across the Isthmus of Tehuantepec in a montane Mexican bird fauna. Proc R Soc B-Biol Sci. 1694, 277: 2675-2681. doi:10.1098/rspb.2010.0343View ArticleGoogle Scholar
- Carnaval AC, Hickerson MJ, Haddad CFB, Rodrigues MT, Moritz C: Stability predicts genetic diversity in the Brazilian Atlantic forest Hotspot. Science. 2009, 323 (5915): 785-789. doi:10.1126/science.1166955PubMedView ArticleGoogle Scholar
- Chan LM, Brown JL, Yoder AD: Integrating statistical genetic and geospatial methods brings new power to phylogeography. Mol Phylogenet Evol. 2011, 59 (2): 523-537. doi:10.1016/j.ympev.2011.01.020PubMedView ArticleGoogle Scholar
- Plouviez S, Shank TM, Faure B, Daguin-Thiebaut C, Viard F, Lallier FH, Jollivet D: Comparative phylogeography among hydrothermal vent species along the East Pacific Rise reveals vicariant processes and population expansion in the South. Mol Ecol. 2009, 18 (18): 3903-3917. doi:10.1111/j.1365-294X.2009.04325.xPubMedView ArticleGoogle Scholar
- Voje KL, Hemp C, Saetre G-P, Stenseth NC, Flagstad Ø: Climatic change as an engine for speciation in flightless Orthoptera species inhabiting African mountains. Mol Ecol. 2009, 18 (1): 93-108. doi:10.1111/j.1365-294X.2008.04002.xPubMedGoogle Scholar
- Jeffreys H: Theory of Probability. 1939, Oxford, UK: Clarendon PressGoogle Scholar
- Lindley DV: A statistical paradox. Biometrika. 1957, 44: 187-192.View ArticleGoogle Scholar
- Oaks JR, Linkem CW, Sukumaran J: Implications of uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al. 2014. arXiv:1402.6397 [q-bio.PE] http://arxiv.org/abs/1402.6397.
- Hickerson MJ, Stone GN, Lohse K, Demos TC, Xie X, Landerer C, Takebayashi N: Recommendations for using msbayes to incorporate uncertainty in selecting an ABC model prior: A response to Oaks et al. Evolution. 2014, 68 (1): 284-294. doi:10.1111/evo.12241PubMedView ArticleGoogle Scholar
- Hasegawa M, Kishino H, Yano T-A: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985, 22 (2): 160-174.PubMedView ArticleGoogle Scholar
- Felsenstein J: Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol. 1981, 17: 368-376.PubMedView ArticleGoogle Scholar
- Kingman JFC: The coalescent. Stochastic Process Appl. 1982, 13: 235-248.View ArticleGoogle Scholar
- Bell ET: Exponential numbers. Am Math Month. 1934, 41: 411-419.View ArticleGoogle Scholar
- Sloan NJA: The on-line encyclopedia of integer sequences, sequence A000041. http://oeis.org/A000041.
- Sloan NJA: The on-line encyclopedia of integer sequences, Sequence A008284. http://oeis.org/A008284.
- Malenfant J: Finite, closed-form expressions for the partition function and for Euler, Bernoulli, and Stirling numbers. 2011, arXiv:1103.1585v6 [math.NT] http://arxiv.org/abs/1103.1585.Google Scholar
- Ferguson TS: A Bayesian analysis of some nonparametric problems. Ann Stat. 1973, 1 (2): 209-230.View ArticleGoogle Scholar
- Antoniak CE: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat. 1974, 2 (6): 1152-1174.View ArticleGoogle Scholar
- Lartillot N, Philippe H: A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol. 2004, 21 (6): 1095-1109. doi:10.1093/molbev/msh112PubMedView ArticleGoogle Scholar
- Huelsenbeck JP, Andolfatto P: Inference of population structure under a Dirichlet process model. Genetics. 2007, 175 (4): 1787-1802. doi:10.1534/genetics.106.061317PubMedPubMed CentralView ArticleGoogle Scholar
- Huelsenbeck JP, Suchard MA: A nonparametric method for accomodating and testing across-site rate variation. Syst Biol. 2007, 56 (6): 975-987. doi:10.1080/10635150701670569PubMedView ArticleGoogle Scholar
- Larget B, Baum DA, Smith SD, Rokas A, Ané C: Bayesian estimation of concordance among gene trees. Mol Biol Evol. 2007, 24 (2): 412-426. doi:10.1093/molbev/msl170PubMedGoogle Scholar
- Heath TA, Holder MT, Huelsenbeck JP: A Dirichlet process prior for estimating lineage-specific substitution rates. Mol Biol Evol. 2011, 29 (3): 939-955. doi:10.1093/molbev/msr255PubMedPubMed CentralView ArticleGoogle Scholar
- Heath TA: A hierarchical Bayesian model for calibrating estimates of species divergence times. Syst Biol. 2012, 61 (5): 793-809. doi:10.1093/sysbio/sys032PubMedPubMed CentralView ArticleGoogle Scholar
- Escobar MD, West M: Bayesian density estimation and inference using mixtures. J Am Stat Assoc. 1995, 90 (430): 577-588.View ArticleGoogle Scholar
- Tajima F: Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983, 105 (2): 437-460.PubMedPubMed CentralGoogle Scholar
- Watterson GA: On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975, 7 (2): 256-276.PubMedView ArticleGoogle Scholar
- Takahata N, Nei M: Gene genealogy and variance of interpopulational nucleotide differences. Genetics. 1985, 110 (2): 325-344.PubMedPubMed CentralGoogle Scholar
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123 (3): 585-595.PubMedPubMed CentralGoogle Scholar
- Beaumont M, Zhang W, Balding DJ: Approximate Bayesian computation in population genetics. Genetics. 2002, 162: 2025-2035.PubMedPubMed CentralGoogle Scholar
- Blum MGB, François O: Non-linear regression models for Approximate Bayesian Computation. Stat Comput. 2009, 20 (1): 63-73.View ArticleGoogle Scholar
- Leuenberger C, Wegmann D: Bayesian computation and model selection without likelihoods. Genetics. 2010, 184: 243-252. doi:10.1534/genetics.109.109058PubMedPubMed CentralView ArticleGoogle Scholar
- Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L: ABCtoolbox: a versatile toolkit for approximate bayesian computations. BMC Bioinformatics. 2010, 11: 116-doi:10.1186/1471-2105-11-116PubMedPubMed CentralView ArticleGoogle Scholar
- Nei M, Li W-H: Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Nat Acad Sci. 1979, 76 (10): 5269-5273. doi:10.1073/pnas.76.10.5269PubMedPubMed CentralView ArticleGoogle Scholar
- Huelsenbeck JP, Rannala B: Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst Biol. 2004, 53 (6): 904-913.PubMedView ArticleGoogle Scholar
- Oaks JR, Sukumaran J, Esselstyn JA, Linkem CW, Siler CD, Holder MT, Brown RM: Evidence for climate-driven diversification? a caution for interpreting ABC inferences of simultaneous historical events. Dryad Digital Repository. 2012, doi:10.5061/dryad.5s07mGoogle Scholar
- Robert CP, Cornuet J-M, Marin J-M, Pillai NS: Lack of confidence in approximate Bayesian computation model choice. Proc Nat Acad Sci. 2011, 108 (37): 15112-15117.PubMedPubMed CentralView ArticleGoogle Scholar
- Sukumaran J: Geographies and genealogies: Phylogeographic simulation and Bayesian approaches to statistical phylogeographic model selection. PhD thesis,. 2012, University of KansasLawrence, Kansas, USAGoogle Scholar
- Green PJ: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995, 82 (4): 711-732.View ArticleGoogle Scholar
- Lemey P, Rambaut A, Drummond AJ, Suchard MA: Bayesian phylogeography finds its roots. PLoS Comput Biol. 2009, 5 (9): 1000520-View ArticleGoogle Scholar
- Sankararaman S, Jordan MI, Bouchard-Côté A: Phylogenetic inference via sequential Monte Carlo. Syst Biol. 2012, 61 (4): 579-593. doi:10.1093/sysbio/syr131PubMedPubMed CentralView ArticleGoogle Scholar
- Ayres DL, Darling A, Zwickl DJ, Beerli P, Holder MT, Lewis PO, Huelsenbeck JP, Ronquist F, Swofford DL, Cummings MP, Rambaut A, Suchard MA: BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Syst Biol. 2012, 61 (1): 170-173.PubMedPubMed CentralView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.