Inferring ancestral states without assuming neutrality or gradualism using a stable model of continuous character evolution

Background The value of a continuous character evolving on a phylogenetic tree is commonly modelled as the location of a particle moving under one-dimensional Brownian motion with constant rate. The Brownian motion model is best suited to characters evolving under neutral drift or tracking an optimum that drifts neutrally. We present a generalization of the Brownian motion model which relaxes assumptions of neutrality and gradualism by considering increments to evolving characters to be drawn from a heavy-tailed stable distribution (of which the normal distribution is a specialized form). Results We describe Markov chain Monte Carlo methods for fitting the model to biological data paying special attention to ancestral state reconstruction, and study the performance of the model in comparison with a selection of existing comparative methods, using both simulated data and a database of body mass in 1,679 mammalian species. We discuss hypothesis testing and model selection. The stable model outperforms Brownian and Ornstein-Uhlenbeck approaches under simulations in which traits evolve with occasional large “jumps” in their value, but does not perform markedly worse for traits evolving under a truly Brownian process. Conclusions The stable model is well suited to a stochastic process with a volatile rate of change in which biological characters undergo a mixture of neutral drift and occasional evolutionary events of large magnitude.


Background
Statistical methods that take into account the dependencies introduced into comparative data by phylogenetic relatedness are fundamental to hypothesis testing and exploration in comparative biology [1,2]. Each comparative method implicitly imputes to the evolutionary process some specific stochastic model [3]. The validity of phylogenetic comparative methods depends on the degree to which historical events can be accommodated by the underlying stochastic model of trait evolution, and a mismatch between model and reality can yield erroneous statistical results, especially in the reconstruction of inaccurate ancestral character states [4][5][6][7]. For this reason it is important to develop realistic stochastic models of *Correspondence: me378@cam.ac.uk 1 St. John's College, University of Cambridge, Cambridge CB2 1TP, UK Full list of author information is available at the end of the article character evolution (and to constrain those models using empirical data where possible). The Brownian motion model of evolution -in which the value of a continuous trait evolves by accruing incremental changes drawn from a random distribution with zero mean and finite constant variance, such that the sum of many increments is distributed according to a normal density [1] -was introduced to model changes in gene frequencies by Cavalli-Sforza and Edwards [8] but now underlies (directly or indirectly) a range of popular methods for the analysis of continuous traits distributed over phylogenetic trees. In the realm of ancestral state reconstruction the value of a trait evolving across a phylogenetic tree can at all times be shown to be probabilistically distributed according to a normal distribution with variance depending only on tree topology and branch lengths [9][10][11], while in the realm of regression analysis the model yields normally distributed residuals and a covariance matrix depending only on tree topology and http://www.biomedcentral.com/1471-2148/14/226 branch lengths [12], both cases giving rise to simple and analytically-tractable solutions. The Brownian walk of a trait value can be regarded as a model of gradualistic neutral evolution since variation in the trait arises from a process of random drift over the branches of a phylogeny at a constant rate and without directionality. A further application thus involves identifying significant departures from the expectation of the Brownian motion model as a means of detecting adaptive variation in the tempo and mode of trait evolution [13][14][15].
We here describe a simple generalization of the Brownian motion model of continuous character evolution which extends the model to include cases where increments to an evolving character may arise from a symmetric stochastic process but without assuming constant finite variance. Not only is the Brownian assumption of constant finite variance, to the best of our knowledge, unverified in existing biological systems, but its relaxation may better suit the domain of application of the standard Brownian motion model, namely biological characters likely subject to some degree of selection, and in many cases it may offer a more robust form of statistical inference with respect to outliers in the data. Broadly speaking, we conceptualize diversifying selection on continuous characters as causing an increase and purifying selection on such characters a decrease in the rate of evolutionary change. The relative frequencies of these forms of natural selection along with neutral drift are expected to generate -for sums of evolutionary increments over long periods of time -limit distributions with heavier tails than expected under the Brownian motion model.
Just as the limit distribution of sums of variates drawn from a distribution with constant finite variance is the normal distribution, so the limit distribution of sums of variates drawn from a distribution without fixed finite variance is the stable distribution [16][17][18], which has the normal distribution as a special case and which otherwise is characterized by heavy tails, closure under convolution and, potentially, by skew [19]. Stochastic processes and random walks with volatile variance and heavy tails have previously been modelled robustly using stable distributions in areas as diverse as the study of fractional diffusion [20], earthquake forecasting [21], signal processing [22], animal foraging [23], rainfall modelling [24], commodity pricing [25], real estate markets [26], foreign exchange rates [27], financial statistics [28], image processing [29] and telecommunications management [30]. Landis et al. [31] recently used a Brownian motion model with Lévy stable jumps to model jumps in the evolution of continuous traits. According to our view of selection resulting in the generation of evolutionary rate volatility we limit our attention in this paper to the symmetric zero-centred stable distribution parameterized by α, the index of stability and c, the scale. We model evolution using the stable generalization of Brownian motion, the stable random walk [17]. The stable random walk shares some attractive properties of Brownian motion, the most important being closure under convolution, such that the sum of several stable distributions is itself a stable distribution with the same α parameter. This closure under linear transformation contrasts with all other non-stable heavy tailed distributions, which may yield complex and analytically intractable mixtures in convolution. Thus, after accumulating increments from a symmetrical zero-centred stable distribution, the value of an evolving trait is always probabilistically distributed according to a stable distribution with mean equal to the trait value prior to accumulation and with scale proportional to the number of increments, or in phylogenetic parlance the branch length. Figure 1 illustrates the log probability densities of some unit-scale zero-centred symmetrical stable distributions that differ in the value of α, including the special cases of the normal distribution (α = 2) and the Cauchy distribution (α = 1). Figure 2 illustrates some stable random walks driven by accumulation of increments from stable noise with various α values. It is clear from inspection of these figures that declining α is associated with increasingly heavy tails, translating into increasingly volatile random walks which exhibit Brownian-like drift (associated with increments drawn from the high-probability modal region of the underlying distribution) interspersed with occasional rapid jumps in trait value (associated with increments drawn from the low-probability heavy tails).
As a motivating example of how the accommodation of evolutionary rate volatility may affect the inference of evolutionary processes, we present a simple toy model of continuous character evolution on a phylogenetic tree with six tips. The values of an evolving character were simulated under the Brownian motion model and values at internal nodes were reconstructed based on values at the tips only, first by fitting a Brownian motion model and second by fitting a stable model using the method presented later in this paper. Next, the value of the character on a single tip was artificially inflated by a factor of ten, to represent a bout of adaptive evolution (or other event such as measurement error) on the branch leading to that tip. Again, values at internal nodes were reconstructed under both models. Results are illustrated in Figure 3.
Both reconstruction methods yield similar ancestral states for the original Brownian motion data, but differ for the manipulated data. The Brownian motion model exhibits an "averaging effect" in which the apparently high rate of evolution resulting from the manipulation is distributed somewhat evenly over all the branches, causing a large increase in estimated ancestral states for several internal Figure 3 Ancestral state reconstruction of data simulated under Brownian motion. Top row: phylogenies exhibiting ancestral state reconstructions; each tip is labeled with the known character state, while at each internal node, upper values represent the Brownian motion maximum likelihood reconstruction and lower values represent the reconstruction under the stable Markov chain Monte Carlo model to be described in this paper (left: original simulated data; right: the character value of a single tip has been multiplied by ten to simulate rapid evolution (or measurement error) on a single branch). Bottom row: the marginal probability density derived from MCMC for the ancestral state assignment of the root node using unmodified (left) and modified (right) data; the Brownian motion marginal probability density is grey and the stable margin probability density is black. http://www.biomedcentral.com/1471-2148/ 14/226 nodes. The stable model, however, can entertain rare increments of large magnitude, and so is not strongly affected by the manipulation; the apparently high rate of evolution on a single branch can be accommodated as a rare event in a heavy-tailed stochastic process.
In this paper we describe the stable model and the procedures used to fit it to empirical data, with an emphasis on ancestral state reconstruction. We outline strategies for hypothesis testing and model selection. We apply the stable model to simulated data in order to estimate error rates of the model selection procedure, and also use it, alongside a number of alternative models, in an illustrative case study of ancestral state reconstruction based on a large dataset of mammalian body masses. Finally we discuss the relationship between the stable model of trait evolution and a number of alternative extensions of the Brownian motion model that have been proposed in the literature, and suggest avenues for further development.

Model
Consider a rooted phylogenetic tree T which may or may not contain polytomies. A continuous character X evolves along the branches of T , taking values b 1 and b 2 at the beginning and end, respectively, of each branch b. Under the standard Brownian motion model of evolution, the continuous character evolves by accumulating random independent increments drawn from a probability distribution with constant mean zero and constant finite variance σ 2 . According to the central limit theorem, the sum of such increments along a branch b of length t b is probabilistically distributed according to a normal density with mean zero and variance t b σ 2 , a density which we denote φ b 2 − b 1 ; t b σ 2 . Given the independence of increments, and therefore of branches, the likelihood of an ancestral state reconstruction of a continuous character evolving under Brownian motion is given by the product: If the variance of the increment generating distribution is not constant and finite (as we suppose to be the case under departures from neutrality and gradualism) then according to the generalized central limit theorem the limit distribution for the sum of random independent variates is not normal but falls into the more general class of stable distributions, parameterized by an index of stability α and a scale c. The symmetric stable distribution has probability density denoted S(x; α, c). Following Matsui and Takemura [33], the unit stable density with c = 1 may be defined as: and the general symmetrical stable density S(x; α, c) is a transformation of the unit stable density S x c ; α, 1 /c. One important property of the stable distribution is that the normal distribution is a special case with α = 2. For the zero-centred symmetrical cases treated here, we note that: Furthermore, the sum of t variates drawn from a stable distribution S(α, c) is distributed as S α, (tc α ) 1 α . Thus, under a stable model of evolution, the likelihood of an ancestral state reconstruction of X is given by: which is functionally identical to Eq. 1 when α = 2.
Unfortunately, there is no analytical solution to the stable probability density function in Eq. 2, so it is necessary to employ numerical methods [19,[33][34][35][36] to calculate likelihoods of stable models. The model does not lend itself to direct maximum likelihood estimation of parameters due to the existence of a highly multi-modal likelihood surface and because arbitrarily high likelihoods can be obtained by setting b 1 = b 2 for any single internal branch b and having c → 0, a problem exhibited by other statistical models with non-constant variance [37], and circumvented through the placing of an appropriate prior on the scale parameter that penalizes the approach to zero, and fitting the model using a Bayesian approach. Since numerical estimation methods are unreliable under extremely heavy tails (i.e. α < 0.2) [33] we apply flat or triangular priors to the index of stability on the domain 0.2 < α ≤ 2, and a loose inverse gamma prior on the scale parameter which has Pr(c → 0) → 0 (see section on Choice of Priors below).

MCMC estimation
Markov chain Monte Carlo (MCMC) methods are widely used to estimate complex multivariate probability densities in numerous biological fields. The goal of such methods is to generate a sample from a probability distribution by constructing a Markov chain that has the desired distribution as its equilibrium density. A common strategy is to utilize a Metropolis-Hastings sampler [38] in which the statistical model is initialized with some set of parameter values θ, a new candidate parameter θ is generated by a symmetrical proposal distribution, and accepted as the next step of the Markov chain with probability equal to P(θ )/P(θ). We found that the Metropolis-Hastings sampler performed poorly in estimating ancestral states and parameters of the stable model due to the multimodality of the local likelihood surface ( Figure 4) and the non-independence of ancestral state values, which together generate numerous very small "islands" of high likelihood which are unlikely to be explored by the Markov chain in a reasonable amount of time. This results from the fact that proposals generated by the Metropolis-Hastings algorithm are constrained by the proposal distribution to be close in value to the current parameter value. Modified versions of the procedure, such as Metropoliscoupled Markov chain Monte Carlo [39] did not yield any benefit.
We found that implementation of a slice sampler [40] manifestly improved the mixing of Markov chains. New proposals under a slice sampler are drawn from the entire range of possible values of the parameter and are not restricted by a proposal distribution to be close in value to the current estimate. Instead of accepting new proposals according to the likelihood ratio criterion, slice samplers accept all new proposals but tend to make proposals that have likelihood similar to the current likelihood of the chain. This allows large jumps across widely dispersed peaks in the multi-modal likelihood surface at each step of the Markov chain, offering an "escape" from local suboptimal peaks in the likelihood surface. To be specific, each step of the chain involves the value of each individual parameter θ i being replaced by a new value θ i drawn randomly from the conditional probability distribution P θ i |θ j , θ k , . . . .
The procedure for generating new proposals under slice sampling is a three-step process. First, the conditional proability of the current parameter, given fixed values for all other parameters in the model, P θ i |θ j , θ k , . . . , is estimated. This is a numerical value between zero and one that is proportional to the likelihood of the current model. Second, a random number y is drawn from the uniform distribution between zero and the calculated conditional probability. Third, we identify the set of all possible parameter values which would have conditional probability P θ i |θ j , θ k , . . . greater than y and accept a new proposal at random from this set. If y is very close to the current conditional probability, the set of proposals with conditional probability greater than y will tend to include large numbers that fit the data better than the current model. Selecting a proposal at random from this set would tend to increase the likelihood of the Markov chain over time. However, when y is much less than the current conditional probability, the set of proposals with conditional probability higher than y will also include many candidates with lower likelihood, and selecting a proposal from the set would permit the chain to decline in likelihood over time. Neal (2003) has shown that the stationary distribution of such a Markov chain constitutes a sample from the complete posterior distribution that we are attempting to characterize. Critically, for a highly multi-modal posterior distribution, slice sampling permits large jumps away from suboptimal likelihood peaks even when the sampled distribution exhibits widely separated modes. The procedure is illustrated graphically in Figure 5.

Choice of priors
We specify prior distributions on the stable parameters α and c, denoted Pr(α) and Pr(c) respectively. We have found that choice of priors on the index of stability have little effect on the ancestral state reconstruction, while the prior on the scale parameter effects the smoothness of the posterior density ( Figure 4, right panel). How should we choose appropriate priors for the Monte Carlo simulation? We believe that the Brownian motion model is a useful null model for the evolutionary process. Since the Brownian motion model is a special case of the stable model it is conservative to choose priors that maximize the a priori likelihood of the Brownian model fit. Consider Equations 1 and 5. If we set α = 2 and c = σ/ √ 2 then the stable model will be identical to the inferred Brownian motion model. So, priors with maximum likelihood at α = 2 and c = σ/ √ 2 are conservative with respect to the null hypothesis.
We use a uniform or triangular distribution between 0.2 and 2 for Pr(α). We do not permit α < 0.2 since numerical estimation methods are unreliable at this extreme of heavy tails [33]; if a triangular rather than uniform prior is selected then its maximum is set equal to 2.
We use an inverted gamma distribution for our prior on c, though of course in principle the stable model accommodates any reasonable priors that maintain 0 < α ≤ 2 and c > 0. Our choice of Pr(c) is motivated by the fact that the inverted gamma distribution has an appropriate shape for the scale of a stable distribution and, as the conjugate prior for a normal variance, it is commonly used as the prior on the unknown variance of a normal distribution with fixed mean [41]. It is defined as: with a > 0 and b > 0. As suggested above, it seems conservative to suppose that the rate of evolutionary change with highest a priori likelihood should be the rate of change imputed by the neutral Brownian motion reconstruction, denoted σ in Equation 1. This rate of evolution is readily calculated from ancestral states generated by a squared change parsimony ancestral state reconstruction or other methods; hence, the inverted gamma prior on the scale of the evolutionary process should have its mode at σ/sqrt2. The mode of an inverted gamma distribution is defined as x = b/(a − 1), so the values of the prior hyperparameters a and b should obey the following constraints: a = −1 + b √ 2/σ and b > σ/ √ 2. So, the value of a is entirely determined and the appropriate value of b is easily obtained by linear search or Newton-Raphson optimization, maximizing the likelihood of Equation 1.
In this way, our use of the Brownian motion model as a null hypothesis allows us to choose directly some reasonable and conservative priors on the more general stable model, in which the Brownian motion ancestral state reconstruction has highest prior likelihood. By default, the software accompanying this paper uses this approach, but can use custom values of the prior hyperparameters a and b upon request.

Hypothesis testing and model selection
It is desirable to formulate a statistical model selection criterion to determine whether the stable model of continuous character evolution (with α < 2) fits the data better than than the Brownian motion model (with α = 2), as a means of estimating the best possible ancestral state http://www.biomedcentral.com/1471-2148/14/226 estimates and of testing the hypothesis that a set of character data at the tips of a phylogeny exhibits patterns consistent with departure of the evolutionary process from neutrality. For maximum likelihood approaches, Akaike's Information Criterion (AIC) [42] is a popular choice. However, since AIC depends upon maximized likelihood, while our Markov chain Monte Carlo procedure only generates a sample from the posterior (conditional) distribution of stable parameters and ancestral states, it is difficult to calculate Akaike's Information Criterion in this case.
There exist a number of Bayesian generalizations of AIC which may be calculated from posterior MCMC samples, most notably the Deviance Information Criterion (DIC) [43]. Statistical deviance is a quantity that, for our purposes, can be defined as a measure of goodness of fit that is calculated at each step in the Markov chain, and is equal to minus twice the log likelihood of the stable model specified at the focal step of the Markov chain, i.e., D = −2log L(X, α, c; T ). Hence, a lower deviance indicates a higher likelihood and better fit to the data. Akaike's Information Criterion is equal to the deviance of the maximum likelihood model, offset by a penalty for complexity equal to twice the number of parameters in the model. In the absence of a maximum likelihood estimate of the deviance, DIC makes use ofD, the average deviance across all posterior samples from the Markov chain, as its primary measure of goodness of fit. It is also possible to calculate a "base-line" measure of goodness of fit by calculating, for each parameter, the average value over all steps in the Markov chain, and then calculating the deviance of this average model, denotedD. IfD is much higher thanD, the implication is that the model being fit is highly complex, since parameter estimates during the chain must be oscillating around values far removed from their mean across the chain as a whole. The Deviance Information Criterion encapsulates this notion of complexity in an "effective number of parameters" or p D equal toD −D. So, while AIC is defined as the deviance of the maximized likelihood plus the number of parameters, DIC is defined as the average deviance across all posterior samples plus the effective number of parameters, or, D + p D .
Our simulation studies indicated that in phylogenetic datasets p D did not increase sufficiently in line with tree size, resulting in over-fitting of stable models on large trees. This is likely because tree size is not incorporated as a component of model complexity in the information criterion. A Bayesian Predictive Information Criterion (BPIC) developed by Ando [personal communication [44]], which amounts to DIC with a increased multiplier on the p D penalty, resolved this problem of over-fitting (see Results below). Specifically, we use the criterion D + 2p D .

Software implementation
Efficient software for fitting the stable model to phylogenetic trees and their associated data was written in C++ and is available at http://www.sfu.ca/~micke/stabletraits. html as source code and also compiled for various operating systems. The software reports a posterior sample of ancestral state reconstructions and stable parameter values in a format compatible with the Tracer software application [45], along with the proportional scale reduction factor convergence diagnostic [46] and Bayesian Predictive Information Criterion for assessment of model fit [44]. Multiple chains are run on independent threads, or on independent processors in a cluster computing environment (for which a torque job submission script is also available).

Application to simulated and natural data
In order to assess the improvement (if any) in quality of ancestral state reconstruction and the ability of statistical tests to identify biological characters that have evolved under a stable rather than Brownian stochastic process, data were simulated under a variety of conditions. Evolutionary increments were generated randomly from a stable model with unit scale and index of stability ranging from 1.0 to 2.0 in steps of 0.2, on random phylogenetic trees with 25, 40, 60, 90, 130, 175, 235, 325, 440 and 600 tips generated under the Yule model in Mesquite [47]. This procedure gave rise to 60 experimental conditions, each consisting of 250 trees and simulated datasets, and varying in tree size and index of stability of simulated trait values. Based on data at the tips, ancestral states were reconstructed using the modal posterior density estimate from MCMC, first with α fixed at 2.0 (representing a Brownian motion model) and second with a free α (representing a general symmetric stable model). Ancestral states were also estimated using the homogenous Ornstein-Uhlenbeck model for comparison. Estimates of Type I and Type II error rates under the BPIC criterion were made for each tree size/stability condition. Accuracy was assessed for Brownian, Stable and OU models by calculating variance of the inferred ancestral states from the true simulated states for each simulated dataset under each condition. We report here the median variance ratio of stable/Brownian and stable/Ornstein-Uhlenbeck reconstructions within each experimental condition.
To provide a demonstration of the model's application to real biological datasets, we made use of a supertree of mammalian species [48] and fitted the stable model of character evolution to data on the log adult body mass of 1,679 eutherian mammals [combined data from [49,50]]. For purposes of comparison we also estimated ancestral states under two homogenous evolutionary models, the Brownian motion model and the Ornstein-Uhlenbeck model, the latter estimates obtained by maximum http://www.biomedcentral.com/1471-2148/14/226 likelihood search from parameter estimates generated by the SLOUCH package in R [51]. Finally we also inferred ancestral states using two heterogenous models, the timeheterogenous Early Burst model [14,52] in which the rate of a Brownian process increases or decreases exponentially in time, and the clade-heterogenous model of Eastman et al. [53] in which the rate of a Brownian process is "inherited" within clades but allowed to occasionally shift in value on probabilistically selected branches of a phylogeny. The former ancestral state estimates were made by maximum likelihood search based on parameter estimates and scaled phylogenies derived from the GEIGER package in R [54] while the latter ancestral state estimates were made by Brownian motion maximum likelihood reconstruction on a tree with branch lengths scaled by the maximum posterior probability estimates of rates reported by the Auteur package in R [53]. Connections between these various models and the stable model are discussed below.

Results
Our analysis of simulated evolution on random phylogenetic trees indicates that biological traits derived from a Brownian process can be statistically distinguished from those derived from a non-Brownian symmetrical stable process on the basis of the Bayesian Posterior Information Criterion (Table 1). This model selection criterion was found to be highly conservative, with a low rate of false rejection of the null hypothesis for trees of all sizes, but with the expected low power to detect small departures from the neutral Brownian model on small trees.
In ancestral state reconstruction of traits simulated under Brownian motion the stable model performed as well as the Brownian model (Table 2), with median squared error ratio ranging from 1.02 to 1.00, and better than the Ornstein-Uhlenbeck model, with median squared error ratio ranging from 0.36 on the smallest tree to 0.86 on the largest (Table 3). For trees simulated under the stable process with α < 2, the stable model yielded more accurate ancestral state estimates, increasingly so for large trees and large deviations from Brownian motion. For the most extreme index of stability considered here (α = 1.0) the mean squared error under Brownian motion reconstruction was from 4.8 to 100 times higher, and under Ornstein-Uhlenbeck reconstruction from 5.9 to 100 times higher, than the mean squared error under stable reconstruction.
Results of our analysis of a dataset of mammalian body masses are presented in Table 4. The maximum posterior probability estimate of the index of stability α was 1.55 and the Brownian motion model was soundly rejected in favour of the stable model under the BPIC model selection criterion ( BPIC = 465). The Ornstein-Uhlenbeck also fit significantly better than the Brownian motion model ( AICc = 24) as did the multi-rate model of Eastman et al. [53] (posterior probability of no rate changes = 0). In order to provide the entries in Table 4 for Early Burst we inferred maximum likelihood ancestral states under the global parameter estimate reported by Cooper and Purvis [55] in their broader study of mammalian body mass evolution: the Early Burst model did not differ significantly from the Brownian motion model on this dataset.

Discussion
Reconstructing a historical narrative of trait evolution over time is central to both the formulation and testing of hypotheses in evolutionary biology [66][67][68]. Comparative phylogenetic methods do so in a formal framework using stochastic models of the evolutionary process that implicitly or explicitly assume some probabilistic distribution of ancestral states over internal nodes of a phylogeny [1][2][3]47,69]. Brownian motion is a fundamental stochastic model of evolution which assumes that biological traits evolve by accruing incremental changes drawn from a random distribution with zero mean and finite constant variance. However, most evolutionary hypotheses of interest involve traits thought to be subject to selection leading to directional tendencies, relatively rapid grade shifts and   [4]. Indeed, studies of the performance of ancestral state reconstruction using known ancestral states derived from fossil estimates [6,7,70,71] or from taxa evolving sufficiently rapidly to be observed in real time [5] indicate that a mismatch between the stochastic model and historical reality can result in incorrect estimates [but see also [72]].

Table 1 Type I and Type II error rates resulting from stable versus Brownian model selection using the BPIC model selection criterion, with data simulated on trees of various sizes under stable processes with varying indices of stability
In this paper we have described a stochastic model of continuous character evolution based on a generalization of the Brownian model of evolution that does not assume that the rate of evolutionary change is constant and finite. Under these relaxed assumptions, the sum of increments accruing to an evolving character along each branch of a phylogeny is known to tend toward a stable limit distribution, which is identical to a normal distribution in the special case of Brownian motion but otherwise has heavier tails (Figure 1). These heavy tails allow rare evolutionary increments of large magnitude to occur, resulting in a volatile evolutionary process characterized by occasional "adaptive" evolutionary shifts interspersed with neutrallike patterns of variation ( Figure 2). Stable parameters and ancestral states can be fit to biological data distributed over a phylogenetic tree using Markov chain Monte Carlo methods. We have implemented software that makes use of a slice sampler [40] to sample the posterior probability distribution of ancestral state assignments at each node and the values of stable parameters (the index of stability α, which is equal to 2 under Brownian motion, and scale c). The slice sampler is able to take advantage of our knowledge of the approximate location of modal regions to move across a multi-modal likelihood surface without becoming trapped in locally but not globally optimal regions ( Figure 5). An additional benefit of the slice sampler is its adaptive step size, requiring no tuning of proposal distributions, which makes practical application of the method straightforward. Our analysis of simulated data indicates that the Bayesian Predictive Information Criterion (BPIC) provides a conservative test of the hypothesis of departures from neutrality (i.e., the existence of heavy tails) in an evolutionary process ( Table 1), and that the stable model estimates ancestral states with reduced error in comparison with the Brownian motion model, when traits evolve by accumulating increments from a probability distribution without constant finite variance ( Table 2).
We found the stable model to fit the eutherian body-size data significantly better than the Brownian motion model (Table 4), suggesting the existence of departures from neutrality. Under the stable model, the ancestral eutherian is relatively small; in line with fossil evidence (also presented in Table 4), low body mass persists through early diversification of the Superorders Afrotheria, Euarchontoglires and Laurasiathera and the origin of various orders of small size such as Primates, Rodentia, Lagomorpha, Scandentia, Afrosoricida and Macroscelidea; large  Reconstructed values are provided for the most recent common ancestors of extant taxa in the specified clades. The right column details a selection of oldest fossil taxa within each clade for which body mass estimates are available [data from [56][57][58][59][60][61][62][63][64][65]].
reductions in body size are rare, occuring in Chiroptera, while large increases in body size occur with the origin of several modern Orders of relatively large species including the ungulates (Perissodactyla + Artiodactyla), Carnivora, Cetartiodactyla, Proboscidea and Sirenia. The Brownian motion model differs in several respects. First the Brownian motion reconstruction of the ancestral eutherian's body mass is an order of magnitude greater than the stable reconstruction; the Brownian motion reconstruction thus posits significant reductions of body size prior to the evolution of orders of small body size including Rodentia, Scandentia, Chiroptera, Eulipotyphla and Macroscelidea. As expected, the Brownian motion model exhibits an "averaging effect" more generally, in which transformations in body mass are distributed somewhat evenly over the phylogeny, while the stable model permits large transformations in body mass to occur on a smaller subset of branches. For this reason, the ancestral state reconstruction of taxa ancestral to typically large species (i.e., Cetartiodactyla, Perissodactyla, Proboscidea) are smaller under the Brownian motion model, and the ancestral state reconstruction of taxa ancestral to typically small http://www. A desire to model the adaptive evolution of continuous traits has given rise to a number of approaches that refine or extend the Brownian model. We categorize these as homogenous approaches, in which the stochastic process underlying the generation of evolutionary increments to an evolving character does not vary across branches of a phylogenetic tree, versus heterogenous processes, in which the stochastic process varies across branches. One popular homogenous model is the global Ornstein-Uhlenbeck model of trait evolution [73], in which the direction and rate of evolution at any time depends upon a selection coefficient and the degree of deviation of the trait's current value from some global optimum or "phylogenetic mean" . This so-called "mean-reverting" process has been used as a model of stabilizing selection since deviation away from the phylogenetic mean is penalized under maximum likelihood reconstruction. Our simulation studies indicate that the stable model estimates ancestral states with reduced error in comparison to the homogenous Ornstein-Uhlenbeck model, when traits evolve by accumulating increments from a probability distribution without fixed variance ( Table 3). The homogenous Ornstein-Uhlenbeck reconstruction of mammalian body mass (Table 4) is in some respects intermediate between the Brownian motion and stable reconstructions, with relatively small ancestors of Orders with small body size and relatively large ancestors of Orders with large body size. This can be interpreted in terms of the stabilizing selection model with an intermediate phylogenetic mean: the proposal of a small ancestral rodent (for example) permits a tendency to evolve back toward the mean within the rodent clade, while the proposal of a large ancestral carnivore (for example) permits the same tendency in the opposite direction within the carnivore clade. This pattern is most striking in the ancestral state inferred for Afrotheria (8.95 kg versus 1.73 kg under Brownian motion and 0.36 kg under the stable model), where a large ancestor reduces the rate of evolution on branches leading ultimately to the large elephants and manatees while permitting many high-likelihood reductions of size toward the phylogenetic mean in small taxa such as Afrosoricida and Macroscelidea. This reconstruction for Afrotheria seems unlikely and may lead us to suppose that a single phylogenetic mean for the entire Eutheria does not form a realistic model of stabilizing selection.
Cooper and Purvis [55] report success in fitting more complex heterogenous models to a larger set of mammalian body mass data. The Ornstein-Uhlenbeck model is easily extended to the heterogenous case by permitting more than one clade-specific phylogenetic means [51,[73][74][75], the number and phylogenetic position of such means being specified a priori or estimated from the data. Various transformations of the phylogeny, such as raising all branch lengths to a constant power in order to approximate speciational change [67] or somewhat ad hoc transformation of branch lengths to maximize the likelihood of a Brownian model [12,76,77] have also been used to generate implicitly heterogenous models. The Early Burst model [14,52], also applied by Cooper and Purvis [55], is interesting in generating rate heterogeneity by allowing the rate of a Brownian motion process to vary over time, rather than across branches or clades. The rate of evolution is taken to be an exponentially increasing or decreasing function of node height, allowing a greater proportion of evolutionary change to occur early in the phylogeny or late in the phylogeny depending on the choice of an exponential scaling factor r. We applied the Early Burst model to our mammalian body mass dataset but did not identify a significant deviation from r = 0; ancestral state estimates in Table 4 are derived from a reconstruction based on r = −0.009 estimated by Cooper and Purvis [55]. The concentration of more evolutionary change in basal branches of the phylogeny appears to allow rapid early deviation from the phylogenetic mean value, with the majority of ancestral taxa exhibiting marginally smaller body sizes, often more consistent with the fossil evidence, than under the Brownian motion reconstruction, yet surprisingly with considerably less ordinal-level diversification than imputed by the stable model which does not build early diversification in to the stochastic process itself.
Homogenous approaches offer a number of advantages over heterogenous approaches in that the latter category must not only infer the parameters of the stochastic process but also must infer the structure of rate heterogeneity over the phylogeny. Especially when heterogeneity is associated with clades, over-fitting heterogenous models by positing too many rate shifts or clade-specific evolutionary regimes may become a danger. Eastman et al. [53] have proposed a heterogenous model of trait evolution that explicitly avoids over-fitting by sampling over model parameter values and the number of model parameters simultaneously. The model is an extension of the standard Brownian motion model in which the rate of evolution is "inherited" over time but may undergo occasional shifts in value. Each shift introduces a new parameter that is penalized in a reversible jump Markov chain Monte Carlo algorithm. We found that the complexity of the reversible jump algorithm considerably increases the computational burden of fitting the model: for the body mass data considered here, the stable slice sampler accomplished around http://www.biomedcentral.com/1471-2148/14/226 70,000 steps per minute on a modest dual-core home laptop, versus around 2,500 per minute for the multi-rate model, without the need for a lengthy calibration of proposal densities beforehand. Ancestral states reconstructed under the Eastman model (Table 4) were in broad agreement with other non-Brownian methods presented here in imputing smaller early mammals, and were in close agreement with results of the Early Burst analysis.
In general, the stable model suggests a greater degree of ordinal-level diversification of mammalian body masses, and appears to accommodate a more volatile evolutionary process, than any of the other models considered here. For 13 of the 22 nodes listed in Table 4 the stable model reconstructs the smallest body masses of any model, and for 4 nodes it reconstructs the largest body mass of any model, making the stable reconstruction consistent with the hypothesis of small early mammals and occasional marked ordinal-level enlargement. The ability of the stable model to accommodate striking variation in evolutionary rate, even more so than approaches such as that of Eastman et al. [53] explicitly designed to model such variation, is most apparent in the highly diverse and species-poor Afrotheria, where the stable reconstruction involves the largest ancestral elephants and manatees yet the smallest Afroinsectivores of any of the methods considered here. While the heterogenous Ornstein-Uhlenbeck model binds rate volatility to the structure of the phylogeny through the assumption of clade-specific phylogenetic means, and the Eastman et al. RJ-MCMC model binds rate volatility to the structure of the phylogeny through the "inheritance" of rate shifts from ancestral to descendant branches, the stable model through its homogenously heavy tails provides unstructured volatility that is able to concentrate the production of evolutionary variation onto relatively few branches scattered across the phylogeny.
The stable model is the simplest non-Brownian model considered here, requiring only a single parameter in addition to the standard Brownian motion model. The relative efficiency of the estimation procedure used to fit the stable model may make it attractive for analysis of very large trees or large sets of trees derived from Bayesian phylogenetics. Furthermore, deviation from the Brownian model according to the BPIC criterion may be used to provide independent statistical support for the adoption of one of the more complex heterogenous models currently available. Rates imputed by the stable model may guide appropriate selection of branches for independent rates in such cases. In the mammal body mass data examined here, for example, the frequency distribution of standardized trait changes along branches of the phylogeny (reported by the accompanying software) indicates accelerated evolution at the origin of a number of clades including Hyomys (white-eared giant rats), Tragulidae (mouse deer), Manidae (pangolins), Megachiroptera (megabats), Megadermatidae (false vampire bats), Solenodontidae (solenodons), Orycteropodidae (aardvark) and Hyracoidea (hyraxes), suggesting that these clades may merit their own phylogenetic mean values under a heterogenous Ornstein-Uhlenbeck approach. In order to determine whether such a model is useful in any particular case it is necessary, as with any stochastic model of evolution, to rigorously constrain the model empirically, and while the results presented here are primarily illustrative and to provide comparison across unconstrained models, we note that the low ancestral state inferences for extinct taxa at the root of Rodentia, Lagomorpha, Primates, Chiroptera and Lipotyphla, and the high ancestral state inferences for taxa at the root of Sirenia and Elephantidae, appear broadly in line with fossil evidence.
While our homogenous approach may be associated with some advantages with respect to efficiency, simplicity and unstructured volatility, the heterogenous models such as Early Burst have the benefit of imposing an explicit evolutionary narrative on the process of trait diversification which may be useful for exploring and testing general hypotheses about historical processes [14]. Heterogenous models typically involve the elaboration of a simple Gaussian kernel to accommodate phylogeny-or time-structured variation in the evolutionary process. We suggest that in future work heterogenous stable models analogous to those considered above may be readily generated by directly replacing the Gaussian kernel with the more general stable kernel, at the expense of a single parameter. Stable Ornstein-Uhlenbeck processes, for example, are already well-characterized [17]. The stable model we introduce to phylogenetic evolutionary biology here may find other uses, for example in assigning substitution rates to edges on phylogenetic trees under relaxed clock models [78]. One primary obstacle to the replacement of Gaussian kernels by stable kernels in models of continuous character evolution is that stable distributions have undefined variance [17]. Methods making direct use of variance are typically used to detect correlated evolution between multiple continuous characters evolving on the same phylogenetic tree [2]. Independent contrasts [79] for example, generates standardized data points for each univariate character by scaling the increments accruing along paired branches of a phylogeny by the square root of the sum of branch lengths, which is proportional to the expected standard deviation of a Brownian process. Methods of phylogenetic regression [12,80,81] extend least squares methods to multivariate phylogenetic data by incorporating branch length and topological information into the model's covariance matrix. The fact that stable variance is undefined means that there is no stable equivalent to standard deviation or the covariance matrix. We note that regression and correlation models based http://www.biomedcentral.com/1471-2148/14/226 on stochastic processes driven by non-Gaussian stable perturbations have been implemented successfully in non-phylogenetic fields [i.e., [82][83][84]]. These approaches raise the prospect that likelihood-based analysis of heavy tailed multivariate distributions may offer useful insights into future studies of correlated evolution of multiple continuous characters in evolutionary biology, since correlated evolution is precisely the kind of problem domain in which the putative Brownian assumptions of neutrality and gradualism are likely to be invalid.

Conclusions
Stochastic process models of evolution regard an evolving trait as accumulating, over time, random increments drawn from some underlying probability distribution. We have described a generalization of the Brownian motion model in which the increment-generating function is a stable distribution characterized by heavy tails, which accommodates both the neutral drift associated with Brownian motion but also occasional burst of rapid evolutionary change. Simulation and empirical studies indicate that stable models can successfully be fit to biological data, and Bayesian model selection criteria can be used to assess goodness of fit in comparison with the Brownian motion model, which is a special case of the more general symmetrical stable distribution. The model presented in this paper is a homogenous model in which a single stochastic process, common to all branches of a phylogeny, gives rise to increments to evolving continuous traits. The approach may be contrasted with heterogenous models in which different evolutionary regimes are bound to different subtrees of a phylogeny, or arise stochastically across the branches of a phylogeny. While homogenous models offer simplicity and computational convenience, it is currently unclear whether such models -and even more highly parameterized heterogenous models of trait evolution -are capable of capturing adequately the richness and complexity of evolutionary processes in nature. We have made an empirical attempt to corroborate results from the stable model on the basis of published fossil data on extinct mammalian body masses. The various homogenous and heterogenous models are consistent in some respects but also exhibit marked differences in reconstructed ancestral states. A major line of future research should be to expand the availability of fossil and other historical data that would facilitate the empirical measurement of the distribution of evolutionary changes over time for known traits and phylogenetic trees. In general, we believe it is likely to be the case that models of continuous trait evolution should be tailored specifically for the empirical question at hand. The present research suggests that models of evolution incorporating heavy tails and volatile stochastic processes may be a useful addition to the toolset of biologists interested in traits exhibiting heterogenous patterns of diversification driven by adaptive evolution.