Exploring power and parameter estimation of the BiSSE method for analyzing species diversification

Background There has been a considerable increase in studies investigating rates of diversification and character evolution, with one of the promising techniques being the BiSSE method (binary state speciation and extinction). This study uses simulations under a variety of different sample sizes (number of tips) and asymmetries of rate (speciation, extinction, character change) to determine BiSSE’s ability to test hypotheses, and investigate whether the method is susceptible to confounding effects. Results We found that the power of the BiSSE method is severely affected by both sample size and high tip ratio bias (one character state dominates among observed tips). Sample size and high tip ratio bias also reduced accuracy and precision of parameter estimation, and resulted in the inability to infer which rate asymmetry caused the excess of a character state. In low tip ratio bias scenarios with appropriate tip sample size, BiSSE accurately estimated the rate asymmetry causing character state excess, avoiding the issue of confounding effects. Conclusions Based on our findings, we recommend that future studies utilizing BiSSE that have fewer than 300 terminals and/or have datasets where high tip ratio bias is observed (i.e., fewer than 10% of species are of one character state) should be extremely cautious with the interpretation of hypothesis testing results.


Background
Maddison et al. [1] described a method using a binarystate speciation and extinction model (BiSSE) that estimates rates of change in a binary character and rates of speciation and extinction contingent on the character state, given a known distribution of observed states on the tips of a tree of contemporaneous species. The BiSSE method assumes that the tree includes all extant species and that all species have known data for the state of the single binary character [1]. FitzJohn et al. [2] provided additional methodology for including known species diversity into an incomplete phylogeny if a researcher could confidently place taxa into unresolved terminal clades. BiSSE provides estimates for the rates of speciation in each character state (λ 0 , λ 1 ), extinction in each state (μ 0 , μ 1 ), and character transition rates between states (q 01 , q 10 ). Such estimates are important to studies of whether a particular feature is controlling the diversification rates of clades and whether the effect is on speciation, extinction, or both [3][4][5]. Recent studies have also modified the basic approach of the BiSSE method to estimate further parameters associated with quantitative characters [6] and geography [7].
Maddison et al. [1] discussed the development of the BiSSE method and demonstrated its ability to estimate rates from simulated data sets. Recently there has been a burst in the number of studies that have utilized the BiSSE method to explore rates of diversification in various taxonomic groups for the purpose of testing hypotheses involving key innovations and the evolution of characters [8][9][10][11][12][13][14][15][16][17][18][19]. However, the majority of these studies explore diversification and character evolution hypotheses with fewer than 200 taxa, despite the initial warning by Maddison et al. [1] that the power of analysis may be affected by low sample size.
Wise use of any statistical method should be guided by an understanding of its power and ability to distinguish hypotheses of interest, but current empirical studies lack sufficient guidance, because there has been little work on the BiSSE method's behavior under different sample sizes (numbers of species) and parameter values. The primary goal of this study is to explore the power and accuracy of parameter estimation of the BiSSE method. Using simulations we explore the number of species needed to obtain good power, the advantages of estimating fewer parameters, and the effect of the extreme asymmetries in rates. Additionally, because there are many ways that an observed excess of a character state can be explained through macroevolutionary processes (e.g., increased speciation rates in taxa with State 1, higher extinction rates in taxa with State 0), there is also concern regarding confounding effects [20], and whether BiSSE can identify which rate asymmetries are causing the observed character excess in empirical data.

Power of BiSSE method Asymmetries in speciation rate
Extremely low power (<5%) was observed when tree size included 50 taxa, regardless of the degree of rate asymmetry ( Figure 1a, Additional file 1: Table S1). Power marginally improved for tree sizes of 100 taxa, but decreased considerably as the asymmetry increased to 10 and 20× rate difference (Figure 1a). A tree size of 300 tips indicated a higher overall power for each difference in rate than those observed with 50 or 100 taxa. Power increased as the degree of difference in rate asymmetry increased, until reaching a rate difference of four times the speciation rate where the power begins to decrease as the degree of rate asymmetry grows. This same pattern was observed for a tree size of 500 tips (Figure 1a). In general, power increased as tree size increased (Figure 1a), and a pattern of power decrease following an increase in tip-ratio bias resulting from rate asymmetry was observed in the simulation including more tips (Figure 1a).

Asymmetries in rate of character state change
For each asymmetrical model of character change (e.g., 2×, 5×), power increased with an increase in tree size ( Figure 1b). Power remained low and did not increase with a difference in rates when the tree size was 50 taxa. There was a slight increase in power as the degree of difference in rates increased with tree sizes of 100 taxa (Figure 1b). Power was higher for simulations with 300 taxa for each respective difference in rates compared to the same simulations with 100 and 50 taxa. Power increased as the rate difference increased to 5× then leveled off to 10×, followed by a strong decrease in power as the difference in rates increased to 20× and 40× (Figure 1b). This same pattern was observed in simulations of 500 taxa and in general there was a decrease in power observed in all tip sizes beyond a 10x difference in rates of character change (Figure 1b).

Asymmetries in extinction rate
As with rates of speciation and character state change, power increased as tree size increased regardless of the amount of difference in extinction rate (Figure 1c). With tree sizes of 50 taxa, power was low regardless of the degree of rate asymmetry, and power was similarly low with 100 taxa. With tree sizes of 300 and 500 taxa, power increased until a rate difference of 3×, with power decreasing as the degree of rate asymmetry continue to increase. Overall, power in hypothesis testing with extinction rates was lower than those for speciation or character state change (Figure 1c).   Figure 1 Power of BiSSE under simulations with asymmetrical rates of (a) speciation, (b) character change, (c) extinction. See Table 1 and Additional file 1: Table S1-S3 for list of rate values.
Power of six parameter model vs. four parameter model The change in power among low and high tip bias scenarios when using a model that estimates fewer parameters is observed in Figure 2 (Additional file 1: Table  S4). The reduced four parameter model differs from the full six parameter model (λ 0 , λ 1 , μ 0 , μ 1 , q 01 , q 10 ) with two of the rates constrained to be equal in both character states, for example (λ 0 , λ 1 , μ 0 =μ 1 , q 01 =q 10 ). In nearly all cases, especially with tree sizes greater than 300 taxa, there was an increase in power when the four parameter model was used compared to the six parameter model ( Figure 2). The greatest increase in power occurred in the scenarios with a greater degree of bias (7:1 State 1 favored), and the effect was observed regardless of which parameter possessed the rate asymmetry ( Figure 2, Additional file 1: Table S4). As was the case with the six parameter estimations, power is low when tree size is below 300 taxa with a four parameter model.

Parameter estimation
Estimating parameters under asymmetrical speciation As previously identified by Maddison et al. (2007), the BiSSE method estimates speciation rates well, with strong delimitation of known asymmetrical rates given an appropriate tree size (Figure 3a), although precision decreases as the speciation rate asymmetry, and tip bias, increases. For estimates of known symmetrical rates of character change under asymmetrical speciation rates, accuracy and precision of estimating the known rates become significantly worse under the higher tip bias scenario ( Figure 3b). A similar pattern is recovered for estimates of known symmetrical extinction values where rates were estimated with more accuracy and precision under low tip bias (asymmetrical speciation rate of 1.25), with accuracy and precision strongly decreasing under high tip bias (20× asymmetry in speciation rates) as observed in Figure 3c. Accuracy and precision of parameter estimation greatly decreases for all rates with a small sample size of tips, regardless of low or high tip bias (Additional file 2: Figure S1).

Estimating parameters under asymmetrical character change
Estimates of asymmetries in character change are not as accurate or precise as estimating rates of speciation ( Figure 4b). In general, precision decreases as the rate difference increase results in a high tip bias scenario, with parameter estimation being poor for a known estimate of a 40× rate difference in character change. Symmetrical speciation rates (λ 0 = 0.1, λ 1 = 0.1) are well estimated when the rate of character change is 2× (low tip bias) as seen in Figure 4a. However, with a 40× (high tip bias) difference in the rate of character change the precision of parameter estimation appears to decrease, and the number of estimates for highly asymmetrical speciation rates increases (Figure 4a). Parameter estimation of known extinction rates (μ 0 = 0.03, μ 1 = 0.03) is more accurate under the 2× (low tip bias) scenario rather than the 40× (high tip bias) scenario where estimates of known extinction values are very poor (Figure 4c).

Estimating parameters under asymmetrical extinction
Estimates of known extinction values are poor and seem to lack precision, with precision decreasing as  Table S4 for rate values.
the difference in extinction rates increase ( Figure 5c). Speciation values are well estimated to the known values (λ 0 = 0.1, λ 1 = 0.1) when the difference between extinction rates is 2× (low tip bias), but accuracy and precision seems to decrease dramatically as the extinction rate asymmetry increases (high tip bias), leading to an abundance of highly asymmetrical estimates of speciation rates (Figure 5a). Parameter estimation of character change has the same pattern, in which the known rate (q 01 = 0.01, q 10 = 0.01) is well estimated under 2× (low tip bias) extinction rate differences ( Figure 5b), but is poorly estimated under an increased difference in extinction rate asymmetry that leads to high tip bias ( Figure 5b). This poor estimation leads to a dramatic increase in asymmetrical rates of character change that favor transitions from State 0 to 1.

Impact of tree size on power
The statistical power of the BiSSE method depends on the number of taxa and the degree of asymmetry in rates of speciation, extinction, and character state change. In terms of tree size, BiSSE achieves extremely low power when testing hypotheses of rate asymmetry if fewer than 100 taxa are used in the analysis, even when rates are known to be highly asymmetrical ( Figure 1). As a result, the potential for a Type II error (failing to reject the null hypothesis when the alternate hypothesis is true) is extremely high. The highest power attributed to any rate asymmetry associated with 300 taxa is only 50% (Figure 1) under a six parameter model. Researchers that attempt to utilize the BiSSE method with fewer than 300 taxa should take caution. Below 200 taxa there is little statistical power associated with identifying rate asymmetries, regardless of whether strong asymmetries exist. Maddison et al. [1] hypothesized that large amounts of data would be needed to distinguish significant asymmetries because there are many ways to arrive at a given phylogeny.

Power of simplified models
If external information justifies a simplification of the model, then greater power can be achieved. We found an overall increase in power when utilizing a four parameter model over a six parameter model regardless of which rate possesses the asymmetry (Figure 2, Additional file 1: Table S4), although tree sizes greater than 300 tips are still desirable. Whether researchers can take advantage of this greater power using a model with fewer free parameters depends of course on whether it is reasonable to assume in advance that any asymmetries are restricted to one rate.

Confounding processes
Strong asymmetries in rates of speciation, character state change, and extinction yielded, as expected, a strong excess of tips with a single character state ( Table 1). The question to a biologist observing such a pattern is; which rate's asymmetry might have led to such excess? Maddison [20] hypothesized that teasing apart parameters that are the cause of taxonomic state frequency asymmetry is difficult, and that simultaneously estimating these parameters may help address this issue. We found that under best-case scenarios where high tip bias is absent and sample size is high (preferably greater than 300 taxa), parameters are estimated accurately and precisely (Figures 3, 4, 5, Additional file 2: Figure S1,  Table 2). This indicates that the BiSSE method is capable of identifying the rate asymmetry that is causing taxonomic excess when datasets are used with a sufficient amount of tips and where high tip bias is absent, as seen in Figures 3, 4, 5 ( Table 2). In these scenarios, BiSSE is able to properly identify the known process with the rate asymmetry (Table 2), whether it was speciation (Figure 3a), character change (Figure 4b), or extinction (Figure 5c). This is further corroborated by the power, which remains high in cases where sample size is large and tip ratio bias is minimized (Figure 1). However, teasing apart which rate asymmetries are causing taxonomic excess of one character state is a problem under two conditions; if tree size is small (Additional file 2: Figure S1), regardless of the degree of tip bias, and if there is a high degree of tip ratio bias towards one state, even while simultaneously estimating parameters using the BiSSE method with trees of substantial size (500 tips, Figures 3, 4, 5, Additional file 2: Figure S1, Table 2). Below we discuss why high tip ratio bias leads to a decrease in power and parameter estimation, and how much tip ratio bias is necessary to have an adverse affect on teasing apart confounding processes.
When taxa are saturated for a particular state (high tip ratio bias), the BiSSE method estimates high rate asymmetries to explain this pattern, even for those rates known to be low and symmetrical (Figure 3, 4, 5, Table 2). For example, when extinction is 10× higher for State 0 than State 1 (Tip bias of 10 1 :1 0 ), the extinction asymmetry is detected, but the rate of character change is also estimated to be highly asymmetrical ( Table 2). The method infers rapid change from State 0 to 1 ( Figure 5b, Table 2) when in fact, the constrained rates were fairly low and symmetrical (q 01 = 0.01, q 10 = 0.01). The inability to identify confounding processes when there is high tip ratio bias results in the observed decrease in power when the degree of rate asymmetry increases in either the rate of speciation, extinction, or character change (Figure 1). This issue is not resolved with larger tree sizes (Figure 1), and the degree of tip bias needed to adversely affect the accuracy and precision of parameter estimation varied with the process (e.g., speciation, extinction, character change). The amount of bias needed to reduce accuracy and precision of parameter estimation, leading to worst-case scenarios, changed depending on the process. When estimating parameters associated with asymmetries of speciation rates, power sharply decreases when fewer than 2.5% of the taxa have one of the binary traits (Tables 1, 2, Figures 1, 3). Character change and extinction rate asymmetries are more adversely affected by trait rarity, with the worst-case scenarios happening when a binary trait occurs in fewer than 8-10% of the terminal taxa. As with speciation, high tip ratio bias leads to a sharp decrease in power regardless of sample size (Figures 1b,  1c, 4, 5; Tables 1, 2, Additional file 1: Tables S1-S3). Overall, caution is recommended when using the BiSSE method if tip ratio bias is greater than a 10:1 ratio for binary states in terminal taxa, as this level of trait rarity in one state may have a negative impact on the power of the analysis (regardless of sample size) and the ability to identify confounding processes under these worst-case scenarios.
The observed pattern of a decrease in power associated with speciation, extinction, and character change when rates become increasingly asymmetrical is troubling ( Figure 1). In general, investigators should be cautious using the BiSSE method when one of the binary characters in question is exceedingly rare in their data sets (less than 10%). BiSSE may mistakenly estimate the wrong parameter (or combination of parameters) to be the cause of taxonomic excess in situations where high tip ratio bias is observed, and increased tree size does not alleviate this issue. If investigators report a significant result, there is a possibility that the inability to identify confounding processes in these scenarios may impact the interpretation of the causes for the observed pattern of state asymmetries. Further work is needed to establish a robust methodology within the BiSSE framework for teasing apart the parameters that are directly contributing to character state bias under these high tip ratio scenarios.

Difference in estimation accuracy among processes of change
We found that the BiSSE method is far more accurate and precise with estimates of rates of speciation and character change than with extinction rates ( Table 2, Figures 3, 4, 5, Additional file 2: Figure S1), which was also noted by Maddison et al. [1]. Overall, a greater degree of tip ratio bias is needed to reduce the accuracy and precision of speciation parameter estimation and a loss of power when the rate asymmetry is related to speciation (discussed previously). While extinction can leave a signal in molecular phylogenetic trees recovered from extant taxa alone [21,22], estimating extinction rates from molecular phylogenies often results in rates approaching zero, which is potentially the result of cladogenetic events being directly inferred from molecular phylogenies and not extinction events [22]. Recently, Rabosky [23] indicated that without information from the fossil record, estimating extinction rates from molecular data alone may be potentially impossible if rates of Table 1 Parameter values for rate asymmetry simulations used to assess power and parameter estimation of BiSSE method The observed percent of terminals with State 0 are for simulations with 500 taxa. Observed tip ratios were nearly identical for other taxa sizes (50, 100, and 300).
diversification vary across a topology. As demonstrated by this study, BiSSE is capable of estimating rates of extinction from known values given sufficient data (e.g., large tree size, low tip bias), albeit with far less accuracy and precision than rates of speciation or character change. Also, if an asymmetry in extinction rates leads to a high tip ratio bias, the accuracy and precision of extinction rates estimation decreases (Figure 5c). Finally, the accuracy and precision of estimating rates is also impacted by a small sample size (<300 tips) regardless of tip ratio bias as indicated in Additional file 2: Figure S1. In addition to low power associated with small tree sizes, investigators should be cautious of a significant result if tree size is less than 300 tips, as the inferred cause(s) for the potential evolutionary pattern may be misled by the issue of confounding processes.

Identifying critical values
An additional issue uncovered in this analysis is the difficulty in finding appropriate critical values. The critical values from our simulations suffer from a great deal of variation (Additional file 1: Tables S1-S4). The nearly consistent difference in critical values suggests that simply comparing the statistic to a χ 2 distribution may not be appropriate as suggested by Maddison et al. [1]. Therefore, we suggest simulating your own critical values, using as many replicates as possible, as we did here. Suitable simulators are available in both Mesquite and the Diversitree R-package [2].

Implementation issues
Another possibility is that the method performs better than these results suggest, but limitations of our likelihood optimizer limit the power of our implementation. Our implementation, as described in Maddison et al. [1] uses several techniques to overcome the limitations of the Brent [24] optimizer, including using multiple searches from randomly generated starting points, and the option of starting searches from values estimated from more constrained models. Tests (Midford unpublished) with alternative optimizers, using the Diversitree package described FitzJohn et al. [2] indicate that little, if any, loss of power is actually related to the choice of optimizer.

Conclusion
The power of the BiSSE likelihood method to test hypotheses of rate asymmetry is susceptible to both tree size and variation in parameter rates. Overall, power of the BiSSE method is low if the tree size is below 300 taxa, and investigators should take special care to investigate the power of their results if applying the BiSSE method to topologies with fewer than 300 tips. Power is increased when estimating fewer parameters, so utilizing a four parameter model to test hypotheses may be preferable if appropriate. This study indicates that contrary to the hope expressed in Maddison [20], the problem of confounding effects can still occur while estimating process parameters simultaneously if there is low sample size and/or high tip ratio bias. Under scenarios of large sample size (greater than 300 taxa) and low tip bias, the BiSSE method accurately and precisely identifies the rate parameters contributing to the observed taxonomic excess. However, if diversification rate parameters are too asymmetrical (yielding a high tip ratio bias) and/or sample size is low, BiSSE is unable to accurately estimate rates. This in turn results in a dramatic decrease of power. We recommend that investigators must be cautious when interpreting their results if there is a character state bias among tips greater than a 10:1 ratio in favor of either binary state. In these worst-case situations, properly identifying the process responsible for taxonomic excess may be impossible regardless of the number of tips in the dataset. If investigators using data with fewer than 300 tips and/or with high tip bias report a significant result, there exists a possibility that the issue of confounding effects has misled the identified rate cause(s) of the significant result. Further work is needed within the BiSSE framework to develop methods to better identify which parameters are causing the character state bias in these worst-case scenarios (e.g., low sample size, high tip bias). Further exploration of the impact of multiple rate asymmetries is also needed. However, it is clear that if multiple rate asymmetries are occurring that promote high tip ratio bias there will be difficulties with power and parameter estimation.

Hypothesis testing and the power of the BiSSE method
The BiSSE likelihood calculation and parameter estimations were done in the Diverse package [25] of Mesquite 2.7 [26]. Maddison et al. [1] suggested that the probability of rejecting a false null hypothesis (power) may vary with the number of species in an analysis, and with the degree of rate difference among parameters. The initial exploration of power for the BiSSE method in Maddison et al. [1] focused on three rate asymmetric scenarios (one for speciation, character change, and extinction) with a tree size of 500 tips. To explore the full range of power for the BiSSE method when using the six parameter model, 500 trees were simulated under a variety of tip sizes and parameter combinations where an asymmetry in one rate parameter was introduced (e.g., λ 0 < λ 1 ).
When a tree and character are simulated with asymmetrical rates in one processspeciation, extinction, or character state changeour question was whether the BiSSE method could detect this asymmetry and estimate the rates correctly. A biologist facing such a question with real data may be interested in just one of the processes (e.g., is there an asymmetry in speciation rates?), and would therefore face a choice: are the other processes assumed to be symmetrical (i.e. extinction rates μ 0 = μ 1 ) or not? Assuming the other processes to be symmetrical reduces the complexity of the models and permits the method to focus entirely on the process of interest. We studied the benefits of such simplifying assumptions as described in the next section. However, because biologists typically would not have information confirming the other processes to be symmetrical, in most of our analyses we permitted all three processes to be asymmetrical, thus requiring us to compare the null five-parameter model against a full six parameter model.
Thus, for any given asymmetry simulation, the BiSSE likelihood score for the full six parameter model was calculated and compared to the likelihood score of a corresponding five parameter model where the rate parameter with an asymmetry being tested is constrained to be equal for both states (e.g., λ 0 = λ 1 ). In addition, a null hypothesis set of simulations with 500 trees was generated where all rate values are symmetrical, and a distribution of the BiSSE likelihood score difference for the six and five parameter models were calculated for each null hypothesis. Power was determined as the percentage of likelihood difference scores between the six and five parameter models of the asymmetrical simulations that were above the 5% cutoff value established by the corresponding likelihood difference score distributions of the rate symmetrical null hypothesis simulations. We simulated sets of 500 trees with varying bias for each of the three parameter pairs as well as sets without bias and each rate parameter combination (Table 1) was tested under tree sizes of 50, 100, 300, and 500 taxa, respectively, in which the probability of the root state was stationary [27].
Many asymmetries in rates would yield biases in the frequency of the two character states observed in the contemporaneous species at the tips of the tree, with stronger asymmetries yielding strong biases. Insofar as BiSSE's behavior might differ depending on the strength of the bias in state frequencies, we explored examples with rate parameter asymmetries that would result in a range of observed high and low biases in character state distributions across the tips (Tables 1 and 2). Ratios of tip bias (taxa with state 0 relative to state 1) were calculated from their corresponding biases in the model rates using the stationary frequency formula in Appendix 2 of Maddison et al. [1].
As expected, tip bias increased as asymmetries in the simulation parameters increased steadily across all tree sizes, and the observed asymmetry in taxa with each state matched expectations given the starting rate asymmetries (Table 1, Additional file 1: Tables S1, S2, and S3).

Power in reduced (4-parameter) models
We investigated whether estimating fewer parameters would lead to an increase in power. In some biological systems it may be reasonable to assume from the beginning that some processes are symmetrical. To examine this we compared the results of the previously described 5 vs. 6 parameter tests with the results of tests involving reduced models of 3 vs. 4 parameters. The reduced test compared a four-parameter model, where two of the rates were constrained to be equal in both character states, for example (λ 0 , λ 1 , μ 0 =μ 1 , q 01 =q 10 ), and a three-parameter model where all three rates were constrained to be equal in both character states. These scenarios were done for two tip bias scenarios, one with a small bias (3:1 character state ratio) and one with a greater degree of bias (7:1 character state ratio) as seen in Additional file 1: Table S4. We simulated sets of 500 trees with both levels of bias for each of the three parameter pairs as well as sets without bias for a six/five and four/three model comparisons as seen in Additional file 1: Table S4.

Estimating parameters in asymmetrical scenarios
Using BiSSE to estimate rates of speciation, extinction and character change may be illuminating not only to understand the degree of any asymmetries, but also to distinguish which potential factor (biased speciation, extinction, or character change) is responsible for an observed excess of species with a particular state. Rate parameters for unconstrained (six parameter) models were tabulated under a best-case scenario representing a small degree of tip bias and a worst-case scenario that included a high degree of tip bias with tree sizes of 500 taxa. Parameters were estimated from the same 500 trees and respective characters that were used to calculate the BiSSE likelihood difference (Table 1). With these simulated cases we asked whether the parameter values were estimated well, with a special focus on whether a bias in one process (e.g., extinction) might be confounded with a bias in another process (e.g., character change).

Implementation
The computer software used in this study was a refined version of the package Diverse [25] described in Maddison et al. [1], with these refinements already implemented in Mesquite [26]. Two of these refinements are described here. The simulation module ("Evolving Binary Speciation/Extinction Character") was modified to generate trees more efficiently by means of a continuous approximation. The updated module calculates the rate of events on the tree as the product of the number of terminal branches on the tree and sum of rates for each event type. The time to the next event was drawn from an exponential distribution and the type and location (branch) of the new event were drawn from appropriately weighted uniform distributions. Following this, all terminal branches were extended to the time point of the generated event. This process continued until the tree reached a limiting number of tips, or the unlikely event that all terminal branches became extinct. The second refinement is an enhanced parameter estimator that uses a numerical integrator that implements the RKF45 method [28]. The RKF45 method improved on the RK4 method, used in Maddison et al. [1] by adaptively adjusting the step-size used in the integration process. Our implementation specified a starting step-size and subsequent changes in step-size were limited a range of 1/10x to 10x the original step-size.