Detecting patterns of species diversification in the presence of both rate shifts and mass extinctions

Recent methodological advances allow better examination of speciation and extinction processes and patterns. A major open question is the origin of large discrepancies in species number between groups of the same age. Existing frameworks to model this diversity either focus on changes between lineages, neglecting global effects such as mass extinctions, or focus on changes over time which would affect all lineages. Yet it seems probable that both lineages differences and mass extinctions affect the same groups. Here we used simulations to test the performance of two widely used methods under complex scenarios of diversification. We report good performances, although with a tendency to over-predict events with increasing complexity of the scenario. Overall, we find that lineage shifts are better detected than mass extinctions. This work has significance to assess the methods currently used to estimate changes in diversification using phylogenetic trees. Our results also point toward the need to develop new models of diversification to expand our capabilities to analyse realistic and complex evolutionary scenarios.


Background
The estimation of the rates of speciation and extinction provides important information on the macroevolutionary processes shaping biodiversity through time [1]. Since the seminal paper by Nee et al. [2], much work has been done to extend the applicability of the birthdeath process, which now allows us to test a wide range of hypotheses on the dynamics of the diversification process.
Several approaches have been developed to identify the changes in rates of diversification occurring along a phylogenetic tree. Among them, we can distinguish between lineage-dependent, trait-dependent, timedependent and diversity-dependent changes. Lineage specific methods identify changes in macro-evolutionary rates -speciation and extinction rates, denoted as λ and μ, respectively -at inner nodes of a phylogenetic tree [3][4][5]. We can also identify trait-dependence in speciation and extinction rates if the states of the particular trait of Among them, the Cretaceaous-Paleogene (K-Pg) boundary and the Permian-Triassic events, which happened 65 million years ago (Mya) and 251 Mya, respectively, induced the most dramatic losses of biodiversity [15]. Moreover, other less extensive events have also occurred in the past hundred million years [16].
Alternative models have been proposed for mass extinctions. They could be represented as a high number of species disappearing at the same time (single-pulse model), or as an increase of the background rate of extinction during an extended period of time (time-slice model) [17]. They could also impact biodiversity in different ways. Three main hypotheses, corresponding to different patterns of extinction, have been proposed [18]. First, the event could affect all lineages equally and terminate any extant lineage with the same probability. This "field of bullets" scenario is often used as a null model [19,20]. Second, in the "fair game" scenario, some form of lineage selection would occur, where the most successful species -in our case, the most diversifying speciesbefore the event would be the most likely to survive. This could, for instance, happen if the probability of survival depends on a specific trait varying across the lineages of the phylogeny [21]. Finally, in the "wanton destruction" scenario [22], the event could induce such changes in the environmental conditions that the probability of extinction of the species and their post-event diversification rate would be uncorrelated to their initial speciation and extinction rates.
Although lineage-dependent differences in macroevolutionary rates and mass extinctions are known to happen, the performances of the existing methods to identify both lineage-specific rate shifts when mass extinctions have occurred, and mass extinctions when lineage-specific rate shifts have occurred has not, to our knowledge, been investigated. The aim of this study was thus to assess the performance of current methods to estimate the rates of diversification using complex scenarios involving both mass extinctions and lineage shifts. We used simulations to assess the impact of varying number and magnitude of rate shifts and mass extinction events. Figure 1 gives an overview of the simulation design. We used a backward algorithm to simulate phylogenetic trees as implemented in the function sim.rateshift.taxa from the R [23] package TreeSim [24]. Direct forward approaches to simulate trees using a birth-death process are also available. They can be used by conditioning either on the number of tips or on the total amount of time of the process. The former approach can lead to bias [25], while the latter could be less practical in our specific context as the procedure would result in trees with highly variable numbers of taxa, in particular when adding Fig. 1 Workflow of the simulation process. Hypothetic case of a 50 species tree, 3 lineage shifts and 2 mass extinctions. The number of species in each lineage is randomly drawn first. Each tree is grown separately with different (λ, μ) but with identical survival rates (ρ) at each mass extinction events. The four trees are then successively joined at branches ensuring ultrametricity. Vertical continuous lines: simulated mass extinction events, full circles: ancestor where diversification change occurred mass extinction events. A backward simulation procedure is therefore the best solution to simulate the different diversification scenarios of interest for our study. This procedure enables both single-pulse or time-slice modeling of mass extinctions, but we chose to represent them only using the single-pulse model because paleontological data indicates very high species loss at major mass extinction events in a limited amount of time. For instance, a 52 % decrease in marine families was observed at the Permian-Triassic boundary [14].

Methods
Our algorithm takes as input the number of extant species, the evolutionary rates λ and μ, and the time of occurrence and survival rate ρ for mass extinction events. We assumed in the first part of our simulations that these events happened according to the field of bullet scenario (step 1). We randomly grafted different trees having experienced the same mass extinction events but different evolutionary rates to account for rate shifts in diversification (step 2; see Table 1). First, we ran as many backward simulations as the number of lineages shifts in our tree. We defined the number of species in each backward simulation by drawing samples from a Dirichlet distribution to keep the total sum equal to the overall number of species. We then ranked the trees by decreasing order of their total age, which included the stem branch length provided by TreeSim. We selected from the oldest tree (referred to as acceptor tree) the branches that overlapped in time with the age of the stem branch of the second oldest tree (referred to as donor tree). Thus, the branches considered for possible grafting were the ones that included the age of the donor tree between the timing of the two speciation events defining them in the acceptor tree. We randomly chose one of those branches to graft the donor tree onto the acceptor. This ensures ultrametricity of the newly created tree and leaves the branch lengths of each separate tree unmodified once the lineage having experienced the diversification shift is removed. We iterated over this protocol until all donor trees, whose number varied in our simulations between 0 and 5 (Table 1), were grafted. Finally, we ran Medusa [4] and TreePar [9] analyses on each simulated tree to investigate our capacity to recover the signal of mass extinctions and diversification shifts (Fig. 2). We simulated trees with different numbers of lineages and extinction events to assess the influence of these factors. Table 1 summarizes the parameter space explored for the 16,371 trees that we simulated. For the values of λ and μ, we targeted distributions similar to the estimates calculated on a mammalian phylogeny [26].
Despite the issues to use existing forward algorithm, we nevertheless compared our backward algorithm with a "forward-like" algorithm based on the R package TESS [27]. We simulated trees with different values of λ, μ and species number to model the lineage shifts in diversification rates. We carry a similar grafting process as described in our backward algorithm. However, we removed all daughter species of the sister clade of the donnor tree in the acceptor tree. This step has the consequence of  removing the instance of the artificially created speciation event that was present in our former algorithm and effectively mimic a forward algorithm with a change in diversification rate possible anywhere between two speciation events. As the first conditioning is made on the number of species, and as we subsequently remove species, the total number of species at the end of the process in not constant but varies slightly below the number used for the conditioning. We simulated trees according to both our backward and forward algorithms and compared them using two different measurements: the distribution of branching times and the outcomes of Medusa on both our trees (Additional file 1). These two measures resulted in very similar outcomes and we present here only the results obtained by the backward algorithm.
Medusa is a maximum likelihood-based framework to detect shifts in diversification by iteratively adding breakpoints on inner branches of the tree with different rates of speciation and extinction. It uses AIC to discriminate between models with an increasing number of parameters [4]. Rabosky also recently presented a new method (BAMM) to estimate the number of possible rate changes along a phylogenetic tree and to fit exponential responses in macroevolutionary rates to time or to species number [28]. Unlike Medusa, BAMM uses a Bayesian framework, with reversible jump Markov chain Monte Carlo to estimate the number of shifts in diversification in the phylogeny. In our design, we chose not to simulate varying speciation and extinction rates except at speciation nodes, thus using higher complexity models is not necessary. Comparisons between BAMM and Medusa have been performed, but only on simulations involving either timedependent or diversity-dependent rates [28]. This framework led to a clear bias in favor of BAMM as Medusa can not evaluate such models, and resulted in Medusa estimating a lower number of events than what was actually simulated [28]. The numbers of estimated shifts obtained with Medusa can therefore be considered as conservative. Finally, we do not expect a different behavior for Medusa and BAMM regarding the identification of mass extinction events, as neither method incorporates them in their model. Those reasons, as well as the large computational burden to run Bayesian analyses on over 16,000 trees, led us to favor the simpler Medusa framework for the rest of the study. Medusa was run until a more complex model was not supported by the AIC. We did not extract the macro evolutionary rate estimations from Medusa as we were only interested in testing the ability of the method to detect the events, and not the accuracy of the parameter estimation.
TreePar uses the birth-death process to identify changes in λ and μ through time. This is done by estimating the probability of a change in parameter values within small time intervals, which can be extended to test for the occurrence of mass extinction events [9]. The parameters of the rate shifts might be correlated with those related to mass extinction [9], which will be a problem for our simulations. We therefore restricted our analysis to the identification of mass extinction events to avoid this issue. The number of iterations of TreePar was set to the simulated number of mass extinction events plus one to test for the appearance of false positive events. A standard Likelihood Ratio Test (LRT) is used to extract the most likely models from TreePar and more complex models were favored when their p-value was less than 0.01, following the standard approach for this framework [9]. Similarly to what was done with Medusa, we did not analyze estimations of survival rates at mass extinctions events given by this framework.
To verify that our simulation design had no effects on the methods evaluated, we tested the influence of the subtree grafting approach with a constant rate of diversification. We simulated trees with 200 species using both the standard procedures implemented in TreeSim and by grafting two subtrees of 150 and 50 species having evolved under the same λ and μ values. We then compared the results obtained by TreePar and Medusa. We ran 250 pairs of simulations and we observed no significant differences in the number of false positive found between the groups with and without artificial grafting (7 and 13 for Medusa respectively, and none in both cases for TreePar), showing that our simulation design does not bias the estimation of the rate shifts by the two methods used.
We used a slightly different framework to study the impact of the different types of mass extinction events. We simulated a scenario that aimed at testing for the presence of the K-Pg mass extinction event using high order phylogenetic trees. We therefore simulated trees with a large number of extant species (5,000 tips, similar to the number of mammalian species) and a large number of lineage shifts (5), but only one event of mass extinction. The other parameters were still drawn at random from the ranges specified in Table 1, except for the survival rate ρ that was modified according to the models of mass extinction. For the fair game hypothesis, we randomly drew λ and μ for the 5 different lineage shifts, but the survival rate ρ was modified for each lineage based on its diversification rate (r, λ − μ). We thus considered that the trait influencing the probability of extinction for each species was its diversification rate. For the wanton destruction hypothesis, the mass extinction event induced a change in rates for each lineage, again drawn according to the distribution stated in Table 1, and their survival rate ρ was then based on their new diversification value. For the wanton destruction, our simulations included both a global rate shift and a mass extinction and we ran TreePar twice in order to detect both events. For the two latter cases, we chose to linearly parametrize ρ with regards to diversification. As diversification could range between 0 and 0.25 and ρ between 0 and 1, we applied a factor four to the diversification to obtain the survival rates of the lineages. We also ran Medusa on the three sets of simulations to assess the potential impact of the three extinction hypotheses on the detection of lineage shifts. For this second part, we generated over 700 trees for each model of mass extinction event, for a total of 2289 simulations.

Baseline performances
The backward and "forward-like" algorithms gave very similar results (Additional file 1) and we only present here the results obtained with the backward algorithm. To estimate the baseline behavior of both frameworks, we first tested the performance of the methods on the simplest scenarios. We thus selected simulations that included a single rate shift for Medusa, or a single mass extinction for TreePar. Figure 3 represents the fraction of shifts detected by Medusa relative to the absolute difference between the new and the old diversification values (Fig. 3a) and to the number of species in the lineage (Fig. 3b). More than 80 % of the changes in diversification larger than 0.05 are detected by Medusa, which shows a good performance in assessing strong shifts. Further, Fig. 3b shows that the overall tree size has no influence on the detection, since We then checked the ability of TreePar to detect mass extinction as a function of the survival rate, ρ, as well as of the number of ancestral species predating this event in the reconstructed tree. We also used first the simplest simulation to limit the effect of other parameters. Figure 4a shows that the signal of mass extinction in the phylogenetic tree is very weak when less than 100 ancestral species are present before the event. This has implications for our ability to find evidence for the K-Pg boundary using phylogenetic trees of vertebrates, for example. We can only reach more than a hundred ancestral species older than 65 My by considering phylogenetic trees encompassing distantly related lineages of tetrapods (see [26] or [29]). Besides, as detection drops with increasing survival rate (Fig. 4b), the signal is even less likely to be picked as the ancestors of the extant species might have experienced the mildest extinction rates.

Mixed scenarios of diversification
In a second stage, we analyzed simulations with more events and a mix of different types of events. We evaluated the performance of rate shift detection by Medusa, or of mass extinction events by TreePar, by comparing the events detected to the relevant simulated events. To perform the assignment between detected and simulated events (see Fig. 2), we chose to minimize the sum of the distances between each potential pairing of events The simulations incorporated several factors and we tested the effect on the framework of three categorical parameters: total number of tips, number of mass  Table 1 for their possible values). To ensure that the effects observed were related to the parameter of interest, we designed a reshuffling scheme for each parameter. First, we randomly selected an equal number of simulations for each combination of every possible value of the other two parameters. As an example, to study the outputs for trees of 200 tips, we randomly drew an equal number of simulations with (i) no lineage shift, no mass extinction and 200 tips; (ii) one lineage shift, no mass extinction and 200 tips; (iii) one lineage shift, one mass extinction and 200 tips; etc. This draw was repeated a hundred times and we determined, for each bin created, the proportion of simulations for which each method favored the model with the correct number of relevant events it was looking for, and the proportion of simulations for which they favored a model with too many events. Finally, we report the median and 95 % intervals of those proportions based on our hundred bins.

Tree size influence
Both Medusa and TreePar perform better in assessing the correct number of events they are set to detect with an increasing number of tips (Fig. 5). The median proportion of simulations correctly assessed reaches 60 % for Medusa and 32 % for TreePar with 5,000 tips. The increase in the number of tips also leads to an increased acceptance by TreePar of models with too many mass extinctions (28 % for 5,000 tips). However, the number of tips in the tree has no effect on the error of the estimated time of mass extinction (Fig. 6), even though more events are predicted. We only see a slight effect of tree size for Medusa, which is probably due to the fact that the method only detects lineage related events and does not depend on the total number of tips. We also investigated the effect of lineage size on the outputs of Medusa. We first compared the variance of lineage sizes relative to the overall tree size, contrasting the simulations with false positives to those with the correct number of rate shifts found. To remove the effect of lineage number, we compared groups of trees with the same number of diversification shifts. To account for a potential effect of tree imbalance, we compared the variance in lineage sizes inside trees, with or without false positives. There is no effect in most cases, except in the simulations with 4 or 5 rate shifts (p-values: 0.01 and 3.6 · 10 −3 , respectively, Mann-Whitney test). Thus, simulations with lineages of similar size are more likely to yield false positives only when they include more than 4 rate shifts. We also compared the variance in lineage sizes between simulations for which we recovered the correct number of events against those for which we recovered too few events. For every possible number of lineages, we find significantly lower variance for simulations that were correctly assessed. Thus, we only see a slight effect of the lineage size on the occurrence of false positives, whereas high variance in lineage size significantly increases false negatives. This indicates on the one hand, a tendency to overestimate the number of shifts when lineages are comparable in size, and on the other hand, problems with Medusa for identifying diversification shifts specific to a low number of species, as showed in the first part.

Impact of events violating the model
We tested the robustness of the methods by studying the behavior of (1) Medusa to detect rate shifts with an increasing number of mass extinctions, and (2) TreePar to detect mass extinction events with an increasing number   (Fig. 7). In contrast, an increase in the number of lineage shifts results in an increase of the proportion of false positives for TreePar (2 % with no lineage shift vs. 20 % with five; Fig. 7). However, the accuracy of the estimate of the timing of the event is not affected (Fig. 8). The number of lineage shifts has almost no impact on the probability of detecting a true mass extinction event, i.e. on false negatives.
We note that false positive rates remain very low throughout all cases for Medusa, less than 10 % overall and around 5 % when dealing with simulations without mass extinctions (Fig. 7a). Recently, May et al. [30] have also studied the performances of Medusa but with a different focus. Medusa also enables the characterization of diversification changes on incomplete phylogenies by letting the user assign species diversity at each tips of the tree. Two different equations are then used to calculate the likelihood function. One of them incorporates the likelihood of getting a specific number of species given a pair of λ and μ after a certain amount of time, and is now used to account for the terminally input species numbers. May et al. simulated complete phylogenies before introducing uncertainties by sequentially collapsing some of the tips, and tested the different flavors of the three different Medusa algorithms ever made available. They found high Type I errors in every algorithm and biased parameter estimates. We note that in our study, we did not consider the estimation of the macro evolutionary parameters, and did not use unresolved trees, that can be used in Medusa to account uncertainties in the phylogeny. Interestingly, May et al. also tested the algorithm that we used in this study (turboMedusa, defined as tMEDUSA in their study) on completely resolved trees, and found about the same rate of Type I errors as we did in the comparable trees ( Figure S.20 of their study). Thus even though the focus of the two studies differs, they are in agreement in the few common analysis.

Impact of patterns of extinction
The effect of different scenarios of mass extinction on the results of Medusa and TreePar are presented in Fig. 9. First, as expected, no effect of the extinction scenarios is observed on the detection of lineage rate shifts detected by Medusa (Fig. 9a). In contrast, the fair game and wanton destruction scenarios impact the estimation made by TreePar. They produce, for comparable levels of detection, more false positives than the field of bullets which was used in the previous simulations (73 % and 74 % for fair and wanton against 58 % for field of bullets, Fig. 9b).  Irrespective of the type of mass extinction simulated, there are very few false negatives, i.e. at least one extinction was detected in almost every tree. The error on the timing of this event was kept under 5 % of the root age. We also performed a search for global rate shifts in the case of wanton destruction (Fig. 9b, dashed background). Regarding this scenario, we also compared simulations where all lineages undergo an increase of diversification after the mass extinction event against those who undergo a decrease and observe no difference between the outcomes of the two frameworks. Even though the shifts are different between lineages (i.e., increase of diversification in some lineages, decrease in others), TreePar detects the period of this shift with more power than for the detection of the associated mass extinction (34 % and 21 % correctly assessed simulations, respectively). Overall, these results show that departure from the simplest model of mass extinction should not affect our ability to detect these events in phylogenetic trees (i.e. no increase in false negatives rate). But it should lead to an increase of false positive detections.

Conclusion
Previous studies involving mass extinctions and changes in macro-evolutionary rates have only focused on their effect on lineage through time plots [31]. This lead to the identification of a possible mass extinction event in some plants lineages around 32 Mya, which was further suggested to be linked with changes in climate. Recently, Hohna [32] developed a new algorithm to perform simulations with varying macro-evolutionary rates, allowing for mass extinction events. Other ongoing work aims at studying and simulating increasingly complex scenarios of diversification [25,33], but we would like to emphasize that no method allows the simultaneous discovery of both time-specific or lineage-specific rate changes and mass extinction events.
The study of diversification rates has become a standard part of the analysis of large phylogenetic trees [12,29,34], and recent efforts have also assessed the methods used when their assumptions are violated [28]. We have shown that departure from the assumption of consistency in rates across lineages causes a large increase in false positives when looking for mass extinction events. This can be problematic as we know that rate consistency rarely holds [3,12,13], and casts doubts on our ability to reliably find such events using only phylogenetic trees. Nevertheless, an increasing number of disparities between lineages caused neither a decrease in the probability of detecting an event nor an increase in the error on its timing. As we observed the same pattern under more complex scenarios of extinction, the difficulty in detecting the K-Pg event in mammals is therefore probably not due to biases in the methods used. We might be limited by the power of TreePar to detect mass extinction events, although in simulations we reach 60 % of true events detected for a tree size similar to that of mammals.
Recent efforts aim to reach a better agreement between paleontological and molecular data [35], including looking for mass extinctions in molecular phylogenies. For instance, there is much debate on whether the K-Pg extinction event triggered the mammalian diversification [9,26,29,36,37]. The fossil record also indicates higher extinction rates of mammalians species around 65 Mya [38]. In this work, we have shown that for phylogenetic trees similar in size to that of mammals (i.e. ca. 5000 species), the signal for mass extinctions was usually recovered in the tree, even though lineage discrepancies in macro-evolutionary rates had a tendency to yield more false positives. Thus, if the ancestor lineages of the extant mammal families did experience a mass extinction at the K-Pg boundary, we should theoretically be able to identify it using phylogenetic trees. The underlying assumption about the mass extinction made when using TreePar is that lineages are terminated randomly with a fixed ρ value everywhere in the tree, i.e. a field of bullets type of mass extinction. But other models of extinction seem to increase false positives but not false negatives, not explaining difficulties in finding a K-Pg signal in real phylogenetic trees.
Recent studies have used Markov processes to account for the effect of specific traits upon the probability of extinction of a species, thus extending models of mass extinction beyond the field of bullets [21]. Such models can be used for instance to estimate the loss of phylogenetic diversity after a mass extinction event [39]. Our simulations can be seen as a special case of such models, where the trait influencing survival probabilities is the diversification value of the species. We have shown that more complex models of mass extinction cause more false positive detection than the simple field of bullets, as well as a decrease in the error for the fair game scenario. Choosing a specific model of extinction (field of bullets, wanton destruction, fair game) might require the incorporation of fossil information into the phylogenetic tree, and thus the further development of methods capable of dealing with both molecular and fossil data.

Additional file
Additional file 1: Details of the comparisons between our backward and the forward-like algorithms used to perform the simulations.