Reassessing the temporal evolution of orchids with new fossils and a Bayesian relaxed clock, with implications for the diversification of the rare South American genus Hoffmannseggella (Orchidaceae: Epidendroideae)

Background The temporal origin and diversification of orchids (family Orchidaceae) has been subject to intense debate in the last decade. The description of the first reliable fossil in 2007 enabled a direct calibration of the orchid phylogeny, but little attention has been paid to the potential influence of dating methodology in obtaining reliable age estimates. Moreover, two new orchid fossils described in 2009 have not yet been incorporated in a molecular dating analysis. Here we compare the ages of major orchid clades estimated under two widely used methods, a Bayesian relaxed clock implemented in BEAST and Penalized Likelihood implemented in r8s. We then perform a new family-level analysis by integrating all 3 available fossils and using BEAST. To evaluate how the newly estimated ages may influence the evolutionary interpretation of a species-level phylogeny, we assess divergence times for the South American genus Hoffmannseggella (subfam. Epidendroideae), for which we present an almost complete phylogeny (40 out of 41 species sampled). Results Our results provide additional support that all extant orchids shared a most recent common ancestor in the Late Cretaceous (~77 million years ago, Ma). However, we estimate the crown age of the five orchid subfamilies to be generally (~1-8 Ma) younger than previously calculated under the Penalized Likelihood algorithm and using a single internal fossil calibration. The crown age of Hoffmannseggella is estimated here at ~11 Ma, some 3 Ma more recently than estimated under Penalized Likelihood. Conclusions Contrary to recent suggestions that orchid diversification began in a period of global warming, our results place the onset of diversification of the largest orchid subfamilies (Orchidoideae and Epidendroideae) in a period of global cooling subsequent to the Early Eocene Climatic Optimum. The diversification of Hoffmannseggella appears even more correlated to late Tertiary climatic fluctuations than previously suggested. With the incorporation of new fossils in the orchid phylogeny and the use of a method that is arguably more adequate given the present data, our results represent the most up-to-date estimate of divergence times in orchids.


Background
Orchidaceae is the largest and one of the ecologically and morphologically most diverse families of flowering plants [1]. Several ages have been proposed for the origin of modern orchid lineages (i.e., their crown age), ranging from ~26 million years (Ma) [2], ~40 Ma [3], ~80 Ma [4] to as much as ~110 Ma [5]. A correct time estimation is essential for our understanding of the mechanisms underlying the diversification of orchids, and could contribute to discern between alternative hypotheses of diversification -such as significant increases in speciation rates temporally correlated to climatic changes, tectonic events, or radiation of pollinators.  [4]. Numbers at nodes are median ages in million of years (Ma). Dashed branches indicate nodes with Bayesian posterior probabilities below 0.90. Circles indicate age-constrained nodes; the yellow circle indicates the root node, the purple circle indicates the internal calibration point for subtribe Goodyerinae (Pachyplectron -Dossinia). Inset: The extinct stingless bee Proplebeia dominicana with well-preserved pollinia attached to the mesoscutellum; these pollinia have been demonstrated to belong to an extinct orchid species in subtribe Goodyerinae. This represents the first definitive fossil record for the Orchidaceae. Reprinted by permission from Macmillan Publishers Ltd: Nature 448 (30), copyright 2007. Many parameters have been identified to affect divergence time estimates in phylogenies, including taxon sampling, reliability, number and placement of internal calibration points, and dating method [6][7][8][9][10][11][12][13][14][15][16]. Until recently, molecular dating of the Orchidaceae has been challenging due the complete absence of reliable orchid fossils. The finding of a 15-20 Ma fossil of an extinct stingless bee (Proplebeia dominicana), covered with pollinia from an orchid species belonging to the subtribe Goodyerinae, finally allowed for temporal calibration of the family [4]. Using this fossil as an internal calibration point, and departing from a phylogenetic tree obtained from the analysis of plastid DNA sequences (matK and rbcL), Ramirez et al. [4] estimated the origin of Orchi-daceae at 76-84 Ma. These results supported an 'ancient' origin of orchids in the Late Cretaceous.
Although the study by Ramirez et al. [4] unquestionably constituted a milestone in orchid research, the large discrepancies in age estimates obtained in the last decadesome 80 Ma between the youngest [2] and oldest [5] crown ages -suggests that the matter is probably not completely settled. In a recent study in the family Begoniaceae, Goodall-Copestake et al. [16] found that the second largest source of variance in age estimates (after availability and placement of internal calibration points) was derived from the choice of dating method employed, a result consistent with previous evaluations of empirical data [10,11]. In particular, recent developments in molec-       [17]] and Penalized Likelihood [PL; [18]] -the two methods employed by Ramirez et al. [4]. Whereas NPRS has been largely abandoned in favour of its successor PL [see discussion in [18]], both implemented in the software r8s [19], PL competes today in popularity with Bayesian dating [20] implemented in the software BEAST [21].
PL and BEAST operate in very different ways: i) PL requires a fixed phylogram as input, whereas BEAST samples topologies simultaneously as it calculates divergence times under a MCMC analysis, and allows the choice of several different priors and models; ii) PL assumes autocorrelation of rates within the phylogeny (i.e. that mutational rates are inherited, resulting in closely related taxa exhibiting similar evolutionary rates), whereas BEAST allows branches to vary in evolutionary rate; iii) in PL, nodes can be calibrated to be either fixed to a certain age, or constrained by a maximum or a minimal bound; whereas in BEAST, several additional alternatives are available for calibrating a node, because such calibrations represent age priors drawn from distributions of various shapes (e.g., normal, lognormal, exponential, or uniform). The methodological and conceptual differences between r8s, BEAST, and some other meth-ods available today for molecular dating have been reviewed by several authors [12,14,[22][23][24].
Although the methodology and assumptions implemented in each molecular dating method can be readily compared, our knowledge of how time estimates are influenced by the choice of method is still poor. For instance, Goodall-Copestake et al. [16] obtained younger ages in the Begoniaceae using PL than using NPRS, but as these authors noted the inverse situation was found by Clement et al. [25,26] on the same taxonomic group. According to Goodall-Copestake [16], this surprising discrepancy was probably caused by differences in density of sampled taxa and calibration points. Similarly, it may be very difficult to predict differences in age estimates using PL and BEAST: in the study by Goodall-Copestake [16], PL produced considerably younger ages than BEAST, whilst the opposite situation was found within family Caryophyllaceae [26]. These results exemplify the potential influence of methodology on age estimations.
In this study we aim at reassessing the temporal origin and diversification of Orchidaceae, using the Bayesian uncorrelated relaxed molecular clock approach implemented in BEAST. In addition to choosing a different dating method, we conduct a new analysis on an expanded taxon sampling by adding two internal calibration points in the orchid phylogeny. We base these calibrations on fossil leaves described subsequent to the study by Ramirez et al. [4] from Early Miocene deposits of New Zealand, which were confidently assigned to genera Dendrobium and Earina [27]. Then, to explore how the highlevel age estimates obtained here may affect the evolutionary interpretation of a species-level orchid clade, we date the origin and diversification of the rare South American orchid genus Hoffmannseggella.
Hoffmannseggella belongs to the Epidendroideae, the largest subfamily within Orchidaceae, which comprises over half of all orchid species [28]. The subfamily has been divided into 'lower' and 'higher' Epidendroids [29] and this latter clade includes the monophyletic subtribe Laeliinae, where Hoffmannseggella is nested [30]. The genus is endemic to Brazil, where it is confined to the High Altitude Rocky Complexes (Brazilian Campos Rupestres and Campos de Altitude) of Minas Gerais, Rio de Janeiro, Espírito Santos and Bahia states. It comprises exclusively rupicolous species, i.e. growing among rocks [31]. Adding to the 32 different species recognized by Chiron and Castro Neto [32], several new species have recently been described and today Hoffmannseggella comprises 41 species [31,[33][34][35][36][37][38][39]. Half of these are "microendemic" -known from a single natural population, and some only from the type collection. We have been able to obtain or generate sequences for all but a single species, thus reaching a 98% complete species sampling.

Results
The maximum credibility tree for the fossil-calibrated relaxed molecular clock analysis of the family Orchidaceae, using the same matrix and calibration points as Ramirez et al. [4], is shown in Figure 1. Support values for the different clades and branches were high, with only a few values below 0.90 Bayesian posterior probability. Our age estimates indicate that extant Orchidaceae shared a most recent common ancestor (MRCA) in the Late Cretaceous, ~80 Ma (95% confidence intervals, CI: 56 -105 Ma). Accordingly, median divergence times for the five orchid subfamilies currently recognized (Apostasioideae, Vanilloideae, Cypripedioideae, Orchidoideae and Epidendroideae) varied between ~31 Ma and ~58 Ma (  Table 1).
The maximum credibility tree for the calibrated relaxed molecular clock analysis of the 'Higher Epidendroids' including the genus Hoffmannseggella is shown in Figure  3. All Hoffmannseggella species clustered together with high support value (posterior probability above 0.90), supporting the monophyly of the genus. However, the resolution within the genus was poor, mainly distinguishing two subclades. According to our dating, the genus Hoffmannseggella shared a MRCA in the Late Miocene, 11 Ma (95% CI: 5 -20 Ma) (Figure 3).  1). Also in the first analysis where we used the same matrix as Ramirez et al. [4] and obtained precisely the same topology for the phylogenetic tree, median age estimates for the five orchid subfamilies are younger (Table  1). Ramirez et al. [4] proposed a Late Palaeocene radiation (~56 Ma) for the Orchidaceae, thus during the prominent temperature increase of the Early Cenozoic (from 59 Ma to 52 Ma; [40,41]). In contrast, our results suggest a younger diversification, placing the origin of the two largest orchid subfamilies Orchidoideae and Epidendroideae in the Eocene (~53 Ma and ~49 Ma, respectively), thus at the onset of a long period of temperature decrease [40,41] ( Figure 4). If these estimates are correct, a potential explanation for the initial radiation of Orchidaceae during the Eocene could be that cooler temperatures increased the global heterogeneity of ecosystems (e.g., with more open and dry habitats), creating new habitats that could foster adaptive radiation and/or increasing allopatric speciation (e.g. [42]). An alternative or complementary scenario is that the Early Eocene Climatic Optimum 51-53 Ma [40,41], when mean global temperatures reached ~12°C higher than today's level (Figure 4), would have caused a wave of large-scale extinction and thus left many empty ecological niches available for orchid diversification.

Radiation of the genus Hoffmannseggella
Our estimate for the crown age of Hoffmannseggella indicates a Late Miocene radiation for the genus (~11 Ma; Figure 3). This is some 3 Ma younger than the dates obtained using a Penalized Likelihood analysis over a sample of Bayesian phylograms [43,44] (mean 14.2 Ma, 95% CI: 9.69 -18.6).
Antonelli et al. [44] postulated a strong correlation between climate cooling following the Mid-Miocene Climatic Optimum and range expansion and diversification in Hoffmannseggella. This younger age estimate may imply an even stronger link than originally conceived, since the intensity of climatic oscillations augmented towards the end of the Tertiary [40,41].

Reliability of results
The age estimation of genus Hoffmannseggella is based on a single, secondary calibration point (the crown group age of 'Higher Epidendroids' obtained in the high-level dating analysis), which should have a direct effect on all internal divergence times. However, the node age estimations between Cattleya and Masdevallia are strikingly similar in both analyses (25 Ma in the high-level Orchi-  daceae data set, CI:14 -35; and 26.5 Ma in the Hoffmannseggella data set, CI: 14 -41). This agreement provides some cross-validation for the use of the 'Higher Epidendroids' as a calibration point for the Hoffmannseggella data set. As outlined in the Introduction, PL and BEAST make different evolutionary assumptions and have very different algorithms. This precludes categorical assertions on which of these methods yields the most correct divergence time estimates. One way to assess the autocorrelation assumption made by PL is to examine the covariance between parent and child branch in each phylogeny. This value is calculated by the software Tracer v1.4 [45] from the log files of the MCMC analyses, and should be significantly positive when rates are autocorrelated, and near zero when there is no evidence of autocorrelation (see BEAST manual). For the Orchidaceae data set, this covariance had a mean of 0.10 and 95% confidence intervals ranging from -0.05 to 0.26. Although the covariance has been criticized as a weak measure of autocorrelation and more critical discussion on this subject is needed [46], the low covariance found in this study does not provide positive evidence for autocorrelation, thus favouring the BEAST results reported here.

Influence of internal fossil calibrations
It is worth noting that the single fossil used by Ramirez et al. [4], for tribe Goodyerinae, did not influence any age estimates for Orchidaceae in the BEAST analysis. This is evident by examining the distribution of ages of the MRCA of tribe Goodyerinae, which falls outside the lower age prior of 15 Ma (Figure 5a). In contrast, both New Zealand fossils affected age estimates, as is apparent from the age distributions of the two constrained nodes, truncated at their younger bound 20 Ma (Figure 5b, c). While the Dendrobium crown age constraint was most frequently reached during the analyses (Figure 5b), the Earina stem age constraint was often reached but did not prevent a normally distributed age distribution with a mean older than the fossil constraint ( Figure 5c).

Conclusions
Molecular dating techniques have greatly improved in the last years, offering novel opportunities to study the temporal evolution of taxa. However, it is essential to critically evaluate the impact of methodology and other parameters (taxon sampling, fossil calibrations, sequence regions) on the reliability of results. This study has shown that age estimations for orchid clades vary by several million years when using BEAST or Penalized Likelihood. While the addition of two new internal calibration points makes our study the most up-to-date estimate of the temporal evolution of orchids, additional studies may be required before a stable chronogram of this charismatic plant family is achieved.

Taxon sampling and genetic markers
In the first analysis for the Orchidaceae family, we used the matrix compiled by Ramirez et al. [4], including aligned plastid DNA sequences (matK and rbcL). The matrix comprised 60 taxa and 2858 characters (see [4] for a detailed list of included species and GenBank accession numbers). In the second dating analysis of the Orchidaceae, we used six additional taxa, including sequences of both matK and rbcL, to allow the use of two recently described orchid fossils from New Zealand as calibration points. We also added sequences from four new outgroup genera, which enabled us to use the stem node of Asparagales as root calibration point, instead of the crown node, as done by Ramirez et al. [4]. We did this because the oldest fossil used by Ramirez et al. [4] was Liliacidites sp., which does not have any diagnostic features of crown Asparagales. The additional sequences were re-aligned with those of Ramirez et al. [4], using MAFFT version 5.64 [47]. The final matrix comprised 70 taxa and 2905 characters (see Table 2 for the additional species and their corresponding GenBank accession numbers).
For Hoffmannseggella, 40 out of 41 species were sampled ( Table 3). The ITS region (ITS1-5.8S-ITS2) was sequenced for 14 species to complement the study by van den Berg et al. [30]. The matrix, including outgroup species selected on the basis of previous phylogenetic analyses [4,30], comprised 56 taxa and 660 characters. No gaps were coded in any of the matrices.

Sequence alignment and dating analyses
The matrix for Orchidaceae [4] was analysed using a relaxed molecular clock approach with the software BEAST v1.5.3 [21]. The input data were compiled in BEAUti v1.5.3 with the tree priors set as follows: i) age for the monophyletic subtribe Goodyerinae (corresponding to the age of the fossil orchid pollinia 15 -20 Ma old; [4]): uniform prior distribution with a lower bound of 15 Ma and an upper bound of 120 Ma; ii) age for the root of the tree (corresponding to the oldest known fossil record for Asparagales; see discussion in [4]): normal prior distribution with mean 106.5 Ma and standard deviation of 8.21 (giving a 95% CI ranging from 93 -120 Ma). The second family-level matrix (with the additional taxa) was analysed with the following tree priors: i) age for the monophyletic subtribe Goodyerinae set as above; ii) the two additional calibration points for Dendrobium and Earina set as uniform prior distributions with a lower bound of 20 Ma and an upper bound of 120 Ma (phylogenetic placement following [27]); iii) age for the root of the tree set to an uniform prior distribution with a lower bound of 93 Ma and an upper bound of 120 Ma. The upper (maximum) age constraint of 120 Ma for the calibrations above corresponds to the oldest known monocot fossils [52]. We acknowledge that this constraint may be questionable since fossils generally provide minimal ages, but in absence of further evidence such upper bounds are technically advantageous for preventing the root of the tree to assume unreasonably old ages.
The Yule process was chosen as speciation process for all three data sets. The Akaike Information Criterion in MrModelTest v2.3 [53] was used to choose the best-fitting evolutionary model for each sequence region (GTR+Γ+I for both partitions in the Orchidaceae data set and GTR+Γ for ITS in the Hoffmannseggella data set). Five separate runs were performed in BEAST with 20 million generations each. Log files were analysed with Tracer v1.5 [45], to assess convergence and confirm that the combined effective sample sizes for all parameters were larger than 200, in order to ensure that the MCMC chain had run long enough to get a valid estimate of the parameters [54]. All resulting trees were then combined with LogCombiner v1.5.3, with a burn-in of 25%. A maximum credibility tree was then produced using TreeAnnotator v1.5.3 [21].

Authors' contributions
ALSG and AA designed the study; ALSG and CFV conducted fieldwork; ALSG generated molecular data and conduced data analysis; ALSG and AA interpreted the results and wrote the paper. All authors read and approved the final manuscript.