- Research article
- Open Access
A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): a dated 1000-tip tree
BMC Evolutionary Biologyvolume 13, Article number: 214 (2013)
The Solanaceae is a plant family of great economic importance. Despite a wealth of phylogenetic work on individual clades and a deep knowledge of particular cultivated species such as tomato and potato, a robust evolutionary framework with a dated molecular phylogeny for the family is still lacking. Here we investigate molecular divergence times for Solanaceae using a densely-sampled species-level phylogeny. We also review the fossil record of the family to derive robust calibration points, and estimate a chronogram using an uncorrelated relaxed molecular clock.
Our densely-sampled phylogeny shows strong support for all previously identified clades of Solanaceae and strongly supported relationships between the major clades, particularly within Solanum. The Tomato clade is shown to be sister to section Petota, and the Regmandra clade is the first branching member of the Potato clade. The minimum age estimates for major splits within the family provided here correspond well with results from previous studies, indicating splits between tomato & potato around 8 Million years ago (Ma) with a 95% highest posterior density (HPD) 7–10 Ma, Solanum & Capsicum c. 19 Ma (95% HPD 17–21), and Solanum & Nicotiana c. 24 Ma (95% HPD 23–26).
Our large time-calibrated phylogeny provides a significant step towards completing a fully sampled species-level phylogeny for Solanaceae, and provides age estimates for the whole family. The chronogram now includes 40% of known species and all but two monotypic genera, and is one of the best sampled angiosperm family phylogenies both in terms of taxon sampling and resolution published thus far. The increased resolution in the chronogram combined with the large increase in species sampling will provide much needed data for the examination of many biological questions using Solanaceae as a model system.
Divergence times are of major interest for studies of evolutionary biology and historical biogeography, but also to researchers who focus on understanding various types of trait evolution, such as the development of chemical and genetic pathways, climatic niche and geographic range sizes, and morphological, ecological and behavioural characters. With the recent publication of fully annotated genomes in the Solanaceae [1–3], genomic tools now exist for unravelling genetic mechanisms that control such traits and their development. What is lacking, however, is a robust phylogenetic framework that encompasses species and generic diversity across the family in order to maximise the potential of these new data sources in a wider evolutionary context. Although several studies have focused on understanding evolution of particular characteristics in Solanaceae in a phylogenetic context, including analyses of genome and chromosome evolution [4–6], life history and polyploidy [7–9], floral and fruit morphology [10, 11], gene family evolution and sub-functionalization [12–15], and broad-scale biogeographic patterns , only a single study has examined character evolution through time . A central problem has been the lack of a robust, densely sampled, dated molecular phylogeny for the entire family.
Solanaceae are a particularly interesting angiosperm family, not only because they include many major crop species (e.g., potato - Solanum tuberosum, tomato - S. lycopersicum, eggplant - S. melongena, sweet and chili peppers - Capsicum spp., tobacco – Nicotiana spp.) and ornamentals (e.g., Petunia spp., Solanum spp.), but also a number of taxa used as biological model systems (e.g. Nicotiana spp., Solanum spp., Petunia spp., Datura spp.) [17–20]. A taxonomic and phylogenetic framework is now available for the family at the generic level (see http://www.solanaceaesource.org) [21, 22]. Separate studies have explored relationships at tribal [23–26] and generic levels [27–33], and many phylogenetic studies have focused on Solanum, a genus that comprises nearly half of the species within the family [34–40]; see  for references prior to 2006]. Some of these studies have resulted in taxonomic changes, such as re-circumscription of the previously distinct genera Lycopersicon, Cyphomandra, Normania, and Triguera as parts of a monophyletic Solanum[35, 41, 42]. Such changes, although sometimes disruptive in the short term, have helped to both stabilise names and provide a better evolutionary context for future studies.
Although a relatively robust understanding of the major clades within Solanaceae exists, a densely sampled species-level phylogeny is still lacking. The most recent molecular systematic study focused on establishing major relationships within the family, but lacked depth in terms of sampling as it only included 190 (c. 7.3%) of a total c. 2,700 Solanaceae species . A larger phylogenetic analysis with 995 species by Goldberg and colleagues  focused on the evolution of breeding systems, and did not discuss details of topology or implications for family-wide systematics. For Solanum itself, the most recent phylogeny included only 102 (7.7%) of the total c. 1,325 species in the genus . Because several phylogenetic studies at various taxonomic levels across Solanaceae have since been published (see above), there is now a large quantity of new sequence data available for a wider family-level analysis.
Molecular divergence time analyses do not only depend on the availability of a robust, well-sampled phylogeny, but also require robust fossil calibration points [44, 45]. The Solanaceae fossil record has never been fully reviewed, and only a few fossils have been used in molecular studies [8, 46, 47]. These studies used fossils as calibration points without a careful comparison of fossil morphology in relation to extant diversity. A recent survey of the earliest fossil record of the Asterid clade, including Solanaceae, highlighted the need to re-assess the earliest putative Solanaceae fossils that could provide robust calibration points for the crown or stem node of the family .
This study is part of a collaborative approach to studying the taxonomy and phylogeny of the Solanaceae. Here we present a densely sampled phylogenetic study of the family coupled with a molecular dating analysis with fossil calibrations. We review all known seed fossils in the family, and assess them for identity, age, and phylogenetic position. We then use all available sequences for seven DNA loci found in GenBank with nearly all genera and 1,075 species represented. A dating analysis is run using an uncorrelated relaxed molecular clock model within a Bayesian framework with direct fossil calibrations. The resulting time-calibrated phylogeny offers important insights into the evolution of the family at different taxonomic levels, and a robust platform for future evolutionary studies.
A total of 50 fossil records previously assigned to Solanaceae were found in the literature (Table 1, see Additional file 1 for full details). These included 39 seed fossils, one leaf fossil, five flowers, two wood and three pollen fossils (Table 1). None of the leaf or flower fossils showed any distinct morphological characters that allowed us to definitely assign them to the family. Of the two wood fossils, Solanumxylum paranensis can be clearly assigned to Solanaceae based on a large number of anatomical characters such as para- and apotracheal axial parenchyma that is diffuse in aggregates, simple perforation plates, bordered and alternate intervessel pits, homocellular rays, fibres that are polygonal and quandrangular in section, and the presence of septate fibers (Table 1) . The other wood fossil shows no specific characters of Solanaceae except those common to Solanaceae and Asteraceae and lacks axial parenchyma; we do not consider this a member of Solanaceae (Table 1) . Of the two pollen records, the classification of Datura cf. discolor awaits further examination, since no description or illustration of the fossil was provided in the original publication (Table 1). A pollen fossil-taxon from California, based on two poorly preserved specimens of 3-colporate, 5-colpate, prolate shaped grains with striate ornamentation, resembles pollen grains of Lycium, Nolana, and Hyoscyamus[51, 52]. Similar characters appear in the pollen of the unrelated genera Brucea (Simaroubaceae) and Skimmia (Rutaceae) [51, 53], and hence we have not assigned this pollen fossil-taxon to Solanaceae for our analysis (Table 1).
The putative Solanaceae seed fossils were analysed using a combination of characteristics known from clades within the family : (1) Seeds flattened, (2) circular to reniform in shape, (3) hilum sub-laterally or laterally positioned, and (4) testa cells sinuate-margined. We assigned seeds with all four of these characters to the subfamily Solanoideae (N = 28), while those with some but not all of these were assigned to the family as a whole (N = 6) (Table 1) . The currently recognised earliest fossils assignable to Solanaceae include two seed fossils from Eocene Europe: Solanispermum reniforme recorded from various beds from southern England [55, 56], and Solanum arnense, a fossil-taxon described based on a few specimens found from the Lower Bagshot (Table 1) . Neither of these shows the combination of flat seeds with sinuate-margined testa cells, a unique combination that could tie them to the tribe Solanoideae. The flattened seeds of Solanispermum reniforme lack sinuate margined testa cells, and Solanum arnense seeds show the characteristic testa cells but seeds are round rather than flattened. Hence, we consider these fossils as earliest evidence of Solanaceae and the presence of the family in Eocene Europe, but do not assign them to any particular clade within the family. Seeds of the fossil-taxon Cantisolanum daturoides have previously been cited as the oldest known Solanaceae fossil by some authors  but doubtfully a member of the family by others . Results from a CT-scanning study have shown that this Cantisolanum seed is anatropous and does not belong to Solanaceae, but has likely affinities to the monocot family Philydraceae [T. Särkinen, M. Collinson, P. Kenrick, F. Ahmed, unpublished observations].
Our final supermatrix had a taxon coverage density of 0.45, and included 1,075 species of Solanaceae, representing all but two genera (the monospecific Darcyanthus and Capsicophysalis) and 40% of total species within the family, including 34% sampling of species within the large genus Solanum. Two plastid regions, ndhF and trnL-F, were available for all genera except Darcyanthus and Capsicophysalis and the plastid and nuclear regions ITS, waxy and trnL-F were the most densely sampled regions at the species level (Table 2). The matrix included a total of 4,576 variable characters, with an aligned length of 10,672 bp (Table 2). A total of 1,902 bp were excluded from analyses due to ambiguous alignment (see Methods section) resulting in a matrix of 8,770 bp (Table 2). Proportionately, waxy (33.9%) and ndhF (20.6%) contributed most PI (parsimony informative) characters (Table 2). The relatively little-used plastid region trnS-G showed a surprising number of PI characters (13.5% of total), considering it had relatively poor taxon coverage density (0.23), compared to trnL-F which had a coverage of 0.66 but only 6.6% of total PI characters (Table 2). The final matrix included 54.7% missing data (Table 2). At the species level, there was an average of 58.7% missing data, as measured by number of base pairs, but only 49.9% when measured in terms of PI characters expected from the missing regions.
The resolved Maximum Likelihood topology shows strong support for all previously identified major clades within Solanaceae , and increased node support is observed particularly within Solanum (Figure 1). Only major clades and their relationships are discussed here due to the fact that our analyses only accounted for incongruence issues amongst data sets between major clades rather than at shallow taxonomic levels. We encourage readers to refer back to available clade-specific studies for detailed species-level phylogenies (see references cited here and in ref.  for studies prior to 2006); these studies have incorporated larger sets of markers than used here, incorporate methods that test/account for gene tree – species tree incongruence, and discuss issues that could have led to any detected incongruences between gene trees such as polyploidy and/or hybridisation, and incomplete lineage sorting.
The branching order at the base of Solanaceae is not well defined, similar to the findings of Olmstead et al. , and four groups are identified as the first branching taxa: Schizanthus, Duckeodendron, the previously unplaced Reyesia, and the tribe Goetzeoideae (Figure 1). Reyesia has been previously associated with Salpiglossis, but is here placed with Goetzeoideae and Duckeodendron (Figure 1, Additional file 2). The previously unsampled genera Heteranthia, Trianaea and Schraderanthus are placed within Schwenckieae, Juanulloeae, and Physalinae, respectively (Figure 1, Additional file 2). The informally named X = 12 clade is here recovered with strong support and Nicotianoideae is resolved as sister to the rest of the clade (Figure 1). Within the Physalinae, work is clearly needed to delimit monophyletic genera (Figure 1, Additional file 2, see [58, 59]). Two closely related genera, Larnax and Deprea, are resolved as sister to Withaninae, in agreement with morphology (Figure 1, Additional file 2). These genera have been linked with Iochrominae in some molecular analyses , but considered distant outgroups of Iochrominae by others [58, 59]. The molecular data support the treatment of Schraderanthus as distinct from Leucophysalis, and Schraderanthus is here found as sister to Brachistus + Witheringia (Figure 1, Additional file 2).
Within Solanum, all 12 major clades identified by Weese & Bohs  are recovered, with nearly fully resolved relationships among them (Figure 1). The Thelopodium clade is resolved as the first branching group, and the remaining Solanum species are divided into two strongly supported clades. Clade I comprises all non-spiny, often herbaceous (e.g., tomatoes, potatoes) species without stellate hairs, but also includes woody climbers (e.g., Dulcamaroids) and some shrubby species (e.g., Morelloids). Clade II comprises species that are often shrubs or small trees (although some are only weakly woody), many with prickles and/or stellate hairs (Figure 1). Within Clade I, which includes a total of c. 525 known species, two clear clades are resolved: (1) the Potato clade, with Regmandra clade as the first branching group, and (2) Clade M, including Morelloid, Dulcamaroid, Archaesolanum, Normania, and the African Non-Spiny clades (Figure 1). Relationships within Clade M are well resolved and highly supported, revealing the position of the African Non-Spiny clade as distinct from and not closely related to the Dulcamaroid clade, despite their morphological similarities such as a twining habit and twisting petioles . Within the Potato clade, relationships are equally well resolved: section Petota is resolved as sister to a group comprising the Tomato clade plus a set of smaller early-branching clades (Figure 1). The Regmandra clade, a group of 11 species whose centre of diversity is the hyper-arid Atacama desert, is here resolved as part of the Potato Clade for the first time (Figure 1), a result supported by morphology [62, 63].
Relationships within Clade II are less well-resolved. The clade consists of c. 800 mostly woody species, and includes the large Leptostemonum clade known as "spiny solanums". There is moderate support for S. clandestinum + S. mapiriense as sister to the rest of Clade II (Figure 1). Relationships within the large Leptostemonum clade remain relatively unresolved, but all 14 major clades found in previous analyses  are supported. A set of previously unplaced species, S. crotonoides, S. hayesii, and S. multispinum, are resolved sequentially as sister to the Torva clade (Additional file 2), although on morphological grounds S. hayesii would be a member of the Torva clade.
The general topology of the Bayesian maximum clade credibility tree matched that of the best scoring Maximum Likelihood tree with similar levels of support for major clades (Additional file 3). The only topological difference, although not a hard incongruence, was observed at the base of Solanaceae: Bayesian analyses resolved Schwenckieae as the first branching group within the family, while the base of the tree remained largely unresolved in the maximum likelihood topology. Results from PATHd8 gave generally similar ages as those from the BEAST analysis (Table 3). A notable trend is that BEAST ages were consistently younger especially towards the early-branching nodes (Table 3). The younger ages obtained from the BEAST analysis reflect that diversification rates across Solanaceae have been non-linear especially towards the base of the tree, and/or that extinction and speciation rates have varied across the tree. We will focus our discussion on the BEAST results, which we consider to be more robust due to the more realistic model assumptions used, including the relaxed molecular clock model that accounts for rate variation across lineages, as well as Birth-Death tree model accounting for extinction .
The BEAST results place the stem age of Solanaceae at c. 49 Million years ago (Ma, 95% highest posterior density (HPD): 46–54 Ma), and the crown node at c. 30 Ma (95% HPD 26–34) (Additional files 2 and 3). The crown node of the x = 12 Clade, which is the split between Nicotiana and Solanum, was estimated to be c. 24 My old (95% HPD 23–26) (Additional files 2 and 3). The Solanoideae began diversifying c. 21 Ma (95% HPD 19–23) (Additional files 2 and 3). Solanum, a genus which includes nearly half of the total species diversity in the family, split from Jaltomata c. 17 Ma (95% HPD 15–19) and started diversifying c. 16 Ma (95% HPD 13–18) (Additional files 2 and 3). The Solanum – Capsicum split, corresponding to the most common ancestor of Solanum & Physalis, occurred c. 19 Ma (95% HPD 17–21) (Additional files 2 and 3). Within Solanum, major splits include tomato – potato c. 8 Ma (95% HPD 7–10) and the eggplant –tomato/potato lineages corresponding to the Clade I – Clade II split c. 14 Ma (95% HPD 13–16) (Additional files 2 and 3). Crown node age estimates show that section Petota, which includes all cultivated potatoes, started diversifying c. 7 Ma (95% HPD 6–9), section Lycopersicon, which includes the cultivated tomato, c. 2 Ma (95% HPD 1–3), the group containing all species of cultivated eggplants (S. melongena, S. anguivi and S. macrocarpon) c. 3 Ma (95% HPD 2–4), and the group (C. frutescens – C. eximium) including all cultivated pepper species c. 3 Ma (95% HPD 2–4) (Additional files 2 and 3).
Phylogenetic relationships within Solanaceae
Although individual studies have contributed significantly to a better understanding of the systematics and evolution of the family at generic and tribal levels, our results bring together data from a large number of studies into a single analysis, and present a coherent view on the current systematic knowledge of this diverse family and its major clades. Our analyses support all of the major clades previously identified within Solanaceae , Solanum and the Leptostemonum clade of Solanum. All of these major clades within the family are now strongly supported, and furthermore, our results reveal strongly-supported relationships between the major clades of the mega-diverse genus Solanum, strengthening the backbone.
The increased resolution in the current phylogeny can be attributed to both the increased sampling of markers as well as species. In the quest for better resolved phylogenies, studies often seek large amounts of sequence data, but it is now well established that increased species sampling can have an equally positive effect on phylogenetic resolution and accuracy [66–68]. The addition of more species to a data set has the effect of splitting long branches and detecting multiple substitutions, as well as resolving phylogenetic conflict, improving parameter estimation, and making inferences less dependent on particular evolutionary models . In our approach we chose to maximise species sampling, while minimising missing data by choosing only the most densely sampled markers available. This approach generally boosted resolution without introducing any of the significant negative effects that large amounts of missing data can have on phylogeny estimation.
Our study presented here is a significant step forward in working towards a fully sampled species-level phylogeny for Solanaceae. A previous study by Goldberg et al.  included 995 species but did not present a fully annotated molecular phylogeny that would allow an analysis of systematic relationships within the family. With > 1,000 species now covered, the current phylogeny includes 40% of known species and all genera of Solanaceae, except the monospecific and recently segregated Darcyanthus and Capsicophysalis. This is a substantial improvement on previous studies, and our current phylogeny is one of the best sampled family-level studies in angiosperms e.g., [69–71].
The sampling is now adequate to test for generic monophyly in previously poorly sampled groups. Although the number of genera is becoming stable with 97 currently recognised genera in Solanaceae (recent changes include those documented in refs. [26, 60, 72]), our analyses support previous results in identifying a set of groups where generic re-evaluation will be necessary, including Lycianthes/Capsicum, the genera in the Physalineae (especially Physalis) , Deprea/Larnax, the Iochrominae , and the Australian endemics in the Anthocercideae (see Additional file 2); many of these clusters of generic problems have been identified by previous authors.
Broader level relationships within Solanaceae and Solanum, as well as generic delimitations and problems identified in previous studies are supported by our species-rich dataset. Relationships between some of the major clades remain unresolved, however, most notably those at the base of the family and within the Solanoideae, and the Leptostemonum clade of Solanum. Resolving these nodes will be a priority in order to better understand evolution of some particularly complex traits, such as chromosome evolution. For example, resolving the sister group to the X = 12 clade, as well as the first branching taxa within Solanaceae, would allow us to determine the ancestral base chromosome number in the family and to fully understand directionality of chromosome evolution. Despite the increased resolution introduced by the use of more sequence data and higher species-level sampling, our results do not show any improvement in the resolution in these critical nodes. More genes will be needed to resolve these relationships, but the question remains which genes should be used. Highly variable nuclear loci, such as COSII markers already used in Solanaceae [73, 74], and the PPR genes used in families within the related Asterid order Lamiales [75, 76], present the most promising candidates. The widely sequenced regions ITS, waxy, ndhF, and trnSG are the most variable across the Solanaceae and species-level sampling using these regions should be increased. The traditionally used plastid marker trnT-F, which is relatively slowly evolving within Solanaceae, is known to include pseudogenes in Solanum and care should be taken when using this region in phylogenetic studies.
Solanaceae fossil record
A few fossils have been used in previous molecular dating studies of Solanaceae, but without re-evaluation of fossil morphology and hence their placement within the phylogeny [29, 46]. As revealed by our literature review, a relatively large record for the family exists. The most usable evidence comes from fossil Solanaceae seeds, the oldest of which are from Eocene Europe (c. 48–40 Ma), with a sharp increase in the number of seed morphotypes observed towards the Pleistocene. The fossil seeds can be divided into two sets: (1) seeds showing four morphological characters present in the extant members of the Solanoideae, and (2) seeds that bear resemblance to the family in general but cannot be assigned to more specific clades within it because they lack the unique combination of seed flattening and presence of sinuate-margined testa cells. Although some of these fossils have been described with names associated with extant species and/or genera, our morphological review shows that none of them show unique morphological characters that can be used to place them to any extant genera. We consider the placement of these fossils on terminal nodes as has been done by previous authors [29, 46] unjustified.
All of the fossils we were able to unambiguously identify as Solanaceae are from Eocene Europe, where none of the first branching lineages of the family occur. South America is the centre of diversity of extant Solanaceae, and all of the early diverging lineages are exclusively found in the New World. This suggests that the fossil record of the family is still far from complete, and that further studies on South American fossils might reveal crucial evidence with respect to the timing of diversification in Solanaceae. A promising avenue for future fossil studies would be to carefully evaluate wood fossil records, especially Cretaceous-Eocene material from the area in which the early-branching lineages all now occur [16, 22].
Dates for Solanaceae
Our study is the largest Bayesian molecular dating analysis executed to date in terms of taxon sampling. Most previous studies have used Bayesian dating methods after pruning their original, large phylogenetic datasets largely due to an a priori assumption that Bayesian methods cannot cope with datasets with >500 terminals e.g., [78–80]. Our study with 1,075 species and >10,000 bp of sequence data demonstrates that large matrices with >500 terminals can be analysed using Bayesian dating methods. Further studies are needed, however, to fully explore best methods for analysing large datasets with the currently available dating methods that implement relaxed molecular clock models required for analyses of diverse clades where rates are expected to vary [81, 82]. Such studies should focus on exploring trade-offs between number of taxa, complexity of models and partitions used in order to fully understand limitations and potential error sources in large scale analyses.
In our dating analysis, we followed the recent recommended best practice guidelines for fossil calibration  and placed fossil calibrations at stem nodes of the most inclusive extant groups using apomorphy-based morphological assignment. Morphological evidence from the seed fossils only allowed assignment to the broad groups Solanoideae or Solanaceae as a whole. Fossils provide only minimum age estimates for the nodes they are assigned, and hence results from our dating analysis where fossil calibrations were used should be considered as minimum age estimates. We further biased our results towards younger ages by assigning the oldest known fossils of Solanaceae to the stem node of the family rather than to more specific nodes within Solanaceae due to lack of morphological and anatomical characters that could be used to assign them to more specific nodes. There is always a possibility, however, that these seeds represent more specific clades within Solanaceae, which would push back age estimates for the family. Currently, the earliest fossil evidence for the family comes from Eocene Europe, but based on biogeographic analyses, the crown group of Solanaceae is thought to have originated and first diversified in South America [16, 22, 84]. Total evidence analysis, where fossils are placed as terminal taxa in the dating analysis using both molecular and morphological data matrix, could help in exploring the robustness of fossil placement , but as pointed above, the lack of characters in the Solanaceae seed fossils does not currently permit such analyses. The most promising avenue in strengthening the dating analysis would be in finding further fossil records (see Solanaceae fossil record above). This would increase the number of fossil calibration points and allow the use of cross-validation methods .
The rate of molecular evolution in plants has been found to correlate with life history traits, whereby longer living species show consistently lower substitution rates compared to shorter living species . Molecular clock models should incorporate such rate variation, especially in groups such as Solanaceae which include a range of growth and life forms. Our dating analyses did not incorporate such models, although the model used in our Bayesian analysis allows rates to vary between lineages independently. The lack of such models in our analyses implies that the age of herbaceous, shorter lived plants (e.g., Schizanthus and the Tomato clade of Solanum) will be systematically overestimated, while ages in dominantly woody clades (e.g. Solanum Clade II) will be consistently underestimated. Future studies should explore how molecular clock models that account for rate variation due to life history traits could be implemented.
Previous studies have produced a wide range of estimates for the stem age of the family, ranging from 34–85 Ma [48, 87–89], but none of these studies included dense sampling within the family nor used robust Solanaceae-specific fossil calibrations. Paape et al.  analysed divergence times within Solanaceae but with a small dataset consisting of 29 species only. This study was based on three fossil calibration points without re-assessment or morphological study of the original fossils, and estimated Solanaceae stem age to have diverged 62 Ma (95% HPD 54–70 Ma) . The oldest estimates for the family stem node age come from earlier molecular studies which used calibration points with more simplistic dating methods (65–85 Ma) [87, 88], while the most recent molecular dating study of angiosperms by Bell et al.  who used 36 fossil calibrations across the tree and a relaxed molecular clock model, estimated the Solanaceae stem node to have diverged c. 59 Ma (95% HPD 49–68 Ma). Our results, which we consider as minimum ages, are broadly consistent with Bell et al.  in estimating the stem node of Solanaceae to date back to c. 49 Ma (95% HPD: 46–54).
The age of the major splits within the family has been of interest to various fields, including studies on chromosomal  and genome evolution [5, 6, 91]. Our minimum age estimates for the major splits between tomato – potato (c. 8 Ma, 95% HPD 7–10), eggplant – tomato/potato (c. 14 Ma, 95% HPD 13–16), Solanum – Capsicum (c. 19 Ma, 95% HPD 17–21), and Solanum – Nicotiana (c. 24 Ma, 95% HPD 23–26) are consistent with the age estimates produced in previous studies without fossil calibrations using much sparser sampling and more simplistic molecular clock models [4, 6, 91]. Our results for the Nicotiana – Symonanthus split (c. 15 Ma, 95% HPD 11–20) corroborate results obtained using island age (c. 15 Ma)  and those calculated using paralogy-free subtree analysis (>15 Ma for section Suaveolentes) . Our results presented here suggest that the rate of chromosomal and genome evolution within Solanaceae has been marginally slower at least within particular lineages than previously thought. With the densely sampled chronogram presented in this study, a more detailed analysis of chromosomal evolution at the species level could now be performed in the Solanaceae to study rate differences and drivers of chromosomal changes such as environmental or life history factors. Similarly, morphological characters such as fruit type  could be analysed in relation to diversification rates to identify whether particular morphological traits are associated with speciation rate shifts in Solanaceae.
Despite much focus on character and trait evolution within Solanaceae, little has been known about the origin of traits in the family in terms of time. We present here minimum age estimates and associated confidence intervals for the entire Solanaceae using a species-rich dataset comprising almost half of the species diversity within the family. This densely sampled chronogram will provide the basis for unravelling the tempo and mode of evolution of many of the much-studied and complex traits in this diverse and economically important family such as self-incompatibility, fruit type, cold and salt tolerance, disease resistance, chromosomal re-arrangements, genome size, and gene sub-functionalization.
References to fossil records were compiled from various sources, including Yale Paleobotany Online Catalog (http://peabody.yale.edu/collections/paleobotany), the Paleobiology Database (http://paleodb.org), InsideWood Database (http://insidewood.lib.ncsu.edu), Burke Paleontology Collection Database (http://www.washington.edu/burkemuseum/collections/paleontology), the Stratigraphy Database (http://www.stratigraphy.net), Fossil Record 2 , and Google searches on terms "Solanaceae" and "fossil". The morphology of two fossil specimens was analysed using high-resolution X-ray computed tomography (Table 1) [T. Särkinen, M. Collinson, P. Kenrick, F. Ahmed, unpublished observations]. The morphology of other specimens was evaluated using descriptions and illustrations provided in original publications. The numeric ages for fossils were derived by matching the specific strata from which fossils were found with the most recent geochronological stratigraphy found in the literature (see Additional file 1). The oldest fossil specimens assigned to Solanaceae and the Solanoideae stem nodes were then used as calibration points (see below). The younger age brackets of these oldest specimens were used following best practise guidelines .
Supermatrix construction and analysis
Our supermatrix data harvesting and construction largely followed the modified supermatrix method termed 'mega-phylogeny’ designed for larger datasets by Smith et al. . The mega-phylogeny method has been designed for large datasets, where maximally dense supermatrices are built based on BLAST searches of all genebank sequences limited to the taxonomic rank of interest . This differs from traditional supermatrix approach where no threshold to missing data or taxa is set, and the resulting sparser matrices are built using clustering techniques.
We looked for all orthologous sequence data available in GenBank release 184 using the PhyLoTA Browser . PhyLoTA identifies available sequence clusters based on BLAST searches, where all sequences for the specified taxonomic group are blasted against each other. We explored all phylogenetically informative sequence clusters identified by PhyLoTA for Solanaceae, and chose seven clusters that had the highest taxon sampling both in terms of genera and species. These seven clusters included data from two nuclear (waxy and ITS) and five plastid regions (matK, ndhF, trnS-G, trnL-F, psbA-trnH) (Table 2). Gaps in generic sampling were identified and sequences for three previously unsampled genera, Trianaea, Heteranthia, and Archihyoscyamus, were generated for ndhF, trnL-F, and ITS (Additional file 4). Further sequences were generated for poorly sampled genera (Reyesia, Benthamiella, Deprea, and particular clades of Solanum) (Additional file 4). The new sequences were joined with the clusters downloaded from PhyLoTA. Each region was aligned using the profile alignment algorithms Muscle  and MAFFT [98, 99], after which all datasets were manually checked and adjusted to assure high quality alignments. MAFFT produced better quality alignments compared to Muscle for the most complex alignments (ITS and waxy) based on visual comparisons. Short multirepeats and ambiguously alignable regions were excluded. For trnL-F, a variable repeat region towards the 5’ end of the intergenic spacer was removed; this is where putative pseudogenic copies of trnF have been found in Solanum. Taxon names were checked for synonomy in all matrices. Duplicate sequences for species were pruned out. Montinia (Montiniaceae), Convolvulus and Ipomoea (Convolvulaceae) were added as outgroups representing two of the closely related families of Solanaceae within the order Solanales  Gene regions were analysed individually using MrBayes v. 3.1.2 [101, 102] via the Oslo Bioportal  in order to visually check for topological incongruence, rogue taxa, and presence of potentially misidentified sequences.
Ten potentially misidentified sequences were detected in the individual analyses and removed prior to supermatrix construction (Additional file 5). No hard incongruences were detected between the individual matrices with respect to the major clades of the Solanaceae. Incongruence issues were not tested at shallower taxonomic levels due to methodological constraints, and hence individual studies cited in the Background section should be referred to for phylogenetic relationships within genera or major clades in Solanum. The software AIR-Appender as implemented in the Oslo BioPortal  was used to concatenate the individual matrices. We measured missing data in two ways: missing data per gene region and per species. Missing data for each species was calculated using two measures, missing data and missing information. Missing data was measured as the absolute number of missing base pairs, while missing information was measured as the sum of the parsimony informative characters of missing regions. All species with > 90% missing data and/or information were removed prior to analysis.
Before analysis, the matrix was cleaned by pruning rogue taxa, identified as unstable terminals causing artificial lowering of branch support, using the software RogueNaRok . RogueNaRok analyses were based on trees derived from fast RAxML bootstrap analyses using a 50% majority-rule consensus threshold and support values for optimization with drop setsize set to one. Four iterations were run and rogue taxa were removed after each iteration. Rapid bootstrap analyses were run in RAxML-VI-HPC v2.0.1 [105, 106] via the CIPRES Science Gateway  applying partitioning for each gene region using a GTR + CAT approximation rate substitution model and the rapid Bootstrap algorithm with 100 replicates . We removed a total of 85 rogue taxa, some of which had a large amount of missing data and/or information (60-90%), but others with nearly complete sampling. The final matrix included 10,672 bp of aligned sequence data of which 1,902 bp were excluded due to ambiguous alignment (Additional files 6, 7, 8). The matrix included a total of 1,075 Solanaceae species and a single outgroup (Ipomoea, Convolvulaceae). We minimized outgroup sampling in order to simplify the BEAST analysis, as the number of outgroups significantly affected run time. The final supermatrix was analysed using RAxML-VI-HPC v2.0.1  via the CIPRES Science Gateway applying partitioning for each gene region using GTR + CAT approximation rate substitution model and the rapid Bootstrap algorithm with 1,000 replicates. The resulting trees were used either as input trees or as starting topologies for dating analyses.
Molecular dating analyses
The Bayesian uncorrelated relaxed clock-model as implemented in BEAST [108, 109] was used as a primary dating method because it allows for rate variation across branches and measures for rate autocorrelation between lineages. Topology and node ages are estimated simultaneously in BEAST, hence topological uncertainty is incorporated into node age estimation. The best tree from the RAxML search was used as a starting topology (Additional file 9). Each region was partitioned separately and given its own substitution model (GTR + G) and rate. A Birth-Death tree prior was used, which accounts for both speciation and extinction . The Solanoideae seed fossils were used to constrain the stem node of Solanoideae with a lognormal offset of 23.0 Ma, mean of 0.01, and standard deviation (SD) of 1.0. The age constraint reflects the youngest age bracket of the oldest known fossil seed assignable to the Solanoideae. Similarly, the Solanaceae stem node was constrained with a lognormal offset of 46.0 Ma, mean of 0.01, and SD of 1.0 based on the youngest age estimate of the oldest fossil specimen of Solanaceae type seeds. Priors for the relaxed clock model mean rate and standard deviation were set to 1.0 and 0.3, respectively, based on known substitution rates in plants. The parameter weights of the delta exchange operator were modified to reflect the length of each partition. Default priors were used for all other parameters. A total of 100 million generations (10 runs with c. 10 million generations each) were run in BEAST v.1.7.4 . Results were combined using LogCombiner and TreeAnnotator (BEAST package).
A second dating analysis was run using PATHd8 . PATHd8 is a local rate smoothing method that estimates node ages by calculating mean path lengths from the node to the tips. Deviations from a strict molecular clock are corrected as suggested by the calibrated nodes. Only simple calibrations are allowed as point estimates of minimum, maximum or mean ages. Because substitution rates are smoothed locally, rather than simultaneously over the whole tree, PATHd8 allows analysis of very large trees. The best tree from the RAxML search was used as the input phylogeny for the PATHd8 analysis (Additional file 10). The stem node of Solanoideae was constrained with the identified Solanoideae seed fossils with minimum age of 23.0 Ma. PATHd8 requires a minimum of one fixed node constraint, and hence the stem node of the family was constrained with a fixed age of 46.0 Ma. Results from both the Maximum Likelihood and Bayesian dating analyses have been deposited in TreeBase (http://purl.org/phylo/treebase/phylows/study/TB2:S14458).
The Potato Genome Sequencing Consortium: Genome sequence and analysis of the tuber crop potato. Nature. 2011, 475: 189-195. 10.1038/nature10158.
The Tomato Genome Consortium: The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012, 485: 635-641. 10.1038/nature11119.
Bombarely A, Rosli HG, Vrebalov J, Moffett P, Mueller LA, Martin GB: A draft genome sequence of Nicotiana benthamiana to enhance molecular plant microbe biology research. Mol Plant Microbe Interact. 2012, 25: 1523-1530. 10.1094/MPMI-06-12-0148-TA.
Wu F, Tanksley SD: Chromosomal evolution in the plant family Solanaceae. BMC Genomics. 2010, 11: 182-10.1186/1471-2164-11-182.
Doganlar S, Frary A, Daunay M-C, Lester RN, Tanksley SD: A comparative genetic linkage map of eggplant (Solanum melongena) and its implications for genome evolution in the Solanaceae. Genetics. 2002, 161: 1697-1711.
Wang Y, Diehl A, Wu F, Vrebalov J, Giovannoni J, Siepel A, Tanksley SD: Sequencing and comparative analysis of a conserved syntenic segment in the Solanaceae. Genetics. 2008, 180: 391-408. 10.1534/genetics.108.087981.
Robertson KA, Goldberg EE, Igic B: Comparative evidence for the correlated evolution of polyploidy and self-compatibility in Solanaceae. Evolution. 2011, 65: 139-155. 10.1111/j.1558-5646.2010.01099.x.
Goldberg EE, Kohn JR, Lande R, Robertson KA, Smith SA, Igic B: Species selection maintains self-incompatibility. Science. 2010, 330: 459-460. 10.1126/science.1198063.
Igic B, Bohs L, Kohn JR: Ancient polymorphism reveals unidirectional breeding system shifts. Proc Natl Acad Sci USA. 2006, 103: 1359-1363. 10.1073/pnas.0506283103.
Knapp S: Tobacco and tomatoes: a phylogenetic perspective on fruit diversity in the Solanaceae. J Exp Bot. 2002, 53: 2001-2022. 10.1093/jxb/erf068.
Knapp S: On 'various contrivances’: pollination, phylogeny and flower form in the Solanaceae. Phil Trans R Soc B. 2010, 365: 449-460. 10.1098/rstb.2009.0236.
Rodriguez GR, Munos S, Anderson C, Sim S-C, Michel A, Causse M, McSpadden BB, Francis D, van der Knaap E: Distribution of SUN, OVATE, LC, and FAS in the tomato germplasm and the relationship to fruit shape diversity. Plant Physiol. 2011, 156: 275-285. 10.1104/pp.110.167577.
Van der Knaap E, Tanksley SD: The making of a bell pepper-shaped tomato fruit: identification of loci controlling fruit morphology in Yellow Stuffer tomato. Theor Appl Genet. 2003, 107: 139-147.
Lippman ZB, Cohen O, Alvarez JP, Abu-Abied M, Pekker I, Paran I, Eshed Y, Zamir D: The making of a compound inflorescence in tomato and related nightshades. PLoS Biol. 2008, 6 (Pekker I): e288-
Chitwood DH, Headland LR, Filiault DL, Kumar R, Jimenez-Gomez JM, Schrager AV, Park DS, Peng J, Sinha NR, Maloof JN: Native environment modulates leaf size and response to simulated foliar shade across wild tomato species. PLoS One. 2012, 7: e29570-10.1371/journal.pone.0029570.
Olmstead RG: Phylogeny and biogeography in Solanaceae, Verbenaceae and Bignoniaceae: a comparison of continental and intercontinental diversification patterns. Bot J Linn Soc. 2013, 171: 80-102. 10.1111/j.1095-8339.2012.01306.x.
Chitwood DH, Headland LR, Ranjan A, Martinez CC, Braybrook SA, Koenig DP, Kuhlemeier C, Smith RS, Sinha NR: Leaf asymmetry as a developmental constraint imposed by auxin-dependent phyllotactic patterning. Plant Cell. 2012, 24: 2318-2327. 10.1105/tpc.112.098798.
Kallenbach M, Bonaventure G, Gilardoni PA, Wissgott A, Baldwin IT: Empoasca leafhoppers attack wild tobacco plants in a jasmonate-dependent manner and identify jasmonate mutants in natural populations. Proc Natl Acad Sci U S A. 2012, 109: E1548-E1557. 10.1073/pnas.1200363109.
Weinhold A, Baldwin IT: Trichome-derived O-acyl sugars are a first meal for caterpillars that tags them for predation. Proc Natl Acad Sci USA. 2011, 108: 7855-7859. 10.1073/pnas.1101306108.
Schmidt DD, Kessler A, Kessler D, Schmidt S, Lim M, Gase K, Baldwin IT: Solanum nigrum: A model ecological expression system and its tools. Mol Ecol. 2004, 13: 981-995. 10.1111/j.1365-294X.2004.02111.x.
Olmstead RG, Bohs L: A summary of molecular systematic research in Solanaceae: 1982–2006. 6th International Solanaceae Conference. Edited by: Spooner DB, Bohs L, Giovannoni J, Olmstead RG, Shibata D. 2007, Acta Horticulturae 745, 255-268.
Olmstead RG, Bohs L, Migid HA, Santiago-Valentin E, Garcia VF, Collier SM: A molecular phylogeny of the Solanaceae. Taxon. 2008, 57: 1159-1181.
Garcia VF, Olmstead RG: Phylogenetics of Tribe Anthocercideae (Solanaceae) based on ndhF and trnL/F sequence data. Syst Bot. 2003, 28: 609-615.
Clarkson JJ, Knapp S, Aoki S, Garcia VG, Olmstead RG, Chase MW: Phylogenetic relationships in Nicotiana (Solanaceae) inferred from multiple plastid DNA regions. Mol Phylogenet Evol. 2004, 33: 75-90. 10.1016/j.ympev.2004.05.002.
Yuan Y-W, Zhang Z-Y, Chen ZD, Olmstead RG: Tracking ancient polyploids: A retroposon reveals an extinct diploid ancestor in the polyploid ancestry of Belladonna. Mol Biol Evol. 2006, 23: 2263-2267. 10.1093/molbev/msl099.
Levin RA, Bernardello G, Whiting C, Miller JS: A new generic circumscription in tribe Lycieae (Solanaceae). Taxon. 2011, 60: 681-690.
Chen S, Matsubara K, Omori T, Kokubun H, Kodama H, Watanabe H, Hashimoto G, Marchesi E, Bullrich L, Ando T: Phylogenetic analysis of the genus Petunia (Solanaceae) based on the sequence of the Hf1 gene. J Plant Res. 2007, 120: 385-397. 10.1007/s10265-006-0070-z.
Miller RJ, Mione T, Phan H-L, Olmstead RG: Color by numbers: Nuclear gene phylogeny of Jaltomata (Solanaceae), sister genus to Solanum, supports three clades differing in fruit color. Syst Bot. 2011, 36: 153-162. 10.1600/036364411X553243.
Tu T, Dillon MO, Sun H, Wen J: Phylogeny of Nolana (Solanaceae) of the Atacama and Peruvian deserts inferred from sequences of four plastid markers and the nuclear LEAFY second intron. Mol Phylogenet Evol. 2008, 49: 561-573. 10.1016/j.ympev.2008.07.018.
Tate JA, Acosta MC, McDill J, Moscone EA, Simpson BB, Cocucci AA: Phylogeny and character evolution in Nierembergia (Solanaceae): Molecular, morphological, and cytogenetic evidence. Syst Bot. 2009, 34: 198-206. 10.1600/036364409787602249.
Filipowicz N, Renner SS: Brunfelsia (Solanaceae): A genus evenly divided between South America and radiations on Cuba and other Antillean islands. Mol Phylogenet Evol. 2012, 64: 1-11. 10.1016/j.ympev.2012.02.026.
Fregonezi JN, Freitas LB, Bonatto SL, Semir J, Stehmann JR: Infrageneric classification of Calibrachoa (Solanaceae) based on morphological and molecular evidence. Taxon. 2012, 61: 120-130.
Fregonezi JN, Turchetto C, Bonatto SL, Freitas LB: Biogeographical history and diversification of Petunia and Calibrachoa (Solanaceae) in the Neotropical pampas grassland. Bot J Linn Soc. 2013, 71: 140-153.
Poczai P, Hyvönen J, Symon DE: Phylogeny of kangaroo apples (Solanum subg. Archaesolanum, Solanaceae). Mol Biol Rep. 2011, 38: 5243-5259. 10.1007/s11033-011-0675-8.
Bohs L: Phylogeny of the Cyphomandra clade of the genus Solanum (Solanaceae) based on ITS sequence data. Taxon. 2007, 56: 1012-1026. 10.2307/25065901.
Stern S, Bohs L: An explosive innovation: Phylogenetic relationships of Solanum section Gonatotrichum (Solanaceae). PhytoKeys. 2012, 8: 83-98. 10.3897/phytokeys.8.2199.
Fajardo D, Spooner DM: Phylogenetic relationships of Solanum series Conicibaccata and related species in Solanum section Petota inferred from five conserved ortholog sequences. Syst Bot. 2011, 36: 163-170. 10.1600/036364411X553252.
Ames M, Spooner DM: Phylogeny of Solanum series Piurana and related species in Solanum section Petota based on five conserved ortholog sequences. Taxon. 2010, 59: 1091-1101.
Rodriguez F, Spooner DM: Nitrate reductase phylogeny of potato (Solanum sect. Petota) genomes with emphasis on the origins of the polyploid species. Syst Bot. 2009, 34: 207-219. 10.1600/036364409787602195.
Tepe E, Farruggia FT, Bohs L: A 10-gene phylogeny of Solanum section Herpystichum (Solanaceae) and a comparison of phylogenetic methods. Am J Bot. 2011, 98: 1356-1365. 10.3732/ajb.1000516.
Bohs L, Olmstead RG: A reassessment of Normania and Triguera (Solanaceae). Plant Systemat Evol. 2001, 228: 33-48. 10.1007/s006060170035.
Spooner DM, Anderson GJ, Jansen RK: Chloroplast DNA evidence for the interrelationships of tomatoes, potatoes, and pepinos (Solanaceae). Am J Bot. 1993, 80: 676-688. 10.2307/2445438.
Weese TL, Bohs L: A three-gene phylogeny of the genus Solanum (Solanaceae). Syst Bot. 2007, 32: 445-463. 10.1600/036364407781179671.
Quental TB, Marshall CR: Diversity dynamics: molecular phylogenies need the fossil record. Trends Ecol Evol. 2010, 35: 434-441.
Morlon H, Parsons TD, Plotkin JB: Reconciling molecular phylogenies with the fossil record. Proc Natl Acad Sci U S A. 2011, 108: 16327-16332. 10.1073/pnas.1102543108.
Dillon MO, Tu T, Xie L, Quipuscoa Silvestre V, Wen J: Biogeographic diversification in Nolana (Solanaceae), a ubiquitous member of the Atacama and Peruvian Deserts along the western coast of South America. J Syst Evol. 2009, 47: 457-476. 10.1111/j.1759-6831.2009.00040.x.
Tu T, Volis S, Dillon MO, Sun H, Wen J: Dispersals of Hyoscyameae and Mandragoreae (Solanaceae) from the New World to Eurasia in the early Miocene and their biogeographic diversification within Eurasia. Mol Phylogenet Evol. 2010, 57: 1226-1237. 10.1016/j.ympev.2010.09.007.
Martínez-Millán M: Fossil record and age of the Asteridae. Bot Rev. 2010, 76: 83-135. 10.1007/s12229-010-9040-1.
Franco MJ, Brea M: Leños fósiles de la formación Paraná (Mioceno Medio), Toma Vieja, Paraná, Entre Ríos, Argentina: registro de bosques estacionales mixtos. Ameghiniana. 2008, 45: 699-717.
Page VM: Dicotyledonous wood from the Upper Cretaceous of central California II. J Arnold Arbor. 1980, 61: 723-748.
Erdtman G: Pollen morphology and plant taxonomy – Angiosperms. 1952, Stockholm: Almqvist & Wiksell
Bernardello L, Lujan MC: Pollen morphology of tribe Lycieae: Grabowskia, Lycium, Phrodus (Solanaceae). Rev Palaeobot Palynol. 1997, 96: 305-315. 10.1016/S0034-6667(96)00057-7.
Fukuda T, Naiki A, Nagamasu H: Pollen morphology of the genus Skimmia (Rutaceae) and its taxonomic implications. J Plant Res. 2008, 121: 463-471. 10.1007/s10265-008-0174-8.
Hunziker AT: Genera Solanacearum: The genera of Solanaceae illustrated, arranged according to a new system. 2001, Ruggell, Liechtenstein: Gantner
Chandler MEJ: The Lower Tertiary Floras of Southern England II Flora of the Pipe- Clay Series of Dorset (Lower Bagshot). 1962, London: British Museum (Natural History)
Chandler MEJ: The Oligocene Flora of the Bovey Tracey Lake Basin, Devonshire. Bull Brit Mus (Nat Hist) Geol. 1957, 3: 71-123.
Reid C, Chandler MEJ: The London clay flora. 1933, London: British Museum (Natural History)
Smith S, Baum DA: Phylogenetics of the florally diverse Andean clade Iochrominae (Solanaceae). Am J Bot. 2006, 93: 1140-1153. 10.3732/ajb.93.8.1140.
Whitson M, Manos PS: Untangling Physalis (Solanaceae) from the Physaloids: A two-gene phylogeny of the Physalinae. Syst Bot. 2005, 30: 216-230. 10.1600/0363644053661841.
Averett JE: Schraderanthus, a new genus of Solanaceae. Phytologia. 2009, 91: 54-61.
Knapp S: A revision of the Dulcamaroid clade of Solanum L. (Solanaceae). PhytoKeys. 2013, 22: 1-428. 10.3897/phytokeys.22.4041.
Bennett JR: Revision of Solanum section Regmandra (Solanaceae). Edinb J Bot. 2008, 65: 69-112.
Peralta IE, Spooner DM, Knapp S: Taxonomy of wild tomatoes and their relatives (Solanum sect. Lycopersicoides, sect. Juglandifolia, sect. Lycopersicon; Solanaceae). Syst Bot Monogr. 2008, 84: 1-186.
Stern S, de Fatima Agra M, Bohs L: Molecular delimitation of clades within New World species of "spiny solanums" (Solanum subg. Leptostemonum). Taxon. 2011, 60: 1429-1441.
dos Reis M, Inoue J, Hasegawa M, Asher RJ, Donoghue PCJ, Yang Z: Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny. Proc R Soc B. 2012, 279: 3491-3500. 10.1098/rspb.2012.0683.
Philippe H, Brinkmann H, Lavrov D, Littlewood DTJ, Manuel M, Wörheide G, Baurain D: Resolving difficult phylogenetic questions: Why more sequences are not enough. PLoS Biol. 2011, 9: e1000602-10.1371/journal.pbio.1000602.
Driskell A, Ane C, Burleigh JG, McMahon MM, O’Meara BC, Sanderson MJ: Prospects for building the Tree of Life from large sequence databases. Science. 2004, 306: 1172-1174. 10.1126/science.1102036.
Heath TA, Hedtke SM, Hillis DM: Taxon sampling and the accuracy of phylogenetic analyses. J Syst Evol. 2008, 46: 239-257.
The Legume Phylogeny Working Group: Legume phylogeny and classification in the 21st century: Progress, prospects and lessons for other species-rich clades. Taxon. 2013, 62: 217-248.
Hinchliff CE, Roalson EH: Using supermatrices for phylogenetic inquiry: an example using the sedges. Syst Biol. 2013, 62: 205-219. 10.1093/sysbio/sys088.
Baker WJ, Savolainen V, Asmussen-Lange CB, Chase MW, Dransfield J, Forest F, Harley MM, Uhl NW, Wilkinson M: Complete generic-level phylogenetic analyses of palms (Arecaceae) with comparisons of supertree and supermatrix approaches. Syst Biol. 2009, 58: 240-256. 10.1093/sysbio/syp021.
Averett JE, Martínez M: Capsicophysalis: a new genus of Solanaceae (Physaleae) from Mexico and Central America. J Bot Res Inst Texas. 2009, 3: 71-75.
Levin RA, Whelan A, Miller JS: The utility of nuclear conserved ortholog set II (COSII) genomic regions for species-level phylogenetic inference in Lycium (Solanaceae). Mol Phylogenet Evol. 2009, 53: 881-890. 10.1016/j.ympev.2009.08.016.
Tepe EJ, Bohs L: A molecular phylogeny of Solanum sect. Pteroidea (Solanaceae) and the utility of COSII markers in resolving relationships among closely related species. Taxon. 2010, 59: 733-743.
Yuan Y-W, Liu C, Marx HE, Olmstead RG: An empirical demonstration of using pentatricopeptide repeat (PPR) genes as plant phylogenetic tools: Phylogeny of Verbanaceae and the Verbena complex. Mol Phylogenet Evol. 2010, 54: 23-35. 10.1016/j.ympev.2009.08.029.
Yuan Y-W, Liu C, Marx HE, Olmstead RG: The pentatricopeptide repeat (PPR) gene family, a tremendous resource for plant phylogenetic studies. New Phytol. 2009, 182: 272-283. 10.1111/j.1469-8137.2008.02739.x.
Poczai P, Hyvönen J: Identification and characterization of plastid trnF(GAA) pseudogenes in four species of Solanum (Solanaceae). Biotechnol Lett. 2011, 33: 2317-2323. 10.1007/s10529-011-0701-x.
Couvreur TLP, Forest F, Baker WJ: Origin and global diversification patterns of tropical rain forests: inferences from a complete genus-level phylogeny of palms. BMC Biol. 2011, 9: 44-10.1186/1741-7007-9-44.
Bininda-Emonds ORP, Cardillo M, Jones KE, MacPhee RDE, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A: The delayed rise of present-day mammals. Nature. 2007, 446: 507-512. 10.1038/nature05634.
Hunt T, Bergsten J, Levkanicova Z, Papadopoulou A, St John O, et al: A comprehensive phylogeny of beetles reveals the evolutionary origins of a superradiation. Science. 2007, 318: 1913-1916. 10.1126/science.1146954.
Smith SA, Donoghue MJ: Rates of molecular evolution are linked to life history in flowering plants. Science. 2008, 322: 86-89. 10.1126/science.1163197.
Andreasen K, Baldwin BG: Unequal evolutionary rates between annual and perennial lineages of checker mallows (Sidalcea, Malvaceae): evidence from 18S-26S rDNA internal and external transcribed spacers. Mol Biol Evol. 2001, 18: 936-944. 10.1093/oxfordjournals.molbev.a003894.
Parham JF, Donoghue PCJ, Bell CJ, Calway TD, Head JJ, et al: Best practices for justifying fossil calibrations. Syst Biol. 2012, 61: 346-359. 10.1093/sysbio/syr107.
Olmstead RG, Palmer JD: A chloroplast DNA phylogeny of the Solanaceae: subfamilial relationships and character evolution. Ann Mo Bot Gard. 1992, 79: 346-360. 10.2307/2399773.
Ronquist F, Klopfstein S, Vilhelmsen L, Schulmeister S, Murray D, et al: A total-evidence approach to dating with fossils, applied to the early radiation of the Hymenoptera. Syst Biol. 2012, 61: 973-999. 10.1093/sysbio/sys058.
Near TJ, Bolnick DI, Wainwright PC: Fossil calibrations and molecular divergence time estimates in centrarchid fishes (Teleostei: Centrarchidae). Evolution. 2005, 59: 1768-1782.
Wikström N, Savolainen V, Chase MW: Evolution of the angiosperms: calibrating the family tree. Proc R Soc Lond B. 2001, 268: 2211-2220. 10.1098/rspb.2001.1782.
Bremer K, Friis EM, Bremer B: Molecular phylogenetic dating of Asterid flowering plants shows early Cretaceous diversification. Syst Biol. 2004, 53: 496-505. 10.1080/10635150490445913.
Bell CD, Soltis DE, Soltis PS: The age and diversification of the angiosperms re-revisited. Am J Bot. 2010, 97: 1296-1303. 10.3732/ajb.0900346.
Paape T, Igic B, Smith SD, Olmstead R, Bohs L, et al: A 15-Myr-old genetic bottleneck. Mol Biol Evol. 2008, 25: 655-663. 10.1093/molbev/msn016.
Blanc G, Wolfe KH: Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004, 16: 1667-1678. 10.1105/tpc.021345.
Clarkson JJ, Lim KY, Kovarik A, Chase MW, Knapp S, Leitch AR: Long-term diploidization in allopolyploid Nicotiana section Repandae. New Phytol. 2005, 168: 214-252.
Ladiges PY, Marks CE, Nelson G: Biogeography of Nicotiana section Suaveolentes (Solanaceae) reveals geographical tracks in arid Australia. J Biogeogr. 2011, 38: 2066-2077. 10.1111/j.1365-2699.2011.02554.x.
Benton MJ: The fossil record 2. 1993, London: Chapman & Hall
Smith SA, Beaulieu JM, Donoghue MJ: Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches. BMC Evol Biol. 2009, 9: 37-10.1186/1471-2148-9-37.
Sanderson MJ, Boss D, Chen D, Cranston KA, Wehe A: The PhyLoTA Browser: processing GenBank for molecular phylogenetics research. Syst Biol. 2008, 57: 335-346. 10.1080/10635150802158688.
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.
Katoh K, Kuma K, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005, 33: 511-518. 10.1093/nar/gki198.
Katoh K, Toh H: Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008, 9: 286-298. 10.1093/bib/bbn013.
The Angiosperm Phylogeny Group: An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc. 2009, 161: 105-121.
Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogeny. Bioinformatics. 2003, 17: 754-755.
Ronquist F, Huelsenbeck JP: MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2005, 19: 1572-1574.
Kumar S, Skjæveland Å, Orr R, Enger P, Ruden T, Mevik B-H, Burki F, Botnen A, Shalchian-Tabrizi K: AIR: A batch-oriented web program package for construction of supermatrices ready for phylogenomic analyses. BMC Bioinforma. 2009, 10: 357-10.1186/1471-2105-10-357.
Aberer AJ, Krompass D, Stamatakis A: Pruning rogue taxa improves phylogenetic accuracy: An efficient algorithm and webservice. Syst Biol. 2013, 62: 162-166. 10.1093/sysbio/sys078.
Stamatakis A: RAxML-VI-HPC: maximum likelihood-1 based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006, 22: 2688-2690. 10.1093/bioinformatics/btl446.
Stamatakis A, Hoover P, Rougemont J: A rapid boostrap algorithm for the RAxML Web servers. Syst Biol. 2008, 57: 758-771. 10.1080/10635150802429642.
Miller MA, Pfeiffer W, Schwartz T: Creating the CIPRES Science Gateway for inference of large phylogenetic trees. 2010, New Orleans, LA: Proceedings of the Gateway Computing Environments Workshop (GCE), 1-8.
Drummond AJ, Rambaut A: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007, 7: 214-10.1186/1471-2148-7-214.
Drummond AJ, Ho SWY, Phillips MJ, Rambaut A: Relaxed phylogenetics and dating with confidence. PloS Biol. 2006, 4: e88-10.1371/journal.pbio.0040088.
Gernhard T: The conditioned reconstructed process. J Theor Biol. 2008, 253: 769-778. 10.1016/j.jtbi.2008.04.005.
Britton T, Anderson CL, Jacquet D, Lundqvist S, Bremer K: Estimating divergence times in large phylogenetic trees. Syst Biol. 2007, 56: 741-752. 10.1080/10635150701613783.
We thank Cody Hinchliff and Chris Barton for helpful guidance on mega-phylogeny methods, Mario dos Reis for advice on Bayesian dating methods, Farah Ahmed, Peta Hayes and Paul Kenrick for assistance in fossil review, and Peter Foster for assistance in running analyses. This work is supported by the National Science Foundation (NSF) grant (to LB & SK) 'PBI Solanum – a world treatment’ DEB-0316614.
The authors declare that they have no competing interests.
TS carried out the molecular sequence alignments and analyses, and the fossil review. LB provided sequence data for missing and poorly sampled genera. RO participated in the sequence alignment. SK participated in the design and coordination of the study. All authors contributed to writing, and read and approved the final manuscript.