Research article | Open | Published:
The effects of repeated whole genome duplication events on the evolution of cytokinin signaling pathway
BMC Evolutionary Biologyvolume 18, Article number: 76 (2018)
It is thought that after whole-genome duplications (WGDs), a large fraction of the duplicated gene copies is lost over time while few duplicates are retained. Which factors promote survival or death of a duplicate remains unclear and the underlying mechanisms are poorly understood. According to the model of gene dosage balance, genes encoding interacting proteins are predicted to be preferentially co-retained after WGDs. Among these are genes encoding proteins involved in complexes or in signal transduction.
We have investigated the way that repeated WGDs during land plant evolution have affected cytokinin signaling to study patterns of gene duplicability and co-retention in this important signal transduction pathway. Through the integration of phylogenetic analyses with comparisons of genome collinearity, we have found that signal input mediated by cytokinin receptors proved to be highly conserved over long evolutionary time-scales, with receptors showing predominantly gene loss after repeated WGDs. However, the downstream elements, e,g. response regulators, were mainly retained after WGDs and thereby formed gene families in most plant lineages.
Gene dosage balance between the interacting components indicated by co-retention after WGDs seems to play a minor role in the evolution of cytokinin signaling pathway. Overall, core genes of cytokinin signaling show a highly heterogeneous pattern of gene retention after WGD, reflecting complex relationships between the various factors that shape the long-term fate of a duplicated gene.
Duplications of individual genes and whole genomes are a dominant feature of plant evolution and have been detected in all land plant lineages [1,2,3,4]. Gene duplication is assumed to be a stochastic process and a common fate of a duplicate is its loss [5, 6]. However, the retention of duplicate genes seems to be biased toward certain functional classes of genes [7,8,9]. Another factor that seems to influence the long-term survival of a duplicated gene is the mode of duplication as genes which are predominantly retained differ between whole-genome duplications (WGDs) and small-scale duplication events [10,11,12,13,14]. Studies of WGD events and their effect on core angiosperm genes (i.e., gene families shared by all angiosperm species) showed a generalized pattern of “gene duplicability”, meaning the ability of genes to be retained following WGD. Three categories could be defined: a) “singleton” genes: the majority of core genes occur as single copies and are functionally involved in the maintenance of genome integrity; b) “multicopy” genes: genes remain in a duplicated state throughout time and are functionally biased toward signaling, transport, and metabolism; and c) “intermediate” genes: these genes show a pattern of prolonged duplicate retention spanning several tens of millions of years following WGD but appear eventually to return to singleton status. This later group (intermediate genes) is enriched for genes that are involved in development, growth, and regulation of transcription .
For the categories b) and c), gene dosage balance (GDB) theory is discussed to be a major driver of gene retention after WGD . Basically, the GDB theory states that, for many genes whose products participate in protein complexes, the stoichiometry among interacting gene products (i.e., proteins) must be maintained [10, 15,16,17,18]. Thus, according to GDB, dosage-balance-sensitive genes are predicted to be co-retained after a WGD event. These genes are also predicted to continuously experience purifying selection after duplication leading to prolonged retention. This prolonged retention accompanied by the gradual circumvention of dosage-balance-constraints may increase the possibility that duplicate genes diversify (sub- or neofunctionalization) and become permanently preserved [17, 19]. Genes in the “multicopy” group may have been retained – at least initially – because of dosage balance constraints. The “intermediate” group of gene families can be explained by a scenario of dosage balance that wears off over time, leading to prolonged preservation but ultimate loss of duplicates .
GDB has been extended to informational pathways (e.g., signal tansduction) , in agreement with the observation that preferentially retained gene categories after WGD include signal transduction genes in diverse species such as banana , Arabidopsiss , and the ciliate Paramecium tetraurelia . Here, we have studied the pattern of gene retention and loss of the individual components of core cytokinin signaling after repeated WGDs during land plant evolution to test whether a bias exists in the gene duplicability of the individual components and to explore whether GDB can explain the observed pattern. Cytokinins are plant hormones that play pivotal roles in plant development and its response to changes in the environment . Various studies have indicated that the cytokinin signaling system was established in early divergent land plants, and even some Charophyceae green algae have been found to encode family members of all four components of this signaling pathway [25,26,27]. Thus, cytokinin signaling is an ideal model system for studying the way that the independent and repeated WGDs during land plant evolution have affected the evolution of the individual components of a signaling pathway.
The core signaling of the phytohormone cytokinin is mediated via a variant of the two-component signaling system  (Fig. 1a). The cytokinin molecules are perceived by binding to the Cyclases/Histidine kinases Associated Sensing Extracellular (CHASE) domain of a membrane-bound hybrid histidine kinase (CHASE domain containing histidine kinase, CHK) that serves as receptor [29, 30]. The binding of the hormone leads to the autophosphorylation of the histidine kinase domain. After an intramolecular phosphotransfer to the c-terminal response regulator domain of the receptor, the signal is transferred to histidine phosphotransfer proteins (HPTs). These proteins have been shown to shuttle between the cytoplasm and the nucleus . The HPTs can be divided into enzymatically active and inactive orthologs (pseudo-HPTs). The pseudo-HPTs lack a conserved histidine residue that acts as a phosphorylation site and negatively interfere with pathway activity [32, 33]. HPTs can phosphorylate the response regulator domain of various response regulators. In cytokinin signaling, two types of response regulators have been shown to be important: i) the type-B response regulators (RRB), which are Myb type transcription factors that, upon phosphorylation, initiate the transcription of their target genes, and ii) the type-A response regulators (RRA), which are transcriptionally regulated by the RRB  and have been shown to be negative regulators of the cytokinin signaling pathway .
The study presented here reveals that the individual components of cytokinin signaling were duplicated and retained independently of each other. Although the cytokinin signaling pathway expanded mainly via WGD events, the observed pattern of gene duplicability and the pattern of co-retention after WGDs does not correlate with the predictions of GDB. Instead, downstream elements of the pathway show a trend towards higher gene duplicability compared with upstream elements.
Repeated WGDs during land plant evolution provide the background to study the evolutionary patterns of the cytokinin signaling components
In order to study the evolutionary pattern of the individual components of cytokinin signaling after whole genome duplications, plant species were chosen for further analysis to cover the major meso- and paleopolyploidy events reported in land plant evolution [4, 36,37,38] (Fig. 1b). Furthermore, to allow the identification of all members of the four protein families involved in cytokinin signaling pathway the availability of a large dataset, e.g., a fully sequenced genome or transcriptome, was another criterion to select species. Thus, this study focused on 14 “core” plant species (Table 1, Fig. 1b) for comparative analyses of cytokinin signaling. Beginning with Klebsormidium flaccidum as a representative of the Charophyceae, the algae lineage that gave rise to land plants, the whole spectrum of land plants was covered.
Comparison of copy numbers of the various cytokinin signaling components among the investigated species
Sequences encoding the four components of the cytokinin signaling pathway were identified in the core species and categorized as bona fide CHKs, HPTs, RRAs, or RRBs. The copy numbers of the identified components varied between species and also between the different protein families (Table 1). The number of cytokinin receptors was relatively stable between species, ranging from two to four copies across most land plants. Exceptions found were M. acuminata, Zea mays, Physcomitrella patens, and K. flaccidum with eight, seven, eleven, and nine receptor genes, respectively. In contrast, for HPTs, a steady increase in copy number was detected during land plant evolution starting from two HPTs in the moss P. patens to eight and 10 in Brassica species and Musa acuminata, respectively. In the gymnosperm Picea abies, seven HPT copies were identified, in comparison with three HPT copies in the closely related Pinus taeda. Another noteworthy trend was the emergence of pseudo HPTs in P. abies and in angiosperms with copy numbers ranging from one to two, with the exception of rice for which three pseudo HPTs were identified.
RRA and RRB copy number increased steadily during land plant evolution to form middle size gene families in dicots and monocots. Furthermore, with a few exceptions, the number of RRAs and RRBs in flowering plant species were found to be roughly equal (Table 1).
These differences in the copy number between the four gene families involved in cytokinin signaling indicated that the individual components experienced different evolutionary pressures that influenced their duplicability or rather their retention after WGD.
Reconstruction of cytokinin receptor evolution
The complete sequence of cytokinin receptors included four protein domains (PFAM domains of CHASE, HisKinaseA, HATPase, response regulator receiver) as alignable regions, which covered in total 466 amino acids. To reconstruct CHK evolution during land plant evolution, genes encoding CHKs of the 14 above-mentioned “core species” (Table 1, Fig. 1) were analyzed. Additional species were sampled to improve the phylogenetic reconstructions (Fig. 1). Thus, the final dataset included CHKs from 51 plant species ranging from mosses, lycophytes, ferns, and gymnosperms up to flowering plants (Additional file 1: Table S1). To test the robustness of the tree topology, trees based on different substitution models (nucleotide, codon, and protein substitution models) were calculated and compared. Furthermore, maximum likelihood (ML) and Bayesian reconstruction were performed.
In the resulting trees, the main branching pattern was highly similar. Robinson-Foulds-(RF)-distances (Additional file 2: Table S2) between pairs of trees showed that less than 25% of branches were dissimilar for most pairwise comparisons. Codon models fitted the data best according to a model test with ModelOMatic . Overall, phylogenetic signal in the dataset was sufficient, and tree topology was robust. All reconstructed trees supported the presence of three major clades within angiosperm CHKs (Additional file 3: Figure S1 and S2). These clades were named according to the three well-characterized cytokinin receptors of A. thaliana, which were located within the clades (the AHK2, the AHK3, and the AHK4 clade, respectively). The more basal branches of the phylogenetic tree reconstructions were generally less supported. Furthermore, the positioning of the CHKs from gymnosperms was inconsistent in the various tree reconstructions. However, CHKs of the early diverging land plants (lycophytes and bryophytes) fell in all tree reconstructions reproducibly outside the above-mentioned groups of the land plant CHKs. At the very base of the tree, we found sequences encoding CHKs of the algae K. flaccidum and the moss P. patens.
Ancient duplications of cytokinin receptors
To analyze the ancient duplication events during cytokinin receptor evolution in more detail, the above-described set of gene trees (ML; Mr. Bayes, codon, protein, nucleotides trees) were reconciled with a species tree that reflected the commonly accepted evolution of land plants (Fig. 1). Furthermore, genomic organization concerning location in collinear or syntenic blocks was studied (Additional file 2: Table S2). The results are summarized in Table 2 and information on colinearity is included in the reconciled CHK tree (Fig. 2).
All reconciled trees supported two ancient duplication events: i) one duplication before the split of gymnosperm and angiosperms giving rise to the ancestor of the AHK4 clade and the ancestor of the AHK2/AHK3 clade (Fig. 2, Additional file 3: Figure S3) which coincides with the ancient ζ WGD event ; ii) a second duplication event before the radiation of angiosperms giving rise to the AHK2 and AHK3 clades which coincides with a well-established ε WGD event at the basis of angiosperm evolution . A third ancient duplication event is predicted within the monocot clade of AHK4. The different reconciled trees support either a correlation with the described commelinid specific τ WGD  (Fig. 2a) or the grass specific σ WGD  (Fig. 2b). Lineage specific duplications of CHKs are supported by the reconciled trees in Z. mays, Gossypium raimondii, and M. acuminata that correlate with lineage specific WGDs. Exactely these species all have a CHK copy number greater than four (see Table 1).
While genomic comparisons did not provide further evidence that the above-described ancient duplications could be traced back to the discussed WGD events, these comparisons could clearly show that the lineage specific WGD in Z. mays and G. raimondii were involved in the copy number increase in these species. For example, in Z. mays, the paralogous CHK pairs (ZeamayCHK01/CHK02, ZeamayCHK03/CHK04, ZeamayCHK05/CHK06) are located in collinear or syntenic regions (Fig. 2; Table 2; Additional file 2: Table S2). Furthermore, the estimated divergence time of the collinear regions (i.e., the paralogous blocks) based on pairwise synonymous nucleotide substitution rates (Ks distances) vary between 0.15 and 0.29 and correlate well with the Ks distance characteristic for gene pairs created by the maize-lineage specific WGD [42, 43]. Interestingly, only three of the four duplicates resulting from this WGD were retained (Fig. 2). In G. raimondii, genomic organization and Ks distances also supports the lineage-specific gene-expansion of CHKs via the five- to six-fold ploidy increase over ~ 60 Mya (Fig. 2; Table 2), characterized by Ks distances ~ 0.5 . In M. acuminata however, which possesses eight CHK-encoding gene copies and experienced three rounds of WGD in the range of 65 to 100 Mya , intragenomic comparisons only support an origin of the paralogs MusacuCHK15/CHK5 (Table 2) through one of these events. Besides, one pair of tandem duplicates, MusacuCHK01/CHK03, has been identified within an ultracontig of the M. acuminata genome. One further duplication event, which led to a copy number increase in Brassica sp., could be identified as a large scale duplication. The paralogs BrarapCHK04/CHK03 are located within an collinear intragenomic region (Fig. 2, Additional file 2: Table S2). Based on phylogenetic reconstruction and Ks distances, they originated from the Brassicaceae-wide α WGD.
Concerning the functionality of the above mentioned additional CHK copies in Z. mays, G. raimondii and M. acuminata, in silico analyses of transmembrane helices predict only three of the eight MusacuCHK copies (MusacuCHK01, CHK06 andCHK15) and five of the seven ZeamayCHK copies (ZeamayCHK03–07) encode functional receptors while all CHKs from G. raimondii are predicted to be functional. Additionally, for ZeamayCHK01/CHK02/CHK04 secretory signal peptides that targets its passenger protein for translocation across the endoplasmic reticulum membrane in eukaryotes are predicted.
In summary, given the scenario of ancient origin of three CHK copies together with the repeated WGDs that occurred in the subsequent radiation of angiosperm species, this indicates that the most common fate of CHK duplicates resulting from WGDs is gene loss. Only in some species that experienced rather recent polyploidization events (Ks values 0.1–0.4) have several duplicates been retained.
HPT copy number increased through palaeo- and mesopolyploidy events
We analyzed and reconstructed the evolution of HPT-encoding genes analogous to the CHKs by combining phylogenetic tree reconstruction, gene tree-species tree reconciliation, and gene synteny analyses to obtain predictions of duplication events and their timing. Compared with the analyses of CHKs, a reduced dataset including only angiosperm species was used as the alignable region of the HPTs covering only the 85 amino acids of the PFAM HPT domain contained limited phylogenetic signal. The HPT copy number in the analyzed species ranged from five to 10 (Table 1).
The reconciled trees supported two different topologies with either two or three repeated duplications before the split of mono- and dicots (Fig. 3a and Additional file 3: Figure S4) indicating that ζ and ε WGD may have been involved in ancestral HPT amplification. According to the reconciled tree, HPTs experienced further lineage-specific amplification in mono- and dicots through WGDs. For example, three of the five HPTs from A. thaliana (AHP2, AHP3 and AHP5) are found in a clade specific for dicots (HPT clade 1, Fig. 3a) and most likely originated by the Brassicaceae-specific α and β WGD (Fig. 3a, Additional file 2: Table S2). Gene colinearity of AHP2 and AHP3 support this phylogeny based reconstruction of WGD events (Table 2, Fig. 2). In contrast, the other two HPTs from A. thaliana that group with monocot HPTs were not amplified by the Brassicaceae-specific WGDs, meaning that the resulting duplicates have been lost (Fig. 3a, HPT clade 2 and 3). The monocot HPTs increased either via the pancereal σ or ρ WGD (Fig. 3a, HPT clade 2) which is supported by colinearity of OsAHP1/OsAHP2 and OsPHP1/OsPHP3. However, with the limitations of Ks distance based dating in mind, it is not possible to clearly identify which of the two WGD was involved (Table 2, Fig. 3). The duplicates resulting from the mesopolyploidy events in the lineage towards Brassica and Z. mays were predominantly lost, except the paralogous pairs BrarapHPT02/HPT03, BrarapHPT07/HPT08, and ZeamayHPT01/HPT02 (Fig. 3b). All three pairs are located in colinear regions with Ks distances similar to the duplicates of the lineage specific polyploidy events. HPTs in M. acuminata showed similar to CHKs a highly species-specific evolutionary pattern and included tandem duplications and/or possibly retroposition as evidenced by the observation that MusacuHPT02/HPT04 are located on one chromosome in close vicinity (chromosome 7) and MusacuHPT02/HPT03/HPT04 completely lack introns.
Concerning the evolution of pseudo-HPTs and thus the evolution of a new regulatory component in cytokinin signaling, the phylogenetic reconstructions support that they have emerged at least twice independently. The duplicability of monocot pseudo-HPTs (OsPHP1/OsPHP3, see above) differed from the dicot pseudo-HPTs as the former were amplified via a lineage-specific WGD event (ρ and/or σ) while the latter were not amplified.
In summary, the most dominant fate of HPT duplicates that arose from WGD events was again gene loss. However, some HPTs copies were retained after the β, α, α,’ and ρ WGD events which in comparison did not led to a similar copy number increase in the upstream acting CHKs.
Response regulator families expand continuously via repeated WGD
The evolution and duplication pattern of RRAs and RRBs of mono- and dicots was reconstructed separately by using the same approach as that for CHKs and HPTs. The 112-amino-acid PFAM response regulator receiver (RR) domain was used as alignable region. However, the tree signal of RRBs was poor indicated by a relative high percentage of low branch supports. Thus, the threshold for the branch support for gene tree reconciliation in the RRB trees was lowered to > 0.75 for SH-like branch support. Furthermore, RRB trees reconstructed with maximum likelihood and Bayesian inference were incongruent with regard to the basal branching pattern and also the placing of some highly diverged RRB sequences from M. acuminata (MusacuRRB38, RRB40, RRB43), but individual clades were consistent between the two reconstructions (Additional file 3: Figure S5 and S6). These branches are highlighted in Fig. 5.
Although similar numbers of RRAs and RRBs exist in flowering plants (Table 1), the in-depth phylogenetic analyses show that their evolutionary pattern is different. For RRAs, the reconciled trees support a scenario of constant copy number increase via repeated paleao- and mesopolyploidy events. The RRAs of monocots and dicots form two main groups indicating that their last common ancestor possessed two RRA copies, which might have arisen from the ancient ζ or ε WGD. Duplication and differentiation of RRAs then occurred independently during monocot and dicot radiation leading to the observed two main clades and the increase in copy number in both plant groups. For example, the phylogenetic analyses indicate that β and γ WGD were involved in the expansion of RRAs within clade 1 (Fig. 4). Phylogeny and collinearity suggest an origin of four paralogous RRA pairs in Arabidopsis thaliana (ARR6/ARR5, ARR15/ARR7, ARR8/ARR9, ARR17/ARR16) via the α WGD event. Thus, RRA duplicates were more likely to be retained compared to the CHKs and HPTs evolution.
On the other hand, RRBs show a more heterogeneous pattern of evolution. Some clades share sequences from monocots and dicots (Fig. 5, clades w - z), suggesting that the last common ancestor possessed at least four RRB copies. Of note, within three of these clades, which include A. thaliana PRR2, ARR11, and ARR14 (clade w, y, z), WGD events rarely led to an increase of the copy number. However, individual RRB copies, such as the A. thaliana ARR10/ARR12 and ARR2/ARR1 gene pairs, originated via WGD events. Compared with RRAs, amplification of RRBs via WGD is patchier. Moreover, RRBs show one more noticeable feature: some gene copies in O. sativa subspec. Japonica  and B. rapa originated from tandem duplication.
To summarize, RRAs and RRBs differ in their evolutionary pattern, although both gene families show a high gene duplicability. RRAs exhibit a clear trend towards gene retention instead of gene loss after WGD. RRBs may have been amplified by additional duplication mechanisms, e.g., tandem duplication. However, the reconstruction of RRBs evolution is in general vaguer, suggesting that they might comprise a functionally heterogeneous group of sequences.
Genes encoding cytokinin receptors were recruited early in land plant evolution for cytokinin signaling
The results of our phylogenetic reconstructions of CHK, HPT, RRA, and RRB evolution are in solid agreement with similar, albeit smaller-scale studies [25, 26, 45, 46] and support an early origin of cytokinin signaling at the base of land plant evolution, and functional cytokinin signaling perhaps is even present in K. flaccidum. However, the CHKs of K. flaccidum, which form a monophyletic clade at the basis of the CHK gene trees, group next to a group of CHKs from the early diverging land plant P. patens recently identified by Gruhn et al. . Although a heterologously expressed member of this subfamily (PpCHK4) has previously been shown to be able to bind cytokinin hormones and to translate this binding into a cellular response in a dose-dependent manner , in P. patens a group of three further CHKs evolved which clade with the classic cytokinin receptors from the other land plants . Genetic and biochemical experiments have shown that these three classic cytokinin receptors are necessary to mediate the cytokinin response in this moss . One can speculate at this point that these three copies were recruited to function specifically in cytokinin signaling and have evolved from ancestors forming the very basal group of CHKs from K. flaccidum and P. patens, whose biological function needs to be investigated.
Interestingly, HPTs and RRBs are also present in Chlorophyceae algae, the branch of the green algae that did not give rise to land plants . However, their function in these green algae is unclear. Thus, we are tempted to speculate that the CHKs might have been recruited specifically in the Charophyceae lineage to serve as a new “plug-in” for the established HPT-RRB network. As has recently been proposed, the reorganization of gene regulatory network architecture is a major factor underlying evolution, and phytohormonal pathways might be used to redirect such gene regulatory networks to fulfill new functions .
Cytokinin signaling pathway expanded via WGD events
The applied approach of integrating phylogenetic analyses with comparisons of genome collinearity, which has previously been successfully applied to identify ancient WGD in monocots , indicates that many of the duplications that have affected cytokinin signaling components are genome-wide or other large-scale duplications. Also for ethylene signaling in M. acuminata, WGD has been shown to be the dominant duplication mode . However, a few exceptions can be found to this common trend. In O. sativum subspec. Japonica, the RRBs OsPRR11, OsPRR12, OsRR29, and OsRR28 originated by tandem duplications (Fig. 5), consistent with repeated unequal cross-overs . In M. acuminata, the genes encoding MusacuHPT02/HPT04 have been identified as tandem duplicates. Both copies are intronless. MusacuHPT03 is closely related to this pair of HPTs and also lacks introns. One can speculate that retroposition was involved in the origin of these three copies; this would represent yet another mechanism for gene duplication affecting members of the cytokinin signaling pathway.
In order to estimate the timing of WGD events that affected cytokinin signaling, we have used Ks distances between paralogous pairs located in collinear intragenomic blocks. These distances only offer rough dates that have to be taken with caution because of variable rates among the different gene families  and between species [50, 51]. Nevertheless, the WGDs identified as affecting cytokinin signaling by using this approach are in good agreement with those in previous studies. A genome alignment spanning major Poaceae lineages supports the amplification of the cytokinin signaling components through WGD events in this lineage . Furthermore, a phylogenomic analysis of ancestral polyploidy events indicates that RRAs and CHKs were amplified before the split of basal angiosperms (Aristolochia, Liriodendron, Nuphar, and Amborella), most likely via the ε WGD .
Patterns of GDB in cytokinin signaling
Of special interest in this study has been the testing of whether the cytokinin signaling components show a signature of GDB. Many previous studies have implied that GDB preferentially “acts” on multiprotein complexes and signal transduction/regulatory networks [15, 17, 20]. The central assumption is: if a pathway, whose components are linked in a dosage-sensitive relationship, is duplicated as a whole via a WGD event, the relative dosage between genes will be preserved, and the duplicated dosage-sensitive genes will be preferentially retained. Thus, “interacting” genes are bound to be co-retained over evolutionary time . In core cytokinin signaling, interacting genes are CHKs, HPTs, RRAs and RRBs (Fig. 1a) and if GDB is a dominating force, co-retention after WGD of these components is predicted (Fig. 6a), leading to a concerted increase of all interacting genes. According to our data, however, a different trend exists in that downstream elements in cytokinin signaling are more likely to be retained while the upstream elements tend to be lost (Fig. 6b). This pattern is in agreement to the trend that upstream genes in a biochemical pathway evolve more slowly than downstream genes [52, 53], which might be explained by network characteristics, as most likely the rate- or flux-controlling elements on which natural selection preferentially acts are the upstream elements in a pathway .
Focusing on the upstream part – the CHKs, which channel the signal into cytokinin signaling – we found relatively simple orthologous relationships, and flowering plants have a similar repertoire of CHKs [45, 46]. Whereas CHKs were amplified before the split of angiosperms possibly via WGD events, the subsequent repeated rounds of palaeopolyploidy events during flowering plant evolution did not affect copy number of CHKs, meaning that the resulting duplicates were lost. This pattern of an unique amplification at the trunk of a tree with very few or no subsequent additions of gene copies is consistent with the phenomenon of “frozen duplications” described by Makarova et al.  as evolutionarily stable paralogous clusters that are not further amplified. Only the more recent mesopolyploidy events led to a copy number increase of CHKs in G. raimondii, Z. mays, and possibly also in M. acuminata. In the two latter species however, several of the additional copies contain extra sequence motifs like transmembrane helices or secretory signal peptides possibly rendering them non-functional in cytokinin signaling. One can speculate, that these copies are in the process of pseudogenization and eventually will be purged from the genome, which may also be the long-term fate for the additional CHK copies in G. raimondii.
HPTs, RRAs, and RRBs displayed overall a higher degree of lineage-specific expansion compared with that of CHKs. However, the in-depth phylogenetic analyses showed that the copy number increase of these components can be traced back to different polyploidy events meaning that the downstream parts were not amplified in a concerted way. Especially striking is the different evolutionary pattern of RRAs and RRBs. While RRAs show a continuous increase in copy number via WGDs, RRBs are much more heterogenous. But, RRBs comprise in general a more heterogenous group of genes. Most but not all of the RRB genes encode an additional Myb domain and some RRBs lack the conserved phosphorylation site. Functional differentiation between these classes is not well understood and thus, the present analyses might guide further functional studies.
Another interesting aspect of the cytokinin signaling are the pseudo-HPTs. GDB is especially suggested to be a driving force between proteins with opposing actions, such as enzymatic or transcriptional activators and inhibitors within a pathway, and might shape the duplicability of the encoding genes . Within the cytokinin signaling pathway the HPTs and the pseudo-HPTs, which lack the conserved His residue, are discussed to be antagonistic players and thus predicted to be co-retained after WGD (Fig. 6c). Of the six HPTs from A. thaliana, only AHP6 is a pseudo-HPT [24, 32]. It negatively interferes with pathway activity, most likely by competing with AHP1–5 for interaction with the phosphorelay machinery [31, 56]. Whereas in A. thaliana only one pseudo-HPT (AHP6) has been identified, this group is especially large in rice with three copies (OsPHP1, PHP2, and PHP3) (Fig. 3). Regarding duplicability, or co-retention after WGD, pseudo-HPTs strongly differ from HPTs. For example, the ancestor of AHP6 and its orthologs in Brassica experienced according to the reconstructed phylogeny up to three WGD events (α, β, γ) but the resulting duplicates have not been retained (Fig. 3). However, the corresponding antagonists (AHP2, AHP3, AHP5) were partly amplified via the α and β WGDs. Thus, in Arabidopsis and Brassica, either the pseudo-HPTs experienced unequal loss after WGDs (Fig. 6c), or their function as repressor evolved only after the last common WGD. In rice, both the pseudo- and the authentic-HPT gene family show a copy number increase due to polyploidy events, however most likely different, pancereal WGD events were involved. Thus, no pattern of co-retention between pseudo-HPT and authentic HPTs is observed.
In general, lineage-specific gene family expansions seem to be one of the principal means of adaptation and one of the most important sources of organizational and regulatory diversity in crown-group eukaryotes . In cytokinin signaling, expansion is mostly found for downstream elements. This amplification might allow the regulatory fine tuning (subfunctionalization) or the rewiring of signal output to execute novel functionality. Furthermore, no unique pattern of co-retention after WGDs, indicative of GDB among the encoded proteins, was identified in cytokinin signaling. Possible explanations for this result are: i) an important prerequisite for GDB to manifest is that the duplicates are co-expressed. While immediately after a WGD event paralogs show an identical expression pattern as coding as well as regulatory sequences have been duplicated, expression pattern subsequently can change. Indeed, expression divergence between paralogs seems to be the rule rather than the exception  and represents an important way of subfunctionalization between duplicates. Furthermore, gene dosage balance can be compensated by regulatory changes . To further test the potential role of GDB in the evolution of cytokinin signaling, detailed co-expression studies of the interacting components (Fig. 6a), the antagonistic interactions partners (Fig. 6c) as well as their duplicated copies in polyploids of different age are necessary. GDB predicts that interaction partners should show a correlated expression level. Thus, after a WGD, an unequal loss of one of the interactions partners needs to be compensated according to the GDB. For several model plants, online tools to study expression pattern are available to get first insights in the dynamic of duplication and expression pattern, e.g. the Bio-Analytic Resource for Plant Biology (bar.utoronto.ca). And ii) a central assumption of GDB, namely that gene expression is directly correlated to copy number variation (CNV), may be not valid. For example, CNV in humans only partly account for the differences in expression between individuals, whereas a large portion of the variance must stem from other sources . Furthermore, most genes present in CNVs in Drosophila melangolaster show no evidence of increased or diminished transcription . Thus, this central assumption of GDB needs to be further studied.
However, while GDB may be of minor importance to co-stabilize duplicates of cytokinin signaling in the long run, in M. acuminata, two antagonistic components of ethylene signaling have been shown to be co-retained after the three lineage-specific WGD events indicating that GDB shaped their evolution . Together, these results reflect the complexity of molecular mechanisms that shape gene duplicability. The fixation and the retention of duplicated genes in plant genomes seem to be context-dependent events, and relevant factors include intrinsic properties, such as gene function, and the environment in which the duplication occurred [60, 61]. Several specific gene features including gene ontology (GO-slim classification), sequence-related features (gene and protein sizes, the GC content in the third codon position, protein domain size), expression-related features (level of expression and biotic and abiotic responsiveness), and conservation-related features (omega, the ratio of relative fixation rates of synonymous and nonsynonymous mutations, as an indicator of selective pressure), have been shown to influence gene retention after WGD . Another important feature of proteins effecting the duplicability of the encoding genes might be their tendency to form symmetrical homomers. Duplication of a gene that encodes a homomeric protein can lead to the phenomenon of paralog interference, which basically predicts a functional link between paralogous genes via interactions of the encoded proteins in multimeric complexes and one important outcome of paralog interference is a prolonged retention timer after duplication . Examples of paralog interference has been described for MADS box transcriptional regulators in the yeast Kluyveromyces lactis  and steroid receptors , both form dimers. Of note, also cytokinin receptors are also thought to function as dimers [28, 65] and paralog interference could influence their duplicability. A prolonged retention of CHKs in Z. mays compared to HPTs, RRAs and RRBs (compare Figs. 2, 3, 4 and 5) could be an indicator but further studies of protein interactions are necessary.
The observation of the non-random loss of genes following WGD has stimulated much discussion regarding the molecular mechanisms that influence these outcomes. The pattern of gene retention after WGDs within the cytokinin signaling pathway fits best to the model stating that downstream genes in a pathway evolve faster. However, most striking is the heterogeneous pattern of gene retention of the various components of cytokinin signaling and the diverse modes of duplication as besides WGD also tandem duplication and possibly retroposition was found. Obviously, various mechanisms at diverse levels of interaction act in shaping the evolution of this signal transduction pathway, all of which require further experimental exploration. Detailed mechanistic studies of specific candidate genes, e.g., young paralogs in neopolyploid species, including analyses of gene expression, gene function (in vivo and in vitro), and protein interactions will allow to gain a more complete picture of the forces that shape the fate of a duplicated gene.
Identification of cytokinin receptor encoding sequences
Sequences encoding putative or confirmed cytokinin receptors were obtained from the genomes and transcriptomes of selected species (Additional file 1: Table S1, Additional file 4: alignment 1) by screening the online platforms “Phytozome”, “Ensemble”, and “1 K database” [66,67,68] by using the Basic Local Alignment Search Tool (BLAST). Species were chosen to cover the evolution of cytokinin signaling during land plant evolution by using the charophyte alga Klebsormidium flaccidum  as an outgroup. For each species, we performed an iterative BLASTN search with the sequences encoding the Cyclase Histidine kinase Associated Sensory Extracellular (CHASE) domain of cytokinin receptors of the model plant Arabidopsis thaliana as query sequences (NCBI accession numbers AHK4 NM_201667, AHK2 BT002530, AHK3 NM_102494). The CHASE domain is the ligand binding domain of cytokinin receptors . Subsequently we used every identified sequence again as query to search for further sequences with homology to cytokinin receptors in the respective species. Sequences were named as CHASE-domain containing His kinase (CHK) according to the common nomenclature  and numbered serially. For the simple and rapid identification of the species, the first three initials of the genus and of the species name are given, e.g., Aratha for Arabidopsis thaliana (Additional file 1: Table S1). To differentiate between Oryza sativa ssp. japonia and Oryza sativa ssp. indica, the first two initials of the genus, species, and subspecies name are given. Furthermore, we added the annotation of well-characterized cytokinin receptors from Arabidopsis thaliana  and Oryza sativa ssp. japonica . If different splicing variants of a gene were identified, only the longest variant was used for subsequent analyses. The individual sequences of this sequence collection were analyzed regarding their protein domain structure by using the PFAM database . Sequences that comprised the following four PFAM protein domains were identified as functional cytokinin receptors: i) CHASE domain (PF03924); ii) Response regulator receiver domain (PF00072); iii) Histidine kinase-, DNA gyrase B-, HSP90-like ATPase domain (PF02518); and iv) His Kinase A (phospho-acceptor) domain (PF00512). As we are interested in the retention of functional genes after WGDs, stringent selection criteria were applied, and sequences that lacked more than 50% in one of the four domains were rejected. The resulting dataset of 166 sequences was used for phylogenetic analyses. Genes encoding cytokinin receptors were also analyzed concerning the presence of transmembrane helices with the “TMHMM Server v. 2.0”  and the presence of signal peptides with the SignalP  and Phobius .
Identification of HPT and RR genes
To investigate the evolutionary pattern of the whole cytokinin signaling pathway, we additionally identified HPT and RR encoding genes in a subset of species that covered flowering plants (Additional file 4: alignment 2–4). We followed the same strategy as that described above, performing an iterative BLASTN search starting with sequences encoding HPT and RR domains from Arabidopsis thaliana . We also used the PFAM database  to analyze the protein domain structure of the identified sequences. Sequences that lacked more than 15% of RR (PF00072)- and HPT (PF01627)-PFAM domains were rejected. For phylogenetic analysis, only the conserved regions encoding the HPT and RR PFAM domain were used as the alignable region. HPT encoding sequences were further classified as authentic His-containing phosphotransfer proteins if they contained a conserved histidine residue, and sequences that lacked this site were classified as pseudo His-containing phosphotransfer proteins . Moreover, response regulators were further categorized into type-A, type-B, type-C, and clock related- and pseudo-response regulators and named as RRA, RRB, RRC, and PRR, respectively . We assigned the sequences of our dataset to these categories based on phylogenetic analysis as previously suggested . We used the well-characterized set of RRs from A. thaliana as a reference (see ) in these phylogenetic analyses and compared the individual species-specific sets of RRs with this reference. For the more divergent species investigated in this study (P. abies, P taeda, S. moellendorffii, and K. flaccidum), A trichopoda was additionally included in the comparative phylogenetic analyses to classify RRs. To perform these analyses, we calculated multiple sequence alignments with the MUSCLE algorithm  implemented in MEGA Software package, version 6.06  and calculated Maximum Likelihood trees with MEGA based on the General Time Reversible (GTR) substitution model with a Gamma distribution to model evolutionary rate differences among sites (5 categories) with 100 bootstrap replicates. Sites with less than 95% site coverage were excluded from the analyses. The RRAs were clearly distinguished as they formed a well-supported monophlyetic clade, even in phylogenetic analysis with diverse species. The same is true for RRCs, which clearly group with the response regulator receiver domain from cytokinin receptors . The PRRs were identified by their phylogenetic position and by an additional characteristic protein motif: the Constans/Constans-like/TOC1 (CCT) domain . The remaining response regulators were designated as RRBs. This group is heterogeneous concerning its phylogenetic composition and includes response regulators that have an additional myeloblastosis (Myb)-related DNA binding domain  but also response regulators that lack this domain as well as response regulators that have been annotated as pseudo-response regulators by Suzuki et al.  based on alterations in the conserved phosphorylation site (the DDK motif) . The biological roles played by latter, noncanonical members are not clear but based on the agreed nomenclature for cytokinin signaling components, they were included in the RRB group in this study . When experimental evidence was available, the information was included in the analysis. Thus, based on work by Tsai and colleagues , OsRR27, OsRR31, OsRR32, OsRR30, and OsRR33 were shown not to function in cytokinin signaling, and thus, these sequences were excluded from our analyses. Furthermore, sequences that contained an additional EHD1 domain were excluded.
To reconstruct the process and pattern of evolution of the cytokinin signaling pathway during land plant evolution with emphasis on WGD events, the following datasets were assembled from the sequence collection for phylogenetic analyses: i) sequences encoding cytokinin receptors of land plants ranging from Klebsormidium flaccidum to monocots and dicots, and from monocots and dicots: ii) sequences encoding HPTs, iii) sequences encoding RRAs, and iv) sequences encoding RRBs. The respective sequences were aligned with the MAFFT multiple sequence alignment algorithm (MSA)  based on codons by using the GUIDANCE web server , which evaluates the confidence of the MSA. Based on the GUIDANCE confidence scores, unreliable columns were removed (threshold 0.93). We used ModelOMatic , which allows comparisons of nucleotide, amino acid, and codon models, to identify the best substitution model for subsequent tree reconstruction. For reconstructing the evolution of cytokinin receptors, the robustness of the reconstructed tree topology was tested by comparing phylogenetic trees based on nucleotide and codon substitution models. When reconstructing the evolution of HPTs, RRAs, and RRBs we focused on the best substitution model predicted by ModelOMatic. In general, maximum likelihood trees based on the general time reversible (GTR) nucleotide substitution model were calculated with RaxML , and the CAT approximation of rate heterogeneity was used to model rate heterogeneity. RaxML involves a rapid hill climbing algorithm to search for the best tree, with the subtree pruning re-grafting (SPR) starting on a randomized maximum parsimony tree. The initial rearrangement settings of SPR and the number of rate categories for the CAT approximation were optimized through comparative analysis by using various settings. Based on the original alignment, 20 tree inferences with the optimized settings were performed to find the best maximum likelihood tree according to the final likelihoods. Branch support was determined by 100 bootstrap repeats and mapped onto the best tree of the 20 inferences.
To calculate maximum likelihood trees under codon models, we used CodonPhyML . We compared three codon models, namely i) the Goldman and Yang model , ii) the Muse and Gaut model , and iii) the YAP model , all of which differ in their instantaneous substitution rates between codons. The stationary frequency of codons and the transition-transversion ratio were estimated by maximum likelihood. The ratio of synonymous and nonsynonymous substitution rates was modeled as being constant over sites (M0 model). Site-rate variation was drawn from a discrete gamma distribution with four classes. Starting from an initial tree built by using the BioNJ algorithm, nearest neighbor interchange was used to search for the best tree topology. Branch support was assessed by using SH-aLRT statistics, which is a conservative test for branch support, comparable to standard bootstrap . The tree with the highest log likelihood was selected for further analyses. In addition to the maximum likelihood approach, we used Bayesian inference to reconstruct phylogenetic trees (Mr. Bayes version 3.2 software; ). The following substitution models were used: i) the GTR model of DNA substitution with among-site variation drawn from a gamma distribution, ii) the Jones fixed-rate model of amino acid substitution, which was chosen via the model jumping option in Mr. Bayes, iii) the implemented codon model, which is based on the GY and MG codon models, to reconstruct phylogenetic trees. In the case of the codon model, the ratio of synonymous and nonsynonymous substitution rates (ω) was again modeled as being constant over sites. The posterior probabilities of the phylogenetic tree model were estimated as part of the Bayesian analyses by using Markov chain Monte Carlo sampling with Metropolis coupling running four chains in two simultaneous analyses. The analyses were run with uniform prior distributions for tree topology. A flat Dirichlet distribution (1.0) was used for stationary frequencies of nucleotides, codons, and amino acids, nucleotide substitution rates, and ω. Exponential priors were used for the shape parameter of the gamma distribution of rate variation. Branch lengths were unconstrained. The MCMC chain was sampled every 500 generations with the burn-in set to 25% from the cold chain. Convergence diagnostics were calculated every 5000 generations and analyses were continued until the average standard deviation of split frequencies reached 0.01. For all trees, cutoffs for the branch support were selected according to the tree signal, also for the subsequent gene tree reconciliation. Trees with a high percentage of low support were interpreted to contain low phylogenetic signal and to be less robust. To compare the tree topologies calculated based on different substitution models and the different algorithms (Mr. Bayes and maximum likelihood), we determined Robinson-Foulds distances (RF distance)  in R, package phangorn 2.4.0, and normalized them by dividing by the maximal possible distance . RF-distance matrix is provided as Additional file 5: Table S3. For illustration, reconstructed RRB cladograms based on maximum likelihood (codonphyML, codon substitution model YAP CF3x4) and Mr. Bayes (codon substitution model) are compared with R package dendextend 1.7.0 . The tangelgram is provided as Additional file 3: Figure S7.
Gene tree reconciliation
We used Notung  for exploring alternate hypotheses about duplication events of the cytokinin signaling genes. Both rooting and a rearrangement tool were used to minimize the overall Duplication/Loss score of a gene tree. A cladogram reflecting land plant evolution was built based on the work of the Angiosperm Phylogeny Group  and Zou et al.  for relationships between rice species. The most basal eudicot is Nelumbo nucifera . Amborella trichopoda is sister to all extant angiosperms . Most basal land lineages are represented by the bryophyte Physcomitrella patens  and Klebsormidium flaccidium marking the transition of aquatic to terrestrial life . This cladogram (Fig. 1) was used as a species tree for gene tree reconciliation.
Comparative genomics (gene collinearity)
We used the PLAZA 3.0 online database  to study the genomic organization of the cytokinin signaling genes. In the PLAZA database, collinear and syntenic regions within and between genomes are pre-computed by using i-ADHoRe . An intra-species comparison of the whole genome (WGDotplot) reports all collinear regions, i.e., duplicated blocks, found within a genome. The age of the paralogs is provided based on pairwise synonymous substitution rate (Ks distances) calculated with PAML .
- CCT domain:
Cyclases/Histidine kinases Associated Sensing Extracellular
CHASE domain containing histidine kinase
Gene dosage balance
General time reversible (substitution model)
Histidine phosphotransfer protein
- Myb domain:
Myeloblastosis related DNA-binding domain
Distance robinson-foulds distance
Type-A response regulator
Type-B response regulator
Whole genome duplication
Whole genome triplication
Adams KL, Wendel JF. Polyploidy and genome evolution in plants. Curr Opin Plant Biol. 2005;8:135–41.
Wood TE, Takebayashi N, Barker MS, Mayrose I, Greenspoon PB, Rieseberg LH. The frequency of polyploid speciation in vascular plants. Proc Natl Acad Sci U S A. 2009;106:13875–9.
Mayrose I, Zhan SH, Rothfels CJ, Magnuson-Ford K, Barker MS, Rieseberg LH, et al. Recently formed polyploid plants diversify at lower rates. Science. 2011;333:1257.
Vanneste K, Baele G, Maere S, de Peer YV. Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the cretaceous–Paleogene boundary. Genome Res. 2014;24:1334–47.
Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–5.
Karev GP, Wolf YI, Berezovskaya FS, Koonin EV. Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models. BMC Evol Biol. 2004;4:32.
De Smet R, Adams KL, Vandepoele K, Van Montagu MCE, Maere S, Van de Peer Y. Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc Natl Acad Sci U S A. 2013;110:2898–903.
Geiser C, Mandáková T, Arrigo N, Lysak MA, Parisod C. Repeated whole-genome duplication, karyotype reshuffling, and biased retention of stress-responding genes in buckler mustard. Plant Cell. 2016;28:17–27.
Li Z, Defoort J, Tasdighian S, Maere S, de Peer YV, Smet RD. Gene duplicability of core genes is highly consistent across all angiosperms. Plant Cell. 2016;28:326–44.
Papp B, Pál C, Hurst LD. Dosage sensitivity and the evolution of gene families in yeast. Nature. 2003;424:194–7.
Maere S, Bodt SD, Raes J, Casneuf T, Montagu MV, Kuiper M, et al. Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci U S A. 2005;102:5454–9.
Freeling M. Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol. 2009;60:433–53.
Huminiecki L, Heldin CH. 2R and remodeling of vertebrate signal transduction engine. BMC Biol. 2010;8:146.
Rodgers-Melnick E, Mane SP, Dharmawardhana P, Slavov GT, Crasta OR, Strauss SH, et al. Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus. Genome Res. 2012;22:95–105.
Veitia RA, Bottani S, Birchler JA. Cellular reactions to gene dosage imbalance: genomic, transcriptomic and proteomic effects. Trends Genet. 2008;24:390–7.
Edger PP, Pires JC. Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes. Chromosom Res. 2009;17:699–717.
Birchler JA, Veitia RA. Gene balance hypothesis: connecting issues of dosage sensitivity across biological disciplines. Proc Natl Acad Sci U S A. 2012;109:14746–53.
Veitia RA, Bottani S, Birchler JA. Gene dosage effects: nonlinearities, genetic interactions, and dosage compensation. Trends Genet. 2013;29:385–93.
Conant GC, Birchler JA, Pires JC. Dosage, duplication, and diploidization: clarifying the interplay of multiple models for duplicate gene evolution over time. Curr Opin Plant Biol. 2014;19:91–8.
Veitia RA. Gene dosage balance: deletions, duplications and dominance. Trends Genet. 2005;21:33–5.
D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F, Garsmeur O, et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–7.
Blanc G, Wolfe KH. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004;16:1679–91.
Aury J-M, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, et al. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature. 2006;444:171–8.
Hwang I, Sheen J, Müller B. Cytokinin signaling networks. Annu Rev Plant Biol. 2012;63:353–80.
Hori K, Maruyama F, Fujisawa T, Togashi T, Yamamoto N, Seo M, et al. Klebsormidium flaccidum genome reveals primary factors for plant terrestrial adaptation. Nat Commun. 2014;5:3978.
Wang C, Liu Y, Li S-S, Han G-Z. Insights into the origin and evolution of plant hormone signaling machinery. Plant Physiol. 2015;167:872–86.
von Schwartzenberg K, Lindner A-C, Gruhn N, Šimura J, Novák O, Strnad M, et al. CHASE domain-containing receptors play an essential role in the cytokinin response of the moss Physcomitrella patens. J Exp Bot. 2016;67:667–79.
Heyl A, Schmülling T. Cytokinin signal perception and transduction. Curr Opin Plant Biol. 2003;6:480–8.
Heyl A, Wulfetange K, Pils B, Nielsen N, Romanov GA, Schmülling T. Evolutionary proteomics identifies amino acids essential for ligand-binding of the cytokinin receptor CHASE domain. BMC Evol Biol. 2007;7:62.
Hothorn M, Dabi T, Chory J. Structural basis for cytokinin recognition by Arabidopsis thaliana histidine kinase 4. Nat Chem Biol. 2011;7:766–8.
Punwani JA, Hutchison CE, Schaller GE, Kieber JJ. The subcellular distribution of the Arabidopsis histidine phosphotransfer proteins is independent of cytokinin signaling. Plant J. 2010;62:473–82.
Mähönen AP, Bishopp A, Higuchi M, Nieminen KM, Kinoshita K, Törmäkangas K, et al. Cytokinin signaling and its inhibitor AHP6 regulate cell fate during vascular development. Science. 2006;311:94–8.
Ruszkowski M, Brzezinski K, Jedrzejczak R, Dauter M, Dauter Z, Sikorski M, et al. M. Truncatula histidine-containing phosphotransfer protein: structural and biochemical insights into cytokinin transduction pathway in plants. FEBS J. 2013;280:3709–20.
Hwang I, Sheen J. Two-component circuitry in arabidopsis cytokinin signal transduction. Nature. 2001;413:383–9.
TO JPC, Haberer G, Ferreira FJ, Deruère J, Mason MG, Schaller GE, et al. Type-a arabidopsis response regulators are partially redundant negative regulators of cytokinin signaling. Plant Cell. 2004;16:658–71.
Van de Peer Y, Fawcett JA, Proost S, Sterck L, Vandepoele K. The flowering world: a tale of duplications. Trends Plant Sci. 2009;14:680–8.
Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, et al. Ancestral polyploidy in seed plants and angiosperms. Nature. 2011;473:97–100.
Mandáková T, Joly S, Krzywinski M, Mummenhoff K, Lysak MA. Fast diploidization in close mesopolyploid relatives of arabidopsis. Plant Cell. 2010;22:2277–90.
Whelan S, Allen JE, Blackburne BP, Talavera D. ModelOMatic: fast and automated model selection between RY, nucleotide, amino acid, and codon substitution models. Syst Biol. 2015;64:42–55.
Jiao Y, Li J, Tang H, Paterson AH. Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. Plant Cell. 2014;26:2792–802.
Tang H, Bowers JE, Wang X, Paterson AH. Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc Natl Acad Sci U S A. 2010;107:472–7.
Blanc G, Wolfe KH. Widespread Paleopolyploidy in Model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004;16:1667–78.
Estep MC, McKain MR, Diaz DV, Zhong J, Hodge JG, Hodkinson TR, et al. Allopolyploidy, diversification, and the miocene grassland expansion. Proc Natl Acad Sci U S A. 2014;111:15149–54.
Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, et al. Repeated polyploidization of gossypium genomes and the evolution of spinnable cotton fibres. Nature. 2012;492:423–7.
Tsai Y-C, Weir NR, Hill K, Zhang W, Kim HJ, Shiu S-H, et al. Characterization of genes involved in cytokinin signaling and metabolism from rice. Plant Physiol. 2012;158:1666–84.
Gruhn N, Halawa M, Snel B, Seidl MF, Heyl A. A subfamily of putative cytokinin receptors is revealed by an analysis of the evolution of the two-component signaling system of plants. Plant Physiol. 2014;165:227–37.
Pils B, Heyl A. Unraveling the evolution of cytokinin signaling. Plant Physiol. 2009;151:782–91.
Pires ND, Yi K, Breuninger H, Catarino B, Menand B, Dolan L. Recruitment and remodeling of an ancient gene regulatory network during land plant evolution. Proc Natl Acad Sci U S A. 2013;110:9571–6.
Jourda C, Cardi C, Mbéguié-A-Mbéguié D, Bocs S, Garsmeur O, D’Hont A, et al. Expansion of banana (Musa acuminata) gene families involved in ethylene biosynthesis and signalling after lineage-specific whole-genome duplications. New Phytol. 2014;202:986–1000.
Smith SA, Donoghue MJ. Rates of molecular evolution are linked to life history in flowering plants. Science. 2008;322:86–9.
Wang X, Wang J, Jin D, Guo H, Lee T-H, Liu T, et al. Genome alignment spanning major poaceae lineages reveals heterogeneous evolutionary rates and alters inferred dates for key evolutionary events. Mol Plant. 2015;8:885–98.
Rausher MD, Miller RE, Tiffin P. Patterns of evolutionary rate variation among genes of the anthocyanin biosynthetic pathway. Mol Biol Evol. 1999;16:266–74.
Lu Y, Rausher MD. Evolutionary rate variation in anthocyanin pathway genes. Mol Biol Evol. 2003;20:1844–53.
Olson-Manning CF, Lee C-R, Rausher MD, Mitchell-Olds T. Evolution of flux control in the glucosinolate pathway in arabidopsis thaliana. Mol Biol Evol. 2013;30:14–23.
Makarova KS, Wolf YI, Mekhedov SL, Mirkin BG, Koonin EV. Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell. Nucleic Acids Res. 2005;33:4626–38.
Moghe GD, Hufnagel DE, Tang H, Xiao Y, Dworkin I, Town CD, et al. Consequences of whole-genome triplication as revealed by comparative genomic analyses of the wild radish raphanus raphanistrum and three other brassicaceae species. Plant Cell. 2014;26:1925–37.
Lespinet O, Wolf YI, Koonin EV, Aravind L. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 2002;12:1048–59.
Renny-Byfield S, Gallagher JP, Grover CE, Szadkowski E, Page JT, Udall JA, et al. Ancient gene duplicates in gossypium (cotton) exhibit near-complete expression divergence. Genome Biol Evol. 2014;6:559–71.
Zhou J, Lemos B, Dopman EB, Hartl DL. Copy-number variation: the balance between gene dosage and expression in drosophila melanogaster. Genome Biol Evol. 2011;3:1014–24.
Hudson CM, Puckett EE, Bekaert M, Pires JC, Conant GC. Selection for higher gene copy number after different types of plant gene duplications. Genome Biol Evol. 2011;3:1369–80.
Rody HVS, Baute GJ, Rieseberg LH, Oliveira LO. Both mechanism and age of duplications contribute to biased gene retention patterns in plants. BMC Genomics. 2017;18:46.
Kaltenegger E, Ober D. Paralogue interference affects the dynamics after gene duplication. Trends Plant Sci. 2015;20:814–21.
Baker CR, Hanson-Smith V, Johnson AD. Following gene duplication, paralog interference constrains transcriptional circuit evolution. Science. 2013;342:104–8.
Bridgham JT, Brown JE, Rodríguez-Marí A, Catchen JM, Thornton JW. Evolution of a new function by degenerative mutation in cephalochordate steroid receptors. PLoS Genet. 2008;4:e1000191.
Keshishian EA, Rashotte AM. Plant cytokinin signalling. Essays Biochem. 2015;58:13–27.
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40:D1178–86.
Matasci N, Hung L-H, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, et al. Data access for the 1000 plants (1KP) project. Giga Science. 2014;3:17.
Yates A, Akanni W, Amode MR, Barrell D, Billis K, Carvalho-Silva D, et al. Ensembl 2016. Nucleic Acids Res. 2016;44:D710–6.
Heyl A, Brault M, Frugier F, Kuderova A, Lindner A-C, Motyka V, et al. Nomenclature for members of the two-component signaling pathway of plants. Plant Physiol. 2013;161:1063–5.
Suzuki T, Miwa K, Ishikawa K, Yamada H, Aiba H, Mizuno T. The arabidopsis sensor his-kinase, AHk4, can respond to cytokinins. Plant Cell Physiol. 2001;42:107–13.
Schaller GE, Doi K, Hwang I, Kieber JJ, Khurana JP, Kurata N, et al. Nomenclature for two-component signaling elements of rice. Plant Physiol. 2007;143:555–7.
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–30.
Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. J Mol Biol. 2001;305:567–80.
Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8:785–6.
Käll L, Krogh A, Sonnhammer ELL. Advantages of combined transmembrane topology and signal peptide prediction--the phobius web server. Nucleic Acids Res. 2007;35:W429–32.
Suzuki T, Sakurai K, Imamura A, Nakamura A, Ueguchi C, Mizuno T. Compilation and characterization of histidine-containing phosphotransmitters implicated in his-to-asp phosphorelay in plants: AHP signal transducers of Arabidopsis thaliana. Biosci Biotechnol Biochem. 2000;64:2486–9.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30:2725–9.
Urao T, Yakubov B, Yamaguchi-Shinozaki K, Shinozaki K. Stress-responsive expression of genes for two-component response regulator-like proteins in arabidopsis thaliana. FEBS Lett. 1998;427:175–8.
Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 2002;30:3059–66.
Sela I, Ashkenazy H, Katoh K, Pupko T. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 2015;43:W7–14.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–3.
Gil M, Zanetti MS, Zoller S, Anisimova M. CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models. Mol Biol Evol. 2013;30:1270–80.
Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–36.
Muse SV, Gaut BS. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994;11:715–24.
Yap VB, Lindsay H, Easteal S, Huttley G. Estimates of the effect of natural selection on protein-coding content. Mol Biol Evol. 2010;27:726–34.
Anisimova M, Gil M, Dufayard J-F, Dessimoz C, Gascuel O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol. 2011;60:685–99.
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61:539–42.
Robinson DF, Foulds LR. Comparison of phylogenetic trees. Math Biosci. 1981;53:131–47.
Schliep K, Potts AJ, Morrison DA, Grimm GW. Intertwining phylogenetic trees and networks. Methods Ecol Evol. 2017;8:1212–20.
Galili T. Dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics. 2015;31:3718–20.
Stolzer M, Lai H, Xu M, Sathaye D, Vernot B, Durand D. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinforma Oxf Engl. 2012;28:i409–15.
The Angiosperm Phylogeny Group. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc. 2016;181:1–20.
Zou X-H, Zhang F-M, Zhang J-G, Zang L-L, Tang L, Wang J, et al. Analysis of 142 genes resolves the rapid diversification of the rice genus. Genome Biol. 2008;9:R49.
Ming R, VanBuren R, Liu Y, Yang M, Han Y, Li L-T, et al. Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol. 2013;14:R41.
Albert VA, Barbazuk WB, dePamphilis CW, Der JP, Leebens-Mack J, Ma H, et al. The Amborella genome and the evolution of flowering plants. Science. 2013;342:1467.
Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, et al. The physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 2008;319:64–9.
Proost S, Van Bel M, Vaneechoutte D, Van de Peer Y, Inzé D, Mueller-Roeber B, et al. PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Res. 2014;43:D974–81.
Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van de Peer Y, et al. I-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets. Nucleic Acids Res. 2012;40:e11.
Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–91.
Kagale S, Robinson SJ, Nixon J, Xiao R, Huebert T, Condie J, et al. Polyploid evolution of the brassicaceae during the cenozoic era. Plant Cell. 2014;26:2777–91.
Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH. Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 2008;18:1944–54.
We are grateful to Prof. Dietrich Ober and to Prof. Lawrence Hobbie for discussing the manuscript. We also thank Anne Kupczok of the bioinformatics support group of the University of Kiel for her advice with the phylogenetic analyses.
The project was funded by the Dahlem Center of Plant Sciences and the University of Kiel (promotion program for women in science).
Availability of data and materials
The data sets supporting the conclusions of this article are included within the article (and its Additional files).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
List of analysed species and used data source, and list of the analysed CHK encoding sequences. (XLSX 46 kb)
Collinear regions, Ks distances. (XLSX 357 kb)
Figure S1. ML Tree CHKs 1, Figure S2. Mr. Bayes CHKs, Figure S3. Reconciled ML tree CHKs, Figure S4. Reconciled ML tree HPTs, Figure S5. ML tree RRBs, Figure S6. Mr. Bayes tree RRBs, Figure S7. Cladogram comparison RRBs. (PDF 2407 kb)
Supplemental alignment of #CHKs, #HPTs, #RRAs and #RRBs. (FASTA 770 kb)
RF distances between phylogenetic trees of CHKs, HPTs, RRAs and RRBs. (XLSX 15 kb)