Gene duplication is considered to be a major mechanism in the generation of evolutionary novelty and adaptation [23, 57, 58]. In plants, gene duplication followed by functional divergence is particularly important for the diversification of biochemical metabolites [9, 36, 37, 59]. Isoprenoids are a large and diverse class of metabolites  derived from five-carbon isoprene units, which can be classified into regular and irregular forms depending on the bond between isoprene units or monoterpenes, sesquiterpenes and diterpenes according to the number of isoprene units [1, 16, 61]. FDSs are involved in the biosynthesis of sesquiterpenes, and are encoded by a small gene family. It seems that this gene family has experienced lineage-specific gene expansions multiple times. For example, the two copies of Arabidopsis thaliana formed a species-specific clade (Additional file 3). FDS copies from Oryza sative and Sorghum bicolor formed a Poaceae-specific clade (Additional file 3). Based on phylogenetic analyses (Figure 1B), we clearly showed that two rounds of gene duplication occurred in the evolutionary history of the Asteraceae FDS gene family. The first round of duplication appears to have occurred in the common ancestor of the Asteraceae, since the genes from Asteraceae formed a monophyletic group separated from the clusters for species from other families (Figure 1B and Additional file 3), and even from the species of two families closely related to Asteraceae (Nymphoides peltata of Menyanthaceae and Platycodon grandiflorus of Campanulaceae; The Compositae Genome Project, personal communication). The FDS gene duplications in Asteraceae might contribute to the diversity of their sesquiterpenes because of the role of FDSs in the biosynthesis of sesquiterpenes. This is consistent with the large number of sesquiterpenes that have been extracted from Asteraceae [62, 63].
The second duplication, which generated the lineage of CDS, occurred after the divergence of the Mutisieae from the other tribes of Asteraceae and before the divergence of the tribe Anthemideae. The evidence includes 1) one FDS copy in G. anandria (Mutisieae) clustered with FDS1, while the other clustered with the ancestor of FDS2 and CDS; 2) all of the sampled Anthemideae species had three FDS copies (FDS1, FDS2, CDS); and 3) CDS was also cloned from Aster ageratoides (tribe Astereae) and Helianthus annuus (tribe Heliantheae) that are close relatives of the tribe Anthemideae [39, 41]. After the origin of CDS, it developed a new function, involved in the biosynthetic pathway of irregular monoterpenes. The CDS gene is common in the tribe Anthemideae, which is consistent with the fact that its products are typically found in Anthemideae species. Our results suggested that the duplication and divergence of FDS genes has played a major role in determining the novelty of irregular monoterpenes in Anthemideae.
After gene duplication, CDS accumulated amino-acid changes toward the change of a substrate. CDSs have four substitutions in the substrate-binding and catalysis sites of FDS: T201 → G244 (or S244), F239 → Y281, D243 → N285, and R351 → G393. Substitutions can be divided into either radical or conservative, based on the biochemical properties of the amino-acids [64–67]. For example, substitutions associated with a change of polarity group are defined as radical and those with the polarity group unaltered as conservative [65, 67]. In the present study, except for T201 → S244, all the substitutions are radical, which is consistent with the fact that the evolution of new function requires alterations in the biochemical properties of the amino-acid sequence . F239 and R351 in FDS are involved in binding of the nucleophilic substrate IPP [30, 31]. The radical replacements of these two sites in CDS are in good agreement with the finding that CDS does not prefer IPP as a nucleophilic reagent. F239 in FDS binds IPP through hydrophobic interactions [30, 31]. The corresponding residue is Y281 in CDS, which, owing to the polarity of the hydroxyl, may not interact with IPP through hydrophobic interactions. R351 in FDS interacts with the pyrophosphate moiety of IPP through water-mediated hydrogen bonds [30, 31] (Figure 2B). The radical replacement of R351 → G393 involves changes in the charge (R is positively-charged and G is nonpolar) and the molecular volume of the amino-acids (R has a larger side-chain than G), which could affect the IPP binding. Hence, these two substitutions might explain why CDS does not prefer IPP as a substrate.
Substitutions can change the function of duplicated genes, and may be due to either a relaxation of purifying selection or to the action of positive selection [19, 23, 32, 33]. Branch-site model A provided evidence of positive selection acting at 29 (p >80) sites along the branch ancestral to CDS (Figure 2A). Interestingly, Y281 (F239 in FDS) and N285 (D243 in FDS) noted above were found to be under positive selection by the branch-site model, suggesting the important role of positive selection in the functional evolution of CDS. The biochemical context of substitutions that were under positive selection is consistent with a scenario involving the adaptive evolution of CDS. These sites (p>80%) were scattered throughout the primary sequences (Figure 2A), whereas in the three-dimensional structures (Figure 2C), they clustered in the large central cavity. Among these sites, two (102M and 103V) were located in conserved region I, and nine (279M, 281Y, 285N, 290T, 293D, 295D, 300T, 306E and 307C) were located in conserved region V. They are all conserved in the FDS gene family and important for the precise function of the protein [27–31]. The mutations at these sites suggested that their importance for the enzymatic activity of FDS was altered in CDS. A few sites that were detected to have experienced positive selection with high probability may be responsible for the novel function of CDS. Further studies using site-directed mutagenesis are needed to determine whether these positively-selected sites, especially those with high posterior probability (103V, 218T and 290T), confer an ability on CDS to discriminate different substrate types.
It has been proposed that positive selection promotes the functional divergence of gene family members encoding enzymes involved in secondary metabolism because secondary products are considered to be a response to challenges imposed by the environment [36, 37]. For example, the methylthioalkylmalate synthase gene (MAM) controls an early step in the biosynthesis of glucosinolates, which play an important role in Arabidopsis thaliana and other crucifers’ defense against herbivorous insects . Benderoth et al.  found that positive selection had driven the evolution of MAM2 that originated from a lineage-specific duplication of MAMa in A. thaliana. Another example is the SABATH gene family of methyltransferases, which encodes enzymes catalyzing the formation of a variety of secondary metabolites in plants such as those that contribute to floral scent and plant defense. Branch-site analysis suggested that positive selection for a single amino-acid change promoted the substrate discrimination of salicylic acid methyltransferase . Here, We provide an additional example in which positive selection has promoted the functional divergence of duplicated genes in a secondary metabolic pathway. The adaptive evolution of the CDS gene at the molecule level is consistent with the adaptive roles of the products of CDS (irregular monoterpenes), and plays an important role in plant survival.
Many models have been proposed to explain the evolutionary fates of duplicated genes, including neofunctionalization, duplication-degeneration-complementation (or subfunctionalization) and escape from adaptive conflict (EAC) models [18–25]. Compared with CDS gene in which adaptive selection has been detected, FDS2 gene, as a sister duplicated copy of CDS, did not show any signature of positive selection. So, it seems that the evolution of FDS2 and CDS are not consistent with the predictions of the EAC model, where both duplicated copies would evolve under positive selection [24, 25]. Particularly, CDS has a ~ 50-amino acid extension at its N-terminal, which was identified as a plastidial transit peptide, in agreement with the Category II-c model (Gene duplication with a modified function) . However, the peptide of CDS shares little similarity with any other sequence database entries by Blast searches. Further work including functional analysis and the exploration on the origin of the peptide of CDS would provide insights into the evolutionary fate of the FDS gene family in Asteraceae.