Our analyses of protein-coding sequences highlighted lineage- and gene-specific evolutionary patterns in two species of Cardamine characterized by distinct habitat preferences, life-histories and, possibly, breeding systems. C. resedifolia and C. impatiens are not sister species ; hence, the substitution rates estimated in the present study apply to the lineages leading to the two species after the split from their common ancestor, rather than to these specific taxa as such. However, C. resedifolia, C. impatiens, and the clades they belong to, are characterized by divergent phenotypic and life-history traits that have profoundly shaped their evolutionary histories; for instance, the clade comprising C. resedifolia (group A in Figure 1 of Carlsen et al. ) is basal to the Cardamine radiation and includes only perennial species living in high altitude alpine habitats with well-developed petals indicating the existence of a mixed mating system. In contrast, C. impatiens is, to our knowledge, the only annual selfing species of its group (group C in Figure 1 of Carlsen et al. ), which includes species that are mainly found at moderate elevations. C. impatiens is further characterized by having very reduced or no petals, in agreement with a predominantly selfing mating system [29, 30]. Thus, the observed patterns of molecular evolution reflect the differences in life history traits and habitats between the two lineages.
The effects of such differences on the rate of molecular evolution are quite complex. For instance, the rate of non-synonymous substitution was significantly higher for the C. resedifolia lineage than for that of C. impatiens. Previous studies have reported a higher substitution rate in annual plants compared to perennial plants [52–54] (but see ). However, our observations are in contrast with this prediction, since the annual C. impatiens had lower substitution rate than the perennial C. resedifolia. Moreover, since the rate of synonymous substitution was similar between lineages, there is no support for the hypothesis that mutation rate is higher along the C. resedifolia lineage than along that of C. impatiens.
The interspecific differences in non-synonymous substitution rates could also be the result of differences in effective population size [56, 57]. For instance, low effective population size can affect the substitution rate by allowing slightly deleterious mutations to escape purifying selection and reach fixation (reviewed in ). Alternatively, high effective population size can increase the efficiency of positive selection by facilitating the fixation of favorable alleles in genes undergoing adaptive evolution . Thus, a high dispersion in the distribution of d
N across the genome is expected to be associated with a large effective population size, especially when there is recombination (e.g. [49, 59]). The analysis of non-synonymous substitution fixation rate revealed that the variance in d
N was significantly larger in C. resedifolia than in C. impatiens (6.6 × 10-15
vs. 9.2 × 10-15; F test, P < 10-15, also when correcting for gene length). This difference in protein evolution heterogeneity is thus consistent with more efficient positive selection and higher effective population size in C. resedifolia. Both of these factors, in turn, would explain why d
N is higher in this lineage than in C. impatiens. Preliminary analyses of recombination levels and polymorphism suggest indeed that selfing rates are lower in C. resedifolia, and the effective population size is slightly larger than in C. impatiens (Ometto and Varotto, unpublished results).
Intriguingly, mean ω was larger in C. impatiens than in C. resedifolia. The fact that in C. impatiens ω was considerably lower than one is compatible with either moderately positive or relaxed selection. Three lines of evidence suggest a reduced efficiency of purifying selection in C. impatiens, rather than pervasive positive selection at a genome-wide level. Firstly, the analyses on d
N suggest that positive selection was more efficient along the C. resedifolia lineage than the C. impatiens lineage. Secondly, only one gene (using a FDR threshold of 0.20; Table 2; Additional File 5) exhibited strong evidence for site-specific positive selection along the C. impatiens branch. By comparison, seven genes exhibited strong evidence for site-specific positive selection along the C. resedifolia branch. Finally, the analysis of codon usage failed to detect differences in the efficiency of positive selection between the two lineages. However, the contrast between rates of non-synonymous substitution and of ω warrants some caution in drawing firm conclusions. The most likely explanation for the opposite patterns observed in d
N and d
S lies on the high heterogeneity in substitution rate among genes. For instance the correlations and the coefficients of determination in the regressions between d
S and ω (ρ = -0.314, P < 10-15, R2 = 0.052 in C. resedifolia; ρ = -0.292, P < 10-15, R2 = 0.037 in C. impatiens) and between d
N and ω (ρ = 0.880, P < 10-15, R2 = 0.326 in C. resedifolia; ρ = 0.879, P < 10-15, R2 = 0.305 in C. impatiens), suggest that substitution rates are relatively weak predictors of the actual selective pressure experienced by a gene. The statistically significant correlation between synonymous substitution rate and ω is rather unusual: previous studies established that the correlation is caused by a combination of (i) adjacent substitutions  and (ii) of a bias on substitution rate estimation due to either low divergence and/or statistical methods [60, 61]. Appropriate analyses will be necessary to unravel the effects of such factors in our system. The complex dynamics of substitution rates are also evident in the statistically significant correlation to gene length. The influence of gene length on sequence evolution had been previously reported for Populus tremula , stressing the importance of accounting for all diverse determinants of levels of gene and protein evolution. Additional studies, including analyses on intraspecific polymorphism, will be certainly necessary to disentangle the neutral and selective forces that have shaped such patterns [62, 63].
The molecular evolution analyses in stress-related genes also revealed important lineage-specific patterns that may be associated with the distinct life-history traits and habitats of C. resedifolia and C. impatiens. For instance, similar selective regimes affected the evolution of genes involved in stress response in the C. impatiens lineage. Conversely, there was a correlation between the type of stress response and the rate of molecular evolution along the C. resedifolia lineage; for example, genes involved in photosynthesis evolved slower than other genes, consistent with selective constraints that limited the accumulation of non-synonymous substitution. This makes sense given their involvement in a process that is particularly relevant in the high altitude habitat of C. resedifolia. Strangely, in C. resedifolia selection acted in opposite directions in the two functional classes associated with cold responses. That is, compared to the genome-wide levels of selective pressure, genes that were identified as involved in cold response based on functional studies (i.e., CRG genes) displayed significantly lower levels of selective pressure, while cold responsive (CGO) genes were under more selective constraints. Since there was little overlap between the two classes (Additional File 1), the discrepancy in the levels of selective pressure between CGO and CRG genes may have captured different selective pressures acting along the cold response pathway. However, the difference may also stem from the approaches used to define the two cold-response functional classes. Specifically, the gene ontology annotations of the CGO genes were taken from the TAIR database , which is a less accurate source of functional information than the direct assays used to define CRG genes. For instance, the TAIR annotations may be based solely on sequence or structural similarity, and a gene may have been annotated as cold responsive because it is indirectly up- or down-regulated in plants exposed to low temperatures (e.g. more than 3,000 genes were reported as cold responsive in A. thaliana ). However, without a comprehensive knowledge on the response pathway it is difficult to evaluate their role in cold response in Cardamine, especially considering that their expression patterns may differ from those of A. thaliana.
The levels of selective pressure were also associated with the duration and pattern of gene expression. For instance, in C. impatiens and C. resedifolia, ω was lower for genes that were expressed only briefly during specific stress responses than for genes that were up-regulated over a longer period of time (as inferred by A. thaliana expression patterns). Most importantly, the correlation existed only for specific stress responses in each species, suggesting habitat-specific selective pressures. In particular, in C. resedifolia, ω was significantly affected by the duration of the responses to osmotic, salt and cold stress. The responses to these three stresses involve partially overlapping pathways [65–67], as cold stress causes membrane leakage and, as a consequence, the activation of the physiological responses also observed in high salt and osmotic stresses . These results support the adaptive relevance of these genes in response to the severe temperature changes that this species experiences in alpine habitats. On the other hand, in C. impatiens ω was significantly correlated with the extent of up-regulation in those genes involved in UV-B stress response. C. impatiens is a nemoral species that is likely to be particularly sensitive to the effect of exposure to UV-B light. Therefore, it is reasonable to expect that this species has evolved response mechanisms based on gene up-regulation to cope with discontinuous UV stress. It will be extremely interesting to experimentally measure the expression of the Cardamine genes in the functional classes considered, as species-specific adaptive variations in expression levels cannot be excluded.
While unequal selective pressures across the stress-related functional classes were well-documented here, we detected positive selection in only a few single genes. In plants, previous studies have found evidence of genome-wide positive selection in some species (e.g. [69–71]), but in most cases there was little indication of widespread adaptive evolution (e.g. [13, 72–75]). Other studies have identified positive selection of some genes involved in stress response (e.g. cold-hardiness in conifers ; drought stress in wild tomatoes ). The most likely explanation for the low rate of adaptive evolution is that most plants have low effective population size [13, 76], which ultimately diminishes the efficiency of positive selection. Estimation of the effective population size of C. resedifolia and C. impatiens will be useful to verify whether this parameter can explain the scarce evidence for positive selection in our dataset.
Four methodological issues may have further resulted in an underestimation of the genes targets of positive selection. The first is that the power of codon substitution models is fairly conservative compared to that of models incorporating polymorphism data (i.e., McDonald-Kreitman test ). The second is that our reciprocal best-hit approach was prone to miss genes with high sequence divergence caused by adaptive evolution, as it was intended to avoid an overestimation of divergence by comparing paralogues. In particular, this approach may have overlooked duplicated genes, which are common in A. thaliana [78, 79] and in other plants (e.g. cottonwood , grapevine ), and which can undergo sub- or neo-functionalization driven by positive selection . For instance, our dataset included only ~11% of all A. thaliana genes, and we analyzed between 17% and 48% of the genes annotated as involved in the stress responses considered in the present study. An examination of our dataset revealed that many of the well-characterized transcription factors involved in cold response (e.g. CBF genes, ICE1) were absent. This indicates either that we are missing many of the genes upstream of the stress response pathways, or that in Cardamine these transcription factors may be differently regulated than in Arabidopsis thaliana or do not participate to the cold response. However, it is unlikely that this pathway is missing, since cold response is well-conserved across distantly related taxa (e.g. Arabidopsis ; Citrus ; Solanum ; Poaceae ). A third issue is related to the use of partial genes, which may have reduced the power of the likelihood ratio tests as a result of an insufficient number of informative sites. Finally, genes expressed at a low level, including the aforementioned transcription factors, may be missing from our dataset because of insufficient coverage or normalization. Because genes expressed at a low level are those evolving faster, this bias in gene representation could contribute to the relatively low number of rapidly evolving genes identified in this study. Deeper sequencing efforts could undoubtedly improve the situation by increasing both coverage and average transcriptome length.
One C. impatiens gene and seven C. resedifolia genes showed signatures of positive selection under the branch-sites model of codon substitution. The C. impatiens gene orthologue of AT4G17520 has not been functionally characterized in A. thaliana, although it is known that the protein encoded by this genes displays homology to the hyaluronan/mRNA binding protein family, a group of proteins binding both specific RNA and a high-molecular-mass polysaccharide extremely abundant in the connective tissue and extracellular matrix of animals . Specific studies will be necessary to understand its role in adaptive processes in the C. impatiens lineage. As for the C. resedifolia candidate genes, no functional information is available for the orthologue of AT1G21680, which codes for a protein of unknown function with homology to TolB, a protein involved in outer membrane stability and uptake of biomolecules in E. coli . Instead, AT3G06130 is known to code for a protein putatively involved in metal ion binding, and is similar to proteins of the heavy metal transport/detoxification superfamily. Heavy metal hyperaccumulation has been documented in several Brassicaceae species  and C. resedifolia has been reported to accumulate large quantities of nickel from the nickel-rich debris of the glacial till, where this species usually grows . Therefore, it is possible that the signature of positive selection identified in this orthologue is related to high contents of heavy metals in soil experienced by C. resedifolia. A third gene, orthologue of AT1G14610, is a valine-tRNA synthetase. Given the multiple metabolic pathways in which valine is involved (e.g. glucosinolate  and pantothenate biosynthesis , conjugation to plant hormones [93, 94]), it would be highly speculative to associate this gene to adaptive processes in the C. resedifolia lineage.
Based on functional evidence from Arabidopsis and other plant species, two other genes identified as putative targets of positive selection in C. resedifolia may play major roles in the response against bacterial pathogens and insect herbivory. The first gene, AT5G20900, is part of the Jasmonate-ZIM-domain protein family [95, 96], whose members are involved in response to wounding and herbivory . The other candidate is the orthologue of AT1G54040, which codes for the Arabidopsis epithiospecifier protein (ESP). ESP catalyzes the formation of simpler nitriles and epithionitriles from glucosinolates, thus modulating the release of isothiocyanate, a metabolite involved in herbivory and pathogen defense in Brassicaceae . In two closely related Boechera species (Brassicaceae), the level of glucosinolates is negatively correlated with elevation preferences and growth rates, but positively correlated with drought tolerance . However, the response to herbivory as a function of elevation is not uniform across plants, with some species experiencing more (e.g. ) and other less (e.g. [101–103]) damage with increase in elevation. Interestingly, a positive correlation with levels of insect herbivory was found for C. cordifolia exposed to full sunlight, possibly resulting from moderate water stress associated with a different insect guild compared to shadowy environments . These observations provide the framework to experimentally test whether the signature of positive selection identified in AT5G20900 and AT1G54040 orthologues might be related to the higher exposition to sun, lower water availability or slower growth rate characterizing C. resedifolia as compared to C. impatiens. Interestingly, the orthologues of AT1G07890 and AT3G52910, may also be related to the light regimes characterizing C. resedifolia and C. impatiens habitats. AT1G07890 codes for a cytosolic ascorbate peroxidase (APX1) that scavenges hydrogen peroxide in plant cells , thus reducing the accumulation of reactive oxygen species (ROS) that cause cellular damage through protein oxidation . The product of AT3G52910 is a transcriptional activator of the growth regulating factor family. Genes from this family have been demonstrated to be involved in the regulation of cell expansion and division in leaves, cotyledons and petals [107, 108]. Strikingly, the product of AT3G52910 is one of the proteins oxidized in apx1 mutant plants in response to moderate light stress , indicating that it could also be part of the signaling cascade activated by ROS stress.
The site codon substitution models also identified putative targets of positive selection that deserve further characterization. In particular, the orthologue of AT2G31610 codes for a ribosomal protein (RPS3A) that is involved in response to salt and genotoxic stress in A. thaliana . This gene is part of the ribosomal protein S3 family, which includes three paralogues (AT2G31610, AT3G53870 and AT5G35530) with similar sequences and function. Gene families can undergo relaxed selection immediately following duplication ; however, this does not seem to be the case for AT2G31610, as the phylogenetic tree clearly indicates that the duplication occurred before the split between the Arabidopsis and Cardamine lineages (Additional File 6).
Additional studies will be necessary to corroborate these findings and link the evolutionary pattern of each gene to its phenotypic effects. In particular, the use of intraspecific variation and functional analyses in the model species A. thaliana will be crucial to ascertaining whether positive selection or relaxed selection accelerated the evolution of these genes and their relevance in adaptive processes in Cardamine.