The prevalence of adaptive evolution in natural populations is one of the most widely investigated questions in evolutionary genetics. The long-held theory that the vast majority of mutations are either neutral or strongly deleterious
, has recently come into question, in light of evidence to the contrary. Several model organisms, including Drosophila melangaster, Mus musculus, Escherichia coli, Capsella grandiflora, and several Helianthus species
, are estimated to have large proportions (40-50%) of amino acid divergence driven to fixation by positive selection. Estimates for other organisms, using comparable approaches, are much lower and thus more consistent with the neutral theory. These include humans
[7, 8], and Arabidopsis thaliana[5, 9–11]. The presence and prevalence of species-wide fixations of beneficial mutations across genomes therefore appears to vary among taxa, and is currently a focal point of interest in the field of molecular evolution.
One possible reason for differences in the amount of adaptive evolution between species is a difference in effective population size
[1, 12]. Effective population size influences the substitution rate of beneficial mutations. Smaller populations will have lower rates of adaptive substitution compared to larger ones, in addition to having an increased number of slightly deleterious mutations fixed by genetic drift
. This will reduce the efficiency and prevalence of both positive and purifying selection in the genome. The genetic model plant Arabidopsis thaliana has been shown to have less efficient positive and purifying selection compared to its close relative Capsella grandiflora, a result that is consistent with the higher population structure, recent range expansion, and lower effective population size of A. thaliana. A comparison of another genetic model system, the house mouse (Mus musculus), to humans showed a similar pattern according to their differences in effective population size
, and an analysis of six sunflower species (Helianthus) showed a positive correlation between effective population size and the rate of adaptive evolution by positive selection
In addition to the effects on the rates of positive and negative selection, species’ differences in effective population sizes can influence the strength and impact of balancing selection. On the one hand, severe reductions in effective population size could lead to a loss of the diversity that is maintained by balancing selection, which could have important deleterious consequences. For instance, when considering a species’ ability to maintain resistance to parasites, balancing selection is critical at the major histocompatibility locus (MHC) in vertebrates (fish
; prairie chickens
; deer mice
), and balancing selection has also been shown in a number of cases in plants at disease resistance (R) loci (Arabidopsis
. Strong reductions in effective size could greatly reduce the possibilities of maintaining resistance in populations
. Alternatively, strong balancing selection, either past or present, could maintain high polymorphism in heavily bottlenecked populations at specific loci, showing a pronounced retention of diversity at specific regions of the genome, despite genome-wide loss of diversity. At the MHC locus, several cases of striking retention of diversity under severe bottlenecks have been found
[16, 19]. In other cases, a loss of balancing selection has been observed at MHC
. A similar pattern has been observed at the plant self-incompatibility (SI) locus, as long-term allelic variation maintained by balancing selection at SI was lost following an ancient population bottleneck of the Solanaceae
Here, we investigate the comparative population genetics of a set of disease resistance (R) genes in two plant species, Capsella grandiflora and Capsella rubella, two members of the Brassicaceae. Capsella grandiflora is an annual, self-incompatible herb that is closely related to the genetic model Arabidopsis thaliana (~20 MYA divergence time,
). Capsella rubella, a recently diverged relative, is self-fertilizing, and has experienced a severe population bottleneck. The bottleneck and change in mating system resulted in a major reduction in genetic diversity and effective population size
. The fact that speciation is recent, coupled with the severity of the diversity reduction, make these two species a useful system in which to explore the evolutionary fate of selected genes, in light of a dramatic shift in genetic background.
Capsella grandiflora is native to Western Greece, and its geographic range is largely restricted to this area, in addition to small populations in Albania and Northern Italy
[29, 30]. Its effective population size is large, approximately 500,000 individuals, and appears to have been relatively stable over a long time period, as it shows no evidence for recent changes in population size
. There is also relatively little population structure in this species, and the effective rate of recombination is high
[28, 31]. As stated earlier, selection has been inferred to be highly efficient in this species, with over 40% of amino acid divergence inferred to be subject to positive selection
Capsella rubella diverged from C. grandiflora in a single event that is estimated to have taken place within the last 20,000 years
[28, 32]. Speciation was associated with the breakdown of self-incompatibility in C. rubella, and this species has evolved to be highly self-fertilizing
. The transition in mating system was followed by a geographic range expansion throughout most of Southern Europe, as well as Middle Europe, North Africa, Australia, and North and South America
[29, 30]. Genetic diversity is greatly reduced in C. rubella compared to C. grandiflora, even more so than would be expected from inbreeding alone, due to a nearly complete population bottleneck
. Capsella rubella therefore has a much smaller effective population size than C. grandiflora, approximately 100 to 1500 fold smaller, as well as a lower effective recombination rate
. These two species represent a recent and rapid dramatic shift in genomic characteristics, including a mating system transition, a reduction in genetic diversity and effective population size following a population bottleneck, and recent widespread expansion in geographic range. Despite this severe bottleneck, however, there is strong heterogeneity in the retention of polymorphism at different loci in C. rubella[28, 31]. One possible explanation for this heterogeneity could be the maintenance of balancing selection and/or historical balancing selection having led to a higher retention of diversity at a subset of genomic regions.
The genes we investigated here are a subset of the genes thought to be involved in plant immune system function, the disease resistance (R) genes. These genes are abundant in every plant species investigated to date
, and can be subdivided into classes based on their coding domains. The largest class are characterized by a nucleotide binding site combined with a region of leucine-rich repeats, referred to as the NBS-LRR region, which is thought to be the site of pathogen recognition. The R-genes are typically characterized by a gene-for-gene interaction
[35, 36], whereby each gene in the plant specifically recognizes an avirulence (avr) gene in the pathogen, and recognition triggers a defense response in the plant.
Evidence for natural selection on plant R-genes, including positive and balancing selection, has been well documented for several well-characterized genes in the genetic model Arabidopsis thaliana, including RPM1, RPS2, RPS4, RPS5, RPP1, RPP13, and RPP8
[20, 22, 37–41]. The majority of clear evidence for selection was found to be balancing selection in these genes, with the exception of RPS4, which has undergone a recent selective sweep
. Evidence for balancing selection has also been found in Arabidopsis lyrata[42, 43], and differences in the types of selection at the same R-genes have been observed between Arabidopsis thaliana and A. lyrata. For example, A. lyrata contains a segregating presence-absence polymorphism at the R-gene RFL1 that is not observed in A. thaliana (RPS5)
. Polymorphism patterns at the R-gene RPP13 indicate positive selection at A. thaliana, but show purifying selection in A. lyrata. Plant R-genes often segregate for alleles that confer either resistance or susceptibility to a specific pathogen. In the case of balancing selection, both alleles are maintained over long time periods. The mechanism for this is proposed to be frequency-dependent selection, where resistance genes are advantageous when the pathogen is common, but incur a fitness cost when pathogens are rare. The result is a cycle of resistance and susceptibility alleles that alternate in frequency following the dynamics of the pathogen population
[20, 22]. Although it has not been demonstrated directly to date, another possible mode of balancing selection could arise when alleles at a single locus show varying specificities to different pathogen strains, and they are subject to frequency-dependent selection
. On the other hand, R-genes may also often experience relaxed constraint under conditions where target pathogens are absent and there is no cost of resistance
. In general, large surveys of R-genes generally show more clear evidence for balancing selection than positive selection, although this is still only at a subset of loci
[40, 45]. Overall, the patterns in Arabidopsis species indicate that new R-gene alleles are constantly being generated, but only briefly maintained, which is a scenario closer to diversifying selection
Here, we aim to investigate the consequences of a severe population bottleneck and mating system transition on the polymorphism patterns at R-genes in the two Capsella species. Genetic signatures of natural selection that are present in Capsella grandiflora at disease resistance genes may be diminished or absent in C. rubella if the bottleneck has effectively eroded allelic variation generated by selection. However, if the selective signatures at R-genes in C. grandiflora are also present in C. rubella, this would suggest that strong balancing or diversifying selection associated with pathogen resistance, or a history of such selection in C. grandiflora, has caused the allelic diversity in these regions to be maintained in C. rubella, despite a genome-wide loss of neutral variation. We take advantage of an extensive dataset on coding region polymorphism in the two species at 283 reference genes, in order to contrast R-gene diversity in a comparable population sample with the genome-wide pattern.