Recombination rate and protein evolution in yeast

Background Theory and artificial selection experiments show that recombination can promote adaptation by enhancing the efficacy of natural selection, but the extent to which recombination affects levels of adaptation across the genome is still an open question. Because patterns of molecular evolution reflect long-term processes of mutation and selection in nature, interactions between recombination rate and genetic differentiation between species can be used to test the benefits of recombination. However, this approach faces a major difficulty: different evolutionary processes (i.e. negative versus positive selection) produce opposing relationships between recombination rate and genetic divergence, and obscure patterns predicted by individual benefits of recombination. Results We use a combination of polymorphism and genomic data from the yeast Saccharomyces cerevisiae to infer the relative importance of nearly-neutral (i.e. slightly deleterious) evolution in different gene categories. For genes with high opportunities for slightly deleterious substitution, recombination substantially reduces the rate of molecular evolution, whereas divergence in genes with little opportunity for slightly deleterious substitution is not strongly affected by recombination. Conclusion These patterns indicate that adaptation throughout the genome can be strongly influenced by each gene's recombinational environment, and suggest substantial long-term fitness benefits of enhanced purifying selection associated with sexual recombination.


Background
Genetic drift is expected to overpower natural selection when selection is weak and effective population size (N e ) is small [1][2][3]. Recombination increases the effective population size in which genes evolve by reducing interference between linked loci under selection [4,5]. As a result, recombination is expected to facilitate the spread of beneficial mutations and the elimination of deleterious mutations [6,7]. Because recombination rates vary between different regions of a genome [e.g. yeast: [8]; Drosophila: [9]; Mammals: [10]; plants: [11]], adaptation at the molecular level might be strongly affected by each gene's recombinational environment -genes evolving in low recombination regions are expected to be poorly adapted relative to those in high recombination regions [12,13].
Comparative genome analyses are potentially useful for assessing whether recombination promotes adaptation because long-term evolutionary processes are reflected in patterns of genetic divergence between species [13][14][15][16][17][18][19]. However, genomic approaches face a major challengemultiple processes can contribute to evolutionary divergence between species and each predicts a different relationship between protein evolution and recombination rate ( Table 1). The rate of neutral substitution is unaffected by recombination and will tend to reduce correlations between recombination rate and total nucleotide divergence between species [20], mildly deleterious (i.e. nearly neutral) substitutions will generate a negative correlation between recombination and divergence [21], and adaptive substitutions generate a positive correlation between recombination and divergence [22].
The relationship between recombination and divergence will be shaped by the predominant process of molecular evolution (i.e. neutral; slightly deleterious; adaptive). To test whether recombination facilitates adaptation throughout the genome (by enhancing purifying and positive selection), genes evolving under purifying selection and those evolving via positive selection should be analyzed separately, as each predicts a different relationship between divergence and recombination rate. Unfortunately, inferring the processes causing molecular divergence has traditionally been problematic without detailed within-and between-species genetic data [23], which limits the extent of the genome that can be analyzed.
Here we take an alternative approach. By capitalizing on an extensive volume of yeast (Saccharomyces spp.) genomic and polymorphism data, individual genes can be partitioned by their 'opportunity' for slightly deleterious evolution, and benefits of recombination can be tested. Furthermore, direct estimates of local recombination rates are available for most genes in the S. cerevisiae genome, whole-genome sequencing projects provide data for estimating protein evolutionary rates, and previous studies have revealed that the average strength of selection predictably varies between genes with different functional attributes [reviewed in [22]; see below]. Evolutionary theory predicts that mildly deleterious substitutions will accumulate readily in genes subject to weak selection, but not in those subject to strong selection (see Fig. 1). To the extent that substitutions differentiating species are often deleterious (e.g., genes with weakly-selected mutations; Fig. 1 -small |N e s d |), increased recombination is expected to decrease the rate of protein evolution by inhibiting the spread of deleterious mutations. However, in genes with little opportunity for slightly deleterious divergence (e.g.  is twice as strong as s d ; C. s d is twice as strong as s b . When selection is weak, mildly-deleterious substitutions outnumber adaptive substitutions; divergence in weakly-selected genes is therefore predicted to be negatively correlated with local recombination rate. As the strength of selection increases, adaptive substitutions predominate; divergence in strongly-selected genes is therefore predicted to be positively correlated with recombination rate. The adaptive divergence rate is u b (1 -e -4Nsp )/(1 -e -4Ns ), where s is the average benefit conferred by each mutation, and p is the initial frequency of each mutation (results for p = 0.0001 are shown). The slightly deleterious rate is u d (1 -e -4Nsp )/(1 -e -4Ns ), where u d is the deleterious mutation rate, and s is the average cost of each mutation (equations modified from [30]; N e = N). The assumption that the beneficial mutation rate is much smaller than the deleterious mutation rate is supported by theory and mutation accumulation experiments [38,[55][56][57]; but see 58,59].  2 The selection coefficient, s (-1 ≤ s < ∞), reflects the strength and direction of selection acting upon new mutations during the process of fixation genes subject to strong purifying selection; Fig. 1 -large |N e s d |), recombination will increase the rate of divergence by enhancing the spread of beneficial mutations (to the extent that such mutations frequently arise).

Inferring the fitness effects of deleterious mutations
Highly expressed genes appear to evolve under stronger purifying selection than low-expressed genes (although the mechanistic basis of this pattern is still debated; [24-27]). Consequently, gene expression level is a good predictor of the average fitness effect of deleterious mutations. Experimental gene knockouts have also identified suites of genes that are essential for survival, while many others are nonessential. To the extent that whole gene knockout phenotypes reflect the fitness effects of individual mutations, mutations in essential genes are predicted to have larger fitness effects than mutations in nonessential genes [25,28-30]. Lastly, proteins have variable numbers of interaction partners (protein-protein interactions per gene -PPI -range up to nearly 300 PPI in yeast; Connallon & Knowles unpub.), which indicate the level of constraint due to pleiotropy [25]. Because individual mutations are likely to disrupt more cellular processes in genes with many PPI compared to those with few PPI, purifying selection is expected to be stronger in genes with many PPI [31].
Previous inferences of the strength of purifying selection acting on different gene categories are based on an observed elevated rate of nonsynonymous substitution in low-expressed nonessential genes with few PPI [25], which assumes that elevated rates of substitution are caused by genetic drift. An alternative possibility is that rapidly evolving genes undergo frequent bouts of positive selection. To test this assumption, we analyzed available polymorphism data from S. cerevisiae genes (see Methods and Additional file 1). Under a neutral/nearly-neutral model, patterns of within species polymorphism are expected to mirror patterns of interspecific substitution (i.e. genes with high substitutions rates also exhibit high levels of polymorphism [32]). Positive selection decouples patterns of polymorphism and divergence and is expected to increase the number of substitutions relative to polymorphisms [33].
Low-expressed genes harbor more nonsynonymous polymorphisms (represented by the P n /P s ratio; P n and P s refer to nonsynonymous to synonymous polymorphisms, respectively) and substitutions (D n /D s ), than highly expressed genes, consistent with the neutral/nearly-neutral model ( Fig. 2; see Additional file 1). Low-expressed genes also harbor higher levels of moderate-to high-frequency polymorphism (i.e. "non-singleton" polymorphism). High frequency polymorphisms are not expected to be strongly deleterious (see [34]), but rather, will consist of neutral and slightly deleterious mutations -mutations that can potentially become fixed via genetic drift. D n /D s ratios are significantly lower than P n /P s ratios for all gene expression categories (G test; singletons included: P < 0.0001; singletons excluded: P < 0.005, except for upper 25% expressed genes: P > 0.1), indicating that the predominant pattern of selection on yeast genes is purifying. Similar results are reached by partitioning the data into gene essentiality categories (see Additional file 1; a meaningful statistical analysis based on PPI was not possible due to small sample size). The results strongly support previous inferences of selection intensity based on divergence data, and suggest that nonessential genes with low expression are relatively likely to evolve under a nearlyneutral process.

Recombination and protein divergence
Analysis of 4786 genes in the yeast species Saccharomyces cerevisiae and S. paradoxus shows that nonsynonymous divergence is weakly negatively correlated with recombination rate (partial r = -0.052, P < 0.01 after Bonferroni correction; as previously reported by Pal et al. [14]). This negative relationship is markedly stronger for lowexpressed genes, particularly nonessential genes with few PPI (Fig. 3). In contrast, the divergence of highly expressed genes tends to correlate positively with recombination rate, though all such associations are weak. For both classes of nonessential genes as well as for the entire data-Ratios of replacement to silent polymorphism (P n /P s ) in S. cer-evisiae, and substitutions (D n /D s ) between S. cerevisiae and S. paradoxus Figure 2 Ratios of replacement to silent polymorphism (P n /P s ) in S. cerevisiae, and substitutions (D n /D s ) between S. cerevisiae and S. paradoxus. Results were obtained by pooling polymorphism and divergence data for multiple genes within each expression category (see Fig. S1 for similar results using a different approach). P n / P s ratios are lower in highly expressed genes: upper vs. lower 50% with singletons, P = 0.172; upper vs. lower 25% with singletons, P = 0.026; upper vs. lower 50% without singletons, P = 0.203; upper vs. lower 25% without singletons, P = 0.023. D n /D s ratios are lower for highly expressed genes for all comparisons (P < 0.0001).
These results clearly show that recombination can influence the rate of protein evolution at a genome wide scale and that the impact of recombination rate variation is strongest for low-expressed, nonessential genes with few PPI. Associations between recombination and divergence rate cannot be explained by covariation between recombination rate and several variables that independently affect protein evolution (the effects were controlled for; see Methods). Estimates of the relative recombination rate between genes are coarse and limited by the quality of the S. cerevisiae recombination map, and there are potential evolutionary changes in recombination between S. cerevi-siae and S. paradoxus. However, both of these factors will decrease the strength of associations between divergence and recombination, and will cause our test to be conservative.
Mutation bias is also unlikely to account for the effect of recombination on protein evolution. We present associations between recombination and divergence at nonsynonymous sites ( Recombination and protein evolutionary rate. The relationship (r = partial correlation coefficient; see materials and methods) between recombination rate and dN for five gene expression intervals: the lower gene expression quartile (2.25 to 3.15 log mRNA abundance), the lower 50% expression (3.31 to 4.54 log mRNA abundance), and the upper gene expression quartile (3.53 to 4.54 log mRNA abundance). * P < 0.05; ** P < 0.01; *** P < 0.001 (after Bonferroni correction for five comparisons).
regions of high recombination [36,37], which should make our tests conservative. Despite these caveats with respect to using dS to estimate underlying mutational dynamics across the genome, dN/dS produces nearlyidentical patterns of covariation with recombination (see Additional file 1, Fig. S3).
The results are consistent with evolutionary theory suggesting that recombination enhances the efficacy of selection [e.g. [6,12]]. Mutations with weak fitness effects respond to selection when the effective population size (N e ) is large, but evolve via genetic drift when N e is small. By increasing N e , recombination enhances the power of selection and minimizes genetic drift. Furthermore, the adaptive consequences of recombination may be extreme in yeast since most genes in the yeast genome can be defined by weak purifying selection (~75% of genes are nonessential; ~50% have one PPI; TC & LLK unpub.). Such genes also tend to reside in genomic regions with relatively low recombination frequencies (see Additional file 1, Fig. S4). The correlations revealed by the data are particularly striking when one considers the method by which the genes are partitioned. Functional genomic data permits classification of genes according to their relative opportunities for slightly deleterious evolution. However, multiple types of substitutions (i.e. slightly deleterious, neutral, and beneficial) are likely to contribute to each gene's total genetic divergence between species. This plurality should dampen patterns predicted by any single processes, and will cause the conclusions presented here to be conservative.
Why does highly-expressed gene divergence show no correlation with recombination rate? There are two major possibilities. If genes under stronger selection tend to experience fewer beneficial mutations, their overall divergence rate might be relatively unaffected by local recombination. This might occur because strongly-selected genes are closer to perfection than weakly-selected genes and therefore have less opportunity for improvement, or because tradeoffs via pleiotropy (as indicated by high PPI [31]) limit the opportunity for beneficial mutations [38][39][40]. Secondly, selection for beneficial mutations might be very strong. The adaptive impact of varying recombination rate (and thus N e ) is expected to decrease with the strength of selection (i.e. s; [3,41]). If beneficial mutations tend to be strongly advantageous, they will tend to become fixed in low or in high recombinational environments.

Conclusion
This study shows that recombination reduces evolutionary divergence in genes under relatively weak purifying selection (e.g. low-expressed, nonessential, few PPI), and at best, marginally increases divergence in genes under strong purifying selection (e.g. highly expressed, essential, many PPI). This pattern suggests that enhanced purifying selection is a primary long-term benefit of recombination in nature. The efficient removal of deleterious mutations might increase the competitive ability of sexual species and contribute to the observed ubiquity of sexual reproduction in eukaryotes [6,42].
While this interpretation is appealing and supported by both theory and data from other taxa [e.g. [12,13]], it should be noted that inferences about the processes driving nucleotide divergence between species are tentative and reflect a major limitation of molecular divergence data. Future studies using entire-genome polymorphism and divergence data can add resolution by estimating the proportion of adaptive substitutions per gene [see [23]]. Such estimates, combined with inferences about the slightly deleterious substitution rate (derived from expression, essentiality and PPI data), will permit a much improved analysis of the benefits of recombination.

Data
Publically available polymorphism data was obtained via [43] and [44]. Genes with at least four samples from S. cerevisiae and at least one polymorphic site were included in the analysis, resulting in a dataset of 35 genes (lower 25% expression, n = 11; lower 50% expression, n = 12; upper 50% expression, n = 23; upper 25% expression, n = 17; essential genes, n = 7; nonessential genes, n = 28), comprising 34443 nonsynonymous, and 9975 synonymous nucleotide sites. The mean number of samples per gene was = 22. Orthologous sequences from S. paradoxus were obtained via BLAST search at [45].
Per gene recombination rates for S. cerevisiae are those reported by Gerton et al. [8] [data available at [46]]. These estimates refer to recombination rates per sexual generation. The total, per generation recombination rate during the evolutionary history of each gene is the product of the rate under sexual reproduction (R sex ) and the frequency of outcrossing (O c ); R TOT = R sex O c (modified from [47]).
Because the exact value of O c will be the same for all genes in the genome, R sex per gene i, relative to R sex for other genes, will be the same as R TOT  . Recombination, expression, length, SBG and divergence estimates were available for 4786 genes in total, including essential genes with two or more PPI (n = 329), essential genes with 1 PPI (n = 195), nonessential genes with two or more PPI (n = 762), and nonessential genes with 1 PPI (n = 835). Our dataset is available upon request.

Analysis
Population samples for each gene were aligned with Clus-talW [52], available online, and manually adjusted. P n , P s , D n , and D s values were calculated with DnaSP, Version 4.10 [53]. Watterson's estimate of silent nucleotide diversity (theta) was calculated by hand (as described in [54]). The complete polymorphism dataset is provided in Additional file 1.
All divergence estimates were log 10 transformed to facilitate linear comparisons, which are presented (values of dN = 0 were converted to dN = 0.0001 prior to log transformation); the results are robust and also obtained with nonparametric comparisons (TC & LLK unpub.). Partial correlation analysis was used to compare recombination rate with dN (the rate of nonsynonymous substitutions); the same results were obtained for the comparison between recombination rate and dN/dS. The partial r statistic reported here reflects the association between recombination rate and protein divergence after associa-tions between gene expression, gene length, and SBG were removed. These factors are known to influence patterns of protein evolution [[25]; TC & LLK unpub.], are all correlated with one another (i.e. recombination is positively correlated with expression and gene density, but negatively correlated with length), and can therefore give rise to spurious correlations between the variables of interest. All statistical analyses were carried out with JMP (SAS Institute). Statistical comparisons between r for different gene categories were carried out with software available online http://department.obg.cuhk.edu.hk/researchsup port/Correlation.asp. Bonferroni corrections for multiple comparisons (α/5; because of the 5 categories explored in Figure 2) were used to adjust P values of statistical significance.