Molecular evolution of synonymous codon usage in Populus
© Ingvarsson. 2008
Received: 08 July 2008
Accepted: 04 November 2008
Published: 04 November 2008
Skip to main content
© Ingvarsson. 2008
Received: 08 July 2008
Accepted: 04 November 2008
Published: 04 November 2008
Evolution of synonymous codon usage is thought to be determined by a balance between mutation, genetic drift and natural selection on translational efficiency. However, natural selection on codon usage is considered to be a weak evolutionary force and selection on codon usage is expected to be strongest in species with large effective population sizes.
I examined the evolution of synonymous codons using EST data from five species of Populus. Data on relative synonymous codon usage in genes with high and low gene expression were used to identify 25 codons from 18 different amino acids that were deemed to be preferred codons across all five species. All five species show significant correlations between codon bias and gene expression, independent of base composition, thus indicating that translational selection has shaped synonymous codon usage. Using a set of 158 orthologous genes I detected an excess of unpreferred to preferred (U → P) mutations in two lineages, P. tremula and P. deltoides. Maximum likelihood estimates of the strength of selection acting on synonymous codons was also significantly greater than zero in P. tremula, with the ML estimate of 4 N e s = 0.720.
The data is consistent with weak selection on preferred codons in all five species. There is also evidence suggesting that selection on synonymous codons has increased in P. tremula. Although the reasons for the increase in selection on codon usage in the P. tremula lineage are not clear, one possible explanation is an increase in the effective population size in P. tremula.
Codon bias, the preferential use of subset of synonymous codons, has been documented in a wide variety of organisms, from prokaryotes, to unicellular and multicellular eukaryotes [1–3]. While codon bias appears to be almost universal, the magnitude of codon bias largely depends on the effective population size, with codon bias being higher in species with larger effective population sizes . Evolution of synonymous codon usage is a process where natural selection is sufficiently weak (N s ~ 1) that the outcome is influenced by both selection, mutation and genetic drift [1, 2]. At the same time, synonymous changes within and between species are sufficiently common that abundant data is available for testing evolutionary hypotheses of synonymous codon usage, explaining why much attention has been directed to understanding the relative contributions of mutation, genetic drift and natural selection to the patterns of codon bias seen within and between species.
where μ and ν are the mutation rates from and to preferred codons, respectively, and 4 N s is the scaled selection coefficient acting on preferred codons [6, 7]. Equation (1) suggest that changes in evolutionary parameters, such as population size or mutation rate, can have large effects on the evolution of synonymous codon usage even in closely related species. While major shifts in codon preferences are unlikely between closely related species, changes in evolutionary parameters, particular in effective population sizes, are relatively common even over short evolutionary time scales . Population size changes are expected to affect codon usage across all synonymous codon families and results in enhanced codon bias following a population size increase and an erosion of codon bias after a population size decline . For instance, several studies have documented reductions in codon bias in Drosophila melanogaster and an increased fixation rate of unpreferred codons compared to the closely related species D. simulans and this has been attributed to a reduction in the effective population size of D. melanogaster and a relaxation of selection on codon bias [8–10]. Reductions in codon bias, likely driven by reductions in effective population sizes, have also been documented in D. miranda  and several species from the D. virils group . Mating system is another potential factor affecting effective population sizes. For instance, self-fertilization is associated with a reduction in the effective population size  and Caenorhabditis species which have experienced a transition from obligately outcrossing to selfing show a concomitant decrease in codon bias .
I have recently shown that patterns of synonymous codon usage in the long-lived tree Populus tremula is consistent with weak natural selection acting on translational efficiency as evidenced by a strong positive correlation between gene expression and codon bias . In this paper I extend my earlier studies of codon bias in P. tremula to five species covering most of the phylogenetic diversity found within the genus Populus. Using a greatly expanded set of genes and species allows for more thorough studies of the molecular evolution of synonymous codon usage across lineages with the hope of answering questions pertaining to the stability of evolutionary parameters, such as mutation rates, population sizes and patterns of natural selection.
In a recent study I estimated optimal codons in P. tremula based on correspondence analysis of codon usage in 558 different genes using the program codonw . The results presented here are both based on more species (five vs. one) and substantially more genes from each species (> 4500 genes per species). Nevertheless there is a large degree of overlap between optimal codons I identified in P. tremula  and those identified in this study. In fact, the 11 codons identified as optimal in P. tremula  represents a subset of the 25 codons identified in this study. The greater number of optimal codons identified here is likely the result of a greater power to identify subtle differences in codon usage between high and low expression genes due to the substantially larger data set used in this study.
Number of unique transcripts and an index of codon usage bias for five different species of Populus.
cor(EST hits, F op )
Initially I tested for rate heterogeneity among lineages using only synonymous sites by contrasting a model assuming a single molecular clock for the entire phylogeny with a model assuming no clock. These analyses show overwhelming support for a model without a clock, (2Δ L = 475.7, df = 4, p < 0.001), suggesting that the molecular clock hypothesis of a constant substitution rate across the genus Populus can be rejected. I then used relative rate tests  to compare substitution rates as synonymous sites between pairs of lineages. These tests show significant rate heterogeneities between P. deltoides and P. trichocarpa (X 2 = 4.40, p = 0.036) and between the P. nigra and the P. tremula - P. euphratica lineages (X 2 = 103.6, p < 0.001). To test for lineage specific effects on synonymous and non-synonymous substitution rates I contrasted a model with a single d N /d S ratio (one-ratio model) with a model allowing for different d N /d S values across branches of the phylogenetic tree (free-ratio model). In general, the log-likelihood values were greater for the free-ratio model than for the one-ratio model and in six of the 158 genes this difference is statistically significant at p < 0.05, suggesting significant heterogeneity in selective constraint at these genes. However, none of the genes remain significant after multiple test correction is applied. There was, however, evidence for significant heterogeneity in selective constraint in the concatenated data set (2Δ L = 13.84, df = 7, p = 0.019). The median d N /d S ratio was lowest in P. trichocarpa (0.132) and highest in P. nigra (0.232) (see also Figure 3 for d N /d S ratios estimated from the concatenated data set). The internal branches leading to the P. tremula - P. euphratica lineages and the P. tremula - P. euphratica - P. nigra lineages have substantially greater d N /d S ratios (0.316 and 0.349, respectively). However, these branches are very short and d N /d S ratios are subject to a large degree of uncertainty. Nevertheless, all d N /d S ratios are substantially lower than one, suggesting strong purifying selection across all genes.
I also used likelihood ratio tests to test for the presence of heterogeneous selection pressures among sites within individual genes. The null model (M0 ), assumes a single d N /d S ratio across sites which is calculated from the data. This model is contrasted with two alternative models; M1a , which assumes two classes of sites, one class which is neutral d N /d S = 1 and one class where 0 < d N /d S < 1, and model M2a that in addition to the two classes defined for model M1a allows for a third class of sites with d N /d S > 1 . A total of 14 genes show evidence for heterogeneous selection pressures across codons as evidenced by likelihood ratio tests that were significant at a nominal p -value of 0.05. Again none of these tests are significant after multiple test correction. However, this test is known to have low power to detect the action of positive selection when few, relatively closely related sequences are compared in each test, so the results should be interpreted with caution . The concatenated data set provides a more robust test of heterogeneous selection pressures across sites. Here model M1a provides a substantially greater fit to the data, compared to model M0 (2Δ L = 422, p < 0.001). Furthermore, model M2a is significantly better than model M1a (2Δ L = 40.9, p < 0.001), conclusively demonstrating that there is heterogeneity in selection pressures among sites. Under model M2a, the majority of sites (78.3%) are under strong purifying selection (d N /d S ≈ 0) and another large class of sites are evolving neutrally (20.2% of sites with d N /d S = 1). Finally, a small fraction of sites (1.4%) appear to be evolving under the action of relatively strong positive selection (d N /d S = 4.2). These sites appear to occur randomly across the genes, with no apparent clustering.
Lineage-specific substitutions at synonymous (S) and non-synonymous (NS) sites, preferred and unpreferred codons (U → P and P → U) and A/T and mutations (W → S and S → W).
P → U
U → P
d UP - PU
S → W
W → S
d WS - SW
U → P mutations shows as slight, but significant excess across the entire phylogeny, suggesting that codon usage has not reached an equilibrium in Populus. The excess of U → P mutations can, however, largely be explained by the excess of U → P substitutions in the P. tremula and the P. deltoides lineages (Table 2). Both of these lineages do show a consistent excess of U → P mutations across loci (p = 0.013 and p < 0.001 for P. deltoides and P. tremula, respectively). The remaining lineages appear to have reached a stationary distribution of codon usage and show no significant deviation from an equal number of P → U and U → P mutations. Taken together, the changes observed in the P. tremula and P. deltoides lineages suggest that selection on synonymous codon usage codon bias has increased in these two lineages. The degree of codon bias, measured using F op , is higher in P. tremula that in both the most closely related species (P. euphratica) or in the inferred ancestor, consistent with the observed increase in selection on codon usage. Although these differences are small (mean = 0.0072 and 0.0065 across loci, respectively) they are consistent and highly significant across loci (Wilcoxon signed-ranks test p < 0.001 in both cases). Comparisons of F op for P. deltoides with either P. trichocarpa or their most recent common ancestor are not significant however.
There is also a significant negative correlation between d UP - PUand F op across loci over the entire phylogeny (ρ = -0.234, p = 0.003), as would be expected if selection is shaping synonymous codon usage . I did not detect any correlations between d S and codon bias within any of the lineages. Although codon bias is expected to be negatively correlated to d S when selection on codon bias is strong, theory shows that weak selection (N s < 1) actually has little effect on synonymous substitution rates .
Estimates of selection on synonymous codons for the concatenated sequence data.
Symmetric mutation rates
p < 0.0001
Mutation rates equal in all lineages
p < 0.0001
No selection in P. deltoides
No selection in P. euphratica
p = 0.237
No selection in P. nigra
p = 0.371
No selection in P. tremula
No selection in P. trichocarpa
p = 0.527
Selection equal in all lineages
The concatenated data, including data from all 158 genes, show significant evidence for weak, but positive selection on synonymous sites only in the P. tremula lineage, with the estimated 4 N e s = 0.720 (Table 3, Figure 3). The selection coefficient in P. deltoides is also marginally significant (p = 0.083, Table 3), but with selection only about half as strong as in P. tremula (4 N e s = 0.330).
The gene-specific analyses identify a total of 21 genes that show significant evidence for selection on synonymous codon usage in P. tremula at a false discovery rate  of 5%. For genes with evidence for significant selection in P. tremula, 16 have estimates of S > 0 and 5 have S < 0. This suggests that although selection on synonymous codon usage favors preferred codons in the majority of cases, there are genes for which there is active selection against preferred codons. This mirror results from Drosophila, where the majority of genes show evidence for selection favoring preferred codons but where a few genes show significantly negative estimates of 4 N e s .
Although the changes in codon usage observed in the P. tremula lineage are consistent with an increase in the strength of natural selection acting on codon usage, an alternative explanation is a change in base composition driven by changes in mutational biases. One way to test this assumption is to compare mutational changes from A/T to G/C and vice versa across the phylogeny and in the individual lineages. Since preferred synonymous codons predominantly end with G/C (Figure 1), synonymous sites cannot be used for such comparisons as it would be difficult to disentangle mutational biases from natural selection on codon usage. Ideally, non-coding regions in the vicinity of each gene could be used for such a comparison, since this would account for chromosome-wide differences in base composition. However, since the sequences used for the current analyses are derived from EST projects, non-coding sequences are not available. I have therefore used non-synonymous mutations to infer mutational changes in different lineages.
Following Akashi et al. , I classified mutations as S → W (strong [G/C] to weak [A/T], following the standard DNA ambiguity code) or W → S (weak to strong) across lineages and deviations from equilibrium base composition were quantified as d WS - SW= (WS - SW)/(WS + SW). The data show that there are no significant departures from an equilibrium base composition, either across the entire phylogeny or within any of the individual lineages (Table 2). There is also no correlation between d UP - PUand d WS - SWacross loci (ρ = 0.112, p = 0.16) or between d WS - SWand F op (ρ = -0.013, p = 0.86).
Selection on synonymous codon usage appears to be common among higher eukaryotes [2, 3] but the efficacy of such selection is usually thought to be lower in species with lower effective population sizes . Nevertheless, even long-lived species like Populus show ample evidence for selection having shaped synonymous codon usage.
Optimal codons appear to be largely the same across the five species used in this analysis, and most of the differences between species can be explained by codons with low ΔRSCU values, where the statistical power to detect changes in codon usage between low and high expression genes may be low. Also, the ΔRSCU analyses are based on publicly available EST-collections and the sets of genes used for the different species are likely very different thereby contributing to the slightly different sets of optimal codons identified in the different species (Figure 1). The codons identified as optimal in this study also correspond well with optimal codons identified in an earlier study of codon bias in P. tremula where optimal codons were identified using substantially less data and slightly different methods . The 11 codons identified as optimal in P. tremula  actually form a subset of 25 optimal codons identified in the ΔRSCU analyses. The estimates of for the different species, which provides rough estimates of the overall strength of selection for preferred codons in the different species, is similar to what has been previously observed in Arabidopsis, but is lower than observations from either Drosophila or Caenorhabditis [14, 28]. I have used EST data as a proxy for gene expression in this study and this is known to be associated with a lot of problems, like relatively shallow and biased library coverage . However studies that have estimated codon bias parameters using expression profiles from either EST data or other methods, such as MPSS or microarrays, have yielded similar results despite the uncertainties associated with the EST data [28, 30, 31]. Codon usage bias can in principle be driven by natural selection or by mutational pressure and/or gene conversion [2, 7]. Although both of these forces are likely operating, there are unequivocal signs of natural selection having shaped codon usage biases in Populus. There are strong positive correlations between gene expression and the degree of codon bias across individual genes in all five species (Table 1). The gene expression-codon bias correlations are independent of base composition differences (Figure 2), indicating that natural selection, operating through translational efficiency or accuracy, has been a significant force shaping synonymous codon usage in Populus. The lack of association between d S and codon bias suggest that selection on synonymous codon usage is strong enough to influence rates of synonymous divergence among species. A significant correlation with d S is only expected when natural selection on codon usage is strong. When selection on codon bias is weak, theory shows that it has only small effects on the rate of synonymous substitutions even if the effects on codon usage can be significant .
Interestingly, gene length appears to have virtually no effect on codon bias, except in P. trichocarpa (Figure 2). Earlier studies in P. tremula have suggested a weak, but significantly negative effect of gene length on codon bias  and negative correlations have been found in other eukaryotes as well . Several reasons can explain the lack of a correlation between gene length and codon bias in the present study. First, there might be genuine differences between species in the effect of gene length. Cutter et al.  found that for some species of Caenorhabitis gene length had a significant effect on codon bias whereas for other species no such effect were found, suggesting that the effects of gene length might differ even between closely related species. Alternatively, different sets of genes were used for the different species and it is conceivable that this could explain why an effect of gene length is lacking in some species. However, the number of genes used is quite large for all species (Table 1), so if gene length has an effect the sample sizes are likely large enough that it would be detected. Finally, gene lengths were taken from the annotated genome of P. trichocarpa but were assumed to be representative of the corresponding gene from the remaining species. It is possible, although perhaps unlikely, that gene lengths differ substantially between species. Nevertheless, the effects of gene length on codon bias clearly deserves further attention.
At first impression, codon usage does not appear to have reached an equilibrium in Populus as there was an excess of U → P substitutions across the entire phylogeny (Table 2). However, the excess of U → P substitutions could largely be explained by significant excesses in two different lineages, P. deltoides and P. tremula, with the remaining lineages showing no deviations from an equal numbers of P → U and U → P substitutions(Table 2). In addition, the maximum-likelihood estimates of the selection coefficient acting on synonymous codons are positive in both of these lineages (4 N e s = 0.330 in P. deltoides and 4 N e s = 0.720 in P. tremula).
The 4 N e estimate for P. tremula is significantly greater than zero whereas the estimate for P. deltoides is approaching significance (Table 3). The estimate of selection in P. tremula (S = 0.720) is only slightly lower than estimates of selection from Drosophila simulans and D. yakuba , two species for which there are abundant evidence for selection on synonymous codon usage [9, 10]. In addition, P. tremula has also significantly higher F op than both the most closely related species (P. euphratica) or their common ancestor, supporting an increase in the degree of codon bias as a result of ongoing selection for preferred codons in P. tremula.
For the remaining lineages of the phylogeny the maximum-likelihood estimates of 4 N e s are either close to zero or even negative, although none of these estimates are significantly different from zero (Table 3). These results suggest that the strength of selection acting on codon usage has increased independently in two different lineages in the genus Populus, with natural selection favoring preferred codons in P. tremula and P. deltoides. Despite lacking evidence for selection on preferred codons, the remaining species show signs of past selection on codon usage as evidenced by the positive association between gene expression and codon bias discussed above. Finally there is no evidence for changes in base composition across lineages (Table 3), although these conclusions are based on changes at non-synonymous sites. There are significant differences in the GC content of coding regions and surrounding non-coding regions in P. tremula  but also no apparent correlation between GC contents of coding and non-coding regions, suggesting that non-synonymous sites might provide an accurate picture of the patterns of mutations in coding regions in Populus.
What accounts for the increase in the strength of selection acting on synonymous codons in P. tremula (and in P. deltoides)? One thing that distinguish P. tremula from many other species of Populus is its very wide distribution, ranging throughout Eurasia, from Western Europe to Eastern Asia. Therefore based only on the distribution range, P. tremula likely has a substantially greater effective population size than other species of Populus, that in many cases have both geographically more restricted ranges and more fragmented distributions where they do occur . It is interesting to note that P. deltoides also have a relatively wide distribution range , suggesting that the N e could be relative large also in P. deltoides. Sequence-based estimates of the effective population size of P. tremula suggest that N e is at least 105 . This estimate of N e is heavily dependent on the genome-wide mutation rate in Populus and this has not been characterized in great detail . However, estimates of N e are lacking for other species of Populus, and the conclusions regarding the relative population sizes of the different species are clearly tentative.
In other species, population size in known to have a large effect on the strength of selection acting on preferred codons. As mentioned above, the effective population size appears to have been reduced in D. melanogaster and this have resulted in reduced selection on synonymous codons at least when compared to the closely related species D. simulans [8–10]. A similar picture is emerging from data on synonymous codon usage in nematodes where species that have evolved self-fertilization and/or have a parasitic life-style have experienced a reduction in N e which translates into lower selection on codon bias [14, 16]. These results suggest that changes in effective population size are perhaps the most important parameter explaining differences in codon bias between closely related species. Interestingly, selection on codon usage does not differ dramatically between Arabidopsis thaliana and A. lyrata, despite the the former bing highly selfing and the latter obligately out-crossing. This mirrors data on rates of protein evolution which also does not differ between the two species and which suggest that the effective population sizes are not dramatically between A. thaliana and A. lyrata .
This study conclusively demonstrates that natural selection on synonymous codon usage is occurring also in a long-lived, perennial plant species like Populus. Data on synonymous codon usage suggest both ancient selection on synonymous codon usage in the genus Populus and current selection in P. tremula. The data also show that there has been a shift in the strength of selection acting on preferred codons in the P. tremula lineage and possibly also in P. deltoides. The analyses of the present data make shifts in mutations rates among species unlikely, suggesting that increases in the effective population size in these lineages can explain the observed increase in selection on synonymous mutations.
I downloaded all available ESTs for five different species of Populus, P. deltoides (14661 ESTs), P. euphratica (13905), P. nigra (51361), P. tremula (37313) and P. trichocarpa (89943) from PlantGDB . I also downloaded the corresponding PlantGDB-assembled Unique Transcripts (PUT) for P. deltoides, P. euphratica, P. nigra and P. tremula. These PUTs are unique transcripts assembled from all mRNA sequences for a given species available in public databases and have been trimmed to remove bacterial contamination, repetitive sequences and polyA tails. For the remainder of this paper I will refer to these PUTs as genes, recognizing that these PUTs do, for the most part, not correspond to full-length transcripts. I also downloaded all predicted gene models for P. trichocarpa from the publicly available genome sequence at http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html. Reading frames of all genes were checked, and corrected, for sequencing errors using FrameD . Pairwise comparisons were then performed between all genes within species using BLASTN and redundant sequences as well as sequences from gene families with very similar paralogous copies (> 85% sequence similarity) were removed. Finally, sequences shorter than 300 bps were also excluded. The final numbers of unique genes per species are listed in Table 1.
Expression profiles for all genes were obtained by all-against-all BLASTN searches of genes and the complete EST data for each species separately. Gene-EST alignments were required to show at least a 90% identity across 100 bp to be recorded as a match and the number of BLASTN hits were used as a proxy for expression levels. I partitioned the gene expression data for each species into two classes, genes with a single EST hit and genes which had EST hit numbers that were equal or greater than the species specific 90th percentile [14, 28]. I then calculated relative synonymous codon usage (RSCU) for all genes in each species and expression category separately using the codonw (version 1.4.2 http://codonw.sourceforge.net/). I then identified codons which are over-represented in highly expressed genes by comparing differences in ΔRSCU between high and low expression genes (ΔRSCU) using t -tests using R . Codons showing significantly higher levels of ΔRSCU in high expression genes were taken as optimal codons for the species in question. Finally, following Cutter et al. , overall indices of codon usage for each of the five different species were calculated as the average of all positive ΔRSCU values, (Table 1).
Optimal codons identified from the ΔRSCU analyses were used to calculate the frequency of optimal codons (F op )  and the GC content of the coding region using codonw http://codonw.sourceforge.net. Each gene was also associated with the most likely orthologous gene from the P. trichocarpa genome sequence using BLAST. The lengths of the corresponding full-length sequences from P. trichocarpa were used in combination with gene expression (EST hit count) and base composition (GC content) to estimate the amount of variation in F op that the variables and their first order interactions explain . As pointed out by Cutter et al. , a significant positive effect of gene expression is consistent with selection having shaped optimal codon usage.
Putative one-to-one orthologous genes were identified from the five different species using the program orthoMCL, using default parameters . Groups of putatively orthologous sequences were aligned using ClustalW . All identified genes were also concatenated within species and used to construct the most likely phylogenetic tree using baseml from the PAML package. The baseml analyses used the HKY85 substitution model with rate variation among sites and with the transition:transversion ratio estimated from the data (HKY85+Γ). The resulting tree was rooted based on results from phylogenetic analyses of the entire genus Populus .
Synonymous and non-synonymous substitutions for each gene and each lineage of the phylogeny were calculated using the codeml program from PAML  based on the phylogenetic tree obtained from the concatenated data set. The model assumed different d N /d S ratios for all branches in the phylogeny (i.e model = 1) and with codon frequencies calculated average nucleotide frequencies at the three codon positions (F 3 x 4).
To test for evidence of heterogeneous selection acting on codons, codeml from PAML was used to fit a null model with a single d N /d S ration (model M0 ) and two models that allow the d N /d S ratio to vary between codons (models M1a and M2a ). A gene was assumed to be under positive selection if a likelihood ratio test, comparing models M1a and M2a, was significant at p < 0.05. As d S did not show any association with codon bias it was not necessary to correct d S for possible effects of selection at synonymous sites .
Ancestral sequences at internal nodes of the phylogeny were reconstructed using baseml from the PAML package . baseml use a defined substitution model and a phylogenetic tree to estimate branch lengths and to assign posterior probabilities to different ancestral nucleotides at internal nodes of the phylogenetic tree. The ancestral reconstruction also used the HKY85+Γ nucleotide substitution model. Probabilities of ancestral codons were calculated as the product of probabilities of ancestral nucleotides at the three different nucleotide positions .
For codons that differ at more than a single nucleotide position, there are multiple alternative paths between ancestral and derived codons. I used the method of Akashi et al.  which weight different paths equally. With low levels of sequence divergence, as in this data set, this method give results similar to methods that weight paths by the number of synonymous to non-synonymous substitutions . However this method has a much lower computational demand. Following Akashi et al.  multiple hits were ignored, again motivated by the low levels of sequence divergence in the data set (d S < 5 – 7%). Once the ancestral sequences of the internal nodes of the tree had been obtained, lineage-specific synonymous and non-synonymous substitutions were calculated. In addition, synonymous substitutions were classified according to the putative fitness effects of the ancestral and derived codons. Codon preferences for all species were assumed to follow the preference table established from the ΔRSCU analyses described above. Mutations between codon preference classes were classified as unpreferred to preferred (P → U) or preferred to unpreferred (P → U) depending on the state of the ancestral and derived codons. Mutations within codon classes (U → U and P → P) were also scored.
If codon usage is at equilibrium, an equal number of U → P and P → U mutations are expected. To test this assumption I applied a goodness-of-fit test to the number of P → U and U → P mutation in each lineage of the phylogeny.
To estimate the strength of selection acting on alternative codons in different lineages of the Populus phylogeny I used the method of Nielsen et al. . Briefly, the method of Nielsen et al.  extends codon-based likelihood methods  with a parameter that allow for selection on synonymous codon usage. The method estimate the scaled coefficient of selection (S = 4 N e s) acting on mutations from unpreferred to preferred codons, with S > 0 indicating selection favoring the preferred state and S < 0 indicating selection against the preferred state. In addition to estimating S, the method also provides lineage specific estimates of the ratio of non-synonymous to synonymous substitution rates (ω = d N /d S ) and parameters of the mutation matrix . The input tree and branch lengths were taken from the PAML analyses described above.
Mutation rates were initially estimated using a data set consisting of concatenated sequences from all loci. Technically it is difficult to accurately estimate the mutation rate parameters on either of the lineages surrounding the root, because the placement of the root in the tree not uniquely determined in the data set. Mutation rate estimates for these lineages are therefore only approximate. Mutation rates estimated from the concatenated sequence data were then used in gene-specific analyses to reduce computational time.
To test for selection on synonymous sites, using either the concatenated data sets or individually for all loci, I separately fit two models. The first model has S as a free parameter and therefore allows for lineage specific estimation of S. The alternative model constrains S to zero on a specific lineage of interest and therefore assumes that synonymous sites are neutral along that lineage. The two models are nested and can be compared using a likelihood ratio test, where twice the difference in the likelihood of the two models (Δlog L = log(L 2) - log(L 1)) is assumed to follow a χ 2-distribution with degrees of freedom equal to the difference in number of parameters between the two models (n p (2)- n p (1)).
This study has been funded by grants from the Swedish Research Council to PKI. I am grateful to four anonymous reviewers who provided very helpful comments on earlier versions of the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.