The cost of wobble translation in fungal mitochondrial genomes: integration of two traditional hypotheses

Background Fungal and animal mitochondrial genomes typically have one tRNA for each synonymous codon family. The codon-anticodon adaptation hypothesis predicts that the wobble nucleotide of a tRNA anticodon should evolve towards maximizing Watson-Crick base pairing with the most frequently used codon within each synonymous codon family, whereas the wobble versatility hypothesis argues that the nucleotide at the wobble site should be occupied by a nucleotide most versatile in wobble pairing, i.e., the tRNA wobble nucleotide should be G for NNY codon families, and U for NNR and NNN codon families (where Y stands for C or U, R for A or G and N for any nucleotide). Results We here integrate these two traditional hypotheses on tRNA anticodons into a unified model based on an analysis of the wobble costs associated with different wobble base pairs. This novel approach allows the relative cost of wobble pairing to be qualitatively evaluated. A comprehensive study of 36 fungal genomes suggests very different costs between two kinds of U:G wobble pairs, i.e., (1) between a G at the wobble site of a tRNA anticodon and a U at the third codon position (designated MU3:G) and (2) between a U at the wobble site of a tRNA anticodon and a G at the third codon position (designated MG3:U). Conclusion In general, MU3:G is much smaller than MG3:U, suggesting no selection against U-ending codons in NNY codon families with a wobble G in the tRNA anticodon but strong selection against G-ending codons in NNR codon families with a wobble U at the tRNA anticodon. This finding resolves several puzzling observations in fungal genomics and corroborates previous studies showing that U3:G wobble is energetically more favorable than G3:U wobble.


Background
The wobble versatility hypothesis [1][2][3][4][5][6], abbreviated as WVH, states that the wobble site of tRNA anticodon should have G for NNY codons (where Y stands for C or U and N for any nucleotide) because G can pair with both C and U in RNA, and should have U for NNR to pair with both A and G. For NNN codon families, the wobble site should be U because U is known to be the most versatile in wobble-pairing [7][8][9][10][11][12]. In contrast, the codon-anticodon adaptation hypothesis, or CAAH for short, invokes the codon usage bias as a determining factor, i.e., the wobble site of tRNA anticodon should co-evolve with codon usage so that the nucleotide in the wobble site of tRNA anticodon should match the most abundant codon in a synonymous codon family [6,[13][14][15]. The association between the major codon and the anticodon of the most abundant tRNA has been documented in Escherichia coli [16,17], Saccharomyces cerevisiae [18], and other species and organelles [15,[19][20][21][22].
Here we develop a general hypothesis of codon-anticodon adaptation based on an analysis of wobble costs, and derive its predictions that can be tested by genomic data. The wobble cost may be viewed as reduction in decoding efficiency and accuracy because such reduction would be selected against over evolutionary time. We will refer to this new general hypothesis based on wobble cost as WCH (for wobble cost hypothesis). The two traditional hypotheses, CAAH and WVH, will be shown to be special forms of WCH.
Following the shorthand notation of Ogle et al. [23], I designate the translation cost through wobble base-pairing between nucleotide i at the third codon position of a codon and the nucleotide j at the wobble site of tRNA anticodon as M i3:j (where M is for wobble cost. The letter C would be more suitable to represent cost but it may confuse with the nucleotide C). We assume M i3:j = 0 if nucleotides i and j form Watson-Crick base pairing. The reason for this assumption is that C:G and A:U pairs have not been found to contribute to ribosomal stalling (which reduces translation efficiency) or amino acid mis-incorporation (which reduces translation accuracy) although almost all non-Watson-Crick base pairings have been shown to reduce translation efficiency and accuracy. We define M Y3:U as the wobble cost between a wobble U at the tRNA anticodon and a C or U at the third codon position. is expected to be smaller than M O because a wobble U at tRNA anticodon is known to be the most versatile in wobble-pairing [7][8][9][10][11][12].
A classical study of U:G wobble pairs [24] suggests a preference for the U being at the 3' end rather than at the 5' end, which implies that U3:G wobble pair is energetically more favorable than G3:U wobble pair [25]. Subsequent studies have shown that, while the U3:G wobble pair occurs on the ribosome, the unmodified G3:U wobble pair does not [2,23,26]. These findings suggest that M U3:G may be smaller than M G3:U , although its generality is unknown.

Two-fold NNY and NNR codon families
First consider the NNY codon family where Y is either C or U. Designate the number of C-ending and U-ending syn-onymous codons by N C and N U , respectively, and the total cost of wobble pairing as M wG when the wobble site of the anticodon is G, and as M wA when the wobble nucleotide is A (we do not need to consider the case when the wobble site is U or C for NNY codon families because such cases have never been observed and because a tRNA with a wobble U or C to translate NNC and NNU codons is against physiochemical reasons). We now express the total cost M wG and M wA as Note that M C3:G = M U3:A = 0 according to our definition. The dependence of M wG and M wA on the relative frequencies of N C , expressed as proportion of C-ending codons in the NNY codon family (P C ), is graphically shown in Fig.  1, with M O assumed to equal 2•;M U3:G . The condition for M wG = M wA , i.e., when the wobble site of the tRNA anticodon can take either G or A without a fitness differential, is In the scenario in Fig. 1  Third, when N C < N U but N C > N U ·M U3:G /M O , then WCH will still predict a G at the wobble nucleotide of tRNA anticodon because M wG is still smaller than M wA . Take the scenario in Fig. 1 for example, when N C < N U but P C > 1/3 (1) (which correspond to the shaded area in Fig. 1), we have M wG < M wA (Fig. 1), so natural selection should favor a G at the wobble site of tRNA anticodon. WVH happens to have the same prediction. However, CAAH will predict an A at the wobble site of tRNA anticodon if U-ending codons are more abundant than C-ending codons. This is in contrast to WVH, i.e. CAAH is inapplicable when P C is within the shaded range in Fig. 1.
Fourth, when N C << N U , especially in the extreme case when N C = 0, then Eq. (1) is reduced to M wG = N U• M U3:G and M wA = 0. Because now M wG > M wA , WCH predicts an A at the anticodon wobble site. In this case, WVH would still predict a G at the wobble site because it ignores the codon frequencies (i.e., it ignores the relative magnitude of N C and N U ), but CAAH would predict an A at the wobble site, which is the same prediction as WCH. Only in this particular case when N C << N U can CAAH and WVH be clearly differentiated. As is depicted in Fig. 1  Conceptual illustration of the dependence of wobbling cost involving a G or an A at the wobble site of tRNA anticodon (M wG and M wA ) on the proportion of C-ending codons (P C ) in an NNY codon family, with M C3:A = 2M U3:G . The shaded area corresponds to P C smaller than 1/2 but larger than 1/3. cise). Alternatively, with many N C /N U ratios from many NNY codon families translated by tRNA with a G at its anticodon wobble site, we may compute the lower 95% confidence limit of the N C /N U ratio (LCL 95.G where the subscript G indicates tRNA with a G at the anticodon wobble site) and infer that M U3:G /M C3:A < LCL 95.G .
If we always have very large N C /N U ratios, we may infer that selection against U-ending codons must be strong, with little chance for mutation to elevate N U . This is a strong indication of a large M U3:G . Along the same line of reasoning, we may infer that M U3:G is very small if N U can often as large as, or even larger than, N C .
Similarly, if nature has chosen an A at the wobble site of tRNA anticodon, then we may infer M wG > M wA , so We can apply exactly the same rationale for the NNR codon family leading to parallel conclusions. For example, if nature has chosen a U at the wobble site of the tRNA anticodon, then we may infer that M wU < M wC , so that Similarly, if M G3:U is very large, then there should be strong selection against G-ending codons in favor of Aending codons. This will produce large N A /N G ratios. In contrast, a large N G comparable to N A indicates a very small M G3:U cost.
Given previous studies indicating that U3:G is energetically much more favorable than G3:U [2,23-26], we should expect M U3:G < M G3:U . The reasoning above paves the way for us to test whether this is generally true among the genetically diverse fungal species.

Three-fold AUH (Isoleucine) codons
Designate codons ending with A, C, and U as N A , N C and N U , respectively. The wobble cost of having an A, G, C, or U at the wobble site of tRNA anticodon is It is obvious that M wC is always greater than M wA , so we should never find a C at the wobble site of a tRNA Ile anticodon, i.e., we can disregard M wC . If nature has chosen G at the wobble site of tRNA Ile anticodon, then we may infer that M wG is the smallest. From M wG < M wA and M wG < M wU , we have where the assumption is made on the basis of previous observations that U is generally the most versatile in wobble-pairing among the four nucleotides [7][8][9][10][11][12]. Cells in fungal species are generally rapid-replicating which necessitates efficient translation. Rapidly replicating unicellular organisms are theoretically expected to be under strong selection to increase the rate of biosynthesis [15,28] and they typically exhibit strong codon-anticodon adaptation [29]. Thus, fungal species should be ideal for evaluating evolutionary hypothesis on codon-anticodon adaptation.

Methods
We retrieved 36 fungal mitochondrial genomes (   ( ) : : : : : : : ( ) : : : : : ( ) : : : : genomes use translation table 16 (Table 1). When results are similar among genomes using the same translation table, only results from a representative genome are presented. The number of codon families supporting CAAH (N CAAH ) and WVH (N WVH ) is compiled following the following rationale [27]. Suppose a lysine (Lys) codon family has 20 AAA and 60 AAG codons. WVH would ignore the codon usage bias and predict a wobble U in the tRNA-Lys anticodon because U can pair with both A and G, whereas CAAH would predict a wobble C in the tRNA Lys anticodon to maximize the Watson-Crick match with the more frequent G-ending codons. If the tRNA Lys anticodon is found to have a wobble U, then WVH is supported; if a wobble C is found, then CAAH is supported. If we have 60 AAA codons and 20 AAG codons and if tRNA Lys anticodon has a wobble U, then both hypotheses are supported, i.e., they are indistinguishable and are not included in Table 1.
The methionine codon families are not included in Table  1 but discussed in detail elsewhere [27,30].
The tRNA and CDS sequences were extracted and analyzed by using DAMBE [31,32]. The CDS-derived codon usage is also computed with DAMBE. The anticodon in almost all tRNA sequences from all species share the regular feature of being flanked by two nucleotides on either side to form a loop that is held together by a stem. For example, the anticodon loop (AC loop) of the tRNA Arg genes translating CGN codons in Epidermophyton floccosum is 28CGUGUUACGGCCACG42, where the starting and ending numbers indicate the position of the AC loop in the tRNA sequence, with the anticodon 5'-ACG-3' (matching codon CGU) flanked by two nucleotides on either side (in bold) to form a loop that is held together by a stem made of the first and the last four nucleotides. Similarly, the other tRNA Arg translating AGR codons is 25AAAAUACUUCUAAUAUUUU43, with the AC loop held together by a six-base stem. However, some tRNA sequences have a suspicious AC loop and DAMBE will flag them out. The AC loop is then identified by aligning the tRNA sequences against other isoaccepting tRNA sequences with a regular AC loop [6]. Some tRNA anticodon loop has the anticodon flanked by three instead of two nucleotides. For example, the anticodon loop in tRNA Leu in the mitochondrial genome of Kluyveromyces thermotolerans is GAUACUCUUAAGAUGUAUU, with the anticodon UAA flanked by three nucleotides (in bold) on both sides. There are a few tRNA sequences in which anticodon loop cannot be identified.
Some mitochondrial genomes in GenBank are annotated incorrectly. For example, tRNA Pro in the mitochondrial genome of Ashbya gossypii ATCC 10895 has an anticodon of UGG matching codon CCA (the most frequently used proline codon), but the GenBank file (NC_005789) annotated the anticodon to match codon CCU.
A few fungal mitochondrial genomes do not have a complete set of tRNA genes. For example, the mitochondrial genomes of Hyaloraphidium curvatum and Harpochytrium sp. JEL94 have seven and eight tRNA genes, respectively, and consequently will need tRNA import from the nuclear genome. This may cause complication in analyzing codon-anticodon adaptation. However, removing such genomes does not alter the conclusions.
Some species exhibit extreme avoidance of certain codon families. For example, Ashbya gossypii ATCC 10895 codes Arg with only AGR codons without using any CGN codons. In contrast, Hyaloraphidium curvatum codes Arg with only CGN codons without using any AGR codons. Such avoidance of certain codon families would facilitate the evolutionary loss of the associated tRNA [33][34][35], although it is not always clear whether the avoidance is the cause or the consequence of the loss of the associated tRNA.
We computed relative synonymous codon usage, or RSCU [36], as a measure of codon usage bias within a codon family by using DAMBE [31,32]. Some coding sequences are incomplete. For example, the cox1 CDS in Aspergillus niger is annotated as "join(<19768..20614,21640..22495)". The first two nucleotides (i.e., at positions 19768 and 19769) represent a partial codon and are discarded in computing codon frequencies.

Wobble cost between G and U: MU3:G and MG3:U
Recall that the two inequalities are, respectively, for NNY codons translated by tRNA with a G at the wobble site of tRNA anticodon, and for NNR codons translated by tRNA with a U at the wobble site of tRNA anticodon. The observed N C /N U ratios for Allomyces macrogynus (representing fungal mitochondrial genomes with translation table 4) are much smaller than N A /N G ratios ( Table 2). The smallest N C /N U value is 0.279 whereas the smallest N A /N G value is 2.372 ( Table 2). We have mentioned before that, if M U3:G is very small, then a wobble G at the tRNA anticodon will not impose strong selection against U-ending codons, and N U may drift up and down with mutation relative to N C . This will lead to relatively small N C /N U ratios. From the minimum N C /N U value of 0.279, we may infer that M U3:G < 0.279•M O , i.e., M U3:G is quite small relative to M O .
The AUH codon family coding for amino acid Ile in A. macrogynus mitochondrial genome is translated by a tRNA with a GAU anticodon. According to Eq. (7), the M U3:G / Co ratio should also be smaller than the N C /N U ratio. The observed N C /N U ratio is 0.3605 (= 159/441). This is similar to the N C /N U ratio in NNY codon families (Table 2). Thus, the wobble cost of M U3:G relative to M O from the AUH codon family is similar to that derived from NNY codon families.
The CUN codon family coding for amino acid Leu in A. macrogynus mitochondrial genome is translated by a tRNA with an AAG anticodon. Note that no A→I conversion has been observed in mitochondria [37,38] so Eq. (9) is applicable. According to Eq. A at the wobble site of its anticodon. In the mitochondrial genome of Penicillium marneffei, the tRNA translating the AAY codon family has a wobble A in its anticodon. The N C /N U ratio is 0.087 (= 41/473) which is much smaller than 0.1905. Similarly, in the mitochondrial genome of Pichia Canadensis, the tRNA translating the AGY codon family has a wobble A in its anticodon. The N C /N U ratio is 0.0083 (= 1/120) which is also much smaller than 0.1905. Thus, the prediction that a wobble A at the tRNA anticodon is advantageous over a wobble G when N C /N U < 0.1905 is consistent with empirical data.
In contrast to the small N C /N U ratios in NNY, AUH and CUN codon families, all N A /N G ratios in NNR codon families are substantially larger (Table 2). We have argued before that, if M G3:U is very small, then a U at the wobble site of tRNA anticodon would impose little selection against G-ending codons in NNR codon families, and mutation may allow N G to drift up, leading to large N G values relative to N A . However, if M G3:U is large, then Gending codons should be strongly selected against and N G would be small relative to N A , leading to large N A /N G ratios. The much larger N A /N G ratios than N C /N U ratios (t = 5.2967, DF = 10, p = 0.0003, two-tailed test) suggest that M G3:U is much greater than M U3:G .
There is a caveat in evaluating the relative magnitude of M U3:G /M O , and M G3:U /M O by the N C /N U and N A /N G ratios because these ratios can be affected by AT-biased mutations. The mitochondrial genome of A. macrogynus is 57472 bp, with the number of C+G being 22700 and that of A+U being 34772. If we exclude those nucleotides in coding sequences, then the numbers of C+G and A+U are 14136 and 17656, respectively. This may be considered as the background frequencies maintained by mutation bias, which leads to the expected N C /N U ratio of 0.8001 (=N C+G /N A+U ) and that of N A /N G ratio of 1.2490 (=N A+U / N C+G ). Thus, we expect N C /N U to be smaller than N A /N G even when there is no difference between M U3:G and M G3:U . To establish the argument that M U3:G is indeed smaller than M G3:U , we need to demonstrate that (1) there is no selection against U-ending codons in NNY codon families by showing that the observed N C /N U ratio is not greater than 0.8001, and (2) there is selection against Gending codons in NNR codon families by showing that N A /N G is significantly greater than 1.2490. It is not enough to show that N C /N U < N A /N G .
We note that the observed N C /N U values for the seven NNY codon families in Table 2 are all smaller than the expected 0.8001, suggesting no selection against U-ending codons in NNY codon families (i.e., small M U3:G ). In contrast the observed N A /N G ratios for the five NNR codon families are all much greater than the expected 1.2490 (Table 2), consistent with the interpretation of selection against G-ending codons in NNR codon families (i.e., large M G3:U ). This is consistent with the interpretation that M U3:G < M G3:U . One can perform a χ 2 -test for each of the NNR codon families to see if G-ending codons are underused. The tests are all highly significant, with p < 0.00001.
The results are similar for fungal genomes with translation table 3, with the result from a representative species (Saccharomyces cerevisiae) presented in Table 3. Again the N C / N U ratios in NNY codon families are much smaller than N A /N G ratios in NNR codon families. We should note that the S. cerevisiae mitochondrial genome is much more ATbiased than the A. macrogynus mitochondrial genome, with the proportion of (G+C) in non-coding sequences being only 0.1484. The reason for the GC deficiency in yeast is not clear, but it may be caused either by mutation bias or by the low abundance of C in living cells [39][40][41]. In any case, the expected N C /N U and N A /N G ratios, given the biased genomic AT content, are 0.1742 and 5.7405, respectively. We note that the observed N C /N U ratios among the NNY codon families are all smaller than the expected value of 0.1742 except for the UUY (Phe) codon family (Table 3), against suggesting little selection against U-ending codons (i.e., small M U3:G ). In contrast, the N A / N G ratios are all significantly greater than the expected 5.7405 except for the AUR (Met) codon family, suggesting selection against G-ending codons (i.e., large M G3:U ). The exceptional AUR (Met) codon family has a tRNA with a CAU anticodon which would favor G-ending codons and is expected to be different.
The N C /N U ratio for the UUY (Phe) codon family is consistently greater than those of other NNY codon families (Tables 2, 3). One may suspect whether, for this particular codon family, there is a significant M U3:G . The rate of tRNA Phe with anticodon 3'-AAG-5' dissociating from the UUU codon is about twice as high as that from the fully matched UUC codon [42]. Also, the tRNA Phe misread CUC codons more than twice as often as CUU codons [42]. These two lines of evidence suggest that C3:G pair is much more favorable than U3:G pair, i.e., M U3:G for the UUY (Phe) codon family may indeed be substantially greater than M C3:G . Unfortunately, there has been no other similar studies on codon-anticodon pairing for other NNY codon families. One should also note that the tRNA Phe in this study comes from Escherichia coli, and the result may not be applicable to fungal species.
The two fungal species using genetic In short, results from fungal mitochondrial genomes are consistent with no selection against U-ending codons in NNY codon families but significant selection against Gending codons in NNR codon families, indicating that M U3:G is smaller than M G3:U . These findings corroborate previous biochemical studies demonstrating that U3:G is energetically much more favorable than G3:U [2,[23][24][25][26]. However, G3:U can be almost as good as A3:U when U is modified to xm 5 U [43].
The finding of a small M U3:G can explain puzzling observations in codon usage in fungal mitochondrial genomes. Take the tRNA Ser translating the AGY codon family in the mitochondrial genome of Ashbya gossypii ATCC 10895 for example. The genome contains 31 AGU codons and no AGC codon. CAAH would have predicted an ACU antico- there is no need to overuse the UGC codons to avoid M U3:G so AT-biased mutation in yeast (whose genome is highly AT-biased), which increases the frequency of Uending codons, is not checked by counteracting selection.
If the finding of M U3:G << M G3:U is applicable to nuclear genomes, then we predict that NNY codon families need only one type of tRNA (i.e., tRNA with a wobble G at the tRNA anticodon) for translation. In contrast, NNR codon families should ideally be translated by two different types of tRNAs, one with a wobble U for NNA codons and the other with a wobble C for NNG codons (to avoid the relatively high wobble cost of M G3:U ). A corollary is that, if a NNY codon family is translated by only tRNA with a wobble G and NNR codon family by tRNA with a wobble U, then codon usage bias should be smaller in the NNY codon family (in which selection against G-ending codon is weak because of the small M U3:G ) than in the NNR codon family (in which selection against U-ending codon is strong because of the relatively large M G3:U ). Below we test these predictions with genomic data.

NNY codon families are translated by one type of tRNA with a wobble G and NNR codon families are translated by two types of tRNA with a wobble U and a wobble C, respectively, in fungal nuclear genomes
We have inferred that a tRNA with a wobble G at its anticodon should be efficient not only in translating C-ending codons, but also in translating U-ending codons because of the small M U3:G . For this reason, only one type of tRNA with a wobble G should generally be sufficient in translating NNY codon families. In contrast, in NNR codon families, tRNA with a wobble U at its anticodon should be poor in translating G-ending codons because of the large M G3:U . Thus, the presence of G-ending codons in NNR codon families should favour the use of two types of tRNAs, one with a wobble U for translating NNA codons and another with a wobble C for translating NNG codons. In mitochondrial genomes with limited gene content, each codon family is generally translated by a single tRNA species. So this prediction cannot be tested. However, this prediction can be tested with nuclear genomes where the number of tRNA genes is not so limited as in mitochondrial genomes.
The prediction is strongly supported by results from the nuclear genome of Saccharomyces cerevisiae (Table 4). All NNR codon families are translated by two types of tRNAs with anticodons matching NNA and NNG codons, respectively, whereas all NNY codon families are translated by one type of tRNA with a wobble G at the tRNA anticodon (Table 4). This is consistent with the interpretation that the inference of M U3:G << M G3:U derived from fungal mitochondrial genome is also applicable to fungal nuclear genomes.
One may propose an alternative hypothesis for the observation that NNY codon families are translated by tRNAs with a wobble G whereas NNR codon families translated by tRNAs with a wobble C and a wobble U, respectively. The wobble A in some nuclear tRNAs is known to be converted to inosine or I for short [45][46][47] which may pair with A, C or U. If a tRNA translating an NNY codon family has its wobble G mutated to wobble A, then the wobble A may undergo the A→I conversion and misread NNA codons. For this reason, the wobble G→A mutation should be strongly selected against, which would explain the lack of wobble A in tRNA translating NNY codons.
This alternative hypothesis invoking the A→I conversion, while logically sound, was dismissed quite early after the discovery [45,47] that the A→I conversion is quite restrictive and occurs mainly at ACG or ARV anticodons (where R is the IUB code for A or G, and V is for A, G or C). Among tRNAs translating NNY codons, only tRNA Phe has a middle A in the anticodon, and the rest do not have an R in the middle of the anticodon. This means that, for all NNYtranslating tRNAs except for tRNA Phe , even if their regular wobble G mutates to A, the resulting wobble A will NOT be converted to inosine. Thus, the alternative hypothesis can only explain the avoidance of a wobble A in tRNA Phe but cannot explain the avoidance of a wobble A in other NNY-translating tRNAs.

Conclusion
In summary, our general hypothesis based on wobble costs allows the integration of the two conventional hypotheses (i.e., CAAH and WVH) on codon-anticodon wobble pairing. The integration leads to new ways of evaluating relative wobble cost of different wobble pairings.
In particular, the finding that M U3:G is much smaller than M G3:U corroborates previous structural studies showing that U3:G is energetically more favourable than G3:U and leads to a better understanding of the translation efficiency mediated by codon and anticodon wobble pairing.