Comparison of translation loads for standard and alternative genetic codes
© Sammet et al; licensee BioMed Central Ltd. 2010
Received: 7 August 2009
Accepted: 14 June 2010
Published: 14 June 2010
The (almost) universality of the genetic code is one of the most intriguing properties of cellular life. Nevertheless, several variants of the standard genetic code have been observed, which differ in one or several of 64 codon assignments and occur mainly in mitochondrial genomes and in nuclear genomes of some bacterial and eukaryotic parasites. These variants are usually considered to be the result of non-adaptive evolution. It has been shown that the standard genetic code is preferential to randomly assembled codes for its ability to reduce the effects of errors in protein translation.
Using a genotype-to-phenotype mapping based on a quantitative model of protein folding, we compare the standard genetic code to seven of its naturally occurring variants with respect to the fitness loss associated to mistranslation and mutation. These fitness losses are computed through computer simulations of protein evolution with mutations that are either neutral or lethal, and different mutation biases, which influence the balance between unfolding and misfolding stability. We show that the alternative codes may produce significantly different mutation and translation loads, particularly for genomes evolving with a rather large mutation bias. Most of the alternative genetic codes are found to be disadvantageous to the standard code, in agreement with the view that the change of genetic code is a mutationally driven event. Nevertheless, one of the studied alternative genetic codes is predicted to be preferable to the standard code for a broad range of mutation biases.
Our results show that, with one exception, the standard genetic code is generally better able to reduce the translation load than the naturally occurring variants studied here. Besides this exception, some of the other alternative genetic codes are predicted to be better adapted for extreme mutation biases. Hence, the fixation of alternative genetic codes might be a neutral or nearly-neutral event in the majority of the cases, but adaptation cannot be excluded for some of the studied cases.
The origin and universality of the genetic code is one of the biggest enigmas in biology . Soon after the genetic code of Escherichia coli was deciphered , it was realized that this specific code out of more than 1084 possible codes is shared by all studied life forms (albeit sometimes with minor modifications). The question of how this specific code appeared and which physical or chemical constraints and evolutionary forces have shaped its highly non-random codon assignment is subject of an intense debate. In particular, the feature that codons differing by a single nucleotide usually code for either the same or a chemically very similar amino acid and the associated block structure of the assignments is thought to be a necessary condition for the robustness of the genetic code both against mutations as well as against errors in translation [3–13]. This robustness reduces fitness losses due to mutation and mistranslation, which is believed to be a major force in coding sequence evolution . There are three basic theories of the genetic code's nature, origin, and evolution. Whereas the stereochemical theory first proposed by Gamow  asserts that the codon assignment was originated by the physicochemical affinity between the amino acid and the codon or anticodon, the adaptive theory posits that the genetic code was shaped under selection for robustness, either against mutations [16, 17] or against translation errors [18, 19], or against both [18, 20, 21]; finally, the coevolution theory postulates that the structure of the genetic code reflects the pathways of amino acid biosynthesis .
In this work, we address an issue that has received somewhat less attention in this broader context, namely the existing variants of the standard genetic code. These variants are used for instance in the mitochondria of many species, and they consist in the modification of one or several codons of the standard genetic code. Two main mechanisms have been proposed to explain how they may have evolved despite the large fitness cost that is expected to be associated with the modification of a codon [9, 23, 24]: through an ambiguous intermediate state and through the reassignment of a temporarily unused codon. These mechanisms are not mutually exclusive. The first one assumes that a codon is temporarily recognized both by the current as well as by a mutated tRNA, so that it can result in two different amino acids. Such ambiguity might be preferential in some circumstances and remain present for some time. A codon reassignment occurs if the mutated tRNA finally takes over. The second scenario takes place if one codon disappears from a given genome. This is particularly likely in small genomes with large guanine/cytosine (GC) or adenine/thymine (AT) content, as it is the case for many mitochondrial genomes and nuclear genomes of parasitic microbes. In this case the translation system may change without any cost, and the codon may be 'conquered' by another amino acid. Both scenarios consider that alternative genetic codes are the result of non-adaptive or neutral evolution, even though 'genomic streamlining' (i.e., selective pressure to minimize the genome by eliminating a tRNA) has been proposed as a possible advantage of code changes [25, 26]. However, it has not been addressed whether these variants differ from the standard genetic code as far as mutation and translation loads are concerned.
Here, we use computer simulations of neutral protein evolution constrained to maintain the folding and misfolding stability of the native state in order to study the differences between the standard genetic code and seven naturally occurring variants concerning their effects on protein stability. These effects are predicted using a simplified model of protein folding , the same that we consistently use in the evolutionary simulations [28–32]. Despite its simplicity, this model is able to predict with similar accuracy as other more complicated models the effects of mutations on folding stability. Due to our simplifying assumption to consider a neutral model, the different genetic codes hardly have any influence on the average unfolding and misfolding stabilities. However, alternative codes yield significantly different mutation and translation loads, in particular for genomes evolving under strong AT or GC mutation bias.
Results and Discussion
In this work, we study how the genetic code influences the fitness consequences of errors (loads) during mutation and translation. This influence may arise because of two mechanisms: (1) Directly, through the change in the rate of occurrence of different amino acid misincorporations in the translated protein; (2) Indirectly, through the evolutionary influence that the genetic code may have on protein stability. We simulated our previously proposed model of protein evolution in order to study this indirect influence as well.
Our model of protein evolution has been presented in previous works [28–32], and it is similar to models used by others [33–40]. It has been successfully used to explain non-Poissonian rates in neutral evolution  and the observed site-specific amino acid distributions [31, 32], to name two examples. It considers a genetically homogeneous population, i.e. the product of the population size N and the mutation rate μ is assumed to be small. The assumption of a small mutation rate μ is justified when considering an individual protein, but not an entire genome. If we considered a whole evolving genome instead of a single protein, the approximation of very small mutation rate would not be justified, since genomic mutation rates are in a range of 0.003 to 0.004 mutations per genome per generation for DNA-based microbes, including viruses, bacteria, and eukaryotes . In this context, a new interesting effect has to be considered, namely the hitch-hiking effect, which consists in the fixation of mildly disfavorable alleles driven by a positively selected allele present in the same chromosome. However, considering the hitch-hiking effect would make the study much more complicated, and we leave it as a subsequent step. In our model, the fitness of an individual carrying a particular gene depends only on the folding properties of the translated protein, which are estimated through a simple protein folding model. A characteristic of our model that distinguishes it from similar ones is that we consider two types of stability, with respect to misfolding and with respect to unfolding. They are calculated by estimating the normalized energy gap α(A) and the folding free energy F(A), respectively. Misfolding stability is measured through α and unfolding stability is measured through -F, which are computed for each protein sequence A encountered in the simulated evolution. The protein structure is assumed to have been already optimized by natural selection and is kept fixed throughout evolution, as represented by the experimental structure found in the Protein Data Bank (PDB) (see Methods). If the folding stability is too small, the protein will not be stable in its native state; if the misfolding stability is too small, misfolded structures can trap the folding process, and they can expose hydrophobic patches and promote aggregation. In the spirit of Kimura's neutral theory of molecular evolution [42, 43], we assume that mutations are either neutral or strongly deleterious. More specifically, all proteins having both unfolding and misfolding stabilities above previously fixed thresholds are regarded as viable and they are assigned the same fitness ℱ = 1 (in arbitrary units) and all proteins for which at least one of the stabilities is below threshold are regarded as unviable and they are assigned fitness ℱ = 0. Mutations to stop codons are considered lethal and receive a fitness ℱ = 0. The two neutral thresholds αthr and Fthr are chosen proportional to the values of αnat and Fnat of the respective protein in the PDB, multiplied with coefficients slightly smaller than one so that the native protein is above threshold. We present results with both coefficients equal to 0.98, but our results are robust to changing this prefactor in a reasonable range. Note that fitness functions depending continuously on stabilities can be considered, but the resulting non-neutral evolutionary dynamics is significantly more complex due to population size effects . Neutrality of the fitness landscape is assumed here for the sake of simplicity, since otherwise the model would depend on at least two additional parameters, i.e. the smoothness of the fitness landscape and the effective size of the population, making it very difficult to reach clear conclusions about the effect of the genetic code and the mutation bias.
'The Echinoderm and Flatworm Mitochondrial Code' (NCBI transl_table = 9) ; Taxonomic range: Asterozoa (starfishes), Echinozoa (sea urchins), Rhabditophora among the Platyhelminthes.
Orange: 'The Invertebrate Mitochondrial Code' (NCBI transl_table = 5) ; Taxonomic range: Nematoda: Ascaris, Caenorhabditis; Mollusca: Bivalvia; Polyplacophora. Arthropoda/Crustacea: Artemia; Arthropoda/Insecta: Drosophila; Locusta migratoria (migratory locust), Apis mellifera (honeybee).
Magenta: 'The Ascidian Mitochondrial Code' (NCBI transl_table = 13) . Taxonomic range: Urochordata: Tunicates.
'The Vertebrate Mitochondrial Code' (NCBI transl_table = 2) ; Taxonomic range: Vertebrata.
'The Yeast Mitochondrial Code' (NCBI transl_table = 3) ; Taxonomic range: Saccharomyces cerevisiae, Candida glabrata, Hansenula saturnus, and Kluyveromyces thermotolerans.
'The Ciliate, Dasycladacean and Hexamita Nuclear Code' (NCBI transl_table = 6) . Taxonomic range: Ciliata: Oxytricha and Stylonychia, Paramecium, Tetrahymena, Oxytrichidae and probably Glaucoma chattoni. Dasycladaceae: Acetabularia and Batophora. Diplomonadida: Hexamita inflata, Diplomonadida ATCC50330 and ATCC50380.
'The Alternative Yeast Nuclear Code' (NCBI transl_table = 12) . Taxonomic range: Endomycetales (yeasts): Candida albicans, Candida cylindracea, Candida melibiosica, Candida parapsilosis, and Candida rugosa (However, other yeasts, including Saccharomyces cerevisiae, Candida azyma, Candida diversa, Candida magnoliae, Candida rugopelliculosa, Yarrowia lipolytica, and Zygoascus hellenicus, definitely use the standard (nuclear) code).
Unfolding and misfolding stabilities
We first study the direct effect of the eight different genetic codes on the average unfolding and misfolding stabilities (see Methods). Since we chose a neutral fitness landscape where mutations are either neutral or lethal, we expect that, independent of the mutation rate and the genetic code, the two folding stabilities will be close to the neutral thresholds, i.e. the minimum allowed stability values, which correspond to the maximum number of sequences, while larger stabilities correspond to many fewer sequences. However, large AT content (more than 50% AT) favors unfolding stability at the expense of misfolding stability, whereas small AT content (less than 50% AT) favors misfolding stability at the expense of unfolding stability. Consequently, for large AT content selection mainly acts on misfolding stability, which is expected to be closer to the neutral threshold, whereas unfolding stability is easily obtained and it is above the threshold. Conversely, for small AT content selection mainly acts on unfolding stability, which is expected to be close to its neutral threshold. In general, the smaller of the two stabilities is very close to the neutral threshold and almost independent of the genetic code, whereas the stability favored by the mutation process is above the neutral threshold, although this does not imply any gain in fitness, and it may depend on the genetic code and the mutation bias.
The behavior of the average unfolding stability -F for different genetic codes (as summarized in Fig. 2) is exemplified using the epsilon subunit of F1F0-ATP synthase (PDB id. 1aqt, chain A) and the transition-to-transversion ratio k = 2, see Fig. 3. The transition-to-transversion ratio k does not affect the results significantly, so that we omit the results for k = 20. As expected, there is hardly any influence of the genetic code on the average unfolding stability, except for extremely large AT biases (approx. 80% AT content or more) where unfolding stability is well above threshold and selection mainly acts on misfolding stability. In such cases, most alternative genetic codes display a smaller unfolding stability than the standard code (less negative F), but there are two exceptions, the 'Echinoderm and Flatworm Mitochondrial Code' (blue triangles) and the 'Alternative Yeast Nuclear Code' (red stars), which have slightly better unfolding stability than the standard code, even though the difference is minor and it does not imply any difference in fitness. The two other proteins we studied show a similar behavior (data not shown).
We see from the two figures that the genetic code may influence the balance between unfolding and misfolding stability. For instance, the 'Yeast Mitochondrial Code' (cyan diamonds) yields systematically lower stability against unfolding (less negative F) when compared with other codes, and higher stability against misfolding at low AT, which is an effect similar to the one obtained by decreasing the AT content. Nevertheless, these differences are not relevant, since they are buffered at the level of fitness. In fact, we assume a neutral model in which the fitness is either ℱ = 0 or ℱ = 1 (in arbitrary units), so that differences in stability do not yield differences in fitness. Moreover, notice that the difference of, say, stability against unfolding between different codes is only significant at an AT content at which this stability is anyway high, i.e. it is far from the neutral threshold, so that the selective pressure mainly affects stability against misfolding. These differences might become relevant in a non-neutral fitness landscape where fitness depends smoothly on stability .
Mutation and translation load
Next, we study the effect of the different genetic codes on mutation and translation loads (see Methods). These represent the fitness loss due to mutations and translation errors. The two loads differ by the rate at which a given error occurs and by the treatment of stop codons: In the case of mutation load, the rate of a mutation is given by the mutation process we use, which includes both the mutation bias and the transition-to-transversion ratio; mutations to stop codons are equivalent to mutations to sense codons as far as the chemical modification of the DNA sequence is concerned and hence included into the definition of the corresponding load (cf. Eq. (3) in Methods), and the associated fitness is zero. In the case of translation load, all mistranslation to sense codons are assigned equal rate. Since a premature end of translation by misinterpreting a sense codon as a stop codon is caused by release factors and not by tRNAs (and hence by a different mechanism than misinterpreting a sense codon as a another sense codon) which furthermore involves neighboring codons , for simplicity, we consider its error rate much smaller than the rate of missense errors in translation, and we neglect mistranslations to stop codons. In this way, the translation load is not explicitly influenced by the number of stop codons in the genetic code under consideration (cf. Eq. (4) in Methods).
It is interesting to note that there seems not to be any trivial dependence of the average mutation load on the number of stop codons in the genetic code. Since the mutation load is calculated including mutations to stop codons, one would expect genetic codes containing more stop codons to have a larger mutation load than genetic codes containing fewer stop codons. This is, however, not the case, as one sees in Fig. 5 by comparing the 'Echinoderm and Flatworm Mitochondrial Code' (blue triangles) and the 'Yeast Mitochondrial Code' (cyan diamonds), which both have two stop codons, with the standard genetic code, which has three stop codons. Additionally, as our definition of translation load excludes mistranslations corresponding to stop codons (as misinterpreting a sense codon as a stop codon is caused by release factors and not by tRNAs and hence by a different mechanism than misinterpreting a sense codon as another sense codon), the lower translation load of the 'Echinoderm and Flatworm Mitochondrial Code' (blue triangles) in comparison to the standard genetic code seen in Fig. 6 is not trivially due to the fact that this alternative genetic code has two instead of three stop codons.
Assumptions and empirical observations
Like all mathematical models of evolution, our model depends on several assumptions and parameters. An important assumption is that the population is genetically homogeneous, i.e. the product Nμ of population size times mutation rate is small. This assumption is considered approximately valid for eukaryotic and bacterial populations when considering an individual protein, in particular if population size is small. Strict validity of this assumption would imply that the number of different alleles at a typical locus is not larger than two. Despite that this is not the case, the number of alleles at a typical locus is usually small, so that the assumption is at least approximately valid. The high mutation rates of RNA viruses violate this assumption, and in this case recent work [54, 55] has shown that even the neutral model should be re-formulated in the framework of the quasi species theory . If we considered a whole evolving genome instead of a single protein, the approximation of very small mutation rate would not be justified, and a new interesting effect has to be considered, namely the hitch-hiking effect, which consists in the fixation of mildly disfavorable alleles driven by a positively selected allele present in the same chromosome. The mutation process was modelled using two parameters, the mutation bias and the transition-to-transversion ratio. While this parameterization might appear too simplified, it has the merit to focus on two variables whose relevance has been pointed out by a large number of experimental studies, statistical analysis, and models.
As evolutionary model, we adopt a model in spirit Kimura's neutral theory of molecular evolution [42, 43], in which mutations are either neutral or lethal. The assumption of neutrality prevents us to study how the genetic code affects the fitness that can be achieved in evolution (in the neutral model the fitness of a viable sequence is equal to one in arbitrary units by definition), however it allows to study its influence on the mutation and translation load without any further assumptions concerning the shape of the fitness landscape. Whether a mutation is neutral or lethal is decided based on unfolding and misfolding stabilities of the resulting amino acid sequence.
The ingredient of our model that seems more debatable is the genotype to phenotype mapping, which is based on predicted unfolding and misfolding stabilities. While we do not claim that our predictions are accurate in individual cases, our experience suggests that they are statistically correct, so that they are able to discover statistical trends, which is what we address here. However, a limitation of our approach consists in that the mitochondrial proteome mainly contains membrane proteins, whereas our predictions of unfolding and misfolding stabilities are only valid for soluble (i.e., non-membrane) proteins, hence implying caution about the interpretation of the deleterious effect of mitochondrial codes. Besides of that, we note that the statistical potentials that we use here are quite general, as they have been optimized based on all soluble globular proteins in the protein data bank, and they are not limited to a particular organism or protein family. An alternative way to derive empirical potentials for protein evolution consists in fitting the potentials to maximize the likelihood of the observed sequences, which provides an improved fit in an evolutionary context .
Another important point concerns the choice of the neutral thresholds αthr and Fthr.We have tested in previous studies that changing the neutral thresholds within reasonable limits (approx. 25% in both directions) does not significantly affect the results of neutral simulations [28–32].
Furthermore, we assume in our model that all synonymous mutations that do not change the amino acid sequence are neutral. Nevertheless, it is known that the use of alternative codons has important phenotypic effects on the translation dynamics [58, 59], and it can affect the rate of translation errors . Modelling these effects, however, would require assumptions on the abundance of different tRNA molecules and the dynamics of the ribosome that are outside the scope of our model. Therefore, selection on optimal codon usage is a way to reduce the load due to translation error that is complementary to the influence of the genetic code that we investigate here.
Finally, an interesting empirical observation that might be related with our protein evolution model is the finding that long genes tend to have lower codon usage bias . One of us previously observed that longer proteins have contact interaction energies that are less optimized than for shorter proteins. This finding has a simple neutral explanation, since the number of contact interaction per protein is larger in longer proteins, whereas the conformation entropy loss per residue that these interactions have to compensate does not depend on protein length . Therefore, contact interactions are subject to weaker selective constraints in long proteins. If this is true, and if the selective forces on codon bias are mainly due to the advantage of reducing folding problems in mistranslated proteins, which is now considered the prevailing view , one would expect that the selective forces on codon bias are also reduced for longer proteins, consistent with the empirical observation by Duret and Mouchiroud . This subject may be addressed in the framework of an improved model in which fitness takes into account both the two protein folding stabilities and the translation load.
Our results show that the standard genetic code is generally preferable to naturally occurring variants, in the sense that it typically yields smaller mutation and translation loads. This finding is consistent with the view that the standard genetic code is well adapted for reducing the consequences of translation errors on protein folding stability, as expected in the framework of the adaptive theory on the origin of the genetic code, and that the fixation of alternative genetic codes is either a slightly deleterious event that is mutationally driven or, if it brings some selective advantage, this is through the reduction of the number of tRNA needed.
Nevertheless, we found one alternative genetic code (the 'Echinoderm and Flatworm Mitochondrial Code') that seems to be better at reducing mutation and translation loads, and particularly yields better translation load except for very small AT content (approx. 20% AT content or less). As translation load is a very important constraint in protein evolution , this difference might result in small but significant fitness differences and hence be subject to positive selection. Therefore, although our model confirms the view that code changes are slightly deleterious events in the majority of the cases, it also suggests that adaptation cannot be excluded for one of the studied cases.
Unfolding and misfolding stability
where U(a, b) is the contact interaction matrix determined in Ref. . Although rather simple, this model is accurate enough to allow quantitative predictions of the folding free energy of small proteins that fold with two-state thermodynamics (the correlation coefficient between experimental and predicted free energy is r = 0.92 over a representative test set of 20 proteins, UB, unpublished result) and of the stability effect of mutations (correlation coefficient r = 0.72 over a set of 195 mutations, UB, unpublished result). This is comparable to state-of-the-art programs such as Fold-X . However, the computational simplicity of the model makes it affordable to use it for simulating very long evolutionary trajectories, which would not be possible using other tools. The unfolding free energy should also take into account the loss of conformation entropy upon folding, which we modelled in other works as sL with s being the chain entropy per amino acid and L the protein length. However, this term only induces a constant shift sL in the unfolding free energy, F'(A) = ∑ i<j C ij U(A i , A j ) + sL and its effect is just to shift the neutral threshold Fthr in the same direction, without influencing the results.
with A = 0.1, B = 4, q0 = 0.1, and N c = ∑i<jC ij . ⟨e⟩ A and σe,Aare the mean and standard deviation of the interaction energy of both native and non-native contacts in sequence A.
Mutations are modelled through the HKY process , in which the mutation rate from nucleotide n to n', T(n, n'), is μf (n') if n→n' is a transition, and μkf(n') if it is a transversion. The transition-to-transversion ratios used in this work are k = 2 and k = 20, suitable for nuclear and mitochondrial DNA, respectively [46, 47]. The microscopic rate μ is assumed to be very small and it does not affect the results. We further assume π(A) = π(T) and π(C) = π(G) (Chargaff second parity rule), so that the only parameter of the mutation model is the stationary AT frequency, AT = π(A) + π(T).
Simulation of the evolutionary process
Our model of protein evolution cannot be treated analytically, so that we have to study it using numerical simulations (see Fig. 1). We start each simulation with the native amino acid sequence obtained from the Protein Databank (PDB) of the chosen structure, from which we construct a corresponding 'native' DNA sequence by randomly choosing codons using the genetic code under consideration with weights determined by the given AT content (inverse translation). The simulation thus starts as close as possible to the equilibrium with the chosen AT content. The initial part of the trajectory is discarded to ensure that relevant quantities are sampled at the stationary state. To do so, we visually verified that the stabilities had reached the stationary state for simulations with the standard code, and then discarded the same transient part of the trajectory for all alternative codes.
The simulations are performed as follows. At every step, we randomly select one DNA site j with probability dependent on the nucleotide n j occupying it, and we extract the mutated nucleotide n' ≠ n j with probability proportional to T μ (n j ,n'). The mutated DNA is then translated to an amino acid sequence, whose unfolding and misfolding stabilities are computed through Eqs. (1) and (2). The mutation receives fitness ℱ = 1 (in arbitrary units) and is accepted if both F <Fthr and α > αthr, or gets a fitness ℱ = 0 and is rejected otherwise, hence assuming a model in spirit of Kimura's neutral theory of molecular evolution. Mutations to stop codons are considered lethal and receive a fitness ℱ = 0. As the mutation process is continuous, the waiting time until a new mutation arises is a Poissonian variable with mean μ-1. Instead of drawing an explicit waiting time for each mutation to arise, we assign each mutation the mean time μ-1 (this is equivalent to performing an average over possible realizations of waiting times). In case several mutations occur before one gets fixed, the weighting of the sequence before the accepted mutation is increased accordingly. The simulation is run for a large number of 106 mutations to obtain long evolutionary trajectories which are used to calculate the averages. The neutral thresholds Fthr and αthr are calculated for each simulated protein and kept fixed during the simulations. We set Fthr = γFnat and αthr = γαnat, where Fnat and αnat are the unfolding free energy and misfolding stability of the respective native amino acid sequence. The factor γ is chosen as γ = 0.98, so that the native sequence is considered viable. Changing γ within reasonable limits (approx. 25% in both directions) does not significantly effect the results.
Mutation and translation load
where Δℱ(A[n] → A[n']) is the fitness difference between amino acid sequence A[n'], as translated from the mutated DNA sequence n', and amino acid sequence A[n], and Rmut(n→n') is the rate of a mutation from n to n', which is calculated according to our mutation process with the mutation rate μ and only single nucleotide mutations are considered (i.e. only terms linear in μ, ignoring the higher order terms which have rates proportional to μ2 and μ3 and are hence much smaller than μ). Since we study a neutral model, so that the fitness of a viable sequence is ℱ = 1 (in arbitrary units) and ℱ = 0 otherwise, the fitness difference Δℱ(A[n] → A[n']) can only take the values 0 if A[n'] is a viable sequence as well, and 1 if A[n'] is not a viable sequence. As we restrict ourselves to those n' which differ from n by a single nucleotide, the sum in Eq. (3) contains 9L terms, and hence the normalization Nmut(n) ≡ ∑ n' Θ[Rmut(n → n')], where Θ[Rmut(n → n')] = 1 if Rmut(n→n')> 0 and 0 otherwise, yields Nmut(n)= 9L independent of n. Consequently, in the extreme case of all DNA sequence n' which differ from n by a single nucleotide (one point mutation) being viable and hence Δℱ(A[n] → A[n']) = 0 for all these n', the mutational load is Lmut (n) = 0. If the transition-to-transversion ratio was k = 1 so that Rmut(n→n') = μ for all n' differing from n by a single nucleotide, then μNmut = ∑ n' Rmut (n → n') and the mutation load Lmut (n) would be the fraction of lethal sequences n ' . Due to our choice for the normalization, there is no explicit dependence of Lmut(n) on sequence length.
where Rtrans (A[n] → A[n']) is the rate of a translation error resulting in amino acid sequence A[n'] instead of A[n] and v the rate of single nucleotide mismatches. For simplicity, we assume that Rtrans(n→n') = v if nucleotide sequence n' resulted from n by a single nucleotide mismatch and does not contain any stop codon and 0 otherwise (i.e. only terms linear in v are considered, as for the mutation load), so that our definition of the translation load does not depend on the error rate of translation, which is approximately 10-4 per translated mRNA codon  but may differ from species to species. We exclude nucleotide sequences n' containing a stop codon for the computation of the translation load since a premature end of translation by misinterpreting a sense codon as a stop codon is caused by release factors and not by tRNAs (and hence by a different mechanism than misinterpreting a sense codon as another sense codon) which furthermore depends on neighboring codons . For simplicity, we neglect here this error rate with respect to the rate of missense errors in translation. In this way, the translation load does not explicitly dependent on the number of stop codons, and the normalization Ntrans(n) = ∑ n' Θ[Rtrans(n → n')], where Θ[Rtrans(n → n')] = 1 if Rtrans(n → n') > 0 and 0 otherwise, does dependent on n. Even though we use the above general definition, Eq. (4), in analogy to the mutation load, note that with our choice for Rtrans(n → n'), νNtrans = ∑ n' Rtrans(n → n') and the translation load Ltrans(n) is the fraction of lethal sequences among all n' differing from n by a single nucleotide and not containing a stop codon. Due to our choice for the normalization, there is neither an explicit dependence of Ltrans(n) on sequence length nor on the number of stop codons in the genetic code considered.
We studied the three following proteins structures: (i) the epsilon subunit of F1F0-ATP synthase (PDB id. 1aqt, chain A, α + β protein, 135 amino acids, GenBank CBG36944.1), (ii) the acyl carrier protein (PDB id. 1hy8, chain A, all-β protein, 76 amino acids, GenBank BAA10975.1 and CAB13465.1), and (iii) the cold-shock protein (PDB id. 1c9o, chain A, all-β protein, 66 amino acids, GenBank CAA51790.1).
We acknowledge contributions of Andreas Buhr in early stages of this work. UB acknowledges financial support from the Spanish Science and Innovation Ministry through the Ramon y Cajal program and through the projects BIO2008-04384 and CSD2006-00023. Our collaboration was facilitated through the program "Acciones Integradas España-Alemania" of the Spanish Science and Innovation Ministry, project HA2006-0044, and of the Deutscher Akademischer Austauschdienst project D/06/12848.
- Koonin E, Nozozhilov A: Origin and Evolution of the Genetic Code: The Universal Enigma. IUBMB Life. 2009, 61: 99-111. 10.1002/iub.146.PubMed CentralView ArticlePubMedGoogle Scholar
- Nirenberg M, Jones W, Leder P, Clark B, Sly W, Pestka S: On the coding of genetic information. Cold Spring Harbor Symp Quant Biol. 1963, 28: 549-557.View ArticleGoogle Scholar
- Haig D, Hurst L: A quantitative measure of error minimization in the genetic code. J Mol Evol. 1991, 33: 412-417. 10.1007/BF02103132.View ArticlePubMedGoogle Scholar
- Freeland S, Hurst L: The genetic code is one in a million. J Mol Evol. 1998, 47: 238-248. 10.1007/PL00006381.View ArticlePubMedGoogle Scholar
- Freeland S, Knight R, Landweber L, Hurst L: Early fixation of an optimal genetic code. Mol Biol Evol. 2000, 17: 511-518.View ArticlePubMedGoogle Scholar
- Gilis D, Massar S, Cerf N, Rooman M: Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biology. 2001, 2: 11-10.1186/gb-2001-2-11-research0049.View ArticleGoogle Scholar
- Zhu C, Zeng X, Huang W: Codon usage decreases the error minimization within the genetic code. J Mol Evol. 2003, 57: 533-537. 10.1007/s00239-003-2505-7.View ArticlePubMedGoogle Scholar
- Goodarzi H, Nejad HA, Torabi N: On the optimality of the genetic code, with the consideration of termination codons. BioSystems. 2004, 77: 163-173. 10.1016/j.biosystems.2004.05.031.View ArticlePubMedGoogle Scholar
- Santos M, Moura G, Massey S, Tuite M: Driving change: The evolution of alternative genetic codes. Trends in Genetics. 2004, 20: 95-102. 10.1016/j.tig.2003.12.009.View ArticlePubMedGoogle Scholar
- Sella G, Ardell D: The coevolution of genes and genetic code: Crick's frozen accident revisited. J Mol Evol. 2006, 63: 297-10.1007/s00239-004-0176-7.View ArticlePubMedGoogle Scholar
- Hohn M, Park H, O'Donoghue P, Schnitzbauer M, Söll D: Emergenze of the universal genetic code imprinted in an RNA record. Proc Natl Acad Sci USA. 2006, 103: 18095-18100. 10.1073/pnas.0608762103.PubMed CentralView ArticlePubMedGoogle Scholar
- Freeland S, Wu T, Keulmann N: The case for an error minimizing standard genetic code. Origins of Life and Evolution of the Biosphere. 2003, 33: 457-477. 10.1023/A:1025771327614.View ArticlePubMedGoogle Scholar
- Itzkovitz S, Alon U: The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genome Research. 2007, 17: 405-412. 10.1101/gr.5987307.PubMed CentralView ArticlePubMedGoogle Scholar
- Drummond D, Wilke C: Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008, 134: 341-352. 10.1016/j.cell.2008.05.042.PubMed CentralView ArticlePubMedGoogle Scholar
- Gamow G: Possible relation between deoxyribonucleic acid and protein structure. Nature. 1954, 173: 318-10.1038/173318a0.View ArticleGoogle Scholar
- Sonneborn T: Degeneracy of the genetic code: Extent, nature, and genetic implications. Evolving Genes and Proteins. Edited by: Bryson V, Vogel H. 1965, New York: Academic Press, 377-397.Google Scholar
- Epstein C: Role of amino acid "code" and selection for confirmation in the evolution of proteins. Nature. 1966, 210: 25-28. 10.1038/210025a0.View ArticlePubMedGoogle Scholar
- Woese C: On the evolution of the genetic code. Proc Natl Acad Sci USA. 1965, 54: 1546-1552. 10.1073/pnas.54.6.1546.PubMed CentralView ArticlePubMedGoogle Scholar
- Goldberg A, Wittes R: Genetic code: Aspects of organization. Science. 1966, 153: 240-10.1126/science.153.3734.420.View ArticleGoogle Scholar
- Davies J, Gilbert W, Gorini L: Streptomycin, suppression, and the code. Proc Natl Acad Sci USA. 1964, 51: 883-890. 10.1073/pnas.51.5.883.PubMed CentralView ArticlePubMedGoogle Scholar
- Friedman S, Weinstein I: Lack of fidelity in the translation of ribopolynucleotides. Proc Natl Acad Sci USA. 1964, 52: 988-996. 10.1073/pnas.52.4.988.PubMed CentralView ArticlePubMedGoogle Scholar
- Wong J: Coevolution theory of the genetic code. Proc Natl Acad Sci USA. 1975, 72: 1909-1912. 10.1073/pnas.72.5.1909.PubMed CentralView ArticlePubMedGoogle Scholar
- Sengupta S, Higgs P: A unified model of codon reassignment in alternative genetic codes. Genetics. 2005, 170: 831-840. 10.1534/genetics.104.037887.PubMed CentralView ArticlePubMedGoogle Scholar
- Sengupta S, Yang X, Higgs P: The mechanisms of codon reassignments in mitochondrial genetic codes. J Mol Evol. 2007, 64: 662-688. 10.1007/s00239-006-0284-7.PubMed CentralView ArticlePubMedGoogle Scholar
- Andersson S, Kurland C: Genomic evolution drives the evolution of the translation system. Biochem Cell Biol. 1995, 73: 775-787. 10.1139/o95-086.View ArticlePubMedGoogle Scholar
- Andersson S, Kurland C: Reductive evolution of resident genomes. Trends Microbiol. 1998, 6: 263-268. 10.1016/S0966-842X(98)01312-2.View ArticlePubMedGoogle Scholar
- Bastolla U, Farwer J, Knapp E, Vendruscolo M: How to guarantee optimal stability for most representative structures in the Protein Data Bank. Proteins. 2001, 44: 79-96. 10.1002/prot.1075.View ArticlePubMedGoogle Scholar
- Bastolla U, Porto M, Roman H, Vendruscolo M: Lack of self-averaging in neutral evolution of proteins. Phys Rev Lett. 2002, 89: 208101-10.1103/PhysRevLett.89.208101.View ArticlePubMedGoogle Scholar
- Bastolla U, Porto M, Roman H, Vendruscolo M: Connectivity of neutral networks, overdispersion and structural conservation in protein evolution. J Mol Evol. 2003, 56: 243-254. 10.1007/s00239-002-2350-0.View ArticlePubMedGoogle Scholar
- Bastolla U, Porto M, Roman H, Vendruscolo M: Statistical properties of neutral evolution. J Mol Evol. 2003, 57: S103-S119. 10.1007/s00239-003-0013-4.View ArticlePubMedGoogle Scholar
- Porto M, Roman H, Vendruscolo M, Bastolla U: Prediction of site-specific amino acid distributions and limits of divergent evolutionary changes in protein sequences. Mol Biol Evol. 2005, 22: 630-638. 10.1093/molbev/msi048.View ArticlePubMedGoogle Scholar
- Bastolla U, Porto M, Roman H, Vendruscolo M: A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol Biol. 2006, 6: 43-10.1186/1471-2148-6-43.PubMed CentralView ArticlePubMedGoogle Scholar
- Babajide A, Hofacker I, Sippl M, Stadler P: Neutral networks in protein space. Fol Des. 1997, 2: 261-269. 10.1016/S1359-0278(97)00037-0.View ArticleGoogle Scholar
- Bussemaker H, Thirumalai D, Bhattacharjee J: Thermodynamic stability of folded proteins against mutations. Phys Rev Lett. 1997, 79: 3530-3533. 10.1103/PhysRevLett.79.3530.View ArticleGoogle Scholar
- Govindarajan S, Goldstein R: On the thermodynamic hypothesis of protein folding. Proc Natl Acad Sci USA. 1998, 95: 5545-5549. 10.1073/pnas.95.10.5545.PubMed CentralView ArticlePubMedGoogle Scholar
- Tiana G, Broglia R, Roman H, Vigezzi E, Shakhnovich E: Folding and misfolding of designed proteinlike chains with mutations. J Chem Phys. 1998, 108: 757-761. 10.1063/1.475435.View ArticleGoogle Scholar
- LA Mirny VA, Shakhnovich E: How evolution makes proteins fold quickly. Proc Natl Acad Sci USA. 1998, 95: 4976-4981. 10.1073/pnas.95.9.4976.View ArticleGoogle Scholar
- Bornberg-Bauer E, Cha H: Modeling evolutionary landscapes: Mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci USA. 1999, 96: 10689-10694. 10.1073/pnas.96.19.10689.PubMed CentralView ArticlePubMedGoogle Scholar
- Dokholyan N, Shakhnovich E: Understanding hierarchical protein evolution from first principles. J Mol Biol. 2001, 312: 289-307. 10.1006/jmbi.2001.4949.View ArticlePubMedGoogle Scholar
- Parisi G, Echave J: Structural constraints and emergence of sequence patterns in protein evolution. Mol Biol Evol . 2001, 18: 750-756.View ArticlePubMedGoogle Scholar
- Drake J: Avoiding dangerous missense: Thermophiles display especially low mutation rates. PLoS Genetics. 2009, 5: e1000520-10.1371/journal.pgen.1000520.PubMed CentralView ArticlePubMedGoogle Scholar
- Kimura M: Evolutionary rate at the molecular level. Nature. 1968, 217: 624-626. 10.1038/217624a0.View ArticlePubMedGoogle Scholar
- Graur D, Li W: The neutral theory of molecular evolution. 1983, Cambridge University Press, CambridgeGoogle Scholar
- Mendez R, Fritsche M, Porto M, Bastolla U: Mutation bias favors protein folding stability in the evolution of small populations. PLoS Comp. Biol. 2010, 6: e1000767-10.1371/journal.pcbi.1000767.View ArticleGoogle Scholar
- Bastolla U, Moya A, Viguera E, van Ham R: Genomic determinants of protein folding thermodynamics. J Mol. Biol. 2004, 343: 1451-1466. 10.1016/j.jmb.2004.08.086.View ArticlePubMedGoogle Scholar
- Graur D, Li W: Fundamentals of molecular evolution. 2000, Sinauer, SunderlandGoogle Scholar
- Belle E, Piganeau G, Gardner M, Eyre-Walker A: An investigation of the variation in the transition bias among various animal mitochondrial DNA. Gene. 2005, 355: 58-66. 10.1016/j.gene.2005.05.019.View ArticlePubMedGoogle Scholar
- Yokobori S, Suzuki T, Watanabe K: Genetic code variations in mitochondria: tRNA as a major determinant of genetic code plasticity. J Mol Evol. 2001, 53: 314-326. 10.1007/s002390010221.View ArticlePubMedGoogle Scholar
- Anderson S, Bankier A, Barrell B, de Bruijn M, Coulson A, Drouin J, Eperon I, Nierlich D, Roe B, Sanger F, Schreier P, Smith A, Staden R, Young I: Sequence and organization of the human mitochondrial genome. Nature. 1981, 290: 457-465. 10.1038/290457a0.View ArticlePubMedGoogle Scholar
- Clark-Walker G, Weiller G: The structure of the small mitochondrial DNA of kluyveromyces thermotolerans is likely to reflect the ancestral gene order in fungi. Biochimica et Biophysica Acta. 1995, 1228: 1-27. 10.1016/0005-2728(94)00161-W.View ArticleGoogle Scholar
- Hoffman D, Anderson R, DuBois M, Prescott D: Macronuclear gene-sized molecules of hypotrichs. Nucl Ac Res. 1995, 23: 1279-1283. 10.1093/nar/23.8.1279.View ArticleGoogle Scholar
- Santos M, Keith G, Tuite M: Non-standard translational events in candida albicans mediated by an unusual seryl-tRNA with a 5'-cag-3' (leucine) anticodon. The EMBO Journal. 1993, 12: 607-616.PubMed CentralPubMedGoogle Scholar
- Freistroffer D, Kwiatkowski M, Buckingham R, Ehrenberg M: The accuracy of codon recognition by polypeptide release factors. Proc. Natl. Acad. Sci. USA. 2000, 97: 2046-2051. 10.1073/pnas.030541097.PubMed CentralView ArticlePubMedGoogle Scholar
- van Nimwegen E, Crutchfield J, Huynen M: Neutral evolution of mutational robustness. Proc Natl Acad Sci USA. 1999, 96: 9716-9720. 10.1073/pnas.96.17.9716.PubMed CentralView ArticlePubMedGoogle Scholar
- Wilke C: Molecular clock in neutral protein evolution. BMC Genetics. 2004, 5: 25-10.1186/1471-2156-5-25.PubMed CentralView ArticlePubMedGoogle Scholar
- Eigen M: Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften. 1971, 58: 465-523. 10.1007/BF00623322.View ArticlePubMedGoogle Scholar
- Kleinman C, Rodrigue N, Lartillot N, Philippe H: Statistical potentials for improved structurally constrained evolutionary models. Mol Biol Evol. 2010,Google Scholar
- Cannarozzi G, Schraudolph N, Faty M, von Rohr P, Friberg M, Roth A, Gonnet P, Gonnet G, Barral Y: A role for codon order in translation dynamics. Cell. 2010, 141: 355-67. 10.1016/j.cell.2010.02.036.View ArticlePubMedGoogle Scholar
- Tuller T, Carmi A, Vestsigian K, Navon S, Dorfan Y, Zaborske J, Pan T, Dahan O, Furman I, Pilpel Y: An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell. 2010, 141: 344-54. 10.1016/j.cell.2010.03.031.View ArticlePubMedGoogle Scholar
- Hershberg R, Petrov D: Selection on codon bias. Annu Rev Genet. 2008, 42: 287-99. 10.1146/annurev.genet.42.110807.091442.View ArticlePubMedGoogle Scholar
- Duret L, Mouchiroud D: Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA. 1999, 96: 4482-7. 10.1073/pnas.96.8.4482.PubMed CentralView ArticlePubMedGoogle Scholar
- Bastolla U, Demetrius L: Stability constraints and protein evolution: the role of chain length, composition and disulfide bonds. Protein Eng Des Sel. 2005, 18: 405-415. 10.1093/protein/gzi045.View ArticlePubMedGoogle Scholar
- Guerois R, Nielsen JE, Serrano L: Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations. J Mol Biol. 2002, 320: 369-387. 10.1016/S0022-2836(02)00442-4.View ArticlePubMedGoogle Scholar
- Derrida B: Random Energy Model: an exactly solvable model of disordered systems. Phys Rev B. 1981, 24: 2613-2626. 10.1103/PhysRevB.24.2613.View ArticleGoogle Scholar
- Shakhnovich E, Gutin A: Formation of unique structure in polypeptide chains. Theoretical investigation with the aid of a replica approach. Biophys Chem. 1989, 34: 187-199. 10.1016/0301-4622(89)80058-4.View ArticlePubMedGoogle Scholar
- Hasegawa M, Kishino H, Yano T: Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985, 22: 160-174. 10.1007/BF02101694.View ArticlePubMedGoogle Scholar
- Kurland C: Translational accuracy and the fitness of bacteria. Annu Rev Genet. 1992, 26: 29-50.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.