Phenotypic error threshold; additivity and epistasis in RNA evolution
© Takeuchi et al; licensee BioMed Central Ltd. 2005
Received: 28 October 2004
Accepted: 03 February 2005
Published: 03 February 2005
The error threshold puts a limit on the amount of information maintainable in Darwinian evolution. The error threshold was first formulated in terms of genotypes. However, if a genotype-phenotype map involves redundancy ("mutational neutrality"), the error threshold should be formulated in terms of phenotypes since there is no unique fittest genotype. A previous study formulated the error threshold in terms of phenotypes, and their results showed that a rather low degree of mutational neutrality can increase the error threshold unlimitedly.
We obtain an analytical formulation of the phenotypic error threshold by considering the "additive assumption", in which base substitutions do not influence each other (no epistasis). Our formulation shows that an increase of the error threshold due to mutational neutrality is limited. Computer simulations of RNA evolution are conducted to verify our formulation, and the results show a good agreement between the analytical prediction and the simulations. The comparison with the previous formulation illustrates that it is important for the prediction of the error threshold to consider that the number of base substitutions per replication is rather large near the error threshold. To examine the additive assumption, a detailed analysis of additivity and epistasis in RNA folding of a particular sequence is performed. The results show a high degree of epistasis in RNA folding; furthermore, the analysis also elucidates the reason of the success of the additive assumption.
We conclude that an increase of the error threshold by mutational neutrality is limited, and that the additive assumption achieves a good prediction of the error threshold in spite of a high degree of epistasis in RNA folding because the average number of base substitutions of sequences retaining the phenotype per replication is sufficiently small to avoid of the effect of epistasis.
The error threshold is a limit on the permissible mutation rate for which "survival of the fittest" holds in Darwinian evolution . The error threshold can be seen as a limit on the amount of information maintainable in evolutionary systems (information threshold) since an increase in sequence length results in an increase in error rate. The information threshold leads to a paradox in prebiotic evolution . Suppose that to increase the maintainable amount of information, an evolving system must acquire a more complex molecular mechanism to reduce the mutation rate. However, to have such a complex molecular mechanism the system must maintain a longer sequence in the first place. Thus, the system will encounter a barrier in the evolution of complexity (cf. ).
The error threshold was first formulated in terms of genotypes. However if some changes in genotype do not alter the phenotype or the fitness (mutational neutrality), there is no unique genotype which can be stably maintained. Instead, the survival of a phenotype should be considered, and thus the error threshold should be formulated in terms of phenotypes .
In this paper, first we formulate the phenotypic error threshold analytically by employing the additive assumption, in which base substitutions do not influence each other. Under the additive assumption, we obtain the probability that a replication does not alter the phenotype (neutral replication) as a function of the number of base substitutions and the fraction of "neutral substitutions". Our results show a qualitative difference from the previous formulation . Second, we analyze epistasis in RNA folding of a particular RNA sequence to examine the additive assumption.
Results and discussion
Phenotypic error threshold
The quasispecies equation describes (prebiotic) replicator dynamics in well-mixed systems . We transform the equation in two ways: (1) describing the abundance of phenotypes instead of that of genotypes by denoting the population of genotypes which share the same phenotype by one variable (see  for mathematical details); (2) distinguishing only two classes of phenotypes, the focal phenotype (denoted by x) and the others called mutants (denoted by y). The population size is assumed to be large enough to express the abundance of the phenotypes by normalized concentration. The population dynamics of the phenotypes is described as
dx/dt = σ Qx + σ Λ (1 - Q)x - Dx - Φx,
dy/dt = y + σ (1 - Λ)(1 - Q)x - Dy - Φy (1)
where σ (> 1) is the replication rate of x (that of y is normalized to 1); Q is the replication accuracy of x. Λ is the fraction of neutral mutants of x. D is the degradation rate (or the death rate) assumed to be uniform over the phenotypes. Φ = (σ - D)x + (1 - D)y is the excess production (or the mean fitness). The terms - Φx and - Φy induce a selection pressure. We neglect back mutation from y to x (this simplification will be discussed in the next section). From Eq. 1, we obtain the survival condition of x as Q + Λ (1 - Q) >σ-1, for which the stationary value of x is larger than zero. From this inequality, we will deduce the phenotypic error threshold, i.e., the maximum error rate of replication for which x can be stably maintained in the system.
We distinguish two classes of single base substitutions: neutral and deleterious substitutions. The class of a substitution is determined by the effect of the substitution on the phenotype when there are no other substitutions in the genotype: A substitution which retains the phenotype is a neutral substitution; otherwise it is a deleterious substitution. A beneficial substitution is not considered since the focus of the study is on the maintenance of x. [A replicator is thought of as a polymer in our study. We refer to a monomer as a base, having RNA in mind. The formulation of the phenotypic error threshold itself is independent of this terminology.]
To calculate the effective replication accuracy Q e = Q + Λ (1 - Q), we assume that a mutant is neutral iff there is no deleterious substitution: The effect of a substitution on the phenotype is independent of the other substitutions; i.e., no epistasis is assumed (the additive assumption). Let λ denote the fraction of neutral substitutions in all possible single substitutions, and let d denote the number of substitutions per replication. Then, the probability that d substitutions are all neutral substitutions (thus neutral replication) is approximated by λ d by assuming that the number of neutral substitutions in d substitutions follows the binomial distribution (the binomial approximation). This approximation is valid if the probability of correct replication per base (denoted by q) is sufficiently large so that d is small. Denoting the sequence length of replicators by N, the effective replication accuracy (Q e ) is obtained as
by assuming that q and λ are uniform among the genotypes in x, that q is invariable over sequence positions, and that N is the same among the populations. [A similar formula was obtained in  as Q e = (q + v(1 - q)) N , where v is a parameter to be tuned to match the formula to the observed value of Q e . Therefore v in  implicitly involves both the additive effect and epistasis.]
The minimum q for which x can survive is derived from Q + Λ(1 - Q) >σ-1 and Eq. 2 as
qmin = (σ-1/N- λ)/(1 - λ). (3)
From Eq. 3, we obtain the information threshold, i.e., the maximum permissible sequence length as
Nmax = ln(σ-1)/ln(q + (1 - q)λ). (4)
Comparison between the analytical prediction and computer simulations
We compare our analytical prediction with computer simulations. Our computer program simulates the evolution of RNA replicators in a well-mixed flow reactor (e.g. see ). In the simulations, each RNA sequence replicates and/or is diluted (be taken out from the reactor) with a certain probability in every time step. RNA folding is utilized again as a genotype-phenotype map (computed by ): The fitness of an RNA sequence depends on the secondary structure (i.e., the phenotype) of the RNA sequence. The fittest phenotype is set to the secondary structure of a yeast tRNAphe (the clover leaf structure, N = 76). RNA sequences which have the fittest phenotype replicate with the probability 0.01 per time step; all the other RNA sequences (mutants) replicate with the probability 0.001 per time step (thus σ = 10). The replication introduces mutations with a certain probability. Back mutations are not allowed to occur – the effect of back mutations is negligible if the sequence length is large enough  (this was confirmed by the simulations which were the same as the above except for allowing back mutations [data not shown]). The dilution probability (Φ) is calculated as the average probability of replication divided by the target population size. The target population size is set to 10000. All the simulations start with 10000 yeast tRNAphe sequences. The degradation of sequences is ignored (D = 0).
The results of the computer simulations showed that the "representative λ" (defined in Methods section – Non-uniform distribution of λ) of the fittest sequences increased from 0.307 to ca. 0.40 for the examined values of the error rate (data not shown). (This is also true for the population average of λ [op. cit.].) The value of the representative λ fluctuates over the time (st. dev. = 0.01 at 1 - q = 0.0475).
Comparison with a previous formulation
Reidys et al.  derived the phenotypic error threshold as
from Q e = Q + λ (1 - Q). This equation shows an unlimited increase in the error threshold for λ ≥ σ-1 (see the dashed line in Fig. 1 and the [green] dotted line in Fig. 3). However, Q e = Q + λ (1 - Q) is valid only if either (1) a neutral set is uniformly distributed over the genotype space [a neutral set is a set of genotypes where all genotypes map to the same phenotype], or (2) q is so large that most mutants have d = 1. The uniform distribution of neutral sets in the genotype space is not applicable in RNA folding as shown later. The latter possibility is discussed next.
Reidys et al.  obtained an extension of Eq. 5, the so called "four λ approximation". This extension divides a sequence in four sub-sequences in order to take into account the fact that the fraction of neutral substitutions varies over the sequence position. This extension still overestimates Q e though less so than Eq. 5 because the approximation now permits four substitutions per replication as a side effect of the subdivision. Note that this extension makes a fairly good prediction on the error threshold (see the [blue] dashed line Fig. 3) because the use of a small non-evolved λ value coincidentally cancels out the overestimation.
In conclusion, it is crucial for the calculation of the error threshold to consider that the number of substitutions per replication is large near the error threshold.
Epistasis in RNA folding
The rather impressive success of the additive assumption is counter-intuitive in view of RNA folding, in which many interactions occur between bases. In the next part of the paper, we study a particular RNA sequence, namely the yeast tRNAphe (which comprises the initial population of the previously described RNA evolution simulations), in terms of additivity and epistasis. The objective of this study is to understand how the additive assumption achieves a good prediction in spite of a high degree of nonlinearity in RNA folding [12, 13].
Definition of positive and negative epistasis.
δ = 0
δ > 0
We calculate the effective replication accuracy (Q e ) including the effect of epistasis in order to compare it with Q e calculated under the additive assumption. The first trial was to include a "trivial" epistasis in base paired regions (helices) as a part of the additive effect (see Methods section – Trivial epistasis). However, the analysis showed that epistasis occurs mainly in a "non-trivial" way (data not shown), and thus it is not sufficient for our sake to include a trivial epistasis. We next took a probabilistic approach to calculate Q e with epistasis (see Methods section – Probabilistic approach). The results of this method agree with the observation (see the dashed line in Fig. 5).
We compare Q e calculated under the additive assumption to that calculated with epistasis as shown in Fig. 6b (the solid line). As the comparison shows, the additive assumption indeed underestimates Q e ; however, the underestimation becomes prominent only if the error rate is higher than the error threshold (1 - qmin = 0.05). As Fig. 6b (the dashed line) shows, the average d per neutral replication is ca. 1.5 at the error threshold, which is much smaller than the average d per replication (ca. 3.8). This means that the main contribution to Q e under the additive assumption is from the mutants of d = 1 or 2 at the error threshold. According to Fig. 6a the additive assumption is a good approximation at d = 1 or 2. Therefore, the additive assumption accurately estimates Q e and thus the error threshold. When the average d per neutral replication reaches 3, the additive assumption substantially underestimates Q e (see Fig. 6b), which is consistent with Fig. 6a. [However, note that this analysis does not imply that the average number of substitutions per replication is safely assumed to be near one. On the contrary, in our calculation it was more than one – actually 3.8 – at the error threshold.]
In the above examination of the additive assumption, there are two points which must be examined further: (1) The analysis of epistasis was performed on a yeast tRNAphe, which comprises the initial population of the RNA evolution simulations, but the results may differ if the analysis is done for a sequence which appears later in the RNA evolution simulations. Thus, we performed the same analysis to a sequence which was chosen from the population of the fittest sequences after the evolution in the simulations (at the 20000th time step). The results, however, did not change our conclusion (data not shown). (2) If the length of sequences is larger, the average d per neutral replication may increase, and thus the additive assumption may break down before the error threshold. However, it turns out from the analytical calculation that the average d per neutral replication at the error threshold decreases as N increases when λ is invariant (cf. the caption of Fig. 6b). Furthermore, λ decrease as N increases (see the filled circles in Fig. 2). Therefore, if the sequence length is larger, the average d per neutral replication will be actually smaller. We also conducted computer simulations of RNA evolution with a longer sequence length (200 bases). The results showed that the average d per neutral replication (calculated under the additive assumption) at the error threshold was indeed smaller (ca. 1.2 substitutions with λ ≈ 0.35) than in the previous case of the shorter sequence length (ca. 1.5 substitutions with λ ≈ 0.4), and the additive assumption still predicts the results of the simulations closely (data not shown).
• The phenotypic error threshold was formulated under the additive assumption. The formulation asserted that mutational neutrality increases the error threshold but the increase is limited.
• The importance of considering multiple substitutions per replication at the error threshold was illustrated.
•The comparison with the computer simulations and the analysis of epistasis showed that the additive assumption correctly estimates the effective replication accuracy (Q e ) and thus the error threshold.
•The reason why the additive assumption achieves a good prediction of the error threshold in spite of a high degree of (non-trivial) epistasis in RNA folding is that the average number of substitutions per neutral replication is small enough to avoid of the effect of epistasis.
Non-uniform distribution of λ
If λ is not uniform over the genotypes sharing the same phenotype, the effective replication accuracy (Q e ) depends on the distribution of the genotypes in the population. In this case, Q e is calculated under the additive assumption as
where X I denotes the population of the focal phenotype (I), and x i (resp. λ i ) is the population (resp. the fraction of neutral substitutions) of the genotype i. The set S I denotes the set of genotypes which have the phenotype I. If x i and λ i are known, the representative λ of the phenotype can be calculated from the following equation as
The difference between the representative λ and the population average of λ was very small in the computer simulations. (The population average was always slightly smaller [ca. 99%] than the representative λ unless the distribution of λ in the fittest population is completely homogeneous [data not shown].)
Calculation of Q e with epistasis
Trivial epistasis in RNA folding
It is trivial that epistasis occurs between bases which make a pair (hydrogen bond) in the reference secondary structure. Our first trial to include epistasis in the calculation of Q e was to include this epistasis as a part of the additive effects of mutations as described in . In this approach, the reference sequence is subdivided into non-paired regions and paired regions; paired regions are treated as strings of base pairs (one pair of bases is considered as one character); a substitution of a base pair is considered as an elementary step of mutations in paired regions. Following this procedure, the epistasis occurring between bases in a pair is now treated as an additive effect. [For example, two mutations – GC→GG and GC→CC – occurring in paired base must be deleterious because the bases can not make a pair any more. Given that the combined mutation – GC→CG – is neutral, it will be a case of positive epistasis in the previous procedure. However, in the new procedure it will be a case of additive neutral because the combined mutation is treated as one substitution of a base pair.] We categorized the mutants into the previously defined four groups of mutants (i.e. additive neutral, additive deleterious, positive epistasis and negative epistasis) using the same data as that of Fig. 5. However, the result did not differ much from that shown in Fig. 5 (data not shown). We conclude that epistasis occurs mainly in a non-trivial way, and thus this approach is not effective for our purpose.
Since (non-trivial) epistasis makes it difficult to predict what happens to the phenotype given a specific change in genotype, we take the following probabilistic approach: We assume that a mutant is neutral with a certain probability (denoted by μ(v, δ)), which depends on the number of neutral base substitutions (denoted by v) and on that of deleterious base substitutions (denoted by δ). Then, the probability of neutral replication is obtained (by using the binomial approximation) as
where d = v + δ. Q e is thereupon derived as
where ε n , ε d and ε nd are the epistatic parameters of the interactions among neutral substitutions, among deleterious substitutions, and between neutral and deleterious substitutions, respectively. Note that in the additive assumption, all epistatic parameters are zero. α and η represent non-exponential decay. To express the compensation by neutral substitutions, we arbitrarily used a saturation function β v/(γ + v) where β and γ are parameters. To obtain the parameters, we fitted Eq. 10 to the data in Fig. 7ab (the solid lines) as explained in the caption. As shown in Fig. 7a (the dotted lines), the theoretical estimation turns out to be a slight underestimation. (d) was calculated from the above obtained parameters, and the calculated values match the observed ones (see the dashed line in Fig. 5).
- Eigen M: Selforganization of Matter and the Evolution of Biological Macromolecules. Naturwissenschaften. 1971, 58: 465-523. 10.1007/BF00623322.View ArticlePubMedGoogle Scholar
- Maynard Smith J, Szathmary E: Chaper 4. The Major Transitions in Evolution. 1997, New York: Oxford University Press, 41-58. reprintGoogle Scholar
- van Nimwegen E, Crutchfield JP: Metastable Evolutionary Dynamics: Crossing Fitness Barriers or Escaping via Neutral Paths?. Bull Math Biol. 2000, 62: 799-848. 10.1006/bulm.2000.0180.View ArticlePubMedGoogle Scholar
- Huynen MA, Stadler PF, Fontana W: Smoothness within ruggedness: the role of neutrality in adaptation. Proc Natl Acad Sci USA. 1996, 93: 397-401. 10.1073/pnas.93.1.397.PubMed CentralView ArticlePubMedGoogle Scholar
- Reidys C, Forst CV, Schuster P: Replication and mutation on neutral networks. Bull Math Biol. 2001, 63: 57-94. 10.1006/bulm.2000.0206.View ArticlePubMedGoogle Scholar
- Wilke CO: Selection for fitness versus selection for robustness in RNA secondary structure folding. Evolution. 2001, 55: 2412-2420.View ArticlePubMedGoogle Scholar
- Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P: Fast Folding and Comparison of RNA Secondary Structures. Monatsh Chem. 1994, 125: 167-188. 10.1007/BF00818163.View ArticleGoogle Scholar
- Fontana W, Schuster P: A computer model of evolutionary optimization. Biophys Chem. 1987, 26: 123-47. 10.1016/0301-4622(87)80017-0.View ArticlePubMedGoogle Scholar
- Eigen M, McCaskill J, Schuster P: The molecular quasi-species. Adv Chem Phys. 1989, 75: 149-263.Google Scholar
- Nowak M, Schuster P: Error Thresholds of Replication in Finite Populations Mutation Frequencies and the Onset of Muller's Ratchet. J Theor Biol. 1989, 137: 375-395.View ArticlePubMedGoogle Scholar
- van Nimwegen E, Crutchfield JP, Huynen M: Neutral evolution of mutational robustness. Proc Natl Acad Sci USA. 1999, 96: 9716-9720. 10.1073/pnas.96.17.9716.PubMed CentralView ArticlePubMedGoogle Scholar
- Huynen MA, Konings DA, Hogeweg P: Multiple coding and the evolutionary properties of RNA secondary structure. J Theor Biol. 1993, 165: 251-267. 10.1006/jtbi.1993.1188.View ArticlePubMedGoogle Scholar
- Wilke CO, Lenski RE, Adami C: Compensatory mutations cause excess of antagonistic epistasis in RNA secondary structure folding. BMC Evol Biol. 2003, 3: 3-10.1186/1471-2148-3-3.PubMed CentralView ArticlePubMedGoogle Scholar
- Johnston WK, Unrau PJ, Lawrence MS, Glasner ME, Bartel DP: RNA-Catalyzed RNA Polymerization: Accurate and General RNA-Templated Primer Extension. Science. 2001, 292: 1319-1325. 10.1126/science.1060786.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.