Evolution of phage with chemically ambiguous proteomes

Background The widespread introduction of amino acid substitutions into organismal proteomes has occurred during natural evolution, but has been difficult to achieve by directed evolution. The adaptation of the translation apparatus represents one barrier, but the multiple mutations that may be required throughout a proteome in order to accommodate an alternative amino acid or analogue is an even more daunting problem. The evolution of a small bacteriophage proteome to accommodate an unnatural amino acid analogue can provide insights into the number and type of substitutions that individual proteins will require to retain functionality. Results The bacteriophage Qβ initially grows poorly in the presence of the amino acid analogue 6-fluorotryptophan. After 25 serial passages, the fitness of the phage on the analogue was substantially increased; there was no loss of fitness when the evolved phage were passaged in the presence of tryptophan. Seven mutations were fixed throughout the phage in two independent lines of descent. None of the mutations changed a tryptophan residue. Conclusions A relatively small number of mutations allowed an unnatural amino acid to be functionally incorporated into a highly interdependent set of proteins. These results support the 'ambiguous intermediate' hypothesis for the emergence of divergent genetic codes, in which the adoption of a new genetic code is preceded by the evolution of proteins that can simultaneously accommodate more than one amino acid at a given codon. It may now be possible to direct the evolution of organisms with novel genetic codes using methods that promote ambiguous intermediates.


Background
Organismal proteomes are generally thought of as being chemically distinct, in the sense that a genetic code is maintained by codon:anticodon interactions and the specificities of aminoacyl-tRNA synthetases will almost always lead to the translation of mRNAs into proteins of defined sequence and chemical composition. While alternative codes are known [1], these also yield chemically distinct proteomes. The evolution of an organism with novel codon:anticodon interactions and aminoacyl-tRNA synthetase specificities may produce proteins whose sequences and compositions differ from those generated by an organism with the 'Universal' code, but still will not produce proteins that have multiple, different amino acids at a given sequence position.
This chemical distinctness of organismal proteomes is maintained by the relatively low rate of amino acid misincorporation that occurs during protein biosynthesis. Many aminoacyl-tRNA synthetases have been found to have at least a thousand-fold preference for their cognate amino acid, after editing (reviewed in [2]). EF-Tu further discriminates between cognate and non-cognate codon:anticodon pairs prior to and after GTP hydrolysis [3]. Because of these mechanisms, the overall error rate for amino acid insertion into proteins is typically at least 3 × 10 -3 , and frequently lower [2][3][4].
Although amino acid misincorporations seldom occur in Nature, chemically ambiguous proteomes can be generated in laboratory settings. Many aminoacyl-tRNA synthetases will efficiently charge tRNA molecules with amino acid analogues [2]. In particular, the ability of the Bacillus subtilis tryptophanyl-tRNA synthetase to discriminate against fluorine-substituted analogues of tryptophan has been examined. Discrimination against 4-fluorotryptophan (4fW) was only 6-fold, while discrimination against 6-fluorotryptophan (6fW) was 20-fold [5]. Consistent with this, Escherichia coli strains that are transiently grown in the presence of high concentrations of fluorotryptophan analogues will incorporate a mixture of natural and unnatural amino acids throughout their proteomes [6][7][8][9][10]. Similarly, norleucine and norvaline have been shown to be synthesized as side-products of branched chain amino acid biosynthesis [2]. Norleucine is incorporated with alacrity into proteins, replacing up to 20% of methionine residues once methionine has been exhausted during protein overexpression [11,12].
Such chemical ambiguity typically extracts a phenotypic cost. An E. coli auxotroph selected to grow continuously on a high proportion of 4fW [6] accumulated 5 (identified) mutations in three genes responsible for tryptophan incorporation (tryptophanyl tRNA synthetase, aromatic amino acid permease, and a transcriptional repressor of aromatic amino acid permease). Nonetheless, the evolved strain grew extremely poorly, and had a doubling time of over a day. E. coli mutants selected to grow with cysteine incorporated at a valine codon accumulated mutations in the editing domain of valyl-tRNA synthetase [13]. Increased mischarging led to the substitution of 24% of valines with aminobutyrate. Finally, the yeast Candida spp. has been found to ambiguously (albeit inefficiently [14]) translate the leucine codon CUG as serine [15]. This ambiguous tRNA was transferred to Saccharomyces cerevesiae on a plasmid, and the dual incorporation of serine and leucine throughout the yeast proteome resulted in a 50% decrease in growth rate [16].
The design or evolution of organisms with novel genetic codes has been undertaken by a number of groups [6,13,[17][18][19]. One potential route to the evolution of an organism with a novel genetic code is to initially select for the mixed incorporation of natural and unnatural amino acids throughout the proteome [6]. Growth defects that arise from the misincorporation of the amino acid analogue can potentially be ameliorated by the evolution of those proteins whose functions are inhibited by the analogue. Such chemically ambiguous proteomes might then further evolve over time to fully incorporate the analogue. In order to better understand the initial route of adaptation of an organismal proteome to chemical ambiguity, we chose to adapt a simple proteome, that of bacteriophage Qβ, to function in the presence of an amino acid analogue.

Results and Discussion
The evolution of phage that could utilize or tolerate an amino acid analogue required the availability of a host that could grow on the analogue. We and others have previously shown that E. coli can be grown in high concentrations of fluorotryptophan analogues, with concomitantly high incorporation of the analogues into cellular proteins [6][7][8][9][10]. The replication of Qβ phage was therefore examined in an E. coli auxotroph grown in the presence of a series of tryptophan analogues. The number of doublings in 20 hours was used as a measure of fitness, and was determined using a standardized assay. While most analogues did not seem to affect phage growth, 6fW significantly depressed Qβ fitness, decreasing the number of doublings by ca. 10-fold in a standard assay ( Figure 1). This is equivalent to an approximately 180 million-fold smaller increase in titer over 20 hours. We therefore chose to adapt phage to 95% 6fW. As expected based on previous experiments with tryptophan analogues, 6fW was found to be incorporated into cellular proteins at a level of approximately 60%, irrespective of whether a single, isolated protein or the bacterial proteome was analyzed (Table 1).
Initially, two replicate lines of phage were evolved over ten serial passages in tryptophan alone, to help ensure that any mutations that were adaptive for the growth conditions alone would sweep the population in advance. Both lines were split and then further evolved over an additional 15 rounds of selection in W or an additional 25 rounds of selection in 95% 6fW (see also Figure 2). After 25 serial passages the fitness of both replicate lines increased by slightly more than 4-fold on the analogue. The kinetics of fitness improvements were quite different between the replicate lines ( Figure 3), indicating that the phage may have taken different evolutionary paths to similar phenotypes. Variant lineages have previously been observed during the natural or directed evolution of other phenotypes, including the evolution of drug-resistant HIV-1 [20], the evolution of φX174 bacteriophage with altered host ranges and thermal optima [21], and the evolution of ribozymes that could cleave a novel substrate [22]. The increase in fitness appeared to have leveled off after 25 rounds of selection in at least one of the lines (Figure 3), and the selection was therefore stopped and the population further characterized.
In order to more closely discern similarities or differences in the evolutionary paths taken by the phage, the genomes of populations of ancestral and evolved phage, as well as genomes of individual variants from the evolved populations, were isolated and sequenced ( Figure 2, Figure 4). All phage apparently had a number of sequence differences relative to a previously published sequence of Qβ phage (A558C, A1607G, A2111G, C2944T, T3229C, C3712T, C4019T) [23]. However, the sequences we have determined are consistent with other published sequences of the coat and A1 proteins (Medline accession number M99039) and the replicase protein (accession number X14764). While bacteriophage Qβ can evolve extremely quickly, we believe that our sequences represent the first complete, accurate, and electronically accessible sequence of the bacteriophage genome, and the first detailed examination of the genome of a Qβ quasi-species. Nonetheless, it should be noted that prior to the development of RNA sequencing techniques, Domingo et al. [24] used RNase T1 fingerprinting to demonstrate that bacteriophage Qβ was in fact a quasi-species in which the genome was "a weighted average of a large number of different individual sequences." Replicate experiments were carried out in parallel on tryptophan media. In these lines, only one mutation, P160S Fitness of ancestral and selected phage on various tryptophan analogues Figure 1 Fitness of ancestral and selected phage on various tryptophan analogues. Ancestral and round 25 selected phage were tested for fitness on eight additional tryptophan analogues (95% analogue, 5% W) for Line 1 (left) and Line 2 (right). Analogues used were 4-fluorotryptophan (4fW), 5-fluorotryptophan (5fW), 4-methyltryptophan (4MeW), 5-methyltryptophan (5MeW), 6methyltryptophan (6MeW), 7-methyltryptophan (7MeW), 5-hydroxytryptophan (5OHW) and 5-methoxytryptophan (5MeOW). Data for fitness on 95% 6fW and W are taken from Figures 3 and 5 respectively. Error bars represent standard deviations of at least three replicates. in the A1 protein, was fixed; this mutation was not found in the lines evolved on 6fW ( Figure 2). Interestingly, proline and serine seem to toggle back and forth at position 160 during the passage of the population. It may also be that these mutations do not alternately sweep the population, but instead vary between high (detectable at the population level) and low (undetectable) frequencies over time. Mutations that cyclically appear have been observed during the evolution of other phage, although usually as a result of iterative passages between different environmental conditions (for example, see [25]). Finally, the dominant genomic sequence of these populations is identical, except for the variable presence of P160S substitutions in A1. This being the case, we expect that the fitness level of the unselected population, the population after ten rounds of selection on W, and the population after fifteen additional rounds of selection on W to be highly comparable.
Given this control, it is likely that the mutations that were fixed at the population level during growth on 6fW were adaptive. Each evolved population had seven mutations that were either fixed or at high frequency, although only two of these mutations were common to both lines (Figure 2). There were some discrepancies between the muta-tions found in the population and the mutations identified in individual isolates. Mutations that appeared to be fixed at the population level were found to be missing from either one clone (one instance) or two clones (from different lines, one instance). Conversely, there were two mutations (i.e., t66a and a3309t = I320F in the replicase) that appeared in two clones, but did not appear at the population level.
On average, individual clones from the two populations had 13 mutations (standard deviation of 3.4), 7 fixed mutations and 6 mutations that were unique to a given isolate. Some isolates contained only 2 unique mutations while others had up to 10 unique mutations. A total of more than 50 unique mutations were found. Qβ phage is typically thought of as an error-prone, quasi-species comprised of numerous different variants, and it has been estimated to have mutation rates as high as 6.5 nucleotide substitutions per genome per replicative cycle [26,27]. While our results are also consistent with considering Qβ a quasi-species [24], they may also support the hypothesis that selection is continuing to act on a transient population of variants, especially in line 2, in which fitness may still be increasing. In support of this hypothesis, the final Genotype changes over the course of the selection Amino acid substitutions were distributed throughout the phage genome ( Figure 4, Figure 6a), but the improvement in fitness on 6fW occurred without the isolation of a single mutation in a tryptophan codon, either at the population level or in individual clones. Interestingly, the coat protein contained no tryptophans, and was also found to contain no fixed amino acid substitutions. However the fact that this gene is short and would therefore have accumulated fewer random mutations may also explain this phenomenon. In contrast, the read-through protein A1 contained two fixed amino acid substitutions, S221R and T223N. However, it should be noted that while the S221R substitution was consistently found at the population level it was not found in either clone 1 of Line 1 or clone 3 of Line 2. Because the recombination frequency of Qβ phage is known to be low (on the order of 10 -8 [28]), it is possible that S221R may be a mutation which was accidentally fixed along with another, truly adaptive mutation, and was in the process of being slowly diluted out of the population. Each of the replicate lines also had additional amino acid substitutions that were fixed at the population level. An amino acid substitution (P149L) was found in the A1 protein in Line 2, there were two different amino acid substitutions (one in each line) in the A2 protein, and two different amino acid substitutions in the replicase. Sequence alignments of the replicase genes from Qβ, SP, MS2 and GA, representing the four serotypes of RNA phage, revealed a number of regions of high conservation [29,30]. The F380L substitution in the replicase protein of Line 2 occurred in what was otherwise a phylogenetically conserved residue. Similarly, the amino acid substitutions D250N (clone 3, Line 2) and L290P (clone 3, Line 1) occurred in highly conserved residues. That appearance of mutations in otherwise highly conserved residues strongly suggests that these mutations were adaptive. By comparison, I320F, found in clones 1 and 3 of Line 1, substituted the residue found in Group B singlestranded RNA phage for the residue found in Group A, suggesting that this substitution is functionally conservative [29,30].
The simplest explanation for these results is that the amino acid substitutions in the three Qβ proteins somehow compensated for intramolecular disruptions due to the incorporation of 6-fluorotryptophan or for intermolecular disruptions with fluorinated E. coli proteins. A number of interactions between phage and host proteins have been described. Interactions between Qβ replicase and various E. coli proteins are known, including EF-Tu, EF-Ts, ribosomal protein S1, and an RNA-binding protein called Hfq [31][32][33]. A2 is known to interact with MurA and inhibit cell wall biosynthesis, resulting in cell lysis [34]. Finally, the entry of Qβ phage is mediated by the F-pilus. A2 binds to the pilus and uses it for transport of the genome. The read-through protein A1 is also required for this process [30], although its precise function is not yet known [35,36].
The identification of five fixed yet silent substitutions (three in Line 1 and two in Line 2; Figure 6b) was consistent with results from previous directed evolution experiments with Qβ and the related RNA phage MS2, which indicated that mutations affecting RNA structure could be as or more important than those affecting proteins. For example, when a hairpin structure that controls the expression levels of the MS2 coat protein was mutated, compensatory mutations were recovered that restored the hairpin [37]. Eight of the selected MS2 operator mutations were silent and retained the wild-type amino acid sequence of the coat protein; only one altered the amino acid sequence [37].
Since it is clear that the secondary structures of RNA phage are under selective pressure, it is formally possible that the amino acid substitutions we observed were not important in and of themselves, but rather were by-products of the evolution of an altered RNA structure. Both the S221R and T223N substitutions in the Qβ A1 protein occur successively in a stem-loop structure [38]. The U2006G (S221R) mutation converts an A:U base-pair into an A:G mismatch, while the C2011A (T223N) mutation converts a G:C base-pair to a G:A mismatch. However, given that both of these mutations would be expected to destabilize the stem structure, it is telling that no non-coding mutations were found that would similarly destabilize this structure. Moreover, if silent mutations were involved in functional alterations of RNA structure then it might be expected that compensatory base-pairing mutations would have been observed. For example, when Qβ was selected to grow in a hfq host a G:C base pair was found to be mutated to an A:U base pair [31,39]. This covariation destabilized the 3'-terminus of the plus strand and promoted melting of the phage RNA structure, a function ascribed to Hfq. No such compensatory base-pairing mutations were found in our selection. Overall, the simplest explanation for the fixation of amino acid substitutions is that that these substitutions preserved the stability or function of Qβ proteins in the presence of a mixture of W and 6fW.
Additional experiments revealed that the adaptive mutations allowed the phage to better tolerate a mixture of tryptophan and 6-fluorotryptophan, without loss of fitness on the wild-type amino acid ( Figure 5). Moreover, fitness remained the same or improved slightly when evolved phage were assayed on eight other tryptophan analogues ( Figure 1). The retention of fitness under multiple growth conditions was not a foregone conclusion. For example, when φX174 phage were adapted to grow on Salmonella typhimurium, they lost the ability to infect E. coli C [25]. E. coli adapted to grow on glucose elicited no growth improvement on maltose [40,41]. While the same bacteria evolved to grow at 37°C lost fitness at temperatures further from optimal, they gained fitness at nearby temperatures [42]. One likely explanation for the lack of a trade-off during growth on the unnatural amino acid is that the natural amino acid was still present, and thus any given tryptophan codon would have had to accommodate both compounds at some point in the evolutionary history of the phage. This may also explain why there was no loss of fitness on a number of other tryptophan analogues.
The most important aspect of these results, though, is that they reveal that it is unlikely that the original diminution in phage fitness and subsequent evolutionary recovery were a consequence of the diminished growth rate of the host on the unnatural amino acid. Strain C600p, a strain of E. coli closely related to the host strain used here, has been shown to grow robustly in 95%6fW, but approximately half as well in 95%4fW [6]. In contrast, Qβ phage grew poorly on hosts grown in 95%6fW, but grew as well in hosts grown in 95%4fW as hosts grown in pure tryptophan ( Figure 1). Thus, it is the effects of the amino acid on the phage itself that seem to be functionally important, as opposed to any indirect effects due to changes in host fitness.
Overall, these results have implications for the origins of alternate genetic codes. Several competing hypotheses for codon reassignment have been proposed (reviewed in [43]). The first of these hypotheses, the 'disappearing intermediate' hypothesis [44][45][46], posits that certain codons were eliminated by genetic drift throughout genomes that evolved skewed GC or AT contents. Following codon loss, relevant tRNA adaptors became functionless and were deleted. At some later point in evolution sequence composition changed again, and a different tRNA adaptor duplicated, mutated at the anticodon position, and recaptured the codon which had previously disappeared. A variant of this hypothesis suggests that evolutionary pressure on a number of genotypic characteristics, including genome size and organization as well as composition, may have influenced codon reassignment [47].
Alternatively, in the 'ambiguous intermediate' hypothesis [48][49][50] a duplicated and mutated tRNA could recognize a normally non-cognate codon and insert its amino acid in competition with the cognate amino acid. Propagation of organisms with ambiguous proteomes could occur if the non-cognate amino acid were either close to selectively neutral or provided a net selective advantage that overcame any deficits in the function of individual proteins. The further evolution of those proteins whose functions were compromised by amino acid substitutions would eventually repair any minor decreases in fitness. Following the adaptation of individual proteins, a discrete but altered genetic code could be re-established.
In our system, incorporation of the amino acid analogue was beneficial relative to growth in the presence of low or no tryptophan, yet still caused a decrease in phage fitness. This is analogous to the finding that a yeast tRNA that ambiguously encoded serine and leucine allowed growth in diverse environments, yet also led to a 50% decrease in growth rate [14][15][16]. Thus, the requirements for an experimental test of the ambiguous intermediate hypothesis were established. The fact that fitness deficits in the phage Qβ proteome were overcome by amino acid substitutions unrelated to the ambiguous amino acid itself strongly supports the 'ambiguous intermediate' hypothesis. The evolved phage are a plausible, experimental example of the penultimate step in amino acid substitution under the ambiguous intermediate model. By way of comparison, a failure to isolate phage with increased fitness on 6fW or the widespread elimination of tryptophan codons would have indicated that codon ambiguity was not an acceptable evolutionary path. Of course, the substitution of an even more chemically dissimilar amino acid might have generated an intractable barrier to evolution.

Conclusion
As in the natural selection of an ambiguous intermediate, evolutionary engineering of an unnatural organism should occur in stages. First, the incorporation of an unnatural amino acid into a proteome, and second the adaptation of the proteome to the unnatural amino acid. Previous experiments have focused largely on the first stage. Taken together, our experiments now suggest that while amino acid ambiguity is poorly tolerated initially, a secondary, proteomic adaptation to ambiguity is possible. Of course, the number of proteins in the phage proteome is of course small relative to larger, organismal proteomes. In this regard, our results with Qβ phage can be seen as either discouraging or encouraging. From one vantage, the fact that three of the four Qβ proteins accumulated substitutions in order to increase the fitness of the phage may imply that literally thousands of independent mutations may be required to isolate organisms that can fully utilize unnatural amino acids. Alternatively, only a few proteins critical for growth may need to adapt to chemical ambiguity, and the highly interdependent phage proteins may therefore all have been under selection pressure. This latter interpretation is most in keeping with the single example of an organism that has been evolved to have an altered genetic code. Starting with a B. subtilis auxotroph, Wong evolved a strain that could not only fully substitute 4fW throughout its proteome, but actually preferred 4fW for growth [19]. While the number and type of genomic mutations responsible for this phenotype are not known, the strain was generated via only four sequential rounds of mutation and selection. The most parsimonious hypothesis for these results, that only a few key proteins in the bacteria were mutated, is consonant with our observation that a relatively small number of mutations were required to adapt the Qβ phage proteome for chemical ambiguity. Irrespective of whether critical targets were spread throughout an organismal proteome or concentrated in the highly interdependent phage proteome, these targets evolved in response to the change in the genetic code. The evolution of phage with chemically ambiguous proteomes now provides a springboard to the evolution of phage with novel genetic codes, and a means to quantify the relative evolutionary costs of such changes. to a fresh tube and two aliquots (1 ml each) were taken for storage. Phage were titered on LB + Kn, and diluted appropriately such that approximately 1000 plaque-forming units were used for subsequent rounds of selection. This process was repeated for twenty-five cycles.

Selection on 95% 6fW
Phage from round 10 of the selection on M9B1TLW + Kn plates were further selected for 25 rounds on M9B1TL95%6fW + Kn plates. Since plaques were never visible, each round of selection was carried out for a standard 20 hours. PBS (2 ml) was used to recover the phage, and 100 µl of solution was used in the subsequent rounds of selection. Phage were titered with C600F on LB + Kn. The phage solution was not extracted with chloroform.

Fitness assays
Host bacteria for fitness assays were grown up in LB + Kn to a concentration of approximately 10 8 colony-forming units/ml. The culture was spun down and resuspended in 1/100 volume of 20% glycerol, aliquoted, and stored at -80°C. Aliquots were thawed as needed and grown for 1 hour in 100 volumes M9B1TLW + Kn before plating. This procedure served to standardize the physiological state of cells used for assays.
After 1 hour of growth, bacteria were plated with ca. 1000 phage from the population being assayed. Plates varied in terms of what amino acids or analogues were added, but were always M9B1TL + Kn. After 20 hours of growth at 37°C, the top agar was scraped away, and phage were eluted in PBS (2 ml), spun down at 6000 rpm for 15', and phage were titered on LB + Kn in parallel with phage stocks used to initiate the fitness assay. Plaques were counted and fitness was expressed as the number of doublings in a 20 hour period according to the equation log 2 (# of phage at end of assay) -log 2 (# phage at start of assay).

Sequencing of phage genomes
Phage RNA genomes were purified essentially as previously described [51]. In brief, 100 µl of a solution of phage, either directly from the selection or from a population grown on C600F in LB, was extracted with phenol:chloroform:0.1% SDS, chloroform extracted, ethanol precipitated, resuspended in 50 µl of water, and passed through a Centri-Sep column (Princeton Separations, Adelphia, NJ) to remove unincorporated small molecules.
Purified phage genome was used for reverse transcription (Superscript II RT kit, Invitrogen, Carlsbad, CA). In short, 10 µl of phage RNA, 9 µg of random hexamers and water to a total of 21 µl was heated to 70°C for 3', placed on ice, and the remainder of the reaction was assembled according to the manufacturer's instructions. Reverse transcription reactions were incubated for 1 hour at 42°C. A portion of this reaction (4 µl) was used to seed polymerase chain reactions (100 µl). Different reactions contained different primers to amplify different portions of the phage genome. PCR products were gel-purified (QIAquick Gel Extraction Kit, Qiagen, Valencia, CA) prior to sequencing. The complete sequence of the wild-type phage genome has been deposited in GenBank (accession number AY099114). The primers used for the amplification of the genome limited our ability to identify sequence changes to nucleotides 40-4200 of the phage genome.
In some instances, phage were first grown on C600F in LB + Kn prior to reverse transcription and sequencing. In order to ensure that growth on LB did not drastically affect the distribution of phage genotypes, a 1.6 Kb region of the phage genome was also sequenced from non-LB-grown phage stocks. The sequences were found to be identical to those from LB-grown phage stocks.

Determination of amino acid incorporation ratios
Global amino acid incorporation ratios were determined from 100 ml overnight cultures of C600F grown on M9B1TLW + Kn or M9B1TL95%6fW + Kn. The bacteria were spun down and lysed in 200 µl B-PER II (Pierce, Beverly, MA). Half of this volume was passed through a Centri-Sep column. The eluant was dried down and hydrolyzed overnight in 5.4 M HCl, 10% thioglycolic acid at 110°C under vacuum. Hydrolysates were again dried down, and then resuspended in 50 µl of water. These hydrolysates were analysed by HPLC-ESI at the Mass Spectrometry Facility at the University of Texas at Austin. Hydrolysates were also analyzed by HPLC. Samples (20 µL) were injected onto a C-18 column and eluted with 50 mM NH 4 OAc, pH 5.0 in a 3% to 1% MeOH gradient. Peaks were collected and lyophilized, followed by reinjection on the same column and developed with 0.1 M NaH 2 PO 4 , pH 2.5, 10% MeOH. Identities of peaks that absorb at 280 nm were confirmed by determining the elution times of standards.