Modelling the evolution of the archaeal tryptophan synthase
© Merkl. 2007
Received: 13 February 2007
Accepted: 10 April 2007
Published: 10 April 2007
Skip to main content
© Merkl. 2007
Received: 13 February 2007
Accepted: 10 April 2007
Published: 10 April 2007
Microorganisms and plants are able to produce tryptophan. Enzymes catalysing the last seven steps of tryptophan biosynthesis are encoded in the canonical trp operon. Among the trp genes are most frequently trpA and trpB, which code for the alpha and beta subunit of tryptophan synthase. In several prokaryotic genomes, two variants of trpB (named trpB1 or trpB2) occur in different combinations. The evolutionary history of these trpB genes is under debate.
In order to study the evolution of trp genes, completely sequenced archaeal and bacterial genomes containing trpB were analysed. Phylogenetic trees indicated that TrpB sequences constitute four distinct groups; their composition is in agreement with the location of respective genes. The first group consisted exclusively of trpB1 genes most of which belonged to trp operons. Groups two to four contained trpB2 genes. The largest group (trpB2_o) contained trpB2 genes all located outside of operons. Most of these genes originated from species possessing an operon-based trpB1 in addition. Groups three and four pertain to trpB2 genes of those genomes containing exclusively one or two trpB2 genes, but no trpB1. One group (trpB2_i) consisted of trpB2 genes located inside, the other (trpB2_a) of trpB2 genes located outside the trp operon. TrpA and TrpB form a heterodimer and cooperate biochemically. In order to characterise trpB variants and stages of TrpA/TrpB cooperation in silico, several approaches were combined. Phylogenetic trees were constructed for all trp genes; their structure was assessed via bootstrapping. Alternative models of trpB evolution were evaluated with parsimony arguments. The four groups of trpB variants were correlated with archaeal speciation. Several stages of TrpA/TrpB cooperation were identified and trpB variants were characterised. Most plausibly, trpB2 represents the predecessor of the modern trpB gene, and trpB1 evolved in an ancestral bacterium.
In archaeal genomes, several stages of trpB evolution, TrpA/TrpB cooperation, and operon formation can be observed. Thus, archaeal trp genes may serve as a model system for studying the evolution of protein-protein interactions and operon formation.
The synthesis of tryptophan is a common metabolic capability of microorganisms and higher plants, which is not provided by mammals. The prokaryotic trp operon encodes the enzymes catalysing the final and pathway-specific steps from chorismate to L-tryptophan. For more than 40 years, the enterobacterial operon has now been the classical model system for studying the evolutionary relation of genes and enzymes (see [1, 2] and references therein) as well as gene regulation. Considering gene regulation, several, conceptually quite different mechanisms have been described for the trp operon. Most of them were elucidated in bacterial species (see e.g. [3–5], and references therein). However, regulation of trp operon expression has also been shown for the archaea Methanothermobacter thermoautotrophicus [6, 7] and Thermococcus kodakaraensis . The reason for an elaborated regulation may be the fact that tryptophan is one of the amino acids, whose biochemical synthesis is very expensive . Besides regulation, other features of tryptophan biosynthesis have been studied extensively. The composition of the operon and several aspects of its evolution have been analysed , and for each enzyme, at least one 3D-structure has been determined. Taken together, the trp operon is besides the ribosomal protein operons one of the best-characterised gene clusters occurring in microorganisms. Its investigation has provided fundamental insights into many aspects of bacterial genetics and enzymology; see .
The canonical trp operon encodes seven enzymes responsible for the synthesis of L-tryptophan from chorismate. The first reaction is catalysed by the anthranilate synthase, a glutamine amidotransferase, which is a complex consisting of the larger synthase (TrpE) and a smaller glutaminase (TrpG) subunit. The anthranilate phosphoribosyl transferase (TrpD) provides the glutamine amidotransferase function that allows glutamine to serve as the amino donor in anthranilate formation. The two subsequent enzymes, TrpF and TrpC, catalyse the isomerisation of phosphoribosylanthranilate and the synthesis of indole-3-glycerol phosphate, respectively.
TrpA and TrpB constitute the αββα tryptophan synthase complex which catalyses the final reaction from indole-3-glycerole phosphate + L-serine to L-tryptophan + H2O. The α subunit (TrpA) cleaves indoleglycerol-3-phosphate to glyceraldehyde-3-phosphate and indole. The latter is transported through a hydrophobic tunnel to the associated β subunit (TrpB), where it is condensed with L-serine to yield L-tryptophan . A sophisticated mechanism of allostery links the α and β monomers of the synthase; see e.g. .
Several Trp enzymes represent paradigmatically larger classes of proteins having similar function or protein architecture: TrpG is similar to HisH (an enzyme involved in histidine biosynthesis) and other glutaminases of type I glutamine amidotransferases . TrpF, TrpC and TrpA are all (βα)8 barrels possessing similar phosphate binding sites . The basic (βα)8 barrel is the most common enzyme fold in the PDB database of known protein structures .
For the bacterial trp genes, the following order was determined: large anthranilate synthase subunit (trpE), small anthranilate synthase subunit (trpG), anthranilate phosphoribosyl transferase (trpD), indole-3-glycerol phosphate synthase (trpC), phosphoribosyl anthranilate isomerase (trpF), tryptophan synthase β subunit (trpB) and tryptophan synthase α subunit (trpA), or abbreviated trpEGDCFBA . The gene-fusions trpGD and trpEG have been observed in several species; moreover, in other genomes, the operon is broken up into several gene clusters. In archaeal genomes, order of trp genes is highly variable. In Sulfolobus solfataricus, an intact operon trpBADFEGC is observed. In Haloferax volcanii, the trp operon is divided into two isolated clusters, trpCBA and trpDFEG, separated by more than 1200 kb. In the genome of Natronomonas pharaonis, there exist three homologs of trpD and two homologs of trpB, trpE and trpG each. Pyrococcus horikoshii completely lacks the genes for tryptophan synthesis (and for other aromatic amino acids).
The genes trpB, trpA and trpE, trpG are frequently in the same order and in close proximity, i.e. they comprise the linkage groups trpBA and trpEG. In both cases, the gene products constitute a bienzyme complex, whose active centres interact with each other. Because they occur in both bacterial and archaeal genomes, these linkage groups have been identified as ancestral . A reconstruction of the tentative ancestral trp operon is hampered by the observation that trp genes are poor phylogenetic reporters. Different rates of evolution, multiple gene duplications and convergent evolution, as a consequence of specific adaptation to environmental demands, may be the reason for inconsistencies seen in comparisons of phylogenies deduced from trp genes or rRNA . Therefore, the evolution of each element of the trp operon has to be examined separately.
For evolutionary studies, tryptophan synthase is an especially interesting candidate. This enzyme has been analysed for decades in order to understand the structural basis and functional consequences of protein-protein interactions . The isolated TrpA and TrpB proteins form stable, however poorly active α monomers and ββ homodimers, respectively [18, 19]. Their assembly to the native αββα complex induces conformational changes in both subunit types, as shown by X-ray crystallography for the Pyrococcus furiosus synthase . The result of this communication between the α and β subunits is a reciprocal activation by one to two orders of magnitude . Conformational changes crucial for the allosteric communication between the active sites of the α– and β-subunits have been analysed in detail for the Salmonella typhimurium tryptophan synthase; see e.g. [21–24].
The role of the β-subunit is of particular importance for the evolution of Trp synthase. For archaea and bacteria, it is known that two variants of trpB genes occur, which can clearly be distinguished by their protein sequences . The major group, harbouring proteins of type TrpB1 includes the enzymes of enterobacteria and Bacillus subtilis. The minor group (denoted TrpB2) contains many archaeal proteins. Most prokaryotes like E. coli possess a single trpB1 gene. However, in several bacterial and archaeal genomes, a combination of one trpB1 and one trpB2 gene occurs. In addition, some species exist, which have only one or two trpB2, but no trpB1 gene. This variety prompted us to characterise the evolution of TrpB and its interaction with TrpA in detail, both biochemically and in silico.
Based on biochemical findings, a model for the evolution of the tryptophan synthase complex has recently been introduced . This model assumes the existence of an ancient and non operon-based trpB2. After duplication, only one trpB2 gene presumably has been integrated into the trp operon. Differences in evolutionary pressure may have been responsible for the divergence of non operon- and operon-based trpB genes. The coevolution with trpA may have led to a better adapted trpB1. The data on complex formation and subunit activation led us consider existing trpB variants as representatives of evolutionary steps in the postulated model.
In this study, I have assessed this model by phylogenetic methods. Two basic questions have been addressed: i) What is the evolutionary relationship of trpB1 and trpB2 ? ii) How did extant archaeal trp operons evolve? Extending previous work , I will discuss novel hypotheses concerning the properties of TrpB2 and operon formation. Based on the content of 26 completely sequenced archaeal genomes, comparative analyses of trp sequences, and their locations in genomes will be reported in order to reconstruct the evolution of TrpB-type subunits and of the coevolution of TrpA/TrpB. It will be shown that TrpB2 variants represent different stages of TrpA/TrpB cooperation and that TrpB2 is favoured over TrpB1 in certain environments. Moreover, TrpB2 has features of a more ancient TrpB variant.
cons cl scores for trp genes
cons Cl – values
anthranilate/para-aminobenzoate synthases comp. I
anthranilate/para-aminobenzoate synthases comp. II
indole-3-glycerol phosphate synthase
tryptophan synthase alpha chain
tryptophan synthase beta chain
paralogue of TrpB
It has been hypothesised that TrpB2 possesses a second function and acts as a serine deaminase . This prediction has been deduced from the analysis of phyletic patterns, i.e. the absence of an encoded serine deaminase function in certain genomes. However, it has been shown that TrpB1 of Thermotoga maritima and TrpB2_o proteins of Sulfolobus solfataricus and T. maritima have poor serine deaminase activities . An alternative method of non-homologous gene annotation is the exploitation of gene neighbourhoods , as e.g. implemented with AMIGOS. For trpB2, AMIGOS did not detect a second conserved gene neighbourhood besides the one constituting trp operons. Thus, no clues for an additional function besides tryptophan synthesis have been deduced for trpB2 by this approach.
The two variants of trpB occur in various genomes in different combinations . In order to facilitate the analysis of phylogenetic trees, a naming scheme was introduced. Names of genes and gene products were generated according to the scheme SPECIES_LOC|TYPE|TAX. Here, SPECIES is an abbreviation of the species name (see Materials). LOC indicates the position of the specific trpB gene relative to a putative trp operon (more precisely: relative to a trpA gene). If two trpB genes occur in a genome, they were labelled_i (if the gene was located inside the trp operon) or_o (if located outside the operon). If only a single trpB gene occurred in the genome, it was labelled_s, if the gene was linked to trpA, and it was labelled_S, if it was separate from trpA. TYPE indicates the gene type. It is 1 for trpB1 and 2 for trpB2. Finally, TAX gives the taxonomical classification. It is C for Crenarchaeota, E for Euryarchaeota and B for Bacteria. The following examples explain how to resolve sequence names: Aperni_o2C was used to name a trpB gene in the genome of Aeropyrum pernix (Aperni), which occurred outside the trp operon (_o) and was of type trpB2 (2). As A. pernix is a Crenarchaeota, the name ends with a C. The_o notation indicates that a second trpB gene exists in A. pernix. This gene was consequently named Aperni_i2C, as it is a trpB2 gene inside the trp operon. Note that also pairs like Tmarit_i1B and Tmarit_o2B exist indicating the occurrence of a trpB1 gene inside and a trpB2 gene outside the trp operon. Sacido_s2C is the designation of a trpB2 gene located inside the trp operon. As Sulfolobus acidocaldarius possesses only one trpB gene, it was labelled with a_s. Since Thermoplasma volcanium possesses only one trpB gene, which is non operon-based and of type trpB2, this gene was named Tvolc_S2E. Designations of the encoded proteins were assigned in a corresponding way.
In order to determine the distribution of trpB variants, the COG  and the STRING database  were used. For all completely sequenced archaeal and bacterial genomes, their occurrence was determined and their location was identified. Depending on the occurrence of trpB variants, archaeal species were grouped into five categories, named species-types in the following. Note that these species-types characterise the content of genomes. Links to the above naming scheme for genes are gene location and type.
Classifying known archaeal genomes according to the occurrence of trpB genes
S2 (3), s2 (1)
S1 (1), s1 (5)
S. acidocaldarius, s2C 3, TA
T. volcanium, S2E 3, TA
T. acidophilum, S2E 3, TA
P. horikoshii, S2E 2, HT
A. pernix, i2C 4, o2C 2, HT
P. aerophilum, i2C 3, o2C 4, HT
P. torridus, i2E 2, o2E 2, TA
S. solfataricus, i2C 3, o2C 3, TA
S. tokodaii, i2C 3, o2C 4, TA
A. fulgidus, i1E 2, o2E 6, HT
M. acetivorans, i1E 1, o2E 5, MS
M. barkeri, i1E 1, o2E 4, MS
M. burtonii, i1E 1, o2E 4, MS
M. hungatei, i1E 1, o2E 4, MS
M. mazei, i1E 1, o2E 5, MS
M. thermoautotrophicus, i1E 1, o2E 3, TP
P. abyssi, i1E 3, o2E 2, HT
P. furiosus, i1E 3, o2E 2, HT
T. kodakaraensis, i1E 2, o2E 2, HT
N. pharaonis, i1E 1, o1E 0, HP
M. kandleri, S1E 4, YP
Halobacterium, s1E 1, HP
H. marismortui, s1E 1, HP
M. maripaludis, s1E 1, MS
M. stadtmanae, s1E 1, MS
M. jannaschii, s1E 2, YP
i2: 3.0, o2: 3.0
i1: 1.6, o2: 3.7
i1: 1.0, o1: 0.0
Bacterial species did not contribute species-types noticeably different from those observed among archaea (data not shown). Both Geobacter species represent special cases most plausibly explained by ongoing genomic rearrangements: Gsulfu_i2B is an operon-based trpB2 gene of type TrpB2_o. The trp operon of G. sulfurreducens harbours both a trpB1 and a trpB2 gene. According to the annotation, the trpB1 gene (Locus tag GSU2375) contains a frameshift and is annotated as a pseudogene . A direct neighbour of trpB1 in G. metallireducens is a transposase, making a recent transfer of this gene plausible. In comparison to archaea, the occurrence of trpB2 was less frequent in bacterial genomes and none contained exclusively trpB2 genes.
In agreement with previous findings , TrpB1 and TrpB2 clearly fall into two distinct groups. This distinction was supported by a high bootstrap value; see Figure 1. Moreover, among TrpB2 sequences a finer sub-clustering could be deduced, which was in agreement with the location of the genes. One group (labelled TrpB2_o) consisted of products of trpB2 genes not located in operons. 14 out of 16 elements were TrpB2 sequences originating from i1_o2 species, i.e. species possessing besides an isolated lying trpB2 an additional, operon-based trpB1. The genes Paerip_o2C and Aperni_o2C of the two i2_o2 species Pyrobaculum aerophilum and A. pernix belonged to this group too. These two species possess a trp operon containing a trpB2 gene. Bacterial TrpB2_o sequences, which originated from the i1_o2 species T. maritima and G. metallireducens did not form an isolated subtree. This finding argues for a common origin of bacterial and archaeal trpB2_o genes.
The other two subgroups of TrpB2 variants were clearly distinct from the TrpB2_o cluster. The sequences of these clusters originated from archaeal S2 (Thermoplasmataceae), s2 or i2_o2 species (Sulfolobaceae, Picrophilus torridus, A. pernix, P. aerophilum), i.e. species possessing exclusively one or two trpB2 genes. These sequences formed two clearly separated sets. The first set, named TrpB2_i, subsumes operon-based trpB2 genes, and harboured Stokod_i2C, Sacido_s2C, Ssolfa_i2C, Ptorri_i2E, Apern_i2C, and Paerop_i2C. The second set, named TrpB2_a, consisted of Ptorri_o2E, Stokod_o2C, Ssolfo_o2C, Tacido_S2E, and Tvolca_S2E, and subsumed trpB2 genes located outside trp operons. For Thermoplasma volcanium and Thermoplasma acidophilum, these trpB2 genes were the only trpB genes, for S. solfataricus, S. tokodaii and P. torridus, a second, however distinguishable trpB2 gene of type trpB2_i was part of the trp operon. Proteins of type TrpB2_i formed two finer subgroups: Those of P. torridus and the Sulfolobaceae resembled more sequences of TrpB2_a. Those of A. pernix and P. aerophilum, which possess a non operon-based trpB2_o gene, were different both from TrpB2_a and from TrpB2_o sequences; see Figure 1. All relevant edges separating these groups are due to their high bootstrap value statistically highly significant.
As a single exception, the genome of P. horikoshii did not follow the general classification scheme. It possesses a single trpB2 gene, which is of type trpB2_o and not – as expected – of type trpB2_a. However, this genome lacks all the other trp genes, which has been previously interpreted as reductive evolution . The occurrence of a trpB2_o gene might be due to the loss of the complete trp operon after speciation of trpB2_i and trpB2_o. The fact that the P. horikoshii trpB2_o gene was not affected by the reduction has been considered as an argument for assigning to it an other selective function , which has not been identified yet. As noted above, the two bacterial Geobacter species represent special cases associated with the presumptive rearrangement of trp genes. Briefly, the trpB variants can be characterised as follows: trpB1 genes occur exclusively in trp operons. trpB2_o variants represent genes occurring outside operons in those species that have an operon-based trpB1. Several archaeal species possess exclusively trpB2 genes: If only one trpB2 gene exists, it is of type trpB2_a, if two trpB2 genes occur, one is an operon-based trpB2_i, the second a trpB2_a, or a trpB2_o gene.
Correlated with TrpB speciation, TrpA proteins showed a division into two, statistically highly significant subgroups; see Figure 2. The larger TrpA1 group consisted of TrpA sequences originating from genomes that possess a trpB1 gene. Most likely, TrpA1 proteins interact with the operon encoded TrpB1 and thus fall into the same class. The smaller TrpA2 group contained exclusively TrpA proteins of species-types S2, s2, or i2_o2, i.e. TrpA proteins whose putative interaction partner is exclusively a TrpB2 protein. The high bootstrap value of 1000 (≜100%) for the central edge emphasises the distinction made between TrpA1 and TrpA2. S2, s2, i2_o2 species formed three statistically significant subtrees; compare Figure 2. These harboured the TrpA sequences of (i) Sulfolobaceae, (ii) Thermoplasmatales (T. acidophilum, T. volcanium, P. torridus) and (iii) P. aerophilum, and A. pernix. The composition of these groups is in agreement with the TrpB2_a and TrpB2_i groups in Figure 1 and indicates the coevolution of trpB2 variants with trpA.
In all three trees (see Figures 3 and 4), both the proteins of Thermoplasmatales and of the three Sulfolobaceae constituted sub-clusters. The edges determined for TrpD or TrpE entries of these species have similar lengths as those calculated for TrpA or TrpB. Especially for the trpA and trpB genes of these species, an increased rate of evolution has been previously postulated . However, the comparison of trees and edge lengths showed that in these species evolutionary divergence is similarly high for several proteins encoded by the trp operon. These findings argue against a specifically increased rate of trpA and trpB evolution. In general, smaller genomes evolve faster . Therefore, a higher evolutionary rate in the trp genes of Thermoplasmatales is more plausible explained by a general trend, which is due to their smaller genome size.
Interestingly, no sub-clustering into smaller, distinctly separated groups was observed in TrpE and TrpG, which form like TrpA and TrpB a heteromeric complex. The above finding distinguishes the subunits of tryptophan synthase from those of anthranilate synthase. TrpG was characterised as the evolutionary most stable trp protein by the compactness of its phylogenetic tree; see Figure 4.
The three Euryarchaeota Halobacterium (s1), Haloarcula marismortui (s1) and Natronomonas pharaonis (i1_s1 species) constituted an isolated group in all five trees (Figures 1, 2, 3, 4); edge lengths were comparable to those of s2 or i2_o2 species. This congruence indicates an elevated evolutionary rate for all elements of these trp operons. Note that these operons harbour trpB1 genes.
Pairwise sequence similarity values of TrpB proteins
49, 72, 2
76, 88, 0
47, 65, 2
46, 67, 1
26, 44, 17
46, 64, 2
32, 43, 18
47, 66, 2
28, 43, 12
30, 46, 11
26, 42, 6
54, 75, 2
56, 71, 3
53, 72, 1
34, 48, 14
57, 72, 1
34, 47, 12
54, 73, 1
30, 46, 15
30, 45, 12
28, 42, 14
50, 66, 2
52, 71, 1
33, 50, 13
47, 68, 1
32, 46, 13
48, 70, 1
28, 43, 16
30, 45, 11
27, 40, 10
57, 68, 3
35, 49, 14
54, 63, 3
35, 50, 12
54, 65, 3
32, 48, 12
34, 46, 15
31, 43, 13
30, 45, 13
63, 78, 1
32, 46, 13
60, 75, 1
28, 44, 16
33, 46, 11
28, 41, 16
31, 44, 11
65, 78, 0
36, 48, 10
65, 85, 0
65, 78, 0
59, 76, 1
34, 47, 12
64, 76, 0
30, 45, 9
32, 45, 9
29, 40, 12
35, 47, 11
61, 79, 1
64, 79, 1
58, 75, 1
34, 46, 9
32, 46, 8
29, 40, 12
59, 77, 0
58, 77, 0
57, 72, 0
The MSA shows that nearly all differences between TrpB1 and TrpB2 are due to larger indels, in agreement with . Interestingly, an insertion of 2 to 6 residues between positions 243 and 244 occurred coincidently in TrpB2_a and TrpB2_o sequences, i.e. exclusively in non operon-based proteins. All considered TrpB1 and TrpB2_i sequences lack this subsequence, which was not predicted as a well-defined 2D-element by Jpred. Several representatives belonging to these two sets of operon-based proteins were shown to interact with TrpA [26, 39]. Therefore, it is probable that this putative loop influences the allosteric communication with TrpA. Most residues, which are in contact with ligands in the known TrpB1 structure, were strictly conserved among all TrpB1 and TrpB2 sequences. The only exception is residue C225, which is V225 in TrpB2_a sequences. The active site residues H81, K82, and S371 were strictly conserved, whereas active site residue K162 was conserved only in TrpB1 proteins and active site residue D300 (TrpB1) was an arginine in TrpB2. Several residues of the interface regions, adjacent to active sites and near sites interacting with ligands had a bimodal occurrence pattern distinguishing TrpB1 and TrpB2. Among these were residues 2 and 110, which were strictly conserved tryptophan residues in all TrpB2 proteins. Given its position near the gene start, W2 may assume a function in translation control. W110 succeeds a cluster of strictly conserved residues suggesting a role in stability or protein function.
It has been postulated that the avoidance of tryptophan residues in enzymes for tryptophan synthesis provides a selective advantage  as has been shown for a number of amino acid biosynthetic enzymes . This criterion was also applied to the trpB genes by assessing the frequency of tryptophan codons (Table 2). trpB1 genes contained one or two tryptophan codons with a mean value of 1.6 both for S1, s1, and i1_o2 species. trpB2 genes contained two tryptophan codons or more with a mean of 2.75 for S2 species, and 3.0 for i2_o2 species. Most pronounced was the difference for i1_o2 species. Here, trpB2 genes had a mean of 3.7, whereas trpB1 genes had a mean of 1.6 tryptophan codons. These trpB1 genes showed a habitat-specific imbalance of tryptophan codon occurrence with one in mesophilic species and at least two tryptophan codons in hyperthermophiles. In summary and according to the notion of tryptophan codon avoidance, trpB2 genes are less optimised than trpB1 genes.
It has been argued that simple trp clusters may have been unstable until the complexity of regulation and the foundation of a metabolic theme had reached a certain level . Gene clusters observed in s2 and i2_o2 species can be considered the less evolved stages of cluster organisation; compare Panels A – C. Moreover, the only archaeal trp gene regulatory systems identified so far are part of the trp operons of M. thermoautotrophicus  and T. kodakaraensis , which both have a bacterial-like composition.
Besides Nanoarchaeum equitans, Thermoplasmata (T. volcanium, T. acidophilum, and P. torridus) possess the smallest archaeal genomes sequenced so far. Most plausibly, strong selective pressure associated with the colonised habitat enforces the minimisation of genome size. However, both Thermoplasma species possess the gene cluster trpA2DFEGC. Therefore, the need for tryptophan synthesis can be taken for granted. The separation of trpB2 from the remaining trp genes is consistent with a demand for individual gene regulation and expression presumably due to an additional function of TrpB2. Most plausibly, under these constraints, trpB2 is the more optimal variant, which is in a specific environment favoured over trpB1.
Recently, TrpA, Tmari_i1B and Tmari_o2B of T. maritima have been produced in E. coli, purified, and characterised . It has been shown that recombinant TrpA forms an α-monomer, and that both recombinant TrpB proteins form β2-homodimers. However, only the operon-encoded Tmari_i1B – but not Tmari_o2B – associated with TrpA to constitute the conventional αββα tryptophan synthase complex in which both subunits reciprocally activate each other. An analogous experiment has been carried out for genes of S. solfataricus . The results have shown that Ssolfa_i2C – but not Ssolfa_o2C – associates transiently with TrpA during catalysis to form a functional tryptophan synthase complex. However, in contrast to regular tryptophan synthases, the affinity between the two subunit-types was weak, and activation has been unidirectional from Ssulfo_i2C to TrpA. These results indicate the following ranking for the binding-affinity to TrpA: TrpB2_o < TrpB2_i < TrpB1.
In the course of modelling trpB evolution, the relationship between the trpB variants has to be made plausible. A possible explanation for the existence of two trpB variants would be convergent evolution, i.e. the independent development of trpB1 and trpB2 towards a trpB gene. In this case, few residues, which are critical for function, should correspond. However, one would expect these residues embedded into polypeptides, which are relatively dissimilar on the sequence level. In contrast, comparison of TrpB1 and TrpB2 sequences shows that on average 30% of the residues are identical and 40% are similar; compare Table 3. This finding and the conservation of indels makes convergent evolution highly improbable and argues for a common origin of trpB1 and trpB2 genes.
The most-widely accepted model for the evolution of novel protein functions postulates gene duplication and the generation of a redundant gene copy . It is assumed that evolutionary stress for a copy is largely reduced thus facilitating the evolution of a paralogue with a novel function. This model is based on the notion that negative trade-offs dominate evolutionary processes . According to this model of evolution, one of the trpB genes originates from a copy of the ancestral variant. Which of the two existing variants represents the more ancient gene? The arguments listed below suggest that trpB2 is the ancestral trpB gene.
i) trpB1 is not universally distributed among archaea. Crenarchaeota possess exclusively trpB2 genes. ii) A low frequency of amino acids in enzymes required for their synthesis provides selective advantage . In general, trpB1 genes contain fewer tryptophan codons than trpB2 genes; in i1_o2 species, the ratio is 1.6/3.7 i.e. less than 0.5. Therefore, trpB1 is the more evolved gene. iii) The sophisticated inter-subunit communication suggests that the products of trpB1 and trpA1 of species-types s1 or i1_o2 are the most efficient enzymes; see  and references therein. Hence, TrpB1 is the more optimised and later evolved TrpB variant. iv) It has been postulated that ancient enzymes possess broad specificities . The occurrence of trpB2 outside trp operons argues for either a new function or a broader specificity. In summary, it is plausible to regard trpB2 as representing the more ancient variant of trpB.
For bacteria, an ancestral trp operon of type trpEGDCFB1A1 is most likely . Therefore, the existence of a trpB1 gene in the bacterial predecessor was taken for granted. In addition, it has been concluded for bacterial trp operons that horizontal gene transfer (HGT) did not affect the path of evolutionary history .
trpB1, trpB2, trpA1 and trpA2 have been invented only once. The analysis of multiple sequence alignments (see Figures 5 and 6) shows that the main differences distinguishing the variants are conserved indels. It has been convincingly argued that conserved indels result less likely than e.g. point mutations from independent mutational events and provide useful milestones for the identification of evolutionary phases . In addition, the strong coherence seen in the TrpB subtree argues against an independent evolution occurring in parallel for bacteria and archaea. Due to the existence of conserved indels, an evolutionary process trpB2_i → trpB1 → trpB2_o or vice versa is unlikely too.
As has been deduced previously , the following order of importance was taken for the processes of genome evolution: gene loss > gene genesis > gene duplication > HGT.
The integration of a trpB gene into the trp operon (or linkage group) was rated less probable than other translocations, gene duplications, gene loss, and mutations. It is presumably very rare that a particular gene gets integrated into a specific gene cluster , which is the trp operon in the considered case.
It is unlikely that several recent events of HGT explain the taxonomically widespread occurrence of trpB2 genes in bacteria. In bacteria, trpB2 genes were found in hyperthermophilic (Aquificae and Thermotogae) and mesophilic bacteria belonging to the taxonomical groups of Alpha - and Gammaproteobacteria and Bacteroides. The program SIGI  identifies genomic islands, i.e. gene clusters having a conspicuous codon usage indicating recent HGT events. In none of the considered genomes were trpB1 or trpB2 genes (both inside and outside operons) elements of such islands.
The most plausible predecessor of all Crenarchaeota is of type i2_o2; for Bacteria and for Euryarchaeota it is of type i1_o2. Assuming this and excluding Thermoplasmata (see below), of the 23 modern archaeal species, 14 have the same species-type as their ancestor. Of the 9 species possessing a deviating type, 7 can be explained with a single gene loss, and for only 2 modern species a more complicated genomic rearrangement has to be postulated: Loss of trpB2 and dislocation of trpB1 has to be postulated for M. kandleri (representing Methanopyri), which is a S1 species. The replacement of trpB2_o with a copy of trpB1 is necessary to explain the i1_o1 genome of N. pharaonis. The only euryarchaeal class requiring a more complex explanation than gene loss and translocation subsumes Thermoplasmata, which possess exclusively trpB2 and trpA2 genes. The composition of congruency groups (compare Figures 1 and 2) makes a common evolution with Sulfolobales or the acquisition of the same trp genes probable. The similarity of operon structures supports this assumption: operon structures of P. torridus and Sulfolobales are identical (compare Panel B of Figure 7). For T. acidophilum, a large amount of HGT with S. solfataricus, which is found in the same habitat, has been made plausible . In summary, a common evolutionary history of trpB2 and trpA2 genes of Sulfolobales and Thermoplasmata is highly plausible, proposing for both taxonomical classes an ancestor of species-type i2_o2. Assuming an i2_o2 ancestor, gene loss is sufficient to explain the genome composition of all modern Thermoplasmata.
In Panel A of Figure 9, the existence of an ancestral trpB *, an intermediate of trpB1 and trpB2 was postulated for the LUCA. trpB * might then have diverged into a bacterial trpB1 and an archaeal trpB2 variant. To explain the existence of a non operon-based trpB2 in archaea, a duplication of trpB2 is necessary. The advent of an euryarchaeal i1_o2 predecessor requires the replacement of linkage group trpB2A2 with trpB1A1 via HGT from bacteria to archaea. The occurrence of trpB2 in bacterial genomes demands an early transfer of trpB2 from an archaeal to a bacterial predecessor.
Panel B of Figure 9 depicts an alternative model for the evolution of the LUCA towards the bacterial and archaeal ancestors. As introduced above, gene duplication is regarded the first step for evolving a novel gene function. In addition, trpB2 must be considered to represent the more ancient variant of trpB. Therefore, the evolution towards the LUCA of bacteria and archaea is most plausibly explained by the duplication of a non operon-based trpB2 gene, which was subsequently integrated into the trp operon and constituted an ancient linkage group trpB2A2. This makes a common ancestor of type i2_o2 plausible. These considerations are the basis for further reconstructing the evolution of predecessors. In Panels B and C, two alternatives are given.
In Panel B, it is assumed that the LUCA was of type i2_o2 and that the evolution trpB2 → trpB1 occurred in an early bacterial species. In this case, species-types of the LUCA and the crenarchaeal predecessor are identical. To explain the advent of an euryarchaeal predecessor of type i1_o2, an ancient event of HGT from Bacteria to Archaea has to be postulated for the acquisition of the linkage group trpB1A1, which replaced trpB2A2.
In Panel C, it is assumed that the LUCA was of type i1_o2, i.e. the evolution trpB2 → trpB1 occurred earlier than the speciation of Bacteria and Archaea. In this case, the species-types of the LUCA and the predecessors of Bacteria and Euryarchaeota are identical. However, a replacement of trpB2_i by trpB2_o is necessary to constitute the crenarchaeal predecessor.
Model A requires at least two ancient events of HGT to explain the occurrence of trpB2 in Bacteria and of trpA1B1 in Euryarchaeota. The phenomenon of non-orthologous displacement in situ is well-characterised [51, 52]. In addition to HGT, a duplication of the trpB2 gene is needed for the predecessor of Archaea. This model is not the most parsimonious one: Model B demands only one HGT event, the ancient acquisition of the linkage group trpB1A1 by an euryarchaeal predecessor.
Model C postulates a LUCA of species-type i1_o2. The sophisticated inter-subunit communication clearly suggests that products of trpB1 and trpA1 genes are the most specialised and most recently evolved tryptophan synthases; see  and references therein. Thus, the replacement of trpB1 with trpB2, which is needed to explain the existence of a crenarchaeal predecessor of type i2_o2, would – with respect to protein-protein interaction – lead to a less optimal tryptophan synthase. This seems unlikely, if one presumes the sustained need for tryptophan synthesis in Crenarchaeota.
In archaeal genomes, various stages of trpB function have been conserved. Most plausibly, trpB2 represents the ancestral variant of trpB genes. With respect to TrpA/TrpB communication and cooperativity, the situation observed in S2 species (T. acidophilum and T. volcanium) is probably the least complex one. Similarly archaic are the non operon-based trpB2 genes of Sulfolobaceae, whereas the operon-based trpB genes are more evolved. s1 and i1_o2 species possess highly cooperative synthases. Thus, the archaeal tryptophan synthase (especially trpB variants) constitutes a model system for the study of protein complex formation. Due to different environmental conditions, several stages of cooperativity have been conserved, which allow to characterise the progress of trpA – trpB coevolution based on gene expression and on functional cooperativity.
Genomic content was determined by analysing version 6.2 of the STRING database .
All protein sequences were downloaded via the "Genome Project" database of the NCBI , which allows to access completely sequenced genomes. Respective COG tables were consulted to determine the COG group of genes  and to download sequences. Genes originating from the following completely sequenced genomes were analysed (abbreviations used for Figures and accession numbers of genomes in brackets):
Aeropyrum pernix K1 (Aperni, NC_00854), Pyrobaculum aerophilum str. IM2 (Paerop, NC_003364), Sulfolobus acidocaldarius DSM 639 (Sacido, NC_007181), Sulfolobus solfataricus P2 (Ssolfa, NC_002754), Sulfolobus tokodaii str. 7 (Stokod, NC_003106).
Archaeoglobus fulgidus DSM 4304 (Afulgi, NC_000917), Haloarcula marismortui ATCC 43049 (Hmaris, NC_006396), Halobacterium sp. NRC-1 (Halob, NC_002607), Methanocaldococcus jannaschii DSM 2661 (Mjanna, NC_000909), Methanococcoides burtonii DSM 6242 (Mburto, NC_007955), Methanococcus maripaludis S2 (Mmarip, NC_005791), Methanopyrus kandleri AV19 (Mkandl, NC_003551), Methanosarcina acetivorans C2A (Maceti, NC_003552), Methanosarcina barkeri str. Fusaro (Mbarke, NC_007355), Methanosarcina mazei Go1 (Mmazei, NC_003901), Methanosphera stadtmanae DSM 3091 (Mstadt, NC_007681), Methanospirillum hungatei JF-1 (Mhunga, NC_007796) Methanothermobacter thermautotrophicus str. Delta H. (Mtherm, NC_000916), Natronomonas pharaonis DSM 2160 (Nphara, NC_007426), Picrophilus torridus DSM9790 (Ptorri, NC_005877), Pyrococcus abyssi GE5 (Pabyss, NC_000868), Pyrococcus furiosus DSM 3638 (Pfurio, NC_003413), Pyrococcus horikoshii OT3 (Phorik, NC_000961), Thermococcus kodakaraensis KOD1 (Tkodak, NC_006624), Thermoplasma acidophilum DSM1728 (Tacido, NC_002578), Thermoplasma volcanium GSS1 (Tvolca, NC_002689).
Escherichia coli K-12 (Ecoli, NC_000913), Geobacter metallireducens GS-15 (Gmetal, NC_007517), Geobacter sulfurreducens PCA (Gsulfu, NC_002939), Thermotoga maritima (Tmarit, NC_00853).
For the generation of multiple sequence alignments (MSAs) the program M-Coffee  was used. It combines the output of nine individual MSA methods for the generation of a "meta"-MSA. M-Coffee has been shown to outperform all individual methods of MSA generation .
For each position in a MSA, residue conservation, secondary structure, the location of the interface area, active sites and residues, which are characteristic for sequence types, were determined and plotted. 3D-data were deduced from the PDB-file 1WDW, describing the TrpA/TrpB complex of P. furiosus . For 2D-structure prediction, Jpred  was used. SDPpred  was utilised to identify those residues, which distinguished sequence groups due to their skewed or bimodal distribution. Annotations referring active site residues were deduced from the PDBsum page [56, 57], interface residues were annotated according to the Protein interfaces, surfaces and assemblies service PISA [58, 59]. Both services were located at the webserver of the European Bioinformatics Institute (EMBL-EBI).
SplitsTrees4 , a frame-work for phylogenetic analyses, was used to generate and analyse phylogenetic trees. MSAs originating from M-Coffee were utilised to calculate maximum likelihood protein distance estimates based on a JTT  model. The bio-neighbour joining approach  was used to generate trees. Resulting trees were analysed by bootstrapping (1000 replications each).
I want to thank Reinhard Sterner and Rüdiger Schmitt for stimulating discussions and their help in preparing this manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.