Differential gene transfers and gene duplications in primary and secondary endosymbioses
© Zauner et al; licensee BioMed Central Ltd. 2006
Received: 22 December 2005
Accepted: 26 April 2006
Published: 26 April 2006
Most genes introduced into phototrophic eukaryotes during the process of endosymbiosis are either lost or relocated into the host nuclear genome. In contrast, gro EL homologues are found in different genome compartments among phototrophic eukaryotes. Comparative sequence analyses of recently available genome data, have allowed us to reconstruct the evolutionary history of these genes and propose a hypothesis that explains the unusual genome distribution of gro EL homologues.
Our analyses indicate that while two distinct gro EL genes were introduced into eukaryotes by a progenitor of plastids, these particular homologues have not been maintained in all evolutionary lineages. This is of significant interest, because two chaperone proteins always co-occur in oxygenic photosynthetic organisms. We infer strikingly different lineage specific processes of evolution involving deletion, duplication and targeting of gro EL proteins.
The requirement of two gro EL homologues for chaperon function in phototrophs has provided a constraint that has shaped convergent evolutionary scenarios in divergent evolutionary lineages. Gro EL provides a general evolutionary model for studying gene transfers and convergent evolutionary processes among eukaryotic lineages.
Plastids, the solar powered energy factories of phototrophic eukaryotes, either translate mRNAs for their organelle-encoded genes or import nuclear-encoded proteins. In both cases, correct folding of proteins is managed by chaperones such as those of the GroEL family. These are an abundant class of chaperones, which are also found in mitochondria, hydrogenosomes and prokaryotes [1, 2]. Their importance and distribution has led to intensive investigation of their function, and has culminated in the 'molecular chaperon concept' [3, 4], which has strongly influenced current understanding of protein folding and assembly.
Early genomics on the cyanobacterium Synechocystis sp. PCC 6803 and on plastid chromosomes of eukaryotes has highlighted a surprisingly varied distribution of genome locations for GroEL homologues amongst photosynthetic taxa [5, 6]. Synechocystis sp. PCC 6803 harbours two different gro EL genes, whereas only one is maintained in red algal plastomes and the plastid genome of the cyanelle. Genes encoding GroEL have not been located within the sequenced genomes of chloroplasts in green algae and land plants, but two nuclear homologs of gro EL, the cpn 60s, have been detected in the nuclear genome of Chlamydomonas reinhardii and some land plants. In Euglena gracilis, no gro EL gene has been identified . The secondary endosymbionts of a cryptomonad (Guillardia theta) and also a diatom (Odontella sinensis) are known to encode a single gro EL gene in their plastid, and it can be speculated that the presence of a single copy of gro EL may indicate the ancestral state in a primary endosymbiont [8, 9]. Interestingly, the nucleomorph genome of the cyptomonad Guillardia theta harbours a gro EL homologue  and a cpn 60-like homologue has recently also been found in the nucleomorph genome of another secondary endosymbiont: the chlorarachniophyte Bigelowiella natans (Gilson & McFadden, unpublished). Additionally, cpn 60-like genes have recently been discovered in the nuclear genomes of other photosynthetic organisms: including in a red alga (Cyanidioschyzon merolae), a diatom (Thalassiosira pseudonana), and Plasmodium falciparum (a parasitic organism that harbours a degenerate plastid). With the exception of P. falciparum, the co-occurrence of two gro EL genes in the genomes of these recently sequenced organisms was predicted as necessary for maintaining chloroplast function . We develop here a hypothesis for differential transfer and gene duplication that explains the distribution of gro EL homologues amongst the mulitple genomes of photosynthetic taxa. We discuss how these proteins may act as an important regulator for plastid functions.
Substitution model selection
In all analyses, except analyses of the red/brown algal GroEL orthologues, an RtTEV model, that accommodated positional rate heterogeneity in some form was the model selected as best by ProtTest under the AIC criterion. In the case of the GroEL orthologues in red/brown algae, we noted that small differences in the AIC criterion separated a variety of different models. When an RtREV model was assumed, and positional rate heterogeneity was approximated by either a constant proportion of variable sites or a discrete gamma distribution of rate classes, we noticed that the optimal estimates for these parameters varied among evolutionary lineages (i.e different estimates were obtained for cyanobacterial GroEL1, cyanobacterial GroEL2, red algal/heterokont plastid GroEL1 like sequences, and highly diverged Cpn60-like sequences). When a uniform rate distribution was assumed, p var values ranged from 0.3 variable sites (cyanobacterial GroEL1) to 0.9 variable sites (highly diverged Cpn60-like sequences). When p var was set to 1, alpha shape parameter values ranged from 0.2 (for cyanobacterial GroEL1) to 0.8 (for highly diverged Cpn60-like sequences). We suggest that this degree of variation in parameter estimates for phylogenetic grouping of anciently diverged sequences is likely to reflect lineage specific differences in structural and functional constraints [10–12]. As such it is potentially problematic for phylogenetic reconstruction, since parallel increases in proportions of variable sites in different evolutionary lineages can sometimes induce a form of long branch attraction [10–12].
Evolutionary tree building
Although, it contains more GroEL homologues than was available to Wastl et al. , our phylogenetic reconstruction here is nevertheless consistent with observations and inferences made by Wastl et al. . Figure 1 shows that (a) the gro EL1 like genes from the plastid genome of red algae/heterokonts and cryptophytes are most closely related to the cyanobacterial gro EL1 genes. (b) Interestingly the cyanelle gro EL1 like homologue is somewhat intermediate between cyanobacterial and red algal/chromist gro EL1-like sequences, a finding that may reflect the cyanobacterial-like nature of this plastid. (c) The non-photosynthetic eubacterial gro EL sequences are arguably more similar to the gro EL2 sequences found in cyanobacteria. In any event, the gro EL2 orthologues in cyanobacteria are genetically more diverse than the gro EL1 sequences in the same taxa. These observations may suggest that gro EL2 orthologues represent an ancestral from of GroEL. However, our inference that structural/functional constraints differ amongst GroEL homologues, means that it is not possible to exclude other interpretations. (d) Assuming the root of the tree joins the branch leading to the non-photosynthetic taxa, Figure 1 places the Gloeobacter "A" and "B" sequences as the ancestral forms of GroEL homologues in photosynthetic taxa. (e) In the nuclear genomes of a diatom (Thalassiosira pseudonana), an alveolate (Plasmodium falciparum), the higher plants and the nucleomorph genomes of a chlorarachniophyte (Bigelowiella natans), a cryptophyte (Guillarida theta) and a red algae (Cyanidoschyzon merolae) gro EL2 type sequences are found. In the case of higher plants, two forms of groEL2 occur, and appear to represent forms of groEL duplicated from an ancestral groEL2-like sequence. Inferences concerning the origin of this duplication are potentially problematic. A relatively high bootstrap value in Figure 1 (77%) suggests that it may have occurred prior to the divergence of plastids. However, potential long branch attraction problems concerning the placement of α and β Cpn60-like sequences from eukaryotes make this conclusion tentative. Gene duplication within the green lineage and differences in the functional/structural constraints of green α and β Cpn60-like sequences might also explain the results observed Although Chlamydomonas reinhardtii Cpn60 α and β sequences (AAA98642 and AAA98643) are not included in our phylogenetic analysis shown in Figure 1 (because their inclusion significantly reduced the alignment length, and increased phylogenetic uncertainty). However, it is clear from other phylogenetic analyses (unpublished) that this green alga contains the two forms of gro EL2 also present in higher plants.
Hypotheses of origin
Our phylogenetic reconstruction suggests a complex pattern of genome transfers, losses and duplications in the evolution of gro EL sequences from photosynthetic taxa. In the earliest cyanobacterial-like prokaryotes, the ancestral gro EL sequence appears to have duplicated to form an "A" and a "B" type sequence that is still present today in the genome of Gloeobacter [14, 15]. It appears that the "A" (gro EL2-like) and "B"(more gro EL1-like) forms have been inherited by most cyanobacteria, and also the endosymbiont(s) involved in primary plastid endosymbiosis. During the process of endosymbiosis, the gro EL2-like orthologue has been transferred from the endosymbiont genome to the nuclear genome in heterokonts, red algae, green algae and high plants, and cryptophytes. We predict that this also will be true for haptophytes and glaucocystophytes. Less certain may be prediction of the pattern of evolution in peridinin-containing dinoflagellates, which in other respects appear highly distinctive in their evolution [16, 17]. An interesting observation is that Cyanophora paradoxa, the molecular prototype of the glaucophytes, shows not only archaic features by encoding a gro ES in its plastid genome, but also by containing a gro EL1 sequence, somewhat intermediate in sequence identity between cyanobacterial and plastid encoded gro EL1 genes. In the case of the chlorophyte lineage, it appears that it has been the gro EL1 form that has been lost, and in Chlamydomonas and higher plants, this has been replaced by a duplicated form of a gro EL2-like sequence. This situation may also be true for the endosymbiont of the chlorarachniophyte B. natans (generally assumed to have been a green alga ), since phylogenetic reconstructions, including those with partial Chlamydomonas GroEL sequences (not shown), provide some support for Bigelowiella and cpn60α containing taxa being most closely related (e.g. Figure 1). Several investigations on the evolution of the Plasmodium apicoplast indicate a red algal origin for this organelle [18, 19]. If so, then based on the generalisations stated above, one would expect that the gro El/cpn 60 distribution should be similar to the situation found in heterokonts. Additional duplications, of gro EL1 genes (in the presence of gro EL2) have also occurred in some species of filamentous cyanobacteria and this observation is discussed in the following section.
GroEL proteins and protein folding
In all eukaryotes and prokaryotes that carry out oxygenic photosynthesis, and for which genome data is complete, two different forms of gro EL/cpn 60 are known to exist. Thus, at least two copies of the gene or protein seem to be necessary for a complete chaperon function. Further, the importance of having two divergent forms of GroEL for protein folding is suggested from observations on GroEL sequences in Synechocystis sp. PCC 6803, where it has been observed that GroEL1 and GroEL2 respond differently to heat shock and light conditions [20, 21]. An interesting speculation is that the number and genetic diversity of GroEL homologues reflects complexity of morphotype in cyanobacteria. This is suggested from comparison within filamentous cyanobacteria. Those strains that harbour three GroEL genes exhibit complex developmental stages (akinetes and hormogonia) whereas filamentous strains with a simpler morphotype, such as Nostoc sp. PCC 7120, harbour only two GroEL genes. Further study is needed to evaluate whether particular gro EL homologues are specifically expressed in different developmental stages. Interestingly, P. falciparum encodes one cpn 60 gene, but no gro EL1 in the apicoplast genome. If phylogenetic inferences of a close relationship with red algae [18, 19] are correct, then this observation may reflect relaxed constraints for protein folding for proteins of the apicoplast, and this speculation is also worth further investigation.
Although, phylogenetic reconstruction of individual gene histories is inherently problematic for anciently diverged taxa [5, 10, 11] phylogenetic reconstruction for gro EL homologues nevertheless provides a framework for developing understanding of genome-wide patterns of gene loss, relocation and multiple events of gene duplication. Our results presented here support and extend the hypothesis of gro EL/cpn 60 evolution by Wastl et al.  which suggests a pattern of differential serial gene transfer and gene duplication.
Resources for sequences
Substitution model selection
The online version of ProtTest v1.2.6 [30, 31], implementing the Akaike Information criterion (AIC) was used to select the most appropriate amino acid substitution models for tree building analyses. The cyanobacteria GroEL 1 and GroEL 2 datasets used for estimating parameters had identical taxon sampling (10 taxa: Synechocystis PC 6803, Crocosphaera watsonii WH 8501, Trichodesmium erythraeum IMS101, Synechococcus elongatus 7942, Synechococcus vulcanus, Synechococcus sp. wh8102, Prochlorococcus marinus str. MIT9313, Prochlorococcus marinus str. CCMP1375, Prochlorococcus pastoris str. CCMP1986). Estimates were also made for bacteria (9 taxa: Thermus thermophilus, Rhodospirillum rubrum, Neisseria meningitidis, Geobacter metallireducens, Pseudomonas aeruginosa, Mycobacterium avium, Bacillus subtilis, Aquifex aeolicus and Chlorobium tepidum; and for red/brown algal chloroplast located orthologues (8 taxa: Guillardia theta, Pyrenomonas salina, Cyanidium caldarium, Porphyra purpurea, Gracilaria tenuistipitata var. liui, Odontella sinensis, Thalassiosira pseudonana and Cyanidioschyzon merolae) and also for highly diverged Cpn60-like sequences (5 taxa: Thalassiosira pseudonana, Cyanidioschyzon merolae, Plasmodium falciparum, Bigelowiella natans and Guillardia theta).
Evolutionary tree building
Trees were reconstructed from amino acid sequences using the windows version of PhyML . Trees were built assuming an RTRev model and the optimal tree shown in Figure 1 displayed using SplitsTree4.0 . The robustness of phylogenetic reconstructions to variations in assumptions of positional rate heterogeneity was investigated by assuming (a) different proportions of variable sites (p var range = 0.3–1.0) and (b) a discrete gamma distribution of rate classes and a range of alpha shape parameter values (0.2–1). Non-parametric bootstrap trees were analyzed to assess the significance of sampling variability.
Our work is supported by the Deutsche Forschungsgemeinschaft (SFB TR1, TP A7), the Alexander von Humboldt Foundation, New Zealand Marsden Fund, the Australian Research Council, the National Health and Medical Research Council and the Howard Hughes Medical Institute.
- Bui ET, Bradley PJ, Johnson PJ: A common evolutionary origin for mitochondria and hydrogenosomes. Proc Natl Acad Sci U S A. 1996, 93: 9651-9656. 10.1073/pnas.93.18.9651.PubMed CentralView ArticlePubMedGoogle Scholar
- Walter S: Structure and function of the GroE chaperone. Cell Mol Life Sci. 2002, 59: 1589-1597. 10.1007/PL00012485.View ArticlePubMedGoogle Scholar
- Ellis RJ, van der Vies SM: Molecular chaperones. Annu Rev Biochem. 1991, 60: 321-347. 10.1146/annurev.bi.60.070191.001541.View ArticlePubMedGoogle Scholar
- Ellis RJ: Protein folding: importance of the Anfinsen cage. Curr Biol. 2003, 13: R881-R883. 10.1016/j.cub.2003.10.051.View ArticlePubMedGoogle Scholar
- Martin W, Stoebe B, Goremykin V, Hansmann S, Hasegawa M, Kowallik KV: Gene transfer to the nucleus and the evolution of chloroplasts. Nature. 1998, 393: 162-165. 10.1038/30234.View ArticlePubMedGoogle Scholar
- Wastl J, Fraunholz M, Zauner S, Douglas S, Maier UG: Ancient gene duplication and differential gene flow in plastid lineages: the GroEL/Cpn60 example. J Mol Evol. 1999, 48: 112-117. 10.1007/PL00006438.View ArticlePubMedGoogle Scholar
- Hallick RB, Hong L, Drager RG, Favreau MR, Monfort A, Orsat B, Spielmann A, Stutz E: Complete sequence of Euglena gracilis chloroplast DNA. Nucleic Acids Res. 1993, 21: 3537-3544.PubMed CentralView ArticlePubMedGoogle Scholar
- Kowallik KV, Stoebe B, Schaffran I, Kroth-Pancic PG, Freier U: The chloroplast genome of a chl a+c-containing alga, Odontella sinensis. Plant Mol Biol Reporter. 1995, 13: 336-342.View ArticleGoogle Scholar
- Douglas SE, Penny SL: The plastid genome of the cryptophyte alga, Guillardia theta: complete sequence and conserved synteny groups confirm its common ancestry with red algae. J Mol Evol. 1999, 48: 236-244. 10.1007/PL00006462.View ArticlePubMedGoogle Scholar
- Lockhart PJ, Novis P, Milligan BG, Riden J, Rambaut A, Larkum AWD: Heterotachy and tree building: a case study with plastids and eubacteria. Mol Biol Evol. 2005, 23: 40-45. 10.1093/molbev/msj005.View ArticlePubMedGoogle Scholar
- Lockhart PJ, Steel MA: A tale of two processes. Syst Biol. 2005, 54: 948-951. 10.1080/10635150500234682.View ArticlePubMedGoogle Scholar
- Guo Z, Stiller JW: Comparative genomics and evolution of proteins associated with RNA polymerase II C-terminal domain. Mol Biol Evol.Google Scholar
- Holland BR, Huber KT, Moulton V, Lockhart PJ: Using consensus networks to visualize contradictory evidence for species phylogeny. Mol Biol Evol. 2004, 21: 1459-1461. 10.1093/molbev/msh145.View ArticlePubMedGoogle Scholar
- Nakamura Y, Kaneko T, Sato S, Mimuro M, Miyashita H, Tsuchiya T, Sasamoto S, Watanabe A, Kawashima K, Kishida Y, Kiyokawa C, Kohara M, Matsumoto M, Matsuno A, Nakazaki N, Shimpo S, Takeuchi C, Yamada M, Tabata S: Complete genome structure of Gloeobacter violaceus PCC a cyanobacterium that lacks thylakoids. DNA Res. 2003, 10: 137-145. 10.1093/dnares/10.4.137.View ArticlePubMedGoogle Scholar
- Nakamura Y, Kaneko T, Sato S, Mimuro M, Miyashita H, Tsuchiya T, Sasamoto S, Watanabe A, Kawashima K, Kishida Y, Kiyokawa C, Kohara M, Matsumoto M, Matsuno A, Nakazaki N, Shimpo S, Takeuchi C, Yamada M, Tabata S: Complete genome structure of Gloeobacter violaceus PCC 7421, a cyanobacterium that lacks thylakoids (supplement). DNA Res. 2003, 10: 181-201. 10.1093/dnares/10.4.181.View ArticlePubMedGoogle Scholar
- Stoebe B, Maier UG: One, two, three: nature's tool box for building plastids. Protoplasma. 2002, 219: 123-130. 10.1007/s007090200013.View ArticlePubMedGoogle Scholar
- Keeling PJ: Diversity and evolutionary history of plastids and their hosts. Amer J Bot. 2004, 91: 1481-1493.View ArticleGoogle Scholar
- Foth BJ, McFadden GI: The apicoplast: a plastid in Plasmodium falciparum and other Apicomplexan parasites. Int Rev Cytol. 2003, 224: 57-110.View ArticlePubMedGoogle Scholar
- Coppin A, Varre JS, Lienard L, Dauvillee D, Guerardel Y, Soyer-Gobillard MO, Buleon A, Ball S, Tomavo S: Evolution of plant-like crystalline storage polysaccharide in the protozoan parasite Toxoplasma gondii argues for a red alga ancestry. J Mol Evol. 2005, 60: 257-267. 10.1007/s00239-004-0185-6.View ArticlePubMedGoogle Scholar
- Glatz A, Horvath I, Varvasovszki V, Kovacs E, Torok Z, Vigh L: Chaperonin genes of the Synechocystis PCC 6803 are differentially regulated under light-dark transition during heat stress. Biochem Biophys Res Commun. 1997, 239: 291-297. 10.1006/bbrc.1997.7463.View ArticlePubMedGoogle Scholar
- Kovacs E, van der Vies SM, Glatz A, Torok Z, Varvasovszki V, Horvath I, Vigh L: The chaperonins of Synechocystis PCC 6803 differ in heat inducibility and chaperone activity. Biochem Biophys Res Commun. 2001, 289: 908-915. 10.1006/bbrc.2001.6083.View ArticlePubMedGoogle Scholar
- Hill JE, Penny SL, Crowell KG, Goh SH, Hemmingsen SM: cpnDB: a chaperonin sequence database. Genome Res. 2004, 14: 1669-1675. 10.1101/gr.2649204.PubMed CentralView ArticlePubMedGoogle Scholar
- Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, Kimura T, Hosouchi T, Matsuno A, Muraki A, Nakazaki N, Naruo K, Okumura S, Shimpo S, Takeuchi C, Wada T, Watanabe A, Yamada M, Yasuda M, Tabata S: Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-encoding regions. DNA Res. 1996, 3: 109-136. 10.1093/dnares/3.3.109.View ArticlePubMedGoogle Scholar
- Matsuzaki M, Misumi O, Shin IT, Maruyama S, Takahara M, Miyagishima SY, Mori T, Nishida K, Yagisawa F, Nishida K, Yagisawa F, Nishida K, Yoshida Y, Nishimura Y, Nakao S, Kobayashi T, Momoyama Y, Higashiyama T, Minoda A, Sano M, Nomoto H, Oishi K, Hayashi H, Ohta F, Nishizaka S, Haga S, Miura S, Morishita T, Kabeya Y, Terasawa K, Suzuki Y, Ishii Y, Asakawa S, Takano H, Ohta N, Kuroiwa H, Tanaka K, Shimikzu N, Sugano S, Sato , Nozaki H, Ogasawara N, Kohara Y, Kuroiwa T: Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature. 2004, 428: 653-657. 10.1038/nature02398.View ArticlePubMedGoogle Scholar
- Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou S, Allen AE, Apt KE, Bechner M, Brezinski MA, Chaal BK, Chiovitti A, Davis AK, Demarest MS, Detter JC, Glavina T, Goodstein D, Hadi MZ, Hellstein U, Hildebrand M, Jenkins BD, Jurka J, Kapitonov VV, Kroger N, Lau WW, Lane TW, Larimer FW, Lippmeier JC, Lucas S, Medina M, Montsant A, Obornik M, Parker MS, Palenik B, Pazour GJ, Richardson PM, Rynearson TA, Saito MA, Schwartz DC, Thamatrakoln K, Valentin K, Vardi A, Wilkerson FP, Rokhsar DS: The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science. 2004, 306: 79-86. 10.1126/science.1101156.View ArticlePubMedGoogle Scholar
- Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barell B: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002, 419: 498-511. 10.1038/nature01097.View ArticlePubMedGoogle Scholar
- Sato S, Wilson RJ: The use of DsRED in single- and dual-color fluorescence labeling of mitochondrial and plastid organelles in Plasmodium falciparum. Mol Biochem Parasitol. 2004, 134: 175-179. 10.1016/j.molbiopara.2003.11.015.View ArticlePubMedGoogle Scholar
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876.PubMed CentralView ArticlePubMedGoogle Scholar
- Abascal F, Zardoya R, Posada D: ProtTest: Selection of best-fit models of protein evolution. Bioinformatics. 2005, 21: 2104-2105. 10.1093/bioinformatics/bti263.View ArticlePubMedGoogle Scholar
- Huson DH, Bryant D: Application of phylogenetic networks in evolutionary studies. Mol Biol Evol. 2006, 23: 254-267. 10.1093/molbev/msj030.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.