The cyanobacterial endosymbiont of the unicellular algae Rhopalodia gibba shows reductive genome evolution

Background Bacteria occur in facultative association and intracellular symbiosis with a diversity of eukaryotic hosts. Recently, we have helped to characterise an intracellular nitrogen fixing bacterium, the so-called spheroid body, located within the diatom Rhopalodia gibba. Spheroid bodies are of cyanobacterial origin and exhibit features that suggest physiological adaptation to their intracellular life style. To investigate the genome modifications that have accompanied the process of endosymbiosis, here we compare gene structure, content and organisation in spheroid body and cyanobacterial genomes. Results Comparison of the spheroid body's genome sequence with corresponding regions of near free-living relatives indicates that multiple modifications have occurred in the endosymbiont's genome. These include localised changes that have led to elimination of some genes. This gene loss has been accompanied either by deletion of the respective DNA region or replacement with non-coding DNA that is AT rich in composition. In addition, genome modifications have led to the fusion and truncation of genes. We also report that in the spheroid body's genome there is an accumulation of deleterious mutations in genes for cell wall biosynthesis and processes controlled by transposases. Interestingly, the formation of pseudogenes in the spheroid body has occurred in the presence of intact, and presumably functional, recA and recF genes. This is in contrast to the situation in most investigated obligate intracellular bacterium-eukaryote symbioses, where at least either recA or recF has been eliminated. Conclusion Our analyses suggest highly specific targeting/loss of individual genes during the process of genome reduction and establishment of a cyanobacterial endosymbiont inside a eukaryotic cell. Our findings confirm, at the genome level, earlier speculation on the obligate intracellular status of the spheroid body in Rhopalodia gibba. This association is the first example of an obligate cyanobacterial symbiosis involving nitrogen fixation for which genomic data are available. It represents a new model system to study molecular adaptations of genome evolution that accompany a switch from free-living to intracellular existence.


Background
A diversity of extracellular and intracellular symbiotic interactions occurs between bacteria and eukaryote hosts. The degree of interconnection between partners ranges from the weak dependence of some extracellular associations to permanent or obligate intracellular symbiosis. In the latter case, the endosymbiont is transmitted vertically to the next generation without any need for re-infection. The dependence on the host can be stabilised by loss or inactivation of genes, whose products are no longer required in the partnership [1,2]. Consequently, intracellular obligate symbionts loose their autonomy and therefore the capacity for a host-independent life style. Isolated from free-living populations, vertically transmitted endosymbionts have limited possibilities for genetic exchange through processes such as conjugation or transformation. Typically, endosymbiont genes diverge rapidly in comparison to their homologues in free-living relatives, a phenomenon that perhaps reflects genetic drift operating on small population size [3,4] and/or relaxation of structural/functional constraints on endosymbiont protein evolution [5]. The genomes of obligate intracellular bacteria often show an accumulation of deleterious mutations and a higher AT-ratio, accompanied with reduction in genome size when compared to their free-living relatives [6]. The dimension of these processes can be as extreme as seen in the reduced genome of Buchnera sp., an endosymbiont of aphids with a genome size of 641 kbp [7,8] and Carsonella, a γ-proteobacterial symbiont of phloem sapfeeding insects with a genome size of only 160 kbp [9]. Others, like the endosymbionts of the rice weevils Sitophilus zeamais (SZPE) and Sitophilus oryzae (SOPE) as well as Sodalis glossinidius, a symbiont of tsetse flies [10][11][12] represent the other extreme, and exhibit only slight reduction in genome size in comparison to free-living close relatives. Unlike the genomes of Buchnera and Carsonella, the genomes of these endosymbionts do not exhibit unusually high AT content. Intracellular symbionts including SZPE and SOPE, have lost at least one of the recombinational repair enzyme genes encoded by recA and recF, a characteristic found in all other bacterial intracellular symbionts. The only known exception to this finding is the observation of intact genes for recA and recF in S. glossinidius. The occurrence of genes for flagella apparatus still encoded in the genome of S. glossinidus might indicate that this symbiosis has only recently been established [12]. Cyanobacterial interactions with plants and protists are also well known [13][14][15][16][17]. These associations are in most cases facultative, and do not involve vertical transmission. In these cases, the cyanobacterial symbiont re-infects the host every generation. As with other facultative symbioses, genetic modification of the endosymbiont genome is yet undetected [17].
The pennate diatom Rhopalodia gibba harbours endosymbionts closely related to extant cyanobacteria. Some of the closest free-living relatives of these so-called spheroid bodies are diazotrophic cyanobacteria of the Cyanothece sp. group [18]. The spheroid bodies encode genes for nitrogen fixation and have the capacity to fix molecular nitrogen [18,19]. Although the spheroid bodies are of cyanobacterial origin, they lack the typical photosynthetic pigmentation; and thus have been assumed to be photosynthetically inactive. Unlike all other unicellular nitrogen fixing cyanobacteria, they fix nitrogen under light conditions only [18][19][20]. We observe one to four spheroid bodies per host cell depending on culture conditions which are transmitted vertically to the daughter cells during host cell division [15]. Altogether, these findings have led to the supposition that the spheroid bodies of R. gibba are obligate endosymbionts. Physiological adaptation to an intracellular endosymbiotic association is expected to result in genome modification and this expectation has motivated our investigation of gene structure, content and organisation in the spheroid body's genome of R. gibba. In order to investigate this, we have constructed fosmid libraries of the spheroid body and Cyanothece sp. ATCC 51142 and analysed genomic regions of special interest. Here we describe observations and analyses of the nifgene region and also loci relevant to the question of the obligate nature of the spheroid body endosymbiosis. Our investigations show massive genomic changes introduced into the spheroid body's genome. These include inactivation and losses of genes and the creation of large non-coding AT-rich areas. Our observations confirm the obligate nature of the spheroid body endosymbiont and provide insight into the nature of genome changes that accompany endosymbiosis and organelle formation.

Results
The nif-gene region of spheroid bodies and Cyanothece sp. ATCC 51142 For this study we cloned and sequenced the nif-operon and flanking regions from the genomes of the intracellular symbiont of the diatom R. gibba, the spheroid bodies, and a close free-living relative, the diazotrophic cyanobacterium Cyanothece sp. ATCC 51142 ( Figure 1). Altogether we sequenced and analysed a contiguous 63,362 bp Cyanothece fragment comprising the nif gene region and a contiguous 51,475 bp fragment including the corresponding region of the spheroid body's genome. Additionally, we sequenced and analysed 140,000 bp of non-contiguous genomic DNA of the endosymbiont. Using these additional datasets we characterised sequences in the spheroid body's genome for recA, recF, psbC and psbD. For further analyses we also used information from the genome of the recently sequenced Cyanothece strain CCY0110 and other closely related cyanobacteria whose genomes have been sequenced and are available in the NCBI nr data-base. To obtain a phylogenetic framework for making inference of genome modification in the spheroid body's genome we reconstructed maximum likelihood gene trees for all homologues greater than 200 amino acids in the 63,362 bp region. Where possible, trees were outgroup rooted using homologues from Synechocystis sp. PCC 6803. Figure 2 shows supernetworks [21] built for these taxa. These networks summarise the relationships in individual gene trees which do not necessarily need to have the same taxon sampling. In this analysis, with some proteins (NifB, NifN, NifS), the spheroid body's genome was found to be most closely related to Cyanothece sp. ATCC 8801 ( Figure 2a) but with other proteins (NifD, NifH, NifK, NifE), spheroid body sequences have a closer phylogenetic relationship with Cyanothece sp ATCC 51142, Cyanothece strain CCY0110, Crocosphaera watsonii WH 8501, and Gloeothece sp. KO68DGA (Figure 2b). Figure 1 summarises the nif-operon-related regions of Cyanothece sp ATCC 51142 and the spheroid body's genome. In both, components of the nitrogenase dependent cyanobacterial nitrogen fixation machinery are encoded, including structural genes of the nitrogenase nifH, nifD and nifK, cofactors (nifB, nifN, nifE, nifV, nifW) and processing proteins for metal centre biosynthesis (nifU, nifS). As shown in figure 1, synteny of the nif-genes including the size of intergenic regions is very high. There is an overall G/C-content of 40.8% for this region of Cyanothece genome and a G/C-content of 37.2% for the spheroid body sequence. Codon usage is nearly equivalent in both genome regions with a slight AT-bias at the third codon position in the spheroid body genes (see Table 1).
Although the spheroid body's genome contains the same set of genes at the nif locus as in Cyanothece ATCC 51142, remarkable differences are apparent. Notably, between nifB and nifS, the functional fdxN gene is replaced by a pseudogene (fdxN*) in the spheroid body's genome. The coding sequence is interrupted by several stop-codons. In contrast, an intact reading frame for fdxN is conserved in all other close free-living cyanobacterial relatives, indicating formation of the pseudogene is a derived feature of the spheroid body lineage (Figure 3). The presence of a truncated nifU gene is also derived on the endosymbiont lineage, corresponding to approximately 170 amino acids of the N-terminus ( Figure 4). This nifU homologue still encodes an intact open reading frame for the [2Fe-2S] binding and C-terminal NifU-domain. Interesting, in more distantly related cyanobacteria (Synechocystis sp. PCC 6803 and Gloeobacter violaceus PCC 7421) a truncated homologue is also present. Based on the phylogenetic analyses reported in Figure 2 and also comparative analyses of the NifU protein (not shown) it appears that truncation of nifU in spheroid bodies is a derived feature of the endosymbiont lineage.
Gene content in, and downstream of, the nif gene region of Cyanothece sp. ATCC 51142 and spheroid body of R. gibba Figure 1 Gene content in, and downstream of, the nif gene region of Cyanothece sp. ATCC 51142 and spheroid body of R. gibba. Blue and red bars represent orfs coded on the leading or lagging strand of DNA, respectively. The locations of pseudogenes in the spheroid body fragment have been indicated with green bars. Genes have been named either according to homology matches in BLAST analyses or numbered consecutively for each organism (see also additional files 1 and 2). A GATA [29] plot is shown and indicates regions of high synteny between both organisms. GATaligner

Gene deletions and modifications downstream the nifregion
Conserved downstream of the nif-operon genes in Cyanothece ATCC 51142 and the spheroid body are genes encoding subunits of the NADH-dehydrogenase and ferredoxin as well as transporters for Fe and Mo ( Figure 1). However, genes encoding photosynthetic proteins, which are precursors for cytochrome c6 (petJ) and plastocyanin (petE), are absent in the spheroid body in this genome region. Interestingly, elsewhere in the genome, the photosystem II protein genes psbC (CP43) and psbD (D2) exist in an operon-like structure, similar to that of Cyanothece sp. CCY0110 and other cyanobacteria. However, in the spheroid body these genes are highly truncated or disrupted by several stop codons and thus exist as pseudogenes. This finding is consistent with the lack of photosynthetic activity previously reported for the spheroid body [18,19].
Downstream of the highly conserved nif-gene region, other genetic modifications can be inferred. In Cyanothece ATCC 51142, and other closely related cyanobacteria the open reading frame (orf)cyl0012 is flanked by the genes for an iron-transporter (feoA) and a Mo-ABC-transporter (modC). This orf has been deleted in the spheroid body's genome ( Figure 1). In this case the whole gene has been removed without trace of pseudogenisation or local sequence conservation. The orf identified as cyl0019 in Cyanothece ATCC 51142 has also been lost from the nifregion. In this species and other close relatives, this orf is flanked by two conserved orfs cyl0018 and cyl0020. In the spheroid body's genome, cyl0019 is deleted and the flanking orfs cyl0018 and cyl0020 are fused to give to the hypothetical protein sbl0010 ( Figure 5). Not all deletions of genes in the spheroid body's genome have resulted in genome compaction as some genome regions appear replaced by non-coding regions as described in the following section.

Extensive modifications lead to large non-coding regions in the spheroid body's genome
A significant difference between the genome of Cyanothece sp. ATCC 51142 and that of the spheroid body is the extent of non-coding DNA stretches greater than 500 bp ( Figure 6). In the former there are three non-coding stretches at the nif locus (including downstream region). There are seven such regions in the spheroid body's genome fragment. One of these additional non-coding regions is located adjacent to the nif gene region, in a region of high synteny between both genomes. The noncoding regions of the spheroid body's genome are characterised by elevated levels of A and T nucleotides. Several pseudogenes are also located within these genome regions (Figure 1 and 6).
In Cyanothece ATCC 51142, the gene fdxIII and an orf coding for a conserved hypothetical protein (cyl0018) flank four open reading frames (cyl0014/vagC, cyl0015/pilT, cyl0016, cyl0017/vapC). Three of these have highest similarity to virulence associated proteins (vap-proteins) and the fourth has greatest similarity with proteins containing a PIN-domain ( Figure 1). In spheroid bodies, fdxIII and the Genome size, AT-content and nucleotide composition of each codon position are indicated (References: [7,9,12,22,[38][39][40]). n.d.: not determined homologue to cyl0018 (as part of sbl0010, see above) frame a non-coding region of about 2000 bp which contains two pseudogenes: for a flavodoxin, long-chain hypothetical protein and an orf conserved in Crocosphaera sp. (CwatDRAFT_1967). This non coding region shows an increased AT-ratio of 73.5% ( Figure 6). Its presence is the result of extensive genome modification which has led to the deletion of multiple genes. Unlike other regions where there has been gene deletion (e.g. as with cyl0012), this modification is not accompanied by the deletion of the genomic regions. It is possible that these regions might be the outcome of multiple mutations leading to the loss of genes, no longer recognizable as pseudogenes, with preservation of the genomic locus, which might be subsequently eliminated [22]. Because the whole genome size of spheroid bodies is not experimentally determined so far, it is not known if these modifications have lead to an overall decrease of the spheroid body genome size. As described in more detail in the discussion part, we used a bioinformatic model to predict the size of the whole sphe-roid body genome. Using this prediction, the genome size is estimated to be approximately 2,6 Mb.
To investigate whether intact copies of missing or pseudogenised genes (e.g. after genome rearrangement or gene duplication events) are present elsewhere in the spheroid body's genome, we performed PCR analysis with specific or degenerate primers for the cyanobacterial genes fdxN, petJ, psbC, cyl0012 and cyl0017. As shown in Figure 7, no products were amplified using either spheroid body or R. gibba DNA as template, indicating that the identified (pseudo)genes do not have functional counter parts encoded elsewhere in the endosymbiont's genome.

RecA and RecF are encoded by the spheroid body's genome
The proteins RecA and RecF play an important role in recombinational repair of DNA as well as roles in other repair pathways like nucleotide excision (reviewed in [23]). From the study of insect-bacterium symbioses it is known that gene loss and pseudogene creation is often associated with defects in recA, and/or recF [10]. In order to investigate whether the observed inactivation and deletion of genes in the spheroid body's genome might be explained by defects in the bacterial repair systems, we searched the genomes of the spheroid body and of Cyanothece ATCC 51142 for recA and recF. As shown in Figure  8, intact and highly conserved orfs can be identified in both genomes, indicating that repair pathways are active in the R. gibba endosymbiont.

Accumulation of pseudogenes in the spheroid body's genome
As already mentioned, by analysing the contiguous spheroid body's genome fragment (Figure 1) several pseudogenes could be readily identified, among them a mutated ferredoxin gene (fdxN*) within the nif-operon as well as other genes downstream of the nif-gene region. Additional pseudogenes for conserved cyanobacterial proteins were also identified when screening 140,000 bp of non-contiguous genome sequences from the spheroid bodies. These Spheroid body orf sbl0010 encodes a fusion protein derived from homologues of the Cyanothece sp. ATCC51142 Cyl0018 and Cyl0020 proteins Figure 5 Spheroid body orf sbl0010 encodes a fusion protein derived from homologues of the Cyanothece sp. ATCC51142 Cyl0018 and Cyl0020 proteins. A. Alignment of predicted amino acid sequences for spheroid body protein Sbl0010 and Cyanothece proteins Cyl0018 and Cyl0020. Deletion in the endosymbiont genome of cyl0019 in the creation of sbl0010 can be inferred during reductive genome evolution. In Sb10010, homologues of Cyl0018 and Cyl0020 have been conserved in full length and are separated by a 17 amino acid residues. Cyl0018: green, Cyl0020: orange, Sbl0010: black. B. Cyl0018 and Cyl0020 are highly conserved in cyanobacteria closely related to the spheroid body. They are aseparated by 1-5 genes when they co-occur at the same locus, but in some cases they are encoded at different loci of the genome (indicated by x).
were found either via BLAST homology search (minimum e-value: 5e-10) or by analysis of spheroid body's genome regions corresponding to operons conserved in other cyanobacteria. Genetic regions with homologues in other species, divided into two fragments by one stopcodon or frameshift or just truncated were considered pseudogenes if the similar gene region was less than half the length of its homologue [12]. Deleterious mutations leading to pseudogenes were found in genes encoding proteins that affect cellular processes such as cell wall biosynthesis and transposon controlled genome rearrangement (summarized in Table 2). In Cyanothece sp. ATCC 51142, only one pseudogene was identified within the genome fragment that contained the nif-operon ( Table 2). In contrast, six pseudogenes were found in the spheroid body's genome fragment. No pseudogenes were identified in the genome of Cyanothece sp. strain CCY0110 which is a very close relative of Cyanothece sp. ATCC 51142.

Discussion
An intriguing example of an obligate intracellular symbiotic interaction is the cyanobacterium-diatom symbiosis found in Rhopalodia gibba [18]. Here the symbiont (spheroid body) can fix nitrogen for its eukaryotic host, and we have hypothesised that this capacity has been a driving force for establishing the intracellular endosymbiotic relationship [17]. The spheroid body of Rhopalodia gibba provides an opportunity to investigate changes in endosymbiont physiology and genome evolution during adaptation of a symbiont to an intracellular environment.
Previous studies have reported changes in the genomes of bacteria following development of symbiotic relationships. In bacteria that are thought to have recently or transiently become symbiotic, changes include occurrence of multiple transposable elements and deletions of important components of recombinational DNA repair mechanisms [1]. In longer established symbiotic and parasitic eukaryote-bacterium interactions, significant gene losses have been observed, and these have been accompanied by reduction of genome size and generation of AT rich genomes [3,6,24]. Changes that have occurred in the spheroid body's genome can not be categorised as an obvious example of the former or latter relationship. For example, the spheroid body's genome encodes several transposase genes, all with disrupted reading frames, indicating that these are pseudogenes. This finding is consistent with stability of the diatom-spheroid body endosymbiosis and a long term host-endosymbiont interaction, which can be traced back to the Miocene [25].
Contrasting with the occurrence of transposase pseudogenes is evidence suggesting a functional DNA repair system in the spheroid body's genome. This is a finding more consistent with a relatively young endosymbiotic relationship. In nearly all intracellular bacteria studied to date, at A/T-G/C frequencies in Cyanothece sp. ATCC 51142 and spheroid body genome fragments least one of the genes encoding the DNA repair proteins RecA and RecF has been eliminated. It is thought this might be necessary to facilitate restructuring of the symbiont genome (for the exception see [12]). In the spheroid body's genome both recA and recF are present and have intact open reading frames. Thus the genome modifications that we report for the spheroid body's genome have all occurred against a background of a presumably intact DNA repair system. These modifications suggest that selective pressure for certain genes has changed upon establishment of the interaction, and the challenge is to attempt to understand the potential relevance of these for necessary and redundant functions in an obligate endosymbiotic relationship. For example, gene truncations as detected e.g. in nifU, would remove genes redundant for diazotrophic growth [26], and such deletion might be an early event in genome reduction of the symbiont. A subsequent or perhaps parallel step would include the inactivation of genes whose gene products are no longer needed for the initial symbiotic association. For this to occur, various different possible scenarios could be hypothesized: inactivation of genes by deleterious mutations resulting in the accumulation of pseudogenes or loss of genome fragments by deletion of larger DNA portions via rearrangements [27]. Another hypothesis posits a "domino-effect" of initial pseudogenisation triggering subsequent largescale gene loss [28]. In this scenario, random pseudogenisation might lead to the inactivation of a pathway due to mutation of a single essential factor, followed by largescale deletion of other genes involved in this pathway. In each case, the selective pressure would be different for genes coding for different functions, and loss would depend on whether function could be compensated by other genes in the endosymbiont or host cell genome. In the latter case, as in highly adapted interactions, signaldependent transport of the protein from the host cytoplasm to the endosymbiont would be necessary.
We detected several examples for the disruption of coding regions by mutations (Table 1), in which the original gene is still detectable by analysis of all three possible frames. This includes psuedogenisation of fdxN (fdxN*), a gene which has been found to be non essential for nitrogen fixation in Anabaena variablis [20] and several other genes on spheroid body's genome fragments that we have sequenced (Table 2, Figure 1). Such observations provide further evidence that pseudogenisation of genes, which are non-essential for endosymbiotic life-style, is an important feature in the early reductive genome evolution of obligate intracellular cyanobacteria. Gene loss through independent DNA deletion events could also be inferred in comparative analysis of the spheroid body's genome fragment; among these the deletion of factors conserved in diverse cyanobacterial lineages (cyl0012, cyl0019). Due to elimination of the immediate DNA region, these modifications have led to a localised increase in gene density.
In one extreme, deletion has produced a fusion of nonadjacent genes on the endosymbionts genome (sbl0010). In other cases of gene deletion, genes have been removed and replaced with non-coding sequence that is much higher in AT-content than occurs in the coding regions ( Figure 6). It is unclear whether this difference in compo- sition reflects a shift in substitutional bias favouring A and T residues, and/or whether an existing bias becomes more apparent in de-novo regions that are under reduced structural/function constraint. In either event, the existence of these AT rich non-coding regions suggests that pseudogenisation and DNA deletion are not inevitably linked events in a sequential process of degenerative genome evolution in spheroid bodies. However, non-coding regions are rare in genomes of free-living bacteria. Since DNA can be introduced in several ways into prokaryotic genomes, their compactness is maintained by the deletion of harmful DNA. Given the intracellular existence of spheroid bodies, it is possible that their genome is less exposed and less susceptible to introductions of foreign DNA through mechanisms of horizontal gene transfer and lysogenic bacteriphages in comparison to those of free-living bacteria. If so, processes excluding non-coding DNA and pseudogenes from the spheroid body's genome may well be less efficient than those operating in free living bacteria. Such a hypothesis might help explain the greater extent of non-coding DNA and pseudogenised genes in the spheroid body's genome. Increased mutation rates, thought to be associated with reductive genome evolution would contribute to accumulation of these genome features [29]. The genome modifications observed in the spheroid body are in some respects comparable to those of Sodalis glossinidius. A large fraction (49%) of the Sodalis genome is composed of non-coding DNA that has accom- Pseudogenes within the nif-operon and downstream regions (genome regions shown in Figure 1) are indicated in grey.

PCR analysis of missing or pseudogenisised genes
panied reductive genome evolution. Moreover, the Sodalis chromosome contains many unusual pseudogenes [12]. The spheroid body's genome differs from Sodalis with respect to their generally higher AT-content.
The diverse features of reductive genome evolution in obligate intracellular symbionts (and pathogens) include a significant reduction of overall genome size in these organisms. However, the experimental determination of the spheroid body's genome size using standard molecular techniques is difficult due to the extreme stability of the host-spheroid body interaction and the limited amount of intact and purified endosymbionts that can be obtained from R.gibba. Recently in a study on the dynamics of reductive evolution, exponential relationships were inferred between genome size and SSU rDNA GC-content in mitochondria, free-living and obligate intracellular bacteria [30]. Based on the model these authors propose, and using 16S sequence data previously published [18], we have estimated that the genome size of spheroid bodies is approximately 2.6 Mb. The genome size of free-living Cyanothece sp. CCY0110 is 5.8 Mb. Hence if our estimate of the spheroid body's genome size is accurate, this estimate suggests that reduction has produced a genome currently similar in size to that of Synechococcus (2.2-2.6 Mb), and may indicate that the endosymbiosis is still at an early state of development.
Our comparative analyses of spheroid body's genome fosmid sequences indicate that the photosynthetic genes psbC and psbD have been inactivated by mutation in the endosymbiont genome. These gene products are essential factors in the photosynthetic light reaction of photosystem II [31]. According to the "domino-effect" hypothesis [28] initial deletion of components such as PsbC and PsbD is expected to lead to mass deletion of other genes involved in photosynthetic light reactions. Consistent with this prediction, additional photosynthetic factors that occur in closely related cyanobacteria are either absent (e.g. the cytochrome PetJ and the plastocyanine precursor PetE) or appear as a non functional pseudogene (e.g. the flavodoxin fldA*) in the spheroid body's genome.
Aside from gene loss resulting from reductive genome evolution, the absence of certain genes within the analysed genome region could also be explained by gene duplications or rearrangements. Without the complete sequence of the spheroid body's genome we can not exclude the possibility that following duplication, pseudogenisation has affected copies of some genes within the analysed genome region, while functional copies are retained elsewhere. However, the phenotypic loss of photosynthetic pigmentation indicates a complete loss of at least one essential factor of photosynthesis in the endosymbiont's genome. In addition, PCR analysis did not identify intact psbC and psbD genes present anywhere else in the spheroid body's genome (Figure 7).
The diverse modifications in the analysed spheroid body's genome fragment are not equally distributed over the whole sequence but accumulate downstream of the conserved nif gene region (Figure 1 and 5). This skewed distribution of degenerative modifications possibly reflects purifying selection acting across this genome region during the molecular adaptation process [32]. Aside from the mutation of fdxN* -a protein unimportant for nitrogen fixation -and the truncation of nifU, all proteins for nitrogen fixation are conserved in the region without signs of degenerative genome evolution. This conservation of nif genes is consistent with the hypothesis that molecular nitrogen fixation has been an important driving force for the endosymbiotic interaction.
It can be expected that endosymbiont and host biochemistry will change with the development of the symbiotic interaction. Genes whose products become superfluous for symbiont-host coexistence are expected targets for mutation. At earlier stages of accumulation of deleterious mutation, holomologues will still be identifiable by BLAST homology searches. Table 2 lists many pseudogenes that may fit this category.

Conclusion
A diverse range of genetic modifications have occurred in the genome of R. gibba spheroid bodies and these would compromise the ability of the endosymbiont to exist as a free-living cyanobacterium, thereby confirming their suspected obligate status. Our findings provide insight into the genome evolution of a nitrogen-fixing endosymbiontic cyanobacterium living within a unicellular eukaryotic host. These are of special importance, as past inferences about processes of reductive genome evolution have mainly been based on the study of insect-bacteria interactions. In these, the symbionts reside within special cells or organs and thus their genomes may haven been subject to selection pressures different from those acting on the genomes of intracellular endosymbionts found in unicellular host organisms. Further analysis of the whole spheroid body's genome and comparison with free-living cyanobacteria will provide additional important information on the age of the interaction and the importance of different molecular processes and genetic modifications.
Since the spheroid body is derived from cyanobacteriallike ancestors, the interaction could also serve as useful model system for understanding early events in the evolution of chloroplast genomes.
between 1000 and 1500 bp were isolated. The fragments were ligated in the pEZSeq™-vector (Lucigen) as described by the manufacturer and used to transform E. coli XL1blue MRF' cells. Sequencing of shotgun plasmids was carried out using cycle sequencing with 700 and 800 nm fluorescent labeled oligonucleotides and the LICOR™ sequencing system. 5'-and 3'-end sequencing of positive fosmid clones was performed using the primers M13 (700) CYrecF and SBrecF were used as positive controls, with the same primers used for the fosmid screening experiments. PCR reactions were performed using standard procedures.

Sequence homology determinations and annotation
We assembled, finished and annotated sequences using the Sequencher [34] and Sequin software to allocate data and facilitate annotation. Identification of orfs was accomplished using BLAST analyses. Comparison of genome fragments was performed using BLAST analysis and the GATA tool for comparative sequence analysis [35]. Pseudogenes with one or more mutations were identified by BLAST searches or direct analysis of all open reading frames. Genome fragments of both Cyanothece sp. ATCC 51142 and R. gibba spheroid bodies are annotated in Gen-BANK under the accession numbers AY728386 and AY728387, respectively. Genes and orfs identified in both organisms were named according to BLASTp protein homologue names. Those orfs with homology to uncharacterised conserved hypothetical proteins and hypothetical orfs without any BLAST hit are numbered and referred as conserved hypothetical proteins and hypothetical proteins, respectively. Orfs oriented in the forward or reverse direction of the analysed fragment are named cyl or cyr for