Previously, we reported that recA was probably lost in the early stage of RGE in Calyptogena clam symbionts . The present study showed that some of the extant clam symbionts still have intact recA (Figure 2). We hypothesized that in the early phase of RGE of the clam symbionts before the loss of recA, large-sized deletions occurred due to RecA-dependent recombination . This type of deletion requires repeated sequences larger than 200 bp, which have been depleted from the genomes of Rma and Vok [8, 19]. It is still not clear whether the genomes of the Calyptogena clam symbionts containing recA have large-sized (> 200 bp) repeated sequences. The presence of intact or of nearly intact recA and of mutY in clade II symbionts except for Rma suggests that the genomes of clade II symbionts are larger than those of clade I symbionts and that their RGE is in an earlier stage than in clade I symbionts. To resolve these questions, we must await their genome sequence analyses.
The coding region of recA was shown to be mostly deleted in Rma and clade I symbionts (Figure 2A and 2B). A similar large-sized deletion was found in each of the recA-amplicons of clade I symbionts (Figure 2B). This indicates that the shared part of their deletions occurred in the common ancestor of clade I symbionts after divergence from that of clade II symbionts [arrowhead (6) in Figure 1]. While both Rma and clade I symbionts lack recA, the phylogenetic tree strongly suggests that these losses occurred independently in both the ancestral Rma and the common ancestor of clade I symbionts (Figure 1).
Degradations of the ORFs for recA in Rma and in the symbionts of C. stearnsii, C. fausta and C. nautilei indicate that RGE in the extant clade II symbionts of Calyptogena clams is in the transitional stage of recA loss. The loss of recA may start with the degeneration of its ORF by point mutations or a few base insertion/deletion mutations like those in the symbionts of C. fausta, C. nautilei and C. stearnsii (Figures 2 and Additional file 2, Figure S1), then continue in the next stage with larger deletions, e.g., those in Rma and in clade I symbionts (Figures 2 and Additional file 2, Figure S1), generated by successive illegitimate recombinations or replication slippages without RecA [8, 23]. This also suggests that the longer (> 200 bp) repeated sequences were depleted in the symbiont genome, and that as a result RecA was not able to function as a recombinase or a deletion generator in the genome before losing this gene.
A three-dimensional (3D) homology model of RecA reconstruction using the crystal structure of E. coli RecA  as a template showed that the 3D structure of RecA in the symbiont of C. phaseoliformis was similar to that of E. coli (Additional files 3 and 4, Figure S3). RecA consists of three domains: the N-terminal domain functions as a monomer-monomer interface; the central domain is responsible for ATP binding; and the C-terminal domain is responsible for dsDNA binding . This indicates that RecA in the symbionts of C. phaseoliformis, C. fossajaponica and C. pacifica are functional, and that the truncated RecA in C. fausta and C. nautilei symbionts having only the N-terminal 68 amino acids is functionless (Additional files 3 and 4, Figure S3).
In the symbiont genomes of C. fausta and C. nautilei, the truncations of their recAs were respectively caused by the same two-base (CC) insertion mutations at the same position of the gene (Additional file 1, Figure S1). It is not clear whether the insertion occurred in the common ancestor of the symbionts of C. fausta, C. nautilei and C. pacifica [arrowhead (5) in Figure 1] and the inserted sequence was removed later in the symbiont of C. pacifica, or whether the insertions occurred independently in the two symbiont lineages of C. nautilei and C. fausta [arrowheads (3) and (4) in Figure 1]. If an insertion occurs randomly at any position of the genome, the identical two-base insertion would not likely have occurred independently at the same position of two different genomes at approximately the same time. This question should be addressed in future studies of their genomes.
Because no common insertion/deletion or substitutional mutation making a stop codon was detected among the symbionts of C. stearnsii, C. fausta and C. nautilei, the mutations in the C. stearnsii symbiont occurred independently in its lineage [arrowhead (2) in Figure 1].
The recAs of C. fausta and C. nautilei symbionts were shown to have additional insertions (Additional file 1, Figure S1). These insertions may have occurred after the loss of the function of the gene by the insertion of "CC" as a result of the relaxation of selective pressure. While RecA is known to be important for recombination and repair mutations, like double-strand breaks of DNA, intracellular symbionts tend to lose it . The selective pressure to retain recA probably remained in the early evolutionary stages of the Calyptogena clam symbionts. However, after the loss of large-sized repeated sequences, the selective pressure for retaining recA may have decreased.
In clade II symbionts, the present data indicate that their recAs are currently deteriorating. This also supports the above hypothesis that the RGE stage due to recA-dependent deletion is probably ending in these extant genomes.
The DNA repair gene mutY was found in the genomes of clade II symbionts except for Rma (Figure 1). In Rma, mutY was found to be split into two ORFs (Figure 3A) by a substitution of the 501st G with A, making a new stop codon (Additional file 2, Figure S2). The phylogenetic tree indicates that this mutation occurred in the Rma lineage after divergence from the symbionts of C. phaseoliformis and C. fossajaponica [arrowhead (1) in Figure 1]. MutY has been shown to be composed of the N-terminal and C-terminal domains (Additional files 5 and 6, Figure S4) . Substrate DNA binds to the cleft between these two domains . While 3D homology modeling showed that MutY of C. phaseoliformis, C. fossajaponica, C. fausta, C.nautilei and C. pacifica symbionts seemed to have an intact 3D structure and to be functional (Additional files 5 and 6, Figure S4), the split gene products of the Rma mutY fragments are functionless (Additional files 5 and 6, Figure S4). The evidence that the gene encodes an almost intact amino acid sequence/architecture indicates that Rma lost the functionality of mutY relatively recently.
The G+C content of genomes generally tends to decrease in obligate intracellular symbionts with decreasing genome size . MutY is known to repair A-G mismatches to C-G . The loss of mutY in a genome is expected to decrease the G+C content [8, 27]. However, many insect intracellular symbionts such as Buchnera spp. with genomes that have low G+C content still have mutY . In addition, a recently found very small genome of the insect symbiont Ca. Hodgikinia cicadicola lacking mutY has a high G+C content . These may contradict the above view and indicate that the loss of mutY does not significantly contribute to the decrease in the G+C content of the genome. However, in this study, the G+C content in the 16S and 23S rRNA gene sequences was significantly lower in the Calyptogena symbionts without mutY than that in the symbionts with mutY (Table 4). This supports the hypothesis that the loss of mutY contributes to the GC bias of the genome [20, 27]. The G+C content of Rma was intermediate between the two symbiont clades. This agrees with the view that it lost functional mutY more recently than clade I symbionts during evolution. This result also coincides with the data showing that the G+C content of the Rma genome (34.0%) is higher than that of Vok (31.6%) [16, 17]. Stewart et al. have recently reported that the G+C contents of 9 genes including 16S and 23S RNA genes of the symbionts in the gigas/kilmeri clade that corresponds to clade I in the present study were significantly lower than those of another clade that corresponds to clade II in the present study (Additional file 7, Figure S5) . Although it was not clear whether the symbionts in the other clade reported by Stewart et al.  had mutY or not, the present results suggest that they do and thus the G+C contents of their genes are higher than those of the symbionts in the gigas/kilmeri clade.
It has recently been shown that mutational bias of GC→AT is a general trend in bacteria, and this trend may be counterbalanced by biased gene conversion and natural selection to maintain the G+C contents [29–31]. In intracellular symbionts, relaxation of natural selection, lower recombination frequency, small effective population size, codon usage, availability of nucleotides in the cytoplasmic pool and loss of DNA repair genes may contribute to lower G+C content [4, 31, 32]. In addition to the loss of mutY, any of these factors may have also contributed to a greater reduction of the G+C content in symbionts in clade I compared with those in clade II. However, this remains to be studied in future.
The present phylogenetic tree shows that both mutY and recA have been lost in Rma and in clade I symbionts (Figure 1). Were the losses in clade I symbionts and Rma accidental coincidences or related phenomena? The loss of recA may increase the mutation rate of the genome and hence increase the possibility of losing other genes such as mutY. It is also noteworthy that the branch length of Rma is longer than other branches in the clade II lineage, and the branch length from the node between clade I and II symbionts (* in Figure 1) to the node of clade I symbiont radiation (*** in Figure 1) is longer than the length to the node of clade II symbiont radiation (** in Figure 1). As a result, the loss of recA which occurred in Rma and clade I symbionts independently may have increased the mutation rate and elongated these branch lengths. This may also increase the probability of losing other genes including mutY.
Once genes lose their functions, their selective pressure must be relaxed and their mutation rates are expected to increase . In the functionless recAs of C. fausta and C. nautilei, one additional mutation was found in each (Additional file 1, Figure S1). Two additional deletions were also found in the Rma mutY (Additional file 2, Figure S2). These may be the result of the decreased (relaxed) selective pressure after the losses of the functions of the genes.
While an evolutionary event like the loss of a gene for DNA repair or recombination may occur spontaneously in a certain lineage, it must greatly affect the later evolutionary fate of that lineage. We previously suggested that the loss of recA probably stabilized the genome architecture in Calyptogena clam symbionts [8, 34]. The present data raise the possibility that the loss of mutY affected the G+C content of the genomes of the Calyptogena symbionts. The effect of losing genes for DNA recombination and repair on their RGE will be analyzed by sequencing the genomes of other Calyptogena clam symbionts, which is now in progress and will be published elsewhere.