Research article | Open | Published:
Evolution of ribonuclease H genes in prokaryotes to avoid inheritance of redundant genes
BMC Evolutionary Biologyvolume 7, Article number: 128 (2007)
A theoretical model of genetic redundancy has proposed that the fates of redundant genes depend on the degree of functional redundancy, and that functionally redundant genes will not be inherited together. However, no example of actual gene evolution has been reported that can be used to test this model. Here, we analyzed the molecular evolution of the ribonuclease H (RNase H) family in prokaryotes and used the results to examine the implications of functional redundancy for gene evolution.
In prokaryotes, RNase H has been classified into RNase HI, HII, and HIII on the basis of amino acid sequences. Using 353 prokaryotic genomes, we identified the genes encoding the RNase H group and examined combinations of these genes in individual genomes. We found that the RNase H group may have evolved in such a way that the RNase HI and HIII genes will not coexist within a single genome – in other words, these genes are inherited in a mutually exclusive manner. Avoiding the simultaneous inheritance of the RNase HI and HIII genes is remarkable when RNase HI contains an additional non-RNase H domain, double-stranded RNA, and an RNA-DNA hybrid-binding domain, which is often observed in eukaryotic RNase H1. This evolutionary process may have resulted from functional redundancy of these genes, because the substrate preferences of RNase HI and RNase HIII are similar.
We provide two possible evolutionary models for RNase H genes in which functional redundancy contributes to the exclusion of redundant genes from the genome of a species. This is the first empirical study to show the effect of functional redundancy on changes in gene constitution during the course of evolution.
The science of molecular evolution has paved the way for understanding evolutionary processes in which genetic redundancy (the presence of two or more genes capable of serving the same functional role in an organism) is a major source of genetic novelty and robustness. In fact, a recent analysis of 106 bacterial genomes revealed that a significant number of genetic redundancies have persisted in individual genomes , and systematic gene deletion experiments have demonstrated that approximately 300 out of 4000 genes are indispensable for two bacterial species, Escherichia coli  and Bacillus subtilis , suggesting the presence of considerable redundancy in the bacterial genome. However, these findings also raise the question of how genetic redundancy is maintained within a genome, because functionally redundant genes are likely to be eliminated by selective pressure as shown in a large-scale analysis of protein-protein interactions in Saccharomyces cerevisiae, in which the redundant interactions due to duplicated genes generally are not persisted long after gene duplication . In order to explain the process, a theoretical model has been developed to provide insight into the retention of redundant genes, which is hypothesized to depend on their degree of functional redundancy , there have been no empirical studies of gene evolution to support this theoretical model directly. The study described here aimed to substantiate the model of evolution of genetic redundancy on the basis of the analysis of a ribonuclease family that has contributed to our understanding of some aspects of molecular evolution, such as adaptive evolution [6, 7], positive Darwinian selection , and the origin of retroviruses with long terminal repeats (LTR) .
Ribonuclease H (RNase H; EC 220.127.116.11), one member of the ribonuclease family, is an enzyme that specifically degrades the RNA moiety of RNA-DNA hybrids . Because various studies have revealed the presence of RNase H in eukaryotes, prokaryotes, and retroviruses, this compound is considered to be one of the most widely conserved enzymes . Although the physiological functions of RNase H are not fully understood, this enzyme is thought to play several roles in DNA replication [12–14], DNA repair [15, 16], and RNA transcription [17, 18]. In terms of its medical importance, RNase H activity in retroviruses (including HIV-1) is necessary for their replication, and the enzyme has thus been regarded as one of the drug targets for AIDS chemotherapy . This enzyme is also suggested to be related to the antiviral immune response in humans, because mutations of the RNase H-encoding gene have been found in individuals affected by a human neurological disease, Aicardi-Goutières syndrome . Therefore, the accumulation of experimental data on the secondary structures and enzymatic features of RNase H from many studies of its biological significance has provided us with an opportunity to use this knowledge in the field of molecular evolution.
Unlike retroviruses, which possess a single RNase H gene, most prokaryotic and eukaryotic genomes contain multiple RNase H genes. According to the nomenclature for these enzymes, prokaryotic RNase H is generally classified into three groups: RNase HI, HII, and HIII. Eukaryotic RNase H is divided into RNase H1 and H2 . Phylogenetic analyses using RNase H sequences have proposed the following classification: Type 1 (prokaryotic RNase HI, eukaryotic RNase H1, and viral RNase H) and Type 2 (prokaryotic RNase HII and HIII, and eukaryotic RNase H2), and it is important to note that no prokaryotic species with a combination of RNase HI and HIII genes has yet been identified . Additionally, in contrast to eukaryotes, which tend to contain both RNase H1 and H2 genes, the combination of RNase H genes in prokaryotes varies among species, and the overall nature of this variation is poorly understood. Therefore, further study is required to clarify the presence or absence of RNase H genes in these species.
We conducted a comparative analysis of the complete genomes of 353 prokaryotes (326 bacteria and 27 archaea) and examined the combination of RNase H genes and the potential evolutionary processes that could explain the effects of functional redundancy on gene evolution, as described by a theoretical model of genetic redundancy . Our findings suggest that the RNase HI and HIII genes have evolved in a mutually exclusive manner owing to their functional similarities. This molecular evolution of RNase H genes is the first actual example of how the degree of functional redundancy has implications for changes in gene constitution during the course of evolution.
Genome-wide identification of RNase H genes
To identify RNase HI, HII, and HIII coding sequences in a genome, two strategies (a remote homology search and a protein domain search) were applied to ensure maximum coverage of the genes (See Methods). Using the complete genomes for 326 strains from 235 bacterial species and 27 strains from 27 archaeal species, we retrieved 342 RNase HI genes, 333 RNase HII genes, and 76 RNase HIII genes (see Additional file 1). Almost all genomes contained one or more RNase H genes, and there was little difference in the types and numbers of RNase H genes among several strains of a given species, with the exception of Buchnera aphidicola and Xanthomonas campestris. The RNase HI-related gene of B. aphidicola str. APS (Acyrthosiphon pisum) contained a frameshift mutation that resulted in a loss of RNase H activity (Dr. Naoto Ohtani, Keio University, personal communication), whereas non-frameshifted RNase HI genes were identified in B. aphidicola str. Bp (Baizongia pistaciae) and B. aphidicola str. Sg (Schizaphis graminum). A frameshift mutation was also found in the RNase HII-related gene of Xanthomonas campestris pv. vesicatoria str. 85-10. In contrast, other strains of X. campestris pv. campestris (str. 8004 and str. ATCC 33913) had a non-frameshifted RNase HII. Therefore, we assumed that B. aphidicola had an RNase HI gene and X. campestris had an RNase HII gene at the species level. Accordingly, we counted the number of RNase H genes in the 27 archaeal and 235 bacterial species listed in Table 1. The RNase HI gene was present in 33% (9/27) of the archaeal species and 89% (210/235) of the bacterial species, the RNase HII gene was present in all archaeal species and in 94% (220/235) of the bacterial species, and the RNase HIII gene was identified in 4% (1/27) of the archaeal species and in 17% (40/235) of the bacterial species. This result is consistent with a previous report that RNase HII is the more universal gene in prokaryotes . Most species had a single copy of a given gene, but multiple genes encoding RNase HI were found in 11% (3/27) of the archaeal species and 16% (37/235) of the bacterial species.
Alteration of RNase H combinations in closely related species
Contrary to the situation in eukaryotes, in which both RNase H1 and RNase H2 tend to coexist, various combinations of RNase H genes have been found in prokaryotes . To compare the presence and absence of RNase H genes among prokaryotes, we examined the combinations of RNase H genes in individual genomes. Three types of RNase H genes can theoretically produce eight combinations of genes, as shown in the Venn diagram in Figure 1. Because, in practice, we found no species that lacked all three genes (Group H), all species were classified on the basis of the remaining seven RNase H combinations (Table 2). No prokaryotic genome contained the combination of only RNase HI and HIII (Group D); this supports the results of a previous study  at the genome-wide level. Although many archaeal species contained only the RNase HII gene (Group F) or a combination of RNase HI and HII genes (Group B) – a finding in agreement with previous reports [22, 23] – one of the euryarchaeota, Methanosphaera stadtmanae DSM 3091, combined RNase HII with RNase HIII (Group C) instead. On the other hand, 189 of the 235 bacterial species (80%) had combinations of the RNase HI and HII genes (Group B) and 16 of the 235 species (7%) had combinations of the RNase HII and HIII genes (Group C). At the same time, the RNase H combinations in bacteria exhibited more variety than those in the archaea and seemed to differ even among related species, especially in the firmicutes. Interestingly, species that had all three RNase H genes (Group A) were limited to the firmicutes.
To elucidate the relationship between RNase HI and HIII, the evolutionary genomic constitution of the RNase H genes was examined in 49 species of firmicutes, because RNase HIII is especially common in this group (classes A, C, D, and G in Table 2). First, we constructed a Bayesian tree based on the nucleotide sequences of the DNA gyrase subunit B (gyrB) genes of the firmicutes, which have been used to infer phylogenetic relationships among prokaryotes , and displayed the RNase H combinations of each species (Figure 2). The results of our phylogenetic analysis indicate that RNase H combinations differed even among closely related species. For example, the species in the mollicutes were classified into Groups B (RNase HI and HII), C (RNase HII and HIII), and G (only RNase HIII), showing that the RNase HIII gene is not found in the mollicutes that retain the RNase HI gene. In addition, species with all three RNase H genes (Group A) were found only in the bacillales and lactobacillales, because this combination is not found in species other than firmicutes (see Table 2).
More noteworthy is the fact that the RNase HI genes of the species that also have RNase HII genes (Group B) often encode additional conserved protein domains, as represented by the presence of Group B' in Figure 2. This non-RNase H domain was first identified in the N-terminal portion of eukaryotic RNase H1  and was designated as a double-stranded RNA (dsRNA) and an RNA-DNA hybrid-binding domain (dsRHbd) because of its ability to bind to dsRNA as well as RNA-DNA hybrids [26–28]. In prokaryotes, it has been reported that RNases HI of Bacillus halodurans  and of Shewanella sp. SIB1  that have dsRHbd in the N-terminus possess RNase H activity. In contrast, no such domain was identified in RNase HI of the species that had all three types of RNase H (Group A). Interestingly, RNase HI of B. subtilis [REFSEQ: NP_390082], a member of Group A, exhibited neither RNase H activity nor other nuclease activity, even though RNase HII and HIII possess RNase H activity [22, 31]. This may indicate a difference of RNase H activity between RNase HI without dsRHbd (Group A) and RNase HI with dsRHbd (Group B').
To identify differences in the primary and secondary structures between RNase HI without dsRHbd (Group A) and RNase HI with dsRHbd (Group B'), multiple alignments were performed using the amino acid sequences of each RNase HI domain in the bacillales and lactobacillales and the E. coli RNase HI domain. If the species had multiple RNase HI genes, one gene that was more similar to E. coli RNase HI than to any other gene was selected; these are described in Additional file 2. As a result, the RNase HI sequences were divided into three groups (Figure 3). The amino acid sequences of RNase HI in Group A were similar to that of B. subtilis RNase HI, which exhibited no nuclease activity. In contrast, the primary structures of RNase HI with dsRHbd formed two groups: Group B'1, in which the primary structures of lactobacillales RNase HI were similar to that of E. coli RNase HI, whose nuclease activity has been demonstrated , and Group B'2, in which B. halodurans and B. clausii RNase HI had little similarity to other RNase HI but showed RNase H activity . There is also a marked difference in the secondary structure. RNase HI in Group A lacked the basic protrusion handle (alpha-helix 3) involved in substrate binding of E. coli RNase HI [33, 34]. On the other hand, all of the lactobacillales RNase HI with dsRHbd in Group B'1 had the basic protrusion handle. Although the basic protrusion handle is not observed in B. halodurans and B. clausii (Group B'2), it has been proposed that dsRHbd could functionally compensate for this basic protrusion . From the relationship between structural similarity and RNase H activity, it can be inferred that RNase HI with dsRHbd in Group B' exhibits RNase H activity but it is unclear whether RNase HI in Group A exhibits RNase H activity or not, because the archaeal RNase HI of Halobacterium sp. NRC-1  and Sulfolobus tokodaii 7  exhibited weak RNase H activity despite the absence of the basic protrusion handle. However, the fact that a double knockout of RNase HII and HIII genes in B. subtilis yields a lethal phenotype  indicates that Group A RNase HI genes encoded in the B. subtilis genome do not have the ability to compensate for functions of RNase HII and HIII. Therefore, our results (that RNase HIII is not present in Group B but is present in Group A) suggest that there is some sort of relationship between protein functions and gene constitutions.
Phylogenetic distribution of dsRHbd sequences
Our results (Table 2) clearly showed that the combination of RNase HI and HIII genes (Group D) was not found in the prokaryotes and that most bacterial species had combinations of RNase HI and HII (Group B) or RNase HII and HIII (Group C). Moreover, the combination of RNase H genes has been altered even among closely related species in such a way that functional RNase HI and HIII genes do not coexist in a single genome; in other words, our results provide evidence that RNase HI and HIII tend to evolve in a mutually exclusive manner. Avoiding the simultaneous inheritance of the RNase HI and HIII genes is remarkable when RNase HI contains dsRHbd in the firmicutes, because dsRHbd sequences were found in 15 out of 18 species that combined the RNase HI and HII genes (Group B) and were not found in any of the 15 species that had all three RNase H genes (group A) (see Figure 2). Therefore, dsRHbd appears to be a key domain in the evolutionary process that has led to the current distribution of RNase H genes. Although the characteristics of dsRHbd, such as its enzymatic features [25, 27] and its secondary structure, have been compared with those of eukaryotic RNase HI , little is known about the number and types of dsRHbd in prokaryotes. Therefore, we searched for dsRHbd sequences in the complete genomes of 326 strains from 235 bacterial species and 27 strains from 27 archaeal species in the same way that we searched for the RNase H sequences (See Methods).
The results revealed that the genomes of 30 bacterial species (one of which had two strains) and 1 archaeal species encoded dsRHbd (Table 3), and that the distribution pattern of dsRHbd in prokaryotes did not appear to be correlated with the phylogenetic pattern. Most dsRHbds are fused with the RNase HI domain, but Lactobacillus delbrueckii has two genes encoding dsRHbd; one is associated with the RNase HI domain and the other is associated with the resolvase domain. In addition, it is interesting that the dsRHbds of Gloeobacter violaceus, Bdellovibrio bacteriovorus, and Myxococcus xanthus were identified in the C-terminus of RNase HI even though many dsRHbds were in the N-terminus, as in the eukaryotes. Multiple alignments of the amino acid sequences of prokaryotic dsRHbds showed that the sequences of dsRHbd located in the C-terminus were similar (Figure 4). The process of dsRHbd acquisition can be inferred from the fact that almost half of the RNase HI with dsRHbd was found in firmicutes that have the abilities to acquire new genes through lateral gene transfer . In addition, RNase HIII genes were not found in any genomes of the 31 species that encoded RNase HI with dsRHbd (Additional file 3), supporting the hypothesis of mutually exclusive evolution of RNase HI and HIII.
Redundant RNase HI genes in a single genome
We also found that 10 of the 31 species listed in Table 3 had multiple RNase HI genes (see Additional file 3). If RNase HI with a dsRHbd gene influences the existence of the RNase HIII gene in a genome, how is the effect exerted on other RNase HI genes? To address this question, we examined the amino acid sequences of RNase HI without dsRHbd in these 10 species. The RNase HI without dsRHbd that were found in five species in the firmicutes and one species in the deltaproteobacteria, with the exception of B. bacteriovorus, were similar in structure (e.g., lacked the basic protrusion) to that of the Group A RNase HI (see Figure 3). On the other hand, the primary structures of RNase HI without dsRHbd in three species of gammaproteobacteria resembled that of E. coli, and there were few differences in their amino acid sequences. Because the primary structures of RNase HI with dsRHbd in the same species in the gammaproteobacteria were also similar to that of E. coli, it is difficult to distinguish redundant RNase HI genes on the basis of their amino acid similarities.
To identify the differences among redundant RNase HI sequences of the gammaproteobacteria (see Additional file 4), we constructed a Bayesian tree based on the nucleotide sequences of the RNase HI domains from 12 species in the gammaproteobacteria (Figure 5). This analysis divided the RNase HI domains into four gene clusters: orthologous RNase HI, including E. coli RNase HI (Group I); RNase HI with dsRHbd (Group II); and other two groups of additional RNase HI (Groups III and IV). Because RNase HI genes in Group I appear to have been inherited by vertical descent from a common ancestor, we defined them as orthologous RNase HI genes. On the other hand, RNase HI genes of Group II to IV seem to have been provided by gene duplication or lateral gene transfer in addition to the original RNase HI genes. Interestingly, orthologous RNase HI was not found in Saccharophagus degradans that contains RNase HI with dsRHbd (Group II). In contrast, Pseudoalteromonas atlantica contains orthologous RNase HI (Group I) instead of RNase HI with dsRHbd, though the presence of Group III RNase HI is common to S. degradans and P. atlantica. In addition, orthologous RNase HI was not found in the genome of Colwellia psychrerythraea, which contains only RNase HI with a dsRHbd gene (Group II). The same statement applies to 21 other prokaryotic species that have only RNase HI with dsRHbd (see Additional file 3). On the other hand, we also found that orthologous RNase HI (Group I) and RNase HI with dsRHbd (Group II) had both been retained in two genomes of Photobacterium profundum and Shewanella denitrificans. These results suggest that RNase HI with dsRHbd may be capable of replacing the original RNase HI. A lineage-specific characterization such as the mapping of gene trees onto species trees using a soft parsimony algorithm  is necessary for more precise analysis of the transition of RNase HI genes during the course of evolution.
Using genome-wide and phylogenetic analyses of RNase H genes, we obtained the following findings: (1) most bacterial species had combinations of RNase HI and HII (80%, 189 out of 235 species) or RNase HII and HIII (7%, 16 out of 235 species); (2) the combination of RNase HI and HIII genes was not found in any species (0% in Group D) unless RNase HII was also present (Group A; 15 species in the firmicutes); (3) the combination of RNase H genes has been altered, even in closely related species, in such a way that the functional RNase HI and HIII genes do not coexist in a single genome; (4) dsRHbd was found in RNase HI in 31 out of 189 species (16%) that contain the RNase HI and HII genes; (5) dsRHbd was not found in the RNase HI in all 15 species that contained all three types of RNase H genes; and (6) RNase HI with dsRHbd may have replaced the orthologous RNase HI without dsRHbd in 21 out of 31 species (68%) that have RNase HI with dsRHbd.
To ascertain the cause of the mutually exclusive evolution of RNase HI and RNase HIII, we focused on their enzymatic properties. Previous reports have indicated that RNase HI and HIII digest the RNA moiety of RNA-DNA hybrids such as Okazaki fragments more effectively than is the case for RNase HII , whereas only RNase HII is capable of removing a single ribonucleotide of DNA-RNA-DNA/DNA hybrids such as an RNA that has been misincorporated into DNA [15, 16]. In addition, mutagenesis analyses of B. subtilis RNase H genes have shown that single-gene knockout mutants targeting the RNase HII or HIII genes exhibit normal growth, but that double-knockout mutants for both genes are unable to form viable colonies; this suggests that a functional overlap exists between RNase HII and HIII . On the other hand, the existence of functional redundancy between RNase HI and HII is not clear, although double-knockout mutants of E. coli RNase HI and HII exhibit a temperature-sensitive phenotype (Dr. Mitsuhiro Itaya, Keio University, personal communication). We hypothesize that the functional similarities and differences among the three RNase H genes may explain this evolutionary process, because a theoretical model of genetic redundancy suggests that the fates of redundant genes are likely to depend on the extent of their functional redundancy .
According to computer simulations using a genetic redundancy model , redundant genes do not persist when they are equally effective at performing their functions (Model 1). On the other hand, redundant genes are evolutionarily stable in two situations: when both genes perform the same function but one is less efficient than the other gene (Model 2), and when the main functions of the two genes differ but one of the genes functions similarly to the other gene, but with lower efficiency (Model 3). The insights from this simulation can be applied to the molecular evolution of RNase H genes in prokaryotes. At first glance, the reason why most bacterial species have combinations of RNase HI with RNase HII or RNase HII with RNase HIII can be explained by Model 3; that is, these combinations are evolutionarily stable because both genes in each combination have independent functions but with an unknown degree of functional overlap. Likewise, the combination of RNase HI and HIII genes is evolutionarily unstable owing to their functional redundancy (Model 1), and this may explain why no species has both functional genes in its genome and why the combination of RNase H genes has been altered even in closely related species in such a way that RNase HI and HIII genes will not coexist in a single genome.
It seems that the effect of functional redundancy is more severe for RNase HI with dsRHbd in firmicutes, because RNase HIII was found in all 15 species whose genomes encode RNase HI without dsRHbd but was not found in any of the 31 species containing RNase HI with dsRHbd. Given the distribution pattern of RNase HI with dsRHbd in prokaryotes, we proposed the following process: once RNase HI with dsRHbd is acquired (for example, by lateral gene transfer ), the combination of RNase HI with dsRHbd and RNase HIII may become evolutionarily unstable owing to their functional redundancy, and one of them is subsequently removed during the course of evolution. We propose this evolutionary process as Model A in Figure 6A. In particular, it is interesting that the RNase H combinations of two species that are regarded as the deepest branching organisms are different: Thermotoga maritima has a combination of RNase HI with dsRHbd and RNase HII, whereas Aquifex aeolicus retains a combination of RNase HII and HIII . This may reflect the ancient status of these RNase H combinations in bacteria, and RNase HIII might have been altered along with RNase HI with dsRHbd in T. maritima in accordance with our model. Also, the fact that RNase HIII genes are less abundant than those of RNase HI and HII (Table 1) suggests the possibility that RNase HIII genes have been replaced in genomes by other RNase H genes during the course of evolution.
A similar scenario might have occurred in the case of redundant RNase HI genes, because our findings also suggest that RNase HI with dsRHbd may have replaced the existing RNase HI without dsRHbd in 21 species. As shown in Model B (Figure 6B), once RNase HI with dsRHbd is obtained, it competes with RNase HI without dsRHbd because of their functional redundancy, and one of them is excluded. It is also noteworthy that RNase HI with dsRHbd is encoded as a single-copy gene in prokaryotic genomes (Table 3). Interestingly, human genome contains one RNase H1 with dsRHbd and at least three pseudogenes related to RNase H1 with dsRHbd . Although we previously showed that four RNase H1-encoding genes in Caenorhabditis elegans exhibited gene-specific expression patterns during development; one gene encodes RNase H1 with dsRHbd and other three gene encode RNase H1 without dsRHbd , it was also found that most of the eukaryotic genomes contained single-copy genes encoding RNase H1 with dsRHbd but RNase H1 without dsRHbd had rarely been identified in the same genome (data not shown). Functional characteristic of RNase H1 with dsRHbd seems to depend on eukaryotic species because disruptions of RNase H1 with dsRHbd resulted in lethal phenotype of fly  and mice  but showed normal growth in yeast  and trypanosoma . However, it appears that prokaryotic and eukaryotic genomes have single-copy RNase H1 with dsRHbd and this tendency might be explained by functional redundancy within individual genomes. In a future work, more detailed analysis of eukaryotic RNase H1 genes is required to show the effect of functional redundancy on the evolution of redundant genes in eukaryotes.
We also discovered that several RNase HI genes could exist in a single genome (Table 1) and that there are some cases in which orthologous RNase HI is retained in the presence of RNase HI with dsRHbd (Figure 5). This raises the question of how multiple RNase HI genes can be retained in a single genome. This is difficult to explain using the genetic evolution models described in Figure 6, because multiple RNase HI genes should also have the same function and should be subject to the same mechanisms that govern the fate of redundant genes. In the case of duplicated genes, the usual fate of redundant genes is that one is silenced through a strong purifying selection after a brief period of relaxed selection . The number of RNase HI genes differed among species even within the same lineages, suggesting that gene duplication or gene transfer might have occurred relatively recently and that redundant genes may have arisen during a period of relaxed selection. An alternative possible explanation for multiple RNase HI is that neofunctionalization and subfunctionalization have been shown by computer simulation to increase the retention rate of duplicated genes [46, 47]. Although it is not known whether the retention of multiple RNase HI genes resulted from subfunctionalization or neofunctionalization, RNase HI appears to represent the acquisition of a new function based on the example of Streptomyces coelicolor A3(2), which encodes a bifunctional enzyme consisting of an RNase H domain and an acid phosphatase domain . In addition to subfunctionalization, it is also possible that some of the RNase HI genes identified in this study might be in the process of nonfunctionalization (pseudogenization) and can be expected to become pseudogenes . Actually, genomic sequences encoding truncated RNase HI domains have been found in some species during genome-wide identification of RNase H genes (data not shown), suggesting the existence of one or more nonfunctionalized RNase HI gene in our dataset. Moreover, even if the coding sequences seem not to have been nonfunctionalized, the regulatory regions might have mutations because duplicated genes are considered to be under active selection pressure owing to energy constraints on gene expression . Further investigations will be necessary to reveal the effect of each functionalization on multiple RNase HI genes in prokaryotes.
In this study, two possible models were provided to explain the evolution of RNase H combinations in prokaryotic genomes. We believe that our models are the first example of the effects of functional redundancy on changes in gene constitution during the course of gene evolution. Experimental evolution of bacterial species constructed to have mutually exclusive genes by means of genetic engineering may be effective in verification of our models. For example, RNase HI and RNase HIII genes tagged with different drug resistances are inserted into the RNase HI-knockout mutants of E. coli and repeated subcultures of the recombinants allow us to detect the mutated RNase H gene using specific drug resistances as markers. This experimental approach would certainly be worthwhile to explore the fate of redundant RNase H genes in future research.
We identified three genes that encode RNase H enzymes and examined the combinations of these genes in 353 prokaryotic genomes. Our results showed that RNase H combinations might have evolved in such a way that the RNase HI and HIII genes will not be inherited together within an individual genome and that this tendency is prominent when RNase HI contains dsRHbd. This mutually exclusive evolution of RNase H genes seems to be related to functional redundancy, because previous reports have suggested that the substrate preferences of RNase HI and HIII are similar. Taken together, these results suggest possible evolutionary models for the RNase H genes in which functional redundancy contributes to the exclusion of redundant genes. Our findings thus provide a good example of the effects of functional redundancy on gene evolution, confirming certain theoretical predictions.
Genome-wide identification of genes encoding RNase H and dsRHbd
Complete genomes of 326 strains from 235 bacterial species and 27 strains from 27 archaeal species and the corresponding GenBank files were downloaded from the National Center for Biotechnology Information (NCBI) GenBank FTP site ; their accession numbers are summarized in Additional file 1. Two strategies were applied to identify sequences of RNase H and double-stranded RNA and RNA-DNA hybrid-binding domains (dsRHbd) in the complete genomes. One was a remote homology search with the PSI-BLAST software  and the other was a protein domain search based on Hidden Markov Model (HMM) profiles .
For the PSI-BLAST search, a non-redundant peptide sequence database was downloaded from the NCBI BLAST FTP site . From this database, peptide sequences of prokaryotes and eukaryotes were extracted by using taxonomy information obtained from the NCBI Taxonomy FTP site . To construct a position-specific scoring matrix, a PGP-BLAST search was carried out against 3 506 454 extracted peptide sequences, with an E-value threshold of 0.002 and four iterations. The amino acid and nucleotide sequences corresponding to the RNase HI domain of E. coli K12 [GenBank: AAC73319], the RNase HII domain of E. coli K12 [GenBank: AAC73294], the RNase HIII domain of B. subtilis subsp. subtilis str. 168 [Swissprot: P94541], and the dsRHbd of B. halodurans C-125 [Swissprot: Q9KEI9] were used as queries. Using the resulting matrix, PSI-TBLASTN searches were conducted against the 353 complete genomes by using an E-value threshold of 0.2.
For the HMM profile analysis, the profiles of RNase HI and RNase HII were downloaded from the Sanger Institute's Pfam Web site  and the HMM profile of dsRHbd was newly built by using the hmmbuild module of the HMMER 2.3.2 software  on the basis of the results of the PSI-BLAST search. The 353 complete genomes were translated into six-frame amino acid sequences. Using these HMM profiles as queries, protein domain searches were performed with the hmmpfam module of the HMMER 2.3.2 software against translated complete genomes with an E-value threshold of 1× 10-6.
On the basis of the outputs of the PSI-BLAST and HMM searches, coding sequences including homologous regions of RNase H or dsRHbd were obtained from GenBank files by using G-language Perl modules . When the search revealed unannotated genomic regions, we manually checked for the existence of an open reading frame (ORF) near the genomic region. In order to distinguish genes encoding RNase HII and RNase HIII in the datasets, a PGPBLAST search was conducted against the Conserved Domain Database (a subset of domains from SMART, Pfam, COG, and CD)  downloaded from the NCBI CDD FTP site .
The amino acid and nucleotide sequences of the DNA gyrase subunit B gene (gyrB) were retrieved in a similar way. The CodonAlign 2.0 software (Barry G. Hall, Rochester, NY, USA) was used to align the nucleotide sequences on the basis of alignments of the corresponding amino acid sequences performed with the ClustalW 1.8.3 software . The Modeltest 3.7 software  was applied to select an appropriate model from the output of the PAUP* Version 4.0 software  by using hierarchical likelihood-ratio tests and the Akaike Information Criterion . Phylogenetic trees were estimated by Bayesian methods with MRBAYES Version 3.1.2 software  under the General Time Reversible model with gamma correction and a proportion of invariable sites . In the Bayesian analysis, the Markov chain Monte Carlo search used 1 000 000 generations run with four chains, with trees being sampled every 100 generations, and a consensus tree was estimated by a burn-in of 2500 trees. TreeView software for Power Macintosh  was used for viewing and editing the tree.
- RNase H:
double-stranded RNA and RNA-DNA hybrid-binding domains
- gyrB :
DNA gyrase subunit B gene
Hidden Markov Model
open reading frame.
Gevers D, Vandepoele K, Simillon C, Van de Peer Y: Gene duplication and biased functional retention of paralogs in bacterial genomes. Trends Microbiol. 2004, 12 (4): 148-154. 10.1016/j.tim.2004.02.007.
Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H: Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006, 2: 2006 0008-10.1038/msb4100050.
Kobayashi K, Ehrlich SD, Albertini A, Amati G, Andersen KK, Arnaud M, Asai K, Ashikaga S, Aymerich S, Bessieres P, Boland F, Brignell SC, Bron S, Bunai K, Chapuis J, Christiansen LC, Danchin A, Debarbouille M, Dervyn E, Deuerling E, Devine K, Devine SK, Dreesen O, Errington J, Fillinger S, Foster SJ, Fujita Y, Galizzi A, Gardan R, Eschevins C, Fukushima T, Haga K, Harwood CR, Hecker M, Hosoya D, Hullo MF, Kakeshita H, Karamata D, Kasahara Y, Kawamura F, Koga K, Koski P, Kuwana R, Imamura D, Ishimaru M, Ishikawa S, Ishio I, Le Coq D, Masson A, Mauel C, Meima R, Mellado RP, Moir A, Moriya S, Nagakawa E, Nanamiya H, Nakai S, Nygaard P, Ogura M, Ohanan T, O'Reilly M, O'Rourke M, Pragai Z, Pooley HM, Rapoport G, Rawlins JP, Rivas LA, Rivolta C, Sadaie A, Sadaie Y, Sarvas M, Sato T, Saxild HH, Scanlan E, Schumann W, Seegers JF, Sekiguchi J, Sekowska A, Seror SJ, Simon M, Stragier P, Studer R, Takamatsu H, Tanaka T, Takeuchi M, Thomaides HB, Vagner V, van Dijl JM, Watabe K, Wipat A, Yamamoto H, Yamamoto M, Yamamoto Y, Yamane K, Yata K, Yoshida K, Yoshikawa H, Zuber U, Ogasawara N: Essential Bacillus subtilis genes. Proc Natl Acad Sci U S A. 2003, 100 (8): 4678-4683. 10.1073/pnas.0730515100.
Wagner A: The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol. 2001, 18 (7): 1283-1292.
Nowak MA, Boerlijst MC, Cooke J, Smith JM: Evolution of genetic redundancy. Nature. 1997, 388 (6638): 167-171. 10.1038/40618.
Zhang J, Zhang YP, Rosenberg HF: Adaptive evolution of a duplicated pancreatic ribonuclease gene in a leaf-eating monkey. Nat Genet. 2002, 30 (4): 411-415. 10.1038/ng852.
Zhang J: Parallel adaptive origins of digestive RNases in Asian and African leaf monkeys. Nat Genet. 2006, 38 (7): 819-823. 10.1038/ng1812.
Zhang J, Rosenberg HF, Nei M: Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci U S A. 1998, 95 (7): 3708-3713. 10.1073/pnas.95.7.3708.
Malik HS, Eickbush TH: Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses. Genome Res. 2001, 11 (7): 1187-1197. 10.1101/gr.185101.
Stein H, Hausen P: Enzyme from calf thymus degrading the RNA moiety of DNA-RNA Hybrids: effect on DNA-dependent RNA polymerase. Science. 1969, 166 (903): 393-395. 10.1126/science.166.3903.393.
Ohtani N, Haruki M, Morikawa M, Kanaya S: Molecular diversities of RNases H. J Biosci Bioeng. 1999, 88 (1): 12-19. 10.1016/S1389-1723(99)80168-6.
Ogawa T, Okazaki T: Function of RNase H in DNA replication revealed by RNase H defective mutants of Escherichia coli. Mol Gen Genet. 1984, 193 (2): 231-237. 10.1007/BF00330673.
Cerritelli SM, Frolova EG, Feng C, Grinberg A, Love PE, Crouch RJ: Failure to produce mitochondrial DNA results in embryonic lethality in Rnaseh1 null mice. Mol Cell. 2003, 11 (3): 807-815. 10.1016/S1097-2765(03)00088-1.
Sato A, Kanai A, Itaya M, Tomita M: Cooperative regulation for Okazaki fragment processing by RNase HII and FEN-1 purified from a hyperthermophilic archaeon, Pyrococcus furiosus. Biochem Biophys Res Commun. 2003, 309 (1): 247-252. 10.1016/j.bbrc.2003.08.003.
Rydberg B, Game J: Excision of misincorporated ribonucleotides in DNA by RNase H (type 2) and FEN-1 in cell-free extracts. Proc Natl Acad Sci U S A. 2002, 99 (26): 16654-16659. 10.1073/pnas.262591699.
Haruki M, Tsunaka Y, Morikawa M, Kanaya S: Cleavage of a DNA-RNA-DNA/DNA chimeric substrate containing a single ribonucleotide at the DNA-RNA junction with prokaryotic RNases HII. FEBS Lett. 2002, 531 (2): 204-208. 10.1016/S0014-5793(02)03503-2.
Drolet M, Phoenix P, Menzel R, Masse E, Liu LF, Crouch RJ: Overexpression of RNase H partially complements the growth defect of an Escherichia coli delta topA mutant: R-loop formation is a major problem in the absence of DNA topoisomerase I. Proc Natl Acad Sci U S A. 1995, 92 (8): 3526-3530. 10.1073/pnas.92.8.3526.
Cheng B, Rui S, Ji C, Gong VW, Van Dyk TK, Drolet M, Tse-Dinh YC: RNase H overproduction allows the expression of stress-induced genes in the absence of topoisomerase I. FEMS Microbiol Lett. 2003, 221 (2): 237-242. 10.1016/S0378-1097(03)00209-X.
Li TK, Barbieri CM, Lin HC, Rabson AB, Yang G, Fan Y, Gaffney BL, Jones RA, Pilch DS: Drug targeting of HIV-1 RNA.DNA hybrid structures: thermodynamics of recognition and impact on reverse transcriptase-mediated ribonuclease H activity and viral replication. Biochemistry. 2004, 43 (30): 9732-9742. 10.1021/bi0497345.
Crow YJ, Leitch A, Hayward BE, Garner A, Parmar R, Griffith E, Ali M, Semple C, Aicardi J, Babul-Hirji R, Baumann C, Baxter P, Bertini E, Chandler KE, Chitayat D, Cau D, Dery C, Fazzi E, Goizet C, King MD, Klepper J, Lacombe D, Lanzi G, Lyall H, Martinez-Frias ML, Mathieu M, McKeown C, Monier A, Oade Y, Quarrell OW, Rittey CD, Rogers RC, Sanchis A, Stephenson JB, Tacke U, Till M, Tolmie JL, Tomlin P, Voit T, Weschke B, Woods CG, Lebon P, Bonthron DT, Ponting CP, Jackson AP: Mutations in genes encoding ribonuclease H2 subunits cause Aicardi-Goutieres syndrome and mimic congenital viral brain infection. Nat Genet. 2006, 38 (8): 910-916. 10.1038/ng1842.
Crouch RJ, Arudchandran A, Cerritelli SM: RNase H1 of Saccharomyces cerevisiae: methods and nomenclature. Methods Enzymol. 2001, 341: 395-413.
Ohtani N, Yanagawa H, Tomita M, Itaya M: Identification of the first archaeal Type 1 RNase H gene from Halobacterium sp. NRC-1: archaeal RNase HI can cleave an RNA-DNA junction. Biochem J. 2004, 381 (Pt 3): 795-802.
Ohtani N, Yanagawa H, Tomita M, Itaya M: Cleavage of double-stranded RNA by RNase HI from a thermoacidophilic archaeon, Sulfolobus tokodaii 7. Nucleic Acids Res. 2004, 32 (19): 5809-5819. 10.1093/nar/gkh917.
Yamamoto S, Harayama S: Phylogenetic analysis of Acinetobacter strains based on the nucleotide sequences of gyrB genes and on the amino acid sequences of their products. Int J Syst Bacteriol. 1996, 46 (2): 506-511.
Mushegian AR, Edskes HK, Koonin EV: Eukaryotic RNAse H shares a conserved domain with caulimovirus proteins that facilitate translation of polycistronic RNA. Nucleic Acids Res. 1994, 22 (20): 4163-4166. 10.1093/nar/22.20.4163.
Gaidamakov SA, Gorshkova, Schuck P, Steinbach PJ, Yamada H, Crouch RJ, Cerritelli SM: Eukaryotic RNases H1 act processively by interactions through the duplex RNA-binding domain. Nucleic Acids Res. 2005, 33 (7): 2166-2175. 10.1093/nar/gki510.
Cerritelli SM, Fedoroff OY, Reid BR, Crouch RJ: A common 40 amino acid motif in eukaryotic RNases H1 and caulimovirus ORF VI proteins binds to duplex RNAs. Nucleic Acids Res. 1998, 26 (7): 1834-1840. 10.1093/nar/26.7.1834.
Cerritelli SM, Crouch RJ: The non-RNase H domain of Saccharomyces cerevisiae RNase H1 binds double-stranded RNA: magnesium modulates the switch between double-stranded RNA binding and RNase H activity. Rna. 1995, 1 (3): 246-259.
Nowotny M, Gaidamakov SA, Crouch RJ, Yang W: Crystal structures of RNase H bound to an RNA/DNA hybrid: substrate specificity and metal-dependent catalysis. Cell. 2005, 121 (7): 1005-1016. 10.1016/j.cell.2005.04.024.
Chon H, Tadokoro T, Ohtani N, Koga Y, Takano K, Kanaya S: Identification of RNase HII from psychrotrophic bacterium, Shewanella sp. SIB1 as a high-activity type RNase H. Febs J. 2006, 273 (10): 2264-2275. 10.1111/j.1742-4658.2006.05241.x.
Itaya M, Omori A, Kanaya S, Crouch RJ, Tanaka T, Kondo K: Isolation of RNase H genes that are essential for growth of Bacillus subtilis 168. J Bacteriol. 1999, 181 (7): 2118-2123.
Kanaya S, Crouch RJ: DNA sequence of the gene coding for Escherichia coli ribonuclease H. J Biol Chem. 1983, 258 (2): 1276-1281.
Katayanagi K, Miyagawa M, Matsushima M, Ishikawa M, Kanaya S, Ikehara M, Matsuzaki T, Morikawa K: Three-dimensional structure of ribonuclease H from E. coli. Nature. 1990, 347 (6290): 306-309. 10.1038/347306a0.
Haruki M, Noguchi E, Kanaya S, Crouch RJ: Kinetic and stoichiometric analysis for the binding of Escherichia coli ribonuclease HI to RNA-DNA hybrids using surface plasmon resonance. J Biol Chem. 1997, 272 (35): 22015-22022. 10.1074/jbc.272.35.22015.
Evans SP, Bycroft M: NMR structure of the N-terminal domain of Saccharomyces cerevisiae RNase HI reveals a fold with a strong resemblance to the N-terminal domain of ribosomal protein L9. J Mol Biol. 1999, 291 (3): 661-669. 10.1006/jmbi.1999.2971.
Makarova K, Slesarev A, Wolf Y, Sorokin A, Mirkin B, Koonin E, Pavlov A, Pavlova N, Karamychev V, Polouchine N, Shakhova V, Grigoriev I, Lou Y, Rohksar D, Lucas S, Huang K, Goodstein DM, Hawkins T, Plengvidhya V, Welker D, Hughes J, Goh Y, Benson A, Baldwin K, Lee JH, Diaz-Muniz I, Dosti B, Smeianov V, Wechter W, Barabote R, Lorca G, Altermann E, Barrangou R, Ganesan B, Xie Y, Rawsthorne H, Tamir D, Parker C, Breidt F, Broadbent J, Hutkins R, O'Sullivan D, Steele J, Unlu G, Saier M, Klaenhammer T, Richardson P, Kozyavkin S, Weimer B, Mills D: Comparative genomics of the lactic acid bacteria. Proc Natl Acad Sci U S A. 2006, 103 (42): 15611-15616. 10.1073/pnas.0607117103.
Berglund-Sonnhammer AC, Steffansson P, Betts MJ, Liberles DA: Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J Mol Evol. 2006, 63 (2): 240-250. 10.1007/s00239-005-0096-1.
Ohtani N, Haruki M, Morikawa M, Crouch RJ, Itaya M, Kanaya S: Identification of the genes encoding Mn2+-dependent RNase HII and Mg2+-dependent RNase HIII from Bacillus subtilis: classification of RNases H into three families. Biochemistry. 1999, 38 (2): 605-618. 10.1021/bi982207z.
Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer and the nature of bacterial innovation. Nature. 2000, 405 (6784): 299-304. 10.1038/35012500.
ten Asbroek AL, van Groenigen M, Jakobs ME, Koevoets C, Janssen B, Baas F: Ribonuclease H1 maps to chromosome 2 and has at least three pseudogene loci in the human genome. Genomics. 2002, 79 (6): 818-823. 10.1006/geno.2002.6776.
Kochiwa H, Itaya M, Tomita M, Kanai A: Stage-specific expression of Caenorhabditis elegans ribonuclease H1 enzymes with different substrate specificities and bivalent cation requirements. Febs J. 2006, 273 (2): 420-429. 10.1111/j.1742-4658.2005.05082.x.
Filippov V, Filippov M, Gill SS: Drosophila RNase H1 is essential for development but not for proliferation. Mol Genet Genomics. 2001, 265 (5): 771-777. 10.1007/s004380100483.
Frank P, Braunshofer-Reiter C, Wintersberger U: Yeast RNase H(35) is the counterpart of the mammalian RNase HI, and is evolutionarily related to prokaryotic RNase HII. FEBS Lett. 1998, 421 (1): 23-26. 10.1016/S0014-5793(97)01528-7.
Ray DS, Hines JC: Disruption of the Crithidia fasciculata RNH1 gene results in the loss of two active forms of ribonuclease H. Nucleic Acids Res. 1995, 23 (13): 2526-2530. 10.1093/nar/23.13.2526.
Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290 (5494): 1151-1155. 10.1126/science.290.5494.1151.
Braun FN, Liberles DA: Retention of enzyme gene duplicates by subfunctionalization. Int J Biol Macromol. 2003, 33 (1-3): 19-22. 10.1016/S0141-8130(03)00059-X.
Rastogi S, Liberles DA: Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol. 2005, 5 (1): 28-10.1186/1471-2148-5-28.
Ohtani N, Saito N, Tomita M, Itaya M, Itoh A: The SCO2299 gene from Streptomyces coelicolor A3(2) encodes a bifunctional enzyme consisting of an RNase H domain and an acid phosphatase domain. Febs J. 2005, 272 (11): 2828-2837. 10.1111/j.1742-4658.2005.04704.x.
Petrov DA, Hartl DL: Pseudogene evolution and natural selection for a compact genome. J Hered. 2000, 91 (3): 221-227. 10.1093/jhered/91.3.221.
Wagner A: Energy constraints on the evolution of gene expression. Mol Biol Evol. 2005, 22 (6): 1365-1374. 10.1093/molbev/msi126.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.
Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.
Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M: G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining. Bioinformatics. 2003, 19 (2): 305-306. 10.1093/bioinformatics/19.2.305.
Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Marchler GH, Mullokandov M, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Yamashita RA, Yin JJ, Zhang D, Bryant SH: CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res. 2005, 33 (Database issue): D192-6. 10.1093/nar/gki069.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.
Posada D, Crandall KA: MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998, 14 (9): 817-818. 10.1093/bioinformatics/14.9.817.
Swofford DL: PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates. 2003, Sunderland, Massachusetts
Akaike H: A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974, 19 (6): 716-723. 10.1109/TAC.1974.1100705.
Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17 (8): 754-755. 10.1093/bioinformatics/17.8.754.
Rodriguez F, Oliver JL, Marin A, Medina JR: The general stochastic model of nucleotide substitution. J Theor Biol. 1990, 142 (4): 485-501.
Page RD: TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci. 1996, 12 (4): 357-358.
We thank Dr. Mitsuhiro Itaya and Dr. Naoto Ohtani (Keio University, Japan) for their helpful discussion and suggestions. This research was supported by grants from the Japan Society for the Promotion of Science (JSPS) and by research funds from the Yamagata Prefectural Government and Tsuruoka City in Japan.
HK conceived the study. MT and AK supervised this work. All authors read and approved the final version of the manuscript.