Phylogenetic analysis of bacterial and archaeal arsC gene sequences suggests an ancient, common origin for arsenate reductase

Background The ars gene system provides arsenic resistance for a variety of microorganisms and can be chromosomal or plasmid-borne. The arsC gene, which codes for an arsenate reductase is essential for arsenate resistance and transforms arsenate into arsenite, which is extruded from the cell. A survey of GenBank shows that arsC appears to be phylogenetically widespread both in organisms with known arsenic resistance and those organisms that have been sequenced as part of whole genome projects. Results Phylogenetic analysis of aligned arsC sequences shows broad similarities to the established 16S rRNA phylogeny, with separation of bacterial, archaeal, and subsequently eukaryotic arsC genes. However, inconsistencies between arsC and 16S rRNA are apparent for some taxa. Cyanobacteria and some of the γ-Proteobacteria appear to possess arsC genes that are similar to those of Low GC Gram-positive Bacteria, and other isolated taxa possess arsC genes that would not be expected based on known evolutionary relationships. There is no clear separation of plasmid-borne and chromosomal arsC genes, although a number of the Enterobacteriales (γ-Proteobacteria) possess similar plasmid-encoded arsC sequences. Conclusion The overall phylogeny of the arsenate reductases suggests a single, early origin of the arsC gene and subsequent sequence divergence to give the distinct arsC classes that exist today. Discrepancies between 16S rRNA and arsC phylogenies support the role of horizontal gene transfer (HGT) in the evolution of arsenate reductases, with a number of instances of HGT early in bacterial arsC evolution. Plasmid-borne arsC genes are not monophyletic suggesting multiple cases of chromosomal-plasmid exchange and subsequent HGT. Overall, arsC phylogeny is complex and is likely the result of a number of evolutionary mechanisms.


Background
Arsenic is a toxic element that is found in both natural environments such as geothermal springs, and in sites contaminated by a number of industries. Inorganic arsenic exists primarily in two valence states: arsenite (As III ) and arsenate (As V , or the arsenate ion AsO 4 3-). Both forms are toxic to microorganisms, with arsenite disrupt-ing enzyme function, and arsenate behaving as a phosphate analog and interfering with phosphate uptake and utilization [1]. Microorganisms have evolved a variety of mechanisms for coping with arsenic toxicity, including minimizing the amount of arsenic that enters the cell (e.g. through increased specificity of phosphate uptake, [2]), arsenite oxidation through the activity of arsenite oxidase [3,4], or peroxidation reactions with membrane lipids [5,6]. Other microorganisms utilize arsenic in metabolism, either as a terminal electron acceptor in dissimilatory arsenate respiration [7][8][9][10] or as an electron donor in chemoautotrophic arsenite oxidation [11,12]. However, the most well-characterized microbial arsenic detoxification pathway involves the ars operon [2,13].
The ars operon consists of a group of genes coding for a transmembrane pump and an arsenate reductase (arsC). The operon includes a regulatory gene (arsR) and a gene coding for an arsenite-specific pump (arsB) as well as arsC [13]. Arsenite is pumped directly out of the cell by the arsB protein; however arsenate must first be reduced to arsenite by the cytoplasmic arsenate reductase coded from arsC. Some bacteria also possess other ars genes: arsA produces an arsenite-stimulated ATPase [14] that results in more efficient arsenite extrusion; arsD encodes for a regulatory protein that controls the upper level of ars expression [15]; arsH has been identified but has an uncertain function [16]. The ars operon was initially recognized in plamids of Staphylococcus aureus and S. xylosus [17,18] but has subsequently been found in other microorganisms (e.g. Escherichia coli [19,20], Acidiphilum multivorum [21], Bacillus subtilis [22], Pseudomonas aeruginosa [23]). The genes can be plasmid-borne or chromosomal, and genomesequencing projects have identified putative ars genes in both Bacteria and Archaea that have not been specifically characterized in terms of arsenic resistance. An analogous genetic system (Arr or ACR) has also been described in the eukaryote Saccharomyces cervisiae [24]. Thus the ars operon, or similar arsenic resistance systems, appears to be relatively widespread throughout microorganisms.
The arsC gene is of particular interest in that its product, the soluble enzyme arsenate reductase, catalyzes the reduction of arsenate to arsenite. Arsenate is the thermodynamically favorable form of arsenic under aerobic conditions [25,26], so it is likely to be the most common form of arsenic in many environments. Thus, the presence and expression of arsC is likely to be required for microorganisms inhabiting such areas. Furthermore, arsenite is generally more labile and toxic than arsenate [26], so that expression of arsC, and the ensuing reduction of arsenate to arsenite, might increase the toxicity of arsenic in the environment. Despite the potential importance of the arsC gene at both a physiological and environmental level, a thorough study of the phylogenetic distribution of different arsC genes has not been performed.
A simple phylogenetic tree showing the relationships between the arsenate reductases of seven Bacteria (those that had been confirmed to possess the gene at that time) was presented as part of a study that identified the arsenic resistance genes of Thiobacillus ferrooxidans (now Acidithio-bacillus ferroxidans) [27]. Saltikov and Olson [28] used probes/primers based on the E. coli ars operon to detect ars genes in natural environments, and presented a phylogenetic analysis of these genes, however their analysis was largely confined to enteric bacteria (those that were similar enough to E. coli to be detected) and they emphasized the arsB gene rather than arsC. As part of a review of microbial arsenic transformations, we recently determined a preliminary arsC phylogeny based on 19 sequences [29]. These three studies reported phylogenies that suggested a common origin for the ars genes examined [27][28][29]. An alternative view suggested by Mukhopadhyay et al. [30] is that three distinct classes of arsenate reductases exist, which have developed similar mechanisms and reaction centers through convergent evolution. Two of these classes are bacterial (one typified by the arsC found in enteric bacteria such as E. coli; the other typified by the arsC found in Staphylococcus plasmids and other Gram-positive Bacteria); the third is the Arr2 gene of S. cerevisiae. Phylogenies of these gene families based on 18 arsC sequences were recently reported [30]. However, as well as microorganisms that have been shown to have demonstrable arsenate resistance, whole genome analyses of various microorganisms have revealed an increasing number of open reading frames (ORFs) that have homology to arsC. Inclusion of these putative arsC genes in phylogenetic analyses might resolve the basic question: Did arsenate resistance systems evolve from a common origin or develop convergently in multiple taxa? In this study we attempt to answer this question through a phylogenetic analysis of both confirmed arsC genes and those inferred from whole genome analyses. We compare this phylogeny to one derived from 16S rRNA, and suggest possible implications for the evolutionary history of arsenic resistance in microorganisms.

Results and Discussion
The 16S rRNA based phylogenetic tree ( Figure 1) generally matched the accepted 16S rRNA model with clear separation of the two prokaryotic domains and the subsequent divergence of Eukarya (represented by Saccharomyces cerevisiae) from Archaea [31]. Minor inconsistencies such as Ralstonia solanacearum (β-Proteobacteria) forming a deep branch within the γ-Proteobacteria, or Fusobacterium nucleatum (Fusobacteria) and Leptospira interrogans (Spirochetes) grouping loosely with the Gram-positive Bacteria are likely the result of the limited data set used (they were the only representatives of their respective phyla that had listed arsC genes) and do not challenge established relationships. Otherwise the major divisions and subdivisions of the Bacteria (e.g. the various Proteobacteria, Low GC Gram-positives, Actinobacteria) formed expected patterns. An interesting finding was the grouping (albeit weak) of Nostoc muscorum and Synechosystis sp. with Deinococcus radiodurans to form a high level clade ( Figure 1).
Phylogenetic tree based on 16S rRNA gene sequences This is similar to findings from genome trees that suggest a close relationship between the Cyanobacteria and Deinococcales [32], although our 16S rRNA tree does not suggest a relationship between these taxa and the Actinobacteria. However, the Actinobacteria (High GC Gram-positives) did clearly separate from the Low GC Gram-positive Bacteria ( Figure 1).
The arsC phylogenies obtained by evolutionary distance (ED) analysis ( Figure 2) and maximum parsimony (MP; Figure 3) showed a number of broad similarities to the 16S rRNA tree. There was clear separation of Archaea and Bacteria (with the sole eukaryote, S. cerevisiae, grouping towards Archaea), and also general grouping of the different bacterial divisions; all the archaeal arsC genes were from the Euryarchaeota so it was not possible to examine division level phylogeny for Archaea. Both treeing methods support the existence of at least three major classes of arsC genes, corresponding to the Archaea/Eukarya, the Enterobacteriales (enteric γ-Proteobacteria) and α-Proteobacteria, and the Low GC Gram-positive Bacteria. At a basic level, these broad groupings of arsC correspond to the three distinct classes of arsenate reductases that others have observed [30], and these three groups had low sequence similarity to each other (less than 33% similarity between the Bacteria and Archaea/Eukarya, and 48% similarity between the two major bacterial groups). However, the analysis reported here includes more diverse arsC sequences and suggests that other deeply branching types of arsC exist. For example, the arsC sequences from major divisions of Bacteria such as the Green Sulfur Bacteria (represented by Chlorobium tepidum) and the Deinococcales (represented by D. radiodurans) are loosely associated with either the Enterobacteriales/α-Proteobacteria ( Figure  2) or Low GC Gram-positive Bacteria ( Figure 3) depending upon the analysis used, but in either case diverge to form their own deep branches, suggesting that they possess distinct arsenate reductases. Thus, the suggestion that the three previously reported classes of arsenate reductases developed through convergent evolution [30] seems flawed: not only does arsC phylogeny show broad parallels to the accepted 16S rRNA phylogeny, but deep bacterial divisions appear to possess the distinct arsenate reductases that would be expected from divergence from a common origin. Given the alternate hypotheses of (1) Common origin followed by sequence divergence, or (2) Independent origin of multiple arsC sequence types with broadly similar reaction centers and general mechanisms, based on the phylogenies observed and the general rarity of convergent evolution at a sequence level [33,34] we suggest that the former is more likely.
A number of taxa do show arsC phylogenies that are inconsistent with the established 16S rRNA evolutionary tree, perhaps the most dramatic being the separation of some of the non-enteric γ-Proteobacteria (Pseudomonas putida, P. aeruginosa, and A. ferrooxidans) from the Enterobacteriales. The non-enteric γ-Proteobacteria actually appear to be paraphyletic with regards to arsC, in that the two Xanthomonas species group with the α-Proteobacteria, whereas the others form a distinct group with the sole representative of the β-Proteobacteria (R. solanacearum). The phylogenetic affiliation of the latter clade is uncertain: ED analysis placed it as a deep branch in the Low GC Gram positive Bacteria group (Figure 2), whereas MP placed it loosely with the large Enterobacteriales/α-Proteobacteria group (Figure 3). Either placement had poor bootstrap support, and given the relatively low sequence homology of this clade to either of the other major bacterial arsC groups (50% similarity to the Low GC Gram positives, 40% similarity to the Enterobacteriales/α-Proteobacteria group) we suspect that this group might represent a fourth distinct group of arsenate reductases, making its placement in a binary tree difficult. A second distinct difference between the arsC and 16s rRNA trees is the tight grouping of the Cyanobacteria with the arsC sequences from Low GC Gram-positive Bacteria. Other discrepancies are placement of the three Streptococcus sequences (Low GC Grampositive Bacteria) with the Actinobacteria and the placement of Aquifex aeolicus (basal in the 16S rRNA-defined bacterial clade) and F. nucleatum within the Low GC Gram-positives (Figures 2 and 3). The latter two sequence types were the only arsC sequences reported for the Aquificales and Fusobacteria, respectively, and as with some of the 16S rRNA tree branches, might represent the limited data set rather than true evolutionary relationships. However, A. aeolicus is known to have exchanged numerous genes with other microorganisms [35], and the phylogeny of its arsenate reductase gene might represent another example of this.
Regardless of the particular phylogenetic methods used, the arsC trees clearly show a number of non-orthologous arsC genes, assuming that the 16S rRNA tree reflects true phylogeny. One explanation for these inconsistencies could be the existence of arsC paralogs that have diverged after arsC gene duplication events. While this might be the case for those organisms with arsC sequences that were not closely related to other sequences (e.g. D. radiodurans or C. tepidum), it is an unreasonable explanation for the similar arsC sequences (e.g. those of the two Cyanobacteria) that group within unexpected larger clades. Rather than multiple gene duplication events or convergent evolution, this suggests horizontal gene transfer (HGT) of arsC genes. Recent studies suggest that HGT events may have been much more widespread during prokaryotic evolution than had been previously thought, with genetic exchange even occurring between Bacteria and Archaea [35][36][37]. While we see no evidence of cross-domain transfer of arsC genes, a number of HGT events could explain Evolutionary distance tree based on arsC gene sequences Figure 2 Evolutionary distance tree based on arsC gene sequences. The tree was constructed by neighbor joining methods using 408 informative positions. Numbers represent percentages of 1000 bootstraps and are only shown for bootstrap values <80%. Plasmid-borne arsC genes are indicated by "plasmid" following the organism name. Names in boxes represent branches that are inconsistent with the 16S rRNA tree.
Maximum parsimony tree based on arsC gene sequences Figure 3 Maximum parsimony tree based on arsC gene sequences.The tree was constructed using the same 408 informative positions used in the ED analysis. Numbers represent percentages of 1000 bootstraps and are only shown for bootstrap values <80%. Plasmid-borne arsC genes are indicated by "plasmid" following the organism name. Names in boxes represent branches that are inconsistent with the 16S rRNA tree. some of the discrepancies between 16S rRNA-based and arsC-based phylogenies. HGT of arsC from an ancestral low GC Gram-positive bacterium to an ancestral cyanobacterium seems likely, and a similar HGT event from ancestral Actinobacteria to the Streptococci would explain the unexpected phylogeny of the Streptococcus arsC sequences. Similar gene transfer events from the low GC Gram-positives to F. nucleatum could also have occurred, but with an arsC sequence only available from a single member of the Fusobacteria it is difficult to draw conclusions. The paraphyletic arsC sequences in the non-enteric γ-Proteobacteria likewise may have arisen from HGT events, although the difficulty in resolving the affiliation of some of these sequences makes speculation about their evolution premature. Indeed, the exact placement of many of the non-orthologous arsC genes was difficult and poor bootstrap support for some nodes suggests that refinement of arsC phylogeny as more sequences become available should be an ongoing process. Regardless, a number of bacteria possess arsC genes that are inconsistent with established phylogenies, and HGT events are one possible explanation. It should also be noted that some of the non-orthologous sequences represent putative arsC genes identified from ORF's, rather than confirmed arsenate reductases. These ORF's show homology to identified arsC genes, but may not necessarily code for a functional enzyme. Thus, some of the non-orthologous sequences might represent elevated genotypic variation if the organism has no selective pressure to maintain a functional arsC gene.
We were hoping that patterns in the phylogeny of plasmid-borne arsC genes might indicate whether HGT of arsC genes has occurred (or is still occurring) in recent times. Specifically, if the plasmid-borne arsenate reductases are similar, this would suggest a common origin for these genes, which must ultimately be chromosomal. However, the plasmid-borne arsC genes appear to be paraphyletic and clearly separate into very different arsC types (Figures 2 and 3). The two major groups of plasmidborne genes are those of the Staphylococci (which were the first arsC genes to be recognized [17,18]) and those of the Enterobacteriales. Both of these groups of plasmidborne genes are similar to the chromosomal genes found in related organisms. The only exception is the arsC found on the pKW301 plasmid of A. multivorum, which shows high sequence similarity to the plasmid-borne arsC sequences found in the Enterobacteriales, strongly suggesting relatively recent plasmid transfer between a member of this group and A. multivorum. Indeed, the ars operon found in A. multivorum is expressed and confers arsenate resistance when transferred to E. coli [21]. Three other types of plasmid-borne arsC genes are found in R. solanacearum, Clostridium acetobutylicum, and Halobacterium halobium, demonstrating that plasmid-borne arsC genes are phylogenetically widespread and suggesting multiple incidences of chromosomal-plasmid transfer. The arsC sequence of C. acetobutylicum is typical of the clostridial arsC genes, and also shows some similarity to the arsenate reductase sequence in L. interrogans (a Spirochete). Thus, it is possible that ancestral clostridial plasmids may have transferred ars genes to the Spirochetes. As well as C. acetobutylicum, two other bacteria were represented by both chromosomal and plasmid-borne arsC sequences. The two arsC genes of E. coli K12 are similar, but they are equally similar to the plasmid-borne genes of other enteric bacteria. The plasmid-borne arsC gene from Salmonella typhimurium, shows little similarity to it's chromosomal equivalent, and appears to be related to the other arsC genes found in plasmids in the Enterobacteriales. The high similarity (at least 70%) between all the enteric plasmids suggests a common source, which from this analysis appears to be very similar to the E. coli chromosomal arsC gene. However, it is clear that within the enterics, relationships based on arsC do not parallel relationships based on 16S rRNA, and significant genetic exchange within this group (possibly inside a host organism) has likely occurred.

Conclusions
Despite the suggestion of HGT and duplication events, the arsC phylogeny suggests that arsenate reductase is an evolutionarily old enzyme, a hypothesis that was recently suggested for arsenite oxidase [38]. This is the first phylogenetic analysis of an arsenic redox active gene to include all three domains and the Archaea and Bacteria were clearly separated, with S. cerevisiae subsequently branching from Archaea. Thus, the arsC gene, and by extension, arsenic resistance mechanisms, must have been present in early organisms, either in the last universal ancestor or after the divergence of the two prokaryotic domains (i.e. in an ancestral bacterium or archaean). In the latter case, early HGT event(s) must have transferred arsenate resistance to the other domain, followed by subsequent divergence to the phylogeny seen today. Such gene exchange between Archaea and primitive Bacteria has certainly taken place [36] and makes absolute statements about the initial origin of genes difficult to make. Regardless, it seems likely that arsenate resistance developed early in the evolution of microorganisms. Recent ideas on the origin of life suggest that early cellular structures were abiotic iron-sulfur formations at submarine hydrothermal vents in the Hadean ocean, in which basic biochemical and metabolic processes could have developed before the formation of cell membranes [39,40]. As well as iron and sulfur, hydrothermal areas are often are often characterized by high levels of arsenic [41] and may even contain arsenic redox active microbial communities [42]. Given the integral role of phosphate in early (and current) cellular metabolism [39], an ability to reduce the levels of arse-nate (a phosphate analog) might have been an essential biochemical process. There would have been selective pressure to develop an arsenate reductase that could change this arsenate to arsenite, and this early evolution of an arsC-like ancestor coupled with sequence divergence and HGT would result in the diversity and widespread distribution of arsC genes seen in microorganisms today.

Sequence acquisition and alignment
Sequences of arsC genes were obtained from GenBank [43] and were characterized into two broad types: confirmed arsC genes (i.e. those sequences that were identified in studies that explicitly tested arsenate reduction and/or resistance in that given organism), and putative arsC genes (i.e. open reading frames (ORFs) obtained from genome projects that showed homology to known arsC genes). Genes were also noted as being either chromosomal or plasmid-borne. A total of 60 sequences were analyzed consisting of 54 bacterial, 5 archaeal, and the Arr2 gene of Saccharomyces cerevisae. Accession numbers for these sequences are given in Table 1 [see Additional file 1]. The Arr2 gene (or ACR2 [24]) has been identified as coding for a protein necessary for arsenate resistance in yeast, and appears to have some homology to prokaryotic arsC genes. Sequences were imported into the ARB software package (distributed by W. Ludwig and O. Strunk, Technical University of Munich, Germany; http:// www.arb-home.de/) running on a SuSE Linux 6.2 platform.
In ARB, amino acid sequences were derived from nucleotide sequences using the Translate DNA to Protein function and the standard genetic code. Because some arsC genes were incomplete, all three possible reading frames were examined to ensure correct translation. Amino acid sequences of the confirmed bacterial arsenate reductases (12 total) were initially aligned automatically using the Fast Aligner function of ARB Edit 4.1, and then manually adjusted to ensure that secondary structure considerations of the arsenate reductase protein were met. The secondary structure of E. coli arsenate reductase (17978 in the NCBI Molecular Modeling Database, [44]) was visualized using Cn3D 3.0 and One-D Viewer 1.0 (available from http:// www.ncbi.nih.gov) and used as a model for arsenate reductase secondary structure. Nucleotide sequences of these confirmed arsC genes were aligned according to the amino acid alignment, and the putative arsC ORFs subsequently aligned to these confirmed sequences. Multiple alignments of all arsC genes used in this study are presented as an additional ARB file [see Additional file 2]. Aligned 16S rRNA sequences of all organisms used in the arsC analysis (with the exception of an arsC sequence obtained from an unknown Proteobacterium) were obtained from the Ribosomal Database Project [45] and imported into ARB. Because three arsC genes have been reported for different strains/plasmids of E. coli, and two arsC genes have been identified for both Salmonella typhimurium and Clostridium acetobutylicum (plasmid and chromosomal for each; Table 1 [see Additional file 1]) only one 16S rRNA sequence was used for each to avoid potential biases and inaccurate attractions due to sequence identity.

Phylogenetic analyses
The aligned 16S rRNA sequences (55) were used to construct a baseline phylogenetic tree to serve as a comparison for arsC sequences. 1326 informative positions (corresponding to E. coli positions 140-1475) were compared using evolutionary distance (ED) methods. An ED tree was constructed in ARB for the 56 16S rRNA sequences using neighbor joining [46] with 1000 bootstraps. Distance and maximum parsimony (MP) methods were used to construct arsC trees using aligned nucleotide sequences. Both methods used all 60 aligned arsC nucleotide sequences. The ED tree was constructed in ARB using neighbor joining following a simple Jukes and Cantor model with the Olsen correction. Because visual examination of the aligned sequences revealed a number of gaps suggesting insertion/deletion events within different taxa, gaps were treated as a fifth base for construction. Thus, ED phylogeny was based on a total of 408 informative positions with 1000 bootstraps performed as part of the procedure. MP phylogeny used the same informative positions and number of bootstraps, and was performed using the PHYLIP package included with ARB.

Authors' contributions
CRJ carried out the phylogenetic analyses, participated in sequence alignment and drafted the manuscript. SLD obtained the sequence information and participated in sequence alignment. Both authors read and approved the final manuscript.