Differences in lateral gene transfer in hypersaline versus thermal environments
© Rhodes et al; licensee BioMed Central Ltd. 2011
Received: 6 March 2011
Accepted: 8 July 2011
Published: 8 July 2011
Skip to main content
© Rhodes et al; licensee BioMed Central Ltd. 2011
Received: 6 March 2011
Accepted: 8 July 2011
Published: 8 July 2011
The role of lateral gene transfer (LGT) in the evolution of microorganisms is only beginning to be understood. While most LGT events occur between closely related individuals, inter-phylum and inter-domain LGT events are not uncommon. These distant transfer events offer potentially greater fitness advantages and it is for this reason that these "long distance" LGT events may have significantly impacted the evolution of microbes. One mechanism driving distant LGT events is microbial transformation. Theoretically, transformative events can occur between any two species provided that the DNA of one enters the habitat of the other. Two categories of microorganisms that are well-known for LGT are the thermophiles and halophiles.
We identified potential inter-class LGT events into both a thermophilic class of Archaea (Thermoprotei) and a halophilic class of Archaea (Halobacteria). We then categorized these LGT genes as originating in thermophiles and halophiles respectively. While more than 68% of transfer events into Thermoprotei taxa originated in other thermophiles, less than 11% of transfer events into Halobacteria taxa originated in other halophiles.
Our results suggest that there is a fundamental difference between LGT in thermophiles and halophiles. We theorize that the difference lies in the different natures of the environments. While DNA degrades rapidly in thermal environments due to temperature-driven denaturization, hypersaline environments are adept at preserving DNA. Furthermore, most hypersaline environments, as topographical minima, are natural collectors of cellular debris. Thus halophiles would in theory be exposed to a greater diversity and quantity of extracellular DNA than thermophiles.
The extent and role of lateral gene transfer (LGT) as a force of evolution has only recently become appreciated. Only in the past couple decades has the sequencing of genomes such as that of Thermotoga maritima thrust LGT into the limelight . The original estimates suggested that over 20% of Thermotoga maritima's genome was the result of long distance LGT events. This and numerous other results have led to a potential reevaluation of the tree of life and the notion of a Last Universal Common Ancestor [2, 3].
LGT itself is driven by a variety of mechanisms including conjugation, or the transfer of genetic material via direct contact , transduction, or the viral mediated transfer of DNA , and transformation, or the uptake and incorporation of naked DNA from an environment . Conjugative transfers necessitate the cohabitation of the participants and are generally thought to require the participants to be closely related, although inter-class conjugative events have been shown to occur between members of the Proteobacteria . Similarly, while most transductive phages and phage like objects are restricted to infecting members of the same species, phages that infect across classes are known to exist . Finally, transformative events present no definitive phylogenetic barrier. Presumably a microorganism can take up virtually any DNA present in its immediate environment. However the probability of a harvested piece of assembled DNA being incorporated into a genome is partially dependent on sequence similarity between the donor and host DNA and is therefore much greater for closely related individuals . Consequently the vast majority of LGT events are thought to occur between closely related species. Nevertheless inter-phylum and inter-domain transfer events can and do occur [1, 10]. These "long range" transfer events are partially the result of transformation events and, while relatively rare, offer a potentially significant evolutionary mechanism.
Species within the domain Archaea and a variety of bacterial phyla are known to be capable of transformation . Preliminary estimates indicate that approximately 1% of bacterial species are naturally able to take up DNA . The frequency of a transformation event is dependent on a number of factors, including but not limited to, the quantity of DNA in an environment, the rate of DNA degradation in an environment, the frequency of DNA uptake by the recipients, the likelihood of incorporation into a genome, and natural selection on the incorporated DNA . These factors in turn are highly specific to individual species and environments. Here, we have used genomic and metagenomic techniques to test mechanisms of LGT into two phylogenetically coherent clades from different extreme environments.
Extremophiles, and in particular thermophiles and halophiles, are well-known for participating in rampant LGT [1, 12, 13]. It is theorized that the very nature of their extreme environments encourages the exchange of genetic material. Essentially any advantages gained in overcoming the environmental challenges are highly sought after, rapidly exchanged, and potentially accelerate the rate of evolution. In this regard, thermal and saline environments are quite similar. Both offer considerable environmental obstacles to be overcome before life can persist.
The crenarchaeal class, Thermoprotei, consists solely of obligate thermophiles. Similarly, the euryarchaeal class, Halobacteria, consists solely of obligate halophiles. Any LGT event into a member of the Thermoprotei or the Halobacteria necessarily occurred in either a thermal or hypersaline environment respectively. Thus, together these two distinct archaeal lineages offer a naturally occurring evolutionary experiment by which we can study "long range," inter-class and more distant, LGT events in these specific environments.
However, with regard to transformation, there are some significant differences between these two types of extreme environments. For example, high temperatures rapidly degrade unprotected DNA, both intracellularly and extracellularly, thereby preferentially preserving more thermally protected DNA. Fittingly, certain proteins, enzymes, and specifically salts, such as MgCl2 and KCl, can help protect DNA from thermal degradation [14, 15]. In contrast, high salinities can preserve even naked DNA for exceptionally long periods of time. Borin et al. demonstrated that the preservation of naked DNA in deep-sea anoxic hypersaline brines did not depend on the species of origin and that DNA was often capable of participating in natural transformation after weeks of exposure . Another fundamental difference between thermal and saline environments is that saline environments almost as a rule are topographical minima. Saline environments such as the Dead Sea are therefore natural collectors of cellular debris and may therefore contain the DNA of a diversity of contaminant species . Thermal environments, however, may or may not be topographical minima and therefore may or may not be natural collectors of cellular debris. Environmental factors would therefore serve to increase the diversity of extracellular DNA in a typical saline environment relative to the typical thermal environment.
Analyses of halophiles have revealed a number of genomic characteristics common to halophilicity. Foremost amongst these characteristics is a propensity for GC richness possibly to protect against thymine dimerization due to the intense UV radiation often associated with hypersaline environments . The preference for GC nucleotides is present in virtually all known lineages of halophiles and is nearly ubiquitous amongst the Halobacteria, Haloquadratum walsbyi being the sole known exception. However interspersed among the GC rich genomes of the Halobacteria are many GC poor regions . The varied composition of the Halobacteria genomes combined with the diversity of metabolic functions and the frequent occurrence of insertion sequence elements suggested to Kennedy et al. that the Halobacteria are particularly adept at procuring novel genes and metabolic pathways .
Other DNA level propensities include an increased abundance of the dinucleotides 'CG', 'GA/TC', and 'AC/GT' and preferences for specific codons for the amino acids arginine, cysteine, leucine, threonine, and valine, presumably for secondary and tertiary stability in protein folding . Furthermore, halophiles have developed two distinct strategies to overcome the extreme salinities of their native environments. While the "salt-out" halophiles balance the osmotic pressure of their environments with intra-cellular organic solutes such as betaine, the "salt-in" halophiles use KCl. The presence of often multimolar concentrations of K+ ions requires radical alterations of protein chemistry. These alterations in protein chemistry include an overall preference for amino acids with acidic residues relative to amino acids with basic residues . The halophiles of the class Halobacteria are all "salt-in" and they all demonstrate a bias toward amino acids with acidic residues, regardless of their nucleotide composition. Recent metagenomic studies have confirmed this trend on an environmental scale in a number of hypersaline environments .
At a salinity of over 340 g/l, the modern surface waters of the Dead Sea represent one of the most saline naturally occurring bodies of water known to harbor life. When combined with a slightly acidic pH (~ 6), near toxic magnesium levels, (currently about 2.0 M Mg2+), and dominance of divalent cations over monovalent cations , it becomes a truly unique and inhospitable ecosystem. Current cell counts are well below 5 × 105 mL-1 . A number of species of the Halobacteria have been isolated from the Dead Sea, including Haloarcula marismortui , Haloferax volcanii , Halorubrum sodomense , and Halobaculum gomorrense . However recent metagenomic studies have suggested that the dominant microorganism in the modern Dead Sea is most closely related to a member of the neutrophilic, halophilic, euryarchaeal genus Halobacterium or the alkaliphilic, halophilic, euryarchaeal genus Natronomonas [17, 27].
Putative LGT events are generally identified using two distinct methods: phylogenetic methods attempt to identify genes associated with LGT events by constructing and analyzing phylogenies in an effort to find genes that do not conform to the group's established taxonomy. Compositional methods, on the other hand, identify LGT events by searching for genes whose DNA or amino acid signatures do not match those of their host organism. The methods are essentially complementary, in that they use unrelated data to obtain similar conclusions. For this reason the two methods often identify entirely different classes of LGT events [28, 29]. Here, in an attempt to investigate the drivers of LGT in thermal and hypersaline environments, we have employed a predominantly phylogenetic approach to identify putative LGT events involving a thermophilic class of Archaea, the Thermoprotei, and a halophilic class of Archaea, the Halobacteria. We then seek to confirm our results in a collection of environmental fosmids from the Dead Sea.
Genomes from all fully sequenced genera of the archaeal classes Thermoprotei (Acidilobus, Aeropyrum, Caldivirga, Desulfurococcus, Hyperthermus, Ignicoccus, Ignisphaera, Metallosphaera, Pyrobaculum, Staphylothermus, Sulfolobus, Thermofilum, Thermoproteus, Thermosphaera, and Vulcanisaeta) and Halobacteria (Halalkalicoccus, Haloarcula, Halobacterium, Haloferax, Halomicrobium, Haloquadratum, Halorhabdus, Halorubrum, Haloterrigena, Natrialba, and Natronomonas), were obtained from the NCBI database in November of 2010. These genomes were then compared to the entire collection of fully sequenced microbes using the BLASTP program and default parameters . In cases where the normalized best BLAST score to members of its own class but not within its genus was less than 75% of the normalized best BLAST score to non-members of the Thermoprotei or Halobacteria respectively, the gene was flagged as a probable inter-class LGT event. Overall this method identified 1226 genes from Halobacteria and 1279 genes from Thermoprotei as "long distance" LGT events. To test for the possibility of bias in our downstream analyses associated with the 75% BLAST score cutoff, the procedure was repeated with cutoffs ranging from 90% to 50%.
There exists a potential bias in our analysis however, in that there are many more fully sequenced thermophiles in the databases than there are halophiles. While it seems unlikely, most if not all of the LGT events from non-halophiles into the Halobacteria could actually originate in heretofore unidentified and/or unsequenced halophiles. The complexity and diversity of hypersaline environments, and for that matter the majority of the microbial world, is poorly constrained . Thus, barring a direct observation of a LGT event from a non-halophile to a member of the Halobacteria, it appears impossible to rule out the possibility that we have not identified the correct donor species. Nevertheless, there are a number of tests that can lend support to our assertion that that the database bias does not account for the vast majority of the discrepancy between LGT into Thermoprotei and Halobacteria. These include:
Top scoring LGT genes
Alpha amylase catalytic region
Dead Sea Fosmids
We identified 22 instances within the Halobacteria where adjacent genes or genes separated by a single gene were apparently transferred together and showed conservation not only of gene content, but also of gene order. The inter-class conservation of gene order offers concrete proof that these genes have undergone a LGT event. Of the 22 gene pairings, 19 originated in species with no known halophilic tendencies (Figure 4c), again suggesting that a significant portion of long range LGT events into the Halobacteria did not originate in halophiles. For the Thermoprotei we identified 55 multiple gene transfers, of these, only 7 or 13% originated in non-thermophiles (Figure 4f).
Both 'GA/TC' and 'AC/GT', also demonstrate a trend toward an increase among the non-LGT genes for the Halobacteria. The trends while not quite as strong, are still apparent. The Thermoprotei show a similar trend for 'GA/TC' and a reverse trend for 'AC/GT'. Amongst the codon biases, cysteine, leucine, threonine, and valine all show a trend toward an increased preference in the halobacterial non-LGT genes. Only arginine demonstrates a reverse trend. Meanwhile, the Thermoprotei do not appear to exhibit any particular trends. Finally, neither the Halobacteria nor the Thermoprotei show a particularly strong trend between LGT genes and non-LGT genes for the amino acid bias.
Twenty-five 40 kb fosmids from the surface water of the 2006 Dead Sea were sequenced on a quarter plate of a 454FLX sequencer. The sequencing run produced a total of 90,479 reads with an average read length of 237 base pairs, for a total of approximately 21 million base pairs of sequence. The sequences were then assembled and a total of 95 contigs with greater than 2,000 base pairs were produced. These contigs were compared to the collection of fully sequenced genomes using a BLASTX search. The contigs were then scanned for the presence of Halobacteria genes, and all contigs without a majority of Halobacteria genes were discarded. The remaining contigs were searched for the presence of genes whose top normalized BLAST score to a member of the Halobacteria was less than 75% of the top normalized BLAST score to any non-member of the Halobacteria. Twenty-two putative "long distance" LGT genes were identified in this manner of which only two were from known halophiles (SI 3). The top five instances are provided in Table 1.
Using a homology-based approach we identified 1,226 putative inter-class LGT events involving members of the obligatory halophilic archaeal class Halobacteria and 1,269 putative inter-class LGT events involving members of the obligatory thermophilic archaeal class Thermoprotei. The vast majority of these LGT events consisted of gene transfers into the Halobacteria and Thermoprotei. Furthermore, the phylogenetic distance between the donor species and the recipient species suggests that the majority of these LGT events were the result of natural transformation. As the Halobacteria are all obligate halophiles and the Thermoprotei are all obligate thermophiles the transformative events must have occurred in saline and thermal environments respectively.
Conventional thinking would suggest that the Halobacteria would be exposed to naked DNA from predominately other halophiles and that the Thermoprotei would be exposed to naked DNA from predominately other thermophiles. Additionally, genes originating in other halophiles and thermophiles would be preadapted to the particular environmental conditions and would therefore be more likely to be successfully transferred. Thus we would expect the majority of LGT events into the Thermoprotei to originate in other thermophiles and the majority of LGT events into the Halobacteria to originate in other halophiles. However, we found that while the majority of these transformational events into the Thermoprotei did in fact originate in other thermophiles, the majority of these transformational events into the Halobacteria did not originate in other known halophiles. This suggests that there is something fundamentally different between LGT in thermophiles and LGT in halophiles.
Unfortunately, as with all studies relying on genomic databases, there is the potential for distortion from database bias. In our study we face three disparate database issues. First, there is always the possibility that we may have misidentified LGT events. Second, while the fully sequenced Thermoprotei originate from a number of distinct orders, the fully sequenced Halobacteria all belong to a single family, the Halobacteriaceae. It is unclear how the reduced phylogenetic diversity of the Halobacteria would affect our analysis. Finally, the relative paucity of fully sequenced halophiles outside the Halobcateria as compared to thermophiles outside the Thermoprotei may explain our observation that more Halobacteria donors are non-halophiles than Thermoprotei donors are non-thermophiles. Nevertheless, a number of halophiles have been sequenced in non-Halobacterial lineages including Bacillus halodurans of the bacterial class Bacilli, Methanohalophilus mahii and Methanohalobium evestigatum of the euryarchaeal class Methanomicrobia, and Chromohalobacter salexigens of the bacterial class Gammaproteobacteria. In all four of these cases there remain numerous LGT events apparently originating in close non-halophilic relatives. At the same time we also went to great lengths to seek out additional lines of evidence that would help confirm our findings.
We restricted the analysis to LGT genes with especially strong matches and to instances where multiple genes were transferred together and gene order was conserved. Restricting the analysis to especially strong matches increases the likelihood of having identified both an LGT event and the correct donor species. Restricting the analysis to multiple gene transfers virtually guarantees that a LGT event took place. In both cases we achieved similar results to our initial analysis.
We then investigated independent genomic indicators of halophilicity for each genus of the Halobacteria and Thermoprotei. These indicators consisted of GC content, 'CG', 'GA/TC', and 'AC/GT' dinucleotide content, codon preferences, and amino acid preferences. If the LGT genes into the Halobacteria did in fact originate in non-halophiles, some residual signature of non-halophilicity could remain. Essentially this amounts to an independent assessment of LGT events specifically targeting halophiles. Of the ten indicators investigated for the Halobacteria, eight supported our assertion, one did not indicate a clear trend, and only the codon preference for arginine refuted our conclusion.
The strongest support came from genomic GC content and 'CG' dinucleotide content. Both indicators showed a statistically significant increase from non-LGT genes to LGT genes for every genus and represent strong support for the correct identification of LGT genes and for the non-halophile origin of the majority of them. This includes the genus Haloquadratum which is unique among the Halobacteria for having a relatively low genomic GC content of approximately 48%. It therefore might appear that there is something inherent to genes associated with LGT that accounts for the differences in GC content. However, for the Thermoprotei there was no clear trend for the indicators as a whole and for genomic GC content there appeared to be a decrease in genomic GC content from LGT genes to non-LGT genes.
Finally we sought confirmation of our genome based results from within metagenomic samples. We used fully and partially assembled fosmid inserts from the surface waters of the Dead Sea, a highly saline environment, to identify inter-class LGT events in environmental halophiles. Once again the vast majority of the donor species were non-halophiles.
In this study we provide a number of lines of evidence that suggest that the mechanisms and origins of "long distance" LGT events into the Thermoprotei and Halobacteria are different. We theorize that the difference in the origin of LGT genes lies in the differing natures of hypersaline and thermal environments with respect to naked DNA. Hypersaline environments are often adept at preserving both naked DNA and intact microorganisms. There have even been claims of intact DNA and viable bacteria preserved in 200 million year old salt crystals [36–38]. In contrast, thermal environments rapidly degrade DNA. Thermophilic organisms, therefore, must go to great lengths to protect and stabilize their DNA from the environment. Thus intracellular degradation of DNA would be expected to be greater for non-thermophiles, and thermally stable DNA would be in better condition upon release to the environment. Furthermore various mechanisms of DNA protection, such as association with DNA binding proteins, may provide transient protection extracellularly. The net effect of these and other protective methods would lead to an increase in intact thermophilic DNA in a thermal environment relative to non-thermophilic DNA.
In a related vein, hypersaline environments generally occupy topographic minima. This makes hypersaline environments such as the deep Mediterranean basins and the Dead Sea natural collectors of debris, cellular and otherwise [16, 17]. Thus the average halophilic microorganism should be exposed to a much greater diversity of DNA than an average thermophilic microbe. Together these facts suggest that halophilic microorganisms are exposed to a greater proportion of intact non-halophilic DNA than thermophiles are exposed to intact non-thermophilic DNA. This suggestion combined with the relatively large genome size, diverse genomic composition, and broad range of metabolic capabilities of the Halobacteria paint the picture of the Halobacteria potentially acting as the consummate opportunists, incorporating and utilizing genes from a great variety of organisms. However, in order to better identify and understand LGT events amongst halophiles many more halophiles must be sequenced and the dynamics of naked DNA in a variety of naturally occurring settings must be studied.
The Dead Sea environmental sample was collected and processed in 2007 by the Béjà lab group (Technion, Haifa, Israel) according to the protocol of Bodaker et al. . The fosmid inserts were then shipped frozen to Penn State. The inserts were run on a 1% low melting point agarose gel to remove residual contamination and the 40 kb band was extracted and digested with the Gelase enzyme (Epicentre). The fosmids were then sequenced on a GS FLX sequencer (454 Life Sciences) on one quarter of a pico-titre plate.
The fosmid sequences were assembled using the 454 assembler program. All contigs of greater than 2,000 base pairs were compared to the collection of fully sequenced Bacteria and Archaea using the BLASTX program, an e-value of 10-5, and default parameters. The contigs were then spliced according to gene location and another identical BLASTX comparison was conducted on each gene. Each gene whose top hit was not to a member of the Halobacteria, had a normalized bit score (BLAST bit score to homologue divided by BLAST bit score to self) more than 25% greater than the best hit to a Halobacteria gene, and had a bit score greater than 67 was flagged as a putative inter-class LGT event. Then all contigs were scanned for genes belonging to the Halobacteria, and contigs without a majority of genes assigned to Halobacteria species were discarded. Finally, a number of genes demonstrated near perfect, upwards of 95%, identity to likely laboratory contaminants such as Escherichia coli. These genes were also removed from the analysis. The remaining 22 genes were considered LGT events and the donor species were assigned according to the best hit as matched by BLAST. A web search was then conducted to identify whether the donor species was a known halophile.
All fully sequenced Thermoprotei and Halobacteria genomes were compared to the collection of all fully sequenced Bacteria and Archaea using BLASTP, an e-value of 10-5, and default parameters. Each gene whose top non-identical hit was not to a member of the Halobacteria or Thermoprotei respectively, had a normalized bit score more than 25% greater than the best non-identical hit to a member of the Halobacteria or Thermoprotei, and had a bit score greater than 67 was flagged as an inter-class LGT event. In cases such as Pyrobaculum where more than one species has been sequenced, the analysis was conducted on the species with the most genes and all hits to members of its genus were masked out. The remaining species were subjected to the usual analysis and any additional LGT genes were included in the analysis of the genus. In cases such as Sulfolobus islandicus where multiple strains have been sequenced a similar masking was performed. The donor species were assigned according to the best hit as matched by BLAST and a web search was then conducted to identify whether the donor species was a known halophile or thermophile respectively. Genes were assigned to COGs based upon the NCBI annotation. All additional analysis was performed using home-written scripts in Perl and/or Python. The scripts are available upon request. For the purposes of our statistical analysis the values of the non-LGT gene pool were taken as representative of the taxon as a whole. We then used a chi square test to assess the likelihood of the LGT genes originating in the same population.
The phylogenetic trees included in the supplemental material were constructed using the online tools available in association with the KEGG database. For each gene with a BLAST score of over 500, its twenty closest homologues were selected. The CLUSTALW tool was then used to create an alignment, and an unrooted neighbor joining tree was constructed. The phylogenetic trees were inspected manually for indicators of LGT directionality. For the phylogenetic trees depicted in Figures 1 and 2, LGT genes from both the Thermoprotei and Halobacteria were pooled into three pools depending on BLAST bit score. Five genes at random were chosen from each pool and the top 14 homologues from distinct genera were selected from the KEGG database. The CLUSTALW tool was again used to create an alignment. The alignments were then loaded into PHYLIP and trees were constructed with 100 bootstraps and the mean-least-squared method .
The following additional data is available with the online version of this paper. Additional file 1 is a collection of phylogenetic trees representing all LGT events with BLAST scores greater than 500 for both the Thermoprotei and the Halobacteria. Additional file 2 is a table listing the percentage of LGT events that are intra-environmental given various BLAST cutoff values. Additional data file 3 is a table listing the 22 "long distance" LGT events identified in the assembled fosmid sequences.
Lateral Gene Transfer
Clusters of Orthologous Groups of proteins.
We thank I. Bodaker and O. Béjà for their efforts in collecting and processing the fosmid samples, L. Tomsho for sequencing, and S. Fitz-Gibbon for bioinformatic support. This work was supported in part by the National Aeronautics and Space Administration (NASA) Astrobiology Institute (NAI) under NASA-Ames Cooperative Agreement NNA09DA76A (C.H.H.) and by the Agriculture and Food Research Initiative Competitive Grants Program Grant no. 2010-65110-20488 from the USDA National Institute of Food and Agriculture. The 454 facility at the Pennsylvania State University Center for Genome Analysis is funded, in part, by a grant from the Pennsylvania Department of Health using Tobacco Settlement Funds appropriated by the legislature.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.