- Research article
- Open Access
Evolution of a microbial nitrilase gene family: a comparative and environmental genomics study
© Podar et al; licensee BioMed Central Ltd. 2005
- Received: 05 May 2005
- Accepted: 06 August 2005
- Published: 06 August 2005
Completed genomes and environmental genomic sequences are bringing a significant contribution to understanding the evolution of gene families, microbial metabolism and community eco-physiology. Here, we used comparative genomics and phylogenetic analyses in conjunction with enzymatic data to probe the evolution and functions of a microbial nitrilase gene family. Nitrilases are relatively rare in bacterial genomes, their biological function being unclear.
We examined the genetic neighborhood of the different subfamily genes and discovered conserved gene clusters or operons associated with specific nitrilase clades. The inferred evolutionary transitions that separate nitrilases which belong to different gene clusters correlated with changes in their enzymatic properties. We present evidence that Darwinian adaptation acted during one of those transitions and identified sites in the enzyme that may have been under positive selection.
Changes in the observed biochemical properties of the nitrilases associated with the different gene clusters are consistent with a hypothesis that those enzymes have been recruited to a novel metabolic pathway following gene duplication and neofunctionalization. These results demonstrate the benefits of combining environmental genomic sampling and completed genomes data with evolutionary and biochemical analyses in the study of gene families. They also open new directions for studying the functions of nitrilases and the genes they are associated with.
- Horizontal Gene Transfer
- Epoxide Hydrolase
- Environmental Genomic
- Codon Substitution Model
- Nitrilase Gene
Having colonized virtually every environment, bacteria and archaea have evolved enzymatic solutions for a wide range of metabolic biochemical transformations [1, 2]. Studying enzymes derived from organisms inhabiting these environments is important for understanding how microbes adapt, react to and transform the environment. The overwhelming majority of microbial species remain however uncultivated . A variety of functional and sequence-based approaches have been developed for discovering and characterizing genes, operons and even entire genomes directly from the environment, collectively referred to as metagenomics or environmental genomics . The use of environmental genomics has already led to important discoveries such as genes responsible for novel biological functions , microbial community metabolic traits [6–8] and dramatic increases in the diversity of various enzyme families [9, 10]. Subsequent biochemical and evolutionary analyses can strengthen the biological end ecological inferences even before organisms that carry that genetic information are isolated in culture [11–13]. From a practical perspective, microbial environmental genomics has been a successful approach for the discovery of enzymes for a broad spectrum of biotechnological applications [14–17].
To gain insight into the evolution of function in a gene family that has been extensively sampled by environmental genomic screening and characterized biochemically, we focused on bacterial nitrilases. These enzymes are members of the carbon-nitrogen hydrolase superfamily which catalyze the hydrolysis of a wide range of non-peptide carbon-nitrogen bonds [18–20]. The nitrilase family hydrolyzes nitriles to their corresponding carboxylic acids, releasing ammonia. This reaction is likely involved in detoxification of xenobiotics and nitriles produced as defense chemicals by other microorganisms and plants, as well as in secondary metabolite biosynthetic pathways. Nitrilases appear to be rare in bacteria (out of over 150 sequenced bacterial genomes only 10 contain nitrilase genes). Recently, over 130 nitrilases were identified by functional screening of hundreds of environmental DNA libraries, for use in industrial biocatalysis applications . Those enzymes were characterized biochemically and classified into six subfamilies, four of them with no representatives in known bacterial species. It was found that a number of enzymatic properties (substrate specificity and enantioselectivity) were specific to subfamilies and, in some cases, correlated with the biogeography and ecology of the environmental samples.
The role of gene duplication, natural selection and functional diversification in the evolution of the nitrilase gene family is unknown. The correlation of distinct enzymatic properties with the different genes subfamilies suggest that nitrilases have diverged functionally to accommodate distinct biological roles in microbial communities that occupy various ecological niches. Functional divergence is the result of changes in selection pressure and is often accompanied by associations with novel gene clusters or operons which encode for enzymes with coupled metabolic activities. To begin addressing some of these aspects, we analyzed the genetic neighborhoods of all available nitrilase genes, identified conserved patterns of conserved gene clustering relative to biochemical data and phylogeny and propose a hypothesis on nitrilase evolution involving gene duplications and Darwinian selection.
The nitrilases from cultivated bacteria belong to clade-specific gene clusters
In the case of subfamily 2, gene neighborhood information was available for only four of the twelve genes from cultivated bacteria. In Bacillus sp. and Pseudomonas syringae, the nitrilase gene is apparently co-transcribed with a downstream phenylacetaldoxime dehydratase gene and preceded by an araC transcription factor transcribed from the other strand. The other nitrilase genes (from Burkholderia, Bradyrhizobium and Ralstonia) are part of unrelated clusters (Figure 1).
In addition to the nitrilases from completed genomes of cultivated bacteria, we searched for such enzymes in two large environmental sequence datasets: the acid-mine drainage microbial mats  and the Sargasso Sea  using BLASTP. No nitrilases were found in the acid-mine dataset. In the Sargasso Sea dataset we identified 17 nitrilases that were full-length or long enough to be phylogenetically informative. Three of the genes appear to be eukaryotic while eight bacterial genes are close relatives to nitrilases from Synechoocccus or Burkholderia. The remaining six genes do not appear to have close relatives among known nitrilases and belong to subfamilies 2, 4 and 5 [see Additional file 1]. Finding so few nitrilase genes in such a large dataset suggests that for uncovering the sequence space of a gene family, functional screening of a large number of samples from very different environments is more efficient than deep sequence coverage of one or a few environments.
Nitrilases associated with different types of gene clusters have distinct enzymatic properties
For the nitrilase genes identified from environmental DNA, the identity of the host organism is unknown. However, because those libraries were constructed using fragments of genomic DNA several times larger than the average nitrilase gene length (~1 kb), we also analyzed the the gene neighborhood of the environmental nitrilase. Because of the highly conserved nature of the Nit1C cluster and its occurrence in distant taxa of bacteria, we first focused on mapping its distribution among the environmental nitrilase clones. We found that the Nit1C cluster is strictly confined to a group of subfamily 1 nitrilases that includes the seven genes identified in completed genomes and 14 of the environmental ones. Four of the subfamily 1 nitrilases from the Sargasso Sea dataset had small flanking sequences and we identified the presence of the Nit1C type genes (ORFs 1 or 3), similar to those of their close relatives from Synechococcus and Burkholderia. However, because of their incomplete length, those sequences were not included in further analyses.
The sister group of subfamily 1 nitrilases, subfamily 3, consists of only three environmental type genes. We had sufficient flanking sequence to determine the nature of the neighboring genes for only one of the genes (3A1), flanked by two hypothetical ORFs with no identifiable homologs. Therefore, the Nit1C cluster appears to have originated with and is restricted to a subset of subfamily 1 nitrilases. The more distantly related nitrilases from subfamilies 4, 5 and 6 have no apparent associations with a conserved gene cluster (data not shown).
In our previous study  we uncovered a number of correlations between the biochemical properties of the environmental microbial nitrilases and their phylogenetic classification. Distinct gains or losses of activity or switches in enantioselectivity coincided with the evolutionary events that led to the formation of the main subfamilies. One of the most interesting findings was a reversal in enantioselectivity (R to S) that occurred in subfamily 1, against the model substrate hydroxyglutaronitrile. To correlate the differences in types of gene clusters with the nitrilase biochemical properties, we graphed the available hydroxyglutaronitrile activity data on the side of the phylogenetic tree (Figure 3C). With one exception (1B15), the enzymes that belong to the Nit1C group are R-enantioselective on hydroxyglutaronitrile. The transition event (TE) marks changes in biochemical properties leading to enantioselectivity reversal. The first enzyme not associated with Nit1C (1A21) was inactive on that substrate, while the next diverging ones (1A20, 1A22, 1A16, 1A17) were R-selective or not enantioselective (low bootstrap values do not support a robust branching order). However, the next statistically supported clade (1A14 and above in the Figure 3A tree) show a reversal of enantioselectivity followed by a steep increase in selectivity to values over 95%.
Analysis of the subfamily 1 nitrilase gene clusters
Having determined that subfamily 1 nitrilases belong to two distinct subgroups based on their associated gene clusters and enzymatic properties, we analyzed the nitrilase neighboring genes for clues to their individual metabolic roles. First in the Nit1C cluster, ORF1 proteins are highly conserved in length (160–163 amino acids) and sequence (>60% identity between any two genes). However, no other homologs were found using standard searching techniques of current databases. Using HMM structural homology modeling (Superfamily 1.63 server) , we tentatively assigned the hypothetical protein 1 to the YchN1-like superfamily and fold, whose biochemical activity is unknown. Next in the cluster is the nitrilase gene. The third gene encodes a member of the radical SAM superfamily (Pfam 04055), enzymes that catalyze a wide variety of radical-based reactions through reductive cleavage of S-adenosylmethionine at an iron-sulfur center . The Nit1C SAM genes form a strongly supported clade (~50% average sequence identity), most closely related to bacterial and archaeal genes annotated as biotin synthase-related enzymes (COG2516) [see Additional file 2]. ORF4 in the Nit1C cluster also forms a clade of closely related sequences and belong to the GCN5-related N-acetyltransferase (GNAT) superfamily (Pfam 00583) . These enzymes are involved in antibiotic detoxification as well as in histone acetylation in eukaryotes. The closest homologs to the Nit1C GNAT genes are a number of other acetylases from bacteria like Rhodobacter and Enterococcus [see Additional file 2]. The fifth gene in the cluster encodes members of the large 5'-phosphorybosyl-5-aminoimidazole synthase-related proteins superfamily (AIRS, Pfam 00586). Enzymes in this superfamily are involved in de novo purine biosynthesis, selenophosphate synthesis, or maturation of NifE hydrogenase. These genes form a unique clade, most closely related to a group of archaeal genes encoding phosphoribosylformylglycinamide synthases [see Additional file 2]. The last invariant position in the cluster, ORF6, encodes a protein of approximately 100 amino acids. While the sequence identity between the individual genes surpasses 70%, we could not find any other relatives to these genes by any sequence analysis approach. The seventh ORF of Nit1C is located at either end of the cluster, on either coding strand. This gene is a member of the pyridine nucleotide-disulphide oxidoreductases (Pfam 00070, COG2072), that include flavin-containing monooxygenases and flavoproteins involved in K+ transport. The closest relatives to the Nit1C genes are putative monooxygenases found in several species of Pseudomonas [see Additional file 2]. All Nit1C genes form clusters of closely related sequences within their respective superfamilies, suggesting a common function, possibly in a pathway for detoxification of plant or microbial defense compounds.
Members of the nitrilase clade that split after the transition event are exclusively of environmental origin, with no sequence representatives in characterized bacterial species. Approximately two thirds of the nitrilases in this group are associated with genes encoding a MarR transcriptional regulator, epimerases and epoxide hydrolases. MarR genes (PFam 01047) are transcriptional repressors controlling the expression of the Mar operon, involved in multiple antibiotic resistances . The nitrilase-associated MarR genes form a specific clade, most closely related to genes from Xanthomonas and Desulfitobacterium (30–40% identity) [see Additional file 3] and are always upstream of the nitrilase gene. The location of the epimerase and epoxide hydrolase varies somewhat, the epimerase ORF being usually between the nitrilase and the epoxide hydrolase ORFs. Epimerases are a large class of enzymes that reversibly determine stereochemical inversions of hydroxyl substituents in carbohydrates, participating in numerous metabolic pathways [29, 30]. The nitrilase-associated epimerases form a unique clade in which the relationship between the genes parallels that of their associated nitrilases. Their closest relatives are epimerases from species of Streptomyces (~35% identity) [see Additional file 3]. Epoxide hydrolases belong to the large superfamily of alpha-beta fold hydrolases and hydrate chemically reactive epoxides to more stable dihydrodiols. This reaction is of major importance in detoxification of a large number of endogenous epoxide metabolites and xenobiotic compounds in all organisms . The association of all these genes with nitrilases could indicate the requirement for coupled reactions under the transcriptional control of MarR, perhaps involved in detoxifying sugar-based cyanogenic compounds in soils rich in decaying plant material.
Positive selection as a possible driving force for nitrilase functional diversification
The observed changes in associated gene clusters and in enzymatic properties suggest that the hypothetical gene duplication in subfamily 1 was followed by nitrilase recruitment to novel metabolic functions, possibly under selective constraints. A powerful approach to studying changes in the selective pressure in protein encoding genes involves calculation of the nonsynonymous/synonymous substitution rate ratio (ω = dN/dS) (reviewed in [32, 33]). A ratio below one indicates negative (purifying) selection, restricting amino acid changes that could interfere with a well-established protein function, while ω = 1 suggests that the gene evolves neutrally. On the other hand, a ratio significantly higher than one may indicate a selective advantage for fixation of amino acid changes. This can be considered evidence of positive selection associated with functional divergence after events such as gene duplications or changes in the environment (e.g. [34, 35]).
Using a relative rate test , we first investigated the rate variation between the branches flanking the transition event (1A23/1A25 and 1A21). A likelihood ratio test based on a three-taxon tree (consisting of 1A25 and 1A21 as test sequences and 1A29 as outgroup) compared the null hypothesis (equal rates for both branches following the transition event) with an alternative model with unconstrained rates. The null model was rejected (P = 2 × 10-6, df = 1), supporting a 5.6 times faster overall rate for the 1A21 lineage than for 1A25, which has maintained the Nit1C association. A rate increase is predicted when gene duplication is followed by functional divergence and could occur because of positive Darwinian selection or an increase in fixation of neutral mutations as result of relaxation of functional constraints [37–40].
To test if positive selection acted along the nitrilase lineages flanking the cluster transition event, we used a maximum likelihood (ML) approach based on codon substitution models . These models take into account sequence features such as transition-transversion rate biases, codon usage variation and allow testing hypotheses at specific branches in a phylogeny by employing heterogeneous ω values among sites and lineages. Positive selection can also be investigated using a parsimony-based method, there being some controversy on to which of the two methods is more reliable [41–43].
Parameter estimates, likelihood scores and identified selected sites under various models. Branch numbers refer to Figure 4A. Parameters indicating positive selection are in bold. A likelihood ratio test (LRT) is used to compare a pair of nested models: one which accounts for sites with ω > 1 and one which does not (the null model). To accept or reject the ω > 1 hypothesis, twice the log-likelihood difference in the scores is compared with a χ2 distribution with the degrees of freedom equal to the difference in the numbers of parameters between the two models. When ML detects lineages with ω > 1, an empirical Bayes analysis identifies sites under positive selection and calculate posterior probabilities that provide a measure of confidence for that prediction.
Positively selected sites
Likelihood Ratio Test
ω = 0.0418
M1:neutral (K = 2)
p0 = 0.298, p1 = 0.702
M3:discrete (K = 2)
p0 = 0.6, p1 = 0.4, ω0 = 0.012, ω1 = 0.098
p0 = 0.3, p1 = 0.70, p2+p3 = 0, ω2 = 0
p0 = 0.4, p1 = 0.6, p2+p3 = 0
ω0 = 0.098, ω1 = 0.012, ω2 = 0
p0 = 0.296, p1 = 0.688, p 2 +p 3 = 0.016, ω 2 = 129.6
Q157 (P = 0.77), Q203 (P = 0.999), T41, Q157, Y184, N200, Q203, R284 (P > 0.9)
LRT vs. M1 2Δl = 6.8, P = 0.03, df = 2
p0 = 0.356, p1 = 0.59, p 2 +p 3 = 0.05
ω0 = 0.1, ω1 = 0.0125, ω 2 = 9.7
LRT vs. M3 (K = 2) 2Δl = 6.2, P = 0.04, df = 2
High resolution structures are not yet available for nitrilases. However, the structures of two homologs, the C. elegans NitFhit protein and the Agrobacterium radiobacter N-carbamoyl-D-amino acid amidohydrolase (D-NCAase) have been solved [47, 48]. Both proteins form tetramers with two dimer subunits and revealed a novel four layer α-β-β-α fold. It is believed that all members of the nitrilase superfamily share this fold and the catalytic triad Glu-Lys-Cys in the active site. A three dimensional model of 1A21 (the first nitrilase outside the Nit1C group) was derived based on the D-NCAase structure coordinates, and used to map the location of the residues under positive selection at the CTE. Three of those, T41, Q157 and Y184, were found to be buried within the protein, close to the catalytic triad (E44, K126, C160) (Figure 4B). Those residues could be involved in the overall conformation of the active site or may have a direct role in the reaction by interacting with the substrate. The other three positively selected sites, N200, Q203 and R284 cluster on the surface interface between the molecules of the dimer. That interface has been shown in D-NCAase to form a hydrophobic pocket that is responsible for the tight dimer structure. It is known that the quaternary structures of nitrilases and cyanide hydratases can be quite different, ranging in size from monomers and dimers to oligomers containing 10, 14 or more subunits. Substrate binding has also been shown to play a role in the formation of active enzyme oligomers. The three interface residues may play a role in aspects of quaternary structure and substrate specificity associated with the proposed neofunctionalization after the cluster transition event.
In this study, we combined genomic and biochemical analysis of a microbial enzyme family to understand evolutionary events that have shaped the genome organization and metabolism of organisms inhabiting various environments. It has long been known that bacterial genes often cluster based on linked functions. The gene location sometimes correlates with the order of the individual reactions in an enzymatic cascade or facilitate regulatory mechanisms of gene expression. Various models have been proposed to explain the formation, the evolutionary and physiological significance of operons and other gene clusters . Comparative genomic studies have shown that recognition of clusters can assist in functional annotation of novel genes but clusters often they break apart with increasing taxonomic distance [49–53]. The Nit1C cluster that we described is remarkable in that it is highly conserved across several bacterial phyla and is present in organisms that inhabit extremely diverse environments. While limited rearrangements have occurred in Nit1C, the preservation of all seven genes suggests there is selective pressure for maintenance of the entire gene cluster regardless of the genomic dynamics in that neighborhood. The internal rearrangements of Nit1C correlate with high level taxa (cyanobacteria, beta and gamma proteobacteria).
There is no experimental evidence for an involvement of any of the Nit1C genes in a known metabolic transformation. Two of the cluster genes have no close homologs or predictable biochemical activities while the remaining genes, even though have a predictable type of biochemical activity, belong to classes of enzymes that are involved in a wide range of transformations. Predicting function for remote homologs in the absence of experimental data is still a major difficulty in genomics [54, 55]. Having a defined cluster of genes such as Nit1C, likely to be functionally connected, sets the ground for future experimental genetic and biochemical investigation in search of its biological function.
Phylogenetically, the nitrilases from the Nit1C cluster appear strictly confined to a basal subset of subfamily 1 genes. More recent diversification of the genes in this subfamily has been accompanied by a change in the type of associated gene clusters and is paralleled by changes in biochemical properties of the nitrilases. While overall, subfamily 1 nitrilases are under strong purifying selection pressure, we detected a significant positive selection signal for the lineage following the transition event and identified several residues under such selection. This supports a hypothesis that a group of nitrilases diverged functionally from the Nit1C-type enzymes, became associated with other metabolic enzymes possibly as part of a novel pathway and advantageous mutations were fixed at specific sites under positive selection. Future studies of bacterial nitrilases and biochemical and genetic characterization of mutations at these residues are needed to better understand the determinants of substrate specificity and the functional differences between the nitrilase subfamilies.
Environmental microbial genomics has demonstrated its utility in studying large scale ecological processes [5, 6, 11], discovering valuable biocatalysts  and reassembling the genomic and metabolic blueprint of natural microbial communities thorough shotgun sequencing [7, 8, 10]. Vast amounts of sequence data could potentially be used to answer a wide range of questions, although there are open questions regarding experimental design, data analysis and breadth of biological significance [4, 56, 57]. A broad environmental sampling from worldwide geographical locations coupled with experimental biochemical validation and comparative genomic analysis allowed us to test metabolic and evolutionary hypotheses difficult to approach by using sequence data from only a few environments.
The nitrilase sequences discovered from environmental DNA libraries are available from Genbank (AY487426-AY487562). Nitrilase sequences from sequenced bacterial genomes and their corresponding flanking genes were also obtained from GenBank, their names and accession numbers being indicated in the corresponding figures. For Verrucomicrobium spinosum DSM 4136, preliminary sequence data was obtained from the The Institute for Genome Research website  and for Burkholderia fungorum and Rubrivivax gelatinosus from the DOE Joint Genome Institute website .
The biochemical characterization data used in this study for the environmental nitrilases tested on the non physiological substrate hydroxyglutaronitrile has been published .
Sequence analysis and annotation
For the analysis of the ORFs flanking the nitrilase genes in known bacterial genomes we used the sequence coordinates available in the corresponding GenBank files. For the environmental DNA clones containing nitrilase genes we identified and annotated the other open reading frames (ORFs) contiguous with the nitrilase in the genomic insert using standard approaches. The inserts varied in size from 1 to 7 kb and in most cases contained information to identify at least one or more ORFs in addition to the nitrilase gene. Annotation was derived based on available experimental or predicted function or biochemical activity using information associated with those genes in GenBank, PFAM, COG and KEGG databases.
Amino acid sequences were aligned in BioEdit  followed by manual refinement. Sequence alignments are provided [see Additional files 4, 5]. Phylogenetic trees were constructed in PROML (PHYLIP 3.6)  using maximum likelihood, JTT amino acid substitution matrix, five global rearrangements with randomized sequence input order and among-site rate variation modeled with an eight rate category discrete approximation to a gamma distribution. The model parameters were estimated using TREE-PUZZLE 5.1. . Branch support was obtained by bootstrapping (100 replicates).
Analysis for positive selection
A DNA sequence alignment for the nitrilase genes was obtained based on the protein alignment and used for phylogenetic reconstructions in PAUP* 4.0  using maximum likelihood and is provided [see Additional file 6]. The model of sequence evolution (GTR+I+G) was selected using Modeltest v.3.06 . To test specific branches for possible rate changes we used Hy-Phy . The topologies for the DNA tree and the protein tree were identical.
The tree topology was used in the program codeml (PAML , to estimate dN/dS ratios based on maximum likelihood codon substitution models. Two categories of models were used, site specific  as well as branch-site models . Statistical comparisons between the results from different nested models were done using likelihood ratio tests .
A three-dimensional model for a clade 1 nitrilase (1A21) was obtained based on the structure of the homologous protein N-carbamoyl-D-amino acid amidohydrolase , using the Jackal software . Analysis of the model and mapping of amino acid residues involved in catalysis or subject to positive selection was done in PyMol .
We thank Jay Short and Michiel Noordewier for their support and guidance, the Diversa Research and Development team, especially, Dan Robertson, Jenny Chaplin and Grace Desantis for leading the nitrilase discovery and characterization projects, David Lomelin and Cosmin Deciu for bioinformatics analysis support and Mark Wall for the three dimensional model of the nitrilase. Special thanks also to Melvin Simon and Phil Hugenholtz for stimulating discussions and suggestions.
- Pace NR: A molecular view of microbial diversity and the biosphere. Science. 1997, 276: 734-740. 10.1126/science.276.5313.734.View ArticlePubMedGoogle Scholar
- Rappe MS, Giovannoni SJ: The uncultured microbial majority. Annu Rev Microbiol. 2003, 57: 369-394. 10.1146/annurev.micro.57.030502.090759.View ArticlePubMedGoogle Scholar
- Keller M, Zengler K: Tapping into microbial diversity. Nat Rev Microbiol. 2004, 2: 141-150. 10.1038/nrmicro819.View ArticlePubMedGoogle Scholar
- Handelsman J: Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev. 2004, 68: 669-685. 10.1128/MMBR.68.4.669-685.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Beja O, Aravind L, Koonin EV, Suzuki MT, Hadd A, Nguyen LP, Jovanovich SB, Gates CM, Feldman RA, Spudich JL, Spudich EN, DeLong EF: Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science. 2000, 289: 1902-1906. 10.1126/science.289.5486.1902.View ArticlePubMedGoogle Scholar
- Hallam SJ, Putnam N, Preston CM, Detter JC, Rokhsar D, Richardson PM, DeLong EF: Reverse methanogenesis: testing the hypothesis with environmental genomics. Science. 2004, 305: 1457-1462. 10.1126/science.1100025.View ArticlePubMedGoogle Scholar
- Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004, 428: 37-43. 10.1038/nature02340.View ArticlePubMedGoogle Scholar
- Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, Bork P, Hugenholtz P, Rubin EM: Comparative metagenomics of microbial communities1. Science. 2005, 308: 554-557. 10.1126/science.1107851.View ArticlePubMedGoogle Scholar
- Robertson DE, Chaplin JA, DeSantis G, Podar M, Madden M, Chi E, Richardson T, Milan A, Miller M, Weiner DP, Wong K, McQuaid J, Farwell B, Preston LA, Tan X, Snead MA, Keller M, Mathur E, Kretz PL, Burk MJ, Short JM: Exploring nitrilase sequence space for enantioselective catalysis. Appl Environ Microbiol. 2004, 70: 2429-2436. 10.1128/AEM.70.4.2429-2436.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004, 304: 66-74. 10.1126/science.1093857.View ArticlePubMedGoogle Scholar
- Beja O, Spudich EN, Spudich JL, Leclerc M, DeLong EF: Proteorhodopsin phototrophy in the ocean. Nature. 2001, 411: 786-789. 10.1038/35081051.View ArticlePubMedGoogle Scholar
- Bielawski JP, Dunn KA, Sabehi G, Beja O: Darwinian adaptation of proteorhodopsin to different light intensities in the marine environment. Proc Natl Acad Sci U S A. 2004, 101: 14824-14829. 10.1073/pnas.0403999101.PubMed CentralView ArticlePubMedGoogle Scholar
- Man D, Wang W, Sabehi G, Aravind L, Post AF, Massana R, Spudich EN, Spudich JL, Beja O: Diversification and spectral tuning in marine proteorhodopsins. EMBO J. 2003, 22: 1725-1731. 10.1093/emboj/cdg183.PubMed CentralView ArticlePubMedGoogle Scholar
- Lorenz P, Eck J: Metagenomics and industrial applications1. Nat Rev Microbiol. 2005, 3: 510-516. 10.1038/nrmicro1161.View ArticlePubMedGoogle Scholar
- Robertson DE, Mathur E, Swanson RV, Marrs BL, Short JM: The discovery of new biocatalysts from microbial diversity. Society for Industrial Microbiology News. 1996, 46: 3-8.Google Scholar
- Schloss PD, Handelsman J: Biotechnological prospects from metagenomics. Curr Opin Biotechnol. 2003, 14: 303-310. 10.1016/S0958-1669(03)00067-3.View ArticlePubMedGoogle Scholar
- Short JM: Recombinant approaches for accessing biodiversity. Nat Biotechnol. 1997, 15: 1322-1323. 10.1038/nbt1297-1322.View ArticlePubMedGoogle Scholar
- Brenner C: Catalysis in the nitrilase superfamily. Curr Opin Struct Biol. 2002, 12: 775-782. 10.1016/S0959-440X(02)00387-1.View ArticlePubMedGoogle Scholar
- O'Reilly C, Turner PD: The nitrilase family of CN hydrolysing enzymes - a comparative study. J Appl Microbiol. 2003, 95: 1161-1174. 10.1046/j.1365-2672.2003.02123.x.View ArticlePubMedGoogle Scholar
- Pace HC, Brenner C: The nitrilase superfamily: classification, structure and function. Genome Biol. 2001, 2: reviews0001.1–0001.9-10.1186/gb-2001-2-1-reviews0001.View ArticleGoogle Scholar
- Lathe WCIII, Snel B, Bork P: Gene context conservation of a higher order than operons. Trends Biochem Sci. 2000, 25: 474-479. 10.1016/S0968-0004(00)01663-7.View ArticlePubMedGoogle Scholar
- Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL, Szekely LA, Koonin EV: Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res. 2002, 30: 2212-2223. 10.1093/nar/30.10.2212.PubMed CentralView ArticlePubMedGoogle Scholar
- Lawrence JG: Gene organization: selection, selfishness, and serendipity. Annu Rev Microbiol. 2003, 57: 419-440. 10.1146/annurev.micro.57.030502.090816.View ArticlePubMedGoogle Scholar
- Price MN, Huang KH, Alm EJ, Arkin AP: A novel method for accurate operon predictions in all sequenced prokaryotes3. Nucleic Acids Res. 2005, 33: 880-892. 10.1093/nar/gki232.PubMed CentralView ArticlePubMedGoogle Scholar
- Gough J, Karplus K, Hughey R, Chothia C: Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol. 2001, 313: 903-919. 10.1006/jmbi.2001.5080.View ArticlePubMedGoogle Scholar
- Sofia HJ, Chen G, Hetzler BG, Reyes-Spindola JF, Miller NE: Radical SAM, a novel protein superfamily linking unresolved steps in familiar biosynthetic pathways with radical mechanisms: functional characterization using new analysis and information visualization methods. Nucleic Acids Res. 2001, 29: 1097-1106. 10.1093/nar/29.5.1097.PubMed CentralView ArticlePubMedGoogle Scholar
- Wybenga-Groot LE, Draker K, Wright GD, Berghuis AM: Crystal structure of an aminoglycoside 6'-N-acetyltransferase: defining the GCN5-related N-acetyltransferase superfamily fold. Structure Fold Des. 1999, 7: 497-507. 10.1016/S0969-2126(99)80066-5.View ArticlePubMedGoogle Scholar
- Sulavik MC, Gambino LF, Miller PF: The MarR repressor of the multiple antibiotic resistance (mar) operon in Escherichia coli: prototypic member of a family of bacterial regulatory proteins involved in sensing phenolic compounds. Mol Med. 1995, 1: 436-446.PubMed CentralPubMedGoogle Scholar
- Allard ST, Giraud MF, Naismith JH: Epimerases: structure, function and mechanism. Cell Mol Life Sci. 2001, 58: 1650-1665.View ArticlePubMedGoogle Scholar
- Tanner ME: Understanding nature's strategies for enzyme-catalyzed racemization and epimerization. Acc Chem Res. 2002, 35: 237-246. 10.1021/ar000056y.View ArticlePubMedGoogle Scholar
- Fretland AJ, Omiecinski CJ: Epoxide hydrolases: biochemistry and molecular biology. Chem Biol Interact. 2000, 129: 41-59. 10.1016/S0009-2797(00)00197-6.View ArticlePubMedGoogle Scholar
- Yang Z, Bielawski JP: Statistical methods for detecting molecular adaptation. Trends Ecol Evol. 2000, 15: 496-503. 10.1016/S0169-5347(00)01994-7.View ArticlePubMedGoogle Scholar
- Yang Z: Inference of selection from multiple species alignments. Curr Opin Genet Dev. 2002, 12: 688-694. 10.1016/S0959-437X(02)00348-9.View ArticlePubMedGoogle Scholar
- Bielawski JP, Yang Z: Maximum likelihood methods for detecting adaptive evolution after gene duplication. J Struct Funct Genomics. 2003, 3: 201-212. 10.1023/A:1022642807731.View ArticlePubMedGoogle Scholar
- Zhang J, Rosenberg HF, Nei M: Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci U S A. 1998, 95: 3708-3713. 10.1073/pnas.95.7.3708.PubMed CentralView ArticlePubMedGoogle Scholar
- Muse SV, Gaut BS: A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994, 11: 715-724.PubMedGoogle Scholar
- Ohno S: Evolution by Gene Duplication. 1970, SpringerView ArticleGoogle Scholar
- Dykhuizen D, Hartl DL: Selective neutrality of 6PGD allozymes in E. coli and the effects of genetic background. Genetics. 1980, 96: 801-817.PubMed CentralPubMedGoogle Scholar
- Rodriguez-Trelles F, Tarrio R, Ayala FJ: Convergent neofunctionalization by positive Darwinian selection after ancient recurrent duplications of the xanthine dehydrogenase gene. Proc Natl Acad Sci U S A. 2003, 100: 13413-13417. 10.1073/pnas.1835646100.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang J: Evolution by gene duplication: an update. Trends Ecol Evol. 2003, 18: 292-298. 10.1016/S0169-5347(03)00033-8.View ArticleGoogle Scholar
- Suzuki Y, Nei M: Simulation study of the reliability and robustness of the statistical methods for detecting positive selection at single amino acid sites. Mol Biol Evol. 2002, 19: 1865-1869.View ArticlePubMedGoogle Scholar
- Suzuki Y, Nei M: False positive selection identified by ML-based methods: examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus. Mol Biol Evol. 2004, 21: 914-921. 10.1093/molbev/msh098.View ArticlePubMedGoogle Scholar
- Wong WS, Yang Z, Goldman N, Nielsen R: Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 2004, 168: 1041-1051. 10.1534/genetics.104.031153.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R, Goldman N, Pedersen AM: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000, 155: 431-449.PubMed CentralPubMedGoogle Scholar
- Endo T, Ikeo K, Gojobori T: Large-scale search for genes on which positive selection may operate5. Mol Biol Evol. 1996, 13: 685-690.View ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 2002, 19: 908-917.View ArticlePubMedGoogle Scholar
- Pace HC, Hodawadekar SC, Draganescu A, Huang J, Bieganowski P, Pekarsky Y, Croce CM, Brenner C: Crystal structure of the worm NitFhit Rosetta Stone protein reveals a Nit tetramer binding two Fhit dimers. Curr Biol. 2000, 10: 907-917. 10.1016/S0960-9822(00)00621-7.View ArticlePubMedGoogle Scholar
- Wang WC, Hsu WH, Chien FT, Chen CY: Crystal structure and site-directed mutagenesis studies of N-carbamoyl-D-amino-acid amidohydrolase from Agrobacterium radiobacter reveals a homotetramer and insight into a catalytic cleft. J Mol Biol. 2001, 306: 251-261. 10.1006/jmbi.2000.4380.View ArticlePubMedGoogle Scholar
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A. 1999, 96: 2896-2901. 10.1073/pnas.96.6.2896.PubMed CentralView ArticlePubMedGoogle Scholar
- Itoh T, Takemoto K, Mori H, Gojobori T: Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. Mol Biol Evol. 1999, 16: 332-346.View ArticlePubMedGoogle Scholar
- Tan K, Moreno-Hagelsieb G, Collado-Vides J, Stormo GD: A comparative genomics approach to prediction of new members of regulons. Genome Res. 2001, 11: 566-584. 10.1101/gr.149301.PubMed CentralView ArticlePubMedGoogle Scholar
- Osterman A, Overbeek R: Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol. 2003, 7: 238-251. 10.1016/S1367-5931(03)00027-9.View ArticlePubMedGoogle Scholar
- Tamames J: Evolution of gene order conservation in prokaryotes. Genome Biol. 2001, 2 (6): Research0020-10.1186/gb-2001-2-6-research0020.PubMed CentralView ArticlePubMedGoogle Scholar
- Makarova KS, Koonin EV: Comparative genomics of Archaea: how much have we learned in six years, and what's next?. Genome Biol. 2003, 4: 115-10.1186/gb-2003-4-8-115.PubMed CentralView ArticlePubMedGoogle Scholar
- Babbitt PC: Definitions of enzyme function for the structural genomics era. Curr Opin Chem Biol. 2003, 7: 230-237. 10.1016/S1367-5931(03)00028-0.View ArticlePubMedGoogle Scholar
- DeLong EF: Microbial population genomics and ecology: the road ahead. Environ Microbiol. 2004, 6: 875-878. 10.1111/j.1462-2920.2004.00668.x.View ArticlePubMedGoogle Scholar
- Rodriguez-Valera F: Environmental genomics, the big picture. FEMS Microbiol Lett. 2004, 231: 153-158. 10.1016/S0378-1097(04)00006-0.View ArticlePubMedGoogle Scholar
- The Institute for Genome Research. 2005, [http://www.tigr.org]
- DOE Joint Genome Institute. 2005, [http://www.jgi.doe.gov/]
- Hall T: BioEdit. 2005, [http://www.mbio.ncsu.edu/BioEdit/bioedit.html]Google Scholar
- Felsenstein J: PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.Google Scholar
- Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502.View ArticlePubMedGoogle Scholar
- Swofford DL: PAUP*: phylogenetic analysis using parsimony (*and other methods). 1998, Sinauer Associates, Sunderland, Mass., [http://paup.csit.fsu.edu/about.html]Google Scholar
- Posada D, Crandall KA: MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998, 14: 817-818. 10.1093/bioinformatics/14.9.817.View ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.PubMedGoogle Scholar
- Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998, 15: 568-573.View ArticlePubMedGoogle Scholar
- Xiang SZ Jackal: A Protein Structure Modeling Package. 2005, [http://honiglab.cpmc.columbia.edu/programs/jackal]Google Scholar
- DeLano WL: The PyMOL Molecular Graphics System. 2002, DeLano Scientific, San Carlos, CA, USA., [http://www.pymol.org]Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.