Rapid birth-and-death evolution of the xenobiotic metabolizing NAT gene family in vertebrates with evidence of adaptive selection
© Sabbagh et al; licensee BioMed Central Ltd. 2013
Received: 30 November 2012
Accepted: 27 February 2013
Published: 7 March 2013
The arylamine N-acetyltransferases (NATs) are a unique family of enzymes widely distributed in nature that play a crucial role in the detoxification of aromatic amine xenobiotics. Considering the temporal changes in the levels and toxicity of environmentally available chemicals, the metabolic function of NATs is likely to be under adaptive evolution to broaden or change substrate specificity over time, making NATs a promising subject for evolutionary analyses. In this study, we trace the molecular evolutionary history of the NAT gene family during the last ~450 million years of vertebrate evolution and define the likely role of gene duplication, gene conversion and positive selection in the evolutionary dynamics of this family.
A phylogenetic analysis of 77 NAT sequences from 38 vertebrate species retrieved from public genomic databases shows that NATs are phylogenetically unstable genes, characterized by frequent gene duplications and losses even among closely related species, and that concerted evolution only played a minor role in the patterns of sequence divergence. Local signals of positive selection are detected in several lineages, probably reflecting response to changes in xenobiotic exposure. We then put a special emphasis on the study of the last ~85 million years of primate NAT evolution by determining the NAT homologous sequences in 13 additional primate species. Our phylogenetic analysis supports the view that the three human NAT genes emerged from a first duplication event in the common ancestor of Simiiformes, yielding NAT1 and an ancestral NAT gene which in turn, duplicated in the common ancestor of Catarrhini, giving rise to NAT2 and the NATP pseudogene. Our analysis suggests a main role of purifying selection in NAT1 protein evolution, whereas NAT2 was predicted to mostly evolve under positive selection to change its amino acid sequence over time. These findings are consistent with a differential role of the two human isoenzymes and support the involvement of NAT1 in endogenous metabolic pathways.
This study provides unequivocal evidence that the NAT gene family has evolved under a dynamic process of birth-and-death evolution in vertebrates, consistent with previous observations made in fungi.
KeywordsN-acetyltransferases Gene family Vertebrates Gene duplication Adaptive selection
Arylamine N-acetyltransferases (NATs) are xenobiotic metabolizing enzymes found in a wide range of species across all major clades of life (bacteria, archaea, eukaryotes), except plants [1–3]. This unique family of enzymes catalyses the transfer of an acetyl group from acetyl-coenzyme A (acetyl-CoA) to the terminal nitrogen of arylamine, hydrazine and heterocyclic amine compounds. Biochemical, structural and functional studies of the last decades have increased our understanding of the potentially diverse roles of NATs in endogenous and xenobiotic metabolism, establishing their key role in cellular homeostasis as well as in gene-environment interactions [4–6].
Humans have two functional NAT genes (NAT1 and NAT2) and one pseudogene (NATP), located within 200 kilobases (kb) on chromosome 8 [7, 8]. Both NAT1 and NAT2 genes have intronless coding regions with a single coding exon of 870 base pairs (bp) that produces a 290 amino acid (34 kDa) protein. They share 81% nucleotide identity, which translates to 87% identity at the amino acid level. The pseudogene NATP displays high sequence identity to NAT1 and NAT2 (79%) but contains multiple frameshift and premature stop codon mutations leading to loss of function . Despite their high degree of sequence identity, NAT1 and NAT2 encode isoenzymes with distinct substrate specificities, tissue distribution and expression levels during development. NAT1 is widely expressed in most tissues and cell types and preferentially acetylates substrates such as p-aminobenzoic acid (pABA), p-aminosalicylic acid (pASA) and p-aminobenzoylglutamate (pABG). NAT2 has a more restricted expression profile, being predominantly expressed in the liver, small intestine and colon, and prefers bulkier substrates such as sulphamethazine, isoniazid, procainamide, and dapsone (reviewed in ). The widespread tissue distribution of NAT1, its selectivity for pABG as well as its early expression in development (as early as the blastocyst stage ), have suggested that NAT1 may have an endogenous role in addition to the metabolism of xenobiotics. Acetylating the folate breakdown product pABG is now generally accepted to be an endogenous role of this enzyme which might be important in normal embryonic development [12, 13]. The importance of NAT1 and NAT2 in the metabolism of drugs and in the activation of common environmental carcinogens has led to a plethora of molecular epidemiological studies that have shown associations of NAT gene polymorphisms with individual drug response and susceptibility to cancers linked to arylamine exposure [14–16].
Because of their role in the detoxification of exogenous chemicals present in the diet and the environment, human NAT1 and NAT2 genes have long been considered as likely targets of population-specific selective pressures, a hypothesis further fuelled by the intriguing patterns of geographic differentiation of their major alleles [17–24]. In particular, the NAT2 gene, which exhibits a well established acetylation polymorphism leading to a phenotype classification of individuals into rapid and slow acetylators, has been the subject of numerous studies examining its nucleotide sequence variation in a wide range of ethnically diverse populations and determining the role of natural selection in shaping genetic variation at this locus. Several surveys provided clear evidence for a correlation between the prevalence of slow acetylators in human populations and the subsistence strategy adopted by their ancestors in the last 10,000 years, suggesting that a slower rate of acetylation may have provided a selective advantage in populations shifting from foraging to pastoralism/agriculturalism in the Neolithic period [18, 20, 21, 24]. It is thus currently held that the Neolithic transition triggered significant changes in dietary exposure to environmental chemicals that modified the selective regime affecting the NAT2 acetylation pathway. Likewise, and more generally, considering the temporal fluctuations in the kinds and levels of environmentally available xenobiotic compounds over evolutionary time, the metabolic function of NATs is likely to be under adaptive evolution and the NAT genes could be broadening or changing substrate specificity in response to changing environmental conditions. Yet, in contrast to the numerous population genetic studies carried out for human polymorphic NAT genes, no study to date has investigated the role of positive selection in the expansion and functional diversification of the NAT genetic loci at a deeper evolutionary time scale.
The aim of the present work was to thoroughly examine the molecular evolutionary history of the NAT gene family during the last ~450 million years. A phylogenetic analysis of 77 NAT homologous nucleotide sequences from 38 vertebrate species retrieved from genomic databases was used to determine the likely contribution of gene duplication, gene conversion and positive selection to the evolutionary dynamics of this family. A special emphasis was put on the study of the last ~85 million years of primate NAT evolution by determining the NAT coding sequences in 13 additional primate species. The results of this work greatly improved our understanding of the evolutionary regime that has shaped the NAT protein-coding sequence: it allowed us to clarify the paralogous and orthologous relationships among the NAT genes in vertebrate species, date the gene-duplication events giving rise to the human NAT gene family, and identify episodes of adaptive evolution at specific sites and domains of the protein.
Phylogenetic analysis of vertebrate NAT sequences
To study the molecular evolution of the NAT gene family in vertebrates, we retrieved all available NAT-homologous sequences from public genomic databases and compiled a dataset of 77 NAT coding sequences from 38 species representing all major vertebrate taxa (‘vertebrate dataset’). These 77 sequences included 68 full-length open reading frames (ORFs) with sizes ranging from 864 to 870 bp and 9 partial coding sequences containing 15 to 166 missing nucleotides. In most species, we identified one or two distinct NAT sequences but in some others, up to five functional NAT genes were found (7, 2 and 1 species harbored 3, 4 and 5 distinct sequences, respectively). A second NAT sequence was identified in Dipodomys ordii and Vicugna pacos but the corresponding ORFs were missing 300 and 500 nucleotides, respectively, so they were not included in the final dataset. No NAT-like sequences were found in the current assembly of the dog genome (CanFam2.0), confirming the previous findings of Trepanier and colleagues  who were unable to detect NAT homologues in any of 25 domestic dogs and 8 wild canids studied. The proportion of sequence identity to human NAT1 varied from 51.2% (Danio rerio NAT4) to 98.9% (Gorilla gorilla NAT1) at the nucleotide level. The alignment result of the 77 nucleotide sequences is provided in Additional file 1. Three sequences showed a 3-bp insertion at positions 385–387 resulting in an additional amino acid in the protein (either a phenylalanine in Ornithorhynchus anatinus NAT1 or a proline in Gallus gallus NAT2 and Meleagris gallopavo NAT2), while eleven sequences from seven species carried a deletion of either 3, 6 or 9 bp, resulting in a loss of either 1, 2 or 3 contiguous amino acids. In all sequences, the three residues composing the ‘Cys-His-Asp’ catalytic triad (Cys68, His107, Asp122), which is essential to the NAT enzymatic activity, were conserved, indicating that the encoded enzymes are likely to be functional.
Significant recombination events detected among NAT genes using six different methods
NAT1 : NAT2
9.06 E-3–1.00 E-6
GeneConv, RDP, MaxChi, Chimaera, Bootscan, SiScan
NAT1 : NAT2
8.87 E-3–6.39 E-6
Geneconv, RDP, MaxChi, Chimaera, Bootscan
NAT1 : NAT2
NAT1 : NAT2
6.06 E-4–1.39 E-4
NAT1 : NAT2
NAT1 : NAT2
9.32 E-3–2.93 E-4
RDP, Chimaera, Bootscan, SiScan
NAT1 : NAT2
NAT2 : NAT3
Maximum-likelihood analysis of selective pressures in vertebrate NAT sequences
Results of PAML analyses for the vertebrate dataset
Proportion of sites with ω > 1 (average ω for these sites)
Positively selected codonsa
All vertebrates ( n = 77)
M1a vs. M2a
P = 0.0005
0.017 (ω = 2.0)
M7 vs. M8
P < 0.0001
0.050 (ω = 1.3)
97**, 98*, 104*, 286*
M0 vs. free ratios
P < 0.0001
Mammals ( n = 55)
M1a vs. M2a
P < 0.0001
0.027 (ω = 2.3)
M7 vs. M8
P < 0.0001
0.066 (ω = 1.6)
97**, 98*, 104*, 214*, 286**
M0 vs. free ratios
P < 0.0001
Primates ( n = 17)
M1a vs. M2a
P = 1.0
M7 vs. M8
P = 1.0
M0 vs. free ratios
P = 0.13
When restricting the analysis to the 55 mammalian sequences of the vertebrate data set, similar results were obtained. Model M8 provided the best fit to the data with 6.6% of sites having a positive ω of 1.6 (Table 2). The same codons were pinpointed to be under positive selection with PP ≥ 95%, plus an additional site located in the acetyl-CoA binding pocket (codon 214). However, when the analysis was restricted to the 17 primate sequences, no significant evidence of positive selection was detected (P > 0.05), neither at the site nor at the branch level. This suggests either that the selective pressure that recurrently drove positive selection of vertebrate NATs became relaxed in the primate lineage, or that the statistical power provided by the small number of sequences available for analysis may be insufficient to get significant results when performing the LRT tests implemented in PAML. These tests have indeed been shown to lack power when data contain only few numbers of slightly divergent sequences . In such a case, sampling a larger number of lineages is considered as the best way to improve the accuracy and power of LRTs and has been shown to cause a spectacular rise in power, even when sequence divergence is low [30, 31]. Therefore, we sought to gain power by determining the NAT coding sequences in 13 additional primate species, bringing up to 22 the number of species available for analysis in the primate clade.
NAT sequence evolution and analysis of selective pressures in the primate lineage
A more thorough investigation of the evolutionary history of the NAT gene family was conducted in primates by extending the analysis to 13 additional primate species (including 3 hominoids, 8 Old World and 2 New World monkeys) in which NAT nucleotide sequences were determined for the first time. Two functional NAT genes were found in all 13 species, whereas a third NAT sequence, abolished by frameshift mutations and highly similar to the human pseudogene NATP, was found in ten of them. In all, the full primate data set comprised 59 nucleotide sequences from 22 primate species, including 17 database sequences and 42 newly generated ones. Whereas functional sequences were all of the same length (a full NAT ORF of 870 bp), pseudogene sequences displayed considerable length variation due to the significant number of indels disrupting NAT ORF (ranging from 865 bp for Colobus guereza NATP to 881 bp for Gorilla gorilla NATP), resulting in a multiple alignment of 890 sites (Additional file 5). These indels involve both single and multiple nucleotides. All NATP sequences have in common a single 1-bp deletion in position 77, which is likely to be the first frameshift mutation that occurred in the ancestral NATP coding sequence.
The phylogenetic analysis also demonstrates that NAT1 and NAT2 paralogs found in Microcebus murinus and Tarsius syrichta are the result of independent gene duplication events that occurred after the divergence of these species from the simian lineage and after the divergence of Strepsirrhini and Tarsiiformes. Because no gene conversion was detected with any of the six recombination detection methods for any pair of sequences in the primate dataset, we can exclude concerted evolution as a possible explanation for the greater similarity of NAT paralogs within M. murinus and T. syrichta species. Otolemur garnetti is the only primate species with a single NAT gene identified. However, genomic coverage is still limited (2-fold) for this species and we cannot exclude that additional NAT copies may exist in its genome.
Results of PAML analyses for the primate dataset
Proportion of sites with ω > 1 (average ω for these sites)
Positively selected codonsa
All NAT coding sequences ( n = 43)
M1a vs. M2a
P = 1.0
M7 vs. M8
P = 0.17
M0 vs. free ratios
P = 0.006
Orthologous sequences to human NAT1 ( n = 19)
M1a vs. M2a
P = 1.0
M7 vs. M8
P = 1.0
M0 vs. free ratios
P = 0.04
Orthologous sequences to human NAT2 ( n = 19)
M1a vs. M2a
P = 0.01
0.038 (ω = 4.8)
M7 vs. M8
P = 0.004
0.041 (ω = 4.6)
M0 vs. free ratios
P = 0.77
A possible limitation of our analysis is that it relies on the assumption that the different orthologous and paralogous sequences compared have equivalent functions. The inclusion in the alignment of homologous proteins with different biological functions which may have experienced different selective regimes can indeed give rise to incorrect estimations of ω. This could be the case if, as observed for the human NAT1 and NAT2 paralogs, the two ancestral NAT1 and NAT2 paralogs diverged in function soon after the duplication event, before the first speciation leading to Catarrhini and Platyrrhini. Thus, in order to maintain a correct comparative framework and to avoid the inclusion of paralogous sequences where changes in protein function occurred, we conducted separate analyses of the sets of orthologous NAT1 sequences and orthologous NAT2 sequences found in Simiiformes.
For the data set including the 19 sequences orthologous to human NAT1, models incorporating positive selection (M2a and M8) provided no better fit to the data than models including only purifying selection and neutrally evolving sites (M1a, M7) (Table 3). The null model M0, which assumes only one ω for all amino acid sites, could not be rejected (P > 0.05). Under this model, the estimate of ω was 0.19, indicating that NAT1 sequences have been subject to strong purifying selection to maintain protein function over time. The selective regime has been variable among species since different ω values along different branches were significantly different from each other (P = 0.04). Interestingly, despite the dominant role of purifying selection in NAT1 protein evolution, a recent episode of positive selection was detected specifically and exclusively in the human lineage (branch b with ω = ∞; Figure 5). Five codon sites were found to show nonsynonymous substitutions along this branch (E46D, I149V, E195K, A214S, E276Q), whereas no synonymous codon changes occurred.
By contrast, site-specific codeml analysis performed on the set of sequences orthologous to human NAT2 yielded very different results with a rather strong evidence of positive selection (M2a vs. M1a: P = 0.01; M8 vs. M7: P < 0.004, Table 3), indicating that NAT2 has been subject to diversifying selection to change its amino acid sequence over time (Table 3). Parameter estimates under model M8, which provided the best fit to the data, suggested 4.1% of sites to be under positive selection with ω = 4.6. Estimates from posterior probability provided significant support of positive selection for codons 173 and 191 (PP ≥ 95%) under M8. It is noteworthy that these two codons were also identified as positively selected with PP ranging from 77% to 87% in the three NAT2 clades pointed out by multiple branch-site analysis (C7, C8 and C9, Figure 5). Codon site 129, which plays a key role in determining NAT2 substrate selectivity, was identified with lower confidence (PP ≥ 90%). Although the free-ratio model was not a better fit to the data than a single ω ratio for the entire NAT2 phylogeny (2ΔlnL = 27.52, P > 0.05), some branches did exhibit ω > 1 (branches c, d and e; Figure 5).
Variation of selective pressures along the NAT protein sequence
The well-known relationship between the primary sequence of eukaryotic NATs and the structural and functional features of these enzymes prompted us to examine the variation of ω along the NAT protein sequence in the different datasets investigated: (i) the vertebrate dataset, including the full set of 77 vertebrate sequences as well as a subset of 55 mammalian sequences, and (ii) the primate dataset, including the full set of 43 primate sequences as well as the subset of 19 simian NAT2 orthologous sequences. We looked at the variation of the posterior mean of ω as estimated by the codeml program under the site-specific model that best fitted the data (Figure 6). Moreover, we tested for significant differences in the mean ω value between different parts of the protein (domain I, domain II, interdomain, domain III, 17-residue insert, C-terminal tail) and different sets of sites (residues involved either in CoA binding or in substrate binding, Additional file 6: Table S2 and Additional file 7: Table S3). Our results revealed a significant heterogeneity of ω between the four protein domains (domain I, domain II, interdomain, and domain III) in the four datasets analysed (Kruskal-Wallis P values ≤ 0.03). More specifically, the first domain was found to evolve under more selective constraints, with a mean ω value being significantly lower than those observed for domain II, domain III or the rest of the sequence (Mann-Whitney P < 0.05, Additional file 6: Table S2 and Additional file 7: Table S3). By contrast, domains II and III did not show any differences in their mean ω value when compared to the rest of the sequence (Mann-Whitney P > 0.05). These results were consistently observed in the four datasets examined. Besides, the interdomain region, 17-residue insert and C-terminal tail were all found to evolve in vertebrates under more relaxed selective constraints than the rest of the sequence, as reflected by significantly higher ω values (Mann-Whitney P = 0.03, P = 0.02, P = 0.007, respectively). A similar finding was observed for the C-terminal tail in the mammalian (P = 0.06), primate (P = 0.01) and simian NAT2 (P = 0.03) datasets, consistent with the major role of this region in determining arylamine substrate specificity. Interestingly, the mean ω value of the set of sites involved in CoA binding was strikingly high when compared to the mean ω value at the global protein level in vertebrates (0.70 vs. 0.34, Mann-Whitney P = 0.002), mammals (0.77 vs. 0.40, P = 0.003) and primates (0.73 vs. 0.47, P = 0.011). Conversely, adaptive molecular evolution was evidenced in the simian NAT2 dataset for those sites involved in substrate binding compared to the full set of sites in the protein (1.31 vs. 0.66, P = 0.044).
The general role of NATs in the detoxification and metabolic activation of aromatic amine xenobiotics, ranging from dietary components to common environmental toxins and pharmaceuticals, has been well documented (reviewed in [34–36]). The possibility that the NAT enzymes could be broadening or changing their substrate specificity in accordance to the high diversity of xenobiotics compounds environmentally available suggests that their metabolic function could be under adaptive evolution and makes them a promising subject for evolutionary analyses. Numerous studies have indeed identified the signature of different selective pressures in genes involved in the metabolism of exogenous substances [37–44]. However, in contrast to the evolutionary processes affecting the NAT2 gene in humans, which have been the subject of extensive research, little is known about the role of molecular adaptation in the evolution and diversification of the NAT gene family as a whole on a longer evolutionary time scale.
The ongoing sequencing of entire genomes from various organisms is providing unparalleled opportunity to trace the evolutionary history of the NAT gene family by enabling the inference of the phylogenetic relationships among NAT sequences over a wide range of taxa. Several previous surveys of public genomic databases have retrieved NAT-like sequences and documented the distribution of NAT genes across all major clades of life [1, 2, 45]. The last survey by Glenn et al.  provided an exhaustive annotation of NAT-homologous sequences recovered through inspection of 2445 genomes encompassing all major taxa from bacteria and archaea to protists and fungi, to animals. The same investigators also performed phylogenetic analyses of the retrieved NAT-homologous protein sequences, providing a broad perspective of NAT evolution. Another study, focusing specifically on the phylogeny of NATs in fungi, has also been published since then . In the present study, we expanded the previous survey by Glenn et al.  by performing a comprehensive search of NAT-like sequences in the genomes of vertebrate species and compiled an up-to-date dataset of 77 vertebrate NAT sequences from 38 distinct species, among which 26 were identified for the first time (Figure 1). In contrast to the previous dataset of 55 vertebrate sequences in Glenn et al. , we decided to exclude one lizard and three fish sequences from analysis (Anolis carolinensis NAT3, Oryzias latipes NAT3, Fugu rubripes NAT1, Tetraodon nigroviridis NAT1) because they aligned poorly with other vertebrate NATs and introduced too many gaps and alignment ambiguities. The deduced protein sequence of these four genes clustered together in a monophyletic clade with three invertebrate NAT protein sequences at the basal position of the vertebrate NAT phylogeny reconstructed by Glenn et al. . The authors suspected the possibility that this clustering might be an artifact stemming from long branch attraction, supporting our decision to exclude these particular sequences from our phylogenetic analysis of vertebrate NAT genes. It is also important to note that, because the vertebrate genomes we considered are in various stages of draft sequencing, assembly, and annotation, we cannot exclude that the present survey missed additional extant NAT sequences. This warrants some caution in the conclusions that may be drawn from the present phylogenetic analysis. The increased availability of genome sequence data from diversified taxa is likely to continue to improve our understanding of the evolutionary history of the NAT gene family in vertebrates.
Our phylogenetic analysis of vertebrate NAT sequences nevertheless demonstrated that the NAT gene family has evolved under a dynamic process called birth-and-death evolution. This process, which operates through three major mechanisms - neofunctionalization, subfunctionalization, and pseudogenization, is thought to be an important source of genetic diversity and evolutionary change that affords functional diversification over short timescales . Our results are in accordance with the previous observations by Thomas  that genes encoding enzymes that function as xenobiotic detoxifiers are often phylogenetically unstable genes that undergo rapid birth-and-death evolution, possibly in response to changing environmental conditions. It is noteworthy that Martins et al.  found similar patterns of gain and loss in fungi, arguing for a complex dynamics of these genes in a broader range of organisms.
Concerted evolution via interlocus gene conversion is increasingly recognized as a major feature of evolution in small multigene families (e.g., [37, 47–49]). By homogeneizing the sequence of paralogous gene copies, the converted paralogs may come to resemble one another more than they do to orthologous sequences in other species. Therefore, gene conversion may potentially alter the relationships among paralogs and lead to the conclusion of independent duplications instead of multiple gene conversions in multiple species. The work carried on the CYP1A1 subfamily  provides a good example of how gene conversion can obfuscate gene orthology relationships and lead to incorrect conclusions when based solely on the results of traditional phylogenetic analyses. However, identification of gene conversion events remains a challenging task as the methods commonly employed for detecting such events can have a high false-negative rate, particularly when gene conversion is frequent and covers a large portion of the duplicates . It is thus advocated to consider the occurrence of gene conversion using various recombination detection algorithms and to combine information coming from both phylogenetic analysis and fine-scale synteny maps. We have used six different methods to evidence possible recombination events between paralogous pairs. Only four events, each involving a NAT1-NAT2 pair, were detected by at least two methods (one each in the fish Oryzias latipes and rat Rattus norvegicus, and two in the bat Myotis lucifugus, Table 1), thus suggesting that concerted evolution has played only a minor role in the diversification of the vertebrate NAT gene family. This conclusion is further supported by: (i) the identical phylogenetic relationships inferred from NAT sequences of which the regions putatively involved in gene conversion events were truncated (Additional file 2: Figure S1), and (ii) the fine-scale synteny comparisons between the three avian species analyzed (Figure 2), as well as between human and mouse (Additional file 3: Figure S2), which evidenced multiple gene duplications events occurring independently in specific lineages. Thus, our analysis of vertebrate NAT sequence data suggests that gene conversion is unlikely to have played a major role in the patterns of divergence of NAT sequences and rather supports a scenario of multiple independent gene duplications.
In humans, there are two NAT isoenzymes encoded at two polymorphic loci. NAT2 polymorphisms modify individual cancer risk and drug response, or susceptibility to adverse drug reactions [15, 16, 51]. Although less well-established, human NAT1 also exhibits genetic polymorphism and several-albeit as yet inconclusive-studies have suggested that variant NAT1 genotypes are associated with susceptibility to a number of diseases including various cancers  and birth defects [52–56]. Given their high level of polymorphism and their association with variable responses to environmental toxins and drugs, these genes are good candidates to test for positive selection and several studies have investigated the possible role of natural selection in shaping genetic variation at these loci (e.g., [17, 23] etc.). Besides their role in phase II metabolism of xenobiotics, several studies have explored the possible endogenous roles of these enzymes. While no endogenous substrate has been identified to date for NAT2, several lines of evidence support the role of human NAT1 in folate catabolism, thereby providing an explanation for its postulated association with congenital defects linked to a disruption in folate metabolism . The widespread tissue distribution of NAT1 and its early expression in development further support a physiological role of this enzyme which might be essential to normal embryonic development (see  for review). Therefore, since NAT1 and NAT2 appear to have distinct functional roles in humans, it is worthwhile to investigate whether these genes have experienced different selective regimes throughout evolution. The identification of the true orthologous sequences to human NAT1 and human NAT2 in Simiiformes and their determination in 13 additional simian species (Figure 4) provided us the necessary power to enable a separate analysis of the two sets of sequences and evaluate the changes in selective pressures experienced by the two gene copies after their functional divergence. Interestingly, two distinct evolutionary patterns emerged for the two paralogs (Table 3). Our analysis suggested a dominant role of purifying selection in NAT1 protein evolution, acting for a conservation of biochemical functions which is in agreement with the role of NAT1 in endogenous metabolism and homeostasis. Note, however, that a signal of positive selection was detected in the human lineage for this gene and that the pattern of evolution of NAT1 is likely to be mosaic with some evidence of positive selection in certain lineages (in humans but maybe also in several other species not included in the present analysis). By contrast, in most of the species investigated, NAT2 was predicted to evolve under diversifying selection to change its amino acid sequence over time, probably in response to changes in xenobiotic exposure. This finding is consistent with the observations made at a population level within the human species by previously published studies which supported an adaptive evolution of the NAT2 gene through either balancing or directional selection [17, 18, 20, 21, 23, 24]. Similarly, the low level of polymorphism reported at NAT1 within the human species [17, 23] is consistent with the action of purifying selection evidenced at the interspecies level in the present study. Evolutionary analyses thus strongly support a differential role of the two isoenzymes and the involvement of NAT1 in endogenous metabolic pathways. Although the folate catabolite pABG is the only endogenous substrate identified to date, one cannot dismiss the possibility that other as yet unknown endogenous substrates and physiological roles may exist for NAT1.
Besides an evaluation of the selective forces acting on members of the NAT gene family, our ML-based phylogenetic analysis allowed us to estimate the strength of natural selection acting at a codon level and to shed light on episodes of adaptive evolution at specific sites and domains of the protein. Several amino acid sites were predicted to be under positive selection with high PP (PP ≥ 95%) throughout vertebrate evolution: codons 97, 98, 104, 214, 286. Interestingly, four of these codon positions were shown to be involved in CoA binding according to the recent structural studies performed on human NAT1 and NAT2 isoenzymes . The residues 97P and 98 V are involved in hydrophobic interactions with the adenine ring of CoA, whereas the amide nitrogen of residue 104 G and the hydroxyl group of 214 T form hydrogen bonds with the pyrophosphate group of CoA. It is noteworthy that the fifth codon position predicted to be under adaptive evolution is the site of a well-known polymorphism in the human NAT2 protein: G286E (c.857 G > A), which defines the NAT2*7 slow haplotype series, is one of the four major nonsynonymous substitutions encountered in human populations and is particularly common in Asia . Functional studies of the G286E variant in mammalian cells demonstrated reduced affinity to both substrate and cofactor acetyl-CoA, resulting in reduced catalytic activity towards some substrates (such as sulfamethazine and dapsone) but not others (such as 2-aminofluorene and isoniazid) . Codon 286 is located on the C-terminal tail in the third domain directly adjacent to the active site. Because the C-terminal tail has an important role in defining the size and shape of the active site cavity, a significant amino acid change at this position is likely to alter active site access and substrate selectivity, which is consistent with the substrate-dependent activity changes observed experimentally for the G286 variant. Such a significant change to a C-terminal residue adjacent to the active site is also likely to affect acetyl-CoA binding and 68C acetylation . Note that the branch-site tests identified four additional sites (11, 100, 102 and 272) as having evolved under positive selection along particular lineages of the vertebrate phylogeny or in a specific clade (B4, B12 and C2, Figure 3). Interestingly, codon 102 is also known to be involved in CoA binding and codon 272 is located in the C-terminus region of the protein. Moreover, two codon sites (173 and 191) were pinpointed to be under diversifying selection throughout simian NAT2 evolution. While the functional and/or structural significance of these two codons have not yet been explored, they are likely to be important in the function of the NAT2 protein and further investigations are warranted to define their potential relevance. In contrast to these positively selected codons, very low values of ω, suggesting rather strong functional constraints, were observed for the three sites of the catalytic triad: 0.054, 0.051 and 0.050 for Cys68, His107, Asp122, respectively, in the vertebrate dataset. These three sites were all conserved in the entire set of NAT coding sequences considered, thereby confirming the importance of these three residues in the activation of the active site cysteine residue. Interestingly, domain I was found to be more constrained than the other domains of the protein, consistent with the lower variability displayed by this region (mean Shannon entropy (H) = 0.70) as compared to domain II (H = 0.84) and domain III (H = 0.93). Conversely, higher ω values were observed for the interdomain, 17-residue insert and C-terminus regions of the protein. In particular, the C-terminal tail was shown to evolve under more relaxed selective constraints in the four datasets examined and many sites pinpointed to be under adaptive evolution were located in this region. This is consistent with the association of this region with different acetyl-CoA binding properties and substrate specificities.
In this study, we provided unequivocal evidence that the NAT gene family has undergone a complex history of duplications and possibly gene losses, as observed for many multigene families encoding xenobiotic-metabolizing enzymes (e.g., [39–41, 43]). We also found evidence of positive selection for amino acid change in the NAT protein sequence in many lineages throughout vertebrate evolution, suggesting that diversification of the NAT gene family is achieved by a combination of gene duplication and selection-driven divergence in sequence. By then focusing on the evolution of this family in primates, we revealed different selective regimes for the two primate paralogs NAT1 and NAT2, consistent with a differential role of the two isoenzymes. Our evolutionary analyses strongly support the key role of the C-terminal region which should be a primary focus of future functional studies.
Collection of vertebrate NAT sequences from genomic databases
All available vertebrate genomes in various stages of draft sequencing, assembly, and annotation were searched for the presence of putative NAT genes by TBLASTN searches in the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov) and Ensembl (http://www.ensembl.org) databases, using the protein sequence of human NAT1 as the query and an E-value cut-off of 10-5. Once we identified the first set of vertebrate sequences using the human probe, subsequent searches were conducted by using the sequences of representative species within a given taxonomic class or order to identify additional sequences in the same group of taxa. More specific searches were also performed using the NAT protein-coding sequence identified in one organism to look for additional NAT paralogs in its genome. We discarded sequences with more than 200 missing nucleotides and only sequences with an identified ORF were kept in the final data set. Sequences from the Syrian hamster (Mesocricetus auratus) and the Chinese hamster (Cricetulus griseus) were obtained from GenBank (accession numbers: U05271 Mesocricetus auratus Nat1, L24912 Mesocricetus auratus Nat2, NM_001244413 Cricetulus griseus Nat2). Fine-scale synteny data around the NAT genes locus were retrieved from Genomicus v63.01  and the UCSC genome browser .
To name the NAT sequences identified in vertebrate species, we followed the official gene nomenclature for non-human NAT genes in eukaryotes and prokaryotes (http://mbg.duth.gr/non-humannatnomenclature/): names were given according to the descending identity between each reported NAT sequence and human NAT1. The symbol NAT1 was thus assigned either to the NAT sequence demonstrating the highest identity to human NAT1 or to the only NAT gene found in a specific genome. In rodents, the nomenclature followed the consensus for mouse Nat genes, with rodent Nat2 representing the sequence bearing higher identity to human NAT1 and rodent Nat1 being more similar to human NAT2.
Amplification, cloning and sequencing of NATs in primates
Based on available DNAs, we selected a representative number of primate specimens belonging to 13 distinct genera, including 3 hominoids, 8 Old World monkeys and 2 New World monkeys, for which the NAT homologous sequences were determined: Pan paniscus (bonobo), Hylobates lar (white-handed gibbon), Nomascus gabriellae (yellow-cheeked gibbon, Macaca sylvanus (barbary macaque), Mandrillus sphinx (mandrill), Cercopithecus diana (diana monkey), Chlorocebus tantalus (tantalus monkey), Erythrocebus patas (patas monkey), Allenopithecus nigroviridis (allen’s swamp monkey), Colobus guereza (guereza), Presbytis cristatus (silvered-leaf monkey), Pithecia pithecia (white-faced saki), and Cebus apella (tufted capuchin).
We designed amplification primers based on conserved sequences in the multiple alignments of database sequences retrieved from the genomes of Homo sapiens, Pan troglodytes, Gorilla gorilla, Pongo pygmaeus, Macaca mulatta, and Papio hamadryas. Separate alignments for sequences homologous to human NAT1, NAT2 and NATP were as considered. Each newly obtained sequence was added to the alignments to refine primer specificity. The list of all primers used in the present study is provided in Additional file 8: Table S4. PCR reactions were carried out in 25 μl final volume, containing 25 ng of genomic DNA, 10 mM of each primer, 7.5 mM MgCl2, 2.5 mM dNTP mix, 1U GoTaq® DNA polymerase (Promega), and H2O. PCR conditions were as follows: denaturation at 95°C for 10 min, followed by 38 cycles at 94°C for 1 min (denaturation), 55°C for 50 s (annealing temperature-variable depending on the set of primers used, see Additional file 9: Table S5) and 72°C for 2 min (elongation), with a final 10 min elongation step at 72°C. Depending on the primate species, different primer pairs and annealing temperatures were used to optimize PCR conditions (Additional file 6: Table S2). PCR products were purified from agarose gels with the QIAquick Gel extraction kit and were sequenced in both directions by Millegen company (Labège-France) on an ABI 3730XL capillary sequencer (Applied Biosystems). In the case of non-specific amplifications, PCR products were cloned using the pGEM®-T Easy Vector System (Promega), according to manufacturer’s instructions, and both strands of the cloned fragments were sequenced using the M13F and M13R universal primers.
Protein and codon sequence alignments
After discarding a few very divergent sequences that aligned poorly (Anolis carolinensis NAT3, Oryzias latipes NAT3, Fugu rubripes NAT1, Tetraodon nigroviridis NAT1), the remaining set of NAT ORF sequences were translated using the ExPASy Translate tool (http://web.expasy.org/translate/) and the amino acid sequences were aligned using ClustalW  and Multalin . Both programs yielded similar results. The amino acid alignments were then converted to the corresponding alignments of codons using the PAL2NAL program . These nucleotide alignments were used as inputs for the construction of phylogenetic trees, the analysis of gene conversion and tests of positive selection.
Phylogenetic trees were built from two nucleotide alignments: the first one included the 77 vertebrate NAT sequences retrieved from genomic databases (referred to as the ‘vertebrate dataset’) and the second one was composed of 58 primate sequences, including both the 17 database sequences and the 41 newly generated ones (referred to as the ‘primate dataset’). Phylogenetic relationships of NAT sequences were reconstructed using the maximum likelihood method implemented in PhyML v2.4.4 , after determining the optimal model of sequence substitution with Modeltest 3.04 . For both datasets, a general time reversible model with a proportion of invariant sites and gamma distributed among-site rate variation (GTR + I + G) was used, as selected by the Akaike information criterion. The initial tree was determined by neighbor-joining (BIONJ). Branch supports were estimated from 1,000 PhyML bootstrap replicates using the same settings. Trees were displayed and colored with Bonsai 1.2 (http://depts.washington.edu/jtlab/software/softwareIndex.html).
Detection of gene conversion
Possible recombination and gene conversion events were detected using GeneConv version 1.18 [66, 67]. The program tests for gene conversion by finding identical fragments between pairs of sequences in a nucleotide alignment. Global and pairwise P-values are calculated in order to assess the statistical significance of the observed fragment lengths. Global P-values are more conservative because they are based on the comparison of each fragment with all possible fragments for the entire alignment. GeneConv was run using a group structure defining a distinct group for each set of NAT coding sequences within a species so as to look only for within-species gene conversion events and adjust Bonferroni corrections accordingly. All the default settings were used except for the mismatch penalty (set to 2) and the number of random permutations to compute global and pairwise P-values (set to 1,000,000). Only gene conversion events with global permutation P-values ≤ 0.01 were retained. In order to avoid confounding effects due to strong selection or recent mutation, GeneConv was also run using only silent sites.
Intragenic recombination events were also detected using five other methods implemented in the RDP3 v.3.44 software package : two of these are phylogenetic methods, which infer recombination when different parts of the genome result in discordant topologies (RDP  and Bootscan ), while the other three are nucleotide substitution methods, which examine the sequences either for a significant clustering of substitutions or for a fit to an expected statistical distribution: MaxChi , Chimaera , and SiScan . Common settings for all methods were to consider sequences as linear, to require phylogenetic evidence, to polish breakpoints and to check alignment consistency. RDP was run with default settings modified to use internal and external reference sequences and a window size of 30 bp. Default settings were used for MaxChi and Chimaera analyses. Bootscan default settings were modified to perform 1000 bootstrap replicates with cutoff percentage set to 95%. SiScan was run using a window size of 200 bp and a step size of 50 bp. P-values from individual algorithms were Bonferroni corrected to account for both multiple testing within each method and for multiple tests by different methods. Only events with corrected P-values < 0.01 were considered and the detected events were filtered to eliminate predictions of interspecies recombination.
Maximum likelihood tests of positive selection
The ratio of nonsynonymous (amino acid altering) to synonymous (silent) substitution rates (dN/dS or ω) was used to evaluate the selective pressures acting on the NAT coding region, expecting dN/dS = 1 for neutrality, dN/dS < 1 for purifying selection, and dN/dS >1 for positive selection. The analysis was performed using the codeml tool of the PAML package version 4.2b [26, 74], which allows pairs of nested models with and without positive selection to be tested in a likelihood ratio framework to determine if adaptive evolution has occurred. We used two pairs of site-evolution models to test whether some sites (codons) are under positive selection: (1) the M1a (Nearly Neutral) and M2a (Positive Selection) models and (2) the M7 (beta) and M8 models (beta & ω). M1a allows sites to fall into two categories with ω < 1 (purifying selection) and ω = 1 (neutral evolution), whereas model M2a extends model M1a with a further class with ω > 1 (positive selection) [75, 76]. M7 allows ten classes of ω sites between 0 and 1 according to a beta distribution with parameters p and q, whereas M8 adds an additional class with ω possibly > 1 as M2a does. Candidate sites for positive selection were pinpointed using the BEB approach described by Yang et al. , which calculates the posterior probability that each codon site falls into a site class affected by positive selection (in models M2a and M8) and provides the posterior ω values which represent the strength of natural selection acting on each residue. The heterogeneity of the dN/dS ratio among lineages (branch-based test) was tested by comparing the free-ratio model, which assumes as many ω parameters as the number of branches in the tree, to the one-ratio null model M0 which assumes only one ω value for all branches . Finally, we used the branch-site test of selection developed by Zhang et al.  to detect episodic positive selection along prespecified lineages on a phylogeny that affects only a few sites in the protein-coding gene. By focusing on individual amino acid residues and particular lineages, this test should provide greater power than site-based tests, which average substitution rates over all branches on the phylogeny, or branch-based tests, which average over all codons in the gene [79–81]. In this test, the branch being tested for positive selection is called the foreground branch, and the other branches in the tree are called the background branches. In the null model, all codons in the foreground and background branches are assumed to evolve under negative selection (ω < 1) or neutrally (ω = 1), whereas in the alternative model A, the foreground branch also includes sites evolving under positive selection (ω > 1). As described above, the posterior probability that each site falls into the positively selected class was calculated using the BEB method. Branch-site tests were also carried out at the clade level by performing a multiple branch site analysis where all the branches within a clade are grouped as the foreground lineage and the remaining branches in the tree are grouped as background . This approach allows testing adaptive evolution in multiple branches while avoiding the assumptions of the clades analysis model (CmC, ) which has been shown to have a high false-positive rate and to be highly unreliable when faced with moderate among-site ω variation .
In all PAML analyses, model comparisons were made using LRTs. For each of the LRTs, twice the log-likelihood difference between alternative and null model (2ΔlnL) was compared to critical values from a χ2 distribution with degrees of freedom equal to the difference in the number of estimated parameters between both models . In branch-site tests, we used the conservative distribution to conduct the LRT and the Bonferroni correction was employed to control the type I error rate when comparing multiple foreground lineages. To avoid local optima and ensure that the global peak is found, each model was run three times with different ω starting values. The three runs always produced identical results. To identify codon sites under positive selection along specific branches of particular interest, we inferred ancestral sequences using baseml (model = HKY85, Mgene = 4; cleandata = 0) and identified nonsynonymous changes by pairwise comparison of sequences. In all PAML analyses, codons containing alignment gaps due to insertion-deletion events among species were treated as missing data and not considered in pairwise comparisons where one species contained sequence and another species did not. We used the F3 × 4 codon model of Goldman and Yang  which is very similar to the nucleotide model HKY85 with different substitution rates and base frequencies for the three codon positions. The analyses were applied to several different partitions of the data, as described in the Results section.
Bayes empirical Bayes
Likelihood ratio test
Open reading frame
The authors deeply thank Frederic Magne for helpful discussions and technical assistance, as well as the anonymous reviewers for their valuable comments and suggestions that have greatly improved the paper.
- Vagena E, Fakis G, Boukouvala S: Arylamine N-acetyltransferases in prokaryotic and eukaryotic genomes: a survey of public databases. Curr Drug Metab. 2008, 9: 628-660. 10.2174/138920008785821729.PubMedView ArticleGoogle Scholar
- Glenn AE, Karagianni EP, Ulndreaj A, Boukouvala S: Comparative genomic and phylogenetic investigation of the xenobiotic metabolizing arylamine N-acetyltransferase enzyme family. FEBS Lett. 2010, 584: 3158-3164. 10.1016/j.febslet.2010.05.063.PubMedView ArticleGoogle Scholar
- Martins M, Dairou J, Rodrigues-Lima F, Dupret J, Silar P: Insights into the phylogeny or arylamine N-acetyltransferases in fungi. J Mol Evol. 2010, 71: 141-152. 10.1007/s00239-010-9371-x.PubMedView ArticleGoogle Scholar
- Sim E, Walters K, Boukouvala S: Arylamine N-acetyltransferases: from structure to function. Drug Metab Rev. 2008, 40: 479-510. 10.1080/03602530802186603.PubMedView ArticleGoogle Scholar
- Sim E, Lack N, Wang CJ, Long H, Westwood I, Fullam E, Kawamura A: Arylamine N-acetyltransferases: structural and functional implications of polymorphisms. Toxicology. 2008, 254: 170-183. 10.1016/j.tox.2008.08.022.PubMedView ArticleGoogle Scholar
- Stanley LA, Sim E: Update on the pharmacogenetics of NATs: structural considerations. Pharmacogenomics. 2008, 9: 1673-1693. 10.2217/14622418.104.22.1683.PubMedView ArticleGoogle Scholar
- Hickman D, Risch A, Buckle V, Spurr NK, Jeremiah SJ, McCarthy A, Sim E: Chromosomal localization of human genes for arylamine N-acetyltransferase. Biochem J. 1994, 297: 441-445.PubMed CentralPubMedView ArticleGoogle Scholar
- Matas N, Thygesen P, Stacey M, Risch A, Sim E: Mapping AAC1 AAC2 and AACP the genes for arylamine N-acetyltransferases carcinogen metabolising enzymes on human chromosome 8p22 a region frequently deleted in tumours. Cytogenet Cell Genet. 1997, 77: 290-295. 10.1159/000134601.PubMedView ArticleGoogle Scholar
- Blum M, Grant DM, McBride W, Heim M, Meyer UA: Human arylamine N-acetyltransferase genes: isolation, chromosomal localization, and functional expression. DNA Cell Biol. 1990, 9: 193-203. 10.1089/dna.1990.9.193.PubMedView ArticleGoogle Scholar
- Boukouvala S, Fakis G: Arylamine N-acetyltransferases: what we learn from genes and genomes. Drug Metab Rev. 2005, 37: 511-564. 10.1080/03602530500251204.PubMedView ArticleGoogle Scholar
- Smelt VA, Upton A, Adjaye J, Payton MA, Boukouvala S, Johnson N, Mardon HJ, Sim E: Expression of arylamine N-acetyltransferases in pre-term placentas and in human pre-implantation embryos. Hum Mol Genet. 2000, 9: 1101-1107. 10.1093/hmg/9.7.1101.PubMedView ArticleGoogle Scholar
- Wakefield L, Cornish V, Long H, Griffiths WJ, Sim E: Deletion of a xenobiotic metabolizing gene in mice affects folate metabolism. Biochem Biophys Res Commun. 2007, 364: 556-560. 10.1016/j.bbrc.2007.10.026.PubMed CentralPubMedView ArticleGoogle Scholar
- Wakefield L, Boukouvala S, Sim E: Characterisation of CpG methylation in the upstream control region of mouse Nat2: evidence for a gene-environment interaction in a polymorphic gene implicated in folate metabolism. Gene. 2010, 452: 16-21. 10.1016/j.gene.2009.12.002.PubMedView ArticleGoogle Scholar
- Hein DW: Molecular genetics and function of NAT1 and NAT2: role in aromatic amine metabolism and carcinogenesis. Mutat Res. 2002, 506–507: 65-77.PubMedView ArticleGoogle Scholar
- Agúndez JAG: Polymorphisms of human N-acetyltransferases and cancer risk. Curr Drug Metab. 2008, 9: 520-531. 10.2174/138920008784892083.PubMedView ArticleGoogle Scholar
- Ladero JM: Influence of polymorphic N-acetyltransferases on non-malignant spontaneous disorders and on response to drugs. Curr Drug Metab. 2008, 9: 532-537. 10.2174/138920008784892038.PubMedView ArticleGoogle Scholar
- Patin E, Barreiro LB, Sabeti PC, Austerlitz F, Luca F, Sajantila A, Behar DM, Semino O, Sakuntabhai A, Guiso N: Deciphering the ancient and complex evolutionary history of human arylamine N-acetyltransferase genes. Am J Hum Genet. 2006, 78: 423-436. 10.1086/500614.PubMed CentralPubMedView ArticleGoogle Scholar
- Patin E, Harmant C, Kidd KK, Kidd J, Froment A, Mehdi SQ, Sica L, Heyer E, Quintana-Murci L: Sub-Saharan African coding sequence variation and haplotype diversity at the NAT2 gene. Hum Mutat. 2006, 27: 720-10.1002/humu.9438.PubMedView ArticleGoogle Scholar
- Fuselli S, Gilman RH, Chanock SJ, Bonatto SL, De Stefano G, Evans CA, Labuda D, Luiselli D, Salzano FM, Soto G: Analysis of nucleotide diversity of NAT2 coding region reveals homogeneity across Native American populations and high intra-population diversity. Pharmacogenomics J. 2007, 7: 144-152. 10.1038/sj.tpj.6500407.PubMed CentralPubMedView ArticleGoogle Scholar
- Luca F, Bubba G, Basile M, Brdicka R, Michalodimitrakis E, Rickards O, Vershubsky G, Quintana-Murci L, Kozlov AI, Novelletto A: Multiple advantageous amino acid variants in the NAT2 gene in human populations. PLoS One. 2008, 3: e3136-10.1371/journal.pone.0003136.PubMed CentralPubMedView ArticleGoogle Scholar
- Magalon H, Patin E, Austerlitz F, Hegay T, Aldashev A, Quintana-Murci L, Heyer E: Population genetic diversity of the NAT2 gene supports a role of acetylation in human adaptation to farming in Central Asia. Eur J Hum Genet. 2008, 16: 243-251. 10.1038/sj.ejhg.5201963.PubMedView ArticleGoogle Scholar
- Sabbagh A, Langaney A, Darlu P, Gérard N, Krishnamoorthy R, Poloni ES: Worldwide distribution of NAT2 diversity: implications for NAT2 evolutionary history. BMC Genet. 2008, 9: 21-PubMed CentralPubMedView ArticleGoogle Scholar
- Mortensen HM, Froment A, Lema G, Bodo JM, Ibrahim M, Nyambo TB, Omar SA, Tishkoff SA: Characterization of genetic variation and natural selection at the arylamine N-acetyltransferase genes in global human populations. Pharmacogenomics. 2011, 12: 1545-1558. 10.2217/pgs.11.88.PubMed CentralPubMedView ArticleGoogle Scholar
- Sabbagh A, Darlu P, Crouau-Roy B, Poloni ES: Arylamine N-acetyltransferase 2 (NAT2) genetic diversity and traditional subsistence: a worldwide population survey. PLoS One. 2011, 6: e18507-10.1371/journal.pone.0018507.PubMed CentralPubMedView ArticleGoogle Scholar
- Trepanier LA, Ray K, Winand NJ, Spielberg SP, Cribb AE: Cytosolic arylamine N-acetyltransferase (NAT) deficiency in the dog and other canids due to an absence of NAT genes. Biochem Pharmacol. 1997, 54: 73-80. 10.1016/S0006-2952(97)00140-8.PubMedView ArticleGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.PubMedView ArticleGoogle Scholar
- Wu H, Dombrovsky L, Tempel W, Martin F, Loppnau P, Goodfellow GH, Grant DM, Plotnikov AN: Structural basis of substrate-binding specificity of human arylamine N-acetyltransferases. J Biol Chem. 2007, 282: 30189-30197. 10.1074/jbc.M704138200.PubMedView ArticleGoogle Scholar
- Oda A, Kobayashi K, Takahashi O: Computational study of the three-dimensional structure of N-acetyltransferase 2-acetyl coenzyme a complex. Biol Pharm Bull. 2010, 33: 1639-1643. 10.1248/bpb.33.1639.PubMedView ArticleGoogle Scholar
- Dupret JM, Goodfellow GH, Janezic SA, Grant DM: Structure-function studies of human arylamine N-acetyltransferases NAT1 and NAT2; functional analysis of recombinant NAT1/NAT2 chimaeras expressed in Escherichia coli. J Biol Chem. 1994, 269: 26830-26835.PubMedGoogle Scholar
- Anisimova M, Bielawski JP, Yang Z: Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol. 2001, 18: 1585-1592. 10.1093/oxfordjournals.molbev.a003945.PubMedView ArticleGoogle Scholar
- Anisimova M, Bielawski JP, Yang Z: Accuracy and power of bayes prediction of amino acid sites under positive selection. Mol Biol Evol. 2002, 19: 950-958. 10.1093/oxfordjournals.molbev.a004152.PubMedView ArticleGoogle Scholar
- Westwood IM, Kawamura A, Fullam E, Russell AJ, Davies SG, Sim E: Structure and mechanism of arylamine N-acetyltransferases. Curr Top Med Chem. 2006, 6: 1641-1654. 10.2174/156802606778108979.PubMedView ArticleGoogle Scholar
- Walraven JM, Trent JO, Hein DW: Computational and experimental analyses of mammalian arylamine N-acetyltransferase structure and function. Drug Metab Dispos. 2007, 35: 1001-1007. 10.1124/dmd.107.015040.PubMed CentralPubMedView ArticleGoogle Scholar
- Butcher NJ, Boukouvala S, Sim E, Minchin RF: Pharmacogenetics of the arylamine N-acetyltransferases. Pharmacogenomics J. 2002, 2: 30-42. 10.1038/sj.tpj.6500053.PubMedView ArticleGoogle Scholar
- Sim E, Pinter K, Mushtaq A, Upton A, Sandy J, Bhakta S, Noble M: Arylamine N-acetyltransferases: a pharmacogenomic approach to drug metabolism and endogenous function. Biochem Soc Trans. 2003, 31: 615-619.PubMedView ArticleGoogle Scholar
- Sim E, Westwood I, Fullam E: Arylamine N-acetyltransferases. Expert Opin Drug Metab Toxicol. 2007, 3: 169-184. 10.1517/17425255.3.2.169.PubMedView ArticleGoogle Scholar
- Goldstone HM, Stegeman JJ: A revised evolutionary history of the CYP1A subfamily: gene duplication gene conversion and positive selection. J Mol Evol. 2006, 62: 708-717. 10.1007/s00239-005-0134-z.PubMedView ArticleGoogle Scholar
- da Fonseca RR, Antunes A, Melo A, Ramos MJ: Structural divergence and adaptive evolution in mammalian cytochromes P450 2C. Gene. 2007, 387: 58-66. 10.1016/j.gene.2006.08.017.PubMedView ArticleGoogle Scholar
- Goldstone JV, Goldstone HM, Morrison AM, Tarrant A, Kern SE, Woodin BR, Stegeman JJ: Cytochrome P450 1 genes in early deuterostomes (tunicates and sea urchins) and vertebrates (chicken and frog): origin and diversification of the CYP1 gene family. Mol Biol Evol. 2007, 24: 2619-2631. 10.1093/molbev/msm200.PubMedView ArticleGoogle Scholar
- Qiu H, Taudien S, Herlyn H, Schmitz J, Zhou Y, Chen G, Roberto R, Rocchi M, Platzer M, Wojnowski L: CYP3 phylogenomics: evidence for positive selection of CYP3A4 and CYP3A7. Pharmacogenet Genomics. 2008, 18: 53-66. 10.1097/FPC.0b013e3282f313f8.PubMedView ArticleGoogle Scholar
- Thomas JH: Rapid birth-death evolution specific to xenobiotic cytochrome P450 genes in vertebrates. PLoS Genet. 2007, 3: e67-10.1371/journal.pgen.0030067.PubMed CentralPubMedView ArticleGoogle Scholar
- Zawaira A, Matimba A, Masimirembwa C: Prediction of sites under adaptive evolution in cytochrome P450 sequences and their relationship to substrate recognition sites. Pharmacogenet Genomics. 2008, 18: 467-476. 10.1097/FPC.0b013e3282f9b68e.PubMedView ArticleGoogle Scholar
- da Fonseca RR, Johnson WE, O'Brien SJ, Vasconcelos V, Antunes A: Molecular evolution and the role of oxidative stress in the expansion and functional diversification of cytosolic glutathione transferases. BMC Evol Biol. 2010, 10: 281-10.1186/1471-2148-10-281.PubMed CentralPubMedView ArticleGoogle Scholar
- Yan J, Cai Z: Molecular evolution and functional divergence of the cytochrome P450 3 (CYP3) Family in Actinopterygii (ray-finned fish). PLoS One. 2010, 5: e14276-10.1371/journal.pone.0014276.PubMed CentralPubMedView ArticleGoogle Scholar
- Rodriguez-Lima F, Dupret JM: In silico sequence analysis of arylamine N-acetyltransferases: evidence for an absence of lateral gene transfer from bacteria to vertebrates and first description of paralogs in bacteria. Biochem Biophys Res Commun. 2002, 293: 783-792. 10.1016/S0006-291X(02)00299-1.View ArticleGoogle Scholar
- Nei M, Rooney AP: Concerted and Birth-and-Death Evolution of Multigene Families. Annu Rev Genet. 2005, 39: 121-152. 10.1146/annurev.genet.39.073003.112240.PubMed CentralPubMedView ArticleGoogle Scholar
- Michelson AM, Orkin SH: Boundaries of gene conversion within the duplicated human alpha-globin genes. Concerted evolution by segmental recombination. J Biol Chem. 1983, 258: 15245-15254.PubMedGoogle Scholar
- Rudikoff S, Fitch WM, Heller M: Exon-specific gene correction (conversion) during short evolutionary periods: homogenization in a two-gene family encoding the beta-chain constant region of the T-lymphocyte antigen receptor. Mol Biol Evol. 1992, 9: 14-26.PubMedGoogle Scholar
- Aguileta G, Bielawski JP, Yang Z: Gene conversion and functional divergence in the beta-globin gene family. J Mol Evol. 2004, 59: 177-189. 10.1007/s00239-004-2612-0.PubMedView ArticleGoogle Scholar
- Fawcett JA, Innan H: Neutral and non-neutral evolution of duplicated genes with gene conversion. Genes. 2011, 2: 191-209. 10.3390/genes2010191.PubMed CentralPubMedView ArticleGoogle Scholar
- Makarova SI: Human N-acetyltransferases and drug-induced hepatotoxicity. Curr Drug Metab. 2008, 9: 538-545. 10.2174/138920008784892047.PubMedView ArticleGoogle Scholar
- Lammer EJ, Shaw GM, Iovannisci DM, Finnell RH: Periconceptional multivitamin intake during early pregnancy genetic variation of acetyl-N-transferase 1 (NAT1) and risk for orofacial clefts. Birth Defects Res A Clin Mol Teratol. 2004, 70: 846-852. 10.1002/bdra.20081.PubMedView ArticleGoogle Scholar
- Jensen LE, Hoess K, Whitehead AS, Mitchell LE: The NAT1 C1095A polymorphism maternal multivitamin use and smoking and the risk of spina bifida. Birth Defects Res A Clin Mol Teratol. 2005, 73: 512-516. 10.1002/bdra.20143.PubMedView ArticleGoogle Scholar
- Carmichael SL, Shaw GM, Yang W, Iovannisci DM, Lammer E: Risk of limb deficiency defects associated with NAT1, NAT2, GSTT1, GSTM1, and NOS3 genetic variants, maternal smoking, and vitamin supplement intake. Am J Med Genet A. 2006, 140: 1915-1922.PubMedView ArticleGoogle Scholar
- Jensen LE, Hoess K, Mitchell LE, Whitehead AS: Loss of function polymorphisms in NAT1 protect against spina bifida. Hum Genet. 2006, 120: 52-57. 10.1007/s00439-006-0181-6.PubMedView ArticleGoogle Scholar
- Zhu Y, States JC, Wang Y, Hein DW: Functional effects of genetic polymorphisms in the N-acetyltransferase 1 coding and 3' untranslated regions. Birth Defects Res A Clin Mol Teratol. 2011, 91: 77-84. 10.1002/bdra.20763.PubMed CentralPubMedView ArticleGoogle Scholar
- Zang Y, Doll MA, Zhao S, States JC, Hein DW: Functional characterization of single-nucleotide polymorphisms and haplotypes of human N-acetyltransferase 2. Carcinogenesis. 2007, 28: 1665-1671. 10.1093/carcin/bgm085.PubMed CentralPubMedView ArticleGoogle Scholar
- Hein DW: N-acetyltransferase SNPs: emerging concepts serve as a paradigm for understanding complexities of personalized medicine. Expert Opin Drug Metab Toxicol. 2009, 5: 353-366. 10.1517/17425250902877698.PubMed CentralPubMedView ArticleGoogle Scholar
- Muffato M, Louis A, Poisnel C, Roest Crollius H: Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics. 2010, 26: 1119-1121. 10.1093/bioinformatics/btq079.PubMed CentralPubMedView ArticleGoogle Scholar
- Karolchik D, Hinrichs AS, Kent WJ: The UCSC Genome Browser. Curr Protoc Bioinformatics. 2009, Chapter 1: Unit14-Google Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMed CentralPubMedView ArticleGoogle Scholar
- Corpet F: Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 1988, 16: 10881-10890. 10.1093/nar/16.22.10881.PubMed CentralPubMedView ArticleGoogle Scholar
- Suyama M, Torrents D, Bork P: PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006, 34: W609-W612. 10.1093/nar/gkl315.PubMed CentralPubMedView ArticleGoogle Scholar
- Guindon S, Gascuel O: A simple fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52: 696-704. 10.1080/10635150390235520.PubMedView ArticleGoogle Scholar
- Posada D, Crandall KA: MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998, 14: 817-818. 10.1093/bioinformatics/14.9.817.PubMedView ArticleGoogle Scholar
- Sawyer S: Statistical tests for detecting gene conversion. Mol Biol Evol. 1989, 6: 526-538.PubMedGoogle Scholar
- Sawyer S: GENECONV: a computer package for the statistical detection of gene conversion. 1999, St Louis: Washington University, [http://www.math.wustl.edu/~sawyer/geneconv]Google Scholar
- Martin DP, Lemey P, Lott M, Moulton V, Posada D, Lefeuvre P: RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics. 2010, 26: 2462-2463. 10.1093/bioinformatics/btq467.PubMed CentralPubMedView ArticleGoogle Scholar
- Martin D, Rybicki E: RDP: detection of recombination amongst aligned sequences. Bioinformatics. 2000, 16: 562-563. 10.1093/bioinformatics/16.6.562.PubMedView ArticleGoogle Scholar
- Martin DP, Posada D, Crandall KA, Williamson CA: Modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res Hum Retroviruses. 2005, 21: 98-102. 10.1089/aid.2005.21.98.PubMedView ArticleGoogle Scholar
- Maynard Smith J: Analyzing the mosaic structure of genes. J Mol Evol. 1992, 34: 126-129.Google Scholar
- Posada D, Crandall KA: Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci USA. 2001, 98: 13757-13762. 10.1073/pnas.241370698.PubMed CentralPubMedView ArticleGoogle Scholar
- Gibbs MJ, Armstrong JS, Gibbs AJ: Sister-scanning: a Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics. 2000, 16: 573-582. 10.1093/bioinformatics/16.7.573.PubMedView ArticleGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.PubMedGoogle Scholar
- Yang Z, Nielsen R, Goldman N, Pedersen AM: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000, 155: 431-449.PubMed CentralPubMedGoogle Scholar
- Wong WSW, Yang Z, Goldman N, Nielsen R: Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 2004, 168: 1041-1051. 10.1534/genetics.104.031153.PubMed CentralPubMedView ArticleGoogle Scholar
- Yang Z, Wong WSW, Nielsen R: Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol. 2005, 22: 1107-1118. 10.1093/molbev/msi097.PubMedView ArticleGoogle Scholar
- Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998, 15: 568-573. 10.1093/oxfordjournals.molbev.a025957.PubMedView ArticleGoogle Scholar
- Zhang J, Nielsen R, Yang Z: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005, 22: 2472-2479. 10.1093/molbev/msi237.PubMedView ArticleGoogle Scholar
- Yang Z, Nielsen R: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 2002, 19: 908-917. 10.1093/oxfordjournals.molbev.a004148.PubMedView ArticleGoogle Scholar
- Yang Z, dos Reis M: Statistical properties of the branch-site test of positive selection. Mol Biol Evol. 2011, 28: 1217-1228. 10.1093/molbev/msq303.PubMedView ArticleGoogle Scholar
- Smith SA, Jann OC, Haig D, Russell GC, Werling D, Glass EJ, Emes RD: Adaptive evolution of Toll-like receptor 5 in domesticated mammals. BMC Evol Biol. 2012, 12: 122-10.1186/1471-2148-12-122.PubMed CentralPubMedView ArticleGoogle Scholar
- Bielawski JP, Yang Z: A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J Mol Evol. 2004, 59: 121-132.PubMedView ArticleGoogle Scholar
- Weadick CJ, Chang BS: An improved likelihood ratio test for detecting site-specific functional divergence among clades of protein-coding genes. Mol Biol Evol. 2012, 29: 1297-1300. 10.1093/molbev/msr311.PubMedView ArticleGoogle Scholar
- Goldman N, Yang ZA: Codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994, 11: 725-736.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.