A genomic view of the NOD-like receptor family in teleost fish: identification of a novel NLR subfamily in zebrafish

Background A large multigene family of NOD-like receptor (NLR) molecules have been described in mammals and implicated in immunity and apoptosis. Little information, however, exists concerning this gene family in non-mammalian taxa. This current study, therefore, provides an in-depth investigation of this gene family in lower vertebrates including extensive phylogenetic comparison of zebrafish NLRs with orthologs in tetrapods, and analysis of their tissue-specific expression. Results Three distinct NLR subfamilies were identified by mining genome databases of various non-mammalian vertebrates; the first subfamily (NLR-A) resembles mammalian NODs, the second (NLR-B) resembles mammalian NALPs, while the third (NLR-C) appears to be unique to teleost fish. In zebrafish, NLR-A and NLR-B subfamilies contain five and six genes respectively. The third subfamily is large, containing several hundred NLR-C genes, many of which are predicted to encode a C-terminal B30.2 domain. This subfamily most likely evolved from a NOD3-like molecule. Gene predictions for zebrafish NLRs were verified using sequence derived from ESTs or direct sequencing of cDNA. Reverse-transcriptase (RT)-PCR analysis confirmed expression of representative genes from each subfamily in selected tissues. Conclusion Our findings confirm the presence of multiple NLR gene orthologs, which form a large multigene family in teleostei. Although the functional significance of the three major NLR subfamilies is unclear, we speculate that conservation and abundance of NLR molecules in all teleostei genomes, reflects an essential role in cellular control, apoptosis or immunity throughout bony fish.


Background
In recent years, a family of molecules with roles in apoptosis and immune regulation has been discovered in mammalian genomes. This gene family, is known under several pseudonyms, including the CATERPILLER (CLR), NACHT, NOD-LRR or NOD-like receptor (NLR) family and is comprised of two major subfamilies of NOD and NALP molecules, along with 3 divergent members; IPAF, the MHC class II transactivator (CIITA) and neuronal apoptosis inhibitory protein (NAIP) [1,2]. Official names have been recently assigned to many members of this family by the HUGO Gene Nomenclature Committee (HGNC) [3] using the NLR prefix (Table 1). NLRs are recognized by the presence of three specific domains; an effector domain at the N-terminus that is involved in protein:protein interactions, a central NACHT (or nucleotide binding oligomerization/NOD) domain and a C-terminal leucine-rich repeat (LRR) domain. They are, therefore, structurally similar to disease resistance (R) proteins found in plants that are well known for their anti-microbial activities [4]. In humans, 22 NLRs have been described including 14 NALPs, with a PYRIN effector domain, and 5 NODs whose effector domain is typically a caspase recruitment domain (CARD).
The functions of the NLRs are presently not well defined. However, based on their structural characteristics, these molecules are thought to be expressed in the cytosol of immune-related cells, and have been implicated in autoimmune diseases and responses to bacterial [5] or viral molecules [6] supporting their importance in host immunity. Some of these molecules activate caspase-1 [7], while others initiate [8,9] or inhibit NF-κB signaling [10]. These two molecular pathways are fundamental to a molecular platform known as the inflammasome [7], which coordinates the production and processing of important inflammatory cytokines such as interleukin (IL)-1, IL-18 and IL-33 in mammals. Proteins that assemble in this caspase-1 inflammasome vary according to the cell type and stimulus [11]. Other molecules (e.g. caspase 11) that are necessary for inflammasome function are thought to be generated or recruited as a result of crosstalk between NLR and toll-like receptor (TLR) signaling [12]. According to current hypotheses, the activation of NLRs occurs following recognition of specific ligands by their LRR domains similar to the way that TLRs recognize molecules from extracellular pathogens. NLR proteins are, therefore, believed to represent cytosolic pattern recognition receptors (PRRs) that use LRR regions to detect intracellular pathogens. Those NLRs that are better defined functionally include NOD1, NOD2, and NALP3. NOD1 and NOD2, have both been shown to play a role in immunity of the mammalian gut and are highly expressed in epithelial cells or macrophages associated with the intestine. NOD1 recognizes a molecule known as meso-DAP (γ-D-glutamyl-meso-diaminopimelic acid), which is a peptidoglycan (PGN) component found in Gram-negative and Gram-positive bacteria [13], while NOD2 recognizes muramyl dipeptide, a peptidoglycan component found only in Gram-positive bacteria [14]. NALP3 (alias cryopyrin) has been shown to recognize a wide range of molecules, including bacterial RNA and synthetic viral RNA/DNA mimics (R837 and R848) [6]. NALP3 becomes activated in TLR-primed macrophages in response to ATP (adenosine triphosphate) and bacterial toxins that lower cytoplasmic K + [12], which is thought to be the major mechanism in the NALP3 response to certain Gram-positive bacteria. In a distinct pathway, monosodium urate and calcium pyrophosphate dihydrate crystals have been shown to increase caspase-1 activity in a NALP3-depend- ent (TLR-independent) manner [1], representing potential 'danger signal' ligands for NALP3 [5] and defining a further role for NLRs in recognizing cellular stress.
Members of the NLR family have not been extensively studied in taxa other than mammals, although recent reports indicate some members of this family exist in lower vertebrates [15] and in invertebrates [16]. Extending the knowledge of NLRs in ectotherms, this study reports an extensive overview of the NLR family in teleost fish, represented using information derived from the zebrafish Danio rerio. Here, we describe the gene phylogeny and expression of three major subfamilies of NLRs in teleostei, which we designate NLR-A and NLR-B (resembling mammalian NOD and NALP subfamilies respectively) and NLR-C, a large subfamily (characterized with a NOD-3like NACHT domain and an unusual C-terminal domain) that appears in all teleostei genomes, and is unique to bony fish. The implications of all three subfamilies in immune regulation of fish are discussed.

Results
Many NLR-like sequences were identified in the genome and EST databases of non-mammalian vertebrates. These genes were compared by phylogenetic analysis of their deduced NACHT domains (Fig 1). Mammalian NLRs are categorized into NOD and NALP families according to previous publications [1] and as depicted in Table 1. In the zebrafish genome, three distinct subfamilies were identified and highly supported by bootstrap analysis; some resembled mammalian NODs (designated subfamily A; Table 2; Fig 1C), some resembled mammalian NALPs (designated subfamily B; Table 3; Fig 1B) and some formed a unique clade, closely related to NOD3, which was restricted to teleostei (designated subfamily C; Table 4; Fig 1A).

Subfamily A
Teleost fish possess gene orthologs for all five members of the mammalian NOD subfamily ( Table 2; Table 5). While chicken and Xenopus genomes apparently lack the NOD2 gene (Table 5; Fig 1C), both are in possession of the remaining four NODs. The gene predictions for zebrafish NOD sequences were corrected using corresponding ESTs identified in the TIGR database, and missing sequence found with assistance from other fish NOD-like sequences using the BLAT (BLAST-like alignment tool) program. Following assembly, zebrafish NODs were highly structurally conserved relative to human NODs. Zebrafish NOD1 (NLR-A1) has an N-terminal CARD domain, and nine highly conserved leucine-rich repeats (LRRs) (Fig 2) DR-NLR-C38  DR-NLR-C46  DR-NLR-C33  DR-NLR-C7  DR-NLR-C61  DR-NLR-C20  DR-NLR-C21  DR-NLR-C53  DR-NLR-C3  DR-NLR-C45  DR-NLR-C62  DR-NLR-C28  DR-NLR-C18  DR-NLR-C25  DR-NLR-C31  DR-NLR-C44  DR-NLR-C50  DR-NLR- CDD, it shares some similarity with the human NOD3 effector domain with two conserved sequence signatures, MRK and EAG (amino acids 59-61 and 69-71 of DR-NLR-A3 respectively). The C-terminal end of NLR-A3 possessed 14 LRR motifs, which aligned exactly with LRR domains of human NOD3 [see Additional file 1], with a similar motif (CxxLxMxxNxF) between the NACHT domain and the first true LRR motif. Both NOD4 (NLR-A4) and NOD5 (NLR-A5) orthologs in zebrafish have conserved sequences within their LRR domains relative to their human equivalents, although the predicted LRR for zebrafish NLR-A4 is shorter than that of human NOD4, and less conserved relative other human and zebrafish NLR-A orthologs. The N-terminal domains for NLR-A4 and NLR-A5, as with NLR-A3, were not identified within the CDD database set, but share some conserved features with the corresponding regions of mammalian NOD4 and NOD5. NLR-A4/NOD4 appears to represent the most divergent gene within the NLR-A subfamily, yet zebrafish NLR-A4 groups with high bootstrap support with human NOD4 and its orthologs from chicken and Xenopus during phylogenetic comparisons ( Fig 1C). All five NLR-As are located on distinct chromosomes in zebrafish. The NLR-A1 gene is located on chromosome 16 in version 6 of the zebrafish genome (Zv6) ( Table 2), but is not mapped to a chromosome in version 7 (Zv7) [see Additional file 2], NLR-A2 resides at chromosome 7 and NLR-A3, NLR-A4 and NLR-A5 can be found on chromosomes 24, 18 and 15 respectively.

Subfamily B
Six distinct genes encoding NACHT domains were identified in Zv6 that belong to subfamily B and form a separate cluster within the clade of mammalian NALPs. Although zebrafish NLR-B2 and B3 were identified in distinct regions of the zebrafish genome (Table 3) these genes are identical in the region of the NACHT domain used for phylogenetic analysis. Several NALP-like sequences were also identified for Xenopus tropicalis ( Table 5) that similarly formed their own cluster distinct from the human and zebrafish NALPs (Fig 1B). Gene predictions encoding putative NALPs in zebrafish are short, with most lacking a recognizable effector domain and C-terminal LRR domain. One exception is NLR-B2, which appears to have an N-terminal region with low similarity to a CARD motif as identified by searching the CDD. Only one cDNA sequence resembling this subfamily could be identified in the zebrafish EST database [GenBank:AI883819] that, although highly similar in sequence, was not an exact match to any of the predicted NLR-B genes and appeared to encode only a portion of the NACHT domain. NLR subfamily B genes appear to be restricted to small clusters on chromosomes 2 and 15 in zebrafish. Furthermore, NLR-B5 and NLR-B6 reside close (28.3-28.5 m) to NLR-A5 (32 m) on chromosome 15. Later analysis of Zv7 revealed removal of the NLR-B5 gene prediction, and its merger with the prediction for NLR-B6 [see Additional file 2].  c NLR-A4 is also encoded within GENSCAN prediction GENSCAN00000002276.

Subfamily C
Database searches revealed multiple genes that possessed NACHT domains and shared significant homology to human NOD3 yet were distinct from the zebrafish NOD3 molecule (NLR-A3) described above. This large number of highly similar genes clearly arose from several gene(ome) duplication events. Several hundred predicted genes/proteins were observed for this group in the databases for all teleost fish (data not shown). In zebrafish, these genes were found at numerous chromosomal loci, with large clusters evident on (at least) chromosomes 1, 4, 14 and 17. A small selection of these genes was subjected to fur-ther analysis (Table 4 and Fig 1A). These molecules divided into three clusters during phylogenetic analysis, which also corresponded to sequence differences identified in the N-terminal region. Representatives from chromosome 14 were identified in all three clusters, while NLRs from some other chromosomes (e.g. chromosomes 12 and 17) were restricted to one cluster, although not all genes were included in the analysis.
Although the NACHT domains of the C-group NLRs are clearly homologous to NOD3, many of these genes were found to encode a conserved PYRIN domain at the N-terminus (Figs 2, 3) showing some analogy to mammalian NALP genes. The presence of this domain was confirmed by identifying an EST sequence containing the PYRIN domain and a partial NACHT domain that resembled Cgroup NLRs. Other C-group NLRs had N-terminal sequences with no obvious gene ortholog. While some are likely incorrectly predicted domains, EST sequences confirm at least two of these predicted N-termini are transcribed in association with the NLR C-group NACHT domain (Fig 4). The NLR C-group molecules also possess an LRR region, as with other NLRs. Unexpectedly, a B30.  5) [see Additional file 3] although, due to the high number of closely related sequences for this subfamily and the current stage of the zebrafish genome sequence, it was not possible to ascertain whether the overlapping ESTs were generated from the same gene or distinct genes within the NLR-C family.

Other NLRs
Although the CIITA was evident in the genomes of the pufferfishes, frog and chicken, this molecule was not readily identifiable in zebrafish Zv6. However, later analyses identified a CIITA-like gene, in Zv7 of the zebrafish genome, which resides on Chromosome 3 at approximate position 24.3 m (Zv7_scaffold 244). Sequences for NAIP were not identified in lower vertebrates during this study. IPAF was identified in the frog genome (Table 5), but not in the other non-mammalian genomes. A recently described family of NLR-like genes from the sea urchin was found to cluster with mammalian IPAF and NAIP molecules during phylogenetic analyses (data not shown).

Expression of zebrafish NLRs
The spatial expression of NLR genes was evaluated in selected tissues from naïve zebrafish. NLR-A1, -A2, -A3 -A4 and -A5 were all identified in zebrafish intestine using RT-PCR. All five genes were also expressed in liver although expression of NLR-A2 was extremely weak. NLR-A3 expression was not detected in the spleen following 35 PCR cycles, but the four other NLR-A genes were expressed in this tissue (with low expression of splenic NLR-A5 in one individual). As a representative of the NLR-B subfamily, NLR-B2 expression was investigated and detected in all three tissues. Similarly, mRNA was detected for an NLR-C gene(s) in all three tissues using primers based on the completely sequenced EST clone. The primers used to detect these genes amplified no products in control reac-tions whose templates were sterile water (not shown) or from cDNA syntheses performed in the absence of reverse transcriptase (RT-). ARP was amplified from all tissues confirming the integrity of the cDNAs and the success of RT-PCR. Genomic products were amplified for all NLR-A genes that were larger than the cDNA amplicons and supported the presence of intron(s) between the primer regions.

Discussion
New insight into the regulation of essential developmental, inflammatory and apoptotic pathways was achieved with the discovery and characterization of the NLR gene family of putative cytoplasmic pattern recognition molecules. While an increasing amount of information exists for these molecules in mammals, this gene family is poorly studied in other vertebrates with little to no information available even at the gene level for birds, amphibians and fish. This study resolves this issue by identifying and characterizing many NLR-like genes from these three classes of animals and uncovering a unique subfamily of NLRs in teleost fish.
Our evidence shows early evolution and high conservation of the NOD (NLR-A) subfamily of NLRs. All species of teleost fish that were analyzed had five distinct members of this subfamily designated NLR-A1, NLR-A2, NLR-A3, NLR-A4 and NLR-A5 that were clear gene orthologs of human NOD1 to NOD5 [1]. A NOD1 ortholog was also described during an earlier screen of zebrafish ESTs for molecules similar to apoptosis regulators [17]. In addition to encoded NACHT domains, the effector domains and LRR regions were highly conserved in the fish NLR-A genes relative to their human equivalents, suggesting retained function. NLR-A1 and NLR-A2, the fish orthologs of human NOD1 and NOD2 respectively, both possessed clear CARD domains (one in NLR-A1 and two in NLR-A2) with high amino acid identity to the equivalent regions of human molecules. In mammalian NOD1 (and presumably NOD2), the CARD domains are necessary for the interaction with RICK kinase, an enzyme that participates in NFκB activation and, ultimately, the generation of proinflammatory molecules [18]. Since RICK is also present in fish genomes (see zebrafish RIPK2, Q4V958), it would appear that this inflammatory cascade was established prior to the divergence of teleost fish from the tetrapod lineage, assuming that the same interaction occurs between these molecules in fish. The highly conserved sequences in the LRR domains implies these zebrafish NLR-A1 and NLR-A2 may also be able to recognize meso-DAP and muramyl dipeptide as mammalian NOD1 and NOD2 respectively [13,14] although this requires formal confirmation. NLR-A1 transcript was detected equally in intestine, liver and spleen, reflecting the wide-spread distribution observed for murine NOD1 [18], while NLR-A2 was strongly expressed in intestine, with some expression in spleen and barely detectable levels in liver. Similar to the highest expression of NLR-A2 in zebrafish intestine, human NOD2 has a more restricted expression pattern, with predominant expression in cells of myeloid origin including monocytes [9] and Paneth cells [19] that are associated with the gut, although expression of NOD2 can also be induced in epithelial cells [20]. Zebrafish NLR-A3 Many N-terminal effector domains of the predicted zebrafish NLR-C molecules are recognized as pyrin/PAAD-DAPIN domains based on the HMM logo

B.
A. Schematic representation of the approximate positions of NLR encoding EST sequences relative to predicted NLR-C pro-teins  TC353471   TC326097  DN899146   TC343155  TC318670  TC347049   TC337542  TC347049  TC343140  TC339592  TC359229  TC363713   CLR-C  N-TERM  NACHT  LRR  B30.2 Length (amino acids) cDNA accession number N1 N2 N3 is clearly an ortholog of mammalian NOD3, with similarity in the effector and NACHT domains and an equal number of LRR domains. At the genomic level, NOD3 is flanked by RHOT2, SBK1 and PDPK1, GNPTG respec-tively in zebrafish and fugu further supporting the orthologous relationship for NOD3 between fish species. Expression of NLR-A3 was strong in zebrafish intestine, with some expression also in liver and little to no expression observed in the spleen. Interestingly, the kidney (bone marrow equivalent) did not express NLR-A3 as well suggesting that it is not expressed by lymphocytes (data not shown). In mammals, NOD3 expression occurs primarily in lymphocytes and is attributed to inhibition of Tcell activity [21]. Two other NLR-A subfamily members were also identified in zebrafish that were designated NLR-A4 and NLR-A5 with NLR-A4 resembling human NOD4 and NLR-A5 being highly conserved to human NOD5. NLR-A4/NOD4 genes represent the most divergent members of this subfamily based on amino acid conservation within the N-terminal and LRR regions between different vertebrate orthologs. Both NLR-A4 and NLR-A5 genes were constitutively expressed in intestine, spleen and liver of naïve zebrafish, although there is clearly some fish to fish variation preventing their detection in some individuals under the conditions used for RT-PCR. Currently, there is no information concerning the expression patterns or functions of these latter two NLRs in mammals.

gagcctcagaggaggaatatttcaggaagagaatcagcgacgagcgtcgagcagcagaatcatctcccacatcagaagagcaagaagcctccagatcatg S L R G G I F Q E E N Q R R A S S S R I I S H I R R A R S L Q I M 10tgccacatacccgtcttctgctggatctcctccactgtgcttcagaagctcctggaagaagatgtgagtgcagaaatccctcaaactctgactgagatgt 34 C H I P V F C W I S S T V L Q K L L E E D V S A E I P Q T L T E M Y 20acatccacttcctgctgattcagatcaacatgaggaagcagaagtatgaagagagagatccagagaaactcctgcaggccaacagaggagtgatcctcaa 68 I H F L L I Q I N M R K Q K Y E E R D P E K L L Q A N R G V I L K 30acttgctgaagtggctttcagacagctgatgaagggcaatgtgatgttctatgaggaggacctgattgagagcggcatagacgtcactgacgcctcggtg 10L A E V A F R Q L M K G N V M F Y E E D L I E S G I D V T D A S V 40tattctgggatctgcactgagatcttcaaggaggaatctgtgattcatcagaggaaagtctacagcttcatccatctcagctttcaggagtttctggctg 134 Y S G I C T E I F K E E S V I H Q R K V Y S F I H L S F Q E F L A A 50ctttctttgtgttttactgctatttaacaaagaacaaagaggagctggagtggatttattacagtggatatttgtataatcatcatctacagttgataca 168 F F V F Y C Y L T K N K E E L E W I Y Y S G Y L Y N H H L Q L I Q 60gttgttacagtttaagaatgctctgtttaatctgctcagatcagcagtagataaagctgtggagagcagtactggtcatctggatctgttcctgcggttc 20L L Q F K N A L F N L L R S A V D K A V E S S T G H L D L F L R F 70ctgctgggcatctccctggagtccaatcagagactcttccaggatctgctgccacacacagagaagagctcaaagagcatcaggaggagcacacagtaca 234 L L G I S L E S N Q R L F Q D L L P H T E K S S K S I R R S T Q Y I 80ttaaagagatgatcaagagagatgatgatctcccagttgacagatccatcaatctgttcctctgtctgctggaggtgaaagatcagactctggccagaga 268 K E M I K R D D D L P V D R S I N L F L C L L E V K D Q T L A R E 90gattcaggagtttgtgaaatcagacaaacactcagaggagaaactcactcctgctcactgctcaacaatctcctacatgattgagatgtcagaggagccg 30I Q E F V K S D K H S E E K L T P A H C S T I S Y M I E M S E E P 100ctggatgagttagactctaataaattcaacacatcagatgagggaagacggagactgataccagctgtgaggaactgcaggagagctctgctacaggact 334 L D E L D S N K F N T S D E G R R R L I P A V R N C R R
Whereas NOD1, NOD3, NOD4 and NOD5 appear to be conserved in bird and amphibian genomes, the gene for NOD2 was identified in neither the chicken nor the frog genomes. This would suggest that NOD2 has been deleted from the genomes in these species, although the genome of Xenopus tropicalis is, at present, incomplete. This is surprising since NOD2, in mammals, appears to be a highly important sensor for intracellular microbial molecules. However, chickens do possess a NALP3 ortholog (see below) representing another potential PRR for muramyl dipeptide [22] and may functionally replace NOD2 in this species.
Members of the NALP subfamily are also evident in lower vertebrates. Six genes were identified in zebrafish (Zv6) for NALP-like molecules (NLR-B1 to -B6), and ten predicted NALP-like genes (nicknamed NALPa to NALPj) were found for Xenopus. These genes clustered separately for each species, suggesting recent duplication events formed the NALP subfamilies independently in fish, amphibians and mammals. The closest human ortholog of the amphibian and fish NALPs appears to be NALP6. A single NALP-like sequence predicted in the chicken genome (ENSGALG00000005155) and in the Uniprot database (Q5F3J4) clusters closest to the group of human NALPs 1, 3, 10, and 12 when analyzed phylogenetically (although not with strong bootstrap support) and has recently been given the name NLRP3 (previously designated CIAS1/NALP3). Chicken NALP was identified on chromosome 5, separate to the chicken NOD5 gene (chro-RT-PCR analysis of the NLR gene family in intestine (1 and 2), liver (3 and 4) and spleen (5 and 6) of two individual naïve zebrafish Figure 5 RT-PCR analysis of the NLR gene family in intestine (1 and 2), liver (3 and 4) and spleen (5 and 6) of two individual naïve zebrafish. ARP expression was amplified to verify cDNA synthesis. Negative controls were performed using templates from cDNA synthesis reactions without reverse transcriptase (7). Genomic DNA (8) was amplified to verify primer efficiency and to show the difference in size of genomic amplicons compared to cDNA amplicons. In addition to the NOD-and NALP-like subfamilies, a unique subfamily of NLRs was identified in teleost fish, and designated NLR subfamily C (NLR-C). This subfamily is interesting for several reasons. Firstly, all teleostei genome (and EST) databases show numerous NLR-C genes, amounting to several hundred of these genes in a single species. Secondly, these genes all possess a central NACHT domain that is highly similar to the NACHT domain of NOD3 (NLR-A3) suggesting they evolved from a NLR-A3-like molecule, yet many of these genes possess a PYRIN domain at their N-terminus making them more structurally similar to mammalian NALP molecules. Finally, following the LRR domain many of these molecules (representatives found in all bony fish) possess a B30.2 (PRY-SPRY) domain, which may allow them to interact with distinct molecules to standard NLRs and thus perform some novel function. B30.2 domains are also found on some tripartite motif containing (TRIM) proteins [23] and on the PYRIN molecule [24] (Fig 2) and have several roles related to immunity. TRIM5a has been shown to inhibit retroviral activity by directly binding the capsid of the HIV retrovirus [25], and PYRIN has been shown to inhibit the activity of caspase-1 by directly binding to the active site of this enzyme [24], both using their B30.2 domains for these interactions. Each of these functions would fit with the role of NLRs as intracellular PRRs; the ability to bind viruses could be an extension of the pattern detection system attributed to the neighboring LRR domain, while the potential to inhibit caspase-1 activity may make NLR-C molecules important negative regulators of the inflammasome in teleost fish. The latter function would reflect gene families of cell surface receptors such as killer immunoglobulin-like receptors (KIRs) [26] or novel immune-type receptors (NITRs) [27] that possess many inhibitory receptors and a small number of stimulatory receptors for controlling cellular activation. It is also interesting that these molecules all contain a NACHT domain similar to NOD3, since mammalian NOD3 has an inhibitory role in T cells [21]. Importantly, since the predicted N-and C-termini of some NLR-Cs are structurally similar to the two domains of the PYRIN molecule, this would also fit with a potential function of mimicking PYRIN. However, additional studies are required to determine what, if any, role in the immune system NLR-C molecules may play.

NLR-A1
The evolutionary processes generating the vast subfamily of NLR-C genes are not clear and appear very complex. The relationships are further confused by apparent errors in the assembly of the zebrafish genome (Zv6 versus Zv7), as evidenced by clear differences in the mapping of some NLR-C genes to their predicted chromosomes between assembly versions [see Additional file 2]. However, evidence suggesting tandem duplications of individual genes within a chromosomal locus is consistent between Zv6 and Zv7, which result in NLR-C genes adopting new exons encoding distinct N-terminal domains and/or C-terminal domains via exon-shuffling. The clusters of tandem NLR-C genes appear to have undergone en bloc duplication, to generate further clusters in the same locus (cis duplication), or within distinct loci or chromosomes (trans duplication) through translocation. Single genes may also have duplicated independently multiple times, within established loci and to create new loci, prior to and following formation of new gene structures. A large scale duplication of this gene family may be explained by the teleostspecific genome duplication event (3R) occurring early in the evolution of teleost fish, which followed two rounds of complete genome duplication (2R) observed early in the evolution of the vertebrate lineage [28]. Should this be the case, mutations and deletions of many of the duplicated genes would be expected, to remove redundancy from the genome [29]. Therefore, many NLR-C genes may be non-functional genes or pseudogenes, although a small number have likely established new functions. Clearly, it is too early to assign functionality to these genes, except to note that many are transcribed and are presumably translated into protein products. Transcripts for NLR-C were detected, in this study, in three distinct tissues in naïve zebrafish, and many more can be identified in the EST databases for this fish species.
A CIITA-like gene was identified in the zebrafish genome (Zv7) and is an important molecule, in mammals, for controlling the expression of both major histocompatibility complex class I and class II molecules and therefore is significant for antigen presentation to T lymphocytes. Defects in human CIITA gene expression have been linked to several immune disorders [5]. However, alternative molecules have been implicated in the induction of antigen presentation pathways [30], including other members of the NLR family, such as NALP12 [31]. NAIP/IPAF homologs have been identified in the sea urchin [16] implying that the ancestral NLR resembled one of these molecules. However, neither NAIP nor IPAF was identified in the fish genomes at this time, although IPAF was evident in the frog genome, suggesting that the genes for these molecules may have been lost from the fish genomes during the teleost-specific genome duplication event.

Conclusion
In summary, the NLR gene family contains several members in all vertebrates, and at least one prototypical gene must have existed prior to the evolution of vertebrates. Clearly, there are some losses and gains of NLR genes in the genomes of distinct species thus shaping unique rep-ertoires of these molecules throughout vertebrates and invertebrates. Although there are still many members of the NLR family that require functional characterization, their implication as regulators of immunity is highly intriguing and warrants future investigation.

Identification of NLRs in non-mammalian vertebrates
The amino acid sequences for human NLRs were obtained from UNIPROT (Release 9.0) [32]. These are listed in Table 1, with recently defined nomenclature assigned at HGNC [3]. Predicted genes for non-mammalian NLRs were identified in the UNIPROT database and at ENSEMBL [33] for chicken Gallus gallus, pipid frog Xenopus tropicalis, Japanese pufferfish Fugu rubripes, green spotted pufferfish Tetraodon nigroviridis and zebrafish Danio rerio, by using the BLAST algorithm to search for sequences with similarity to human NLRs [34]. Gene predictions were also identified in ENSEMBL by keyword searches for NACHT, PYRIN or CARD domains. EST sequences for these species were identified by BLASTbased searching the TIGR gene indices [35] or the "other vertebrate EST" section of GenBank at NCBI, and used to confirm and correct the gene predictions. The unique NACHT-LRR-B30.2 arrangement for NLR-C was determined by completely sequencing zebrafish EST CK126487 [GenBank:EF613347]. The EST was obtained from the American Type Culture Collection (ATCC) (Image number 7049223) and sequencing was carried out in-house using an ABI 3030 automated sequencer, univer-  sal (SP6/T7) and gene specific primers (located in Table 6) and BigDye V3.1.
Chromosomal locations for the NLRs were deduced by matching the translated NLR sequences against the genomes using BLAT [36] at the UCSC Genome Browser database [37]. Specific domains within the zebrafish NLRs were confirmed by searching the Conserved domain database (CDD v 2.09) [38] at NCBI, by comparison to the PFAM hidden Markov Model (HMM) logos [39] and by direct comparison to putative mammalian orthologs.

Phylogeny of NLRs
The phylogenetic relationships between zebrafish NLRs and human NLRs were predicted using both the maximum evolution and neighbor-joining methods within the MEGA 3.1 program [40]. Partial amino acid sequences from the NACHT domain (from regions corresponding to the GxxGxGKS motif to the FAAFY sequence signature of human NOD2) were used in the analyses as this region was clearly identified in all NLRs. Further analysis of the NLR-A and NLR-B subfamilies including frog and chicken NLRs were performed using the same methods. All trees were constructed from CLUSTALW generated alignments [41], using Poisson correction, complete deletion of gaps, and bootstrapped 1000 times.

Expression of zebrafish NLRs
The specific expression patterns of the NLR gene family in zebrafish tissues were investigated. Zebrafish (Ekwill strain) were obtained from Ekwill Fish Farm, FL and reared in sand-filtered and UV-treated freshwater at a constant temperature of 24°C. Fish were fed a daily ration of adult zebrafish diet (Zeigler). Genomic DNA was extracted from fin tissue using the DNeasy extraction kit (Qiagen) following manufacturer's instructions. The spleen, liver and intestinal tissues were removed from two individuals and RNA was extracted using the RNeasy RNA extraction kit with in-column DNAse treatment (Qiagen) following manufacturer's instructions. Total RNA was purified and cDNA was synthesized as previously described [42]. A control, containing liver RNA but lacking reverse transcriptase, was also synthesized. The 20 µL cDNA synthesis reactions were diluted to a final volume 100 µL and stored at -20°C until use. PCR amplifications were performed in a 25 µl final reaction volume containing 2 µL of diluted cDNA, reagents from the Taq core PCR kit (Qiagen) and 12.5 pM of each primer. Primer pairs used to detect transcripts for each NLR gene are listed in Table 6 with their sequences. Cycling conditions for all amplifications consisted of 95°C for 3 min, 35 cycles of 94°C for 30 sec, 55°C for 30 sec and 72°C for 1 min, followed by final extension of 10 min at 72°C. Amplified products were subjected to electrophoresis on a 3% agarose gel and visualized by ethidium bromide staining.