Origin and evolution of the RIG-I like RNA helicase gene family
© Zou et al. 2009
Received: 09 December 2008
Accepted: 28 April 2009
Published: 28 April 2009
Skip to main content
© Zou et al. 2009
Received: 09 December 2008
Accepted: 28 April 2009
Published: 28 April 2009
The DExD/H domain containing RNA helicases such as retinoic acid-inducible gene I (RIG-I) and melanoma differentiation-associated gene 5 (MDA5) are key cytosolic pattern recognition receptors (PRRs) for detecting nucleotide pathogen associated molecular patterns (PAMPs) of invading viruses. The RIG-I and MDA5 proteins differentially recognise conserved PAMPs in double stranded or single stranded viral RNA molecules, leading to activation of the interferon system in vertebrates. They share three core protein domains including a RNA helicase domain near the C terminus (HELICc), one or more caspase activation and recruitment domains (CARDs) and an ATP dependent DExD/H domain. The RIG-I/MDA5 directed interferon response is negatively regulated by laboratory of genetics and physiology 2 (LGP2) and is believed to be controlled by the mitochondria antiviral signalling protein (MAVS), a CARD containing protein associated with mitochondria.
The DExD/H containing RNA helicases including RIG-I, MDA5 and LGP2 were analysed in silico in a wide spectrum of invertebrate and vertebrate genomes. The gene synteny of MDA5 and LGP2 is well conserved among vertebrates whilst conservation of the gene synteny of RIG-I is less apparent. Invertebrate homologues had a closer phylogenetic relationship with the vertebrate RIG-Is than the MDA5/LGP2 molecules, suggesting the RIG-I homologues may have emerged earlier in evolution, possibly prior to the appearance of vertebrates. Our data suggest that the RIG-I like helicases possibly originated from three distinct genes coding for the core domains including the HELICc, CARD and ATP dependent DExD/H domains through gene fusion and gene/domain duplication. Furthermore, presence of domains similar to a prokaryotic DNA restriction enzyme III domain (Res III), and a zinc finger domain of transcription factor (TF) IIS have been detected by bioinformatic analysis.
The RIG-I/MDA5 viral surveillance system is conserved in vertebrates. The RIG-I like helicase family appears to have evolved from a common ancestor that originated from genes encoding different core functional domains. Diversification of core functional domains might be fundamental to their functional divergence in terms of recognition of different viral PAMPs.
Pattern recognition receptors (PRRs) are crucial to animal surveillance of pathogen invasion. The PRRs recognise conserved pathogen-associated molecular pattern (PAMP) motifs, including proteins, lipids and nucleotides, resulting in activation of host innate defences . The PRRs comprise three major groups, toll like receptors (TLR), retinoic acid induced RIG-I like receptors and nucleotide oligomerization domain (NOD) containing proteins, sensing PAMPs extracellularly or within the cytoplasmic region.
The RIG-I like receptors are crucial to the RNA virus triggered interferon response. They consist of three members, retinoic acid-inducible gene I (RIG-I, also named DEAD (Asp-Glu-Ala-Asp) box polypeptide 58 (DDX58)) and melanoma differentiation-associated gene 5 (MDA5, also named interferon induced with helicase C domain 1 (IFIH1)), and laboratory of genetics and physiology 2 (LGP2, also named DExH (Asp-Glu-X-His) box polypeptide 58 (DHX58)), which share a common functional RNA helicase domain near the C terminus (HELICc) specifically binding to the RNA molecules with viral origin [2–4]. Two tandem arranged caspase activation and recruitment domains (CARDs) involved in protein-protein interactions are present at the N terminal region of the RIG-I and MDA5 proteins but not LGP2, triggering the interferon response via activation of interferon regulatory factor 3 and NFkB [3, 5]. Another distinct core domain is the ATP dependent DExD/H domain containing a conserved motif Asp-Glu-X-Asp/His (DExD/H) which is involved in ATP-dependent RNA or DNA unwinding. RIG-I/MDA5 directed interferon signalling is now known to be controlled by the mitochondria antiviral signalling protein (MAVS), a CARD containing protein associated with mitochondria, and negatively regulated by LGP2 which lacks a CARD domain [4, 6, 7]. LGP2 has been shown to interfere with the binding process of RIG-I/MDA5 to viral RNAs .
Both RIG-I and MDA5 appear to have overlapping binding properties with viral PAMPs and share similar signalling pathways leading to activation of the interferon system. However, evidence of differential recognition of viral PAMPs by RIG-I has begun to emerge recently. It seems that MDA5 preferentially binds long, capped di- or mono-5' phosphate double stranded (ds) RNAs whilst RIG-I has high binding affinity with short dsRNAs or 5' ppp uncapped single stranded (ss) RNAs [9–11]. Interestingly, neither RIG-I nor MDA5 has a classic RNA binding motif. A zinc-binding domain located at the C terminal region (802–925 aa) of human has been shown to specifically bind to viral derived 5'ppp RNA [12, 13]. RIG-I and MDA5 respond differently to infection with various viral strains, with RIG-I sensitive to paramyxoviruses, orthomyxoviruses, and the rhabdovirus vesicular stomatitis virus whilst MDA5 reacts to picornaviruses [11, 14]. Some viral proteins, such as the V protein of paramyxoviruses, interact with MDA5, a mechanism possibly used by viruses as a means to escape host surveillance.
Whilst most studies are focused on the RIG-I like PRRs in mammals, little is known about such molecules in other living organisms. A recent study surveying the purple sea urchin genome has revealed multiple putative RIG-I like homologues that appear to be present in invertebrates . More recently, it has been hypothesised that MDA5 might have emerged before RIG-I and their domain arrangement evolved independently by domain grafting rather than by a simple gene duplication event . In this study, we took a comparative genomics approach by analysing RIG-I like PRRs in a number of invertebrate and vertebrate genomes, in order to elucidate the origin and evolution of the RIG-I like PRR family. Bioinformatic analysis of functional domains of RIG-I, MDA5 and LGP2 has identified two evolutionary conserved domains in MDA5 and LGP2 which may be critical to the recognition and processing of viral nucleotide PAMPs.
Sequence information of homologues of RIG-I, MDA5, LGP2, DICER and eIF4A in vertebrates and invertebrates
ENSEMBL prediction ID
Identity/similarity to human homologue
Sea urchin DICER
Caenorhabditis elegans DRH1
Caenorhabditis elegans DRH2
Caenorhabditis elegans DRH3
Caenorhabditis elegans DCR1
Jewel wasp DICER1
Sclerotinia sclerotiorum eIF4A
Botryotinia fuckeliana eIF4A
Invertebrate RIG-I like genes
Sea urchin RIG-I (LOC591972)
Sea urchin RIG-I (LOC575036)
Sea urchin RIG-I (LOC767124)
Sea urchin RIG-I (LOC574972)
Sea urchin RNA helicase (LOC593153)
Sea urchin RNA helicase (LOC583008)
Sea urchin RIG-I
Sea urchin RNA helicase (LOC578749)
Sea urchin RIG-I (LOC582062)
Sea urchin RIG-I (LOC577076)
Nematostella vectensis RIG-I/MDA5 like gene 1
Nematostella vectensis RIG-I/MDA5 like gene 2
Key structural domains predicted in the Pfam database.
LGP2 is an adaptor protein lacking CARD domains but containing a DExD/H domain and a HELICc domain homologous to their corresponding motifs in the RIG-I and MDA5 protein. It competes with RIG-I and MDA5 for the ligands, viral derived RNA PAMPs, but is unable to interact with down stream signalling proteins due to the absence of CARD domains. Thus it acts as a negative regulator of the RIG-I/MDA5 directed antiviral response. LGP2 appears to co-exist with MDA5 in vertebrates as a single copy gene. It is located in a different chromosome to MDA5 in every species analysed. The putative LGP2 proteins from non-mammalian species contain 588–682 aa, much shorter than the RIG-I and MDA5 proteins. The DExD/H domain and the HELICc domain in the LGP2 protein share higher homology with the corresponding regions in MDA5 than those in RIG-I. The LGP2 DExD/H domains are 33.4–55.3% identical to the MDA5 counterparts compared to 22.3–39.8% for the RIG-I proteins. Similarly, 47.6–66.7% identity is seen between the LGP2 HELICc domains and the MDA5 HELICc domains, in contrast to 31.7–48.7% between LGP2 HELICc domains and the RIG-I helicase domains.
Twelve genes coding for RNA helicase proteins homologous to RIG-I/MDA5/LGP2 have been reported in a recent survey of the sea urchin genome draft . Some of the deduced proteins contain CARD domains in addition to DExD/H and HELICc domains. Using the human MDA5 protein sequence as a bait, a partial homologue sequence was obtained from the sea anemone Nematostella vectensis genome database http://www.stellabase.org/, http://blast.ncbi.nlm.nih.gov/ This partial sequence was then used to search the NCBI database and two contigs (NEMVEDRAFT_v1g95706 and NEMVEDRAFT_v1g87071) were retrieved, which encoded two putative RIG-I/MDA5/LGP2 homologues. The putative proteins are 672 aa and 689 aa in length, similar to that of LGP2. Further prediction of functional motifs revealed the presence of a DExD/H domain and a HELICc domain but not the N terminal CARD domain. The proteins share 17.4–26.3 identity with RIG-I, 21.7–32.4% with MDA5 and 25.3–36.3% with LGP2.
The gene synteny of MDA5/LGP2 is well conserved in vertebrates, from fish to humans (Fig. 1B and 1C). Eight genes surrounding MDA5 in stickleback, Fugu and medaka appear in the genomes of Xenopus, chicken and humans, in the same order and the same transcriptional orientation. Less conservation of gene synteny was noted in the zebrafish genome where only 4 conserved neighbouring genes were present in the MDA5 locus. Similarly, the gene composition and arrangement in the LGP2 gene locus shows remarkable conservation during vertebrate evolution.
The RIG-I like helicase family members have recently been reported to play pivotal roles in recognising viral nucleotides in mammals. In this report, the RIG-like homologues have been identified in silico in the nucleotide databases of invertebrates and vertebrates and their evolutionary origin discussed.
Double stranded RNA is the genetic component of viruses with double stranded genomes and part of a single stranded RNA with secondary structures. It can be generated during viral replication and RNA metabolism. This nature of dsRNA makes it the prime target for host PRRs. Classical double stranded RNA binding domains are often used by some cytosolic PRRs, such as PKRs, as the detectors to sense viral presence. DICER proteins also contain two dsRNA binding domains (dsRBDs) for capturing dsRNA molecules. In the present study, a zinc finger domain similar to that of transcription factor (TF) IIS has been found in MDA5 but not in LGP2, with moderate homology to the RIG-I C terminal region. Furthermore, a well conserved type III restriction enzyme domain responsible for restriction in prokaryotic organisms is identified in the middle of both MDA5 and the N terminal region of LGP2. This domain was not detected in RIG-I molecules by the Pfam HMM programme although it shared some degree of homology. We speculate that these two domains may serve as potential binding domains to interact with viral PAMPs.
One striking finding is that a well conserved restriction enzyme III (Res III) domain is predicted in all MDA5 and LGP2 proteins (except human LGP2). The Res III domain is structurally similar to the DExD/H domain. Restriction enzymes are important components of prokaryotic DNA restriction-modification mechanisms in defence against foreign DNA . They function in combination with one or two modification enzymes (DNA-methyltransferases) that protect the cell's own DNA from cleavage by the restriction enzymes. Restriction enzymes consist of four types depending on their recognition sequences and location of cleavage sites. Type III enzymes recognize short 5–6 bp long asymmetric DNA sequences and cleave 25–27 bp downstream to generate short, single-stranded 5' overhang ends. Type III enzymes contain two functional subunits Res (restriction) and Mod (modification), specifically for DNA cleavage of unmethylated double stranded foreign DNA (Res unit) and protection of self DNA from damage by methylation (Mod unit), respectively. Classic strand separation helicase activities have not been detected for type III restriction enzymes . The Res III domain predicted in MDA5 and LGP2 have significant homology with bacterial Res III domains and multiple alignment reveals significant conservation (Fig. 5A). MDA5/LGP2 are also similar to the RNase III domains in the RNA endonuclease DICER and DICER like helicases which process dsRNA into 21–23 nt 3' overhang small RNAs, with 2 nt protrusions, and ATP-binding domains in bacterial and yeast DNA helicases [20, 21]. Integrated nuclease domains with excision activities are seen in the DICER proteins where two ribonuclease III domains cut double stranded RNAs, releasing 2 nt 3' end overhang 21–23 nt RNA molecules which are essential for specific cleavage of viral RNAs [20, 22].
Another putative important domain, a zinc finger motif similar to that of transcription factor (TF) IIS, was identified by homology analysis in the Pfam database. The zinc finger motif can bind a range of targets including DNAs, RNAs, proteins and even lipids. It is known that the zinc finger motif at the C-terminus of the TFIIS is essential for RNA binding and processing . Integrated TFIIS zinc ribbon C-terminal domains are also found in some viral proteins [24, 25]. The TFIIS motif located near the C terminus of the RIG-I/MDA5 proteins was detectable in the Pfam database although the E value (0.51–8.6) is moderate (Table 2). Structural modelling confirmed remarkable conservation of a C4 type zinc finger pocket and a β-strand structure compared to the C4 type zinc finger nucleic acid binding domain in the human TFIIS. Furthermore, a β-strand motif is also present within this domain in addition to the C4 type β-strand zinc finger structure. Whether it is involved in recognition of viral RNA PAMPs remains to be determined. A recent study has demonstrated that a C terminal domain in human RIG-I (792–925 aa) was involved in binding dsRNA or 5'ppp RNA, which was confirmed by magnetic resonance and X-ray crystallography [12, 13]. This region was also shown to suppress RIG-I signalling . Thus it is possible that viruses could interfere with this host recognition system by their own TFIIS-C containing proteins.
The RIG-I/MDA5/LGP2 system is an ancient antiviral system well conserved in vertebrates. Our data suggest that these helicase PRRs have evolved from an ancient progenitor originated from genes coding for individual functional domains and expanded by multiple evolutionary events leading to gene and/or domain gain and loss. The present study provides important clues for further elucidation of RIG-I/MDA5 mediated antiviral defence in vertebrates.
To identify MDA5, LGP2 and RIG-I genes in the available teleost genomes, the tblastn search using the human MDA5, LGP2 and RIG-I protein sequences as baits was performed against the genomes of zebrafish (Danio rerio), pufferfish (Takifugu rubripes and Tetraodon nigroviridis), medaka (Oryzias latipes), stickleback (Gasterosteus aculeatus), Western Clawed Xenopus (Xenopus tropicalis) and Chicken (Gallus gallus) in the Ensembl database http://www.ensembl.org. The obtained sequences were reciprocally searched against the other genomes to further verify their identity. The translated proteins from predicted transcripts were verified by BLASTP in the NCBI non-redundant protein sequence database and the SWISSPROT protein database http://www.ncbi.nlm.nih.gov. In addition, known MDA5, LGP2 and RIG-I genes were retrieved from the NCBI database for analysis.
For gene synteny analyis, human MDA-5, LGP2 and RIG-I were used as anchor sites. Orthologous comparisons of the genes in the regions of approximately 1 to 10 mb (million base pairs) flanking the human (NCBI 36) anchor site with medaka (HdrR), zebrafish (Zv7), stickleback (BROAD S1), pufferfish (FUGU 4.0, TETRAODON 7), Western Clawed Xenopus (JGI 4.1) or chicken (WASHUC2) genome were done within the Ensembl genome browser using the GeneView and MultiContigView options. Manual annotation of orthologous genes was also performed using FGENESH+ to predict structures based on homology with human genes: "fish" specific parameters were applied in this program.
The conserved domains were predicted using software at the ExPASy Molecular Biology Server http://pfam.sanger.ac.uk. Caspase recruitment domain, DExD/H box helicase, Type III restriction enzyme and helicase conserved C-terminal domains were predicted by a Pfam HMM search with a cutoff value of 10.0. The full-length amino acid sequences and the conserved functional domains were used in phylogenetic tree analysis. Multiple protein sequence alignments were performed using the ClustalW programme (version 1.83) . A phylogenetic tree was constructed using the neighbour-joining method within the MEGA (4.0) package . Data were analyzed using Poisson correction, and gaps were removed by pairwise deletion. The topological stability of the neighbour-joining trees was evaluated by 10,000 bootstrap replications. The three dimensional (3D) structures were predicted using the 3D JIGSAW protein comparative modelling programme  and compared to those in the MMDB/PDB database by VAST search analysis http://www.ncbi.nlm.nih.gov/Structure/VAST. The 3D structural images were displayed by the Cn3D programme (version 4.1).
This work was supported by the Royal Society of Edinburgh and National Natural Science Foundation of China (grant numbers: 30711130225 and 30830083).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.