Unusual duplication of the insulin-like receptor in the crustacean Daphnia pulex
© Boucher et al. 2010
Received: 24 April 2010
Accepted: 12 October 2010
Published: 12 October 2010
Skip to main content
© Boucher et al. 2010
Received: 24 April 2010
Accepted: 12 October 2010
Published: 12 October 2010
The insulin signaling pathway (ISP) has a key role in major physiological events like carbohydrate metabolism and growth regulation. The ISP has been well described in vertebrates and in a few invertebrate model organisms but remains largely unexplored in non-model invertebrates. This study is the first detailed genomic study of this pathway in a crustacean species, Daphnia pulex.
The Daphnia pulex draft genome sequence assembly was scanned for major components of the ISP with a special attention to the insulin-like receptor. Twenty three putative genes are reported. The pathway appears to be generally well conserved as genes found in other invertebrates are present. Major findings include a lower number of insulin-like peptides in Daphnia as compared to other invertebrates and the presence of multiple insulin-like receptors (InR), with four genes as opposed to a single one in other invertebrates. Genes encoding for the Dappu_InR are likely the result of three duplication events and bear some unusual features. Dappu_InR-4 has undergone extensive evolutionary divergence and lacks the conserved site of the catalytic domain of the receptor tyrosine kinase. Dappu_InR-1 has a large insert and lacks the transmembranal domain in the β-subunit. This domain is also absent in Dappu_InR-3. Dappu_InR-2 is characterized by the absence of the cystein-rich region. Real-time q-PCR confirmed the expression of all four receptors. EST analyses of cDNA libraries revealed that the four receptors were differently expressed under various conditions.
Duplications of the insulin receptor genes might represent an important evolutionary innovation in Daphnia as they are known to exhibit extensive phenotypic plasticity in body size and in the size of defensive structures in response to predation.
The insulin signaling pathway is evolutionary well conserved among multicellular organisms. In humans and higher vertebrates, signaling through the insulin pathway is critical for the regulation of intracellular and blood glucose levels and has a pivotal importance in metabolic diseases, such as diabetes, and cellular process such as ageing [1–3]. It is also well recognized as a major regulator of growth in both vertebrates and invertebrates and can also trigger diapause and regulate ageing in invertebrates [4–8].
The insulin signaling pathway is initiated by the secretion of insulin like peptides (ILPs), in response to elevated glucose and amino acid levels . Binding of ILP initiates a complex cascade of events, starting with phosphorylation of specific tyrosine residues on the insulin/IGF-like receptors (InR) . Once activated, these receptors phosphorylate a number of docking proteins; the best characterized are the insulin receptor substrate (IRS/Chico) proteins . IRS interact with other intracellular signaling molecules primarily through src homology 2 (SH2) domains leading to the activation of several downstream pathways. These in turn coordinate and regulate vesicle trafficking, protein synthesis, and glucose uptake . The InR and its substrates, therefore, constitute the first critical node in the insulin signaling network and thus define the full set of proteins that are tyrosine phosphorylated upon ILP stimulation.
The InR belongs to an ancient transmembrane receptor tyrosine kinase (RTK) superfamily . Unlike the other members of the RTK family, the InR forms a tetramer of two extracellular alpha (α) subunits, linked by disulphide bonds to two beta (β) subunits, which pass through the cell membrane and have intracellular tyrosine kinase domains. Both α and β chains are synthesized from a single mRNA with a variable number of exons . Although there is an exceptional evolutionary conservatism in the function and components of insulin signaling pathway, major changes have occurred between vertebrates and invertebrates in the beginning of the pathway. First, the early duplication events in vertebrates gave rise to specialized receptors, the insulin receptor, the type 1 insulin growth factor receptor (IGF1R) and the insulin receptor-related receptor (IRR) . The primary function of the insulin receptor is to control blood sugar homeostasis while the IGF1R promotes pre and post-natal growth. These receptors show a relatively high specificity with their respective ligand . The IRR is called an orphan receptor as the ligand and its biological function is still unknown.
The insulin signaling pathway has been extensively characterized in the fruitfly Drosophila melanogaster and the worm Caenorhabditis elegans and partially described for a large variety of invertebrates from sponges to insects [17–21]. Recent studies have revealed that invertebrates possess many more insulin-like peptides than expected, up to 37 in C. elegans . Although there are numerous insulin-like peptides in invertebrates, only one insulin/IGF receptor homolog has been described to date in all of them (but two distinct IR homologs have been found in the parasite trematode Schistosoma mansoni ). In this study, we report for the first time, the presence of four InR homologs in Daphnia pulex (Crustacea, Cladocera). Phylogenetic analyses confirmed that the four receptors belong to the insulin receptor family. We discuss the evolutionary significance of the duplications and highlight the significance of these results for the biology of this species.
Insulin signaling pathway gene annotation based on the scaffolds_assembly_2 database obtained from Dappu v1.1 draft genome assembly.
Insulin like Receptor 1
Insulin like receptor 2
Insulin like receptor 3
Insulin like receptor 4
Insulin like peptide 1
Insulin like peptide 2
Insulin like peptide 3
Insulin like peptide 4
Insulin receptor substrate
Protein kinase B
Phosphoinositide-dependent protein kinase 1
Forkhead transcription factor FOXO
Tumor suppressor 1
Tumor suppressor 2
Ras homolog enriched in brain
Target of rapamycin
S6 kinase 1
S6 kinase 2
Pol I transcripEon factor
Phylogenetic relationships of Dappu_InR1-R4 with various invertebrate and vertebrate InR were investigated. Tyrosine kinase (TK) domain of diverse RTK and InR were aligned using MUSCLE 3.7 and the alignments were refined with Gblocks 0.91b which eliminates poorly aligned positions and divergent regions in order to focus on unambiguously aligned residues. The RTK and InR alignments used in the phylogenetic analyses included 187 amino acids, which represent 63% of the original 295 positions. Overall, there were 30 conserved positions and 122 parsimony informative sites.
The length of the open reading frame of the Daphnia pulex insulin receptors ranged from 1313 and 1865 amino acids. The A+T composition was very similar among the four receptors. The exon-intron structure of each gene presents a relatively complex and variable structure and the receptors 1 to 4 were respectively composed of 26, 15, 20 and 19 introns (Table 1). Two assembly gaps were found, one of 6 kb in Dappu_InR1 located after exon 13 and another one of 1 kb in Dappu_InR2. Dappu_InR3 and Dappu_InR4 did not show gaps (Additional Figure 1).
The insulin receptor is transmembranal and formed by two a and two b subunits linked in a disulfide b-a-a-b configuration. Dappu_InR1, Dappu_InR2, Dappu_InR3, and Dappu_InR4 possessed a 20 a.a. transmembrane domain in the beginning of the gene almost in the same region as the one identified in Drosophila The insulin receptor 2 had a second transmembranal domain of 20 a.a. located upstream into the gene just before the tyrosine protein kinase domain. The structure of the extracellular region of the insulin receptor is predicted to be composed of six distinct structural domains as follows: two homologous domains L1 and L2 flanking a cysteine-rich domain CR followed by three fibronectin type III repeats (Fn0, Fn1, and Fn2). All four analysed receptors included two L domains but the Dappu_InR-2 lacked the cysteine-rich domain CR. Consequently, in this receptor, the two L domains are separated by only one a.a. In the extracellular region, we also identified fibronectin type III domains. Dappu_InR1, Dappu_InR2, Dappu_InR3, and Dapppu_InR4 contained three highly similar fibronectin regions as in Homo sapiens and Drosophila (region I-II-IV in Figure 3). Dappu_InR1 included an additional fibronectin domain that we identified as an insert. The insertion (333 a.a. in length) was located immediately after a predicted basic cleavage site (KRR847) and had a tyrosine protein kinase domain that itself included a protein kinase ATP binding conserved site.
Overall, the analyses of the gene arrangements of the four InR of D. pulex revealed that the extracellular parts were more variable than the intracellular components. Dapppu_InR-3 appeared to be the more typical or conserved of the four as Dappu_InR-2 lacked the CR region and Dappu_InR-1 had a long insert including an additional FN type 3 domain as well as an additional protein kinase domain. Dappu_InR4 lacked the catalytic site of the tyrosine kinase.
The dN:dS ratios did not show significant differences in all species tested and in the four Daphnia receptors and were mostly lower than one in all pairwise comparisons, thus congruent with purifying selection.
Qualitative expression of four Daphnia pulex insulin receptors as revealed by cDNA sequencing of condition specific libraries.
Type of library
Number of clones sequenced
Low mixed metals
High mixed metals
In this study, we provide the first detailed genomic study of the insulin signaling pathway in a crustacean species. As expected, all the components of the insulin signaling pathway were present, confirming the conservation of this pathway in metazoans. Few insulin-like peptides were found in Daphnia in contrast to other invertebrates. For example, D. melanogaster and C. elegans possess 7 and 37 ILPs respectively . We report here the presence of four putative InR in the D. pulex genome, a unique case in arthropods. A single InR gene has been described in a wide variety of invertebrates including freshwater mollusks, nematodes and insects. A duplication event of the insulin receptor has also been reported in the helminth Schistosoma masoni but the structural and functional particularities of one of these receptors (SmIR-1) argued for its involvement in the parasitic lifestyle of this organism . In vertebrates, two duplication events have led to the presence of three InR paralogs (InR, IGF1R and IRR) . The presence of three or more paralogs in vertebrates but only one gene in invertebrates is a common pattern in many protein families. Our results contrast with this trend. The D. pulex genome is recognized as one of the most gene rich metazoan genomes, surpassing the number of overall duplicated genes in the genomes of other well characterized species. Duplication of the InR is thus consistent with large numbers of duplicates for other common gene families in D. pulex http://www.biomedcentral.com/series/Daphnia. For example, 75 cytochrome P450 http://www.biomedcentral.com/1471-2164/10/169, 64 ABC proteins http://www.biomedcentral.com/1471-2164/10/170 and 11 hemoglobin genes have already been documented in this species http://www.biomedcentral.com/1472-6793/9/7.
The phylogeny of the D. pulex InR is consistent with three possible duplication events with a first duplication of the InR gene that could have led to the ancestor of Dappu_InR-1 and Dappu_InR-3 and to the ancestor of Dappu_InR-2 and Dappu_InR-4, which were both subsequently generated in a second round of duplication. Assembly gaps in two of these InR genes will need to be resolved for a complete understanding, but our analyses of the non-gap regions support the reported biological differences.
Following duplication event, three possible fates are possible for duplicated genes . Briefly, (1) duplicated genes are subject to mutation that can destroy incipient function; (2) during the relaxed selection period following the duplication one copy can acquire a beneficial mutation resulting in new function (neofunctionalization); and (3) the original functions of the single-copy gene may be partitioned between the duplicates (subfunctionalization). In our results, Dappu_InR-4 may have acquired new function based on three lines of evidence: 1) the modification of the TK motif involved in phosphotransfer, 2) the longer branch seen on the phylogeny suggesting an unusual rapid evolution rate of Dappu_InR-4 as compared to other InR, and 3) results from cDNA sequencing showing that this receptor was sampled in libraries created under various conditions. Expression studies will need to be performed to determine if the original function has been partitioned among the four duplicates (subfunctionalization) or if Dappu_InR4 has acquired new function by neofunctionalization. The three other InR appear functional even if their ectodomains revealed some intriguing structural features. In the well described InR of vertebrates, the β-subunit includes a single α-helical hydrophobic transmembrane domain . Transmembrane domain interactions are thought to be involved in negative cooperativity of InR . Among the four InR, only Dappu_InR-2 possesses this transmembranal domain on the β-subunit. However, a transmembranal domain has been identified in all four InR near the N-terminal of the α-subunit. To our knowledge, this is the first report of a second transmembranal domain located in the α-subunit of the InR. This particularity is apparently not restricted to D. pulex as we found this domain in insects (e.g. Tribolium castaneum, Nasonia vitripennis and Drosophila melanogaster) and vertebrates (e.g. Mus musculus) (data not shown).
Another important difference observed among InR is the absence of the furin-like cystein-rich domain in Dappu_InR-2. The cysteine residues involved in the formation of disulfur bonds of this domain maintain the quaternary structure of the extracellular portion of the receptor . These cysteines are highly conserved in all paralogs across vertebrates. A similar pattern is found in Drosophila  where the receptor assembles into a quaternary structure but only 2 out of the 6 cysteines described in human InR are conserved.
Experiments with chimeric InR have confirmed that the cystein-rich domain constitutes a part of the insulin/IGF1 binding specifity. In the nematode C. elegans, the cystein-rich region is much shorter and contains only two conserved cysteines . Actually, it is not known if the absence of this cystein-rich domain compromises or modifies the activity of the receptor. We observed the presence of a large insert of 333 amino acids in Dappu_InR-1 that included a fourth fibronectin type-3 domain and a TK domain. As this insert contains additional motifs characteristic of InR, it remains possible that this sequence bears a specific function. As there is already evidence from EST that these four InR are expressed, further studies are needed to confirm if they are expressed in different tissues and/or life stages i.e. if they have different functions. As Daphnia are known to exhibit extensive phenotypic plasticity in body size and in the size of defensive structures in response to predation [30, 31], duplications of the insulin receptor genes might represent an important evolutionary innovation in this group.
The insulin signaling pathway is of pivotal importance in metabolic diseases, regulator of growth and ageing and can also trigger diapause. This pathway has been well described in vertebrate and in few invertebrate model organisms but remains largely unexplored in non-model invertebrates. Our study showed the presence of four InR which are likely the result of three duplication events. All four receptors were found to be expressed. One of these receptors (Dappu_InR-4) has likely changed its function but does not appear to be under positive selection. The three other receptors present some unusual structural features such as the presence of a large insert in Dappu_InR-1, the absence of the transmembranal domain in the β-subunit in the Dappu_InR-1 and Dappu_InR-3, and the absence of the cystein-rich region in Dappu_InR-2. Although these InR have some unusual features, the Daphnia pulex insulin signaling pathway presents some characteristics (few ILPs and more than one InR) that are more similar to mammalian ISP which makes it an ideal candidate as a model organism for the study of this pathway. Future studies will examine if other daphniids also possess four InR and what are the evolutionary consequences of these duplications in Daphnia pulex.
Molecular databases at the National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov and ENSEMBL http://www.ensembl.org/index.html were screened for vertebrate and invertebrate InR using any combination of words related to insulin receptors, e.g. insulin receptor, IGF receptor, insulin, etc. Blast searches were then performed using several well described vertebrate and invertebrate InR sequences as the queries. Other sequence databases that are publicly accessible were also screened to retrieve unannotated sequences. The following websites were used: Joint Genome Institute (JGI) Genome Portal http://www.jgi.doe.gov/Daphnia/, the Wellcome Trust Sanger Institute http://www.sanger.ac.uk, the Institute for Genome Research http://www.tigr.org/tdb/, Washington University Genome Sequencing Centre http://www.genome.wustl.edu, H-invitational database http://h-invitational.jp, Baylor College of Medicine http://www.hgsc.bcm.tmc.edu, Dictybase http://dictybase.org, Ciona intestinalisgenome http://genome.jgi-psf.org/ciona/. Protein sequences that were of full length or apparently of full length (presence of initial methionine) were included preferentially over partial sequences, although the later were included for some organisms.
We searched the Daphnia pulex genome http://genome.jgi-psf.org/Dappu1/Dappu1.home.html for candidate InR genes using a combination of key-word queries and the tblastn program. This database included a number of gene prediction sets as well as a combined prediction data set. Identification of putative InR gene orthologs was completed by multiple protein sequence alignments followed by phylogenetic analysis. In addition to the InR genes, we also annotated major components of the insulin signaling pathway (Table 1) using the same methods as described above. We described all component members (determined based on sequence homology) and annotated them using an abbreviation of genus and species name followed by the most common designation (for example, Dappu_InR-1 correspond to the first putative InR in Daphnia pulex). Numbering of sequences does not imply homology with those of other organisms but rather chronology of discovery or identification. We suggest that all proteins described in this study, which have not been functionally characterized, be considered as putative, until confirmed by appropriate functional assays.
For phylogenetic analyses, alignments included additional InR sequences delimited by the well conserved tyrosine kinase (TK) subdomain. This portion of the gene was selected to include taxa in which the InR was partially sequenced (e.g. Sycon raphanus) and allowed the assessment of the genetic relationships of a greater number of taxa from various groups than with the complete insulin sequence. A total of 30 TK domain sequences were included in the phylogenetic analysis. Amino acid sequences were aligned using MUSCLE 3.7  and the alignment was refined with Gblocks 0.91b  which eliminates poorly aligned positions and divergent regions in order to focus on unambiguously aligned residues. The alignment of the ingroup taxa included 187 unambiguously aligned residues with no gap and used for the construction of phylogenetic trees by two different analytical approaches: Maximum likelihood (ML) and Bayesian inference (BI). We used PROTEST version 1.2.6 to select the substitution model that best fit the empirical data set . Then, the ML and Bayesian analyses were performed using the fixed rate amino acid replacement model Whelan and Goldman (WAG) + G (α = 0.82). PHYML version 2.4.4  was used to find the ML tree. The robustness of the inferred tree was assessed using bootstrapping (500 pseudoreplicates) as implemented in PHYML. Bayesian analysis was also performed using MrBAYES software, version 3.1. Runs of 1,000,000 generations were executed, with a sampling frequency of 10, a burn-in parameter of 25,000. Stability of the likelihood scores was assessed in preliminary trials before setting the burn-in parameter. To confirm that the results converged to the same topology, we repeated the analysis three times. Bayesian posterior probabilities were calculated using a Markov chain Monte Carlo sampling approach implemented in MrBAYES v3.1. For both phylogenetic analyses, we used Mus musculus epidermal growth factor receptor (EGFR) as outgroup. We also performed the phylogenetic analyses on the complete insulin sequence (but on a restricted number of taxa) to verify if the genetic relationships would be similar to those of the TK domain.
Positive selection was tested using Creevey-McInerney  analysis with the Crann program. This analysis compares the number of S and NS substitutions by creating a neighbor-joining tree based on dN values (NS substitutions per NS site) and then determining whether mutations are variable (occur more than once in the tree) or invariable (occur in only one branch of the tree). Four values are compared to those predicted by neutral theory: S variable (SV), S invariable (SI), NS variable (NSV), and NS invariable (NSI) mutations. If the NS value is significantly greater than its S counterpart, then positive selection is detected as nondirectional (NSI>SI) or directional (NSV>SV).
The different domains and subdomains were compared to consensus domains using BLAST http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi, SMART http://smart.embl-heidelberg.de, ScanProsite http://www.expasy.org/tools/scanprosite/ and MotifScan http://myhits.isb-sib.ch/cgi-bin/motif_scan. A set of sequences from human (IGF-1R) and D. melanogaster InR were aligned together with those of Dappu InR-1, Dappu InR-2, Dappu InR-3 and Dappu InR-4 using clustalW program (MEGA version 4.0  with the default parameters.
Qualitative expression of the four Daphnia pulex insulin receptors was obtained by cDNA sequencing of condition specific libraries. Data were extracted from wFleaBase and Gbrowse http://wfleabase.org/genepage/daphnia/JGI_V11_314199, http://wfleabase.org/genepage/daphnia/JGI_V11_270048, http://wfleabase.org/genepage/daphnia/JGI_V11_268485, http://wfleabase.org/genepage/daphnia/JGI_V11_237791).
Detailed description of cDNA libraries can be found at https://dgc.cgb.indiana.edu/display/daphnia/cDNA+sequencing+project.
RT-qPCR was used to confirm the expression of the four receptors. Total RNA was extracted from 10 pooled adult Daphnia with no eggs in the brood pouch from a single D. pulex clone using a Qiagen RNA extraction kit. RT was then performed on 2 αg of total RNA in 20 αL of 1 × M-MLV buffer (50 mmol·L-1 Tris-HCl, pH 8.3; 50 mmol·L-1 KCl; 3 mmol·L-1 MgCl2; 5 mmol·L-1 DTT) containing 0.5 mmol·L-1 dNTPs mix, 2 μmol·L-1 oligo(dT)23 primer, 200 units M-MLV RT, and 20 units of RNase inhibitor. Primers specific for each receptor were designed based on unique exons (except for Dappu_InR4) from predicted cDNA sequences from FleaBase. RT-qPCR was performed in the myIQ BIORAD. Each reaction contained 2αl of cDNA template, 10 αl of Quantifast SYBRgreen PCR master mix (Qiagen) and 0.5 αM of primers (DpInR1F: 5' CAAACACGTCATCCACAAC3', DpInR1R: 5'GCCGCTTCATAAACTCAAGTAAT3', DpInR2F: 5'GCCGCTTCATAAACTCAAGTAAT3', DpInR2R: 5' GCAATTTGACCGCCAGGATT 3', DpInR3F 5' GAGGGTCAACAATGTAGCTGCTAAC3', DpInR3R: 5' CAAAATTGGCTACGGGCACCTCA3', DpInR4F: 5' AAAGAATGCATTCGCCGGAAGGAC 3', DpInR4R: 5' TTTCCTGTTCCGGCCAATGAAACCGCTCGAA3'). PCR was carried out using the following conditions: 5 min at 95°C, 50 cycles consisting of 20 s at 95°C followed by 30 s at 60°C (InR1), 67°C (InR2,3), and 72°C (InR4). Dissociation curves were performed to test for the presence of primers-dimers.
This work was supported by NSERC grants to FD. The sequencing and portions of the analyses were performed at the DOE Joint Genome Institute under the auspices of the U.S. Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48, Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231, Los Alamos National Laboratory under Contract No. W-7405-ENG-36 and in collaboration with the Daphnia Genomics Consortium (DGC) http://daphnia.cgb.indiana.edu. Additional analyses were performed by wFleaBase, developed at the Genome Informatics Lab of Indiana University with support to Don Gilbert from the National Science Foundation and the National Institutes of Health. Coordination infrastructure for the DGC is provided by The Center for Genomics and Bioinformatics at Indiana University, which is supported in part by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. Our work benefits from, and contributes to the Daphnia Genomics Consortium. This manuscript was improved by comments from John Colbourne. We thank JunJian Chen for providing help with the qPCR analyses.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.