Unusual duplication of the insulin-like receptor in the crustacean Daphnia pulex

Background The insulin signaling pathway (ISP) has a key role in major physiological events like carbohydrate metabolism and growth regulation. The ISP has been well described in vertebrates and in a few invertebrate model organisms but remains largely unexplored in non-model invertebrates. This study is the first detailed genomic study of this pathway in a crustacean species, Daphnia pulex. Results The Daphnia pulex draft genome sequence assembly was scanned for major components of the ISP with a special attention to the insulin-like receptor. Twenty three putative genes are reported. The pathway appears to be generally well conserved as genes found in other invertebrates are present. Major findings include a lower number of insulin-like peptides in Daphnia as compared to other invertebrates and the presence of multiple insulin-like receptors (InR), with four genes as opposed to a single one in other invertebrates. Genes encoding for the Dappu_InR are likely the result of three duplication events and bear some unusual features. Dappu_InR-4 has undergone extensive evolutionary divergence and lacks the conserved site of the catalytic domain of the receptor tyrosine kinase. Dappu_InR-1 has a large insert and lacks the transmembranal domain in the β-subunit. This domain is also absent in Dappu_InR-3. Dappu_InR-2 is characterized by the absence of the cystein-rich region. Real-time q-PCR confirmed the expression of all four receptors. EST analyses of cDNA libraries revealed that the four receptors were differently expressed under various conditions. Conclusions Duplications of the insulin receptor genes might represent an important evolutionary innovation in Daphnia as they are known to exhibit extensive phenotypic plasticity in body size and in the size of defensive structures in response to predation.


Background
The insulin signaling pathway is evolutionary well conserved among multicellular organisms. In humans and higher vertebrates, signaling through the insulin pathway is critical for the regulation of intracellular and blood glucose levels and has a pivotal importance in metabolic diseases, such as diabetes, and cellular process such as ageing [1][2][3]. It is also well recognized as a major regulator of growth in both vertebrates and invertebrates and can also trigger diapause and regulate ageing in invertebrates [4][5][6][7][8].
The insulin signaling pathway is initiated by the secretion of insulin like peptides (ILPs), in response to elevated glucose and amino acid levels [9]. Binding of ILP initiates a complex cascade of events, starting with phosphorylation of specific tyrosine residues on the insulin/IGF-like receptors (InR) [10]. Once activated, these receptors phosphorylate a number of docking proteins; the best characterized are the insulin receptor substrate (IRS/Chico) proteins [11]. IRS interact with other intracellular signaling molecules primarily through src homology 2 (SH2) domains leading to the activation of several downstream pathways. These in turn coordinate and regulate vesicle trafficking, protein synthesis, and glucose uptake [12]. The InR and its substrates, therefore, constitute the first critical node in the insulin signaling network and thus define the full set of proteins that are tyrosine phosphorylated upon ILP stimulation.
The InR belongs to an ancient transmembrane receptor tyrosine kinase (RTK) superfamily [13]. Unlike the other members of the RTK family, the InR forms a tetramer of two extracellular alpha (a) subunits, linked by disulphide bonds to two beta (b) subunits, which pass through the cell membrane and have intracellular tyrosine kinase domains. Both a and b chains are synthesized from a single mRNA with a variable number of exons [14]. Although there is an exceptional evolutionary conservatism in the function and components of insulin signaling pathway, major changes have occurred between vertebrates and invertebrates in the beginning of the pathway. First, the early duplication events in vertebrates gave rise to specialized receptors, the insulin receptor, the type 1 insulin growth factor receptor (IGF1R) and the insulin receptor-related receptor (IRR) [15]. The primary function of the insulin receptor is to control blood sugar homeostasis while the IGF1R promotes pre and post-natal growth. These receptors show a relatively high specificity with their respective ligand [16]. The IRR is called an orphan receptor as the ligand and its biological function is still unknown.
The insulin signaling pathway has been extensively characterized in the fruitfly Drosophila melanogaster and the worm Caenorhabditis elegans and partially described for a large variety of invertebrates from sponges to insects [17][18][19][20][21]. Recent studies have revealed that invertebrates possess many more insulin-like peptides than expected, up to 37 in C. elegans [22]. Although there are numerous insulin-like peptides in invertebrates, only one insulin/IGF receptor homolog has been described to date in all of them (but two distinct IR homologs have been found in the parasite trematode Schistosoma mansoni [23]). In this study, we report for the first time, the presence of four InR homologs in Daphnia pulex (Crustacea, Cladocera). Phylogenetic analyses confirmed that the four receptors belong to the insulin receptor family. We discuss the evolutionary significance of the duplications and highlight the significance of these results for the biology of this species.

Results
Analyses of the newly sequenced D. pulex genome, the first crustacean draft genome sequence, allowed us to identify key components of the insulin pathway. The D. pulex genome includes 25 putative genes belonging to the insulin signaling pathway, which include all major components of this signaling system. Daphnia pulex appears to have a relatively low number of insulin-like peptides as compared to other invertebrates but has four putative insulin-like receptors. (Table 1). The rest of the pathway appears similar to what is known from other invertebrates with no appearance of duplication ( Figure 1).

Phylogenetic analyses of the receptors
Phylogenetic relationships of Dappu_InR1-R4 with various invertebrate and vertebrate InR were investigated.
Tyrosine kinase (TK) domain of diverse RTK and InR were aligned using MUSCLE 3.7 and the alignments were refined with Gblocks 0.91b which eliminates poorly aligned positions and divergent regions in order to focus on unambiguously aligned residues. The RTK and InR alignments used in the phylogenetic analyses included 187 amino acids, which represent 63% of the original 295 positions. Overall, there were 30 conserved positions and 122 parsimony informative sites.
The ML and Bayesian analyses of the InR using the RTKs of the mouse EGFR as outgroup recovered the phylogenetic tree (ML: -logL = 4403.76) shown in figure 2. The trees were generally reliable, consisting of five major clusters namely porifera, nematodes, molluscs, deuterostomes, and arthropods. The main difference between the phylogenies produced by maximum likelihood and bayesian were branch lengths and the position of Ciona intestinalis that branched with Drosophila under maximum likelihood but not under Bayesian analyses. The phylogenetic analyses confirmed the monophyly of the four Daphnia insulin's receptors suggesting an origin through duplication events. The fourth InR was found to have undergone extensive evolutionary divergence as its branch was much longer than the other InR. The D. pulex InR were found to be closely related to those of the other arthropods. The D. melanogaster InR showed a surprising grouping with the D. pulex InR although the bootstrap support for this phylogenetic relationship was moderate. The phylogeny produced with the complete insulin sequence confirmed the monophyly of the four D. pulex InR and the longer branch of Dappu_InR4 but placed D.melanogaster with the other insect taxa.

Comparative analyses of the insulin receptors
The length of the open reading frame of the Daphnia pulex insulin receptors ranged from 1313 and 1865 amino acids. The A+T composition was very similar among the four receptors. The exon-intron structure of each gene presents a relatively complex and variable structure and the receptors 1 to 4 were respectively composed of 26, 15, 20 and 19 introns (Table 1). Two assembly gaps were found, one of 6 kb in Dappu_InR1 located after exon 13 and another one of 1 kb in Dap-pu_InR2. Dappu_InR3 and Dappu_InR4 did not show gaps (Additional Figure 1).
The insulin receptor is a modular protein containing several specific functional domains. The structural analysis of the D. pulex insulin receptors revealed the presence of nearly all of the typical domains found in well characterised insulin receptors such as human or Drosophila ( Figure 3). Surprisingly, the fourth receptor lacked the conserved site of the catalytic domain of the receptor tyrosine kinase. The insulin receptor is transmembranal and formed by two a and two b subunits linked in a disulfide b-a-ab configuration. Dappu_InR1, Dappu_InR2, Dap-pu_InR3, and Dappu_InR4 possessed a 20 a.a. transmembrane domain in the beginning of the gene almost in the same region as the one identified in Drosophila The insulin receptor 2 had a second transmembranal domain of 20 a.a. located upstream into the gene just before the tyrosine protein kinase domain. The structure of the extracellular region of the insulin receptor is predicted to be composed of six distinct structural domains as follows: two homologous domains L1 and L2 flanking a cysteine-rich domain CR followed by three fibronectin type III repeats (Fn0, Fn1, and Fn2). All four analysed receptors included two L domains but the Dappu_InR-2 lacked the cysteine-rich domain CR. Consequently, in this receptor, the two L domains are separated by only one a.a. In the extracellular region, we also identified fibronectin type III domains. Dappu_InR1, Dappu_InR2, Dappu_InR3, and Dapppu_InR4 contained three highly similar fibronectin regions as in Homo sapiens and Drosophila (region I-II-IV in Figure 3). Dappu_InR1 included an additional fibronectin domain that we identified as an insert. The insertion (333 a.a. in length) was located immediately after a predicted basic cleavage site (KRR847) and had a tyrosine protein kinase domain that Figure 1 Daphnia pulex insulin and TOR signalling pathways, adapted from Drosophila melanogaster [39]. The D. pulex insulin/insulin-like growth factors signalling (IIS) comprises a group of insulin-like peptides (ILPs), four insulin receptors (InR1, InR2, InR3 and InR4) genes, an insulin receptor substrate (IRS), the type phosphatidylinositol-3-kinase (PI3K p85/p60 and PI3K p110), the lipid phosphatase PTEN, the protein kinase PKB/ AKT, the phosphoinositide-dependent protein kinase 1 (PDK-1) and the transcriptional factor FOXO. The TOR-pathway includes a TSC complex (TSC1 and TSC2), a small GTPase RHEB, the target of rapamycin (TOR), two S6 kinase (S6K1 and S6K2), the 4E-binding protein (4EBP) and the Pol I transcription factor TIF-1A. The AMPK-pathway involves the activation of AMP-dependent kinase (AMPK) by the LKB1 protein kinase.
itself included a protein kinase ATP binding conserved site.
The intracellular region of the insulin receptor i.e. the cytoplasmic tyrosine kinase activity is responsible for ligand-induced signal transduction to metabolic and mitogenic responses. Compared to the extracellular ligand binding determinant, this portion of the insulin receptor was much more conserved among the four Daphnia insulin receptors but also when compared to the other taxa ( Figure 4). Dappu_InR1, Dappu_InR2, and Dappu_InR3 exhibited a protein tyrosine kinase catalytic domain signature which included ATP binding site, sequence required for ATP stabilization, motif implicated in phosphotransfer, Mg 2+ binding site, consensus "PVRWMAPE" and two juxtaposed autophosphorylation sites characteristic of InR domain (for details on conserved peptide motifs that define the 11 protein kinase subdomains see [24]).
Overall, the analyses of the gene arrangements of the four InR of D. pulex revealed that the extracellular parts were more variable than the intracellular components. Dapppu_InR-3 appeared to be the more typical or conserved of the four as Dappu_InR-2 lacked the CR region and Dappu_InR-1 had a long insert including an additional FN type 3 domain as well as an additional protein kinase domain. Dappu_InR4 lacked the catalytic site of the tyrosine kinase.
The dN:dS ratios did not show significant differences in all species tested and in the four Daphnia receptors and were mostly lower than one in all pairwise comparisons, thus congruent with purifying selection.

Expression analyses
Investigations of aligned EST sequences from condition specific cDNA libraries on the Draft genome sequence scaffolds revealed that all four InR were expressed under various conditions. ( Table 2). RT-qPCR confirmed the expression of all four receptors. Single products were found for each receptor and efficiencies rates were close to 95%.

Discussion
In this study, we provide the first detailed genomic study of the insulin signaling pathway in a crustacean species. As expected, all the components of the insulin signaling pathway were present, confirming the conservation of this pathway in metazoans. Few insulin-like peptides were found in Daphnia in contrast to other invertebrates. For example, D. melanogaster and C. elegans possess 7 and 37 ILPs respectively [22]. We report   here the presence of four putative InR in the D. pulex genome, a unique case in arthropods. A single InR gene has been described in a wide variety of invertebrates including freshwater mollusks, nematodes and insects. A duplication event of the insulin receptor has also been reported in the helminth Schistosoma masoni but the structural and functional particularities of one of these receptors (SmIR-1) argued for its involvement in the parasitic lifestyle of this organism [23]. In vertebrates, two duplication events have led to the presence of three InR paralogs (InR, IGF1R and IRR) [15]. The presence of three or more paralogs in vertebrates but only one gene in invertebrates is a common pattern in many protein families. Our results contrast with this trend. The Assembly gaps in two of these InR genes will need to be resolved for a complete understanding, but our analyses of the non-gap regions support the reported biological differences.
Following duplication event, three possible fates are possible for duplicated genes [25]. Briefly, (1) duplicated genes are subject to mutation that can destroy incipient function; (2) during the relaxed selection period following the duplication one copy can acquire a beneficial mutation resulting in new function (neofunctionalization); and (3) the original functions of the single-copy gene may be partitioned between the duplicates (subfunctionalization). In our results, Dappu_InR-4 may have acquired new function based on three lines of evidence: 1) the modification of the TK motif involved in phosphotransfer, 2) the longer branch seen on the phylogeny suggesting an unusual rapid evolution rate of Dappu_InR-4 as compared to other InR, and 3) results from cDNA sequencing showing that this receptor was sampled in libraries created under various conditions. Expression studies will need to be performed to determine if the original function has been partitioned among the four duplicates (subfunctionalization) or if Dappu_InR4 has acquired new function by neofunctionalization. The three other InR appear functional even if their ectodomains revealed some intriguing structural features. In the well described InR of vertebrates, the βsubunit includes a single α-helical hydrophobic transmembrane domain [26]. Transmembrane domain interactions are thought to be involved in negative cooperativity of InR [27]. Among the four InR, only Dappu_InR-2 possesses this transmembranal domain on the β-subunit. However, a transmembranal domain has been identified in all four InR near the N-terminal of the α-subunit. To our knowledge, this is the first report of a second transmembranal domain located in the αsubunit of the InR. This particularity is apparently not restricted to D. pulex as we found this domain in insects (e.g. Tribolium castaneum, Nasonia vitripennis and Drosophila melanogaster) and vertebrates (e.g. Mus musculus) (data not shown).
Another important difference observed among InR is the absence of the furin-like cystein-rich domain in Dappu_InR-2. The cysteine residues involved in the formation of disulfur bonds of this domain maintain the quaternary structure of the extracellular portion of the receptor [26]. These cysteines are highly conserved in all paralogs across vertebrates. A similar pattern is found in Drosophila [28] where the receptor assembles into a quaternary structure but only 2 out of the 6 cysteines described in human InR are conserved.
Experiments with chimeric InR have confirmed that the cystein-rich domain constitutes a part of the insulin/ IGF1 binding specifity. In the nematode C. elegans, the cystein-rich region is much shorter and contains only two conserved cysteines [29]. Actually, it is not known if the absence of this cystein-rich domain compromises or modifies the activity of the receptor. We observed the presence of a large insert of 333 amino acids in Dap-pu_InR-1 that included a fourth fibronectin type-3 domain and a TK domain. As this insert contains additional motifs characteristic of InR, it remains possible that this sequence bears a specific function. As there is already evidence from EST that these four InR are expressed, further studies are needed to confirm if they are expressed in different tissues and/or life stages i.e. if they have different functions. As Daphnia are known to exhibit extensive phenotypic plasticity in body size and in the size of defensive structures in response to predation [30,31], duplications of the insulin receptor genes might represent an important evolutionary innovation in this group.

Conclusion
The insulin signaling pathway is of pivotal importance in metabolic diseases, regulator of growth and ageing and can also trigger diapause. This pathway has been well described in vertebrate and in few invertebrate model organisms but remains largely unexplored in non-model invertebrates. Our study showed the presence of four InR which are likely the result of three duplication events. All four receptors were found to be expressed. One of these receptors (Dappu_InR-4) has likely changed its function but does not appear to be under positive selection. The three other receptors present some unusual structural features such as the presence of a large insert in Dappu_InR-1, the absence of the transmembranal domain in the β-subunit in the Dappu_InR-1 and Dappu_InR-3, and the absence of the cystein-rich region in Dappu_InR-2. Although these InR have some unusual features, the Daphnia pulex insulin signaling pathway presents some characteristics (few ILPs and more than one InR) that are more similar to mammalian ISP which makes it an ideal candidate as a model organism for the study of this pathway. Future studies will examine if other daphniids also possess four InR and what are the evolutionary consequences of these duplications in Daphnia pulex.

Sequence retrieval
Molecular databases at the National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov and ENSEMBL http://www.ensembl.org/index.html were screened for vertebrate and invertebrate InR using any combination of words related to insulin receptors, e.g. insulin receptor, IGF receptor, insulin, etc. Blast searches were then performed using several well described vertebrate and invertebrate InR sequences as the queries. Other sequence databases that are publicly accessible were also screened to retrieve unannotated sequences. The following websites were used: Joint Genome Institute (JGI) Genome Portal http://www.jgi.doe. gov/Daphnia/, the Wellcome Trust Sanger Institute http://www.sanger.ac.uk, the Institute for Genome Research http://www.tigr.org/tdb/, Washington University Genome Sequencing Centre http://www.genome. wustl.edu, H-invitational database http://h-invitational. jp, Baylor College of Medicine http://www.hgsc.bcm. tmc.edu, Dictybase http://dictybase.org, Ciona intestinalisgenome http://genome.jgi-psf.org/ciona/. Protein sequences that were of full length or apparently of full length (presence of initial methionine) were included preferentially over partial sequences, although the later were included for some organisms.

Annotation of insulin signaling pathway in D. pulex
We searched the Daphnia pulex genome http://genome. jgi-psf.org/Dappu1/Dappu1.home.html for candidate InR genes using a combination of key-word queries and the tblastn program. This database included a number of gene prediction sets as well as a combined prediction data set. Identification of putative InR gene orthologs was completed by multiple protein sequence alignments followed by phylogenetic analysis. In addition to the InR genes, we also annotated major components of the insulin signaling pathway (Table 1) using the same methods as described above. We described all component members (determined based on sequence homology) and annotated them using an abbreviation of genus and species name followed by the most common designation (for example, Dappu_InR-1 correspond to the first putative InR in Daphnia pulex). Numbering of sequences does not imply homology with those of other organisms but rather chronology of discovery or identification. We suggest that all proteins described in this study, which have not been functionally characterized, be considered as putative, until confirmed by appropriate functional assays.

Sequence alignment and phylogenetic analysis
For phylogenetic analyses, alignments included additional InR sequences delimited by the well conserved tyrosine kinase (TK) subdomain. This portion of the gene was selected to include taxa in which the InR was partially sequenced (e.g. Sycon raphanus) and allowed the assessment of the genetic relationships of a greater number of taxa from various groups than with the complete insulin sequence. A total of 30 TK domain sequences were included in the phylogenetic analysis. Amino acid sequences were aligned using MUSCLE 3.7 [32] and the alignment was refined with Gblocks 0.91b [33] which eliminates poorly aligned positions and divergent regions in order to focus on unambiguously aligned residues. The alignment of the ingroup taxa included 187 unambiguously aligned residues with no gap and used for the construction of phylogenetic trees by two different analytical approaches: Maximum likelihood (ML) and Bayesian inference (BI). We used PRO-TEST version 1.2.6 to select the substitution model that best fit the empirical data set [34]. Then, the ML and Bayesian analyses were performed using the fixed rate amino acid replacement model Whelan and Goldman (WAG) + G (α = 0.82). PHYML version 2.4.4 [35] was used to find the ML tree. The robustness of the inferred tree was assessed using bootstrapping (500 pseudoreplicates) as implemented in PHYML. Bayesian analysis was also performed using MrBAYES software, version 3.1 [36]. Runs of 1,000,000 generations were executed, with a sampling frequency of 10, a burn-in parameter of 25,000. Stability of the likelihood scores was assessed in preliminary trials before setting the burn-in parameter. To confirm that the results converged to the same topology, we repeated the analysis three times. Bayesian posterior probabilities were calculated using a Markov chain Monte Carlo sampling approach implemented in MrBAYES v3.1. For both phylogenetic analyses, we used Mus musculus epidermal growth factor receptor (EGFR) as outgroup. We also performed the phylogenetic analyses on the complete insulin sequence (but on a restricted number of taxa) to verify if the genetic relationships would be similar to those of the TK domain.
Positive selection was tested using Creevey-McInerney [37] analysis with the Crann program. This analysis compares the number of S and NS substitutions by creating a neighbor-joining tree based on dN values (NS substitutions per NS site) and then determining whether mutations are variable (occur more than once in the tree) or invariable (occur in only one branch of the tree). Four values are compared to those predicted by neutral theory: S variable (SV), S invariable (SI), NS variable (NSV), and NS invariable (NSI) mutations. If the NS value is significantly greater than its S counterpart, then positive selection is detected as nondirectional (NSI>SI) or directional (NSV>SV).