Evolution of secretin family GPCR members in the metazoa

Background Comparative approaches using protostome and deuterostome data have greatly contributed to understanding gene function and organismal complexity. The family 2 G-protein coupled receptors (GPCRs) are one of the largest and best studied hormone and neuropeptide receptor families. They are suggested to have arisen from a single ancestral gene via duplication events. Despite the recent identification of receptor members in protostome and early deuterostome genomes, relatively little is known about their function or origin during metazoan divergence. In this study a comprehensive description of family 2 GPCR evolution is given based on in silico and expression analyses of the invertebrate receptor genes. Results Family 2 GPCR members were identified in the invertebrate genomes of the nematodes C. elegans and C. briggsae, the arthropods D. melanogaster and A. gambiae (mosquito) and in the tunicate C. intestinalis. This suggests that they are of ancient origin and have evolved through gene/genome duplication events. Sequence comparisons and phylogenetic analyses have demonstrated that the immediate gene environment, with regard to gene content, is conserved between the protostome and deuterostome receptor genomic regions. Also that the protostome genes are more like the deuterostome Corticotrophin Releasing Factor (CRF) and Calcitonin/Calcitonin Gene-Related Peptide (CAL/CGRP) receptors members than the other family 2 GPCR members. The evolution of family 2 GPCRs in deuterostomes is characterised by acquisition of new family members, with SCT (Secretin) receptors only present in tetrapods. Gene structure is characterised by an increase in intron number with organismal complexity with the exception of the vertebrate CAL/CGRP receptors. Conclusion The family 2 GPCR members provide a good example of gene duplication events occurring in tandem with increasing organismal complexity during metazoan evolution. The putative ancestral receptors are proposed to be more like the deuterostome CAL/CGRP and CRF receptors and this may be associated with their fundamental role in calcium regulation and the stress response, both of which are essential for survival.


Background
The Guanine protein coupled receptor (GPCRs) family is one of the largest receptor groups in vertebrates. Members of this family are also present in unicellular eukaryotes such as yeast and in plants which suggests that they are of ancient origin [1]. In the human genome, GPCRs account for approximately 2% of the coding genes [2,3] and they bind structurally diverse ligands such as protons, odorants, biogenic amines, peptides and glycoproteins [4,5]. Recent analysis of the human genome identified five main GPCR subfamilies collectively known as GRAFS (Glutamate, G; Rhodopsin, R; Adhesion, A; Frizzled, F; and Secretin, S) [1,6]. This grouping was based on protein motifs characterised by the presence of seven highly conserved transmembrane domains (TM). Several authors have proposed the existence of a common ancestral gene in early metazoans that, as a consequence of successive duplication events, generated the full complement of family members in vertebrates [6,7]. Such a proposal is in general agreement with the genome duplication theories of Haldane 1932 [8], Muller 1935 [9] and Ohno 1970 [10]. All of which suggest that the existence of gene family members in chordates is a consequence of genome duplication events in the vertebrate lineage during evolution and that gene duplicates are an essential source of organism diversity.
The present work focuses on the secretin family (a.k.a family B or 2) of GPCRs which represent one of the largest receptor families for hormones and neuropeptides involved in several important biological functions. Previous in silico analysis identified a total of 50 family 2 GPCR members in the human genome [6,11,12]. However only receptor members of the following groups: a) Corticotrophin Releasing Factor (CRF); b) Secretin (SCT), Vasoactive Intestinal Peptide (VIP), Pituitary Adenylate Cyclase-Activating Polypeptide (PACAP) and Growth Hormone Releasing Hormone (GHRH); c) Glucagon (GCG), Glucagon-Like Peptide (GLP), Glucose Insulinotropic Peptide (GIP); d) Parathyroid Hormone (PTR) and e) Calcitonin (CAL) and Calcitonin Gene-Related Peptide (CGRP) [13,14] have been functionally characterised and identified in other vertebrates such as birds, amphibians and teleosts [15][16][17][18]. So far, in invertebrates putative family 2 GPCRs have been found based upon sequence similarity and phylogenetic studies ( [1,14,17,19,20], but their evolution and function is poorly described. In this study, comparative analyses using phylogenetically distant organisms have been used to study the evolution of members of the family 2 GPCRs in metazoans ( Figure  1). Gene family members were characterised in protostome (nematode and arthropod) and tunicate (Ciona) genomes and compared with their vertebrate homologues (Takifugu and human). Putative family 2 GPCR receptor members were identified and isolated in silico from public genome databases and their expression analysed by RT-PCR. The gene structures and gene environments vis à vis gene content of the protostome and deuterostome receptors were compared and a model for the evolution of family 2 GPCR receptors is proposed.

Putative family 2 GPCRs in invertebrates
Initially, three, five and nine putative family 2 GPCR genes were identified respectively in the protostome genomes of the nematodes (C. elegans and C briggsae), the arthropods (D. melanogaster and Anopheles gambiae) and in the deuterostome genome of the tunicate C. intestinalis. With the exception of the C.briggsae and mosquito receptor genes, the protostome and tunicate predicted gene sequences were edited taking into consideration EST data available in order to minimise errors derived from in silico gene predictions (Additional file 1). Sequence comparison and other in silico approaches revealed that of these genes, only two in the nematodes, three in the fruit-fly, one gene in the mosquito and eight in the Ciona genome contained seven TM domains and were considered to be putative family 2 GPCRs members ( Figure 2, Table 1). Database searches carried out in the prokaryote E. coli and the unicellular eukaryote S. cerevisae genomes did not produce any significant alignments for family 2 receptor genes.

Phylogenetic analysis
Because of gaps and errors in draft genome data, only 4 TMs were used in the full analysis: TM2, TM4, TM5 and TM6 which were found to be common to all metazoan genes analysed (Additional file 2). The amino acid sequences of the TM domains of a total of 52 receptors were concatenated and aligned using the ClustalX programme (Blosum matrix and Gap opening penalty 10 and Gap extension 0.2). The alignment produced (length 98, with 99 informative sites) did not required the insertion of gaps and was used for phylogenetic analysis and the consensus tree obtained is presented on Figure 2. Family 2 GPCR members are suggested to have evolved via both late and early gene duplication events, which have occurred during metazoan evolution. Examples of specific gene duplication events which are well supported by the high bootstrap values of the tree nodes are the Ciona gene pair CinS5A/CinS5B, CinS372/CinS752 and the gene pair CinS50/CinS273. After neighbour joining, maximum parsimony and minimum evolution phylogenetic analysis, the protostome receptors and Ciona CinS93, CinS50 and CinS273 genes tended to cluster with the deuterostome CRF and CAL/CGRP receptor family members. In Ciona orthologues of the majority of the vertebrate family members such as CinS752 and CinS352 group with the PTR receptor subfamily, whilst CinS5A, CinS5B and CinS70 genes seem to be more related in sequence to the GCG/ GLP/GIP and PAC/VPAC/GHRH families however CinS5A and CinS5B position in the tree was not clearly defined.

Sequence comparative analyses
Comparison of the amino acid sequences of the putative protostome and tunicate receptors with the vertebrate homologues revealed the existence of conserved amino acid motifs at the ligand-binding N-terminal region (Figure 3). Large N-terminal regions containing five conserved cysteines were identified in all species analysed with the exception of the Ciona CinS273 and CinS70 genes which due to incomplete genome coverage and EST data available lacked complete N-terminal regions and CinS50 where the putative initial methionine was not identified. Other highly conserved amino acid residues such as the amino acid aspartate (D) before the motif C-W-P and the amino acid motifs C-W-P, C-P and G-X-W (where X is any amino acid) (crucial for ligand binding in mammals) [21,22] were also identified. Moreover, the amino acid glycine (G) localised between this latter motif and the C-Phylogenetic position of the protostome and deuterostome genomes analysed Phylogenetic relationship of the metazoan family 2 GPCRs Figure 2 Phylogenetic relationship of the metazoan family 2 GPCRs. Consensus phylogenetic tree (neighbour joining method, pairwise gap deletion, Poisson correction distance and 1000 bootstraps) produced with family 2 GPCR TM domains (TM2, TM4, TM5 and TM6). The protostome (nematodes and arthropods) and tunicate (Ciona) receptors are underlined and the bootstrap values for each fork is indicated. Bootstrap values less than 50 were removed. Annotation of the receptor subfamilies was carried out according to Donnelly (1997) [13]. The human (Hsa), Takifugu (Tru), nematodes, C. elegans (Cel) and C. briggsae (Cbr), arthropod D. melanogaster (Dme) and A. gambiae (Aga) and Ciona (Cin) receptor sequences were obtained as described in the methods section. P motif was also found to be conserved amongst metazoans however no functional role for this residue has been yet assigned.
To strengthen previous analysis and in order to identify novel amino acid and protein motifs that have been conserved within each receptor subfamily, members that might have a potential role in ligand binding in the protostome and deuterostome N-terminal regions of the receptors were compared based on receptor clustering groups previously obtained by phylogentic analysis. Putative protostome and tunicate members of the CAL/CGRP, CRF, PTH and CGC/GLP/GIP receptor families were aligned with the vertebrate homologue receptor genes (Additional files 3456). Examples of novel conserved Nterminal protein motifs identified are the W-S/T-N-Y/F motif in CAL/CGRP receptors alignment (Additional file 3); the motif G-V/I-X-Y (X any amino acid) within CRF receptor group (Additional file 4) which has been previously reported to be involved in ligand binding [23,24]; the motifs E-W, P-G; Y-I-Y/I-D-F-D/N-H and A-X-R (X any amino acid) in the PTR group and amino acid HsaPTR1 R 186 (Additional file 5) which was previously found to be determinant for PTH binding [25] and the motif Y-L/I-P/ E-W within GCG/GLP/GIP receptor group (Additional file 6).

Short-range linkage mapping analysis
Short-range linkage analysis was carried out between the protostome (C. elegans and D. melanogaster) with the deuterostome (Takifugu and human) homologous regions which contain family 2 receptor genes. The linked genes identified in both C. elegans and Takifugu were used to identify homologous genes in the Drosophila ( Figure 4) and human ( Figure 5) genomes. The protostome gene environment was found to be conserved and linked genes were identified between the nematode and insect genomes. The Drosophila X chromosome and chromosome III of C. elegans showed the greatest number of linked genes. Within the protostome genome three genes namely 3H538, Clp-2 and him-4 in C. elegans were also found to be conserved in the homologous deuterostomes genome regions analysed ( Figure 5). A number of Takifugu scaffolds were found to share a similar gene environment with C. elegans chromosomes II and X, and human chromosomes 2, 3, 6, 7, and 17. This suggests that the nematode chromosome regions may be very similar in terms of gene content with the ancestral chromosomal region that gave rise to this family of receptors in vertebrates ( Figure  5).

Gene organisation in protostomes and deuterostomes
The gene organisation of the regions encompassing the seven TM domains of the protostome and tunicate family 2 GPCR genes were characterised and compared with the vertebrate human and Takifugu homologous regions (Figure 6). Different gene structures were observed between protostomes and deuterostomes. This was mainly due to an increase in intron number in the latter species. Comparison of exon/intron boundaries revealed total conservation of splice sites (AG/GT) although intron phases are generally poorly conserved with the exception of the TM1 and TM2 boundary ( Figure 6). Amongst protostome family 2 GPCR genes, organisation is poorly conserved. In nematodes two different gene organisations were identified for the receptors CelC13B9.4/CbrCAE70126 (5 exons) and CelC18B12.2/ CbrCAE63268 (6 exons), respectively. In the Drosophila genome, a different gene organisation also exists for each of the three receptor genes analysed. TM1 and TM2 in the Drosophila genes DmeCG32843 and DmeCG8422, in common with the nematode, tunicate and vertebrate receptors, are encoded by individual exons. The DmeCG13758 gene has the most divergent gene structure amongst protostomes and is composed of 3 exons. In mosquito, the gene structure of AgENSANGP00000014164 (AgaP14164) is similar to DmeCG8422 with which it share greatest sequence similarity.
Names of the protostomes and tunicate receptor clones identified by sequence similarity searches using the vertebrate family 2 GPCRs (*) does not have 7 TM domains; (#) TM4 was not identified; (+) incomplete Comparison of the N-terminal end of metazoa family 2 GPCRs

, NP_498465) and
Caenorhabditis briggsae (CbrCAE70126, CAE70126; CbrCAE63268, CAE63268). EST data was used to obtain the N-terminal region of the incomplete receptor sequences (Additional file 1). The N-terminal region of Ciona CinS752 was predicted by NIX and that of CinS5A by sequence comparison with CinS5B. Only the clones for which a putative N-terminal domain was identified were included in the analysis. For figure simplicity, an arrow indicates the region of the receptor CbrCAE70126 that was eliminated since it did not align with any other sequences present.
In contrast with protostomes, two main gene structures were characterised in deuterostomes with TM4, TM5 and TM7 domains shared between two exons. In human and Takifugu, the CAL and CGRP receptors genes form a separate group with a different gene organisation from the other receptor genes. These receptors seem to have lost an intron between the exons that contain part of TM5 and TM6 and part of TM7 respectively and are composed of 7 exons with identical intron phases. The Ciona CinS5A, CinS5B, CinS50 and CinS273 receptor genes are also shorter and composed by 6 exons. With the exception of the CAL and CGRP receptors all human and Takifugu family 2 genes and Ciona CinS70, CinS93, CinS372 and CinS752 receptors are composed of 8 exons and share identical intron phases.

Expression analysis
RT-PCR with specific primers for each receptor gene was carried out using total RNA extracted from whole adult C. elegans, Drosophila, mosquito and Ciona (results not shown). The nematode receptor genes CelC18B12.2 and CelC13B9.4, the Drosophila DmeCG13758, DmeCG32843 and DmeCG8422 were successfully amplified. The tissue distribution was refined in Ciona with expression analyses carried out in the intestine, pharyngeal basket, gonads, endostyle and cerebral ganglion. The putative family 2 GPCR encoded by CinS70, was expressed in all tissues analysed, but was mainly present in the intestine, gonads and cerebral ganglion. The CinS5A gene expression had a similar tissue distribution to CinS70. However its duplicate, CinS5B had a more limited distribution and was only expressed in the intestine. The receptors CinS273 and CinS752 were only expressed in gonad and cerebral ganglion tissue, whilst CinS93 was expressed in the endostyle. CinS50 was restricted to the ovary where it was weakly expressed. It was not possible to amplify AgENSANGP00000014164 and CinS372 receptors.
Gene environment comparison between C. elegans and Drosophila family 2 GPCRs genomic regions Figure 4 Gene environment comparison between C. elegans and Drosophila family 2 GPCRs genomic regions. Short-range linkage analysis of the region surrounding family 2 GPCRs on the C. elegans chromosome III and X with the Drosophila chromosome regions containing family 2 GPCR receptor genes. Genes are represented by horizontal bars and gene identification and chromosome position is given at the side. Family 2 GPCRs members are highlighted in bold. The lines represent the correspondence between the genes in each species. The dashed lines represent the common genes that were identified in both protostome and deuterostome genomes. For simplicity, only genes that are in common are represented.

Discussion
In total 16 putative family 2 GPCR members were identified and characterised in the protostome and in the tunicate genome using the human and Takifugu TM domain sequences. In the human genome a total of 50 family 2 GPCR receptors have been described [6,11,12]) although ligands have only been assigned for 15 family 2 GPCR members [16,26]. The evolution of this latter receptor group is the major subject of this study.
To avoid the inclusion of potential intronic regions which could bias analysis, the invertebrate genes were manually edited using expression data available and sequence simi-larity for the homologue genes and the in silico analysis performed was restricted for receptor conserved motifs at N-terminal and TM domains. The putative invertebrate receptors identified share the general characteristics of family 2 GPCR members with the presence of conserved cysteine residues [27,28] and several highly conserved amino acids and protein motifs at the N-terminal domain which is involved in ligand binding [22]. Analysis of unicellular organism genomes such as the prokaryote E. coli and the eukaryote S. cerevisae failed to reveal any putative family 2 receptor genes suggesting that these receptor genes are characteristic of metazoan genomes.
Gene environment comparison between C. elegans and Takifugu and Human family 2 GPCRs genomic regions Figure 5 Gene environment comparison between C. elegans and Takifugu and Human family 2 GPCRs genomic regions. Short-range linkage analysis of the region surrounding family 2 GPCRs on the C. elegans chromosome III and Takifugu and human chromosome regions containing family 2 GPCR receptor genes. Genes are represented by horizontal bars and gene identification and chromosome position is given at the side. Family 2 GPCRs members are highlighted in bold. The lines represent the correspondence between the genes in each species. The dashed lines represent the common genes that were identified in both protostome and deuterostome genomes. For simplicity, only genes that are in common are represented.
The protostome genes identified are more like the vertebrate CAL/CGRP and CRF receptors subfamilies suggesting that ancestral family 2 GPCR members were most like these genes. In vertebrates these receptors are associated with calcium homeostasis and the stress axis respectively, and appropriate functioning is essential for survival [29][30][31][32]. No studies are available describing the role of these receptors in protostomes and their classification was based upon their sequence similarity, so they may not be functional orthologues. Recently, expression studies carried out using the Drosophila DmeCG8244 and DmeCG32843 (equivalent to CG17415) receptor genes, which are very similar in sequence to the vertebrate CRF and CAL/CGRP receptors, respectively, were indeed found to be functional orthologues. These studies revealed that DmeCG8244 and DmCG32843 in the presence of insect diuretic hormones (DH) were functional and activate a similar intracellular signalling pathway to the vertebrate receptor genes [33,34].
It is generally accepted that complexity of vertebrates and the origin and development of physiological systems is a consequence of the acquisition of new genes by gene or exon duplications that occurred during chordate evolu- tion ( [10,[35][36][37][38]. The absence of sequence homologues of the other members of family 2 in protostomes may be a consequence of the relatively low complexity of nematodes and arthropods when compared with vertebrates. For example, it is known that VIP, PACAP, SCT, GCG, GLPs and GIP peptides and their receptors are mainly associated with the nervous and gastrointestinal systems. In protostomes these biological structures are of very low complexity [39] and an organised gastroenteropancreatic system has only been identified after the divergence of tunicates. In general, a similar evolutionary profile occurs during the development of metazoan nervous systems (with the exception of cephalopod molluscs, which also have a highly developed nervous system). The occurrence of an organised brain and complex central nervous structure is only present in vertebrates. In the majority of protostomes and early deuterostomes the nervous system is mainly characterised by the presence of simple structures such as cerebral and head ganglions to which several nerve networks are connected and like the gastroenteropancreatic system it has been evolving by varying degrees of complexity throughout metazoan evolution [39].

Evolution of family 2 GPCRs in the metazoa
The existence of an increasing gene number of family 2 GPCR members in the metazoan lineage clearly suggests that evolution of this gene family results from a series of duplications. Several gene duplication events have been proposed to have occurred in chordate genomes. However if they are a consequence of two total genome duplication events (2R theory, [10,[40][41][42]) or a result of independent single gene duplications ( [43,44]) is still under debate. The 2R theory has been generally accepted to justify the presence of gene family members and novel genes in higher vertebrate genomes when compared with early chordate and invertebrate species. Family 2 GPCR members have been identified in the majority of vertebrate genomes [17,20] and searches carried out on the amphibian and chicken genomes identified an equivalent number of gene family members (data not shown) to that found in the human genome. In particular putative SCT receptor genes, lacking in teleost genomes, were identified as previously described by Langerstrom et al [46]. The absence of an equivalent SCT receptor sequence in teleosts but its identification in amphibian and avian genomes may indicate that this receptor either, i) arose after the divergence of the fishes or, ii) was lost in the fish lineage and studies on ancient fishes (eg. Agnatha) should help to clarify this issue.
The protostome receptors contain the most divergent gene structures when compared with those of deuterostomes (intron number, TM domain distribution and intron phases). This is probably a consequence of the higher rate of chromosomal and gene rearrangements when compared to early chordates and vertebrates [47][48][49][50]. The rea-son behind differences in intron numbers between protostomes and deuterostomes remains to be established. The precursor gene in Urbilateria (last common ancestors of protostomes and deuterostomes) remains to be identified and therefore the difference in intron numbers can be explained either by intron gain during metazoan evolution after the divergence of the protostome and deuterostome [51][52][53] or by intron loss in the protostome lineage [54,55]. Interestingly, the vertebrate CAL and CGRP receptor gene organisation is more like the tunicate receptors than other vertebrate family 2 members and the way in which these receptors function is also different. Accessory single transmembrane proteins (Receptor activity modifying proteins, RAMPs [56,57]) interact with both CAL and CGRP receptors and alter their affinity profile for the ligands (CAL, CGRP, adrenomodullin and amylin) [58,59]. In fact CAL receptor-RAMP heterodimerazation is essential for receptor function but not for the other family 2 members [58]. It remains to be established if such functional constraints can influence gene evolution and in particular CAL/CGRP receptors gene evolution.
Expression analyses indicates that with the exception of the single mosquito receptor and CinS372, all protostome and tunicate genes are expressed. In general, the tissue distribution of the Ciona receptors mirrors the expression of the vertebrate family 2 GPCRs sequence homologues. For example, the duplicate CinS5A and CinS5B receptors and CinS70 which share sequence similarity for the vertebrate brain-gut peptide GCG, GLP, GIP receptors were found to be expressed in Ciona intestine, gonads and neural ganglion suggesting that like the vertebrate homologues they may also have a role in the gastrointestinal, reproductive and nervous systems [60,61]. The function of the vertebrate receptors in the reproductive system is not clear, but in the nervous and gastrointestinal systems they are involved in carbohydrate, amino acid and lipid metabolism [60,62]. It remains to be established if the tunicate receptors localised in these tissue have a similar functional role.
The Ciona CAL/CGRP (CinS93) homologue was only expressed in the endostyle whilst the tunicate PTR-like receptor homologue (CinS752) was present in the neural ganglion and the gonads. In vertebrates these receptors are found to have an important role in the endocrine regulation of calcium mediated by CAL and PTH hormones [63]. The presence in Ciona of the vertebrate homologue receptors may suggest that elements of calcium homeostasis are conserved between tunicates and vertebrates. Moreover, the expression of CinS93 in Ciona endostyle, the homologue of the vertebrate thyroid gland, the site of CAL production [64], further supports this hypothesis. The CinS50 and CinS273 are the sequence homologues of the vertebrate CRF receptors, which are mainly associated with stress response [65]. In vertebrates both receptors are found to be expressed in nervous tissue and CRF1 receptor was also found in the gonads and CRF2 receptor in the gastrointestinal tract [66,67]. The pattern of expression of the tunicate CinS50 and CinS273 was similar and both receptors were expressed in the gonads whilst CinS273 was further detected in the neural ganglion suggesting they may also play a functional role in the nervous and reproductive systems in tunicates. Functional and ligandbinding studies are required to characterise the physiological role of the tunicate receptors. Moreover, the isolation and characterisation of their putative ligand peptides which have yet to be comprehensively described will be essential to understanding of their function.
The evolution of family 2 GPCR receptor genes in protostomes and deuterostomes is probably the result of a combination of species-specific gene duplications and gene or genome duplication events in ancestral gene precursors ( Figure 6). For example, the Drosophila DmeCG8422 and DmeCG32843 map to chromosome 2R and share a similar gene organisation. This suggests that they arose by a specific gene duplication event. Based on their sequence similarity, gene organisation and intron phases, the Drosophila DmeCG8422 and DmeCG32843 receptor genes appear to be the orthologues of the nematode CelC13B9.4 and Ciona CinS50/CinS273 and CinS93 receptors, respectively. In tunicates, 3 different family 2 GPCR ancestral gene precursors probably existed: the gene precursor for CinS50/CinS273, the gene precursor for CinS93 and a common gene precursor for CinS70/ CinS5A/CinS5B/CinS372/CinS752. These are proposed to be the origin of the vertebrate CRF, CAL/CGRP genes and remaining deuterostome family 2 GPCRs, respectively.

Conclusion
Putative family 2 receptor genes were isolated and characterised from a number of different invertebrate genomes. This study provides for the first time a comprehensive description of the gene sequence, structure and expression of family 2 GPCRs members in invertebrates providing important clues about their origin and evolution along the metazoan divergence. The CAL/CGRP and CRF receptors are proposed to be the first family 2 members to evolve in contrast to SCT receptors which seem to have evolved much later and are only present in tetrapods. Studies such as this, can via a mixture of in vitro and in silico approaches, contribute to a better understanding of gene regulation in vertebrates.

Sequence database searches
Sequence database searches were carried out on the genomes of the prokaryote Escherichia coli (E. coli), the unicellular eukaryote Saccharomyces cerevisae (S. cerevisae), the nematodes Caenorhabditis elegans (C. elegans) and Caenorhabditis briggsae (C. briggsae), the insects, Drosophila melanogaster (D. melanogaster) and mosquito Anopheles gambiae (A. gambiae) and in the tunicate Ciona intestinalis (C. intestinalis) (  [68] to interrogate invertebrate genomes. The invertebrate in silico predicted sequences were identified based on their sequence similarity for the vertebrate receptor genes using a cut-off E value higher than 10 and their sequences were manually edited according to their similarity for the homologue genes in vertebrates and EST data available (Additional file 1). The identity of the invertebrate genes was further confirmed against the GPCR database at CMBI [69] and in order to substantiate previous searches (identify putative receptors that were not identified) they were further used to search all the genome databases used in this analysis (Table 2).

Gene organisation of invertebrate family 2 GPCRs
The gene organisation of the invertebrate family 2 members was manually characterised. This approach was complemented using the available protostome and Ciona EST data to identify putative N and C terminal ends of the protein (Additional file 1). The presence of TM domain regions was verified using the TMHMM Server v. 2.0 [70] and their positions were subsequently confirmed by mul-  [71]. The gene structures of the human receptors were characterised using the Spidey mRNA-to-genomic alignment programme [72] and the Takifugu receptor gene organisation characterised as described in Cardoso et al, 2005[17].

Linkage analysis
The gene environments of the protostome (C. elegans and Drosophila) and deuterostome (human and Takifugu) receptor genes were compared using a sequence similarity approach. The human, C. elegans and Drosophila gene environments were accessed using the NCBI Mapview interfaces [73]. The gene environment of the Takifugu scaffolds (release17/05) was accessed using NIX annotation [74] and the neighbouring genes were used to search for orthologues in human, C. elegans and Drosophila genomes using the TBLASTX algorithm [75].

Sequence comparison and phylogenetic analysis
Sequence alignments of the predicted protostome and deuterostome receptor protein sequences were carried out using the Clustal X programme [76] (Blosum matrix, Gap opening penalty 10, Gap extension 0.2) with and percentage similarity were calculated using GeneDoc [77]. The evolutionary analysis between the protostome and deuterostome receptor genes was carried out using the TM domains that were complete and common to all receptor genes (TM2, TM4, TM5 and TM6) following a similar strategy has previously described [17]. Manual editing of the Takifugu family 2 GPCRs did not identify TM1 domain of TruS012367, the TM5 of TruCRFR2 was found to be incomplete and TM3 of TruS000381 was frameshifted.
The four TM domain sequences common to all metazoan were concatenated and aligned using the ClustalX programme as described. The alignment produced (length 98, with 99 informative sites) was used for phylogenetic analysis using the neighbour joining, maximum parsimony and minimum evolution methods with 1000 bootstrap replicates in the MEGA 3.1 phylogenetic programme [78]. Multiple sequence alignments were also carried using the manually edited protostome and deuterostome receptors within each family 2 GPCR group using the ClustalX programme (according to the parameters previously described) in order to further identify conserved protein motifs or amino acid residues at the N-terminal regions that might be involved in ligand-binding.

Expression analysis
In order to investigate the expression of the putative protostome and tunicate receptors RT-PCR was carried out using cDNA produced from whole individual organisms. Total RNA from adult individuals was extracted from the nematodes, Drosophila and Ciona with TRI reagent (Sigma-Aldrich, Spain) according to the manufactures instructions. 1-2 μg of total RNA was used for cDNA synthesis and each reaction was performed as follows, 1xRT-PCR buffer (Invitrogen), 0.25 mM dNTPs (Amersham-Biosciences, UK), 0.05 μg/μl random hexameric oligonucleotides, 1 U MMLV-RT (200 U/μl) (Promega, USA) and 0.2 U RNAguard 36.3 U/μl (Amersham-Biosciences). Specific primers for each receptor gene were designed spanning different exons to detect potential genomic contamination. A control PCR using primers for housekeeping genes was also preformed in order to control the amount of cDNA utilised in each reaction. Specific primers for Ciona 18 S ribosomal protein were designed but sea bream 18 S and β-actin primers were routinely used in the Drosophila and the nematode. All primers (including housekeeping control sequences) used are described in Table 3. All the PCR reactions were performed with 1xPCR buffer (Euroclone, Italy), 1.5 mM MgCl 2 (Euroclone), 0.2 mM dNTPs (Amersham-Biosciences), 1 mM of each primer (Forward and Reverse) EuroTaq DNA Polymerase 5 U/μl (Euroclone) and DNase Free water (Sigma-Aldrich) for a 25 μl final reaction volume. Amplification of all the genes was carried out using a standard cycle with an initial denaturing step of 93°C for 2 minutes, followed by 35 cycles of: 30 s at 93°C, the annealing temperatures of primers for 60 s and 72°C for 30 seconds followed by a final chain extension step of 72°C for 5 minutes. The reaction products were cloned into pGEMT-easy vector (Promega) and sequenced to confirm their identity.

Authors' contributions
The majority of the work here described was carried by JCRC in collaboration with FAV and VCP. MSC and DMP planned the study, and critically revised the manuscript for important intellectual content and data analysis. All authors read and approved the final manuscript