Research article | Open | Published:
Amphioxus (Branchiostoma floridae) has orthologs of vertebrate odorant receptors
BMC Evolutionary Biologyvolume 9, Article number: 242 (2009)
A common feature of chemosensory systems is the involvement of G protein-coupled receptors (GPCRs) in the detection of environmental stimuli. Several lineages of GPCRs are involved in vertebrate olfaction, including trace amine-associated receptors, type 1 and 2 vomeronasal receptors and odorant receptors (ORs). Gene duplication and gene loss in different vertebrate lineages have lead to an enormous amount of variation in OR gene repertoire among species; some fish have fewer than 100 OR genes, while some mammals possess more than 1000. Fascinating features of the vertebrate olfactory system include allelic exclusion, where each olfactory neuron expresses only a single OR gene, and axonal guidance where neurons expressing the same receptor project axons to common glomerulae. By identifying homologous ORs in vertebrate and in non-vertebrate chordates, we hope to expose ancestral features of the chordate olfactory system that will help us to better understand the evolution of the receptors themselves and of the cellular components of the olfactory system.
We have identified 50 full-length and 11 partial ORs in Branchiostoma floridae. No ORs were identified in Ciona intestinalis. Phylogenetic analysis places the B. floridae OR genes in a monophyletic clade with the vertebrate ORs. The majority of OR genes in amphioxus are intronless and many are also tandemly arrayed in the genome. By exposing conserved amino acid motifs and testing the ability of those motifs to discriminate between ORs and non-OR GPCRs, we identified three OR-specific amino acid motifs common in cephalochordate, fish and mammalian and ORs.
Here, we show that amphioxus has orthologs of vertebrate ORs. This conclusion demonstrates that the receptors, and perhaps other components of vertebrate olfaction, evolved at least 550 million years ago. We have also identified highly conserved amino acid motifs that may be important for maintaining receptor conformation or regulating receptor activity. We anticipate that the identification of vertebrate OR orthologs in amphioxus will lead to an improved understanding of OR gene family evolution, OR gene function, and the mechanisms that control cell-specific expression, axonal guidance, signal transduction and signal integration.
Genes encoding odorant receptors (ORs) were first identified by Linda Buck and Richard Axel in 1991 . Prior to 1991, experiments from several other labs suggested that odorant receptors were seven transmembrane (TM) domain G protein-coupled receptors (GPCRs), so Buck and Axel used PCR with degenerate primers designed from available GPCR sequences to query cDNA isolated from rat olfactory epithelium tissue. The new genes they discovered were then used as probes to search rat cDNA and genomic DNA for additional paralogs . This similarity-based approach, in which query sequences are used to identify orthologs and then paralogs, is a staple of both molecular and bioinformatics research. These and subsequent studies have now uncovered over a thousand rat and mouse odorant receptors [2–5] and have led to the identification of other GPCR families involved in vertebrate olfaction such as the trace amine-associated receptors (TAARs) , the type 1  and type 2 vomeronasal receptors [8–10] and the formyl peptide receptor-like proteins .
In mammals, phylogenetic analyses have shown that many of the OR-encoding genes are the products of relatively recent duplication events. There are fewer OR genes in fishes, however the fish genes are more variable at the sequence level [12, 13]. Despite lineage-specific gene amplification and loss, ORs in vertebrates are members of a single large monophyletic clade. Here we report the results of our search for orthologs of vertebrate ORs in the tunicate, Ciona intestinalis (subphylum Urochordata), and in amphioxus, Branchiostoma floridae (subphylum Cephalochordata).
Recently, phylogenetic analyses have shown that Urochordata is the extant sister of the vertebrates and that Cephalochordata is the sister group to the vertebrate plus urochordate clade , which is called Olfactores . Whole genome sequences are available for C. intestinalis and B. floridae, but similarity-based surveys have not yet identified orthologs of vertebrate ORs in either genome [16, 17]. However, neither study employed the available diversity of vertebrate OR sequences as queries in their survey. Here we used a bioinformatics approach that mimics the molecular strategy of Buck and Axel. Instead of degenerate primers, we used an HMM model based upon a broad diversity of full-length fish OR sequences as a probe to survey the C. intestinalis and B. floridae protein predictions. The candidate ORs identified were then used as Blastp query sequences to search within each species for additional ORs. This experiment uncovered a family of 61 OR genes in B. floridae but no ORs in C. intestinalis. Phylogenetic analyses demonstrate that the amphioxus genes we uncovered are orthologs of vertebrate ORs. Many of these new B. floridae sequences lack introns and are linked as is the case for most vertebrate ORs.
We identified amino acid motifs that can discriminate between ORs and non-OR GPCRs in a regular expression-based survey. These key residues may prove to be useful for identifying formerly unrecognized ORs in vertebrates and for identifying orthologs in even more distantly related taxa, such as echinoderms and hemichordates. Our results provide the foundation for future comparative studies with cephalochordates, urochordates and early vertebrates. The results will also aid in the understanding of OR gene family evolution, OR function, the mechanisms that control single receptor expression, axonal guidance, signal transduction and signal integration.
HMM and Blastp
When we searched the B. floridae protein predictions using the HMM model derived from fish odorant receptors with an e-value cut off of E-10, three B. floridae proteins were identified. No proteins in the C. intestinalis protein predictions database were identified using the same search criteria. Each of the three amphioxus sequences was used as a query in a Blastp search of the B. floridae protein predictions. This Blastp search identified 50 sequences that were at least 40% identical to one or more of the three query sequences over a minimum of 100 amino acids. To uncover additional candidate ORs, a second Blastp search was carried out using the 50 hits from the first search as query sequences. The HMM search combined with two Blastp searches generated a list of 246 candidate ORs from the B. floridae protein predictions. However, 2 of the 50 hits from the first Blastp search (Braf1_106555 and Braf1_92691) had unusually long N termini and these domains alone aligned to 180 of the genes in the second Blastp search. Five more sequences were hits only to the C termini of query sequences Braf1_111311, Braf1_69444 and Braf1_87794. None of these 185 hits to N or C termini contain any of the transmembrane spanning domains and they were removed from the dataset, leaving 61 candidate amphioxus ORs (see Additional file 1). Three of these 61 proteins were previously identified as G protein-coupled receptors , but they were not considered to be ORs. One was classified as a basal member of the Rhodopsin amine family (Braf1_69014), and the other two were not classified (Braf1_109264 and Braf1_69037). Of the 61 genes, 50 are considered full-length genes because they contain all seven TM domains; the remaining 11 are partial sequences because they are missing at least one of the seven TM domains.
We aligned the 50 full-length candidate ORs from B. floridae with vertebrate ORs (see Additional file 2 for sequence list), some of which were used in the construction of the HMM. We also included non-OR GPCRs from the Rhodopsin family to root the tree (alignment shown Additional file 3). The OR and non-OR out-group sequences have several 'anchor' residues common to Rhodopsin family GPCRs. These features include: a conserved cysteine residue in transmembrane domain three, TM3 [18, 19], the conserved E/DRY motif at the junction of TM3 and intracellular loop two (IL2), a tryptophan residue in TM4, and the NPxxY motif in TM7 [20, 21]. These conserved sites were used to obtain a reliable alignment. The results of this analysis (Figure 1) suggest that B. floridae ORs fall into two subfamilies: one contains 40 genes, the other contains 10 genes. The phylogeny also shows that all B. floridae candidate ORs belong to a monophyletic clade and that this clade is the sister group to type 1 vertebrate ORs. This last observation suggests that vertebrate type 2 ORs diverged from type 1 ORs prior to the split between cephalochordates and the Olfactores . Finally, a single gene from Branchiostoma belcheri, believed to be an amphioxus OR based on its expression domain , occurs in the larger subfamily of B. floridae ORs.
Regular expression survey
The phylogenetic node separating amphioxus and Olfactores  is approximately 550 million years old . By identifying individual amino acid residues or motifs that are conserved in amphioxus and vertebrate ORs we may be able to find those that play an important role in OR function. Four conserved regions were uncovered using WebLogo (Figure 2). Three of these are found in intracellular loops 1-3, and one is found in TM7. For each of these conserved regions, it was possible to derive between 1 and 12 sub-motifs that could be evaluated in terms of their ability to discriminate between ORs and other GPCRs from the Rhodopsin gene family. These motifs were used in regular expression searches of an OR and a non-OR MySQL databases. From this list we identified one motif (KAxxTxxxH) that is found in more than 73% of ORs and less than 1% of non-ORs, and two motifs (MxxxxYxxxCxPLxY, and LxxPxYxxxxxLxxxDxxxxxxxxP) that are present in more than 44% of ORs and less than 1% of non-ORs (Table 1).
Amphioxus OR gene structure and location
Of the 61 B. floridae ORs, 50 are considered to be full-length genes. Of these, 35 are predicted to be intronless (see Additional file 1). In the B. floridae assembly version 2.0, the 61 genes are found on 44 scaffolds. Four of these scaffolds contain two ORs, two contain three ORs and three contain four ORs.
Using a combination of HMM and Blastp searches, we have identified 50 full-length and 11 partial sequences among the B. floridae protein predictions that appear to be odorant receptors (ORs). Similarities between the vertebrate ORs used to generate the HMM and amphioxus hits to this HMM are low. However, the stringent criteria used in our alignment-based searches and the bootstrap support for the key nodes in the phylogenetic tree support the hypothesis that these amphioxus genes are orthologs of vertebrate odorant receptors. Furthermore, the B. floridae candidate ORs have amino acid motifs found in vertebrate ORs that appear not to occur, or occur very rarely, in non-OR genes from the Rhodopsin family. Lastly, evidence has been reported (see below) indicating that these genes are likely to be expressed in B. floridae rostral epithelium.
Vertebrate ORs have recently been divided into two groups, the type 1 and the type 2 ORs; the type 1 genes have been further subdivided into six clades . Genes from only two of these type 1 clades are present in mammals, whereas fish and amphibians have genes from five of the six clades. Type 2 ORs have been subdivided into three clades and appear to be present only in amphibians and fish . Since type 1 ORs have been identified in lamprey , the divergence between these two lineages of paralogous genes occurred at least 450 million years ago . Representatives from all nine type 1 and type 2 vertebrate OR clades were included in a phylogenetic analysis with the candidate ORs from B. floridae identified here. The results of this analysis demonstrate that amphioxus ORs and the vertebrate type 1 ORs form a monophyletic group (Figure 1). In a separate phylogenetic analysis, we added fish and mammalian sequences from the α, β, γ and δ groups of Rhodopsin GPCRs  and non-OR Rhodopsin-like GPCRs from B.floridae  (see Additional file 4). The addition of more sequences to the phylogeny had no effect on the bootstrap support for the key nodes and did not change the topology of the tree. These observations not only provide strong support for the hypothesis that the amphioxus genes are orthologs of vertebrate ORs, they also indicate that type 1 and type 2 ORs diverged more than 550 million years ago.
Sequence identity among amphioxus ORs ranges from approximately 22% to 95%, over the seven transmembrane regions indicating that these genes were produced by old and recent duplication events. This pattern can also be observed in fish ORs; sequence identity among the 238 fish ORs used in this study ranges from under 20% to over 90% (data not shown). The range of sequence identity values between B. floridae ORs and the vertebrate ORs derived from the alignment used to reconstruct the phylogeny in Figure 1 was 10% to 31%. All B. floridae ORs are members of a clade that contains no vertebrate sequences suggesting that a few OR genes have provided the raw material for gene family expansions just as in several vertebrate lineages.
The number of OR genes identified in B. floridae is smaller than the number of OR genes that are found in most vertebrates. One possible explanation is that the majority of the receptors involved in olfaction in B. floridae are encoded by other gene families, such as the TAARs or the formyl peptide receptor-like proteins. Alternatively, we may not have identified all members of the B. floridae OR gene family. If this is the case, these genes may belong to OR gene families yet to be identified in any chordates; the InterPro database  contains a number of orphan GPCRs in the Rhodopsin family. As genome annotation improves for lamprey, hemichordates and echinoderms, it might be possible to identify additional OR genes in amphioxus and vertebrates that cannot be detected using a search based entirely upon the OR diversity currently described in vertebrates.
As mentioned above, Nordström et al.  did not uncover orthologs of vertebrate ORs among the 664 GPCRs identified in their survey of the B. floridae protein predictions. Our search strategy differed from theirs in that it employed a greater diversity of vertebrate OR sequences to query the amphioxus protein predictions. As mentioned above, mammals generally have more ORs than fish, but they have representatives of only two of the nine OR clades, whereas fish have OR genes from eight of these clades . By using fish sequences instead of mammalian sequences in our search, we emphasized residues conserved in a broad diversity of ORs and were able to ignore residues that appear to be diagnostic for ORs only because they are common in recently duplicated genes.
Sequence conservation: GPCRs
The candidate B. floridae ORs identified in this study share several features with other genes in the Rhodopsin family of GPCRs. These include a conserved cysteine residue at the border of TM3 and extracellular loop (EL1), a conserved tryptophan in TM4, and an NPxxY motif (where x represents a variable amino acid position) in TM7. The cysteine residue is present in most GPCRs and is thought to participate in a disulfide bond between TM3 and EL2 [18, 19]. The tryptophan residue in TM4 plays a role in inter-helix interactions that help to maintain receptor conformation . The NPxxY motif is found in most GPCRs in the Rhodopsin family [18, 20] including vertebrate ORs [13, 27] and is thought to be involved in receptor internalization and desensitization . A DRY motif occurs in TM3 of most Rhodopsin family GPCRs [18, 20]. While this motif is also present in some B. floridae ORs, the majority have a leucine (L) in place of the arginine (R) residue. The consequences of mutations in this motif vary [reviewed in ] and the DLY motif is not inconsistent with OR status. A search of our InterPro OR database uncovered homologous DLY motifs in human, colobus monkey, and dolphin OR proteins.
Sequence conservation: odorant receptors
Having shown that B. floridae ORs share several sequence features with other members of the Rhodopsin family of GPCRs, our next goal was to identify features specific to ORs. The WebLogo analysis of an alignment of 125 ORs revealed four areas that are conserved in vertebrate and amphioxus ORs (Figure 2). From these four regions, we generated a series of 27 motifs which were then tested for their ability to discriminate between ORs and non-ORs. This survey identified three motifs common in ORs but rare in other Rhodopsin family GPCRs; LxxPxYxxxxxLxxxDxxxxxxxxP, MxxxxYxxxCxPLxY and KAxxTxxxH. These three motifs are found in intracellular loops one, two and three respectively and all three overlap with neighbouring TM domains. The KAxxTxxxH was best at discriminating between ORs and non-ORs (as defined by InterPro). This motif occurred in 73.48% of ORs, but only in 0.24% of non-ORs.
Conserved amino acid motifs have previously been noted in alignments of human, mouse and zebrafish OR sequences [13, 27, 30, 31] and these motifs include some of the amino acid residues highlighted above. For example, most mammalian ORs have a conserved motif in IL1  that is similar to the first motif identified in this study. Both motifs include a leucine (L) residue followed by downstream proline (P) and tyrosine (Y) residues. In B. floridae ORs, the L, P and Y residues are conserved though the Y residue appears to have been lost in many of the recent duplicates. Also, most human odorant receptors have the MAYDRYVAIC motif at the border of TM3 and IL2  and this motif can also be found in mouse  and zebrafish ORs . The comparison between the MAYDRYVAIC motif and the second motif identified here suggests that the methionine (M), tyrosine (Y), and cysteine (C) residues are the most important components of this motif and may have OR-specific functions. The alanine (A) and aspartic acid (D) residues are also common in both vertebrate and B. floridae ORs (Table 1). In IL3, the KAFSTC motif is also present in human, mouse and zebrafish ORs [4, 13, 27], however, the phenylalanine (F) and serine (S) residues are not as common in zebrafish ORs. The comparison between the KAFSTC motif and our third motif suggests that the lysine (K), alanine (A) and threonine (T) residues play the most important roles, and that the downstream histidine (H) would also be a good candidate for site directed mutagenesis studies. Though the threonine residue is highly conserved between taxa, it appears to have been lost in many of the B. floridae ORs. Finally, our analysis that included B. floridae sequences shows that an NPxxY motif, which is common to Rhodopsin family GPCRs, becomes a good OR marker when an arginine (R) residue is included two amino acids positions downstream (i.e. NPxxYxxR).
The locations of the motifs within the intracellular loops suggest they these loops are important for OR signalling. In other GPCRs, the intracellular loops interact with G proteins and other proteins on the inside of the cell to regulate signal transduction. In mOR-EG, a mouse OR, mutation of conserved positions within the intracellular loops has been shown to inhibit receptor function that is unrelated to the protein's ability to bind ligands . The pattern of conservation observed here suggests that signal transduction in both cephalochordate and vertebrate sensory neurons may be regulated by similar molecular interactions on the inside of the cell. These conserved residues may also be important for maintaining receptor conformation in cephalochordates and vertebrates. Though purely speculative as to what the precise role of these residues is, these sites, because of their persistence over evolutionary time, are excellent candidates for functional analysis.
Organization in genome
ORs in vertebrates are intronless and have short N and C termini [1, 2, 4]. In B. floridae, 35 out of 50 of the full-length ORs identified in this study are intronless. Like vertebrates, most B. floridae ORs have short N termini but unlike vertebrates, many B. floridae ORs have long C termini. In mOR-EG, the C terminus plays an important role in maintaining receptor conformation and specificity; mutation of residues within the C terminus can inhibit signalling . In other GPCRs, the C terminus is important for receptor phosphorylation and the internalization of the receptor from the membrane [reviewed in ]. The presence of long C termini in B. floridae ORs should be confirmed experimentally, however, the presence of several clusters of serine and threonine residues in the C termini suggests they may be sites for receptor phosphorylation as seen in other GPCRs [34, 35].
Another common feature of vertebrate odorant receptors is that they are often found tandemly arrayed in the genome [2, 4, 13, 36, 37]. More than half of the full-length and partial OR genes identified here are found on a scaffold with at least one other OR, and over a third of these genes are found on a scaffold with two or more ORs. Since the B. floridae genome assembly is not yet complete, the degree of linkage between B. floridae ORs is likely an underestimate.
Expression in the rostral epithelium
Although our bioinformatics approach tells us only that these amphioxus genes are orthologs of vertebrate ORs, there are also experimental data for a similar gene in B. belcheri suggesting these genes function as ORs. Satoh  sequenced a single gene from B. belcheri that appeared to be related to vertebrate ORs and he showed expression in the rostral epithelium using an in-situ probe for this sequence. This gene was included in our phylogenetic analysis of B. floridae candidate ORs and it occurred nested within the larger group of B. floridae ORs (Figure 1). Interestingly, Satoh also mentioned that the sequence he amplified from cDNA for the in-situ probe was 'nearly identical' to the one derived from genomic DNA suggesting there may be recently duplicated OR genes in the B. belcheri genome that are highly similar in primary sequence. If these duplicate genes are present in the B. belcheri genome as seen in the B. floridae genome, then the primers used to make the in-situ probe may not have been gene-specific resulting in a pool of probes generated from highly similar B. belcheri OR genes. Alternatively, a single probe may have bound to multiple, highly similar mRNAs. These factors may explain the 'ubiquitous' expression pattern in the rostral epithelium that Satoh observed.
In conjunction with the expression data collected by Satoh , the identification of amino acid motifs that are conserved in both amphioxus and vertebrate ORs supports the hypothesis that these amphioxus genes function as ORs. However, GPCRs that are similar in sequence may not have exactly the same function: sequence identities among the formyl peptide receptor-like genes range from 67-96% but in mice, not all of these genes are expressed in the vomeronasal sensory neurons . For this reason, further experimental evidence is required to determine if the amphioxus ORs have the same function as vertebrate ORs.
Using the search strategy employed here, we did not uncover orthologs of vertebrate ORs in the urchordate Ciona intestinalis. Our results are consistent with those obtained in a recent survey of the C. intestinalis protein predictions for GPCRs . Orthologs of vertebrate ORs may be present in other urochordate species but have been lost in C. intestinalis. However, the results of our phylogenetic analysis show that OR families have expanded from a few progenitor genes independently in many lineages, suggesting that the loss of ORs in any one clade (e.g. urochordates) could have been be a result of the loss of only one or two ancestral genes.
In this study we have identified orthologs of vertebrate odorant receptor genes in the cephalochordate B. floridae. This discovery supports the hypothesis that vertebrate odorant receptors evolved prior to the split between cephalordates and chordates which occurred approximately 550 million years ago . By aligning and comparing vertebrate and amphioxus ORs, we have identified amino acid motifs that are conserved only in ORs. These residues may prove useful for uncovering formerly unrecognized ORs in vertebrates and for uncovering orthologs in more distantly related taxa. These sites, which occur in intracellular loops, are also excellent candidates for mutation-based study of OR signal transduction. The expression domains of these genes may be used to identify homologous sensory neurons in vertebrate and invertebrate chordates. Comparative studies that include cephalochordates, urochordates and early vertebrates should help us to understand OR gene family evolution, the mechanisms that control single receptor expression, axonal guidance, and signal transduction and integration.
An HMM and Blastp based search for odorant receptors in B. floridae
Ray-finned fish (Actinopterygii) odorant receptors (n = 238), the majority from zebrafish (Danio rerio), were used to create an odorant receptor (OR) Hidden Markov Model (HMM). We used fish sequences instead of mammalian ORs because fishes have retained members of eight of the nine classes of odorant receptors thought to be present in early vertebrates . Although mammals possess on average more ORs than other vertebrates, only two of the nine OR clades are present in mammals . Fish OR sequences  were downloaded from GenBank and translated into proteins. All pseudogenes, and one sequence that could not be aligned (NM_131143.1), were removed and the remaining sequences were aligned with ClustalW . The alignment was edited using BioEdit  and used to construct a profile hidden Markov model (HMM) using default settings and the HMM calibrate application [40, 41]. This HMM model was used to search the B. floridae protein predictions (N = 50 817, assembly v1.0) that were downloaded from the DOE Joint Genome Institute . The protein predictions for Ciona intestinalis (N = 19 858, assembly version 2 release 53) were downloaded from Ensemble . An E-value cut-off of E-10 and default parameters were used for the HMM searches.
The B. floridae sequences identified in the HMM survey were used as query sequences in a Blastp  search of the B. floridae protein predictions. For a Blastp hit to be considered a candidate OR, it had to be at least 40% identical to the query sequence over a minimum of 100 amino acids. Each of the hit sequences that met this criterion was used in a second Blastp search using the same criteria. In this survey, only hits to at least part of any of the TM-spanning domains of the query sequence were retained. Sequences that spanned all seven TM domains were considered full-length sequences; all others were considered partial sequences.
All candidate ORs from B. floridae were aligned to 59 vertebrate ORs including sequences from lamprey, tetrapod (Sarcopterygii), and ray-finned fish (Actinopterygii) using ClustalW . A single candidate OR from B. belcheri  was also included in the alignment. N and C termini were removed for phylogenetic analysis as well as EL2 and TM5 because they could not be aligned (transmembrane boundaries defined by Man et al. ). Non-OR GPCRs from the Rhodopsin family included in the alignment were human and fish purinergic [GenBank:NM_002563, GenBank:CAK04925] and melanocortin receptors [GenBank:AAC13541, GenBank:NP_851301]. Although several other Rhodopsin-like genes were used as out-groups in our preliminary analyses (see Additional file 4), the P2Y and melanocortin receptors were chosen because the human P2Y receptor belongs to a subgroup of the Rhodopsin-like GPCRs that includes the human ORs (group δ) and is expected to be more closely related to the vertebrate ORs than the melanocortin receptor which belongs in another subgroup (group α) . An alignment of 200 amino acid positions was used to construct a Neighbor-Joining tree in Mega3.1  based on Poisson-corrected distances. Support for tree topology was estimated using 1000 bootstrap replicates.
Key amino acid motifs
To identify amino acid motifs common in vertebrate and B. floridae ORs, we constructed an alignment of vertebrate (n = 64) and B. floridae (n = 61) ORs. Sequences from all nine clades of vertebrate ORs  were used in the alignment including representatives from human, mouse, lamprey, fish, chicken and amphibians. The alignment included the same amino acid positions that were used for the phylogeny. Using the alignment, we constructed a WebLogo  from which a list of candidate amino acid motifs was generated. To determine whether these motifs were present in ORs, non-ORs, or both, we downloaded InterPro protein families  IPR000276 (Rhodopsin-like GPCRs) and IPR000725 (Olfactory receptors) and used them to construct two MySQL databases: one containing 5438 odorant receptors and the other containing the Rhodopsin-like sequences with the OR genes from IPR000725 excluded (N = 21 282). We searched these databases for the presence of the motifs using a series of regular expressions. An OR-specific motif was defined as one that is found in a large proportion of ORs but less than 1% of non-ORs.
OR gene structure and scaffold positions
Vertebrate odorant receptors are intronless and are often found in tandem [1, 48, 49]. To determine if B. floridae ORs are also intronless and in tandem, we obtained exon number and gene orientation from the annotation file accompanying genome assembly v1.0. The locations of these genes were obtained from the more recent version of the assembly, v2.0. Our ability to identify single exon genes is limited by the incomplete annotation of the genome. However, as previously stated, we considered a full-length sequence to be one that spans all seven transmembrane domains.
Buck L, Axel R: A novel multigene family may encode odorant receptors: A molecular basis for odor recognition. Cell. 1991, 65: 175-187. 10.1016/0092-8674(91)90418-X.
Quignon P, Giraud M, Rimbault M, Lavigne P, Tacher S, Morin E, Retout E, Valin AS, Lindblad-Toh K, Nicolas J, Galibert F: The dog and rat olfactory receptor repertoires. Genome Biol. 2005, 6: R83-10.1186/gb-2005-6-10-r83.
Godfrey PA, Malnic B, Buck LB: The mouse olfactory receptor gene family. P Natl Acad Sci USA. 2004, 101: 2156-2161. 10.1073/pnas.0308051100.
Zhang X, Firestein S: The olfactory receptor gene superfamily of the mouse. Nat Neurosci. 2002, 5: 124-133.
Gloriam DE, Fredriksson R, Schiöth HB: The G protein-coupled receptor subset of the rat genome. BMC Genomics. 2007, 8: 338-10.1186/1471-2164-8-338.
Liberles SD, Buck LB: A second class of chemosensory receptors in the olfactory epithelium. Nature. 2006, 442: 645-650. 10.1038/nature05066.
Dulac C, Axel R: A novel family of genes encoding putative pheromone receptors in mammals. Cell. 1995, 83: 195-206. 10.1016/0092-8674(95)90161-2.
Herrada G, Dulac C: A novel family of putative pheromone receptors in mammals with a topographically organized and sexually dimorphic distribution. Cell. 1997, 90: 763-773. 10.1016/S0092-8674(00)80536-X.
Matsunami H, Buck LB: A multigene family encoding a diverse array of putative pheromone receptors in mammals. Cell. 1997, 90: 775-784. 10.1016/S0092-8674(00)80537-1.
Ryba NJP, Tirindelli R: A new multigene family of putative pheromone receptors. Neuron. 1997, 19: 371-379. 10.1016/S0896-6273(00)80946-0.
Rivière S, Challet L, Fluegge D, Spehr M, Rodriguez I: Formyl peptide receptor-like proteins are a novel family of vomeronasal chemosensors. Nature. 2009, 459: 574-577. 10.1038/nature08029.
Niimura Y, Nei M: Evolutionary dynamics of olfactory receptor genes in fishes and tetrapods. P Natl Acad Sci USA. 2005, 102: 6039-6044. 10.1073/pnas.0501922102.
Alioto TS, Ngai J: The odorant receptor repertoire of teleost fish. BMC Genomics. 2005, 6: 173-10.1186/1471-2164-6-173.
Delsuc F, Brinkmann H, Chourrout D, Philippe H: Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature. 2006, 439: 965-968. 10.1038/nature04336.
Jefferies RPS: Two types of bilateral symmetry in the Metazoa: chordate and bilaterian. Biological Asymmetry and Handedness. Edited by: Bock GR, Marsh J. 1991, Chichester: Wiley, 94-127.
Nordström KJV, Fredriksson R, Schiöth HB: The amphioxus (Branchiostoma floridae) genome contains a highly diversified set of G protein-coupled receptors. BMC Evol Biol. 2008, 8: 9-10.1186/1471-2148-8-9.
Kamesh N, Aradhyam GK, Manoj N: The repertoire of G protein-coupled receptors in the sea squirt Ciona intestinalis. BMC Evol Biol. 2008, 8: 129-10.1186/1471-2148-8-129.
Karnik SS, Gogonea C, Patil S, Saad Y, Takezako T: Activation of G-protein-coupled receptors: a common molecular mechanism. Trends Endocrinol Metab. 2003, 14: 431-437. 10.1016/j.tem.2003.09.007.
Bockaert J, Pin JP: Molecular tinkering of G protein-coupled receptors: an evolutionary success. EMBO J. 1999, 18: 1723-1729. 10.1093/emboj/18.7.1723.
Fredriksson R, Lagerstrom MC, Lundin LG, Schiöth HB: The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. Mol Pharmacol. 2003, 63: 1256-1272. 10.1124/mol.63.6.1256.
Fredriksson R, Schiöth HB: The repertoire of G-protein-coupled receptors in fully sequenced genomes. Mol Pharmacol. 2005, 67: 1414-1425. 10.1124/mol.104.009001.
Satoh G: Characterization of novel GPCR gene coding locus in amphioxus genome: Gene structure, expression, and phylogenetic analysis with implications for its involvement in chemoreception. Genesis. 2005, 41: 47-57. 10.1002/gene.20082.
Putnam NH, Butts T, Ferrier DE, Furlong RF, Hellsten U, Kawashima T, Robinson-Rechavi M, Shoguchi E, Terry A, Yu JK, Benito-Gutiérrez EL, Dubchak I, Garcia-Fernàndez J, Gibson-Brown JJ, Grigoriev IV, Horton AC, de Jong PJ, Jurka J, Kapitonov VV, Kohara Y, Kuroki Y, Lindquist E, Lucas S, Osoegawa K, Pennacchio LA, Salamov AA, Satou Y, Sauka-Spengler T, Schmutz J, Shin-I T, Toyoda A, Bronner-Fraser M, Fujiyama A, Holland LZ, Holland PW, Satoh N, Rokhsar DS: The amphioxus genome and the evolution of the chordate karyotype. Nature. 2008, 453: 1064-71. 10.1038/nature06967.
Freitag J, Beck A, Ludwig G, von Buchholtz L, Breer H: On the origin of the olfactory receptor family: receptor genes of the jawless fish (Lampetra fluviatilis). Gene. 1999, 226: 165-174. 10.1016/S0378-1119(98)00575-7.
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, 37: D211-5. 10.1093/nar/gkn785.
Palczewski K, Kumasaka T, Hori T, Behnke CA, Motoshima H, Fox BA, Le Trong I, Teller DC, Okada T, Stenkamp RE, Yamamoto M, Miyano M: Crystal structure of rhodopsin: A G protein-coupled receptor. Science. 2000, 289: 739-745. 10.1126/science.289.5480.739.
Zozulya S, Echeverri F, Nguyen T: The human olfactory receptor repertoire. Genome Biol. 2001, 2: RESEARCH0018-10.1186/gb-2001-2-6-research0018.
Gripentrog JM, Jesaitis AJ, Miettinen HM: A single amino acid substitution (N297A) in the conserved NPXXY sequence of the human N-formyl peptide receptor results in inhibition of desensitization and endocytosis, and a dose-dependent shift in p42/44 mitogen-activated protein kinase activation and chemotaxis. Biochem J. 2000, 352: 399-407. 10.1042/0264-6021:3520399.
Rovati GE, Capra V, Neubig RR: The highly conserved DRY motif of class A G protein-coupled receptors: Beyond the ground state. Mol Pharmacol. 2007, 71: 959-964. 10.1124/mol.106.029470.
Pilpel Y, Lancet D: The variable and conserved interfaces of modeled olfactory receptor proteins. Protein Sci. 1999, 8: 969-977. 10.1110/ps.8.5.969.
Liu AH, Zhang XM, Stolovitzky GA, Califano A, Firestein SJ: Motif-based construction of a functional map for mammalian olfactory receptors. Genomics. 2003, 81: 443-456. 10.1016/S0888-7543(03)00022-3.
Kato A, Katada S, Touhara K: Amino acids involved in conformational dynamics and G protein coupling of an odorant receptor: targeting gain-of-function mutation. J Neurochem. 2008, 107: 1261-1270. 10.1111/j.1471-4159.2008.05693.x.
Katada S, Tanaka M, Touhara K: Structural determinants for membrane trafficking and G protein selectivity of a mouse olfactory receptor. J Neurochem. 2004, 90: 1453-1463. 10.1111/j.1471-4159.2004.02619.x.
Hanyaloglu AC, von Zastrow M: Regulation of GPCRs by endocytic membrane trafficking and its potential implications. Annu Rev Pharmacol Toxicol. 2008, 48: 537-568. 10.1146/annurev.pharmtox.48.113006.094830.
Oakley RH, Laporte SA, Holt JA, Barak LS, Caron MG: Molecular determinants underlying the formation of stable intracellular G protein-coupled receptor-β-arrestin complexes after receptor endocytosis. J Biol Chem. 2001, 276: 19452-19460. 10.1074/jbc.M101450200.
Glusman G, Yanai I, Rubin I, Lancet D: The complete human olfactory subgenome. Genome Res. 2001, 11: 685-702. 10.1101/gr.171001.
Niimura Y, Nei M: Evolution of olfactory receptor genes in the human genome. P Natl Acad Sci USA. 2003, 100: 12235-12240. 10.1073/pnas.1635157100.
Thompson JD, Higgins DG, Gibson TJ: ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.
Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999, 41: 95-98.
Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.
US Department of Energy Joint Genome Institute. [http://www.jgi.doe.gov]
Ensemble FTP server. [http://www.ensembl.org/info/data/ftp/index.html]
Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Man O, Gilad Y, Lancet D: Prediction of the odorant binding site of olfactory receptor proteins by human-mouse comparisons. Protein Sci. 2004, 13: 240-254. 10.1110/ps.03296404.
Kumar S, Tamura K, Nei M: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform. 2004, 5: 150-163. 10.1093/bib/5.2.150.
Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: A sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.
Young JM, Trask BJ: The sense of smell: genomics of vertebrate odorant receptors. Hum Mol Genet. 2002, 11: 1153-1160. 10.1093/hmg/11.10.1153.
Mombaerts P: Seven-transmembrane proteins as odorant and chemosensory receptors. Science. 1999, 286: 707-711. 10.1126/science.286.5440.707.
The authors would like to thank the Department of Energy Joint Genome Institute for making the B. floridae and C. intestinalis genome sequence available. We would also like to thank Angelika Ehlers and Vladamir Kotlovyi for creating the Perl scripts used for MySQL database construction. We would also like to thank Christine Churcher for her help with the editing of the manuscript. This work was funded by grants from the Canadian Foundation for Innovation, the British Columbia Knowledge Development Fund, and the Natural Sciences and Engineering Research Council of Canada (JST).
AMC and JST conceived and designed this study. AMC collected and analyzed the data and JST assisted with sequence alignments. Both authors contributed to the writing of the manuscript and have read and approved the final manuscript.