Identification of the Otopetrin Domain, a conserved domain in vertebrate otopetrins and invertebrate otopetrin-like family members

Background Otopetrin 1 (Otop1) encodes a multi-transmembrane domain protein with no homology to known transporters, channels, exchangers, or receptors. Otop1 is necessary for the formation of otoconia and otoliths, calcium carbonate biominerals within the inner ear of mammals and teleost fish that are required for the detection of linear acceleration and gravity. Vertebrate Otop1 and its paralogues Otop2 and Otop3 define a new gene family with homology to the invertebrate Domain of Unknown Function 270 genes (DUF270; pfam03189). Results Multi-species comparison of the predicted primary sequences and predicted secondary structures of 62 vertebrate otopetrin, and arthropod and nematode DUF270 proteins, has established that the genes encoding these proteins constitute a single family that we renamed the Otopetrin Domain Protein (ODP) gene family. Signature features of ODP proteins are three "Otopetrin Domains" that are highly conserved between vertebrates, arthropods and nematodes, and a highly constrained predicted loop structure. Conclusion Our studies suggest a refined topologic model for ODP insertion into the lipid bilayer of 12 transmembrane domains, and highlight conserved amino-acid residues that will aid in the biochemical examination of ODP family function. The high degree of sequence and structural similarity of the ODP proteins may suggest a conserved role in the intracellular trafficking of calcium and the formation of biominerals.


Background
Otopetrin1 (Otop1) is the first described member of the otopetrin family, a novel gene family that encodes multi-transmembrane domain proteins. The family was named for the conserved role of Otop1 in the formation of otoconia and otoliths -"oto" (ear) and "petros" (stone). Oto-conia are complex calcium carbonate biominerals in the utricle and saccule of the vertebrate inner ear that are required for the normal sensation of linear acceleration and gravity. Degeneration or displacement of otoconia can lead to vertigo and loss of balance [1][2][3][4][5]. Three mutant mice and one zebrafish model with mutations in Otop1 have been described: tilted (tlt) [6]; mergulhador (mlh) [7]; inner ear defect (ied) [8]; and backstroke (bks) [9], respectively. All of these mutants lack otoconia or otoliths, but have normal inner ear development. In zebrafish, the morpholino knockdown of Otop1 phenocopies the tlt mutation, showing otolith agenesis with no disruption of the patterning of the developing inner ear [9,10].
The otopetrin family in most vertebrates studied consists of three genes clustered in two chromosomal locations: Otop1 (i.e., human Chr 4p16, mouse Ch5B2) and the paralogous tandem genes Otop2 and Otop3 (i.e., human Ch17q24-25, mouse Ch11E2). Vertebrate otopetrins share a conserved gene and protein structure, with no homology to other transporters, channels, exchangers, or receptors. A preliminary secondary structure prediction based on the human, mouse, rat, zebrafish, and fugu protein sequences suggested a topology of ten transmembrane domains (TM) with cytosolic amino and carboxy termini. Additionally, tBlastn searches in the EST and genomic databases identified regions of homology with the DUF270 domain in a number of arthropod and nematode proteins. DUF270 (pfam03189) is a 404 aminoacid consensus sequence of unknown function that defines the DUF270 family, with members in C. elegans and D. melanogaster. The two regions of maximum homology with DUF270 found in vertebrate otopetrins correspond to putative TM domains 3-5 and 9-10, respectively, and were initially designated DUF270-I and DUF270-II [7].
Here, we report a comparison of evolutionary constraint and hydropathy profile analysis of 62 vertebrate otopetrins and arthropod and nematode DUF270 proteins, demonstrating that the genes that encode these proteins constitute a single family that we have renamed the Otopetrin Domain Protein (ODP) gene family. The refined topologic model of the ODP proteins includes 12 putative TM domains clustered into three "Otopetrin Domains" (OD-I, -II, and -III, respectively), with a strong degree of sequence conservation across widely divergent groups of metazoa. These regions of highest homology and evolutionary constraint, including the FYR box in the cytoplasmic tail, may represent important functional subdomains. Biochemical studies in transfected cells show that Otop1 modulates the manner in which cells handle intracellular calcium in response to purinergic stimuli [11]. The lack of known functional domains, such as ATPbinding domains, selectivity pores, or G-protein-binding consensus sequences, suggests that either the ODP family has a novel function that significantly differs from the activities of known channels, transporters, or receptors, or that the ODP genes encode novel functional motifs. We hypothesize that these motifs would likely occur within the evolutionarily constrained regions, as has been shown for other well-conserved gene families [12]. The challenge remains to define the functional domains of the ODP family, with sequence and analyses reported here providing a step in that direction.

Results and Discussion
Comparative sequence data set The annotation of the Otop1, Otop2, and Otop3 genes in the human, mouse, rat, zebrafish, and fugu genomes is described elsewhere [7]. Orthologous otopetrin sequences were generated using a targeted sequencing approach (from dog, cow, armadillo and western clawed frog) (see methods in [13,14]) or identified through tBlastn searches of available whole-genome sequences. The phylogenetic relationships of vertebrate otopetrin and arthropod and nematode DUF270 genes were deduced from a total of 62 complete or nearly complete open reading frames in 25 species (see Table 1 for a listing of the specific species and accession numbers). Fragmentary, but clearly otopetrin-related, sequences were also identified in urochordates (ciona), echinoderms (urchin), and cnidarians (nematostella), however were not complete enough to include in this analysis.

Phylogenetic relationships and revised nomenclature of vertebrate otopetrins and arthropod and nematode DUF270 genes
A maximum-likelihood phylogenetic tree was created from the multi-sequence alignment of each encoded protein ( Figure 1). The vertebrate, arthropod, and nematode sequences form distinct monophyletic groups, each containing three or more paralogous groups. This arrangement suggests that the ancestral metazoan genome may have contained a single otopetrin-like gene, with subsequent duplications giving rise to the paralogs in the different phyla after the three lineages diverged. Based on the positions in the tree of the named mouse and human sequences, the three vertebrate paralogous groups correspond to Otop1, Otop2, and Otop3. Otop2 and Otop3 are more closely related to each other than either is to Otop1, a clustering that parallels the genomic organization of the Otop genes in the vertebrate genomes. The arthropod and nematode DUF270 sequences, in which encoded proteins cluster independently in the tree from the vertebrate otopetrin sequences, have been renamed as otopetrin-like proteins (OTOPL), and the paralogous groups have been assigned arbitrary letters. This is in agreement with the HUGO gene nomenclature committee guidelines for gene families and grouping [15]. Like verte- In some instances, the nucleotide accession number corresponds to a † scaffold, † † cosmid, or † † † fosmid record; in those cases, the accession number of the Otop or OTOPL annotation (protein) is indicated in parenthesis. * ENSEMBL accession number brates, arthropods also have three paralogous groups of OTOPLs. The grouping in nematodes is more complex: there appears to be three major groups of OTOPLs, as in vertebrates and arthropods, but each group itself contains two or more paralogous groups as a result of species-specific gene duplications. In summary, vertebrate otopetrins and arthropod and nematode OTOPL genes have been grouped as a single family that we named collectively the Otopetrin Domain Protein (ODP, see below) gene family.

Refined topological model for ODP insertion into the lipid bi-layer
Conserved primary sequence is indicative of an underlying conserved tertiary structure, and the evolutionary information contained in an alignment of related sequences can be leveraged to improve predictions of shared structures [16]. We took advantage of the deep multi-sequence alignment and phylogenetic tree of the ODP family to reexamine the predicted topology of the ODPs (Figure 2A). A hydropathy profile was generated that employs phylogenetic averaging [17] on hydropathy scale values for amino acids [18] to improve the detection of conserved hydrophobic regions, which might correspond to TM domains. The hydropathy profile revealed 12 strong hydrophobic regions, ten of which overlap with the originally predicted TM domains [7]. Likewise, the MEMSAT3 [19] and TMAP [20] algorithms, which take into account leveraged evolutionary information, also predicted 12 TM helices for ODP family members that overlap well with the constrained regions and hydrophobic regions in our profile ( Figure 2A).
The refined topological model for the ODP family thus consists of 12 TM domains, with both the N-and C-termini in the cytosol, and in which the two newly identified TM domains are TM4 and TM10, respectively. As shown in Figure 2B, there are three discrete regions with maximum evolutionary constraint among vertebrates, arthropods and nematodes, which we have designated Otopetrin Domain (OD) -I, -II, and -III, respectively. Among the TM domains, TM2 and TM8 show the poorest conservation and evolutionary constraint across species. On the other hand, the loops connecting the TM domains show little sequence conservation or evolutionary constraint, strongly suggesting that the TM domains are the primary functional regions of the ODP family ( Figure 2A and Additional file 1). Despite the poor loop sequence conservation, the number of amino acids in 8 of the 11 loops separating TM domains is highly conserved (

Conclusion
Comparative analyses of vertebrate otopetrins and arthropod and nematode OTOPL proteins revealed that they all share a TM domain structure and significant conservation of amino-acid sequence, suggesting that they constitute a single protein family, here renamed the ODP family. We have expanded the domains of homology to more accurately reflect the extent of sequence conservation between vertebrates, arthropods and nematodes, and have identified three evolutionarily constrained TM domain-rich areas that we have designated as Otopetrin Domains.
OD-I and OD-III are the most highly conserved regions of the ODP family. Tlt mice carry a missense mutation (Ala 151 →Glu), which alters the hydrophobicity of the predicted TM3 domain within OD-I, and leads to a presumed alteration in the membrane insertion or activity of Otop1 and otoconial agenesis [7]. The OD-II evolutionarily constrained region was not identified in the initial modeling, but mutations in Otop1 within this conserved segment of the protein have been shown to cause otolith/otoconial agenesis in bks mutant fish (Glu 429 →Val) [9] and in mlh mutant mice (Leu 408 →Gln) [7] (Figure 2B), suggesting that this region is functionally important.
Initial modeling of the OTOP proteins suggested a 10 TM domain model with cytosolic N-and C-termini [7]. This model had several problems, including that sites consistent with the consensus sequence for N-glycosylation were  Predicted secondary structure and topologic model for Otop1 insertion into the lipid bilayer

D. melanogaster OTOPLa
predicted to be cytosolic. The 12 TM domain model predicted by hydrophobicity and evolutionary constraint analysis places the proposed glycosylation sites in the extracellular space ( Figure 2B), and suggests that it may reflect a more accurate version of OTOP insertion into the lipid bilayer. Interestingly, the missense mutations in the tlt, mlh, and bks animal models, which lead to functional loss of OTOP1 activity, each occur within highly conserved transmembrane domains; such mutations often alter the hydrophobicity of the conserved TM domain, which may lead to alterations in the ability of the protein to insert and orient in membranes.
Otop1 is required for the formation of vertebrate otoconia, a process that involves calcium carbonate biomineralization and requires the regulation of intracellular calcium. Biochemical studies in transfected cells show that OTOP1 modulates the manner in which cells handle intracellular calcium in response to purinergic stimuli [11]. The mechanisms of calcium carbonate biomineralization are highly conserved in the development of otoconia and otoliths in the vertebrate inner ear, the formation of the avian egg-shell, the mineralization of the arthropod exoskeleton, and the development of other mineralized structures such as the mollusk shell [21][22][23]. There is evidence that some ODP family members are expressed in tissues associated with calcium secretion and calcium carbonate-based mineralization. In particular, ESTs from Callinectes sapidus (Blue crab) reveal strong expression of the D. melanogaster OTOPLb ortholog in hypodermal tissues that are required for calcium mobilization during the mineralization of the chitinous exoskeleton [24]. ODP mRNAs are also expressed in the hemocytes of various invertebrate species, which have been associated with the development of mineralized structures in mollusks [25]. In mammals, Otop1 is expressed in the lactating mammary gland [7], perhaps functioning in the secretion of calcium into milk. Taken together, the sequence homology, structural constraint, and expression pattern suggest a conserved role for members of the ODP family in the formation of mineralized structures. Further examination of ODPs and continued characterization of natural and induced mutations in these proteins through both physiologic and topologic studies may assist in better understanding the mechanisms of establishing and maintaining mineralized structures throughout the animal kingdom.

Alignment, phylogenic tree generation, and evolutionary constraint versus hydropathy analysis
The initial protein sequence alignment was performed with ProbCons [29], and a preliminary phylogenetic tree was built with SEMPHY [30] using only the most confidently aligned regions of the multi-sequence alignment. The sequences were then divided into smaller groups based upon their relatedness according to the tree. Each group was re-aligned with Probcons, and each of these sub-alignments was manually adjusted. ClustalW [31,32] was then used to profile-align these sub-alignments, producing the final, full alignment. The final phylogenetic tree was constructed using SEMPHY, constraining the topology to conform to SEMPHY trees built from the subalignments. 1000 bootstrap replicates were generated for each subtree as well as the final tree. The bootstrap values shown in Figure 1 are from the lowest-level tree in which the given branch occurs.
Evolutionarily constrained regions were detected essentially as described previously [12]. The final alignment and tree were used to calculate single-site evolutionary rates with the empirical Bayesian version of the program Rate4Site [33]. These single-site rate values were smoothed using sliding-windows of weighted averaging.
In each 17-position-wide window, the value at the center position of the window was given the highest relative weight, and the relative weight decreased linearly for the values on either side to the edge of the window. The resulting weighted average was assigned to the position in the protein corresponding to the center of the window. To produce the evolutionary constraint profile, the rates were then converted to relative constraint by normalizing to a range between 0 and 1, inverted by subtracting from 1 (because a region of low evolutionary rate is under high evolutionary constraint), and plotted against the position in the protein.
To produce the hydropathy profile, the hydropathy-scale value [18] for each amino acid in a column of the multisequence alignment (corresponding to a single position on the profile) was multiplied by a weighting factor that reflects the fractional contribution of the corresponding sequence to the total sequence diversity represented [17].
The hydropathy score at each position is the sum of these values. These single-position values were smoothed using the same sliding-windows weighted averaging scheme applied to the rate values above, normalized to vary between 0 and 1, and plotted against the position in the protein.