- Research article
- Open Access
Evolutionary analysis of selective constraints identifies ameloblastin (AMBN) as a potential candidate for amelogenesis imperfecta
BMC Evolutionary Biology volume 15, Article number: 148 (2015)
Ameloblastin (AMBN) is a phosphorylated, proline/glutamine-rich protein secreted during enamel formation. Previous studies have revealed that this enamel matrix protein was present early in vertebrate evolution and certainly plays important roles during enamel formation although its precise functions remain unclear. We performed evolutionary analyses of AMBN in order to (i) identify residues and motifs important for the protein function, (ii) predict mutations responsible for genetic diseases, and (iii) understand its molecular evolution in mammals.
In silico searches retrieved 56 complete sequences in public databases that were aligned and analyzed computationally. We showed that AMBN is globally evolving under moderate purifying selection in mammals and contains a strong phylogenetic signal. In addition, our analyses revealed codons evolving under significant positive selection. Evidence for positive selection acting on AMBN was observed in catarrhine primates and the aye-aye. We also found that (i) an additional translation initiation site was recruited in the ancestral placental AMBN, (ii) a short exon was duplicated several times in various species including catarrhine primates, and (iii) several polyadenylation sites are present.
AMBN possesses many positions, which have been subjected to strong selective pressure for 200 million years. These positions correspond to several cleavage sites and hydroxylated, O-glycosylated, and phosphorylated residues. We predict that these conserved positions would be potentially responsible for enamel disorder if substituted. Some motifs that were previously identified as potentially important functionally were confirmed, and we found two, highly conserved, new motifs, the function of which should be tested in the near future. This study illustrates the power of evolutionary analyses for characterizing the functional constraints acting on proteins with yet uncharacterized structure.
Ameloblastin (AMBN), amelogenin (AMEL), enamelin (ENAM), amelotin (AMTN), odontogenic, ameloblast associated (ODAM) and SCPP-PQ1, are ameloblast-secreted proteins involved in various steps of the organization and mineralization of the enamel matrix of mammalian teeth. They belong to the large secretory calcium-binding phosphoprotein (SCPP) family (23 members known in humans), which appeared more than 450 million years ago (Ma) then diversified through gene duplication during vertebrate evolution [1–4]. In mammals, non-AMEL proteins account only for 10–20 % of the total enamel proteins, among which AMBN, also known as sheathlin [5, 6] or amelin , is the most abundant [8–10]. AMBN mRNA was sequenced in pigs , mice , humans [12, 13], cattles and guinea-pigs (direct submission to NCBI database). In non-mammals, AMBN cDNA was published only in a crocodile  and in the clawed toad . Genomic DNA (gDNA) sequences are known in a lizard  and in the coelacanth . These findings indicate that AMBN was present early during vertebrate evolution, and at least in the last common ancestor of sarcopterygians (420 million years ago, Ma ). However, the large evolutionary distance among the few AMBN sequences available in non mammalian tetrapods does not allow accurate comparison and further analysis, including new data in non mammalian sarcopterygians are needed to better understand AMBN evolution through such a large geological period.
AMBN is an intrinsically unstructured, enamel matrix protein (EMP), and its functions remain unclear. It has been suggested either to be a structural component of the enamel matrix playing a role in maintaining the prismatic structure of growing enamel crystals at the rod and interrod boundaries [6, 19, 20], or to be involved in ameloblast adhesion [21–23], or to function as a growth factor [24–27] or as a signaling molecule [26, 28]. AMBN possesses two calcium-binding domains that interact with Ca2+ ions after being liberated by proteolysis [29–31]. The transcription start site (TSS) and the promoter region of AMBN were analyzed in the mouse . This region contains cis-acting elements that function both to enhance and suppress transcription, some of them regulating transcription activity in mesenchymal cells in vitro . Cbfa1, the essential transcription factor for osteoblast differentiation, was shown to play an important role in AMBN transcription .
It is worth noting that (i) AMBN expression was reported during craniofacial bone development in rats , and (ii) in vitro and in vivo experiments have suggested that this protein could also play a role in bone formation and repair [27, 36, 37]. These findings could indicate that AMBN plays a role in early bone formation and modeling, but they have been contradicted by a study showing no implication in bone modeling and repair . Moreover, a possible function in bone development remains elusive as (i) AMBN −/− mice do not exhibit apparent bone phenotype , (ii) AMBN is subjected to pseudogenization in birds and in mammalian species, in which the capability to form either enamel or teeth has been lost, e.g., xenarthrans, pangolins and baleen whales, indicating that AMBN is a tooth specific protein [39–41], and (iii) AMBN expression was found to be restricted to teeth in rodents and in a crocodile [9, 14]. Finally, AMBN −/− mice develop severe enamel hypoplasia only, a phenotype that indicates a crucial role of AMBN in enamel formation .
Because of its important role in mammalian enamel formation, for long AMBN has been considered a candidate gene for amelogenesis imperfecta hereditary type 2 (AIH2), a human genetic disease [42–46]. In humans, AMBN is located on the long arm of chromosome 4 (4q13-21), containing the gene locus for the autosomal dominant hypoplastic form of AIH2, that affects enamel formation and is the most prevalent amelogenesis imperfecta (AI) type (85 % of all inherited AI) [12, 47]. However, it is only recently that the first case of a disease-associated mutation of AMBN (homozygous exon 6 deletion) was reported in a family having hypoplastic AI . The lack of other cases of AMBN-associated AIH2 strongly contrasts with the important role played by AMBN during amelogenesis. This contradiction could be explained by an autosomal recessive pattern of inheritance as demonstrated in AMBN +/− mice that do not exhibit an apparent dental phenotype .
Interestingly, human ameloblastoma, a benign but destructive tumor, expresses AMBN transcripts with tumor-specific mutations [13, 50, 51]. Such expression suggests that AMBN plays a role in epithelial odontogenic tumors. AMBN is also expressed in osteosarcoma cells . In humans and some mammals, amyloidosis has been associated with calcifying epithelial odontogenic tumors (CEOT) and AMBN was found to be highly expressed in these cells . AMBN deficiency was proposed to be the cause of the odontogenic tumors seen in 20 % of the AMBN −/− mice . Also, tumor suppressor genes P21 and P27 are upregulated in in vitro transfected ameloblastoma cells .
In recent years, a series of detailed evolutionary analyses of mammalian SCPPs have been conducted (AMEL ; ENAM ; MEPE ; AMTN ; DMP1 ) with the aim to (i) reveal regions and residues important for the protein function, (ii) predict or validate mutations responsible for genetic diseases, and (iii) understand their mode of evolution, origin and relationships [3, 4]. Here, we perform such an evolutionary analysis of mammalian AMBN sequences in order to predict functionally important sites of the protein and to identify candidate disease-associated mutations responsible for AI in human, an approach called phylomedicine .
Material and methods
AMBN sequences and alignment
A total of 56 mammalian full-length AMBN coding sequences, representative of the main lineages, were extracted from the NCBI  and Ensembl  databases: four published, full-length sequences (human, pig, mouse and rat); two sequences published only in GenBank (cow and guinea-pig), all including the UTRs; 33 computer-predicted sequences, i.e., available from the automatic analysis of sequenced mammalian genomes; and 17 sequences obtained using BLAST from whole genome shotgun data.
The AMBN sequences of armadillo and sloth (Xenarthra), and of aardvark (Afrotheria) are particular because the former is enamel-less in adult and the two others lack enamel, even in juveniles [41, 61]. AMBN being a tooth-specific protein [39, 40] the gene has accumulated mutations after the ability to form enamel was lost, or reduced, in these lineages . Therefore, we performed our evolutionary analyses with two datasets: the complete dataset of 56 sequences and a reduced dataset of 53 functional mammalian AMBN sequences. The published crocodilian sequence (Caiman crocodilus) was also used as outgroup for calculation of the mammalian ancestral sequence. Species names and sequence references are indicated in the Additional file 1.
The coding regions of the AMBN sequences were translated into putative amino acid sequences, aligned to the human sequence using Clustal X 2.0 , and checked by hand using Se-Al v.2.0a11 software [63, 64]. Each of the computer-predicted sequences was carefully checked against the published sequences, taking into account the intron/exon boundaries. When necessary, they were completed and/or corrected using (i) Blast search against the Ensembl genome database, (ii) the NCBI trace archives, (iii) the whole genome shotgun (WGS) repository sequences, and (iv) resequencing by performing classical PCR on DNA extracted from alcohol-preserved tissues. In our final alignment only a few residues were missing, representing less than 1 % of the data. The positions with missing data were included in our analyses and treated as “unknown states”.
5' untranslated region (5' UTR) and signal peptides
Taking the published mRNA sequences as templates, we obtained the putative 5' UTR (i.e., exon 1 and beginning of exon 2) of AMBN sequences in 49 genomes using BLAST. Then, using Clustal X 2.0 and Se-Al v.2.0a11, we aligned these sequences against the published mammalian sequences of exons 1 and 2 with particular attention paid on the exon-intron boundaries. The transcription start site (5' end of exon 1) was defined by default, but this region was not useful for our analysis. The 49 sequences were analyzed to identify the ATGs that could be correct translation initiation sites (TIS). We used the DNA functional site miner  that predicts functional TIS i.e., possessing the highest Kozak consensus score [66, 67]. All identified putative signal peptides (SPs) were analyzed using the Signal P 3.0 server . This software predicts the location of the three characteristic regions (n-, h- and c- regions) of a SP, the putative cleavage site of the protein, and calculates the probability for each SP to be functional .
The 53 functional mammalian AMBN sequences were aligned based on their amino acid translations using MAFFT  within Geneious R6.03 . The alignment was then restricted to the length of the human AMBN sequence by excluding all sites containing gaps in non-human sequences and the final stop codon. This procedure resulted in a dataset containing 1341 nucleotide sites (447 codons/amino acids versus 500 when including gaps). Then the three non-functional sequences (aardvark, armadillo and sloth) were added to this dataset using the program MACSE specifically designed to deal with frameshifts and stop codons . Ambiguously aligned codons were then excluded using Gblocks  leading to a dataset with 1239 nucleotide sites (413 codons) for 56 taxa. Maximum likelihood (ML) phylogenetic reconstructions were performed on the two nucleotide alignments under the GTR + GAMMA model using RAxML 7.3.2 . The statistical support for the ML tree was evaluated using 100 bootstrap replications of the initial ML search. The platypus (Ornithorhynchus anatinus) sequence was used as outgroup in all analyses.
We evaluated the selective constraints acting on AMBN by performing a number of statistical tests based on the ratio of non-synonymous (dN) to synonymous (dS) substitutions (dN/dS or ω) using the codeml program of the PAML 4.7a package . In all analyses, we used the mammalian species tree topology as inferred by Meredith et al. , except for the position of the placental root, which follows Romiguier et al. . We first tested for significant among-site variation in dN/dS ratio along the AMBN molecule by using site-specific codon models for the detection of positive selection. This was achieved by comparing the fit of a nearly neutral model (M7) versus a model (M8) that allows a fraction of positively selected sites [78, 79]. These two models were compared using a hierarchical likelihood ratio test (LRT) . A conservation index (CI) was then calculated for each codon from the Bayes Empirical Bayes (BEB) estimated mean values of ω obtained under the M8 model following the procedure proposed by Burk-Herrick et al. . The CI was defined as 1-ω with values of neutral (ω = 1) and positively selected sites (ω > 1) being set to 0 in order to graphically represent site-specific selective constraints along the AMBN molecule .
The branch specific variation of dN/dS in AMBN across the mammalian species tree was jointly reconstructed from the complete 56-taxa dataset with divergence times while controlling the effect of life-history traits  using the Bayesian framework implemented in the CoEvol program . The dN/dS ratio was estimated under the dsom procedure together with the 24 calibration constraints issued from Benton et al.  that were compatible with our taxon sampling to estimate divergence times. The prior on the root node was set to 220 Ma with a standard deviation of 60 Ma following the expert estimate for the split between Monotremata and Placentalia reported on the TimeTree2 website . The values of the three life-history traits incorporated into the analysis (body mass, longevity, and sexual maturity) were extracted from the PanTheria database . Two independent MCMC were run for a total of 2500 cycles sampling points every cycle. The first 500 points of each MCMC were then excluded as the burnin, and inferences were made from the remaining 2000 sampled points.
To further test for significant variations in dN/dS along specific branches, we used the branch model implemented in codeml allowing ω to vary among specific branches and/or clades . Hierarchical LRTs were calculated between the one-ratio model (M0) and alternative models (M2ω, M4ω, M5ω, M8ω), in which branches with a mean dN/dS > 0.6 as estimated in the previous Bayesian reconstruction were allowed to have their own ω (see Table 1).
Ancestral sequence reconstruction
Ancestral sequence reconstruction of mammalian AMBN was computed with FastML  on the FastML server . The AMBN sequence of the caiman (Caiman crocodilus) was added to the 53-taxa alignment using MAFFT in order to serve as outgroup. FastML was used for inferring ancestral amino acid sequences at each node of the nucleotide-derived ML tree under the LG + G8 + F model using the marginal reconstruction procedure including indels.
For a better understanding of the results presented below, our current knowledge of AMBN structure (exemplified in humans) and the location of cleavage sites of the protein (identified in pigs) are illustrated in Fig. 1.
Alignment and sequence comparisons
The 56 AMBN sequences studied here represent 46 families distributed in 19 orders and are representative of the current mammalian phylogenetic diversity (Additional file 1). The alignment of the 53 functional sequences resulted in a 500 amino acid long sequence, when considering methionine in exon 1 (Additional file 2), and including gaps required for accurate alignment. For convenience of presentation, the duplicated exons 9c, 9e, and 9d were not shown: they were nearly identical and only found in a few species (see below).
Sequence comparison revealed a few insertions, the largest being related to the duplication of small exons in 12 species (see below). These additional exons account for a large part of the length variation observed among the sequences (Additional file 2). In the 10 rodent AMBN analyzed, an insertion of 4 or 5 residues (QG/NMAP) was identified in the region encoded by exon 3. This event likely occurred in the rodent ancestral branch after the divergence with lagomorphs (rabbit and pika), which do not possess this insertion. The insertion is located close to a remarkably conserved region and probably results from the duplication of the neighboring PGMAS residues (Additional file 2). In hedgehog, six residues were inserted in the region encoded by exon 13 at positions 365–370, and in marsupials three amino acids (IKQ) were inserted in the region encoded by the 3′ end of exon 2. In addition, a few insertions and deletions (indels) of one to five residues were found distributed in the whole sequences and no sequence repeats were identified, to the exception of the exon duplication reported below.
Human AMBN possesses two exons, exons 8 and 9, which are absent in rodent sequences. Therefore, we screened the DNA region located between exons 7 and 10 in the mammalian genomic sequences to check whether additional exons were present in this region. We show that a gene structure composed of 11 exons (i.e., exons 1 to 7 and 10 to 13) characterizes 41 mammalian AMBN out of the 53 functional sequences investigated. This structure is also conserved in caiman, which indicates that the ancestral mammalian AMBN most probably possessed 11 exons. In 12 species, however, AMBN displays additional exons in this particular region. They are eight simiiform primates (out of nine), three laurasiatherians (horse, microbat and hedgehog), and one afrotherian (elephant shrew) (Additional file 2): one exon in orangutan and microbat; two in humans, chimpanzee, gibbon, squirrel monkey, marmoset and hedgehog; three in elephant shrew; four in baboon and macaque; and six in horse. These additional exons have the same length (39 bp) and are nearly identical in sequence to exon 7 (Additional file 2). These features strongly suggest that these exons originated all from tandem duplications of a short DNA region containing exon 7. This region is the only AMBN location, in which additional exons were identified. The two exons found between exons 7 and 10 in human AMBN were numbered 8 and 9a; therefore, the extra exons found between exon 9a and 10 were numbered 9b, 9c, etc. In the horse, the last duplicated exon was numbered 9e. It is worth noting that closely related species such as shrew vs. hedgehog, megabat vs. microbat, etc. lacked duplicated exons, a finding which strongly suggests that the duplications occurred independently within the different lineages (Additional file 2). However, in primates, duplication of this short DNA region is found only in the Simiiformes (Platyrrhini + Catarrhini), which could indicate that the first duplication likely occurred in the common ancestor of simiiforms, after this lineage diverged from Tarsiiformes, and was subsequently conserved in all simiiform lineages except in gorilla. The presence of these duplicated exons contributes to a substantial increase in length of the encoded protein: from 421 amino acids (aa) on average in the majority of sequences to e.g., 447 aa in humans, 473 in baboon, and 499 aa in horse, which is the largest AMBN mammalian sequence known to date (Additional file 2). These exons also encode two to four prolines, which contribute to enlarge the proline-rich region of the protein.
Analyses of exon/intron boundaries in representative species indicated that the 21 nucleotides on each side of the exons are rather well conserved (Additional file 3). The unchanged nucleotides in these intronic regions could be important for intron splicing.
Translation initiation site (TIS) and signal peptide (SP) analyses
Alignment of the 5' region of the AMBN sequences revealed that the 3' end of exon 1 and the 5' end of exon 2 are well conserved (Fig. 2). Depending on the species, in exon 1, exon 2 and/or in both, DNAFSMiner identified one to five ATGs in the right context i.e., that does not lead to in-frame downstream stop codons. For each sequence analyzed, the first ATG was always predicted as being the functional TIS. In most placentals the putative TIS was located in exon 1, but with a few exceptions (bushbaby, hedgehog, bats and elephant), in which it is found in exon 2 as in marsupials, and monotremes (Fig. 2). In placentals, the TIS was located at the same position in the 3' region of exon 1, with the exception of Bovinae, in which another TIS, was identified upstream. In humans, for instance, DNAFSMiner predicted that the first ATG (starting at position 76, exon 1) is the correct functional TIS (score = 0.844). However, when exon 1 was not included in the analysis, the first ATG in exon 2 (position 106) was also predicted as the functional TIS (score = 0.627). This means that the second TIS could be used if that encoded by exon 1 was lacking, i.e. by using different promoter and transcription initiation sites.
Interestingly, the predicted TIS in exon 1 exhibits a weak Kozak consensus as it lacks a purine (A/G) in position −3 while possessing a pyrimidine in position +4. However, such ATG could serve as the initiator codon . In contrast, the ATG located in exon 2 has a strong Kozak consensus (purines in −3 and +4). As a consequence, translation could occur not only at the first (weak) TIS but also at the next (strong) TIS, resulting in the synthesis of two proteins . The additional TIS identified in cow AMBN exon 1 displays also a strong Kozak consensus (a purine at positions −3 and +4), and DNAFSMiner predicted this TIS as the functional one. This additional TIS is also present in the water buffalo Bubalus bubalis and the American bison Bison bison (Bovinae) but is absent in the sheep, goat and Tibetan antelope Pantholops hodgsonii (Antilopinae) AMBN sequences (not shown), which suggests that the CTG to ATG substitution occurred along the Bovinae lineage after its divergence from Antilopinae some 20 Mya .
Our alignment indicates that the TIS in exon 2 is conserved in all AMBN analyzed, including that of the caiman, which means an ancient origin for this ATG. In contrast, the ATG in exon 1 was probably recruited in the eutherian ancestor, after an A to T substitution changed the ancestral AAG to ATG (Fig. 2). This ATG appears to have been secondarily lost independently in various lineages or species: ATG to CTG in elephant, ATG to AGG in bats, and ATG to ACA in bushbaby. However, in most primates, glires, most laurasiatherians, and five afrotherians out of six studied, the translated, putative AMBN sequences exhibit two methionines: the methionine encoded by the ATG in exon 1, numbered M1, and the one encoded by the ATG in exon 2, M11. In cow the first methionine is located five residues upstream M1.
The analysis of the putative SPs with SignalP revealed a similar, high probability for the SP starting either at M1 or M11. As a consequence, the SP of most placental AMBN can be either large (26 aa), when starting at M1, or short (16 aa) when beginning at M11. The three typical regions — positively charged n-region, hydrophobic h-region, and polar c-region — were found in both SPs (Additional file 4). The maximal cleavage site probability was always located between Ala26 and Val27. The corresponding predicted cleavage site is strongly supported by the presence of these two residues in all AMBN sequences analyzed, excepted for the platypus, in which Val is replaced with Ile from the same amino acid group (Additional file 2).
Polyadenylation sites (PS)
Several putative alternative polyadenylation signals (PS) were identified in the 3' UTR of the mammalian AMBN sequences (Table 2). This region is large, ranging from 441 to 557 bp in representative species of the main lineages (493 bp in the caiman). The number of PS ranged from two in the platypus to five in several species, including humans, and were numbered PS1 to PS5, from the stop codon onwards. PS were either the highly conserved hexamer AATAAA found at three locations (PS2, PS3, PS5) or the common variant ATTAAA found at two locations (PS1, PS4) (Table 2). In our dataset the latter was less conserved than the canonical signal, and it was not found in the caiman sequence. When both present, PS1 and PS2 are close to one another and can be considered a single PS; PS3 is found in all sequences but the platypus; PS4 is present in rodents and primates, in some laurasiatherians and in marsupials; PS5 is found in all sequences. The caiman AMBN possesses PS2, PS3 and PS5. In marsupials, PS4 and PS5 are close one to each other. Altogether these findings suggest that (i) three PS (PS2, PS3 and PS5) were present in the 3'UTR of the last common ancestral mammalian AMBN and (ii) PS3 was secondarily lost in the platypus lineage.
Amino acid composition
EMPs are characterized by their high amount of proline (P) and glutamine (Q), and are included in the P/Q-rich sub-family of SCPPs . In the seven representative species analyzed for AMBN residue composition, including caiman, proline is well represented (14.09 % in average for the full-length sequence and 23.78 % in average for the proline rich region) compared to the 5.1 % encountered in most proteins (Additional file 5). Glutamine (8.7 and 15.7 %, respectively) is also well represented compared to the overall value of 4.0 %. It is worth noting that the percentage of proline in both the full-length sequence and the proline-rich domain is relatively similar from caiman to humans, while the glutamine percentage is decreasing from 11.79 to 6.49 % and from 20.93 to 11.20 %, respectively. These two amino acids represent an average of ca 23 % of the residues in the full-length AMBN sequence and ca 40 % in the proline-rich domain. In the ancestral mammalian AMBN (see below) these amino acids represent ca 24 % (P = 14.1; Q = 9.8 %) and ca 42 % (22.2 + 20.2), respectively (Additional file 5).
Ancestral AMBN and enamel-less species
In addition to its slightly higher P/Q percentage compared to living species, the ancestral mammalian AMBN, calculated from the 53 functional AMBN and using the caiman sequence as outgroup, displays a structure composed of 11 exons (1–7; 10–13), i.e., without duplicated exons between exons 7 and 10. There is a single TIS located in exon 2. Otherwise, the ancestral AMBN sequence is structurally similar to that of most mammalian AMBN and, in particular, possesses all the important positions and domains reported above.
Sloth, armadillo and aardvark sequences were aligned with the ancestral AMBN. This alignment revealed that selective pressures on the AMBN sequence of these enamel-less species were more relaxed in sloth and aardvark than in armadillo (Fig. 3). Sloth and aardvark AMBN show (i) numerous residue substitutions, some having occurred in conserved positions, (ii) premature stop codons resulting from nucleotide substitution, (iii) frameshifts in the region encoded by the large exon 13 resulting from nucleotide insertion or deletion, and (iv) nucleotide substitutions in the splicing sites (not shown). In addition, a large deletion has occurred in the DNA region of aardvark AMBN between exons 5 and 7, and exon 6 has disappeared. Taken together these features indicate that AMBN is no longer functional in these two species and was subjected to pseudogenization. In contrast, armadillo AMBN does not exhibit stop codons and frameshifts, and residue substitutions are less numerous. However, several amino acid substitutions have occurred on conserved positions that were suggested above to play an important role for the protein function (Fig. 3). These data indicate that armadillo AMBN either was recently inactivated or is still active but the encoded protein is probably defective.
The ML phylogram inferred from the complete 56-taxa nucleotide data set under the GTR + GAMMA model is presented in Fig. 4. This tree illustrates the strong phylogenetic signal contained in AMBN for resolving the phylogeny of mammals. Indeed, the ML topology supports the four major placental clades: Xenarthra, Afrotheria, Laurasiatheria and Euarchontoglires, and the grouping of the latter two in Boreoeutheria. Furthermore, almost all nodes are fully resolved and supported by bootstrap values of more than 90 % with the exception of, for instance, the relationships among laurasiatherian orders or the position of tree shrew within Euarchontoglires (Fig. 4). The ML estimate of the gamma shape parameter (α) was 0.92, which underlines the relative homogeneity of the site-specific substitution rates along the AMBN molecule. Among placentals, strong variations in substitution rates are revealed with contrasted branch lengths observed in the ML tree, with fast evolving taxa such as tenrec and elephant-shrew in Afrotheria, shrew and hedgehog in Laurasiatheria, all Glires except pika and squirrel, and bushbaby in Primates within Euarchontoglires.
Site-specific selection constraints and conserved domains
Our amino acid alignment of the 53 functional AMBN sequences revealed that many residues have been unchanged throughout mammalian evolution, some of them being regrouped into short domains (Additional file 2). Other conserved positions include amino acids that can be replaced with residues possessing the same characteristics (conservative positions). Unchanged and conservative positions are subjected to strong selective pressure, which indicates that they are functionally important for the protein. Most of them are located in the N-terminal region of the protein (encoded from exon 2 to beginning of exon 6). Conversely, more variable positions are located in the region encoded by exon 6, exon 7, and for a large part in exon 13 (Additional file 2).
Selective forces acting along the 53 AMBN codon sequences were inferred through non-synonymous vs. synonymous substitution rate (Fig. 5a). The comparison between the nearly neutral model (M7) and the model allowing a fraction of positively selected sites (M8) revealed significant among-site variation in dN/dS with the occurrence of positive selection (p < 0.001; Table 1). Under the M8 model, five codons out of 447 were identified as evolving under significant positive selection (PP > 0.95) during mammalian evolution. These sites are located in the proline rich region encoded by exon 6 at position 135, and in the exon 13 encoded region at positions 328, 360, 375, and 394 (Table 1, Fig. 5b).
The site conservation index calculated from the site-specific dN/dS values obtained under model M8 illustrated the variation of the selection constraints across the AMBN (Fig. 5a). Most of the N-ter region (positions 1 to 123, encoded from exon 1 to the 5' region of exon 6), several small regions dispersed in the AMBN sequence (notably positions 233–265, exons 10–12; 300–320, beginning of exon 13; 379–404, mid exon 13) and a short C-ter sequence (442–447, encoded by exon 13) are subjected to high functional constraints. Most of the remaining sequence is characterized by an alternation of high and low selective pressures (Fig. 5a). We identified 161 positions (out of 447 residues in human AMBN) subjected to purifying selection during more than 200 Ma of mammalian evolution pointing to biologically significant residues (Fig. 5b). These constrained positions have certainly important functions, which explains they were either unchanged (80 positions) or they displayed only few substitutions (81 positions). Among these residues are: the cleavage sites N31, G232, Y249 and G326/L327; the hydroxylated P11; the O-glycosylated S86; and the phosphorylated S17, S235, S236 and T277. In contrast, our study identified as variable positions, i.e., having probably no important function, the putative cleavage sites Q130 and R196, hydroxylated P350, and O-glycosylated T387 (Fig. 5b). Out of this dozen of important positions, there are 150 other selectively constrained positions that our evolutionary analysis pointed out as being important for the correct functioning of AMBN. Therefore, we predict that these conserved positions in AMBN, and particularly the unchanged ones, would be potentially responsible for enamel disorder if they were substituted.
Three conserved domains (or motifs) were previously identified as possibly playing an important functional or structural role (Fig. 5b). Between the two CD63-interaction domains the position E364 is conserved, while the RPGF motif is not. Half of the residues of the calcium-binding domain (Y217-R230) and eight residues out of 28 in the heparin-binding domain are conserved, which confirms that these domains are important. In addition to these domains, several conserved residues are located close one to the other in a same region of the protein, representing new motifs of unknown but important function. They are housed principally in the N-terminal region (residues 1–124) of the protein encoded by exons 3, 4, 5 and beginning of exon 6 (Fig. 5b). This large, conserved AMBN region contains important residues: the two first residues that are involved in the cleavage of the signal peptide, the hydroxylated P11, the phosphorylated S17, the O-glycosylated S86 and the cleavage site N31. The residues on both sides of these positions are also conserved, but in the same region other highly conserved amino acids constitute motifs of unknown function. We identified notably two highly conserved peptide sequences: the one is encoded by exons 3 and 4 (35 residues, 29 of them being conserved), and the other encoded by the 3' end of exon 5 and 5' extremity of exon 6 (22 residues conserved), that certainly play an important role remaining to be defined. Interestingly, it is known that the 5' end of exon 6 is alternatively spliced in the few human, rodent and porcine AMBN transcripts available. Our alignment revealed that the residue at the splicing site (Q87 = always encoded by CAG, not shown) is well conserved (Additional file 2). Similarly, several, highly conserved motifs encoded by exons 5 and 13 are highlighted as being probably important for the functionality of AMBN.
In mammals, AMBN does not possess a conserved RGD (integrin-binding) motif; however, we found an RGD sequence in the region encoded by the beginning of exon 13 in the cow and in four primates. This sequence was clearly acquired secondarily during mammalian evolution through the substitution RGG to RGD. The VTKG sequence (heparin-binding motif) is only present in rat, mouse, hamster and kangaroo rat, and was clearly acquired secondarily during rodent evolution through the substitution ATKG to VTKG. However, the TKG sequence is conserved in all eutherians and the lysine, K288, is unchanged (Fig. 5b). The DGEA motif previously described in rats and mice as a possible interacting site with integrins is also present in the hamster AMBN but not in other rodents. This motif was acquired secondarily during murid evolution.
Lineage-specific selection constraints
The estimation of the global dN/dS ratio across the whole tree using the one-ratio branch model (M0) resulted in a ω value of 0.46 showing that AMBN is globally evolving under purifying selection in mammals (Table 1). The joint Bayesian reconstruction of the dN/dS ratio along the phylogeny revealed large variations among mammalian lineages with mean ω values of functional sequences ranging from 0.15 in cat to 1.03 in gibbon (Fig. 6). As expected, elevated ω values were found in the non-functional sequences of aardvark (0.65) and sloth (1.04) as a consequence of relaxed selection constraints, whereas the armadillo has a lower value (0.44). Conversely, some lineages with functional AMBN showed high dN/dS values such as elephant (0.74), elephant shrew (0.73), and aye-aye (0.71). However, AMBN has also evolved under strong purifying selection for instance in rodents with low values of ω observed in this group overall (0.19-0.32).
The most striking pattern in this reconstruction is nonetheless the high dN/dS ratios inferred along the whole catarrhine subtree, suggesting the occurrence of positive selection on AMBN in this particular group of Primates (Fig. 6). Hierarchical likelihood ratio tests performed between different branch models allowing ω to vary among branches of the phylogeny revealed that strong positive selection has occurred in Catarrhini (Table 1). Indeed, branch model LRTs for positive selection in the catarrhine subtree were strongly significant (p < 0.001) with both the 53-taxa and 56-taxa data sets, in which the dN/dS ratio in Catarrhini was estimated to be respectively 1.32 and 1.11. Finally, among the candidate branches revealed by the previous Bayesian analysis, two were estimated to have ω values higher than 1: in sloth (ω = 1.10), in which AMBN is pseudogenized, and in aye-aye (ω = 1.13), in which it nevertheless appears fully functional.
A second translation initiation site (TIS) was recruited in an ancestral eutherian
Two TIS were clearly identified in our analysis as being conserved through mammalian evolution: one in exon 1, the other in exon 2. The latter is the ancestral TIS in tetrapod AMBN; it is the only TIS present in reptiles  and frogs . This TIS is homologous to that of other SCPPs . Interestingly, our analysis indicated also that the former TIS was recruited in the last common eutherian ancestor, then conserved in most lineages during more than 100 Ma of evolution . This finding suggests that this additional TIS plays a role in enamel formation in most eutherian species, and contradicts previous studies proposing that the only functional TIS was that in exon 2 [9, 32]. The presence of two (predicted) functional TIS means, however, that they do function through alternative splicing of exon 1 and that this process could play a role in the regulation of the amount of protein synthesized from the mRNA [66, 67]. Such alternative splicing may result in the presence of two AMBN isoforms: one with a short, 16 amino acid long, signal peptide (SP) and the other with a long SP (26 residues), the function of which remains to be tested. The presence of two signal peptides (SP) is quite common in proteins, in which two SP could play a role in the export efficiency of the protein . During amelogenesis, the either use of these SP could regulate AMBN secretion in the forming enamel matrix. For further discussion on this topic we refer the readers to our previous study on ENAM, in which two functional TIS were similarly identified in two exons (exons 3 and 4), resulting in a short and a long SP . The second, short SP of AMBN is homologous to the second SP described in ENAM , and sequence similarity in the N-terminal region of these two EMPs allowed to propose that AMBN was derived from an ancestral ENAM after duplication . In fact, AMBN exon 2 and ENAM exon 4 are homologous exons, and both possess 15 nucleotides upstream the TIS. In contrast, AMBN exon 1 and ENAM exon 3, in which is located the first TIS, are not homologous. They were recruited separately during evolution. In ENAM the first TIS was recruited in a mammalian ancestor more than 200 Ma , while in AMBN we showed here that it was recruited later, in an eutherian ancestor, more than 100 Ma. In mammals, the convergent recruitment of an additional TIS upstream the ancestral one in two related enamel matrix proteins is, however, intriguing. Indeed, one can wonder whether the second event i.e., the recruitment of a large SP in AMBN, was linked to the presence of two SP in ENAM, and what was the benefit of this feature for enamel microstructure. To our knowledge there are no data in the literature showing to what extend the relative amount of the two EMPs, AMBN and ENAM, is important for enamel microstructure, nor whether the two proteins interact. Here, we speculate that the TIS responsible for encoding a larger SP has appeared at random in AMBN exon 1, then was conserved as providing a better fit with the different regulation of ENAM secretion through the presence of its two SP.
A hotspot for DNA duplication has appeared in placental AMBN
As previously reported in the human AMBN, we identified additional exons of same length and similar sequence in the same region of 12 unrelated eutherian AMBNs. We concluded that these additional exons resulted from tandem duplications of the region containing exon 7 and, therefore, are all homologous. Because the duplication of this short DNA region occurred repeatedly and independently in various lineages, we consider that the AMBN region located downstream exon 7 is a hotspot for the occurrence of such duplication. The process of tandem duplication is known as a major mechanism for gene elongation, but is it advantageous for AMBN function or simply neutral? Repeated duplication of exon 7 was found in terminal taxa, except in simiiform primates, in which the first duplication of this region could have occurred in their common ancestor. The duplicated region was conserved (and some copies were added) in all simiiform gDNA studied excepted in the gorilla genome, in which either the duplicated region was secondarily lost or this AMBN region was badly assembled. Given the presence of these duplicated exons in various taxa and their conservation in simiiform lineages one can wonder whether the region they encode could not have some selective advantage. In addition to enlarge the protein with 13 residues, each exon encodes two to four prolines. Because these duplications do not change much the high percentage of proline in this region, this is the increase in length of this proline-rich region that could be advantageous. Indeed, it has been suggested that the length of the proline-rich region of the EMPs could determine the supramolecular enamel matrix assembly .
Alternative polyadenylation signals (PS) were conserved through mammalian evolution
The 3'UTR of a mRNA is important and anomalies in this region of the mRNAs are implicated in a variety of human diseases . Here, we identified two to five PS in the 3' UTR of the sequences analyzed. Three of them were present in the ancestral mammalian AMBN and conserved in all sequences during mammalian evolution. These variant PS could also be selected for regulatory purposes in the concerned sequences. For instance, it is known that mRNAs with multiple PS tend to use noncanonical signals (i.e., ATTAAA) more often than mRNAs with a single PS . The long lasting conservation of the three PS through 200 Ma means that AMBN codes for three mRNAs that differ in their 3' end and play an important role in post-transcriptional regulation. This finding partially confirms previous reports of two AMBN transcripts in rodents and humans: a short sequence ending after PS1 and a large sequence ending after PS5 [9, 11, 13]. In rodents, one of these polyadenylation sites shortens the sequence at the 3' end, which could explain the presence of two transcripts. Here we show that a third transcript sequence ending at PS3 could also exist. The presence of multiple PS is significantly observed in mammalian mRNAs, and alternative polyadenylation may also influence the stability, translation efficiency, or localization of an mRNA [98, 99]. Indeed, mRNAs vary greatly in stability depending on the length of the 3'UTR, which changes a.o., the number of potential binding sites for microRNAs. However, to our knowledge AMBN mRNAs were not found among targets of several miRNAs .
Selection analyses highlighted residues and domains important for AMBN function
Our evolutionary analysis revealed that most cleavage sites and most hydroxylated, phosphorylated and O-glycosylated residues either identified or predicted in previous studies are included into the 161 conservative positions that were found here subjected to purifying selection (i.e., functionally constrained) during 200 Ma of mammalian evolution. They are for instance, the hydroxylated P11, phosphorylated S17 (SxE motif) and S86, and the cleavage site N31. These findings, therefore, confirm that these amino acids are important for the AMBN function and, vice versa, given that these positions were previously identified as probably important they support evolutionary analysis as an efficient method for detecting important residues. Similar studies targeting AMEL or ENAM led to the same findings [53, 54]. Interestingly, most of these remarkable residues are adjacent to unchanged amino acids, which indicate that the latter certainly play a role in stabilizing the environment of the crucial residue. Also, our analysis raised some doubts about the potential function of a few positions that were previously predicted as being important in studies of rat and pig AMBN, i.e., some cleavage sites [5, 6, 101, 102], and other hydroxylated and O-glycosylated residues [29, 103–105]. However, conservation of specific sequences could not be necessary for the cleavages to occur.
The proline-rich and calcium-binding domains do not display a large number of unchanged positions but each possesses some conservative positions that probably play a role in the function of the domain. It is known that such domains do not need high sequence conservation but rather a general context appropriate for their function. Proline-rich sequences are known to bind WW and SH3 domains, two small protein modules that mediate protein-protein interactions and are key parts of signaling cascades [106–108]. The calcium-binding site (i.e., an helix-loop-helix structural domain known as EF-hand motif) is supposed either to be involved in mineral nucleation or in calcium-mediated cell binding [23, 29, 30]. Some motifs reported in rodent AMBN, such as VTKG [7, 49, 109] and DGEA  are not conserved in mammalian evolution. Given that these motifs are known to interact with cell surface proteins (integrins) they are supposed to serve to link the ameloblasts to the forming enamel matrix. However, their absence in most mammalian lineages suggests that these motifs were only acquired in the rodent lineage ancestor and could have a functional role solely in rodents.
Remarkably, conserved positions and the adjacent conserved residues are however few compared to the 150 positions that were highlighted in our evolutionary analysis, the function of which remain to be characterized. Two important motifs were identified by their high number of conserved residues in the region encoded by exons 3 and 4, on the one hand, and by the 3' end of exon 5 and the 5' extremity of exon 6, on the other hand. Both motifs contain important residues; the first one houses hydroxylated P11, phosphorylated S17 (SxE motif) and cleavage site N31. It remains to be determined whether the other conserved positions are only important in order to keep these residues functional or have other functions. The second motif contains a remarkable residue, a O-glycosylated S86. Here also one can wonder what could be the function of the other conserved positions. In addition, previous studies have revealed that the 15 amino acid sequence encoded by the 5' extremity of exon 6 (including the O-glycosylated S86) is subjected to alternative splicing in rodents, human and pig AMBN through the presence of an intraexonic splicing site [6, 11, 12]. Here we showed that the residue at the splicing site (Q87 = CAG) is conserved in mammalian evolution, which strongly suggests that the two isoforms resulting from the splicing play distinct but crucial roles for the correct functioning of AMBN. In particular, it is worth noting that (i) the short isoform does not contain this O-glycosylated position, suggesting avoiding interaction with the cell membrane is important to be able to fulfill its function, and (ii) deletion of exon 6 is associated with amelogenesis imperfecta  with hypoplastic enamel as previously reported in mice lacking AMBN exons 5 and 6 (see below) .
Identification of disease-associated mutations potentially leading to AI
A total of 161 positions were identified in mammalian AMBN as conservative positions, i.e., residues that were unchanged or only substituted with a residue having the same properties during more than 200 Ma of mammalian evolution. This finding leads to the prediction that the amino acids located at these positions play an important role. They cannot be changed otherwise the AMBN function linked to this position will be disturbed and will result in a genetic disease with apparent enamel disorder. In fact, the process of natural selection provides us with tests in nature, and our previous evolutionary analyses performed on various SCPPs, including AMELX, ENAM, MEPE, AMTN, and DMP1, have shown that the patterns of long-term evolutionary conservation are crucial for validating human genetic diseases related to residue substitutions [53–57]. Such patterns were recently used for in silico functional diagnoses of non-synonymous single nucleotide variants found in thousands of disease-associated genes (see review in ). Given these various findings, we predict that AMBN is an excellent candidate for enamel genetic disease, amelogenesis imperfecta (AI). This prediction confirmed the conclusions previously reached by many authors because (i) AMBN was located on chromosome 4 in a region containing the locus for AIH2, the autosomal hypoplastic form of AI [12, 47], and (ii) the important function of AMBN during amelogenesis, as illustrated by AMBN −/− mice that exhibit severe enamel hypoplasia .
During years, mutations of AMBN were never found associated with AI phenotypes until recently when hypoplastic AI was associated with homozygous exon 6 deletion . The reason why other AMBN-associated AI were not identified could be due to an autosomal recessive pattern of inheritance, and this is supported by the fact that AMBN +/− mice do not display any dental phenotype . Another reason could be that AMELX compensate, at least partially, for AMBN deficiency. The two encoding genes are phylogenetically related, they share some similarities, notably in the 5' region, and both proteins possess a proline-rich region [4, 112]. Therefore, in case of homozygous mutation of AMBN compensation by AMELX could contribute to weaken the dental phenotype, which could be hardly detectable.
Deletion of exon 6 leads to the lack of 14 highly conserved and 11 conservative residues, among which the O-glycosylated S86, and a large part of the proline-rich region of the protein. Given this number of sensitive positions predicted by our evolutionary analysis, the mutation is validated as being responsible for the AI and this region as playing an important role during enamel matrix formation.
Lineage-specific positive selection and relaxation of functional constraints
Analyses of lineage-specific selective constraints acting on functional AMBN sequences have revealed a large variation in dN/dS ratio across mammals, even though this gene globally evolved under purifying selection. One of our most prominent results is the strong signal for positive selection detected in catarrhine primates, in which the dN/dS ratio was estimated to be greater than 1. This suggests that episodes of positive selection have occurred throughout the evolutionary history of this sub-clade. Interestingly, this elevated dN/dS ratio correlates with the presence of additional AMBN exons in several of these primate species suggesting that these exon duplications might be adaptive. Apart from Catarrhines, we also found evidence for positive selection in AMBN along the aye-aye (Daubentonia madagascariensis) lineage.
In Primates a correlation between enamel thickness and diet has been reported  and enamel thickness is often used to infer the diet of both extant and fossil primates [114, 115]. Adaptive changes related to diet shifts in Primates have been reported in another enamel protein (ENAM) and it has been hypothesized that differences in tooth enamel thickness were correlated with the adaptive evolution of enamelin . Moreover, the aye-aye differs from other lemurs in possessing rodent-like gnawing and ever-growing incisors, and molars with a particularly thick enamel layer . Our results therefore suggest that adaptive changes might have occurred in AMBN of catarrhines and aye-aye in response to selective constraints imposed by dietary adaptation through changes in enamel thickness.
Our analyses also detected traces of relaxed selection in enamel-less species: aardvark, armadillo, and sloth. In adult aardvarks (Afrotheria), armadillos and sloths (Xenarthra), teeth indeed lack enamel. In the nine-banded armadillo (Dasypus novemcinctus), however, at birth the teeth are covered with a thin layer of enamel that is no longer present in adults [61, 117]. In birds and in various tooth- or enamel-less mammals, selective pressures on enamel-specific proteins were relaxed after the ability to form enamel was lost, and the genes were inactivated, becoming pseudogenes [39–41, 118, 119]. As expected from these previous studies, we showed that AMBN has accumulated random deleterious mutations resulting in stop codon and frameshifts in sloth and aardvark, which confirms that AMBN is an enamel-specific protein. In the nine-banded armadillo, the AMBN sequence has not drastically changed, a condition that could be expected when considering the presence of a thin enamel cover in the young, which strongly suggests that AMBN could be expressed during enamel formation prior to birth. However, we identified several substitutions at AMBN positions that were considered important for the function of the protein. Such mutations could lead to the expression of a defective protein resulting in the deposition of a thin (hypoplastic?) enamel layer at the tooth surface, which subsequently disappears after birth through abrasion. Further structural and molecular studies of enamel layer formation in this armadillo species are however needed in order to confirm or infirm this hypothesis.
By adding 50 new sequences to the six previously published sequences, our study improved considerably our knowledge on the gene structure, protein composition and evolution of AMBN in mammals. In particular, we revealed that (i) an additional TIS was recruited in an ancestral eutherian AMBN, allowing the translation of a second, alternative, large SP, (ii) a short DNA sequence including exon 7 was duplicated several times in various eutherian species, increasing the proline-rich region, (iii) several PS were functional, suggesting a regulation process in the 3'UTR, (iv) numerous residues were conserved during more than 200 Ma of mammalian evolution, which strongly suggests that they are structurally and/or functionally important for the correct function of the protein, (v) several conserved residues constituted new domains of predicted high importance, and (vi) AMBN was invalidated in enamel-less species as previously reported [40, 41]. The putative function of some residues identified in previous studies was not confirmed by our analyses, and cannot be generalized to all mammals. Finally, the presence of highly conserved residues indicates that AMBN is a good candidate gene for amelogenesis imperfecta.
Kawasaki K, Weiss KM. Mineralized tissue and vertebrate evolution: the secretory calcium-binding phosphoprotein gene cluster. Proc Natl Acad Sci U S A. 2003;100:4060–5.
Kawasaki K. The SCPP gene repertoire in bony vertebrates and graded differences in mineralized tissues. Dev Genes Evol. 2009;219:147–57.
Sire J-Y, Delgado S, Girondot M. The amelogenin story: origin and evolution. Eur J Oral Sci. 2006;114 Suppl 1:64–77. discussion 93–95, 379–380.
Sire J-Y, Davit-Béal T, Delgado S, Gu X. The origin and evolution of enamel mineralization genes. Cells Tissues Organs. 2007;186:25–48.
Uchida T, Fukae M, Tanabe T, Yamakoshi Y, Satoda T, Murakami C, et al. Immunochemical and immunocytochemical study of a 15 kDa non-amelogenin and related proteins in the porcine immature enamel: proposal of a new group of enamel proteins “sheath proteins”. Biomed Res. 1995;16:131–40.
Hu CC, Fukae M, Uchida T, Qian Q, Zhang CH, Ryu OH, et al. Sheathlin: cloning, cDNA/polypeptide sequences, and immunolocalization of porcine enamel sheath proteins. J Dent Res. 1997;76:648–57.
Černý R, Slaby I, Hammarström L, Wurtz T. A novel gene expressed in rat ameloblasts codes for proteins with cell binding domains. J Bone Miner Res. 1996;11:883–91.
Termine JD, Belcourt AB, Christner PJ, Conn KM, Nylen MU. Properties of dissociatively extracted fetal tooth matrix proteins. I. Principal molecular species in developing bovine enamel. J Biol Chem. 1980;255:9760–8.
Krebsbach PH, Lee SK, Matsuki Y, Kozak CA, Yamada KM, Yamada Y. Full-length sequence, localization, and chromosomal mapping of ameloblastin a novel tooth-specific gene. J Biol Chem. 1996;271:4431–5.
Fong CD, Cerný R, Hammarström L, Slaby I. Sequential expression of an amelin gene in mesenchymal and epithelial cells during odontogenesis in rats. Eur J Oral Sci. 1998;106 Suppl 1:324–30.
Simmons D, Gu TT, Krebsbach PH, Yamada Y, MacDougall M. Identification and characterization of a cDNA for mouse ameloblastin. Connect Tissue Res. 1998;39:3–12. discussion 63–67.
MacDougall M, Simmons D, Gu TT, Forsman-Semb K, Kärrman Mårdh C, Mesbah M, et al. Cloning, characterization and immunolocalization of human ameloblastin. Eur J Oral Sci. 2000;108:303–10.
Toyosawa S, Fujiwara T, Ooshima T, Shintani S, Sato A, Ogawa Y, et al. Cloning and characterization of the human ameloblastin gene. Gene. 2000;256:1–11.
Shintani S, Kobata M, Toyosawa S, Fujiwara T, Sato A, Ooshima T. Identification and characterization of ameloblastin gene in a reptile. Gene. 2002;283:245–54.
Shintani S, Kobata M, Toyosawa S, Ooshima T. Identification and characterization of ameloblastin gene in an amphibian, Xenopus laevis. Gene. 2003;318:125–36.
Kawasaki K. The SCPP gene family and the complexity of hard tissues in vertebrates. Cells Tissues Organs. 2011;194:108–12.
Kawasaki K, Amemiya CT. SCPP genes in the coelacanth: tissue mineralization genes shared by sarcopterygians. J Exp Zoolog B Mol Dev Evol. 2014;322:390–402.
Hedges SB. Vertebrates (Vertebrata). In: Hedges SB, Kumar S, editors. The TimeTree of Life. New York: Oxford University Press; 2009. p. 309–14.
Nanci A, Zalzal S, Lavoie P, Kunikata M, Chen W-Y, Krebsbach PH, et al. Comparative immunochemical analyses of the developmental expression and distribution of ameloblastin and amelogenin in rat incisors. J Histochem Cytochem. 1998;46:911–34.
Iwata T, Yamakoshi Y, Hu JC-C, Ishikawa I, Bartlett JD, Krebsbach PH, et al. Processing of ameloblastin by MMP-20. J Dent Res. 2007;86:153–7.
Fukumoto S, Kiba T, Hall B, Iehara N, Nakamura T, Longenecker G, et al. Ameloblastin is a cell adhesion molecule required for maintaining the differentiation state of ameloblasts. J Cell Biol. 2004;167:973–83.
Sonoda A, Iwamoto T, Nakamura T, Fukumoto E, Yoshizaki K, Yamada A, et al. Critical role of heparin binding domains of ameloblastin for dental epithelium cell adhesion and ameloblastoma proliferation. J Biol Chem. 2009;284:27176–84.
Zhang Y, Zhang X, Lu X, Atsawasuwan P, Luan X. Ameloblastin regulates cell attachment and proliferation through RhoA and p27. Eur J Oral Sci. 2011;119:280–5.
Zeichner-David M, Chen L-S, Hsu Z, Reyna J, Caton J, Bringas P. Amelogenin and ameloblastin show growth-factor like activity in periodontal ligament cells. Eur J Oral Sci. 2006;114:244–53.
Bartlett JD, Ganss B, Goldberg M, Moradian-Oldak J, Paine ML, Snead ML, et al. Protein–protein interactions of the developing enamel matrix. In: Schatten GP editor. Current Topics in Developmental Biology. Volume 74. Academic Press, Waltham; 2006:57–115.
Nakamura Y, Slaby I, Spahr A, Pezeshki G, Matsumoto K, Lyngstadaas SP. Ameloblastin fusion protein enhances pulpal healing and dentin formation in porcine teeth. Calcif Tissue Int. 2006;78:278–84.
Iizuka S, Kudo Y, Yoshida M, Tsunematsu T, Yoshiko Y, Uchida T, et al. Ameloblastin regulates osteogenic differentiation by inhibiting Src kinase via cross talk between Integrin β1 and CD63. Mol Cell Biol. 2011;31:783–92.
Spahr A, Lyngstadaas SP, Slaby I, Haller B, Boeckh C, Tsoulfidou F, et al. Expression of amelin and trauma-induced dentin formation. Clin Oral Investig. 2002;6:51–7.
Yamakoshi Y, Tanabe T, Oida S, Hu C-C, Simmer JP, Fukae M. Calcium binding of enamel proteins and their derivatives with emphasis on the calcium-binding domain of porcine sheathlin. Arch Oral Biol. 2001;46:1005–14.
Vymětal J, Slabý I, Spahr A, Vondrášek J, Lyngstadaas SP. Bioinformatic analysis and molecular modelling of human ameloblastin suggest a two-domain intrinsically unstructured calcium-binding protein. Eur J Oral Sci. 2008;116:124–34.
Zhang X, Diekwisch TGH, Luan X. Structure and function of ameloblastin as an extracellular matrix protein: adhesion, calcium binding, and CD63 interaction in human and mouse. Eur J Oral Sci. 2011;119:270–9.
Dhamija S, Liu Y, Yamada Y, Snead ML, Krebsbach PH. Cloning and characterization of the murine ameloblastin promoter. J Biol Chem. 1999;274:20738–43.
Tamburstuen MV, Snead ML, Reseland JE, Paine ML, Lyngstadaas SP. Ameloblastin upstream region contains structural elements regulating transcriptional activity in a stromal cell line derived from bone marrow. Eur J Oral Sci. 2011;119:286–92.
Dhamija S, Krebsbach PH. Role of Cbfa1 in ameloblastin gene transcription. J Biol Chem. 2001;276:35159–64.
Spahr A, Lyngstadaas SP, Slaby I, Pezeshki G. Ameloblastin expression during craniofacial bone formation in rats. Eur J Oral Sci. 2006;114:504–11.
Tamburstuen MV, Reseland JE, Spahr A, Brookes SJ, Kvalheim G, Slaby I, et al. Ameloblastin expression and putative autoregulation in mesenchymal cells suggest a role in early bone formation and repair. Bone. 2011;48:406–13.
Tamburstuen MV, Reppe S, Spahr A, Sabetrasekh R, Kvalheim G, Slaby I, et al. Ameloblastin promotes bone growth by enhancing proliferation of progenitor cells and by stimulating immunoregulators. Eur J Oral Sci. 2010;118:451–9.
Kuroda S, Wazen R, Sellin K, Tanaka E, Moffatt P, Nanci A. Ameloblastin is not implicated in bone remodelling and repair. Eur Cell Mater. 2011;22:56–66. discussion 66–67.
Sire J-Y, Delgado SC, Girondot M. Hen’s teeth with enamel cap: from dream to impossibility. BMC Evol Biol. 2008;8:246.
Deméré TA, McGowen MR, Berta A, Gatesy J. Morphological and molecular evidence for a stepwise evolutionary transition from teeth to baleen in mysticete whales. Syst Biol. 2008;57:15–37.
Meredith RW, Gatesy J, Springer MS. Molecular decay of enamel matrix protein genes in turtles and other edentulous amniotes. BMC Evol Biol. 2013;13:20.
Kärrman Mårdh C, Bäckman B, Simmons D, Golovleva I, Gu TT, Holmgren G, et al. Human ameloblastin gene: genomic organization and mutation analysis in amelogenesis imperfecta patients. Eur J Oral Sci. 2001;109:8–13.
Hart PS, Wright JT, Savage M, Kang G, Bensen JT, Gorry MC, et al. Exclusion of candidate genes in two families with autosomal dominant hypocalcified amelogenesis imperfecta. Eur J Oral Sci. 2003;111:326–31.
Santos MC, Hart PS, Ramaswami M, Kanno CM, Hart TC, Line SR. Exclusion of known gene for enamel development in two Brazilian families with amelogenesis imperfecta. Head Face Med. 2007;3:8.
Chan H-C, Estrella NMRP, Milkovich RN, Kim J-W, Simmer JP, Hu JC-C. Target gene analyses of 39 amelogenesis imperfecta kindreds. Eur J Oral Sci. 2011;119:311–23.
MacDougall M. Dental structural diseases mapping to human chromosome 4q21. Connect Tissue Res. 2003;44 Suppl 1:285–91.
MacDougall M, DuPont BR, Simmons D, Reus B, Krebsbach P, Kärrman C, et al. Ameloblastin gene (AMBN) maps within the critical region for autosomal dominant amelogenesis imperfecta at chromosome 4q21. Genomics. 1997;41:115–8.
Poulter JA, Murillo G, Brookes SJ, Smith CEL, Parry DA, Silva S, et al. Deletion of ameloblastin exon 6 is associated with amelogenesis imperfecta. Hum Mol Genet. 2014;23:5317–24.
Fukumoto S, Yamada A, Nonaka K, Yamada Y. Essential roles of ameloblastin in maintaining ameloblast differentiation and enamel formation. Cells Tissues Organs. 2005;181:189–95.
Perdigão PF, Gomez RS, Pimenta FJGS, De Marco L. Ameloblastin gene (AMBN) mutations associated with epithelial odontogenic tumors. Oral Oncol. 2004;40:841–6.
Perdigão PF, Carvalho VM, Marco LD, Gomez RS. Mutation of ameloblastin gene in calcifying epithelial odontogenic tumor. Anticancer Res. 2009;29:3065–7.
Hirayama K, Miyasho T, Ohmachi T, Watanabe T, Yokota H, Taniyama H. Biochemical and immunohistochemical characterization of the amyloid in canine amyloid-producing odontogenic tumor. Vet Pathol Online. 2010;47:915–22.
Delgado S, Girondot M, Sire J-Y. Molecular evolution of amelogenin in mammals. J Mol Evol. 2005;60:12–30.
Al-Hashimi N, Sire J-Y, Delgado S. Evolutionary analysis of mammalian enamelin, the largest enamel protein, supports a crucial role for the 32-kDa peptide and reveals selective adaptation in rodents and primates. J Mol Evol. 2009;69:635–56.
Bardet C, Delgado S, Sire J-Y. MEPE evolution in mammals reveals regions and residues of prime functional importance. Cell Mol Life Sci. 2010;67:305–20.
Gasse B, Silvent J, Sire J-Y. Evolutionary analysis suggests that AMTN is enamel-specific and a candidate for AI. J Dent Res. 2012;91:1085–9.
Silvent J, Sire J-Y, Delgado S. The dentin matrix acidic phosphoprotein 1 (DMP1) in the light of mammalian evolution. J Mol Evol. 2013;76:59–70.
Kumar S, Dudley JT, Filipski A, Liu L. Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet. 2011;27:377–86.
National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov]
Davit-Béal T, Tucker AS, Sire J-Y. Loss of teeth and enamel in tetrapods: fossil record, genetic data and morphological adaptations. J Anat. 2009;214:477–501.
Higgins DG, Thompson JD, Gibson TJ. Using CLUSTAL for multiple sequence alignments. In: Doolittle RF, editor. Methods in Enzymology. Volume 266. Academic Press, Waltham; 1996:383–402. [Computer Methods for Macromolecular Sequence Analysis]
Rambaut A. Se-Al: Sequence Alignement Editor. Oxford: Department of Zoology. University of Oxford; 1996.
Se-Al software. [http://tree.bio.ed.ac.uk/software/seal/]
DNA Functional Site Miner. [http://dnafsminer.bic.nus.edu.sg]
Kozak M. Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. Nucleic Acids Res. 1984;12:857–72.
Kozak M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell. 1986;44:283–92.
SignalP Server. [www.cbs.dtu.dk/services/SignalP]
Dyrløv Bendtsen J, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004;340:783–95.
Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–66.
Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647–9.
Ranwez V, Harispe S, Delsuc F, Douzery EJP. MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons. PLoS ONE. 2011;6:e22594.
Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17:540–52.
Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–90.
Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–91.
Meredith RW, Janečka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, et al. Impacts of the Cretaceous terrestrial revolution and KPg extinction on mammal diversification. Science. 2011;334:521–4.
Romiguier J, Ranwez V, Delsuc F, Galtier N, Douzery EJP. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Mol Biol Evol. 2013;30:2134–44.
Nielsen R, Yang Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998;148:929–36.
Yang Z, Nielsen R, Goldman N, Pedersen A-MK. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000;155:431–49.
Yang Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998;15:568–73.
Burk-Herrick A, Scally M, Amrine-Madsen H, Stanhope MJ, Springer MS. Natural selection and mammalian BRCA1 sequences: elucidating functionally important sites relevant to breast cancer susceptibility in humans. Mamm Genome. 2006;17:257–70.
Kirwan JD, Bekaert M, Commins JM, Davies KTJ, Rossiter SJ, Teeling EC. A phylomedicine approach to understanding the evolution of auditory sensory perception and disease in mammals. Evol Appl. 2013;6:412–22.
Lartillot N, Delsuc F. Joint reconstruction of divergence times and life-history evolution in placental mammals using a phylogenetic covariance model. Evolution. 2012;66:1773–87.
Lartillot N, Poujol R. A phylogenetic model for investigating correlated evolution of substitution rates and continuous phenotypic characters. Mol Biol Evol. 2011;28:729–44.
Benton MJ, Donoghue PCJ, Asher RJ. Calibrating and constraining molecular clocks. In: SB Hedges and S Kumar, editors. The Timetree of Life. Oxford University Press, Oxford; 2009.
Kumar S, Hedges SB. TimeTree2: species divergence times on the iPhone. Bioinformatics. 2011;27:2023–4.
Jones KE, Bielby J, Cardillo M, Fritz SA, O’Dell J, Orme CDL, et al. PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology. 2009;90:2648–8.
Yang Z, Nielsen R. Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol. 1998;46:409–18.
Pupko T, Pe’er I, Hasegawa M, Graur D, Friedman N. A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: Application to the evolution of five gene families. Bioinformatics. 2002;18:1116–23.
Ashkenazy H, Penn O, Doron-Faigenboim A, Cohen O, Cannarozzi G, Zomer O, et al. FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids Res. 2012;40:W580–4.
Hassanin A, Delsuc F, Ropiquet A, Hammer C, van Vuuren BJ, Matthee C, et al. Pattern and timing of diversification of Cetartiodactyla (Mammalia, Laurasiatheria), as revealed by a comprehensive analysis of mitochondrial genomes. C R Biol. 2012;335:32–50.
Kawasaki K, Weiss KM. SCPP gene evolution and the dental mineralization continuum. J Dent Res. 2008;87:520–31.
Davis MJ, Hanson KA, Clark F, Fink JL, Zhang F, Kasukawa T, et al. Differential use of signal peptides and membrane domains is a common occurrence in the protein output of transcriptional units. PLoS Genet. 2006;2:e46.
Al-Hashimi N, Lafont A-G, Delgado S, Kawasaki K, Sire J-Y. The enamelin genes in lizard, crocodile, and frog and the pseudogene in the chicken provide new insights on enamelin evolution in tetrapods. Mol Biol Evol. 2010;27:2078–94.
Jin T, Ito Y, Luan X, Dangaria S, Walker C, Allen M, et al. Elongated polyproline motifs facilitate enamel evolution through matrix subunit compaction. PLoS Biol. 2009;7:e1000262.
Conne B, Stutz A, Vassalli JD. The 3′ untranslated region of messenger RNA: a molecular “hotspot” for pathology? Nat Med. 2000;6:637–41.
Beaudoing E, Freier S, Wyatt JR, Claverie J-M, Gautheret D. Patterns of variant polyadenylation signal usage in human genes. Genome Res. 2000;10:1001–10.
Edwalds-Gilbert G, Veraldi KL, Milcarek C. Alternative poly(A) site selection in complex transcription units: means to an end? Nucleic Acids Res. 1997;25:2547–61.
Gautheret D, Poirot O, Lopez F, Audic S, Claverie J-M. Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. Genome Res. 1998;8:524–30.
Sehic A, Risnes S, Khuu C, Khan Q-E-S, Osmundsen H. Effects of in vivo transfection with anti-miR-214 on gene expression in murine molar tooth germ. Physiol Genomics. 2011;43:488–98.
Fukae M, Kanazashi M, Nagano T, Tanabe T, Oida S, Gomi K. Porcine sheath proteins show periodontal ligament regeneration activity. Eur J Oral Sci. 2006;114:212–8.
Yamakoshi Y, Hu JC-C, Zhang H, Iwata T, Yamakoshi F, Simmer JP. Proteomic analysis of enamel matrix using a two-dimensional protein fractionation system. Eur J Oral Sci. 2006;114:266–71.
Uchida T, Murakami C, Wakida K, Satoda T, Dohi N, Takahashi O. Synthesis, secretion, degradation, and fate of ameloblastin during the matrix formation stage of the rat incisor as shown by immunocytochemistry and immunochemistry using region-specific antibodies. J Histochem Cytochem. 1997;45:1329–40.
Hu JC-C, Yamakoshi Y, Yamakoshi F, Krebsbach PH, Simmer JP. Proteomics and genetics of dental enamel. Cells Tissues Organs. 2005;181:219–31.
Kobayashi K, Yamakoshi Y, Hu JC-C, Gomi K, Arai T, Fukae M, et al. Splicing determines the glycosylation state of ameloblastin. J Dent Res. 2007;86:962–7.
Bohley P. Surface hydrophobicity and intracellular degradation of proteins. Biol Chem. 1996;377:425–35.
Mayer BJ. SH3 domains: complexity in moderation. J Cell Sci. 2001;114:1253–63.
Ingham RJ, Colwill K, Howard C, Dettwiler S, Lim CSH, Yu J, et al. WW domains provide a platform for the assembly of multiprotein networks. Mol Cell Biol. 2005;25:7092–106.
Fukumoto S, Yamada A, Iwamoto T, Nakamura T. Dental epithelium proliferation and differentiation regulated by ameloblastin. In: Sasano T et al, editors. Interface Oral Health Science. Springer Japan, Tokyo; 2010:33–8.
Yoo SY, Kobayashi M, Lee PP, Lee S-W. Early osteogenic differentiation of mouse preosteoblasts induced by collagen-derived DGEA-peptide on nanofibrous phage tissue matrices. Biomacromolecules. 2011;12:987–96.
Wazen RM, Moffatt P, Zalzal SF, Yamada Y, Nanci A. A mouse model expressing a truncated form of ameloblastin exhibits dental and junctional epithelium defects. Matrix Biol. 2009;28:292–303.
Sire J-Y, Delgado S, Fromentin D, Girondot M. Amelogenin: lessons from evolution. Arch Oral Biol. 2005;50:205–12 [Eighth International Conference on Tooth Morphogenesis and Differentiation Eighth International Conference on Tooth Morphogenesis and Differentiation International Association for Dental Research].
Shellis RP, Beynon AD, Reid DJ, Hiiemae KM. Variations in molar enamel thickness among primates. J Hum Evol. 1998;35:507–22.
Dumont ER. Enamel thickness and dietary adaptation among extant primates and chiropterans. J Mammal. 1995;76:1127–36.
Pampush JD, Duque AC, Burrows BR, Daegling DJ, Kenney WF, McGraw WS. Homoplasy and thick enamel in primates. J Hum Evol. 2013;64:216–24.
Kelley JL, Swanson WJ. Dietary change and adaptive evolution of enamelin in humans and among primates. Genetics. 2008;178:1595–603.
Meredith RW, Gatesy J, Murphy WJ, Ryder OA, Springer MS. Molecular decay of the tooth gene enamelin (ENAM) mirrors the loss of enamel in the fossil record of placental mammals. PLoS Genet. 2009;5:e1000634.
Meredith RW, Zhang G, Gilbert MTP, Jarvis ED, Springer MS. Evidence for a single loss of mineralized teeth in the common avian ancestor. Science. 2014;346:1254390.
Chun Y-HP, Yamakoshi Y, Yamakoshi F, Fukae M, Hu JC-C, Bartlett JD, et al. Cleavage site specificity of MMP-20 for secretory-stage ameloblastin. J Dent Res. 2010;89:785–90.
We are grateful to Géraldine Véron (Muséum national d’Histoire Naturelle, Paris) and François Catzeflis (Institut des Sciences de l’Evolution de Montpellier) who kindly provided ethanol-preserved tissues from some species. We thank sequencing centers for making available assembled genome sequences. The study was financially supported by grants from the Centre National de la Recherche Scientifique (CNRS), the Scientific Council of the Université de Montpellier, and the Université Pierre & Marie Curie, Paris (UMR 7138). Computational resources were provided by the Montpellier Bioinformatics Biodiversity platform of the Labex CeMEB. This is contribution ISEM 2015-148 of the Institut des Sciences de l’Evolution de Montpellier.
The authors declare that they have no competing interests.
FD performed the evolutionary analyses. BG carried out the molecular biology experiments. JYS performed in silico data recovery and sequence alignment. FD and JYS wrote the manuscript.
Names of the 56 mammalian species used in this study. Preferred common names, scientific names, families, orders and references in GenBank are listed in alphabetical order of common names. The AMBN sequence of the crocodilian Caiman crocodilus was used as outgroup. (PDF 95 kb)
Amino acid alignment of the 53 mammalian AMBN used in our evolutionary analysis. The sequences were aligned against the human sequence and are ordered following mammalian relationships. Our alignment led to 500 positions, including gaps. For convenience of presentation our alignment does not include the duplicated exons 9c, 9d and 9e (+9c; +9c-9e) that were found in three species only. The specific length of each AMBN in indicated in brackets at the end of the sequence. The signal peptide (underlined) can start at the methionine located either in the region encoded by exon 1 or exon 2 (M, squared). For a better understanding of SP evolution, the sequences in which the first methionine is lacking are highlighted on grey background. Exon limits are indicated by vertical lines; (.): residue identical to human AMBN residue; (−): indel; (?): unknown amino acid; *: stop codon. (PDF 138 kb)
Comparison of intron/exon boundaries and UTR of twelve mammalian AMBN sequences. For scientific names and references, see Additional file 1. (//): sequence not shown; (?): unknown nucleotide; (.): nucleotide identical to human AMBN nucleotide; (−): indel. (PDF 143 kb)
Prediction of functional signal peptides (SP) in the AMBN sequence using Signal P software. Here the human sequence was used as an example. The high probability for the two sequences (score = 0.998) indicates that the SP encoded either by exon 1 + exon 2 (26 residues) (A) or by exon 2 only (16 residues) (B) are functional. The cleavage site (arrow) does not vary in the two sequences analyzed and exhibits a similar, high probability: 0.971 in A, between positions 26 and 27; 0.965 in B, between positions 16 and 17. The three characteristic regions of SP (n-, h- and c-regions) are indicated. (TIFF 488 kb)
Amino acid composition of AMBN. The amino acid percentages are given for the full length sequence and the proline-rich region for six representative mammalian AMBN, the putative ancestral mammalian sequence, and the crocodile, and are compared to average frequency in most proteins. The percentages of proline and glutamine are in bold characters. (PDF 100 kb)