Adaptive evolution of the matrix extracellular phosphoglycoprotein in mammals
© Machado et al; licensee BioMed Central Ltd. 2011
Received: 31 March 2011
Accepted: 21 November 2011
Published: 21 November 2011
Matrix extracellular phosphoglycoprotein (MEPE) belongs to a family of small integrin-binding ligand N-linked glycoproteins (SIBLINGs) that play a key role in skeleton development, particularly in mineralization, phosphate regulation and osteogenesis. MEPE associated disorders cause various physiological effects, such as loss of bone mass, tumors and disruption of renal function (hypophosphatemia). The study of this developmental gene from an evolutionary perspective could provide valuable insights on the adaptive diversification of morphological phenotypes in vertebrates.
Here we studied the adaptive evolution of the MEPE gene in 26 Eutherian mammals and three birds. The comparative genomic analyses revealed a high degree of evolutionary conservation of some coding and non-coding regions of the MEPE gene across mammals indicating a possible regulatory or functional role likely related with mineralization and/or phosphate regulation. However, the majority of the coding region had a fast evolutionary rate, particularly within the largest exon (1467 bp). Rodentia and Scandentia had distinct substitution rates with an increased accumulation of both synonymous and non-synonymous mutations compared with other mammalian lineages. Characteristics of the gene (e.g. biochemical, evolutionary rate, and intronic conservation) differed greatly among lineages of the eight mammalian orders. We identified 20 sites with significant positive selection signatures (codon and protein level) outside the main regulatory motifs (dentonin and ASARM) suggestive of an adaptive role. Conversely, we find three sites under selection in the signal peptide and one in the ASARM motif that were supported by at least one selection model. The MEPE protein tends to accumulate amino acids promoting disorder and potential phosphorylation targets.
MEPE shows a high number of selection signatures, revealing the crucial role of positive selection in the evolution of this SIBLING member. The selection signatures were found mainly outside the functional motifs, reinforcing the idea that other regions outside the dentonin and the ASARM might be crucial for the function of the protein and future studies should be undertaken to understand its importance.
Dentin, one of the major mineralized tissues of teeth, is deposited by odontoblasts, which synthesize collagenous and non-collagenous proteins (NCPs) [1, 2]. Among the NCPs, there is a family of small integrin-binding ligand N-linked glycoproteins (SIBLINGs) consisting of dentin matrix protein 1 (DMP1), dentin sialophosphoprotein (DSPP), integrin-binding sialoprotein (IBSP), matrix extracellular phosphoglycoprotein (MEPE, also known as OF45) and osteopontin (SPP1) . These genes share common genetic and structural features, including a small non-translational first exon, a start codon in the second exon and a large coding segment in the last exon (although exon number varies among the different genes) . The entire SIBLING protein family likely arose from the secretory calcium-binding phosphoprotein (SCPP) family by gene duplication, since this cluster of genes encodes proteins with similar molecular-structural features and functions .
Members of this gene family are encoded by a compact tandem gene cluster (located on chromosome 4 q in Human and 5 q in mouse) characterized by: (i) common exon-intron features, (ii) the presence of the integrin-binding tripeptide Arg-Gly-Asp (RGD) motif that mediates cell attachment/signaling via interaction with cell surface integrins , and (iii) post-translational modifications of conserved phosphorylation and N-glycosylation sites . In humans, the MEPE protein (525 amino acids) is encoded by four exons with a 1960 bp transcript with two N-glycosylation motifs (at residues 477-481), a glycosaminoglycan (SGDG) attachment site at residues 256-259, and the RGD cell attachment motif at residues 247-249 . The RGD motif has a similar function in other members of the SIBLING's (DSPP, DMP1, IBSP, and SPP1) . The protein MEPE has several predicted phosphorylation sites/motifs for protein kinase C, casein kinase II, tyrosine kinase, and cAMP-cGMP-dependent protein kinase and a large number of N-myristoylation sites that appear to be also a feature of the RGD-containing proteins . The acidic serine-aspartate-rich MEPE-associated motif (ASARM motif) occurs at the C-terminus in MEPE (residues 509 to 522)  and when phosphorylated this small peptide can bind to hydroxyapatite and inhibit mineralization .
The basic MEPE protein was first cloned from a tumor resected from a patient with tumor-induced osteomalacia (OHO) [7, 9], which is associated with hypophosphatemia and is caused by a renal phosphate wasting. The MEPE gene is also up-regulated in X-linked hypophosphatemic rickets (XLH or HYP-osteoblasts) and OHO-tumors [7, 10–14]. Under normal conditions it is expressed primarily in osteoblasts, osteocytes, and odontoblasts .
Targeted disruption of the MEPE gene in mouse causes increased bone formation and bone mass, suggesting that MEPE plays an inhibitory role in bone formation and mineralization . In humans, MEPE inhibits mineralization and is also involved in renal phosphate regulation [16, 17]. The inhibition of mineralization and phosphate uptake are related with the protease resistant small peptide ASARM motif located near the end of the protein [7, 17]. However, the MEPE protein has dual functions depending on the proteolytic processing. When the protein is cleaved by cathepsin B or D into several fragments, the small peptide ASARM is released  and when phosphorylated, this small peptide can bind the hydroxyapatite crystal and inhibit mineralization . By contrast, when fragments containing the RGD motif are released and the ASARM is not degraded by proteases, mineralization is accelerated . The influence of MEPE-ASARM peptides in the modulation of mineralization is due to a protein-protein interaction with PHEX, an X-linked phosphate-regulating endopeptidase homolog (also called the minhibin model) . PHEX is also expressed in osteoblasts, osteocytes and odontoblasts and the protein interacts with MEPE, protecting it from the proteolytic process (from cathepsin-B) and preventing ASARM from being released into blood circulation . Most of the disorders associated with MEPE result from a malfunction of this PHEX-MEPE interaction, which in turn leads to an increase of ASARM blood levels.
The majority of mammalian genes are strongly conserved in the coding sequence [20, 21]. Genes carrying signatures of selection may be involved in adaptation and functional innovation, and often have elevated ratios of nonsynonymous/synonymous nucleotide substitutions (dN/dS) in their coding regions . However, evolutionary rates of nuclear and mitochondrial genes are not equal in all the mammalian lineages . For example, while rodents tend to accumulate more mutations in nuclear genes than humans , the differences between the rates in the two lineages seems to be smaller than the generation time difference .
Since MEPE protein has an important role in the regulation of the skeleton mineralization process and since the mineralized tissue is a critical innovation in vertebrate evolution, the evolutionary study of this developmental gene could provide valuable insights on the adaptive diversification of morphological phenotypes in mammals. As the MEPE gene has been suggested to be under selection , our objective was to undergo a thorough analysis to evaluate signatures of positive selection using both a gene-level and protein-level approaches. We assessed the evolution of the MEPE protein-coding gene in 26 mammalian species, from Hyracoidea to Primates, showing that while four regions/motifs in the MEPE gene have a high degree of conservation, the majority of the coding region has a fast evolutionary rate, especially in rodents and tree shrews. Indeed, evidence of strong positive selection (gene and protein-level) was found in 20 amino acids that encompass MEPE protein, highlighting the role of molecular adaptation in the functionality of this gene.
Presence of the MEPE in vertebrates
Twenty-six mammalian MEPE sequences were retrieved from the GenBank and Ensembl databases, comprising eight different mammalian Orders (Additional file 1: Table S1). In addition, sequences of the putative MEPE orthologue, Ovocleidin-116, were obtained from the available bird genome projects (Gallus gallus, Taeniopygia guttata, Meleagris gallopavo) for comparative purposes. For the majority of the mammals considered in this work, the MEPE gene encompasses four exons that encode a transcript that varied from 1272 bp in Ochotona princeps to 2030 bp in Pan troglodytes. Some of the smallest reported transcripts may be incomplete, as in the case of O. princeps, which is missing a stop codon. The absence of the ASARM motif in the MEPE's C-terminal in some species (Equus caballus, Ochotona. princeps, Otolemur garnetti and Pteropus. vampyrus) also suggests that those genes were not fully annotated. Thus, we performed a detailed search in databases for those species using TBLASTN , which led to the identification of the ASARM in E. caballus, but not in O. princeps, O. garnetti and P. vampyrus (in these cases, the missing end portion of the protein corresponds to the end of the contig available in the database). However, several stop codons are present between the end of the present sequence and the putative ASARM motif in the E. caballus sequence and therefore it was not included in subsequent analyses.
The cell attachment region, RGD, situated near the center of the MEPE protein, is fully conserved in 20 of the 26 mammalian species (Figure 2). However, some changes are observed in Tursiops truncatus, Procavia capensis, the bats Pteropus vampyrus and Myotis lucifugus, and in the rodents Dipodomys ordii and Spermophilus tridecemlineatus (Figure 2) and it is likely that such amino acids changes in the RGD motif may have functional relevance. Moreover, the RGD motif is also present in other genes of this gene-cluster family.
The SDGD is completely conserved among all the mammals, reinforcing the premise that this peptide region is, along with RGD, important to the MEPE function. These two motifs constitute the dentonin region, which was not detected in any of the others members of the SIBLING protein family. The chicken and the turkey MEPE orthologues appear to be exceptions, since they do not have the cell-adhesion motif, RGD, but contain the glycosaminoglycan-binding motif, SGDG. In these species we found a HGD near the SGDG motif, suggesting that RGD is replaced by HGD (Additional file 4: Figure S2). A similar change from RGD has been described in other members of the DSPP orthologues (e.g. in rat, Rattus norvegicus, the RGD replaces the HGD) . Nevertheless, in zebra finch (T. guttata) we found the RGD motif but not the SGDG region (Additional file 4: Figure S2). The ASARM motif is highly conserved within the 21 mammals for which ASARM is annotated (average above 85%), although the Bottlenose dolphin (Tursiops truncatus) has a similarity of only 59.1%. Pairwise similarity among birds was 79.9% (among the three avian species), but on average only 27.3% similarity was observed between birds and the mammalian ASARM. Moreover, in birds this motif is capped at the C-terminal by 21 (G. gallus, M. gallopavo) to 24 (T. guttata) amino acids, and this region shows 77.2% similarity between G. gallus and M. gallopavo but less than 40% between these two species and T. guttata, showing that this region in birds is probably less constrained than the ASARM.
Rodentia and Scandentia selection signatures
Results from the RRTree test comparing substitution rates in Rodentia, Scandentia and the other mammals.
Rodentia and Scandentia (n = 6)
Other Mammals (n = 20)
Selection signatures at the codon level
PAML results of MEPE for the 20 mammalian species (excluding ambiguity data).
ω = 0.46086
p0 = 0.63812 p1 = 0.36188
ω0 = 0.27631 ω1 = 1.00000 ω2 = 1.00000
p0 = 0.63812 p1 = 0.27244 p2 = 0.08944
M2 vs M1
p = 1.06299 q = 1.09727
p0 = 0.86974 p = 1.58369 q = 2.27462
(p1 = 0.13026) ω1 = 1.29913
M8 vs M7
The evaluation of positive selection using the model implemented in Single Likelihood Ancestor Counting (SLAC) showed three sites under selection, one of those sites being similar to that retrieved with model M8 in PAML. Since SLAC tends to be quite conservative, we also estimated the selection signatures using the Fixed Effects Likelihood (FEL) model, which is assumed to be more powerful than SLAC [42, 43]. The FEL model revealed a total of 23 sites under selection using this model, including a mutation in the highly conserved small peptide ASARM (from aspartate to glicine) (Figure 2). Such a radical change in ASARM was only observed in a few species and further studies are needed to better document its frequency across mammals. All the sites presented in the model implemented in Datamonkey have a significance value (p < 0.10) in FEL and SLAC, which is an accepted level of significance for the test of those models . When we use a significance threshold of 0.05, the number of positive selected amino acids decreased to 14 in FEL and zero in SLAC, meaning that 9 sites (out of the 23 detected with a significance level of 0.10) had less evidence of being under strong positive selection. However, these positions may still be indicative of selection signatures. Recombination can affect several analyses, including phylogenetic reconstruction and analysis of positive selection . Therefore, we assessed gene recombination using GARD implemented in the Datamonkey web-based server  and repeated the selection analysis including and excluding recombination in the dataset. Partitioning the data did not change the conclusions of the positive selection analyses (data not show), suggesting that recombination is not significantly affecting the MEPE gene evolution.
No additional sites were found using the SLR , but six sites under selection in the previous analysis were also statistically supported. Overall, across MEPE, 32 of the 525 sites (referenced to the length of the human MEPE) were under positive selection; additionally six sites were supported by more than one codon analysis (9 in PAML, 23 in FEL, 3 in SLAC, and 6 in SLR) (Additional file 6: Table S4).
Selection at the amino acid level
Selection models that use dN/dS ratios to detect selection are generally not sensitive enough to detect subtle molecular adaptations . It is therefore necessary to employ alternative criteria within generally conserved protein-coding genes or within proteins with strict motifs intermixed with regions under fast directional evolution. Therefore, we used TreeSAAP , which evaluates destabilizing radical changes at each site, and an empirical threshold of change in three properties was applied as evidence that a site is under positive (or negative) selection.
MEPE properties under positive selection determined in TreeSAAP.
Equilibrium constant (ionization of COOH)
Power to be at the C-terminal
Power to be at the middle of alpha-helix
Power to be at the N-terminal
Solvent accessible reduction ratio
At the amino acid site level, MEPE has 181 sites (33.8%) under positive selection in at least one property. Although applying the empirical threshold of at least three properties showing signatures of positive selection the number of sites is reduced to 41 (7.6%) (Additional file 7: Table S5). The majority of these 41 sites are located in the N-terminal region of the protein and the dentonin region (68% of the positive selected sites). The alternative calculation method was performed using CONTEST and estimates of variation in amino acid charge and volume revealed 79 sites with signatures of positive selection for at least one of the amino acid properties (Additional file 8: Table S6). However, after the Bonferroni and False Discovery Rate (FDR) correction, only one site showed positive selection. This site, located at position 354 in the alignment (position 349 in the human sequence), corresponds either to lysine or glutamate and was not detected by TreeSAAP. The ancestral protein reconstruction in TreeSAAP, based on the baseml implemented in PAML, shows that glutamate is present in the common ancestor of non-Afrotheria mammals, suggesting that the radical change to lysine occurred in Cetartiodactyla, Perissodactyla and in at least one representative of the Lagomorpha.
Based on selection analyses at the protein level across MEPE, 42 of the 525 sites (human MEPE as reference) were under selection at the amino acid level (41 detected with TreeSAAP and 1 with CONTEST).
Selection at codon and amino acid level
Directed evolution analysis (DEPS)
MEPE evolution has disproportionally accumulated serines, threonines (potential phosphorylation target residues), arginines, alanines and valines, as all these amino acids showed directional evolution in the DEPS analysis (with a P-value < 0.01) (Additional file 9: Table S7). The MEPE protein had 14 sites under directional selection (Additional file 10: Table S8), seven of which are amino acids that tend to increase the disorder/unstructured probability of the regions. Additionally, eight of these 14 sites had a tendency to change to amino acids that are potentially phosphorylated residues, particularly at positions 496 and 503 (505 and 512 positions in the alignment), since these sites are relatively near the ASARM motif and the cleavage site by cathepsin-B.
Selection Signatures and the MEPE structure
The MEPE protein belongs to a category of proteins classified as "intrinsically unstructured/natively disordered", with 53.8% and 55.8% of the human and the mouse MEPE constituted by amino acids that are associated with disorder/unstructured regions, respectively. This is reinforced given that most of the protein (around 78.8%) is disordered at a 0.05% false positive rate. Interestingly, the ASARM motif has a high content of amino acids disorder promoters while the other functional motifs (such as RGD and SGDG) incorporate regions that are structured (Additional file 11: Figure S3). The protein has a high percentage of the amino acid aspartate, which characterizes the proteins of the SIBLING family. Given the importance of disorder/order in MEPE, we analyzed the implications of selection signatures relative to the protein structural differences, and found that sites 75-Ser, 127-Glu, and 481-Arg (human MEPE as reference) are under positive selection and have a higher number of non-synonymous mutations towards codons that encode the amino acids disorder promoters.
MEPE in the Tetrapods
Given the absence of the MEPE gene in fishes and amphibians, its origin likely coincides with the divergence of amniotes, when mineralization [11, 48] and phosphate regulation  had a crucial role in species survival and diversification. SPP1 diverged from SPARCL1 (secreted protein acidic cysteine-rich like 1) and both are expressed in bone, participating in the bone formation (as an inhibitor of mineralization in SPP1) . Therefore, the presence of SPP1 in fishes with a broader tissue expression pattern  suggests that SPP1 might also have similar functions to MEPE. Remarkably, after duplication, the genes were conserved during evolution and probably have differentiated to assume various functions related with tissue mineralization specificity. Recently, it was proposed that SPP1 is a more-powerful inhibitor of mineralization than MEPE . This suggests that after the emergence of the complete SIBLING family in vertebrates, some functions were possibly shared among genes, notably because MEPE is absent in fishes.
The MEPE gene has similarities with other SIBLING genes, suggesting that it originated through a duplication event from another member of the gene family , but different dynamics of gene duplication and gene loss have occurred among lineages (e.g. absence of MEPE - Figure 1). The five genes of the SIBLING family are present in therian mammals and reptiles, but birds only have four genes (IBSP, SPP1, DMP1 and MEPE/OC-116), while fish only have two genes (SPP1 and DSPP-like). The DSPP orthology in fishes is controversial . However, despite the low similarity, DSPP starmaker was identified as a functional orthologue  clearly associated with DSPP . The presence/absence of various SIBLING family genes in vertebrates suggests that despite the crucial role of MEPE in mammals, birds and reptiles, its function may have been compensated in other taxa by other genes of the family. For example, in fishes a duplicated copy of SPP1 has not been described, suggesting that the fish SPP1 orthologue may have had a similar function to MEPE since SPP1 and MEPE, interact with PHEX . The release of the ASARM from the MEPE protein and the phosphorylation of this motif lead to an inhibition of mineralization . Similarly, the ASARM from SPP1 inhibit the mineralization . Moreover, the ASARM from SPP1 is potentially phosphorylated and can interact with the hydroxyapatite crystals leading to a negative regulation of mineralization . Although SPP1 has an ASARM motif near the center of the molecule, it does not have the full dentonin region (just the RGD motif). Moreover, the SPP1-ASARM has been described as a more-potent mineralization inhibitor than the MEPE-ASARM . However, the knockouts of SPP1 and MEPE in mice have different phenotypes. MEPE knockouts have increased bone mass and inhibition of age-related bone loss  while SPP1 knockouts cause a resistance to bone loss and trabecular bone mass .
The functional motifs of MEPE (RGD, SGDG and ASARM) are highly conserved among the studied mammals. In the SIBLING proteins the first coding exon encodes the signal peptides [4, 54], as is observed in MEPE. The RGD motif is a common feature of all member of the SIBLING family, remaining functionally preserved after the tandem duplication that gave rise to all the members of this gene family . Surprisingly, birds do not have a complete dentonin region (RGD and SGDG), although the high conservation observed among mammals suggests that this region has an important role in the function of the protein. In fact, in mammals the gene function apparently depends on the full dentonin region, as the RGD motif alone does not enhance an optimal adhesion on biomaterial surfaces in osteoblast . However, when SGDG is close to RGD the mitogenic activity of dentonin increases, while the presence of only the SGDG motif promotes the cell proliferation . In mammals, MEPE is involved in bone formation and osteoblast proliferation [19, 56], while in birds it is involved in egg-shell formation . This functional divergence may explain the sequence differences observed between the two lineages, particularly reinforced by the absence of the full dentonin region in birds. The ASARM motif is also highly conserved among mammals, but shares less than 50% similarity with the avian ASARM. Moreover, we have not detected a similar cathepsin-B cleavage site near avian MEPE-ASARM and this motif is capped at the C-terminal by 21 to 24 amino acids. Amino acids towards the C-terminal after the ASARM motif are also observed in marsupials . Despite the lower similarity with the mammalian ASARM and its different position, the high conservation within birds suggests that this motif continues to have a crucial role. The changes are probably not due a relaxation of selection, but instead may have an adaptive role. In mammals, the cathepsin-B cleavage site is crucial for the function of MEPE, since this small peptide only interferes with hydroxyapaptite crystals when released . Therefore, birds are also expected to have a mechanism for cleavage of ASARM. MEPE has not yet been annotated in a monotremata, no significant matches were found in a representative species of this group, the platypus (Ornithorhynchus anatinus). Nevertheless, the discovery of this gene in egg-laying mammals would be of great relevance to understanding the functional differences between mammals and birds.
The coding region of MEPE that flank the motifs described above is less conserved, but retain considerable phylogenetic signal across species. The human MEPE sequence has a high similarity with the great apes and with the genus Macaca (Cercophitecidae), even in the non-coding regions (M. mulatta). To a lesser degree, human MEPE also has some significant similarities in the non-coding regions with the genes of the lower primates (M. murinus and O. garnetti). MEPE appears to be particularly conserved among primates, in both coding and non-coding regions. The intronic conservation could provide valuable information about the role of non-coding sequences in the regulation/functionality of this gene. Despite the accelerated evolution in rodents, intronic conservation allowed us to reconstruct a well-supported species phylogeny from intronic sequences (even including the rodents sequences) with similar results as those obtained from MEPE coding regions.
Several human diseases increase MEPE expression [7, 14, 58], which may imply functional constraints in the gene even at the intronic level. Previous studies have demonstrated that highly conserved intronic regions are correlated with functional constraints and can be evidence of a hidden class of abundant regulatory elements . Recently, a SNP in the region 7 kb 3' of the gene was associated with osteoporosis, a disease characterized by reduced bone mass and microarchitectural deterioration of bone tissue that reduces bone strength and leads to an increased risk of fracture . These findings suggest that intergenic regions can also be important in gene function and may cause significantly different phenotypes. We hypothesize that intronic regions can also lead to significant differences at the expression level and ultimately to differences in phenotypes. This is consistent with our findings that there are strong evolutionary constraints in the MEPE intronic region.
Selection signatures and conservation
Within mammals, MEPE in rodents is evolving faster, presenting a high amount of transitions and transversions. A similar trend is also observed in the tree shrew T. belangeri. However, since we only had one MEPE sequence from the order Scandentia we were unable to infer if this pattern is species-specific or if it is typical of this order. The increased number of substitutions in rodents was expected as previous studies have shown that rodents tend to accumulate more mutations in the coding regions [24, 61]. We hypothesize that the observed differences in these two orders have resulted from either a divergent functional role or simply a relaxation in Darwinian selection. It is not known if the function of the rodent MEPE is similar to that in humans , but all the functional motifs are conserved and the signatures of positive selection or the differences observed were only detected outside of these important motifs. It is clear that positive selection may have an important role in the functional divergence of homologous proteins during adaptation to different habitats . Indeed, selection may be episodic as positive and negative selection shifts over time across different lineages, reinforcing the importance of comparing sequences that have diverged within appropriate time frames . The branch-site model, using rodents as foreground branches and allowing ω ratio variation not only between the branches but also among sites, identified 12 sites with strong signatures of positive selection. This suggests that the rodents and probably Scandentia may have lineage-specific selection differences in MEPE, not only in the magnitude of the selective pressure found in the branch, but also in the number of sites under selection. The acceleration of the substitution rates in rodents and the tree shrew potentially compromises the assessment of positive selection by increasing the number of synonymous mutations and because this heterogeneous site selection is observed in only two of the eight orders evaluated (i.e. Rodentia and Scandentia). The results may also be biased by the mixing of species with long and short generation times , as well as the related long-branch-attraction effect in phylogenetic reconstruction. Therefore, we did not include the rodents and the tree shrew in the site analysis.
The evolutionary analyses of mammalian MEPE codons (excluding the rodents and the tree shrew) found 32 sites under positive selection at codon level, and remarkably three were in functional regions of the protein, positions 6-Val and 11-Phe (Signal Peptide) and position 517-Gly (ASARM motif) (Figure 2).
Recent methods for investigating selection in protein coding genes have focused on evaluating the type of positive selection detected (directional or nondirectional, stabilizing or destabilizing), determining the presence of purifying selection, and interpreting how selection affects overall protein structure and function. Amino acid substitutions have different effects on a protein depending on differences in physicochemical properties and their position in the protein structure . Here, we performed multiple analyses to differentiate among the different types of selective pressures acting in MEPE at the amino acid level. The evaluation of the amino acid physiochemical properties changes in the mammalian MEPE identified 37 more sites (36 using TreeSAAP and one using CONTEST) with selection signatures compared with the results retrieved using codon models. This shows that total reliance on models based on dN/dS using codon models may not detect some important sites with signatures of selection, often because a single adaptive mutation may occur in a small number of species, resulting in an omega lower than one. By contrast, these could also primarily be amino acid stabilizing rather than destabilizing changes, and a ω > 1 may not always be indicative of adaptive evolution.
Combining all the selection analyses, we found 69 amino acids with evidence of positive selection (20 well-supported by both codon and amino acid level approaches) (Additional file 12: Figure S4). Three clusters of positively selected sites revealed three new motifs that likely have a functional role, SEASEN (75-80), LNXEXS (96-101) and ENT (170-172) (Figure 10), using the human protein as site reference.
Selection analysis of MEPE in TreeSAAP using amino acid destabilizing properties revealed that the structural properties tend to be more affected by positive selection than the chemical properties. This suggests that the flexible and intrinsically unstructured nature of MEPE is linked to its multiple biological roles. The ASARM motif shows a "high tendency" to be a "disordered region and highly acidic", although the conformation of ASARM should be dependent on the phosphorylation level . The ability to bind to hidroxyapatite is also correlated with phosphorylation state and PHEX cleavage of MEPE is dependent on the Serine phosphorylation status . Moreover, our results shows that the protein tends to accumulate numerous residues with potential phosphorylation sites and this can be important to the folding/function of the protein. Proteins fold to minimize their free energy, although the structure also reflects an organization that can allow the recognition of a ligand or a transition state . In fact, there is a balance between protein function and stability, and most of functional sites are non-optimal in terms of stability. If a residue is replaced by another residue, the protein activity will be reduced but the stability will be increased . In MEPE we detected 75 sites with a Γ lower than -2 kcal/mol, indicating that a large number of sites in MEPE are non-optimal and therefore possibly involved in protein function. Moreover, 13 of those sites showed signatures of selection in one analysis, and sites 55, 127 and 276 in both codon and amino acid level analyses. Proteins have different secondary-structures and physicochemical properties and roles that help determine their evolutionary flexibility . Thus, amino acids that comprise disordered regions, such as random coils, are more likely to be under positive selection than expected from their proportion in the proteins, compared with the residues in helices and β-structures which are subjected to less positive selection . Indeed, when we compare the evidence of positive selection with the protein secondary structure in MEPE we observed that the number of sites under selection in the random coils and disordered regions are slightly higher than expected. This suggests that a high number of sites probably have a functional role or are at least relevant to an increase in MEPE protein flexibility.
Presently, most of the research on MEPE has centered on the biological role of the RGD and ASARM regions. However, our comparative study of mammalian MEPE orthologues revealed that the protein has lineage-specific properties (e.g. biochemical, evolutionary rate, intronic conservation), and that outside these two well-described motifs there are 69 sites (20 with high confidence level) under positive selection and of probable functional relevance. As positively selected sites might be either near catalytically important regions of the proteins  or be functionally relevant sites [70, 71], these sites are good candidates for mutagenesis and structural studies to determine the functionality of MEPE relative with the other SIBLING proteins.
MEPE is found in reptiles, birds and mammals (eutheria and metatheria), and to date has not been identified in monotremes. The description and study of MEPE in other taxonomic groups will be crucial to fully understanding the differences reported in avian and mammalian orthologues, and the adaptive significance of these differences. The absence of this gene in some vertebrate lineages suggests that SPP1 might partially cover the functions of MEPE in those groups. MEPE retains a strong phylogenetic signal at both coding and non-coding regions in mammals, probably due to in the functional relevance of these regions. Nevertheless, the gene is highly variable, particularly in the largest exon outside the functional motif, while other regions appear to be under strong positive selection. We found 20 sites with a significant signature of positive selection at both nucleotide and amino acid level complimentary analyses (in addition to other 69 sites with evidence of selection at either the nucleotide or the amino acid level). The analyses identified three motifs (LNXEXS, SEASEN and ENT) with selection signatures suggesting important adaptive functions. We also showed that Rodentia and Scandentia have an accelerated evolutionary rate with a unique evolutionary pattern. Finally, we showed that MEPE tends to accumulate amino acids that promote "disorder" and that present potential phosphorylation targets, supporting the contention that other regions outside the dentonin and ASARM might have crucial functional roles and demonstrating the need for future studies to understand the importance of these regions.
Comparative genomic analyses
MEPE nucleotide sequences were retrieved from GenBank and ENSEMBL. We aligned 26 MEPE sequences representing eight orders of mammalian species and produced two different alignments, one including all species and another excluding rodents and the tree shrew due to its nucleotide saturation bias. Given the low similarity between the avian and the mammalian sequences, the avian sequences were excluded from phylogenetic and selection analyses. BLAST searches were used to retrieve non-annotated sequences from several mammalian genomes. All the alignments were performed after the translation of nucleotides to amino acids and the corresponding alignments were back-translated to nucleotides. The alignment were performed in ClustalW  implemented in BIOEDIT v7.05 , MEGA4  and LAGAN . Sliding-window percent amino acid and nucleotide identity, and % GC content were calculated in Swaap 1.0.3 . Saturation plots (including or excluding the third-coding position) and the estimated pI (excluding indels) were assessed in DAMBE . Conservation in the coding and the non-coding regions was assessed using mVISTA .
We used Modelgenerator version 0.85  to determine the optimal model of sequence substitution for our protein dataset, employing the Jones-Taylor-Thornton (JTT+I+G) substitution model. MrModeltest 2.3  was employed to determine the optimal model of sequence substitution for our coding sequence dataset, employing the General-Time-Reversible (GTR+I+G) substitution model with the invariant site plus gamma options (five categories). Bayesian inference methods with Markov chain Monte Carlo (MCMC) sampling were performed in MrBayes [81, 82]. The analysis was run for 5, 000, 000 generations with a sample frequency of 100 and burn-in was set to correspond to 25% of the sampled trees. The Maximum-Likelihood (ML) phylogenetic tree was constructed in PHYML , under the best-fit model for nucleotides and amino acids, 1000 bootstrap replicates and the NNI branch search algorithm. The parameters used in the tree reconstructions were set to: (i) Nucleotides: GTR+I+G with 6 substitution rate parameters and gamma-distributed rate variation with a proportion of invariant sites; (ii) Amino acids: JTT+I+G. A neighbor-joining tree was conducted in MEGA 4  using the complete deletion of ambiguous data and the maximum composite likelihood option. The topologies were tested in TREE-PUZZLE  to identify the tree that best fits the alignment, using three tests: KH, SH and ELW. A phylogenetic signal test was performed in TREE-PUZZLE  using the implemented methodology .
Detection of positive selection
Positive selection analyses were performed in the Eutherian mammals (the closely-related taxa) to avoid nucleotide saturation and base-compositional bias. We assessed positive selection using primarily a gene-level approach  based on the ratio (ω) of nonsynonymous (dN) to synonymous (dS) substitutions rate (i.e., ω = dN/dS), implemented in PAML v4.3  and in the web-based program SELECTON [87, 88], PAML uses LRT to compare two nested models, a model that does not allow, and a model that allows, sites categories > 1 (null versus positive selection, respectively). Here, we used three LRTs based on site-specific models comparing the nested models: M1a-M2a, M7-M8 and M8a-M8. The first LRT was performed comparing M1a (nearly neutral: p0, p1, ω0 < 1, ω1 = 1, NS sites = 1) against M2a (positive selection: p0, p1, p2, ω0 < 1, ω1 = 1, ω2 < 1, NSsites = 2); the second LRT was comparing M7 (beta: p, q, NS sites = 7) with M8 (beta & ω: p0, p1, p, q, ωs > 1, NS sites = 8). The third LRT was between M8a (beta & ωs = 1: fix omega = 1, omega = 1, NS sites = 8) and M8. However, a significant LRT only demonstrated that the selection model is more suitable than the neutral model; it does not provide any indication of the sites under selection . This can be accomplished through an Empirical Bayes (EB) approach to calculate the posterior probability (PP) that a given site comes from the class with ω > 1. Sites presenting a PP above the defined cut-off value (e.g. p > 95%)  are inferred to be under positive selection. A robust method was used to accommodate the uncertainties in the MLEs of parameters in the ω distribution, designated by BEB . This approach was shown to be reliable in both small and large data sets, and also to have a good resolution power for identifying individual sites under positive selection, especially in large data sets or with strong selective pressure. We also performed an analysis using the branch-site model A  (model = 2 NSsites = 2), including and excluding the rodents and tree shrew as foreground branch, allowing the ω ratio vary both among sites and among lineages. The branch-site test 2 was performed using the null model, ω2 = 1 fixed (using the parameters fix omega = 1 and omega = 1). The sites under selection in the foreground branches were obtained after calculating probabilities of site classes using the BEB procedure.
Although the PAML models  allow for variation in the non-synonymous substitution rate, the synonymous rate is fixed across the sequence. To overcome that specificity, we used SLAC and FEL  for detecting positive selection while allowing variation in synonymous rate. SLAC is a heavily modified and improved derivative of the Suzuki-Gojobori counting approach [42, 93] that maps changes in the phylogeny to estimate selection on a site-by-site basis. SLAC calculates the number of non-synonymous and synonymous substitutions that have occurred at each site using ML reconstructions of ancestral sequences [42, 93]. The FEL model estimates the ratio of nonsynonymous to synonymous not assuming an a priori distribution of rates across sites substitution on a site-by-site analysis . The SLAC and FEL methods were implemented using the web interface Datamonkey . Since recombination in the gene can bias the analysis , we also re-run SLAC and FEL in Datamonkey using the GARD method , allowing each calculated partition to have its own phylogenetic tree.
Additionally, we used the LRT based analysis as implemented in the SLR (Sitewise Likelihood-Ratio) software package . This method assumes that substitutions (both synonymous and non-synonymous) can occur independently with every other site, modulating substitution rates as a continuous-time Markov process. The LRT on a site-wise basis is performed testing a null model (neutrality, ω = 1) against an alternative model ω ≠ 1.
We performed multiple analyses to differentiate the different types of selective pressures acting in MEPE: (i) positive versus negative selection, and (ii) stabilizing (selection that tends to maintain the overall biochemistry of the protein) versus destabilizing selection (selection that results in radical structural or functional shifts in local regions of the protein). These analyses provided insight into the structural and functional consequences of the residues under selection . We used TreeSAAP v3.2  and CONTEST  implemented in IMPACT  to detect selection signatures at the amino acid level. In TreeSAAP positive destabilizing-selection is detected based on the properties changes with significantly greater amino acid replacements than would be expected under neutrality for magnitude categories +7 and +8 (i.e., the two most-radical property-change categories). Within TreeSAAP, 31 amino acid properties were evaluated across the phylogenetic tree to identify the specific amino acid residues within each region that showed evidence of positive destabilization for each property. The baseml implemented PAML  is used in TreeSAAP  to reconstruct ancestral character states at the nodes on the MEPE phylogeny.
To test if evolutionary rates varied between lineages we used the relative-rate test, weighting by the predefined tree topology, as implemented in RRTree . To detect directional selection over the tree or a large number of substitutions towards a particular residue in a maximum likelihood context we used the directional evolution in protein sequences (DEPS) analysis to identify statistically significant directional changes in amino acid residue frequencies .
MEPE Three-Dimensional Structure Modeling
To determine the position of the positive selected amino acids when the protein is folded, we modeled the three-dimensional (3D) structure of MEPE. Protein structure prediction can be approached in three ways: (i) comparative modeling, (ii) threading, and (iii) ab-initio folding. For MEPE, the first two methods, which build a protein model by aligning query sequences onto solved template structures, were not feasible. Thus, the only practical strategy was to run the I-TASSER  to obtain an ab-initio 3D model of MEPE. The model obtained using the Homo sapiens sequence had a TM Score of 0.46 ± 0.15 and a C-Score = -2.18. To accurately infer the correct topology, the model should have a C-score above -1.5, varying from [-5; 2] . A TM score above 0.5 means that the obtained topology is not random . Results using the sequences of the rock hyrax (out-group), the dog (i.e. one of the species showing differences in the pI) and the mouse (which demonstrates accelerated evolution) all had similar C-scores and the 3D structures similar to the results retrieved for the human MEPE, suggesting that the biochemical differences in the composition of the amino acids that constitutes the different orthologues are not imposing significant differences in the folding of the protein.
To assess the surface exposure of the amino acids in the protein structure, we used the GETAREA 1.1  web-based program based on the atom coordinates of the PDB file. This provides an estimate of the solvent exposure based on the ratio of the side-chain surface area to "random coil" value per residue, performing an analytical calculation of solvent accessible surface area residues. These are considered to be solvent exposed if the ratio value exceeds 50% and to be buried if the ratio is less than 20% . Since MEPE has been described as an intrinsic unfolded protein, we also used the Protein DisOrder prediction System (PrDOS) server  to predict natively disordered regions of a protein chain based on the composition of the amino acid sequence. Protein stability was calculated with the PoPMuSiC 2.1 web server  using the MEPE PDB file previously obtained in I-TASSER to calculate the sites Γ considering all the possible mutations in each site. The secondary structure was visualized in POLYVIEW .
The authors acknowledge the Portuguese Fundação para a Ciência e a Tecnologia (FCT) for financial support to JPM (SFRH/BD/65245/2009) and the project PTDC/BIA-BDE/69144/2006 (FCOMP-01-0124-FEDER-007065) and PTDC/AAC-AMB/104983/2008 (FCOMP-01-0124-FEDER-008610). This work was further supported by a grant from Iceland, Liechtenstein and Norway through the EEA Financial Mechanism and the Norwegian Financial Mechanism. We thank Siby Philip from LEGE/CIIMAR for discussion and helpful suggestions. Comments made by the anonymous reviewers improved a previous version of this manuscript.
- Thesleff I: Epithelial-mesenchymal signalling regulating tooth morphogenesis. J Cell Sci. 2003, 116: 1647-1648. 10.1242/jcs.00410.PubMedGoogle Scholar
- Butler WT: Dentin matrix proteins. Eur J Oral Sci. 1998, 106 (Suppl 1): 204-210.PubMedGoogle Scholar
- Chen S, Chen L, Jahangiri A, Chen B, Wu Y, Chuang HH, Qin C, MacDougall M: Expression and processing of small integrin-binding ligand N-linked glycoproteins in mouse odontoblastic cells. Arch Oral Biol. 2008, 53: 879-889. 10.1016/j.archoralbio.2008.05.005.PubMedPubMed CentralGoogle Scholar
- Fisher LW, Fedarko NS: Six genes expressed in bones and teeth encode the current members of the SIBLING family of proteins. Connect Tissue Res. 2003, 44 (Suppl 1): 33-40.PubMedGoogle Scholar
- Kawasaki K, Weiss KM: Evolutionary genetics of vertebrate tissue mineralization: the origin and evolution of the secretory calcium-binding phosphoprotein family. J Exp Zool B Mol Dev Evol. 2006, 306: 295-316.PubMedGoogle Scholar
- Yamada KM: Adhesive recognition sequences. J Biol Chem. 1991, 266: 12809-12812.PubMedGoogle Scholar
- Rowe PS, de Zoysa PA, Dong R, Wang HR, White KE, Econs MJ, Oudet CL: MEPE, a new gene expressed in bone marrow and tumors causing osteomalacia. Genomics. 2000, 67: 54-68. 10.1006/geno.2000.6235.PubMedGoogle Scholar
- Addison WN, Nakano Y, Loisel T, Crine P, McKee MD: MEPE-ASARM peptides control extracellular matrix mineralization by binding to hydroxyapatite: an inhibition regulated by PHEX cleavage of ASARM. J Bone Miner Res. 2008, 23: 1638-1649. 10.1359/jbmr.080601.PubMedGoogle Scholar
- Ogbureke KU, Fisher LW: SIBLING expression patterns in duct epithelia reflect the degree of metabolic activity. J Histochem Cytochem. 2007, 55: 403-409.PubMedGoogle Scholar
- Argiro L, Desbarats M, Glorieux FH, Ecarot B: Mepe, the gene encoding a tumor-secreted protein in oncogenic hypophosphatemic osteomalacia, is expressed in bone. Genomics. 2001, 74: 342-351. 10.1006/geno.2001.6553.PubMedGoogle Scholar
- Gowen LC, Petersen DN, Mansolf AL, Qi H, Stock JL, Tkalcevic GT, Simmons HA, Crawford DT, Chidsey-Frink KL, Ke HZ, et al: Targeted disruption of the osteoblast/osteocyte factor 45 gene (OF45) results in increased bone formation and bone mass. J Biol Chem. 2003, 278: 1998-2007. 10.1074/jbc.M203250200.PubMedGoogle Scholar
- Petersen DN, Tkalcevic GT, Mansolf AL, Rivera-Gonzalez R, Brown TA: Identification of osteoblast/osteocyte factor 45 (OF45), a bone-specific cDNA encoding an RGD-containing protein that is highly expressed in osteoblasts and osteocytes. J Biol Chem. 2000, 275: 36172-36180. 10.1074/jbc.M003622200.PubMedGoogle Scholar
- Rowe PS: The wrickkened pathways of FGF23, MEPE and PHEX. Crit Rev Oral Biol Med. 2004, 15: 264-281. 10.1177/154411130401500503.PubMedPubMed CentralGoogle Scholar
- De Beur SM, Finnegan RB, Vassiliadis J, Cook B, Barberio D, Estes S, Manavalan P, Petroziello J, Madden SL, Cho JY, et al: Tumors associated with oncogenic osteomalacia express genes important in bone and mineral metabolism. J Bone Miner Res. 2002, 17: 1102-1110. 10.1359/jbmr.2002.17.6.1102.PubMedGoogle Scholar
- Gluhak-Heinrich J, Kotha SP, Bonewald LF, Schaffler MB, Harris SE: In-vivo site-specific correlation of dentin matrix protein 1 (DMPI) and matrix extracellular phosphoglycoprotein (MEPE) gene expression: Effect of overload. J Bone Miner Res. 2004, 19: S73-S73.Google Scholar
- Dobbie H, Shirley DG, Faria NJ, Rowe PS, Slater JM, Unwin RJ: Infusion of the Bone-Derived protein MEPE causes phosphaturia in rats. J Am Soc Nephrol. 2003, 14: 467a-468a.Google Scholar
- Rowe PS, Kumagai Y, Gutierrez G, Garrett IR, Blacher R, Rosen D, Cundy J, Navvab S, Chen D, Drezner MK, et al: MEPE has the properties of an osteoblastic phosphatonin and minhibin. Bone. 2004, 34: 303-319. 10.1016/j.bone.2003.10.005.PubMedPubMed CentralGoogle Scholar
- Rowe PSN, Matsumoto N, Jo OD, Shih RNJ, Roudier M, Harrison J, Yanagawa N: MEPE-ASARM-peptide associated mineralization defects in X-linked hypophosphatemic rickets (hyp) is corrected by protease-inhibitors. J Bone Miner Res. 2005, 20: S42-S42.Google Scholar
- Hayashibara T, Hiraga T, Yi B, Nomizu M, Kumagai Y, Nishimura R, Yoneda T: A synthetic peptide fragment of human MEPE stimulates new bone formation in vitro and in vivo. J Bone Miner Res. 2004, 19: 455-462.PubMedGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.PubMedGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.PubMedGoogle Scholar
- Ohta T: The Nearly Neutral Theory of Molecular Evolution. Annu Rev Ecol Syst. 1992, 23: 263-286. 10.1146/annurev.es.23.110192.001403.Google Scholar
- Hasegawa M, Thorne JL, Kishino H: Time scale of eutherian evolution estimated without assuming a constant rate of molecular evolution. Genes Genet Syst. 2003, 78: 267-283. 10.1266/ggs.78.267.PubMedGoogle Scholar
- Wu CI, Li WH: Evidence for higher rates of nucleotide substitution in rodents than in man. Proc Natl Acad Sci USA. 1985, 82: 1741-1745. 10.1073/pnas.82.6.1741.PubMedPubMed CentralGoogle Scholar
- Bardet C, Delgado S, Sire JY: MEPE evolution in mammals reveals regions and residues of prime functional importance. Cell Mol Life Sci. 2010, 67: 305-320. 10.1007/s00018-009-0185-1.PubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedGoogle Scholar
- Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP, Bork P, Burt DW, Groenen MAM, Delany ME, et al: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004, 432: 695-716. 10.1038/nature03154.Google Scholar
- Fisher LW: DMP1 and DSPP: Evidence for Duplication and Convergent Evolution of Two SIBLING Proteins. Cells Tissues Organs. 2011, 194: 113-118. 10.1159/000324254.PubMedPubMed CentralGoogle Scholar
- Kawasaki K: The SCPP gene repertoire in bony vertebrates and graded differences in mineralized tissues. Dev Genes Evol. 2009, 219: 147-157. 10.1007/s00427-009-0276-x.PubMedPubMed CentralGoogle Scholar
- Kawasaki K, Weiss KM: SCPP gene evolution and the dental mineralization continuum. J Dent Res. 2008, 87: 520-531. 10.1177/154405910808700608.PubMedGoogle Scholar
- Sollner C, Burghammer M, Busch-Nentwich E, Berger J, Schwarz H, Riekel C, Nicolson T: Control of crystal size and lattice formation by starmaker in otolith biomineralization. Science. 2003, 302: 282-286. 10.1126/science.1088443.PubMedGoogle Scholar
- Ramialison M, Bajoghli B, Aghaallaei N, Czerny T, Wittbrodt J: Identification of Starmaker-Like in Medaka as a Putative Target Gene of Pax2 in the Otic Vesicle. Dev Dynam. 2009, 238: 2860-2866. 10.1002/dvdy.22093.Google Scholar
- Perelman P, Johnson WE, Roos C, Seuanez HN, Horvath JE, Moreira MAM, Kessing B, Pontius J, Roelke M, Rumpler Y, et al: A Molecular Phylogeny of Living Primates. Plos Genet. 2011, 7:Google Scholar
- Springer MS, Murphy WJ, Eizirik E, O'Brien SJ: Placental mammal diversification and the Cretaceous-Tertiary boundary. Proc Natl Acad Sci USA. 2003, 100: 1056-1061. 10.1073/pnas.0334222100.PubMedPubMed CentralGoogle Scholar
- Nishihara H, Hasegawa M, Okada N: Pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions. Proc Natl Acad Sci USA. 2006, 103: 9929-9934. 10.1073/pnas.0603797103.PubMedPubMed CentralGoogle Scholar
- Meredith RW, Janecka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, Goodbla A, Eizirik E, Simao TL, Stadler T, et al: Impacts of the Cretaceous Terrestrial Revolution and KPg extinction on mammal diversification. Science. 2011, 334: 521-524. 10.1126/science.1211028.PubMedGoogle Scholar
- Majewski J, Ott J: Distribution and characterization of regulatory elements in the human genome. Genome Res. 2002, 12: 1827-1836. 10.1101/gr.606402.PubMedPubMed CentralGoogle Scholar
- McKnight DA, Fisher LW: Molecular evolution of dentin phosphoprotein among toothed and toothless animals. Bmc Evol Biol. 2009, 9:Google Scholar
- Lynn DJ, Lloyd AT, Fares MA, O'Farrelly C: Evidence of positively selected sites in mammalian alpha-defensins. Mol Biol Evol. 2004, 21: 819-827. 10.1093/molbev/msh084.PubMedGoogle Scholar
- Muse SV, Gaut BS: Comparing patterns of nucleotide substitution rates among chloroplast loci using the relative ratio test. Genetics. 1997, 146: 393-399.PubMedPubMed CentralGoogle Scholar
- Swanson WJ, Nielsen R, Yang Q: Pervasive adaptive evolution in mammalian fertilization proteins. Mol Biol Evol. 2003, 20: 18-20.PubMedGoogle Scholar
- Pond SL, Frost SD: Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics. 2005, 21: 2531-2533. 10.1093/bioinformatics/bti320.PubMedGoogle Scholar
- Poon AF, Frost SD, Pond SL: Detecting signatures of selection from DNA sequences using Datamonkey. Methods Mol Biol. 2009, 537: 163-183. 10.1007/978-1-59745-251-9_8.PubMedGoogle Scholar
- Anisimova M, Nielsen R, Yang Z: Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 2003, 164: 1229-1236.PubMedPubMed CentralGoogle Scholar
- Massingham T, Goldman N: Detecting amino acid sites under positive selection and purifying selection. Genetics. 2005, 169: 1753-1762.PubMedPubMed CentralGoogle Scholar
- McClellan DA, Palfreyman EJ, Smith MJ, Moss JL, Christensen RG, Sailsbery AK: Physicochemical evolution and molecular adaptation of the cetacean and artiodactyl cytochrome b proteins. Mol Biol Evol. 2005, 22: 437-455.PubMedGoogle Scholar
- Woolley S, Johnson J, Smith MJ, Crandall KA, McClellan DA: TreeSAAP: selection on amino acid properties using phylogenetic trees. Bioinformatics. 2003, 19: 671-672. 10.1093/bioinformatics/btg043.PubMedGoogle Scholar
- Rowe PSN, Kumagai Y, Gutierrez G, Garrett IR, Blacher R, Rosen D, Chen D, Drezner MK, Quarles LD, Mundy GR: MEPE regulates bone mineralization and phosphate transport: PHEX and the MEPE ASARM-peptide. J Bone Miner Res. 2003, 18: S24-S24. 10.1359/jbmr.2003.18.1.24.Google Scholar
- Quarles LD: FGF23, PHEX, and MEPE regulation of phosphate homeostasis and skeletal mineralization. Am J Physiol Endocrinol Metab. 2003, 285: E1-9.PubMedGoogle Scholar
- Kawasaki K, Suzuki T, Weiss KM: Genetic basis for the evolution of vertebrate mineralized tissue. Proc Natl Acad Sci USA. 2004, 101: 11356-11361. 10.1073/pnas.0404279101.PubMedPubMed CentralGoogle Scholar
- Kawasaki K, Buchanan AV, Weiss KM: Biomineralization in humans: making the hard choices in life. Annu Rev Genet. 2009, 43: 119-142. 10.1146/annurev-genet-102108-134242.PubMedGoogle Scholar
- Addison WN, Masica DL, Gray JJ, McKee MD: Phosphorylation-dependent inhibition of mineralization by osteopontin ASARM peptides is regulated by PHEX cleavage. J Bone Miner Res. 2010, 25: 695-705.PubMedGoogle Scholar
- Yoshitake H, Rittling SR, Denhardt DT, Noda M: Osteopontin-deficient mice are resistant to ovariectomy-induced bone resorption. Proc Natl Acad Sci USA. 1999, 96: 8156-8160. 10.1073/pnas.96.14.8156.PubMedPubMed CentralGoogle Scholar
- Fisher LW, Torchia DA, Fohr B, Young MF, Fedarko NS: Flexible structures of SIBLING proteins, bone sialoprotein, and osteopontin. Biochem Biophys Res Commun. 2001, 280: 460-465. 10.1006/bbrc.2000.4146.PubMedGoogle Scholar
- Dee KC, Andersen TT, Bizios R: Design and function of novel osteoblast-adhesive peptides for chemical modification of biomaterials. J Biomed Mater Res. 1998, 40: 371-377. 10.1002/(SICI)1097-4636(19980605)40:3<371::AID-JBM5>3.0.CO;2-C.PubMedGoogle Scholar
- Liu H, Li W, Gao C, Kumagai Y, Blacher RW, DenBesten PK: Dentonin, a fragment of MEPE, enhanced dental pulp stem cell proliferation. J Dent Res. 2004, 83: 496-499. 10.1177/154405910408300612.PubMedGoogle Scholar
- Hincke MT, Gautron J, Tsang CP, McKee MD, Nys Y: Molecular cloning and ultrastructural localization of the core protein of an eggshell matrix proteoglycan, ovocleidin-116. J Biol Chem. 1999, 274: 32915-32923. 10.1074/jbc.274.46.32915.PubMedGoogle Scholar
- Brame LA, White KE, Econs MJ: Renal phosphate wasting disorders: clinical features and pathogenesis. Semin Nephrol. 2004, 24: 39-47. 10.1053/j.semnephrol.2003.08.016.PubMedGoogle Scholar
- Hare MP, Palumbi SR: High intron sequence conservation across three mammalian orders suggests functional constraints. Mol Biol Evol. 2003, 20: 969-978. 10.1093/molbev/msg111.PubMedGoogle Scholar
- Rivadeneira F, Styrkarsdottir U, Estrada K, Halldorsson BV, Hsu YH, Richards JB, Zillikens MC, Kavvoura FK, Amin N, Aulchenko YS, et al: Twenty bone-mineral-density loci identified by large-scale meta-analysis of genome-wide association studies. Nat Genet. 2009, 41: 1199-1206. 10.1038/ng.446.PubMedPubMed CentralGoogle Scholar
- Li WH, Ellsworth DL, Krushkal J, Chang BH, Hewett-Emmett D: Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol Phylogenet Evol. 1996, 5: 182-187. 10.1006/mpev.1996.0012.PubMedGoogle Scholar
- Liu S, Wang H, Wang X, Lu L, Gao N, Rowe PS, Hu B, Wang Y: MEPE/OF45 protects cells from DNA damage induced killing via stabilizing CHK1. Nucleic Acids Res. 2009, 37: 7447-7454. 10.1093/nar/gkp768.PubMedPubMed CentralGoogle Scholar
- Levasseur A, Gouret P, Lesage-Meessen L, Asther M, Record E, Pontarotti P: Tracking the connection between evolutionary and functional shifts using the fungal lipase/feruloyl esterase A family. Bmc Evol Biol. 2006, 6: 92-10.1186/1471-2148-6-92.PubMedPubMed CentralGoogle Scholar
- Messier W, Stewart CB: Episodic adaptive evolution of primate lysozymes. Nature. 1997, 385: 151-154. 10.1038/385151a0.PubMedGoogle Scholar
- Antunes A, Ramos MJ: Gathering computational genomics and proteomics to unravel adaptive evolution. Evol Bioinform Online. 2007, 3: 207-209.PubMedPubMed CentralGoogle Scholar
- Martin A, David V, Laurence JS, Schwarz PM, Lafer EM, Hedge AM, Rowe PS: Degradation of MEPE, DMP1, and release of SIBLING ASARM-peptides (minhibins): ASARM-peptide(s) are directly responsible for defective mineralization in HYP. Endocrinology. 2008, 149: 1757-1772. 10.1210/en.2007-1205.PubMedPubMed CentralGoogle Scholar
- Shoichet BK, Baase WA, Kuroki R, Matthews BW: A relationship between protein stability and protein function. Proc Natl Acad Sci USA. 1995, 92: 452-456. 10.1073/pnas.92.2.452.PubMedPubMed CentralGoogle Scholar
- Ridout KE, Dixon CJ, Filatov DA: Positive selection differs between protein secondary structure elements in Drosophila. Genome Biol Evol. 2010, 2: 166-179. 10.1093/gbe/evq008.PubMedPubMed CentralGoogle Scholar
- Morgan CC, Loughran NB, Walsh TA, Harrison AJ, O'Connell MJ: Positive selection neighboring functionally essential sites and disease-implicated regions of mammalian reproductive proteins. Bmc Evol Biol. 2010, 10: 39-10.1186/1471-2148-10-39.PubMedPubMed CentralGoogle Scholar
- Moury B, Simon V: dN/dS-Based Methods Detect Positive Selection Linked to Trade-Offs between Different Fitness Traits in the Coat Protein of Potato virus Y. Mol Biol Evol. 2011, 28: 2707-2717. 10.1093/molbev/msr105.PubMedGoogle Scholar
- Casasoli M, Federici L, Spinelli F, Di Matteo A, Vella N, Scaloni F, Fernandez-Recio J, Cervone F, De Lorenzo G: Integration of evolutionary and desolvation energy analysis identifies functional sites in a plant immunity protein. Proc Natl Acad Sci USA. 2009, 106: 7666-7671.PubMedPubMed CentralGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMedPubMed CentralGoogle Scholar
- Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999, 41: 95-98.Google Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.PubMedGoogle Scholar
- Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 2003, 13: 721-731. 10.1101/gr.926603.PubMedPubMed CentralGoogle Scholar
- Pride DT, Blaser MJ: Concerted evolution between duplicated genetic elements in Helicobacter pylori. J Mol Biol. 2002, 316: 629-642. 10.1006/jmbi.2001.5311.PubMedGoogle Scholar
- Xia X, Xie Z: DAMBE: Software package for data analysis in molecular biology and evolution. J Hered. 2001, 92: 371-373. 10.1093/jhered/92.4.371.PubMedGoogle Scholar
- Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I: VISTA: computational tools for comparative genomics. Nucleic Acids Res. 2004, 32: W273-W279. 10.1093/nar/gkh458.PubMedPubMed CentralGoogle Scholar
- Keane TM, Creevey CJ, Pentony MM, Naughton TJ, Mclnerney JO: Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. Bmc Evolutionary Biology. 2006, 6:Google Scholar
- Nylander JAA: MrModeltest v2. 2004, Evolutionary Biology Centre, Uppsala University, Program distributed by the authorGoogle Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.PubMedGoogle Scholar
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19: 1572-1574. 10.1093/bioinformatics/btg180.PubMedGoogle Scholar
- Guindon S, Delsuc F, Dufayard JF, Gascuel O: Estimating maximum likelihood phylogenies with PhyML. Methods Mol Biol. 2009, 537: 113-137. 10.1007/978-1-59745-251-9_6.PubMedGoogle Scholar
- Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502.PubMedGoogle Scholar
- Strimmer K, von Haeseler A: Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci USA. 1997, 94: 6815-6819. 10.1073/pnas.94.13.6815.PubMedPubMed CentralGoogle Scholar
- Yang ZH: PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.PubMedGoogle Scholar
- Stern A, Doron-Faigenboim A, Erez E, Martz E, Bacharach E, Pupko T: Selecton 2007: advanced models for detecting positive and purifying selection using a Bayesian inference approach. Nucleic Acids Res. 2007, 35: W506-W511. 10.1093/nar/gkm382.PubMedPubMed CentralGoogle Scholar
- Doron-Faigenboim A, Stern A, Bacharach E, Pupko T: Selecton: a server for detecting evolutionary forces at a single amino-acid site. Bioinformatics. 2005, 21: 2101-2103. 10.1093/bioinformatics/bti259.PubMedGoogle Scholar
- Osorio DS, Antunes A, Ramos MJ: Structural and functional implications of positive selection at the primate angiogenin gene. Bmc Evol Bio. 2007, 7:Google Scholar
- Yang ZH, Wong WSW, Nielsen R: Bayes empirical Bayes inference of amino acid sites under positive selection. Mol Biol Evol. 2005, 22: 1107-1118. 10.1093/molbev/msi097.PubMedGoogle Scholar
- Zhang JZ, Nielsen R, Yang ZH: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005, 22: 2472-2479. 10.1093/molbev/msi237.PubMedGoogle Scholar
- Pond SK, Muse SV: Site-to-site variation of synonymous substitution rates. Mol Biol Evol. 2005, 22: 2375-2385. 10.1093/molbev/msi232.PubMedGoogle Scholar
- Kosakovsky Pond SL, Frost SD: Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol. 2005, 22: 1208-1222. 10.1093/molbev/msi105.PubMedGoogle Scholar
- Pond SL, Frost SD: A genetic algorithm approach to detecting lineage-specific variation in selection pressure. Mol Biol Evol. 2005, 22: 478-485.PubMedGoogle Scholar
- Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SD: GARD: a genetic algorithm for recombination detection. Bioinformatics. 2006, 22: 3096-3098. 10.1093/bioinformatics/btl474.PubMedGoogle Scholar
- Dutheil J: Detecting site-specific biochemical constraints through substitution mapping. J Mol Evol. 2008, 67: 257-265. 10.1007/s00239-008-9139-8.PubMedGoogle Scholar
- Maldonado E, Dutheil JY, da Fonseca RR, Vasconcelos V, Antunes A: IMPACT: integrated multiprogram platform for analyses in ConTest. J Hered. 2011, 102: 366-369. 10.1093/jhered/esr003.PubMedGoogle Scholar
- Robinson-Rechavi M, Huchon D: RRTree: relative-rate tests between groups of sequences on a phylogenetic tree. Bioinformatics. 2000, 16: 296-297. 10.1093/bioinformatics/16.3.296.PubMedGoogle Scholar
- Kosakovsky Pond SL, Poon AF, Leigh Brown AJ, Frost SD: A maximum likelihood method for detecting directional evolution in protein sequences and its application to influenza A virus. Mol Biol Evol. 2008, 25: 1809-1824. 10.1093/molbev/msn123.PubMedPubMed CentralGoogle Scholar
- Zhang Y: I-TASSER server for protein 3D structure prediction. BMC Bioinformatics. 2008, 9: 40-10.1186/1471-2105-9-40.PubMedPubMed CentralGoogle Scholar
- Fraczkiewicz R, Braun W: Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules. J Comput Chem. 1998, 19: 319-333. 10.1002/(SICI)1096-987X(199802)19:3<319::AID-JCC6>3.0.CO;2-W.Google Scholar
- Ishida T, Kinoshita K: PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res. 2007, 35: W460-464. 10.1093/nar/gkm363.PubMedPubMed CentralGoogle Scholar
- Dehouck Y, Kwasigroch JM, Gilis D, Rooman M: PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics. 2011, 12: 151-10.1186/1471-2105-12-151.PubMedPubMed CentralGoogle Scholar
- Porollo AA, Adamczak R, Meller J: POLYVIEW: a flexible visualization tool for structural and functional annotations of proteins. Bioinformatics. 2004, 20: 2460-2462. 10.1093/bioinformatics/bth248.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.