Positive selection neighboring functionally essential sites and disease-implicated regions of mammalian reproductive proteins
© Morgan et al; licensee BioMed Central Ltd. 2010
Received: 20 July 2009
Accepted: 11 February 2010
Published: 11 February 2010
Reproductive proteins are central to the continuation of all mammalian species. The evolution of these proteins has been greatly influenced by environmental pressures induced by pathogens, rival sperm, sexual selection and sexual conflict. Positive selection has been demonstrated in many of these proteins with particular focus on primate lineages. However, the mammalia are a diverse group in terms of mating habits, population sizes and germ line generation times. We have examined the selective pressures at work on a number of novel reproductive proteins across a wide variety of mammalia.
We show that selective pressures on reproductive proteins are highly varied. Of the 10 genes analyzed in detail, all contain signatures of positive selection either across specific sites or in specific lineages or a combination of both. Our analysis of SP56 and Col1a1 are entirely novel and the results show positively selected sites present in each gene. Our findings for the Col1a1 gene are suggestive of a link between positive selection and severe disease type. We find evidence in our dataset to suggest that interacting proteins are evolving in symphony: most likely to maintain interacting functionality.
Our in silico analyses show positively selected sites are occurring near catalytically important regions suggesting selective pressure to maximize efficient fertilization. In those cases where a mechanism of protein function is not fully understood, the sites presented here represent ideal candidates for mutational study. This work has highlighted the widespread rate heterogeneity in mutational rates across the mammalia and specifically has shown that the evolution of reproductive proteins is highly varied depending on the species and interacting partners. We have shown that positive selection and disease are closely linked in the Col1a1 gene.
Reproductive proteins are essential for success of sexually reproducing species and indeed for the emergence of new species. In the past it has been observed that reproductive proteins tend to be under positive selective pressure to change, i.e. adaptive evolution, a trend found in a variety of animal species from abalone to primates [1, 2]. Adaptive evolution or positive selection is a selective pressure placed on a protein by a change in environment in order to improve the fitness of the organism in that environment.
With changes in environment, that can include mating system, there is a subsequent selective pressure on the protein sequences related to those functions to adapt accordingly. This variation can be detected using the well-known measurements of the rate of non-synonymous substitutions per non-synonymous site (Dn) and synonymous substitutions per synonymous site (Ds) and their ratio ω = Dn/Ds. The detection of adaptive evolution, where the ratio exceeds unity, is referred to as positive Darwinian selection. Detecting positive Darwinian selection in a region of a protein, or indeed in a lineage of a phylogeny, indicates that there is a selective advantage in changing the amino acid sequence in this region. These signals are essential for our understanding of functionally important residues in a protein sequence and protein functional shift.
In general, the rate of mutation that a gene undergoes is contingent on a number of factors including; protein structure, presence of gene duplicates, location in the genome, effective population size, germ line generation time, and composition of the sequence (for review see ). It has recently been shown that the number of physical interactions of a particular protein also influences the intrinsic rate of evolution . Evidence for the generation time effect has come from studies on various proteins and species including analyses of substitution rates in higher primates and rodents , substitution rates in higher grasses and in palms , in mammalian genomes  and in chloroplast and sex mutation rate ratios [5, 6]. With recent advances in sequencing we have an opportunity to examine these effects using a wider selection of proteins and species. Documented selective pressures associated with positive selection in reproductive proteins include: (i) intense sperm competition whereby sperm from numerous males, ejaculated into the female reproductive tract, compete with one another for the prized fertilization of the egg ; (ii) evasion of the immune system, whereby surface layer reproductive proteins evolve to evade destruction by the host's immune system ; and finally (iii) selective pressures enforced by mating system, related of course to point (i) above. Species that are more promiscuous have increased levels of selective pressure acting on reproductive proteins than species that are monogamous. This later point is illustrated in the study of SEMG2, where adaptive evolution was found to correlate with mating system in primates .
In order to determine the variation in selective pressure in these proteins, there are a number of criteria that the data must meet. Firstly, the data must have a robust phylogenetic signal. Secondly, systematic biases that may exist in the data must be minimized, these include but are not limited to: long branch attraction (LBA), amino acid composition bias, base composition bias and unqualified ortholog predictions, all of which may lead to inaccurate estimates of phylogeny. Thirdly, sensitivity to taxa number is a known limitation of methods for detecting positive selection, therefore more than 6 taxa are needed to gain accurate estimations of selective pressure using the maximum likelihood (ML) method applied here .
In this study we have selected a subset of proteins that have roles to play in reproduction. Our dataset was composed of three major datatypes, (i) previously published reproductive proteins, (ii) interacting proteins, here we identified proteins shown to interact with (i), and finally (iii), genes identified from microarray experiments as being highly expressed in reproductive tissues. For group (iii) we assume that those proteins highly expressed in reproductive tissues are important for the function of that tissue. The previously untested reproductive proteins analysed here are from data types (ii) and (iii) outlined above. These novel proteins are SP56, Porimin and Col1a1. SP56 is sperm binding protein number 56, this protein is a representative of the interacting protein subset of sequences analysed. SP56 has been shown to interact with ZP3 - a well-studied reproductive protein. Both Porimin and Col1a1 have been identified from published microarray experiments on normal human tissue , and were selected for analysis due to their high levels of expression in reproductive tissues in that study. Porimin is a transmembrane protein that is highly expressed in the uterus, prostate and placenta and Col1a1 is highly expressed in the uterus. Further evidence for the link between Porimin and reproduction was not available in the literature and therefore results from this particular gene are taken with caution until this protein is further characterized. Col1a1 plays an important role during spermatogenesis where it mediates the detachment and migration of germ cells, thus adding further support for its role in reproduction .
We have analyzed these data with an approach sensitive to all the systematic biases and limitations of methods given above. A number of genes in our dataset have been analyzed previously but have not taken these limitations and considerations into account. We have expanded these datasets to include a greater number of taxa, we have analyzed all of these genes for evidence of systematic biases and we have used improved models of codon evolution. In this paper we have included models that allow for rate variation across the sequence and across the phylogeny.
Results and Discussion
Summary of the analysis of quality and bias present in the data
AA Comp Bias
Base Comp Bias
Gene v Species Tree
Summary of SH tests for complete gene datasets
SH - gene
SH - ideal
1. Tests of Data Quality and Bias
(i) Test for amino acid and base composition biases
We tested all multiple sequence alignments (MSAs) for evidence of significant levels of amino acid composition bias and base composition bias in each lineage using the TreePuzzle software . We found that all alignments passed the significance test with p-values < 0.05, see Table 1 for summary. For full set of amino acid and base composition bias test results, see Additional Files 1 and 2 respectively. In summary the discordance between each of the gene trees and the canonical species phylogeny is not a result of amino acid or base composition biases providing evidence of false relationships.
(ii) Test for phylogenetic signal
(iii) Long Branch Attraction (LBA) analysis
We assessed the data for evidence of LBA which would manifest itself in the data by drawing species with a greater number of mutations in the gene of interest together erroneously on the phylogenetic tree. The method applied uses the MSA and the corresponding phylogeny to categorise rates amongst sites, using an approach we have previsouly published for mammalian data , as described in detail the Methods section. In this method of site-stripping we apply the phylogenetic tree (estimated ab initio in this software) and the MSA to classify all sites in the alignment into one of eight categories of mutation rate. These are arbitrary categories from 1-8; with 1 being the most highly conserved sites and 8 being the most highly variable. Essentially, these estimates allow us to select only the most conserved sites for phylogeny reconstruction. Sites are sequentially stripped from the alignments based on their rate of evolution and phylogenies are created based on slower evolving sites. These site-stripped phylogenies are then compared to the species tree. Using two independent methods of comparison we determined whether the resultant stripped trees had topologies significantly similar to the species phylogeny. The "root mean squared deviation", or RMSD, method is restricted to binary trees , see Additional File 4 for full set of results. Therefore we also employed the SH method of comparing phylogenies , see Additional File 5 for full set of results. For a full description of the RMSD statistic used here , see the corresponding section in the Methods. Using this approach we could identify the signature of LBA in the Ph20 dataset alone, see Table 1 for summary.
2. Analysis of selective pressures using codon models of evolution
Following analysis of the phylogenies of these reproductive genes, we determined the selective forces at work on these 10 genes (11 datasets). Only those genes passing the data quality tests were analyzed here (i.e. 10 genes), see Table 1. In the case of Catsper1, we have analyzed the gene at two different evolutionary distances because it contains high levels of insertion and deletion events. The two datasets for Catsper1 are: exon 1 from the primates only, and, the entire gene from only distant mammalian groups. Hence the number of datasets is 11, and the number of genes tested is 10. The alignments in all cases reached significant levels following randomization tests (z-scores > 1000 in all cases, a z-score of greater than 5 is typically taken as significant).
Summary of the results of the site-specific analysis: in each case the most significant model was M8
# Positively selected Sites
p0 = 0.92632 p = 0.37637
q = 0.60688
p1 = 0.07368 ω = 3.94326
Catsper1_Exon1 (primates only)
p0 = 0.82736 p = 0.13661
q = 0.03850
p1 = 0.17264 ω = 3.13071
(non-primate mammals only)
p0 = 0.83315 p = 0.34233
q = 0.51278
p1 = 0.16685 ω = 3.26879
p0 = 0.98023 p = 0.04796
q = 0.32286
p1 = 0.01977 ω = 4.09285
p0 = 0.87658 p = 0.56141
q = 0.83349
p1 = 0.12342 ω = 2.20500
p0 = 0.85067 p = 0.41864
q = 0.32952
p1 = 0.14933 ω = 12.21841
p0 = 0.95102 p = 0.16339
q = 0.98823
p1 = 0.04898 ω = 2.60992
p0 = 0.97236 p = 0.01163
q = 0.00500
p1 = 0.02764 ω = 12.26405
p0 = 0.98807 p = 0.16114
q = 1.12262
p1 = 0.01193 ω = 3.81710
p0 = 0.87339 p = 0.63945
q = 0.75356
p1 = 0.12661 ω = 2.04655
p0 = 0.91489 p = 0.30029
q = 0.77328
p1 = 0.08511 ω = 1.92305
Summary of lineage-specific positive selection detected.
Species tested as Foreground
ModelA v M1
ModelA v M1
ModelA v M1
ModelB v m3Discrtk2
ModelA v M1
ModelB v m3Discrtk2
ModelA v M1
ModelB v m3Discrtk2
ModelA v M1
ModelB v m3Discrtk2
ModelB v m3Discrtk2
ModelB v m3Discrtk2
Possibly the most intriguing result from our entire analysis is that from the Col1a1 protein. According to the microarray study employed here , Col1a1 is highly expressed in the uterus tissue. It is also found in most structural tissues including cartilage, bone, tendon, skin and part of the eye (sclera). It is a member of the group 1 collagen proteins involved in the development of the uterine fibroids . There are two propeptide regions to the Col1a1 gene, denoted N- and C-terminal propeptides. According to studies on Col1a1 function, a role has been established for Col1a1 in spermatogenesis .
Summary of the positively selected sites in the col1a1 gene, their clinical relevance, and, the probability of being located within distance "d" from the nearest disease-implicated site.
Positively selected sites
Human Variant: SNP position
Probability of being d
Genetic code distances between observed character states
A-N = 2
G → C
A-S = 1; S-T = 1; T-A = 1
G → D
A-S = 1; S-T = 1; T-A = 1
G → R
A-S = 1; S-T = 1; T-A = 1
G → S
A-P = 1
G → R
N-S = 1
G → D
N-S = 1
G → S
A-S = 1
G → S
A-S = 1
G → V
A-G = 1; G-S = 1; S-A = 1
G → C
OI-II mild form
A-F = 2; F-Y = 1; Y-A = 2
K-N = 1; N-P = 2; P-K = 2
W → C
C-F = 1; C-L = 2; C-M = 2; F-L = 1; F-M = 2; L-M = 1
P → H
Lineage-specific analysis shows evidence for positive selection in this protein in the rodent ancestor. In total, 2.2% of the sites in the rodent ancestor have ω = 72.73, while the rest of the species are evolving under purifying selection, ω = 0.013. For a summary of site and lineage specific results for Col1a1, see Table 3 and 4. For complete set of results see Additional File 6(d).
Prkar2a (interacts with SEMG2)
Prkar2a is a cAMP dependent protein kinase that is attached to the sperm flagella via regulatory subunit (RII) . Protein tyrosine phosphorylation has been linked with successful fertilization due to hyper-activated sperm motility . This increase in phosphorylation is part of a cAMP dependent pathway that activates protein kinase A .
The PRKA families were previously tested for positive selection using 3 to 4 taxa and site-specific model M8 with no significant results for positive selection reported. With our 17 taxa dataset, we were able to detect that 4.7% of sites were evolving at a rate of ω = 2.60, see Table 3 for summary of details.
Positively selected sites detected in the site-specific analysis of Prkar2a were compared to the human Swiss-Prot sequence (P13861). In total 18 sites were predicted to be positively selected, 17 of these sites occur in the region of the protein associated with dimerization and phosphorylation (2-138), see Figure 5(c). In the Swiss-Prot entry there are a number of residues listed as being modified by phosphoserine. These are positions 58, 78, 80, 99 and phosphothreonine at position 54. The sites estimated to be positively selected from our analysis are: 58, 59, 61, 62, 63, 64, 65, 68, 70, 74, 75, these sites are at or in close proximity to these modified residues.
The regulatory subunit alpha 2 of Prkar2a has been shown in vitro to interact with Semg2. The phosphorylation of Semg2 may lead to its activation into forming a gel matrix in the female reproductive tract. From our analysis it is shown that while Semg2 has positively selected sites dispersed throughout its sequence, whereas the positively selected sites for Prkar2a are localized to the N-terminus region, and the remainder of the gene is under strong purifying selection. Literature has so far not specified an exact phosphorylation site for Semg2, which prevents us from commenting further on its interactions with Prkar2a.
Lineage-specific analysis shows that Prkar2a in the macaque has undergone a greater selective pressure to change when compared with other mammalia in the dataset, with 2.53% of sites evolving at ω = 1.22, see Table 4 for summary of results. For complete set of results for Prkar2a, see Additional File 6(g).
Ph20 (interacts with ZP2 and ZP3)
Ph20 is expressed in the testis and found in the acrosome of the sperm. It is also codes for a receptor that is involved in the sperm to zona pellucida (ZP) adhesion .
Previous analysis conducted on this protein involved 6 taxa . Here we have increased the number of taxa to 11. We have omitted the carnivores from our analysis of Ph20 as the sequences were spurious. We found evidence for LBA in the Ph20 dataset. By removing fast evolving sites a fully resolved gene phylogeny is obtained. This gene tree now is in agreement with the ideal species phylogeny (.
Lineage-specific analysis shows that guinea pig is under positive selection, with 6.1% of sites with ω = 12.57 while all other species in the background are evolving at ω = 0.14 or neutrally, see Table 4. The 39 positively selected sites were then compared to the human Swiss-Prot sequence (P38567), see Figure 5(b) for results. Catalytically important resides 146, 148, 211 284 and 287 when mutated result in a reduction in, or loss of, activity . It has been shown experimentally that mutations in the region of this active site significantly reduce or completely block the function of this protein . Our results show that 3 of the positively selected sites, 155, 272, 273, are in close proximity to these regions. Another 5 positively selected sites: 83, 155, 252, 353 and 391 are close to glycosylation sites, see Figure 5(b). These sites when modified are known to change the structure and function of the Ph20 protein. For complete set of results for Ph20 see Additional File 6(e). These results are of significance as the Ph20 protein changes position in the sperm during the different stages of the fertilization process. In guinea pig Ph20 protein is known to migrate from the post acrosomal membrane to the inner acrosomal membrane . Thus finding these positively selected sites in close proximity to these glycosylation sites in guinea pig suggests that these sites have been selected to modify the Ph20 structure more effectively thus increasing the chance of capacitation.
SP56 (interacts with ZP2 and ZP3)
The binding of sperm to the zona pellucida (ZP) is crucial for gamete formation to take place. The exact mechanisms of this process are still to be uncovered therefore any predictions on important residues will greatly improve knowledge by directing mutational studies. SP56 has been shown through photoaffinity cross-linking experiments to have a specific binding affinity for ZP3 . Therefore it is believed to play an important role in the binding of sperm to the ZP matrix. Experiments have shown that during capacitation SP56 is released from the acrosomal matrix and becomes situated in the sperm head membrane, enabling it to act as a ZP3 binding protein .
Here we have found 8 positions in the SP56 protein that are under positive selection (ω = 3.82) following site-specific analysis. These sites were compared to the human SP56 entry in Swiss-Prot (Q13228) to determine possible links to function. One of these 8 positively selected sites is position 122, regarded as a SNP number (rs35396382) in dbSNP database . Although further experimental work needs to be conducted to decipher the clinical association of this position, it is extremely interesting that our most significant positively selected site also displays variation in the population, especially given the overall high level of conservation in this gene. For summary of results see Tables 3 and 4, and for full set of results for this gene see Additional File 6(i).
Zona pellucida (ZP) proteins form the complex glycoprotein coat that surrounds the oocyte . These ZP proteins have been shown to be under strong pressure to change, and results have been published on both site and lineage analyses . Here we have expanded the analysis of ZP2 to include 18 taxa (maximum previously tested = 8 ). We have also applied more complex models of evolution and have sampled deeper branches on the phylogeny including a representative of the Afrotheria - elephant.
In this case, the results of our larger dataset and more complex models show that the values of ω determined here vary slightly when compared to previous analyses . This previous test showed 4.7% of sites to have ω = 2.5, increasing the size of the dataset in this study results in 52 sites in ZP2 that have an ω value of 2.05. See Additional File 6(j) for complete results.
Positively selected sites were compared to the human Swiss-Prot entry for ZP2 (Q05996) to identify possible function for these sites, see Figure 5(d). ZP2 contains 7 carbohydrate chains situated between sites 87-462, these are important for the sperm to bind to the ZP of the egg coat . Of the 46 sites identified to be under positive selection, 23 fall between positions 66-257, this region contains 5 of the binding domains of the carbohydrate chains. The clustering can be seen more clearly in Figure 5(d). Another cluster of positively selected sites (10 sites in total) occurs in the propeptide region (641-745). It has been suggested that upon the cleavage of the propeptide region, the mature ZP2 protein plays a role in the prevention of polyspermy .
Analysis of site-specific evolution in ZP3 identified 48 positively selected sites. Of specific interest are positively selected positions 329, 330, 332, 336, 338, 339, as these sites were in close proximity to identified sperm binding sites (329-334) , see Table 3. The furin cleavage site is identified at position (350-353), and the propeptide domain at position (351-424). When cleavage takes place the ZP3 undergoes a conformational change that inhibits any further sperm binding to the coat thus preventing polyspermy . Of the 48 positively selected sites identified, 10 fall within the propeptide domain, with an additional 12 occurring close to the vicinity of the furin cleavage and sperm binding sites, thus suggesting that there is a pressure to improve binding and prevent polyspermy. For complete set of results for ZP3, see Additional File 6(k).
Adam2 (Fertilin β)
Adam2 is a cell adhesion molecule that plays a fundamental role in the final binding of sperm to the oocyte membrane . Indirect interactions have been shown with female proteins CD9 . (We have not continued further analysis on CD9, as it failed the likelihood mapping test).
Previous results have been published reporting positive selection using site-specific analysis on 6 taxa . Here we have included 12 taxa for Adam2 and we have investigated the possible functional implications of positively selected sites found. In the site-specific analysis we find 7.3% of sites with ω = 3.94, this corresponds to 45 sites in total, see Table 3. Comparison of these positions to human Swiss-Prot Adam2 sequence (Q99965), we determine that 39/45 positively selected sites are situated in the C-terminus region. On closer investigation of these sites we find that 12/45 positively selected sites occur in the disintegrin domain (position 384-473). The disintegrin domain has been shown to be involved in the binding of Adam2 to the oocyte . A cysteine-rich domain occurs between (477-606), 16/45 positively selected sites fall in this region. It has been suggested for Adam12, (another member of the Adam family of proteins), that the cysteine-rich domain plays a role in mediating the cellular interactions via syndecans and integrin , a similar role for this domain in Adam2 can be postulated. Overall the results for Adam2 suggest a selective pressure for increased binding of Adam2 to the oocyte regardless of species of origin. For a complete set of results and LRTs for Adam2, see Additional File 6(a).
Catsper1 is involved in regulating the calcium cation channel in sperm flagella, the result of which is movement of sperm . Previous studies on Catsper1 exon 1 have been performed . We intended to expand our analysis to span all exons and expand the data set to include a variety of mammalia. However, the exon 1 of non-primate mammalia is so highly variable that an accurate alignment cannot be constructed. The remaining exons were highly conserved across all species. We therefore split our catsper1 dataset into two sections each of which produced a good quality alignment for analysis, (1) exon1 of catsper1 for the primates, and (2) entire catsper1 gene for non-primate mammalia.
(a) Catsper1 Exon 1 primates
Site-specific analysis of this protein identified 17% of the protein under positive selection with ω = 3.13. Previous analysis of this exon showed positive selection on indel substitutions in this gene . The positively selected sites are situated throughout exon1, little is known about the functional significance of these sites. However, it is known that exon 1 has a significant role to play in altering the rate of calcium ion channel inactivation. Different lengths in the N-terminus result in different rates of channel inactivation, where a long terminus results in a longer time to activation than the shorter terminus. This is described most effectively by the ball and chain mechanism described in . See Additional File 6(b) for complete results. These results show the importance of this protein, and specifically the first exon, for reproductive success.
(b) Catsper1 entire gene non-primate mammals
Our site-specific analysis identified 16.7% of the sites under positive selection with an ω = 3.27, see Table 3. These sites all cluster in exon 1. While the rodent ancestor appears to be under positive selection with 4.47% of its sites evolving at ω = 999, see Additional File 6(c) for complete set of results. A previous study of 9 rodent species, including Mus musculus individuals from 4 different populations, has shown that within the rodent order there has been a continued pressure to evolve, with positive selection for indel substitutions in exon1 of the Catsper1 gene .
A member of the family of semenogelin genes, Semg2 is involved in the formation of a postcopulatory plug . Previously, positive selection has been reported for both site-specific and lineage-specific analysis for Semg2 [9, 45]. We have expanded the data set from previous analyses to incorporate more species.
In our site-specific analysis, we found that 2.7% of our sites had an ω value of 12.26, see Table 3.
We have performed a novel functional analysis of these positively selected sites by comparing them to the human Semg2 sequence (Q02383) in the Swiss-Prot database. This is a step not previously taken by other studies of Semg2. A striking pattern emerged - all known domains of this protein have several positively selected sites. There is a probable glycosylation site at position 272, which is located close to a large stretch of positively selected sites (positions 262 to 289). It is so far unknown how significant this glycosylation site is in Semg2 and whether it plays a role in modifying the protein to form a copulatory plug. However, the results indicate that this protein, and in particular the region around the glycosylation site, has been under significant pressure to change.
A complete set of results for Semg2 is given in Additional File 6(h). The lineage-specific results are not described here in detail as lineage analyses have been carried out previously on the primate Semg2 gene [9, 45]. It has been shown recently that the rate of evolution for this protein varies depending on the level of sperm competition . Our results are in agreement with this finding, thus further verifying our approach.
Two isoforms of this protein have been identified; we have focused on isoform 1 in the mammalia, as isoform 2 contains an additional human specific region between residues 34-52. To date the exact mechanisms of this transmembrane receptor are unknown. This protein is not well characterized biochemically and its function cannot be verified as reproduction related, therefore we only discuss the results briefly below.
On site-specific analysis of this protein we determined that 30 of the sites are under positive selection (ω = 12.22), see Table 3. From analysis of the sites on the Swiss-Prot entry for human Porimin (Q8N131), we could determine that two positively selected sites (146 and 147), were found in a highly conserved region and fall in close proximity to the N-linked glycosylation site. For complete set of results for Porimin, see Additional File 6(f).
Testing for phylogenetic signal and biases, such as amino acid composition bias and LBA, indicated that there was adequate phylogenetic signal for 10 of the genes and in general no evidence of systematic biases. On testing for LBA, Ph20 was the only protein in this dataset that displayed the typical signature of this bias with gene and species tree agreement being maximized with the removal of the fastest evolving categories. This would suggest that while germ line generation times vary greatly in the dataset, the effect of the resultant LBA does not impact on the sequence data to any great extent (1/11datasets).
Selective pressures for the reproductive proteins studied here are heterogeneous. All proteins exhibited regions of strong conservation proving the importance of maintaining structural stability and overall function in these proteins. All but 1 protein (Adam2) exhibited evidence of positive selection in specific lineages, and all proteins without exception exhibited positive selection in regions of catalytic/functional importance. For SP56 and Col1a1 the site-specific results are entirely novel. The lineage-specific results described here for Prkar2a and Catsper1 exon 1 in primates, are also novel. We have shown that, in the case of Catsper1, there is a fundamental protein functional shift between new world monkeys and old world monkeys. The Dn/Ds measurement applied here assumes that neutral substitution rate is akin to Ds, therefore no selection on silent sites. There have been many publications of late to the contrary therefore we are mindful of examining the rate of silent substitution in all our analyses [46, 47].
For the reproductive genes in our dataset, we show that lineages evolve at unique rates and at functionally crucial sites, specifically those involved in phosphorylation. We have also shown that a number of these proteins (Col1a1 and Catsper1) show positive selection for example in the ancestral rodent lineage and evidence of purifying selection in the subsequent divergent species.
Overall our analyses of these reproductive proteins show how important it is to carefully examine data for systematic biases prior to testing for lineage and/or site specific positive selection. We have also demonstrated the importance of including large numbers of taxa/lineages in these analyses. This finding was highlighted in our analysis of Prkar2a where previous analysis of this protein had included only 4 taxa and therefore reported a negative result. We do not observe any large-scale effect of germ line generation time in our dataset, with only 1 protein (Ph20) with evidence of long branch attraction. The results of Col1a1 indicate that the positively selected sites may have been of such importance for this protein that neighboring mutated sites may have been maintained in the population despite their propensity for causing disease. The location of positively selected sites determined using this approach and in regions of functional importance in the proteins in this dataset, provides us with further evidence of the link between functional shift and positive selection.
The data analyzed in this study consist of homologous reproductive genes from a variety of mammalian genomes. Genes were identified as being reproduction related from literature searches, analysis of protein interaction networks (iHOP)  and expression (microarray) data . The microarray expression data used is from normal human tissues. We have also included a more in-depth analysis of previously identified cases of positive selection in reproductive proteins. A list of all data used in this study are available in Additional File 7, the total number of genes analyzed was 10. Homologs of all 10 reproduction related genes were identified in mammalian genomes that span the entire phylogeny of mammals, see Figure 1. For each of the reproduction related genes, the alignment of homologs contained between 10 and 18 species, and the alignment length varied between 351 and 4374 base pairs.
Protein coding sequences for the reproductive proteins were retrieved by the combination of two methods; Ensembl and Blast searches. Orthologous coding sequences from all available completed mammalian genomes were retrieved from the Ensembl database . These orthologs had been identified previously by performing a genome-wide reciprocal WUBlastp+SmithWaterman search of each gene across all completed genomes. To include those mammalia that were not present in Ensembl a BlastP search was conducted on all the human amino acid sequences from each gene against the Swiss-Prot database.
Primates: Human (Homo sapiens), Chimp (Pan troglodytes), Bonobo (Pan paniscus), Bornean Orangutan (Pongo pygmaeus), Sumatran Orangutan (Pongo abelii), Gorilla (Gorilla gorilla), Rhesus Macaque (Macaca mulatta), Crab eating Macaque (Macaca fascicularis), Pigtailed Macaque (Macaca nemestrina), Bonnet monkey (Macaca radiata), Baboon (Papio hamadryas), Mantled Guereza (Colobus guereza), Vervet Monkey (Cercopithecus aethiops), Angolan Talapoin (Miopithecus talapoin), Squirrel Monkey (Saimiri sciureus), Cotton top tamarin (Saguinus oedipus), Common Marmoset (Callithrix jacchus), Marmoset/Callithrix (Callithrix-jacchus), Spider Monkey (Ateles geoffroyi), Bushbaby (Otolemur garnettii), Common woolly monkey (Lagothrix lagotricha), Ringtailed lemurs (Lemur catta), Kloss Gibbon (Hylobates klossii), Common/Lar Gibbon (Hylobates lar), Night/owl Monkey (Aotus trivirgatus boliviensis). Scandentia: Treeshrew (Tupaia belangeri). Rodents: Mouse (Mus musculus), Rat (Rattus norvegicus), Guinea pig (Cavia porcellus), Ground Squirrel/Squirrel (Spermophilus tridecemlineatus). Lagomorpha: Rabbit (Oryctolagus cuniculus), Pika (Ochotona princes). Eulipotyphila: Hedgehog (Erinaceus europaeus), Shrew (Sorex araneus). Carnivores: Cat (Felis catus), Dog (Canis familiaris). Artiodactyla: Cow (Bos taurus), Pig (Sus scrofa). Perisodactyla: Horse (Equus caballus). Proboscidea: Elephant (Loxodonta africana). Monotremata: Platypus (Ornithorhynchus anatinus). Didelphimorphia: Opossum (Monodelphis domestica).
Multiple Sequence Alignment (MSA)
All coding sequences were translated into their corresponding amino acid sequences using in-house translation software. Gene family alignments were generated at protein level using ClustalX 1.83.1 using default parameter settings . The corresponding nucleotide gene family datasets were aligning based on their protein alignments using in-house software. Each gene family alignment was manually edited using Se-Al  to remove any ambiguous regions.
Nucleotide composition bias, amino acid composition bias and likelihood mapping tests
TreePuzzle 5.2  performs a chi-square test that compares the amino acid composition of each sequence to the frequency distribution assumed in the General Time Reversible (GTR) and Jones Taylor Thornton (JTT) models . Ideally no species should fail this test, however, where two species fail and are thus drawn together on a tree, these sequences are excluded. Using the likelihood mapping method, each tree is disassembled into its constituent quartets and the support for each possible quartet is assessed. If the data contains phylogenetic signal then the likelihood of all three possible relationships for that quartet will be equally likely, these are represented by the three tips of the triangle, and the majority of the signal will be in these tip regions. Otherwise, the vertices and central region will be most heavily populated by supporting quartets.
Phylogenetic trees were constructed using MrBayes v3.2.1  and the amino acid sequences. Amino acid sequences were used in order to vitiate the effects of base and codon compositional biases. The substitution model was selected following model testing using Modelgenerator version 85 . The selected model was JTT, the GTR rate model was implemented and the first 20000 trees for each gene were discarded as "burnin". A majority rule consensus tree from the remaining trees sampled was constructed for each gene. The parameter settings for each gene phylogeny are summarized in Additional File 8.
Site-stripping for significance
To test for long branch attraction (LBA) we applied the slow-fast approach of Brinkman and Phillipe . We implemented the rate categorisation in a maximum likelihood framework in TreePuzzle 5.2 . This software takes the alignment as input and generates ab initio phylogenetic trees. It then calculates the rate of mutation for each site in the alignment. The software specifies 8 arbitrary categories of site: each one of these categories contains some portion of the alignment. In this manuscript 8 is the most rapidly evolving (for example every lineage has a different character state for that character), and category 1 is the most slowly evolving (for example each lineage has the same/identical character state for that character). Sites are then progressively removed from the protein MSA according to their evolutionary rate, and at each stage a new phylogenetic tree is constructed based on this slightly reduced dataset. The difference between the new topology created on a reduced alignment and the original topology reconstructed based on the entire alignment are then compared in a statistical framework to determine which fits the data best (SH Test 2, see below) or which is most similar to the species phylogeny (RMSD Test 1, see below). At each stage we employ MrBayes  to perform the phylogenetic reconstruction using the aforementioned settings.
Tests of the difference between two trees
Test 1: Nodal distance calculation
TOPD/FMTS v 3.3  calculates the distance between the site-stripped trees and the 'ideal' tree. The 'ideal' tree used for each gene was a pruned version of the canonical species tree as seen in Figure 1. A distance matrix is derived by counting the number of nodes that separate each of the taxa in a tree. A distance matrix is calculated for each site-stripped tree as compared to the ideal species tree. The nodal distance score is obtained by calculating the RMSD of the matrices. If both trees are identical the RMSD value would be 0, indicating no distance between them. This figure increases the more distance there is between the two trees.
Test 2: Shimodaira-Hasegawa (SH) statistical test of two trees
For each gene MSA, complete and site-stripped, a comparison of the likelihood of the estimated Bayesian phylogeny for that alignment with the likelihood of its corresponding 'ideal' species tree was carried out using the SH test  implemented in TreePuzzle 5.2  to determine which tree was significantly the best-fit tree for the alignment.
Selective Pressure Analysis
PAML 4.3 [57, 58] uses a ML method of calculating ω for site-specific and lineage-site specific changes. Codeml, part of the PAML 4.3 package [57, 58], applies a series of models to our data, with each model differing from the previous with the addition of more complex parameters. The simplest model is M0, and it calculates an ω value over the entire alignment. This model assumes that all sites and all lineages are evolving at the same rate. Model M3 is an extension of M0 and allows all ω values to vary freely. There are two variations of the M3 model, m3(k = 2) discrete which allows two variable classes of sites and m3(k = 3) which allows three classes of site. M1 is a neutral model that allows two parameter estimates for proportion of sites where ω = 0 or ω = 1. M2 is the selection model, it allows three parameters where ω = 0 or ω = 1 or ω is estimated and free to be greater than 1. M7, is the beta model, it allows ten different site classes for ω between 0 and 1. M7 is compared against the more parameter rich M8 (beta &omega >1). M8 allows 10 different site classes but contains an additional parameter whereby the 11th ω is free to vary between 0 and >1. M8a(beta &omega = 1) is null hypothesis of model 8. Model A & Model B are models that allow testing of ω variation in lineage-site analyses. Model A is an extension of M1 and Model B is a more parameter rich extension of m3(k = 2). We have also implemented model A null which is denoted as modelA1 elsewhere. Model A null is compared to model A in an LRT as per Additional File 9. Only statistically significant models for the data are taken into account. Statistically significant results were decided by calculating the difference in log likelihood or, lnL, scores between models and their more parameter rich extensions in a likelihood ratio test (LRT) as described previously in [17, 58]. If the likelihood score was exceeded the critical χ2 values, then the result was significant. See Additional File 9 for full set of LRTs performed.
In silico analysis of positively selected sites
Sites under positive selection (ω > 1) were estimated using the empirical Bayes methods in the site-specific and lineage specific analysis performed. The methods used were naúve empirical Bayes (NEB) and Bayes empirical Bayes (BEB) . Swiss-Prot is a protein sequence database that provides description of the function of a protein, the domain structures, post-translational modifications and variants. Significant sites, verified through close examination of the MSAs and codeml output using alignment visualisation software Se-AL , were compared with unaligned human amino acid sequence taken from Swiss-Prot. These sites were examined to see whether or not they lay in catalytically important regions of the protein.
Bayes Empirical Bayes
Coding DNA sequence
Non-synonymous substitution per non-synonymous site
Synonymous substitution per synonymous site
Frequency of amino acids
gamma distributed sites rates across sites
General Time Reversible
Jones, Taylor and Thornton
Long Branch Attraction
Likelihood Ratio Test
Multiple Sequence Alignment
data not available
Naïve Empirical Bayes
No statistical difference
Osteogenesis imperfecta type -2/-3/-4
Root Mean Squared Deviation
Single nucleotide polymorphism.
We would like to thank the Irish Research Council for Science, Engineering and Technology (Embark Initiative Postgraduate Scholarship to NBL and CCM) for financial support and DCU School of Biotechnology scholarship (for TAW). We would like to thank the SFI/HEA Irish Centre for High-End Computing (ICHEC) for processor time and technical support for both phylogeny reconstruction and selection analysis. We would like to thank the SCI-SYM centre for processor time.
- Aagaard JE, et al: Rapidly evolving zona pellucida domain proteins are a major component of the vitelline envelope of abalone eggs. Proceedings of the National Academy of Sciences of the United States of America. 2006, 103 (46): 1730-17307. 10.1073/pnas.0603125103.View ArticleGoogle Scholar
- Wyckoff GJ, Wang W, Wu CI: Rapid evolution of male reproductive genes in the descent of man. Nature. 2000, 403 (6767): 304-309. 10.1038/35002070.View ArticlePubMedGoogle Scholar
- McInerney JO: The causes of protein evolutionary rate variation. Trends Ecol Evol. 2006, 21 (5): 230-2. 10.1016/j.tree.2006.03.008.View ArticlePubMedGoogle Scholar
- Zhou T, Drummond DA, Wilke CO: Contact density affects protein evolutionary rate from bacteria to animals. J Mol Evol. 2008, 66 (4): 395-404. 10.1007/s00239-008-9094-4.View ArticlePubMedGoogle Scholar
- Li WH, et al: Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol Phylogenet Evol. 1996, 5 (1): 182-7. 10.1006/mpev.1996.0012.View ArticlePubMedGoogle Scholar
- Gaut BS, et al: Relative rates of nucleotide substitution at the rbcL locus of monocotyledonous plants. J Mol Evol. 1992, 35 (4): 292-303. 10.1007/BF00161167.View ArticlePubMedGoogle Scholar
- Ohta T: An examination of the generation-time effect on molecular evolution. Proc Natl Acad Sci USA. 1993, 90 (22): 10676-80. 10.1073/pnas.90.22.10676.PubMed CentralView ArticlePubMedGoogle Scholar
- Swanson WJ, Vacquier VD: The rapid evolution of reproductive proteins. Nature reviews Genetics. 2002, 3 (2): 137-144. 10.1038/nrg733.View ArticlePubMedGoogle Scholar
- Dorus S, et al: Rate of molecular evolution of the seminal protein gene SEMG2 correlates with levels of female promiscuity. Nature genetics. 2004, 36 (12): 1326-1329. 10.1038/ng1471.View ArticlePubMedGoogle Scholar
- Anisimova M, Bielawski JP, Yang Z: Accuracy and power of bayes prediction of amino acid sites under positive selection. Molecular biology and evolution. 2002, 19 (6): 950-958.View ArticlePubMedGoogle Scholar
- Shyamsundar R, et al: A DNA microarray survey of gene expression in normal human tissues. Genome biology. 2005, 6 (3): R22-10.1186/gb-2005-6-3-r22.PubMed CentralView ArticlePubMedGoogle Scholar
- He Z, et al: Expression of Col1a1, Col1a2 and procollagen I in germ cells of immature and adult mouse testis. Reproduction. 2005, 130 (3): 333-41. 10.1530/rep.1.00694.View ArticlePubMedGoogle Scholar
- Murphy WJ, et al: Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001, 294 (5550): 2348-51. 10.1126/science.1067179.View ArticlePubMedGoogle Scholar
- Shimodaira H, Hasegawa M: CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics. 2001, 17 (12): 1246-7. 10.1093/bioinformatics/17.12.1246.View ArticlePubMedGoogle Scholar
- Schmidt HA, et al: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics (Oxford, England). 2002, 18 (3): 502-504. 10.1093/bioinformatics/18.3.502.View ArticleGoogle Scholar
- Strimmer K, von Haeseler A: Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proceedings of the National Academy of Sciences of the United States of America. 1997, 94 (13): 6815-6819. 10.1073/pnas.94.13.6815.PubMed CentralView ArticlePubMedGoogle Scholar
- Loughran NB, et al: The phylogeny of the mammalian heme peroxidases and the evolution of their diverse functions. BMC evolutionary biology. 2008, 8: 101-10.1186/1471-2148-8-101.PubMed CentralView ArticlePubMedGoogle Scholar
- Puigbo P, Garcia-Vallve S, McInerney JO: TOPD/FMTS: a new software to compare phylogenetic trees. Bioinformatics (Oxford, England). 2007, 23 (12): 1556-1558. 10.1093/bioinformatics/btm135.View ArticleGoogle Scholar
- Behera MA, et al: Thrombospondin-1 and thrombospondin-2 mRNA and TSP-1 and TSP-2 protein expression in uterine fibroids and correlation to the genes COL1A1 and COL3A1 and to the collagen cross-link hydroxyproline. Reproductive sciences (Thousand Oaks, Calif). 2007, 14 (8 Suppl): 63-76.View ArticleGoogle Scholar
- Marini JC, et al: Consortium for osteogenesis imperfecta mutations in the helical domain of type I collagen: regions rich in lethal mutations align with collagen binding sites for integrins and proteoglycans. Human mutation. 2007, 28 (3): 209-221. 10.1002/humu.20429.PubMed CentralView ArticlePubMedGoogle Scholar
- Oyen O, et al: Human testis cDNA for the regulatory subunit RII alpha of cAMP-dependent protein kinase encodes an alternate amino-terminal region. FEBS letters. 1989, 246 (1-2): 57-64. 10.1016/0014-5793(89)80253-4.View ArticlePubMedGoogle Scholar
- Leclerc P, de Lamirande E, Gagnon C: Cyclic adenosine 3',5'monophosphate-dependent regulation of protein tyrosine phosphorylation in relation to human sperm capacitation and motility. Biology of reproduction. 1996, 55 (3): 684-692. 10.1095/biolreprod55.3.684.View ArticlePubMedGoogle Scholar
- Hunnicutt GR, Primakoff P, Myles DG: Sperm surface protein PH-20 is bifunctional: one activity is a hyaluronidase and a second, distinct activity is required in secondary sperm-zona binding. Biology of reproduction. 1996, 55 (1): 80-86. 10.1095/biolreprod55.1.80.View ArticlePubMedGoogle Scholar
- Swanson WJ, Nielsen R, Yang Q: Pervasive adaptive evolution in mammalian fertilization proteins. Molecular biology and evolution. 2003, 20 (1): 18-20.View ArticlePubMedGoogle Scholar
- Arming S, et al: In vitro mutagenesis of PH-20 hyaluronidase from human sperm. European journal of biochemistry/FEBS. 1997, 247 (3): 810-814. 10.1111/j.1432-1033.1997.t01-1-00810.x.View ArticlePubMedGoogle Scholar
- Phelps BM, Myles DG: The guinea pig sperm plasma membrane protein, PH-20, reaches the surface via two transport pathways and becomes localized to a domain after an initial uniform distribution. Developmental biology. 1987, 123 (1): 63-72. 10.1016/0012-1606(87)90428-3.View ArticlePubMedGoogle Scholar
- Bleil JD, Wassarman PM: Identification of a ZP3-binding protein on acrosome-intact mouse sperm by photoaffinity crosslinking. Proceedings of the National Academy of Sciences of the United States of America. 1990, 87 (14): 5563-5567. 10.1073/pnas.87.14.5563.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim KS, Cha MC, Gerton GL: Mouse sperm protein sp56 is a component of the acrosomal matrix. Biology of reproduction. 2001, 64 (1): 36-43. 10.1095/biolreprod64.1.36.View ArticlePubMedGoogle Scholar
- Sherry ST, Ward M, Sirotkin K: dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 1999, 9 (8): 677-9.PubMedGoogle Scholar
- Gupta SK, et al: Structural and functional attributes of zona pellucida glycoproteins. Society of Reproduction and Fertility supplement. 2007, 63: 203-216.PubMedGoogle Scholar
- Swanson WJ, et al: Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals. Proceedings of the National Academy of Sciences of the United States of America. 2001, 98 (5): 2509-2514. 10.1073/pnas.051605998.PubMed CentralView ArticlePubMedGoogle Scholar
- Chakravarty S, et al: Relevance of glycosylation of human zona pellucida glycoproteins for their binding to capacitated human spermatozoa and subsequent induction of acrosomal exocytosis. Molecular reproduction and development. 2008, 75 (1): 75-88. 10.1002/mrd.20726.View ArticlePubMedGoogle Scholar
- Shabanowitz RB, O'Rand MG: Characterization of the human zona pellucida from fertilized and unfertilized eggs. Journal of reproduction and fertility. 1988, 82 (1): 151-161.View ArticlePubMedGoogle Scholar
- Wassarman PM: Mammalian fertilization: molecular aspects of gamete adhesion, exocytosis, and fusion. Cell. 1999, 96 (2): 175-183. 10.1016/S0092-8674(00)80558-9.View ArticlePubMedGoogle Scholar
- Patrat C, et al: Zona pellucida from fertilised human oocytes induces a voltage-dependent calcium influx and the acrosome reaction in spermatozoa, but cannot be penetrated by sperm. BMC developmental biology. 2006, 6: 59-10.1186/1471-213X-6-59.PubMed CentralView ArticlePubMedGoogle Scholar
- Primakoff P, Hyatt H, Tredick-Kline J: Identification and purification of a sperm surface protein with a potential role in sperm-egg membrane fusion. The Journal of cell biology. 1987, 104 (1): 141-149. 10.1083/jcb.104.1.141.View ArticlePubMedGoogle Scholar
- Evans JP: The molecular basis of sperm-oocyte membrane interactions during mammalian fertilization. Human reproduction update. 2002, 8 (4): 297-311. 10.1093/humupd/8.4.297.View ArticlePubMedGoogle Scholar
- Wong GE, et al: Analysis of fertilin alpha (ADAM1)-mediated sperm-egg cell adhesion during fertilization and identification of an adhesion-mediating sequence in the disintegrin-like domain. The Journal of biological chemistry. 2001, 276 (27): 24937-24945. 10.1074/jbc.M101637200.View ArticlePubMedGoogle Scholar
- Iba K, et al: The cysteine-rich domain of human ADAM 12 supports cell adhesion through syndecans and triggers signaling events that lead to beta1 integrin-dependent cell spreading. The Journal of cell biology. 2000, 149 (5): 1143-1156. 10.1083/jcb.149.5.1143.PubMed CentralView ArticlePubMedGoogle Scholar
- Carlson AE, et al: CatSper1 required for evoked Ca2+ entry and control of flagellar function in sperm. Proceedings of the National Academy of Sciences of the United States of America. 2003, 100 (25): 14864-14868. 10.1073/pnas.2536658100.PubMed CentralView ArticlePubMedGoogle Scholar
- Podlaha O, Zhang J: Positive selection on protein-length in the evolution of a primate sperm ion channel. Proceedings of the National Academy of Sciences of the United States of America. 2003, 100 (21): 12241-12246. 10.1073/pnas.2033555100.PubMed CentralView ArticlePubMedGoogle Scholar
- Avenarius MR, et al: Human male infertility caused by mutations in the CATSPER1 channel protein. American Journal of Human Genetics. 2009, 84 (4): 505-510. 10.1016/j.ajhg.2009.03.004.PubMed CentralView ArticlePubMedGoogle Scholar
- Podlaha O, et al: Positive selection for indel substitutions in the rodent sperm protein catsper1. Molecular biology and evolution. 2005, 22 (9): 1845-1852. 10.1093/molbev/msi178.PubMed CentralView ArticlePubMedGoogle Scholar
- Peter A, et al: Semenogelin I and semenogelin II, the major gel-forming proteins in human semen, are substrates for transglutaminase. European journal of biochemistry/FEBS. 1998, 252 (2): 216-221. 10.1046/j.1432-1327.1998.2520216.x.View ArticlePubMedGoogle Scholar
- Hurle B, et al: Comparative sequence analyses reveal rapid and divergent evolutionary changes of the WFDC locus in the primate lineage. Genome research. 2007, 17 (3): 276-286. 10.1101/gr.6004607.PubMed CentralView ArticlePubMedGoogle Scholar
- Chamary JV, Hurst LD: The price of silent mutations. Sci Am. 2009, 300 (6): 46-53. 10.1038/scientificamerican0609-46.View ArticlePubMedGoogle Scholar
- Hurst LD, Pal C: Evidence for purifying selection acting on silent sites in BRCA1. Trends Genet. 2001, 17 (2): 62-5. 10.1016/S0168-9525(00)02173-9.View ArticlePubMedGoogle Scholar
- iHOP: The iHOP database.
- Ensembl: Ensembl. cited, [http://www.ensembl.org]
- Chenna R, et al: Multiple sequence alignment with the Clustal series of programs. Nucleic acids research. 2003, 31 (13): 3497-3500. 10.1093/nar/gkg500.PubMed CentralView ArticlePubMedGoogle Scholar
- Rambaut A: Se-AL Sequence alignment editor. Oxford. 1996Google Scholar
- Lanave C, et al: A new method for calculating evolutionary substitution rates. J Mol Evol. 1984, 20 (1): 86-93. 10.1007/BF02101990.View ArticlePubMedGoogle Scholar
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics (Oxford, England). 2003, 19 (12): 1572-1574. 10.1093/bioinformatics/btg180.View ArticleGoogle Scholar
- Keane TM, et al: Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC evolutionary biology. 2006, 6: 29-10.1186/1471-2148-6-29.PubMed CentralView ArticlePubMedGoogle Scholar
- Brinkmann H, Philippe H: Archaea sister group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. Mol Biol Evol. 1999, 16 (6): 817-25.View ArticlePubMedGoogle Scholar
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19 (12): 1572-4. 10.1093/bioinformatics/btg180.View ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Computer applications in the biosciences: CABIOS. 1997, 13 (5): 555-556.PubMedGoogle Scholar
- Yang ZW, Wong S, Nielsen R: Bayes empirical bayes inference of amino acid sites under positive selection. Molecular biology and evolution. 2005, 22 (4): 1107-1118. 10.1093/molbev/msi097.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.