Skip to main content

Molecular evolution of mammalian genes with epistatic interactions in fertilization

Abstract

Background

Genes that encode proteins associated with sperm competition, fertilization, and sexual conflicts of interest are often among the most rapidly evolving parts of animal genomes. One family of sperm-expressed genes (Zp3r, C4bpa) in the mammalian gene cluster called the regulator of complement activation (RCA) encodes proteins that bind eggs and mediate reproductive success, and are therefore expected to show high relative rates of nonsynonymous nucleotide substitution in response to sexual selection in comparison to other genes not involved in gamete binding at fertilization. We tested that working hypothesis by using phylogenetic models of codon evolution to identify episodes of diversifying positive selection. We used a comparative approach to quantify the evidence for episodic diversifying selection acting on RCA genes with known functions in fertilization (and sensitivity to sexual selection), and contrast them with other RCA genes in the same gene family that function in innate immunity (and are not sensitive to sexual selection).

Results

We expected but did not find evidence for more episodes of positive selection on Zp3r in Glires (the rodents and lagomorphs) or on C4BPA in Primates, in comparison to other paralogous RCA genes in the same taxon, or in comparison to the same orthologous RCA gene in the other taxon. That result was not unique to RCA genes: we also found little evidence for more episodes of diversifying selection on genes that encode selective sperm-binding molecules in the egg coat or zona pellucida (Zp2, Zp3) in comparison to members of the same gene family that encode structural elements of the egg coat (Zp1, Zp4). Similarly, we found little evidence for episodic diversifying selection acting on two other recently discovered genes (Juno, Izumo1) that encode essential molecules for sperm–egg fusion.

Conclusions

These negative results help to illustrate the importance of a comparative context for this type of codon model analysis. The results may also point to other phylogenetic contexts in which the effects of selection acting on these fertilization proteins might be more readily discovered and documented in mammals and other taxa.

Background

Genes that encode molecules expressed on the surfaces of gametes are key to the success of several interactions among males or between males and females, including sperm chemoattraction toward the egg, gamete physiological activation (including the sperm acrosome reaction), sperm binding to the egg coat, and fusion of gametes [26, 52]. Such genes are among the most rapidly evolving parts of animal genomes [34, 80], in part because the gene products are subject to both natural selection associated with fertilization success and sexual selection associated with sperm competition among males or reproductive conflicts of interest between males and females. A frequent outcome of such selection within species is the rapid divergence of protein-coding sequences between closely related species, in part via high relative rates of nonsynonymous nucleotide substitutions that affect the specificity of protein interactions during fertilization [19, 68, 69]. Codon models of nucleotide evolution can be used to identify episodes of diversifying or positive selection associated with specific lineages or specific codons in alignments of protein-coding sequences [4, 5].

Among mammals, considerable research has focused on genes that encode glycoproteins involved in sperm–egg binding. The mammalian egg coat proteins include two members of the ZP gene family (Zp2 and Zp3) that bind sperm in a selective or species-specific manner (reviewed by [6, 38, 77, 78]). Pairs of ZP2 and ZP3 proteins form heterodimers in antiparallel orientation, with the heterodimers joined to ZP1 polymers that appear to have a structural role in forming the zona pellucida [14, 15, 23, 39, 43]. Several studies have documented high rates of molecular evolution of Zp2 and Zp3 among closely related mammal species, including episodes of diversifying or positive selection on codons in the known sperm-binding domains of rodent genes [66, 67, 69, 72, 73], and population genetic analyses indicate selection on ZP2 and ZP3 in humans ([24, 58]; for counterexamples see [2, 13, 41]).

The identification of the sperm protein(s) responsible for the specificity or selectivity of sperm binding to the egg coat via interactions with ZP2 and ZP3 has been highly contentious (reviewed by [51, 52]). One well-studied candidate gene that was originally identified in mice is the sperm receptor for the zona pellucida called Zp3r (also called sperm protein 56 or Sp56; [76]). In the mouse genome, Zp3r occurs on chromosome 1 in a cluster of protein-coding genes called the regulator of complement activation (RCA; [28]). The mammalian RCA cluster includes two genes that encode the alpha and beta subunits of the C4b-binding protein (C4bpa, C4bpb); both proteins are expressed in plasma and (like many other genes in the RCA cluster) function in the innate immune system [53]. In rodents, these three genes (Zp3r, C4bpa, C4bpb) occur in tandem, and each encodes a series of 3–8 repeated sushi domains (also known as complement control protein or CCP domains). The sushi domains contribute to the formation of folded monomers that associate into functional multimeric proteins via oligomerization of their C-terminal sequences [27, 53]. Functional ZP3R in the sperm acrosome consists of an oligomer of six or more monomers [10], similar to the organization of C4b-binding protein in plasma. The co-occurrence of the three genes together in the RCA cluster, their similar protein-coding domains, and their similar organization into functional protein oligomers, suggest that they are descended from a common ancestor within the RCA cluster by a series of gene duplication events (e.g., [35]). In contrast to the innate immune function of its paralogs C4bpa and C4bpb, functional studies show that mouse ZP3R protein binds ZP3 in the egg coat in a species-specific fashion [9, 10, 76]. However, mouse knockout studies that show Zp3r-null homozygote males are fertile [47, 52] suggest that ZP3R is not essential for gamete binding. One interpretation of those results is that multiple sperm proteins (including ZP3R) contribute to (and have redundant functions in) sperm binding to the zona pellucida.

Two previous studies focused on selection associated with ZP3R-dependent gamete binding, but both analyses misidentified the gene [44, 45, 58]. The RCA cluster in human and other primate genomes includes only two paralogous gene copies that encode sushi domains (C4BPA and C4BPB), and does not include a third gene that is orthologous with the rodent gene Zp3r. Instead, Zp3r is unique to Glires (the rodents and lagomorphs), and descended from C4bpa by a gene duplication event in the common ancestor leading to mice, rabbits, and their extant relatives [41]. Consequently, it is clear that Rohlfs et al. [58] documented strong but unexpected evidence for LD between the human gene C4BPA in the RCA cluster and the egg coat gene ZP3. This evidence is unexpected because it implies that human C4BPA is expressed in sperm and mediates gamete binding, which is not a known function or mode of expression for human C4BPA. Other complementary evidence has extended that hypothesis to include human ZP2 coevolution with C4BPA, identified a key codon under selection in all three genes, and showed that covariation of pairs of alleles among those genes has a detectable influence on human fertility [24]. Cagliani et al. [11] found many positively selected codons in their analysis of primate C4BPA, which they ascribed to the immunological (rather than the reproductive) function of that gene. By contrast, the evidence for Zp3r expression and function in the sperm acrosomal vesicle of rodents is clear and well documented, but a codon model analysis of the molecular evolution of Zp3r could only have been applied to an alignment of Zp3r orthologs from Glires (in which this gene occurs next to C4bpa within the RCA cluster), and not outside of that clade. Such an analysis appears not to have been carried out.

Here we analyze the evolution of Zp3r and its paralogs in the RCA gene cluster in Glires and in Primates. We used codon models to identify episodes of positive selection on lineages or codons in alignments of RCA genes that are known or suspected to be involved in sperm–egg binding (Zp3r in Glires, C4BPA in Primates) and alignments of genes that encode sperm receptors in the egg coat (Zp2, Zp3). As a negative control for the contribution of other modes of selection to the evolution of those genes, we contrasted those codon model results against evidence for positive selection on two paralogous genes in the RCA cluster (C4bpb in both taxa; C4bpa in Glires) that are not known to be expressed in gametes or sensitive to sexual selection at fertilization, and two genes that encode structural proteins in the zona pellucida (Zp1, Zp4). As a positive control, we compared those results to models of episodic diversifying selection acting on two genes (Izumo1, Juno) that are known to be required for sperm–egg fusion and are expected to be sensitive to sexual selection at fertilization [22]. We use these comparisons among genes and taxa to test the working hypothesis that sexual selection on these interacting gene products causes high relative rates of nonsynonymous substitution differences among species.

We found some genes involved in sperm–egg binding in Glires or Primates showed episodes of positive selection, but there was little evidence for more episodes of positive selection on such genes in comparison to the other taxon (in which the same orthologous gene is not know to be expressed in gametes) or in comparison to other genes (that are not involved in gamete binding or gamete fusion). We found some modest evidence for coevolution between sperm- and egg-expressed genes that encode interacting gene products, but this evidence was limited to one specific primate lineage. We offer some speculative interpretation of those surprising (and mostly negative) results, and propose some guidelines for future analyses of these or other genes that mediate gamete interactions under sexual selection.

Results

Episodic diversifying selection on RCA genes Zp3r, C4bpa and C4bpb

We used the adaptive branch-site random effects likelihood (aBSREL) model to identify episodes of selection associated with specific lineages (or times in the evolutionary history of the organisms; Fig. 1), and to test the hypothesis of more episodes of positive selection in genes that encode proteins involved in gamete binding or fusion. We found 11 episodes of selection acting on the two RCA genes expressed in sperm, including three episodes of positive selection on Zp3r in the deer mouse and both species of Castorimorpha (beaver, kangaroo rat), and four episodes of positive selection on C4BPA in the bush baby, tarsier, and two Old World monkeys (vervet, crab-eating macaque) (Table 1; Fig. 2). Episodes of positive selection included about 8% of the total branch length in the phylogeny for each of those genes and taxa, and a mean of about 8% of codons in each alignment were included in the class of codons under positive selection on those branches (Table 1; see Additional file 1: Appendix 1 for a complete summary of the codon model results). By contrast, we found no episodes of positive selection acting on C4bpb in Glires or C4BPB in Primates (Table 1).

Fig. 1
figure 1

Species trees used in the codon model analyses, including common names for species and higher taxon names for clades, from Springer et al. [65] and Fabre et al. [16]. Organism icons from phylopic.org

Table 1 Summary of codon model results using the aBSREL and MEME methods
Fig. 2
figure 2

Episodes of diversifying positive selection on three genes from the RCA cluster, including Zp3r and C4bpa (in Glires), and C4BPA (in Primates). Zp3r and C4BPA are known or expected to be expressed in gametes and sensitive to sexual selection, but C4bpa is not. Red branches in each phylogeny show lineages under positive selection in aBSREL analyses. Numbered red vertical bars in each gene cartoon show the locations of codons under positive selection in MEME analyses relative to the total length of the alignment (the grey bar), and relative to several different protein coding domain types. No episodes of diversifying selection were found in C4bpb (in Glires) or C4BPB (in Primates; not shown)

Those results, especially the difference between genes involved in fertilization (Zp3r, C4BPA) compared to a gene involved only in innate immunity (C4bpb), might be interpreted as evidence pointing toward specific primates, rodents, or lagomorphs that have experienced sexual selection associated with the specificity of sperm–egg binding at fertilization. However, other comparisons between genes and taxa did not support that interpretation. Specifically, we found four episodes of positive selection on C4bpa in Glires including the Chinese hamster, the internal branch leading to the most recent common ancestor of squirrel and marmot, the internal branch leading to the most recent common ancestor of two species of Mus, and the lineage that includes the most recent common ancestor of hamsters, voles, and deer mice (Fig. 2). Like the results for Zp3r in Glires and C4BPA in Primates, the relative rate of nonsynonymous substitution was high (ω = 15–693, mean = 293) at positively selected codons along those four branches of C4bpa in Glires, and a substantial proportion (1–9%, mean = 5%) of codons in the C4bpa alignment was included in that positively selected class (Table 1).

Because C4bpa in Glires is not known to be involved in fertilization, those episodes of positive selection cannot be ascribed to sexual selection. The contrast between Zp3r evolution (including three episodes of positive selection) and C4bpa evolution (four episodes) does not suggest an especially strong effect of sexual selection on Zp3r in Glires. Similarly, because C4BPA is known to be involved in innate immunity in Primates and is suspected to be involved in fertilization in humans and perhaps in other Primates, the contribution of sexual selection to C4BPA evolution is expected to be evident as more episodes of positive selection on C4BPA in Primates compared to C4bpa in Glires. However, we found the same number of episodes of positive selection on C4BPA in Primates and on C4bpa in Glires. Thus our planned comparisons between genes (in Glires) or between taxa (for C4bpa) do not support the hypothesized role for sexual selection in the evolution of RCA genes.

We used the mixed effects model of evolution (MEME) to identify positive selection associated with specific codons as a complementary way to test the hypothesis of more episodes of positive selection in genes that encode proteins involved in gamete binding or fusion. We found little evidence for a strong contribution of sexual selection toward the number of episodes (codons) of selection on RCA genes. We found five codons under selection in Zp3r codons in Glires and nine codons under selection in C4BPA in Primates (Fig. 2), which represented 0.9–1.5% of codons in each gene alignment (Table 1). Similar to the aBSREL analyses, we found no codons under selection in C4bpb in Glires or C4BPB in Primates (Table 1).

Like the aBSREL results, comparisons of MEME models between genes for a single taxon (Zp3r versus C4bpa in Glires) and between taxa for a single orthologous gene (C4BPA in Primates versus C4bpa in Glires) did not strongly support the hypothesis of more episodes of positive selection in genes that encode proteins involved in gamete binding. We found six codons under selection in the alignment of C4bpa in Glires compared to five codons under selection in Zp3r (five) (Fig. 2). That comparison between MEME results does not suggest an especially strong effect of sexual selection on the evolution of Zp3r (in fertilization) compared to the effect of natural selection on C4bpa (in innate immunity) among Glires. Only one of our MEME model comparisons supported the hypothesized effect of sexual selection on episodes of positive selection: we found slightly more codons (nine) under selection in C4BPA in Primates (involved in both fertilization and innate immunity) compared to the number of codons (six) under selection in C4bpa in Glires (involved in innate immunity alone) (Fig. 2; Table 1).

Episodic diversifying selection on four egg coat genes

In aBSREL models, we found 1–3 episodes of positive selection on Zp2 or Zp3 in Glires and ZP2 in Primates (Fig. 3), but we found no episodes of positive selection on ZP3 in Primates (Table 1). We found comparable numbers of episodes (1–3) of positive selection for all alignments of Zp1 (Fig. 4) and Zp4 (Fig. 5) in Glires or ZP1 and ZP4 in Primates. The total number of episodes of positive selection (eight) for these two orthologs that do not encode selective or specific sperm-binding molecules (Zp1, Zp4) was greater than the total number of episodes of positive selection (five) for the two orthologs (Zp2, Zp3) that are known to play a role in selective or specific sperm binding. In MEME models, we found 1–4 episodes of positive selection at some Zp2 or Zp3 codons in Glires (Fig. 3), but we found no episodes of positive selection at any ZP2 or ZP3 codons in Primates. Like the aBSREL analyses, we found comparable numbers of episodes (codons) of positive selection (2–3) for all four alignments of Zp1 (Fig. 4) and Zp4 (Fig. 5) in Glires or ZP1 and ZP4 in Primates, and the total number of episodes of positive selection (11 codons) was greater for those two orthologs that do not encode sperm-binding molecules (Zp1, Zp4) in comparison to the total number of episodes of positive selection (five codons) for two orthologs that encode selective sperm-binding molecules (Zp2, Zp3) (Table 1). The discovery of many positively selected codons (on some lineages) in Zp1 is particularly strong evidence against the sexual selection hypothesis because these were the smallest alignments that we analyzed (only 17 species in Glires, and only 22 species in Primates), with fewer lineages in each of those gene trees on which to model rate variation among codons. In spite of that constraint, we found more codons (and more lineages) under positive selection for those two structural genes than for two genes known to encode sperm-binding proteins.

Fig. 3
figure 3

Episodes of diversifying positive selection on two genes that encode egg coat proteins that bind sperm, including Zp3 and Zp2 (in Glires), and ZP2 (in Primates). Both genes in both taxa are known to be expressed in the egg coat and sensitive to sexual selection. Note that no positively selected sites were identified in ZP2 in Primates. Branches and sites under positive selection, and protein coding domain types, are shown as in Fig. 2. No episodes of diversifying selection were found in ZP3 in Primates (not shown)

Fig. 4
figure 4

Episodes of diversifying positive selection on an egg coat structural gene (Zp1) that does not bind sperm and is not expected to be sensitive to sexual selection. Branches and sites under positive selection, and protein coding domain types, are shown as in Fig. 2

Fig. 5
figure 5

Episodes of diversifying positive selection on a second egg coat structural gene (Zp4) that does not bind sperm and is not expected to be sensitive to sexual selection. Branches and sites under positive selection, and protein coding domain types, are shown as in Fig. 2

Episodic diversifying selection on two sperm–egg fusion genes

In contrast to the evidence noted above for episodes of diversifying selection on lineages or codons in two suites of genes that encode sperm–egg binding molecules (including some molecules that are also expressed in innate immunity), we found very little evidence for episodes of selection acting on two gamete fusion genes. In aBSREL models, we found one or two episodes of positive selection on Izumo1 expressed in sperm of Glires or IZUMO1 expressed in sperm of Primates (Fig. 6), but no lineages under selection in either taxon for the gene (Juno) that encodes the cognate molecule that is expressed in eggs and binds IZUMO1 on sperm. In MEME models, we found just one episode of positive selection in Izumo1 and one episode of positive selection in Juno (in Glires) (Fig. 6), but no episodes of positive selection in either gene in Primates (Table 1). The absence of consistent evidence for episodes of selection in comparisons between the two gene pairs (in aBSREL results) and the absence of consistent evidence for episodes of selection in comparisons between the two taxa (in MEME results) did not support the predicted effect of sexual selection on the evolution of genes that encode gamete fusion molecules. Those results are broadly similar to the analyses by Grayson [22] using many of the same sequences but a different model of codon evolution, in which the evidence for positively selected codons in Juno and Izumo1 was limited to other mammal lineages and was weak or absent in Glires and Primates.

Fig. 6
figure 6

Episodes of diversifying positive selection on three genes that encode gamete-fusion proteins, including Juno and Izumo1 (in Glires), and IZUMO1 (in Primates). Both genes are known or expected to be expressed in gametes and sensitive to sexual selection. Note that no positively selected lineages could be identified in Juno (in Glires), and no positively selected sites could be identified in IZUMO1 (in Primates). Branches and sites under positive selection, and protein coding domain types, are shown as in Fig. 2. No episodes of diversifying selection were found in JUNO in Primates (not shown)

Coevolution between genes that encode interacting gene products

We used the branch-site unrestricted statistical test for episodic diversification (BUSTED) to predict positive selection in an egg coat gene based on observed episodes of positive selection in a sperm-expressed gene. We found only one case of apparent coevolution: between ZP2 and C4BPA in Primates. Primate C4BPA evolution included four episodes of positive selection (in the bush baby, tarsier, crab-eating macaque, and vervet lineages; Fig. 2); when we fitted nested codon models of ZP2 evolution with those four lineages in the foreground class and two or three classes of codons with different substitution rates, we obtained a significantly better model fit by the likelihood ratio test [2δln(L) = 13.2, p = 0.00028] for the unconstrained model that included a third class of positively-selected ZP2 codons. That better model included about 7% of codons in the ZP2 alignment with a high relative rate of nonsynonymous substitution (ω = 6.2) along those four foreground branches. The specific source of that signal of coevolution between C4BPA and ZP2 was evident from comparing the aBSREL model results for those two genes: both aBSREL results included an episode of positive selection at some codons along the terminal branch leading to the tarsier (Figs. 2, 3). When we dropped the tarsier lineage from the foreground class in the BUSTED model analysis of ZP2, we obtained a nonsignificant improvement in the model fit [2δln(L) = 1.0, p = 0.59] for the unconstrained model, which suggested no coevolution between ZP2 and C4BPA along those other three branches of the species tree.

No other BUSTED analyses suggested evidence of coevolution between sperm- and egg-expressed gene pairs that encode interacting gene products. Nested models of ZP3 evolution in Primates did not indicate coevolution with C4BPA along the branches of the C4BPA gene tree that showed episodes of diversifying positive selection in Primates (Fig. 2). No models of Zp2 or Zp3 evolution in Glires indicated that either of those genes coevolve with Zp3r. And neither of the BUSTED analyses indicated coevolution between Izumo1 and Juno in Glires or between IZUMO1 and JUNO in Primates.

Exploratory analysis of functional or phenotypic associations with episodes of selection

We did not preregister hypothesis tests about which specific codons in these gene alignments, or which specific lineages in the phylogenies for Glires and for Primates, are expected to be associated with episodes of diversifying positive selection. Here we explore several possible associations that were suggested by our results but not analyzed in hypothesis tests.

The Philippine tarsier (Carlito syrichta) was the only primate lineage associated with multiple episodes of diversifying positive selection in two genes that encode interacting gene products involved in fertilization (C4BPA and ZP2) (Figs. 2, 3). Such episodes of selection could be caused by life history traits that are associated with especially strong competition among males or among sperm, or by conflicts of interest between mates, and could lead to coevolution of the male- and female-expressed genes during coincident episodes of selection. However, the mating systems of tarsiers do not include traits that are usually associated with sperm competition or sexual conflicts of interest in primates. Males are slightly (about 14%) larger than females in C. syrichta [32], individual home ranges overlap slightly, and social groups typically consist of one adult male and one or two adult females plus offspring [50]. In comparison to other primate mating systems that feature pronounced male-biased sexual size dimorphism (e.g., gorilla; [36]), or coercive mating with conflict between the sexes (e.g., chimpanzees; [46]), tarsiers seem to be unlikely hotspots for episodic diversifying selection on fertilization genes. One interpretation of our discovery of coevolution between C4BPA and ZP2 in C. syrichta is that this discovery points toward previously unsuspected strong sexual selection in this lineage, but an alternative interpretation is that our discovery is a false positive (or at least unrelated to the mating system of tarsiers). A follow-up study specifically directed at testing those interpretations is needed.

Other examples of multiple episodes of diversifying positive selection in one lineage seem to argue more strongly against the hypothesis that these sperm- and egg-expressed genes coevolve under selection driven by the interaction of the gene products at fertilization. We found multiple episodes of diversifying positive selection in the kangaroo rat (Dipodomys ordii) including the sperm acrosomal gene Zp3r and the egg coat gene Zp1, but those two gene products are not known to interact at fertilization; instead, ZP1 is thought to form the structural component of the fibrillar protein network in the rodent egg coat. Similarly, we found multiple episodes of diversifying positive selection in the Chinese hamster (Cricetulus griseus) in the sperm-binding gene Zp2 and in C4bpa, but that RCA gene is not known to be expressed in sperm (or involved in fertilization) in rodents; instead, C4bpa is thought to function only in the innate immune system of hamsters and other Glires. Finally, we found multiple episodes of diversifying positive selection in the Mongolian gerbil (Meriones unguiculatus), including the three zona pellucida genes Zp1, Zp2, and Zp3, but not including Zp3r or other sperm-expressed genes; those episodes of selection on all three egg coat genes might point to interesting coevolution among those genes (and interactions among their gene products to form the gerbil egg coat), but they do not seem to point to the effects of sexual selection on molecular evolution.

The discovery of some codons under selection in egg coat genes could potentially be related to the known function of specific domains in those genes. The only positively selected codon that we found in Glires Zp3 (codon 339 in our trimmed Zp3 alignment; Fig. 3; Additional file 2: Appendix 2) occurred in the portion of the gene that encodes the known sperm-binding site (sometimes called the sperm-combining site), between the ZP domain and the transmembrane region near the carboxyl end of the mature protein. That discovery alone would be consistent with the predicted effects of sexual selection on the evolution of selective sperm binding by the egg coat, and is consistent with previous documentation of high rates of nonsynonymous substitution in the ZP3 sperm-binding site in other analyses of Zp3 evolution in rodents (e.g., [72, 73]). However, other results were not consistent with those predicted effects. First, we found a larger number of positively selected codons (four) in Glires Zp2 (codons 116, 118, 121, 161 in our Zp2 alignment; Additional file 2: Appendix 2) but all of those codons occurred outside of the N-terminal region of ZP2 that both confers specificity of sperm binding in mice [7] and covaries with fertility in humans [24]. Second, we found no positively selected codons in ZP2 or ZP3 of Primates inside or outside of the known sperm-binding sites in those genes, and we found no positively selected codons in Juno from either taxon, including sites inside or outside of the parts of each gene that encode the regions known to mediate protein-protein interactions involved in fertilization. Those additional results seem to considerably weaken the overall strength of evidence for diversifying positive selection specifically on the sperm-binding domains of these genes.

It is less straightforward to assign possible functional significance to codons under positive selection in alignments of RCA genes because substrate binding by those multimeric proteins depends on the number and organization of the monomers (and their possible interaction with a beta subunit protein in the oligomer), and in particular because the egg-binding function of those gene products (ZP3R in rodent sperm, C4BPA in human sperm) has not been studied. Indirect insight into possible functional associations could be based on comparisons between paralogous genes with different functions in the same taxon (Zp3r versus C4bpa) or between taxa in which the same ortholog is thought to differ in function (C4bpa versus C4BPA).

Discussion

We found only limited evidence for diversifying positive selection associated with the gamete-binding function of fertilization genes in both Glires and Primates. We expected to find more episodes of positive selection in a gene with multiple functions including innate immunity and sperm-binding to the egg coat (C4BPA in Primates) compared to the same orthologous gene without a role in fertilization in the other taxon (C4bpa in Glires), and we expected to find more episodes of positive selection in a second gene with a known role in gamete-binding (Zp3r in Glires) compared to a paralogous gene without a fertilization function in the same taxon (C4bpa). Neither of those predictions was supported by the model results. Comparable patterns for genes expressed in the zona pellucida, and for two genes that mediate sperm–egg fusion, reinforced this apparent lack of evidence for many episodes of diversifying positive selection associated with two different modes of sperm–egg interaction. We conclude that these data offer little support for the hypothesis that sexual selection shapes the molecular evolution of those gene products in these two taxa at this taxonomic level of comparison (within crown group taxa that are each about 70 Ma old). Similar comparative approaches that contrasted genes with and without a function in fertilization in the same taxon (e.g., [71]), and approaches that contrast homologous genes with different function or expression patterns in different taxa (e.g., [74]), have provided important insights into the causes of selection at the molecular level, and the processes that mediate the response to such selection.

Our study and discoveries benefited from many of the advantages that have been proposed for preregistration as an approach to avoid false positives in evolutionary ecology [17] and other disciplines [62], such as the selective reporting of some model results at the expense of others (sometimes called cherry-picking) or the development of open-ended post hoc hypotheses after the results are known (sometimes called HARKing). Codon model analyses of positive selection seem particularly susceptible to the allure of these questionable research practices because the models can be fitted to data without specifying particular species or coding sequence domains that are expected to be the targets of selection. Constraining our analysis and reporting to include all results (and not just those results that might have conformed to our broadly stated expectations) may help to avoid selective reporting of some results, and seems more likely to lead to an unbiased view of the magnitude and targets of selection.

Our study did not address an alternative working hypothesis: that the response to sexual selection acting on fertilization genes may be mediated by the evolution of gene expression differences rather than by the evolution of substitution differences. Mammal species show substantial qualitative differences in the expression of RCA genes and egg coat genes, including the gain of expression of a new paralog (Zp3r) in Glires, and the loss of expression of some ZP family genes as pseudogenes in diverse mammal lineages including the loss of Zp4 expression in mice [21, 41]. The evolution of those qualitative expression differences suggests that other quantitative differences in expression might also mediate responses to sexual selection acting at fertilization. The observation that the gain and loss of gene function has not included Zp2 and Zp3 suggests that the evolution of quantitative expression differences might be constrained by functional requirements for specific gamete-binding functions (such as essential binding sites in both ZP2 and ZP3 protein subunits). However, there might be considerable scope for selection to modulate relative expression levels within the egg coat or within the sperm acrosome. Promoter regions associated with Zp2 and Zp3 coding sequences are known (e.g., [42]), and are reported to be highly conserved between Primates and Glires (e.g., [37]). Analyses of the evolution and functional variation of these regulatory sequences or of the genes that encode their cognate regulatory molecules (such as repressors) might reveal evidence of responses to sexual selection that were not evident in our analyses of coding sequence evolution.

Our results raise at least two additional questions. First, given the existence of other strong evidence for positive selection acting on genes that mediate fertilization interactions and are sensitive to sexual selection, what is the appropriate genomic scale for this comparative approach? Here we focused on a small number of paralogous CCP-containing genes in the RCA cluster plus a small number of genes from a second gene family (ZP-domain genes), including some pairs of genes in those two gene families that are known or expected to encode interacting gene products. We found that sperm-expressed genes (e.g., Zp3r in Glires) and egg-expressed genes (e.g., Zp2, Zp3) had not experienced more episodes of positive selection in comparisons that were restricted to the most closely-related (and in some ways most directly comparable) parts of the same genomes (C4bpa, C4bpb; Zp1, Zp4). However, it is possible that a broader comparison across the genomes or across the gonad transcriptomes of Glires and Primates might show that these few fertilization genes fall in the far tail of the frequency distribution of strongly positively selected genes. The interpretation of such patterns involving positive selection detected across the genome (e.g., [59, 75]) is complicated by the diverse nature of the structure and function of the genes in the comparison, and by the expectation that they are subject to diverse modes and sources of selection. We find this type of focused comparison (e.g., [25]), limited to a few other genes in the same gene family (ZP-domain) or in the same genomic region (CCP-containing genes), or with a similar function in fertilization (Izumo1, Juno), to be highly informative because such comparisons focus on genes that are expected to have similar functional properties and experience comparable modes of selection. Focused comparisons among such genes seem to have the greatest potential to reveal differences in the episodic nature of selection on genes that are or are not expressed in gametes and subject to sexual selection at fertilization. A broader genomic comparison might lead to different insights into the relative importance of the few episodes of positive selection in gamete-recognition genes that were identified in our analyses.

Second, what is the appropriate temporal or phylogenetic scale for comparative analyses of gamete-recognition genes among taxa? Increased taxon sampling improves the scope for identifying some lineages under positive selection (at some codons in aBSREL models), and improves the scope for identifying some codons under positive selection (along some lineages in MEME models). However, broader comparisons among more distantly-related taxa can be confounded by gap-filled alignments due to the accumulation of real insertion-deletion mutations and due to the accumulation of multiple substitutions that lead to convergent similarities or dubious alignment among highly divergent gene copies from distantly related lineages. Both of those constraints will cause multiple sequence alignment algorithms to infer gap-filled alignments with reduced power to detect positive selection, and may cause misaligned codons to differ at nonsynonymous nucleotide sites (leading to false positives in codon model results). Previous analyses of some of the genes analyzed here found strong evidence of positively selected codons that encode sperm-binding sites in zona pellucida genes of mice, but only in analyses focused on congeneric species [72, 73]. In their analysis of the molecular evolution of complement genes among a diverse suite of Primates, Cagliani et al. [11] found 15 codons under selection in C4BPA (using different criteria from those used in our study to identify positively selected codons), but this was not an unusual proportion of codons under selection (15%) in comparison to other complement genes in their analyses (range 5–35% among 18 genes that showed evidence of positive selection). Our previous analyses of positive selection on gamete-recognition genes from diverging populations or congeneric species of sea stars [25, 55] used similar combinations of phylogenetic and population genetic approaches to identify codons under selection that may be associated with variation in fertility or gamete compatibility. Possibly the evidence for selection acting on such genes is more likely to be detected when sampling focuses on relatively recent episodes of selection. Planned comparisons of codon model results for fertilization genes sampled on increasingly broad phylogenetic scales (e.g., [1]) are needed to test that possibility.

Conclusions

Codon model analyses of protein-coding sequences provide a powerful method for testing hypotheses of selection acting on codons or lineages associated with specific functional features of genes and organisms. A comparative approach that contrasts taxa with different phenotypic traits or contrasts genes with different functional expression patterns can provide important context for interpreting codon model results. We found both codons and lineages under episodic diversifying selection among mammalian species in two clades in which different RCA genes have been implicated in sperm–egg interactions, and those results alone could be interpreted as evidence for sexual selection associated with variation in fertilization success. However, comparisons of codon model results between paralogous genes (with and without a function in fertilization) and between orthologous genes (in taxa with different expression patterns) did not support that interpretation. We conclude that caution is warranted in ascribing any of those particular results to the effects of sexual selection. We advocate for preregistration of analyses and interpretations in future studies, including comparative analyses of molecular evolution among genes and among taxa that can be used to test a specific hypothesis about the causes of selection acting on molecules and organisms.

Methods

Comparative analysis of genes and taxa

We used a common comparative approach to analyze and interpret evidence for diversifying positive selection in codon models caused by sexual selection acting on some genes that encode sperm–egg binding proteins (ZP3R, C4BPA; ZP2, ZP3) or gamete fusion proteins (IZUMO1, JUNO) in some taxa. We compared those results to the same models fitted to alignments for paralogous genes in the same taxon (C4bpa, C4bpb, C4BPB; Zp1, Zp4) that are not sensitive to sexual selection, or to the same models fitted to an alignment for the orthologous gene in the other taxon (C4bpa) in which the gene is not expressed in gametes. We also looked for coevolution between sperm- and egg-expressed genes by searching for single lineages (internal branches or terminal leaves) in the Glires or Primates phylogeny that showed evidence of positive selection in both members of gene pairs that encode interacting proteins.

This approach is similar to the well-known comparative approach used in previous studies (e.g., [69]). The main advantage of this comparative approach is that it can be used to test a specific working hypothesis: more evidence of diversifying positive selection (more codons or lineages) in genes that are sensitive to sexual selection in comparison to other genes (in the same taxon) or other taxa (for the same genes) that do not mediate gamete interactions and are not sensitive to sexual selection. Codon models can be fitted to sequence alignments without a specific hypothesis about which genes or taxa are expected to show evidence of positive diversifying selection, but this unsupervised mode of analysis is more sensitive to false positives when the model results are interpreted post hoc. Unsupervised use of likelihood ratio tests in codon models has been criticized as likely to generate false positives (e.g., [18, 30]). Several solutions to this problem of unconstrained searches for positive selection have been proposed (e.g., [8, 82]), including the specification of hypotheses based on known differences in expression and function between genes and between taxa, such as the contrast between codon model results for candidate genes under selection in comparison to so-called housekeeping genes (e.g., [3]).

We followed a preregistered protocol of codon model analyses to test hypotheses about the influence of sexual selection on the molecular evolution of RCA genes and other sperm-expressed genes and their egg-expressed cognate genes. We preregistered those methods in order to avoid problems associated with the exercise of researcher degrees of freedom in the selection and interpretation of analyses and hypothesis tests (sometimes called the garden of forking paths) [17, 20, 61]. Our preregistered workflow was finalized and deposited in the preregistration database at the Open Science Framework (osf.io/yf9be) before we obtained the sequence data used in our analyses. The preregistration included both our plans for obtaining and handling data and our plans for hypothesis tests. Here we note specific deviations from that workflow that arose during data handling and analysis, including exploratory analyses or tests that were not preplanned.

Taxon choice, data assembly, and sequence alignment

We used mouse genes (for Glires) or human genes (for Primates) as query sequences to search the Ensembl database (release 91; [81]) for coding sequences of orthologous genes in other sequenced mammalian genomes, including 19 Glires and 24 Primates. We used mouse and human orthologs as queries because most of the available experimental annotation for gene function comes from biochemical or genetic analyses of mouse and human genes. Our analysis focused on alignments for genes from each taxon separately (and not analyses of genes for Glires and Primates together in one alignment) because one key gene (Zp3r) is unique to Glires, and our hypothesis testing depended on comparison of results among genes (with different functions in the same taxon) or between taxa (in which one orthologous gene has evolved two different functions). The two taxa are particularly well suited for this kind of comparative approach because they are closely related (and make up the large majority of species in the supertaxon Euarchontoglires), and because the crown group is estimated to be of similar age in each taxon: 71–63 million years ago for Primates [65]; 75–71 million years ago for Glires [64].

We downloaded from Ensembl each 1:1 ortholog that had whole genome alignment coverage (WGA) and gene order conservation (GOC) scores greater than 75. For 1:1 orthologs that included multiple transcripts of different length, we chose the longest isoform. For 1:1 orthologs that failed to pass either of those two filters, we downloaded the Ensembl sequence and confirmed its identity by using the Ensembl sequence as the query in a blastn search against all mouse (or human) sequences in GenBank. For cases in which Ensembl did not identify a 1:1 ortholog of the mouse (or human) gene, or for cases where the 1:1 ortholog with a low WGA or GOC score was not a best blast match to the expected mouse (or human) gene, we used the mouse (or human) ortholog as the query in a blastn search of all GenBank sequences for that species. For those blastn searches we used two search criteria (expectation scores of e < 10− 40 and query coverage greater than 75%) to find a GenBank accession that was orthologous to the mouse (or human) gene. Those blastn searches also identified orthologs from four other species (beaver, marmot, Mongolian gerbil from Glires; red colobus from Primates) for which genome assemblies and gene models were available as sequence accessions in GenBank but not searchable in Ensembl. Cases in which Ensembl contained no 1:1 ortholog and blastn searches did not identify a likely ortholog in GenBank were scored as missing; alignments thus varied in size from a maximum of 22 species (Glires) or 25 species (Primates) to a minimum of 17 species (Zp1, Glires) or 22 species (ZP1, Primates) (see Additional file 3: Appendix 3 for taxon names and accession numbers for each sequence).

We used COBALT [54] to align orthologs within Glires or within Primates. We used the COBALT method because it is sensitive to the organization of protein-coding genes into distinctive functional domains (such as the sushi domains of many RCA genes). We used the default values for COBALT alignment parameters (gap open and extension penalties). Although COBALT successfully conserved the boundaries between coding sequence domains in genes, our preliminary codon model analyses of COBALT alignments included many codons under selection that occurred in parts of the alignments with many gap sites (which may be incorrectly aligned). Alignment errors can cause numerous false positives in codon model analyses [31, 40, 57]. The benefit of removing alignment errors (by deleting gap-filled parts of alignments that may be of dubious homology) is probably greater than the cost of shorter alignments (with fewer sites and reduced power to detect positive selection [56]. For those reasons, we revised the COBALT alignments using two criteria that were not part of our preregistered workflow. First, we examined each alignment for any amino acid sequence motifs for one species that were obviously misaligned with a nearby region of other species (i.e., COBALT errors caused by a high gap opening penalty), and we manually adjusted those regions of each alignment (e.g., a distinctive and obviously misaligned four-codon motif in the 3′ region of C4BPB in the tarsier). Second, we examined each alignment for short (≤30 codon) motifs in one sequence that were separated from other parts of the alignment by gaps at both the 5′ and 3′ ends of the motif. We assumed that such islands of codons were likely to represent compressed sequences with many possible alignment errors (e.g., part of Izumo1 in the kangaroo rat). If more than half of the codons in such islands encoded amino acid differences from other sequences in the alignment, then we recoded those islands of codons as missing (replaced with alignment gaps) to represent uncertain homology with other sequences for that region of the alignment. We then used trimAL v1.2 [12] to delete sites in each alignment that were represented by sequence data for < 80% of species. We used the norMD score [70] to assess overall alignment quality with a cutoff value of 0.6 (all alignments passed that filter).

Phylogeny selection for lineage specific analysis

Codon models are used to estimate parameter values associated with episodes of positive selection by mapping synonymous and nonsynonymous nucleotide differences onto a phylogeny (Fig. 1). We used the canonical species tree topology and higher taxon names from Springer et al. ([65]; Fig. 1) for Primates. We used the canonical species tree topology and higher taxon names from Fabre et al. ([16]; Fig. 2) for Glires. We edited the Newick string for each species tree to match the species represented in each alignment for each taxon (17–25 species per alignment), collapsed nodes for missing species accordingly, and added that Newick string to each alignment file as input for codon model analyses (see Additional file 2: Appendix 2 for all sequence alignments and input files for codon model analyses).

We also estimated gene trees for each multiple sequence alignment. An initial empirical protein evolutionary model was determined for each alignment using ModelGenerator v.85 [33]. Phylogenetic reconstruction was performed using MrBayes [29] under the best fit empirical protein evolutionary model. Two independent MCMC chains were run for 500,000 generations with print frequency at 1000 and sample frequency set at 10. The consensus tree was estimated following a burnin of 25%. We used the Shimodaira-Hasegawa (SH) test implemented in TreePuzzle [60] to ask whether the canonical species tree was a significantly worse (p < 0.05) fit to the data for each alignment in comparison to the best gene tree estimated by MrBayes. In three cases where the SH test indicated a better fit to the data for the best gene tree (C4BPA in Primates; ZP1 in Primates; Zp2 in Glires), we used both trees in codon model analyses and asked whether our results differed between those two analyses (in all three cases we recovered the same episodes of positive selection that were also identified in analyses using the canonical species tree).

Codon model analyses

We fitted three models of episodic diversifying selection to coding sequence alignments. We used the MEME method [49] to identify codons in each alignment that were estimated to have high relative rates of nonsynonymous substitution (ω) along some lineages in the species tree. We used the aBSREL method [63] to identify lineages in each species tree that were estimated to have high values of ω for some codons in the alignment. We implemented those models using the datamonkey web interface [79]. In each of those analyses, the primary parameter value was the number of episodes of diversifying or positive selection (codons under selection in the MEME analyses; branches under selection in the aBSREL analyses). We used a relatively stringent criterion (a likelihood ratio test result with p < 0.01) to identify those episodes of positive selection from the analysis of each alignment; we chose this lower critical p value (compared to the proposed critical p value in our preregistration) because we carried out several hypothesis tests for each alignment (a MEME and an aBSREL analysis, plus one or two additional analyses in some cases; see below) and because we were concerned about possible false positives associated with alignment errors. In each analysis we also noted two secondary parameter values: the proportion of the gene tree under positive selection, and the value of β (the nonsynonymous substitution rate along those branches), for each positively selected codon in MEME models; and the proportion of codons under positive selection, and the value of ω (the relative rate of nonsynonymous substitution at those codons), for each positively selected branch in aBSREL models.

We also used the BUSTED method [48] to characterize the strength of coevolution between pairs of male- and female-expressed genes under selection. This model allows the user to assign branches in the gene tree to a class of foreground lineages based on an a priori hypothesis, and then ask whether alignment-wide evidence of positive selection can be detected as a better fit for a model with a high value of ω at some codons on the foreground branches (relative to a null model without positive selection at some codons on foreground branches). We used the aBSREL results for sperm-expressed genes involved in fertilization (Zp3r in Glires; C4BPA in Primates; Izumo1 in each taxon) to identify lineages in the species tree under positive selection. We then used the datamonkey interface to specify those same branches as foreground branches in BUSTED models fitted to data for each alignment of egg-expressed genes (Zp2, Zp3, Juno in both taxa) that encode egg coat proteins that interact with sperm. We used those results to ask whether the BUSTED model with an additional class of positively selected Zp2 (or Zp3 or Juno) codons on those foreground branches was a significantly better fit to the Zp2 (or Zp3 or Juno) data, and thus an indication of coevolution of the pair of genes on the same subset of branches in the species tree.

Our preregistration included one BUSTED analysis that we did not carry out. We had planned a direct comparison of the sexual selection hypothesis for C4BPA evolution in Primates (expressed in innate immunity and expressed in sperm, and coevolving with zona pellucida genes) by aligning those sequences with C4bpa genes from Glires (expressed only in innate immunity), and testing the significance of a BUSTED model with all primate lineages in the foreground class (relative to a null model with both taxa in the same nonselected class of lineages). We did not carry out that analysis because much of the alignment (including the fifth and sixth sushi domains, as well as the 5′ and 3′ nonrepetitive regions) was of doubtful quality with many alignment gaps.

Availability of data and materials

The sequence data analyzed in this study come from public databases (Ensembl; GenBank). All outputs from codon model analyses (Additional file 1: Appendix 1) of sequence alignments (Additional file 2: Appendix 2) based on data from publicly available sequence accessions (Additional file 3: Appendix 3) are available with the full-text of this article.

Abbreviations

aBSREL:

Adaptive branch-site random effects likelihood

BUSTED:

Branch-site unrestricted statistical test for episodic diversification

C4bpa:

Alpha subunit of C4b-binding protein

C4bpb:

Beta subunit of C4b-binding protein

CCP:

Complement control protein repeat

HARKing:

Hypothesizing after results are known

LD:

Linkage disequilibrium

Ma:

Million years

MEME:

Mixed-effects model of evolution

RCA:

Regulator of complement activation

SNP:

Single nucleotide polymorphism

Sp56:

Sperm protein 56

ZP:

Zona pellucida

Zp1:

Zona pellucida protein 1

Zp2:

Zona pellucida protein 2

Zp3:

Zona pellucida protein 3

Zp3r:

Zona pellucida 3 receptor

Zp4:

Zona pellucida protein 4

References

  1. Almeida FC, DeSalle R. Genetic differentiation and adaptive evolution at reproductive loci in incipient Drosophila species. J Evol Biol. 2017;30:524–37.

    Article  CAS  PubMed  Google Scholar 

  2. Amaral AR, Möller LM, Beheregaray LB, Coelho MM. Evolution of 2 reproductive proteins, ZP3 and PKDREJ, in cetaceans. J Hered. 2011;102:275–82.

    Article  CAS  PubMed  Google Scholar 

  3. Andres JA, Maroja LS, Bogdanowicz SM, Swanson WJ, Harrison RG. Molecular evolution of seminal proteins in field crickets. Mol Biol Evol. 2006;23:1574–84.

    Article  CAS  PubMed  Google Scholar 

  4. Anvisimova M, Liberles DA. Detecting and understanding natural selection. In: Cannarozzi GM, Schneider A, editors. Codon Evolution. London: Oxford University Press; 2012. p. 73–96.

  5. Arenas M. Trends in substitution models of molecular evolution. Frontiers Genet. 2015;7:319.

    Google Scholar 

  6. Avella MA, Xiong B, Dean J. The molecular basis of gamete recognition in mice and humans. Mol Human Reprod. 2013;19:279–89.

    Article  CAS  Google Scholar 

  7. Baibakov B, Boggs NA, Yauger B, Baibakov G, Dean J. Human sperm bind to the N-terminal domain of ZP2 in humanized zonae pellucidae in transgenic mice. J Cell Biol. 2012;197:897–905.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Barrett RDH, Hoekstra HE. Molecular spandrels: tests of adaptation at the genetic level. Nat Rev Genet. 2011;12:767–80.

    Article  CAS  PubMed  Google Scholar 

  9. Buffone MG, Kim KS, Doak BJ, Rodriguez-Miranda E, Gerton GL. Functional consequences of cleavage, dissociation and exocytotic release of ZP3R, a C4BP-related protein, from the mouse sperm acrosomal matrix. J Cell Sci. 2009;122:3153–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Buffone MG, Zhuang T, Ord TS, Hui L, Moss SB, Gerton GL. Recombinant mouse sperm ZP3-binding protein (ZP3R/sp56) forms a high order oligomer that binds eggs and ihibits mouse fertilization in vitro. J Biol Chem. 2008;283:12438–45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Cagliani R, Forni D, Filippi G, Mozzi A, de Gioia L, Pontremoli C, Pozzoli U, Bresolin R, Clerici M, Sironi M. The mammalian complement system as an epitome of host–pathogen genetic conflicts. Mol Ecol. 2016;25:1324–39.

    Article  CAS  PubMed  Google Scholar 

  12. Capella-Gutierrez S, Silla-Martinez JM, Gabaladon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Chen S, Costa V, Beja-Pereira A. Evolutionary patterns of two major reproduction candidate genes (Zp2 and Zp3) reveal no contribution to reproductive isolation between bovine species. BMC Evol Biol. 2011;11:24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Clark GF. The role of carbohydrate recognition during human sperm–egg binding. Hum Reprod. 2013;28:566–77.

    Article  CAS  PubMed  Google Scholar 

  15. Darie CC, Janssen WG, Litscher ES, Wassarman PM. Purified trout egg vitelline envelope proteins VE and VE polymerize into homomeric fibrils from dimers in vitro. Biochim et Biophy Acta. 2008;1784:385–92.

    Article  CAS  Google Scholar 

  16. Fabre P-H, Hautier L, Dimitrov D, Douzery EJP. A glimpse on the pattern of rodent diversification: a phylogenetic approach. BMC Evol Biol. 2012;12:88.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Forstmeier W, Wagenmakers E-J, Parker TH. Detecting and avoiding likely false-positive findings – a practical guide. Biol Rev. 2017;92:1941–68.

    Article  PubMed  Google Scholar 

  18. Friedman R, Hughes AL. Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods. Mol Phylogen Evol. 2007;42:388–93.

    Article  CAS  Google Scholar 

  19. Gavrilets S. Rapid evolution of reproductive barriers driven by sexual conflict. Nature. 2000;403:886–9.

    Article  CAS  PubMed  Google Scholar 

  20. Gelman A, Loken E. The statistical crisis in science. Am Sci. 2014;102:460–5.

    Article  Google Scholar 

  21. Goudet G, Mugnier S, Callebaut I, Monget P. Phylogenetic analysis and identification of pseudogenes reveal a progressive loss of zona pellucida genes during evolution of vertebrates. Biol Reprod. 2008;78:796–806.

    Article  CAS  PubMed  Google Scholar 

  22. Grayson P. Izumo1 and Juno: the evolutionary origins and coevolution of essential sperm–egg binding partners. R Soc Open Sci. 2015;2:150296.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Han L, Monné M, Okumura H, Schwend T, Cherry AL, Flot D, Matsuda T, Jovine L. Insights into egg coat assembly and egg–sperm interaction from the X-ray structure of full-length ZP3. Cell. 2010;143:404–15.

    Article  CAS  PubMed  Google Scholar 

  24. Hart MW, Stover DA, Guerra V, Mozaffari SV, Ober C, Mugal CF, Kaj I. Positive selection on human gamete-recognition genes. PeerJ. 2018;6:e4259.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Hart MW, Sunday JM, Popovic I, Learning KJ, Konrad CM. Incipient speciation of sea star populations by adaptive gamete recognition coevolution. Evolution. 2014;68:1294–305.

    Article  CAS  PubMed  Google Scholar 

  26. Hirohashi N, Kamei N, Kubo H, Sawada H, Matsumoto M, Hoshi M. Egg and sperm recognition systems during fertilization. Devel Growth Differ. 2008;50:S221–38.

    Article  CAS  Google Scholar 

  27. Hofmeyer T, Schmelz S, Degiacomi MT, Dal Peraro M, Daneschdar M, Scrima A, van den Heuvel J, Heinz DW, Kolmar H. Arranged sevenfold: structural insights into the C-terminal oligomerization domain of human C4b-binding protein. J Mol Biol. 2012;425:1302–17.

    Article  PubMed  CAS  Google Scholar 

  28. Hourcade D, Holers VM, Atkinson JP. The regulators of complement activation (RCA) gene cluster. Adv Immunol. 1989;46:381–416.

    Article  Google Scholar 

  29. Huelsenbeck JP, Ronquist F. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics. 2001;17:754–5.

    Article  CAS  PubMed  Google Scholar 

  30. Hughes AL. Looking for Darwin in all the wrong places: the misguided quest for positive selection at the nucleotide sequence level. Heredity. 2007;99:364–73.

    Article  CAS  PubMed  Google Scholar 

  31. Jordan G, Goldman N. The effects of alignment error and alignment filtering on the sitewise detection of positive selection. Mol Biol Evol. 2012;29:1125–39.

    Article  CAS  PubMed  Google Scholar 

  32. Kappeler PM. Patterns of sexual dimorphism in body weight among prosimian primates. Fol Primatol. 1991;57:132–46.

    Article  CAS  Google Scholar 

  33. Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McInerney JO. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol. 2006;6:29.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Kober KM, Pogson GH. Genome-wide signals of positive selection in strongylocentrotid sea urchins. BMC Genomics. 2017;18:555.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Krushkal J, Bat O, Gigli I. Evolutionary relationships among proteins encoded by the regulator of complement activation gene cluster. Mol Biol Evol. 2000;17:1718–30.

    Article  CAS  PubMed  Google Scholar 

  36. Leigh SR, Shea BT. Ontogeny and the evolution of adult body size dimorphism in apes. Am J Primatol. 1995;36:37–60.

    Article  PubMed  Google Scholar 

  37. Liang LF, Dean J. Conservation of mammalian secondary sperm receptor genes enables the promoter of the human gene to function in mouse oocytes. Devel Biol. 1993;156:399–408.

    Article  CAS  Google Scholar 

  38. Litscher ES, Williams Z, Wassarman PM. Zona pellucida glycoprotein ZP3 and fertilization in mammals. Mol Reprod Devel. 2009;76:933–41.

    Article  CAS  PubMed  Google Scholar 

  39. Louros NN, Chrysina ED, Baltatzis GE, Patsouris ES, Homodrakas SJ, Iconomidou VA. A common ‘aggregation-prone’ interface possibly participates in the self-assembly of human zona pellucida proteins. FEBS Lett. 2016;590:619–30.

    Article  CAS  PubMed  Google Scholar 

  40. Markova-Raina P, Petrov D. High sensitivity to aligner and high rates of false positives in the estimates of positive selection in the 12 Drosophila genomes. Genome Res. 2011;21:863–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Meslin C, Mugnier S, Callebaut I, Laurin M, Pascal G, Poupon A, Goudet G, Monget P. Evolution of genes involved in gamete interaction: evidence for positive selection, duplications and losses in vertebrates. PLoS One. 2012;7:e44548.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Mold DE, Dinitz AE, Sambandan DR. Regulation of zebrafish zona epllucida gene activity in developing oocytes. Biol Reprod. 2009;81:101–10.

    Article  CAS  PubMed  Google Scholar 

  43. Monné M, Jovine L. A structural view of egg coat architecture and function in fertilization. Biol Reprod. 2011;85:661–9.

    Article  PubMed  CAS  Google Scholar 

  44. Morgan CC, Loughran NB, Walsh TA, Harrison AJ, O’Connell MJ. Positive selection neighboring functionally essential sites and disease-implicated regions of mammalian reproductive proteins. BMC Evol Biol. 2010;10:39.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Morgan CC, Loughran NB, Walsh TA, Harrison AJ, O’Connell MJ. Erratum to: positive selection neighboring functionally essential sites and disease-implicated regions of mammalian reproductive proteins. BMC Evol Biol. 2017;17:170.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Muller MN, Thompson ME, Kahlenberg SM, Wrangham RW. Sexual coercion by male chimpanzees shows that female choice may be more apparent than real. Behav Ecol Sociobiol. 2011;65:921–33.

    Article  Google Scholar 

  47. Muro Y, Buffone MG, Okabe M, Gerton GL. Function of the acrosomal matrix: Zona pellucida 3 receptor (ZP3R/sp56) is not essential for mouse fertilization. Biol Reprod. 2012; 86:23.

    Article  CAS  Google Scholar 

  48. Murrell B, Weaver S, Smith MD, Wertheim JO, Murrell S, Aylward A, Eren K, Pollner T, Martin DP, Smith DM, Scheffler K, Kosakovsky Pond SL. Gene-wide identification of episodic selection. Mol Biol Evol. 2015;32:1365–71.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Murrell B, Wertheim JO, Moola S, Weighill T, Scheffler K, Kosakovsky Pond SL. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 2012;8:e1002764.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Neri-Arboleda I, Stott P, Arboleda NP. Home ranges, spatial movements and habitat associations of the Philippine tarsier (Tarsius syrichta) in Corella. Bohol J Zool. 2002;257:387–402.

    Article  Google Scholar 

  51. Okabe M. Beware of memes in the interpretation of your results – lessons from gene-disrupted mice in fertilization research. FEBS Lett. 2018a;592:2673–9.

    Article  CAS  PubMed  Google Scholar 

  52. Okabe M. Sperm–egg interaction and fertilization: past, present, and future. Biol Reprod. 2018b;99:134–46.

    Article  PubMed  Google Scholar 

  53. Okrój M, Blom AM. C4b-binding protein. In: Barnum S, Schein T, editors. The complement handbook. 2nd ed. New York: Elsevier; 2018. p. 251–9.

    Google Scholar 

  54. Papadopoulos JS, Agarwala R. COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics. 2007;23:1073–9.

    Article  CAS  PubMed  Google Scholar 

  55. Patiño S, Keever CC, Sunday JM, Popovic I, Byrne M, Hart MW. Sperm bindin divergence under sexual selection and concerted evolution in sea stars. Mol Biol Evol. 2016;33:1988–2001.

    Article  PubMed  CAS  Google Scholar 

  56. Privman E, Penn O, Pupko T. Improving the performance of positive selection inference by filtering unreliable alignment regions. Mol Biol Evol. 2012;29:1–5.

    Article  PubMed  CAS  Google Scholar 

  57. Redelings B. Erasing errors due to alignment ambiguity when estimating positive selection. Mol Biol Evol. 2014;31:1979–93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Rohlfs RV, Swanson WJ, Weir BS. Detecting coevolution through allelic association between physically unlinked loci. Am J Human Genet. 2010;86:674–85.

    Article  CAS  Google Scholar 

  59. Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES. The international HapMap consortium. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449:914–8.

    Article  CAS  Google Scholar 

  60. Schmidt HA, Strimmer K, Vingron M, von Haeseler A. Tree-puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002;18:502–4.

    Article  CAS  PubMed  Google Scholar 

  61. Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22:1359–66.

    Article  PubMed  Google Scholar 

  62. Simmons JP, Nelson LD, Simonsohn U. False-positive citations. Persp Psychol Sci. 2018;13:255–9.

    Article  Google Scholar 

  63. Smith MD, Wertheim JO, Weaver S, Murrell B, Scheffler K, Kosakovsky Pond SL. Less is more: an adaptive branch-site random effects model for efficient detection of episodic diversifying selection. Mol Biol Evol. 2015;32:1342–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Springer MS, Emerling CA, Meredith RW, Janecka JE, Eizirik E, Murphy WJ. Waking the undead: implications of a soft explosive model for the timing of placental mammal diversification. Mol Phylogen Evol. 2017;106:86–102.

    Article  Google Scholar 

  65. Springer MS, Meredith RW, Gatesy J, Emerling CA, Park J, Rabosky DL, Stadler T, Steiner C, Ryder OA, Janecka JE, Fisher CA, Murphy WJ. Macroevolutionary dynamics and historical biogeography of primate diversification inferred from a species supermatrix. PLoS One. 2012;7:e49521.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Swann CA, Cooper SJB, Breed WG. Molecular evolution of the carboxy terminal region of the zona pellucida 3 glycoprotein in murine rodents. Reproduction. 2007;133:697–708.

    Article  CAS  PubMed  Google Scholar 

  67. Swann CA, Cooper SJB, Breed WG. The egg coat zona pellucida 3 glycoprotein – evolution of its putative sperm-binding region in Old World murine rodents (Rodentia: Muridae). Reprod Fertil Devel. 2017;29:2376–86.

    Article  CAS  Google Scholar 

  68. Swanson WJ, Vacquier VD. Reproductive protein evolution. Ann Rev Ecol Syst. 2002;33:161–79.

    Article  Google Scholar 

  69. Swanson WJ, Zhang ZH, Wolfner MF, Aquadro CF. Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals. Proc Natl Acad Sci U S A. 2001;98:2509–14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Thompson JD, Plewniak F, Ripp R, Thierry JC, Poch O. Towards a reliable objective function for multiple sequences alignments. J Mol Biol. 2001;314:937–51.

    Article  CAS  PubMed  Google Scholar 

  71. Torgerson DG, Kulathinal RJ, Singh RS. Mammalian sperm proteins are rapidly evolving: evidence of positive selection in functionally diverse genes. Mol Biol Evol. 2004;19:1973–80.

    Article  Google Scholar 

  72. Turner LM, Hoekstra HE. Adaptive evolution of fertilization proteins within a genus: variation in ZP2 and ZP3 in deer mice (Peromyscus). Mol Biol Evol. 2006;32:1656–69.

    Article  CAS  Google Scholar 

  73. Turner LM, Hoekstra HE. Reproductive protein evolution within and between species: maintenance of divergent ZP3 alleles in Peromyscus. Mol Ecol. 2008;17:12616–28.

    Article  CAS  Google Scholar 

  74. Van de Peer Y, Taylor JS, Braasch I, Meyer A. The ghost of selection past: rates of evolution and functional divergence of anciently duplicated genes. J Mol Evol. 2001;53:436–46.

    Article  PubMed  CAS  Google Scholar 

  75. Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol. 2006;4:e72.

    Article  PubMed  PubMed Central  Google Scholar 

  76. Wassarman PM. Mammalian fertilization: the strange case of sperm protein 56. BioEssays. 2009;31:153–8.

    Article  CAS  PubMed  Google Scholar 

  77. Wassarman PM, Litscher ES. A bespoke coat for eggs: getting ready for fertilization. Curr Topics Dev Biol. 2016;117:539–52.

    Article  Google Scholar 

  78. Wassarman PM, Litscher ES. The mouse egg’s zona pellucida. Curr Topics Dev Biol. 2018;130:331–56.

    Article  Google Scholar 

  79. Weaver S, Shank SD, Spielman SJ, Li M, Muse SV, Kosakovsky Pond SL. Datamonkey 2.0: a modern web application for characterizing selective and other evolutionary processes. Mol Biol Evol. 2018;35:773–7.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  80. Wilburn DB, Swanson WJ. From molecules to mating: rapid evolution and biochemical studies of reproductive proteins. J Proteome. 2016;135:12–25.

    Article  CAS  Google Scholar 

  81. Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, et al. Ensembl 2018. Nucl Acids Res. 2018;46:D754–61.

    Article  CAS  PubMed  Google Scholar 

  82. Zhai WW, Nielsen R, Goldman N, Yang ZH. Looking for Darwin in genomic sequences – validity and success of statistical methods. Mol Biol Evol. 2012;29:2889–93.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

Thanks to members of the Crawford Laboratory at Simon Fraser University for constructive criticism of the analyses and interpretation.

Funding

The authors were supported during the analysis and writing by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (no. 05404) to MWH.

Author information

Authors and Affiliations

Authors

Contributions

MWH conceived the study. CCM and MWH designed the analysis, analyzed the data, and wrote the manuscript. Both authors have read and approved the final manuscript.

Corresponding author

Correspondence to Michael W. Hart.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Appendix S1. aBSREL and MEME output files, and summary of codon model analyses. (ZIP 2616 kb)

Additional file 2:

Appendix S2. Input files for codon model analyses (alignments, Newick strings) in fasta format (ZIP 91 kb)

Additional file 3:

Appendix S3. Summary of species names, accession numbers, and Newick strings used in codon model analyses (XLSX 51 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Morgan, C.C., Hart, M.W. Molecular evolution of mammalian genes with epistatic interactions in fertilization. BMC Evol Biol 19, 154 (2019). https://doi.org/10.1186/s12862-019-1480-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12862-019-1480-6

Keywords