Immune genes undergo more adaptive evolution than non-immune system genes in Daphnia pulex

Background Understanding which parts of the genome have been most influenced by adaptive evolution remains an unsolved puzzle. Some evidence suggests that selection has the greatest impact on regions of the genome that interact with other evolving genomes, including loci that are involved in host-parasite co-evolutionary processes. In this study, we used a population genetic approach to test this hypothesis by comparing DNA sequences of 30 putative immune system genes in the crustacean Daphnia pulex with 24 non-immune system genes. Results In support of the hypothesis, results from a multilocus extension of the McDonald-Kreitman (MK) test indicate that immune system genes as a class have experienced more adaptive evolution than non-immune system genes. However, not all immune system genes show evidence of adaptive evolution. Additionally, we apply single locus MK tests and calculate population genetic parameters at all loci in order to characterize the mode of selection (directional versus balancing) in the genes that show the greatest deviation from neutral evolution. Conclusions Our data are consistent with the hypothesis that immune system genes undergo more adaptive evolution than non-immune system genes, possibly as a result of host-parasite arms races. The results of these analyses highlight several candidate loci undergoing adaptive evolution that could be targeted in future studies.


Background
The antagonistic interaction between host and pathogen results in adaptation and counter-adaptation in both players. The continual need for host immune systems to respond to evolving pathogens has led to the prediction that immune genes will show greater evidence of adaptive evolution than the genome average. This prediction has been supported in a recent study documenting the spatial allele frequency distributions in humans [1] and in studies comparing amino acid divergence between humans and chimpanzees [2], rodents [3], chickens [4] and between Drosophila species [5,6]. However, not all genes within the immune system show a strong signature of adaptive evolution, possibly reflecting the diverse functional roles that immune system genes play. For example, the Drosophila study of Obbard et al. [5] concluded that the global pattern was driven by a small number of genes within a limited portion of the immune system such as antiviral RNAi genes and members of the immune deficiency (IMD) pathway. In addition to the expectation that immune genes as a class will show more adaptive divergence than non-immune system genes, it has also been hypothesized that immune genes involved in the recognition of pathogens will show stronger signatures of positive selection than signal transduction molecules, which are more likely to be the subjects of purifying selection [7]. In order to test the generality of these hypotheses, the analyses of multiple immune genes need to be extended to additional species.
Here, we estimated the proportion of nucleotide sites that have been subjected to positive selection in immune versus non-immune genes in the crustacean Daphnia pulex. We compared a set of 30 putative immune system genes to 24 non-immune system genes. The immunity genes were initially identified by searching the D. pulex genome for homologous sequences to genes known to be involved in the immune system of other arthropods [8]. Coupling a phylogenetic approach such as McTaggart et al. [8] with population genetic data can provide additional support that a gene might have an immunological role, and specifically whether it is co-evolving with pathogens. Additionally, this type of approach can potentially uncover whether homologs play different roles in different taxa. Finally, population genetic theory provides the framework to identify and describe the different types of selection that might be at play in host-pathogen interactions. Our aim was to address the following questions: (1) Do immune system genes in Daphnia undergo greater adaptive protein evolution than non-immune system genes? (2) Which immune system genes show the strongest signature of adaptive change? (3) What type of selective pressure (i.e. directional selection or balancing selection) are the identified genes under?

Results
We amplified 54 loci from Daphnia pulex (mean number of individuals = 9.1, range =2-12) and 49 loci from D. parvula (mean number of individuals = 2.5, range 1-4) ( Table 1). We were not successful in amplifying a PCR product in all individuals at all loci. In particular, clone CC was difficult to amplify. Phylogenetic analysis indicates that CC belongs to an incipient lineage of D. pulex, found in western Oregon, called D. arenata [9], hence it was removed from all analyses, unless we failed entirely to amplify a PCR product from D. parvula (N = 5 loci). In the latter case, D. arenata samples were used as the outgroup. In total, 24 genes were classified as non-immune genes and 30 genes were classified as putative immune genes (Table 1). There was no difference in the mean sequence diversity (π) between the cyclical parthenogenic (sexual) and the obligate asexual D. pulex samples (paired two-sample t-test 44 = 0.97, p = 0.33) (data not shown). Thus, all D. pulex samples were considered as a single group for all other tests.

Sequence diversity and divergence
There was a significant difference in the average pairwise number of synonymous substitutions per synonymous site (K s ) between D. pulex and D. arenata compared to D. pulex and D. parvula (t-test, t 18 = −3.058, p-value = 0.007) ( Table 1), consistent with the much closer relationship between D. pulex and D. arenata. Since including two species in the ingroup would artificially inflate estimates of its sequence diversity, the D. arenata samples were excluded from the McDonald-Kreitman tests, except for 5 single-locus tests in which they comprised the outgroup. Likewise, all D. arenata samples were removed from the diversity and divergence calculations and multilocus analysis. Over the 49 remaining genes, the average pairwise sequence diversity between D. pulex and D. parvula at synonymous sites (π s ) was =0.0329 and at non-synonymous sites (π a ) was =0.0057 (Table 1), and the average pairwise sequence divergence at synonymous sites (K s ) was 0.1260 and at nonsynonymous sites (K a ) was 0.0182 (Table 1). There is no significant difference between the average π s -an estimate of 4N e μ -of non-immune and putative immune genes (Wilcoxon rank sum test W = 296.5, p = 1.0), or in the average π a between the two gene classes (Wilcoxon rank sum test W = 311.0, p = 0.78). Similarly, there is no difference in the average divergence at either synonymous or nonsynonymous sites between the two classes of genes (K s : Wilcoxon rank sum test W = 314.5, p = 0.67; K a : Wilcoxon rank sum test W = 309.5, p = 0.81).
Two loci, Caspase 8 and Chitinase 17 have extremely high diversity (π s =0.1456 and 0.1514, respectively) compared to the average (π s =0.0329), and comparable with the average K s (Table 1), which could result if two paralogous loci were sequenced instead of one, or if two highly divergent alleles were being maintained by balancing selection. BLASTn of the sequences to the D. pulex genome does not suggest that we had amplified two loci. To be conservative they were removed from the multilocus MK analysis. The extremely high pairwise sequence diversity (π s ) seen for these loci also results in significantly positive Tajima's D statistics, reflecting a high proportion of intermediate-frequency alleles (D. pulex Caspase 8: D = 2.29, p = 0.01; D. pulex Chitinase 17: D = 1.81, p = 0.02). Again, this is consistent with either the amplification of multiple loci or balancing selection. Caspase 8 also has the highest K a /K s ratio (0.7347) ( Table 1), which is driven by an extreme excess of nonsynonymous substitutions between the species (K aCASPASE8 = 0.1456, K aAVERAGE = 0.0182). K sCASPASE8 is also higher than average (=0.1994 versus 0.1260 respectively), although to a lesser extent.

McDonald-Kreitman tests
The McDonald-Kreitman test compares the within species nucleotide diversity to between species nucleotide divergence at synonymous and non-synonymous sites [10]. Evolutionary theory predicts that if nucleotide substitutions are neutral, then the ratio of synonymous: non-synonymous changes will be equivalent within and between species. Forty of the 54 loci showed sufficient polymorphism to conduct individual McDonald-Kreitman (MK) tests (Table 2). Of these, 3 loci were significant or marginally non-significant prior to a Benjamini-Hochberg False Discovery Rate (FDR) correction (Mannose phosphate isomerase (MPI), p = 0.03; Scavenger 3, p = 0.04; and Toll 1, p = 0.05). The significance is due to an excess of nonsynonymous divergence at the Toll 1 and Scavenger 3 loci (indicative of positive selection), and an excess of non-synonymous polymorphism in MPI (indicative of weakly deleterious   Forty-seven out of 54 loci were included in the multilocus model (20 control genes and 27 immunity genes). Genes that used D. arenata as the outgroup (N = 5) were excluded from this analysis as their inclusion violates the assumption of the model that the divergence time is equal at all loci. Caspase 8 and Chitinase 17 were removed for the reasons described above. The results of the likelihood ratio tests amongst the four models show that both M2 and M3 are significantly better than the null model M0 (Table 3). Of these two, the Akaike information criterion strongly supports model M3, where a separate α value is estimated for each class of genes (immune versus control), over M2 (Akaike weight (wi) =0.96 versus 0.0 respectively) ( Table 3). This is consistent with a significant permutation test between the two classes of genes (α nonimmune = −0.27 (2.5 and 97.5% bootstrap intervals (BI): -1.003, 0.320), α immune = 0.33 (BI: 0.029,0.607), p = 0.049 by permuting genes between classes).
The rank order between the estimate of the absolute number of adaptive nonsynonymous substitutions/nonsynonymous site (a) and α differed (Table 4), indicating that the level of constraint may differ among the genes. The five loci with the highest a values are all classed as immune system genes (Toll 1, Caspase 3, GNBP 1, MyD88 and Dicer). However, there is not a clear split between immune and control genes; indeed 7 putative immune genes are amongst those with negative a values. The gene with the highest estimated a = 0.058 (Toll 1) was marginally non-significant prior to the FDR correction (p = 0.052) in the single locus MK test. The other gene that had significant MK tests prior to correcting for multiple testing (MPI), and was included in the multilocus MK test, was estimated to have a negative a     Table 3  value, indicating that either that some mildly deleterious alleles may be segregating at this locus, or that it was insufficiently sampled.

Discussion
Immune genes in Daphnia undergo greater adaptive evolution than non-immune genes Consistent with the results obtained from humans [2], rodents [3], chickens [4] and fruit flies [5], our data support the hypothesis that on average immune system genes have experienced more adaptive evolution than non-immune system genes. Specifically, the most likely evolutionary model for our data fits a separate and higher value of proportion of the non-synonymous divergence due to adaptive evolution (α) for immune (=0.33) versus non-immune genes (−0.27) ( Table 3). This is particularly striking since our classification of immune versus non-immune genes is likely to be conservative because, due to lack of functional information, we may have included immune genes in the non-immune gene group, which will lessen the actual the difference between the two classes. It is less likely that we have erroneously included non-immune genes in the immune class, as immune genes were only classified as such if functional information was available at homologous loci in other species. Thus, this study illustrates that the effect of host-parasite co-evolution -which is the most likely driver of adaptive change in immune system genesis strong and consistent across a diverse taxonomic range. As not all immune system genes exhibit evidence of adaptive evolution, our data confirm the general conclusion that immune system genes are subject to a wide range of selective pressures [3,5,7]. One explanation for this variation is that molecules interacting directly with pathogens, either in their recognition or destruction undergo more adaptive changes than molecules that do not (e.g. signal transducers). In support of this hypothesis, a study in Drosophila found that recognition moleculesbut not signalling or effector moleculeshad a significantly higher proportion of positively selected genes than the genome average [7]. Additionally, effector molecules show evidence of positive selection in frogs and termites (reviewed in [11]). In contrast, all members of the IMD pathway in Drosophila, including the signal transducers (except for IMD),  showed evidence of adaptive change, while the recognition molecule that initiates this cascade did not [5]. While our dataset does not contain all members from a pathway, our results do not support the hypothesis; indeed a signal transducer (MyD88) in the Toll pathway is one of the five genes with the highest number of adaptive substitutions/site (a).
Population genetic tests for adaptive evolution reveal likely candidate immune genes under selective pressure In addition to testing if immune system genes as a class undergo more adaptive change than non-immune system genes, the estimates of α and a and/or single-locus MK tests from this study have identified several loci that may be evolving in a non-neutral manner. The first of these, Toll 1, encodes a transmembrane receptor that is activated by recognition proteins after fungal and bacterial invasion. Toll 1 had the highest proportion of nonsynonymous divergence due to adaptive evolution (α=0.91), as well as the highest number of adaptive substitutions per site (a = 0.058). Additionally, the single locus MK test was marginally non-significant because of an elevated number of nonsynonymous fixed differences between the two Daphnia species. Combined with the fact that this locus does not have an elevated synonymous site sequence diversity, indeed it is slightly lower than the average (π s =0.02 versus 0.03), we suggest that Toll 1 may be evolving by directional selection. In contrast, we did not detect a signature of selection in the two additional Toll genes included in this study (Toll 2 and Toll 3). A phylogenetic analysis indicates that Toll 1 is in a clade with Toll 2, while Toll 3 is in a separate clade [8]. From this we speculate that the functionality of the Toll 1 gene has evolved since its duplication from Toll 2.
Similarly, the McDonald-Kreitman test showed that the Scavenger 3 locus had a disproportionate excess of fixed nonsynonymous substitutions between D. pulex and D. arenata (G = 4.17, P = 0.04), consistent with adaptive divergence between the two species. Scavenger receptors are responsible for binding a range of foreign ligands (including bacteria) prior to cellular internalization, and are therefore are candidate subjects of host-pathogen coevolution. At the Scavenger 3 locus, we also observed that the proportion of silent polymorphisms in D. pulex was as high as silent divergence (π s =0.0715 and K s =0.0775 respectively), but that π a was within the normal range. In contrast, Scavenger receptors in Drosophila have a rate of silent polymorphism that conforms to neutral expectations, but a higher rate of non-synonymous diversity compared to the genome average [12]. The observed pattern of variation in the Daphnia Scavenger 3 locus could be due to the amplification of multiple loci, recombination between loci or balancing selection. Similarly, at the Chitinase 17 and Caspase 3 loci, these factors could drive the level of π s to be similar to K s , and Tajima's D to be positive and significant. Thus, these three loci may be interesting targets for future population genetic studies examining the effects of host-pathogen co-evolution.
Finally, the results of the single locus MK test indicates that a non-immune gene, the MPI locus, deviates from neutral expectations due to an excess of segregating nonsynonymous sites, which might be maintained by balancing selection. The negative value of α at this locus supports this hypothesis. However the deviation from neutral expectations could also be caused by weakly deleterious segregating alleles. The present study is unable to distinguish between these two hypotheses. Nonetheless, studies in the acorn barnacle (a crustacean) have shown that MPI variation is differentially selected between distinct intertidal thermal habitats [13]. The Daphnia sampled in this study span a large geographic distance (Table 5) and therefore may experience selective structuring among populations due to some unknown environmental gradient. Indeed, the non-synonymous sites segregating in D. pulex at the MPI locus are only found in the westernmost populations (GU and PH). Since this is the only locus where polymorphisms segregate in this manner, this pattern is not due to population subdivision. Further sampling at this locus in conjunction with an assessment of allelic fitness are needed to test if the MPI locus is under balancing selection in Daphnia, as it appears to be in the acorn barnacle.

Conclusions
We found that, as a class, immune system genes show more adaptive evolution than non-immune system genes, supporting the hypothesis that host-parasite coevolution is an important force in shaping genomes. As in Drosophila, this result appears to be driven by a few genes within the class. Our results also confirm that immune system genes experience a broad gradient of selective pressures and highlight that the targets of selection can differ among taxonomic groups. We have also identified two candidate loci under balancing selection and one under directional selection, demonstrating that the mode of action of selection in host-pathogen coevolution is variable.

Model system
Origin of samples, RNA extraction, cDNA synthesis Daphnia are small aquatic crustaceans. Their primary mode of reproduction is via cyclical parthenogeneis, although some clones have lost the ability to reproduce sexually and only undergo clonal propagation. Individuals isolated from natural populations can be clonally propagated in the lab. For this experiment, a single Daphnia pulex clone was collected from 12 different populations from across North America. Two individual D. parvula clones were collected from each of 2 ponds in Ohio (Table 5). Total RNA was extracted from five D. pulex clones or 10 D. parvula clones (which are considerably smaller in size) per population using the RNeasy Midi Kit (Qiagen), following the manufacturer's protocol, and further purified with RNAase-free DNAase (Promega). cDNA was synthesized in 20ul volumes using the Promega RT system (A3500) and subsequently diluted five-fold in ultrapure water. Two ul of this solution was used in a 20ul PCR reaction.

Primer design, PCR amplification, sequencing, cloning
Specific primers for all genes were designed using Primer 3 [14]. The PCR mix was as follows: 1X buffer (10 mM Tris-HCl, pH 8.3, 50 mM KCl), 1.25 mM MgCl 2 , 10uM dNTPs, 0.2uM primers, 1U Taq polymerase. PCR conditions were as follows: 94°C for 3 minutes followed by 10 cycles of 94°C for 30 seconds, 65°C for 20 seconds (−1°C/ cycle), 72°C for 2 minutes. This was followed by 33 cycles of the following: 94°C for 30 seconds, 55°C for 20 seconds, 72°C for 2 minutes. A final extension step of 72°C for 3 minutes was carried out. The presence of a product of the expected size was confirmed on a 1% agarose gel.
Prior to sequencing, PCR products were cleaned by treatment with shrimp alkaline phosphatase and exonuclease I. Treated PCR products were sequenced directly in both directions using the gene specific primers in a 10 μl mix containing 1.0 μl Big Dye (Amersham), 3.0 μl 5X buffer (Amersham), 0.5 μl primer and 1-3 μl of PCR product. Sequences were aligned to the genome sequence to verify that the correct gene had been amplified.
Sequences for five of the genes (Caspase 8, PPO, Scavenger 4, Toll 1 and Eukaryotic translation initiation factor 2γ) displayed hallmark features of indels (a 'clean' sequence followed by sequence with multiple peaks, which always began at the same nucleotide position, regardless of the sample). All PCR products with such a sequencing profile was cloned into the TOPO TA vector (Invitrogen), following the manufacturer's instructions. Six of the resulting colonies were randomly chosen, added directly to a PCR mix as above, except that plasmid specific primers M13F and M13R were used. Each resulting PCR product was then sequenced as above. All sequences are available under GenBank accession numbers [JQ723152-JQ723164;JQ856341-JQ856992].

Sequence and data analysis
Primer and vector sequences were removed from all sequences, and then aligned and edited by eye using SeqMan (DNAstar). All nucleotide positions were visually inspected to identify heterozygous peaks. We obtained pseudohaplotypes from the heterozygous sequences using the program PHASE v. 2.1 [15]. We calculated the number of synonymous and nonsynonymous polymorphisms and fixed differences using DNAsp (v5.10.01) [16]. Furthermore, we used DNAsp to calculate within species synonymous (π s ) and nonsynonymous (π a ) diversity, and between species synonymous (K s ) and non-synonymous (K a ) divergence.
Directional or balancing selection will alter allele frequencies from neutral expectations. Tajima's D statistic Also using DnaSP, we conducted a McDonald-Kreitman (MK) test for each locus, which compared the observed ratios of polymorphism within a species and divergence between species to those expected under a neutral model [10]. All tests were corrected with a Benjamini-Hochberg False Discovery Rate procedure. We also conducted a multilocus MK test using the software MKtest v3.2 (modified from [17]), which estimates the proportion of non-synonymous divergence due to adaptive evolution (denoted α) for specified gene classes. We constructed the following four models: the null model (M0) assumes a single value of α=0, M1 also assumes a single value of α, but it is estimated from the data, M2 where α is estimated at each locus and finally M3 estimates a separate value of α for each of the two gene classes (immune versus non-immune).
For each model, we only varied the parameter α, and kept the remaining parameters constant as follows: the neutral diversity per site (θ=4N e μ) took a single value at all loci for each of the two Daphnia species, and finally, the proportion of mutations at non-synonymous sites that are under strong purifying, positive or no selection (f ) was estimated at each locus. For each model, we calculated Akaike's Information Criterion corrected for small samples (AICc), the difference in AICc between each model and the model with the minimum AICc (Δι), and the Akaike weight (wi) [18]. We tested for the model with the best fit directly with the likelihood ratio test as well as wi. To calculate the 95% bootstrap intervals of the α estimates for M3, each gene class was bootstrapped 10 000 times and the 2.5 and 97.5% quantiles of the resulting distributions were determined. To test whether the α values estimated in M3 were significantly different from one another, we generated 10 000 permutations in which the loci were randomly assigned to one of two gene classes, and values of their respective α were estimated. We then determined if observed |α IMMUNEα NONIMMUNE | was greater than 95% of the |α IMMUNEα NONIMMUNE | from permuted dataset.
Ten of the loci examined are known housekeeping genes, and thus were automatically classed as nonimmune system genes. Single copy genes that were known to be involved in the immune systems of other invertebrate taxa (i.e. Pelle, MyD88) and genes from multigene families that had no known function other than in immunity (i.e. Gram-negative binding proteins, TEPs, and Toll receptors) were classified as immune system genes. However, some of the genes belong to multigene families (i.e. Caspases, Chitinase and Scavenger receptors), which are known to have roles in nonimmune as well as immune pathways. Thus, we used published phylogenies [8] to guide our classification of these gene family members. Any Daphnia gene that was in the same clade as a gene that had a empirically determined functional role in the immune system were classified as immune, while others were classified as non-immune. As functional data on many of these genes is sparse, this approach is conservative, as many putative immune genes will be classified as non-immune genes. In the case of inducible nitric oxide synthase (iNOS), where Daphnia has two gene copies, both were classified as putative immune system genes.
Since α is calculated as the proportion of nonsynonymous substitutions fixed by selection, it is inherently dependant on the number of nonsynonymous substitutions fixed by random genetic drift. Thus, two loci having an identical number of adaptive amino acid changes could have very different values of α if drift had fixed different numbers of nonsynonymous substitutions, i.e. if constraint also differs between the genes. Therefore, an alternate parameterisation was also used, in which "a" is the estimate of the absolute number of adaptive nonsynonymous substitutions/nonsynonymous site [19]. Thus, for each locus in M2 we estimated a, as described by the MK-Test User Guide. These values are ranked to assess which genes were experiencing the greatest evolution due to positive selection. Positive values of both α and a reflect the amount of positive sequence evolution, while negative numbers will result from sampling error or from the presence of mildly deleterious mutations segregating at low frequency. We attempted to correct for the effect of mildly deleterious alleles by removing all alleles that were segregating at a frequency of less than 10% prior to all analyses [20].