Evidence of balanced diversity at the chicken interleukin 4 receptor alpha chain locus

Background The comparative analysis of genome sequences emerging for several avian species with the fully sequenced chicken genome enables the genome-wide investigation of selective processes in functionally important chicken genes. In particular, because of pathogenic challenges it is expected that genes involved in the chicken immune system are subject to particularly strong adaptive pressure. Signatures of selection detected by inter-species comparison may then be investigated at the population level in global chicken populations to highlight potentially relevant functional polymorphisms. Results Comparative evolutionary analysis of chicken (Gallus gallus) and zebra finch (Taeniopygia guttata) genes identified interleukin 4 receptor alpha-chain (IL-4Rα), a key cytokine receptor as a candidate with a significant excess of substitutions at nonsynonymous sites, suggestive of adaptive evolution. Resequencing and detailed population genetic analysis of this gene in diverse village chickens from Asia and Africa, commercial broilers, and in outgroup species red jungle fowl (JF), grey JF, Ceylon JF, green JF, grey francolin and bamboo partridge, suggested elevated and balanced diversity across all populations at this gene, acting to preserve different high-frequency alleles at two nonsynonymous sites. Conclusion Haplotype networks indicate that red JF is the primary contributor of diversity at chicken IL-4Rα: the signature of variation observed here may be due to the effects of domestication, admixture and introgression, which produce high diversity. However, this gene is a key cytokine-binding receptor in the immune system, so balancing selection related to the host response to pathogens cannot be excluded.


Background
The chicken represents one of our most important sources of food protein worldwide but remains a potential threat to human health as a reservoir for diseases and foodborne pathogens. Emerging diseases such as avian influenza [1] provide a new impetus to investigate chicken immunity -in particular the relationship between population diversity and disease susceptibility.
The geographic distribution, population densities and disease epidemiology of chickens is likely to have changed dramatically since their domestication, undoubtedly shaping their genetic diversity. Novel diseases and increased incidence of infection would have challenged the chicken immune response, necessitating adaptive evolution at key immune genes. Evidence for such adaptation is found in the sequence conservation of immunityrelated genes, the lowest of any functional category [2], and in several studies reporting the association of allelic variation at particular immune genes with susceptibility to infection. For example, different alleles at the chicken MHC-B locus are known to alter susceptibility to a diverse array of diseases [3]. Genes such as the chicken Mx gene, which determines susceptibility to the myxovirus [4], have been shown to be subject to selection [5,6]. Genes involved in the immune system therefore represent appealing candidates for examining the selective processes shaping genetic diversity. Knowledge about the nature of selection acting on a gene can illuminate their evolutionary history and can provide insight into the complex relationship between diseases and genes [7].
New large-scale sequencing projects in several avian species, for instance the zebra finch genome project http:// songbirdgenome.org, now allow the genome-wide comparative analysis of avian genes and the detection of selection on a wider scale. Approximately 20% amino acid changes between chicken and zebra finch have been fixed by positive selection [8], so by comparing coding sequences (CDS) between these birds, chicken genes with signals suggestive of adaptation can be identified.
In this study, we report that the chicken interleukin receptor 4 alpha chain gene (IL-4Rα) showed a relative excess of nonsynonymous substitutions and may be subject to selection. It is associated with disease: for example, its expression is downregulated by the avian influenza virus during infection [1]. The human ortholog of this gene encodes a transmembrane receptor for IL-4 and IL-13, both of which are key immune system cytokines that initiate signalling pathways in the inflammatory response to infection [9]. The IL-4Rα gene was resequenced in 70 Asian and African village chickens, 20 commercial broilers, and in 6 closely related species: red, grey, Ceylon and green jungle fowl (JF), bamboo partridge and grey franco-lin. High allelic variation at this gene appeared to be balanced at two nonsynonymous SNP sites in particular. Although this may enhance immune system variability in response to challenges by pathogens, a consequence of the complex domestication history of the chicken is that introgression, multiple origins and migration are likely to have altered the pattern of diversity at this locus, complicating selection signatures.

Identifying candidate genes subject to selection
As the most extensively sequenced other bird species, all available zebra finch genes were compared with the chicken genome. This was achieved by clustering [10] validated zebra finch mRNAs and expressed sequence tags, then using chicken protein sequences to search this zebra finch database with Blastx, [11] and subsequently implementing T-Coffee [12] to generate 3,653 pairwise CDS alignments from the Blastx best-hit pairs (for details see supplementary methods).
Pairwise d N /d S (ω) was calculated for each CDS alignment using the codeml implementation of the PAML 3.15 package [13]. If synonymous and nonsynonymous mutations are neutral, the relative rates of each are expected to be equal so that ω = 1 [14]. Departures from this, where ω > 1 (d N > d S ) suggest that nonsynonymous mutations are advantageous, and are maintained under directional selection. If ω < 1 (d N <d S ) then the nonsynonymous SNPs may be deleterious since they are not preserved and are likely to be subject to purifying selection. We compared ω by maximum likelihood under two different models: a neutral model where ω was fixed = 1, and a model where ω was free to vary. These models were compared using a likelihood ratio test (LRT) to determine if the variable model was significantly better at explaining the data [13].
As a consequence of this conservative strategy of calculating ω across the entire gene length, genes may be discounted when the signal of directional selection is focused on specific regions or domains and thus obscured by purifying selection operating on the majority of the gene [15]. Many genes known to be subject to positive selection have 0.5 <ω < 1 [16], so using a lower cut-off point to identify candidate genes that may be subject to selection can be effective. Accordingly, chicken-zebra finch alignments with ω > 0.5 where the variable model was significantly favoured (p < 0.05) were identified. The annotation associated with the best human orthologs from the Panther database [17] was used to identify the function of chicken genes with relevance to the immune system.
The chicken IL-4Rα mRNA sequence (Refseq ID: XM_414885), initially determined by Boardman et al. (GenBank accession: CR407301) and Caldwell et al. [18], PCR primers were designed using Primer3 software http:/ /frodo.wi.mit.edu and were constructed by VHBio, UK http://www.vhbio.com. The details of the primer sequences and optimal parameters for their usage are available in Additional file 2 (Table S1). Each amplicon was amplified according to the PCR cycle setup (Table S2 in Additional file 2): 8 were successfully amplified for all 96 samples (Figure 1). The use of this large sample size increases probability of identification of a target of selection [21]. The forward and reverse PCR product sequences were determined by Agowa, Germany http:// www.agowa.de.

Sequence assembly
Sequencing reads were assembled into contigs using the Phred-Phrap-Consed-Polyphred pipeline programs http:/ /www.phrap.org/phredphrapconsed.html Phrap v0.990319 and Phred v0.020425.c [22,23]. Bases were called with Consed [24] using P(base is correct) = 1-10 -S/ 10 , where S the base quality score [25]. Any bases with S < 20 were not included in the analysis, so all bases had at least a 99.0% probability of being correct: most had S ≥ 40 (99.99%). Only SNPs with high probability of being accurate (polyphred ranks 1, 2 or 3) were used in further analyses, and only SNPs in polyphred rank 1 were used for the outgroup samples. Polyphred version 5.0 [26,27] was used to assemble the data for further processing.
A list of the genotypes for each sample was collated and PHASE [28] was used to infer missing haplotypes. These assigned haplotypes were cross-referenced with haplotypes generated by Arlequin [29] to ensure consistencyboth were identical. Any sequence sites with inadequate Gene structure of IL-4Rα coverage across populations or continents, which had sub-standard base quality scores, or had insufficient coverage for either forward or reverse sequences, were removed -leaving a total of 5,298 bp for further analysis.

Data analysis
DnaSP 4.0 [30,31] was used to analyse the polymorphic characteristics of the data and to perform a series of population genetic analyses. The numbers and types of SNPs were assessed. Nucleotide diversity was measured using π, the average number of nucleotide differences between sequences pairs [32]. The haplotype diversity (Hd, the number and frequency of haplotypes in the sample) [33], the number of haplotypes, Kelly's Z nS [34] and θ W = 4N e μ [35] were determined. The four gamete test for the minimum number of recombination events (R M ) [36] and R (the degree of recombination) [37] were calculated, as was the GC content.
A set of summary statistics were used to identify departures from neutrality using coalescent simulations: Fu and Li's D and F [38], Tajima's D [39], Fu's F s [40], Fay and Wu's H [41]. These tests were performed using DnaSP for 10 3 replicates with the following parameters estimated from the resequenced data: numbers of genotypes, segregating sites, total sites, sample sizes and rate of recombination. These simulations generated empirical distributions with which the statistical values were compared to determine the extent of their deviation from neutrality. It is an indication of non-neutral evolution if the observed values lie at the extremes of the distribution.
The McDonald-Kreitman test [44] was implemented using DnaSP to examine the relative ratios of fixed and non-fixed nonsynonymous differences to fixed and nonfixed silent changes, which can indicate the presence of non-neutral evolution. Significance was based on a twotailed Fisher's exact test.

Selection at IL-4Rα among avian species
To investigate for evidence of selection in IL-4Rα between chicken and each of the 6 outgroups, CDS alignments were generated and ω was determined under a variety of models using codeml [13]. For this analysis, a chicken sequence from the most numerous haplotype was used (FJ542575). Although the chicken coding haplotypes observed at IL-4Rα were diverse, substituting this for other chicken genotypes yielded no significant changes to results, except at certain sites with model M8 for a divergent sample (FJ542675). The PAML models implemented here are sensitive to low numbers of sequences [45].
The free-ratio (M1) model was used to calculate tree branch lengths and ω for each species lineage in the sample. To identify specific codon sites with evidence of selection, site-specific models estimated ω for each site across the whole sequence by using a random sites model under Bayes empirical Bayes (BEB) [13,46,47]. fixed ω values are permitted: ω 0 = 0 (conserved) and ω 1 = 1 (neutral). For M2a, a variable model, these two classes are used with an additional class (K = 3) where ω is freely estimated to allow for deviations from neutrality. Similarly, M7 is a neutral model that models K = 4 site classes sampled from a β-distribution, all of which have 0 ≤ ω ≤ 1. Variable model M8 has the same four β-distributed classes as M7 with an additional class where ω > 1 (K = 5).
A LRT was conducted between the paired neutral and variable models (M1a vs M2a, M7 vs M8). BEB was used to determine the posterior Bayesian probability of ω for each amino acid site: a significantly high posterior probability for this variable ω class suggests that a particular site is under selection, if ω > 1 and M8 (or M2a) is significantly favoured by the LRT [48]. Candidate positively selected sites from M8 were examined using PMut to assess the functional impact for each nonsynonymous substitution.

Confirming the signature of selection at IL-4Rα
Tests on 3,653 chicken and zebra finch CDS pairwise alignments identified genes with ω > 0.5 where the variable model was significantly favoured from (Table S3 in Additional file 2; Additional file 3). From these, IL-4Rα was selected for further analysis because of its critical function in the immune response, including an implicated role in the anti-viral response [1]. Interestingly, another chicken immune gene identified by the pairwise comparison method (Progesterone-induced blocking factor) had a human ortholog that binds IL-4Rα.
IL-4Rα was resequenced in 7 closely related bird species: chicken, red JF, grey JF, Ceylon JF, green JF, grey francolin and bamboo partridge. An excess of nonsynonymous compared to synonymous substitutions was observed in all birds except red JF (Table S4 in Additional file 2). Branch-specific models of evolution, implemented in PAML [13] were used to investigate evidence of selection among the sequenced lineages. Using the free-ratio model, the branch leading to the Gallus genus was determined to have a high ω value (0.92) (Additional file 4), though this cannot be taken as strict evidence of positive selection. Consequently, site-specific models were implemented to investigate whether particular codon sites contributed to the evidence of selection. Model M8, one the most conservative models of site-specific evolution was determined to be significantly more favoured in comparison to the neutral M7 model (p = 5 × 10 -23 ; Table 1). Bayes Empirical Bayes (BEB) was used to estimate the proportion of sites under positive selection: 48 (9.8%) of the sites had ω > 9.5, values much greater than that expected under neutrality [47]. Under M8, 28 sites were identified with a BEB posterior probability of at least 95% for ω > 1 (Table 2). There were substitutions between the chicken and red JF sample or genome sequence at 6 of these sites (5, 517, 547, 590, 628 and 665). PMut found 4 substitutions at these sites would have a neutral effect on protein structure (Table S5 in Additional file 2).

SNP and Population diversity
Of the 100 SNPs observed among the chicken populations, 7 were singletons. In protein-coding regions 17 SNPs were observed: 10 were nonsynonymous and 7 were synonymous. Assuming red JF was the primary ancestral origin of diversity at this gene, some replacement mutations between red JF and chicken are potentially associated with the domestication process. In the chicken 7 nonsynonymous substitutions were identified as segregating at high frequencies (55% or more): F5L, L520P, S590G, L594R, M665R, S670Y and T692S (Table 3, Additional file 5).
The generation of median-joining networks ( Figure 2) illustrated a high degree of allele diversity among samples and little geographical structuring among populations. The number of genetically divergent high-frequency haplotypes showed a trend of balanced diversity (Figure 2). When only the nonsynonymous SNPs were examined, an interesting pattern of dominant haplotypes emerged (Figure 3); when silent SNPs were included, recombination obfuscated these groups (Additional files 6 &7). Four haplotypes containing 81% of the 180 genotypes were characterised by substitutions at two sites: F5L and L520P. The 4 alleles possible at these 2 sites (F-L, F-P, L-L and L-P) were present in all 8 populations. No single variant was domi-nant among the samples: 32 were F-L, 38 were F-P, 46 were L-L and 64 were L-P. Both sites 5 and 520 showed evidence for positive selection in the site-specific test in codeml (Table 2, Additional file 8). Here, red JF and chicken both shared L520 and P520 alleles as well as F5, but L5 was unique to chicken.
The feature of high population diversity and low geographic partitioning in the networks was apparent in the analysis of variation using AMOVA with the Arlequin package [29]. This assessed the extent of partitioning of diversity at different levels of population structure. Most variation lay within the populations (94.1%, p < 1 × 10-5), a trend seen in other studies of chicken populations [48,49]; the remainder partitioned between the populations (1.8%, p = 0.060) and the continents (4.1%, p = 0.033).

Summary statistics and tests of neutrality
There was further evidence for the trend of elevated allelic diversity: 115 haplotypes were observed in just 180 genotypes. This was reflected in the high Hd value, a statistical measure of haplotype diversity (Table 4). Fu's F S was highly negative, signifying an excess of rare alleles. Nucleotide, haplotype and SNP diversity were all higher in Asia than in Africa as expected, despite sampling fewer birds in Asia (30) than in Africa (40).
The significantly positive Tajima's D in Asia and Africa (Table 4) and in each of their populations (Table S6 in Additional file 2) was paralleled by a highly negative Fay and Wu's H, an indicator of an excess of derived alleles. Together, these metrics suggest a clear tendency for alleles to rise to mid-or high-frequency levels. Tests on the protein-coding portion of the gene alone indicated a significantly negative Fay and Wu's H (-3.02, p = 0.04) and a less positive Tajima's D (0.61); the latter may be a consequence of stronger conservation in coding regions, which appears to limit diversity, except at sites 5 and 520.
Moderate recombination was detected at IL-4Rα: for the given value of the recombination rate R, coalescent simulations showed the minimum number of recombination
Evidence of non-neutral evolution was evident from the McDonald-Kreitman [44] test results. The McDonald-Kreitman test examines the relative ratios of fixed and nonfixed nonsynonymous differences to fixed and non-fixed silent changes between species. Purifying selection may explain a rate of fixation of nonsynonymous differences much lower than that for silent substitutions. Alternatively, if there is a significant excess of fixation of nonsynonymous changes compared to silent ones, then directional selection may be present. The chicken genotypes were tested against the red JF genome sequence and also against each of the outgroup samples. Both tests showed an excess of fixed nonsynonymous substitutions (p = 0.002 with the genome sequence, p = 0.040 with all the outgroup samples; Table 6), indicating that selection may have affected the evolution of this gene.

Identifying IL-4Rα as a candidate for resequencing
A pairwise comparison of ω = d N /d S in chicken and zebra finch genes identified IL-4Rα as having an elevated rate of nonsynonymous substitutions, suggesting it as a candidate for positive selection [50], though relaxed selective constraint has been observed in other domestic species [51]. Due to an important role in the host immune response and evidence of selection in humans, IL-4Rα was resequenced in 6 closely related birds and subsequently in 70 global village chickens and 20 commercial broilers. An analysis of sequence data from these 6 related species identified a large number of sites likely to be subject to positive selection, supporting the initial detection of IL-4Rα as a candidate gene undergoing adaptive evolution. Probable confounding factors in these results, however, are the complex domestication history of these populations and high rate of recombination identified at this locus.   Median-joining network of chicken haplotypes for all SNPs The identification of chicken IL-4Rα is of particular interest given the vital role played by its human ortholog as a regulator of IgE production and T H 2 cell differentiation [52,53]. The critical role of human IL-4Rα in the immune response is evidenced by its differential expression during particular infections and its association between polymorphism and disease susceptibility; it facilitates gastrointestinal nematode clearance [54] and its expression is upregulated in response to HIV-1 infection [55]. Variation in human IL-4Rα has been shown to affect signal transduction [56] and to modulate T H 1/T H 2 balance in the blood [57], as well as contributing to various allergies [9] and to mumps virus infection susceptibility [58]. Selection at IL-4Rα in human populations may be driven by different T H 1 (viral and bacterial) and T H 2 (parasitical) immune responses to pathogens [52], and the dysregulation of such components of immunity may be associated with atopy [59].

The origin of diversity at IL-4Rα
Although nucleotide diversity at this gene (5.19 per kb) was comparable to that observed between red JF and domestic fowl (5.36 per kb on average) [2], the substantial excess of haplotypes was suggestive of non-neutral evolution. Despite this, the significantly positive Fu and Li's D and F values show that there was a relative deficit of singletons [39]. A deficit of rare alleles in commercial chicken lines has been observed in other studies comparing wild and standard breeds [65]. In this study, the Hd and Fu's F S values highlighted this rare allele deficiency in the commercial broilers, in contrast with the excess of haplotypes in the Asian and African samples. In addition, the significantly high R M values indicated that some recombinant alleles were present in the populations, implying either relaxed selective constraint or adaptive processes favouring allelic diversity.
Median-joining network of chicken haplotypes for nonsynonymous SNPs Figure 3 Median-joining network of chicken haplotypes for nonsynonymous SNPs. Populations are denoted in the legend. Branch lengths are proportional to the number of mutational differences between haplotypes. The outgroup samples are represented by the colourless nodes. V represents the green JF sequences; F the grey francolin; B the bamboo partridge; G the grey JF; C the Ceylon JF; R1 and R2 the red JF; and RJF the genome sequence.
Tajima's D compares the proportions of low-to mediumfrequency alleles and is an indicator of directional selection when negative, and balancing selection when positive [40] (Tajima 1989). Fay and Wu's H measures the relative frequency of derived alleles, which increases when there are more high-frequency haplotypes [42]. The observed surplus of mid-and high-frequency haplotypes at the IL-4Rα locus has generated highly significant D and H values that are more extreme than those observed by other studies of disease-associated chicken genes [6,48] however, D and H are likely to be affected by demographic aspects of chicken history and the samples pooling [66].
The networks were diffused into several divergent highfrequency haplotype clusters with high intra-population diversity. A distinctive set of balanced alleles was apparent when silent substitutions were removed. The signal of balanced diversity in the chicken populations appeared to centre around two nonsynonymous substitutions: F5L and L520P. All four variants at these two sites were segregating in the 8 populations surveyed at similar frequencies. Site-specific models of evolution identified both these sites as likely subject to selection across species.
An alignment of the chicken and human IL-4Rα protein sequences identified the amino acid positions orthologous to sites 5 and 520 in chicken (Additional file 1). The site orthologous to the latter is segregating in humans (C431R, rs1805012) [67,68] at an intermediate frequency of over 10% in the population [69], similar to the chicken polymorphism. Substitution C431R is in the cytoplasmic domain of the receptor and is linked with better survival from gliomas in humans [70]. The human amino acid position orthologous to chicken site 5 is conserved (F10) and is located in the signal peptide of the protein, indicating that the L5 chicken variant might affect activation of the receptor protein.
There is series of shared population genetic properties between chicken and human IL-4Rα that may be the result of equivalent functional roles for each. The genes possess comparable McDonald-Kreitman test results and positive Tajima's D values [49] and share orthologous high-frequency nonsynonymous SNPs (L520P and C431R). And given that several amino acid substitutions in IL-4Rα affect disease susceptibility in humans (see Franjkovic et al. [71]), the variability at nonsynonymous substitution sites in chickens is likely to be of biological importance.
The balanced and elevated variation and possible selective processes at chicken IL-4Rα may be in response to the common pathogens and the range of pleiotropic roles that the receptor plays in facilitating cytokine binding in the innate immune response. The trend of high diversity fuelled by balancing selection is seen in other chicken immune genes including MHC-B [3], Mx [6] and IL1B [48], which initially suggests that immune system genes may maintain high diversity in order to respond to a wide array of pathogens.
Another explanation for the observed elevated balanced diversity is that multiple domestications of red JF and genetic  3 Haplotypes. 4 Haplotype diversity. 5 Mean pairwise differences per kb. 6 Watterson's estimator per kb. Only p values generated by simulations < 0.05 are given; p > 0.05 are denoted "ns". The resequenced region length is 5,298 bp. introgressions of other JF have both enhanced and distorted variation at this locus. The lack of observed geographic structure, which has also been observed at other chicken genes, [48] may be in part a consequence of this. There are likely to have been multiple events of chicken domestication in South and South-East Asia [72][73][74]. And though the red JF is the main source of chicken genetic diversity [2,75], genetic introgressions have come from other wild JF [76]: possibly from Ceylon JF [77] and unambiguously at the yellow skin locus from grey JF [78]. Wild red JF and domestic village strains are closely related [50,79], indicating that introgressions of red JF may have continued after domestication.
Here, networks of IL-4Rα indicate that red JF is the most closely related wild relative to the domestic chicken. This does not exclude the possibility of multiple contributions of different genetic sources of JF. If admixture of different sources occurred sufficiently early through trading and migration [80,81] this may explain the presence of the four alleles at the two nonsynonymous sites in each population. Regardless of whether this signal of high and balanced diversity is from biological pleiotropy or from multiple origins, it is persisting, indicating that it may have an important role in current chicken immunity.

Conclusion
This study shows evidence for high and balanced diversity at the chicken IL-4Rα gene, which was initially identified through the evaluation of the rate of nonsynonymous to synonymous substitutions in pairwise comparisons of chicken and zebra finch orthologs. This strategy incorporated functional and literature information to detect a suitable gene for resequencing in African, Asian and commercial chicken samples, as well as in related JF and bird species. Haplotype networks, tests of neutrality and summary statistics indicated a signal of balanced nonsynonymous polymorphisms at two sites in the IL-4Rα gene. Networks showed that red JF is the primary source of diversity at this gene. The elevated and balanced diversity present in all the populations might be a result of the chicken's history of multiple domestications [72][73][74], introgressions [76][77][78] and subsequent admixture of different types [79][80][81]. However, the identification of two potentially functionally significant SNPs as fulcrums of the balancing signal suggest that the functions of IL-4Rα in the immune system may affected by selective processes for specific allelic variants in response to new pathogenic challenges during domestication.