Autoimmune diseases are common in industrialized societies, collectively reaching a prevalence of 5% in populations with European ancestry . Epidemiological studies have indicated that the incidence of these disorders has been steadily increasing during the last decades in the industrialized world. Therefore, much scientific debate has addressed the role of human evolutionary history and adaptation in shaping the genetic predisposition to the development of autoimmunity . For any single autoimmune condition, more than 50% of the disease risk is heritable  and GWASs have unveiled the role of several common risk variants, possibly reflecting an allelic architecture for autoimmune disease that matches a "common variant/common disease" model more closely than observed for other traits . From the evolutionary perspective, this raises interesting questions on the forces responsible for the maintenance of disease alleles in populations (reviewed in ). Since the evolutionary history of autoimmune-related alleles is only beginning to be investigated, our knowledge is still relatively limited in this field. Specifically, a subset of risk alleles for CD and UC has previously been shown to have evolved in response to pathogen-driven selective pressures, the underlying scenario for some of them being balancing selection . More recently, several disease alleles for autoimmune conditions were reported to display signatures of directional selection in favour of the risk alleles . Finally, two common variants for T1D, an early-onset, potentially lethal disease, have been described as neutrally evolving . Albeit limited to a small number of variants, these data do not support the notion whereby autoimmune phenomena have acted as a selective pressure strong enough to affect the frequency distribution of risk alleles, although some authors have speculated that balancing selection at innate immunity genes might stem from a tuning of response to pathogens and to self . The identification of a set of variants with an opposite risk profile for different autoimmune conditions prompted the speculation that these variants might be targets of balancing selection possibly deriving from antagonistic pleiotropy. Therefore the question is: if neither allele can be considered medically favourable, are they both evolutionary beneficial under specific circumstances? Our data indicate that 4 out of ten gene regions we analysed have been subjected to balancing selection. Additionally, IL18RAP has previously been indicated as a possible selection target . Similarly, a signature of balancing selection has previously been described at the promoter region of IL10 , suggesting that the selected variant might be different from the opposite risk allele which is located downstream the transcription end site. The evidences we report herein for the TAP2, TRIM10/TRIM40, and CDSN/PSORS1C1 gene regions are all consistent with the hypothesis whereby genetic variability is maintained at these loci by long-standing balancing selection. In the case of BTNL2 and TAP2, we addressed the possibility that LD with HLA class II genes influences the results we obtained, as genetic hitch-hiking can potentially affect neutral diversity over long genomic regions. Yet, the sliding window analysis of nucleotide diversity across the region encompassing these two genes and HLA class II loci indicated that peaks at BTNL2 and TAP2 are flanked by regions with lower diversity, suggesting that these two genes represent independent (from class II loci) selection targets.
Conversely, we identified no selection signature for the remaining 6 genes, namely ZSCAN23, PTPN22, HLA-DMB, VARS2, C6orf47, and BAT3. In all these cases nucleotide diversity was within average values and no test significantly deviated from the null hypothesis of neutral evolution. One possibility is that opposite risk SNPs in these regions do not represent causal variants but genetic markers and, therefore, that natural selection might be acting in a region different from the one we resequenced. Yet, this is unlikely to be the case for PTPN22, as the opposite risk variant has been shown to be functional. The derived 602W allele, which segregates at low frequency in Caucasian populations and is extremely rare outside Europe, confers to the phosphatase a stronger ability to inhibit the T cell receptor signalling pathway ; this allele has been associated with T1D, RA and other autoimmune manifestations . Conversely, the ancestral allele has been associated with a higher risk of developing CD. Our analysis of the region carrying the R602W variant did not unveil any molecular signature of natural selection, as all tests were consistent with neutrality. Nonetheless, the power of most tests is strongly influenced by the timing and strength of the selective pressure; it has been suggested that the 602W allele has risen in frequency in some European populations as a result of natural selection, as its frequency seems to increase with latitude . An extremely recent selective event in these populations would not be detected using our approach.
The four targets of balancing selection we identified in this study are all located within the xMHC and have different molecular functions. TAP2 is a central component of the antigen processing pathway; the protein products of TAP1 and TAP2 interact to form a transporter complex (TAP) that translocates antigenic peptides to the endoplasmic reticulum (ER) where loading onto MHC class I molecules occurs. Therefore, inhibition of TAP has been exploited by different viruses as an immune evasion strategy (reviewed in ). Specifically, proteins encoded by herpesviruses (HSV), human cytomegalovirus (HCMV) and Epstein-Barr virus can block TAP function, limit the supply of peptides to the endoplasmic reticulum and therefore inhibit MHC class I maturation. Interestingly, the ability of HSV- and HCMV-encoded proteins to block TAP function is species-specific, suggesting that viral products have co-evolved with the TAP molecules of their host species. This observation also suggests that TAP1 and TAP2 may be subjected to a selective pressure exerted by viruses to avoid binding of inhibitory proteins. The TAP2 gene portion we analysed herein is subjected to a haplotype-specific alternative splicing event that results in two molecular forms differing at the C-terminus; our data show that the two haplotypes associated with alternative splicing are maintained by balancing selection. Whether the two protein products are differentially sensitive to viral inhibitors is presently unknown, but a previous report has indicated that they display marked differences in the translocation efficiency of specific peptides. This difference may have an effect on the susceptibility to specific viral infections and the Ala665Thr variant (rs241447), which is associated with alternative splicing, and located on the major branch leading to clade b (Figure 4A) has been associated to altered susceptibility to HIV-1 infection .
Two genes encoding ER aminopeptidases that trim peptides translocated by TAP have recently been shown to be subjected to long-standing balancing selection with polymorphic variants conferring resistance against HIV-1 . It is therefore tempting to speculate that maintenance of genetic diversity at genes in the antigen processing pathway derives from the differential activity of diverse alleles for specific peptides, eventually leading to distinctive repertoires of antigens presented to CD8+ T cells and, possibly, altered susceptibility to specific infections. Nonetheless, it is worth noting that the allele with opposite risk profile (rs10484565) has a low MAF (frequency of minor allele < 0.10) in all populations and our data suggest that it is not (and is not in linkage with) the balancing selection target.
TRIM10 and TRIM40 code for two members of a large family of tripartite motif-containing (TRIM) proteins. While several TRIM proteins have been shown to act as antiviral factors, the role of TRIM10 and TRIM40 is virtually unknown, although TRIM10 may have a role in erythropoiesis . A recent GWAS for host genetic factors involved in HIV control identified a SNP (rs9468692) located in 3'UTR of TRIM10, within the region we analysed. Our resequencing data indicate that this SNP is triallelic with T/G alleles segregating in CEU and YRI, and A/G in EAS. This is clearly shown in the network analysis: variant 74 defines two different haplotype clusters one containing African and European chromosomes and the other Asian haplotypes only. The variant involves a CpG dinucleotide suggesting that the triallelic status is determined by a two-hit mutation process on different haplotypic backgrounds and involving deamination of 5-methyl cytosine in the case of the A/G alleles. While the description of a triallelic variant might have a relevance for future association studies in populations with different ancestry, its location in the network suggests that the site is neutrally evolving. Conversely, the Thr183Met variant (rs757262, variant 19 in Figure 5A) with opposite risk profile (Table 1) is in strong LD with another nonsynonymous variant (Glu215Lys, rs757259, variant 26 in Figure 5A) and both SNPs separate the two major haplotype clades, suggesting that they may represent the selection target(s). The limited knowledge on the biological function of TRIM10 and TRIM40 does not allow extensive speculation on the selective pressure responsible for maintaining nucleotide diversity at these genes and further functional studies will be required to determine whether it is virus-driven or not.
The third gene region which we found to be subjected to balancing selection covers part of CDSN and PSORS1C1. A previous work using a different dataset has also suggested that CDSN might be a balancing selection target . CDSN and PSORS1C1 the two genes are transcribed in the opposite direction and the whole coding region of corneodesmosin (CDSN) is comprised within the first intron of PSORS1C1 (also known as SEEK1). Both transcripts are widely expressed http://biogps.gnf.org and have been associated with psoriasis in several studies. Specifically, CDSN is up-regulated in psoriatic lesisons  and one psoriasis-associated allele (rs1062470, variant 5 in Figure 6A) affects the binding of an unknown cellular factor, resulting in increased transcript stability . This SNP is located on the network branch separating the two major haplotype clades, suggesting that it may represent the selected variant. Similarly, one of the two SNPs (rs3130981, Asp527Asn) with an opposite effect on autoimmune diseases defines a major haplotype group within clade b, indicating that it may represent (or be in linkage with) a selection target (in a multiallelic selection regime).
The role of the protein product of PSORS1C1 is presently unknown, although one SNP in PSORS1C1 immediately downstream the region we analysed has been associated through GWAS with white blood cell counts  and a second downstream variant with delayed AIDS progression . Conversely the role of CDSN is well studied as the gene encodes corneodesmosin, an adhesive protein which participates in the stabilization of corneodesmosomes in the skin and other cornified squamous epithelia. Loss of coreneodesmosin in humans results in generalised peeling skin disease, a condition characterized by skin barrier defects, pruritus and atopy, with patients also showing Staphylococcus aureus superinfections . Mice lacking CDSN display a severe phenotype and the skin barrier defect is accompanied by a 10-fold increase in transepithelial water loss. These observations highlight the central role played by the epidermal barrier in protection from infections and in water homeostasis, two processes that are though to have been targeted by natural selection during human evolutionary history. Therefore, SNPs in CDSN may have been maintained by balancing selection due to their modulation of the skin barrier properties. Still, the observation that humans with genetic defects in CDSN also display food allergies  suggests that the protein may play a role in processes unrelated to skin function.
Finally, BTNL2 encodes a butrophilin-like, widely expressed protein with still poorly understood function. Experiments in mice have suggested that BTNL2 might function as a negative co-stimulatory molecule with inhibitory activity on T cell activation [44, 45]. In particular, a possible modulatory role for the protein in intestine inflammation has been proposed in this animal model, in line with the observation that a SNP (rs9268480) in the region we analysed has also been associated with UC .
The haplotype genealogy of the BTNL2 region we resequenced is peculiar with one deeply separated clade limited to EAS subjects. Previous studies have suggested that such tree topologies might originate from admixture of anatomically modern humans with archaic hominin populations (reviewed in ) rather than by a selective force; yet, a TMRCA > 5 MY may be unlikely even under ancient admixture scenarios . Therefore, we suggest that the distantly related clade is maintained by a selective process acting on BTNL2 or on the nearby class II MHC loci. As reported above, the coalescence time of the two more closely related clades is also deep and several nonsynonymous variants are located on the major branches (Figure 7A). Specifically, 3D modelling of BTNL2  indicated that the Pro285Leu, Met286Ile and Pro299Gln variants are all located in close physical proximity within the IgC domain and adjacent to cystein residues involved in disulfide bonding. These variants are postulated to be functional  and can be regarded as potential selection targets. Conversely both the opposite risk alleles and the UC susceptibility variant identify haplotype subsets within the major clade (Figure 7A).
Therefore, even in those regions where we did identify balancing selection signatures, the opposite risk alleles do not always represent (or are in tight linkage with) the selected variants. Rather, our data suggest that, with the exclusion of rs757262 and rs3130981 in TRIM40 and CDSN, opposite risk SNPs are likely to have accumulated as neutral variants, possibly maintained through hitch-hiking with the balanced polymorphisms. An alternative possibility is that, as in the case of the gene regions that we described as neutrally evolving, weaker and more recent selective events have been maintaining the opposite risk alleles. In this respect it is worth noting that the haplotype structure of the gene regions we analysed is often complex, raising the possibility that the opposite risk profile identified for some of these SNPs is secondary to association with another causal variant which is not analysed in the association study. In the case of BTNL2, for example, we were able to include two opposite risk alleles in the network analysis; based on the risk profiles of variants 28 and 100 (rs2076530 and rs3129953, respectively, Table 1) we were able to identify a set of haplotypes that should confer increased risk for ATD (they carry 2 predisposing alleles) and a group of haplotypes that predispose to MS (again with two risk alleles) (Figure 7A). This means that rs2076530 and rs3129953 may simply define haplotype subsets that carry causal variants for MS and ATD but when these variants are not typed in the association study, the haplotype structure is such that the derived allele of at position 100 will be under-represented among MS patients for example, resulting in an apparent protective effect. A similar situation might apply to CDSN, as well (Figure 6A).