Multiple selective sweeps of ancient polymorphisms in and around LTα located in the MHC class III region on chromosome 6

Campbell, Michael C.; Ashong, Bryan; Teng, Shaolei; Harvey, Jayla; Cross, Christopher N.

doi:10.1186/s12862-019-1516-y

Research article
Open access
Published: 02 December 2019

Multiple selective sweeps of ancient polymorphisms in and around LTα located in the MHC class III region on chromosome 6

Michael C. Campbell ORCID: orcid.org/0000-0003-0749-3445¹,
Bryan Ashong¹,
Shaolei Teng¹,
Jayla Harvey¹ &
…
Christopher N. Cross²

BMC Evolutionary Biology volume 19, Article number: 218 (2019) Cite this article

3169 Accesses
4 Citations
1 Altmetric
Metrics details

A Correction to this article was published on 16 December 2019

This article has been updated

Abstract

Background

Lymphotoxin-α (LTα), located in the Major Histocompatibility Complex (MHC) class III region on chromosome 6, encodes a cytotoxic protein that mediates a variety of antiviral responses among other biological functions. Furthermore, several genotypes at this gene have been implicated in the onset of a number of complex diseases, including myocardial infarction, autoimmunity, and various types of cancer. However, little is known about levels of nucleotide variation and linkage disequilibrium (LD) in and near LTα, which could also influence phenotypic variance. To address this gap in knowledge, we examined sequence variation across ~ 10 kilobases (kbs), encompassing LTα and the upstream region, in 2039 individuals from the 1000 Genomes Project originating from 21 global populations.

Results

Here, we observed striking patterns of diversity, including an excess of intermediate-frequency alleles, the maintenance of multiple common haplotypes and a deep coalescence time for variation (dating > 1.0 million years ago), in global populations. While these results are generally consistent with a model of balancing selection, we also uncovered a signature of positive selection in the form of long-range LD on chromosomes with derived alleles primarily in Eurasian populations. To reconcile these findings, which appear to support different models of selection, we argue that selective sweeps (particularly, soft sweeps) of multiple derived alleles in and/or near LTα occurred in non-Africans after their ancestors left Africa. Furthermore, these targets of selection were predicted to alter transcription factor binding site affinity and protein stability, suggesting they play a role in gene function. Additionally, our data also showed that a subset of these functional adaptive variants are present in archaic hominin genomes.

Conclusions

Overall, this study identified candidate functional alleles in a biologically-relevant genomic region, and offers new insights into the evolutionary origins of these loci in modern human populations.

Background

Lymphotoxin-α (LTα) in the MHC class III region encodes a potent cytotoxic polypeptide that plays a key role in regulating a number of biological processes, including lipid metabolism, coagulation, neurotransmission, and immunological response [1,2,3,4,5,6,7,8,9,10,11,12]. Studies have also shown that common variants in and/or near LTα contribute to the onset of complex diseases. For example, a recent analysis demonstrated that individuals with the A-allele at rs909253— located in an intronic region of LTα— have a higher risk for nasal NK/T-cell lymphoma relative to individuals with the G-allele in a Chinese population [13]. Data have also indicated that variability at rs2239704, rs909253, rs1041981, and rs2229094 was associated with elevated risk for other types of cancer, such as gastric and breast cancers in East Asians as well as non-Hodgkin lymphoma in individuals of European ancestry [14,15,16,17,18,19,20]. In vivo analyses using a mouse model further showed that upregulation of LTα in hepatocytic cells— infected by the hepatitis B or C virus — contributed to apoptosis and/or cell transformation leading to the development of hepatocellular carcinoma [21]. In addition, variability at rs909253, rs1800683 and rs1041981 has been correlated with increased susceptibility to cardiovascular disease in European, East Asian and/or Brazilian populations [10, 22,23,24]. Lastly, allelic variation at rs909253 and/or at rs2229094 has been implicated in the onset of inflammatory/autoimmune disorders, such as chronic periodontitis [25], ankylosing spondylitis [26], rheumatoid arthritis [27], systemic lupus erythematosus [28], vitiligo [29] and Sjogren’s syndrome [30] in individuals of non-African ancestry.

Despite the important role that LTα plays in complex traits, little is still known about levels of nucleotide variation and LD in and/or near this gene. Equally as important, the evolutionary processes that have shaped patterns of diversity in this region are similarly not known. These pieces of information are critical for identifying additional alleles in the LTα region that might contribute to phenotypes, including disease susceptibility. To address this gap in knowledge, we analyzed sequence variation across ~ 10 kbs on chromosome 6, encompassing the LTα gene and the 5′ region (and more broadly across the entire chromosome in some cases), in 2039 individuals from 21 distinct populations in the 1000 Genomes Project. Here, we observed striking patterns of variation in global populations, including an excess of intermediate-frequency alleles, the maintenance of multiple common haplotypes, and a deep coalescent time for variation (dating > 1 million years ago). We also identified a number of common alleles in or near LTα that are present in Neandertal and Denisovan genomes, further supporting the inferred ancient age of nucleotide variation. While these findings are generally consistent with a model of long-term balancing selection, we also observed extensive haplotype homozygosity on chromosomes carrying derived alleles primarily in non-African populations, suggestive of recent selection. Furthermore, a subset of these adaptive alleles were predicted to alter transcription factor binding site affinity and protein stability, suggesting they play a role in gene function. This latter finding could be highly informative for biomedical studies focused on the development of therapeutic interventions that mitigate, mimic or magnify the effects of these functional sites to combat diseases. Overall, our study identified candidate alleles that contribute to phenotypic variation, and offers additional insights into the evolutionary origins of these loci in modern human populations.

Results

Patterns of nucleotide variation

We identified a total of 183 bi-allelic single nucleotide polymorphisms (SNPs) across ~ 10 kbs of sequence, encompassing LTα (2226 base pairs [bps]) and the adjacent 5′ non-coding region (7411 bps) on chromosome 6, in 21 global populations from the 1000 Genomes Project. Of the 183 polymorphisms, 52 SNPs were located within LTα, while the remaining 131 SNPs were found in the 5′ region (Additional file 2: Table S1). The LTα gene is comprised of four exons (Fig. 1); however, the mature LTα protein is encoded by exons 2 (from bps + 461 to + 559), 3 (from bps + 646 to + 751) and 4 (from bps + 999 to + 1411). In the LTα coding region, we identified 12 polymorphisms in global populations, eight of which were nonsynonymous changes. Two of these nonsynonymous polymorphisms (rs2229094 and rs1041981) occurred at relatively high frequency in all populations. More specifically, the minor allele frequency (MAF) at rs2229094 (C-allele) and rs1041981 (A-allele) ranged from 27.3 to 45.8% and from 20.2 to 33.8%, respectively, in South Asian populations. In Europeans, the rs2229094 minor C-allele and rs1041981 minor A-allele varied from 22.9 to 38.3% and from 24.8 to 30.9%, respectively, while the MAF at these loci ranged from 14.4 to 22.8% and from 38.4 to 54.8%, respectively, in East Asian populations. Likewise, the minor C-allele at rs2229094 and the minor A-allele at rs1041981 varied from 22.4 to 30.8% and from 36.3 to 61.5%, respectively, in populations of African descent (which include indigenous Africans, African Americans and African Caribbeans). Further comparative analysis also showed that allelic variation at rs2229094 was present in the genomes of two closely related archaic species to modern humans—Neanderthals and Denisovans (Additional file 2: Table S1).

In addition, we detected population-specific nonsynonymous variation in our modern human samples. For example, the minor alleles at rs538402044, rs562333039, rs538877791, rs566451995 occurred at < 2.5% frequency in non-Africans, while these alleles were absent in populations of African descent. Furthermore, the minor C-allele at rs2229092 was common in Europeans varying from 5.1 to 7.0%, while it occurred at lower frequency in South Asians (from 1.7 to 5.5%) and in East Asians (from < 1 to 2.4%). In contrast, the rs2229092 C-allele was absent in indigenous Africans, but was observed at relatively low frequency (~ 1–2.5%) in African American and African Caribbean populations. Additionally, we found that synonymous variants were either absent or occurred at very low frequency (< 1.0%) in all populations (Additional file 2: Table S1). Overall, we observed a striking deficit of synonymous SNPs relative to nonsynonymous polymorphisms in global populations.

In the intronic regions of LTα, we identified a total of 31 SNPs (Additional file 2: Table S1). Alleles at four of these polymorphic sites (rs1800683 rs909253, rs2239704, and rs746868) occurred at relatively high frequency in global populations (Additional file 2: Table S1). In particular, the MAF at rs1800683 and rs909253 both varied from ~ 20.2 to 33.8% frequency in South Asians, from 38.5 to 54.7% in East Asians, from 26.6 to 30.9% in Europeans, and from 38.9 to 64.1% in populations of African descent (Additional file 2: Table S1). Furthermore, the MAF at rs2239704 and rs746868 (Additional file 2: Table S1) ranged from ~ 31 to ~ 44.0% in South Asians, from 33.3 to 48.1% in Europeans, and from 24.1 to 47.1% in East Asians. Comparatively, the frequency of minor alleles at these sites was lower in African and African-descended populations, varying from ~ 16.0 to ~ 38.0%. Our analysis also uncovered a moderate level of population-specific variation; specifically, 25.8, 12.9, 9.7 and 22.6% of the intronic variants were private to populations of African, South Asian, European and East Asian descent, respectively (Additional file 2: Table S1; Additional file 1: Figure S1). Additionally, we found that human derived alleles at rs909253 and rs2239704 were shared with Neandertal and Denisovan hominins.

In the adjacent 5′ region, we identified a total of 131 SNPs, and uncovered extensive allele sharing among distinct populations (Additional file 2: Table S1). Notably, a subset of these polymorphisms (specifically, rs2009658, rs2844484 and rs915654) have previously been implicated in the onset of complex diseases [19, 31]. Like in the LTα gene, our analyses also showed that modern and archaic humans shared a number of derived alleles located in the upstream region (Additional file 2: Table S1).

Lastly, we calculated standard measures of nucleotide diversity (θ_π and θ_W) in LTα and the upstream region, separately, for each global population (Table 1). We found that θ_π (the mean number of pairwise differences per nucleotide) in the LTα gene was similar across diverse populations. Specifically, θ_π ranged from 1.18 × 10^− 3 to 1.39 × 10^− 3 in populations of African descent; from 1.19 × 10^− 3 to 1.33 × 10^− 3 in South Asians, from 1.34 × 10^− 3 to 1.48 × 10^− 3 in Europeans and from 1.19 × 10^− 3 to 1.33 × 10^− 3 in East Asians (Table 1). Likewise, estimates of θ_W (nucleotide diversity calculated based on the number of segregating sites) did not vary greatly among populations (Table 1). In the 5′ region, we detected a similar pattern of nucleotide diversity among geographically distinct populations (Additional file 2: Table S2). Overall, we observed little difference in the level of nucleotide diversity between African and non-African populations across this ~ 10-kb region.

Table 1 Summary statistics based on sequence variation in the LTα gene

Full size table

Tests of neutrality

To determine if patterns of variation are consistent with neutral evolution, we calculated Tajima’s D (D_T) and the Fay and Wu’s H (H) statistics for the LTα gene and the upstream region, separately, in each population (Table 1; Additional file 2: Table S2). We also generated expected D_T and H values under varying models of demographic growth using the ms software [32]. We incorporated growth as a parameter in these coalescent simulations given the genetic evidence for past human population expansion in prior studies [33, 34]. Based on these analyses, we found that D_T for LTα was more positive than expected (P < 0.05) in South Asians, Europeans and East Asians (Table 1; Additional file 2: Table S3). Moreover, we observed a significant departure of H values (P < 0.05) in South Asian and European populations (Table 1; Additional file 2: Table S4). Our results also showed a general pattern of positive D_T in populations of African descent, with the largest departures from expected values occurring in the Mende from Sierra Leone and the Luhya from Kenya (Table 1; Additional file 2: Table S3). However, we did not observe significant H values in African, African American and African Caribbean populations (Barbadians) (Table 1; Additional file 2: Table S4).

Additionally, we calculated D_T and H statistics for the adjacent 5′ region in each population (Additional file 2: Tables S5 and S6), and found significantly positive D_T values (P < 0.05) in South Asians, Europeans, and East Asians (Additional file 2: Tables S2 and S5). Furthermore, H statistics were more positive than expected (P < 0.05) in these populations (Additional file 2: Tables S2 and Table S6). In comparison, we observed a mix of positive and slightly negative D_T values (Additional file 2: Tables S2 and S5) among populations of African descent. However, we did detect significantly positive H values (P < 0.05) in both indigenous and recently admixed Africans (Additional file 2: Tables S2 and S6).

To further assess whether or not variation at LTα is evolving neutrally, we applied the McDonald-Kreitman (M-K) test, which compares the ratio of synonymous and non-synonymous sites within and between species, to our sequence data (Table 1). Our results showed a significant excess of nonsynonymous variants (P = 0.047) in the Kinh population from Vietnam and a borderline significant excess of amino acid variation (0.1 ≥ P ≥ 0.05) in other populations of African (African Americans and the Mende), South Asian (Bengali, Gujarati, Telugu, Punjabi and Sri Lankan Tamil), European (Finnish, Great British, and Iberian), and East Asian (Dai and southern Han Chinese) ancestry (Table 1). Thus, we consistently observed a higher proportion of nonsynonymous polymorphisms relative to synonymous changes in all populations. Overall, the excess of nonsynonymous variants and the sharply positive D_T and H statistics in global populations are consistent with the action of long-term balancing selection [35,36,37].

Haplotype variation and inferred relationships

We analyzed phased haplotype data, encompassing the entire 9637 base pair (bp) region, and identified 223 distinct haplotypes in global populations (Additional file 2: Table S7A). Of these variants, 23 haplotypes (H6, H8, H12, H30, H36, H37, H80, H82, H86, H105, H119, H128, H132, H136, H142, H158, H160, H187, H190, H204, H214, H217 and H219) were common, and six of these haplotypes were found in all populations (Additional file 2: Table S7A); 16–18 common haplotypes accounted for 80.4–88.4%, 77.6–83.1% and 77.1–88.9% of the total number of haplotypes in South Asian, European and East Asian populations, respectively (Additional file 2: Table S8). Comparatively, a smaller number of common haplotypes (H6, H30, H37, H82, H86, H119, H128, H142, H187, H190, H217 and H219) comprised 66.7–74.2% of all haplotype lineages in African and African-descended populations (Additional file 2: Table S9). Further analysis of all haplotype variation also showed that Africans and non-Africans exhibited high levels of haplotype diversity (h-diversity). Specifically, h-diversity varied from 90.5 to 95.0% in populations of African descent; from 91.5 to 93.6% in South Asians; from 91.9 to 93.3% in Europeans; and from 85.9 to 89.9% in East Asians (Table 1).

We also performed a median-joining network analysis [38] to explore the phylogenetic relationships among haplotype lineages, and found that haplotype variation grouped into three distinct clusters (arbitrarily labeled I, II, III; Fig. 2). Each cluster consisted of low-frequency haplotypes radiating from common haplotypes in a “star-like” pattern which is the genetic pattern predicted under a model of positive or purifying selection [39]. Moreover, the highest-frequency haplotypes (specifically, H37, H82, H128, H142 and H190) in clusters I, II and III were shared among globally diverse populations, and these haplotype lineages differed from one another at 28 polymorphic sites. Intriguingly, we also found that 22 of the 28 polymorphic sites on these high-frequency haplotypes were present in the Neandertal and Denisovan genomes (Additional file 2: Table S7B).

Pairwise linkage disequilibrium in the LTα region and surrounding genes

To quantify the allelic associations between SNPs across ~ 10 kbs, encompassing LTα and the 5′ region, we estimated pairwise LD using the D′ statistic [40]. Our results generally showed a faster decay of LD between loci (D′ < 100) in populations of African ancestry compared to non-Africans. For example, in the Esan from Nigeria, we observed lower levels of allelic association (D′ < 100) within and between LTα and the upstream region (Fig. 3; Additional file 1: Figure S1; Additional file 2: Table S10), implying a history of recombination in this population. By contrast, our LD plots revealed a higher level of allelic association in South Asians, Europeans, and East Asians. In particular, we observed a greater number of instances of complete LD (D′ = 100) between loci in non-Africans compared to populations of African descent (Fig. 3; Additional file 1: Figure S1; Additional file 2: Table S10). The inference of recombination across LTα and the 5′ region is further supported by our phylogenetic network which displayed a small number of reticulations or loops among haplotype lineages (Fig. 2). It is known that these reticulate (non-bifurcating) relationships among haplotypes can arise due to historical recombination events [41].

To explore possible gene-gene interactions, we also quantified pairwise LD between genetic markers in the LTα region and neighboring genes (specifically, NFKBIL1, TNFα, and LTβ) within a 35,538-bp region on chromosome 6 (Additional file 1: Figure S2). This analysis showed that African and African-descended populations exhibited less LD (D′ < 100) between SNPs in the LTα region and nearby genes (Additional file 1: Figure S2). In comparison, however, we observed more instances of complete LD (D′ = 100) across this ~ 35.5 kilobase (kb) region in Eurasian populations (Additional file 1: Figure S2).

Long-range haplotype structure

We further characterized long-range LD across the entire ∼170 megabase (Mb) region of chromosome 6 for each population using the iHS statistic [42]. In particular, we calculated the absolute standardized |iHS| scores by normalizing raw values with the selscan and norm software [43]. Based on this analysis, we observed outlier |iHS| statistics for multiple SNPs— namely, rs4947326, rs4947327, rs9267497, rs36221456, rs73396237, rs148407582, rs9267499, rs2857709, rs2857708 and rs2229092— in the coding and 5′ regions in South Asians (Fig. 4; Additional file 1: Figure S3; Additional file 2: Table S12). We also identified multiple SNPs with elevated |iHS| values in the upstream, intronic and exonic regions in European populations (Fig. 4; Additional file 1: Figure S3; Additional file 2: Table S12). In addition, we detected several outlier |iHS| scores for loci in the 5′ region only (rs73396237, rs915654, rs2857709) in populations from East Asia (Fig. 4; Additional file 1: Figure S3; Additional file 2: Table S12). We did not, however, observe outlier values for SNPs in the LTα gene or the 5′ region in populations of African descent, except in African Americans where we detected a single extreme |iHS| score for rs73396237 located in the upstream region (Fig. 4; Additional file 1: Figure S3; Additional file 2: Table S12).

To complement our |iHS| scans, we applied the nSL statistic which is a test for detecting both soft and hard sweeps in genomic data [44]. One advantage of this statistic is that it does not depend on the recombination rate, which can lead to false positive signals of haplotype homozygosity in chromosomal regions with low levels of recombination [44]. Using this method, we observed extreme standardized |nSL| scores (defined as |nSL| > 2) for variation in the 5′ region (rs36221456, rs73396237, rs2857709, rs4947326, rs2857601, rs2857708) and in the LTα coding region (rs2229092) in South Asians (Additional file 1: Figure S4; Additional file 2: Table S11), generally overlapping with the |iHS| results. We also detected outlier |nSL| values for a similar set of loci upstream (rs73396237, rs2857709, rs3093540, rs2071590, rs4947326, rs2857601, rs2239704 and rs3093546) and within the LTα gene (rs3093542 and rs2229092) in European populations (Additional file 1: Figure S4; Additional file 2: Table S12). Furthermore, we found a single extreme |nSL| score for rs73396237 in the upstream region in East Asians. By contrast, we did not observe unusual |nSL| scores for loci in any population of African descent (Additional file 1: Figure S4; Additional file 2: Table S12).

We also plotted the decay of haplotype homozygosity from core SNPs (represented by loci with extreme |iHS| or |nSL| scores) using the EHH statistic [45]. This analysis revealed long-distance haplotype homozygosity, spanning up to 1.4 megabases (Mbs), on chromosomes carrying the derived allele relative to chromosomes with the ancestral allele mainly in Eurasian populations (Fig. 4; Additional file 1: Figure S5). For example, we detected unusually long haplotype structure around derived alleles at rs73396237, rs2857709, rs2857708, rs4947326, rs4947327, rs148407582, rs9267497, rs9267499, and rs2229092 in the South Asian Telugu (ITU) population (Fig. 4; Additional file 1: Figure S5). We also observed a similar pattern of EHH on chromosomes carrying derived alleles in other non-African populations. In addition, we detected long-range EHH around the derived G-allele at rs73396237 (extending no more than 80 kbs) in African Americans in agreement with our |iHS| results (Fig. 4; Additional file 1: Figure S5). We did not, however, observe a difference in EHH curves for chromosomes harboring the derived versus the ancestral allele in other populations of African descent (Additional file 1: Figure S5).

Population differentiation and structure

To examine patterns of genetic differentiation, we calculated mean F_ST among geographic regions (namely Africa, South Asia, East Asia, and Europe) using polymorphisms across the ~ 10 kbs encompassing LTα and the upstream region. Mean global F_ST was estimated to be 0.010132, much lower than F_ST estimates for autosomal loci, which typically range from 0.10 to 0.16 [46, 47]. Our analyses also revealed that per-site F_ST values for common SNPs were below 0.10 (Additional file 2: Table S1). For example, F_ST values for rs1800683, rs2239704, rs909253, rs746868, rs2229094, rs3093542, rs2229092 and rs1041981 within the LTα gene were estimated to be 0.069033, 0.020427, 0.068599, 0.021288, 0.029649, 0.026510, 0.034892 and 0.067877, respectively (Additional file 2: Table S1). Similarly, F_ST values for common SNPs in the 5′ region varied from 0.0055 to 0.0745 (Additional file 2: Table S1). Although the above estimates of F_ST were not outliers in the empirical distribution of F_ST, calculated for ~ 7.5 million genome-wide SNPs, they did fall within the 15th percentile of empirical values. Thus, we consistently observed low estimates of F_ST in both the LTα gene and the upstream region.

Inferred ages of variants in LTα and the adjacent 5′ region

Using a coalescent-based method [48], we inferred the mean time to the most recent common ancestor (TMRCA) for variation in both LTα and the upstream region, and calculated the mean ages of individual polymorphisms (in years ago ± standard deviation in years) (Additional file 2: Table S13). Our analysis revealed a deep coalescence time for the origin of variation, estimated to be ~ 1.0 million years ago (ya) ± 57,207.68 years. We also found that the ages of common polymorphisms in the 5′ region ranged from 200,571.46 ya (± 77,877.56 years) to 734,694 ya (± 140,326.55 years) (Additional file 2: Table S13). Furthermore, variants with the oldest inferred ages—namely, rs2857710 (734,694 ya ± 140,326.55 years), rs3131641 (453,306.20 ya ± 69,869.40 years), rs2857602 (718,530.73 ya ± 141,061.25 years), rs2844486 (626,693.98 ya ± 177,795.95 years) and rs1121800 (472,408.24 ya ± 106,530.63 years)—were also found in the Neanderthal and/or Denisovan genomes (Additional file 2: Tables S1 and S13). Additionally, several of the SNPs with deep coalescence times (i.e., rs4947326, rs4947327, rs148407582 and rs9267497) were inferred to be targets of recent selection based on our long-range haplotype analyses (Additional file 2: Tables S11 and S13).

Inferred functional consequences

Given the higher proportion of nonsynonymous polymorphisms relative to synonymous changes in the LTα coding region, we inferred the possible effects of nonsynonymous SNPs on protein function using SIFT [49]. SIFT classifies nonsynonymous substitutions as either “tolerated” or “deleterious” based on sequence homology and the properties of amino acids [49]. This method predicted that seven of the eight nonsynonymous polymorphisms (specifically, rs538402044, rs2229094, rs368539892, rs2229092, rs1041981, rs538877791 and rs566451995) were “tolerated”, while variation at rs562333039 was predicted to be “deleterious”. Moreover, three “tolerated” variants— rs2229094, rs2229092, rs1041981— were common (MAF ≥ 5%) in at least one population, while the other four “tolerated” polymorphisms were either rare (MAF < 5%) or absent in global populations. The single “deleterious” SNP was present only in the Kinh population from Vietnam at very low frequency (MAF < 0.01). We also determined if common missense variation was conserved across mammalian species using GERP++ [50], phyloP [51] and phastCons [52]. Our results showed that the conservation score for each polymorphic site did not reach the “deleterious” threshold level, in agreement with our SIFT results (Table 2).

Table 2 Predictions of amino acid changes on LTα protein function

Full size table

In addition, we analyzed changes in LTα protein stability (i.e. the folded conformation) caused by missense mutations at different common SNPs with two bioinformatic tools: FoldX [53] and I-Mutant3 [54]. Our analyses revealed that the total energy change (ΔΔG) for rs1041981, determined by FoldX, was − 0.802 kcal/mol, suggesting that mutation (the A-allele) at this site slightly destabilizes protein structure (Table 2). By contrast, the FoldX ΔΔG for rs2229092 was 1.180 kcal/mol, implying that variability at this site largely stabilizes the folding conformation of the LTα protein (Table 2). The I-Mutant3 method yielded similar predictions of protein stability changes due to nucleotide variation at the rs1041981 and rs2229092 loci. The folding energy difference caused by mutation at rs2229094 —calculated using I-Mutant3— was 0.07 kcal/mol (Table 2), indicating that variability at this site has little effect on the folded structure of the LTα protein. Regrettably, we were unable to determine the FoldX ΔΔG for rs2229094 because the available template structure, used as input for this bioinformatic tool, did not include residues encoded by this SNP.

Lastly, we examined the potential regulatory effects of variants in or near the LTα gene using the SNP2TFBS database [55]. Based on this analysis, we identified 13 non-coding polymorphisms that map to transcription factor binding sites (TFBSs), and are predicted to alter transcription factor binding site affinity (Table 3) [56]. Our data also showed that the transcription factors (TFs) affected by nucleotide changes in the TFBSs were Pdx1, Prrx2, KLF1, ZNF263, SP2, SP1, KLF5, ELF-1, Zfx, Nr1h3/Rxra, Arnt/Ahr, EGR2, Foxd3, FOXI1, IRF1, SPIB, ARID3A. Furthermore, a subset of these 13 SNPs (namely, rs2857709, rs4947326, rs4947327 and rs3093542, which are predicted to affect the binding of TFs ELF-1, Zfx and ZNF263) had extreme |iHS| and/or |nSL| scores, suggesting these candidate functional alleles are or have been targets of recent selection (Table 3; Additional file 2: Table S12). In addition, derived alleles at these sites had coalescence times that ranged from 56,718.38 ya (± 32,106.13 years) to 668,571.54 ya (± 105,795.94 years) (Additional file 2: Table S13). Our analysis also revealed that one of the 13 candidate functional polymorphisms (specifically, rs909253)— not inferred to be under selection— is a known genetic risk factor for several complex diseases [10, 13, 57, 58] , and is predicted to alter the binding of TF SPIB (Table 3).

Table 3 The effect of allelic variation on transcription factor binding affinity

Full size table

Discussion

Evidence for soft sweeps in global populations

Our analyses uncovered striking patterns of diversity in LTα and the 5′ region in African and non-African populations. In particular, we found significantly positive D_T and H statistics, indicating an excess of intermediate-frequency-derived alleles, within and outside of Africa. Furthermore, we observed the maintenance of multiple high-frequency haplotype lineages in global populations, and inferred a deep coalescent time of variation (> 1.0 million years ago). We also detected low levels of genetic differentiation and extensive sharing of variation among globally diverse populations. A comparative analysis further revealed that a number of polymorphisms were shared between archaic and modern humans (i.e. trans-species polymorphisms). Taken together, these results are largely consistent with a model of long-term balancing selection. However, we also detected long-range EHH on chromosomes carrying derived alleles primarily in non-African populations, and observed a “star-like” phylogeny of haplotypes radiating around common haplotypes in our network. These genetic patterns are consistent with the predictions for positive selection. To reconcile these findings, which appear to support different models of selection, we argue that long-term balancing selection has acted to maintain alleles/haplotypes in African populations, while patterns of diversity and LD in non-Africans were shaped by recent selection.

Within Africa, we found evidence for balancing selection based on the site frequency spectrum and the persistence of multiple common haplotypes in populations. These results are similar to the findings in a prior study that detected signatures of balancing selection at the TAS2R38 locus—which is associated with normal variation in bitter taste perception—in ethnically diverse African populations [59]. In addition, we detected shorter blocks of EHH in most populations of African descent, likely reflecting a history of recombination in and around LTα [60,61,62]. One exception to this general pattern of LD, however, was observed in African Americans where we found unusually long haplotype structure around the derived allele at rs73396237. This SNP is also inferred to be a target of selection and is common in European populations. One possible explanation for this pattern is that the extended LD surrounding this selected SNP could have been introduced into African Americans through gene flow from the ancestors of western Europeans to enslaved West Africans during the Trans-Atlantic slave trade [63]. Indeed, prior studies have shown that contemporary African Americans have varying levels of European ancestry due to this historical event which occurred between the 15th and 19th centuries [63,64,65]. Additionally, the presence of other derived variation in African Americans (for example, the C-allele at rs2229092 in the LTα coding region), common in Europeans but notably absent in West Africans, also supports this model of historical admixture. Alternatively, and more provocatively, the common derived allele at rs73396237 (inferred to be under selection) could have already been present in the enslaved ancestors of African Americans and then became selectively advantageous in the new environments (for example, novel pathogens) to which these individuals were exposed, leading to extended haplotype structure over time. However, further analysis of a larger set of populations will be needed to distinguish: 1) whether the above extended LD in African Americans was due to strong selection on pre-existing common variation at rs73396237 that became beneficial in novel environments, or 2) a selective event had occurred first in the ancestors of Europeans who then admixed with the West African ancestors of present-day African Americans. Interestingly, recent studies have reported instances of populations acquiring selected alleles through past admixture [66,67,68]. Thus, it is highly conceivable that a selected allele could have been introduced into African Americans through admixture between African and non-African parental populations.

After modern humans migrated from Africa ~ 60,000–80,000 ya, we contend that positive selection on pre-existing genetic variation occurred in regions of the world outside of the African continent. Indeed, this model of adaptation has been reported for several genes in non-African populations [69]. In particular, a recent study uncovered evidence for long-term balancing selection at several loci (CLCNKB, PKDREJ, SDR39U1, and ZNF473) in West and East Africans [69]. However, these genes showed an absence of some of the signals of balancing selection in European and/or East Asian populations; data also indicated that one of the two alleles under balancing selection in Africa underwent a soft sweep in non-Africans [69], likely in response to changes in selective pressures outside of Africa [69,70,71,72,73]. In the present study, we propose that a similar phenomenon occurred in the LTα region during the history of non-Africans, leading to the presence of long-range haplotypes in these populations.

Additionally, we suggest that soft sweeps of multiple advantageous mutations occurred— such that pre-existing alleles of similar benefit were selected for and increased in frequency simultaneously, or in short succession of one another (in response to different selective pressures). Under this model of evolution (i.e. a multiple-locus soft sweep model), none of the favored mutations rise rapidly to fixation [74], but rather beneficial alleles at different loci undergo an incomplete selective sweep. In this scenario, multiple mutations can co-exist at intermediate frequency in populations, mimicking balancing selection (resulting in strongly positive D_T and H values) [37, 75,76,77,78]. Furthermore, multiple haplotypes carrying selectively advantageous alleles will become frequent within populations, leading to an increase in haplotype diversity [37, 79, 80]. Explicitly, both of these genetic patterns were consistently observed in Eurasian populations. This proposed model of selection is distinct from a classic selective sweep in which a novel beneficial mutation arises once and quickly rises in frequency until it becomes fixed [7].

Lastly, we detected a significant and marginally significant excess of nonsynonymous polymorphisms in African and non-African populations. This bias towards amino acid variation in the LTα coding region could arise due to several evolutionary forces, such as positive selection, a relaxation of functional constraint, or balancing selection in diverse populations [60, 81,82,83]. Although the precise environmental/external pressures driving the different types of selection in and outside of Africa are currently unknown, it is clear that multiple loci in LTα and the upstream region are or have been functionally important during human evolution.

Identification of functional candidate loci

The human genome contains thousands of experimentally corroborated transcription factor binding sites (TFBSs)[84, 85]; precise knowledge of the genetic variants that disrupt TFBSs is critical for understanding the molecular changes that contribute to phenotypic variation [86]. Using a computational approach, we identified a number of SNPs that were predicted to alter the binding of multiple TFs— most notably Zfx, ELF-1, SPIB and ZNF263. Prior studies have also documented the role that these TFs play in gene regulation and immunity. For example, studies have reported that Zfx is essential for the survival of recirculating mature B-cells and embryonic stem cells. Moreover, in vitro experiments demonstrated that Zfx-deficient peripheral T-cells failed to proliferate and expand after microbial antigen stimulation [87,88,89]. Other studies have also indicated that the ELF-1 binding site is important for the initiation of transcription and that the knockdown of this TF inhibited gene expression [90,91,92]. Data further showed that TF SPIB plays a role in B-cell differentiation (via gene regulation), enabling these lymphocytes to appropriately respond to environmental cues [93, 94]. In addition, analyses focused on ZNF263 found that this TF can have both positive and negative effects on transcriptional regulation of gene targets; more specifically, in cells where ZNF263 levels were low, a subset (∼15–20%) of genes exhibited decreased transcription, while a larger proportion of genes showed elevated expression [95]. As a result, we suggest that the candidate loci, identified in the present study, likely contribute to phenotypic variation through the regulation of gene expression (though it is not completely clear from our data in which direction these variants affect expression; that is to say, we do not know if they inhibit or enhance expression). Intriguingly, we also found that rs909253, which is predicted to alter the binding of SPIB, is associated with increased risk for myocardial infarction, non-Hodgkin lymphoma, and psoriatic arthritis [10, 13, 58], suggesting that the onset of these complex diseases may be due in part to LTα expression levels. However, further studies are needed to elucidate the role that LTα expression plays in complex disease susceptibility.

Our analyses also revealed strong LD among alleles in and between LTα and the 5′ region, raising the possibility that our candidate SNPs are in LD with previously identified variants in the literature. Therefore, it is conceivable that our candidate loci are either causal or act epistatically with previously described alleles in association studies. Furthermore, given the evidence for strong LD between alleles located in different genes (mainly in non-African populations), it is possible that loci in the LTα region and in neighboring genes could also interact epistatically, influencing gene expression. Additionally, a subset of the candidate functional sites (i.e., rs2857709, rs4947326, rs4947327, and rs3093542) were inferred to be under selection, suggesting that adaptive evolution in the LTα region involved changes in the level of gene expression.

Although computational approaches have been beneficial for predicting the impact of nucleotide variation on TF-DNA binding [96,97,98,99,100,101,102], the ability to infer functional consequences of nucleotide changes in TFBSs using such methods has its limits. One important limitation to note is that only a small fraction of all existing TFs in the genome have been characterized in terms of their binding properties to date [86]. As a result, a number of TFBSs in the genome do not have known cognate TFs. Equally as important, current computational methods examine each variant in DNA binding sites independently, neglecting the potential epistatic interactions among distinct TFBSs. Therefore, future in vitro and in vivo TF binding experiments will be important for further clarifying how the novel loci in our study singly or jointly, influence function, leading to differences in gene expression.

Our data also indicated that the high levels of amino acid change in the LTα protein is generally tolerated (i.e. they do not disrupt protein function), suggesting that sequence diversity may have a functional purpose. For example, greater amino acid diversity could expand the ability of the LTα protein to recognize a broad range of antigens, enhancing immune surveillance. Equally as important, the accumulation of amino acid changes could be critical for the other key functions that the LTα protein performs. While the reasons underlying this bias towards amino acid polymorphisms are not clear, this pattern of diversity has been previously reported for a number of other MHC genes [81]. In addition, our ΔΔG results indicated that the “mutant” proline residue at amino acid position 51— encoded by rs2229092— has a stabilizing effect on LTα protein structure relative to the “wild-type” histidine residue at the same site (Table 2). Indeed, it is widely recognized that high stability is important for preserving protein function in a range of conditions over time (for example, temperature and pH) [103]. Furthermore, extra stability is often correlated with protein evolution; specifically, it increases the tolerance of proteins to amino acid substitutions while still folding to its native structure [104]. Thus, highly stable proteins can function across broader physiochemical environments and accept a greater number of amino acid changes, increasing their capacity to perform diverse biochemical tasks [104]. Given that rs2229092 was also inferred to be a target of adaptive evolution in non-Africans, we contend that there has been recent selection for nucleotide variability that enhances protein monomer stability, suggesting that the LTα protein may play functionally important and diverse roles in these populations.

Though our analyses of protein structure indicated that nucleotide changes at rs2229092, rs2229094 and rs1041981 were generally tolerated, prior studies have reported that missense mutations at rs2229094 and rs1041981 are associated with elevated risk for several complex diseases [14, 17, 29]. Given our findings in the present study, we argue that protein instability (leading to dysfunction of the LTα protein) is not likely a major contributor to disease outcome. Further investigation of other possible molecular mechanisms may be more informative for understanding the underlying cause(s) of disease susceptibility (for example, rs2229094 and rs1041981 may be in LD with other loci that alter the expression of LTα proteins leading to increased risk for disease).

Timing of common variants in African and non-African populations

Our GENETREE analysis revealed an ancient TMRCA for variation in and near LTα (dating to > 1.0 million ya), and a deep coalescence time for common alleles that predate the origin of anatomically modern humans ~ 200,000–300,000 ya [63, 105]. The presence of human polymorphisms in Neanderthal and Denisovan genomes further support the inferred ancient age of common nucleotide variation in our dataset. This sharing of alleles among closely related species suggests that variation in and/or near LTα likely arose in modern H. sapiens because it was either: 1) inherited from the last ancestor that modern humans shared with Neandertals and Denisovans more than 500,000 ya [106], or 2) introduced into modern humans through admixture with Neanderthal and Denisovan populations who overlapped in space and time with H. sapiens. Because we observed these ancient polymorphisms in indigenous African populations, which are known to have low levels of Neandertal/Denisovan ancestry [107, 108], we argue that it is likely these variants originated from the last common ancestor of H. sapiens, Neandertals, and Denisovans in Africa rather than through introgression. Regardless, the inferred deep TMRCA of polymorphisms and their presence in closely related hominins suggest that these variants have persisted in the Homo lineage for a relatively long period of time.

Lastly, although classical and soft selective sweeps are expected to increase allele frequency differences among populations [109, 110], we did not observe extreme genetic differentiation at common SNPs in and/or near LTα. One possible explanation for this pattern is that both ancestral and derived alleles were common in populations in Africa. When modern humans migrated out of Africa within the last 100,000 years [63, 111], they carried ancestral and derived alleles at common sites with them. Because these alleles occur at appreciable frequencies in global populations, it is not surprising that we would not observe large among-population differences in allele frequency and thus not detect unusually large F_ST values at common sites [112].

The function and evolution of MHC genes

The MHC is a key component of the adaptive immune system in all jawed vertebrates [113,114,115]. In humans, MHC is a gene-dense region that spans ∼4 Mbs on the short arm of chromosome 6 and contains over 200 genes [113]. The classical MHC class I and class II genes (also called the human leukocyte antigen, HLA, system) encode cell-surface glycoproteins that play a key role in adaptive immunity [114, 115]. In particular, in cells infected by intracellular parasites, MHC class I molecules present parasite-derived peptides to cytotoxic T lymphocytes [114, 115]. Similarly, the MHC class II molecules present antigens (mainly from extracellular pathogens) on the surface of antigen-presenting cells [114, 115]. These exposed peptides are then recognized by helper T lymphocytes, resulting in a series of immune responses [116, 117]. In comparison, proteins encoded by MHC class III genes have somewhat different functions, playing roles in inflammation and complementary immune responses among other immune-related activities [118].

The classical MHC genes are among the most polymorphic in the human genome [114]. Furthermore, alleles at these loci are ancient in age (often predating species divergence events) and can be retained across multiple species. Studies have also indicated that balancing selection has likely acted at these genes to enhance the level of nonsynonymous substitutions in codons and allelic diversity over extremely long time periods [81, 114, 119, 120]. Comparatively, little is known about patterns of variation at genes within the MHC class III region or the microevolutionary forces that have shaped these patterns. However, some studies have identified intergenic sequences in this region that contain cis-acting elements for transcriptional or post-transcriptional regulation of gene expression [121, 122]. For example, non-coding sequences upstream of TNF-α were found to contain transcription factor and enhancer binding sites that influence TNF-α expression in blood serum [122]. In the present study, we similarly identified polymorphisms in the 5′ region of LTα that are predicted to alter LTα expression levels. However, more in-depth studies of the MHC class III region are needed in order to identify variants with a similar regulatory effect on other structural genes in this chromosomal region, and to better understand the evolutionary history of these genes in the human lineage.

Concluding remarks

It has been estimated that more than 90% of the SNPs identified in prior association studies lie outside of protein coding genes and that a substantial fraction of these polymorphisms affect gene regulation [123]. Therefore, understanding whether or not variation within TFBSs disrupts or creates functional sites will be informative for elucidating the molecular basis of gene regulation and complex traits, including disease susceptibility. Our study identified SNPs in TFBSs that potentially influence gene expression and were also shared with archaic hominins. The biological consequences of polymorphisms (in modern H. sapiens) that originate from closely related species have been a focus of intense scientific inquiry in the field of human genomics [124, 125]. For example, a number of studies have indicated that introgressed Neandertal alleles, occurring at relatively high frequency in contemporary Eurasian populations, contribute to immunity, metabolism, height, hair color, and skin tone [124,125,126,127,128,129,130,131,132,133,134,135,136]. Analyses have also shown that introgressed archaic alleles have been targets of positive selection in modern populations. Thus, polymorphisms of non-modern origin are argued to have played important roles in human biology. In the current study, our data suggest that ancient shared alleles in the LTα region— likely inherited from the common ancestor of H. sapiens, Neandertals, and Denisovans— were functional and adaptive in modern humans. These findings raise new and intriguing questions about whether or not archaic and modern humans exhibited similar phenotypic variation due to their shared allelic variation. Indeed, interrogating the genomes of related species— such as Neandertals and Denisovans— can provide insights into the evolutionary origins of alleles associated with traits that may or may not be unique to modern humans.

Methods

Population samples

We analyzed the 9637 bps, encompassing the LTα gene (2226 bps) and the adjacent 5′ region (7411 bps) on chromosome 6 in 2039 unrelated individuals from the 1000 Genomes Project (Phase 3). These individuals originated from four geographic regions: [1] Africa includes 99 Esan (ESN, Nigeria), 113 Mandinka (GWD, The Gambia), 54 Mende (MSL, Sierra Leone), 107 Yoruba (YRI, Nigeria), 99 Luhya (LWK, Kenya), 60 African Americans (ASW, United States), 96 Barbadians (ACB, Barbados); [2] South Asia encompasses 84 Bengali (BEB, Bangladesh), 103 Gurjarati (GIH, India), 99 Indian Telugu (ITU, India), 93 Punjabi (PJL, Pakistan), 99 Tamil (STU, Sri Lanka); [3] East Asia consists of 93 Dai (CDX, China), 103 Han (CHB, Beijing), 105 Han (CHS, southern China), 104 Japanese (JPT, Japan), 99 Kinh (KHV, Vietnam); [4] Europe comprise 90 Great British (GBR, Great Britain), 99 Finnish (FIN, Finland), 107 Iberian (IBS, Spain), 104 Toscani Italians (TSI, Italy) [137]. Although we grouped African Americans and African Caribbeans (Barbadians) with indigenous Africans above, we did not combine these recently admixed population with Africans when we examined the geographic distribution of polymorphisms in Fig. 1b and c. As a result, the total number of variants in Fig. 1b and c does not include the population-specific variation present in African Americans and African Caribbeans.

Nucleotide variation

We extracted variant calls (in vcf files; build GRCh 37 human assembly) from the LTα gene and the adjacent 5′ region in the 1000 Genomes sequencing dataset using vcftools [138]. The start and end positions for this region (build GRCh 37) were obtained from NCBI [139]. The minor allele frequency (MAF) was defined as the second most frequent allele at a given site in pooled populations (i.e., 4078 chromosomes). SNPs were also broadly classified as common or rare based on the MAF at a given site. More explicitly, SNPs with an MAF ≥ 5% were classified as common, while SNPs with an MAF < 5% were categorized as rare. For comparative analysis, we examined sequence contigs for ~ 10 kbs, encompassing the LTα and 5′ region, in Neandertal and Denisovan samples from the UCSC genome browser [140].

Tests of neutrality

We calculated the Tajima’s D (D_T) [141] and Fay and Wu’s H (H) [142] statistics for LTα and the adjacent 5′ region, separately, in each population. Significance of the test statistics was assessed by comparing the observed values to expected values generated from 10,000 neutral coalescent simulations incorporating different models of growth using the ms software [32]. If a sample showed a statistically significant deviation from the expected theoretical distribution, the null hypothesis of no selection was rejected. For Africans, we calculated expected D_T and H statistics under a 2-, 4-, 6-, 8-, and 10-fold increase in population size (starting from 10,000 individuals) beginning 70,000 ya until the present [34, 143,144,145]. For non-Africans, we calculated the expected D_T and H values under a range of demographic scenarios that included a population bottleneck at 60,000 ya (from an initial population size of 10,000 individuals and decreasing to 2000 individuals), followed by 10-, 20-, and 40- fold growth beginning at 50,000 ya until the present [144, 145]. For all simulations, we used a generation time of 20 years. Because LTα and the 5′ region are different sizes, we also incorporated sequence length as a parameter in these simulations.

In addition, we applied the McDonald–Kreitman (M-K) test to the LTα coding region using DnaSP [146]. The M-K test compares the ratio of polymorphism to divergence at replacement and silent sites. Under neutrality, the ratio of replacement to silent substitutions between species (D_N/D_S) is expected to equal the ratio of replacement to silent polymorphisms within species (P_N/P_S).

Haplotype variation and inferred relationships

We extracted fully phased haplotype data from the 1000 Genomes Project for 2039 individuals using vcftools [138], and then applied a custom script to identify unique haplotypes along with the number of times that each haplotype appeared in the pooled global dataset. The genealogical relationships among haplotypes were inferred using the median-joining algorithm implemented in the Network 5.0 program [38]. The resulting phylogeny was a tree with the minimum number of changes among all possible trees [38]. Haplotype diversity (h-diversity) was also calculated for each population using the DnaSP software [146].

Linkage disequilibrium

We examined pairwise LD: 1) in the LTα region spanning ~ 10 kbs, and separately 2) in an ~ 35.5-kb genomic region, encompassing neighboring genes (NFKBIL1, LTα, TNFα, and LTβ), using the Haploview software [40]. LD was quantified using the D′ statistic [147], which indicates the magnitude of LD between SNP loci based on allele frequency. To maximize our power to detect a relationship between SNP loci, we filtered out SNPs with a MAF < 1% in our analyses. In the resulting LD plots, the color intensity of each square represents the strength of the relationship between SNP alleles. Specifically, bright red squares indicate complete LD between SNP pairs (D′ = 100; logarithm of odds (LOD) > 2); shades of pink/white squares signify little evidence of LD (D′ < 100; LOD < 2); purple squares denote high LD but with little statistical support (low LOD) [40]. The bold triangles in the plots also indicate strong blocks of LD between SNP markers.

Extended haplotype homozygosity

We characterized long-range LD on chromosome 6 for each population using the iHS statistic [42], which indicates the amount of haplotype homozygosity on chromosomes carrying the derived allele compared to chromosomes with the ancestral allele. To identify outlier values, the unstandardized scores for > 340,000 SNPs across the ~ 170-Mb region of chromosome 6 were normalized with the norm program implemented in the selscan package [43]. SNPs with a standardized |iHS| > 2 represent the most extreme 5% of scores. We also estimated the length of haplotype homozygosity with another haplotype-based statistic, nSL, using selscan [43]. This statistic measures haplotype lengths based on the number of segregating sites in a sample and does not depend on the recombination rate, unlike the iHS statistic, making it robust to recombination rate variation [44]. The output results were normalized following the same procedure used for iHS. To complement these scans for selection, we quantified the decay of identity of haplotypes with distance by calculating the EHH statistic using loci with extreme iHS and/or nSL scores as core SNPs [45].

Population differentiation and structure

To measure the degree of genetic divergence among global populations, we calculated average F_ST derived from genomic data using the Weir and Cockerham method implemented in vcftools [138, 148]. We also computed among-population F_ST at individual polymorphic sites across the ~ 10-kb region of interest and at ~ 7.5 million randomly selected SNPs from across the genome. The observed F_ST estimates for SNPs in and/or near LTα were then compared with the empirical distribution of F_ST values derived from genome-wide SNPs in order to identify outlier values (≤ 2.5th percentile or ≥ 95th percentile of the distribution). F_ST values at individual sites in or near LTα, and F_ST values at the ~ 7.5 million genome-wide polymorphisms were calculated using the same 21 populations.

Age estimates of mutations

We used a coalescent-based approach to estimate: 1) the expected TMRCA of the gene tree and 2) the expected ages of individual polymorphisms. This method requires an outgroup that provides the ancestral state at each polymorphic site. We applied the GENETREE software [48] to our sequence data to obtain the maximum likelihood estimate (MLE) of θ over 1,000,000 runs [48]. Using the MLE of θ and our estimate of μ, we calculated the effective population size (N_e) parameter, which was determined to be 18,367 based on the formula N_e = θ/4μ. Then, the TMRCA of the gene tree and the ages of individual polymorphisms were estimated from the weighted average of simulated ages over 100,000 independent runs [48]. GENETREE assumes an infinite alleles model and no recombination [48]. We removed haplotypes and/or sites that were not compatible with these underlying assumptions before applying the GENETREE algorithm.

Functional analysis

The effects of nonsynonymous SNPs at LTα on protein function were inferred using the SIFT algorithm implemented in the Ensembl Variant Effect Predictor toolset. SIFT predicts the functional impact of amino acid substitutions (i.e., “tolerated” or “deleterious”) based on sequence homology, the physical properties of amino acids and multiple alignment information [49]. In addition to SIFT, we executed GERP++ [49], phyloP [50] and phastCons [51] to evaluate whether or not common missense variation was conserved across 20 different mammalian species [149]. We also predicted changes in LTα protein stability caused by common missense mutations using two bioinformatic tools: FoldX [53] and I-Mutant3 [54] tools. ΔΔG is the difference in the Gibbs free energy for folding between the final state (the mutant) and the reference state (the wild-type). For each polymorphic site, ΔΔG was computed from the free energy of the wild type protein (encoded by the major allele at a given site) minus the Gibbs free energy of the mutated protein (encoded by the minor allele at a given site). In other words, ΔΔG = ΔG WT – ΔG MUT. Given this definition, mutations resulting in negative ΔΔG values were classified as “destabilizing”, while mutations resulting in positive ΔΔG values were categorized as “stabilizing”. We constructed the protein structure of LTα, which served as the input for FoldX, with the homology modelling server SWISS-MODEL [150]. The resulting LTα protein structure only consisted of amino acids from positions 48 to 80, which encompassed rs2229092 (H51P) and rs1041981 (T60N), but excluded rs2229094 (C13R). Therefore, the ΔΔG could not be calculated for rs2229094 using the FoldX tool.

We also analyzed polymorphisms in and near the LTα gene using the SNP2TFBS tool [55], which predicts if SNPs in TFBSs affect transcription factor binding in the human genome. The effect of a given SNP on transcription factor (TF) binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding TF factor [55, 56]. In addition, using the NCBI ClinVar database [57], we searched for any reports of an association between variants (inferred to be functional in Table 3) and human health outcomes.

Availability of data and materials

The 1000 Genomes Project data (Phase 3) analyzed in the current study are publicly available from an unrestricted online access repository [152].

Change history

16 December 2019
After publication of our article [1] we were notified that a few duplicate sentences were included on Figure 3 and Figure 4 legends.

Abbreviations

| iHS |:: Absolute value of the integrated haplotype score
| nSL |:: Absolute number of segregating sites by length
2N :: Number of gene copies analyzed in each population
ACB:: Barbadians from Barbados
Ahr:: Aryl hydrocarbon receptor
ARID3A:: AT-Rich Interaction Domain 3A
Arnt:: Aryl hydrocarbon receptor nuclear translocator)
ASW:: African Americans from the Southwest United States
BEB:: Bengali from Bangladesh
bp:: base pair
bps:: base pairs
C13R:: Cysteine/Arginine polymorphism at amino acid position 13
CDX:: Chinese Dai from Xishuangbanna, China
CHB:: Han Chinese from Beijing
CHS:: Han Chinese from southern China
CLCNKB :: Chloride Voltage-gated Channel Kb (gene)
D_N :: Replacement divergent sites
DNA:: Deoxyribonucleic Acid
D_S :: Silent divergent sites
D_T :: Tajima’s D
EGR2:: Early Growth Response 2
EHH:: Extended haplotype homozygosity
ELF-1:: E74 like ETS transcription factor 1
ESN:: Esan from Nigeria
FIN:: Finnish from Finland
Foxd3:: Forkhead Box D3
FOXI1:: Forkhead Box I1
F _ST :: Fixation index (subpopulation relative to total population)
GBR:: Great British from the United Kingdom
GIH:: Gujarati Indian from Houston, Texas
GRCh37:: Genome Reference Consortium Human Build 37
GWD:: Gambian in Western Divisions in the Gambia
H :: Fay and Wu’s H
h:: Number of haplotypes
H. sapiens :: Homo sapiens
H51P:: Histidine/Proline polymorphism at amino acid position 51
h-diversity:: Haplotype diversity
HLA:: Human Leukocyte Antigen
IBS:: Iberian population in Spain
iHS:: integrated haplotype score
IRF1:: Interferon Regulatory Factor 1
ITU:: Indian Telugu from the UK
JPT:: Japanese in Tokyo, Japan
kb:: kilobase
kbs:: kilobases
kcal/mol:: one kilocalorie of energy per one mole of substance
KHV:: Kinh in Ho Chi Minh City, Vietnam
KLF1:: Kruppel Like Factor 1
KLF5:: Kruppel Like Factor 5
LD:: Linkage disequilibrium
LOD:: Logarithm of odds
LTα :: Lymphotoxin alpha (gene)
LTβ :: Lymphotoxin beta (gene)
LWK:: Luhya in Webuye, Kenya
MAF:: Minor Allele Frequency
Mb:: Megabase
Mbs:: Megabases
MHC:: Major Histocompatibility Complex
M-K:: McDonald-Kreitman
MLE:: Maximum Likelihood Estimate
MSL:: Mende in Sierra Leone
NCBI:: National Center for Biotechnology Information
N _e :: Effective population size
NFKBIL1 :: NFKB inhibitor like 1 (gene)
NK/T-cell:: Natural killer T-cell
Nr1h3:: Nuclear Receptor Subfamily 1 Group H Member 3
nSL:: number of segregating sites by length
Pdx1:: Pancreas/Duodenum Homeobox Protein 1
pH:: potential Hydrogen
PJL:: Punjabi from Lahore, Pakistan
PKDREJ :: Polycystin family receptor for egg jelly (gene)
P_N :: Replacement polymorphic sites
Prrx2:: Paired Related Homeobox 2
P_S :: Silent polymorphic sites
PWM:: Position Weight Matrix
RxrA:: Retinoid X Receptor Alpha
S :: Number of segregating sites
SDR39U1 :: Short chain dehydrogenase/reductase family 39 U member 1 (gene)
SNP:: Single nucleotide polymorphism
SNPs:: Single nucleotide polymorphisms
SP1:: Specificity proteins 1
SP2:: Specificity proteins 2
SPIB:: Spi-B transcription factor
STU:: Sri Lankan Tamil from the UK
T60N:: Threonine/Asparagine polymorphism at amino acid position 60
TAS2R38 :: Taste 2 receptor member 38 (gene)
TF:: Transcription Factor
TFBSs:: Transcription Factor Binding Sites
TF-DNA:: Transcription Factor-Deoxyribonucleic Acid
TFs:: Transcription Factors
TMRCA:: Time to the most recent common ancestor
TNFα :: Tumor necrosis factor alpha (gene)
TSI:: Toscani in Italy
UCSC:: University of California Santa Cruz
UTR:: Untranslated region
vcf:: variant call format
ya:: years ago
YRI:: Yoruba in Ibadan, Nigeria
Zfx:: Zinc finger X-chromosomal protein
ZNF263:: Zinc Finger Protein 263
ZNF473:: Zinc Finger Protein 473
ΔG MUT:: DeltaG for the mutant allele
ΔG WT:: DeltaG for the wild-type allele
ΔΔG:: DeltaDeltaG (the total energy change)
θ :: Scaled population mutation rate
θ _W :: Watterson’s theta (nucleotide diversity per site)
θ _π :: Mean pairwise number of differences between sequences per site
μ :: Mutation rate per sequence per generation

References

Bauer J, Namineni S, Reisinger F, Zoller J, Yuan D, Heikenwalder M. Lymphotoxin, NF-kB, and cancer: the dark side of cytokines. Dig Dis. 2012;30(5):453–68.
Article PubMed Google Scholar
Daller B, Musch W, Rohrl J, Tumanov AV, Nedospasov SA, Mannel DN, et al. Lymphotoxin-beta receptor activation by lymphotoxin-alpha(1)beta(2) and LIGHT promotes tumor growth in an NFkappaB-dependent manner. Int J Cancer. 2011;128(6):1363–70.
Article CAS PubMed Google Scholar
Etemadi N, Holien JK, Chau D, Dewson G, Murphy JM, Alexander WS, et al. Lymphotoxin alpha induces apoptosis, necroptosis and inflammatory signals with the same potency as tumour necrosis factor. FEBS J. 2013;280(21):5283–97.
Article CAS PubMed Google Scholar
Etemadi N, Webb A, Bankovacki A, Silke J, Nachbur U. Progranulin does not inhibit TNF and lymphotoxin-alpha signalling through TNF receptor 1. Immunol Cell Biol. 2013;91(10):661–4.
Article CAS PubMed Google Scholar
Liu J, Liu J, Song B, Wang T, Liu Y, Hao J, et al. Genetic variations in CTLA-4, TNF-alpha, and LTA and susceptibility to T-cell lymphoma in a Chinese population. Cancer Epidemiol. 2013;37(6):930–4.
Article PubMed Google Scholar
Yu X, Huang Y, Li C, Yang H, Lu C, Duan S. Positive association between lymphotoxin-alpha variation rs909253 and cancer risk: a meta-analysis based on 36 case-control studies. Tumour Biol. 2014;35(3):1973–83.
Article CAS PubMed Google Scholar
Messer G, Spengler U, Jung MC, Honold G, Blomer K, Pape GR, et al. Polymorphic structure of the tumor necrosis factor (TNF) locus: an NcoI polymorphism in the first intron of the human TNF-beta gene correlates with a variant amino acid in position 26 and a reduced level of TNF-beta production. J Exp Med. 1991;173(1):209–19.
Article CAS PubMed Google Scholar
Tan JH, Temple SE, Kee C, Waterer GW, Tan CR, Gut I, et al. Characterisation of TNF block haplotypes affecting the production of TNF and LTA. Tissue Antigens. 2011;77(2):100–6.
Article CAS PubMed Google Scholar
Yokley BH, Selby ST, Posch PE. A stimulation-dependent alternate core promoter links lymphotoxin alpha expression with TGF-beta1 and fibroblast growth factor-7 signaling in primary human T cells. J Immunol. 2013;190(9):4573–84.
Article CAS PubMed Google Scholar
Ozaki K, Ohnishi Y, Iida A, Sekine A, Yamada R, Tsunoda T, et al. Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat Genet. 2002;32(4):650–4.
Article CAS PubMed Google Scholar
Knight JC, Keating BJ, Kwiatkowski DP. Allele-specific repression of lymphotoxin-alpha by activated B cell factor-1. Nat Genet. 2004;36(4):394–9.
Article CAS PubMed Google Scholar
Taylor JM, Wicks K, Vandiedonck C, Knight JC. Chromatin profiling across the human tumour necrosis factor gene locus reveals a complex, cell type-specific landscape with novel regulatory elements. Nucleic Acids Res. 2008;36(15):4845–62.
Article CAS PubMed PubMed Central Google Scholar
Cheng S, Li J, Liu W, Liu C, Su L, Liu X, et al. LTA + 252A > G polymorphism is associated with risk of nasal NK/T-cell lymphoma in a Chinese population: a case-control study. BMC Cancer. 2015;15:480.
Article PubMed PubMed Central CAS Google Scholar
Huang Y, Yu X, Wang L, Zhou S, Sun J, Feng N, et al. Four genetic polymorphisms of lymphotoxin-alpha gene and cancer risk: a systematic review and meta-analysis. PLoS One. 2013;8(12):e82519.
Article PubMed PubMed Central CAS Google Scholar
Lu R, Dou X, Gao X, Zhang J, Ni J, Guo L. A functional polymorphism of lymphotoxin-alpha (LTA) gene rs909253 is associated with gastric cancer risk in an Asian population. Cancer Epidemiol. 2012;36(6):e380–6.
Article CAS PubMed Google Scholar
Zhou P, Huang W, Chu X, Du LF, Li JP, Zhang C. The lymphotoxin-alpha 252A>G polymorphism and breast cancer: a meta-analysis. Asian Pac J Cancer Prev. 2012;13(5):1949–52.
Article PubMed Google Scholar
Sainz J, Rudolph A, Hoffmeister M, Frank B, Brenner H, Chang-Claude J, et al. Effect of type 2 diabetes predisposing genetic variants on colorectal cancer risk. J Clin Endocrinol Metab. 2012;97(5):E845–51.
Article CAS PubMed Google Scholar
Skibola CF, Bracci PM, Nieters A, Brooks-Wilson A, de Sanjose S, Hughes AM, et al. Tumor necrosis factor (TNF) and lymphotoxin-alpha (LTA) polymorphisms and risk of non-Hodgkin lymphoma in the InterLymph Consortium. Am J Epidemiol. 2010;171(3):267–76.
Article PubMed PubMed Central Google Scholar
Wang SS, Purdue MP, Cerhan JR, Zheng T, Menashe I, Armstrong BK, et al. Common gene variants in the tumor necrosis factor (TNF) and TNF receptor superfamilies and NF-kB transcription factors and non-Hodgkin lymphoma risk. PLoS One. 2009;4(4):e5360.
Article PubMed PubMed Central CAS Google Scholar
Aissani B, Ogwaro KM, Shrestha S, Tang J, Breen EC, Wong HL, et al. The major histocompatibility complex conserved extended haplotype 8.1 in AIDS-related non-Hodgkin lymphoma. J Acquir Immune Defic Syndr. 2009;52(2):170–9.
Article CAS PubMed PubMed Central Google Scholar
Haybaeck J, Zeller N, Wolf MJ, Weber A, Wagner U, Kurrer MO, et al. A lymphotoxin-driven pathway to hepatocellular carcinoma. Cancer Cell. 2009;16(4):295–308.
Article CAS PubMed PubMed Central Google Scholar
Ramasawmy R, Fae KC, Cunha-Neto E, Muller NG, Cavalcanti VL, Ferreira RC, et al. Polymorphisms in the gene for lymphotoxin-alpha predispose to chronic Chagas cardiomyopathy. J Infect Dis. 2007;196(12):1836–43.
Article CAS PubMed Google Scholar
Iwanaga Y, Ono K, Takagi S, Terashima M, Tsutsumi Y, Mannami T, et al. Association analysis between polymorphisms of the lymphotoxin-alpha gene and myocardial infarction in a Japanese population. Atherosclerosis. 2004;172(1):197–8.
Article CAS PubMed Google Scholar
Consortium P. A trio family study showing association of the lymphotoxin-alpha N26 (804A) allele with coronary artery disease. Eur J Hum Genet. 2004;12(9):770–4.
Article CAS Google Scholar
Fassmann A, Holla LI, Buckova D, Vasku A, Znojil V, Vanek J. Polymorphisms in the +252(a/G) lymphotoxin-alpha and the −308(a/G) tumor necrosis factor-alpha genes and susceptibility to chronic periodontitis in a Czech population. J Periodontal Res. 2003;38(4):394–9.
Article CAS PubMed Google Scholar
Jia B, Qi X. The genetic association between polymorphisms in lymphotoxin-alpha gene and ankylosing spondylitis susceptibility in Chinese group: a case-control study. Medicine (Baltimore). 2017;96(19):e6796.
Article CAS Google Scholar
Saad MN, Mabrouk MS, Eldeib AM, Shaker OG. Genetic case-control study for eight polymorphisms associated with rheumatoid arthritis. PLoS One. 2015;10(7):e0131960.
Article PubMed PubMed Central CAS Google Scholar
Zhang C, Zhao MQ, Liu J, Huang Q, Li P, Ni J, et al. Association of lymphotoxin alpha polymorphism with systemic lupus erythematosus and rheumatoid arthritis: a meta-analysis. Int J Rheum Dis. 2015;18(4):398–407.
Article CAS PubMed Google Scholar
Laddha NC, Dwivedi M, Gani AR, Mansuri MS, Begum R. Tumor necrosis factor B (TNFB) genetic variants and its increased expression are associated with vitiligo susceptibility. PLoS One. 2013;8(11):e81736.
Article PubMed PubMed Central CAS Google Scholar
Bolstad AI, Le Hellard S, Kristjansdottir G, Vasaitis L, Kvarnstrom M, Sjowall C, et al. Association between genetic variants in the tumour necrosis factor/lymphotoxin alpha/lymphotoxin beta locus and primary Sjogren's syndrome in Scandinavian samples. Ann Rheum Dis. 2012;71(6):981–8.
Article CAS PubMed Google Scholar
Phillips CM, Goumidi L, Bertrais S, Ferguson JF, Field MR, Kelly ED, et al. Additive effect of polymorphisms in the IL-6, LTA, and TNF-{alpha} genes and plasma fatty acid level modulate risk for the metabolic syndrome and its components. J Clin Endocrinol Metab. 2010;95(3):1386–94.
Article CAS PubMed Google Scholar
Hudson RR. Generating samples under a Wright-fisher neutral model of genetic variation. Bioinformatics. 2002;18(2):337–8.
Article CAS PubMed Google Scholar
Wall JD, Przeworski M. When did the human population size start increasing? Genetics. 2000;155(4):1865–74.
CAS PubMed PubMed Central Google Scholar
Cox MP, Morales DA, Woerner AE, Sozanski J, Wall JD, Hammer MF. Autosomal resequence data reveal late stone age signals of population expansion in sub-Saharan African foraging and farming populations. PLoS One. 2009;4(7):e6366.
Article PubMed PubMed Central CAS Google Scholar
Ferrer-Admetlla A, Bosch E, Sikora M, Marques-Bonet T, Ramirez-Soriano A, Muntasell A, et al. Balancing selection is the main force shaping the evolution of innate immunity genes. J Immunol. 2008;181(2):1315–22.
Article CAS PubMed Google Scholar
Hancock AM, Rienzo AD. Detecting the genetic signature of natural selection in human populations: models, methods, and data. Annu Rev Anthropol. 2008;37:197–217.
Article PubMed PubMed Central Google Scholar
Przeworski M, Coop G, Wall JD. The signature of positive selection on standing genetic variation. Evolution. 2005;59(11):2312–23.
Article PubMed Google Scholar
Bandelt HJ, Forster P, Rohl A. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol. 1999;16(1):37–48.
Article CAS PubMed Google Scholar
Bamshad M, Wooding SP. Signatures of natural selection in the human genome. Nat Rev Genet. 2003;4(2):99–111.
Article CAS PubMed Google Scholar
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21(2):263–5.
Article CAS PubMed Google Scholar
Posada D, Crandall KA. Intraspecific gene genealogies: trees grafting into networks. Trends Ecol Evol. 2001;16(1):37–45.
Article CAS PubMed Google Scholar
Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol. 2006;4(3):e72.
Article PubMed PubMed Central Google Scholar
Szpiech ZA, Hernandez RD. Selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol Biol Evol. 2014;31(10):2824–7.
Article CAS PubMed PubMed Central Google Scholar
Ferrer-Admetlla A, Liang M, Korneliussen T, Nielsen R. On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol Biol Evol. 2014;31(5):1275–91.
Article CAS PubMed PubMed Central Google Scholar
Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002;419(6909):832–7.
Article CAS PubMed Google Scholar
Campbell MC, Tishkoff SA. African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu Rev Genomics Hum Genet. 2008;9:403–33.
Article CAS PubMed PubMed Central Google Scholar
Elhaik E. Empirical distributions of F (ST) from large-scale human polymorphism data. PLoS One. 2012;7(11):e49837.
Article CAS PubMed PubMed Central Google Scholar
Griffiths RC, Tavaré S. Ancestral inference in population genetics. Stat Sci. 1994;9(3):307–19.
Article Google Scholar
Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 2012;40(Web Server issue):W452–7.
Article CAS PubMed PubMed Central Google Scholar
Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol. 2010;6(12):e1001025.
Article PubMed PubMed Central CAS Google Scholar
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20(1):110–21.
Article CAS PubMed PubMed Central Google Scholar
Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15(8):1034–50.
Article CAS PubMed PubMed Central Google Scholar
Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33(Web Server issue):W382–8.
Article CAS PubMed PubMed Central Google Scholar
Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005;33(Web Server issue):W306–10.
Article CAS PubMed PubMed Central Google Scholar
Kumar S, Ambrosini G, Bucher P. SNP2TFBS - a database of regulatory SNPs affecting predicted transcription factor binding site affinity. Nucleic Acids Res. 2017;45(D1):D139–D44.
Article CAS PubMed Google Scholar
Mathelier A, Fornes O, Arenillas DJ, Chen CY, Denay G, Lee J, et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2016;44(D1):D110–5.
Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–D7.
Balding J, Kane D, Livingstone W, Mynett-Johnson L, Bresnihan B, Smith O, et al. Cytokine gene polymorphisms: association with psoriatic arthritis susceptibility and severity. Arthritis Rheum. 2003;48(5):1408–13.
Campbell MC, Ranciaro A, Froment A, Hirbo J, Omar S, Bodo JM, et al. Evolution of functionally diverse alleles associated with PTC bitter taste sensitivity in Africa. Mol Biol Evol. 2012;29(4):1141–53.
Charlesworth D. Balancing selection and its effects on sequences in nearby genome regions. PLoS Genet. 2006;2(4):e64.
Charlesworth B, Nordborg M, Charlesworth D. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet Res. 1997;70(2):155–74.
Article CAS PubMed Google Scholar
Messer PW, Petrov DA. Population genomics of rapid adaptation by soft selective sweeps. Trends Ecol Evol. 2013;28(11):659–69.
Article PubMed Google Scholar
Campbell MC, Hirbo JB, Townsend JP, Tishkoff SA. The peopling of the African continent and the diaspora into the new world. Curr Opin Genet Dev. 2014;29:120–32.
Article CAS PubMed PubMed Central Google Scholar
Bryc K, Auton A, Nelson MR, Oksenberg JR, Hauser SL, Williams S, et al. Genome-wide patterns of population structure and admixture in west Africans and African Americans. Proc Natl Acad Sci U S A. 2010;107(2):786–91.
Article CAS PubMed Google Scholar
Bryc K, Durand EY, Macpherson JM, Reich D, Mountain JL. The genetic ancestry of African Americans, Latinos, and European Americans across the United States. Am J Hum Genet. 2015;96(1):37–53.
Article CAS PubMed PubMed Central Google Scholar
Laso-Jadart R, Harmant C, Quach H, Zidane N, Tyler-Smith C, Mehdi Q, et al. The genetic legacy of the Indian Ocean slave trade: recent admixture and post-admixture selection in the Makranis of Pakistan. Am J Hum Genet. 2017;101(6):977–84.
Article CAS PubMed PubMed Central Google Scholar
Pierron D, Heiske M, Razafindrazaka H, Pereda-Loth V, Sanchez J, Alva O, et al. Strong selection during the last millennium for African ancestry in the admixed population of Madagascar. Nat Commun. 2018;9(1):932.
Article PubMed PubMed Central CAS Google Scholar
Ranciaro A, Campbell MC, Hirbo JB, Ko WY, Froment A, Anagnostou P, et al. Genetic origins of lactase persistence and the spread of pastoralism in Africa. Am J Hum Genet. 2014;94(4):496–510.
Article CAS PubMed PubMed Central Google Scholar
de Filippo C, Key FM, Ghirotto S, Benazzo A, Meneu JR, Weihmann A, et al. Recent selection changes in human genes under long-term balancing selection. Mol Biol Evol. 2016;33(6):1435–47.
Article PubMed PubMed Central CAS Google Scholar
Coop G, Pickrell JK, Novembre J, Kudaravalli S, Li J, Absher D, et al. The role of geography in human adaptation. PLoS Genet. 2009;5(6):e1000500.
Article PubMed PubMed Central CAS Google Scholar
Fumagalli M, Sironi M, Pozzoli U, Ferrer-Admetlla A, Pattini L, Nielsen R. Signatures of environmental genetic adaptation pinpoint pathogens as the main selective pressure through human evolution. PLoS Genet. 2011;7(11):e1002355.
Article CAS PubMed PubMed Central Google Scholar
Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, et al. Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci U S A. 2011;108(29):11983–8.
Article CAS PubMed PubMed Central Google Scholar
Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009;19(5):826–37.
Article CAS PubMed PubMed Central Google Scholar
Pritchard JK, Pickrell JK, Coop G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr Biol. 2010;20(4):R208–15.
Article CAS PubMed PubMed Central Google Scholar
Jensen JD. On the unfounded enthusiasm for soft selective sweeps. Nat Commun. 2014;5:5281.
Article CAS PubMed Google Scholar
Hermisson J, Pennings PS. Soft sweeps and beyond: understanding the patterns and probabilities of selection footprints under rapid adaptation. Methods Ecol Evol. 2017;8:700–16.
Article Google Scholar
Barrett RD, Schluter D. Adaptation from standing genetic variation. Trends Ecol Evol. 2008;23(1):38–44.
Article PubMed Google Scholar
Croze M, Zivkovic D, Stephan W, Hutter S. Balancing selection on immunity genes: review of the current literature and new analysis in Drosophila melanogaster. Zoology (Jena). 2016;119(4):322–9.
Article Google Scholar
Teshima KM, Coop G, Przeworski M. How reliable are empirical genomic scans for selective sweeps? Genome Res. 2006;16(6):702–12.
Article CAS PubMed PubMed Central Google Scholar
Smith JM, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23(1):23–35.
Article CAS PubMed Google Scholar
Pierini F, Lenz TL. Divergent allele advantage at human MHC genes: signatures of past and ongoing selection. Mol Biol Evol. 2018. https://doi.org/10.1093/molbev/msy116.
Gillespie JH. The causes of molecular evolution. New York: Oxford University Press; 1991.
Google Scholar
Engle EK, Fay JC. ZRT1 Harbors an Excess of Nonsynonymous Polymorphism and Shows Evidence of Balancing Selection in Saccharomyces cerevisiae. G3 (Bethesda). 2013;3(4):665–73.
Article CAS Google Scholar
Levitsky VG, Kulakovskiy IV, Ershov NI, Oshchepkov DY, Makeev VJ, Hodgman TC, et al. Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data. BMC Genomics. 2014;15:80.
Article PubMed PubMed Central CAS Google Scholar
Vorontsov IE, Fedorova AD, Yevshin IS, Sharipov RN, Kolpakov FA, Makeev VJ, et al. Genome-wide map of human and mouse transcription factor binding sites aggregated from ChIP-Seq data. BMC Res Notes. 2018;11(1):756.
Article CAS PubMed PubMed Central Google Scholar
Deplancke B, Alpern D, Gardeux V. The genetics of transcription factor DNA binding variation. Cell. 2016;166(3):538–54.
Article CAS PubMed Google Scholar
Arenzana TL, Smith-Raska MR, Reizis B. Transcription factor Zfx controls BCR-induced proliferation and survival of B lymphocytes. Blood. 2009;113(23):5857–67.
Article CAS PubMed PubMed Central Google Scholar
Galan-Caridad JM, Harel S, Arenzana TL, Hou ZE, Doetsch FK, Mirny LA, et al. Zfx controls the self-renewal of embryonic and hematopoietic stem cells. Cell. 2007;129(2):345–57.
Article CAS PubMed PubMed Central Google Scholar
Smith-Raska MR, Arenzana TL, D’Cruz LM, Khodadadi-Jamayran A, Tsirigos A, Goldrath AW, et al. The transcription factor Zfx regulates peripheral T cell self-renewal and proliferation. Front Immunol. 2018;9:1482.
Article PubMed PubMed Central CAS Google Scholar
Nishiyama C, Yokota T, Okumura K, Ra C. The transcription factors elf-1 and GATA-1 bind to cell-specific enhancer elements of human high-affinity IgE receptor alpha-chain gene. J Immunol. 1999;163(2):623–30.
CAS PubMed Google Scholar
Tsokos GC, Nambiar MP, Juang YT. Activation of the Ets transcription factor elf-1 requires phosphorylation and glycosylation: defective expression of activated elf-1 is involved in the decreased TCR zeta chain gene expression in patients with systemic lupus erythematosus. Ann N Y Acad Sci. 2003;987:240–5.
Article CAS PubMed Google Scholar
Yang J, Yang W, Hirankarn N, Ye DQ, Zhang Y, Pan HF, et al. ELF1 is associated with systemic lupus erythematosus in Asian populations. Hum Mol Genet. 2011;20(3):601–7.
Article CAS PubMed Google Scholar
Willis SN, Tellier J, Liao Y, Trezise S, Light A, O’Donnell K, et al. Environmental sensing by mature B cells is controlled by the transcription factors PU.1 and SpiB. Nat Commun. 2017;8(1):1426.
Article PubMed PubMed Central CAS Google Scholar
DeKoter RP, Geadah M, Khoosal S, Xu LS, Thillainadesan G, Torchia J, et al. Regulation of follicular B cell differentiation by the related E26 transformation-specific transcription factors PU.1, Spi-B, and Spi-C. J Immunol. 2010;185(12):7374–84.
Article CAS PubMed Google Scholar
Frietze S, Lan X, Jin VX, Farnham PJ. Genomic targets of the KRAB and SCAN domain-containing zinc finger protein 263. J Biol Chem. 2010;285(2):1393–403.
Article CAS PubMed Google Scholar
Barrera LA, Vedenko A, Kurland JV, Rogers JM, Gisselbrecht SS, Rossin EJ, et al. Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science. 2016;351(6280):1450–4.
Article CAS PubMed PubMed Central Google Scholar
Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22(9):1790–7.
Article CAS PubMed PubMed Central Google Scholar
Coetzee SG, Shen HC, Hazelett DJ, Lawrenson K, Kuchenbaecker K, Tyrer J, et al. Cell-type-specific enrichment of risk-associated regulatory elements at ovarian cancer susceptibility loci. Hum Mol Genet. 2015;24(13):3595–607.
Article CAS PubMed PubMed Central Google Scholar
Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, et al. The human transcription factors. Cell. 2018;172(4):650–65.
Article CAS PubMed Google Scholar
Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, et al. A method to predict the impact of regulatory variants from DNA sequence. Nat Genet. 2015;47(8):955–61.
Article CAS PubMed PubMed Central Google Scholar
Zhao J, Li D, Seo J, Allen AS, Gordan R. Quantifying the impact of non-coding variants on transcription factor-DNA binding. Res Comput Mol Biol. 2017;10229:336–52.
Article CAS PubMed PubMed Central Google Scholar
Mathelier A, Wasserman WW. The next generation of transcription factor binding site prediction. PLoS Comput Biol. 2013;9(9):e1003214.
Article CAS PubMed PubMed Central Google Scholar
Deller MC, Kong L, Rupp B. Protein stability: a crystallographer’s perspective. Acta Crystallogr F Struct Biol Commun. 2016;72(Pt 2):72–95.
Article CAS PubMed PubMed Central Google Scholar
Bloom JD, Labthavikul ST, Otey CR, Arnold FH. Protein stability promotes evolvability. Proc Natl Acad Sci U S A. 2006;103(15):5869–74.
Article CAS PubMed PubMed Central Google Scholar
Hublin JJ, Ben-Ncer A, Bailey SE, Freidline SE, Neubauer S, Skinner MM, et al. New fossils from Jebel Irhoud, Morocco and the pan-African origin of Homo sapiens. Nature. 2017;546(7657):289–92.
Article CAS PubMed Google Scholar
Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, et al. A draft sequence of the Neandertal genome. Science. 2010;328(5979):710–22.
Article CAS PubMed PubMed Central Google Scholar
Povysil G, Hochreiter S. IBD sharing between Africans, Neandertals, and Denisovans. Genome Biol Evol. 2016;8(12):3406–16.
Article PubMed Google Scholar
Higham T, Douka K, Wood R, Ramsey CB, Brock F, Basell L, et al. The timing and spatiotemporal patterning of Neanderthal disappearance. Nature. 2014;512(7514):306–9.
Article CAS PubMed Google Scholar
Hohenlohe PA, Phillips PC, Cresko WA. Using population genomics to detect selection in natural populations: Key concepts and methodological considerations. Int J Plant Sci. 2010;171(9):1059–71.
Article CAS PubMed PubMed Central Google Scholar
Peter BM, Huerta-Sanchez E, Nielsen R. Distinguishing between selective sweeps from standing variation and from a de novo mutation. PLoS Genet. 2012;8(10):e1003011.
Article CAS PubMed PubMed Central Google Scholar
Pagani L, Schiffels S, Gurdasani D, Danecek P, Scally A, Chen Y, et al. Tracing the route of modern humans out of Africa by using 225 human genome sequences from Ethiopians and Egyptians. Am J Hum Genet. 2015;96(6):986–91.
Article CAS PubMed PubMed Central Google Scholar
Campbell MC, Smith LT, Harvey J. Population genetic evidence for positive and purifying selection acting at the human IFN-γ locus in Africa. Genes Immun. 2019;20:143–57.
Article CAS PubMed Google Scholar
Beck S, Trowsdale J. The human major histocompatability complex: lessons from the DNA sequence. Annu Rev Genomics Hum Genet. 2000;1:117–37.
Article CAS PubMed Google Scholar
Trowsdale J. The MHC, disease and selection. Immunol Lett. 2011;137(1–2):1–8.
Article CAS PubMed Google Scholar
Trowsdale J, Knight JC. Major histocompatibility complex genomics and human disease. Annu Rev Genomics Hum Genet. 2013;14:301–23.
Article CAS PubMed PubMed Central Google Scholar
Jensen PE. Recent advances in antigen processing and presentation. Nat Immunol. 2007;8(10):1041–8.
Article CAS PubMed Google Scholar
Neefjes J, Jongsma ML, Paul P, Bakke O. Towards a systems understanding of MHC class I and MHC class II antigen presentation. Nat Rev Immunol. 2011;11(12):823–36.
Article CAS PubMed Google Scholar
Deakin JE, Papenfuss AT, Belov K, Cross JG, Coggill P, Palmer S, et al. Evolution and comparative analysis of the MHC class III inflammatory region. BMC Genomics. 2006;7:281.
Article PubMed PubMed Central CAS Google Scholar
Lenz TL. Adaptive value of novel MHC immune gene variants. Proc Natl Acad Sci U S A. 2018;115(7):1414–6.
Article CAS PubMed PubMed Central Google Scholar
Milner CM, Campbell RD. Genetic organization of the human MHC class III region. Front Biosci. 2001;6:D914–26.
Article CAS PubMed Google Scholar
Yung Yu C, Yang Z, Blanchong CA, Miller W. The human and mouse MHC class III region: a parade of 21 genes at the centromeric segment. Immunol Today. 2000;21(7):320–8.
Article CAS PubMed Google Scholar
Qidwai T, Khan F. Tumour necrosis factor gene polymorphism and disease prevalence. Scand J Immunol. 2011;74(6):522–47.
Article CAS PubMed PubMed Central Google Scholar
Hrdlickova B, de Almeida RC, Borek Z, Withoff S. Genetic variation in the non-coding genome: involvement of micro-RNAs and long non-coding RNAs in disease. Biochim Biophys Acta. 2014;1842(10):1910–22.
Article CAS PubMed Google Scholar
Racimo F, Sankararaman S, Nielsen R, Huerta-Sanchez E. Evidence for archaic adaptive introgression in humans. Nat Rev Genet. 2015;16(6):359–71.
Article CAS PubMed PubMed Central Google Scholar
Vattathil S, Akey JM. Small amounts of archaic admixture provide big insights into human history. Cell. 2015;163(2):281–4.
Article CAS PubMed Google Scholar
Dannemann M, Kelso J. The contribution of Neanderthals to phenotypic variation in modern humans. Am J Hum Genet. 2017;101(4):578–89.
Article CAS PubMed PubMed Central Google Scholar
Dolgova O, Lao O. Evolutionary and Medical Consequences of Archaic Introgression into Modern Human Genomes. Genes (Basel). 2018;9(7):358.
Article CAS Google Scholar
Racimo F, Gokhman D, Fumagalli M, Ko A, Hansen T, Moltke I, et al. Archaic adaptive introgression in TBX15/WARS2. Mol Biol Evol. 2017;34(3):509–24.
CAS PubMed Google Scholar
Abi-Rached L, Jobin MJ, Kulkarni S, McWhinnie A, Dalva K, Gragert L, et al. The shaping of modern human immune systems by multiregional admixture with archaic humans. Science. 2011;334(6052):89–94.
Article CAS PubMed PubMed Central Google Scholar
Dannemann M, Andres AM, Kelso J. Introgression of Neandertal- and Denisovan-like haplotypes contributes to adaptive variation in human toll-like receptors. Am J Hum Genet. 2016;98(1):22–33.
Article CAS PubMed PubMed Central Google Scholar
Huerta-Sanchez E, Jin X, Asan, Bianba Z, Peter BM, Vinckenbosch N, et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature. 2014;512(7513):194–7.
Article CAS PubMed PubMed Central Google Scholar
Sankararaman S, Mallick S, Dannemann M, Prufer K, Kelso J, Paabo S, et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014;507(7492):354–7.
Article CAS PubMed PubMed Central Google Scholar
Vernot B, Akey JM. Resurrecting surviving Neandertal lineages from modern human genomes. Science. 2014;343(6174):1017–21.
Article CAS PubMed Google Scholar
Deschamps M, Laval G, Fagny M, Itan Y, Abel L, Casanova JL, et al. Genomic signatures of selective pressures and introgression from archaic hominins at human innate immunity genes. Am J Hum Genet. 2016;98(1):5–21.
Article CAS PubMed PubMed Central Google Scholar
Dannemann M, Racimo F. Something old, something borrowed: admixture and adaptation in human evolution. Curr Opin Genet Dev. 2018;53:1–8.
Article CAS PubMed Google Scholar
Marciniak S, Perry GH. Harnessing ancient genomes to study the history of human adaptation. Nat Rev Genet. 2017;18(11):659–74.
Article CAS PubMed Google Scholar
The 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.
Article CAS PubMed PubMed Central Google Scholar
National Library of Medicine (US). Cited January 26, 2019. Available from: https://www.ncbi.nlm.nih.gov/ (Accessed 27 Jan 2019).
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996–1006.
Article CAS PubMed PubMed Central Google Scholar
Tajima F, Nei M. Estimation of evolutionary distance between nucleotide sequences. Mol Biol Evol. 1984;1(3):269–85.
CAS PubMed Google Scholar
Fay JC, Wu CI. Hitchhiking under positive Darwinian selection. Genetics. 2000;155(3):1405–13.
CAS PubMed PubMed Central Google Scholar
Gay J, Myers S, McVean G. Estimating meiotic gene conversion rates from population genetic data. Genetics. 2007;177(2):881–94.
Article CAS PubMed PubMed Central Google Scholar
Marth GT, Czabarka E, Murvai J, Sherry ST. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics. 2004;166(1):351–72.
Article CAS PubMed PubMed Central Google Scholar
Voight BF, Adams AM, Frisse LA, Qian Y, Hudson RR, Di Rienzo A. Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc Natl Acad Sci U S A. 2005;102(51):18508–13.
Article CAS PubMed PubMed Central Google Scholar
Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25(11):1451–2.
Article CAS PubMed Google Scholar
Lewontin RC. The interaction of selection and linkage. I. General Considerations; Heterotic Models. Genetics. 1964;49(1):49–67.
CAS PubMed PubMed Central Google Scholar
Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;38(6):1358–70.
CAS PubMed Google Scholar
The UCSC Genome Browser database. Cited January 27, 2019. Available from: http://www.noncode.org/cgi-bin/hgTables?db=hg38&hgta_group=compGeno&hgta_track=cons20way&hgta_table=multiz20way&hgta_doSchema=describe+table+schema. Accessed 27 Jan 2019.
Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, et al. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014;42(Web Server issue):W252–8.
Article CAS PubMed PubMed Central Google Scholar
IGSR: The International Genome Sample Resource. Available from: https://www.internationalgenome.org/sample_collection_principles/. (Accessed 27 Jan 2019)
1000 Genomes Project. Cited January 27, 2019. Available from: ftp://ftptrace.ncbi.nih.gov/1000genomes/ftp/release/20130502 (Accessed 27 Jan 2019).

Download references

Acknowledgements

We thank the Center for Computational Biology and Bioinformatics (CCBB) at Howard University for providing the computational resources for this project. We also thank Faith C. Simmonds for her invaluable help with the haplotype network.

Funding

This work was supported by the Start-up Funds from Howard University to MCC and ST. This research was further supported in part by the National Institute on Minority Health and Health Disparities of the National Institutes of Health under Award Number G12MD007597. The funding bodies had no role in study design, analysis and interpretation, nor in the writing of the manuscript.

Author information

Authors and Affiliations

Department of Biology, College of Arts and Sciences, Howard University, Washington, DC, 20059, USA
Michael C. Campbell, Bryan Ashong, Shaolei Teng & Jayla Harvey
Department of Anatomy, College of Medicine, Howard University, Washington, DC, 20059, USA
Christopher N. Cross

Authors

Michael C. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Bryan Ashong
View author publications
You can also search for this author in PubMed Google Scholar
Shaolei Teng
View author publications
You can also search for this author in PubMed Google Scholar
Jayla Harvey
View author publications
You can also search for this author in PubMed Google Scholar
Christopher N. Cross
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MCC conceived of and designed the study. MCC, BA, ST and JH performed data analyses with assistance from CNC. The manuscript was written by MCC with assistance from BA. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Michael C. Campbell.

Ethics declarations

Ethics approval and consent to participate

The 1000 Genomes Project created an open and public resource of anonymous data. Written informed consent was obtained from all participants following the ethical guidelines outlined in [151].

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: we were notified that a few duplicate sentences were included on Figure 3 and Figure 4 legends. Also, in the sentence “In humans, MHC is a gene-dense region that spans ∼4 Mbs on the short arm of chromosome 6 and contains over 200 genes [113]”, “Mbs” was originally missing.

Supplementary information

Additional file 1

: Figure S1. contains pairwise LD plots (spanning ~9.64 kbs) for African and non-African populations that were not included in Fig. 3 of the main manuscript. Figure S2. contains pairwise LD plots for loci located across a larger genomic region (spanning ~ 35.5 kbs) for each population. Figure S3. displays Manhattan plots of standardized |iHS| values in populations that were not included in Fig. 4 of the main manuscript. Figure S4. shows Manhattan plots of standardized |nSL| scores for all populations. Figure S5. presents the remaining EHH plots for populations that were not included in Fig. 4 of the main manuscript.

Additional file 2

: Table S1. contains the frequency of alleles at polymorphic sites across LTα and 5′ regulatory region in global populations, and estimates of per site F_ST. Tables S2. gives summary statistics for the 5′ regulatory region only. Table S3. lists the observed Tajima’s D (D_T) statistics for the LTα gene and their corresponding P-values under different scenarios of population growth. Table S4. shows the observed Fay and Wu’s H (H) statistics for the LTα gene and their corresponding P-values under different scenarios of population growth. Table S5. lists the observed D_T statistics for the 5′ regulatory region and their corresponding P-values for different scenarios of population growth. Table S6. contains the observed H statistics for the 5′ regulatory region and their corresponding P-values under different scenarios of population growth. Table S7A. presents haplotype frequencies in global populations. Table S7B. lists the polymorphic sites where the highest-frequency haplotypes in the network differ from one another and also indicates the alleles (on these haplotypes) that are shared with Neandertal and Denisovan genomes. Table S8. presents the frequencies of common haplotypes in non-African populations. Table S9. shows the frequencies of common haplotypes in African populations. Table S10. gives the genomic coordinates for markers in the LD plots (spanning ~9.64 kbs) in Fig. 3 and in Additional file 1: Figure S1. Table S11. gives the genomic coordinates for markers in the LD plots (spanning ~35.5 kbs) in Additional file 1: Figure S2. Table S12. lists outlier standardized |iHS| and |nSL| scores for SNPs in each population. Table S13. presents the inferred ages of mutations across LTα and the 5′ regulatory region.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Campbell, M.C., Ashong, B., Teng, S. et al. Multiple selective sweeps of ancient polymorphisms in and around LTα located in the MHC class III region on chromosome 6. BMC Evol Biol 19, 218 (2019). https://doi.org/10.1186/s12862-019-1516-y

Download citation

Received: 28 January 2019
Accepted: 20 September 2019
Published: 02 December 2019
DOI: https://doi.org/10.1186/s12862-019-1516-y

Multiple selective sweeps of ancient polymorphisms in and around LTα located in the MHC class III region on chromosome 6

Abstract

Background

Results

Conclusions

Background

Results

Patterns of nucleotide variation

Tests of neutrality

Haplotype variation and inferred relationships

Pairwise linkage disequilibrium in the LTα region and surrounding genes

Long-range haplotype structure

Population differentiation and structure

Inferred ages of variants in LTα and the adjacent 5′ region

Inferred functional consequences

Discussion

Evidence for soft sweeps in global populations

Identification of functional candidate loci

Timing of common variants in African and non-African populations

The function and evolution of MHC genes

Concluding remarks

Methods

Population samples

Nucleotide variation

Tests of neutrality

Haplotype variation and inferred relationships

Linkage disequilibrium

Extended haplotype homozygosity

Population differentiation and structure

Age estimates of mutations

Functional analysis

Availability of data and materials

Change history

16 December 2019

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary information

Additional file 1

Additional file 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Ecology and Evolution

Contact us