- Research article
- Open Access
Genetic adaptation of the antibacterial human innate immunity network
BMC Evolutionary Biologyvolume 11, Article number: 202 (2011)
Pathogens have represented an important selective force during the adaptation of modern human populations to changing social and other environmental conditions. The evolution of the immune system has therefore been influenced by these pressures. Genomic scans have revealed that immune system is one of the functions enriched with genes under adaptive selection.
Here, we describe how the innate immune system has responded to these challenges, through the analysis of resequencing data for 132 innate immunity genes in two human populations. Results are interpreted in the context of the functional and interaction networks defined by these genes. Nucleotide diversity is lower in the adaptors and modulators functional classes, and is negatively correlated with the centrality of the proteins within the interaction network. We also produced a list of candidate genes under positive or balancing selection in each population detected by neutrality tests and showed that some functional classes are preferential targets for selection.
We found evidence that the role of each gene in the network conditions the capacity to evolve or their evolvability: genes at the core of the network are more constrained, while adaptation mostly occurred at particular positions at the network edges. Interestingly, the functional classes containing most of the genes with signatures of balancing selection are involved in autoinflammatory and autoimmune diseases, suggesting a counterbalance between the beneficial and deleterious effects of the immune response.
Infectious diseases are among the most important selective agents for any vertebrate species. In humans, they have represented a great challenge in our adaptation to new environments and social practices, with increasing population densities sustained by improved agricultural technologies and cattle domestication favoring their emergence and spread in the last ten thousand years. The human immune system seems likely to have played a key role in the adaption of the different populations to changing conditions and emerging infections. This hypothesis is supported by recent genomic scans for signatures of adaptive selection in human populations showing that immune function is one of the classes enriched with genes under positive or balancing selection [1–6], the two evolutionary forces underlying adaptation. The literature includes a large collection of human genes related to the host-pathogen interaction with signatures of adaptation [7, 8]. Vertebrate immune function can be divided into the adaptive, exclusive of this phylogenetic group, and the ancient innate immune system, common to most multicellular organisms. Innate immunity constitutes the first barrier of defense, and acts in a semi-specific way by recognizing pathogen-associated molecular patterns (PAMPs), which are essential and conserved components of the pathogens. Substantial evidence of positive or balancing selection acting on some of these genes has been reported [9–15].
In this manuscript, we address the analysis of the footprint of the adaptive selection in the innate immune mechanisms involved in (mostly) antibacterial host defense. Bacteria have probably been the most important human pathogens, with a major impact on morbidity and mortality. Although some of the pathways described here are also involved in host defense against human parasites, fungi and viruses, the pattern recognition receptor (PRR) gene families analyzed (TLRs and NLRs) are the most important for non-adaptive recognition of bacteria. The other two major classes of PRR gene families involved in the host defense to fungi (C-type lectins) and viruses (RiG-I helicases) are not included in the present study.
There is an increasing interest in characterizing the evolutionary dynamics of proteins in the context of their functional network. Published data on biological networks, for example in the form of experimentally derived data sets of protein-protein interactions, or curated databases of functional pathways, have facilitated studies relating evolutionary parameters (mainly the footprint of natural selection) to network topology [16–19]. We use molecular signatures in particular genes to explore adaptation of the functional network, seeking whether human adaptation at the molecular level of the innate immunity system has occurred preferentially on some functional class or classes (with more plasticity to respond to the pathogenic pressures) or in specific positions of the network structure. The final goal of this work is to unravel past selective forces that have shaped human immunological responses to bacterial infections having acted in humans; the approach is based on detecting adaptations through natural selection uncovered in resequencing information in two populations (Europeans and Africans). The results are interpreted in the functional context of gene products interacting in well-identified networks.
Selection of genes and DNA sequences
We included in the study as many genes as possible involved in innate immunity with available resequencing data. Data for 132 genes were retrieved from different sources. Sixty-two genes were from the Innate Immunity Program in Genomics Applications (IIPGA, http://innateimmunity.com), only considering those genes with available information on the sequencing primers used. In addition, data for seven genes was obtained from Environmental Genome Project database (NIEHS SNPs, http://egp.gs.washington.edu), and for 54 from the SeattleSNPs database (http://pga.gs.washington.edu). All sequence data available, both from coding and noncoding regions, was included in the analysis. For the analysis, we also included the previously published results  for eight additional genes from Innate Immunity Program in Genomics Applications and one from the SeattleSNPs database. Genes were classified according to their function into a comprehensive functional network (Figure 1). Chimpanzee sequences (panTro2) were obtained from GenBank (http://www.ncbi.nlm.nih.gov/) and Ensembl database (http://www.ensembl.org/index.html) to be used as outgroups. For IL1F7 we used as outgroup the sequence from Pongo pygmaeus from the Ensemble database. Sequence alignments were performed with ClustalW . Divergence estimations were retrieved from BioMart database (http://uswest.ensembl.org/biomart/index.html), only for unequivocal 1-to-1 hits to functional sequences. dN/dS values correspond to the comparison of humans and Pan troglodytes for all genes except TLR5 and IL1F8 (Gorilla gorilla) and IL1F7 (Pongo pygmaeus).
All included genes from the Innate Immunity Program in Genomics Applications, and 44 from SeattleSNPs were resequenced in the same 23 European-American and 24 African-American samples included in the Coriell CEPH/African American panel (Coriell Institute for Medical Research, Camden, NJ). The remaining 11 genes from Seattle SNPs were resequenced in 23 European (HapMap CEU) and 24 African individuals (HapMap YRI) individuals. For the seven genes from the Environmental Genome Project database we retrieved resequencing data in 22 European (HapMap CEU) and 15 Coriell African American individuals.
Molecular Data Analysis
The following diversity statistics and neutrality tests were calculated for every gene: heterozygosity or π , Tajima's D , Fu and Li's F*, D*, F and D  and the normalized Fay and Wu's H tests  using DnaSP v5  and MANVa . Indels and triallelic positions were not included in the analyses. Genes with less than five segregating sites (MYD88 in Africans; MYD88, IL5 and TLR3 in Europeans) where not considered in the neutrality tests analysis. COSI software  was used to calculate the significance of neutrality tests by means of coalescent simulations, using the model which takes into consideration the demographic history of humans. COSI produces data that closely resembles empirical data from African American, West African and European populations. 1,000 replicates were performed by using the local recombination rate estimate of each region obtained from HapMap (http://www.hapmap.org). In the case of C3 and ITGA8 it was not possible to run the simulations because of its length and high recombination rate. The excess of genes in a given functional class under positive, balancing or both selection was tested by means of Fisher's Exact test in a two by two table including: the number of genes in this particular functional class under selection, the number of genes in this particular functional class without evidences of selection, the number of remaining genes (not in this functional class) under selection, and the number of remaining genes (not in this functional class) without evidences of selection.
We used the MiMI plugin  for Cytoscape  to retrieve all known interactions for the genes in our dataset. Specifically, we queried the database with the 132 gene identifiers to retrieve all known protein-protein interactions among those gene products and their first neighbors. Network statistics were calculated using the NetworkAnalyzer plugin  in Cytoscape.
We have analyzed publicly available resequencing data for 132 genes involved in antibacterial innate immunity in individuals of African and European ancestry (see Methods). 129 genes are autosomal and only three genes (IRAK1, TLR7 and TLR8) are located in the X chromosome. All these genes have an unequivocal role in host defense against bacteria, with some of them also related to immune response against other pathogens, and were classified according to their main function in this system into five categories: receptors, adaptors, modulators, cytokines and effector molecules (Figure 1). We have also defined subclasses in the case of the pattern-recognition receptors (TLR2, TLR4, TLR5, NOD1, NOD2, TLR3 and TLR7-9 modules), which can be differentiated according to their location in the cell membrane (the first three), in the cytoplasm (NOD1 and NOD2), or in endosomes (TLR3, TLR7-9). Cytokines have been subclassified according to their involvement in different immune processes (acute phase, cellular immunity, anti-inflammatory cytokines, neutrophil function, chemokines, and defense against extracellular bacterial parasites). Several genes were included in a certain cytokine functional module due to their crucial role for the induction of the respective cytokine (e.g. STAT3 gene was included in the Th1 module) or for the role played in the function of the respective cytokine (e.g. JAK3 in the Th1 module). Five genes were not included in any of the functional classes defined (Additional file 1, Table S1).
Nucleotide diversity within a network approach
Nucleotide diversity levels can provide a measure of the degree of conservation of the different genes and their comparison across functional classes may shed light on different intensities of purifying selection. We estimated the nucleotide diversity measured as the average number of differences between pairs of sequences (π) for the 132 genes within each population, and analyzed this information in the context of the position of the genes in the functional network and through the comparison of diversity levels among the five functional categories defined (see above).
We obtained the network for all the genes included in the study considering all known interactions for the 132 gene products, obtained from the MiMI database for molecular interactions (see Materials & Methods). Interactions could be retrieved for 126 of the 132 genes, and the final network consisted of 1,561 proteins with a total of 13,002 interactions. A finding emerging from previous studies on the influence of network topology on evolutionary rate is the observation that highly connected proteins in the protein-protein interaction network ("hubs") are more constrained in their evolution in respect to peripheral ones . Furthermore, proteins in the periphery of the interactome have been shown to be more likely to be under positive selection in the human lineage than more centrally located ones . In order to test whether this trend can also be observed among the innate immunity proteins in our dataset, we investigated the correlations between nucleotide diversity and two measures of the centrality of nodes within a network: the degree centrality (defined as the total number of links incident upon a node) and the betweenness centrality (defined as the fraction of all shortest paths that pass through a node). For these analyses, we considered all known interactions within the entire interactome for each of our 126 genes present in the MiMI database (see Methods), as they more accurately reflect a particular protein's position in the network than merely considering all interactions among themselves only.
Significant negative correlation values were obtained for degree centrality with nucleotide diversity in Africans (τ = -0.2, P = 0.001) and Europeans (τ = -0.19, P = 0.002) (Additional file 1, Figure S2). For betweenness centrality and nucleotide diversity the correlations were also negative and marginally significant (τ = -0.12, P = 0.05 for Africans; τ = -0.11, P = 0.08 for Europeans). When only the coding sequence is considered these correlations remain significant only in the case of degree centrality in Africans (τ = -0.15, P = 0.015) (Additional file 1, Figure S3), probably as a result of lack of power given the much smaller number of segregating sites. We repeated these analyses comparing both centrality indexes to divergence, measured as the ratio between nonsynonymous and synonymous changes (dN/dS) between humans and chimpanzees for each gene. Again, the correlations are negative and significant for degree centrality (τ = -0.16, P = 0.01) and marginally significant for betweenness centrality (τ = -0.12, P = 0.05) (Additional file 1, Figure S4), which is consistent with previous results, even though we are restricting our analyses to much fewer genes.
An alternative approach to the network analysis can be applied by considering the hierarchical structure of innate immunity signaling from a functional perspective. Figure 1 shows a "bow-tie" shaped organization in terms of the number of components and the flow of information: a relatively large number of receptor molecules recognizing different classes of pathogens all signal to a limited number of intracellular adaptor proteins which, in turn, can interact with proteins that act as modulators, and subsequently signal to a diverse array of downstream molecules, including cytokines and other effector molecules (Figure 1). This bow-tie structure characterized by a large number of "inputs", a relatively small number of central control nodes which elaborates the information, and a large number of "outputs", seems to be a topological organization widely adopted by metabolic networks [33–35] and also by the immunity system signaling network .
Given this structure, one can hypothesize that the amount of gene variation should follow a similar pattern to the one obtained in the context of the position of the genes in the network, with the adaptors common to all different pathogen responses being more constrained than both up- and downstream proteins, as previously reported . Table 1 shows the average value of nucleotide diversity observed in the different functional classes, both using the whole sequence and only the coding sequence. Adaptors and modulators show significantly lower diversity values. When the whole available sequence for the gene is considered, adaptors show lower values of nucleotide diversity that are significant in Africans (unpaired t-test P = 0.01) although not in Europeans (P = 0.09). Considering only the coding sequence, this difference is significant in Africans for the adaptors (P = 0.006) and the modulators (P = 0.05). We also used a permutation procedure to test this hypothesis, where the mean nucleotide diversity calculated for each defined functional category in the original data is compared to 10,000 replicates with class labels randomly permuted. Significance is reached in adaptors for Africans (P = 0.004) and Europeans (P = 0.03) when using the whole sequence, and for adaptors in Africans for the coding sequence (P = 0.04) (Additional file 1, Figures S5 and S6). We repeated this analysis for molecular divergence and, although the adaptors and modulators also show the lowest dN/dS values among the five main functional categories, these differences were not statistically significant (Additional file 1, Table S2).
Some functional classes are preferential targets for adaptive selection
We evaluated if some functional categories are also preferential targets for positive or balancing selection. We performed six different neutrality tests (see Methods) to detect genes with an excess of rare or intermediate variants, which after correcting for demographiy can suggest positive or balancing selection. Tajima's D and Fu and Li's D* and F* use the intraspecific variation data only, whereas Fu and Li's D and F and the normalized Fay and Wu's H tests compare these intraspecific information to a sequence from another species (outgroup). Tajima's D, and Fu and Li's D* and F* tests are based on the comparison of low frequency variants (Tajima's D) or singletons (Fu and Li's D* and F*) to intermediate variants. Fu and Li D and F tests are based on the comparison of mutations in the external branches of the genealogy to the total number of mutations and to the average number of nucleotide differences between pairs of sequences, respectively. In these five tests, values lower than expected under neutrality are obtained when there is an excess of rare variants (low frequency, singletons, mutations at external branches), and values higher than expected under neutrality are obtained when there is an excess of intermediate frequency variants. Fay and Wu's H and its normalized test have been proven to have power to detect the selective sweeps increasing the frequency of the derived alleles (those different to the allele in the outgroup species) produced by positive selection or initial stages of balancing selection . The significance of these tests was evaluated by comparing the results for each gene to the rest of the genes included in the study (Figure 2), since all the sequences are supposed to share the same demographic history. We considered a gene to show evidence of positive or balancing selection if it was included in the 95th upper or lower percentiles for two or more neutrality test in the same population. Additionally, for those genes statistical significance was also assessed by means of coalescent simulations that resemble the demographic history of Africans and Europeans (see Methods) (Additional file 1, Table S3). These methods should exclude the possibility of these signals being produced randomly or by demographic events such as expansion or population contractions. Moreover, our aim is to identify different selective pressures acting on the different functional classes, rather than describing unequivocal signatures of selection in a particular gene. These possible random effects of demography in a particular gene are also minimized by pooling together the genes in the same functional class, as we did for the subsequent analysis.
Table 2 shows the list of candidate genes for positive and balancing selection in each population. Coalescent simulations corroborated the results in all cases, with all the genes showing significant results in at least two of the neutrality tests (Additional file 1, Table S3). Although the neutrality tests used have more power to detect positive rather than purifying selection  we have indicated among the genes under positive selection those with significant results in the Fay and Wu's H test as genes with unequivocal signatures of positive selection (Table 2). From the total of 132 genes, 31 show signs of adaptive selection: 17 of positive selection (8 in Africans and 10 in Europeans, with MPO in the two populations) and 14 of balancing selection (9 in Africans and 10 in Europeans, with LY96, IL18RAP, IL1F5, IL1F7 and C3 in the two populations). Some of them have been previously reported by independent studies using the same or similar datasets: balancing selection in IL1F5, IL1F7, IL1F10 and IL18RAP  and positive selection in the TLR1-TLR6-TLR10 region finally attributed to TLR1 . It has to be noted that IL1A, IL1F7, IL1F5 and IL1F10 are located in a region spanning 300 kb at chromosome 2. As discussed in Fumagalli et al , it seems improbable that a hitchhiking effect is producing all these signatures even between the closest genes (IL1F5 and IL1F10), given the low LD and presence of recombination hotspots in this region, as well as the fact that balancing selection signatures have been proven to spread only over small regions. Moreover, other genes included in this region (IL1B, IL1F9, IL1F6 and IL1F8) did not show excess of intermediate variants.
Beyond signatures of adaptive selection on specific genes, we have analyzed the effect of functional classes. Figure 2 shows the results of the six neutrality tests performed. We tested for an excess of candidate genes under positive or balancing selection (or both) in each functional class, by comparing the number of genes with and without significant evidence of adaptive selection in this class against the rest of the genes included in the study (see Materials and Methods). Genes with statistical evidence for positive or balancing selection are not randomly distributed among the different functional classes, and tend to accumulate in a reduced number of classes. Specifically, there is an unexpected high number of acute phase proteins showing signatures of balancing selection, both in Africans and Europeans (Fisher's exact test P = 0.03 and 0.008, respectively). This excess of genes under balancing selection in this functional category remains statistically significant even if IL1F5 and IL1F10 signatures in Europeans are considered to be produced by the same selective event (P = 0.03). Remarkably, all genes in that category with signatures of balancing selection in both Europeans and Africans are included in the IL1 or TNF family (Figure 1). Moreover MEFV also shows an excess of intermediate variants in one of the neutrality tests in Europeans, and signatures of balancing selection have been previously reported [12, 13]. The complement proteins also show a significant enrichment for genes under balancing selection in Africans (Fisher's exact test P = 0.01). Finally, the receptors are also overrepresented among the genes showing signatures of positive or balancing selection, suggesting adaptive selection events. In this case, this tendency is only observed in Europeans (Fisher's exact test P = 0.04), and is due to an excess of extracellular receptors (Fisher's exact test P = 0.02).
A high number of genes analyzed here have showed signs of adaptive selection. Even if data is not comparable to whole genome scans looking at the footprint of positive selection with SNP data, the five studies published so far [2–4, 6, 39] all reviewed in  find from 444 to 1,030 genomic regions as candidates for positive selection, which represents a very small fraction of the genome (< 4%). Thus our results support previous evidence that immunity genes have likely been preferential targets of recent adaptive selection in human populations. The excess of genes under adaptive selection is still more striking when balancing selection is considered. Balancing selection has traditionally been considered to have acted less frequently on human populations, and the number of reported cases is smaller in comparison to positive selection. However, this difference could be at least partially due to the intrinsic difficulties presented by its detection .
Scientific literature contains several examples of host-pathogen interaction genes with evidences of balancing selection. In addition to some well known cases as the human major histocompatibility complex, more recently several works have described the action of balancing selection on additional immunity genes [10–14, 41–43] and blood group antigen genes that are supposed to act as molecular targets for pathogens [12, 13, 43–45]. The first genome scan for signatures of balancing selection has also confirmed an enrichment of immune system and host-pathogen interaction related genes .
The high number of immunity genes that are under balancing selection is presumably due to a beneficial effect in infections, that is counterbalanced by deleterious effects in autoinflammatory and autoimmune conditions (Figure 3) . The IL-1 family is overrepresented (most abundantly and strongly under balanced selection, with 6 genes represented in both European and African populations). Additionally, signatures of balancing selection have also been reported in the Mediterranean Fever gene, MEFV, another member of this family [12, 13]. There are also members of the complement (three genes) and the TNF family (TNFβ/LTα) showing signatures of balancing selection. While their role in host defense is well documented, these genes encode the main proinflammatory mediators of the two most important classes of sterile inflammation disorders: the autoinflammatory diseases (e.g. Crohn's disease, gout, autoinflammatory syndromes, Behcet's disease) in which inflammation is mediated by the IL-1 family, and the autoimmune diseases (e.g. rheumatoid arthritis, systemic lupus erythematous, type 1 diabetes) in which the TNF family and complement play a central role . Alternatively, variants found at genes under balancing selection could be conferring susceptibility to different pathogens and their effect on autoimmunity would be a consequence of past humans adaptations, as suggested recently for IFIH1, a innate immunity gene involved in resistance to virus . The fact that some of the genes found to be under balancing selection are exactly those encoding the major mediators of human autoinflammatory and autoimmune pathologies is relevant.
The dissection of the innate immune system presented here allows inferring other important biological consequences. Selection signatures at receptor genes support the major evolutionary flexibility described in extracellular TLRs compared to the intracellular ones [38, 47]. One important question that may be also asked is which major class of bacteria could have exerted the evolutionary pressures observed here. As the innate immune system is relatively non-specific, any genetic information obtained in the present study cannot refer to specific species or even families, but rather to broad groups of pathogens: both Gram-positive bacteria (e.g. Staphylococci and Streptococci, mycobacteria) and Gram-negative bacteria (Salmonella and other enterobacteriaceae, Yersinia pestis) contain pathogens that had a major impact on human populations during history. An answer to the question of whether both these two groups of bacteria exerted evolutionary pressures cannot be obtained from the selection profile of effector genes such as defensins or cytokines, as these proteins are generally necessary for host defense to both Gram-positive and Gram-negative bacteria. In contrast, a certain level of specificity of the innate immune response can be seen by the recognition of either Gram-positive bacteria mainly by the TLR2/TLR1/TLR6/TLR10 cluster, while Gram-negative bacteria recognition is heavily reliant on the TLR4 pathway . In our study we found signatures of selection in genes from both the TLR2 cluster (TLR2, TLR6, TLR10) and the TLR4 pathway (LY64, LY96, CD14), suggesting that both Gram-negative and Gram-positive bacteria have exerted pressure on innate immunity genes. In addition, TLR9, a pathogen recognition receptor recognizing unmethylated DNA from both Gram-negative and Gram-positive bacteria  showed also signatures of selection in Europeans. The exclusive positive selection of TLR9 and TLR10 (or TLR1) pathways in Europeans may be related to more recent responses that could have been related to the introduction of agriculture and zoonotic diseases (e.g. Brucella, Coxiella) or the major epidemics of the medieval periods (e.g. Yersinia pestis causing pandemics including pneumonic, septicaemic and bubonic plague). However, at this point one also has to be aware that some of these receptors have also a role for recognition of other types of pathogens (e.g. TLR9 for recognition of DNA from certain viruses or parasites, or TLR2/TLR4 recognizing also malaria and fungal structures) and although these are secondary recognition pathways for these microorganisms, one cannot exclude evolutionary pressures of non-bacterial pathogens on some of the genes analyzed here.
Overall, our results suggest that the innate immune system has had enough plasticity to play an important role in the adaptation of modern humans to new environments. Analysis of the protein-protein interaction network suggests a structural and functional constraint on genes that occupy a central position in the innate immunity network. This network is organized into a bow tie structure, consisting in a large number of receptors, a central small number of adaptor genes (the bow tie knot) and a large number of effectors. The bow tie structure is already known to characterize many metabolic and system networks and it is recognized as a topological feature which provides robustness to systems [33, 50]. It also has the advantage of making robustness compatible with a high capacity to evolve (or evolvability): the knot provides robustness to the system by easily accommodating for perturbations, while a great variability is allowed at the edges of the bow tie making possible the generation of new functionalities . While the core of the bow tie structure is highly conserved, signatures of adaptation are found at the bow tie edges. Therefore, the amount of selection acting on each gene appeared therefore highly dependent on its position in the network structure. This clearly shows the strength of the constraint represented by the topology of the network on the evolution of individual genes, which is able to explain much better the adaptive landscape of innate immunity in humans.
Andres AM, Hubisz MJ, Indap A, Torgerson DG, Degenhardt JD, Boyko AR, Gutenkunst RN, White TJ, Green ED, Bustamante CD, et al: Targets of balancing selection in the human genome. Mol Biol Evol. 2009, 26 (12): 2755-2764. 10.1093/molbev/msp190.
Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, et al: Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009, 19 (5): 826-837. 10.1101/gr.087577.108.
Tang K, Thornton KR, Stoneking M: A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 2007, 5 (7): e171-10.1371/journal.pbio.0050171.
Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biol. 2006, 4 (3): e72-10.1371/journal.pbio.0040072.
Wang ET, Kodama G, Baldi P, Moyzis RK: Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci USA. 2006, 103 (1): 135-140. 10.1073/pnas.0509691102.
Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R: Localizing recent adaptive evolution in the human genome. PLoS Genet. 2007, 3 (6): e90-10.1371/journal.pgen.0030090.
Barreiro LB, Quintana-Murci L: From evolutionary genetics to human immunology: how selection shapes host defence genes. Nat Rev Genet. 2010, 11 (1): 17-30. 10.1038/nrg2698.
Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES: Positive natural selection in the human lineage. Science. 2006, 312 (5780): 1614-1620. 10.1126/science.1124309.
Barreiro LB, Patin E, Neyrolles O, Cann HM, Gicquel B, Quintana-Murci L: The heritage of pathogen pressures and ancient demography in the human innate-immunity CD209/CD209L region. Am J Hum Genet. 2005, 77 (5): 869-886. 10.1086/497613.
Cagliani R, Fumagalli M, Riva S, Pozzoli U, Comi GP, Menozzi G, Bresolin N, Sironi M: The signature of long-standing balancing selection at the human defensin beta-1 promoter. Genome Biol. 2008, 9 (9): R143-10.1186/gb-2008-9-9-r143.
Ferrer-Admetlla A, Bosch E, Sikora M, Marques-Bonet T, Ramirez-Soriano A, Muntasell A, Navarro A, Lazarus R, Calafell F, Bertranpetit J, et al: Balancing selection is the main force shaping the evolution of innate immunity genes. J Immunol. 2008, 181 (2): 1315-1322.
Fumagalli M, Cagliani R, Pozzoli U, Riva S, Comi GP, Menozzi G, Bresolin N, Sironi M: A population genetics study of the Familial Mediterranean Fever gene: evidence of balancing selection under an overdominance regime. Genes Immun. 2009
Fumagalli M, Pozzoli U, Cagliani R, Comi GP, Riva S, Clerici M, Bresolin N, Sironi M: Parasites represent a major selective force for interleukin genes and shape the genetic predisposition to autoimmune conditions. J Exp Med. 2009, 206 (6): 1395-1408. 10.1084/jem.20082779.
Cagliani R, Fumagalli M, Biasin M, Piacentini L, Riva S, Pozzoli U, Bonaglia MC, Bresolin N, Clerici M, Sironi M: Long-term balancing selection maintains trans-specific polymorphisms in the human TRIM5 gene. Hum Genet. 2010
Fumagalli M, Cagliani R, Riva S, Pozzoli U, Biasin M, Piacentini L, Comi GP, Bresolin N, Clerici M, Sironi M: Population genetics of IFIH1: ancient population structure, local selection and implications for susceptibility to type 1 diabetes. Mol Biol Evol. 2010
Alvarez-Ponce D, Aguade M, Rozas J: Network-level molecular evolutionary analysis of the insulin/TOR signal transduction pathway across 12 Drosophila genomes. Genome Res. 2009, 19 (2): 234-242.
Kim PM, Korbel JO, Gerstein MB: Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci USA. 2007, 104 (51): 20274-20279. 10.1073/pnas.0710183104.
Fumagalli M, Pozzoli U, Cagliani R, Comi GP, Bresolin N, Clerici M, Sironi M: Genome-wide identification of susceptibility alleles for viral infections through a population genetics approach. PLoS Genet. 2010, 6 (2): e1000849-10.1371/journal.pgen.1000849.
Pozzoli U, Fumagalli M, Cagliani R, Comi GP, Bresolin N, Clerici M, Sironi M: The role of protozoa-driven selection in shaping human genetic variability. Trends Genet. 2010, 26 (3): 95-99. 10.1016/j.tig.2009.12.010.
Lazarus R, Vercelli D, Palmer LJ, Klimecki WJ, Silverman EK, Richter B, Riva A, Ramoni M, Martinez FD, Weiss ST, et al: Single nucleotide polymorphisms in innate immunity genes: abundant variation and potential role in complex human disease. Immunol Rev. 2002, 190: 9-25. 10.1034/j.1600-065X.2002.19002.x.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.
Tajima F: Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983, 105 (2): 437-460.
Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123 (3): 585-595.
Fu YX, Li WH: Statistical tests of neutrality of mutations. Genetics. 1993, 133 (3): 693-709.
Zeng K, Fu YX, Shi S, Wu CI: Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics. 2006, 174 (3): 1431-1439. 10.1534/genetics.106.061432.
Librado P, Rozas J: DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009, 25 (11): 1451-1452. 10.1093/bioinformatics/btp187.
Ramos-Onsins SE, Puerma E, Balana-Alcaide D, Salguero D, Aguade M: Multilocus analysis of variation using a large empirical data set: phenylpropanoid pathway genes in Arabidopsis thaliana. Mol Ecol. 2008, 17 (5): 1211-1223. 10.1111/j.1365-294X.2007.03633.x.
Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D: Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005, 15 (11): 1576-1583. 10.1101/gr.3709305.
Gao J, Ade AS, Tarcea VG, Weymouth TE, Mirel BR, Jagadish HV, States DJ: Integrating and annotating the interactome using the MiMI plugin for cytoscape. Bioinformatics. 2009, 25 (1): 137-138. 10.1093/bioinformatics/btn501.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11): 2498-2504. 10.1101/gr.1239303.
Assenov Y, Ramirez F, Schelhorn SE, Lengauer T, Albrecht M: Computing topological parameters of biological networks. Bioinformatics. 2008, 24 (2): 282-284. 10.1093/bioinformatics/btm554.
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science. 2002, 296 (5568): 750-752. 10.1126/science.1068696.
Csete M, Doyle J: Bow ties, metabolism and disease. Trends Biotechnol. 2004, 22 (9): 446-450. 10.1016/j.tibtech.2004.07.007.
Ma H, Zeng AP: Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics. 2003, 19 (2): 270-277. 10.1093/bioinformatics/19.2.270.
Ma HW, Zeng AP: The connectivity structure, giant strong component and centrality of metabolic networks. Bioinformatics. 2003, 19 (11): 1423-1430. 10.1093/bioinformatics/btg177.
Oda K, Kitano H: A comprehensive map of the toll-like receptor signaling network. Mol Syst Biol. 2006, 2: 2006 0015-
Zhai W, Nielsen R, Slatkin M: An investigation of the statistical power of neutrality tests based on comparative and population genetic data. Mol Biol Evol. 2009, 26 (2): 273-283. 10.1093/molbev/msn231.
Barreiro LB, Ben-Ali M, Quach H, Laval G, Patin E, Pickrell JK, Bouchier C, Tichit M, Neyrolles O, Gicquel B, et al: Evolutionary dynamics of human Toll-like receptors and their different contributions to host defense. PLoS Genet. 2009, 5 (7): e1000562-10.1371/journal.pgen.1000562.
Carlson CS, Thomas DJ, Eberle MA, Swanson JE, Livingston RJ, Rieder MJ, Nickerson DA: Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res. 2005, 15 (11): 1553-1565. 10.1101/gr.4326505.
Enard D, Depaulis F, Roest Crollius H: Human and non-human primate genomes share hotspots of positive selection. PLoS Genet. 2010, 6 (2): e1000840-10.1371/journal.pgen.1000840.
Andres AM, Dennis MY, Kretzschmar WW, Cannons JL, Lee-Lin SQ, Hurle B, Schwartzberg PL, Williamson SH, Bustamante CD, Nielsen R, et al: Balancing selection maintains a form of ERAP2 that undergoes nonsense-mediated decay and affects antigen presentation. PLoS Genet. 2010, 6 (10): e1001157-10.1371/journal.pgen.1001157.
Cagliani R, Riva S, Biasin M, Fumagalli M, Pozzoli U, Lo Caputo S, Mazzotta F, Piacentini L, Bresolin N, Clerici M, et al: Genetic diversity at endoplasmic reticulum aminopeptidases is maintained by balancing selection and is associated with natural resistance to HIV-1 infection. Hum Mol Genet. 2010, 19 (23): 4705-4714. 10.1093/hmg/ddq401.
Calafell F, Roubinet F, Ramirez-Soriano A, Saitou N, Bertranpetit J, Blancher A: Evolutionary dynamics of the human ABO gene. Hum Genet. 2008, 124 (2): 123-135. 10.1007/s00439-008-0530-8.
Ferrer-Admetlla A, Sikora M, Laayouni H, Esteve A, Roubinet F, Blancher A, Calafell F, Bertranpetit J, Casals F: A natural history of FUT2 polymorphism in humans. Mol Biol Evol. 2009, 26 (9): 1993-2003. 10.1093/molbev/msp108.
Fumagalli M, Cagliani R, Pozzoli U, Riva S, Comi GP, Menozzi G, Bresolin N, Sironi M: Widespread balancing selection and pathogen-driven selection at blood group antigen genes. Genome Res. 2008
Dinarello CA: Immunological and inflammatory functions of the interleukin-1 family. Annu Rev Immunol. 2009, 27: 519-550. 10.1146/annurev.immunol.021908.132612.
Mukherjee S, Sarkar-Roy N, Wagener DK, Majumder PP: Signatures of natural selection are not uniform across genes of innate immune system, but purifying selection is the dominant signature. Proc Natl Acad Sci USA. 2009, 106 (17): 7073-7078. 10.1073/pnas.0811357106.
Akira S, Uematsu S, Takeuchi O: Pathogen recognition and innate immunity. Cell. 2006, 124 (4): 783-801. 10.1016/j.cell.2006.02.015.
Krieg AM: CpG motifs in bacterial DNA and their immune effects. Annu Rev Immunol. 2002, 20: 709-760. 10.1146/annurev.immunol.20.100301.064842.
Kitano H: Biological robustness. Nat Rev Genet. 2004, 5 (11): 826-837.
Acknowledgements and Funding
We thank Giovanni Dall'Olio for his helpful comments. Support for this research comes from the Spanish Ministry of Innovation and Research grant SAF-2007-63171 to JB. Additional support from Direcció General de Recerca of Generalitat de Catalunya (Grup de Recerca Consolidat 2005SGR/00608) and the National Institute for Bioinformatics (http://www.inab.org). A.M. is supported by Red HERACLES (Instituto de Salut Carlos III). R.L. was supported by NIH grants HL065899, HL083069, HG004909 and HG003646. M.G.N. was supported by a Vici grant of the Netherlands Organization for Scientific Research. M.S. was supported by a PhD fellowship from the Programa de becas FPU del Ministerio de Educación y Ciencia, Spain (AP2005-3982).
FCas and JB designed the study. AM and MGN designed the functional network categories. FCas, MS, HL and LM performed the analyses. RL assisted the processing of sequencing data. FCas, MS, HL, LM, RL, FCal, PA and JB analyzed the data. FCas, MS, AM, MGN and JB elaborated the discussion. FCas, MS, MGN and JB wrote the manuscript with important contributions from AM. All the authors participated in critically reviewing the paper and approved the final version of the manuscript.