Different level of population differentiation among human genes

Background During the colonization of the world, after dispersal out of African, modern humans encountered changeable environments and substantial phenotypic variations that involve diverse behaviors, lifestyles and cultures, were generated among the different modern human populations. Results Here, we study the level of population differentiation among different populations of human genes. Intriguingly, genes involved in osteoblast development were identified as being enriched with higher FST SNPs, a result consistent with the proposed role of the skeletal system in accounting for variation among human populations. Genes involved in the development of hair follicles, where hair is produced, were also found to have higher levels of population differentiation, consistent with hair morphology being a distinctive trait among human populations. Other genes that showed higher levels of population differentiation include those involved in pigmentation, spermatid, nervous system and organ development, and some metabolic pathways, but few involved with the immune system. Disease-related genes demonstrate excessive SNPs with lower levels of population differentiation, probably due to purifying selection. Surprisingly, we find that Mendelian-disease genes appear to have a significant excessive of SNPs with high levels of population differentiation, possibly because the incidence and susceptibility of these diseases show differences among populations. As expected, microRNA regulated genes show lower levels of population differentiation due to purifying selection. Conclusion Our analysis demonstrates different level of population differentiation among human populations for different gene groups.


Background
After dispersal from Africa, humans have evolved to be characterized by substantial phenotypic variation, including variation in skin, hair, and eye color, body mass, height, diet, drug metabolism, susceptibility and resistance to disease, during the colonization of the World. Efforts to reveal the genetic bases of these variations should provide important insight into the history of human evolution, gene function, and the mechanisms of disease [1,2]. Indeed, with the advent of large scale comparative genomic and human polymorphism data, a flood of studies have identified many candidate genes and genomic regions accounting for the observed phenotypic characters [2]. However, the evolutionary forces, i.e., positive selection, balancing selection, purifying selection, or neutral evolution, driving the variation of these phenotypic traits remain largely unknown.
In general, population differentiation under neutral evolution is mostly influenced by demographic history; however, adaptation to a local environment, driven by positive selection, will increase the level of population differentiation [3]. In contrast, negative and balancing selection tends to reduce population differentiation [3]. Accordingly, the evaluation of the level of population differentiation of the human genome would be helpful and informative for the identification of the genetic basis of the phenotypic difference observed in different human populations.

Results and Discussion
Here, we evaluated the level of population differentiation for human genes on autosomal chromosomes among three populations: African, European and East Asian, based on the HapMap data (Phase II) [4], using the parameter F ST according to methods described previously [3,5]. A previous study has reported that there is a higher level of population differentiation at gene regions compared to non-gene regions in the genome [6]. However, in our analysis, we observed that for several chromosomes, including 5,6,8,11,13, and 20, did not show a pattern with higher population differentiation at genic compared to non-genic regions, namely genic regions did not have excess SNPs with a higher F ST (≥0.6) ( Figure S1 in Additional file 1).

Functional significance of genes with higher levels of population differentiation
Since an analysis of categories that contain only a few genes will have low statistical power, here we only summarize categories that contain at least 10 genes. Figure 1 summarizes the biological processes that are enriched with higher F ST SNPs with a significant P value of 10 -10 or lower (see Method), and their λ values, with λ being the ratio of the proportion of higher F ST SNPs (≥0.6) in the analyzed category to the proportion of higher F ST SNPs in genome-wide genes (which is 0.0049). The categories listed in Figure 1 include a large number involved with organ development, such as those involved in pancreas, lung, and heart development. For example, GO: 0021983, pituitary gland development, is enriched with high F ST SNPs and has the highest λ value, 19.37. The pituitary gland produces and secretes many hormones, some of which stimulate other glands to produce other types of hormones, thus this organ and it controls many biochemical processes, e.g. growth, homeostasis, stress response, reproduction, and metabolism [7,8], that similarly demonstrate a high level of population differentiation, such as developmental growth (GO: 0048589), reproduction related (GO:0030317, GO:0007286, GO:0007276), and several metabolic pathways (GO: 0006641,GO: 0042593, GO: 0042632) (see following text and Figure 1).
An intriguing observation is that osteoblast development is significantly rich in high F ST SNPs (λ = 12.28, P= 4.92E-88 after multiple testing). Osteoblasts are mononucleate cells that are responsible for bone formation. Modern humans demonstrate substantial phenotypic variation, which to a large extent can be illuminated by the skeletal system, such as height, body mass, body mineral density, and craniofacial differences. Indeed, evidence indicates that the human skeletal system has evolved rapidly since the advent of agriculture [9] and our recent study concluded that the high levels of population differentiation of skeletal genes among human populations was driven by positive selection [10].
Another interesting category is hair follicle development, which also showed a higher level of population differentiation (GO: 0001942, λ = 4.09, P= 2.07E-08 after multiple testing). Hair is produced by hair follicles. Similar to the skeletal system, hair morphology, including water swelling diameter and section, shape of fiber, mechanical properties, combability and hair moisture, have distinctive traits among human populations [11]. Previous studies have identified some genes involved in hair follicle development that have undergone recent positive selection, as detected by the long range haplotype homozygosity test, such as EDAR and EDA2R [12,13]. These studies, together with our evidence of higher population differentiation in the genes involved in the hair follicle development support a hypothesis of adaptive evolution accounting for the diversification of human hair.
Immunity-related genes, however, which are a common target of positive selection [2,15,16], are involved in small list of categories with a higher proportion of higher F ST SNPs. This observation is probably attributable to the fact that many of the genes in the immunity system evolve under balancing selection in human populations for a heterozygote advantage, which would reduce the level of population differentiation [17,18].
Tables S1 in Additional file 2, and Tables S2 in Additional file 3 summarize the GO categories in cellular component and molecular function with an enrichment of higher F ST SNPs.
In addition, to discern which population(s) contribute more to the pattern, we generated three pairwise sets of At the genes in the biological processes described in Figure 1, the three data sets demonstrate consistent pattern of significantly higher proportion of higher F ST SNPs compared with that at the genome-wide genes (Figure 2), which suggested that the population differentiation is present commonly between pairwise populations.
Population differentiation under neutral evolution is mostly influenced by demographic history (that is, genetic drift and gene flow), which can generate similar pattern with biological factor such as natural selection. However, demographic history tends to influence all loci in the genome equally, and natural selection acts only on the single gene or a group of functional related genes. Compared with the proportion of higher F ST SNPs in the genome-wide genes, we present some groups of functional related genes enriched with high F ST SNPs, which are mostly driven by positive natural selection, although the confounding factor of demographic history cannot be excluded absolutely.

Population differentiation in disease-related genes
Studies of the pattern of molecular evolution of human disease-related genes will provide insight into the origin, maintenance and mechanism of disease [19]. Previous reports suggested that disease-related genes tend to evolve under purifying selection based on the comparison of non-synonymous rate to synonymous substitution rates [19][20][21]. Here, as expected, we found that disease-related genes (including Mendelian disease genes and complex disease genes), demonstrate a significant excess of SNPs with lower F ST (≤0.05), relative to other genes (χ 2 = 23.16, P= 1.49E-06 for OMIM gene panel, χ 2 = 193.78, P = 4.76E-44 for complexdisease gene panel, Figure S2 in Additional file 1). These disease genes demonstrate an excess of lower F ST SNPs in the lower frequency bins but not in the high frequency bins (Figure 3), suggesting that negative selection, rather than balancing selection, operated on these genes. Surprisingly, higher F ST (≥0.6) SNPs are enriched significantly at Mendelian disease genes (OMIM) relative to other genes (χ 2 = 30.47, P = 3.39E-08), with three MAF bins demonstrating statistical significance ( Figure 4). These higher F ST SNPs are probably under positive selection. This pattern, however, was not observed in complex disease genes and appear inconsistent with the previous study by Blekhman et al. (2008) [20]. Blekhman et al. (2008) found that Mendelian-disease genes appear to be under widespread purifying selection but that genes that influence complex disease risk show lower levels of evolutionary conservation, as assessed by the ratio of nonsynonymous to synonymous substitutions (Dn/Ds), possibly because they were targeted by both purifying and positive selection. The difference in results is probably attributable to the different methods used to assess sequence evolution: Dn/Ds method changes over a long time scale (i.e. between human and other species), while F ST measures recent evolution (i.e., since the separation of modern human populations). The incidence and susceptibility to some Mendelian diseases might demonstrate higher levels of differences among modern human populations. Lower levels of population differentiation in microRNA targeted genes The regulation of gene expression is crucial to the development of an organism and has been increasingly recognized that a remarkable fraction of regulation is dominated by microRNAs (miRNAs) [22,23]. miRNAs are a group of~23 nt endogenous RNAs important for a diverse range of biological functions that direct the posttranscriptional repression of mRNAs by cleavage or translational repression [22,23]. Evidence has shown that negative selection operates on miRNA regulated genes [24]. Here, we observed that microRNA targeted genes present a significant excess of lower F ST (≤0.05) SNPs (χ 2 = 29.76, P = 4.90E-08), and significantly fewer high F ST (≥0.6) SNPs (χ 2 = 37.61, P = 8.63E-10), relative to other genes ( Figure S3 in Additional file 1). The lower F ST SNPs are mainly restricted within the lower minor allele frequency bins, and not the intermediate frequency bin (Figure 5), suggesting that widespread purifying selection operated on miRNA targeted genes.

Conclusions
In this study, we find that genes involved in osteoblast development, hair follicles development, pigmentation, spermatid, nervous system and organ development, and some metabolic pathways have higher levels of population differentiation. Surprisingly, we find that Mendelian-disease genes appear to have a significant excessive of SNPs with high levels of population differentiation, possibly because the incidence and susceptibility of these diseases show differences among populations. As expected, microRNA regulated genes show lower levels of population differentiation due to purifying selection. Our analysis demonstrates different level of population differentiation among human populations for different gene groups.

Methods
Since genes on the sex chromosomes are involved in higher population differentiation than those on the autosomal chromosomes [3], we only analyzed data from the autosomal chromosomes. Allele frequency data for SNPs on autosomes were retrieved from HapMap Phase II (release 24, NCBI36) [4] for three populations: African (YRI panel including 60 Yoruban individuals from Ibadan), European (CEU panel including 60 individuals of Utah residents with ancestry from northern and western Europe) and East Asian (EA panels including 45 Han Chinese (HCB) and 45 Japanese from Tokyo (JPT)).
To evaluate the degree of population differentiation, F ST values of the polymorphic SNPs with minor allele frequencies ≥0.01 in at least one population were calculated as previously described [3,5]. Since negative values have no biological explanation these were set to 0. Protein coding genes on the human autosomal chromosomes, and their corresponding gene ontology (GO) terms including three categories: biological process, cellular component, and molecular function, were downloaded from Ensembl (http://www.ensembl.org version 54) by means of BioMart [25]. Each gene was extended 500 bp upstream of 5'-termus and downstream of 3'-termus to include all of its SNPs. χ 2 tests with one degree of freedom were used to test for the significance of the enrichment of SNPs with higher (≥0.6) F ST values compared with genome-wide genes empirical data based on 2 × 2 contingency tables constructed by the numbers of SNPs. For these analyses, Bonferroni correction was used for the multiple testing. To better understand the enrichment, we calculated the parameter, λ, the ratio of the proportion of higher F ST SNPs in the analyzed category to that in the genome-wide genes. λ values significantly higher than 1 indicates a higher population differentiation of genes in the category among human populations.
Complex disease genes were obtained from the Genetic Association Database (GAD) [26]. Human Mendelian disease genes were obtained from the study by Blekhman et al. (2008) (OMIM) [20]. Genes targeted by microRNA were obtained from targetscan (http://www. targetscan.org, release 5.1) [27][28][29]. For these genes, χ 2 tests with one degree of freedom were used to test the significance of an enrichment of SNPs with higher (≥0.6) F ST values and lower (≤0.05) F ST values, respectively, compared with other genes based on 2 × 2 contingency tables constructed by the numbers of SNPs.

Additional material
Additional file 1: Word file including Figure S1, FigureS2 and Figure  S3.
Additional file 2: Word file including Table S1.
Additional file 3: Word file including Table S2.