Impact of constitutional copy number variants on biological pathway evolution
© Poptsova et al.; licensee BioMed Central Ltd. 2013
Received: 22 June 2012
Accepted: 18 January 2013
Published: 23 January 2013
Inherited Copy Number Variants (CNVs) can modulate the expression levels of individual genes. However, little is known about how CNVs alter biological pathways and how this varies across different populations. To trace potential evolutionary changes of well-described biological pathways, we jointly queried the genomes and the transcriptomes of a collection of individuals with Caucasian, Asian or Yoruban descent combining high-resolution array and sequencing data.
We implemented an enrichment analysis of pathways accounting for CNVs and genes sizes and detected significant enrichment not only in signal transduction and extracellular biological processes, but also in metabolism pathways. Upon the estimation of CNV population differentiation (CNVs with different polymorphism frequencies across populations), we evaluated that 22% of the pathways contain at least one gene that is proximal to a CNV (CNV-gene pair) that shows significant population differentiation. The majority of these CNV-gene pairs belong to signal transduction pathways and 6% of the CNV-gene pairs show statistical association between the copy number states and the transcript levels.
The analysis suggested possible examples of positive selection within individual populations including NF-kB, MAPK signaling pathways, and Alu/L1 retrotransposition factors. Altogether, our results suggest that constitutional CNVs may modulate subtle pathway changes through specific pathway enzymes, which may become fixed in some populations.
The study of human genome variation over the last several years has expanded from single nucleotide polymorphisms (SNPs) to structural variants (SVs) of which copy number variants (CNVs) constitute a distinct class [1, 2]. CNVs are unbalanced SVs that are present in polymorphic copy number states, and include deletions, duplications, or combinations thereof. Recently, a comprehensive population-wide CNV map of the human genome was published . The authors reported on a large set of common CNVs (Minor Allele Frequency (MAF) > 5%) greater than 1 kilobase (kb) in length and genotyped 4,978 of them on 450 individuals with ancestry in Europe, Africa or Asia (the HapMap collection, http://www.hapmap.org). In addition, a recent report from the 1000 Genome Project  included about 20,000 deletion polymorphisms (MAF > 1%) in the same HapMap populations from European, African and Asian ancestry, and genotypes were inferred for most of the variants. Further Mills et al. characterized the majority of these deletions at sequence level resolution , and proposed a detailed map of SV hotspots formed by common mechanisms.
CNVs have now been implicated with multiple common human diseases (see [6, 7] for comprehensive reviews) including Crohn’s disease (20-kb deletion upstream IRGM) , osteoporosis (117-kb deletion of UGT2B17) , body mass index (45-kb deletion upstream of NEGR1), and decreased susceptibility to HIV (higher copy number of CCL3L1) . CNVs have also been shown to be associated with high risk of autism [12, 13], schizophrenia [14, 15] and cancer [16, 17].
The effect of SNPs on gene expression was recently described through extensive expression Quantitative Trait Locus (eQTL) analysis in European and African populations [18, 19]. In 2007, Stranger et al.  performed transcripts association analysis both with SNPs and with CNVs, and highlighted similarities and differences for the two types of variation. Recently, two independent studies focused on the characterization of the potential impact of CNVs on gene transcripts by systematically querying CNV copy number states and paired gene expression levels obtained by RNA sequencing on a common set of 129 individuals. Our group  queried more than 5,000 CNVs and showed that short CNVs (< 1 kb) and gains are more likely to have functional impact with respect to larger CNVs and deletions, respectively. Schlattl et al.  observed that CNVs exhibit a stronger correlation with expression than nearby SNPs and suggested a frequent causal role of CNVs in expression quantitative trait loci.
Several human polymorphism studies, mainly based on SNPs, have been undertaken to identify human genome regions that are under selection (a summary of genome-wide scans for positive selection can be found in Akey et al. ; identification of areas of balancing selection and of classic selective sweeps in Andres et al.  and Hernandez et al. , respectively). The integrated map of positive selection from Akey et al.  suggested that positive selection targets encompass ~14% of the human genome and ~23% of all UCSC RefSeq genes. In the context of SNPs, the signature of positive selection includes a high proportion of function-altering mutations, site frequency spectrum with high frequency of the derived allele and low genetic diversity (as a signature of complete or partial sweep), different allele frequency between populations, and long haplotypes . In the context of CNVs, positive selection could be suggested by allele frequencies that significantly differ between populations and by linkage disequilibrium with SNPs under positive selection [27, 28]. Alternatively, population differentiation could be a result of genetic drift, however the latter is more significant for small populations . For instance, the salivary amylase gene (AMY1) copy number varies considerably across populations and correlates with dietary starch prevalence, which supports the hypothesis of positive selection acting on AMY1 copy number in high-starch diet populations . Another example is the complex evolutionary history of the polymorphic UGT2B17 gene that shows high population differentiation  (see Additional file 1). Recently, RHOXF2 has been reported as fast-evolving homeobox gene in primates with rapid evolution and copy number changes driven by Darwinian positive selection acting on the male reproductive system .
Based on the observation of close genetic distances at the nucleotide level between human and chimpanzee, the hypothesis that regulatory mutations account for the majority of biological differences was proposed as early as 1975 . More recent work suggests that gene expression levels can also serve as targets of selection . In the past, few studies focused on individual genes’ substitution rates (non-uniform across the genome) suggested that different types of selection act on genes depending on their position in the pathway [35, 36]. Here, we posit more broadly that pathways can be subjects for selection.
Existing models of pathway evolution, such as the Horowitz retrograde model , the chemistry-driven patchwork model , and others (see  for a review on pathway evolution theories) consider major modifications to a pathway chain, such as recruitment of a new enzyme (new node) or a whole pathway duplication, that eventually lead to creation of a new pathway. In addition to these major changes, we argue that more subtle ones might play an important role. Specifically, we hypothesize that fixed changes in gene product concentration levels - resulting from the changes in gene transcription levels - act as small adjustments while we assume that the number of pathway nodes and the pathway structure remain unchanged.
In this study, we performed an analysis of 491 well-characterized biological pathways seeking to determine how CNVs mapping to pathway genes impact transcript levels and how this effect differs across populations. Our results suggest that CNVs may modulate subtle changes in pathways at specific nodes, which may become fixed in certain populations. We propose a model of pathway evolution where population differences are tuned at finite nodal points. We discuss here the role of CNVs as potential modulators of biological pathways in human genome evolution.
CNVs and HapMap genotype data
The complete set of 11,700 CNV coordinates from  was considered in the pathways enrichment/depletion analysis. Based upon the availability of high-resolution data  at single sample level, we considered genotype calls for 4,978 CNVs and then used this annotated set for population differentiation analysis. The sample set included 180 CEU (Utah residents with ancestry from Northern and Western Europe), 180 YRI (Yoruba from Ibadan, Nigeria), 45 JPT (individuals from Tokyo, Japan), and 45 CHB (individuals from Beijing, China) from the International HapMap Consortium (http://www.hapmap.org). Throughout this study, Japanese and Chinese individuals are collectively referred to as Asian (ASN). All the data were downloaded from Conrad et al. . Classification of CNVs as gains and losses was taken exactly as inferred in .
Gene annotation and pathway information
RefSeq Gene annotation information was downloaded from the University of California-Santa Cruz (UCSC) Web browser  as NCBI build 36 (hg18). The complete list of pathways and corresponding genes was compiled from Kyoto Encyclopedia of Genes and Genomes (KEGG)  and Biocarta (http://www.biocarta.com).
F-statistics and CNV frequencies
Population differentiation was evaluated using the F-statistics  that ranges between 0 (completely undifferentiated) and 1 (highly differentiated). F-statistics was calculated for each population pair by considering each CN genotype as an allele for diallelic CNVs as in McCarroll et al. . The F-statistics (Fst) was evaluated as follows: Fst = (Ht-Hs) / Ht ; Ht = 1 - ∑ ti2; ti =( (xi · Nx) + (yi · Ny) ) / (Nx+Ny); Hs =( (1 - ∑ xi2) · Nx + (1 - ∑ yi2) · Ny ) / (Nx+Ny), where xi and yi are the population frequencies of allelic copy number i (i = A0, A1, A2, A3, A4 or >A4) in population X and Y, respectively, Nx and Ny denote the number of individuals in population X and Y, and ti is a weighted average of xi and yi. This approach ignores the phase of the haplotype.
F-statistics cutoffs corresponding to the 5% and 1% tails of the distributions built for each population pair were considered. Specifically, the following values were calculated and applied: Fst=0.19 (P<0.05) and Fst=0.32 (P<0.01) for CEU-YRI; Fst=0.2 (P<0.05) and Fst=0.36 for (P<0.01) for YRI-ASN; Fst=0.14 (P<0.05) and Fst=0.24 (P<0.01) for CEU-ASN.
Beside the formal assessment of population differentiation by means of the F-statistics, we considered the frequency of polymorphisms, referred to as CNV frequency, as the sum of the frequencies of all CN states that differ from the major CN state. CNV frequencies are utilized in the generation of frequency heatmaps. For consistency, we selected the CEU major CN state for each CNV. This choice does not affect the results as we focus on differences rather than absolute values.
Size-dependent CNV enrichment/depletion analysis
The following steps describe the method: First, for each chromosome, we compose a “pseudo transcriptome” made of concatenated genes (regions from transcription start to transcription end positions) with flanking regions (Figure 1B). For simplicity, for genes with multiple isoforms coordinates corresponding to the widest size are considered. We then define a ‘gene area’ as the area that spans the gene and upstream and downstream flanks. Second, we refine the CNV set by retaining only CNVs that overlap with gene areas. In case of partial overlap, the CNV segment inside the gene area is retained (Figure 1B). Third, the null distribution of enrichment is constructed by permuting this new set of truncated CNVs on the transcriptome by maintaining the number and size of CNVs per chromosome. Last, we define simple rules to count CNV-gene overlap as illustrated in Figure 1C; the gene area is entirely within a CNV or the gene spans at least 50% of a CNV. Finally, for each pathway, the p-value is calculated by comparing the observed number of CNV-gene overlaps with the empirical null distribution constructed by the permutations.
It is important to mention few aspects that were not considered in the implementation of the null distribution. First, the utilized CNVs genomic distribution information is limited by the use of array-based platforms. Second, paralogous genes, likely relevant in the context of segmental duplications where CNVs are significantly enriched, were not considered. More refined approaches should eventually address these limitations.
The Size Dependent Enrichment Test executable and source code is downloadable at: http://demichelislab.unitn.it/tools/SizeEnrichmentTest.tar.gz.
Graphical representation of pathway related to CNV frequencies across populations
For each pathway, we considered the subset of genes overlapping with CNVs based on the gene areas (defined above) and represented it with the corresponding set of CNV-gene pairs. Based upon the systematic analyses of short- and long-range effect of CNVs on gene transcript levels [3, 20, 21], we considered 10 kb and 1 Mb flanks and composed two CNV-gene frequency maps. The 10 kb flanks map will capture the majority of the significant associations between CNVs and genes (Stranger et al. 2007) and the 1 Mb flanks map will include longer distance effects. We used 10 kB flanks for size-dependent enrichment/depletion analysis, and 1 Mb flanks for CNV association analysis with gene expression levels. For population differentiation analysis at the level of individual genes and of pathways we used 10 kb flanks to link genes and CNVs. For population differentiation analysis of genes and pathways linked to “functional” CNVs we used 1 Mb flanks. One CNV can overlap multiple genes and vice-versa one gene can encompass (‘hit by’) several CNVs. Therefore the number of CNV-gene pairs is not necessarily bound by the number of CNVs. Using 10 kb flanks we counted 5,282 genes paired with CNVs and using 1 Mb flanks we counted 21,378 genes paired with CNVs.
Heatmaps were used to graphically represent the CNV polymorphism frequency in each population, with CNV-gene pairs in rows, and populations in columns (YRI, CEU, and ASN). Hierarchical clustering (complete linkage with Euclidean distances) was applied to identify groups of similar CNV-gene frequencies across the three populations.
Gene expression data and association analysis between CNVs and gene expression levels
RNA-seq data for CEU and 69 YRI HapMap individuals were downloaded from  and . Sequencing data were originally generated using the Illumina Analyzer II with 36-base and 35 or 46-base pairs, respectively. YRI individual data were downloaded from http://eqtl.uchicago.edu, CEU individual raw data were obtained from ArrayExpress under accession numbers E-MTAB-197 and preprocessed applying RSEQtools . Association analysis between gene expression data and CNVs was performed independently for CEU and YRI individuals by cis analysis applying 1 Mb flanks to each variant accordingly to . Dosage effect (linear model) and allelic effect of transcript levels versus the copy number states were tested . Multiple hypothesis testing correction was evaluated by calculating the False Discovery Rate (Benjamini 1995) on the p-values; 10% and 5% thresholds were applied. We will refer to CNV-gene pair with significant association between CNV states and gene expression levels as functional CNV-gene pair.
Mechanisms of CNV formation
Mechanisms of CNV formation were inferred as described in  and are divided in four major classes: variable number of tandem repeats (VNTR), non-allelic homologous recombination (NAHR), transposable elements insertion (TEI) and non-homologous recombination (NHR). RepeatMasker annotation tracks were downloaded from UCSC browser to infer VNTR and TEI mechanisms. Two-sample test of proportions was applied to assess the significant differences in proportions of mechanism formation classes for CNVs differentiated in populations.
CNVs enrichment/depletion analysis at the level of pathways
List of pathways enriched for CNV-gene pairs through size-dependent analysis (10 kb flanks)
Antigen processing and presentation
Metabolism of xenobiotics by cytochrome P450
Type I diabetes mellitus
Pentose and glucuronate interconversions
The role of FYVE-finger proteins in vesicle transport
Phospholipase C d1 in phospholipid associated cell signaling
Activation of cAMP-dependent protein kinase, PKA
B Cell Receptor Complex
Enriched signaling pathways include Keratinocyte differentiation (16 genes out of 18, size-dependent enrichment test) and Phospholipase C d1 in phospholipid associated cell signaling (4 genes out of 5, size-dependent enrichment test) (see Table 1). The list of depleted pathways and distribution of KEGG and Biocarta functional classes is given in Additional file 2: Table S1 and Additional file 3: Figure S1. Distribution of the depleted pathway classes is similar with that of all pathway classes in both databases.
Focusing on the pathways enriched for the presence of CNVs, we compared the overall proportions of four CNV formation mechanisms and found evidence for enrichment of NHR (1.12 fold, P-value= 0.00014) and depletion of TEI (0.63 fold, P-value= 0.0019) formation classes. With respect to CNV types, gains or losses, depletion was observed for losses (0.89 fold, P-value= 0.0055) and enrichment for gains (1.19 fold, P-value= 0.0050).
Population differentiation at the level of individual genes and of pathways
List of top ranked CNVs (gene overlap using 10 kb flanks) for population differentiation (at 1% significance level, see Methods for cutoff values in three population pairs)
The analysis of all pathway CNV-gene frequency maps (see Additional file 3: Figure S3) suggested a characteristic pattern. Generally, a small proportion (6% on average) of CNV-gene per pathway show strong population differentiation (at least one CNV-gene pair in the pathway has significant population differentiation) while the remaining CNV-gene pairs are weakly differentiated or undifferentiated as we can observe for Apoptosis, Fructose and Mannose metabolism and NF-kb signaling pathway (Figure 3C). A total of 107 pathways exhibited population differentiation in the sense that at least one CNV-gene pair is differentiated (P<0.05), among them 41 pathways showed the highest population differentiation (P<0.01). This list includes the Purine metabolism (P<0.01), the Starch and sucrose metabolism (P<0.01), the Pentose and glucuronate interconversions (P<0.01), and other highly differentiated metabolic pathways (see the full list in Additional file 5: Table S3).
Association of CNVs with gene expression at the level of genes and pathways
The proximity between a CNV and a gene can be considered an indicator for transcription regulation effect. However, this only occurs for a fraction of variants. To better assess the potential regulator effect of CNVs on gene transcripts, we considered RNA sequencing gene expression data from 60 CEU  and 69 YRI  individuals (see Methods) as in . Overall, we detected significant CNV gene expression association (FDR < 5%) involving 54 functional CNV-gene pairs in CEU of which 16 could be mapped to 24 pathways, and 37 functional CNV-gene pairs in YRI of which 8 genes could be mapped to 14 pathways. Focusing on the set of functional CNV-gene pairs, we performed CNV enrichment/depletion analysis (using Hypergeometric test) for the set of 24 pathways in CEU and 14 pathways in YRI (Additional file 8: Table S7) and found a significant enrichment of functional CNV-gene pairs for one pathway, the Atrazine degradation pathway, in CEU and for 14, mostly metabolic, pathways in YRI. The latter includes Glutathione metabolism, Starch and sucrose metabolism, Cyanoamino acid metabolism, Androgen and estrogen metabolism and others (see the full list in Additional file 8: Table S7).
Then we focused on functional CNV-gene pairs, which were characterized as significantly differentiated among populations. Out of 81 (in CEU and YRI combined) functional CNV-gene pairs, 13 (16%) showed significant population differentiation (P<0.05), and thus, are potential candidates for positive selection. Of the genes linked to functional CNVs that showed population differentiation (P<0.05), 4 genes (UGT2B17, KRT39, APOBEC3B, and TAF4B) are involved in 8 known pathways (Additional file 9: Table S6): the Androgen and estrogen metabolism, the Metabolism of xenobiotics by cytochrome P450, the Pentose and glucuronate interconversions, the Porphyrin and chlorophyll metabolism, the Starch and sucrose metabolism, the Cell Communication, the Basal transcription factors, and the Atrazine degradation. The highest population differentiation (P<0.01) among functional CNVs was detected for 2 CNV-gene pairs: one is for the gene FAM128A, and the other for the gene TAF4B from the Basal transcription factors pathway. Illustrations of significant associations between CNV and gene expression are presented for the gene TAF4B belonging to the Basal transcription factors pathway from YRI data (Figure 3A) and for the gene APOBEC3B belonging to the Atrazine degradation pathway from CEU data (Figure 3B). TAF4B encodes a subunit of transcription initiation factor that has been shown to function as co-activator of genes from NF-kB signaling pathway . Functional CNV for TAF4B is 4 kb large and located 58 kb upstream the gene of 165 kb. Observed CN states are 2, 3, and 4, and the expression is the highest for CN equal to 2. CNV population frequencies are 89% in YRI, 15% in CEU, and 42% in ASN. The gene APOBEC3B is a gene from cytidine deaminase family performing C to U RNA-editing. It has been known as an antiviral factor that can act against retroviruses, such as HIV , and it has been also found to act as an inhibitor of L1 and Alu retrotranspositon . Functional CNVs for APOBEC3B is 36 kb large and completely removes the gene which is 10 kb in size. Population frequencies are 13% for CEU, 7% for YRI, and 54% for ASN.
Illustrations of associations, though not statistically significant, between CNV and gene expression for genes from the Mitogen-activated protein kinase (MAPK) signaling pathway are presented in Additional file 3: Figure S5. Associations were detected for two genes located consecutively in the pathway chain: first gene is from the CACNG family of gamma subunits important for regulating calcium channels , and the second belongs to RASGRP family that activates MAP kinase cascade.
In this study, we performed a comprehensive size-dependent analysis of the impact of common CNVs on 491 biological pathways across human populations, with the idea that abundance of genetic variation in pathways can be indicative of ongoing evolution. We first showed that CNV enriched pathways include not only signaling pathways and pathways involved in extracellular biological processes , but also metabolic pathways such as amino-acid, carbohydrate, energy and glycan metabolisms. The Glycan biosynthesis pathway is annotated as originated in vertebrates in a recent phylogenetic study of metabolic pathways in the context of evolution  and in the orangutan genome paper . Some of the CNV-gene pairs we report are potential candidates for individual studies intended to investigate pathway-related dysfunctions and metabolic diseases  (see Additional file 1).
In order to characterize the extent of human variation at the level of each pathway, we compiled a comprehensive list of population differentiated CNV-gene pairs. As expected, the majority of pathways exhibiting population differentiation, belong to the signaling class, including Calcium, MAPK, Toll-like receptor signaling, and others. Pathways from the metabolic class include Purine metabolism, Arginine and proline metabolism, and Starch and sucrose metabolism with the well documented example of AMY1 polymorphism and its relation to starch diet . These findings provide evidence that signaling pathways and pathways involved in extracellular activities are less conserved and are potentially under the influence of positive or adaptive selection . Comprehensive characterization of the influence of negative selection is mandatory in future studies. Highly differentiated CNVs overlapping or in proximity of genes indicate recent evolutionary events, and emphasize the importance to improve our understanding of selection forces that shape the observed population differentiation . Alternatively genetic drift events  might be responsible for the observed population differentiation, however their effect is considered to be significant in small populations. Pathway based analysis revealed CNV-gene pairs with intermediate Fst in addition to highly differentiated CNVs. We reasoned that this might correspond to ‘CNVs genetic hitchhiking’ similar to what was suggested by Barreiro  in the context of SNPs.
Recent studies also indicated the pervasiveness of negative, or purifying selection, acting on CNVs. Conrad et al.  reported on purifying selection acting on exonic and intronic deletions. Other studies reported on variants under negative selection [58–60]. The methods used for detection of purifying selection were based on the nucleotide level resolution, and restricted solely to the genes. Where our study focuses on the existence of significantly differentiated pattern, it is relevant to highlight that appropriate distinction between positive and purifying selection acting on CNVs is an important challenge that requires extensive future work.
We then investigated pathway enrichment considering the set of CNVs deemed functional through the association between the corresponding transcript levels and the copy number states and focused on those which also show differentiation across populations. Despite the fact that the transcript analysis is limited due to the sample size, the differences in the sequencing depth, and the sample type (i.e., lymphoblast cells), we reasoned that our integrated analysis would benefit from this additional layer of information. Making a simplifying assumption, one can consider that selection acting on transcript levels ultimately influences the gene product leading to population specific selective advantage, as it was shown for UGT2B17 and AMY1 in Asians and Europeans, respectively. Similarly, the variant associated with APOBEC3B (Figure 3B) and present at different frequencies across populations may suggest a population specific antiviral effect, and in particular, population specific effect on Alu-L1 retrotranspositional activity. However, one has to consider that the detected potential “functional” CNVs are not necessarily direct causal variants and might simply be linked to the causal variants.
Throughout the study some well-known cancer-related pathways were detected as enriched in population differentiated CNVs (see Additional file 1). Even though most of the detected genes have been extensively studied in relation to cancer, their differential effect across populations, including differential disease susceptibility, is still to be investigated. The gene TAF4B, whose expression levels are significantly associated with a population differentiated CNV (Figure 3A) is a known co-activator of NF-kB genes further stimulating NF-kB transcriptional activity . It was shown that functional consequences of NF-kB signaling pathway is determined by NF-kB oscillation dynamics  and the number, period and amplitude of NF-kB oscillations are regulated, via a negative feedback loop, by transcription levels of IkBα. In a similar way, transcription levels of TAF4B can regulate dynamics of NF-kB genes, providing different functional outcomes for the pathway. MAPK signaling pathway is another example of how the tuning of enzyme concentration can affect signaling pathway. We found population differentiated functional CNVs for CANCG and RASGRP4 located upstream of the MAPK pathway chain (Additional file 1 and Additional file 3: Figure S5). The analysis of the model for this pathway identified existence of two different dynamical regimes, and depending on parameters, the system can switch from single-state bi-stability to oscillations . This means that the MAPK signaling network can act as a switch and as a clock, and the altered (tuned) element concentration levels can initiate such transition or prevent cycling due to a shift in the threshold positions.
The effect of gene concentration level changes on the cellular phenotype and the concept of genetic balance has been addressed in , together with the study of dosage-sensitive genes documented in genomes of various species, including yeast and human, and often encoding transcription regulators, signal transduction elements and binding factors . However, we recognized that the complexity of possible downstream changes relies on the non-linear dynamics of biochemical reactions possibly leading to non-proportional change in the concentration of the final component .
Here we provide insights into human pathways enriched for population differentiated functionally active CNVs. Under the assumption that a pathway chain remains intact as a whole (i.e., no new enzymes are added), we hypothesize that evolutionary selected changes in transcription level of some genes constituting the pathway “tune” the pathway into a more favorable state for homeostasis, a process we refer to as the ‘tuning effect’. Last, we suggest that new pathways can stem as long-term potential outcome of the proposed tuning effect (Additional file 1 and Additional file 3: Figure S6).
Upon the characterization of functional CNVs and the concomitant population differentiation of the same variants suggestive of positive selection in different populations, it is challenging to discover the real effect of these changes on a pathway chain and to study the regulatory mechanisms in the cell that control the changes in gene concentration levels. The picture becomes more complicated by acknowledging the multiplicity due to the existence of co-factors concurring to gene regulation, to the presence of other sources of variations, like epigenetic events, and by gene regulation compensatory effects. Our analysis may help to reveal pathway nodes, which have undergone changes (positively, neutrally or negatively) in gene concentrations, or, in other words, pathways that have been tuned. Further studies are required to understand the impact of these and other changes on pathway structure and human diversity.
The authors would like to thank Alex Root, Andrea Sboner, and Alessandro Romanel for fruitful comments on this study and to acknowledge the Department of Defense New Investigator Award (PC094516 to FD) and the Starr Cancer Consortium for funding.
- Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: Detection of large-scale variation in the human genome. Nat Genet. 2004, 36 (9): 949-951. 10.1038/ng1416.PubMedView ArticleGoogle Scholar
- Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, et al: Large-scale copy number polymorphism in the human genome. Science. 2004, 305 (5683): 525-528. 10.1126/science.1098918.PubMedView ArticleGoogle Scholar
- Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, et al: Origins and functional impact of copy number variation in the human genome. Nature. 2010, 464 (7289): 704-712. 10.1038/nature08516.PubMed CentralPubMedView ArticleGoogle Scholar
- Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, Gibbs RA, Hurles ME, McVean GA: A map of human genome variation from population-scale sequencing. Nature. 2010, 467 (7319): 1061-1073. 10.1038/nature09534.PubMedView ArticleGoogle Scholar
- Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, et al: Mapping copy number variation by population-scale genome sequencing. Nature. 2011, 470 (7332): 59-65. 10.1038/nature09708.PubMed CentralPubMedView ArticleGoogle Scholar
- Fanciulli M, Petretto E, Aitman TJ: Gene copy number variation and common human disease. Clin Genet. 2010, 77 (3): 201-213. 10.1111/j.1399-0004.2009.01342.x.PubMedView ArticleGoogle Scholar
- Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biol. 2006, 4 (3): e72-10.1371/journal.pbio.0040072.PubMed CentralPubMedView ArticleGoogle Scholar
- McCarroll SA, Huett A, Kuballa P, Chilewski SD, Landry A, Goyette P, Zody MC, Hall JL, Brant SR, Cho JH, et al: Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn’s disease. Nat Genet. 2008, 40 (9): 1107-1112. 10.1038/ng.215.PubMed CentralPubMedView ArticleGoogle Scholar
- Yang TL, Chen XD, Guo Y, Lei SF, Wang JT, Zhou Q, Pan F, Chen Y, Zhang ZX, Dong SS, et al: Genome-wide copy-number-variation study identified a susceptibility gene, UGT2B17, for osteoporosis. Am J Hum Genet. 2008, 83 (6): 663-674. 10.1016/j.ajhg.2008.10.006.PubMed CentralPubMedView ArticleGoogle Scholar
- Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM, Heid IM, Berndt SI, Elliott AL, Jackson AU, Lamina C, et al: Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat Genet. 2009, 41 (1): 25-34. 10.1038/ng.287.PubMed CentralPubMedView ArticleGoogle Scholar
- Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, Nibbs RJ, Freedman BI, Quinones MP, Bamshad MJ, et al: The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science. 2005, 307 (5714): 1434-1440. 10.1126/science.1101160.PubMedView ArticleGoogle Scholar
- Pinto D, Pagnamenta AT, Klei L, Anney R, Merico D, Regan R, Conroy J, Magalhaes TR, Correia C, Abrahams BS, et al: Functional impact of global rare copy number variation in autism spectrum disorders. Nature. 2010, 466 (7304): 368-372. 10.1038/nature09146.PubMed CentralPubMedView ArticleGoogle Scholar
- Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh T, Yamrom B, Yoon S, Krasnitz A, Kendall J, et al: Strong association of de novo copy number mutations with autism. Science. 2007, 316 (5823): 445-449. 10.1126/science.1138659.PubMed CentralPubMedView ArticleGoogle Scholar
- McCarthy SE, Makarov V, Kirov G, Addington AM, McClellan J, Yoon S, Perkins DO, Dickel DE, Kusenda M, Krastoshevsky O, et al: Microduplications of 16p11.2 are associated with schizophrenia. Nat Genet. 2009, 41 (11): 1223-1227. 10.1038/ng.474.PubMed CentralPubMedView ArticleGoogle Scholar
- Stefansson H, Rujescu D, Cichon S, Pietilainen OP, Ingason A, Steinberg S, Fossdal R, Sigurdsson E, Sigmundsson T, Buizer-Voskamp JE, et al: Large recurrent microdeletions associated with schizophrenia. Nature. 2008, 455 (7210): 232-236. 10.1038/nature07229.PubMed CentralPubMedView ArticleGoogle Scholar
- Demichelis F, Setlur SR, Banerjee S, Chakravarty D, Chen JY, Chen CX, Huang J, Beltran H, Oldridge DA, Kitabayashi N, et al: Identification of functionally active, low frequency copy number variants at 15q21.3 and 12q21.31 associated with prostate cancer risk. Proc Natl Acad Sci USA. 2012, 109 (17): 6686-6691. 10.1073/pnas.1117405109.PubMed CentralPubMedView ArticleGoogle Scholar
- Diskin SJ, Hou C, Glessner JT, Attiyeh EF, Laudenslager M, Bosse K, Cole K, Mosse YP, Wood A, Lynch JE, et al: Copy number variation at 1q21.1 associated with neuroblastoma. Nature. 2009, 459 (7249): 987-991. 10.1038/nature08035.PubMed CentralPubMedView ArticleGoogle Scholar
- Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET: Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010, 464 (7289): 773-777. 10.1038/nature08903.PubMedView ArticleGoogle Scholar
- Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010, 464 (7289): 768-772. 10.1038/nature08872.PubMed CentralPubMedView ArticleGoogle Scholar
- Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, et al: Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007, 315 (5813): 848-853. 10.1126/science.1136678.PubMed CentralPubMedView ArticleGoogle Scholar
- Banerjee S, Oldridge D, Poptsova M, Hussain W, Chakravarty D, Demichelis F: A computational framework discovers New copy number variants with functional importance. PLoS One. 2011, 6 (3): e17539-10.1371/journal.pone.0017539.PubMed CentralPubMedView ArticleGoogle Scholar
- Schlattl A, Anders S, Waszak SM, Huber W, Korbel JO: Relating CNVs to transcriptome data at fine-resolution: Assessment of the effect of variant size, type, and overlap with functional regions. Genome Res. 2011, 21 (12): 2004-2013. 10.1101/gr.122614.111.PubMed CentralPubMedView ArticleGoogle Scholar
- Akey JM: Constructing genomic maps of positive selection in humans: where do we go from here?. Genome Res. 2009, 19 (5): 711-722. 10.1101/gr.086652.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Andres AM, Hubisz MJ, Indap A, Torgerson DG, Degenhardt JD, Boyko AR, Gutenkunst RN, White TJ, Green ED, Bustamante CD, et al: Targets of balancing selection in the human genome. Mol Biol Evol. 2009, 26 (12): 2755-2764. 10.1093/molbev/msp190.PubMed CentralPubMedView ArticleGoogle Scholar
- Hernandez RD, Kelley JL, Elyashiv E, Melton SC, Auton A, McVean G, Sella G, Przeworski M: Classic selective sweeps were rare in recent human evolution. Science. 2011, 331 (6019): 920-924. 10.1126/science.1198878.PubMed CentralPubMedView ArticleGoogle Scholar
- Kelley JL, Swanson WJ: Positive selection in the human genome: from genome scans to biological significance. Annu Rev Genomics Hum Genet. 2008, 9: 143-160. 10.1146/annurev.genom.9.081307.164411.PubMedView ArticleGoogle Scholar
- Kato M, Kawaguchi T, Ishikawa S, Umeda T, Nakamichi R, Shapero MH, Jones KW, Nakamura Y, Aburatani H, Tsunoda T: Population-genetic nature of copy number variations in the human genome. Hum Mol Genet. 2010, 19 (5): 761-773. 10.1093/hmg/ddp541.PubMed CentralPubMedView ArticleGoogle Scholar
- Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, et al: Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008, 451 (7181): 998-1003. 10.1038/nature06742.PubMedView ArticleGoogle Scholar
- Cavalli-Sforza LL, Menozzi P, Piazza A: The history and geography of human genes. 1996, Princeton, N.J.: Princeton University PressGoogle Scholar
- Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea FA, Mountain JL, Misra R, et al: Diet and the evolution of human amylase gene copy number variation. Nat Genet. 2007, 39 (10): 1256-1260. 10.1038/ng2123.PubMed CentralPubMedView ArticleGoogle Scholar
- Xue Y, Sun D, Daly A, Yang F, Zhou X, Zhao M, Huang N, Zerjal T, Lee C, Carter NP, et al: Adaptive evolution of UGT2B17 copy-number variation. Am J Hum Genet. 2008, 83 (3): 337-346. 10.1016/j.ajhg.2008.08.004.PubMed CentralPubMedView ArticleGoogle Scholar
- Niu AL, Wang YQ, Zhang H, Liao CH, Wang JK, Zhang R, Che J, Su B: Rapid evolution and copy number variation of primate RHOXF2, an X-linked homeobox gene involved in male reproduction and possibly brain function. BMC Evol Biol. 2011, 11: 298-10.1186/1471-2148-11-298.PubMed CentralPubMedView ArticleGoogle Scholar
- King MC, Wilson AC: Evolution at two levels in humans and chimpanzees. Science. 1975, 188 (4184): 107-116. 10.1126/science.1090005.PubMedView ArticleGoogle Scholar
- Blekhman R, Marioni JC, Zumbo P, Stephens M, Gilad Y: Sex-specific and lineage-specific alternative splicing in primates. Genome Res. 2010, 20 (2): 180-189. 10.1101/gr.099226.109.PubMed CentralPubMedView ArticleGoogle Scholar
- Flowers JM, Sezgin E, Kumagai S, Duvernell DD, Matzkin LM, Schmidt PS, Eanes WF: Adaptive evolution of metabolic pathways in Drosophila. Mol Biol Evol. 2007, 24 (6): 1347-1354. 10.1093/molbev/msm057.PubMedView ArticleGoogle Scholar
- Eanes WF: Molecular population genetics and selection in the glycolytic pathway. J Exp Biol. 2011, 214 (Pt 2): 165-171.PubMed CentralPubMedView ArticleGoogle Scholar
- Horowitz NH: On the evolution of biochemical synthesis. Proc Natl Acad Sci USA. 1945, 31: 153-157. 10.1073/pnas.31.6.153.PubMed CentralPubMedView ArticleGoogle Scholar
- Ycas M: On earlier states of the biochemical system. J Theor Biol. 1974, 44 (1): 145-160. 10.1016/S0022-5193(74)80035-4.PubMedView ArticleGoogle Scholar
- Lazcano A, Miller SL: On the origin of metabolic pathways. J Mol Evol. 1999, 49 (4): 424-431. 10.1007/PL00006565.PubMedView ArticleGoogle Scholar
- Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pheasant M, et al: The UCSC genome browser database: update 2009. Nucleic Acids Res. 2009, 37 (Database issue): D755-761.PubMed CentralPubMedView ArticleGoogle Scholar
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.PubMed CentralPubMedView ArticleGoogle Scholar
- Weir BS, Hill WG: Estimating F-statistics. Annu Rev Genet. 2002, 36: 721-750. 10.1146/annurev.genet.36.050802.093940.PubMedView ArticleGoogle Scholar
- McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, de Bakker PI, Maller JB, Kirby A, et al: Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet. 2008, 40 (10): 1166-1174. 10.1038/ng.238.PubMedView ArticleGoogle Scholar
- Habegger L, Sboner A, Gianoulis TA, Rozowsky J, Agarwal A, Snyder M, Gerstein M: RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries. Bioinformatics. 2011, 27 (2): 281-283. 10.1093/bioinformatics/btq643.PubMed CentralPubMedView ArticleGoogle Scholar
- Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, et al: Global variation in copy number in the human genome. Nature. 2006, 444 (7118): 444-454. 10.1038/nature05329.PubMed CentralPubMedView ArticleGoogle Scholar
- Holsinger KE, Weir BS: Genetics in geographically structured populations: defining, estimating and interpreting F(ST). Nat Rev Genet. 2009, 10 (9): 639-650. 10.1038/nrg2611.PubMedView ArticleGoogle Scholar
- Baye TM, Wilke RA, Olivier M: Genomic and geographic distribution of private SNPs and pathways in human populations. Per Med. 2009, 6 (6): 623-641. 10.2217/pme.09.54.PubMed CentralPubMedView ArticleGoogle Scholar
- Kudaravalli S, Veyrieras JB, Stranger BE, Dermitzakis ET, Pritchard JK: Gene expression levels are a target of recent natural selection in the human genome. Mol Biol Evol. 2009, 26 (3): 649-658.PubMed CentralPubMedView ArticleGoogle Scholar
- Yamit-Hezi A, Nir S, Wolstein O, Dikstein R: Interaction of TAFII105 with selected p65/RelA dimers is associated with activation of subset of NF-kappa B genes. J Biol Chem. 2000, 275 (24): 18180-18187. 10.1074/jbc.275.24.18180.PubMedView ArticleGoogle Scholar
- Romani B, Engelbrecht S, Glashoff RH: Antiviral roles of APOBEC proteins against HIV-1 and suppression by Vif. Arch Virol. 2009, 154 (10): 1579-1588. 10.1007/s00705-009-0481-y.PubMedView ArticleGoogle Scholar
- Bogerd HP, Wiegand HL, Hulme AE, Garcia-Perez JL, O’Shea KS, Moran JV, Cullen BR: Cellular inhibitors of long interspersed element 1 and Alu retrotransposition. Proc Natl Acad Sci USA. 2006, 103 (23): 8780-8785. 10.1073/pnas.0603313103.PubMed CentralPubMedView ArticleGoogle Scholar
- Burgess DL, Gefrides LA, Foreman PJ, Noebels JL: A cluster of three novel Ca2+ channel gamma subunit genes on chromosome 19q13.4: evolution and expression profile of the gamma subunit gene family. Genomics. 2001, 71 (3): 339-350. 10.1006/geno.2000.6440.PubMedView ArticleGoogle Scholar
- Freilich S, Goldovsky L, Ouzounis CA, Thornton JM: Metabolic innovations towards the human lineage. BMC Evol Biol. 2008, 8: 247-10.1186/1471-2148-8-247.PubMed CentralPubMedView ArticleGoogle Scholar
- Locke DP, Hillier LW, Warren WC, Worley KC, Nazareth LV, Muzny DM, Yang SP, Wang Z, Chinwalla AT, Minx P, et al: Comparative and demographic analysis of orang-utan genomes. Nature. 2011, 469 (7331): 529-533. 10.1038/nature09687.PubMed CentralPubMedView ArticleGoogle Scholar
- Lanktree M, Hegele RA: Copy number variation in metabolic phenotypes. Cytogenet Genome Res. 2008, 123 (1–4): 169-175.PubMedView ArticleGoogle Scholar
- Kim PM, Korbel JO, Gerstein MB: Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci USA. 2007, 104 (51): 20274-20279. 10.1073/pnas.0710183104.PubMed CentralPubMedView ArticleGoogle Scholar
- Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES: Positive natural selection in the human lineage. Science. 2006, 312 (5780): 1614-1620. 10.1126/science.1124309.PubMedView ArticleGoogle Scholar
- Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L: Natural selection has driven population differentiation in modern humans. Nat Genet. 2008, 40 (3): 340-345. 10.1038/ng.78.PubMedView ArticleGoogle Scholar
- Nguyen DQ, Webber C, Hehir-Kwa J, Pfundt R, Veltman J, Ponting CP: Reduced purifying selection prevails over positive selection in human copy number variant evolution. Genome Res. 2008, 18 (11): 1711-1723. 10.1101/gr.077289.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Schuster-Bockler B, Conrad D, Bateman A: Dosage sensitivity shapes the evolution of copy-number varied regions. PLoS One. 2010, 5 (3): e9474-10.1371/journal.pone.0009474.PubMed CentralPubMedView ArticleGoogle Scholar
- Yamit-Hezi A, Dikstein R: TAFII105 mediates activation of anti-apoptotic genes by NF-kappaB. EMBO J. 1998, 17 (17): 5161-5169. 10.1093/emboj/17.17.5161.PubMed CentralPubMedView ArticleGoogle Scholar
- Nelson DE, Ihekwaba AE, Elliott M, Johnson JR, Gibney CA, Foreman BE, Nelson G, See V, Horton CA, Spiller DG, et al: Oscillations in NF-kappaB signaling control the dynamics of gene expression. Science. 2004, 306 (5696): 704-708. 10.1126/science.1099962.PubMedView ArticleGoogle Scholar
- Qiao L, Nachbar RB, Kevrekidis IG, Shvartsman SY: Bistability and oscillations in the Huang-Ferrell model of MAPK signaling. PLoS Comput Biol. 2007, 3 (9): 1819-1826.PubMedView ArticleGoogle Scholar
- Veitia RA: Gene dosage balance in cellular pathways: implications for dominance and gene duplicability. Genetics. 2004, 168 (1): 569-574. 10.1534/genetics.104.029785.PubMed CentralPubMedView ArticleGoogle Scholar
- Veitia RA: Gene dosage balance: deletions, duplications and dominance. Trends Genet. 2005, 21 (1): 33-35. 10.1016/j.tig.2004.11.002.PubMedView ArticleGoogle Scholar
- Veitia RA: Nonlinear effects in macromolecular assembly and dosage sensitivity. J Theor Biol. 2003, 220 (1): 19-25. 10.1006/jtbi.2003.3105.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.