Human-specific methylated regions are enriched in schizophrenia

Background Despite the reduced fertility of patients, schizophrenia has a prevalence of ~1% in the general population. One explanation for this persistence is that schizophrenia is a by-product of human evolution. This hypothesis is supported by evidence suggesting that recently-evolved genomic regions in humans are involved in the genetic risk for schizophrenia. We tested if genetic variants located within human-specific differentially methylated regions (DMRs) are enriched for association with schizophrenia. Methods We used analytical tools that evaluate the polygenic enrichment of a set of genomic variants against all genomic variants. SNPs identified in genome-wide association studies (GWAS) of schizophrenia and 11 other phenotypes were classified according to their location in previously defined human-specific DMRs. These DMR SNPs were then tested for enrichment of association with the GWAS trait. Results Schizophrenia was the only trait in which GWAS SNPs within human-specific DMRs showed clear enrichment of association that passed the genome-wide significance threshold. The enrichment was not observed for Neanderthal or Denisovan DMRs. The enrichment seen in human DMRs is comparable to that for genomic regions tagged by Neanderthal Selective Sweep markers, and stronger than that for Human Accelerated Regions. Conclusions Regions where DNA methylation modifications have changed during recent human evolution harbor more schizophrenia-associated genetic variants than expected. Our study further supports the hypothesis that genetic variants conferring risk of schizophrenia co-occur in genomic regions that have changed as the human species evolved. Our results also suggest that interaction with the environment might have contributed to that association.


Introduction
Schizophrenia is a psychiatric disorder that has been reported throughout human history, possibly as far back as 5000 years (1,2). Family, twin and adoption studies estimate that schizophrenia has a high heritability of 60-90% (3-6) Today, schizophrenia is estimated to have a prevalence of 1%. It is associated with reduced fertility and increased mortality (7)(8)(9)(10), and its persistence despite this heavy burden is paradoxical. One explanation is that evolution has indirectly selected the disease instead of eliminating it -the disease may co-segregate with creativity and intellectual prowess, providing selective advantages to the kin of affected individuals (9,11). Crow first argued that language and psychosis may have common origins, which could explain the persistence of schizophrenia in human populations (11,12). This evolutionary hypothesis can now be tested, thanks to the identification of genetic factors implicated in schizophrenia (13)(14)(15) and the availability of datasets that reflect recent genomic evolution in humans (16)(17)(18).
Large genome-wide association studies (GWAS) have identified thousands of variants that are associated with schizophrenia (13-15) but our mechanistic understanding of the candidate variants is poor. One approach to investigating the function of schizophrenia-associated variants is comparative genomics, which investigates the evolutionarily relevance of certain genomic regions (19). This field has introduced new datasets to test disease origins in humans, including Human Accelerated Regions (HARs) and Neanderthal Selective Sweep (NSS) scores (17,18). HARs are genomic regions that are highly conserved in non-human species, but have undergone rapid sequence change in the human lineage (19)(20)(21)(22)(23). Xu et al (17) showed that genes near HARs are enriched for association with schizophrenia. Neanderthals were hominids that co-existed and even bred with modern humans (24,25). Comparison of Neanderthal and human genome sequences (26,27) has revealed genomic regions that have experienced a selective sweep in modern humans, presumably following a favorable mutation (27). Negative NSS scores can be used to pinpoint mutations (usually single nucleotide changes) that were positively selected in humans as they Banerjee et al 4 diverged from Neanderthals. Srinivasan et al (18) found that genomic regions tagged by negative NSS scores show enrichment of association with schizophrenia.
Using specific interpretation of genome sequencing in two recently extinct hominids, Neanderthals and Denisovans, Gokhman et al (28) mapped genome-wide methylation levels (i.e. the methylome) and compared them to modern humans. While 99% of the methylation maps were identical in the three hominids, nearly 2000 differentially methylated regions (DMRs) were identified, which give the first clues about the role of epigenomic evolution in generating anthropometric differences between modern humans and their ancient cousins (28). These DMRs provide a dataset of evolutionary annotations complementary to pre-existing datasets. Unlike HARs and NSS scores, which are based on DNA sequence changes, DMRs provide information on the evolution of epigenomes and hence an insight into how the environment could interact in different ways with the genome in the three hominids. We thus examined if these evolutionary DMRs are enriched for association with schizophrenia or other human traits. Using previously published methodologies (18,29,30) and publicly available GWAS datasets we systematically analyzed twelve diverse phenotypes to investigate the potential role of regions susceptible to epigenetic variation in the emergence of specific traits in the human lineage.

Differentially Methylated Region data
Coordinates for DMRs were obtained from data publicly available in Supplemental Table S2 of Gokhman et al, 2014 (28). This file contained DMRs inferred by comparing fossilized Neanderthal and Denisovan limb samples with osteoblasts from modern humans. DMRs are classified according to the hominid in which the change occurred, i.e. human-specific, Neanderthal-specific and Denisovan-specific DMRs. DMRs that could not be classified reliably (unclassified DMRs) (28) were not used. Full methodological details for assigning DMRs are in the Supplementary File of the original paper (28).

HAR data
Genomic coordinates were obtained from publicly available data (docpollard.com/2x) for three classes of human accelerated region: HARs, in which regions conserved in mammals are accelerated in humans; PARs, in which regions conserved in mammals are accelerated in primates; and pHARs, in which regions conserved in primates (but not other mammals) are accelerated in humans. Conversion to hg19 assembly was performed using the liftOver tool from the UCSC Genome Browser.

NSS data
NSS data was obtained as a list of markers with corresponding NSS values from Srinivasan et al (18). Markers with negative values, indicating positive selection in humans, were filtered out and used for analysis.

SNP assignment to DMRs
SNPs were assigned to DMRs with LDsnpR (39) using positional binning and LD (linkage disequilibrium)-based binning. We used both methods because DMR-localized SNPs that were not genotyped in a specific GWAS would be missed if we used positional binning alone (39) (Supplemental Table S1). The LD file utilized in HDF5 format was constructed on the European reference population of 1KGP and can be publicly downloaded at: http://services.cbu.uib.no/software/ldsnpr/Download.

Quantile-Quantile (QQ) Plots
QQ plots are an effective tool to visualize the spread of data and any deviations from expected null distributions. They are frequently utilized in GWAS to depict enrichment of true signals. When the observed distribution matches the expected distribution, a line of equality is obtained that depicts the null hypothesis. If the observed and expected distributions differ, there will be deviation from this null line. In GWASs, due to the extremely low p-values, it is common to depict p-values by converting them to negative log 10 values so that with smaller p-values, higher negative logarithmic values are obtained. We plotted the negative log 10 of observed p-values against the expected negative log 10 of a normal distribution. Leftwards deflections from the null line represent enrichment (29).

INRICH
INterval EnRICHment Analysis (INRICH) is a robust bioinformatics pipeline to determine enrichment of genomic intervals implicated by LD with predefined or custom gene sets (30). It takes into account several potential biases that can otherwise lead to false positives. It is well suited for testing GWAS-implicated SNPs for association with gene-sets as it controls for variable gene size, SNP density, LD within and between genes, and overlapping genes with similar annotations.
We followed the procedure described in ref 17, with the extended MHC region (chr6:25-35Mb) masked and SNPs with MAF <0.05 excluded. To assess enrichment for SNPs with different disease significance thresholds, we generated a range of LD-implicated regions through LD clumping in PLINK for index SNPs with p-values from 1x10 -2 to 1x10 -8 . LD Clumps were formed at r 2 =0.5 with the clump range limited to 250kb. INRICH was run on all the sets of LD intervals using default parameters described by Lee et al (30). INRICH merges overlapping genes and overlapping intervals to prevent potentially inflated results due to multi-counting of the same genes/intervals. A total of 7252, 2510, 1015, 445, 207, 108 and 68 intervals were analyzed respectively for SNPs with p-values from 1x10 -2 to 1x10 -8 . Enrichment was determined for genes located within 100kb of human DMRs, HARs, PARs and pHARs as described previously (17) and genes in LD blocks containing NSS markers (which are single-base markers, unlike HARs and DMRs). Since INRICH tests for enrichment of genes, we used an LD block with a strict threshold r 2 ≥0.8 to include potential genes of interest near NSS markers. GENCODE v19 gene database (last accessed 5 th February 2016) was used to map the genes to DMRs, NSS markers and HARs.

Pathway analysis
Banerjee et al 8 Pathway analysis was performed using Ingenuity Pathway Analysis (IPA) from QIAGEN (www.qiagen.com/ingenuity, last accessed 26 th August 2016). The DMR SNPs driving the enrichment were determined from QQ plots using the method of Schork et al (29). These SNPs were then mapped to genes using LDsnpR (39). 5338 enriched SNPs were mapped to 349 unique overlapping RefSeq genes and 446 RefSeq genes in LD. Genes in LD blocks containing NSS markers were determined in a similar manner. 4276 enriched SNPs mapped to 648 overlapping RefSeq genes and 1363 RefSeq genes in LD. IPA was run on all these gene sets separately. The reference set was Ingenuity Knowledge Base (Genes). Both direct and indirect relationships were analyzed. All data sources were included with the confidence parameter set to experimentally observed and highly predicted pathways for Human. Under Tissues & Cell Lines, we performed the analysis once with all organ systems and once with only the nervous system.

SNPs in human-specific DMRs are enriched for association with schizophrenia.
We determined enrichment using QQ plots as described by Schork et al (29). In such plots the baseline is the null line of no difference between expected distribution of p-values and observed p-  (34), systolic blood pressure (35), diastolic blood pressure (35), body mass index (36), and height (37). The GWAS datasets are summarised in Supplemental Table   S1. For each trait, we extracted the SNPs that occur in DMRs and compared them with the full list of SNPs. We generated a list of SNPs within DMRs (positional annotation) and a list of SNPs in linkage disequilibrium (LD-based annotation) with markers within DMRs (Supplemental Table S1).
For the schizophrenia GWAS, enrichment was observed for both SNPs in LD with markers in DMRs ( Figure 1, Supplemental Figure S1) and for SNPs located within DMRs (Figure 2). Although there was a slight leftward deflection in the higher p-values (smaller negative log 10 of p-values) in some other traits (e.g. height; Figure 1, Supplemental Figure S1), the observed enrichment only crosses the genome-wide significance level of 5x10 -8 for the schizophrenia SNPs. The enrichment of disease-associated markers in DMRs is thus specific to schizophrenia and is independent of LD.

Human-specific DMR enrichment in schizophrenia is independent of the MHC region, other genomic annotations and total markers genotyped
The MHC harbors several significant schizophrenia markers and could potentially bias our results because of long-range LD. We therefore repeated the analysis with the MHC region masked and found that the enrichment remains ( Figure 1).
The schizophrenia GWAS had the highest density of markers genotyped (~9.4 million) and thus had the most SNPs in DMR regions (Supplemental Table S1), which could artificially inflate the enrichment. We normalized the total number of SNPs within DMRs with the total number of SNPs genotyped in each GWAS and found that the proportion of SNPs in DMRs is nearly identical for all traits (Supplemental Figure S3). To further eliminate the possibility that the enrichment is due to variation in the number of markers analyzed, we extracted ~2.4 million SNPs that were common across the twelve GWAS. Although not as strong as with the full set, the deflection observed for the schizophrenia GWAS remains higher than any other trait, indicating the presence of significant disease markers in DMRs. These validations point to a true enrichment of association of the human DMRs with schizophrenia that is independent of the number of markers in a GWAS. It should be noted that we cannot rule out enrichment in the ADHD and BPD GWAS, because they are lacking in power (Supplemental Figure S1).
Additionally, we considered the distribution of significant SNPs based on genomic annotations of 5'UTRs, Exons, Introns and 3'UTRs (29). Contrary to previously published findings (29), the enrichment was highest for intronic SNPs and lowest for 5'UTR SNPs (Supplemental Figure S4).

This makes sense biologically as one would not expect open chromatin regions, containing exons
and UTRs, to possess methylation marks.

Only human-specific DMRs are enriched for association with schizophrenia
Next, we used QQ plots to test whether markers located in the Neanderthal-and Denisovan-specific DMRs are enriched for association with schizophrenia. No enrichment was observed (Figure 2). It should be noted that this approach may not be appropriate for testing Neanderthal-and Denisovanspecific DMRs since (a) the schizophrenia GWAS was conducted in humans; (b) SNP and LD information is available only for humans; (c) the three hominids had variable number of DMRs, which affected the number of SNPs captured via positional annotation.

Comparison of Human DMRs with other evolutionary annotations
We compared the enrichment observed for the human DMRs with the enrichment previously reported for NSS markers and HARs (17,18). We first compared the enrichment via QQ plots. We find that the enrichment of human DMRs in schizophrenia is comparable to that observed for NSS markers and far greater than that observed for HARs (Figure 3).
To ensure that we were not comparing the same groups of markers across the different evolutionary annotations we analyzed how much overlap exists between the various sets. Reassuringly, the various evolutionary annotations do not share the same group of markers, indicating that we did not test the same regions or SNPs (Supplemental Figure S2). The SNPs in the DMRs thus represent a different group of markers that have not been annotated or analyzed previously from an evolutionary standpoint.

Statistically-significant enrichment exists for Human DMRs
To determine the statistical significance of the DMR enrichment in schizophrenia, we utilized the INRICH software pipeline. This performs permutation and bootstrapping procedures to determine the significance of association of markers in LD intervals while maintaining SNP density and gene density of the original intervals (30). We utilized a similar procedure as reported by Xu et al (17) and tested LD intervals for SNPs with nominal p-values ranging from 1x10 -2 to 1x10 -8 . In their analysis, Xu et al had 893 genes within 100kb of pHARs, 326 genes within 100kb of mHARs (regions conserved in all mammals which are accelerated in humans) and 305 genes within 100kb of PARs. In our study, using GENCODE V19, we had 3700, 1316 and 1268 genes within 100kb of pHARs, mHARs and PARs respectively. INRICH confirmed significant (p<0.05) enrichment of association for human DMRs with schizophrenia at all p-value thresholds of LD intervals. After correcting for multiple testing through bootstrapping, most nominal p-value LD intervals maintained modest enrichment (p<0.1) for human DMRs. Additionally, INRICH independently verified the previously reported enrichment of NSS markers with schizophrenia (18) (Figure 4).

Pathway Analysis
We utilized Ingenuity Pathway Analysis (IPA) to markers that show enrichment of association with schizophrenia and found that 'CREB signalling in neurons' was also amongst the top canonical pathways (Supplemental Table S4). When we repeated the analyses with all organ systems, 'CREB signalling in neurons' and 'Synaptic long term potentiation' emerged amongst the top canonical pathways for genes in LD with enriched DMR SNPs (Supplemental Table S3) and for genes in LD with enriched NSS markers (Supplemental Table S5). determine which biological pathways are overrepresented by genes in LD with DMR SNPs that are enriched for association with schizophrenia. When analyzing pathways overrepresented in the nervous system, we find 'CREB Signalling in Neurons' and 'Synaptic Long Term Potentiation' amongst the top canonical pathways.
Additionally, under physiological systems, 'Nervous system development and function' is enriched (Supplemental Table S2). We repeated the same analysis for genes in LD with NSS This is interesting since there is very little overlap between the DMR and NSS SNPs (Supplemental Figure   S2). Considering all organs, not just the nervous system, genes in LD with enriched DMR SNPs are also overrepresented in 'Hair & Skin development' (Supplemental Table S3). This may suggest potential gene by environment interactions, modulated by methylation variation during human evolution.

Discussion
Our results suggest that SNPs in regions of the human genome that have undergone recent changes in DNA methylation status are enriched for association with schizophrenia. Amongst all the traits analyzed, the enrichment observed in QQ plots was strongest for schizophrenia and passed the genome-wide significance threshold of 5x10 -8 (Figure 1). INRICH analysis demonstrated significant enrichment (p<0.05) in human DMRs for all LD intervals tested (Figure 4a). Our results suggest that DMRs are enriched for genetic variation that confers risk of developing schizophrenia.
Neanderthal-or Denisovan-specific DMRs showed no enrichment of association ( Figure 2). This suggests that SNPs conferring vulnerability to schizophrenia occur in genomic regions whose methylation levels were altered in the modern human lineage but not in the ancestral lineages. It is possible that the evolutionary changes driving the variation in the methylation status could also have made the human lineage more vulnerable to schizophrenia. A caveat to this result is that the LD structure in archaic genomes is unknown, so we cannot test LD-based enrichment in Neanderthal or Denisovan genomes. Our inter-lineage analyses with enrichment plots were thus restricted to SNPs occurring exclusively within DMRs. The other limitation to this comparative approach is that the GWAS data is specific to modern humans.
Xu et al (17) and Srinivasan et al (18) respectively demonstrated that variants located in HARs and in regions containing NSS markers were enriched for association with schizophrenia. In our study, we compared the evolutionary enrichments of schizophrenia risk variants in DMRs, NSS markers and HARs. We validate the results of Srinivasan et al (18) (Figure 3, Figure 4). QQ plots for HARs do not show enrichment of disease markers unlike NSS markers and DMRs (Figure 3). The INRICH analysis showed enrichment for primate HARs only at LD intervals at a p-value threshold of 1x10 -2 with borderline (p>0.1) enrichment at higher thresholds of LD intervals. This difference with the report of Xu et al could be due to a different freeze of the gene database used; it could also be because Xu et al. used a more stringent Hardy-Weinberg equilibrium (HWE) threshold to filter out markers from the schizophrenia GWAS (13), a step we could not replicate as the HWE values are not publicly available. We used the publicly available schizophrenia dataset that has a HWE pvalue >10 −6 in controls and p-value >10 −10 in cases (13). Interestingly, all the evolutionary annotations (DMRs, NSS markers and HARs) cover different sections of the genome with very little overlap between them (Supplemental Figure S2). Between the three evolutionary annotations, nearly 70,000 SNPs occur around regions with evolutionary significance (Supplemental Figure S2).
This provides a wealth of information on genomic regions that are important for the evolution of humans and are also enriched for schizophrenia risk variants (NSS markers and DMRs).
Although we cannot rule out enrichment in the other phenotypes tested, schizophrenia was the only trait for which the enrichment of DMR-localized disease markers exceeded genome-wide significance (Supplemental Figure S1). Furthermore, while the DMRs utilized here were obtained from bone samples, the authors of the original study (28) referred to the report by Hernando-Herraez et al (40) which demonstrated that species-specific DMRs tend to be conserved across tissues and as such should not represent tissue-specific variations. Other studies also showed that neurological systems were enriched for methylation differences even when the tissue samples analyzed were not neurological (41)(42)(43). Therefore, we believe that our results are valid for a 'brain' phenotype even though the DMRs were derived from non-brain tissues. The enrichment seen for schizophrenia also corroborates the results of Gokhman et al (28) who reported that DMRs were more enriched around genes implicated in the nervous system amongst all the organ systems tested for evolutionary changes in methylation patterns. Hernando-Herraez et al (40) also found that methylation differences between humans and great apes were located around genes controlling neurological and developmental features. It is therefore possible that the methylation differences were mediated by evolution of genomic regions controlling neurodevelopmental processes. The results of pathway analysis are consistent with this. Both the DMR and NSS regions that are enriched for association with schizophrenia contain genes that are overrepresented in 'CREB Signalling in Neurons' and 'Synaptic Long Term Potentiation'.
In summary, we have demonstrated that human genomic regions whose methylation status was altered during evolution are enriched for association with schizophrenia. Our results concur with previous genomic studies demonstrating that methylation changes in Homo sapiens have had the greatest impact on the nervous system and provide evidence that epigenomic evolution plays a role in conferring a high risk of schizophrenia on humans.