DNA methylation modulated genetic variant effect on gene transcriptional regulation

Identification of meCpG sites associated with CTCF occupancy

Our previous study demonstrated that a meCpG can modulate the interaction between genetic variant and gene expression by altering CTCF binding [7]. Building on this, we hypothesized that other meCpGs sites, where methylation levels are associated with CTCF occupancy, might exert similar effects. To test this hypothesis, we analyzed the correlation of 444,364 meCpG-CTCF pairs across 26 human cell lines or tissues, including the prostate gland, using matched whole-genome bisulfite sequencing (WGBS) and CTCF chromatin immunoprecipitation sequencing (ChIP-seq) data from ENCODE (Additional file 5: Fig. S1A, B) [12]. Of these, 10,987 meCpG-CTCF pairs showed significant negative correlation, while only 246 pairs displayed a positive correlation (Additional file 5: Fig. S1C and Additional file 1: Table S1).

Although a single CTCF binding site might contain multiple meCpG sites (Additional file 5: Fig. S1D), we only identified meCpG-CTCF pairs showing opposite correlations in five CTCF binding sites, suggesting that meCpG sites located in the same CTCF binding site likely share a similar relationship with CTCF occupancy. Thus, after removing these five CTCF binding sites, we selected the most significantly correlated meCpG-CTCF pair for each CTCF binding site, resulting in 6573 negatively correlated and 215 positively correlated meCpG-CTCF pairs, respectively (Fig. 1A, B). This is consistent with previous findings that methylation levels of meCpGs are primarily negatively associated with CTCF occupancy [13, 14].

Fig. 1figure 1

Correlation between meCpG and CTCF binding. A The correlation coefficients and statistical significance for the most significantly correlated meCpG and CTCF per each CTCF binding site. Neg and Pos refer to negatively correlated and positively correlated meCpG-CTCF pairs, respectively. B Examples of negatively and positively correlated meCpG-CTCF pair across 26 ENCODE samples (SCC: Spearman correlation coefficient; Pval: p-value; CpG to CTCF: the distance from the meCpG site to the center of the CTCF binding site). Comparisons of the average CpG methylation levels (C) and average CTCF binding intensity (D) between Neg and Pos groups (Wilcoxon rank-sum two-sided test: mean CpG methylation level: p < 2.20 × 10−16; mean CTCF intensity: p = 2.20 × 10−3). E Comparison of the distances between the meCpG site and the center of CTCF binding site for Neg and Pos groups (Kolmogorov–Smirnov test: p = 1.05 × 10.−5). **p < 0.01; ***p < 0.001

Our analysis showed that meCpG-CTCF pairs with negative correlations tend to have lower CpG methylation levels and reduced CTCF occupancy compared to those with positive correlation (Fig. 1C, D). Additionally, meCpG sites negatively associated with CTCF binding are more likely to be located closer to the center of the corresponding CTCF binding sites (Fig. 1E and Additional file 5: Fig. S1E). In total, we obtained 6788 significantly associated meCpG-CTCF pairs for further examination of their modulation effects on eQTL (Additional file 1: Table S1).

memo-eQTL mapping reveals hidden relationship between SNP and gene

We introduced an extended eQTL method, named memo-eQTL, to systematically assess the modulation effects of these meCpGs, This method characterizes the modulation effect as the interaction between SNP and meCpG (SNP × meCpG) via a moderate model (M3). Subsequently, it requires comparisons against the covariate model (M2) and the standard eQTL model (M1) to determine the statistical significance of the modulation effect (Fig. 2A and “Methods”).

Fig. 2figure 2

Mapping and characteristics of memo-eQTLs. A The framework of memo-eQTL mapping method and its implementation in the CPGEA cohort (sig: significant; sigDiff: significantly different; relH and relL refer to the subsamples with relatively high and low methylation levels at corresponding meCpG site, respectively). B Four different groups of SNP-meCpG-Gene combinations based on comparisons of M3 versus M1 and M3 versus M2 after requiring that M3 be significant. Note that combinations belonging to the group 1 (G1) are considered as memo-eQTLs. C Canonical eQTL (left) and meQTL (right) analysis for SNP rs28452766 with gene OSR2 and CpG site at chr8:98,471,622, respectively. D Visualization of selected memo-eQTL, depicting the relationship between rs28452766 and OSR2 in subsamples with relatively high (relH: Beta ≥ 0.67) and low (relL: Beta < 0.67) methylation levels at chr8:98,471,622. E The comparisons of the relative variance of gene expression can be explained by SNP × meCpG across groups G1-4 (Wilcoxon rank-sum two-sided test: ****p < 0.0001)

We conducted the memo-eQTL mapping in the Chinese Prostate Cancer Genome and Epigenome Atlas (CPGEA) cohort, which comprised matched whole-genome sequencing (WGS), RNA sequencing (RNA-seq), and WGBS data for 128 benign prostate samples [15] (Fig. 2A). Specifically, we pruned SNPs in high linkage disequilibrium (LD) and focused on 19,895 SNPs located in 14,374 ATAC-seq distal peak regions that were identified in prostate cancer [16] (Additional file 5: Fig. S2A and “Methods”). For meCpG sites, we pinpointed 1187 sites with variable methylation levels (Additional file 5: Fig. S2B and “Methods”), which were significantly correlated with CTCF binding as identified in Fig. 1A. Among the potential target genes, we preserved 14,520 protein-coding and lincRNA genes after filtering out lowly expressed ones (Median FPKM < 1, Additional file 5: Fig. S2C). Lastly, we conducted memo-eQTL analysis for 48,348 SNP-meCpG-Gene combinations, where the linear distance between the SNP and the Gene was up to 1 million base pairs. Importantly, we required the meCpG site to be located in between the paired SNP and Gene to simplify the possible modulating mechanisms (Fig. 2A and “Methods”).

In total, we identified 1063 memo-eQTLs, which not only displayed a statistically significant SNP × meCpG interaction (M3 versus M2), but also surpassed canonical eQTL models in performance (M3 versus M1) (Fig. 2B and Additional file 2: Table S2). Notably, only 93 of these memo-eQTLs were detected as eQTLs for the corresponding genes, and 81 as methylation quantitative trait loci (meQTLs) associated with matched meCpGs, all with p-values less than 0.05 (Additional file 2: Table S2). For instance, the SNP rs28452766 is neither an eQTL for OSR2 nor a meQTL for the CpG site at chr8:98,471,622. However, it is significantly associated with OSR2 expression levels in the subsamples with relatively low (relL) methylation levels at chr8:98,471,622 (Fig. 2B–D and “Methods”). This suggests the potential modulating capability of the meCpG site on the relationship between rs28452766 and OSR2 expression. Notably, none of the 1063 memo-eQTL SNP and gene pairs were reported as eQTL in the prostate samples from GTEx (dbGaP Accession phs000424.v8.p2). In addition, subsampling the GTEx prostate samples to the same sample size as the CPGEA cohort showed that the vast majority of the 40,740 SNP-gene pairs examined in memo-eQTL mapping did not exhibit significant associations (Additional file 5: Fig. S2D).

To further validate the credibility of our identified memo-eQTLs, we conducted permutation tests by randomly splitting 128 GTEx samples into DNA methylation relatively high (relH) and low (relL) groups 1000 times to simulate the modulation effects of meCpG (“Methods”). Remarkably, 350 out of the 375 memo-eQTLs significant in the relH group exhibit significantly lower p-values of eQTL model in comparison to random sample partitions (Additional file 5: Fig. S2E). A similar trend was observed for 329 out of the 344 memo-eQTLs that were significant in the relL group (Additional file 5: Fig. S2E). These results underscore the capability of memo-eQTL to uncover the intricate meCpG modulated interactions between SNPs and Genes that exceed random expectations. Moreover, recognizing that DNA methylation level variations could arise from different cell type compositions, leading to potential misidentification, we accounted for potential confounding effects by estimating the proportions of different cell types within our samples. Our analysis revealed that, on average, about 72% of the cells within our samples are non-immune (Additional file 2: Table S2), with low variation in cell type composition (Additional file 5: Fig. S2F, G and Additional file 2: Table S2). These results suggest that the variation in cell type composition within our sample was minimal, consistent with expectations for healthy samples. Furthermore, our analysis showed that the SNP × meCpG explained the most substantial portion of gene expression variance for memo-eQTLs compared to non-significant groups (Fig. 2B, E). In contrast, the SNP or meCpG alone tends to explain less gene expression variance for memo-eQTLs compared to other groups (Fig. 2B and Additional file 5: Fig. S2H). These findings further emphasize the unique modulation role of meCpG in gene expression captured by memo-eQTL mapping.

In conclusion, these results suggest that memo-eQTL mapping complements canonical eQTL and meQTL analyses, uncovering previously uncaptured relationships between genetic variant and gene expression modulated by DNA methylation.

The characteristics of meCpGs, genes and SNPs involved in memo-eQTLs

Among the 1063 memo-eQTLs, there are 352, 749, and 847 unique meCpGs, genes, and SNPs, respectively. Hereafter, they are termed as eCpGs, eGenes, and eSNPs (Fig. 3A). Notably, an eCpG can modulate the relationships of up to 37 pairs of eSNP and eGene, and an eGene can also be associated with as many as six combinations of eCpG and eSNP, indicating possible dominance or additive modulation effects by eCpGs (Fig. 3A). To explore the biological processes and functions regulated by memo-eQTLs, we performed Gene Ontology (GO) enrichment analysis for all eGenes and found that they are enriched on chr6p21 and over-represented in several immune-related Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Fig. 3B, C), especially in the antigen processing and presentation pathway, crucial for adaptive immunity [17]. Nevertheless, given the high linkage disequilibrium in the MHC region on chr6p21 [18], further investigations are needed to pinpoint which eGene corresponds to the associated eSNPs. It is worth noting that the eGene SRD5A3 was reported as a risk gene for prostate cancer in our transcriptome-wide association study (TWAS) analysis using two distinct prostate cancer GWAS studies (Additional file 5: Fig. S3A) [19,20,21]. These findings suggest that our memo-eQTLs can enhance the comprehension of genetic regulatory mechanisms underlying risk genes associated with specific traits and diseases.

Fig. 3figure 3

Characteristics eCpG, eGene, and eSNP for memo-eQTLs. A The occurrence of eCpG (left), eGene (middle), and eSNP (right) in 1063 memo-eQTLs. B Enrichment of eGenes in various chromosome regions. C Enriched KEGG pathways for eGenes located in chr6p21

Of the eSNPs identified, 63 were also reported as meQTLs, indicating that the same SNP could simultaneously influence both DNA methylation levels and gene expression (Additional file 2: Table S2 and Additional file 5: Fig. S3B). The intricate mechanism underlying this dual regulatory role on gene expression could involve modulation and mediation through changes of DNA methylation at corresponding CTCF binding sites. However, thorough understanding of these simultaneous effects necessitates further in-depth investigation. Moreover, among the 847 unique eSNPs, we found 30 have been previously reported to be associated with 19 types of traits or diseases, such as hypertension and Alzheimer’s disease in GWAS studies (Additional file 3: Table S3) [1]. An additional 206 eSNPs were found to be in high LD with significant risk SNPs in GWAS studies (“Methods”). These results underscore that memo-eQTLs can complement canonical eQTLs, aiding in interpreting associations between SNPs and traits or diseases detected by GWAS.

eCpG-CTCF-based chromatin loop can either insulate or enhance the regulatory interaction between eSNP and eGene

In a prior study, we demonstrated that a high CpG methylation level can impede CTCF binding and the formation of a 3D chromatin loop. This allows for cross-talk between a SNP and its target genes. Conversely, low CpG methylation levels correlate with increased CTCF binding and creation of the 3D loop, which acts as an insulator blocking the interplay between the SNP and target genes [7]. To systematically assess the underlying regulatory mechanism for memo-eQTLs, we categorized them into four groups based on whether significant associations between eSNPs and eGenes were observed in subsamples with either high or low eCpG methylation levels: sigHigh, sigLow, sigBoth, and sigNone memo-eQTLs (“Methods”). To simplify the categorization, we excluded the 32 memo-eQTLs with meCpG sites positively correlated with CTCF binding, resulting in 394 sigHigh, 361 sigLow, 109 sigBoth, and 167 sigNone memo-eQTLs (Fig. 4A).

Fig. 4figure 4

Mechanisms investigation for memo-eQTLs. A Stratified four subgroups of memo-eQTLs based on the p-values of M1 models for relatively high (relH) and low (relL) subsamples of the eCpG site, 0.05 was used as the significance cutoff. B The effect size and direction of eSNPs on eGenes in eCpG relH and relL subsamples for the sigBoth memo-eQTLs. C Comparisons of the Spearman correlation coefficients for meCpG-CTCF pairs across four memo-eQTL groups (Wilcoxon rank-sum two-sided test: ns: not significant). D The number of 22Rv1 HiChIP data derived CTCF loops that overlapped with the eSNP-eCpG-eGene loci. E The illustration plot of the overlapping patterns between eSNP-eCpG-eGene loci and eCpG-CTCF loops. F The overlapping patterns between the eSNP-eCpG-eGene loci and eCpG-CTCF loops derived from 22Rv1 HiChIP data for the four memo-eQTL groups (chi-squared test: p = 1.68 × 10.−3)

Interestingly, we observed that eSNPs tend to have opposite relationships with eGenes in subsamples with relatively high and low eCpG methylation levels, particularly in the sigBoth memo-eQTLs (Fig. 4B and Additional file 5: Fig. S4A). This finding suggests that the meCpG modulation effect not only determines the presence of the interaction between genetic variant and gene expression but also alters the direction of their cross-talk. When delving into the distances among eSNP, eCpG, and eGene across four types of memo-eQTLs, we found no significant differences in the pairwise linear proximities (Additional file 5: Fig. S4B and Additional file 4: Table S4). Additionally, the correlations between meCpG and CTCF binding were consistent across all four groups (Fig. 4C). Together, these results suggest that eCpG primarily modulates the interplay between eSNP and eGene by influencing CTCF-based chromatin organization, thereby altering their spatial proximity.

To validate this hypothesis, we employed the CTCF-based 3D chromatin interaction data from two prostate cancer cell lines, 22Rv1 and VCaP, as well as the prostate epithelial cell line RWPE-1 to examine the spatial relationship among eCpG, eGene, and eSNP (“Methods”). In analyzing the 3D chromatin interaction data from 22RV1, we found that the eSNP-eCpG-eGene loci of 997 memo-eQTLs coincided with CTCF loops. Notably, 99.8% of these regions intersected with more than one loop (Fig. 4D). We further identified 756 eCpGs sites located in the anchor sites of 3879 CTCF loops, which allowed us to directly assess their modulation effects through 3D structure alteration. To simplify the analysis, we focused on 128 eCpGs that overlapped with a single CTCF loop anchor site, which we termed as eCpG-CTCF loop. There were only nine eCpG-CTCF loops fully embedded in corresponding eSNP-eCpG-eGene loci, while the remaining 119 eCpG-CTCF loops partially intersected with the eSNP-eCpG-eGene loci. These two groups were termed as “within” and “cross” eCpG-CTCF loops, respectively, as shown in Fig. 4E. We then extended the analysis using 3D chromatin interaction data from VCaP and RWPE-1. Overall, we identified 36, 16, and 5 sigHigh memo-eQTLs with Cross eCpG-CTCF loops derived from 22Rv1, VCaP, and RWPE-1, respectively (Fig. 4F, Additional file 5: Fig. S4C, D and Additional file 4: Table S4), suggesting the corresponding eCpGs may block the formation of CTCF loops, which could act as insulators for the interplay between eSNP and eGene. In contrast, we identified 3 and 23 sigLow memo-eQTLs with Within eCpG-CTCF loops derived from VCap and RWPE-1 (Additional file 5: Fig. S4C, D and Additional file 4: Table S4), implying the corresponding eCpGs might promote the formation of CTCF loops, which could enhance the interaction between the eSNP and eGene by bringing them physically closer. Although a distinct preference among the different types of memo-eQTLs for intersecting with eCpG-CTCF loops was not apparent (Fig. 4F and Additional file 5: Fig. S4D), we found that meCpG modulated CTCF-based chromatin 3D organization could either insulate or enhance the cross-talk between genetic variant and gene expression. Yet, these intricate mechanisms warrant further exploration.

留言 (0)

沒有登入
gif