Abundant transcriptomic alterations in the human cerebellum of patients with a C9orf72 repeat expansion

Gene expression

We performed RNAseq on 193 subjects. Overall, no outliers were detected, and samples did not separate by group using PCA (Online Resource Fig. 1a). When using a linear regression model that accounted for sex, gene count, RIN, age, plate, and differences in cellular proportions, we identified 6911 differentially expressed genes (FDR < 0.05) among groups (Online Resource Table 2). Of those genes, 12 have previously been associated with ALS and/or FTLD, including C9orf72 (FDR = 3.65E-09; Fig. 1d), CDH22 (FDR = 2.28E-05), TARDBP (FDR = 0.011), CAMTA1 (FDR = 0.034), FUS (FDR = 0.042), and OPTN (FDR = 0.046).

Fig. 1figure 1

Differential expression analyses. ac Volcano plot of differentially expressed genes. a C9orf72 expansion patients compared to controls (c9 vs. control). b C9orf72 expansion patients compared to non-C9orf72 expansion patients (c9 vs. non-c9) c, and non-C9orf72 expansion patients compared to controls (non-c9 vs. control). Significantly differentially expressed (FDR < 0.05) homeobox genes are displayed. The x-axis shows fold-changes and the y-axis -log10 of P Values. Fold-changes are scaled to 0 for visualization purposes. d, e Boxplots of 2 differentially expressed genes including d C9orf72 (FDR = 3.65E-09) and e SHOX2 (FDR = 2.15E-21). Residual expression including markers for cell types are displayed. Boxplots represent the median with interquartile range (IQR)

Additionally, we completed 3 pairwise differential expression analyses, which included patients with a C9orf72 repeat expansion and controls (c9 vs. control), patients with a C9orf72 repeat expansion and patients without a C9orf72 repeat expansion (c9 vs. non-c9), and patients without a C9orf72 repeat expansion and controls (non-c9 vs. control). We found 3383 differentially expressed genes in the c9 vs. control analysis (Fig. 1a, Online Resource Fig. 1b, Online Resource Table 2), 1369 differentially expressed genes in the c9 vs. non-c9 analysis (Fig. 1b, Online Resource Fig. 1b, Online Resource Table 2) and 1037 differentially expressed genes in the non-c9 vs. control analysis (Fig. 1c, Online Resource Fig. 1b, Online Resource Table 2).

Using the top 50 differentially expressed genes from each pairwise comparison, we performed pathway enrichment analysis and found that in the c9 vs. both non-c9 and control, differentially expressed genes were enriched for pattern specification and regionalization (Online Resource Table 3). Genes were not enriched for any biological processes in the non-c9 vs. control comparison (Online Resource Table 3). Our findings align with a previously reported upregulation of HOX genes in the c9 subjects compared to both non-c9 and control subjects [24]. Notably, the gene SHOX2 (FDR = 2.15E-21) was one of the top differentially expressed genes in all models tested (Fig. 1a, b, e).

To identify important similarities and differences between brain regions, we then compared these differential expression results to our previously published frontal cortex study [19]. In total, 1957 of the 6911 (28%) differentially expressed genes overlapped between these two regions (Online Resource Fig. 1d), including C9orf72, TARDBP, CAMTA1, FUS, and OPTN. We did notice that HOX genes were detected only in the cerebellum, such as SHOX2, HOXC12, HOXA5, and HOXC13.

Hereafter, we compared cellular proportions using marker genes selected for 5 major cell types in the brain (neurons, microglia, astrocytes, oligodendrocytes, and endothelial cells) that were included in differential expression analyses. We detected no significant changes in neurons, microglia, astrocytes, or endothelial cells in c9 patients compared to non-c9 patients or controls (Online Resource Fig. 2a). We did, however, find an overall change in the estimated proportion of oligodendrocytes between groups (e.g., c9, non-c9, and control, P Value = 6.74E-06). When examining each of the 3 pairwise comparisons, we noticed that there was a decrease in the proportion of oligodendrocytes in the c9 group compared to both non-c9 (P Value = 1.80E-05; Online Resource Fig. 2a) and control groups (P Value = 7.52E-05; Online Resource Fig. 2a). To confirm this reduction, we performed an exploratory immunostaining experiment using oligodendrocyte lineage cell marker OLIG2 [36, 39, 50, 60] in 12 cerebellar samples (Online Resource Fig. 2b). We selected 6 samples with a relatively high proportion of oligodendrocytes, and 6 with a fairly low proportion of oligodendrocytes, as suggested by our deconvolution analysis. Our exploratory experiment revealed that the number of OLIG2 + cells was significantly increased in the high group compared to the low group (P Value = 0.041; Online Resource Fig. 2c).

Co-expression analysis

Next, we looked to identify modules of correlated genes that may differ between disease groups using co-expression analysis. In the c9 vs. control analysis, we identified 24 total modules (Online Resource Fig. 3a, Online Resource Table 4a). Of these 24 modules, 7 were significantly associated with disease group (c9; Table 2). Pathway analysis of the downregulated disease-associated modules revealed involvement in small molecule metabolic processes (Blue; Online Resource Fig. 4a–b, Table 2) and mitochondrion organization (Yellow), while the upregulated disease-associated modules were involved in pattern specification (DarkRed; Online Resource Fig. 4c), protein localization (Turquoise), protein modification (Brown), nucleosome assembly (DarkGreen), and defense response (MidnightBlue; Table 2). Interestingly, the Blue module, enriched for genes involved in metabolic processes, included C9orf72 as a hub gene, similar to a module we observed in the frontal cortex [19]. Furthermore, the DarkRed module (Online Resource Fig. 4c), enriched for genes in pattern specification (Table 2), contained many of the HOX family genes that were differentially expressed (e.g., SHOX2). We performed PCA using the residual expression of genes belonging to this module and found separation of c9 samples and a subset of non-c9 samples (Online Resource Fig. 4d).

Table 2 Disease-associated modules

In the c9 vs. non-c9 analysis, we identified 25 modules (Online Resource Fig. 3b, Online Resource Table 4b). Of these 25 modules, 9 were significantly associated with the c9 disease group (c9; Table 2). Pathway analysis showed that the downregulated disease-associated modules were involved in RNA localization (Pink), regulation of apoptosis (Yellow), vasculature development (LightGreen), neuron ensheathment (Salmon), actin filament organization (DarkTurquiose), and nervous system process (Green), while the upregulated disease-associated modules were involved in pattern specification (DarkRed), chromatin organization (Orange), and mRNA processing (RoyalBlue; Table 2). The DarkRed module was similar to what was shown in the c9 vs. control analysis, also containing SHOX2 and many other HOX family genes.

In the non-c9 vs. control analysis, we identified 21 modules (Online Resource Fig. 3c, Online Resource Table 4c). Of these 21 modules, 5 were significantly associated with the disease group (non-c9). Pathway analysis showed that the downregulated disease-associated modules were involved in small molecule metabolic processes (Blue) and synaptic signaling (Orange), while the upregulated disease-associated modules were involved in vasculature development (MidnightBlue), RNA binding (Turquoise), and mRNA processing (Brown).

Our enrichment results highlighted the importance of metabolic, mRNA, and protein processing pathways as relevant dysregulated processes. More specifically, when focusing on c9-associated modules (7 c9 vs. control, 9 c9 vs. non-c9), 7 contained multiple known disease-associated genes. For example, in the c9 vs. control analysis, the Blue module (downregulated; Online Resource Fig. 3a–b, Online Resource Table 4a) contained 19 disease-associated genes, including C9orf72, NEK1, FUS, TBK1, and TMEM106B. Additionally, the Brown module (upregulated) contained 12 disease-associated genes, including EPHA4, GRN, MAPT, OPTN, CHMP2B, and UNC13A. Moreover, in the c9 vs. non-c9 analysis, the RoyalBlue module (upregulated) contained 7 disease-associated genes, some of which include MAPT, OPTN, SOD1, and TARDBP, emphasizing the involvement of known disease-associated genes in these modules (Online Resource Tables 4a, b).

Furthermore, we performed a module preservation analysis between the modules identified in the current study and those from our previous frontal cortex study [19]. We considered a module preservation score > 10 an indication of preservation. There was substantial preservation of cerebellar modules in the frontal cortex across all 3 pairwise analyses (Online Resource Fig. 5a–c). In fact, most cerebellar modules were preserved in the frontal cortex. As an example, we noticed several c9-associated modules that were strongly preserved, including the Blue module involved in small molecule metabolic processes and the Brown module involved in protein modification. Notably, the DarkRed module enriched for pattern specification was not preserved (Online Resource Fig. 5a, b).

Differential splicing

We used LeafCutter to perform differential splicing analysis. We found 815 differentially spliced intron clusters (FDR < 0.05, Δ PSI >|10%|) corresponding to 761 unique genes in the c9 vs. control comparison, 101 differentially spliced intron clusters corresponding to 97 unique genes in the c9 vs. non-c9 comparison, and 123 differentially spliced intron clusters corresponding to 121 unique genes in the non-c9 vs. control comparison (Fig. 2a, Online Resource Tables 5a–c). We first looked at differential splicing of disease-associated genes. We observed 4 differential splicing events in these genes (Fig. 2c, d). This includes an exon skipping event in CAMTA1 (c9 vs. control, FDR = 5.50E-06; Fig. 2c, Online Resource Fig. 6a), an exon skipping event in DCTN1 (c9 vs. control, FDR = 1.32E-04; non-c9 vs. control, FDR = 0.002; Fig. 2c, Online Resource Fig. 6b), increased inclusion of an annotated exon in SS18L1 (c9 vs. control, FDR = 6.55E-05; Fig. 2c, Online Resource Fig. 6c), and a complex cryptic splicing event in C9orf72 (c9 vs. non-c9, FDR = 7.31E-10; Fig. 2d, Online Resource Tables 5a–c). We completed pathway enrichment analysis using all genes that contained a differential splicing event from each comparison. In the c9 vs. control analysis, we found enrichment for pathways involved in regulation of RNA splicing (Fig. 2b, Online Resource Table 6). No significant enrichments were present in the other 2 pairwise comparisons (Online Resource Table 6). In general, there did not appear to be a unifying feature among differential splicing events (e.g., based on intron length, gene length, location within gene, and GC content).

Fig. 2figure 2

Differential splicing analysis. a Upset plot to demonstrate number of genes that contain at least one differential splicing event for each comparison. b Volcano plot of differentially spliced intron junctions (FDR < 0.05) for C9orf72 expansion patients (c9) compared to controls (c9 vs. control). The x-axis shows the change in percent spliced in (Δ PSI) and the y-axis shows -log10 of the P Value. Genes that contain differentially spliced clusters in significantly enriched pathways are displayed. c Barplot of PSI for the 3 differentially spliced ALS/FTLD associated genes in the c9 vs. control analysis. d Sashimi plot of the cryptic splicing event in C9orf72 for c9 patients and non-C9orf72 expansion patients (non-c9). Pink lines represent cryptic splice junctions as determined by LeafCutter and solid red lines represent annotated splice junctions. Numbers represent the average inclusion for that splice junction

Next, we looked specifically at alternative splicing of cassette exons, as skipping of these exons has been shown to be dysregulated previously in the cerebellum of c9 patients [67]. We calculated percent inclusion of the cassette exons as described elsewhere [7]. Using our differential splicing criteria (FDR < 0.05, Δ PSI >|10%|), we found 303 differentially spliced cassette exons in the c9 vs. control analysis, 37 differentially spliced cassette exons in the c9 vs. non-c9 analysis, and 81 differentially spliced cassette exons in the non-c9 vs. control analysis (Fig. 3a, Online Resource Table 7a–c). Similar to previous reports, there was a relatively high number of exon skipping events in c9 patients, with 285 skipped cassette exons in the c9 vs. control analysis (94% of differentially spliced cassette exons) and 34 in the c9 vs. non-c9 analysis (92% of differentially spliced cassette exons).

Fig. 3figure 3

Cassette exon differential splicing. a Scatter plots of differentially spliced cassette exons. Left: Scatter plot displaying the percent spliced in (PSI) for control samples on the x-axis and change in PSI (Δ PSI) on the y-axis comparing C9orf72 expansion patients (c9) to controls (c9 vs. control). Middle: Scatter plot displaying the PSI for non-C9orf72 expansion patients (non-c9) on the x-axis and Δ PSI on the y-axis comparing c9 patients vs. non-c9 patients (c9 vs. non-c9). Right: Scatter plot displaying the PSI for control samples on the x-axis and Δ PSI on the y-axis comparing non-c9 patients to controls (non-c9 vs. control). Cryptic cassette exons are colored red and skiptic cassette exons are colored blue. b Boxplots of cassette exon percent inclusion for the 4 cryptic cassette exons identified in the c9 vs. control analysis, which includes ODF2L (FDR = 0.028), AGL (FDR = 0.002), MTX2 (FDR = 0.002), and MAP4K3 (FDR = 1.39E-06). Boxplots represent the median with interquartile range (IQR)

Cryptic splicing

We were particularly interested in cryptic splicing, given evidence of its dysregulation across the ALS/FTLD disease spectrum [28, 41, 51, 52, 56, 68]. We classified cryptic splicing events using 2 different methods (see methods for further details). First, we classified cryptic and skiptic cassette exons considering the PSI. Using these criteria for the cassette exon analysis, we revealed 4 cryptic cassette exons and 157 skiptic cassette exons in c9 patients and 1 cryptic cassette exon and 44 skiptic cassette exons in non-c9 patients (Fig. 3a) in comparison to controls. The 4 cryptic cassette exons in c9 patients were in the genes ODF2L (FDR = 0.028), AGL (FDR = 0.002), MTX2 (FDR = 0.002), and MAP4K3 (FDR = 1.39E-06; Fig. 3b).

Furthermore, we implemented an annotation-based cryptic splicing approach using LeafCutter output. Cryptic splicing events in this analysis were identified regardless of the PSI in a given group. We included “cryptic five prime”, “cryptic three prime”, and “cryptic unanchored” LeafCutter assignments in our count of cryptic splicing events (Fig. 4a). Overall, we identified 98 cryptic splicing events across all 3 analyses. We first looked at the amount of each type of event present in our samples. In total, there were 48 cryptic five prime splice junctions, 40 cryptic three prime splice junctions, and 10 cryptic unanchored events (Fig. 4a). We next assigned cryptic splicing events to the group that had the highest inclusion of the cryptic junction and looked at the number of these events present in each group. We found 77 cryptic splicing events in the c9 group, 4 cryptic splicing events in the non-c9 group, and 17 cryptic splicing events in the control group (Fig. 4b, Online Resource Table 8).

Fig. 4figure 4

Annotation-based cryptic splicing. a Barplot showing number of differential splicing events for the 3 types of splicing events that we considered to be cryptic by annotation identified across all 3 pairwise comparisons. b Barplot showing the number of cryptic exons assigned to each group. c Sashimi plot of AGL (left) and MAP4K3 (right) cryptic exons. These events were identified in both cryptic cassette and cryptic annotation analyses. Pink lines represent cryptic splice junctions as determined by LeafCutter and solid red lines represent annotated splice junctions. Numbers represent the average inclusion for that splice junction. Control = control subjects, non-c9 = patients without a C9orf72 repeat expansion, c9 = patients with a C9orf72 repeat expansion

Two cryptic splicing events were discovered using both the cryptic cassette and annotation-based analyses in c9 patients, which were in the genes AGL and MAP4K3 (Figs. 3b, 4c, d), increasing our confidence that these novel cryptic cassette exons might be real. For AGL, this cassette exon was included approximately 28% of the time in c9 patients, 16% of the time in non-c9 patients, and 6% of the time in controls. In MAP4K3, this cassette exon was included approximately 13% of the time in c9 patients, 8% of the time in non-c9 patients, and 1% of the time in controls. To validate the presence of these cryptic exons, we completed cDNA sequencing. For each event, we selected 3 individuals with high inclusion of the cryptic exon and 3 individuals with low or no inclusion, based on our RNAseq data. Using cDNA sequencing, we confirmed the presence of these two cryptic cassette exons, validating our results for both AGL and MAP4K3 (Online Resource Fig. 7a, b).

In addition, we selected 2 top cryptic splicing events (STK10 and EFR3A; Fig. 5a, b) from the annotation-based analysis for validation given their significance, relatively large effect size, and decent coverage. To validate these events, we leveraged long-read RNAseq (Iso-Seq) data we recently obtained from the cerebellum for a small number of individuals included in our short-read study (n =  2). In STK10 (c9 vs. control, FDR = 5.14E-07), we showed that the PSI of this cryptic exon was 47% in the c9 group, 33% in the non-c9 group, and 10% in controls. For EFR3A (c9 vs. control, FDR = 8.83E-11), the PSI was 18% in the c9 group, 11% in the non-c9 group, and 3% in controls. For the validation, we included 1 c9 patient who had high coverage of both cryptic splicing junctions in our short-read data (63 reads for STK10 and 13 reads for EFR3A) and 1 control subject who had 0 coverage of these splice junctions in our short-read data (Fig. 5a, b). We visualized the splicing events in these 2 individuals using IGV. First, for STK10, we confirmed that this event is an extension of the canonical five prime splice junction in exon 14 (Fig. 5a). Comparing the 2 subjects at this splice junction, we found that the individual with inclusion of this splicing event in our short-read data had 7 reads covering the cryptic splice junction and 1 covering the canonical splice junction, whereas in the individual who did not have the cryptic event in the short-read data, we had 0 reads covering the cryptic splice junction and 11 reads covering the canonical splice junction in the long-read data (Fig. 5a). For EFRA3, in the c9 patient with high coverage, we found 3 reads mapping to a cryptic exon located between exons 1 and 2, which contained the identical five prime splice junction as observed in the short-read data and a three prime splice junction that was not detected in the short-read data. This was a complex cluster, with 7 unique cryptic splicing junctions in our short-read data, so this exemplified how long-read data may help resolve complex events (Fig. 5b). All 7 reads in our Iso-Seq data in this region mapped to the canonical splice junctions for the control subject.

Fig. 5figure 5

Select cryptic splicing events. a Left: Sashimi plot of STK10 cryptic exon. Pink lines represent cryptic splice junctions as determined by LeafCutter and solid red lines represent annotated splice junctions. Numbers represent percent spliced in for that group. Right: Boxplot of intron junction counts for the cryptic splicing event shown in the sashimi plot. Bottom: Long-read sequencing coverage for 2 samples, 1 c9 patient (red) and 1 control subject (blue). The number represents the number of reads spanning that splice junction. b Left: Sashimi plot of EFR3A cryptic exon. Pink lines represent cryptic splice junctions as determined by LeafCutter and solid red lines represent annotated splice junctions. Numbers represent percent spliced in for that group. Right: Boxplot of intron junction counts for the cryptic splicing event shown in the sashimi plot. Bottom: Long-read sequencing coverage for 2 samples, 1 c9 patient (red) and 1 control subject (blue). Control = control subjects, non-c9 = patients without a C9orf72 repeat expansion, c9 = patients with a C9orf72 repeat expansion. Boxplots represent the median with interquartile range (IQR)

To demonstrate that we could validate skiptic splicing events too, we selected top skiptic cassette exons, which were present in the genes CEP170 (c9 vs. control, FDR = 9.02E-12) and HSPH1 (c9 vs. control, FDR = 7.20E-06), based on their significance, effect size, and coverage. For both events, we detected a high coverage of the skipped exon in our c9 patient and low coverage of the exon skipping in our control subject in the short-read data (Online Resource Fig. 8a). Our long-read data also confirmed these skiptic exons, where in CEP170 the c9 patient had 7 reads covering the skipped junction, while the control had 0 reads covering the skipped junction, and in HSPH1 where the c9 patient had 36 reads covering the skipped junction and the control showed 0 reads covering the skipped junction (Online Resource Fig. 8b). This already demonstrates the value of Iso-Seq data, which was able to validate both skiptic and cryptic splicing events that were present in c9 patients, while also helping to unravel some of the complexity that we observed in the short-read splicing data.

Finally, because aggregated TDP-43 is generally not present in the cerebellum, we examined the expression of TARDBP, the gene that encodes TDP-43, and additional genes that encode RNA-binding proteins, including some known to be sequestered by RNA foci [6, 8, 14, 15, 46]. We noticed that 15/32 (47%) were differentially expressed between groups, such as TARDBP (FDR = 0.011), HNRNPA1 (FDR = 0.013), HNRNPF (FDR = 0.007), SRSF1 (FDR = 0.021), and ALYREF (FDR = 0.043; see Online Resource Table 2). We then checked if their expression correlated with the inclusion of any of the cryptic splicing events we identified. We, therefore, performed a correlation analysis between the residual expression of selected genes and the junction ratio of the cryptic splicing events. We identified 335, 53, and 64 significant correlations in the c9 vs. control, c9 vs. non-c9, and non-c9 vs. control analyses, respectively, when including annotation-based cryptic events (Fig. 6a–c). Interestingly, we found a consistent pattern, where a select set of genes, which includes TARDBP, HNRNP2AB1, SRSF1, and PCBP1, were positively correlated with cryptic splicing event inclusion in c9 patients, and another set of genes, which includes HNRNPD, FUS, HNRNPK, and HNRNPA1, were negatively correlated with cryptic splicing event inclusion in c9 patients (Fig. 6a–c). As an example, the STK10 cryptic splice junction was highly correlated with the RNA expression of several RNA-binding proteins, including a positive correlation with TARDBP (c9 vs. control, Bonferroni P Value = 2.34E-07, r = 0.632) and HNRNP2AB1 (c9 vs. control, Bonferroni P Value = 4.85E-07, r = 0.623), and a negative correlation with FUS (c9 vs. control, Bonferroni P Value = 6.24E-04, r = −0.526) and HNRNPD (c9 vs. control, Bonferroni P Value = 1.68E-08, r = −0.660; Online Resource Fig. 9). Other top hits, including cryptic cassette exons as seen in MAP4K3 and MTX2, showed comparable patterns (Online Resource Fig. 10a, b). Notably, these patterns remained similar when not adjusting for cellular composition and using CQN gene expression values (data not shown).

Fig. 6figure 6

Cryptic splicing correlations. ac Correlation heatmaps between cryptic splicing events in the annotation-based analysis and residual gene expression values for select RNA-binding proteins, a when comparing C9orf72 expansion patients to controls (c9 vs. control), b C9orf72 expansion patients to non-C9orf72 expansion patients (c9 vs. non-c9), and c non-C9orf72 expansion patients to controls (non-c9 vs. control). Colors represent Pearson’s correlation coefficient. Stars indicate significant correlations after correction for multiple testing (Bonferroni P Value < 0.05). Differentially expressed genes are bolded

留言 (0)

沒有登入
gif