Exploratory analysis of L1 retrotransposons expression in autism

Expression of evolutionarily young L1 subfamilies is detected in postmortem brains, in in vitro models and in blood.

L1 transcriptional deregulation is observed in multiple human disorders including ASD [46, 47], through mechanisms that are not yet well understood. To explore the potential impact of L1 elements in ASD, we assessed whether and which L1 subfamilies are expressed in samples and experimental models of ASD. We retrieved public bulk RNA-seq data from postmortem ASD brains (Velmeshev et al. dataset [48]). The dataset consists of a total of 31 individuals (from 15 ASD donors and 16 controls) with expression profiling of anterior cingulate cortex (ACC) and prefrontal cortex (PFC). Data from both tissues were available for 10 individuals (7 ASD and 3 controls), while for the remaining 21 individuals (8 ASD and 13 controls) expression data were produced from either ACC or PFC. Therefore, analyses were performed on a total of 41 RNA-seq derived samples, 18 for ACC (9 ASD and 9 controls) and 23 for PFC (13 ASD and 10 controls) (Additional file 1: Tables S1, S2). We also retrieved RNA-seq data from induced pluripotent stem cell (iPSC) and neuronal cells knockout (KO) for ten ASD-relevant genes (AFF2, ANOS1, ASTN2, ATRX, CACNA1C, CHD8, DLGAP2, KCNQ2, SCN2A and TENM1) (Deneault et al. dataset, [49]). The dataset comprised a total of 87 RNA-seq samples, 42 from iPSCs and 45 from neurons. KOs were performed for each ASD-relevant gene through a CRISPR gene editing method to insert premature termination sites (Additional file 1: Tables S1, S3). Furthermore, we used RNA-seq data from whole blood of 73 pairs of discordant ASD siblings for a total of 146 samples (Additional file 1: Tables S1, S4).

FASTQ files from each sample were mapped on the human genome (Additional file 1: Table S5). To quantify L1 expression, two softwares were used: TEspeX [56] and SQuIRE [55]. TEspeX measures TEs expression not counting reads possibly generated from TE fragments embedded in the exons of annotated canonical transcripts. It cumulatively quantifies the expression of TE subfamilies annotated in RepeatMasker [31] by counting the sequencing reads mapped to the consensus sequence of each TE subfamily. SQuIRE [55] allows to count reads mapped to each genomic locus where a TE is annotated. Both tools should enrich for the uniquely mapped reads; however, a portion of multimapping reads is likely taken into account for quantification. We measured the expression of a total of 117 human L1 subfamilies. First, we used TEspeX to determine which L1 subfamilies are more expressed independently from the specific host transcripts and genomic locus. We then exploited SQuIRE to pinpoint which specific L1 fragments belonging to the most expressed subfamilies are transcribed at the locus-specific level. For each dataset, we defined as expressed all L1s fragments with an average of at least 200 reads mapped among all samples. According to our analysis, L1HS was the L1 subfamily with the highest expression level in all datasets, followed by several L1PA subfamilies (Additional file 2: Table S6). Figure 1A shows the sum of the average normalized expression of specific grouping of L1 subfamilies. In ASD brain samples, KO cell lines and all their controls, L1HS and L1PA subfamilies made up ~ 80–90% of the total normalized L1 expression, while the other L1 subfamilies cumulatively showed a modest level of expression (Fig. 1A, Additional file 2: Tables S6, S7). On the other hand, L1HS and L1PA subfamilies constituted only about 35–40% of the total normalized L1 expression in the blood dataset, for both ASD and controls (Fig. 1A).

Evolutionarily young TEs are expected to show a lower number of mutations which might increase their probability to be transcribed compared to older TEs. A measure of the evolutionary age of TEs is measured as base mismatches in parts per thousand (MilliDiv) with respect to its consensus sequence. We retrieved MilliDiv values for each L1 fragment annotated in the human genome from RepeatMasker [31] and calculated the average MilliDiv for each subfamily. As expected, the average MilliDiv for L1HS and L1PA fragments was significantly lower compared to that of the fragments of other L1 subfamilies and, interestingly, the average MilliDiv for the group of expressed L1 elements was lower than the corresponding one of the same subfamily containing also the non-expressed ones (Fig. 1B). We therefore decided to focus our analyses on L1HS and 20 L1PA subfamilies.

L1 elements are upregulated in postmortem brains from a subset of ASD individuals and ASD model samples

To assess whether any of the analyzed ASD sample would show altered expression of young L1 elements, we measured the expression of L1s annotated in the human genome in all samples with SQuIRE [55]. We then performed L1-related differential expression analysis comparing ASD samples versus controls. To reduce the risk of quantifying the expression of small exonized L1 fragments whose expression can be a consequence of the transcription of their host gene, we limited our analysis on likely full-length (> 5 kbp) fragments showing an average of at least 200 reads mapped among samples of each dataset and belonging to L1HS and L1PA subfamilies. A total of 128,506 L1HS and L1PA fragments are annotated in RepeatMasker [31]; of these 9077 are more than 5 kbp long. From SQuIRE analysis, we classified 100 L1s from Velmeshev et al. and 175 L1s from Deneault et al. as expressed (> 200 average reads mapped among all samples) (Additional file 3: Tables S8, S9). Of them 36 were commonly expressed in both datasets.

We then performed L1 expression analyses on each single individual. Briefly, the number of reads mapped to each L1 fragment was first normalized with DEseq2 [58]. To infer for differential expression in each single individual, in the absence of replicates, we calculated a z score for each L1 fragment comparing its expression level in each sample against the distribution of its expression levels in the group of controls. We therefore chose to analyze every single subject independently, avoiding to aggregate the disease subjects and considering them as a group of replicates. The reason behind this choice relies on the fact that ASD cases represent a heterogeneous cohort and aggregate analyses often tends to mask the heterogeneity of the molecular phenotypes. Although this analysis allows the identification of specific patterns in single subjects, results should be considered as exploratory and to be validated on bigger cohorts because z score analysis is more powerful with bigger sample sizes. For each sample, we classified as potentially upregulated those L1 fragments characterized by a z score higher than 3, and as potentially downregulated those L1 fragments characterized by a z score lower than -3. We detected a total of 2196 cases of expressed L1 upregulation and 315 cases of downregulation in the analyzed samples (Additional file 3: Table S10). We then calculated the net number of expressed fragments by subtracting the number of downregulated L1s to the number of upregulated L1s for each sample. Overall, ASD brain and KO samples always showed a positive net number of expressed L1s compared to controls. In most control samples, the net number of expressed L1s was either null or very low but positive, except for two control samples of the Velmeshev et al. ACC dataset, which showed a moderately negative value (Fig. 1C). In the case of the Velmeshev et al. dataset, three ASD samples in the ACC showed a considerably higher amount of L1 net expression compared to all other samples. These samples showed a net number of about 60 upregulated L1s, while all other samples did not present more than 25 net potentially upregulated L1s (Fig. 1C). On the other hand, no PFC sample showed a substantially increased number of upregulated L1s compared to the other samples. Furthermore, the three samples characterized by strong L1 upregulation (SRR9292614, SRR9292620, SRR9292621) when compared to all other samples in ACC did not show the same trend in their PFC counterpart (SRR9292613, SRR9292619, SRR9292623). Therefore, the observed L1 upregulation probably occurs only in specific areas of the brain.

In order to rule out the possibility that these results could arise from quality or technical biases specific of the three significant samples (SRR292614, SRR292620, and SRR292621), we calculated several metrics based on reads mappings (Additional file 1: Table S5) and evaluated the z score for each set of normalized reads in order to detect potential mapping outliers. Samples SRR292614, SRR292620 and SRR292621 did not show any significant z score in any of the metrics suggesting that the results associated with these samples were not resulting from potential biases. These samples showed a relatively high and positive z score related to the number of reads mapping within introns, with the sample SRR9292620 showing a statistically significant positive z score. Rather than representing a technical artefact, in the absence of other outlier measures, we believe this result could indicate a potential alteration of splicing that will be further discussed in the remaining part of this study.

In the Deneault et al. dataset, both iPSC and differentiated neurons KO for ATRX showed a substantially higher average net number of expressed L1s (about 60) compared to the other KOs, which never exhibited a value higher than 30 (Fig. 1C). Of note, in all KO samples the average net number of expressed L1 compared to controls was positive. We interpret this result as an indication of L1 upregulation, suggesting that the loss of function of ASD-related genes might lead to an increase in L1 transcriptional activity.

To determine the magnitude of differential expression related to L1s, we calculated the log2 ratio between the normalized expression of each L1 in each sample and the average normalized expression of controls, separately for each sample of each dataset analyzed (Additional file 3: Table S10). The average log2 fold change of DE L1s is about 1.2 (2.3-fold on linear scale), median 1.0 (twofold on linear scale). We also retrieved the genomic coordinates of three subgroups of L1s from the L1Base2 database [62] and calculated the overlap of the genomic coordinates of expressed L1s with the coordinates of the L1Base2 annotations (see ‘Methods’ section). 97 out of 100 L1s expressed in the Velmeshev et al. dataset and 162 out of 175 expressed L1s in the Deneault et al. dataset matched with the FlnI-L1 (full-length non-intact) set. Only 1 L1 expressed in the Deneault dataset matched with the ORF2-L1 group and 1 from the Velmeshev dataset and 6 from the Deneault dataset matched with the FLI-L1 (full-length intact). Overall, more than 90% of the expressed L1s overlap elements in the L1Base2 database and can be considered full length, albeit only about 5% of the expressed L1s retain its coding potential. These results suggest that the potential cellular effects of L1 dysregulation may occur mostly at the RNA level.

Finally, we performed the same L1 expression analyses on a total of 146 RNA-seq whole blood samples derived from individuals affected by ASD and their unaffected siblings. However, no L1 fragment showed a z score above 3 in any ASD or control sample. We then plotted the cumulative expression of L1 subfamilies quantified with TEspeX. Overall, all blood samples (ASD and controls) showed similar total L1HS/L1PA expression levels (Additional file 4: Fig. S1A). Furthermore, the distribution of the ratio of the total L1HS and L1PA normalized expression between each pair of discordant siblings (ASD/Control) showed no alterations of L1 levels in the blood of discordant siblings (Additional file 4: Fig. S1B).

Taken together, these results suggest that only a subset of ASD subjects may present a strong increase in the transcriptional activity of young full-length L1 elements, probably in specific brain regions.

Evaluation of L1 expression in the context of mutated genes

iPSCs and differentiated neurons KO for ATRX displayed a pervasive upregulation of young FL L1 elements, while the KO of nine other ASD-relevant genes led to a more limited increase of L1 expression. Hence, we speculate that the loss of function of specific genes may differently affect L1 transcriptional regulation. ATRX is a SFARI gene that encodes for a protein which contains an ATPase/helicase domain belonging to the SWI/SNF family of chromatin remodeling proteins [52, 68]. It has been demonstrated that ATRX knockout in mouse ES cells causes increased chromatin accessibility at genomic loci occupied by retrotransposons [68] suggesting that this gene has a direct role in the establishment and maintenance of heterochromatin also at the level of L1s. To validate this hypothesis, we retrieved ChIP-seq data from a recent publication [52], where the authors performed a comprehensive ChIP-sequencing on ATRX in human cell lines. To assess whether ATRX may specifically act on genomic regions occupied by the FL L1s that were upregulated in our analysis, we computed a profile of ATRX ChIP-seq data separately on the genomic coordinates of (1) the expressed and (2) the upregulated L1 loci compared to controls L1HS/L1PA, as well as of (3) one thousand of 6 kbp long genomic coordinates randomly selected within annotated introns as controls. The profile was computed with DeepTools [60] computeMatrix utility and plotted with plotProfiles. In both iPSC and neurons, expressed and upregulated L1HS/L1PA showed a binding enrichment compared to both their up/downstream boundaries and to random intronic segments (Fig. 1D). Interestingly, the upregulated L1s showed a decrease of ATRX binding in the 5’UTR and a corresponding increase in the binding at the 3’UTR when compared to the set of expressed L1s. This binding pattern has been previously observed in correspondence of zinc finger genes and it warrants further investigation [52]. Together with the strong L1 upregulation observed in ATRX KO cells, these results add support to the fact that ATRX loss of function might lead to an increased transcription of young FL L1 elements.

Transcriptome-wide deregulation in ASD postmortem brain and in vitro differentiated neurons correlates with L1 upregulation

Altered gene expression is currently considered as one of the main molecular manifestations of ASD [14,15,16, 18]. Since in our analysis a specific subset of ASD samples showed strong L1 upregulation in independent datasets, we investigated whether altered L1 expression was associated with gene expression alteration. To define differentially expressed (DE) genes, we used the same approach applied to FL L1HS/L1PA fragments, which allowed us to characterize each subject individually in comparison with controls in absence of replicates. We applied this method only to genes associated with an average number of mapped reads above 200 across samples of each dataset. We refer to this subset of genes as expressed genes. Genes were considered DE if associated with a z score higher than 3 or lower than -3. Overall, the number of DE genes was not homogeneous among samples, with a subset of ASD samples showing a number of DE genes very close to controls (Fig. 2A). However, samples showing stronger L1 upregulation were among the samples with the higher number of DE genes, especially in postmortem brain and in vitro differentiated neurons demonstrating a significant correlation between L1 expression and the number of differentially expressed genes in the ACC dataset (Fig. 2A, B). For example, sample SRR9292620 was associated with both the highest number of net expressed L1s and with the highest number of DE genes. Furthermore, the top six samples of the Velmeshev et al.—ACC dataset with the highest L1 net expression, were at the same time the six samples with the highest number of DE genes. Similarly, in vitro differentiated neurons that were KO for ATRX, TENM1, KCNQ2 and DLGAP2 showed the highest L1 net expression and gene dysregulation.

Fig. 2figure 2

Transcriptome-wide level of deregulation in ASD brain and in vitro neurons correlates with L1 upregulation. A Total number of dysregulated genes (|z score|> 3) for all samples. In the case of the gene KO datasets, the average and standard deviation of the number of dysregulated genes among replicates has been plotted. B Correlation between the number of DE genes and the number of upregulated L1 in log2. C Number of DE genes (FDR < 0.05, |LogFoldChange|> 0.5) resulting from DEseq2 differential expression analysis

In order to further explore this result, we carried out gene DE analysis with DEseq2 [58] by exploiting the technical replicates of KO iPSC and neuronal cell lines. To this end, the number of DE genes was plotted for all KOs (FDR < 0.05, |Log2FoldChange|> 0.5). While we observed transcriptional deregulation for all KOs in iPSCs, only KO samples for ATRX showed a considerable number of DE genes in differentiated neurons (Fig. 2C). Given that the very same samples showed the strongest L1 RNA upregulation, consistently with what we observed in the z score analysis, these results suggest that FL L1HS and L1PA elements might be associated with gene deregulation in mature neurons of ASD subjects.

Upregulated L1 elements are mainly expressed within introns of actively transcribed genes

Given the original observation that gene deregulation was generally stronger in postmortem brain tissue and in vitro differentiated neurons showing L1 upregulation, we explored the genomic context of upregulated L1s and assessed the relationship between their transcription and the host gene transcripts. Firstly, we analyzed ChIP-seq data from the NIH Roadmap Epigenomics Mapping Consortium [51], focusing our attention on six major histone modifications (H3K4me1, H3K4me3, H3K9me3, H3K36me3, H3K27me3 and H3K27ac) from postmortem mid-frontal lobe of a control individual. Upon retrieving ChIP-seq data in wig format, and converting them to bigwig using USCS wigToBigWig utility, we computed a profile of histone mark mapping density relatively to (1) the genomic coordinates of a unique list of FL L1s upregulated in all samples and (2) of all FL L1HS/L1PA annotated in the human genome. The profile was computed with DeepTools [60] ⁠ computeMatrix utility and plotted with plotProfiles. Overall, all histone marks profiles showed a substantial drop at the level of L1 genomic coordinates, which may be due to the typically lower mappability of repetitive regions. However, upregulated L1s showed a slightly weaker enrichment for transcriptional repressive marks H3K9me3 and H3K27me3 compared to all annotated FL L1HS/L1PA, both within the boundaries of the L1 fragment and their flanking upstream and downstream regions. Interestingly, upregulated L1s exhibited a considerably stronger enrichment for activator marks (H3K36me3, H3K4me1 and H3K27ac) at the level of their flanking regions (Fig. 3A). While further analysis is needed on samples from different brain regions, these results are consistent with the model of a genomic environment more permissive to transcription for the upregulated L1s with respect to the genomic average.

Fig. 3figure 3

Upregulated L1 elements are expressed within introns of actively transcribed genes. A Histone marks aligned reads density profiles computed for all upregulated L1 elements and total annotated L1 elements for six major histone marks, the profile extends along the whole length of the L1 elements (~ 6 kbp) as well as along 5 kbp upstream and downstream with respect to each L1 element. B Genomic localization of all upregulated L1 elements in all samples. C Volcano plot representing differentially retained introns between controls and samples characterized by a strong L1 upregulation, significance lines: FDR = 0.05, |Log2(FoldChange)|= 1. D Percentage of reads shared among random exons and their closest upregulated L1, exons and their closest exon, random intronic 6 kbp regions and their closest exons

We also measured the percentage of upregulated, expressed and total annotated FL L1PA/L1HS which overlap expressed genes. In both datasets, the percentage of upregulated and expressed L1s found within expressed genes greatly exceeded the percentage of total L1s. However, upregulated L1s were neither enriched nor depleted within expressed genes compared to non-deregulated or expressed L1s (Additional file 4: Fig. S2). We then overlapped the genomic coordinates of upregulated L1s with the genomic coordinates of exons and introns of canonical coding and non-coding genes retrieved from UCSC. All L1s not overlapping with either introns or exons were considered intergenic. For the vast majority of samples in all datasets, the number of upregulated intronic L1s greatly exceeded the number of upregulated exonic and intergenic L1s, especially in the samples with the highest number of upregulated L1s (Fig. 3B). These results suggest that upregulated L1s were enriched within introns of actively transcribed genes. We did not find any enrichment of a specific strand orientation preference between the L1s and the host genes. Further analysis on larger cohorts is needed to properly address the question whether a specific strand orientation combination is important for L1/host gene transcription.

Given that the majority of upregulated L1s are intronic, we wondered whether their upregulation might be a technical artefact caused by an increased intron retention. We therefore performed a differential intron retention analysis focused on the groups of samples characterized by L1 upregulation compared to controls using IRFinder [61]. While we detected only a few differentially retained introns in iPSC and neurons KO for ATRX, we detected 5 downregulated and 806 upregulated introns in the group comprising samples SRR292614, SRR292620 and SRR292621 (Fig. 3C). However, only one of these significantly upregulated retained introns overlapped an upregulated L1, suggesting that intron retention is not at the basis of the observed L1s upregulation. This lack of overlap was also confirmed by a per-sample analysis. In this case, we first identified retained introns in each sample using a specific filter on the IRFinder output and then we overlapped the identified retained introns with the upregulated L1s identified with the z score method. Again, also in this analysis no evidence of overlaps was found between any retained intron and upregulated L1 (Additional file 5: Table S11). Intron retention was therefore strongly increased in the brain of ASD subjects showing upregulation of L1s and general deregulation of canonical genes. This interesting pattern of expression deserves further studies to understand its functional significance and regulatory mechanism.

We then tested whether intronic upregulated L1s would be spliced into novel transcripts together with nearby annotated exons. To this end, we counted, for all groups of samples characterized by L1 upregulation, the percentage of mapped reads whose fragments are shared by pairs of genomic segments comprising: (1) a random expressed exons and its closest expressed exon, (2) an upregulated L1 and its closest exon, (3) a random 6 kbp long sequence sampled from introns of expressed genes and its closest exon. The distribution of the percentage of shared reads between upregulated L1s and their closest exon was similar to the one of random intronic segments, and significantly lower than the average percentage of shared reads between pairs of neighboring exons (Fig. 3D).

Taken together, these results suggest that upregulated L1s are mostly located within introns of expressed genes and might be expressed as independent transcriptional units.

Upregulated L1 elements may be associated with downregulation of a small set of ASD-related host genes

Starting from the model that intronic upregulated L1s might be expressed as independent transcriptional units, we assessed how they relate with the expression of their host genes. We limited our analysis to the samples with the highest L1 upregulation: SRR292614, SRR292620 and SRR292621 from postmortem ACC (Velmeshev et al. dataset), and iPSC and differentiated neurons KO for ATRX (Deneault et al. dataset). To this end, we counted the number of upregulated L1s found within the genomic loci of DE genes for each sample. We considered as DE those genes and L1s featuring a z score above 3 or below -3. Overall, most upregulated L1s were not found within DE genes (Fig. 4A) with the exception of 11 significantly upregulated L1s in postmortem ACC that were located within significantly differentially expressed genes. Of note, 5 of them (MARK1, MAPK8, OPCML, DLGAP1 and ZNF780B) resulted from a single subject and showed an upregulated L1 overlapping a downregulated gene (Fig. 4B, Table 1). Taking into account both the Velmeshev and the Deneault dataset, we identified a total of about 50 overlaps between L1 and genes in which both of them were significantly upregulated. Albeit an interesting topic to be further inspected, the concomitant increased expression of an L1 and its host gene could result from general chromatin relaxation. On the other hand, even if deriving from a single sample, we were intrigued by the 5 upregulated L1s located inside downregulated genes. This relationship led us to speculate that, in some loci, L1 upregulation might have a negative impact on the expression of their host genes. To further explore this possibility, we selected all intronic upregulated L1s (cutoff on z score > 3 for the L1s) in each of the three ACC sample displaying strong L1 upregulation separately. For each of the three samples, we then associated the z scores of each intronic upregulated L1 with the z score of their overlapping genes (without a specific cutoff on the z for the genes). In all the three samples, the z score associated with genes overlapping upregulated L1s was mostly negative, while genes overlapping non-upregulated L1s did not show any trend (Additional file 5: Tables S12–S14, Fig. 4C). This result added support to the possibility that L1 upregulation might have a negative impact on the transcription of host genes in a subset of loci.

Fig. 4figure 4

Upregulated L1 elements may negatively impact the expression of specific ASD-related host genes. A Number of upregulated L1 elements overlapping DE and non-DE genes. B Number of upregulated L1 elements overlapping upregulated and downregulated genes. C z score associated with upregulated L1s (z score > 3) and their overlapping genes in samples, and non-upregulated L1s (|z score < 2|) and their overlapping genes in samples SRR9292614, SRR9292620 and SRR9292621. D Number of genes overlapping upregulated L1s and associated with a negative z score in common with genes nearby genomic loci where the L1 RNA binds genomic DNA in mouse, compared to random distributions computed with expressed genes, **z score > 5. E Example of an upregulated L1 intronic to a gene showing an average negative z score. The sashimi plots represent the normalized expression at the level of the intronic upregulated L1 and the closest exon belonging to the DLGAP1 gene. The upregulated L1 is in red in the RepeatMasker track. Other annotated repeats are in black. DLGAP1 exon is in blue. F Distribution of correlation coefficients between intronic and extragenic L1s and the closest/overlapping gene for all L1/gene couples in different human tissues grouped together and G taken ungrouped

Table 1 Differentially expressed genes overlapping upregulated L1s in specific samples

A recent publication [69] showed that, in mouse ESCs, L1-enriched genes were transcriptionally silenced by a not yet well understood mechanism which involves the transcription of the L1 element and the subsequent binding of the L1 RNA to the DNA at the level of specific loci. Indeed, through chromatin isolation by RNA purification (ChIRP) experiments, the authors identified ~ 25,000 genomic loci where the L1 RNA binds genomic DNA [69]. A subset of them were found in close proximity to a total of 2400 genes (distance < 5 kbp). Interestingly, in all three ACC samples genes overlapping upregulated L1s and associated with a negative z score were enriched within the human orthologs of genes nearby L1-ChIRP peaks when compared to a random distribution computed with equally sized sets of expressed genes (Fig. 4D, z score > 5). This result reinforces the idea that L1 upregulation in the ACC of a subset of ASD subjects might be involved in mechanisms regulating the expression of the host gene.

We then studied whether the genes overlapping upregulated FL L1s (z > 3 for L1s) and associated with a negative z score (without a specific cutoff on the z for the genes) were associated with neural markers, genes found mutated in ASD and genes highly expressed in the adult brain (see ‘Methods’ section). We call these genes negatively correlated genes. There are about 10 negatively correlated genes in each of the three samples (SRR292614, SRR292620 and SRR292621) accounting for a total of 15 unique genes (Additional file 4: Fig. S3). For each sample, negatively correlated genes resulted enriched for neural markers from single cell studies [65] and for SFARI genes. These enrichments, however, appeared to be a general feature of genes overlapping full-length L1s, the subset of L1s analyzed in this study (z score > 25). These genes were not significantly enriched for those highly expressed in the adult brain [64] (Additional file 6: Tables S19–S22). However, five (one-third of the total) negatively correlated genes that are present also in the SFARI database showed the same behavior in all three samples (CADM2, CSMD1, DLG2, DLGAP1, GPHN) (Additional file 4: Fig. S3). The main function of these genes lies in synapse organization and function (Table 2). Examples of mappings for upregulated L1s are provided in Additional file 4: Figs. S4–S6. Figure 4E shows an example of an L1 upregulated in all 3 samples, found within a SFARI gene (DLGAP1) and associated with a negative z score. These results suggest that a subset of genes expressed in adult brain might be negatively influenced by the expression of the overlapping intronic FL L1s.

Table 2 Description of the 5 genes associated with a negative z score and overlapping an upregulated L1 element in all the three samples characterized by pervasive L1 upregulation

We then calculated the correlation between the transcription level of each expressed FL L1 element with the transcription level of the respective host gene in each dataset (Additional file 5: Tables S15–S18). Interestingly, the percentages of FL L1s whose expression is in negative correlation with the expression of the host gene in Velmeshev data were 50% and 60% for ACC and PFC, respectively, while in the Deneault dataset these percentages were much lower with 3.4% and 0% for iPSC and differentiated neurons, respectively. These results raise the interesting question whether cultured cells are a good system to finely study the expression and the activity of transposable elements.

To validate the observed expression patterns in an independent dataset, we exploited data from a recent study [70], where the authors calculated the coefficient of correlation between the expression of intronic and intergenic L1s and the expression of the overlapping genes in 49 human tissues. The authors stratified L1s into elements expressed in all tissues and in those with a particularly high expression (> 1.5-fold higher) in one tissue compared to all others. Initially, to ease data representation, we grouped the 49 tissues into 11 classes. Our analysis showed that the average coefficient of correlation between intronic tissue-specific L1s and their overlapping genes was substantially lower for brain-derived tissues compared to all other tissues (Fig. 4F). This adds support to the existence of an inverse correlation between intronic L1 and the host genes expression in neurons for a subset of genes. When values for the ungrouped tissues were plotted, results showed differences among areas of the brain. The lower values, suggestive of a stronger negative expression correlation between a subset of L1/gene pairs, were derived from putamen basal ganglia, ACC and cerebellar hemispheres (Fig. 4G). These results support both the idea of regional differences in the brain as well as the existence of a possible relationship specific to the ACC.

Taken together, our results suggest that L1 upregulation might, in some instances, be associated with the downregulation of their host genes. This could happen more frequently in genes important for ASD, with a mechanism yet to be characterized. Of note, the strong enrichments of FL L1-containing genes for neural markers and SFARI classification are interesting and will be important, in future studies, to specifically deepen our knowledge on them.

留言 (0)

沒有登入
gif