It is well known that UTRs have comparable genomic footprints similar to coding regions and important regulatory roles in human diseases [10]. So far, a variety of different regulatory elements are known to be distributed in the UTR or upstream/downstream regions of gene coding sequences, such as uAUG sites [33], miRNA binding sites [34], and methylated CpG sites [35]. Regarding the identification of such regulatory elements, some could be predicted with specific algorithms based on characteristic sequences (e.g. the AUG initiation codon sequences or sequences matching to known miRNA seed regions), while some still lack a recognizable pattern and are difficult to identify by in-silico prediction. Therefore, functional assays remain an important way to confirm the potential effects of a given genetic variant on gene regulation. Since UTR variants usually cause disease through gene dosage effects, when looking for pathogenic UTR variants, it is important to focus on sequences that fall into regulatory elements that have well-established or functionally validated links to target genes. Besides, there should be well-documented associations between the target genes and phenotypes of interest. For SUD, the phenotypes could be short/prolonged QT intervals, ventricular fibrillation, conduction block, etc.
Currently, functional assays widely used to test the impact of non-coding variants on gene regulation include RNA sequencing, luciferase reporter assays, and chromosome conformation capture (3C) approaches. However, each assay has its own advantages and disadvantages. For example, RNA sequencing is the most direct way to detect aberrant gene expression, but fresh human tissue from exactly the same anatomical site of different individuals is not always available, not to mention that most forensic samples are highly degraded. Previous studies investigating human post-mortem gene expression have shown that mRNA degradation occurs in a tissue-specific manner and is associated with gene-specific properties [36, 37]. Salzmann et al. also demonstrated in their study on body fluids that some gene transcripts are more prone to degradation than others after deposition [38]. The stability discrepancies among different transcripts in degraded samples will make the comparison of RNA sequencing results even more difficult. In contrast, the luciferase reporter assay can avoid the influence of external factors, such as postmortem interval and individual variation. Nevertheless, the disadvantage of this method is that the assay system is artificial, which might not be able to accurately simulate in-vivo conditions. Regarding cell lines for the luciferase reporter assays, HEK293 and AC16 were used in parallel to increase the validity of our results. Due to the fact that the expression pattern of some regulatory RNAs are tissue specific [39], a representative cell line that shares the common expression regulatory mechanism with the studied tissue/organ is recommended. AC16 is a cell line derived from adult human ventricular cardiomyocytes and thus usually used to study developmental regulation of cardiomyocytes. In addition, HEK293 is a cell line derived from human embryonic kidney cells and currently widely used for functional studies due to its well-performance regarding reproducibility and transfection amenability. Previous studies have demonstrated that the combination of these two cell lines could improve the reliability of functional studies focusing on regulatory elements [40].
Among our candidate variants, only the one in the 5’ UTR of SCO2 induced a reduction in gene expression, which is similar to the effect of LOF variants in the coding region of this gene. Previous studies have shown that uAUG-generating variants are under strong negative selection [33], suggesting a correlation between such variants and aberrant gene expression. As the encoding gene for a metallochaperone involved in the biogenesis of cytochrome c oxidase (COX) subunit, SCO2 dysfunction could cause COX deficiency, resulting in reduced mitochondrial oxidative ATP production capacity [41]. Previous studies have proposed that mutations in SCO2 are associated with infantile encephalocardiomyopathy [42] and hypertrophic cardiomyopathy [43]. However, a comprehensive assessment of this gene’s constraint metrics, intolerance indexes, and dosage sensitivity scores indicated that the aberrant expression of SCO2 will not likely cause fatal disease alone. Thus, we assume that the SCO2 variant identified in this study is at most a contributing factor rather than the main cause of SUD, even though it could lead to a moderate expression change in SCO2. SUDS038 was previously found to also carry a likely pathogenic variant in the coding region of ACADS and a VUS in the coding region of CACNA1C [18]. Therefore, it is possible that the coding variants and UTR variants have all contributed to the death of this individual to varying degrees, as suggested by the multifactorial model of cardiac genetic diseases [44].
The 3’ UTR variants identified in CALM2 and TBX3 could disrupt their binding to certain miRNAs, thereby up-regulating gene expression, leading to a similar effect as GOF variants in coding regions. However, the constraint metrics could only provide evidence that CALM2 and TBX3 are intolerant to LOF variation. Subsequent ncRVIS and ncGERP analysis revealed that CALM2 and TBX3 are intolerant to variation in their regulatory sequences and might have strong dosage sensitivities. The dosage sensitivity scores further confirmed that TBX3 is subject to haploinsufficiency, while it remains questionable whether these two genes also exhibit triplosensitivity. Taken together, these data highlight the strong deleterious effect of aberrant expression of CALM2 and TBX3. As a calmodulin encoding gene, CALM2 is associated with severe early-onset LQTS, which can cause life-threatening ventricular arrhythmias [45]. According to a summary of published pathogenic missense variants in this gene [46], both variants that increase and decrease its affinity for Ca2+ have been identified in LQTS cohorts, suggesting that either LOF or GOF variants in CALM2 are phenotype-related. Therefore, variants that lead to an overexpression of this gene, as shown in this study, are also very likely to be pathogenic. More importantly, our previous study focusing on the coding region did not identify any pathogenic variant or VUS in SUDS026, further strengthening the significance of this single positive finding. However, it remains unclear whether this UTR variant can be considered the sole cause of death until we better understand to what extent the decrease of CALM2 is severe enough to have fatal consequences. The TBX3 gene is a well-known member of the ancient T-box gene family, which is conserved across a wide range of species. It is a critical developmental regulator of several structures, including the heart [47]. Moreover, Frank et al. have shown that TBX3 is required for functional maturation and post-natal homoeostasis of the conduction system in a highly dosage-sensitive manner [48]. Taken together with our findings, it is reasonable to assume that in this case (SUDS068), a joint effect of this variant and another previously identified VUS in the TTN coding region could be responsible for the lethal arrhythmia.
It is noteworthy that the ACMG/AMP guidelines were applied for the initial variant screening of the massive UTR variants identified in our SUD cohort, even though many of these existing rules relate specifically to coding region variants. As proposed by Ellingford et al. [49], some of the rules from Richard et al. [24] can be applied directly to variants in non-coding regions, without the need for additional considerations. These include the use of frequency information (BA1, BS1, BS2, and PM2), up-weighting of confirmed de novo variants (PM6 and PS2), and incorporation of co-segregation evidence (PP1 and BS4). But some other rules from Richard et al. require modifications in terms of strength of evidence and manual curation of triggering conditions (Table 3). In our study, PM2 (absent from controls or at extremely low frequency) was met for the three candidate variants due to their low frequencies in the general population. For computational prediction, PP3 (multiple lines of computational evidence support a deleterious effect on the gene or gene product) was met for SCO2 (NM_001169109.1):c.-135G > C based on in-silico prediction (MetaRNN score = 0.82, which is within the range of pathogenic supportive evidence), while the other two variants were not assigned with a high deleterious score by the prediction tools integrated in Varsome. A possible explanation could be that many pathogenic mutations that act via dominant-negative or GOF mechanisms are likely to be missed by current variant prioritization strategies [50]. In addition to computational prediction, functional evidence is also important in assessing whether a non-coding variant is pathogenic or benign. As our results from the dual-luciferase reporter assay suggest that the three variants all have significant effects on gene expression, manual curation to activate PS3 (well-established in-vitro or in-vivo functional studies supportive of a damaging effect on the gene or gene product) should be acceptable.
Table 3 Applicability of ACMG/AMP rules for non-coding region variants [49]There are also some limitations in our study. First, not all currently available in-silico prediction tools were applied for the functional annotation of the UTR variants, since some are open source scripts that can only run on a specific programming language or environment. Therefore, some regulatory variants with functional effects may have been missed in this study. However, as new prediction algorithms continue to emerge rapidly through iterative improvements, the in-silico tools we used are also likely to be outdated quickly. With this in mind, periodic re-analysis of molecular autopsy results is recommended, as there may be significant updates in variant interpretation in the future [51, 52]. Second, although the candidate variants seem to cause aberrant gene expression, we were unable to confirm their actual contribution to SUD due to the lack of conclusive evidence in terms of gene dosage effect. In future studies, more explicit associations between up/down-regulation of previously proposed SUD susceptibility genes and specific phenotypes deserve to be better clarified to identify the dosage-sensitive genes that are intolerant to variation in their regulatory sequences.
In conclusion, by functional analysis of the UTR variants from the molecular autopsy findings in our SUD cohort, we identified three variants with high confidence of pathogenicity in the UTRs of SCO2, CALM2 and TBX3. Our strategies for UTR variant prioritization and functional annotation could be considered as a practical way of interpreting molecular autopsy findings in the non-coding sequences, which will improve the understanding of the pathogenicity of a significant proportion of VUS identified in SUD cohorts.
Comments (0)