Diagnostic genome sequencing improves diagnostic yield: a prospective single-centre study in 1000 patients with inherited eye diseases

Introduction

Although protein-coding regions represent only 1–2% of the human genome, they harbour an estimated 85% of annotated pathogenic variants.1 2 Despite these numbers, genome sequencing (GS) usually achieves a higher diagnostic yield than sequencing approaches that focus on exonic regions, not least because of its more homogeneous coverage3 4 and higher efficiency in capturing genomic regions that are particularly high or low inguanine or cytosine content.3 5 The real advantage of GS is of course its ability to detect variants outside the protein-coding regions. In addition, the homogeneous coverage by GS aids the detection of CNVs by semiquantitative algorithms and callers based on the discovery of discordant pair and split-read alignments. Even more important, GS is able to detect copy-neutral rearrangements such as inversions and translocations. Numerous studies have found that the diagnostic yield can be improved by the use of GS, for instance in intellectual disability,6 7 paediatric disease,8 9 neurological disorders10 11 and inherited retinal degeneration (IRD).5 12 13

The latter disease group comprises a range of disorders characterised by progressive degeneration or stationary dysfunction of the outer retina and/or the retinal pigment epithelium. IRD affects approximately 1 in 3000 individuals in North America and Europe.14–17 Genetic diagnostics is hampered by the considerable clinical overlap of disease entities and the complexity in the genetic causes (>270 ‘disease genes’ (RetNet, https://sph.uth.edu/retnet)). Typical for Mendelian disorders, diagnostic rates vary widely by phenotype and are inversely correlated with the level of genetic heterogeneity.18 The genetic diagnostic rate can exceed 90% in certain clinically well-defined phenotypes of IRDs (eg, choroideraemia),19 while the most common subtype of IRD, retinitis pigmentosa, shows an extreme level of genetic heterogeneity, resulting in lower diagnostic rates.20 21

Another group of inherited eye diseases with less pronounced clinical heterogeneity is inherited optic neuropathy (ION). This disease group mainly affects visual acuity, central visual fields and colour vision due to a progressive loss of retinal ganglion cells and their axons that form the optic nerve. Among IONs, Leber hereditary optic neuropathy (LHON) and dominant optic atrophy (DOA) are the two most common disorders seen in clinical practice.22 Three variants in the mitochondrial DNA account for ~95% of all cases of LHON, whereas about 70% of DOA cases harbour pathogenic variants in OPA1.23 Undoubtedly, next-generation sequencing (NGS) approaches have accelerated the identification of the underlying disease-causing variants in IRD and ION.24 25 Nevertheless, 24–52% of IRD cases and up to 78% of ION cases remain genetically undiagnosed despite rather comprehensive work-up such as targeted sequencing applying specific capture panels or exome sequencing (ES).24–29

Since 2019, individuals affected with IRD or ION recruited at the University Eye Hospital Tübingen receive genetic diagnostic testing based on GS. Having reached a significant number of 1000 datasets, the aim of the present study was to provide the mutational spectrum observed in an unselected cohort of IRD and ION cases and, in particular, to evaluate the advantages of GS compared with targeted approaches. Given the notoriously challenging annotation of intronic variants with respect to their functional consequences, the potential diagnostic added value of complementary RNA sequencing was additionally tested in a subgroup.

Materials and methodsCohort

All individuals in this study have been exclusively examined at a specialised outpatient clinic for IRD and ION established at the University Eye Hospital Tübingen. Enrolment of the entire cohort included consecutive admissions from January 2019 to September 2021. Only individuals who did not have previous genetic confirmation of the cause of their disease were recruited. Sampling was random and solely based on the patient’s interest in genetic testing and consent to a scientific data use of the results. Blood samples were sent to the genetic testing facility along with the patient’s informed consent and clinical diagnosis.

A total of 1000 affected individuals underwent genetic testing with GS. Among them, 921 were tested as singletons (ie, no other family member was sequenced). In 26 families, 2 affected individuals were tested, and in 3 families, 3 affected individuals were tested. Eighteen cases were sequenced as trios (affected child and unaffected parents), but their parents were not included in the 1000 cases.

Each patient underwent a comprehensive ophthalmological examination. The decision as to which examination was performed in each patient was made on an individual basis, considering the patient’s compliance and the course of the disease. Individual examinations were as follows: assessment of best-corrected visual acuity using early treatment diabetic retinopathy study charts, semiautomated 90° kinetic visual field examination using the lll4e and I4e isopters (Octopus 900; Haag‐Streit, Köniz, Switzerland), colour vision testing using panel D15 tests and full‐field electroretinography testing according to International Society for Clinical Electrophysiology of Vision standards (Espion; Diagnosys, Lowell, Massachusetts, USA). In addition, fundus and fundus autofluorescence photography, as well as spectral-domain optical coherence tomography (Spectralis HRA+OCT; Heidelberg Engineering, Heidelberg, Germany), were performed.

Prior to genetic testing and depending on their clinical phenotype and self-reported family history, patients were assigned a specific family code, namely ACHM (achromatopsia), ADRP (autosomal dominant retinitis pigmentosa), ARRP (autosomal recessive retinitis pigmentosa), BBS (Bardet-Biedl syndrome), BCM (blue cone monochromacy), BVMD (Best vitelliform macular dystrophy), CACD (central areolar choroidal dystrophy), CD (cone dystrophy), CHM (choroideraemia), CRD (cone‐rod dystrophy), CSNB (congenital stationary night blindness), LCA (Leber congenital amaurosis), LHON, MDS (macular dystrophy), MISC (miscellaneous diagnoses; including, but not limited to: Bietti’s crystalline dystrophy, exudative vitreoretinopathy and Wagner’s disease), NYS (nystagmus), OA (ocular albinism), DOA, SCHI (retinoschisis), SRP (simplex retinitis pigmentosa), STGD (Stargardt disease), UD (unclear diagnosis; for example, conflicting phenotypical features), USH I (Usher syndrome type 1), USH II (Usher syndrome type 2) and XRP (X linked retinitis pigmentosa). For the sake of better legibility, these abbreviations are only used in figures and tables, but not in the main text. Note that family codes were not adjusted after genetic testing (ie, if the molecular diagnosis suggested a change or refinement of the clinical diagnosis).

Two hundred and six individuals had previously undergone first-tier genetic testing without discovering the cause of the disease, either by Sanger sequencing of recurrent genes (eg, RHO in ADRP or CNGB3 in ACHM), targeted sequencing of 108 genes associated with IRD using molecular inversion probes30 or targeted sequencing using a comprehensive diagnostic panel of genes associated with eye diseases.28

Familial co-segregation analysis was performed either by conventional Sanger sequencing or multiplex ligation dependent probe amplification whenever possible.

Genome sequencing

Diagnostic genetic testing based on GS was performed at the Institute of Medical Genetics and Applied Genomics (IMGAG), University Hospital Tübingen, Germany. Clinical GS has been accredited at the IMGAG by the DAkkS according to DIN EN ISO 15189:2014. Accreditation is effective for the scope of activities as defined in the certification annex (D‐ML‐13130‐04‐00).

Genomic DNA was extracted from whole blood using the FlexiGene DNA kit (Qiagen, Hilden, Germany) and quantified using the Qubit Fluorometer (Thermo Fisher Scientific, Dreieich, Germany). One microgram of genomic DNA was further processed using the TruSeq PCR-Free Library Prep kit (Illumina, Berlin, Germany) and generated libraries were sequenced on a NovaSeq6000 System (Illumina) as 2×150 bp paired-end reads to an average 49× coverage.

Mapping and variant calling

The conversion of the sequence data into FASTQ format was done with Illumina bcl2fastq. Adapter sequences were removed using SeqPurge31 and the remaining reads were mapped against the human reference genome (GRCh37, hg19) with Burrow Wheeler Aligner (BWA-MEM).32 Optical duplicates were removed with samblaster.33 Insertions and deletions were realigned using ABRA2.34 Variants were detected with freebayes35 and annotated with Ensembl VEP36 and various internal and external databases. CNVs and structural variants (SVs) were called using ClinCNV and Manta, respectively.37 38 For details, refer to the megSAP pipeline (https://github.com/imgag/megSAP) developed at the IMGAG, University Hospital Tübingen, Germany.

Variant filtering and interpretation

Various filtering steps were performed to prioritise potentially clinically relevant DNA variants. Filtering was mainly based on the predicted consequences of identified alterations, their listing in disease databases (specifically ClinVar,39 HGMD40 and LOVD41), and their allele frequency (≤1.00%). Allele frequency was estimated using an in-house database, 1000 Genomes,42 dbSNP,43 the Exome Aggregation Consortium browser44 and the Genome Aggregation Database (gnomAD).45 In a phenotype-based prioritisation, variants that have been previously associated with the individual’s disease or phenotypical characteristics were evaluated. Furthermore, variants that have been predescribed as clinically relevant (ClinVar status ‘pathogenic’ or ‘likely pathogenic’, HGMD annotation, ‘pathogenic’ or ‘likely pathogenic’ variants from our in-house database) were prioritised. To ensure an efficient diagnosis with high sensitivity, the search criteria were individually adjusted in the context of the research question based on the available additional information (eg, incidence and inheritance of the disease, ethnicity and family history, pathomechanism of candidate genes to be assessed). Variant nomenclature in this study is in accordance with Human Genome Variation Society recommendations. Variant classification in this manuscript was performed using the classification tool from Franklin (https://franklin.genoox.com—Franklin by Genoox) which is based on American College of Medical Genetics guidelines.46 The following classifications were used: pathogenic (P), likely pathogenic (LP), variant of uncertain significance (VUS), likely benign (LB), benign (B).

In this study, a case was considered unsolved if the identified variants were classified as benign or likely benign or if only a single pathogenic or likely pathogenic allele was identified in a gene associated with autosomal recessive inheritance. A case was considered possibly solved if one or more of the identified variants were classified as variants of uncertain significance. A case was considered solved if the identified variants were classified as pathogenic or likely pathogenic and consistent with the patient’s reported phenotype. Some exceptions were made to this rule if expert opinion deviated from the automated variant classification (see also Explanations in the online supplemental table). Note that validation of biallelism was not a prerequisite for the classification as solved. Information on familial co-segregation in individual cases can be found in online supplemental table 1 in the column ‘transmission/phase’.

Previously undescribed pathogenic and likely pathogenic variants have been submitted to ClinVar. Accession numbers can be found in online supplemental table 1.

Resequencing of exon 15 of RPGR

Due to the highly repetitive sequence of exon 15 of RPGR, which is not sufficiently covered by genome short-read sequencing, all unsolved cases with a clinical diagnosis of IRD were subjected to RPGR exon 15 (ORF15) resequencing. To this end, a long-range PCR (Roche, Expand Long Template PCR) covering exon 15 was performed. The amplicon was subsequently processed using the Nextera XT DNA library preparation kit (Illumina) and sequenced as 2×100 bp paired-end reads on a NovaSeq6000 System (Illumina).

RNA-sequencing and combined RNA/DNA data analysis

RNA-sequencing (RNA-seq) was performed on 74 cases. RNA was extracted from PAXgene Blood RNA Tube (Qiagen) with QIAsymphony PAXgene Blood RNA kits on a QIAsymphony SP with the protocol PAXgen RNA V5. From 500 ng of total RNA, mRNAs were enriched using polyA capture on a NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB). Libraries were prepared on a Biomek i7 (Beckman Sequencing) using Next Ultra II Directional RNA Library Prep Kits for Illumina (NEB) and NEBNext Globin & rRNA Depletion Kits according to the manufacturer’s instructions. The fragment sizes were determined with a Fragment Analyzer (High NGS Fragment 1–6000 bp assay (Agilent)) and the library concentration (approximately 5 ng/µL) was analysed with an Infinite 200Pro (Tecan) and the Quant-iT HS Assay Kit (Thermo Fisher Scientific). The 270 pM cDNA libraries were sequenced as 2×100 bp paired-end reads on an Illumina NovaSeq6000 (Illumina, San Diego, California, USA) with approximately 50 million clusters per sample.

Generated RNA sequences were analysed with respect to aberrant expression, aberrant splicing and allelic imbalance using the megSAP pipeline (V.2022_08, https://github.com/imgag/megSAP). In brief, the ngs-bits tool collection (V.2022_07-80, https://github.com/imgag/ngs-bits) was used for quality control (ReadQC) and preprocessing (SeqPurge) of fastq files. STAR (V.2.7.10a, https://www.ncbi.nlm.nih.gov/pubmed/23104886, https://github.com/alexdobin/STAR/) was used for read alignment and detection of splice junctions, which were post-processed with SplicingToBed. After mapping, MappingQC was used for quality control and Subread (V.2.0.3, https://pubmed.ncbi.nlm.nih.gov/30783653/, https://sourceforge.net/projects/subread/) for read counting based on an Ensembl gene annotation file (GRCh38, release 107, http://www.ensembl.org/index.html). Upon normalisation (megSAP) and quality assessment (RnaQC), expression values of genes and exons were compared with an in-house cohort (same tissue and processing system) using NGSDAnnotateRNA.

Clinical interpretation was done with GSvar, enabling filtering for expression of genes and exons by gene, biotype, expression value, read counts and Z-score, compared with the cohort and the splice junctions by gene, type, read count and motif. Integrative Genomics Viewer (IGV, V.2.11.9, https://www.nature.com/articles/nbt.1754, https://software.broadinstitute.org/software/igv/) was used for visual inspection.

ResultsCohort characteristics

A total of 1000 individuals were enrolled in this study, of whom 941 were affected by IRD and 59 by ION. Males and females were equally represented (50.4% females, 49.6% males). In X linked disease, the proportion of affected males was naturally higher, for example, 100% in retinoschisis, and 91.7% in choroideraemia.

The median age at time of testing was 39 years (range: 1–85 years), while 10.7% of participants were minors. In early-onset diseases, genetic testing was mostly performed in childhood and young adult age, for example, in achromatopsia (median age: 19 years; range: 3–55 years) and Leber congenital amaurosis (median age: 15.5 years; range: 2–63 years). In contrast, genetic testing was performed later in life in participants with late-onset diseases, for example, in macular dystrophy (mean age: 48 years; range: 9–80 years) and Best vitelliform macular dystrophy (mean age: 46 years; range: 6–75 years). An overview of the age distribution in all clinical subgroups is given in online supplemental figure 1.

The largest phenotypical group within the IRD cohort (n=941) was retinitis pigmentosa (42.4%, n=399), followed by cone-rod dystrophy (9.4%, n=88), Stargardt disease (8.8%, n=83) and macular dystrophy (8.7%, n=82). The smaller ION cohort included 59 cases, 83% of whom were diagnosed with dominant optic atrophy (n=49). Figure 1 shows the number of cases in all clinical subgroups.

Figure 1Figure 1Figure 1

Distribution of solved cases, possibly solved cases and unsolved cases among the 25 clinical subgroups. ACHM, achromatopsia; ADRP, autosomal dominant retinitis pigmentosa; ARRP, autosomal recessive retinitis pigmentosa; BBS, Bardet-Biedl syndrome; BCM, blue cone monochromacy; BVMD, Best vitelliform macular dystrophy; CACD, central areolar choroidal dystrophy; CD, cone dystrophy; CHM, choroideraemia; CRD, cone-rod dystrophy; CSNB, congenital stationary night blindness; DOA, dominant optic atrophy; LCA, Leber congenital amaurosis; LHON, Leber hereditary optic neuropathy; MDS, macular dystrophy; MISC, miscellaneous diagnosis; NYS, nystagmus; OA, ocular albinism; SCHI, retinoschisis; SRP, simplex retinitis pigmentosa; STGD, Stargardt disease; UD, unclear diagnosis; USH I, Usher syndrome type I; USH II, Usher syndrome type 2; XRP, X linked retinitis pigmentosa.

Diagnostic yield

In 57.4% of the participants (n=574), a definite genetic diagnosis could be made. Another 16.7% of participants (n=167) were shown to carry variants of unknown significance or variants with unconfirmed biallelism in genes consistent with their phenotype, but which require further functional validation. The remaining 25.9% (n=259) of the probands received a negative report. Thirty-one per cent of unsolved cases (8.0% of the entire cohort) carried one or more single heterozygous likely pathogenic or pathogenic variants in a recessive gene, with ABCA4 (n=19), USH2A (n=5) and EYS (n=5) being the most frequently affected.

The overall diagnostic yield (defined as solved and possibly solved cases) was 74.1% for the entire cohort. When considering only IRD cases (n=941), the overall diagnostic yield was 75.1% compared with 59.3% for the smaller ION cohort (n=59). Thirty per cent of participants reported other affected family members. In participants with a positive family history, the overall diagnostic yield increased to 85%.

The rate of solved cases varied widely among disease entities and was highest in diseases with little or no genetic heterogeneity, such as choroideraemia (100%), X linked retinitis pigmentosa (89.4%) and retinoschisis (84.6%) (figure 1). The rate of solved cases was considerably lower in macular dystrophy (51.2%) and simplex retinitis pigmentosa (36.1%). Overall, disease-causing and possibly disease-causing variants were identified in 190 genes (table 1). The most frequently implicated gene was ABCA4 (16.3%), followed by USH2A (6.3%) and RPGR (4.7%). These three genes accounted for 30.7% of all solved cases. Accordingly, ABCA4 accounted for 5 of the 10 most frequent alleles in the cohort and USH2A for 3 (online supplemental table 2).

Table 1

Distribution of genes with causal variants according to clinical subgroup

Genes and variant types

In total, 1097 different variants were identified. A total of 53.2% of the variants (n=584) have already been described (ie, have an entry in HGMD), while 40.4% of the variants (n=443) were novel (ie, have no entry in HGMD; figure 2A). In addition, 6.5% (n=71) were SVs for which, with one exception, no matching HGMD entries are listed, as most HGMD entries do not specify exact breakpoints.

Figure 2Figure 2Figure 2

Characteristics of identified disease-causal genomic variation. (A) Variants with HGMD entry are described as already described variants. Novel variants are those with no HGMD entry so far. The fraction of structural variants is shown for which no matching HGMD entries are listed, as most HGMD entries do not specify exact breakpoints. (B) Distribution of variant types among the 1097 unique variants identified in the cohort. (C) Subcategories of variants from subgroups ‘other types of variants’ (n=25) and ‘structural variants’ (n=71) from (B). snRNA, small nuclear RNA.

Among the 1097 unique variants, 1026 represent single nucleotide variants or small insertions and deletions, including missense variants (n=548; 49.9%), nonsense variants (n=139; 12.6%), frameshift variants (n=173; 15.7%), in-frame insertion or deletion variants (n=27; 2.4%), canonical splice site variants (n=70; 6.3%) and non-canonical splice site variants (n=44; 4.0%). Less commonly detected variants included intronic variants acting on splicing (n=13), variants in regulatory regions (n=6), small nuclear RNA variants (n=2), start loss (n=1), stop loss (n=1) and tRNA variants (n=2). In addition, 71 unique SVs were identified in 77 individuals (including several members of families with the same SV and individuals with two biallelic SVs), representing 6.5% of variants. Figure 2 shows total numbers of all variant types.

Table 2 provides an overview of SVs. Their size ranged from 118 bp to 2.4 Mb. Multiple exon events were the most frequent (n=35), followed by single exon events (n=25). Larger SVs involving one or more genes were observed in 15 alleles. Few variants involved intronic or upstream regions and one case was shown to carry an unbalanced translocation. In terms of single genes, most SVs were identified in EYS (10 variants) and PRPF31 (9 variants). Most SVs could be characterised with nucleotide resolution of breakpoints. An example of compound heterozygous SVs in EYS is shown in figure 3.

Table 2

Types of structural variants

Figure 3Figure 3Figure 3

Illustrative example for biallelic structural variants. Genome sequencing with comprehensive bioinformatic analysis revealed a structural variant with breakpoints in intronic regions and only partial copy number alteration together with a deletion in trans in the EYS gene. The upper part shows the corresponding IGV screenshots, the lower part a schematic representation summarising the structural changes. The HGVS nomenclature is NC_000006.11:g.[65922918_66006755delinsGTTTTCTTTTTA]; [64832337_64839052delins64914341_64945399inv]. Biallelism of the structural variants was confirmed by carrier testing using shallow genome sequencing. HGVS, Human Genome Variation Society; IGV, Integrative Genomics Viewer.

Inheritance patterns

Among cases with a definite genetic diagnosis, pathogenic or likely pathogenic variants were found in autosomal genes (83.5%), X linked (15.3%) and in mitochondrial genes (1.2%). Most solved cases (56.3%) carried variants in genes in which all so far described variants act exclusively recessively compared with 13.7% where variants were found in genes with solely dominant disease-causing alleles described so far. In several genes (in particular RP1, BEST1 and NR2E3), variants can act either dominantly or recessively. Variants in these genes were identified in 13.4% of solved cases. Although 6.4% of cases (n=64) with a definite genetic diagnosis had pathogenic or likely pathogenic variants in more than one gene, the majority of these cases (n=57) had a single molecular diagnosis in addition to being a carrier of a pathogenic or likely pathogenic variant in a gene associated with an autosomal recessive disorder. Note that six cases (namely ARRP 441, ARRP 453, USH II 348, STGD 465 and two affected individuals from family CRD 844) had dual molecular diagnoses. Considering the large overlap of clinical subtypes in IRD, it is difficult to determine the contribution of each gene to the phenotype.

Prescreened cases

We obtained data on the presence or absence of previous genetic testing results in all participants. The majority of the cohort (n=794) had not received any previous genetic testing (cohort A), while in 69 individuals, Sanger sequencing of single or few recurrent genes had been negative (cohort B). Twenty individuals had received a prescreening with a mid-size research gene panel (108 genes)30 with reduced coverage (cohort C), whereas 117 individuals had received genetic testing with a comprehensive diagnostic gene panel (cohort D).28 Diagnostic yield was improved in all prescreened cohorts, with 47 additional definite genetic diagnoses made in cohort B (68.1%), 5 in cohort C (25.0%) and 42 in cohort D (35.9%).

RNA-seq in selected cases

We investigated a subset of 74 patients by RNA-seq to evaluate its added diagnostic value in IRD (71) or ION (3) as a genome-wide tool for improved variant interpretation and detection of transcript-deleterious alterations not expected from genome data analysis alone. An average of 11 335.75±2115.13 Mb of sequences was generated per sample with 52.78%±2.26% of the reads mapping to the coding region. The coverage of a set of 2176 housekeeping genes (see online supplemental table 3) was documented as an additional quality control parameter. In 58% of the samples, more than 65% of the housekeeping genes were covered at >20× or more than 80% at >10×. Evaluation of the expression levels of the 28 IRD/ION genes most frequently affected in our cohort indicated that only 5 of them (OPA1, RP2, BEST1, PRPF8 and PRFP31) are stably expressed in blood (defined as transcript per million in >95% of individuals; online supplemental figure 2).

The cohort of 74 individuals included 21 solved cases, 17 unclear cases and 36 unsolved cases. In a first step, we investigated suspected disease-related variants with a predicted effect on splicing. Of the total of 31 prioritised changes, 77.4% (24 of 31) could not be evaluated at the transcript level due to generally low expression of the affected genes, 16.1% (5 of 31) remained non-conclusive as transcript alterations were either not statistically significant in quantitative analyses or the specific regions had insufficient read coverage.

In 6.5% (2 of 31) of cases, the transcriptome analysis provided additional evidence of the functional consequences of identified splice site changes in the genes SCLT1 (individual CRD 671) and SDCCAG8 (individual ARRP 285). Proband CRD 671 carried a heterozygous canonical splice site change predicted to alter the splice acceptor site of intron 15 in trans with a 1.4 kb deletion of the last coding exon, exon 21 (SCLT1 (NM_144643.2): c.[1294–2A>G];[2005–1052_*363del]). While overall SCLT1 expression levels were comparable with controls, there was evidence of multiple aberrant splicing events (online supplemental figure 3), consistent with a functional relevance of the identified SCLT1 variants. Proband ARRP 285 was found to carry a heterozygous predicted frameshift variant in trans with a deep intronic alteration in SDCCAG8 (NM_006642.3): [c.1946_1949del];[740+267C>T]). Transcripts with the frameshift were not observed in the RNA-seq reads, consistent with reduced expression due to nonsense-mediated decay of the mutant transcript. The intronic variant c.740+267C>T is predicted to affect a splice enhancer site, possibly resulting in aberrant splicing comparable with the nearby c.740+356C>T variant.47 In line with this hypothesis, we did not observe any reads supporting the usage of the canonical exon 7–8 splice junction in the patient’s RNA-seq but the presence of reads with predominant inclusion of an alternative coding exon as well as other non-canonical transcripts (see online supplemental figure 4). While these findings are supportive of a functional relevance of the c.740+267C>T, we are aware of the technical limitations associated with overall rather low SDCCAG8 expression levels and read length of 2×100 bp, hindering quantitative assignments of the distribution of specific aberrant transcripts.

The transcript alterations in SDCCAG8 and SCLT1 were subsequently also identified in an unbiased systematic transcriptome-wide prioritisation for aberrant expression, mono-allelic expression or aberrant splicing. Interestingly, the aberrant expression of SDCCAG8 was not significant at the gene level but was evident at the single-exon level, which also allows detection of altered expression of individual isoforms. However, no additional disease-related changes were detected in the untargeted transcriptome-wide analysis.

Discussion

Although many IRD cohort studies have now been published, those involving GS are still scarce. In 2017, Carss and colleagues published a cohort study of 722 individuals with IRD.5 While 605 of the individuals were analysed using GS, 72 had ES, and an additional 45 had both. The composition of their cohort in terms of clinical subgroups and the proportion of singletons, duos and trios is very similar to ours. The main difference is that in their cohort, almost all participants had received some sort of prescreening, whereas in ours, this fraction was only 20.6%. Carss and colleagues achieved a pathogenic variant detection rate of 56%, which almost exactly matches our rate of 57.4%.5 In their study, 31% of cases that remained unsolved after ES could be solved by GS. Another IRD cohort study with 562 individuals was performed by Ellingford and colleagues in 2016, but only 46 cases underwent GS.13 They established a molecular diagnosis in 50% of cases which is consistent with the present study and that of Carss and colleagues. Ellingford and colleagues hypothesised that diagnostic yield could have been increased by 29% if their GS pipeline had been applied to all 562 patients in their cohort, rather than just 46.13 However, they concede that much of this hypothetical uplift could also be achieved by improved variant calling in targeted NGS pipelines and the inclusion of non-coding regions that are known to harbour pathogenic variants. Recently, Biswas and colleagues have analysed 409 individuals from 108 unrelated pedigrees with IRD using GS and achieved a molecular diagnosis rate of 57%.12 In summary, our detection rate is very close or equal to that of other studies. In the following, we will discuss several aspects of our study in more detail.

Prescreening

In our cohort, 206 individuals had received prior genetic testing, either based on Sanger sequencing of single or few recurrent genes (cohort B, n=69), research-based testing using molecular inversion probes targeting 108 IRD genes (cohort C, n=20) or diagnostic-based testing using a comprehensive gene panel for inherited eye disorders (cohort D, n=117). Individuals who had not had any genetic test prior to this study (cohort A) had a notably higher rate of likely molecular diagnoses (76.2%) than those of cohorts B, C and D (66.0%). Nevertheless, GS improved diagnostic yield in all prescreened cohorts: 47 definite diagnoses were made in cohort B, 5 in cohort C and 42 in cohort D. This was achieved primarily through the detection of variants that reside in genes that have not been included in the first-tier testing. However, GS also improved diagnostic yield by the delineation of balanced48 or complex SVs that were undetectable by prior testing strategies (see examples table 2 and figure 3).

Structural variants

We previously examined a cohort of 2158 individuals diagnosed with IRD using a targeted gene panel and found that the genotypes of 91 individuals (4.2%) comprised CNVs.28 These variants were mainly recurrent deletions of single or multiple exons in four different genes (PRPF31, USH2A, EYS and CHM). In the present study, clinically relevant SVs were detected in 77 out of 1000 individuals (7.7%). These variants were found in 37 different genes and comprised not only CNVs but also copy-neutral rearrangements (see table 2). The nearly doubled detection rate of SVs in the present study can certainly be explained by the improved detection of SVs by GS, especially copy number neutral ones. In addition, GS allows breakpoint resolution to the nucleotide level in many cases. This facilitates segregation in family members where breakpoint PCR can be easily performed instead of a more laborious qPCR.

CNVs were identified in three genes not previously reported to have large deletions or duplications (CNGA1, CWC27, SCLT1), expanding their mutational spectrum.

Trios

Trio analysis (ie, including the parents of an affected child in the analysis) aids in the filtering of familial benign variants and allows the direct assessment of the inheritance pattern of candidate variants, as well as their phase. It also reliably identifies de novo variants. In this study, we have analysed 18 IRD trios. Of these, 13 are considered solved with pathogenic or likely pathogenic variants, while 1 case was shown to be compound heterozygous for a likely pathogenic variant and a variant of uncertain significance. Four trios remained unsolved. Hence, the diagnostic yield among the trios was 72.2% which basically equals the overall diagnostic yield of the entire cohort (74.1%). Therefore, we cannot conclude that trio analysis has a major benefit in IRD diagnostics, although the number of trios is probably too small to draw a definite conclusion.

De novo variants

Since parental DNA was available only in a subset of cases, we could not assess the actual frequency of de novo variation in our cohort. A de novo status of variants could be confirmed in three cases. While de novo variants have previously been found in 3.2–14.3% of cases diagnosed with retinitis pigmentosa,20 49 the proportion is generally unknown in IRD because most cohort studies have not assessed the de novo status. In any case, most variants of IRD can be expected to be inherited, as reproductive fitness is unlikely to be affected—in contrast to developmental disorders where the proportion of de novo variants is high.6

Cases solved by additional measures

The inclusion of difficult-to-sequence genomic regions is essential in IRD, since two clinical subgroups are solely caused by X-chromosomal genes that are characterised by highly repetitive sequences. X linked retinitis pigmentosa is most commonly caused by mutations in RPGR and, to a lesser extent, RP2. The RPGR gene includes a C-terminal exon, termed ORF15, which is characterised by a highly repetitive glutamic acid/glycine‐rich domain. While ORF15 is difficult to sequence, it is also a known mutation hot-spot that comprises 60% of disease‐causing variants in RPGR.50 51 Based on clinical examination and self-reported family history, 39 cases in our cohort were diagnosed with X linked retinitis pigmentosa. Among these, 35 cases could be solved with pathogenic or likely pathogenic variants in RP2 (n=9) or RPGR (n=26), leading to a diagnostic rate of 89.7%. This rate is only slightly lower than that of our former study (93.7 %) based on targeted sequencing.28 To ensure a high diagnostic sensitivity for variants in ORF15, all unsolved IRD cases were resequenced for this region. This was done for all these cases because small core families, which are prevalent in Germany, make it challenging to determine the mode of inheritance accurately. Resequencing led to a molecular diagnosis in only one case, suggesting that GS is capable of detecting most variants in the notoriously difficult-to-sequence region of ORF15.

Another under-represented target region is the red/green cone opsin (OPN1LW/OPN1MW) gene cluster associated with cone dysfunction disorders, such as blue cone monochromacy. Short-read sequencing is not sufficient to reveal the complexity of the opsin gene cluster and distinguish between copies since the variable number of OPN1LW/OPN1MW genes within the gene cluster shares an identity of 98% both for coding and non-coding nucleotide sequence.52 Using IGV, the red/green opsin gene cluster (OPN1LW/OPN1MW) was manually inspected in cases diagnosed with blue cone monochromacy and eventually analysed using a customised genotyping strategy in a research set-up.52 With this approach, 8 out of 11 cases with blue cone monochromacy could be solved.

Thirteen different intronic variants were identified in this study, seven of which are already known to cause a splicing defect, in particular the c.4253+43G>A variant in ABCA4 and the c.2991+1655A>G variant in CEP290.53 54 The interpretation of non-coding variants is challenging and needs complementary methods to decipher their functional impact. Of the six novel deep intronic variants, we have functionally characterised one, namely the c.1033–327T>A variant in POC1B.55 In addition, we have analysed a silent exonic variant (c.750A>G) in MFSD8.56 Using in vitro splice assays and direct cDNA analysis, we could demonstrate a pathogenic effect for the c.1033–327T>A variant in POC1B identified in patient MDS 438 and the c.750A>G variant in MFSD8 identified in family MISC 272.55 56

Unsolved cases with monoallelic variants and diagnostic value of RNA-seq

Eighty cases were found to harbour pathogenic or likely pathogenic variants on one allele in genes associated with recessive disease. While it cannot be ruled out that these individuals are just incidental carriers, it is entirely possible that an elusive pathogenic variant resides on the second allele. Searching for these elusive variants and determining their potentially deleterious effects, for example, through a systematic functional evaluation of rare deep-intronic variants, might prove crucial to improving the diagnostic rate. As for the latter, the results of our pilot study on 74 cases suggest that the added diagnostic value of RNA-seq from blood is rather limited in IRD. At first glance, this observation contrasts with RNA-seq studies suggesting improved diagnostic variant interpretation in up to 13–16% of cases, but this is likely to depend largely on the clinical diagnosis and expression of associated genes in accessible tissue.57 58 Apart from the disease phenotypes and tissues studied, the extent of concomitant DNA sequencing studies and bioinformatic analyses may contribute to the differences in the percentage of additional cases solved. For example, most studies were performed on cases that remained unresolved after exome sequencing, so RNA-seq could contribute significantly to the technical accessibility of non-coding regions and additional types of genomic variation. We expect that this benefit will decrease over the next years with improved detection and annotation of intronic genomic variation and diagnostic-grade genome analyses becoming widely available as a first-line diagnostic assay. However, we also agree on the importance of additional biosamples such as fibroblast cell lines as well as induced pluripotent stem cells and thereof derived organoids or specialised cells as a resource for second-line analyses in a research context.

Age at molecular diagnosis

In this study, age at onset was not assessed. However, we used the age at genetic testing as a variable. The mean age at molecular diagnosis for solved and possibly solved individuals was 38.2 years compared with 45.4 years for unsolved cases. This difference in the age distribution is statistically significant (Mann-Whitney U test, p<0.001). Remarkably, these figures are almost identical to those from our previous study, where the average age of participants with a molecular diagnosis was 39.8 years, while it was 46.3 years for unsolved cases.28 The differences in age distribution between solved and unsolved cases are reflected in the differences in age distribution within clinical subgroups. The median age at genetic testing in early-onset diseases such as achromatopsia or Leber congenital amaurosis was significantly lower than that for late-onset diseases such as macular dystrophy or central areolar choroidal dystrophy (online supplemental figure 1). For the latter, the rate of solved cases was significantly lower than for the former. A skewing of diagnostic yield towards an earlier age of onset/testing in IRD cohorts has been reported previously25 and has been attributed to a potential multigenic or multifactorial aetiology in participants with an age of onset >50 years.59

Benefits of GS

Although it is difficult to estimate numbers, a considerable proportion of undiagnosed cases in IRD may be attributed to variant types which can only be detected using GS. One of the major benefits of GS is the more uniform coverage compared with other technologies. This can be seen, for example, in gnomAD which contains both ES and GS datasets: while 89.4% of the exome was covered with ≥20 read depth, this value was exceeded by GS with a 97.1% coverage.60 Since the ES datasets were generated before the GS datasets, improvements in the general sequencing process may account for some of the differences, but the more uniform coverage of GS has been observed repeatedly.3 4 Moreover, GS requires lower average coverage to obtain the same sensitivity and accuracy in variant calling compared with ES.61 62 Along this line, we aimed to estimate the added diagnostic value of genome versus exome sequencing using an in silico approach focused on sequence coverage. From at least 10 representative diagnostic grade exome datasets generated with Agilent Human SureSelect Human All Exon Kits V6 and V7 as well as a Twist-based custom enrichment kit, the coverage of the genomic positions of the reported variants was calculated. Depending on testing system investigated, the diagnostic coverage cut-off applied (below 20× or 6× coverage) and the types of variation included (small variants with/without structural variants), the rate of exome failures ranged from 1.7% (<6×, small variants only, Twist) to 8.5% (<20×, including structural variants, SureSelect Human All Exon V6). Depending on the quality of an exome or panel experiment and extent of the initial diagnostic work-up, we assume that an estimated 5–10% of additional diagnoses is achievable using a bioinformatic pipeline optimised for data analysis beyond the exome and a constantly growing list of clinically annotated DNA variants in the non-coding regions. For inherited eye diseases, this estimate is also supported empirically by a diagnostic yield of 70.8% documented in a cohort recruited in the same clinical setting but investigated by targeted approaches,28 compared with 76.8% in the current study, corresponding to 7.8% of additional diagnoses.

In conclusion, GS can provide specific genetic diagnoses for a considerable proportion of patients with IRD and ION. A molecular diagnosis provides individuals with improved genetic counselling, redirection of clinical management, monitoring for potential systemic manifestations, and preparedness for existing and future precision therapies. GS avoids serial testing of unsolved cases and is the only platform interrogating non-coding regions. Whereas ES currently is the favoured first-tier diagnostic tool, an increasing number of studies have used GS as a first-tier test.9 63–65 Given the continuous decrease of sequencing costs and increase in sequencing capacity, the number of GS analyses in a diagnostic is expected to increase further. The full extent of the benefits of GS will probably become more apparent when bioinformatics predictions have improved and allow for a better selection of candidate intronic variants. However, functional work-up of these intronic variants through functional follow-up studies will likely remain a challenge. While the targeted approaches have been shown to be effective for specific variants in selected genes,55 66 they are rather difficult to scale. Generic approaches with putatively broad clinical implementation, such as RNA-seq, currently lack adequate models, and readily accessible tissues such as blood are informative for only a small subset of disease genes underlying IRD or ION. This makes documentation of the results of such studies in publicly available databases all the more important as a community effort to improve the diagnosis of future patients.

留言 (0)

沒有登入
gif