In this proof-of-concept study, we present a workflow to classify pediatric CNS tumors using a DNA methylation assay that was developed for fragmented cfDNA isolated from liquid biopsies, and that works on both DNA from fresh frozen and paraffin-embedded tissues. Tumor type and tumor fraction are estimated using computational deconvolution based on a reference dataset containing published methylation data of brain tumor tissues complemented with healthy cfDNA profiles. We found that the tumor classification correlates well with histopathological diagnosis for good quality cfDNA samples. We identified several pitfalls of our approach related to CSF collection and CSF characteristics, as well as opportunities for improvement which require further validation on larger patient cohorts before clinical implementation.
In summary, 7 out of 20 samples from pediatric CNS cancer patients are classified correctly using cfDNA from CSF. All samples with high cfDNA/total DNA fraction were classified correctly (6/20). In most misclassified samples, we observe increased fractions of HMW-DNA (length over 700 bp) among the isolated DNA in CSF, with 14 samples showing a cfDNA fraction lower than 0.5. Additionally, more than half of the samples exhibit a cfDNA yield below 5 ng. The scarcity of cfDNA in CSF is not emphasized in similar studies, yet it is notable that published articles working on cfDNA from CSF for tumor classification and follow-up almost exclusively focus on higher grade tumors [36, 37, 44,45,46,47,48,49,50]. Although ctDNA levels are not defined by tumor grade alone, it is an important variable in the release of fragmented DNA [51]. Next to the more urgent clinical need for those more aggressive tumors, lack of studies on lower grade tumors most probably indicates the challenges in obtaining sufficient ctDNA (circulating tumoral DNA) material. The Afflerbach study [37], in contrast, does include low- and high-grade tumors, and indeed underscores the low abundance of ctDNA in the CSF. Due to minimal input requirements of 1 ml CSF and 5 ng of DNA, only 72% samples were suitable for nanopore sequencing. Of these samples 17% passed the minimal technical requirements for methylation profiling and correct classification. Although our cohort size prevents direct comparison, the cfRRBS approach confirms these results with successfully generated methylation profiles on all included samples and correctly estimated tumor diagnosis in 30% of the samples. Still, obtaining sufficient ctDNA appears challenging for lower grade tumors.
In the paragraphs below, we discuss the different factors that impact the performance of our classification approach, including HMW-DNA contamination, tumoral cfDNA fractions and reference dataset.
In previous cfRRBS studies, we have already highlighted the importance of assessing the cfDNA/HMW-DNA fraction in each sample [39]. HMW-DNA that is also processed in the cfRRBS library preparation will dilute the signal of the cfDNA and might lead to misclassification of samples in case of high fractions of HMW-DNA. The HMW-DNA is likely derived from cells that are damaged during ventricular drain placement or white blood cells in blood-contaminated samples. In this study, we show that centrifugation of a fresh CSF sample before DNA isolation improves sample quality in many cases. Freezing the CSF prior to centrifugation results in lysis of the cells and thus release of HMW-DNA into the fluid. Thus, centrifugation within four hours after collection and before freezing is ideal, but more challenging to implement into clinical practice. In addition, avoiding blood cell contamination will result in better quality samples. Secondly, although yield of ctDNA relates to variables such as tumor size and aggression [51], sampling in close proximity of the tumor through a ventricular drain might collect more ctDNA compared to sampling via lumbar puncture. These observations underscore the need for more dedicated studies investigating the pre-analytical variables that can improve the sample quality, as well as more standardized protocols for CSF collection ensuring the standard collection of high-quality samples allowing robust tumor classification. The significance of this is highlighted by the wide variation in published collection protocols, encompassing collection through lumbar puncture, ventricular drainage, or mixed cohorts, with volumes ranging from 200 µl to 10 ml. While centrifugation is commonly included, there are studies that omit this step [30, 31, 33,34,35, 37, 45].
The absence of a well-defined profile characterizing the non-tumoral background in cerebrospinal fluid (CSF) poses a limitation in accurately estimating the fraction of a specific tumor type. This challenge becomes particularly pronounced in cases where CSF samples contain lower tumor fractions and higher non-tumoral cfDNA. We observe that in the samples with cfDNA fractions below 50%, most samples are classified as central neurocytoma or infantile hemispheric glioma. We observed a better classification accuracy in cfDNA samples with higher estimated tumor fractions. In blood samples with lower tumor fraction, the non-tumoral cfDNA mostly originate from the white blood cells [42]. For CSF, however, the origin of the non-tumoral cfDNA is not clearly defined and might originate from WBC, but also from brain tissue damaged due to increased pressure in patients presenting with hydrocephalus. In the CSF of 4 low-grade cancer patients, central neurocytoma (CN) is the highest estimated tumor signal. This might be explained by ventricular cells that are damaged due to the increased intracranial pressure, that and resemble the methylation profile of the ventricular CN tumor. Similar effects might be present for infantile hemispheric glioma (IHG), highest estimated tumor fraction in 7/20 patients. To investigate this hypothesis, we need CSF samples of non-tumoral pediatric patients with increased intracranial pressure which is very difficult to obtain.
Although the Capper reference data include various healthy brain tissues, pediatric profiles often differ from adult. Additionally, the increased intracranial pressure does not match a physiological state, and hydrocephalus background profiles might be interpreted as tumor entities. To allow proper deconvolution of all the contributing cell types in the CSF-cfDNA samples, a reference dataset encompassing all those cell types is necessary. However, pediatric CSF collection is only performed in patients with a (suspected) brain-related pathology, thus obtaining a reference sample and DNA methylation profile from pediatric hydrocephalus patients without brain pathologies is almost impossible. One option would be the inclusion of CSF from patients with hydrocephalus caused by a traumatic brain injury.
The performance of deconvolution algorithms heavily relies on the choice of reference data. The DNA methylation-based assay for CNS brain tumor diagnosis is utilized in an increasing number of pathological departments and employs the Infinium HumanMethylation450 BeadChip array [11]. This array encompasses 450,000 methylation sites and shows good performance to distinguish between different tumor entities. However, a drawback is the recommended input of 500 ng DNA [52], a quantity significantly surpassing the average cfDNA yield from liquid biopsies. To address this challenge, we successfully employed cell-free reduced representation bisulfite sequencing (cfRRBS), an approach tailored for low quantities of highly fragmented DNA, requiring only 10 ng or even less input DNA to generate high-quality DNA methylation profiles [38]. We formatted the published 450 K array methylation data [11] of CNS tumors to align with the cfRRBS workflow and used it as a reference dataset for deconvolution. A limitation of this approach is that we only use the sites that are covered by both the 450 K array and the cfRRBS assay, which is only 13.7% of methylation sites that are covered by the cfRRBS assay. By restricting the number of sites, we noticed that discriminating low-grade glioma tumors became more challenging as visualized in the UMAP plot (Fig. 1) compared to the published UMAP [11]. Building a (cf)RRBS-based reference dataset would enable the utilization of all cfRRBS regions in the deconvolution model and thus increase the available information to discriminate different tumor entities; however, this will come with additional effort and costs. This problem highlights the trade-off between maximizing data inclusivity and managing data availability or associated financial constraints. In addition to the restricted number of sites, the published version of the classifier shows challenges in discriminating low-grade glioma tumors, resulting in less accurate predictions for this particular subtype [53]. Newer versions of the classifier can improve classification for several challenging tumor types including low-grade gliomas; however, the reference data of newer versions are not publicly available [53].
Interestingly, the data produced via cfRRBS can also be used for CNV profiling. Although these data are more noisy compared to dedicated CNV profiling assays such as shallow whole-genome sequencing (shWGS), extraction of multiple data layers from cfRRBS reads without requiring new input material is an important asset. Compared to most cfDNA shWGS approaches for CNV analysis, cfRRBS lacks a size separation step, and thus, also HMW-DNA will be processed [38] resulting in a dilution of the tumoral signal. Indeed, for samples with tumoral fraction below 30% we could not observe any tumor associated aberrations. For the patients with matched tumor and CSF material and higher estimated tumor fractions, we observed some CSF-specific aberrations suggesting intratumoral heterogeneity, similar to the results described by Chicard et al. [44]. However, it is notable that the lower quality of the CNV profile data limits the number of patients for which the CNV profile can accurately be analyzed (Figs. 5, supplementary Fig. 2).
An important factor to consider for clinical implementation of an assay is the time between sampling and reporting of results. For the proposed assay (cfRRBS followed by computational analysis), the turnaround time is roughly 5 days in an optimized setting where samples are processed immediately after collection. This is a reasonable turnaround time for molecular diagnostics and falls perfectly within the median turnaround time of 21 days that are presented for most targeted NGS and DNA methylation profiling assays [54]. Another important advantage is that the proposed workflow is designed almost completely as a single tube reaction which facilitates clinical implementation through a fully automated liquid handling system [38]. Additionally, the cfRRBS protocol is cost-effective compared to other sequencing methods. By targeting particular subsections of the genome, sufficient sequencing coverage is achieved using only 20–25 M reads per sample [38].
Comments (0)