A total of n = 155 patients with newly diagnosed AML treated within the AMLSG 12-09 trial and with available pre-treatment bone marrow samples were randomly assigned to a screening (n = 58) and a refinement and validation cohort (n = 97). The median age in the screening cohort was 63 years (range 20 to 78) and 60 years in the refinement and validation cohort (range 19 to 82). 95% and 94% of patients were younger than 75 years in the two cohorts, respectively. Sex distribution was imbalanced between both cohorts with 52% female patients in the screening cohort and 41% in the refinement and validation cohort.
Out of 58 patients of the screening cohort, 18 received standard of care treatment (STD) and 40 patients received experimental treatment (EXP) comprising AZA as substitute for cytarabine (araC) (Fig. 1A).
Fig. 1Analysis overview. A Overview of analysis steps based on DNA isolated from mononuclear cells from each pretreatment bone marrow aspirate from a subset of 155 AML samples derived from the AMLSG 12-09 trial. Global, genome-wide methylation status of a training set was analysed via MCIp followed by NGS-analysis on the HiSeq 2k platform. Differentially methylated regions were derived and ranked according to p-values and effect size. Methylation levels within a set of top regions were validated via 450k analysis at single CpG resolution and used to generate a classifier. B The validation cohort consisted of an independent subset of patients derived from the AMLSG 12-09 collective. Methylation status of the classifier contained CpGs was analysed via MassARRAY assay and used for validation. CR was defined as non-detectability of evidence for disease both cytomorphologically and via immunophenotyping in peripheral blood smear and bone marrow aspirate as well as via molecular genetics. AML acute myeloid leukemia; DMR differentially methylated regions; MCIp methyl-CpG immunoprecipitation; HiSeq 2k the HiSeq next-generation sequencing platform; NGS next generation sequencing; 450k Infinium® HumanMethylation450 Bead Chip; MassARRAY a benchtop multiplex genetic analyzer utilizing Matrix assisted laser desorption/ionization; time-of-flight mass spectrometry; std standard therapy arm; exp experimental therapy arm; CR complete response; RD refractory disease
Within the STD arm, 10 patients achieved complete remission (CR) and 8 patients had incomplete remission/induction failure (referred to as refractory disease, RD). For the EXP arm, CR was achieved in 19 patients and RD in 21 patients, respectively.
Within the refinement and validation cohort, out of 97 patients, 40 were treated in the STD arm and 57 in the EXP arm. CR within STD treatment was achieved in 29 patients, while 11 patients had RD. For 34 patients with EXP therapy, CR was observed, while 23 patients had RD.
In the screening cohort (refinement and validation cohort, correspondingly in brackets), median white blood cell count was 6 G/l with a range of 0.6–155 G/l (6 G/l; range 1–214 G/l), median peripheral blood blast count was 17.5% with a range of 0–97% (23%; range 0–97% and median bone marrow blasts were 60.5% with a range of 15–100 (70%; range 10–100%) (Table 1).
Cytogenetic analysis revealed 21 (35) patients with a normal karyotype (CN), 9 (17) patients with a complex karyotype (CK), 1 (4) patient with a 5q-minus-syndrome or loss of chromosome 5 (del(5q)/-5), 2 (1) with a MECOM rearrangement (inv(3)/t(3;3)), 3 (5) patients with a translocation 11q23 and 15 (22) patients with a karyotype not otherwise specified. In total, cytogenetic information was missing in 7 (13) cases (Table 1). Recurrent aberrations leading to genotype-specific therapeutic approaches (e. g. FLT3-ITD) at the time of inclusion in the study were excluded according to the protocol. Mutational status in a panel of seven genes recurrently mutated in myeloid neoplasia (TP53, ASXL1, DNMT3A, RUNX1, IDH1, IDH2, TET2) including regulators of the epigenotype did not correlate with AZA response. However, a significant difference in the number of DNMT3A mutations was observed between the screening and validation cohort. Moreover, there was no association of cytogenetic subgrouping or mutations in epigenetic modifier genes with therapy response (Additional file 2: Fig. 1). To assess the impact of the mutations on the overall methylation landscape in the screening cohort, we performed unsupervised clustering of the 1.000 and 10.000 most variable 500 bp bins of the MCIp analysis (Additional file 2: Fig. 2). Moderate clustering with discrete methylation patterns of DNMT3A and IDH2 was evident, whereas IDH1 and ASXL1 did not appear to have a significant impact. The major clusters of this unsupervised hierarchical clustering were not primarily driven by the mutations in the epigenetic modifier genes. In addition, we reviewed the distribution of mutations in the epigenetic modifier genes as well as the distribution of cytogenetic aberrations in our potential top DMRs between responding and refractory patients after evaluation for differential methylation (Additional file 2: Fig. 3). There was no segregation with response. Standard and experimental treatment arms within the screening cohort did not differ significantly regarding clinical characteristics except for bone marrow blast counts which were significantly higher in the exp-arm (66.5% versus 50.0%) than in the std-arm (p = 0.02) (Additional file 4: Table 2).
Fig. 2Significant Baseline DNA Methylation Differences reveal less methylation in refractory disease of AZA containing treatment regimens. A Volcano Plot illustrating methylation differences between AZA-sensitive and AZA-resistant (Experimental Therapy) as well as B induction sensitive and induction resistant patients (Standard Therapy). Mean methylation difference between the 2 groups is represented on the x axis and statistical significance (-log10 unadjusted p-value) on the y axis. Negative binomial distribution-based testing with edgeR identified 1755 DMRs, indicated by red and blue dots (FDR < 5% with adjustment for multiple testing)
Fig. 3Technical Validation of Differentially Methylated Regions. A Selection of EdgeR-based testing results for differential methylation between responders and non-responders both in EXT and STD arm, prior to validation. B Validation criteria are exemplarily illustrated for the 500 bp region assigned to WNT10A and its corresponding probe cg22587479. For this probe, a strong and distinct correlation between beta values and RPKM exists (Spearman’s rank correlation coefficient > 0.8). Differences in beta regression levels between resp. and non-resp. patients showed statistical significance and overall methylation differences showed congruency in the change between modalities, i.e. hypermethylation in patients with refractory disease both in the MCIp-seq and 450k assay. CR Complete Response; RD Refractory Disease; RPKM Reads per kilobase per million mapped reads
Genomewide DNA methylation screening within the screening cohortFor the development of a predictive classifier based on genome wide differential DNA methylation patterns, methyl-CpG immunoprecipitation-seq (MCIp-seq) of BM PBMC from AML patients in the screening cohort (n = 58) either treated within the STD or EXP arm was performed (Fig. 1B). Seven samples from the STD and EXP arm with very low read counts (mean read count < 1.0 × 10^6 reads) were flagged as outliers and removed from further analyses (Additional file 2: Fig. 4). A total of 51 samples remained in the screening cohort.
Fig. 4Significantly differentially methylated CpGs in close proximity to CpGs from original classifier define a recomposed classifier within an independent validation cohort. A Box plots for significant differences in methylation levels between responders (blue color) and non-responders (red color) as assessed by non-parametric wilcoxon rank sum tests in the EXP arm. Methylation levels were determined by MassARRAY assay. B Elements of a recomposed classifier based on a penalized likelihood regression model
Informative differential CpG methylation was retrieved for 664,227 (14%) out of more than 6 × 10^6 genomic bins enriched for high methylation by removing all bins with no reads across all or all but one sample.
Principal component analysis did not show formation of sample clusters (Additional file 2: Fig. 4) and components of variance did not display major effects in DNA methylation variance that allowed to reliably separate between treatment groups or response status. Overall, differences in variance distribution across principal components were subtle (data not shown).
Differential methylation analysis between responding and non-responding patients revealed twice as many regions with a significantly differing positive log-fold change (n = 384 vs. n = 157) in patients within the experimental treatment arm (Fig. 2A) indicating a higher fraction of hypermethylated regions in patients with refractory disease [40]. In the standard treatment arm, predominantly negative log-fold changes were observed within the group of responders (factor of 9.5 with n = 1109 vs n = 105) (Fig. 2B).
Overall distribution of differentially methylated regions (DMR) (n = 5.7 × 10^6, comprising both the EXP- and STD-set after filtering for positive read counts across all samples) within the filtered set of genomic bins shows higher read counts in exons while the set of top DMRs shows higher proportions of read counts with an intergenic and intronic location (Additional file 2: Fig. 5).
Fig. 5Quality assessment of the predictive model. A The AUC for the recomposed classifier is 0.924 and significantly improved over the previous version (AUC 0.59) resulting in a sensitivity of 93.3% and a specificity of 42.85% with a corresponding positive predictive (70%) and negative predictive value of 81.8% for the given results (B). C Final assessment via correction for multiple testing with .0632 + bootstrap resampling estimates reveal a misclassification error of 35% (C1) and a bootstrap estimation for the AUC of 0.77 (C2). ROC Receiver Operating Characteristic Curve. AUC Area under the curve. λ lambda, lasso penalty value
Identification of specific response prediction signature for the 5-azacytidine containing treatment arm (EXP)In total, considering both positive and negative log-fold changes, 1755 DMRs were identified at a false discovery rate (FDR) of 5% (541 in EXP and 1214 in STD arm) with adjustment for multiple testing. Identified DMRs were ranked according to q-values, i.e. adjusted p-values after multiple testing, and grouped into a top list. 50 candidates were chosen for validation based on the following criteria: q-value ranking, effect size and consistency of differential methylation in either treatment group. Effect size was set to include a read count difference of at least 2.5-fold in a consistent fraction of at least 50% of samples in either treatment group. Regions found on chromosomes 3 and 11 were excluded from analysis, as patients with inv(3)/t(3;3) and a translocation 11q23 could artificially introduce differential methylation on screening via MCIp-seq. This restriction affected less than 5% of choices for the top list. Because of the slightly uneven gender distribution between screening and validation cohort, sex chromosomes were also excluded from the analysis.
To extract an AZA specific response signature, DMR sets identified both in the EXP and STD arm were checked for overlaps as these were considered to potentially indicate unspecific global or chemotherapy associated effects on differential methylation, rather than AZA specific effects. Within the chosen top list, there were no overlaps between both DMR sets. Additional file 5: Table 3 contains a list of all filtered and significant DMRs, identified within the EXP arm. WNT10A shows an exemplary top hit (Fig. 3A).
Furthermore, enrichment in the vicinity of transcriptional start sites (TSS) of identified top DMRs as compared to the overall, filtered bin set could be observed. Moreover, GC content distribution in the set of top DMRs showed distinct skewing at a GC content level between 60 and 70% but was otherwise comparable to the entire genome, therefore indicating overrepresentation of higher GC content in the set of top hits (Additional file 2: Fig. 6). The first top DMRs included the genes WNT10A, ZNF490, LZTS2, CIZ1, TNK1, PIEZO1, UNC119 and ATOH8. A gene ontology analysis demonstrated a strong enrichment for regulation of phagocytosis and engulfment, cell maturation, regulation of cell activation as well as of cell proliferation and might therefore be involved in crucial regulatory steps in myeloid differentiation and proliferation (data not shown).
Fig. 6Coefficient plots for multivariable analysis of mutations (A) and chromosomal aberrations (B). A Coefficient plot for contained mutations, age and gender in comparison to the 12-CpG-signature. B Coefficient plot for karyotypes, age and gender in comparison to the 12-CpG-signature. Plots include the 95%-confidence interval for each predictor. Values in respective tables are the results from multiple logistic regression modelling. A model containing both mutational and cytogenetic variables could not be fitted because the sample size was too small to estimate all model parameters with sufficient confidence. Due to very small samples sizes for the del(5q)/-5 and t(11q23) groups (n = 2 each), both groups were combined with the group „other “ (n = 11) for the multivariable analysis. Karyotype merged comprises “del(5q)/-5 “, “t(11q23)“, “other “
Confirmation of genomewide MCIp-based DMR screening by 450 k infinium human methylation bead chip assayValidation of the MCIp-derived DMRs was done with HumanMethylation 450k Bead Chip aiming at enabling easy clinical applicability and easy reproducibility of a DNA methylation based predictive signature.
Out of the 50 top hit regions, 80% were represented on the 450 k Bead Chip by at least one CpG probe. In total, remaining top DMRs (n = 40) were represented by a total of 105 CpG probes with a variable number of CpG probes per top region (1–7 probes per region) and 25% of regions being defined by a single probe. For WNT10A, Fig. 3B illustrates a single CpG-site validation with a correlation of 0.816 for 450 k beta-values of a CpG probe and the reads per kilobase per million mapped reads (RPKM) for the corresponding DMR identified via MCIp-seq. The median correlation coefficient rho over all CpGs was 0.69 (95% confidence interval 0.32–0.87). Additionally, as baseline requirement, a correlation coefficient above the median and unidirectional differences in methylation changes between CR and RD were required for MCIp-Seq and 450 k Bead Chip results to meet the confirmation criteria. Based on the small sample size, the significance level for differential methylation within the 450 k dataset was set at 0.2. With application of these criteria, 90% of DMRs could be quantitatively confirmed via 450 k array-based analysis. 65% of top DMRs met all validation criteria for each CpG probe, 25% met all criteria for at least one CpG probe and 10% of DMRs failed technical confirmation due to insufficient significance levels.
In total, 95 out of 105 CpG probes, contained within 36 out of 40 top hit DMRs, could be confirmed and could subsequently be used to create a multivariable signature for therapy response prediction.
Generation and refinement of a methylation based predictive classifier based on single distinct CpGsFor an easy and clinically applicable signature, the MCIp-identified regional differences in methylation were aimed to be transformed and compressed into a classifier that contains individual CpGs. A penalized logistic regression model with automated selection of variables was fitted for predicting response to hypomethylating therapy. Logit transformation of 450k data with transition of beta values to M values was performed. Subsequently, fivefold cross validation was done to find optimal penalty parameters as described in the supplement. The resulting classifier comprised 17 CpG dinucleotides which were associated with 12 different genes and two previously undescribed regions (Additional file 2: Fig. 7A). It allowed to perfectly match response or non-response to HMA therapy with AZA when fit to the screening dataset (Additional file 2: Fig. 7B).
Validation of the DNA methylation based predictive classifierValidation of the identified classifier within a validation cohort, derived from the AMLSG 12-09 study group trial cohort (validation cohort, n = 97) was performed using MALDI-TOF, a targeted approach for the quantification of DNA methylation at single CpG-site resolution as described earlier [41]. For final data analysis, nine samples were removed from the validation cohort (remaining samples n = 88). One sample was removed due to correction of patient response status to early death, another sample was removed due to more than 50% of missing values after generation of mass spectra and the remaining samples were removed due to insufficient amounts of DNA in final quality control before generation of mass spectra.
16 out of 17 classifiers-contained CpG dinucleotides could be addressed with primers suited for mass spectrometry at single CpG-site resolution. Designed primers also encompassed flanking regions with up to 125 bp and included CpGs. The analysis resulted in a total of 152 informative CpG units. After quality control by removal of units with more than 20% of missing values, n = 71 informative CpG units remained. For classifier-contained mass spectra 15 out of 17 profiles generated were informative.
When the previously established classifier was mapped to these 15 CpG units as assessed by MALDI-TOF and applied to the validation cohort, validation failed within this cohort. The resulting receiver operating characteristic (ROC) curve was only slightly above the bisecting line and the area under the curve (AUC) was 0,59 resulting in low performance (Additional file 2: Fig. 8).
Independently validated CpGs, in proximity of the classifier comprised CpGs allow for prediction of therapy response in the validation set (EXP arm), but are not specific for HMA treatment with AZAWe tested, if the signature’s distinction capacity could be preserved with the information from neighboring CpGs by the additionally generated methylation data from flanking regions. Significant differences in methylation were tested for between responders and non-responders by non-parametric Mann–Whitney-U testing both in the EXP and STD arm based on methylation data generated by mass spectrometry. Assessment was performed in the validation cohort and significant differences are visualized in Fig. 4A. There was no overlap with significant hits from the STD arm (Additional file 2: Fig. 9). Significant hits comprised 5 out of 17 target regions from the original classifier.
Based on these results the classifier was recomposed by penalized regression and included 8 out of 15 significantly differentially methylated MassARRAY units, consisting of up to 3 CpGs (Fig. 4B). In total, 12 CpGs were included in the refined classifier. With the refined classifier, prediction of therapy response has an apparent misclassification error of 0.2157, if run without considering subsampling to avoid overfitting. Compared to the original classifier, predictive quality is significantly improved (AUC = 0.924). For the given results, a sensitivity of 93.3% and a specificity of 42.9% can be calculated (Fig. 5A, B; Additional file 2: Fig. 10).
For a final evaluation of classifier quality unbiased from potential overfitting, misclassification error and an AUC were calculated based on 0.632 + bootstrap resampling (Fig. 5C). The refined lasso signature has a bootstrap estimated unbiased misclassification error of about 35%, while the reference error for the null model is about 41%.
In summary, the value of this model for predicting therapy response in new samples is better than the null model. Nevertheless, a substantial error remains. The bootstrap-estimated AUC is about 0.77, which is lower than the AUC computed on the full data set, but better than the AUC for the reference model (Fig. 5C2). Our DNA methylation-based signature which was trained to predict response to therapy was also associated with a trend towards improved OS and a significantly improved EFS (data not shown).
To further assess the classifier’s specificity to HMA treatment it was tested within the STD arm of the validation cohort. With a misclassification error of 0.24 and an AUC of 0.76, the signature unfolds a prediction performance in the STD arm, comparable to the 0.632 + -bootstrap estimates for misclassification error and AUC within the EXP arm (Additional file 2: Fig. 11). Though the recomposed classifier can better discriminate between response and non-response than the null model, it does not reach its genuine goal to discriminate therapy response, specific for AZA.
Multivariable analysis shows the association of 12-CpG-classifier with treatment response to be independent of potential confoundersMultivariable analysis including potential confounding variables showed that both mutational status of epigenetic modifier genes such as DNMT3A and IDH1/2 and cytogenetics had no impact on the significance of the classifier (Fig. 6A, B). The effect of the 12-CpG signature remained statistically significant in all models, indicating that neither of these variables are important confounders for treatment response in our experimental setting. However, small sample size, the limited panel of mutations and protocol restrictions excluding several recurrent mutations in AML and an overall low fraction of patients with mutations restrict this multivariable analysis.
Comments (0)