REPAIR: Reciprocal assistance imputation-representation learning for glioma diagnosis with incomplete MRI sequences

Glioma (GM) is the most common malignant primary brain tumor, accounting for approximately 24% of all primary brain and other central nervous system tumors, and 80.9% of malignant tumors (Ostrom et al., 2022). It has a grim prognosis, with a 5-year survival rate of only 6.9% and a median survival of less than 2 years after diagnosis (Louis et al., 2021; Ostrom et al., 2022; Tan et al., 2020). The overall survival of GM patients is closely correlated with histological grades and molecular markers (e.g., isocitrate dehydrogenase (IDH) mutation and 1p/19q co-deleted) (Weller et al., 2021), which usually require assessment of pathological specimens obtained via invasive surgical resection. Given the potential risks associated with histopathology and the inaccessibility of some deep-seated tumors (Dammers et al., 2008; Riche et al., 2021; Taweesomboonyat et al., 2019), a non-invasive approach is highly desirable in clinical settings to offer a safe and easily accessible alternative for the diagnosis of GM.

Magnetic Resonance Imaging (MRI) is a primary radiological tool for diagnosing GM (Suh et al., 2019; Zhou et al., 2022). Imaging features extracted from multi-sequence MRI, such as T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), contrast-enhanced T1-weighted imaging (CE_T1WI) and T2-weighted fluid attenuated inversion recovery (T2_FLAIR), provide multifaceted insights into the disease. Integrating radiomics information from multi-sequence MRI has shown great promise for clinical tasks such as genomic profiling, grading, and subtypes prediction of GM (Gore et al., 2021; Kim et al., 2020; Peng et al., 2021; Su et al., 2019; Wang et al., 2019b). However, the potential absence of MRI sequences poses a practical challenge when using multi-sequence MRI for GM prediction modeling. This can be caused by various reasons, including differences in acquisition protocols between hospitals, patients’ health condition and compliance, scanning expenses, imaging artifacts, and data corruption or loss (Farahani et al., 1990; Sharma and Hamarneh, 2019). For example, patients with renal disease, risk factors for nephrogenic systemic fibrosis, or allergies to gadolinium-based contrast agents cannot undergo contrast-enhanced MRI scans (Fraum et al., 2017). Some sequences may be omitted for pediatric patients to reduce discomfort and avoid prolonged scans. Even when all sequences are scanned, suboptimal image quality or data loss (e.g., during system upgrades) might also hinder the use of a complete set of MRI sequences for analysis. Previous studies reported a considerable proportion of lost or unusable MRI sequences in GM data prediction (up to 33.8%) (van der Voort et al., 2023), posing substantial obstacles to building predictive models (Kandalgaonkar et al., 2022; Liu et al., 2023d; Nakamoto et al., 2019a) that assume the integrity and availability of complete MRI sequences.

In this paper, we propose a unified reciprocal assistance imputation-representation learning (REPAIR) framework (Fig. 1) for GM diagnosis modeling with incomplete MRI sequences. The “reciprocal assistance” refers to the active cooperation between missing value imputation and multiple sequence fusion. Multi-sequence MRI data from existing samples provide supportive knowledge for imputing missing values; conversely, the imputed data facilitates multi-sequence fusion, which can be framed as a shared latent representation learning problem. Reciprocally, the learned representation from multi-sequence data guides and enhances the accuracy of missing value imputation. To tailor the learned representation for downstream tasks, we introduce a novel ambiguity-aware intercorrelation regularization in this joint optimization scheme. It correlates the imputation ambiguity and its impacts conveying to the learned representation via a fuzzy paradigm, minimizing the learning errors caused by missing data while simultaneously capturing intra-feature redundancy and feature-task relevancy in the learned representation. A multimodal structural calibration constraint is also devised to correct for the structural shift caused by missing data, ensuring structural consistency between the learned representations and the real data.

The novel contributions of this work are summarized as follows:•

We propose a reciprocal assistance imputation-representation learning paradigm for the effective integration of incomplete MRI sequences for GM diagnosis, extensively validating its effectiveness on several public and private datasets.

We devise an ambiguity-aware intercorrelation regularization by introducing fuzzy rough sets theory to better correlate imputation ambiguity with the learned representation, meticulously accounting for uncertainties inherent in the incomplete MRI sequences. This approach allows for more legitimate and task-relevant imputation to be estimated within a joint optimization framework.

We design a multimodal structural calibration constraint to ensure structural consistency in the learned representation by excavating low-rank structural properties from the complete sample set. It ensures the interpretability of the learned representation through relaxed and scalable constraints, where only fundamental structure of the original domain is preserved.

The proposed REPAIR adapts to varying scenarios with arbitrary missing of MRI sequences, rendering it a practical tool for handing comprehensive real-world medical data in complex clinical settings.

The rest of this paper is organized as follows. In Section 2, we briefly review the related works; In Section 3, we give a detailed description of our method; In Section 4, we describe multimodal clinical data used for evaluation and present the experimental results; In Section 5, we discuss related issues pertaining to the proposed method; Finally, we conclude our study in Section 6. The source codes of the proposed method are available.1

Comments (0)

No login
gif