HGMSurvNet: A two-stage hypergraph learning network for multimodal cancer survival prediction

Survival prediction is a crucial task in computational pathology, which aims to analyze the expected duration of time until death occurs (Shmatko et al., 2022; Duman et al., 2024). With advanced whole slide image (WSI) scanning technology, computational pathology enables quantitative analysis of pathological diagnosis processes and outcomes (Campanella et al., 2019). For the past decades, many computer-aided diagnosis (CAD) methods have been developed to alleviate the workload of pathologists and enhance diagnostic accuracy (Vorontsov et al., 2024; Xu et al., 2024). With the development of genomic methods in recent years (Feero, 2020), Multimodal representation learning based on the multimodal medical data, such as pathological slides, genomic profiles, and clinical records, has been widely used for more comprehensive cancer prognoses (Xu and Chen, 2023; Song et al., 2024; Zhang et al., 2024).

Although multimodal data provide different and complementary views of the health status of patients, there is great heterogeneity among different medical data. For example, compared to clinical records and genomic profiles, the WSIs have an extremely large size and exhibit intricate data correlations. To address the huge size of WSIs, recent methods follow a pipeline to process WSI data: the WSIs are first tokenized into small patches, from which features are extracted and then fed into a regression model to predict survival risk scores (Ilse et al., 2018; Campanella et al., 2019; Lu et al., 2021). However, the structure of WSIs is disrupted by patch sampling, which may affect the representation ability of those methods. In WSIs, tumors are usually located within a complex microenvironment that includes immune cells, stromal cells, blood vessels, and other components (Balkwill et al., 2012). These components work together to influence the growth, invasion, and metastasis of tumors. Thus, the spatial and morphological relationships between cell populations and tissue regions are critical for cancer diagnosis, as they reflect the tumor microenvironment's dynamic interactions and functional states (Cheng et al., 2017).

To this end, several works have employed Graph Convolutional Networks (GCN) to capture the correlation structures among patches (Lee et al., 2022; Chan et al., 2023; Alzoubi et al., 2024). These GCN-based methods can learn the low-order representations of WSIs by modeling the interactions between patches (Chen et al., 2021a; Lee et al., 2022). However, the modeling ability of the traditional graphs is still limited to pair-wise relationships (Gao et al., 2022; Cai et al., 2024; Zhou et al., 2024). Recently, hypergraph learning has been employed to learn complex correlations in various types of data, such as hyperspectral images (Guan et al., 2024), 3D point clouds (Chen et al., 2024), and biological pathways of cells (Franzese et al., 2019). Different from the edges in traditional graphs, the hyperedges in a hypergraph can connect more than two nodes. This unique hyperedge eliminates the necessity of decomposing higher-order relationships into multiple pairwise edges, thereby preserving the integrity of group interactions. Moreover, a single hyperedge can encompass all relevant nodes, thus significantly reducing the total number of edges compared to traditional graphs. Therefore, the hypergraph has the feasibility to capture the high-order (beyond pairwise) representations from complex WSIs, which will facilitate the comprehensive exploration of diagnostic information within patients.

On the other hand, multimodal learning with missing modalities is a common issue in clinical practice, which may hinder the effectiveness of current multimodal methods (Haneuse et al., 2021; Zhang et al., 2022; Wu et al., 2024). Previous studies usually disregard these incomplete data, which significantly reduces the number of training samples (Chen and Zhang, 2020). Recently, some studies have used the padding and generation methods, such as zero padding (Shen et al., 2020), learning factorized representations (Tsai et al., 2018), autoencoders (Tran et al., 2017), and generative adversarial networks (Pan et al., 2022), to address the problem of data incompleteness. However, the heterogeneity of multimodal medical data poses some challenges in applying these methods to multimodal survival prediction. For example, clinical records are multi-dimensional text data, WSIs are gigapixel images that contain thousands of patches, and genome profiles consist of thousands of dimensional sequences. It is relatively difficult to train these generative models due to the imbalance and significant differences in multimodal features. In addition, the features reconstructed by generative methods may introduce more noise, significantly affecting the performance of the model.

In fact, for the task of survival prediction with missing modalities, the generative model may not be essential, and some alternative solutions have been investigated. Recent studies have proved that the similarities among patients can assist clinical analysis (Zhang et al., 2022; Kim et al., 2023; Wu et al., 2024; Han et al., 2025b). That is, if two patients exhibit comparable survival times, they are more likely to share similarities in their clinical data. This motivates us to address the issue of missing modalities by leveraging information from similar patients. For example, some works model patient data as graph structures (Chen and Zhang, 2020; Zhang et al., 2022; Wu et al., 2024), in which patients are regarded as nodes and edges are constructed based on pairwise similarities. However, for the survival prediction task with multiple modalities, the adjacency matrix of the graph is insufficient to capture the complex relationships among patients.

In this work, a novel two-stage hypergraph learning network, named HGMSurvNet, is proposed for cancer survival prediction with multimodal data. It can effectively capture survival-specific diagnostic information from intra-patient and inter-patient and remains robust even in scenarios with missing modalities. Specifically, in the first stage, the HGMSurvNet models the WSIs as flexible hypergraph structures using both the phenotype-wise and topology-wise relationships of patches, and then utilizes hypergraph learning to capture complex correlation structures among patches for intra-patient learning. In the second stage, we construct patient sub-hypergraphs for each modality and then merge them into a unified hypergraph for inter-patient learning. Thus, if a patient misses one modality, they can still remain connected through other available sub-hypergraphs. More importantly, to mitigate the impact of meaningless hyperedges resulting from missing modalities, we introduce an attention-driven hyperedge dropout mechanism to selectively discard unimportant hyperedges. Extensive experiments on six public cancer cohorts from TCGA indicate the effectiveness and robustness of the proposed HGMSurvNet for cancer survival prediction.

The main contributions of this work are threefold as follows:1)

We explicitly model WSIs into hypergraph structures by integrating visual appearance and spatial structural information, enabling the WSI hypergraph network to capture complex correlations among patches and generate the global high-order representation for multimodal survival prediction.

2)

We propose a novel multimodal hypergraph learning network, whose hyperedges are connected based on the similarity of multimodal features, revealing various types of relationships among patients. The patient hypergraph network can learn the neighborhood information from the topological structure of the multimodal hypergraphs, thereby enhancing the robustness of the survival model even in cases of missing modalities.

3)

We develop a new attention-driven hyperedge dropout mechanism in the hypergraph convolution to address meaningless hyperedges generated by incomplete data. This mechanism skillfully leverages the scalability of hypergraphs to remove irrelevant hyperedges in the multimodal hypergraphs, thereby mitigating the impact of incomplete data on multimodal hypergraph learning.

Comments (0)

No login
gif