Predicting long-term progression of Alzheimer’s disease using a multimodal deep learning model incorporating interaction effects

Participants

This study included 297 participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database who met the following criteria summarized in Fig. 1: (a) baseline diagnosis with MCI or AD, (b) availability of T1-weighted sMRI scan, clinical assessments, and genetic polymorphism data at baseline, (c) follow-up visits exceeding defined durations. We categorized all MCI subjects into progressive MCI (pMCI) or stable MCI (sMCI) based on their progression to AD during follow-up. Participants with reversed diagnostic status and repeated enrollment in the pMCI or sMCI groups were excluded.

Fig. 1

Flowchart of study design. The study comprised 238 subjects with MCI from ADNI-1 and ADNI-2/GO cohorts for cross-validation, and 14 subjects with MCI from ADNI-3 for an independent test. In addition, 45 subjects diagnosed with AD were included in the model training to address class imbalance

The cross-validation set consisted of 238 MCI subjects from ADNI-1 and ADNI-GO/2 cohorts, with a 48-month follow-up. Additionally, 45 subjects with AD were included in the model training to provide insights into pathological changes and address class imbalance. The independent test set comprised 14 MCI subjects with follow-up durations ranging from 36 to 48 months from the ongoing ADNI-3 cohort, serving as external validation for model generalizability.

Image preprocessing

The acquisition of T1-weighted sMRI scans involved multiple scanners, each with its customized scanning protocols. The initial preprocessing steps were conducted through FreeSurfer (version 7.1.1), including motion correction, intensity normalization, and skull stripping. This yielded images of 256 × 256 × 256 voxels with a spatial resolution of 1 × 1 × 1 mm3. We further cropped the images to match the largest skull-stripped brain size of 160 × 176 × 200 voxels, resulting in a 66.43% reduction in the total image volume. To ensure uniformity, image intensities were scaled to a range between 0 and 1 using max–min normalization. The above process is detailed in Additional file 1: Fig. S1.

Preparation of clinical and genetic features

For each participant, we considered 14 clinical features at baseline, including demographic data (age, sex, education) and cognitive assessments, as listed in Table 1. Sex was encoded as a binary variable. Mean imputation was employed for handling missing data. Subsequently, all variables were normalized using the Min–Max scaler.

Table 1 Baseline characteristics

Whole genome sequencing (WGS) data of all participants were genotyped using the Human 610-Quad Bead-Chip. Quality control, encompassing criteria such as genotype quality, deletion rate, minor allele frequency, and Hardy–Weinberg test, was applied to retain 8,326,239 features from 44,535,780 single nucleotide polymorphisms (SNPs). Following data filtering, the genotypes of all SNPs were imputed using Beagle and recoded as the number of alleles. Subsequently, we implemented a two-stage feature selection approach. In the first stage, a knowledge-driven approach selected 1023 AD-related SNPs that had achieved gene-wide significance in the IGAP meta-analysis [28]. In the second stage, a data-driven approach with Lasso regression was performed to identify the most important 49 features. The detailed process is summarized in Additional file 1: Fig. S2.

Deep learning model architecture

We proposed the Dual Interaction Stepwise Fusion Classifier (DISFC), a multimodal deep learning model based on 3D sMRI scans, demographic and neuropsychological assessments, and genetic polymorphism data to predict the risk of MCI progression to AD at baseline. The DISFC framework was designed for two steps: multimodal feature extraction and stepwise fusion classification (Fig. 2A). In the multimodal feature extraction step, we employed a parallel three-branch network comprising spatial, clinical, and genetic feature extractors. The network took trimodal data as inputs and produced 8-dimensional abstract features for each modality. In the subsequent stepwise fusion classification step, these abstract representations were gradually fused, starting with the fusion of neuroimaging and clinical high-level features and followed by the concatenation of genetic encoded features. Finally, this process outputted a probability of whether the corresponding MCI patient would convert to AD.

Fig. 2

Schematic illustration of the deep learning model architecture. A The proposed deep learning model consists of multimodal feature extraction and stepwise fusion classification. B Sequential operations within the SepConv block, residual block, and FC block. C Inter-modal and intra-modal interaction modules

The model backbone consisted of three types of blocks: separable convolution (SepConv) blocks, residual blocks, and fully connected (FC) blocks (Fig. 2B). To prevent overfitting, we implemented batch normalization, dropout, and L2 regularization in each block. In the spatial feature extractor of the DISFC model, SepConv blocks replaced traditional 3D convolution with separable convolution to reduce the number of trainable parameters. A shortcut connection was applied to the group of four residual blocks to improve gradient propagation.

Most importantly, our DISFC model introduced two interaction modules: the intra-modal interaction module and the inter-modal interaction module (Fig. 2C). The dual interaction modules played a pivotal role in guiding the model to learn meaningful combinations. The intra-modal interaction module was applied to the clinical feature extractor of the DISFC model to integrate clinical variables and their second-order interaction terms. Meanwhile, the inter-modal interaction module was embedded into the stepwise fusion process, explicitly modeling the pairwise interactions between neuroimaging and clinical information using outer product.

Model development and evaluation

Sigmoid activation and binary cross-entropy loss were applied to the output layers of the three feature extractors and the stepwise fusion classifier to supervise the learning of our model. The binary cross-entropy loss is defined as follows:

$$}\left( } \right) = - \frac\sum\limits_^ \cdot \log \left( _ } \right) + \left( } \right) \cdot \log \left( _ } \right)} \right]} ,$$

where $N$ is the batch size, $y_$ represents the ground truth for sample $i$, and $\hat_$ is the conversion probability of sample $i$ predicted by our model.

Based on the described design, our complete loss function $}$ for the MCI conversion prediction task is as follows:

$$} = \alpha }_ + \beta_ }_ + \beta_ }_ + \beta_ }_ ,$$

where $}_$ denotes the classification loss for the fusion subnetwork output, and $}_$, $}_$, and $}_$ represent the corresponding losses for three feature extractors. $\alpha$ and $\beta_$ $\left( \right)$ are the hyperparameters for balancing losses, set to 1.0, 1.5, 0.5, and 0.5 in our experiments.

All experiments were conducted using Keras with TensorFlow backend on NVIDIA Tesla V100 GPUs. During model development, we performed stratified tenfold cross-validation to ensure each patient was tested once. Data augmentation was applied to augment training data, including mirroring, rotation, shifting, and scaling transformations for sMRI, as well as slight random perturbations for clinical and genetic inputs. The model was trained for 50 epochs with a batch size of 6. We utilized the Adam optimizer with a learning rate that initially warmed up to 0.001 in 15 epochs and then exponentially decayed. The best-performing model, determined by validation performance, was evaluated on the independent test set to assess its generalizability.

The performance of the DISFC model on the cross-validation and the independent test sets was evaluated using metrics including the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, and F1 score. Notably, we calculated the average results across folds as the overall performance of the tenfold cross-validation.

Statistical analysis

All statistical analyses were implemented using R software (version 4.1.2). We compared the performance of the validation and independent test sets using the Fisher exact test for accuracy, sensitivity, specificity, and the Delong test for AUC. Similarly, we applied the Fisher exact test to evaluate differences in accuracy, sensitivity, and specificity between subgroups from different centers and scanners. The Wilcoxon signed rank test with continuity correction was employed to assess the improvement by interaction modules and multimodality. For comparisons between our model and previous methods, we utilized the same test for AUC and conducted the one-sample proportions test for accuracy, sensitivity, and specificity. The 95% confidence intervals (CIs) were calculated by the Clopper–Pearson method for accuracy, sensitivity, specificity, and 2000 stratified bootstrap replicates for AUC. The p-values less than 0.05 were considered statistically significant.

View original article

JOURNAL OF TRANSLATIONAL MEDICINE

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Predicting long-term progression of Alzheimer’s disease using a multimodal deep learning model incorporating interaction effects

Comments (0)