A transformer-based multi-task deep learning model for simultaneous infiltrated brain area identification and segmentation of gliomas

Patient characteristics

By applying specific inclusion and exclusion criteria, a total of 354 patients (195 males and 159 females) with an average age of 47.61 ± 12.99 years (range from 11 ~ 82 years) were included in our study. Of these, 270 were allocated to the training set, 30 to the validation set, and 54 to the independent test set. As shown in Table 1, 243 patients (68.64%) were diagnosed with low-grade gliomas (grade II and III), while 111 patients (31.36%) were diagnosed with high-grade gliomas (grade IV). Besides, a total of 239 gliomas (67.51%) were located within a single anatomic area, while 115 gliomas (32.49%) infiltrated across two or more areas. Specifically, the gliomas in the frontal lobe accounted for 56.78%, those in the parietal lobe accounted for 21.47%, those in the occipital lobe accounted for 11.30%, those in the temporal lobe accounted for 40.11%, and those in the insula lobe accounted for 7.62% of all patients.

Model performance of glioma-infiltrated brain area identification

Tables 2 and 3 and Fig. 4 show the performance of identifying the glioma-infiltrated brain areas. Specially, Table 3 is adopted to show the performance of models for gliomas that infiltrated in single and multiple regions. Our proposed method achieves better performance than VGG16, ResNet50, EfficientNetb0 and COM-Net, MTTU-Net, with AUC of 94.95% (95% CI, 91.78–97.58) on the independent test set. Figure 4 illustrates the Receiver Operating Characteristic (ROC) curves for six distinct deep learning-based classification models, along with an enhanced iteration of our model that incorporates four additional clinical characteristics. These models were evaluated on an independent test set, yielding AUC values of 78.90% (95% CI, 73.07–84.28), 87.10% (95% CI, 81.73–92.04), 85.23% (95% CI, 78.75–90.74), 90.22% (95% CI, 85.12–94.61), 93.51% (95% CI, 90.22–96.37), 94.95% (95% CI, 91.78–97.58), and 95.07% (95% CI, 92.28–97.48) for VGG16, ResNet50, EfficientNetb0, COM-Net, MTTU-Net, our method, and our method with added clinical characteristics, respectively. Tables 2 and 3 also present the results of clinically related subgroups of patients. Our method outperforms the aforementioned state-of-the-art classification methods in terms of AUC, with an AUC of 95.25% (95% CI, 91.09–98.23) in grade II, an AUC of 98.26% (95% CI, 95.22–100.00) in grade III, an AUC of 93.83% (95% CI, 86.57–99.12) in grade IV, an AUC of 98.90% (95% CI, 97.46–99.94) in single infiltration (infiltrating brain area count), an AUC of 91.48% (95% CI, 84.27–97.08) in double infiltration and an AUC of 100% (95% CI, 100–100) in triple infiltration.

Table 2 Classification performance of the various models on both the validation and independent test setTable 3 Classification performance of the various models in single and multiple regions on both the validation and independent test setFig. 4figure 4

ROC curves of six distinct deep learning-based classification models, including VGG16, ResNet50, EfficientNetb0, COM-Net, MTTU-Net and the proposed method, for identifying the glioma-infiltrated brain areas. The seventh subplot evaluates our method enhanced with additional clinical characteristics

We also integrated clinical features into our model, specifically two demographic characteristics (age and sex) and two clinical data variables (glioma grade and Karnofsky performance score). The results presented in the table demonstrate an improvement of 2.73% in AUC, 6.21% in accuracy, 4.77% in sensitivity, and 6.8% in specificity compared to our original MR-only model for all grades on validation set.

Model performance of tumor segmentation

The performance of five different tumor segmentation methods, including U-net, nnUnet, COM-Net, MTTU- Net and our method was evaluated on the validation and independent test sets. The related results are presented Table 4 and Fig. 5, and the findings indicate that the proposed method outperforms the other four methods in terms of DSC for all grades of tumors on both the validation and independent test sets. Specifically, Fig. 5 shows that the proposed method achieved the highest overlap between the ground truth and the predicted segmentation, as indicated by the green curve. As shown in Fig. 5, our method achieves the highest overlap between the ground truth (red curve) and the predicted segmentation (green curve). Table 4 provides the numerical results of the evaluation, and it shows that the proposed method achieved the optimal DSC for all grades of tumors, including grade II, III, and IV. The DSC for the proposed method were 87.60% for all grades, which were higher than the corresponding scores for single-task methods (U-Net and nnUnet) and multi-task methods (COM-Net and MTTU-Net). Specifically, it achieved a DSC o 88.50% for grade II, 85.44% for grade III, and 88.20% for grade IV on the independent test set.

Table 4 Segmentation performance of the compared methods on both the validation and independent test setFig. 5figure 5

Visual segmentation results obtained from different segmentation methods. The red curve represents the ground truth, while the green curve shows the predicted segmentation

Model performance of glioma-infiltrated brain area identification vs. experts

To compare the accuracy of infiltrating tumor lobe classification by human and the model, we conducted a comparison experiment with a random sample of 50% in independent test set. Two experts dedicated to annotate glioma-infiltrated brain lobes based on enhanced T1 and T2-FLAIR MRI data (X.M.L and M.L with 10 years and 4 years of experience, respectively). They consecutively and independently evaluated the MR data from independent test set. The results of our model have been binarized for fair comparison. Table 5 shows performance of identifying glioma-infiltrated brain areas with AUCs of 61.58% (95% CI, 52.38–70.33), 57.21% (95% CI, 48.10–66.49) and 85.30% (95% CI, 77.71–91.70) for two experts and our model. The time spent in each case was 50 s and 2 min, respectively, which was much higher than the 0.4 s of the model. As shown in Fig. 6, the experts' results show largely individual differences. The two experts had a high sensitivity to the frontal lobe, while they could not discriminate well in the parietal, occipital, and temporal lobes. In contrast, although our model cannot achieve optimal performance in every category, it was able to achieve a higher level of discrimination in each brain lobe, with a better overall ability.

Table 5 Model performance of glioma-infiltrated brain area identification vs. expertsFig. 6figure 6

ROC curves of two experts and proposed model

Visualization of local and global information of our model

To validate the extraction of specific global and local features, we compare the performance of the hybrid CNN and Transformer network with a pure CNN network using guided backpropagation [26] for visual interpretation. The results of this comparison were shown in Fig. 7. Given an input image, we perform the forward pass to the last convolutional layer we are interested in, then set to zero all activation except one and propagate back to the image to get a reconstruction. As shown in Fig. 7, the Guided Backpropagation map (Guided Backprop) reflects the pixels where the network is focused, with white indicating high attention and black low attention.

Fig. 7figure 7

Guided backprop maps of (b) hybrid CNN and Transformer network and (c) pure CNN models. Illustrate the model's attention to both global and local information

The pure CNN network shows attention to local tumor information, as evidenced by the high intensity in the tumor region (Fig. 7(c)). In contrast, the hybrid model, as shown in the Fig. 7(b), exhibits high attention regions associated with tumors and brain edges. This indicates that the hybrid model effectively learns global features, specifically the relative position relationship between tumors and brain edges. The corresponding class activation maps of the hybrid model provide insights into the decision regions when predicting infiltrating brain regions. By combining information from both global and local features, the hybrid model maximizes the advantages of each. The global features provide important information about the relative positions of tumors and brain edges, while the local features focus on specific tumor regions. This combination allows the hybrid model to achieve more accurate predictions of infiltrating brain regions.

Comments (0)

No login
gif