The potential of an artificial intelligence for diagnosing MRI images in rectal cancer: multicenter collaborative trial

In this study, we were able to verify the usefulness of mrAI, which we developed using a multi-institutional dataset. The performance of mrAI was enhanced when characteristics of the MRI imaging environment, such as the imaging protocol and the manufacturer, were close to those of the ground-truth data used for algorithm development. This result suggests how sensitive AI can be as a diagnostic support technology and has revealed that addressing this issue will be mandatory in making this technology generally available in future. The result that mrAI had higher diagnostic accuracy with thinner slice thickness suggests that image quality improvement of each cross-sectional image and the information embedded within the continuity of multiple sections are likely important in enhancing diagnostic accuracy.

The most standard approach for assessing the accuracy of preoperative diagnosis is comparison with pathologic findings. While such verification was previously possible, it has become challenging to use unmodified specimens now that preoperative treatment has been standardized [15,16,17,18]. Japan has a unique environment where unmodified pathologic information from relatively recent times is easily accessible, as surgery-first treatment has been standardized for a long time, even after neoadjuvant CRT became prevalent in the West. In this study, to develop an AI-based algorithm, we created ground-truth label data by correlating pathologic sections of circular specimens with MRI images. To assess diagnostic accuracy, we utilized nationwide, multicentric data to compare pathologic findings with MRI on a 1:1 basis. In Japan too, the importance of preoperative treatment has been recognized in recent years, and the number of cases undergoing preoperative treatment is increasing [20, 21]. Conducting similar research methods will become more and more challenging in future. We consider this study to have significant value for having utilized this rare opportunity.

Regarding T staging, MRI is perceived to have a tendency to overestimate tumor stage rather than underestimate it [22]. In the MERCURY study, a comparison was made between MRI diagnosis and pathologic diagnosis among 311 individuals who underwent surgery first. Of the cases pathologically diagnosed as T1/2, 36% were diagnosed as T3/4 on MRI, while 31% of the cases pathologically diagnosed as T3/4 were diagnosed as T1/2 on MRI [15]. Long-term prognosis analysis of the MERCURY study showed a poor disease-free survival rate in cases with a high risk of positive CRM on preoperative MRI. In this report, 23.2% were diagnosed as Stage I on preoperative MRI, and eventually 26.7% were diagnosed as Stage I pathologically. This fact suggests that among the cases diagnosed pathologically as T2 or lower, there were not a few cases diagnosed as T3/4 at the time of diagnosis [23]. Although the MERCURY study had strict management with tightened MRI diagnostic criteria, it is believed that the tendency to overestimate the depth in actual clinical settings may be even more pronounced. Other retrospective studies examining the diagnostic accuracy of MRI for upfront surgery cases have also shown a tendency for MRI diagnosis to overestimate tumor stage [16,17,18], and a meta-analysis analyzing MRI diagnostic accuracy reported that the proportion of overestimation was 25% and of underestimation was 13% [19]. In not a few cases, it is difficult to distinguish between fibrotic and tumor tissues [22]. In radiologic diagnosis made by humans, intentions to avoid under-diagnosis by considering various clinical situations may significantly influence the diagnosis. In our study data, a similar trend was observed. Particularly in this study, which analyzed data from a time when MRI was not widespread in Japan, there was a tendency to extremely over-diagnose in real-world clinical settings. In centralized diagnosis involving experienced radiologists, this tendency was corrected compared with local diagnosis, but 63% of the cases pathologically diagnosed as T2 or lower were over-diagnosed preoperatively as T3 or higher. While a central diagnostic setting is more conducive to neutral judgments, it would be even harder to avoid over-diagnosis in real-world clinical settings [24]. In this regard, mrAI's diagnosis is always neutral, and even in cases that are difficult to judge, the algorithm can diagnose tumors without bias as T2 or below. This may contribute to optimizing each patient's treatment. We do not assert that mrAI surpasses human diagnostic capabilities; rather, we emphasize that AI diagnosis should not be seen as a technology that confronts radiologic diagnosis. A comprehensive understanding of both the advantages and disadvantages of mrAI is imperative, and its application should be judiciously implemented to augment the diagnostic procedures conducted by radiologists. In addition, mrAI, when applied to research using MRI, is expected to be able to make unbiased decisions and also have the potential to reduce the workload of radiologists performing central diagnoses, since they can carry out bias-free diagnostics.

In this study, to evaluate the performance of mrAI, the depth of invasion was extracted based on segmentation and compared with centralized diagnosis. The depth of invasion is a well-known long-term prognostic factor, and in the MERCURY study the hazard ratio for DFS of pathologic Stage II was over five times that of Stage I [23]. Compared with centralized diagnosis, mrAI showed slightly lower sensitivity and higher specificity. To validate the clinical significance of this result, we compared the relapse-free survival of ≤ T2 and T3 ≤ groups in MRI diagnosis. Patients with pathologic stage ≥ T3 tumors had significantly poorer prognosis than those with tumors staged ≤ T2, but a significant difference in relapse-free survival was shown only in the diagnosis derived by mrAI, not in the centralized diagnosis. Moreover, no association was found between the two groups in local diagnosis. Although we could not evaluate the performance of mrAI against pathologic assessment for MRF involvement, the observed correlation between centralized diagnosis and mrAI diagnosis suggests that mrAI may be useful for assessing MRF involvement.

An essential factor in deciding on preoperative treatment for locally advanced rectal cancer is the risk of recurrence for each case. The recently highlighted TNT has been proven to improve DFS in locally advanced rectal cancer, and its application is expanding, especially in Western countries [1,2,3]. However, for cases that can be cured without TNT, it represents overtreatment. One of the current challenges is to clearly define the indications for TNT [25]. We expect that mrAI, which can classify relapse-free survival similarly to pathologic diagnosis based on MRI findings at the time of diagnosis, will play a significant role in rectal cancer treatment as multidisciplinary treatments evolve. Especially as mentioned above, there might be a concern among radiologists about missing the opportunity for preoperative treatment, which tends to lead to overestimation. The importance of a tool to support neutral judgment is immense. While MRI can evaluate factors such as extramural vascular invasion (EMVI) [26], lymph node metastasis [27], and tumor deposits [28] that also affect long-term prognosis, they are not assessed by the current mrAI. We have not explored these in this study but plan to do so in future research. Furthermore, with recent advances in ctDNA and multiomics analysis, individualization of colorectal cancer treatment is becoming a reality [29,30,31,32]. With these technological innovations, establishing a method to comprehensively evaluate the individual risks of advanced rectal cancer will help pave the way to future individualized treatment of rectal cancer.

There are several limitations to this study. First, this study was retrospective, and therefore there is potential for bias. Because the MRIs were not obtained within a recent timeframe, in some cases the image quality was inferior compared with that of MRI obtained with standardized imaging protocols in Western countries [22, 33]. Second, our findings could not prove that the current mrAI is universally applicable across different vendors and imaging environments. However, optimizing the imaging environment for MRI can enhance the applicability of mrAI, and we believe this challenge can be adequately addressed in future. Third, we did not individually review the segmentation results extracted by mrAI for each case. The accuracy of these segmentation results needs to be prospectively accumulated and verified, ensuring proper imaging conditions.

In conclusion, we were able to verify the performance of mrAI, which we developed using multi-institutional data. By ensuring proper imaging conditions for MRI, the accuracy of the results of the mrAI analysis can be enhanced, and mrAI has the potential to provide feedback to radiologists without leading to overestimation of tumor stage.

Comments (0)

No login
gif