The Fidelity of Artificial Intelligence to Multidisciplinary Tumor Board Recommendations for Patients with Gastric Cancer: A Retrospective Study

This retrospective study was conducted to investigate the factors that should be supplemented in AI to assist the MTB in gastric cancer treatment recommendations. Few studies have analyzed the concordance rate between AI and an MTB in gastric cancer treatment recommendations. Choi et al. reported that stage IV gastric cancer was the only significant factor that affected concordance rates [6]. In contrast, Tian et al. reported that HER-2 positivity was a significant factor, while stage IV was not [8]. Interestingly, in the present study, HER-2 positivity was not significant, but age > 80 years, performance status, and stage IV gastric cancer were all significant factors affecting the concordance rate between the MTB and AI (Table 3).

Recommendations pertaining to patients aged > 80 years were less likely to be concordant than those pertaining to patients aged < 80 years (OR 0.175, 95% CI, 0.069–0.441; p = 0.000). Elderly patients have more comorbidities than younger patients, a tendency to refuse chemotherapy requiring hospitalization, and a fluctuating general status [9, 10]. These clinical features could explain the lower concordance rate in patients over 80 years of age.

The higher the performance score, the lower the probability of concordance between the MTB and AI in gastric cancer recommendations (OR 0.203, 95% CI 0.072–0.574, p-value = 0.003 for performance score 1; OR 0.191, 95% CI 0.057–0.639, p-value = 0.007 for performance score 2; OR 0.089, 95% CI 0.026–0.301, p-value = 0.000 for performance score 3). In other words, as the performance status score increases, it becomes easier to select a chemotherapy regimen that does not match the AI recommendation. In general, elderly patients have higher performance scores than younger patients [11].

Age > 80 years, performance status were the main factors for discrepancies in stage II and stage III. In stage II, the MTB selected surveillance as the treatment option for four patients, when taking age, performance status score, and comorbidities into account. The patients were aged 68, 69, 72, and 85 and their performance status scores were all grade three. They also had comorbidities, such as dementia, chronic kidney disease, and chronic heart failure. For the remaining gastric cancer patient, the MTB selected S-1 adjuvant chemotherapy because of the inconvenience of frequent hospitalization due to the old age of the patient (82 years) and a performance status score of grade two (Table S2). In stage III, the MTB selected S-1 adjuvant chemotherapy for four patients after considering age (all over 80 years), a performance status score of two or three, and the inconvenience of frequent hospitalization. For the remaining two patients, the MTB selected the XELOX regimen after taking age (49 and 64 years), a performance status of one, and the presence of no comorbidities into account (Table S3).

In this study, there was no significant difference in concordance rates between stage II and stage III when compared to stage I (p = 0.367 for stage II, p = 0.673 for stage III). Interestingly, the concordance rate for stage IV was significantly lower than that of stage I due to differences in the preference for a palliative chemotherapy regimen in the local guidelines between the MTB and WFO (OR 0.017, 95% CI 0.005–0.055, p = 0.017). S-1 plus cisplatin is a commonly used chemotherapy regimen in Korea and Japan, following the Japanese guidelines [12, 13], whereas S-1 is an investigational agent in the National Comprehensive Cancer Network (NCCN) guidelines of 2018, and therefore not used in the Memorial Sloan Kettering Cancer Center (MSKCC) following the NCCN guidelines [14]. Therefore, since S-1 + cisplatin was not included in the WFO chemotherapy regimen based on the MSKCC data, the concordance rate between the MTB and AI was significantly lower in stage IV (Table S4). Theses local guideline differences were a primary cause of non-concordance not only in stage IV but also in stage I. D2 lymph node dissection is commonly performed in East Asia, unlike the Western world, and observation is recommended in the Japanese guidelines for pathologic stage I after curative gastrectomy [13]. The MTB judged that since five patients with gastric cancer stage I in the non-concordance group had undergone D2 lymph node dissection and were over 65 years of age, there was little benefit from adjuvant chemotherapy considering its complications (Table S1).

In summary, when analyzing the pattern of discordance between AI and MTB for each stage, along with the results of multivariate analysis, discrepancies due to local guideline differences were primarily observed in stage I and stage IV, while discrepancies based on age > 80 years and performance status were mainly evident in stage II and stage III.

AI is applied in various areas of medicine, such as robotics, medical diagnosis, and medical statistics. WFO, an AI system for clinical decision support, is expected to offer many advantages, such as increased work efficiency and a decreased workload for doctors, decision support for junior oncologists, and treatment selection based on the latest medical research, even in hospitals with few or no experts [15, 16]. However, several factors lowered the concordance rate between the medical AI and experts in gastric cancer, resulting in the reduced validity of the medical AI.

First, AI lacks a comprehensive understanding of individual patient. The WFO cannot understand the comprehensive status of patients, such as patient compliance and rapport with doctors, comorbidities that may affect chemotherapy, and interpretation of whether biochemical study results are temporary or persistent [5].

However, it is expected that these AI shortcomings will be compensated for as technology advances. For example, a wearable device or sensor, that can check the patient’s condition and evaluate their activity 24 h a day, could provide continuous rather than fragmentary patient information. Therefore, an accurate individual performance status can be obtained through individual activity history, rather than through performance scores, which are limited in their range [17, 18]. The development and usage of health applications that can be used on portable computing devices, such as smartphones, are also expanding. If the medical information recorded on a personal device can be easily linked to the medical information database of a country or hospital, the patient’s medical history and comorbidities can be easily identified and applied in clinical practice [19, 20]. In addition, AI assistance for emotional support, such as a robot companion for the elderly with limited cognitive function or activity, is expected [21].

Second, the local guidelines for gastric cancer differ according to race, country, and region. WFO, the medical AI used in this study, is based on the data of MSKCC in the USA. However, gastric cancer treatment in Korea follows the Korean guidelines, which are closer to the Japanese guidelines than the NCCN guidelines [12]. Local guidelines differ in their preferred surgical methods, the effectiveness of specific chemotherapy regimens or radiation therapy, and the approval status of chemotherapy drugs. In the future, besides resolving these local guideline differences, AI may assist in determining the best treatment plan tailored to an individual cancer patient.

Third, there are several economic factors. Owing to the high cost of cancer medicines, an individual patient’s financial circumstances, such as public or private insurance coverage, can affect the chemotherapy regimen choice. Therefore, lowering the price of chemotherapy drugs and expanding insurance coverage will enhance the affordability and accessibility of cancer medicines [22]. AI increases the efficiency of clinical trials and research, thereby significantly lowering the cost of drug development, which in turn lowers the price of cancer medicine [23]. The improved cost efficiency of chemotherapy drugs is also expected to have a positive impact on social discussions and government approval regarding the insurance coverage of anticancer drugs.

In this study, the WFO version 16.4 was used as the medical AI. However, there are various types of AI applicable to the medical field, and ChatGPT is a prominent example of that. ChatGPT, an AI chatbot with remarkable text comprehension capabilities, was released for public use in November 2022. ChatGPT has become a worldwide sensation for its ability to comprehend and respond to questions on a variety of topics. Research on the capabilities and usefulness of ChatGPT has been conducted across various fields, and the medical field is no exception. Brian Schulte conducted a study comparing the ability of ChatGPT to suggest appropriate systemic therapies for 51 different prompts utilizing 32 advanced solid tumors with the NCCN guidelines. The overall ratio of those medications listed by ChatGPT to those suggested in the NCCN guidelines was 0.77 [24]. Georges et al. retrospectively investigated the effectiveness of ChatGPT in assisting healthcare providers with decision-making in the emergency room, focusing on patient with metastatic prostate cancer and concluded that ChatGPT has the potential to assist healthcare providers in enhancing patient triage and improving the efficiency and quality of care in emergency room [25].

However, not only positive results but also significant drawbacks for ChatGPT to function as a medical supporting AI have been reported. ChatGPT can provide varying answers to the same question without providing references, and it can generate incorrect answers in a confident sound manner [26]. Therefore, Zhou et al. reported that we should learn to utilize ChatGPT without relying on it and always remember that it is a chatbot, not a person [27].

Despites these drawbacks, considering the potential of AI shown by ChatGPT and high concordance rate between WFO and MTB, further research and corresponding improvements in capabilities are expected to enable AI to perform well in the role of a medical assistant.

This study had several limitations. First, it was a retrospective and single-center analysis; therefore, it may have been biased. Second, the results of this study were analyzed based on a treatment consensus for WFO and an MTB from 2015 to 2018. If we compare the concordance rates between the last version of WFO and the MTB, based on the latest guidelines, the results may differ. Third, this study only used the concordance rate as a method for evaluating the validity of AI. Therefore, factors such as the extent to which AI influences doctors’ decisions, if better outcomes such as increased overall survival or disease-free survival can be achieved with AI-recommended treatments, and time and cost saving gained by using AI, have not been examined. These factors should be analyzed not only for the validation of AI but also in terms of the usefulness and economic feasibility of AI. However, to investigate these factors, a large-scale prospective study is required, and discussions on the ethics and legal responsibility of AI decisions should be conducted before such a study.

In this study, the factors affecting the dis-concordance between AI and the MTB were age, performance status, and stage IV gastric cancer. The effect of gastric cancer stage IV occurred because of differences in the local guidelines between AI and the MTB. Also, the effects of age and performance status were caused by the inability of AI to comprehensively understand individual patients.

留言 (0)

沒有登入
gif