Human-Like Artificial Intelligent System for Predicting Invasion Depth of Esophageal Squamous Cell Carcinoma Using Magnifying Narrow-Band Imaging Endoscopy: A Retrospective Multicenter Study

INTRODUCTION

Esophageal cancer is one of the most prevalent malignancies and a leading cause of cancer-associated mortality worldwide (1). Esophagus squamous cell carcinoma (ESCC) accounts for 84% of the new global cases of esophageal cancer (2). Endoscopic resection (ER) is recommended for treating esophageal neoplasia of epithelium (EP) to minute submucosal invasion (SM1) because it is associated with fewer complications and shorter hospital stay compared with surgical treatment (3). However, ESCC deeper than SM2 represented a formal contraindication for ER, with the risk of lymph node metastasis, which can reach up to 50% (4–7). Accurately differentiating SM2-3 lesions from other superficial lesions is crucial for determining the optimal treatment strategy for ESCC.

Gastroenterology societies, including European Society of Gastrointestinal Endoscopy and Japanese Endoscopic Society (JES), regard magnifying endoscopy with narrow-band imaging (ME-NBI) as one of the most effective methods for preoperative invasion depth prediction of ESCC (8–10). Multiple endoscopic classification criteria have been proposed for ESCC diagnosis, such as mucosal surface features, JES classification of intrapapillary capillary loop, avascular area (AVA), and others (11–14). However, the comprehensive application of these criteria in clinical practice remains challenging due to its reliance on personal experience and empirical knowledge (15,16). Numerous microstructures, including vascular density, dilation, and shape, have been identified as valuable indicators for evaluating ESCC. However, despite their potential usefulness, these indicators are seldom used in practice due to the inherent limitations of manual measurement, specifically cognitive bias (17,18).

Artificial intelligence (AI) has made several preliminary attempts to predict the invasion depth of ESCC (19,20). However, the lack of transparency in the diagnosis process of deep learning (DL) models may reduce trust in AI-assisted decisions and weaken their credibility. A previous observational study found that cases with SM2-3 ESCC account for only 6.14%–9.47% of superficial ESCC, resulting in limited data volume available for DL training, damaging the robustness (21). While endoscopists can learn from rare visual examples by incorporating prior knowledge, developing AI architecture with limited data volume and interpretability remains a challenge.

Based on human-like learning principles, we developed and evaluated an interpretable artificial intelligence–based invasion depth prediction system (AI-IDPS) for ESCC invasion depth. The AI-IDPS is designed to automatically evaluate and integrate prior knowledge for ESCC diagnosis, overcoming the inherent limitations of DL.

METHODS Patients and cohorts

Multicenter data were retrospectively collected between April 2016 and October 2021, from Renmin Hospital of Wuhan University, Nanjing Drum Tower Hospital of Nanjing University, Shanghai Chest Hospital, and Taizhou Hospital of Zhejiang Province. A total of 5119 images obtained through ME-NBI from 562 patients with superficial ESCC and 33 videos were used to train and validate the system (Table 1; Figure 1). All endoscopic examinations were performed using a standard magnifying endoscope (GIF-H260Z; Olympus Medical Systems, Tokyo, Japan). Two expert endoscopists jointly established criteria for poor-quality images, including bleeding, halation, blur, defocus, or mucus, and excluded such images from all datasets. Unique images were selected from those with a high similarity. Invasion depth of lesions used for prediction was obtained from pathological examination after ER or esophagectomy with negative margins. This work followed the Standards for Reporting Diagnostic Accuracy Studies 2015 reporting guidelines.

Table 1. - Baseline characteristics Baseline characteristics Dataset for feature extracting model Train dataset for feature fitting model Test dataset for AI-IDPS Video test set for AI-IDPS Sex, n (%)  Male 374 (73.2%) 125 (71.8%) 37 (72.6%) 28 (84.9%)  Female 137 (26.8%) 49 (28.2%) 14 (27.4%) 5 (15.1%) Age (yr), median (range) 64 (31–88) 66 (45–81) 64 (43–80) 66 (52–81) Tumor size (mm), median (range) 15 (3–130) 15 (3–130) 15 (3–40) 19 (5–35) Depth of lesion  Uncertain, n (%) 46 (8.5%) 0 (0%) 0 (0%) 0 (0%)  pEP-LPM, n (%) 349 (64.6%) 116 (66.7%) 24 (47.0%) 15 (45.4%)  pMM-SM1, n (%) 80 (14.8%) 28 (16.1%) 10 (19.6%) 10 (30.3%)  pSM2-SM3, n (%) 65 (12.0%) 30 (17.2%) 17 (33.3%) 8 (24.2%) No. of images for each lesion depth  Uncertain, n (%) 1517 (30.8%) 0 (0%) 0 (0%)  pEP-LPM, n (%) 2083 (42.3%) 743 (69.5%) 101 (51.8%)  pMM-SM1, n (%) 754 (15.3%) 177 (16.6%) 45 (23.1%)  pSM2-SM3, n (%) 570 (11.5%) 149 (13.9%) 49 (25.1%)  Case distribution (Hospital 1\Hospital 2\Hospital 3\Hospital 4) 41\352\24\94 22\100\12\40 8\10\25\8 0\0\33\0  Image distribution (Hospital 1\Hospital 2\Hospital 3\Hospital 4) 1025\2973\280\646 146\646\102\175 29\54\85\27 \

Hospital 1, Renmin Hospital of Wuhan University; Hospital 2, Nanjing Drum Tower Hospital of Nanjing University; Hospital 3, Shanghai Chest Hospital; Hospital 4, Taizhou Hospital of Zhejiang Province.

AI, artificial intelligence; EP, epithelium; IDPS, invasion depth prediction system; LPM, lamina propria; MM, muscularis mucosa; SM, submucosa; uncertain, lesions without definite invasion depth and inflammation.


F1Figure 1.:

Flow chart. AI-IDPS, artificial intelligence–based invasion depth predicting system.

Development of feature fitting model

DL and quantitative algorithm models were applied to develop feature extraction models. Nine invasion depth–related features were included in the system: mucosal flatness (11,22), background color, JES classification of IPCLs, diameter, tortuosity and cyclization of IPCLs, area size of AVAs, length of AVA major axis, and the spectral dominant color. Details of the development and validation of feature extraction models are provided in supplementary materials. A summary of typical endoscopic features for ESCC invasion depth in previous studies is available in Table 2.

Table 2. - Summary of typical endoscopic classifications of ESCC diagnosis Classification Type Description Pathology IPCL classification [27] IPCL-I Brown loops Normal IPCL-II Dilation and elongation of these capillaries, appearing at the margin of erosions. Inflammatory IPCL-III “Borderline” lesions with less vascular proliferation Atrophic mucosa or low-grade intraepithelial neoplasia IPCL-IV Vascular proliferation increased in “borderline” lesions Noninvasive high-grade intraepithelial neoplasia IPCL-V1 Dilation, meandering, irregular caliber, and form variation EP IPCL-V2 Extension of IPCL type V-1 LPM IPCL-V3 Advanced destruction of IPCL MM, SM1 or deeper IPCL- VN Generation of new tumor vessel SM2 or deeper Magnifying endoscopic classification of the Japan Esophageal Society [12] IPCL-A Normal IPCL or abnormal microvessels without severe irregularity Normal epithelium, inflammation, and LGIN IPCL-B1 Abnormal microvessels with severe irregularity or highly dilated abnormal vessels (type B vessels with a loop-like formation) EP-LPM IPCL-B2 Abnormal microvessels with severe irregularity or highly dilated abnormal vessels (type B vessels with a loop-like formation) MM-SM1 IPCL-B3 Highly dilated vessels which calibers appear to be more than 3 times that of usual B2 vessels SM2 or deeper AVA-small Smaller than 0.5 mm in diameter EP-LPM AVA-middle 0.5 mm or between 0.5 and 3 mm MM-SM1 AVA-large 3 mm or larger SM2 or deeper Novel endoscopic criteria for mucosal and submucosal cancers [11] Criteria for mucosal cancer Any size of flat or slight elevation or slight depression with smooth/even surface Mucosal Slightly elevated lesion of ≤1 cm with granular or uneven surface Hyperemic flat lesion of ≤3 cm with granular or uneven surface Slightly depressed lesion of ≤2 cm with uneven surface Criteria for submucosal cancer Irregularly (unevenly) nodular or protruded lesion of any size Submucosal Slightly elevated lesion of >1 cm with granular or uneven surface Hyperemic flat lesion of >3 cm with granular or uneven surface Irregularly (unevenly) depressed lesion of >2 cm Ulcerative lesion of any size Brownish epithelium in squamous neoplasia of the esophagus [28] Negative Lesion without brownish color change in the area between vessels Low-grade intraepithelial neoplasia Positive Brownish color changes in the areas between vessels High-grade intraepithelial neoplasia or invasive cancer

AVA, avascular area; EP, epithelium; IPCLs, intrapapillary capillary loops; MM, muscularis mucosa; SM, submucosa.

To achieve optimal predicting efficiency, all visual feature indicators were integrated into AI-IDPS for invasion depth prediction. The model with superior performance in testing datasets was chosen for AI-IDPS (see Supplementary Table S5, Supplementary Digital Content, https://links.lww.com/CTG/A970; Supplementary Figure S2, Supplementary Digital Content, https://links.lww.com/CTG/A958).

Development of AI-IDPS

The AI-IDPS was constructed using 13 features extraction models and 1 feature fitting model. The accuracy of each model was optimized according to the ROC curve (23,24). All outcomes from previous models were provided as input to the feature fitting model for decision-making and presented in real time to illustrate the process of AI prediction reached (Figure 2). For real-time analysis, AI-IDPS could recognize the frozen screen during ME-NBI and conduct analysis automatically. This is a fully automatic procedure without requiring additional commands. All features captured and invasion depth prediction are presented on screen for 5 seconds (Video 1). Once AI-IDPS is activated, evidence for AI prediction, including invasion depth–associated features, IPCL, and AVA mapping, is presented to endoscopists, allowing endoscopists to trust the AI predictions and make informed decisions based on the provided evidence (Video 1). All models run on a server with an NVIDIA RTX3080Ti GPU (with 16 GB GPU memory). The frames per second for running the system in videos is 45 on the GPU, fulfilling the real-time processing speed requirement of 25 frames per second for endoscopic video.

F2Figure 2.:

Framework of AI-IDPS. AI-IDPS, artificial intelligence–based invasion depth predicting system; AVA, avascular area; ESCC, esophagus squamous cell carcinoma; IPCLs, intrapapillary capillary loops; JES, Japanese Endoscopic Society; ME-NBI, magnifying endoscopy with narrow-band imaging.

,,]}Video validation and crossover study

To evaluate the efficacy of the AI-IDPS in practice, a crossover study was performed. Thirty-three videos were consecutively collected from Shanghai Chest Hospital between April 2021 and October 2021. Each video contained a representative ME-NBI clip from a unique lesion, focusing on the lesion inspection.

The video datasets of ESCC were provided for 3 board-certified experts, 2 senior endoscopists, and 2 junior endoscopists to predict invasion depth. First, all endoscopists were required to watch the videos and predict the invasion depth of each lesion. After a 5-month washout period, all endoscopists were provided with the same videos alongside AI-IDPS predictions and asked to predict the invasion depth. All videos were renamed before being represented to endoscopists, who were reminded of deliberately adopt or disregard the AI-IDPS prediction based on the evidence of AI decision and personal judgment. The performance of endoscopists was compared before and after AI assistance. For video validation, the overall prediction of each case was recognized as the final result of AI-IDPS. The gold standard was the pathological result after ER or esophagectomy.

Questionnaire survey

Endoscopists who participated in the crossover study were provided with a questionnaire survey. A seven-point Likert scale was used to estimate endoscopists' subjective understanding and satisfaction with AI-IDPS. Binomial options were used to validate endoscopists' preference for AI-IDPS or pure DL models. All questions in our survey are supplied in Supplementary Table S1 (see Supplementary Digital Content, https://links.lww.com/CTG/A970).

Ethical approval

This study was approved by the institutional review board of Renmin Hospital of Wuhan University (WDRY2019-K094), Nanjing Drum Tower Hospital (2019-204-01), Shanghai Chest Hospital (IS2143), and Taizhou Hospital (K20210719), with a waiver granted for the requirement of informed consent. The reader study was not an intervention trial and was performed under guidelines approved by the institutional review board. All data processed for this investigation were anonymous when transferred to the study investigators.

Statistical analysis

The area under the receiver operating curve, positive predictive value, negative predictive value, sensitivity, specificity, and accuracy were presented for the quantitative data. The χ2 test was used to analyze the relationship between invasion depth and endoscopic features. The McNemar test was applied to compare the endoscopists' performance with that of AI-IDPS in identifying SM2-3 ESCC in images, and a generalized estimating equation (GEE) model was applied for man-machine video competition and crossover study. A 2-sided P value of <0.05 was considered statistically significant. All calculations were performed using SPSS 23 (IBM, Chicago, IL).

RESULTS Characteristics of the patient cohort

A total of 5,119 images obtained through ME-NBI of esophageal lesions of 562 patients across 4 hospitals were included (Table 1, see Supplementary Table S2, Supplementary Digital Content, https://links.lww.com/CTG/A970), consisting of 82 (13.9%) SM2-3 lesions, 463 (78.3%) EP-SM1 lesions, and 46 (7.8%) lesions without definite pathological result.

To evaluate AI-IDPS performance, an independent testing dataset of 196 images obtained through ME-NBI from 4 hospitals, comprising 34 (66.6%) EP-SM1 lesions and 17 (33.3%) SM2-3 lesions, was used. For real-time performance validation, 33 videos were consecutively collected for video validation, consisting of 25 (65.8%) EP-SM1 lesions and 8 (24.2%) SM2-3 lesions.

Efficiency validation of indices for ESCC prediction

Before AI-IDPS construction, the performance of quantitative and qualitative features was validated. Vessel diameter can differentiate SM2-3 with an accuracy of 71.4% (67.8%–75.0%), tortuosity with 68.70% (65.0%–72.4%), and cyclization with 66.0% (62.2%–69.8%). The types of mucosa flatness and background coloration were significantly different between lesions of EP-SM1 and SM2-SM3 (see Supplementary Table S7, Supplementary Digital Content, https://links.lww.com/CTG/A970).

The overall accuracy of IPCL classification was 91.5% and that of B2-IPCL recognizing was 93.8% (91.4%–96.1%) (Table 3, see Supplementary Table S3, Supplementary Digital Content, https://links.lww.com/CTG/A970). The size of AVAs was measured using quantitative analysis of AVA and length of the major axis achieving the accuracy of 87.28% and 93.64% in recognizing large AVAs, respectively.

Table 3. - Performance of AI-IDPS system and feature extracting models Models Accuracy Sensitivity Specificity PPV NPV AI-IDPS 86.2% (81.3%–91.0%) 85.7% (75.9%–95.5%) 86.3% (80.7%–91.9%) 67.7% (55.9%–79.6%) 94.8% (91.0%–98.6%) Mucosa flatness 90.8% (87.8%–93.9%) 89.9% (84.8%–94.9%) 91.5% (87.6%–95.3%) 87.9% (82.6%–93.3%) 92.9% (89.3%–96.5%) Classification of A/B type IPCLs 96.0% (94.4%–97.6%) 98.5% (97.3%–99.7%) 90.6% (86.4%–94.9%) 95.8% (93.9%–97.8%) 96.5% (93.7%–99.2%) Classification of B1/B2 type IPCLs 93.8% (91.4%–96.1%) 95.7% (92.8%–98.6%) 92.2% (88.8%–95.7%) 90.8% (86.8%–94.9%) 96.4% (93.9%–98.8%) Classification of B3/non-B3 type IPCLs 99.7% (99.2%–100.0%) 83.3% (40.5%–100.0%) 99.8% (99.6%–100.0%) 83.3% (40.5%–100.0%) 99.8% (99.6%–100.0%) Principal components of color 75.9% (72.6%–79.3%) 76.7% (72.8%–80.5%) 73.8% (66.9%–80.6%) 89.5% (86.5%–92.5%) 52.0% (45.5%–58.5%) Cyclization of vessels 66.0% (62.2%–69.8%) 67.1% (59.8%–74.3%) 65.6% (61.1%–70.1%) 42.2% (36.1%–48.2%) 84.2% (80.3%–88.1%) Background color 82.9% (77.5%–88.6%) 84.2% (77.5%–90.9%) 80.4% (70.0%–90.8%) 89.7% (84.0%–95.5%) 71.4% (60.3%–82.6%) Diameter of vessels 71.4% (67.8%–75.0%) 72.1% (65.1%–79.0%) 71.2% (66.9%–75.4%) 48.3% (42.0%–54.7%) 87.2% (83.7%–90.7%) Tortuosity of vessels 68.7% (65.0%–72.4%) 67.1% (59.8%–74.3%) 69.3% (64.9%–73.7%) 45.0% (38.7%–51.3%) 84.9% (81.2%–88.6%) Length of AVA major axis 93.6% (91.1%–96.2%) 93.6% (89.1%–98.1%) 93.8% (90.7%–96.9%) 96.9% (93.8%–100.0%) 87.5% (83.2%–91.8%) Area size of AVAs 87.3% (83.8%–90.8%) 87.5% (81.4%–93.6%) 87.2% (82.9%–91.5%) 76.6% (69.2%–83.9%) 93.6% (90.4%–96.8%)

Data are shown as % (95% CI).

AI-IDPS, artificial intelligent–based invasion depth predicting system; AVA, avascular area; IPCLs, intrapapillary capillary loops; NPV, negative predictive value; PPV, positive predictive value.


AI-IDPS performance

After feature extraction, quantitative and qualitative features were automatically input into the feature fitting model to predict ESCC invasion depth. The useful features are summarized in Table 3 in order of contribution. On the independent dataset, AI-IDPS achieved an accuracy of 84.9% (81.3%–91.0%), sensitivity of 85.7% (75.9%–95.5%), specificity of 86.3% (80.7%–91.9%), and area under the receiver operating curve of 0.867 (Table 3). The ROC curve of AI-IDPS is provided in Figure 3.

F3Figure 3.:

Performance of AI-IDPS. (a) ROC curves of pure deep learning model and AI-IDPS. The rectangle, round, and triangle dots denote the performance of expert, senior, and junior endoscopists in image testing datasets, respectively; and the comparation of endoscopists' performance with and without AI-IDPS assistance. The accuracy (b), sensitivity (c), and specificity (d) change for each endoscopist were presented. The blue dot represents performance without AI-IDPS, and the red dot represents performance with AI-IDPS assistance. AI-IDPS, artificial intelligence–based invasion depth predicting system; DCNN, Dynamic Convolution Neural Network; ROC, receiver operating curve.

Video validation

A total of 33 videos consecutively collected at Hospital 4 were used for video validation. The accuracy, sensitivity, and specificity of AI-IDPS in video validation were 84.9% (71.9%–97.8%), 87.5% (57.9%–100.0%), and 84.0% (68.6%–99.4%), respectively. The representative video identification results are presented in Supplementary Figure S3 (See Supplementary Digital Content, https://links.lww.com/CTG/A959).

Comparison with pure DL model

To explore whether AI-IDPS performance was superior to that of learning architecture, a pure DL model was established using the same training and testing dataset as AI-IDPS. As Table 4 summarizes, at similar sensitivity levels (85.7% vs 83.7%), AI-IDPS achieved a higher accuracy (86.2% vs 60.0%, P < 0.001) and specificity (86.3% vs 52.1%, P < 0.001) and a positive predictive value (66.7% vs 37.0%, P < 0.001) compared with the pure DL model.

Table 4. - Comparison between AI-IDPS and pure deep learning model AI-IDPS Pure deep learning P Value Accuracy 86.2% (81.3%–91.0%) 60.0% (53.1%–66.9%) <0.001 Sensitivity 85.7% (75.9%–95.5%) 83.7% (73.3%–94.0%) 0.79 Specificity 86.3% (80.7%–91.9%) 52.1% (44.0%–60.2%) <0.001 PPV 66.7% (54.8%–78.6%) 37.0% (28.0%–45.9%) <0.001 NPV 93.3% (89.1%–97.5%) 90.5% (84.2%–96.8%) 0.442

Data are shown as % (95% confidence interval).

AI-IDPS, artificial intelligent–based invasion depth predicting system; NPV, negative predictive value; PPV, positive predictive value.


AI-IDPS performance compared with that of endoscopists

In the man-machine contest, AI-IDPS achieved a significantly higher accuracy and sensitivity when analyzed by images (86.2% vs 69.2%, 85.7% vs 32.0%, respectively) and a comparable specificity (86.3% vs 81.7%, respectively) (Table 5). AI-IDPS had a significantly better performance than the average performance of endoscopist, when analyzed by patient (see Supplementary Table S6, Supplementary Digital Content, https://links.lww.com/CTG/A970).

Table 5. - Man-machine contest Accuracy Sensitivity Specificity PPV NPV P a value P b value P c value P d value P e value AI-IDPS 86.2% (81.3%–91.0%) 85.7% (75.9%–95.5%) 86.3% (80.7%–91.9%) 67.7% (59.8%–79.7%) 94.7% (90.9%–98.6%) — — — — — Endoscopist 1 63.1% (56.3%–69.9%) 28.6% (15.9%–41.2%) 74.7% (67.6%–81.7%) 27.5% (14.8%–40.1%) 75.7% (68.6%–82.8%) <0.005 <0.001 <0.005 <0.001 <0.001 Endoscopist 2 62.1% (55.2%–68.9%) 61.2% (47.6%–74.9%) 62.3% (54.5%–70.2%) 35.3% (24.9%–45.7%) 82.7% (75.6%–89.9%) <0.001 <0.05 <0.001 <0.001 <0.005 Endoscopist 3 69.7% (63.3%–76.2%) 14.3% (4.5%–24.1%) 88.4% (83.2%–93.6%) 29.2% (9.6%–48.8%) 75.4% (68.9%–82.0%) <0.05 <0.001 0.678 <0.005 <0.001 Endoscopist 4 73.9% (67.7%–80.0%) 20.4% (9.1%–31.7%) 91.8% (87.3%–96.2%) 45.5% (22.9%–68.1%) 77.5% (71.2%–83.8%) 0.123 <0.001 0.152 0.064 <0.001 Endoscopist 5 69.7% (63.3%–76.2%) 30.6% (17.7%–43.5%) 82.9% (76.8%–89.0%) 40.9% (28.7%–53.1%) 83.0% (76.4%–89.5%) <0.001 <0.001 <0.005 <0.005 <0.005 Endoscopist 6 72.8% (66.6%–79.1%) 49.0% (35.0%–63.0%) 80.8% (74.4%–87.2%) 45.3% (31.4%–59.1%) 82.4% (76.1%–88.7%) <0.001 <0.001 0.064 <0.05 <0.005 Endoscopist 7 71.8% (65.5%–78.1%) 24.5% (12.5%–36.5%) 87.7% (82.3%–93.0%) 40.0% (21.4%–58.6%) 77.6% (71.1%–84.0%) <0.001 <0.001 0.791 <0.05 <0.001 Endoscopist 8 68.7% (62.2%–75.2%) 26.5% (14.2%–39.0%) 82.9% (76.8%–89.0%) 32.5% (17.3%–47.7%) 76.8% (70.1%–83.5%) <0.001 <0.001 0.189 <0.001 <0.001 Endoscopist 9 70.8% (64.4%–77.2%) 32.7% (19.5%–45.8%) 83.6% (77.6%–89.6%) 40.0% (24.1%–55.9%) 78.7% (72.2%–85.2%) <0.001 <0.001 0.503 <0.01 <0.001

Data are shown as % (95% confidence interval).

Endoscopist 1–3 were junior, 4–5 were senior, and 6–9 were expert endoscopists.

AI-IDPS, artificial intelligent–based invasion depth predicting system; NPV, negative predictive value; PPV, positive predictive value.

aAnalysis for accuracy.

bAnalysis for sensitivity.

cAnalysis for specificity.

dAnalysis for PPV.

eAnalysis for NPV.


Evaluation of AI-IDPS effect on endoscopists' prediction

In the objective evaluation, the crossover study showed that the endoscopists had significantly improved accuracy (from 79.7% to 84.9% on average, P = 0.03) and comparable sensitivity (from 37.5% to 55.4% on average, P = 0.27) and specificity (from 93.1% to 94.3% on average, P = 0.75) after AI-IDPS assistance.

In the subjective evaluation, the mean scores of interpretation of AI-IDPS were 5.86 ± 1.36, 4.86 ± 1.46, 4.86 ± 0.83, 4.29 ± 1.49, and 4.71 ± 1.83 when evaluating the extent to which users understood the AI decision process. The extent to which AI-IDPS influence users' judgment is higher (4.57 ± 1.4 vs 3.43 ± 1.51), and it achieved greater user satisfaction (5.71 ± 0.48 vs 3.57 ± 1.13) compared with the pure DL model. The visualized features of AI-IDPS mainly contribute to the users' trust in AI prediction (5.86 ± 0.99). Moreover, all participants in our questionnaire survey preferred AI-IDPS to the pure DL model.

DISCUSSION

In this study, we developed an interpretable AI-IDPS for ESCC based on prior knowledge. This system automatically identified 9 parameters of compound endoscopic visual features and provided an interpretable prediction for invasion depth. The system demonstrated superior performance compared with both pure DL models and expert endoscopists. To our knowledge, this is the first AI-IDPS with interpretable and transparent decision-making process for ESCC.

Accurate preoperative optical diagnosis is crucial for guiding optimal treatment approach for ESCC. Predicting the invasion depth of superficial ESCC before resection remains challenging, with overdiagnosis rates ranging from 20.0% to 40.0% and underdiagnosis rates between 4.3% and 60% in previous studies (12,14,16,25). While, AI-IDPS has demonstrated reliable performance in predicting ESCC invasion depth, achieving higher sensitivity (85.7% vs 32.0%) and accuracy (86.2% vs 69.2%) compared with experienced endoscopists. It revealed that AI-IDPS assistance enabled endoscopists to diagnose ESCC invasion depth in real time with greater accuracy, thereby preventing delays in appropriate treatment and minimizing additional harm caused by underestimating the extent of ESCC invasion.

In addition to established clinical indicators, various microstructure indicators, such as vascular density, diameter, and morphology, have been proposed for accessing ESCC invasion depth (21,26). However, the lack of efficient methods for quantitative measurement in clinical settings has hindered the adoption of these parameters, which are limited by manual evaluation (17). To optimize the diagnosis process, AI-IDPS integrated multiple endoscopic features derived from validated theories (11,12,27,28). In this study, we investigated the effectiveness of previously unused parameters and developed quantitative measurement methods to surmount the constraints of human visual perception, thereby enriching the existing knowledge base for ESCC invasion depth prediction.

Despite the existence of effective classification systems, the intricate decision-making process involving the mapping relationships of multiple indicators has deterred their widespread implementation (14,29). Owing to its practicality, the simplified JES classification has been recommended for clinical diagnosis and training courses (17). Computer-aided decision-making constitutes an important approach for predicting outcomes of multivariate events, such as the economic trend prediction and protein structure prediction (30,31). In this study, AI-IDPS streamlined the process by using a comprehensive and interpretable method. The decision-making process integrated all relevant indicators, transcending the biological limitations of the human brain. AI-IDPS holds promise as a potent tool for incorporating multiple features and facilitating computer-aided decision-making in endoscopic diagnosis.

The “black box” nature of AI systems remains one of the greatest challenges for AI application in medicine, alongside accuracy (32). Tokai et al and Shimamoto et al have made preliminary attempts at ESCC invasion depth prediction (19,33). However, these models performed comparably with endoscopists, and the opaque decision-making process could diminish physicians' acceptance and confidence in AI diagnosis, ultimately undermining overall benefits (34). Pure DL model provides only a diagnostic result without interpretation of the decision-making process or diagnostic basis. By contrast, AI-IDPS extracted visual features into structural data, with feature vectors weighted differently for interpreting the AI outcome. The final endoscopic features were displayed on a monitor, enabling endoscopists to understand the rationale behind AI predictions, significantly influencing their decisions. Benefited from the transparency of AI-IDPS, endoscopists can correct and validate certain errors, such as when an artifact on the mucosal surface was incorrectly segmented into vascular area and AVA (See Supplementary Figure S3, Supplementary Digital Content, https://links.lww.com/CTG/A959). This represents an effective attempt to demystify the black box nature of DL models.

Insufficient data volume, constrained by data privacy and security concerns, remains challenging for DL (35). Unfortunately, large medical datasets are sometimes unavailable due to li

留言 (0)

沒有登入
gif