The subjects of this study were 414 patients diagnosed with LAGC at two Chinese hospitals. Both institutions had identical inclusion and exclusion criteria. Inclusion criteria included: (a) gastric adenocarcinoma confirmed through histopathological examination; (b) LAGC diagnosis based on preoperative CT scans or laparoscopic examination according to the American Joint Committee on Cancer (AJCC) TNM staging manual (8th edition), defined as cT2 ~ 4/N0 ~ N3/M0; (c) undergoing gastrectomy and lymph node dissection after NAC, with confirmation of TRG on postoperative pathological examination; (d) having undergone multi-phase contrast-enhanced CT scans before treatment. Exclusion criteria included: (a) inability to identify the primary tumor on CT or bad CT image quality (e.g., severe artifacts) preventing accurate measurements; (b) concurrent presence of other malignancies; (c) receiving anticancer treatment before baseline CT scanning; (d) incomplete clinical or pathologic data. Ethical approval was secured from the Ethics Review Committees of both participating medical centers. Given the retrospective nature of the study, informed consent was waived.
This study divided patients into three sets: (1) Training set and internal validation set: A total of 284 LAGC patients who sought medical attention at the Fourth Hospital of Hebei Medical University from January 2013 to June 2023 were initially considered. Following the inclusion/exclusion criteria mentioned above, 225 patients were ultimately included. They were randomly assigned in a 7:3 ratio, resulting in a training cohort (n = 157) and an internal validation cohort (n = 68). (2) External validation set: A group of 130 patients with LAGC treated at the First Hospital of Qinhuangdao from December 2014 to June 2023 was identified. After applying the same inclusion/exclusion criteria, 97 patients with advanced gastric cancer were ultimately included. For details of patient inclusion and exclusion, refer to Fig. 1.
Fig. 1Inclusion and exclusion flowchart for patients in the study. LAGC locally advanced gastric cancer, NAC neoadjuvant chemotherapy, CT computed tomography, GR good response, PR poor response
Baseline characteristicsBaseline clinical features, including age, gender, body mass index (BMI), tumor differentiation, carcinoembryonic antigen (CEA), carbohydrate antigen 19–9 (CA 19–9), as well as clinical T (cT) and clinical N (cN) staging according to the 8th edition of the AJCC TNM staging system, were extracted from medical records.
NAC strategyAll enrolled patients underwent 2–4 cycles of neoadjuvant chemotherapy (specifically the SOX regimen: oxaliplatin 130 mg/m2 of body surface area administered intravenously on day 1; S-1 administered orally on days 1–14: for individuals with a body surface area less than 1.25m2, 40 mg twice daily; for those with a body surface area between 1.25 and 1.5m2, 50 mg twice daily; for those with a body surface area greater than 1.5m2, 60 mg twice daily) with treatment cycles repeated every 3 weeks. NAC was administered to all patients before surgery, and dose or cycle adjustments were made based on treatment efficacy and patient tolerance. Preoperative treatment efficacy is assessed based on improvement in patient symptoms, normalization or continuous decrease in tumor markers, and reduction in the size of the primary tumor and suspected metastatic lymph nodes observed in CT or magnetic resonance imaging (MRI). All patients received at least two cycles of the SOX regimen, and there were no cases of premature termination of the intended NAC regimen or alterations in the treatment agents. Gastric resection surgery was performed within 2 weeks of completion of NAC.
NAC response assessmentThe assessment of NAC response was conducted collaboratively by two pathology experts with more than 10 years of experience in diagnosing gastrointestinal tumors. Both were blinded to the imaging and clinical data of the patients. The TRG was categorized into four levels based on the most recent National Comprehensive Cancer Network (NCCN) guidelines (2021, version 4, guideline 26), evaluating the extent of tumor regression after preoperative neoadjuvant treatment for gastric cancer: TRG 0: No viable cancer cells (complete response); TRG 1: Residual cancer cells in single or small clusters (moderate response); TRG 2: Residual cancer with fibrosis in the stroma (mild response); TRG 3: Minimal or no tumor regression, with a significant amount of residual cancer cells (poor response). TRG 0 and TRG 1 were combined into the good response (GR) group, while TRG 2 and TRG 3 were categorized as the poor response (PR) group.
CT examinationAfter an overnight fast, patients ingested a small amount of warm water and swallowed a gas-producing powder before the examination, followed by immediate CT scanning. The CT scans covered the entire gastric region and axial images were acquired during breath-holding. Contrast-enhanced CT scans in the arterial phase, portal venous phase, and delayed phase were obtained 30, 60, and 180 s, respectively, after injection of contrast agent. The CT image acquisition parameters for Centers A and B are detailed in Supplementary Table S1.
Image standardization and segmentationImage standardization in this study involved two steps to reduce data variability between centers. Firstly, all CT images were resampled using cubic spline interpolation to a pixel size of 1 × 1 mm. Secondly, pixel intensities were normalized, transforming the intensity range to − 1024 HU to 1024 HU, and applying a consistent abdominal window with a window level of 50 and a window width of 350.
A radiologist (Radiologist A) with 5 years of experience in diagnosing digestive system tumors manually delineated regions of interest (ROIs) on the three-phase contrast-enhanced images. The segmentation encompassed tumor parenchyma, necrosis, hemorrhage, and cystic areas, resulting in multiple ROIs containing tumor regions for each patient. For each patient, segmentation was performed on arterial, portal venous, and delayed phase images, encompassing all slices standardized to 1 mm thickness containing CT findings of interest. To assess inter-observer consistency, 30 randomly selected patients underwent a re-segmentation process one month after the initial ROI delineation. This time, both Radiologist A and another radiologist (Radiologist B) with 10 years of experience in diagnosing digestive system tumor diagnosis performed the segmentation using the same method. Inter-group correlation coefficients and intra-group correlation coefficients were calculated for feature extraction. The process of image segmentation is illustrated in Supplementary Fig. S1.
Manual feature extractionManual feature extraction was conducted using the PyRadiomics software package, a Python-based tool. The extracted features comprised 8 first-order statistical features, 18 shape features, 75 s-order statistical features, and 1488 transformation features. Detailed definitions of these features are available at http://PyRadiomics.readthedocs.io/en/latest/. Transformations were applied, including wavelet, Laplacian of Gaussian (LoG) filter, square, square root, logarithm, exponential, gradient, and 2D local binary pattern (LBP2D). Image features were extracted at various spatial scales by adjusting the parameter sigma values of the LoG filter to 3.0 and 5.0. In total, 1589 manual features were extracted from each 2D image. The process of feature extraction is depicted in Supplementary Fig. S1.
Deep learning feature extractionFor all patients, we included all 2D images containing tumor tissue and their corresponding regions of interest (ROIs) as inputs. Subsequently, the image sizes were uniformly transformed to 224 × 224 pixels to match the input size of the model. This operation, allows for a more comprehensive utilization of medical images, contributing to the improvement of the model’s performance as opposed to selecting only the maximum tumor layer as input. Additionally, since effective training of deep learning models involves millions of learnable parameters for estimation, requiring a large amount of image data, and medical image datasets are often limited in size, this study employed transfer learning techniques to address the issue of insufficient image quantity. Specifically, this study employed the EfficientNet V2 architecture of a pre-trained deep learning model trained on the ImageNet dataset. The final fully connected layer was removed to create a feature extractor, and the resulting output values from the feature extractor were utilized as deep learning features. The images underwent three transformations based on the normalization parameters (z-score) of the red/green/blue channels to adapt them to the format suitable for the ImageNet dataset. Implementation of the EfficientNet V2 model was carried out using the Python timm package (https://github.com/huggingface/pytorch-image-models) in conjunction with the PyTorch library (https://www.pytorch.org/). A total of 1280 features were extracted for each 2D image.
During model training, a cross-entropy loss function was used, and optimization was performed using the Adam algorithm. A batch size of 64, an initial learning rate of 0.1, and a learning rate scheduler (ReduceLROnPlateau) dynamically adjusting the learning rate were employed for training over 100 epochs. The training process was implemented using PyTorch in an environment with an NVIDIA GTX 3090 GPU, Intel(R) Core(TM) CPU i7-12700F @ 2.10, and 64 GB of memory. The feature extraction process is illustrated in Supplementary Fig. S1.
Feature selection and signature buildingTo deal with the imbalance between PR and GR samples, we use the borderline synthetic minority over-sampling technique (Borderline-SMOTE) to oversample the training set, ensuring an equal number of GR and PR samples. Then, the handcrafted and deep learning features from the arterial, portal venous, and delayed phases were combined for feature selection and signature building. Firstly, the z-score method was applied to normalize the handcrafted and deep learning features.
The feature selection in this study comprised five steps. Firstly, the ICC value was calculated for both intra-observer and inter-observer reliability. Features with ICC values > 0.75 were considered to have good repeatability and were selected for further filtering. Subsequently, the Spearman correlation coefficient method was used to select features, randomly removing one of two features if their correlation coefficient exceeded 0.9. Next, the variance threshold method was employed to further filter the remaining features. The basic idea was to eliminate features with low variance, as these features exhibit minimal changes in the data and may contribute less information to modeling or classification tasks. In this study, a variance threshold of 0.8 was set; as commonly employed in previous studies [22]. Following that, the ReliefF algorithm, which calculates importance scores by considering the distribution of instance weights based on nearest neighbors, was applied for feature selection. With n_neighbors set to 10 in this study, the ReliefF algorithm selected 10 nearest neighbors of the same class and 10 nearest neighbors of a different class for each sample to estimate the importance scores of features. Finally, the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm was employed. Through ten-fold cross-validation, the penalty parameter was optimized to select the most useful predictive features with non-zero coefficients. The feature selection process is illustrated in Supplementary Fig. S1.
For each gastric cancer patient, as there could be multiple layers of CT images and different layers might yield different predictive results, a voting method was used to determine the final classification. By combining the finally selected features and multiplying them by their normalized coefficients, a multivariate logistic regression model was used to calculate the feature identifiers predicting GR. The final outcome comprised two feature signature; handcrafted feature and deep learning feature signature.
Establishment of the deep learning radiomics nomogramIn the training set, univariate analysis was conducted to select clinical-pathologic variables with statistical significance (P < 0.05). Subsequently, multivariate logistic regression was performed to integrate the three-phase handcrafted, the deep learning signatures, and significant clinical-pathologic factors, constructing a fused nomogram model. This model was compared with clinical models, handcrafted, and deep learning signatures.
Receiver operating characteristic (ROC) curve analysis was applied to measure the discriminative performance of each model. The Delong test was used to compare the discriminative abilities of different models. Calibration curve analysis and the Hosmer–Lemeshow test were employed to evaluate the goodness-of-fit of the models. Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) were calculated to compare the performance differences between the fused model and the clinical model. Decision Curve Analysis (DCA) assessed the clinical utility of the models. Kaplan–Meier curves were used to evaluate the association between the radiomics nomogram score and PFS.
PFSPatients in the follow-up set (n = 147) underwent follow-up every 3–6 months in the first 2 years’ post-surgery, followed by annual follow-ups. The follow-up duration extended from the time of surgery until March 2023, collecting information on PFS up to the last follow-up. PFS was defined as the duration from the commencement of tumor NAC to the onset of any type of tumor progression or mortality from any cause. All occurrences of disease progression, encompassing both local recurrence and distant metastasis, were evaluated through clinical examination and imaging modalities such as CT, magnetic resonance imaging, or positron emission tomography-computed tomography scans.
Statistical analysisDifferences in clinical characteristics among various groups or cohorts were compared using independent t-tests or Mann–Whitney U tests for continuous variables. For categorical variables, Fisher’s exact test or the chi-square test was employed as appropriate. The Akaike Information Criterion (AIC) was served as the stopping criterion for the backward stepwise process aimed at determining the optimal feature combination. Kaplan–Meier survival analysis and log-rank tests were employed to assess the probability of PFS. The optimal cutoff values were determined using X-tile software, and patients were stratified into high-risk and low-risk groups. Univariate and multivariate analyses utilizing Cox proportional hazards regression, with backward stepwise elimination and AIC, were conducted to construct the PFS prediction model. All statistical analyses were carried out using R software (version 3.6.3, http://www.R-project.org). Two-sided P value less than 0.05 were considered statistically significant.
Comments (0)