External Validation of Robust Radiomic Signature to Predict 2-Year Overall Survival in Non-Small-Cell Lung Cancer

The study was approved by the Institution Ethics Committee (IEC) (IEC-2) of our hospital as a retrospective study. A consent form waiver is provided by the same IEC as an institutional policy. All the data of the patients were kept confidential.

PatientsTMH Dataset

Two hundred patients of non-small cell lung carcinoma (NSCLC) who underwent treatment with a combination of surgery, chemotherapy, and radiotherapy in our hospital from January 2012 to January 2017 were included in this study. The pre-treatment CT or PET/CT scans of these patients was extracted from the hospital PACS and was included. Similarly, clinical data were extracted from the hospital information system (HIS). Patients’ demographic data are shown in Table 1.

Table 1 Demographic data of patient population used in this studyExternal Validation Set

The Cancer Image Archive (TCIA) open-source data: 100 NSCLC patients with CT images and RT structures (GTV-1) and survival data of NSCLC-radiomics collection were downloaded from the TCIA portal [18, 23]. The CT scans and GTV were used to extract radiomic features.

Pre-Processing of Data

Clinical data extracted from the HIS were cleaned and converted into a form amenable to machine learning. CT or PET/CT scans were checked for completeness, and contrast-enhanced CT series of PET/CT or CT studies were selected for this study.

Based on median overall survival (OS) in both the datasets, 2-year OS was selected as a clinical endpoint (Table 1). For both datasets, OS were binarized based on 2-year OS [(OS < 2 years) = 1 and (OS > 2 years) = 0].

PET/CT Imaging ProcedureTMH Dataset

Pre-treatment PET/CT scans were performed using Gemini TF16 or Gemini TF64 PET/CT scanners (Philips Medical Systems, Netherlands). The CT of PET/CT scans were performed after the injection of 60 to 80 ml of non-ionic contrast using the protocol mentioned in Supplementary Table s1. CT images were reconstructed using the filtered back project (FBP) reconstruction algorithm.

TCIA External Validation Set

Pre-treatment CT scans were performed using a Gemini CT scanner (Philips Medical Systems, Netherlands). The CT scans were performed after the intravenous injection of 80 ml of non-ionic contrast using the protocol was mentioned in Supplementary Table s1. CT images were reconstructed using the filtered back projection (FBP) reconstruction algorithm.

From both cohorts, CT data were extracted in Digital Imaging and Communications in Medicine (DIOCM) format for radiomic extraction.

Radiomic Extraction Internal Dataset

The CT series of PET/CT scans were loaded on Intellispace Discovery Portal (research-only build; Philips Medical System, Eindhoven, The Netherlands) and primary tumor delineation was performed using 3D contouring software by the experienced (more than 15 years) medical physicists and saved as radiotherapy structure (DICOM series: RT structure) by the name of gross tumor volume (GTV). The GTVs were checked and approved by experienced (more than 20 years) nuclear medicine physicians and radiologists. Subsequently, the DICOM images and GTV were transferred to the research computer for radiomic extraction. On a research PC, radiomic features were extracted using in-house developed PyRadGUI software using a combination of Plastimatch [24] and Pyradiomics software [25]. The following pre-processing steps were performed using PyRadGUI software. Image conversion: DICOM images and RT structures were converted into NRRD format using the Plastimatch package. Resampling: Images were resampled using a 2 × 2 × 2 mm cube isotropic voxel. Filtering and transformation of image: Three sets of images were generated applying Laplacian of Gaussian (LoG) filters with sigma values of 1, 2, and 3 mm. We also generated eight sets of wavelet-transformed images using eight combinations of high-pass and low-pass wavelet filters [25]. Finally, a total of 1093 radiomic features were extracted from the 12 imaging sets (1 set of original images, 3 sets of LoG images, and 8 sets of wavelet images) and corresponding GTVs [25].

External Validation Set

The TCIA dataset contains CT Image and RT structure (GTV) in DICOM format. We performed the same operation as described in the earlier section, and 1093 radiomic features were extracted for every patient’s data.

Data Balancing

Usually, it is assumed that balanced endpoints are more appropriate to train most of the machine learning algorithms for prediction model development [26]. The majority of the time clinical endpoints have imbalanced ratios, which do not meet the assumptions of balanced endpoints and require data balancing. Data balancing was performed using synthetic minority oversampling technique (SMOTE).

Prediction Algorithm Used

Several radiomic studies have shown that random forest (RFC), support vector (SVC), and gradient boosting classifier (GBC) algorithms are the most efficient classification algorithms for treatment response and outcome events prediction in radiomics based analysis in several types of cancer (28–30). Hence, in this study, we have used RFC, SVC, and GBC for the overall survival prediction. Additionally, we also developed deep learning (DL) multilayer perceptron model.

Radiomic Feature Selection

We opted for a two-step process to select the best radiomic features for OS prediction out of 1093 radiomic features extracted from CT images. We selected 121 stable radiomic features based on our earlier radiomic stability study [21]. Subsequently, the top 50 features were selected using the Chi-squared test. Finally, the top 10 features were selected by applying recursive feature elimination (RFE) methods using random forest (RFE-RF). Python 3.9.0 software is used for the feature selection process.

Prediction Model Development and Validation

The prediction models were developed using random forest (RF), support vector (SV), and gradient boosting (GB) algorithms in Python 3.9.0 software. Hyperparameters of these prediction algorithms were tuned using nested cross-validation, and the same parameters were used to develop all the prediction models. Subsequently, 10-fold cross-validation was performed to access the model performance on the internal dataset. In the next step, a train-test split (80:20) was performed for model development and validation. Three prediction algorithms were used to develop a total of six prediction models utilizing original and balanced training sets.. RF models (RF-Model-O: on the original training data and RF-Model-B: on the balanced training data); SV models (SV-Model-O: on the original training data and SV-Model-B: on the balanced training data), and GB models (GB- Model-O: on the original training data and GB-Model-B: on the balanced training data) were developed on the internal dataset and validated on the test dataset. Subsequently, these models were also validated using the bootstrap (1000 iterations) method on the test dataset and on the external validation cohort.

Two deep learning models (simple-DL: 7-layer perceptron model without dropout layer and dropout-DL: 7-layer perceptron model with dropout layer) were also developed using an internal train-test dataset. Both the DL models were validated using the internal test dataset and an external dataset.

Using random forest, support vector, and gradient boosting algorithm, three models, i.e., RF-MODEL-V, SV-MODEL-V, and GB_MODELS-V, were also developed for predicting 2-year overall survival with tumor volume as a single feature.

Statistical Tests

For all the statistical tests, different packages of Python 3.9.0 open-source software were used. Descriptive statistical tests were performed to understand the distribution of patients in various categories. The demographic data of the internal and external cohorts were compared using t-test. Hierarchical clustering using Pearson’s correlation test and z-score and Chi-squared tests was performed for feature reduction. Recursive feature elimination using a random forest algorithm was performed to select the most significant features for model development. The features from both cohorts were compared using a t-test and violin plot. Receiver operating characteristics area under the curve (AUC), accuracy, precision, recall, and f1-score were calculated for all prediction models on internal and external validation datasets.

留言 (0)

沒有登入
gif