The practical clinical role of machine learning models with different algorithms in predicting prostate cancer local recurrence after radical prostatectomy

Patients

We comprehensively searched our institutional electronic database to identify PCa patients who underwent post-operative prostate mpMRI for clinically suspected local recurrence following RP between November 2015 and October 2022. Inclusion criteria were: (1) those who experienced BCR or PSA persistence following RP (two consecutive serum PSA values > 0.2 ng/mL following RP) [22]; (2) those who underwent standard prostate mpMRI for suspected local recurrence after RP. The exclusion criteria were the following: (1) androgen deprivation therapy (ADT) or radiotherapy (RT) before post-operative MRI assessment; (2) poor imaging quality or inappropriate MRI protocol; (3) insufficient follow-up data. The study flow chart is shown in Fig. 1.

Fig. 1figure 1

Study flow chart. mpMRI = multiparametric magnetic resonance imaging; BCR = biochemical recurrence; RP = Radical prostatectomy; ADT = androgen deprivation therapy; RT = radiotherapy

Clinical data, including age, pre-operative PSA, follow-up PET-CT results, and PI-RADS score, were also collected from the electronic database of our institution. Histopathologic data were obtained from the surgical pathology reports, including the International Society of Urological Pathology Gleason scores (GS), pathologic T stage, perineural invasion (PNI), seminal vesicle invasion (SVI), and positive surgical margins (PSM).

The Institutional Ethics Review Board approved this retrospective study and waived the requirement for written informed consent due to the retrospective study design.

MRI acquisition and analysis

The prostatic mpMRI examinations were performed using a 3.0T MRI scanner (Skyra; Siemens, Munich, Germany) with a pelvic phased-array surface coil without an endorectal coil. The prostate mpMRI protocol, including T1-weighted (T1WI), T2-weighted (T2WI) in three planes, diffusion-weighted imaging (DWI) and dynamic contrast-enhanced (DCE) T1WI, conformed to the PI-RR recommendations [13]. ADC images were calculated based on the DWI images of 50 and 1000 b-values using an extended single exponential fitting model. Next, the early enhancement phase (E2) of DCE images was selected for radiomics analysis following Nie K’s method [23], which specifically identifies this phase as occurring within 10 s of the appearance of contrast agents in the femoral arteries. The specific details of the examination protocol are displayed in Table 1.

Table 1 MRI sequences and parameters for radiomics analysis

All post-operative mpMRI were independently assessed by two expert-level radiologists (reader 1 with 10 years of professional experience, reader 2 with 15 years of professional experience in prostate MRI diagnosis) in compliance with PI-RR criteria [13]. All readers were aware of pre-operative clinical and surgical pathological data, including primary tumor location. Cases with indeterminate lesions or scores were assessed by a third experienced radiologist (reader 3 with more than 20 years of professional experience in prostate cancer imaging). In the present study, the lesion scored with the highest PI-RR in mpMRI was assessed if a case contained multiple lesions.

According to the PI-RR guidelines [13], the three-dimensional entire volume of interest (VOI) encompassing the whole suspicious lesion was manually contoured on axial slices of T2WI, DWI, ADC and early enhancement phase of DCE by reader 2, who participated in the PI-RR evaluation, using ITK-SNAP software (version 3.6.0). For individuals with a PI-RR score of 1, both DWI and DCE sequences showed no abnormal signal, and we delineated normal vesicourethral anastomosis. For patients with a PI-RR score of 2, the suspicious lesion was defined as the focus showing diffuse or heterogeneous enhancement in DCE images. For patients scoring 3–5, the lesion with the highest PI-RR score was delineated. The largest lesion was segmented if two or more lesions exhibited equally high PI-RR scores. Reader 3, with more than 20 years of professional experience, reviewed all annotations. The radiologists had access to the operative histopathological and pre-surgical clinical results while segmenting VOIs. To guarantee the intra-observer consistency of annotations, the segmentation procedure was repeated by reader 2 after an 8-week interval. Reader 1 also segmented all VOIs to evaluate inter-observer repeatability.

Gold standard of reference

Based on previously reported reference standards [14], the criteria to define a post-operative mpMRI assessment as true-positive consisted of (1) a histologically confirmed positive result from biopsy specimens of the prostate or prostatectomy bed; (2) a volume enlargement detected by imaging modalities (including pelvic MRI, choline or gallium PSMA PET/CT) after more than 1 year of follow-up; (3) a volume shrinkage of a previously observed recurrent lesion at various imaging modalities or a reduction of PSA values following treatments (including ADT or salvage therapy) with a follow-up of > 2 years, restricted to patients with no signs of regional or distant metastasis on nuclear imaging (including bone scan, choline or gallium PSMA PET/CT).

The criteria for defining a post-operative mpMRI evaluation as true-negative consisted of [14]: (1) a biopsy-proven negative histopathological result obtained from prostatectomy bed or residual prostate; (2) a negative finding without tumor progression at various imaging modalities (including choline or gallium PSMA PET/CT or pelvic MRI) for more than 1 year of follow-up, accompanied by no rise of PSA levels for > 2 years.

Radiomics feature extraction and selection

We respectively extracted 1781 radiomics features from each sequence, including T2WI, DWI, ADC and early enhancement phase of DCE, using the pyradiomics package in Python [24]. The extracted radiomics features contained shape, first-order and texture features from original and filtered images. We calculated texture features utilizing the gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), gray-level dependence matrix (GLDM), and neighboring gray-tone difference matrix (NGTDM). The image transformation types included Wavelet, Laplacian of Gaussian (LOG), square, square root, logarithm, exponential, gradient, local binary pattern (2D), and local binary pattern (3D). The intra-observer and inter-observer consistency of lesion delineation were estimated with the intraclass correlation coefficient (ICC), and only radiomics features exhibiting both intra-observer and inter-observer ICC values > 0.80 were preserved for the following study.

We utilized FeAture Explorer (FAE) software (0.5.5) [25] to pre-process radiomics features and develop machine learning models. FAE is an open-source platform capable of extracting features, selecting features, constructing models, and visualizing results. First, the synthetic minority oversampling technique (SMOTE) was used to balance positive and negative samples of the training cohort. Second, we standardized the radiomics features by Z-score normalization, subtracting and dividing the mean value by the standard deviation for each feature. Third, the Pearson correction coefficient (PCC) analysis was utilized to reduce dimensionality. If the PCC of a feature pair surpassed 0.9, which means a high correlation between these two features, one of them was randomly eliminated. Finally, to filter significant radiomics features, we employed recursive feature elimination (RFE), which selects the best (or worst) features by iteratively constructing machine learning models for each feature. The feature selection procedure was carried out in the training set, with the number of selected features limited to a range of 1 to 20.

Model development and validation

Employing the selected radiomics features, three prevalent machine learning models, based on support vector machine (SVM), linear discriminant analysis (LDA), and logistic regression-least absolute shrinkage and selection operator (LR-LASSO), were built to identify the classifier with the best prognostic prediction capability. Five-fold cross-validation was employed in the training cohort to determine the hyper-parameters of radiomics models. The hyper-parameters were adjusted in accordance with the model performance in the validation set. The area under the curve (AUC) obtained from the receiver operating characteristic (ROC) curve, sensitivity, specificity, accuracy, positive prediction value (PPV), and negative prediction value (NPV) of the three models were calculated to select the best radiomics model for the following analysis.

First, we compared the predictive performance of the best radiomics model with the PI-RR assessment of expert-level radiologists to evaluate their ability to predict PCa local recurrence. Then, clinicopathologic features were entered into univariate and multivariable logistic regression analyses to estimate their predictive capability. The radiomics features obtained through RFE and clinicopathologic features selected through logistic regression analyses were evaluated for correlation. We removed features demonstrating high correlation (PCC > 0.9) to acquire the final features for combined model construction. To uniformly and objectively compare the predictive performance of all models, the machine learning algorithm that performed best in the radiomics models was chosen to construct the combined model by integrating significant clinicopathologic and radiomics features. Finally, we compared the combined model with the PI-RR score assessed by expert-level radiologists to explore if the combined model could further improve the predictive level. Figure 2 displays the entire workflow of this study.

Fig. 2figure 2

Imaging analysis and data flow of the research. VOI = volume of interest; PI-RR = Prostate Imaging for Recurrence Reporting system; ICC = intraclass correlation coefficient; PCC = Pearson correction coefficient; RFE = recursive feature elimination; ROC = receiver operating characteristic; DCA = decision curve analysis

Statistical analysis

SPSS 26.0 software, Python software (version 3.5.6) and R software (version 3.6.3) were used for statistical analysis. Continuous variables were represented as the mean ± standard deviation or median with interquartile range (IQR) in accordance with the normality test, while categorical variables were reported as frequency and proportions. We employed the Shapiro–Wilk test to verify the normality of features. The independent-sample t-test or Mann–Whitney U test was applied to compare quantitative parameters, and the chi-square test was utilized to compare qualitative parameters.

ROC curves and corresponding AUC values assessed all models. In accordance with previous findings [14, 26], PI-RR ≥ 3 was used to define a positive post-operative mpMRI assessment. The best cutoff values of machine learning models were determined according to the maximization of the Youden Index in the training cohort. The sensitivity, specificity, accuracy, PPV, and NPV of all models were calculated for predictive performance comparison. The DeLong test was employed to compare the AUCs of all models. Decision curve analysis (DCA), which estimated the net benefits at varying threshold probabilities, was used to evaluate the clinical applicability of the PI-RR system, radiomics, and combined models. The calibration curve was plotted to assess the calibration ability of the combined model. A two-tailed p-value < 0.05 represented statistical significance.

留言 (0)

沒有登入
gif