PET/CT Radiomics Integrated with Clinical Indexes as a Tool to Predict Ki67 in Breast Cancer: a Pilot Study

Study Population

This retrospective study was conducted at the First Affiliated Hospital of Xi ‘an Jiaotong University (NCT05826197), and the study protocol was approved by the Ethics Committee of Xi ‘an Jiaotong University (IRB-SOP-AF-16).

Between November 2016 to June 2020, a total of 129 female patients with BC who underwent 18F-FDG PET/CT examinations were retrospectively studied. Inclusion criteria were: (1) BC confirmed through preoperative puncture or postoperative pathology; (2) underwent 18F-FDG PET/CT examination; and (3) available Ki67 expression. Exclusion criteria were: (1) the primary lesion was too small to detect by 18F-FDG PET/CT or occult BC patients (n = 5); (2) diffuse lesion on unilateral side or multifocal lesions in bilateral side (n = 4); (3) benign breast lesions (n = 1); and (4) neo-adjuvant chemotherapy or anti-tumor treatment performed before imaging (n = 4).

A flowchart of this process is shown in Fig. 1. A total of 114 patients were enrolled in the study. Patients were randomly assigned to a training group (79 patients) and a test group (35 patients) at a ratio of 7 to 3 using a stratified sampling method. The age, location (right side or left side) of the BC, and the menopausal status (premenopausal or during and beyond menopausal) of the patient were recorded.

Fig. 1figure 1

The workflow of the BC patient selection. BC, breast cancer; FDG, fluorodeoxyglucose

Immunohistochemistry

Formalin-fixed paraffin-embedded tissue samples from BC cases were used for Ki67 assessment by an experienced pathologist blinded to the PET/CT results. Ki67 levels were divided into the 0.5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100% groups according to the degree of expression, and an expression index ≥ 20% was considered positive.

PET/CT Data Acquisition

All examinations were performed using a 64-detector scanner (Gemini TF PET/CT, Philips, Netherlands). 18F-FDG was synthesized using a small cyclotron (GE MINItrace) and an FDG synthesis module. Radiochemical purity was > 99%. Both endotoxin and bacteriological tests were negative, which met the radio-pharmaceutical requirements.

The patients fasted for over six hours before intravenous injection of 18F-FDG (3.7 MBq/kg). The fasting blood glucose level should be lower than 12.0 mmol/L. After resting for 60 min, the patients underwent whole-body PET/CT. The scan scope was from the top of the skull or the level of the first thoracic vertebra to the upper femur. PET collects 6–10 beds with 1.5 min/bed. CT scans (tube voltage, 120 kV; automatic milliampere second; matrix, 512 × 512; layer thickness, 5 mm) were performed for lesions’ location and attenuation correction of the PET image. MIP (Maximum Intensity Projection), PET, CT, and fusion images were displayed on the Extended Brilliance Workstation (EBW) workstation.

Image Analysis

Manually defined features, including tumor morphology (regular or irregular), necrosis (with or without), and calcification (with or without), as well as the N (N0, or N1) and M (M0, or M1) stages, were determined in a double-masked manner by two experts (Dawei Li, reader 1; Cong Shen, reader 2) with more than ten years of PET/CT interpretation experience. Any disagreement between the two radiologists was resolved by another experienced radiologist.

The longest diameter (mm) was measured at the maximal horizontal position. The volume of interest (VOI) was automatically delineated with a 40% maximum standardized uptake value (SUVmax) as the threshold on the EBW workstation. The VOIs were reviewed, and manual correction was allowed when the tumor border was unsatisfied. SUVmax, mean SUV (SUVmean), standard deviation (SD) of SUV, and metabolic tumor volume (MTV) were calculated, see Fig. 2.

Fig. 2figure 2

The delineation of the VOI of BC. A female, 56 years old, with breast cancer on the right side, underwent an 18F-FDG PET/CT scan. (A) (MIP) showed a high uptake of 18F-FDG of the cancer. The VOI was segmented with a 40% SUVmax on the EBW workstation (B). The longest diameter of the cancer was measured on the maximal horizontal position and was 30.43 mm (C). The SUVmax, SUVmean, SD, and MTV were 6.01, 3.62, 0.89, and 15.680mm3 (D)

Feature Extraction

The VOIs were saved as nifty files. Radiomics feature extraction was implemented using the Philips Radiomics Tool (Philips Healthcare, China), and the core feature calculation was based on pyRadiomics (3.0.1) [16]. A total of 704 three-dimensional (3D) radiomics features were extracted, including the original features (n = 83), wavelet transform (n = 552), and logarithm transform (n = 69). The original features consist of shape-based features (n = 14), first-order statistics features (n = 18), gray-level run length matrix (GLRLM) (n = 16), neighboring gray-tone difference matrix (NGTDM) (n = 5), gray-level dependence matrix (GLDM) (n = 14), and gray-level size zone matrix (GLSZM) (n = 16). The wavelet transform (8 decompositions per level) and logarithm transform were applied to each feature in the category of the first-order statistics features, GLRLM, NGTDM, GLDM, and GLSZM. The categories and the names of all the 704 radiomics features were summarized in Supplementary Table 1. Details of the feature extraction definition are shown on the website (https://pyradiomics.readthedocs.io/en/latest/features.html#). There were no missing data for clinical and radiomics features.

Feature Reduction

In the training group, for all thirteen clinical features (general characteristics and manually measured features), a univariate logistic regression analysis test was applied to select features with a P value < 0.1 for the subsequent analysis. The maximum relevance minimum redundancy (mRMR, top 30 features were selected) and least absolute shrinkage and selection operator (LASSO) with the optimal λ were applied to choose the most discriminative radiomics features for predicting Ki67 status. Mean squared error (MSE) was used in the feature reduction. After that, the Spearman test with a threshold of 0.8 was used for the clinical features and radiomics features, respectively, to delete the collinear features. The one with the highest correlation was retained at last. The clinical model was built using the selected clinical features. The radiomics model was built using selected radiomics features and expressed as Radscore. Then, the selected clinical features and radiomics features were combined, and the LASSO regression model was applied again to delete the collinear features.

Statistical Analysis, Model Construction, and Evaluation

Data were analyzed using SPSS® v. 25.0. (IBM Corp., New York, NY, USA) and python V 3.9 (URL https://www.python.org/). Continuous variables with abnormal distributions were expressed as median [25%, 75%] and were tested using the Mann–Whitney U test. Categorical variables were compared using the χ2 test or Fisher’s exact test. Statistical significance was set at p < 0.05.

Three models, including the clinical, radiomics, and combined clinical-radiomics, were constructed using a logistic regression model. The predicted probability of clinical, radiomics, and combined clinical-radiomics models was defined as clinical score, Radscore, and combined score, respectively. The formula is expressed as a linear regression model by multiplying and summing the feature values and related weight coefficients and then passing the result through a sigmoid transformation. Models were assessed using the receiver operating characteristic (ROC) curve to discriminate Ki67 + cases from Ki67- cases. Indexes were calculated, including area under the curve (AUC), sensitivity, specificity, and accuracy (ACC). The DeLong test was used to compare the differences between ROC curves.

The calibration curves and Hosmer-Lemeshow test were utilized to assess the agreement between predicted and actual probabilities of various models, and a P value > 0.05 means good consistency. A nomogram was constructed to visualize the predictive model for Ki67 expression. The decision curve analysis (DCA) was used to determine the nomogram’s clinical usefulness by quantifying the net benefits at different threshold probabilities.

Comments (0)

No login
gif