Leveraging machine learning with dynamic 18F-FDG PET/CT: integrating metabolic and flow features for lung cancer differential diagnosis

Patient demographics

This prospective study was approved by the ethics committee of the Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences (IRB approval number: KYLH2022-1). The study adhered to the ethical standards of the 1964 Declaration of Helsinki and its subsequent amendments. Written informed consent was obtained from all the participants. Dynamic 18F-FDG PET/CT scans at the chest position were conducted for patients clinically suspected of having lung cancer from May 2021 to December 2023. The exclusion criteria were: (1) No prior treatment before the scan; (2) The range of the PET/CT scan did not fully cover the pulmonary lesions and aortic arch; (3) No obvious motion artifacts or blurring at the lesion; (4) The final diagnosis was not confirmed based on surgical or biopsy findings. Data from 187 patients meeting these criteria were included in the study. In addition, total-body dynamic 18F-FDG PET/CT scans at Henan Provincial People’s Hospital were included retrospectively for testing purposes. The same ethical considerations and procedural standards were applied for these scans, with informed consent and approval from the ethics committee (IRB approval number: IRB2020123). A total of 42 patients, scanned from February 2022 to November 2023 were added to the study according to the same exclusion criteria.

Image acquisition and reconstruction

All the patients in the training dataset avoided strenuous exercise for 24 h prior to the PET/CT scan and fasted for at least 6 h prior to the scan. The scanner model was a Discovery MI PET/CT (GE Healthcare, Milwaukee, USA) installed in the Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences. The patient’s blood glucose levels were below 8.0 mmol/L. The scan protocol was as follows: First, the patients underwent a breath-hold CT of the chest with a tube voltage of 120 kV, tube current 10–220 mA, pitch 1.375:1, and a noise index of 20. The chest region covering an axial field-of-view of 20 cm was then scanned immediately after 18F-FDG injection through an intravenous indwelling needle. Dynamic list-mode PET acquisition was performed immediately after the injection. The total duration of the scan was 60 min, with the data divided and reconstructed into 27 frames (6 × 10 s, 4 × 30 s, 4 × 60 s, 4 × 120 s, and 9 × 300 s). Each frame was reconstructed to a matrix of 256 × 256 × 71 voxels using the block sequential regularized expectation maximization algorithm (25 iterations and 2 subsets, post-smoothing). Attenuation and scatter corrections were performed during the reconstruction using CT-based attenuation maps. All the resulting dynamic frames were then converted to the standardized uptake value (SUV) by normalizing the measured counts to the individual’s dose and weight.

For the testing dataset, 42 dynamic scans were acquired on a uEXPLORER PET/CT (United Imaging Healthcare, Shanghai, China) at Henan Provincial People’s Hospital. The scan workflow and data formatting were as follows: A CT scan was performed for attenuation correction, followed by a 60-minute list-mode acquisition, initiated by an intravenous bolus injection of 18F-FDG at the ankle. List-mode data were binned into 66 frames (24 × 5 s, 6 × 10 s, 6 × 30 s, 6 × 60 s, and 24 × 120 s) and then reconstructed on the scanner workstation into a matrix of 192 × 192 × 673 voxels using a 3D ordered subset expectation–maximization algorithm (3 iterations, 28 subsets, with TOF and PSF and post-smoothing). The data corrections and activity normalization were performed similarly in the training dataset. More details can be found in [25].

PET data analysisImage processing

The contours of the descending aorta and lesions were outlined for each scan. A 3D mask in the CT image was created by segmenting the aorta using an automatic segmentation tool, Totalsegmentator [26]. The lesions were manually delineated layer-by-layer on the last frame of dynamic images by two nuclear medicine physicians. Based on each delineated region we calculated the regional TAC as,

$$\:}_}\left(\text\right)=_=0}^}}_}\left(\text\right)$$

(1)

where \(\:}_}\)(t) represents the average activity and \(\:}_}\)(t) its activity at time point t. For the descending aorta, its TAC was treated as the image-derived input function (IDIF) and denoted as \(\:}_\text\text\text}\left(\text\right)\) [27].

We proposed to decompose \(\:}_}\)(t) at each region into compartments with different characteristics by applying kinetic modeling (Fig. 2). The irreversible two-compartment model was used based on Eqs. 2–4, in which \(\:}_\), \(\:}_\), and \(\:}_\) represent the rate constants for 18F-FDG transport into the tissue, its extraction from the tissue, and its phosphorylation, respectively. The radioactivity signal in the tissue is attributed to three components,

$$\:}_}\left(\text\right)=\left(1-}_}\right)}_}\left(\text\right)+\left(1-}_}\right)}_}\left(\text\right)+}_}}_}\left(\text\right)$$

(2)

where \(\:}_}\) is the volume of vascular, \(\:}_}\left(\text\right)\) represents the concentration of blood 18F-FDG, \(\:}_}\left(\text\right)\) represents the concentration of free 18F-FDG, and \(\:}_}\left(\text\right)\) is the concentration of metabolized tracer. The radioactivity signal at the compartments \(\:}_}\left(\text\right)\) and \(\:}_}\left(\text\right)\) can be calculated from the partial differential equations,

$$\begin\frac}}_}}\left( } \right)}}}}} = }_1}}_}}}\left( } \right) - \left( }_2} + }_3}} \right)}_}}\left( } \right) \\ \end $$

(3)

$$\begin\\\frac}}_}}\left( } \right)}}}}} = }_3}}_}}\left( } \right) \\ \end $$

(4)

Fig. 2figure 2

Details of “Feature extraction” block in Fig. 1. The TAC signals of the lesion were decomposed to blood, free, and metabolism components. The features were calculated, and key features (in red) were identified with the Least Absolute Shrinkage and Selection Operator

Feature extraction

As illustrated in Fig. 2, for the free activity signal \(\:}_}\left(\text\right)\) and vascular activity signal\(\:}_}\left(\text\right)\), we performed feature extraction based on the following definitions:

(1)

Time to peak (tpeak): The time taken to reach maximum radioactivity.

(2)

Maximum uptake (Cpeak): The highest 18F-FDG uptake over a period.

(3)

Area under the curve (AUC): Reflects the total FDG uptake of the target during the period,

$$} = \mathop \smallint \limits_0^ }\left( } \right)}$$

(5)

(4)

Slope from start to peak (Slope0-peak): Reflects the rate FDG accumulates in the tissue,

$$}}_}}} = \frac}\left( }} \right) - }\left( 0 \right)}}}_}}}}}$$

(6)

(5)

Slope from peak to end (Slopepeak-60): Reflects the rate FDG is cleared or metabolized from the tissue,

$$\:\text\text\text\text}_\text\text\text-60}=\frac\left(60\right)-\text\left(\text\text\text\text\right)}}_\text\text\text}}$$

(7)

For the metabolism signal \(\:}_}\left(\text\right)\), which has a stable increasing pattern compared to that of the other curves (Fig. 2), we chose to extract two parameters, Slope0 − peak and \(\:}_\text\text\text}\). These parameters contain information similar to SUVmax. In total, 12 features were computed for a given region: 5 for \(\:}_}\left(\text\right)\), 5 for \(\:}_}\left(\text\right)\), and 2 for \(\:}_}\left(\text\right)\). We utilized regularized Least Absolute Shrinkage and Selection Operator (LASSO) to evaluate the relationship between each feature and the classification. This helped identify which features significantly impacted the target variable, thereby guiding the selection of the most relevant features for machine learning.

In addition, for each patient scan, conventional uptake parameters were evaluated for comparison. The SUVmax of a lesion was calculated based on the delineation in the last PET frame of image. After performing kinetic modeling, the lesion Ki value was determined based on the fitted dynamic parameters defined in Eq. 3 and 4 [6, 28],

$$\:}_}=\frac}_\text3}}_+}_}$$

(8)

Machine learning model

Using the identified key dynamic features, we trained and validated the model with scans from the Cancer Hospital & Shenzhen Hospital, which included 187 patients. The dataset was imbalanced, with a disproportionate number of benign and malignant cases, leading to potential bias in the machine learning models where the minority class was under-represented and prone to misclassification. To address this issue, smooth bootstrap oversampling was applied to augment the data [29]. This method determined the correlation between the extracted features and generated new data points by adding noise to expand the dataset beyond the original points,

$$\:}_\text\text}=\text+\left(\text\right)$$

(9)

where \(\:\left(\text\right)\) represents a noise matrix, which, when combined with the original data point, resulted in a synthetic data point. The noise is Gaussian with mean zero and a standard deviation proportional to the intra-class variance to maintain realistic variability. We ensure that the synthetic samples remain within the physiological range of observed data by constraining augmentation within predefined percentile bounds. The potential effect of this noise on the synthetic data distribution has been analyzed, and additional details are included in Sect. 1 in the supplement file.

After oversampling, we prescreened appropriate models for further analysis by comparing a range of machine learning methods, such as linear models, support vector machines, neural networks, and ensemble methods (not shown). Based on the current dataset, we selected bagging because of its superior performance. By generating several different subsamples from the original datasets, multiple independent decision tree models were trained, ultimately obtaining a predictive model based on a voting mechanism [30]. The identified dynamic features were combined to differentiate patients with either benign or malignant lesions. By weighing the TAC features based on the bagging classifier, the prediction score \(\:\text}_\text\text}\) was calculated for each subject to quantitatively evaluate their characteristic profile shown in Eq. 10,

$$\:}}_}}} = \frac}}\sum } = 1}^}} }_}} \cdot \:}}_}}$$

(10)

where i is the index of features, N is the total number of features, PSi is the prediction score for i-th feature.

For the dataset from the Cancer Hospital & Shenzhen Hospital, we performed a five-fold cross-validation for the trained classifier. The data were divided equally into five parts, one for each fold. Four folds were used for training and one for testing. This process was iterated through all the folds, with each fold having the opportunity to be tested once. The final performance metric was the average of all the folds tested. In addition, the trained model was applied directly to an external dataset from Henan Provincial People’s Hospital with the objective of validating the robustness of the model.

Statistical analysis

All the statistical analyses were conducted using Python (version 3.8.0). Continuous variables with a normal distribution were expressed as mean ± standard deviation, with the ranges reported. The model performance for binary classification between benign and malignant cases was evaluated using the ROC curve. The DeLong test was used to compare the ROC curves to determine whether there were significant differences. Kruskal-Wallis tests were conducted to compare the two groups, while Cohen’s effect size was used to measure the separation effect between groups. A significance threshold of p < 0.05 was selected. A beeswarm and a bar plot, based on SHapley Additive exPlanations (SHAP) [31], were shown to rank the most important features in the sequence of absolute mean values.

Table 1 Detailed information on patients - training (testing)

Comments (0)

No login
gif