Radiomics-based automated machine learning for differentiating focal liver lesions on unenhanced computed tomography

This retrospective study was approved by the Institutional Review Board (no. 20240045) of our hospital and was conducted in accordance with the Declaration of Helsinki. Patient consent was waived owing to the retrospective nature of this study.

Patients

We included 423 patients with HM, HH, HC, or HA who underwent CT at our hospital (Medical Center A) between January 2017 and March 2023. Additionally, we included 40 patients with similar conditions who underwent CT examination at another hospital (Medical Center B) between January 2022 and March 2023.

The inclusion criteria were as follows: (1) upper abdominal unenhanced and enhanced CT performed simultaneously or no more than 30 days apart, (2) liver-occupying lesions ≤ 10, and (3) good image quality meeting labeling requirements of radiologists.

The exclusion criteria were as follows: (1) lesions < 1.0 cm, (2) patients with iodide deposits from interventional therapy, (3) absence of standard abdominal unenhanced CT, and (4) hepatic malignancy without pathological confirmation. The study flowchart, from data collection to evaluation, is presented in Fig. 1.

Fig. 1figure 1

Flowchart of the study from data collection to evaluation. Pn, Number of patients; Ln, Number of lesions; HC, Hepatic cyst; HH, Hepatic hemangioma; HA, Hepatic abscess; HM, Hepatic malignancy

Hepatic lesion confirmation

Hepatic lesions were confirmed histopathologically through surgery or percutaneous needle biopsy [22]. If pathological results were unavailable, typical MRI or CT findings were used to characterize the lesions [23]. HH, HC, and HA were confirmed by referencing radiologic reports by experienced radiologists and adhering to the following criteria [7, 24]: HH—CT shows a hypodense, well-defined lesion with internal density similar to vessels and peripheral nodule enhancement in the arterial phase, and cardiac filling enhancement in the venous phase [15]; HC—CT confirms water density (attenuation < 20 HU) with clear edges and no enhancement after contrast administration [20]; HA—Imaging findings of HA are closely related to the pathological stage. Unenhanced CT shows heterogeneous, low-density lesions with unclear boundaries during early pyogenic stages. In the suppuration stage, the density is lower than surrounding normal liver parenchyma, with a thin wall and clear boundary [25, 26].

Several types of focal hepatic lesions were classified into benign and malignant groups. The benign group includes HH, HC, and HA, whereas HM belongs to the malignant group [27].

CT image acquisition

CT scans were performed using post-64-detector row CT scanners from Siemens (Somatom Definition Flash, Somatom Force, or Somatom Drive, Forchheim, Germany) and GE (Revolution CT, Discovery CT750 HD, or 64-slice LightSpeed VCT, GE Medical Systems, Milwaukee, WI). Imaging data were reconstructed using a 1 mm medium sharp algorithm. The other scanning parameters included rotation time, 0.5 s; pitch, 1.2–1.375; matrix, 512 × 512; standard resolution algorithms; and tube voltage, 80–100 kV (Somatom Definition Flash or Somatom Force or Somatom Drive) and 120 kVp (Revolution CT, Discovery CT750 HD or 64-slice LightSpeed VCT, GE Healthcare). The tube current was automatically adjusted in the noise index mode. Enhanced scanning was performed by injecting a nonionic contrast agent (iodine content, 320 g/L) into the cubital vein at 2.5–4.0 mL/s; the total calculated dose was 1.5 mL/kg. After the contrast agent was injected, the arterial, venous, and delayed phases were scanned between 25 and 30 s, after 60 s, and after 180 s, respectively.

Clinical information acquisition

Patient demographic and clinical data, including age, sex, and pathological results, were recorded from picture archiving and communication systems; tumor location, mean CT value, size, and morphology were assessed from unenhanced CT images. Tumor location was evaluated on the basis of liver capsule protrusion, and tumor size was measured from the largest boundary of the lesion’s region of interest (ROI) in clinical settings. Tumor morphology was evaluated for regularity (round shape) and boundary clarity. All measurements were completed simultaneously by the labeling radiologists.

Radiologists image evaluation

One board-certified abdominal radiologist and one second-year radiology resident (S.Y.L., seven years of experience in digestive system radiology; L.D.C., two years of experience in digestive system radiology), both blinded to hepatic lesion outcomes, independently reviewed the axial unenhanced CT images of each lesion in the external testing cohort. The radiologists scored the probability of each lesion being a cyst, hemangioma, malignancy, or abscess.

Tumor segmentation

Two radiologists (Y.N., two years of experience in radiology; M.Z.X., six years of experience in radiology) used open-source software (3D Slicer, version 4.13.0; National Institutes of Health; https://www.slicer.org; accessed on August 7, 2021) to manually delineate the volume of interest (VOI) for focal hepatic lesions. At least one type of focal liver lesion was selected for each patient, with the largest lesion of each type chosen for segmentation. The radiologists manually delineated the ROI along the edge of the lesion layer-by-layer on unenhanced CT images, and the VOIs were automatically generated by a computer. The results were reviewed by a senior radiologist (L.M., with > 20 years of experience in digestive system radiology). During segmentation, the corresponding enhanced CT images were used to determine tumor boundaries.

Radiomic feature extraction

Radiomic features were extracted from 3D ROIs using Pyradiomics (version 3.0.1) to comply with the standards of the image biomarker standardisation initiative. Feature selection is a key step in the AutoML, which aims to identify and select those features from the raw data set that have a significant impact on the performance of the model. Through feature selection, redundant features can be reduced, model complexity can be reduced, and model training speed and generalization ability can be improved. MLJAR AutoML is an automated machine learning tool that automates feature selection tasks to simplify machine learning workflows. The radiomics features selected by AutoML, including shape features, First-order feature, Second-order features and Higher-order features. Parameters were set as follows: Spatial Resampling, 1 mm × 1 mm × 1 mm; Intensity Rescaling, 500; and Intensity Discretization, bin width of 25.

Consistency of segmentation and radiomics features

The intraclass correlation coefficient (ICC) evaluated the reliability of radiomics values between the two radiologists. The ICC measures and evaluates interobserver and test–retest reliability. Here, the ICC was calculated using a single measurement, absolute agreement, and a two-way random-effects model. Initially, VOI segmentations in the 227 patients, including 252 lesions, were performed by 2 radiologists. For reliability evaluation, 30 random CT images were selected and analyzed by another radiologist (J.L., with > 10 years of experience in digestive system radiology).

Automated machine learning (AutoML) model design

The mljar-supervised (MLJAR) platform is an AutoML Python package that works with tabular data. It was designed to save time for data scientists. It abstracts a common way to preprocess the data, construct machine learning models, and perform hyperparameter tuning to find the best model [28, 29].

The entire dataset from Medical Center A was randomly split into a training set (n = 176; HC = 54, hepatic malignancy = 62, HA = 28, HH = 32) and an independent validation set (n = 76; HC = 23, hepatic malignancy = 29, HA = 12, HH = 12). Additionally, 33 patients from Medical Center B formed the external testing set (n = 33; HC = 23, hepatic malignancy = 29, HA = 12, HH = 12). The radiomics workflow based on the automated learning algorithm is shown in Fig. 2. Three predictive models were developed—a radiomics model trained on radiomics features, a clinical model using only clinical features, and a fusion model incorporating both features.

Fig. 2figure 2

Flowchart of the study. (a) Radiologists performed tumor segmentation on unenhanced CT. (b) Clinical information acquisition and radiomics feature extraction. (c) Automatic machine learning algorithms used to establish clinical, radiomics, and fusion models and complete predictive evaluation

An AutoML algorithm was designed to operate without human intervention to build the prediction model. It automatically screens features for participating in training, selects the model, and adjusts model parameters dynamically, thus significantly reducing the time and technical cost of the application. During data preprocessing, all features were normalized to zero mean and unit variance, with missing value imputation and conversion to categories handled automatically. The golden feature algorithm was used in the feature selection process. The MLJAR AutoML framework uses numeric features in the golden feature search. From each pair of original features, a new feature was created using the mathematical operators +, -, and /. A decision-tree algorithm was used to assess the predictive power of the newly created features, including only the top new features in the training dataset. The golden feature method maximizes the use of lesion information. In this study, MLJAR AutoML adopted Bayesian Optimization when automatically adjusting parameters: Bayesian optimization is a probabilistic model-based optimization method that uses past evaluation results to guide subsequent parameter selection, approximates the objective function by building an alternative function (probabilistic model), and optimizes the parameters through continuous iteration. The parameter adjustment process of MLJAR AutoML in this study includes the following steps: define the parameter space, select optimization algorithm, combination of evaluation parameters, iterative optimization, model training and verification.

The importance was computed using permutation, with dependence and decision plots for every algorithm available for analysis. Models were trained using various algorithms, including Nearest Neighbors, Linear, Random Forest, Extra Trees, LightGBM, Xgboost, and CatBoost. Hyperparameter optimization was conducted using a random search over defined values, the Optuna framework, and hill-climbing to fine-tune the final models.

Statistical analysis

Statistical analyses were performed using R (version 4.2.2) (https://www.r-project.org), Python (version 3.9.7), and SPSS (version 26) with significance set at P < 0.05. The performance of the prediction models was assessed using several indices with 10-fold cross-validation of the training and validation sets. Receiver operating characteristic curves were also used to assess the overall performance of prediction models, and the area under the curve (AUC) was calculated.

Comments (0)

No login
gif