Assessment of axial spondyloarthritis activity using a magnetic resonance imaging-based multi-region-of-interest fusion model

This study was reviewed and approved by the Institutional Research Ethics Board, which adheres to the tenets prescribed by the Declaration of Helsinki (institutional review board M2022399, National Clinical Trial number MR-11–22-009236). Being retrospective research, the need for signed informed consent was waived. The workflow of the radiomics analysis conducted in this study is illustrated in Fig. 1.

Fig. 1figure 1

Workflow of the study. Manual segmentation and automatic expansions were performed on SIJ-MRIs to generate two ROIs, Circle and Facet. Different features were extracted, then selected by various methods, including cross-validation, LASSO, and MSE. Different predictive models were constructed using these features. Their performance was assessed using the ROC, decision, and calibration curves

Dataset

Approval was obtained for the use of patient data. We identified 1096 consecutive patients who had undergone MRI for SIJs in our institution from April 2019 to September 2021. Inclusion criteria were as follows: (a) patients over 18 years old and (b) seeking medical attention for suspected axSpA, as diagnosed using the 2019 recommendations from the Assessment of SpondyloArthritis International Society (ASAS) working group.

Of the 1096 patients, 987 were excluded from this study for the following reasons: (a) absence of axSpA or lack of a definite clinical axSpA diagnosis, (b) sacroiliac MR scans having incomplete coverage or sequences, and (c) incomplete medical records, including a lack of CRP, ESR, BASDAI score, and ASDAS score.

Disease activity was assessed by two rheumatologists using the ASDAS calculated with CRP (ASDAS-CRP). An ASDAS-CRP score of greater or equal to 2.1 was considered high activity.

The final cohort constituted 109 patients, of which 68 were non-active axSpA patients and 41 were active axSpA patients. The flowchart for the inclusion and exclusion of patients is shown in Fig. 2.

Fig. 2figure 2

Flowchart for the inclusion of patients

We randomly selected 80% of the samples as the training set, and the remaining 20% as the test set. Baseline characteristics of the final cohort can be found in Table 1.

Table 1 Baseline characteristics of patients with active and non-active axSpA in training and testing setsImage acquisition

All MRI examinations were acquired using three 3.0-T scanners, two identical Discovery 750w (GE Healthcare, Milwaukee, WI) and one Discovery 750 (GE Healthcare, Milwaukee, WI). For both models of MR scanners, the same institutional protocol was used for SIJ examinations, including axial and oblique coronal (parallel with the long axis of the sacral bone), fat-suppressed (FS) T2-weighted (T2w) fast spin-echo (FSE) sequences as well as oblique coronal T1-weighted FSE sequences, and oblique coronal proton density-weighted FSE sequences.

We chose to acquire oblique coronal FS T2-weighted FSE images (scanning parameters: repetition time/time to echo = 3200/85 ms, slice thickness/gap = 4/0.5 mm, field of view = 30 × 30 cm, matrix size = 320 × 256, number of excitations = 4). This is because among the sequences included in the institutional protocol, BME is the most prominent in T2w images. Besides, in the most recent suggestion from the ASAS working group [11, 20], only the abovementioned T2w FS sequence is recommended. Images were retrieved from the Picture Archiving and Communication System in the Digital Imaging and Communications in Medicine format.

Segmentation and labeling of ROIs

Two different heterogeneous regions of interest (CircleOriginal and FacetOriginal) were manually segmented and labeled using Research Portal V1.1 (United Imaging Intelligence, Co., Ltd., Shanghai, China). CircleOriginal and FacetOriginal were segmented and labeled back-to-back on oblique coronal MR images by two radiologists with 2 and 11 years of experience, respectively (Reader A and B). Prior to segmentation, both radiologists underwent training to ensure that the ROIs were drawn to the required standard. Boundaries of the ROIs were validated and redrawn by Reader B when necessary to reduce intra-reader bias and to better adhere to the segmentation criteria.

Next, CircleOriginal was automatically expanded by 15 mm in all directions to generate Circle, which represents a pair of circular ROIs of a defined size located within the SIJ and encompasses as much of abnormal findings as possible. Circle was segmented and expanded using a method inspired by a previous study [21].

FacetOriginal was further automatically expanded by 10 mm to generate Facet, a ROI that spans six consecutive slices in the oblique coronal plane, including articular space and subarticular region to 1 cm. Criteria for segmentation and expansion of Facet were performed as done in a previous study [19].

All the expanded ROI boundaries were further validated and corrected by Reader B, to reduce possible bias and to better adhere to the segmentation criteria. An example of SIJ-MRI segmentation and expansion is shown in Fig. 3.

Fig. 3figure 3

SIJ-MRI segmentation and expansion for four ROIs: CircleOriginal (A), Circle (B), Facet (C), and FacetOriginal (D). MR, magnetic resonance; SIJ, sacroiliac joint

Data preprocessing

The voxel value range of MRI images varies significantly across different machines and imaging modalities. To reduce the impact of voxel value outliers, we sorted all voxels’ values in each image and truncated them to the range of 0.5 to 99.5 percentile.

Radiomics featuresFeature extraction

We extracted corresponding radiomics features for each ROI in this study. Prior to extraction, various filters were applied to ROIs, including additive Gaussian white noise, Laplacian of Gaussian, shot noise, wavelet, binomial blur, recursive Gaussian, discrete Gaussian, curvature flow filters, and several simple filters, such as BoxMean and Normalize. All features were extracted using the PyRadiomics package (version 3.0.1) [25], and most of them adhered to the feature definitions provided by the Imaging Biomarker Standardization Initiative [26].

The handcrafted features were classified as (I) geometry, (II) intensity, and (III) texture. Geometry features capture the three-dimensional shape of the tumor. Intensity features represent the first-order statistical distribution of voxel intensities within the tumor. Texture features describe patterns or the second- and higher-order spatial distributions of intensities. Various methods were used to extract texture features, including the gray-level co-occurrence matrix, the gray-level run length matrix, the gray-level size zone matrix, and the neighborhood gray-tone difference matrix.

To evaluate the performance of the multi-ROI model, we fused the features extracted from each ROI to obtain fused features. We compared the performance of modeling using single-ROI features and fused features in our experiments.

Feature selection Intraclass correlation coefficient (ICC)

To assess the robustness of image features against segmentation uncertainties, test–retest and inter-rater analyses were conducted. Test–retest analysis involving two segmentations was performed by one rater for each of the randomly selected 32 patients, while inter-rater analysis involving independent segmentations of ROIs was performed by two raters for another set of 32 randomly chosen patients. In the test–retest analysis, overall ICC was high for both Reader A (an average of 0.965 and 0.942 for Circle- and Facet-based features) and Reader B (an average of 0.963 and 0.956 for Circle- and Facet-based features). For inter-rater analysis, ICC was also high, averaging 0.965 and 0.944 for Circle- and Facet-based features respectively. The ICC was used to evaluate the features extracted from the multiple-segmented subregions. Features with an ICC ≥ 0.85 were considered robust and unaffected by segmentation uncertainties.

Feature pre-fusion

In this study, the performance of features derived from two different MRI ROIs, Circle and Facet, were compared. To assess whether features chosen from multiple ROIs outperformed those from a single ROI, prior further feature screening features extracted from the two ROIs were combined to obtain a fusion feature set. The remaining processes were the same as those for features extracted from one ROI and followed the same parameter configuration.

Standardization screening

Following the initial screening using ICC, all features were standardized using the Z-score method to ensure a normal distribution. Subsequently, p-values were calculated for all imaging features using the t-test. Only radiomic features with a p-value < 0.05 were retained for further analysis.

Correlation screening

Highly repeatable features were further analyzed using Pearson’s correlation coefficient to identify highly correlated features. To avoid redundancy, only one feature was retained when the correlation coefficient between any two features exceeded 0.9. To maximize feature representation while minimizing redundancy, a greedy recursive deletion strategy was used to remove features with the highest redundancy in the current set at each iteration. Additionally, the minimum Redundancy Maximum Relevance algorithm was used to further reduce feature redundancy. For each ROI, only the 64 most important features were retained.

Least absolute shrinkage and selection operator (LASSO) screening

The final features used to construct the radiomics signature (Rad_Sig) were selected using the LASSO regression model. LASSO shrinks regression coefficients, setting many irrelevant features’ coefficients to zero based on the regularization weight lambda (λ). The optimal λ value was determined using tenfold cross-validation with the minimum criterion, and λ with the lowest mean standard error was selected.

Construction of the assessment model

After feature screening using LASSO, various machine learning models, including Logistic Regression, Support Vector Machine, Random Forest, ExtraTrees, and eXtreme Gradient Booting (XGBoost), were used to construct the assessment model.

In the training set, we performed a fivefold cross-validation and utilized the Grid Search algorithm for hyperparameter optimization. The best model parameters were selected based on their performance in the test set. Finally, with fixed hyperparameters, we trained the model using the entire training set (80%) and evaluated its performance in the remaining data set.

Statistical analysis

We used Statistical Package for the Social Sciences 24.0 (IBM SPSS Inc., USA) for statistical analysis. Kolmogorov–Smirnov method was used to test the normality of the measurement data. Quantitative data were expressed as mean ± standard deviation (x̄ ± s). Chi-square test was used for comparison between the two groups. Student t-test was used for quantitative data. p < 0.05 indicated a statistically significant difference. The receiver operating characteristic (ROC) curve was used to analyze the performance of each model in the training set and test set. The efficiency of each model is evaluated by accuracy, the area under the curve (AUC), and its 95% confidence interval (CI), sensitivity, specificity, F1 score, true positive rate (TPR), false positive rate (FPR), and Youden index. Calibration curves were generated to evaluate the calibration performance of the model, and the clinical utility of the model was determined by decision curve analysis (DCA).

留言 (0)

沒有登入
gif