Artificial intelligence-based analyses of varus leg alignment and after high tibial osteotomy show high accuracy and reproducibility

The most important finding of the present study was that the AI-aided software for automated long-leg alignment measurements produced reliable results for varus malaligned knees pre- and postoperatively after high tibial osteotomy. Although manual readers ranked higher than the AI-driven software, the discrepancies were below 0.5°, which are minor and would not alter clinical decision making.

A detailed deformity analysis to identify varus malalignment is obligatory in patients with medial knee osteoarthritis or patients that require cartilage or medial meniscal repair. When malalignment is addressed concomitantly clinical outcomes after cartilage repair and meniscal repair improve significantly [1]. Analysis of the bony geometry in the coronal plane is usually performed on weight-bearing anteroposterior long-leg radiographs. Manual measurements are time-consuming and can show high inter- and intrareader variability, depending on the experience and fatigue of the observer [3, 25]. Agreement is excellent for HKA and MAD, indicated by an intraclass correlation coefficient of over 0.9 [4]. However, agreement for mLDFA, MPTA and JLCA is reported to be worse [3].

AI can support radiologists and orthopedic surgeons in analyzing radiographs automatically. The literature reports that various AI models have been introduced and tested. Due to the heterogeneity of algorithms applied, varying experience and background of clinical readers as well as different data sets used for performance testing, comparability between studies is limited. However, AI-based analyses offer several advantages, including improved accuracy, time savings, reproducibility and objectivity [22]. However, concerns regarding the application of AI for analyses of radiographs need to be recognized. These include limited transparency and interpretability of AI models, where the internal decision-making process might not be easily understandable. Second, AI algorithms require large amounts of high-quality training data [5]. If these data are biased or unrepresentative for a certain group of patients, it can lead to poor performance of AI models. Hence, it is important to address issues related to data bias to ensure equity and avoid under-representation of certain demographics [6]. Furthermore, effective integration of AI software into the clinical workflow is crucial. For the practical use and application of these models, seamless integration into existing systems (e.g., picture archiving and communication systems) is of utmost importance.

The evidence for AI-aided analyses of long-leg radiographs is rapidly growing. Different study groups developed and validated various AI algorithms for automated measurements on LLRs [13, 19, 23].

The software used in this study (LAMA™) is a commercially available FDA- and CE-marked software that has been studied for native LLRs and implant alignment measurement after TKA [21, 23]. Simon et al. performed a retrospective single-center analysis of 295 native LLRs, where they compared AI measurements with manual readers that constituted a “ground truth” [23]. AI produced an overall accuracy of 89.2% compared with the manual measurements after exclusion of radiographs with metalwork and postoperative images. The mean absolute-deviation for angles was 0.39°–2.19°and 1.45–5.00 mm for length measurements. The intra-class-coefficient (ICC) showed good reliability in all lengths and angles according to Koo et al. (ICC ≥ 0.87). The equivalence-index (γ) was between 0.54 and 3.03° for angles and − 0.70 to 1.95 mm for lengths.

The evaluation of LLRs by LAMA™ after TKA showed a high reproducibility and reliability [21]. Correct detection of femoral and tibial components was achieved in 92.1%. Nevertheless, performance was altered in cases of constrained implants, where landmark setting failed in 12.5%. Furthermore, Huber et al. used LAMA™ analyses of preoperative LLRs in osteoarthritic knees prior to TKA to perform functional phenotype and coronal plane alignment of the knee classifications [11]. The authors found gender-specific differences with significant differences between men and women for all radiographic parameters.

However, to our best knowledge this is the first study investigating the performance of an AI application for varus malalignment and after HTO. Especially in the case of osteotomies and joint preserving surgeries a thorough analysis of the leg alignment and detection of the bony deformities is indispensable. The measurements are usually performed manually on long-leg radiographs using a medical imaging software. Still, manual landmark setting showed high intra- and inter-reader variability and poor reproducibility [3, 10, 20]. Furthermore, studies demonstrated that the reliability is affected by the experience of the observers [25].

Our results show mean absolute differences between LAMA™ and mean manual observer measurements of 0.5° or lower for all measurements. Although SUCRA plots show low probabilities that the AI software ranks better than manual readers, except for the AMA, there were no statistically significant differences between manual measurements and the AI-based analyses, neither in native radiographs before surgery nor after osteotomy. The detected differences between the AI and manual measurements were minor and would not influence the clinical decision making-process. The advantage of the AI analysis is the immediate availability of measurement data and detailed information about the leg alignment. The data are automatically evaluated in less than half a minute and immediately available to the treating physician. This instant information together with the reported accuracy and reproducibility could offer advantages in clinical practice, especially in determining the indication for osteotomy and controlling the correction after HTO. The potential clinical relevance includes higher reproducibility, irrespective of the observer’s experience level or fatigue and improved and prompt visualization for patients. The graphical report provided by the AI software highlights important findings and can be used for patient information and education. Furthermore, using the AI-based software leg alignment measurements could easily be available for every long-leg radiograph without the need for additional personnel resources. Hence, even patients with mild varus alignment might be identified faster and directed towards potential joint-preserving therapy.

Previous studies have demonstrated high accuracy of AI-aided analyses in native LLRs for leg length and alignment [13, 23]. However, this is the first study investigating an AI application for automated measurements specifically in a population of patients with varus malalignment that underwent HTO. Our results confirm the feasibility of fully automated measurements and the high accuracy in native radiographs. In the preoperative measurements differences between LAMA™ and manual observers were as low as 0.21° or lower. Despite our findings indicating a low likelihood of the AI outperforming human readers in terms of absolute deviation from the median, the impact of these deviations on clinical decision-making would be minimal. Consequently, the application for long-leg analysis in patients with varus leg alignment is feasible for clinical practice. Our findings confirm a considerable intra-reader variability for manual measurements that has been reported in the literature. While HKA and MAD showed intra-observer ICCs of 0.98 or higher, repeated measurements for mLDFA and MPTA showed a higher variability. In accordance to previous studies, JLCA showed by far the highest variability with ICCs demonstrating moderate to excellent reliability. In contrast, repeated measurements for AI-aided measurements demonstrated perfect reproducibility with an ICC of 1.0 for all measurements preoperatively and after HTO. These excellent results for measurements postoperatively after HTO showed no significant differences compared to manual measurements. The analysis of postoperative X-rays after osteotomy might pose a challenge to the software due to altered bone morphology and the presence of osteosynthesis material. In our study, the AI software showed erroneous results in 9/1140 measurements (0.79%). These incorrect measurements were found on two native preoperative and three postoperative radiographs. The detailed error analysis revealed that an incorrect landmark placement was the primary cause for erroneous results, both pre- and postoperatively. In two of the postoperative cases, the incorrect landmark setting occurred in the proximal tibia, where the proximal tibial joint line was placed at the plate medially rather than the tibial plateau. Naturally, this resulted in significant deviations for JLCA and MPTA. In the third postoperative erroneous case the distal femoral joint line was incorrectly marked and JLCA and mLDFA showed implausible values.

Preoperatively, in one case the distal femoral joint line was incorrectly marked due to advanced medial OA, resulting in significant deviations for JLCA and mLDFA. In the second erroneous preoperative image, the calibration ball was incorrectly identified resulting in an erroneous estimation of the magnification factor and incorrect MAD values, while angle measurements were not altered. It needs to be emphasized that all results provided by an AI software in the medical field need to be confirmed by medical professionals. The erroneous measurements in this study are illustrated in the supplement material and were clearly discernible for physicians with experience in musculoskeletal imaging.

This study has some limitations that need to be considered when interpreting the results. First, sample size was relatively small, which may limit the generalizability of the findings in larger populations. Second, the accuracy of the automated analysis is dependent on the quality of the radiographs, and variations in image quality across different healthcare facilities may affect the algorithm’s performance. Furthermore, the study compared the automated analysis with manual measurements that are not infallible and may themselves have inherent limitations.

However, the findings of this study support the use of AI-based analyses for long-leg radiographs in patients with varus malalignment and after HTO. In clinical practice, the use of an AI-based software that offers fully automated measurements could increase the availability of accurate alignment measurements without the need for additional personnel resources. An instant visual report with accurate results accompanying each image could enhance awareness and enable early detection of malalignment. This study demonstrates that the AI-aided software is also applicable for postoperative radiographs following HTO. This enables an accurate analysis of postoperative alignment and can be used for patient education. Furthermore, this technology offers the potential to enhance the precision and reproducibility, mitigating the significant deviations in manual measurements.

留言 (0)

沒有登入
gif