Therefore, the study aimed to investigate whether a novel approach based on adaptive IoU thresholds would provide better localization and detection accuracy. Our hypotheses were that (1) lower erosion detection thresholds in RA would result in higher detection performance, analogous to previous studies, and (2) that detection performance, as well as joint localization accuracy, could be substantially increased with a new adaptive IoU method.
4. DiscussionIn this study, we successfully presented a new method to substantially improve the recognition accuracy of small objects by adaptively adjusting the IoU thresholds during the training process.
In recent years, the research field of artificial intelligence has expanded in all areas of medicine, including radiology and rheumatology [11]. Especially in image processing, innovative techniques have shown great promise in improving and speeding up clinical workflows and reducing the workload of medical staff. Given the clinical experience, requirements, and human need to monitor DL pipeline decisions, we decided to implement joint erosion classification using a RetinaNet. Compared to feedforward neural networks (FNN) that only have a vector of classifications at the end of the convolutions, for example, to decide if the patient has corona on the CT image or not [14], the RetinaNet allows a visual representation of which region the respective decision is based on. The radiologist or rheumatologist is thus able to verify and confirm the results.Although joint damage is increasingly assessed with echography and MRI examinations, radiographs still provide a comprehensive or panoramic view of joints and are the clinical standard for classifying RA stages [16]. Deep-learning algorithms could greatly improve the clinical assessment of radiographs in many cases, e.g., Pneumonia detection in chest X-ray images or bone lesion detection in musculoskeletal radiographs [34,35]. In addition, new radiographic findings of joint destruction could be discovered. Recently, numerous studies have reported that Deep Learning or CNN was used to assess joints or bones. In this regard, different types of osteoarthritis have been investigated, including osteoarthritis of the hip [36], and osteoporosis of the knees [37], but also the assessment of bone age [38] has been the focus of the studies. However, these previous studies considering large joints have limited applicability to RA as a polyarthritis with central involvement of small joints. Our study overcomes the difficult task of identifying small joints, thus closing the gap in RA joint classification.In our study, we observed a dependence between detection accuracy and the IoU thresholds used, analogous to Yan et al. [22]. Furthermore, no trained RetinaNet with fixed IoU values over the training epochs achieved sufficient accuracy for clinical applicability.In particular, many joints of different sizes are located close to each other in the carpal region. Analogous to the study of Hirano et al., who achieved a localization accuracy of 95.3% for the finger joints using a two-stage approach [16], which is comparable to our study, the intercarpal joints tended to be neglected in their study because these areas are complex and have a closer spatial relationship. We observed similar results with single-stage RetinaNet without adaptive adjustment of the IoU threshold. However, the final model with adaptive IoU adjustment was able to capture complex regions by adaptive adjustment during training. Therefore, it can be assumed that adaptive adaptation is not only suitable for small objects but also of interest for complex structures with close spatial relationships.Due to the lack of localization and low accuracy in erosion detection, none of the models we tested without adaptive IoU values achieved sufficient accuracy for routine clinical use. In comparison, with the proposed adaptive approach and end-Pos-IoU\end-Neg-IoU values of 0.4\0.3 and 50 adaptive epochs or end-Pos-IoU\end-Neg-IoU values of 0.5\0.3 for 100 adaptive epochs, an accuracy of more than 94% was achieved with an mAP of 0.81 ± 0.18 (50 adaptive epochs) and an mAP of 0.79 ± 0.22 (100 adaptive epochs). These results are comparable to the repeatability of an experienced rheumatologist (mAP = 0.79 and accuracy 88.5%) within one evaluator.
Similar results were observed by Wang et al. in their study on JSN classification in patients with RA [22]. They achieved only a maximum mAP of 0.71 using the classical You only look once (YOLO) version 4 approach. Their proposed adjustment of error functions based on the distance to the GT box, loss generalization, consideration of aspect ratios, and separation of hand and finger joints increased the performance to mAP = 0.87 for two-hand radiographs. However, the performance was determined using validation data rather than test data, and that hands in advanced stages of degeneration were excluded, making a comparison difficult. Nonetheless, our proposed fit is straightforward, requires no prior assumptions, and is extensible to any data set. In addition, we could classify both wrists and finger joints simultaneously in models without requiring additional computational steps. This significantly reduces the computational power and, thus, the availability of usable hardware in the clinical setting. On a standard workstation without GPU used in clinic, a complete evaluation requires only 5 s, which allows a significant acceleration of the clinical routine.Our study, as well as the study by Wang et al., impressively show that the recognition accuracy can be significantly improved by adjusting the loss as a function of spatial relationship compared to previous RA studies, in which only recognition accuracies between 70.6 to 77.5 could be achieved [16,39], we targeted mAP values of 0.81 and 0.87, respectively.In addition, it was notable that the best model we trained had a higher agreement with the rater than the rater had with himself at a delay of six months. This could be because the model generalized the subjective decision-making of the rater for the first time. Nevertheless, the RetinaNet sometimes differed from the rheumatologist by more than one SvH score. In contrast, at six months, the rheumatologist differed from his previous evaluation by no more than one score. Here, the model tended to classify joints with a score of 1–4 as score 0, which could be due to the class imbalance of the individual scores. While score 0 was present in 75.26% of the joints, scores 1–4 were present in only 3.13–6.9% of the joints.
Furthermore, our study shows that the medical care of RA patients can be optimized in terms of time by using deep learning frameworks. Experienced rheumatologists need 9 ± 13 min for a complete erosion history and documentation. In contrast, the RetinaNet we used took about 5 s for equivalent documentation. This time savings could allow physicians to spend less time on documentation alone in the years to come. In this way, rheumatologists can spend more time with their patients and perform tasks, such as face-to-face discussions with patients about clinical problems and limitations, that cannot be performed equivalently by DL frameworks. Nevertheless, some limitations have to be mentioned. First, the number of patients was limited, mainly due to the fact that there was no freely available data set. Consequently, we need to prepare our own dataset, which is time-consuming work. Second, we only examined CR from Siemens Healthineers. However, compared to MRI measurements in which numerous different scanner coil configurations are available, X-ray images, on the other hand, can be considered comparatively uniform. However, this study did not consider the effects of variability between different providers, platforms, and institutes. In addition, the effects of rings or other interfering objects on the accuracy of the assessment were not investigated. Therefore, further studies are needed to validate its applicability across multiple institutions and other x-ray manufacturers. Third, our proposed approach was only studied for one retinal network. Although retinal networks have been shown in numerous studies to have higher accuracy compared to other network configurations such as YOLO, single-shot multi-box detectors (SSD), etc. [18,40], further studies are needed to investigate the benefits of adaptive adjustment of IoU thresholds as well as to evaluate the different model types in assessing erosion values. Fourth, the paragraph we gave was applied exclusively to images in which all objects were small but of comparable size. Its usefulness for classification tasks in which objects of different sizes are to be detected must therefore be investigated in subsequent studies.
Comments (0)