Deep Learning-based Diagnosis and Localization of Pneumothorax on Portable Supine Chest X-ray in Intensive and Emergency Medicine: A Retrospective Study

Main Findings

Both detection- and segmentation-based systems achieved excellent performance, which was comparable to radiology reports or human annotators. Like human readers, the diagnosis and localization performance of the CAD systems might be influenced by the size of the pneumothorax.

Annotation of Pneumothorax on SCXR

Most public datasets [24, 25] rely on chest X-rays with image-level labels of common thoracic diseases that are text-mined from radiology reports and are inherently inaccurate [26, 27]. For example, for ChestX-ray14, a study suggested the agreement regarding pneumothorax diagnosis between the image-level label and radiologist review was only about 60% [28], which may lead to poor model generalizability [29].

On the other hand, pixel-based annotation may effectively facilitate the development of pneumothorax-detecting algorithms [30]. For standing CXR, the pneumothorax lesion could usually be delineated [31, 32] by the visceral pleural line in the apicolateral space [33]. Nonetheless, when patients are in the supine position, the spaces where the air is trapped differ from those in the standing position [34]. Adopting segmentation masks to delineate the pneumothorax lesion on SCXR might raise a concern that only those images with clear pleural lines were annotated, leading to selection bias.

Consequently, we used bounding boxes for annotation, allowing for localization of pneumothoraces without distinct pleural lines. Nevertheless, in some lesions, such as those spanning lung apices and basal aspects, the use of bounding boxes might encompass nearly an entire unilateral lung region. This problem was overcome by dividing images into 10 × 10 grids, permitting bounding boxes to accommodate lesions of varying shapes.

Dataset Selection for Training and Testing Models

Considering the low (0.5-3%) incidence of pneumothorax cited in epidemiologic data [35, 36], use of a consecutive random SCXR sampling for model development may result in class imbalance. Such imbalance may bias CAD systems towards learning features of a more common class (i.e., pneumothorax-negative images) and distort various evaluation metrics [37]. Thus, we employed a case-controlled design [38, 39] to achieve greater balance in training and testing datasets. As shown in Table 1, the higher proportion (31.2%) of images annotated as pneumothorax in the NTUH-1519 dataset may enable our CAD systems to better learn pneumothorax-related features; whereas the lower proportion (11.8%) in NTUH-20 fostered performance testing on a plane approaching real-world prevalence [35, 36].

In a previous study [29], the accuracy of DL-based pneumothorax detection was shown to significantly decline when testing the algorithm in an external dataset. Concerns over accuracy overestimation and limited generalizability of such algorithms may be mitigated by model evaluation in an independent dataset. However, no datasets dedicated to portable SCXRs were available for our purposes. According to the Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement [40], external validation may use data collected by the same researchers, using the same predictors and outcome definitions and assessments, but typically sampled from a later period (temporal or narrow validation). In our study, the NTUH-20 dataset consisted of SCXRs taken during 2020 at NTUH. Compared with NTUH-1519, NTUH-20 was a chronologically different dataset (2015–2019 vs. 2020), with significant differences (Table 1). According to the TRIPOD statement, the chronologically different testing dataset can be used to verify the external generalizability of the CAD system.

Diagnosis Output Performance

Niehues et al. [41] used portable SCXR to develop a CAD algorithm with excellent performance in identifying pneumothorax (AUC: 0.92, 95% CI: 0.89–0.95). Nonetheless, the thoracic drains were concomitantly present in approximately half of the images with pneumothorax [41]. It is thus conceivable that these drains were misconstrued in the algorithm as a feature of pneumothorax [42]. Rueckel et al. [30] also collected 3062 SCXRs, including 760 images with pixel-level annotations of pneumothorax and thoracic drain. This model also performed well overall (AUC: 0.877) for unilateral pneumothorax detection.

For the present study, however, we excluded images with thoracic drains and used bounding boxes for pixel-level annotation. Both of detection- and segmentation-based systems delivered excellent performances (AUC values > 0.94) in pneumothorax detection. In our study, the architectures of the classification models differed between the two CAD systems as the UNet-based model [20] itself could output both classification results and localization information.

Routine portable SCXR exams are common practice in critical care [43, 44]. Such regular use of portable SCXR exams may partly account for the prolonged turnaround time from image acquisition to interpretation by a radiologist [45]. Our systems may help prioritize portable SCXRs within queues, flagging those to be checked upfront by a radiologist or earmarking treating clinicians for notifications. As shown in Table 1, there was a high percentage of patients receiving tracheal intubation. Early detection of pneumothorax may facilitate prompt life-saving procedures for these patients to prevent serious complications, such as tension pneumothorax.

Localization Output Performance

Using standing chest X-rays, a model devised by Lee et al. [46] has achieved a Dice coefficient of 0.798 in pneumothorax localization. Feng et al. [47] also derived a model able to localize the pneumothorax lesion (Dice coefficient: 0.69). Nevertheless, even though Feng et al. [47] included portable SCXRs in the analysis, the researchers excluded films with only supine signs of pneumothorax, e.g., deep sulcus sign. Another model by Zhou et al. [48], based on frontal chest X-rays alone (no portable SCXRs), could detect pneumothorax with a Dice coefficient of 0.827.

The images of portable SCXR are generally deemed suboptimal for diagnosis. The patients are often unable to cooperate during image acquisition, leading to poor bodily orientation or inspiratory efforts. Compared with standing chest X-rays, they are also inferior in image quality, hindering the diagnosis of pneumothorax due to lesser degrees of resolution and luminance [49]. Furthermore, classic findings of pneumothorax on standing chest X-rays are often lacking on portable SCXRs. Given the more challenging interpretation of SCXRs, past models [46,47,48; 50] may not be suitable for pneumothorax localization on these images.

Both CAD systems we developed (based on object detection or image segmentation) performed excellently in pneumothorax localization, comparable to the level of annotators (Table 3). To the best of our knowledge, our CAD systems may be the first ones capable of localizing pneumothoraces on portable SCXRs. Although the detection- and segmentation-based systems performed similarly in testing, their required computational resources differed substantially (Supplemental Table 5). The detection-based system only outputs approximate positional information with several coordinates of bounding boxes. Logically, its computational demands should be less than those of the segmentation-based system, which provides accurate pixel-wise lesion information. However, the detection-based system must integrate several models for ensemble and thus is more demanding of resources by comparison. Users must take into account specific computational requirements when choosing a preference.

Influence of Pneumothorax Size

Previous studies [42, 51, 52] have demonstrated that model performance (as with human readings) may be influenced by extent of pneumothorax. A model that Taylor et al. [52] devised correctly identified 100% of large pneumothoraces but only 39% of small ones. Similarly, performance levels of our CAD systems declined as pneumothorax size diminished. This is not surprising, because inter-annotator TP- Dice coefficients also fell as pneumothorax size decreased, underscoring the problematic model learning of small-volume lesions. This phenomenon was more obvious for the detection-based CAD system as its lower prediction-ground truth TP-DICE than the inter-annotator TP-DICE (Table 3) may lead to the lower diagnostic performance for small pneumothorax than the radiology reports (Table 2).

Unlike large pneumothoraces, small pneumothorax is apt to be overlooked by clinicians, especially on portable SCXRs, necessitating assistance by CAD systems. Because most patients subjected to portable SCXRs are those susceptible to complications caused by pneumothorax, especially those receiving mechanical ventilation, timely detection is critical to prevent a small pneumothorax from progressing into tension pneumothorax [53].

Future Applications

The CAD system can serve two primary functions: (1) prioritizing the SCXRs and selecting those in question to be checked first by the radiologist or (2) issuing notifications to attending clinicians. When the clinicians examine the diagnosis results, the localization outputs of pneumothorax may pop up to facilitate verification of the results. We present the requirements of computational resources for these two CAD systems (Supplemental Table 5), which can assist healthcare institutions in selecting the most suitable model for deployment. Moreover, in future studies, it is warranted to examine the feasibility of adapting these CAD systems for edge computing and their integration into portable chest X-ray machines, which holds the potential to broaden the CAD systems’ applicability.

Study Limitations

First, because we only have de-identified images available for analysis, we did not know whether patients’ clinical comorbidities may influence the performance of the CAD system. Nonetheless, Table 1 shows there were diverse findings or diagnoses on SCXRs, which might somewhat mitigate this concern. Second, given the low prevalence for pneumothorax [35, 36], we used a case-controlled study design for image collection to ensure sufficient numbers of pneumothorax-positive patients. This design may result in an artificially elevated pneumothorax prevalence in our datasets, compared with real-world settings. We therefore relied on radiology reports or annotators as reader reference points by which to judge CAD system performance. Further prospective studies are warranted to better test performance with real-life pneumothorax prevalence by enrolling consecutive patients from EDs or ICUs on a manageable scale [54].

留言 (0)

沒有登入
gif