Automated detection of IVC filters on radiographs with deep convolutional neural networks

This HIPAA-compliant retrospective study was approved by the institutional review boards of both participating institutions, which are academic, tertiary care centers; there were no external funding sources. The key points of the methods are described here; complete methods sufficient for reproducing the work are detailed in the online supplemental methods.

Candidate images from both inpatient and outpatient settings for the primary dataset were identified in our report database using mPower search software (Nuance Inc., Burlington, MA). Two searches were performed, one designed to identify abdominal radiographs where IVC filters were mentioned in the report (presumed positives) and a second designed to simply identify abdominal radiographs (presumed negative controls). Search terms and date ranges were chosen to create a dataset that would include nearly all of the images with IVC filters in our clinical archive. Corresponding images were extracted from the PACS archive.

DICOM images were annotated using the MD.ai annotation platform (MD.ai, New York, New York). Annotation of the complete dataset was performed by an attending interventional radiologist author with 13 years experience. The test partition was also annotated by two attending abdominal radiologist authors with 11 and 16 years experience, respectively. For studies with more than one image, only one representative image, selected by the interventional radiology author, was annotated. All images used were frontal images. For each annotated image, annotators either drew a bounding box around the IVC filter or marked the image as “no filter.” For the multiply annotated test set, final annotations were determined based on the majority annotation. Final bounding boxes were constructed using the mean center location and mean width and height of each annotator’s bounding boxes. The complete primary dataset was randomly divided at the patient level into training, validation and testing partitions consisting of approximately 70%, 15% and 15% of the data.

A secondary dataset, used for external validation, was constructed from images drawn from the clinical archive of a separate institution. A different instance of the same mPower search software, using the same search terms, was used to identify studies. Annotation of this dataset was performed in a custom web-based tool, but the annotation scheme was otherwise the same as for the primary dataset. All images in the secondary dataset were annotated by three radiologists: an attending abdominal radiologist with 12 years experience, an attending neuroradiologist with 7 years experience and a fourth-year radiology resident.

Annotated DICOM images were converted to JPEG format using dcmtk v3.6.2 (Offis, Oldenburg, Germany). The Cascade R-CNN [11] object detection neural network architecture using a ResNet-50 [12] backbone was employed, as implemented in MMDetection toolbox 2.4.0 [13] based on PyTorch 1.6.0 [14]. Training and inference were performed using four NVIDIA (Santa Clara, CA) RTX 2080 Ti GPUs.

Augmentation of the training partition of the dataset was performed by randomly applying transformations to the images. Applied transformations included: horizontal flip, changes in brightness and contrast, rotation in 90-degree increments, and fine rotation (1-degree increments).

Hyperparameter optimization was performed using Optuna 2.1.0 [15] to determine the best values for base learning rate, augmentation probabilities and augmentation extents. 100 iterations of optimization were performed using maximization of the area under the curve (AUC) for the receiver operator characteristic (ROC) of the model on the validation partition of the dataset as the objective function.

Using the hyperparameter values that produced the best results during hyperparameter optimization, a final model was trained on the combined training and validation partitions of the dataset. Nine additional models were trained using the same hyperparameter values but different random seeds to facilitate uncertainty estimates in the results. Final model performance was calculated based on performance on the primary internal and secondary external test sets. Confidence intervals on proportions were calculated using Chi-squared statistics using R v4.0.0 [16].

留言 (0)

沒有登入
gif