N-staging is a critical factor in determining treatment options and predicting patient outcomes. The initial step in this process is identifying all LNs, which can be a tedious and time-consuming task. Our study introduces a deep learning model that enables the accurate detection and segmentation of all pelvis LNs on DWI images. Furthermore, we validated the model’s performance on an external dataset.
A comprehensive evaluation of size, morphological features, signal intensity, and other imaging parameters is essential in the interpretation of pelvis LNs in the context of PCa N staging. The PIRADS guidelines [7] recommend reporting a short diameter greater than 8 mm as a threshold for suspected metastatic LNs. This approach oversimplifies the complex nature of LN metastasis. Of note, LNs with short diameters greater than 8 mm can exhibit benign characteristics, while those with short diameters less than 8 mm may still harbor metastatic cells [8, 9]. Our study developed a model capable of segmenting all visible LNs on DWI images, whether they are healthy or metastatic. Furthermore, we exploited a cutoff threshold for LNs with a short diameter of more than 8 mm, allowing us to assess the performance of our model in detecting suspicious metastatic LNs. In the external validation test dataset, the model achieved a DSC of 0.77 for all LNs and 0.82 for suspicious LNs. The model achieved a sensitivity of 60.1%, PPV of 79.2%, and FP/vol of 0.56 for detecting suspicious LNs at the LN level. The results from our external validation dataset confirmed the feasibility of this method, which could aid in LN staging, quantitative measurements of tumor burden, and image-guided treatment of patients with PCa.
In clinical practice, radiologists commonly focus on measuring and recording the short diameter and volume of the largest LN as it correlates with the N stage of the patient. Therefore, we took this factor into consideration in our study to ensure its practicality. We assessed the model’s ability to detect and segment the largest LNs to enhance the clinical relevance of our analysis. In the external validation test dataset, the model demonstrated a DSC of 0.82 for the largest LNs. At the patient level, the model exhibited a sensitivity of 81.1%, specificity of 75.6%, and positive predictive value (PPV) of 93.2% in detecting patients with suspicious LNs. Furthermore, we leveraged quantitative measurements of the largest LN’s short diameter and volume to automatically generate N-staging, which was then automatically included in the structured report on PCa.
Among neural network structures, fully convolutional networks (FCNs) [15], U-Net [13], 3D U-Ne t [16], and V-Net [10] are the most widely used architectures. The FCN [15], which adopts an end-to-end convolutional neural network and deconvolution for up-sampling, was the first to pioneer image segmentation and deep learning techniques. However, its low sensitivity to image details and tendency to cause partial information loss result in low segmentation accuracy for small structures. Ronneberger et al. proposed the U-Net [13] method, based on FCN [15], which applies a fully convolutional network to medical image segmentation. Unfortunately, FCN [15] and U-Net [13] can only be used for the identification and segmentation of two-dimensional images, whereas 3D U-Net [13] and V-Net [10] can process three-dimensional images. Of the two, V-Net [10] training has become the primary method of medical image segmentation due to its high speed and short completion time. In this study, even with significant individual variation in size, pose, shape, and sparsely distributed location of pelvic LNs, we demonstrate that V-Net’s outstanding performance can be extended to the challenging task of LN segmentation by utilizing an ensemble strategy.
Liu et al. [14] developed a 3D U-Net model that can detect and segment all pelvic LNs on DWI images. The model achieved a high recall value of 0.98 for identifying suspicious LNs. However, their research data are limited and lack external validation. In a similar vein, Zhao et al. [17] presented an innovative autoLNDS model to detect and segment LNs with a short diameter greater than 3 mm on MR examination (T2-weighted imaging and DWI). Their external testing showed that the model achieved a sensitivity, PPV, and FP/vol of 62.6%, 64.5%, and 8.2, respectively, which is comparable to our results. However, their dataset size (293 patients) was smaller than the natural detection task dataset. Their training and internal testing datasets were generated by the same MR vendor from one medical center, which limits the variability of the dataset. In contrast, our model development dataset was generated by eight MR scanners from a single hospital, including 1,151 patients, while the external validation dataset included 401 patients generated by seven scanners from four hospitals. This dataset is large and heterogeneous compared to other studies of its kind, which enhances the robustness and generalizability of our model.
Radiomics technology holds promise in predicting pelvic LN metastasis in various malignancies, including PCa [18, 19, 20, 21]. Radiomics-based pelvic LN metastasis prediction models typically undergo a multistep process, including segmentation of the region of interest (ROI), extraction of quantitative features, feature selection, and model building. Within the field, researchers have multiple choices when selecting an ROI to study, including the prostate glands, PCa foci, or LNs. Among these options, LNs emerge as the most frequently investigated ROI. A fundamental premise of these studies is to initially segment all LNs, and our study represents an initial step towards this goal, providing an automated method for delineating the ROI of LNs, thus addressing the current limitation of relying on manual delineation at this stage.
While the model achieved acceptable accuracy for the detection of suspicious metastases patients, further improvements are needed to increase its sensitivity at the individual LN level. False positives and false negatives are still common. Lymphadenopathies in the pelvis exhibit great heterogeneity in terms of shape and size, which makes it difficult to accurately distinguish true LN regions from other regions. Furthermore, the relatively small size of LN lesions in comparison to the background volume creates an imbalance that further complicates segmentation. This imbalance also results in a large number of FPs with no specificity for high-intensity mimics, which ultimately lowers the overall specificity of the segmentation process. While larger LNs tend to produce better segmentation results [17], there is a risk of FN detection due to obvious swelling and necrosis. This can be especially problematic in cases of diffuse PCa that occupy most of the pelvic cavity. To address the issue of imbalanced data, we utilized the Dice coefficient as the loss function in the 3D V-Net al.gorithm. We also manually annotated all visible LNs to capture as many specific voxel details as possible. In analyzing the results, we discovered instances where the model made accurate predictions, despite the reference standard failing to annotate them. In the annotation process of the reference standard, the junior radiologist provided a fresh perspective and attention to detail, while the expert radiologist provided valuable insights and corrections. Despite the limitations of manual annotation, which can vary both within and between operations, it remains the most reliable method for accurate image segmentation, and there is currently no viable substitute. Our findings suggest that V-Net can be an effective tool for LN segmentation despite the challenges posed by the complex nature of these lesions.
Several limitations of our study should be acknowledged. First, our study lacked one-to-one MR-surgical pathological LN confirmation. This challenge arises due to the selective use of PLND in clinical practice, particularly for patients with low-risk PCa or metastatic disease where PLND may not be routinely recommended. This does not diminish the validity of our current findings. Future studies may benefit from incorporating histopathologically confirmed metastatic lymph nodes for further analysis of model performance. Second, while our reference standards were established by a senior radiologist, inviting reputable senior radiologists from well-known clinical centers could enhance the credibility of our study by establishing a more robust ground reference. Third, we focused on the feasibility of multi-device image segmentation of pelvic LNs. However, there was a failure to address the relative intensity problem with MRI and to perform any corrections aimed at minimizing discrepancies between different scanners at different magnets. Incorporating these measures in future studies may enhance the reliability of the model results.
Comments (0)