CACTUSS: Common Anatomical CT-US Space for US examinations

CACTUSS (c.f. Fig. 1) consists of three main phases: (I) Joint anatomical IR generator, (II) Image-to-Image translation network, (III) Aorta Segmentation network.

Phase I. Joint anatomical IR generator For the generation of the common anatomical space, we use a hybrid US simulator [9] available in ImFusion Suite,Footnote 2 and initially defined by Burger et al. [10]. The simulator uses a ray-tracing approach to simulate the behavior of sound waves as they propagate through different anatomical tissues using a predefined set of parameters specific to each tissue. An ultrasound wave can be assumed as a ray and therefore be computed using the known laws of wave physics. Beginning at the transducer, the ray is traced through different tissues. During the traversal, the ray is partially absorbed, resulting in attenuation of its amplitude, which can be modeled using a tissue-specific attenuation coefficient. Additionally, reflections occur at the boundary between two tissues, and the strength of the returned signal depends on the change in acoustic impedances between two adjacent tissues. By modeling these phenomena, the intensity at each point along the travel path of the ray is computed.

Input to the simulator is a three-dimensional labelmap where each label represents a tissue or an organ, and each has assigned six acoustic parameters—speed of sound c, acoustic impedance Z, attenuation coefficient \(\alpha \), and speckle parameters (listed in Table 1). The 3D position of the simulated probe is determined using positional and directional splines in the hybrid US Simulator software. In CACTUSS, we set the tissue-specific speckle distribution parameters (\(\mu _0\), \(\mu _1\), \(\sigma _0\)) to zero, rendering the tissues black and highlighting only the boundaries. The parameters in Table 1 are empirically selected to model specular reflection, emphasizing geometrical boundaries crucial for segmentation. Subwavelength reflections (scattering) are omitted, as they can lead to amplitude variations in the B-mode image, potentially obscuring gross anatomies and adversely affecting segmentation accuracy. Further, Table 2 shows the simulation parameters that describe the characteristics of the simulated US machine, which allows for the mapping from the CT domain to the ultrasound domain.

In this way, we create a virtual imaging modality that provides important characteristics from ultrasound, such as tissue interfaces, using annotated CT. This has the advantage that a large number of IR samples can be generated from a single CT scan. Moreover, using a US simulator ensures that the anatomical IRs have anisotropic properties, thereby preserving the direction-dependent nature of US imaging, a fundamental characteristic of the modality.

Table 1 Ultrasound simulator tissue parameters for CACTUSS: c—Speed of Sound, Z—Acoustic Impedance, \(\alpha \)—Attenuation Coefficient

Phase II. Image-to-image translation We address the domain shift issue between the anatomical intermediate representations (IRs) and real ultrasound B-mode images by learning a mapping that retains patient-specific anatomical characteristics in each image. To achieve the translation of real ultrasound images into the IR, we utilize a recent Contrastive Learning for Unpaired image-to-image translation network (CUT) [11].

The CUT network applies the concept of maximum correlation between the content information of a target image patch and its corresponding patch in the source image, compared to other patches within the source image. The network generator function \(G: } \mapsto }\), is responsible for transforming input domain images \(}\) to resemble output domain images \(}\). The training samples consist of unpaired source images \(X = \}\}\) and target images \(Y = \}\}\).

The generator G comprises an encoder \(G_\) and a decoder \(G_\textrm\) that are sequentially applied to the input, resulting in the synthesized output \(\widehat = G(z) = G_\textrm(G_\textrm(x))\). The encoder’s task \(G_\textrm\) is to extract content characteristics, while the decoder \(G_\textrm\) learns to create the desired appearance by employing a patch contrastive loss [11]. This approach ensures that the generated samples have the appearance of the IR while preserving the underlying anatomical structure from the input ultrasound image.

Table 2 Ultrasound simulator machine parameters for CACTUSS

By employing the CUT network, we successfully establish a mapping between real ultrasound B-mode images and the anatomical IR, enabling the generated samples to possess the visual properties of the IR while retaining the anatomical content derived from the original ultrasound images.

Phase III. Aorta segmentation In the final stage, we train a segmentation network with a U-Net architecture, using only the samples obtained in phase I to conduct aorta segmentation on the intermediate space images. Notably, the labels for training can be directly extracted from the CT slices, eliminating the need for manual labeling. Importantly, the proposed CACTUSS method does not require manual ultrasound image annotations.

Experimental setupData

We employ two image domains: IR space images and in-vivo images. For generating the IR images, we used eight partially labeled CT volumes obtained from a publicly available dataset Synapse.Footnote 3 Additionally, annotations for bones, fat, skin, and lungs were added to complete the label map. Using these CT volumes, 5000 IR samples were simulated, each sized at 256 \(\times \) 256 pixels and used as a training set for the segmentation network in step 3 (see Fig. 1). From this simulated IR dataset, a subset of 500 images was randomly selected as domain Y for training the CUT network, ensuring a balanced representation by choosing an equal number from each CT.

The second domain comprises in-vivo images obtained from ten volunteers (six males and four females) with an average age of 26 ± 3. Using a convex probe (CPCA19234r55) on a cQuest Cicada US scanner (Cephasonics, Santa Clara, CA, US), ten US abdominal sweeps were acquired. From each sweep 50 randomly sampled frames were selected, all sized at 256 \(\times \) 256 pixels. This resulted in 500 samples, which were used for training in domain X of the CUT network. Additionally, to ensure data integrity, a separation was maintained between training and testing datasets.

To evaluate the segmentation network, which was trained solely on intermediate representations, a subset of 100 real images manually labeled by a medical expert was used as a test set. Those images were comprised of 10 random frames per volunteer. Additionally, in order to compare to a fully supervised approach, an eightfold cross-validation network was trained with patient-wise split. For this purpose, supplementary images were annotated where each fold consisted of 50 images from 8 subjects. We expanded our dataset to 11 subjects, allowing for a separate evaluation on three hold-out subjects. Furthermore, additional images were acquired from another ultrasound machine for further evaluation. A set of 23 images was acquired from a volunteer not included in the existing datasets. These images were acquired using an ACUSON Juniper ultrasound system (Siemens Healthineers, Erlangen, Germany) with a 5C1 convex probe and subsequently annotated.

Moreover, in order to validate the applicability of our approach in the context of abdominal aortic aneurysm (AAA), we downloaded 13 random US images with AAA from the internet. To emulate the presence of AAA in the CT labelmaps, we employed a dilation process on the aorta label with three distinct sizes, guided by medical literature: small AAA (3–4.5cm), medium AAA (4.5–5.5 cm), and large AAA (greater than 5.5cm). Subsequently, we quantified the deformation \(\phi _\) between the initial healthy aorta and the simulated AAA by registering the two corresponding meshes. In order to propagate the effect of the dilated aorta onto the surrounding organs, we extrapolated \(\phi _\) across the entire CT image using a radial-based function interpolater. To ensure the preservation of the anatomical properties of the adjacent organs, we took into account the rigidity of the spine. Consequently, the resulting deformation accurately mimics an AAA-affected aorta and causes the deformation of the neighboring soft tissue.

Training

We train the CUT network for 70 epochs, with a learning rate of \(10^\) and default hyperparameters and U-Net [12] for 50 epochs, with \(10^\) learning rate, Adam optimizer, Dice loss, and a batch size of 64. We include data augmentations such as rotation, translation, scaling, and noise, and random partitioning into an 80–20 % ratio for training and validation, respectively. To evaluate the full model’s performance, we employ a separate test set comprising 100 in-vivo images. For evaluation and early stopping of CUT we use the Fŕechet inception distance (FID) [13]. FID is a metric that measures the dissimilarity in feature distributions between two image sets, in our case real and IR images, by utilizing feature vectors extracted from the Inception network. We use the second layer for FID calculation and consider the epochs with the top 3 FID scores and perform a qualitative selection.

Table 3 Evaluation of DSC in % and MAE in mm for the task of aorta segmentation of our proposed method against a supervised approach on Cephasonics and Juniper samplesExperiments

In order to evaluate the proposed framework, we conducted a series of quantitative experiments:

Segmentation performance We compare against a supervised network to evaluate the accuracy of the proposed method. Specifically, we conducted a standard eightfold cross-validation with a U-Net architecture, where each fold consists of 50 in-vivo images from a single subject. In each round of cross-validation, seven subjects’ images were used for training and one for validation. Consequently, the best model was tested on three hold-out subjects, excluded from the training and validation process, and the average Dice Similarity Coefficient (DSC) obtained from this testing was reported.

Fig. 2figure 2

Image: Example of images of alternative IRs to compare against the proposed CACTUSS IRs. Left to right: edge detection on CT slice, US simulation, CACTUSS IR image. Table: Comparison of DSC of segmentation given alternative IRs

Clinical applicability We assess the clinical applicability of the proposed method by measuring the anterior-posterior diameter of the aorta in millimeters, following the established clinical practice [5]. Mean Absolute Error (MAE) and standard deviation were calculated by comparing the measurements obtained from both CACTUSS and the supervised segmentation with ground truth labels. For a medical diagnosis of Abdominal Aortic Aneurysm (AAA), an error of less than 8 mm is considered acceptable [5].

Robustness To evaluate the robustness of our method against domain shift, we acquire images from a different ultrasound (US) machine, as discussed in Sect. "Experimental setup" and again compared against the supervised network.

Different intermediate representations We evaluate the sensitivity of the proposed method to different choices of intermediate representations (IRs). We conducted an experiment where we replaced the common anatomical IR with two alternatives. The first is generated by applying a Canny edge detector, bilateral filter, and a subsequent convex US probe’s fan mask on CT slices. The second is a physics-based ultrasound simulation generated from the CT label map with default simulator parameters. These alternative IRs were utilized in the same manner as our proposed IR images to train both the segmentation network and the CUT network. This experiment is evaluated by reporting the Dice Coefficient Similarity (DCS) on 100 in-vivo frames passed first through the trained CUT model and subsequently through the trained segmentation model, comparing the final segmentation result with ground truth labels annotated by experts.

AAA experiment To further validate the applicability of CACTUSS, we conduct an additional experiment focusing on cases with abdominal aortic aneurysm (AAA). As described in Sect. "Experimental setup", the AAA images are from different ultrasound machines, each with its own set of parameters. For this experiment, we solely retrained the segmentation network from phase III, utilizing the newly simulated AAA images. The segmentation accuracy was quantitatively measured by comparing the results obtained by CACTUSS with the ground truth labels, and we again report the DSC, MAE as performance metrics to assess the alignment between the predicted segmentations and the actual boundaries of the AAA pathology.

Experiment without IR We have conducted an experiment where no IR is required. To achieve this, the CUT network was trained to learn to translate CT into US images for a direct mapping between the two modalities.

留言 (0)

沒有登入
gif