90Y SPECT scatter estimation and voxel dosimetry in radioembolization using a unified deep learning framework

Figure 1 presents an overview of the framework, encompassing three main stages: scatter estimation, SPECT reconstruction and voxel-level dosimetry.

Fig. 1figure 1

Overview of deep learning framework for 90Y Bremsstrahlung SPECT scatter estimation and voxel dosimetry. Our framework consists of (1) a deep learning model for 90Y-SPECT/CT scatter estimation; (2) SPECT reconstruction with the deep learning-based scatter estimate; (3) a residual learning network for 90Y dose estimation, trained using high-resolution virtual phantoms

Virtual patient phantom generation

90Y SPECT/CT data for 18 patients who underwent 90Y-RE at our clinic were available for generation of virtual patient phantoms. The lesion contours defined manually by a radiologist and healthy liver (liver minus lesions) contours defined using automated tools were also available. The lesions covered a range of lesion-to-background uptake ratios, shapes, and sizes (median: 26.9 mL, range 5.7–932.1 mL) and represented both left and right lobe treatment. The patient SPECT data were reconstructed with in-house developed ordered-subsets expectation–maximization (OS-EM) software with attenuation, scatter correction and resolution recovery to define virtual patient (digital phantom) activity maps. Note that for the scatter estimation, we used the previous CNN described in [15]. The corresponding density maps were generated from the patient's CT using an experimentally derived calibration curve for CT-to-density conversion. All activity maps and density maps used for simulation were registered to CT image space (matrix size: 512 × 512 × 194, voxel size in mm: 0.98 × 0.98 × 2). Phantoms were divided into training/validation/testing sets for both networks to represent a range of activity distributions and patient sizes (see Additional file 1: Figs. S1–S3 and Additional file 1: Table S1).

Stage I: convolutional neural network-based estimation of SPECT scatter projectionsDatasets

We used MC-simulated SPECT projections from 6 virtual patients (6 × 128 projection views) to train and validate the scatter estimation CNN in the first stage, while the rest of 12 virtual patients were used for testing. To train/validate the network, we generated the ground truth (GT) scatter projections by running the SIMIND MC simulation code [21] that couples the digital phantoms with the SPECT/CT camera model based on parameters from our clinic (scanner: Siemens Intevo with HE collimators, crystal size: 5/8″, acquisition window: 105–195 keV, number of projection views: 128, matrix size: 128 × 80, pixel size in mm: 4.8 × 4.8). Approximately one billion photon histories were simulated per projection angle to generate data with low statistical noise. We kept only the central slices of each projection that were within the FOV, reducing the initial projection matrix size of 128 × 128 to 128 × 80 for each view angle. Poisson noise was added after scaling projections to count levels observed in the clinic, ~ 107 counts over all views.

Network

Our scatter estimation CNN (Fig. 2), similar to our previous network [16], took both the projected attenuation map (3D attenuation map projected to SPECT projection space via line integrals) and the scaled SPECT projection measurements individually as inputs and passed them to separate branches. Each of the input branches has three convolutional layers followed by a ReLU activation layer. The outputs of these two branches were then concatenated along the channel dimension and fed through three convolutional layers with ReLU activations. To generate the non-negative estimated scatter projection, a point-wise convolutional layer and an additional ReLU layer were applied at the end. To preserve the spatial dimensions, all convolutions, except for the final one, used a 3 × 3 kernel with zero padding of size 1 for each dimension. We trained the scatter estimation network by minimizing the pixel-wise mean square error (MSE) between the estimated and GT scatter projections from SIMIND, using the Adam optimizer [22] with 800 epochs and initial learning rate of 0.0001. We implemented the network in PyTorch and trained on NVIDIA V100 GPU.

Fig. 2figure 2

Architecture of CNN for 90Y Bremsstrahlung scatter estimation in SPECT projection space. Each blue box corresponds to a multi-channel (the number of channels is listed on the top of each box) feature map. The spatial dimensions are maintained at 128 × 80 through the entire network

Stage II: OS-EM for SPECT reconstruction

OS-EM [23] is a widely used algorithm for performing 3D SPECT reconstruction that aims to estimate the image (emission distribution) \(}\) from the noisy measurements (recordings of emitted particles) \(}\), which are often statistically modeled as:

$$\user2\sim }} + \overline}$$

where \(}\) represents the system model, including ray-dependent factors such as attenuation and detector efficiency, and \(}\) denotes the mean of background events such as scatter events. In this stage, we performed OS-EM reconstruction (number of subsets: 4, number of iterations: 16, matrix size: 128 × 128 × 80, voxel size in mm: 4.8 × 4.8 × 4.8) using our in-house MIRT program [24], with CT-based attenuation correction, CNN estimated scatter correction, as well as collimator-detector response compensation with no post-filtering. All reconstructed SPECT images were linearly interpolated and registered to CT image space afterward.

Stage III: generating the absorbed dose-rate map with deep residual learning networkDatasets

The 12 virtual patient phantoms that had gone through the test phase of the scatter estimation CNN were used for training/testing the dosimetry network (6 for training and 6 for testing). To generate the GT dose-rate maps to use as the training labels, we first ran the in-house Monte Carlo dosimetry code (dose planning method, DPM [25]) with the true activity and density maps corresponding to the phantom as the input (Phantom + MC Dose). For DPM, we simulated approximately one billion histories after considering statistical uncertainty. With this number of histories, the relative uncertainty in absorbed dose-rate was < 0.1% at organ and lesion level and < 1% at the voxel level for voxels within these structures. For use in our residual learning network described below, we also generated dose-rate maps by DVK convolution, where the 90Y kernels in water (1.0 g/cm3) were generated by DPM MC. To include all possible events, the beta particle kernel size was designed to be 23 × 23 × 13 (with voxel size 0.98 × 0.98 × 2 mm3) given the fact that the maximum range of 90Y beta particles in tissue is about 11 mm. SPECT reconstructions obtained from stage II were convolved with DVKs with fast Fourier transform (FFT). Density scaling was performed by dividing each voxel by the corresponding density value (g/cm3). To avoid the unreasonably high dose-rate estimation in extreme low-density regions, the DVK dose-rate with corresponding voxel density lower than 1.0 g/cm3 was set to 0. Such a cutoff was also used within DPM.

Network

The network takes the reconstructed SPECT (from Stage II) and CT images as inputs to generate the dose-rate map with high resolution (deblurred). The network implemented in this stage (Fig. 3) for the proposed 90Y dose-rate estimation was inspired by our previous implementation (DblurDoseNet) for 177Lu [20], and we further extended and fine-tuned the hyperparameters. Here, we use the physics-based DVK convolution approach to obtain a fast initial estimate of the dose-rate map. Our network was designed to have a residual structure that learned only the subtle differences between the GT dose-rate maps from MC and DVK dose-rate maps. The inputs to our residual learning network consist of packs of scaled activity maps (SPECT images reconstructed with CNN scatter correction, scaled so that all voxels sum up to a normalizing constant for faster and more stable convergence during training) and density maps. The input size was 512 × 512 × 11 (with voxel size 0.98 × 0.98 × 2 mm3) with 11 slices packed to capture all possible dose contributions from 90Y beta particles. The input packs were concatenated along the channel dimension and passed to a 3D convolutional layer-based (kernels with spatial size of 7 and depth of 5, 3, 3, respectively) feature extractor and 2D feature maps corresponding to the middle slice of the inputs were generated. The 2D feature maps were then fed to a 2D U-Net. The corresponding DVK dose-rate maps (provided to the stage III as an initial estimation of dose-rate maps for residual learning) were added to the outputs of the 2D U-Net to generate the outputs of the stage III, as well as the outputs of the entire framework: estimated dose-rate maps (density correction and inversely scaling was applied at the last layer). The dose-rate unit in the network output is nGy/MBq-sec, aligning with the unit of the training labels (GT dose-rate maps) up to a scaling factor. The network was trained to minimize the pixel-wise MSE between the GT dose-rate maps (Phantom + MC Dose) and the estimated dose-rate maps (with residual learning) on a single NVIDIA V100 GPU (batch size: 32, optimizer: Adam with initial learning rate of 0.001, epochs: 300). The training/validation curves converged visually after ~ 10 h of training, as illustrated in Additional file 1: Fig. S4. Note that we trained two networks in stage I and stage III separately and sequentially, without backpropagating through the OS-EM.

Fig. 3figure 3

Architecture of our residual learning network (DblurDoseNet) for SPECT-based absorbed dose-rate map generation. Each blue box corresponds to a multi-depth (the depth is listed on top of each box) feature map. The depths of kernels in the three 3D-convolution operations are 5, 5, 3, respectively. The input size was 512 × 512 × 11 with 11 adjacent slices to capture the dose contributions from 90Y beta particles. The spatial dimensions are maintained at 512 × 512 in all layers

Ablation study

As an ablation of our framework, instead of two networks for scatter and dosimetry separately we also investigated training and fine-tuning only the stage III model (DblurDoseNet) to perform both tasks. The same 6 training cases used previously to train the DblurDoseNet were used here. In this case, the single-stage model took SPECT reconstruction with no SC and density maps as inputs and used the DVK dose-rate map (FFT convolved DVK with SPECT w/no SC) for residual learning.

Metrics used for evaluation

Performance was evaluated for the total image and both the lesion and the healthy liver (treated liver minus the lesions) volumes of interest (VOIs) in the phantoms. Qualitatively, results were compared by visual assessment of images and line profiles. Quantitatively, the following metrics were used. Let \(n_\) denotes the total number of voxels in the VOI. The GT and estimated images are denoted by \(}\) and \(\hat}\), respectively.

Normalized mean absolute error (NMAE) is defined as:

$$} \triangleq \frac }}\left( ^ }} }_ - \mathop \sum \nolimits_^ }} \widehat}_}} }}} \right)} \right|}} }}\mathop \sum \nolimits_^ }} }_ }} \times 100\%$$

Normalized root mean squared error (NRMSE) is defined as:

$$} \triangleq \frac }}\mathop \sum \nolimits_^ }} \left( }_ - \widehat}_ }}} \right)^ } }} }}\mathop \sum \nolimits_^ }} }_^ } }} \times 100\%$$

The subscription indicates the ith voxel of the object.

We evaluated the CNN scatter projections using NRMSE relative to the GT scatter projections from SIMIND. The SPECT reconstructions with no SC and the CNN estimated SC were evaluated using NRMSE relative to the reconstructions with the true scatter estimates from SIMIND as the GT. The dose-rate maps were evaluated using both NMAE and NRMSE relative to the GT dose rate maps directly from the phantom (Phantom + MC Dose). We compared performance of dose-rate maps corresponding to (1) SPECT reconstruction with CNN scatter correction and DVK convolution (SPECT w/CNN SC + DVK) (2) SPECT reconstruction with CNN scatter correction and DPM MC dosimetry (SPECT w/CNN SC + MC Dose) and (3) SPECT reconstruction with CNN scatter correction and deep learning dosimetry (SPECT w/CNN + DblurDoseNet). All dosimetry evaluations were performed on dose-rate maps (output of the framework) instead of dose maps. For 90Y-RE, where there is no re-distribution of activity, conversion of dose-rate maps to dose-maps simply requires scaling by a constant factor, which is the time integral of the mono-exponential decay curve with a half-life equal to physical decay.

Torso phantom measurement

The torso phantom comprised a liver section filled with water, lung compartments filled with Styrofoam beads and water (simulating lung tissue density), and a spine insert for bone density. The liver had a volume of 1200 mL and incorporated three lesion inserts: a 29 mL ovoid (insert 1), a 16 mL sphere (insert 2), and an 8 mL sphere (insert 3). The phantom's total Y90 activity during imaging was 2.0 GBq. Activity concentrations were 6.4–7.8 MBq/mL for the liver inserts and 1.3 MBq/mL for the liver excluding inserts, aligning with clinical scenarios for Y90 RE. SPECT/CT acquisition lasted 30 min, ensuring a count level (9 million counts) consistent with standard patient studies. The ground truth activity map was generated by masking the CT image and assigning the true (uniform) activity concentration to each compartment.

Patient studies

90Y SPECT/CT images from 4 clinical patients (distinct from the patients used to generate virtual patient phantoms) were selected to represent clinical application. Patients covered diverse scatter conditions and dose-rate levels with injected activities ranging from 1.3 to 4.0 GBq. 90Y SPECT/CT scans were acquired with a duration of approximately 30 min within a few hours following the RE procedure on the same SPECT/CT system described previously. A total of 6 lesions (volume ranging from 4 to 59 mL) were segmented by the radiologist in these 4 patients. As in the phantom testing, patient measured SPECT projections and CT-based projected attenuation map were used to estimate scatter projections (Stage I) that were needed for OS-EM SPECT reconstruction (Stage II). DVK dose-rate maps (based on SPECT w/CNN SC) were produced, and packs of upstream SPECT images with CNN SC and density maps were subsequently input to the residual learning network to generate dose-rate maps (Stage III). In the absence of GT, to evaluate the dose-rate map of clinical patients, we compute line profiles and the lesion-to-background ratios. Here, the background VOIs were defined nearby in a uniform region of the liver with equivalent sizes and shapes to those of the target lesions.

留言 (0)

沒有登入
gif