Harmonization of brain PET images in multi-center PET studies using Hoffman phantom scan

In this work, we assessed a framework for the harmonization of brain PET images using Hoffman phantom scans and the EIR as harmonization criterion. Using this method, we showed the feasibility of attaining PET scans of comparable quality, as assessed quantitatively and qualitatively, in multi-site imaging networks. Our complementary results showed that this method can be used prospectively for harmonizing scans acquired on state-of-the-art PET systems to target sharper EIR. The results confirmed that our proposed harmonization protocol is robust against dose-calibrator errors. Indeed, we were able to validate the proposed framework across different systems including PET/MR systems which pose challenges for harmonization due to their specific attenuation correction methods. Despite this, our method produced comparable quantitative metrics between PET/MR and PET/CT systems.

We showed that EIR is a robust criterion for harmonizing brain PET images acquired in different centers. EIR is a global metric comparing the simulated theoretical activity concentration per voxel in DRO using different Gaussian filters with the experimental activity concentration in the phantom PET image. In other words, EIR is the FWHM of a three-dimensional Gaussian filter that provides the best fit between the theoretical and experimental activity concentration values. One of the advantages of using EIR is including all voxels in the calculation of EIR, therefore taking into account simultaneously the signal degradation, image uniformity, and spill-in and spill-out between different compartments of the Hoffman phantom (representing GM, WM, and ventricles). This characteristic makes EIR a robust parameter for image harmonization as it is not sensitive to the presence of small bubbles, slight shape differences of the Hoffman phantoms, or dose-calibrator errors. We additionally considered EIR as a symmetrical Gaussian filter with the same FWHM in x, y (trans-axial), and z (axial) directions, meaning that the resolution of a PET system in the axial direction is higher than radial and tangential resolutions. However, it is reasonable to assume uniform smoothing in brain imaging as the patient is located in the center of the system where the field of view is fairly uniform in all directions. As the coarsest estimated EIR was around 8 mm across different PET scans, the target EIR was selected to be 8 mm. Harmonizing all PET images to the target EIR was performed by smoothing PET images with better spatial resolution. This allowed for reducing the standard deviation of COV% across different PET images as well as the COV% to the acceptable level (COV ≤ 15%) (Table 3). It should be noted that for the data with sharper EIR, COV% has higher values and higher levels of variabilities are observed across different scans with similar EIRs. As a consequence, most of the images do not meet the acceptance COV% for achieving optimal image quality as COV% falls above 15% in most of them (Additional file 2: Figure_S5).

In this study, contrast, GMRC, COV%, cold-spot RC, and left-to-right GMRC ratio were used as complementary indicators for evaluating the performance of the harmonization method. Our results indicated that mapping EIR across different systems produced comparable quantitative image quality metrics irrespective of the system model and reconstruction setting (Table 3 and Fig. 4). It should be noted that in the current work, 4 PET/MR systems were included in the harmonization procedure, and we were able to validate the feasibility of harmonization of PET/MR systems in multi-center studies. Contrast and GMRC have been previously recommended for harmonizing brain PET images in multi-center studies [18, 19]. While keeping contrast and GMRC between the lower and upper acceptance range helps to achieve optimal image quality and reduce variabilities, it does not necessarily result in harmonized images. The main reason for the insufficient performance of contrast and GMRC as harmonization criteria comes from the way these metrics are defined. Contrast is calculated as the ratio of mean activity in gray matter VOI to the mean activity concentration in white matter VOI, meaning that the noise property will be canceled out by this division. GMRC is the mean activity concentration in the GM mask divided by true activity at the starting PET scan, where the GM mask is big enough to minimize the noise effect. In other words, it is possible to achieve PET images with similar contrast and GMRC, but with different noise levels (Fig. 4). Another limitation of using contrast and GMRC as harmonization metrics is the way GM and WM VOIs are defined. For example, some studies used eroded WM and GM VOIs for extracting contrast and GMRC. The main disadvantage of that approach is that quantitative metrics extracted from eroded VOIs are based on the limited number of voxels. That is, using small VOIs for quantification can be sensitive to noise and it normally represents the part of the image that is minimally affected by signal degradation and partial volume effect. However, the main aim of harmonizing PET images is to acquire the same level of signal degradation across different systems. According to our results on quantification of un-harmonized PET images, fifteen PET scans (44.11%) were complying with contrast and GMRC limits for both eroded and non-eroded VOIs recommended by Verwer et al. However, COVs% ranging from 10.64 to 28.19% (18.09 ± 4.50%) were observed for these PET scans, showing a high level of heterogeneity, confirming that contrast and GMRC are not efficient enough for reducing between scanner variabilities. After harmonizing PET images, although quantitative metrics variabilities were reduced significantly, none of these metrics were complying with the recommended limits [18]. In a previous study, the acceptance limits for quantitative criteria were defined by using optimized reconstruction protocols, meaning that these limits cannot be applicable to the scans reconstructed with protocols deviating from optimal reconstruction and those historical scans that have been acquired previously, using older systems.

RC of cold VOIs (regions with low uptake) is an important quantitative metric, and its accuracy depends on the spatial resolution as well as scatter correction algorithms [27]. Since scatter correction algorithms are different across different vendors, they could produce different values in low uptake VOIs. In this study, we evaluated the accuracy of cold-spot RC before and after harmonization. As cold-spot was defined in a VOI without uptake, a recovery coefficient of zero was expected. According to our results, cold-spot RCs were very close to zero for the majority of the centers (Mean: 0.04 95% CI 0.04–0.05), confirming similar performance of scatter corrections across different vendors and PET system models. However, a cold-spot RC of 0.08 was observed for one PET/MR system which could be due to the errors of generated attenuation correction map in PET/MR phantom scan. Uniformity of the PET images across the field of view was measured using left-to-right GMRCs. Given that the system should provide uniform performance across the FOV, a left-to-right GMRC ratio of 1 is assumed for all systems. In this study, the left-to-right GMRC ratio was in the expected range and no significant difference was observed among different systems (1.02 ± 0.01, 95% CI 1.01–1.02). Additionally, as expected, harmonizing PET images had minimal effect on the mean cold-spot RC (Mean ± SD difference: 0.003 ± 0.002) as well as right-to-left hemisphere GMRC ratios (Mean ± SD difference: 0.005 ± 0.004) (Fig. 6). Although the highest acceptable value for GMRC is one, we observed GMRCerod > 1 for five of the un-harmonized PET data. Similar behavior was observed for contrasterod, where contrasterod > 4 was observed for two PET scans. This level of overestimation can be explained by either a high level of noise propagation or the presence of Gibbs artifact due to reconstruction settings. However, after harmonizing PET images, GMRCerod (0.89–0.98) and contrasterod (2.62–3.53) fell below the acceptable level, which is due to minimizing the noise effect or correcting Gibbs artifact after applying post-smoothing filter (Table 3).

Fig. 6figure 6

Bland–Altman plots comparing differences between RCs for A global GM, B left hippocampus, C left cuneus, D left-to-right RC ratio, and E cold-spot RC before and after harmonization. Dashed lines represent 95% confidence intervals, and solid lines represent the line of equality

Our results on the performance of the harmonization method on quantitative metrics confirmed that the proposed methodology is capable of harmonizing PET data and producing comparable quantitative metrics across different PET images (Fig. 4). Despite the capacity of this methodology in minimizing differences in PET data, one potential caveat could be that smoothing the PET images to the poorest EIR decreases its sensitivity in detecting small changes. However, present results indicated that by harmonizing the images, global GMRC was minimally reduced 0.06 ± 0.03 after harmonization and maintained good recovery coefficients for both large VOIs such as cuneus as well as small VOIs such as the hippocampus (Fig. 6) while drastically reducing the COV% spread across sites. The highest decrease between the quantitative metrics before and after harmonization was observed for a PET image reconstructed with PSF modeling, and global RC and left cuneus RC decreased by − 0.12, and left hippocampus RC reduced by − 0.24 after harmonization of the PET images. The reason behind this significant decrease is that due to the reconstruction settings significant Gibbs artifact was observed, resulting in overshooting the signal, especially in the small VOIs like the hippocampus, and harmonization of these data improved image quality by removing the Gibbs artifact as well as producing comparable images across different centers.

Different levels of error between dose-calibrator estimated activity and image-derived activity concentration were observed. Our results showed that dose-calibrator and image-derived activity concentration ratio is between 0.73 and 1.27 across the AMYPAD imaging network, representing the overestimation/underestimation of activity concentrations across different centers (Fig. 3). These discrepancies can be explained by different sources of error such as errors in phantom preparation and estimating stock volume solution, calibration errors, or quality control errors of the dose calibrator. Introducing a ± 10% error in the image-derived activity to the DRO for estimating EIR and harmonization kernel had a negligible impact on both EIR (~ 1 mm) and estimated harmonization kernel (~ 1.5 mm), confirming that our methodology in estimating EIR and harmonization kernel is stable and robust enough to the expected level of errors (Fig. 5). Image-derived activity is a suitable substitute for dose-calibrator activity to avoid including dose-calibrator-based errors in the harmonization procedure. However, to calculate image-derived activity, it is important to include only voxels in the gray matter that are far from the edge of the GM mask, to avoid any possible contribution of the Gibbs artifact in the case of using point spread function modeling in the reconstruction. Additionally, voxels that are minimally or not affected by partial volume effect and signal degradation should be used for estimating image-derived activity. In this framework, DRO was smoothed using a Gaussian filter with FWHM of 8 mm and only voxels with intensity above 0.98 were considered as pure gray matter for calculating image-derived activity. The reason behind selecting this filter is that the coarsest EIR of a commercial PET system is about 8 mm, and to avoid underestimation of image-derived activity, it is necessary to include voxels that are minimally affected by PVE across all available PET systems. Also, voxels with intensity above 0.98 are far from the edge of GM VOI and eliminated the possibility of over-estimating activity due to Gibbs artifact.

In this study, the brain PET harmonization framework was implemented in two clinical trials of AMYPAD, and our results confirmed the feasibility of using this method for both PET/CT and PET/MR systems. Most of the imaging centers in AMYPAD network had EARL accreditation, and a NEMA phantom scan has been conducted across most imaging sites for evaluating the quantitative performance of the system before starting clinical PET scans. As a result, the level of variabilities in the AMYPAD imaging network is probably lower compared to general clinical settings. Based on our framework, true activity concentration was calculated using a data-driven metric, without any need for cumbersome preprocessing steps. Estimating the EIR of each PET system was done using the DRO provided by a software tool developed for the automated analysis of Hoffman PET images. Using this toolbox, only phantom PET image and scan info were introduced as the inputs, and many image quality and quantitative metrics were extracted as outputs. This toolbox is developed for automated analysis of Hoffman phantom PET images and enabled us to calculate several quantitative metrics of phantom PET using a consistent framework for evaluating the PET quantitative criteria across different centers before and after harmonization.

留言 (0)

沒有登入
gif