STADNet: Spatial-Temporal Attention-Guided Dual-Path Network for cardiac cine MRI super-resolution

Cardiac magnetic resonance imaging (MRI) is considered the gold standard for both visual and quantitative evaluation of heart function (Blansit et al., 2019). Particularly, cine-balanced steady-state free precession (bSSFP) is known for providing high myocardium-blood pool contrast, making it ideal for the assessment of left ventricular function (Feng et al., 2013). Nevertheless, MRI is encumbered with extended acquisition times that often require averaging across multiple cardiac cycles (Xue et al., 2013). Balancing spatial resolution, temporal resolution, and scan time presents a challenge, as radiologists strive to tailor acquisition time for optimal clinical observation (Wang and Ying, 2013). Small acquisition matrices are necessary for real-time imaging (Setser et al., 2000), but conventional up-scaling techniques such as Fourier domain zero padding and bicubic interpolation (Ashikaga et al., 2014, Bernstein et al., 2001) will potentially result in the loss of fine details and blurry edges, leading to a decrease in image quality .

To address the aforementioned issues, researchers have utilized deep learning-based methods (Dong et al., 2015, Ledig et al., 2017, Chaudhari et al., 2018) that can directly learn end-to-end mappings from extensive low-resolution (LR) and high-resolution (HR) paired images. However, most existing algorithms are based on single-image SR methods which are not optimal for cardiac cine MRI since the relationship among consecutive frames in cine images is not effectively leveraged. Therefore, researchers have applied the video super-resolution (VSR) approaches to solve the cine MRI SR problem (Masutani et al., 2020, Steeden et al., 2020, Lin et al., 2020, Upendra and Linte, 2022). Nevertheless, as shown in Fig. 1, these methods have two shortcomings. Firstly, existing methods feed the 2D+1D (time) cine MRI directly into a 3D convolutional neural network (CNN) to learn the mapping between the LR and HR images, without further mining the connection between frames. Secondly, the convolutional kernel has a limited receptive field and cannot capture long-range or non-local features, which are crucial in cardiac regions with complex anatomical structures. To address these limitations, transformer-based VSR methods that are based on optical flow, such as video restoration transformer (VRT) (Liang et al., 2022a) and recurrent video restoration transformer (RVRT) (Liang et al., 2022b), can be considered. Utilizing an optical flow estimator to capture motion information and align adjacent frames is a common video restoration strategy (Makansi et al., 2017, Su et al., 2017, Xue et al., 2019, Pan et al., 2020). The flow-based approaches utilize the pre-warping strategy that first operates an optical flow estimator to generate motion offsets that warp adjacent frames. Then, the regions that correspond to the same object but are misaligned in adjacent images or feature fields are aligned. Although the pre-warping strategy utilizing optical flow achieves impressive performance in natural video restoration, there are still some limitations when applied to cardiac cine MRI. Firstly, the pre-warping strategy assumes that the motion between adjacent frames is smooth and can be accurately estimated by optical flow. However, this assumption may not hold in cardiac cine MRI due to the complex and irregular motion patterns of the heart. Secondly, the pre-warping strategy may introduce additional artifacts in the restored images due to the inaccuracies in the optical flow estimation and the interpolating operation used for warping the images.

To address these problems, we propose a novel approach, the Spatial-Temporal Attention-Guided Dual-Path Network (STADNet), for cardiac cine MRI SR. Our proposed STADNet consists of two paths. The location-aware spatial path employs the information of neighboring frames to enhance the spatial details of the current frame, while the motion-aware temporal path utilizes an optical flow-guided strategy to exploit the correlation between cine MR frames and extract the motion information of the heart. We utilize the Swin Transformer (Liu et al., 2021) as the backbone of our network. This selection is driven by the transformer’s adeptness in capturing long-range and non-local features, surpassing the constraints of conventional convolutional kernels. The Swin Transformer’s phased attention mechanism and hierarchical structure enhance its effectiveness in detailed feature modeling, making it exceptionally suitable for tasks that necessitate a nuanced understanding of both local and global image relationships. For the location-aware spatial path, we design a position-weighted cross-frame attention module. This module improves the reconstruction of the texture information and anatomical structures of the current frame by referring to the information of its adjacent frames. For the motion-aware temporal path, we design a recurrent flow-enhanced attention strategy. Our approach adopts a novel alignment method that samples key features in consecutive cine frames to calculate attention. This strategy preserves the prior information in cardiac cine MRIs comprehensively. To capture long-range temporal dependencies more effectively, we adopt a recurrent mechanism for the flow-based attention strategy, inspired by previous RNN-based methods (Qin et al., 2018, Zhong et al., 2020). Finally, we concatenate the output of the features from the two paths, which makes full use of the correlation between frames and effectively suppresses the artifact or blurring problem. Our contributions can be summarized as follows:

We propose a novel framework for cardiac cine MRI super-resolution, named STADNet, in which we highlight: − Swin-transformer groups to acquire deep features from cine MRIs, essential for modeling the long-range dependencies in such images with complex anatomical patterns. − A position-weighted cross-frame attention strategy to exploit the correlation between cardiac cine MR frames for restoring the texture and anatomical structures of the current frame. − A recurrent flow-enhanced attention module to employ the information of neighboring cardiac frames to enhance the fine details of the current cardiac frame.

We extensively evaluate our model, STADNet, on two datasets: the public ACDC dataset and a private in-house dataset. Our results demonstrate that STADNet outperforms state-of-the-art approaches, showing its effectiveness and potential for clinical practice

留言 (0)

沒有登入
gif