The process of medical image segmentation is a critical aspect of modern medical systems. It provides vital anatomical information for accurate diagnosis and effective treatment of disease. However, in the specific domain of cardiac magnetic resonance imaging (MRI), there are several challenges. Firstly, cardiac MRI images exhibit non-uniform intensity distributions and blurry boundaries, making the automated segmentation process more complex. Secondly, there are significant physiological and anatomical differences among individuals, making it difficult for highly generalizable segmentation models to adapt to the images of different patients. Moreover, although clinical doctors have the capability to perform manual segmentation, this method is prone to subjective factors and requires significant time investment. Therefore, there is a need for higher-precision automated segmentation methods in intelligent healthcare systems. The current challenge is to develop highly accurate automatic segmentation models for cardiac MRI images in the field of medical image segmentation, improving the performance of intelligent healthcare systems to better support disease diagnosis and treatment. Continuous research and innovation in this field are expected to bring about significant breakthroughs and advancements in the future. Simultaneously, countries worldwide are facing challenges related to varying degrees of medical resource shortages. Smart healthcare systems have the potential to alleviate some of the issues caused by the scarcity of medical resources. These systems integrate advanced technologies such as artificial intelligence, big data analytics, and remote monitoring to enhance the efficiency and accessibility of healthcare services. Through remote medical consultations, automated diagnostic tools, and electronic health records, smart healthcare systems can provide patients with faster and more accurate medical diagnoses and treatments. This is particularly significant for populations that have difficulty accessing traditional medical resources, especially in remote areas or regions with limited healthcare resources. Additionally, smart healthcare systems help improve the management and utilization efficiency of medical data, providing more reliable information support for medical decision-making. This aids healthcare institutions in better resource allocation, enhancing healthcare quality, and reducing medical costs.
Medical image segmentation often faces the challenge of scarce annotated data, primarily due to the complexity, expertise, and high costs associated with the annotation process. Medical images require pixel-level annotations by experienced radiologists, which is time-consuming and labor-intensive. Additionally, inconsistencies in annotation standards across institutions limit the availability of high-quality datasets. Patient privacy protection policies further restrict data sharing and open access, exacerbating data scarcity. In scenarios with limited samples or rare disease cases, insufficient annotations lead to model overfitting or poor generalization, hindering accurate segmentation of complex anatomical structures or lesion regions. Moreover, imbalanced data distribution (e.g., skewed ratios of normal to abnormal samples) weakens model performance, necessitating tailored strategies to optimize segmentation outcomes.
At present, most models used to process medical images are based on variants or improvements of the Unet [1] framework, although their improvement ideas are different, the overall framework still maintains the U-shaped structure of Unet. No matter how they are improved, they are not divorced from the design idea of the Unet framework, including the encoder, decoder and the skip connection connecting the two parts. In Unet, the skip connection is crucial, which can fuse some simple features captured by the shallow structure (such as boundaries and colors) with intricate semantic features captured by the deep structure with a larger receptive field. In recent years, U-shaped algorithms have made considerable progress in the field of medical image segmentation. Furthermore, a series of outstanding models have emerged, including variations like Unet++[2], skunet [3], Unet3+[4], Attention Unet [5], which focus on modifying the skip connections of Unet. There are also models that redesign the encoding structure, such as TransUnet [6], Dense Unet [7], SA Unet [8], Swin-Unet [9], among others. These methods all take single 2D slices as input. Additionally, there are models that use 3D images as input, such as 3D Unet [10], NNFormer [11], 3D U-Net [12], V-Net [13], which have excelled in medical image segmentation, confirming the unique advantages of the U-shaped architecture in this domain. Confirming the unique advantages of the U-shaped structure in this domain, these models have demonstrated outstanding performance in medical image segmentation.
The approach based on the U-shaped architecture has shown very promising results in the field of medical image segmentation. However, it has not yet been widely adopted in clinical medicine due to the strict requirements for segmentation accuracy. While CNN approaches have made great progress, they face challenges when incorporating global and distant semantic information. These difficulties arise from the limitations of convolution operations, particularly for complex target structures that possess significant variations in texture, shape, and size. To overcome these limitations, some studies have attempted to capture global contextual information by introducing an attentional mechanism [14], [15], [16], [17], while some studies have introduced feature pyramids to capture multi-scale features through pooling operations at different scales [18]. However, none of these methods can effectively solve the problem of false segmentation caused by fuzzy mask boundaries, which ultimately is that the pixels in the boundary part are not classified correctly. In recent years, with the significant breakthroughs of Transformers in computer vision, more and more researchers have started to adopt Visual Transformers (ViT) methods. Transformers were originally widely used in the field of natural language processing (NLP), but due to their outstanding performance and excellent portability, they have gradually gained wide application in computer vision. Some researchers have attempted to incorporate transformers into the model to improve segmentation accuracy, but have not addressed the issue of boundary segmentation errors.
Inspired by DeepLux [19], TextField [20], and DFM [21], this paper introduces a novel model called DF-TransUnet for the segmentation of 2D medical images. Unlike previous methods that extract deeper semantic features through different methods, this paper proposes a pixel level boundary classification model that can classify each pixel point at the boundary of the MR image to be segmented. Specifically, the main contributions of our model can be summarized as follows:
(1) A pixel-level medical image segmentation model DF-TransUNet is proposed, which can segment each pixel of medical image boundary correctly.
(2) The proposed classification module has strong portability and can be embedded in various medical image segmentation models.
(3) We conducted a variety of experiments on the Automated Cardiac Diagnosis Challenge (ACDC) Dataset and Synapse Multi-Organ Segmentation Dataset with different methods for comparison as well as visualization of results.
The experimental results show that the model proposed in this paper is significantly more efficient and robust in terms of accuracy and robustness.
Comments (0)