SWINUNET: A HYBRID SWIN TRANSFORMER-CNN ARCHITECTURE WITH ADAPTIVE FEATURE FUSION FOR BREAST ULTRASOUND LESION SEGMENTATION

Saeeda Naz; Saddam Hussain Khan; Rashid Iqbal; Muhammad Safiullah

Authors

Saeeda Naz
Saddam Hussain Khan
Rashid Iqbal
Muhammad Safiullah

Keywords:

Breast Cancer, Ultrasound Imaging, Image Segmentation, CNN, ViT, Swin Transformer

Abstract

Breast ultrasound imaging is a very safe and cost-effective lesion detection technique; however, exact lesion segmentation is a difficult task because of its low contrast, speckle noise, and unclear boundaries. To overcome these limitations, this work presents SwinUNet, a novel hybrid model that leverages the strong global modeling ability of a Swin-transformer(encoder) and the strong local modeling ability of a convolutional decoder. The main idea of this work is the utilization of a novel Adaptive Feature Fusion (AFF) module inside the skip connections, aiming to adaptively weigh channel-wise features, thus highlighting lesion-related feature maps while eliminating background noise inherent to ultrasound images. SwinUNet was tested for its efficacy on the publicly available BUSI breast ultrasound imaging dataset following a five-fold cross-validation setup. The results clearly show that SwinUNet can effectively achieve a Mean Intersection over Union of 89.93%, with a Global Accuracy of 97.68%, thus outperforming traditional CNN models as well as baseline transformer models. Furthermore, ablation tests have confirmed that the AFF module plays a crucial role in achieving these improvements over traditional techniques. These findings clearly demonstrate SwinUNet's ability for accurate lesion segmentation, thus making it a potential candidate for inclusion within a computer-aided diagnostic environment. This confirms that the suggested deep network presents accurate lesion delineation capability, emphasizing its use to assist medical specialists through computer-aided tools.