{"ID":2845195,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04084","arxiv_id":"2511.04084","title":"When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation","abstract":"Medical image segmentation is critical for accurate diagnostics and treatment planning, but remains challenging due to complex anatomical structures and limited annotated training data. CNN-based segmentation methods excel at local feature extraction, but struggle with modeling long-range dependencies. Transformers, on the other hand, capture global context more effectively, but are inherently data-hungry and computationally expensive. In this work, we introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders. By leveraging rational base functions and Group Rational KANs (GR-KANs) from the Kolmogorov-Arnold Transformer (KAT), our architecture addresses the inefficiencies of vanilla spline-based KANs, yielding a more expressive and data-efficient framework with reduced FLOPs and only a very small increase in parameter count compared to SwinUNETR. UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks, consistently surpassing both CNN- and Transformer-based baselines. Notably, it attains superior accuracy in data-scarce settings, alleviating the data-hungry limitations of standard Vision Transformers. These results show the potential of KAN-enhanced Transformers to advance data-efficient medical image segmentation. Code is available at: https://github.com/nsapkota417/UKAST","short_abstract":"Medical image segmentation is critical for accurate diagnostics and treatment planning, but remains challenging due to complex anatomical structures and limited annotated training data. CNN-based segmentation methods excel at local feature extraction, but struggle with modeling long-range dependencies. Transformers, on...","url_abs":"https://arxiv.org/abs/2511.04084","url_pdf":"https://arxiv.org/pdf/2511.04084v2","authors":"[\"Nishchal Sapkota\",\"Haoyan Shi\",\"Yejia Zhang\",\"Xianshi Ma\",\"Bofang Zheng\",\"Fabian Vazquez\",\"Pengfei Gu\",\"Danny Z. Chen\"]","published":"2025-11-06T05:44:57Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"Convolutional Neural Network\"]","has_code":false,"code_links":[{"ID":607352,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2845195,"paper_url":"https://arxiv.org/abs/2511.04084","paper_title":"When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation","repo_url":"https://github.com/nsapkota417/UKAST","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
