{"ID":2823148,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.01285","arxiv_id":"2601.01285","title":"S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss","abstract":"Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures fail to resolve. Although convolutional networks provide local precision at $\\mathcal{O}(n)$ cost but limited receptive fields, vision transformers achieve global context through $\\mathcal{O}(n^2)$ self-attention at prohibitive computational expense, causing overfitting on small clinical datasets. We propose S2M-Net, a 4.7M-parameter architecture that achieves $\\mathcal{O}(HW \\log HW)$ global context through two synergistic innovations: (i) Spectral-Selective Token Mixer (SSTM), which exploits the spectral concentration of medical images via truncated 2D FFT with learnable frequency filtering and content-gated spatial projection, avoiding quadratic attention cost while maintaining global receptive fields; and (ii) Morphology-Aware Adaptive Segmentation Loss (MASL), which automatically analyzes structure characteristics (compactness, tubularity, irregularity, scale) to modulate five complementary loss components through constrained learnable weights, eliminating manual per-dataset tuning. Comprehensive evaluation in 16 medical imaging datasets that span 8 modalities demonstrates state-of-the-art performance: 96.12\\% Dice on polyp segmentation, 83.77\\% on surgical instruments (+17.85\\% over the prior art) and 80.90\\% on brain tumors, with consistent 3-18\\% improvements over specialized baselines while using 3.5--6$\\times$ fewer parameters than transformer-based methods.","short_abstract":"Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures fail to resolve. Although convolutional networks provide loc...","url_abs":"https://arxiv.org/abs/2601.01285","url_pdf":"https://arxiv.org/pdf/2601.01285v2","authors":"[\"Md. Sanaullah Chowdhury Lameya Sabrin\"]","published":"2026-01-03T21:03:54Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Vision Transformer\",\"Transformer\"]","has_code":false}
