{"ID":2874966,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.03188","arxiv_id":"2509.03188","title":"Prompt-Guided Patch UNet-VAE with Adversarial Supervision for Adrenal Gland Segmentation in Computed Tomography Medical Images","abstract":"Segmentation of small and irregularly shaped abdominal organs, such as the adrenal glands in CT imaging, remains a persistent challenge due to severe class imbalance, poor spatial context, and limited annotated data. In this work, we propose a unified framework that combines variational reconstruction, supervised segmentation, and adversarial patch-based feedback to address these limitations in a principled and scalable manner. Our architecture is built upon a VAE-UNet backbone that jointly reconstructs input patches and generates voxel-level segmentation masks, allowing the model to learn disentangled representations of anatomical structure and appearance. We introduce a patch-based training pipeline that selectively injects synthetic patches generated from the learned latent space, and systematically study the effects of varying synthetic-to-real patch ratios during training. To further enhance output fidelity, the framework incorporates perceptual reconstruction loss using VGG features, as well as a PatchGAN-style discriminator for adversarial supervision over spatial realism. Comprehensive experiments on the BTCV dataset demonstrate that our approach improves segmentation accuracy, particularly in boundary-sensitive regions, while maintaining strong reconstruction quality. Our findings highlight the effectiveness of hybrid generative-discriminative training regimes for small-organ segmentation and provide new insights into balancing realism, diversity, and anatomical consistency in data-scarce scenarios.","short_abstract":"Segmentation of small and irregularly shaped abdominal organs, such as the adrenal glands in CT imaging, remains a persistent challenge due to severe class imbalance, poor spatial context, and limited annotated data. In this work, we propose a unified framework that combines variational reconstruction, supervised segme...","url_abs":"https://arxiv.org/abs/2509.03188","url_pdf":"https://arxiv.org/pdf/2509.03188v1","authors":"[\"Hania Ghouse\",\"Muzammil Behzad\"]","published":"2025-09-03T10:18:06Z","proceeding":"eess.IV","tasks":"[\"eess.IV\",\"cs.CV\"]","methods":"[\"Generative Adversarial Network\",\"Variational Autoencoder\"]","has_code":false}