{"ID":2900842,"CreatedAt":"2026-06-01T05:51:17.9442275Z","UpdatedAt":"2026-06-01T06:23:29.641557848Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2605.30893","arxiv_id":"2605.30893","title":"Foundation VAEs for 3D CT Reconstruction, Augmentation, and Generation","abstract":"Variational autoencoders (VAEs) compress high resolution CT volumes into compact latents while preserving clinically relevant structure. However, training CT-specific VAEs from scratch or heavily fine-tuning them incurs substantial computational and engineering cost, and often degrades under heterogeneous scanners, protocols, and diseases. This paper makes a progressive stride toward training-free medical VAEs by leveraging a critical observation: a single Foundation VAE, pretrained at scale on natural images and videos, can serve as a unified interface for CT Reconstruction, Augmentation, and Generation. With both encoder and decoder frozen, the Foundation VAE reconstructs CT volumes with preserved anatomy while suppressing acquisition noise; training segmentation models on these reconstructions improves surface accuracy by 3.9% NSD on average for pancreatic tumor and lung tumor. Within the same Foundation VAE latent space, a conditional latent diffusion model achieves 3.9% lower average FVD with 36.2% higher CT CLIP score, and improves multi-disease generation faithfulness across 18 types by 2.76% AUC. These results demonstrate Foundation VAEs as a practical interface for scalable CT representation reuse and faithful CT generation. Our code and demo are available at https://github.com/qic999/Foundation-VAE.","short_abstract":"Variational autoencoders (VAEs) compress high resolution CT volumes into compact latents while preserving clinically relevant structure. However, training CT-specific VAEs from scratch or heavily fine-tuning them incurs substantial computational and engineering cost, and often degrades under heterogeneous scanners, pro...","url_abs":"https://arxiv.org/abs/2605.30893","url_pdf":"https://arxiv.org/pdf/2605.30893v1","authors":"[\"Qi Chen\",\"Shuhan Ding\",\"Yu Gu\",\"Nan Liu\",\"Jiang Bian\",\"Alan Yuille\",\"Zongwei Zhou\",\"Jingjing Fu\"]","published":"2026-05-29T06:28:57Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Variational Autoencoder\"]","has_code":false,"code_links":[{"ID":612530,"CreatedAt":"2026-06-01T05:51:17.9442275Z","UpdatedAt":"2026-06-01T05:51:17.9442275Z","DeletedAt":null,"paper_id":2900842,"paper_url":"https://arxiv.org/abs/2605.30893","paper_title":"Foundation VAEs for 3D CT Reconstruction, Augmentation, and Generation","repo_url":"https://github.com/qic999/Foundation-VAE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}