{"ID":2852869,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.17700","arxiv_id":"2510.17700","title":"Elastic ViTs from Pretrained Models without Retraining","abstract":"Vision foundation models achieve remarkable performance but are only available in a limited set of pre-determined sizes, forcing sub-optimal deployment choices under real-world constraints. We introduce SnapViT: Single-shot network approximation for pruned Vision Transformers, a new post-pretraining structured pruning method that enables elastic inference across a continuum of compute budgets. Our approach efficiently combines gradient information with cross-network structure correlations, approximated via an evolutionary algorithm, does not require labeled data, generalizes to models without a classification head, and is retraining-free. Experiments on DINO, SigLIPv2, DeIT, and AugReg models demonstrate superior performance over state-of-the-art methods across various sparsities, requiring less than five minutes on a single A100 GPU to generate elastic models that can be adjusted to any computational budget. Our key contributions include an efficient pruning strategy for pretrained Vision Transformers, a novel evolutionary approximation of Hessian off-diagonal structures, and a self-supervised importance scoring mechanism that maintains strong performance without requiring retraining or labels. Code and pruned models are available at: https://elastic.ashita.nl/","short_abstract":"Vision foundation models achieve remarkable performance but are only available in a limited set of pre-determined sizes, forcing sub-optimal deployment choices under real-world constraints. We introduce SnapViT: Single-shot network approximation for pruned Vision Transformers, a new post-pretraining structured pruning...","url_abs":"https://arxiv.org/abs/2510.17700","url_pdf":"https://arxiv.org/pdf/2510.17700v2","authors":"[\"Walter Simoncini\",\"Michael Dorkenwald\",\"Tijmen Blankevoort\",\"Cees G. M. Snoek\",\"Yuki M. Asano\"]","published":"2025-10-20T16:15:03Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Vision Transformer\",\"Transformer\"]","project_urls":"[\"https://elastic.ashita.nl/\"]","has_code":false,"code_links":[{"ID":608036,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2852869,"paper_url":"https://arxiv.org/abs/2510.17700","paper_title":"Elastic ViTs from Pretrained Models without Retraining","repo_url":"https://github.com/WalterSimoncini/SnapViT","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":608037,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2852869,"paper_url":"https://arxiv.org/abs/2510.17700","paper_title":"Elastic ViTs from Pretrained Models without Retraining","repo_url":"https://github.com/WalterSimoncini/SnapVit","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}