{"ID":2863833,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25136","arxiv_id":"2509.25136","title":"BALF: Budgeted Activation-Aware Low-Rank Factorization for Fine-Tuning-Free Model Compression","abstract":"Neural network compression techniques typically require expensive fine-tuning or search procedures, rendering them impractical on commodity hardware. Inspired by recent LLM compression research, we present a general activation-aware factorization framework that can be applied to a broad range of layers. Moreover, we introduce a scalable budgeted rank allocator that allows flexible control over compression targets (e.g., retaining 50% of parameters) with no overhead. Together, these components form BALF, an efficient pipeline for compressing models without fine-tuning. We demonstrate its effectiveness across multiple scales and architectures, from ResNet-20 on CIFAR-10 to ResNeXt-101 and vision transformers on ImageNet, and show that it achieves excellent results in the fine-tuning-free regime. For instance, BALF reduces FLOPs on ResNeXt-101 by 45% with only a 1-percentage-point top-1 accuracy drop.","short_abstract":"Neural network compression techniques typically require expensive fine-tuning or search procedures, rendering them impractical on commodity hardware. Inspired by recent LLM compression research, we present a general activation-aware factorization framework that can be applied to a broad range of layers. Moreover, we in...","url_abs":"https://arxiv.org/abs/2509.25136","url_pdf":"https://arxiv.org/pdf/2509.25136v2","authors":"[\"David González-Martínez\"]","published":"2025-09-29T17:50:29Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"Large Language Model\"]","has_code":false}