{"ID":2861202,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.03574","arxiv_id":"2510.03574","title":"Efficient Test-Time Scaling for Small Vision-Language Models","abstract":"Small Vision-Language Models (VLMs) provide a computationally efficient alternative to larger models, at the cost of weaker generalization abilities and downstream task performance. These shortcomings could be addressed by test-time scaling techniques, but existing methods are typically computationally demanding, contradicting the resource-efficient design goals of small models. To address these limitations, we propose two novel and efficient test-time scaling strategies that leverage the model-internal features rather than external supervision: (i) Test-Time Augmentation (TTAug), which generates multiple augmented inputs and aggregates outputs at the token level without parameter updates, and (ii) Test-Time Adaptation (TTAdapt), which adapts model parameters during inference using consensus-based pseudolabels from TTAug. Through extensive experiments across nine benchmarks, we demonstrate consistent performance improvements while maintaining computational efficiency suitable for resource-constrained environments. The generality of our approach is demonstrated both within models at different scales and across different VLMs without additional tuning.","short_abstract":"Small Vision-Language Models (VLMs) provide a computationally efficient alternative to larger models, at the cost of weaker generalization abilities and downstream task performance. These shortcomings could be addressed by test-time scaling techniques, but existing methods are typically computationally demanding, contr...","url_abs":"https://arxiv.org/abs/2510.03574","url_pdf":"https://arxiv.org/pdf/2510.03574v2","authors":"[\"Mehmet Onurcan Kaya\",\"Desmond Elliott\",\"Dim P. Papadopoulos\"]","published":"2025-10-03T23:49:06Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}
