{"ID":2856398,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.11484","arxiv_id":"2510.11484","title":"Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware","abstract":"Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of integer rescaling during inference, which is a hardware costly operation in integer-only AI inference. This work shows that rescaling cost can be dramatically reduced post-training, by applying a stronger quantization to the rescale multiplicands at no model-quality loss. Furthermore, we introduce Rescale-Aware Training, a fine tuning method for ultra-low bit-width rescaling multiplicands. Experiments show that even with 8x reduced rescaler widths, the full accuracy is preserved through minimal incremental retraining. This enables more energy-efficient and cost-efficient AI inference for resource-constrained embedded systems.","short_abstract":"Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of integer rescaling during inference, which is a hardware costly operation in integer-...","url_abs":"https://arxiv.org/abs/2510.11484","url_pdf":"https://arxiv.org/pdf/2510.11484v1","authors":"[\"Lion Mueller\",\"Alberto Garcia-Ortiz\",\"Ardalan Najafi\",\"Adam Fuks\",\"Lennart Bamberg\"]","published":"2025-10-13T14:55:34Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AR\"]","methods":"[]","has_code":false}