{"ID":2852344,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.18784","arxiv_id":"2510.18784","title":"CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training","abstract":"Despite significant work on low-bit quantization-aware training (QAT), there is still an accuracy gap between such techniques and native training. To address this, we introduce CAGE (Curvature-Aware Gradient Estimation), a new QAT method that augments the straight-through estimator (STE) gradient with a curvature-aware correction designed to counteract the loss increase induced by quantization. CAGE is derived from a multi-objective view of QAT that balances loss minimization with the quantization constraints, yielding a principled correction term that depends on local curvature information. On the theoretical side, we introduce the notion of Pareto-optimal solutions for quantized optimization, and establish that CAGE yields strong convergence guarantees in the smooth non-convex setting. In terms of implementation, our approach is optimizer-agnostic, but we provide a highly-efficient implementation that leverages Adam statistics. CAGE significantly improves upon the prior state-of-the-art methods in terms of accuracy, for similar computational cost: for QAT fine-tuning, it halves the compression accuracy loss relative to the prior best method, while for QAT pre-training of Llama models, its accuracy for 3-bit weights-and-activations (W3A3) matches the accuracy achieved at 4-bits (W4A4) with the prior best method. The official implementation can be found over https://github.com/IST-DASLab/CAGE .","short_abstract":"Despite significant work on low-bit quantization-aware training (QAT), there is still an accuracy gap between such techniques and native training. To address this, we introduce CAGE (Curvature-Aware Gradient Estimation), a new QAT method that augments the straight-through estimator (STE) gradient with a curvature-aware...","url_abs":"https://arxiv.org/abs/2510.18784","url_pdf":"https://arxiv.org/pdf/2510.18784v2","authors":"[\"Soroush Tabesh\",\"Mher Safaryan\",\"Andrei Panferov\",\"Alexandra Volkova\",\"Dan Alistarh\"]","published":"2025-10-21T16:33:57Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false,"code_links":[{"ID":607984,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2852344,"paper_url":"https://arxiv.org/abs/2510.18784","paper_title":"CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training","repo_url":"https://github.com/IST-DASLab/CAGE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}