{"ID":2880794,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.14004","arxiv_id":"2508.14004","title":"GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks","abstract":"Quantized neural networks can be viewed as a chain of noisy channels, where rounding in each layer reduces capacity as bit-width shrinks; the floating-point (FP) checkpoint sets the maximum input rate. We track capacity dynamics as the average bit-width decreases and identify resulting quantization bottlenecks by casting fine-tuning as a smooth, constrained optimization problem. Our approach employs a fully differentiable Straight-Through Estimator (STE) with learnable bit-width, noise scale and clamp bounds, and enforces a target bit-width via an exterior-point penalty; mild metric smoothing (via distillation) stabilizes training. Despite its simplicity, the method attains competitive accuracy down to the extreme W1A1 setting while retaining the efficiency of STE.","short_abstract":"Quantized neural networks can be viewed as a chain of noisy channels, where rounding in each layer reduces capacity as bit-width shrinks; the floating-point (FP) checkpoint sets the maximum input rate. We track capacity dynamics as the average bit-width decreases and identify resulting quantization bottlenecks by casti...","url_abs":"https://arxiv.org/abs/2508.14004","url_pdf":"https://arxiv.org/pdf/2508.14004v2","authors":"[\"Sergey Salishev\",\"Ian Akhremchik\"]","published":"2025-08-19T17:05:26Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.IT\",\"math.NA\"]","methods":"[]","has_code":false}
