{"ID":2849886,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.23926","arxiv_id":"2510.23926","title":"Improving the Straight-Through Estimator with Zeroth-Order Information","abstract":"We study the problem of training neural networks with quantized parameters. Learning low-precision quantized parameters by enabling computation of gradients via the Straight-Through Estimator (STE) can be challenging. While the STE enables back-propagation, which is a first-order method, recent works have explored the use of zeroth-order (ZO) gradient descent for fine-tuning. We note that the STE provides high-quality biased gradients, and ZO gradients are unbiased but can be expensive. We thus propose First-Order-Guided Zeroth-Order Gradient Descent (FOGZO) that reduces STE bias while reducing computations relative to ZO methods. Empirically, we show FOGZO improves the tradeoff between quality and training time in Quantization-Aware Pre-Training. Specifically, versus STE at the same number of iterations, we show a 1-8\\% accuracy improvement for DeiT Tiny/Small, 1-2\\% accuracy improvement on ResNet 18/50, and 1-22 perplexity point improvement for LLaMA models with up to 0.3 billion parameters. For the same loss, FOGZO yields a 796$\\times$ reduction in computation versus n-SPSA for a 2-layer MLP on MNIST. Code is available at https://github.com/1733116199/fogzo.","short_abstract":"We study the problem of training neural networks with quantized parameters. Learning low-precision quantized parameters by enabling computation of gradients via the Straight-Through Estimator (STE) can be challenging. While the STE enables back-propagation, which is a first-order method, recent works have explored the...","url_abs":"https://arxiv.org/abs/2510.23926","url_pdf":"https://arxiv.org/pdf/2510.23926v1","authors":"[\"Ningfeng Yang\",\"Tor M. Aamodt\"]","published":"2025-10-27T23:14:59Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false,"code_links":[{"ID":607747,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2849886,"paper_url":"https://arxiv.org/abs/2510.23926","paper_title":"Improving the Straight-Through Estimator with Zeroth-Order Information","repo_url":"https://github.com/1733116199/fogzo","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
