{"ID":2839688,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15694","arxiv_id":"2511.15694","title":"The Impact of Quantization on Large Reasoning Model Reinforcement Learning","abstract":"Strong reasoning capabilities can now be achieved by large-scale reinforcement learning (RL) without any supervised fine-tuning. Although post-training quantization (PTQ) and quantization-aware training (QAT) are well studied in the context of fine-tuning, how quantization impacts RL in large reasoning models (LRMs) remains an open question. To answer this question, we conducted systematic experiments and discovered a significant gap in reasoning performance on mathematical benchmarks between post-RL quantized models and their quantization-aware RL optimized counterparts. Our findings suggest that quantization-aware RL training negatively impacted the learning process, whereas PTQ and QLoRA led to greater performance.","short_abstract":"Strong reasoning capabilities can now be achieved by large-scale reinforcement learning (RL) without any supervised fine-tuning. Although post-training quantization (PTQ) and quantization-aware training (QAT) are well studied in the context of fine-tuning, how quantization impacts RL in large reasoning models (LRMs) re...","url_abs":"https://arxiv.org/abs/2511.15694","url_pdf":"https://arxiv.org/pdf/2511.15694v1","authors":"[\"Medha Kumar\",\"Zifei Xu\",\"Xin Wang\",\"Tristan Webb\"]","published":"2025-11-19T18:50:58Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
