{"ID":2865538,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.22637","arxiv_id":"2509.22637","title":"Variational Reasoning for Language Models","abstract":"We introduce a variational reasoning framework for language models that treats thinking traces as latent variables and optimizes them through variational inference. Starting from the evidence lower bound (ELBO), we extend it to a multi-trace objective for tighter bounds and propose a forward-KL formulation that stabilizes the training of the variational posterior. We further show that rejection sampling finetuning and binary-reward RL, including GRPO, can be interpreted as local forward-KL objectives, where an implicit weighting by model accuracy naturally arises from the derivation and reveals a previously unnoticed bias toward easier questions. We empirically validate our method on the Qwen 2.5 and Qwen 3 model families across a wide range of reasoning tasks. Overall, our work provides a principled probabilistic perspective that unifies variational inference with RL-style methods and yields stable objectives for improving the reasoning ability of language models. Our code is available at https://github.com/sail-sg/variational-reasoning.","short_abstract":"We introduce a variational reasoning framework for language models that treats thinking traces as latent variables and optimizes them through variational inference. Starting from the evidence lower bound (ELBO), we extend it to a multi-trace objective for tighter bounds and propose a forward-KL formulation that stabili...","url_abs":"https://arxiv.org/abs/2509.22637","url_pdf":"https://arxiv.org/pdf/2509.22637v2","authors":"[\"Xiangxin Zhou\",\"Zichen Liu\",\"Haonan Wang\",\"Chao Du\",\"Min Lin\",\"Chongxuan Li\",\"Liang Wang\",\"Tianyu Pang\"]","published":"2025-09-26T17:58:10Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":609282,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2865538,"paper_url":"https://arxiv.org/abs/2509.22637","paper_title":"Variational Reasoning for Language Models","repo_url":"https://github.com/sail-sg/variational-reasoning","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
