{"ID":2854503,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14420","arxiv_id":"2510.14420","title":"Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following","abstract":"Language models often struggle to follow multi-constraint instructions that are crucial for real-world applications. Existing reinforcement learning (RL) approaches suffer from dependency on external supervision and sparse reward signals from multi-constraint tasks. We propose a label-free self-supervised RL framework that eliminates dependency on external supervision by deriving reward signals directly from instructions and generating pseudo-labels for reward model training. Our approach introduces constraint decomposition strategies and efficient constraint-wise binary classification to address sparse reward challenges while maintaining computational efficiency. Experiments show that our approach generalizes well, achieving strong improvements across 3 in-domain and 5 out-of-domain datasets, including challenging agentic and multi-turn instruction following. The data and code are publicly available at https://github.com/Rainier-rq/verl-if","short_abstract":"Language models often struggle to follow multi-constraint instructions that are crucial for real-world applications. Existing reinforcement learning (RL) approaches suffer from dependency on external supervision and sparse reward signals from multi-constraint tasks. We propose a label-free self-supervised RL framework...","url_abs":"https://arxiv.org/abs/2510.14420","url_pdf":"https://arxiv.org/pdf/2510.14420v4","authors":"[\"Qingyu Ren\",\"Qianyu He\",\"Powei Chang\",\"Jie Zeng\",\"Zeye Sun\",\"Fei Yu\",\"Jiaqing Liang\",\"Yanghua Xiao\"]","published":"2025-10-16T08:24:44Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false,"code_links":[{"ID":608161,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2854503,"paper_url":"https://arxiv.org/abs/2510.14420","paper_title":"Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following","repo_url":"https://github.com/Rainier-rq/verl-if","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
