{"ID":2842382,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.10507","arxiv_id":"2511.10507","title":"AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following","abstract":"Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)-especially for complex, multi-turn, and system-prompted instructions-remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hindered by the lack of high-quality, human-annotated benchmarks and reliable, interpretable reward signals. In this work, we introduce AdvancedIF (we will release this benchmark soon), a comprehensive benchmark featuring over 1,600 prompts and expert-curated rubrics that assess LLMs ability to follow complex, multi-turn, and system-level instructions. We further propose RIFL (Rubric-based Instruction-Following Learning), a novel post-training pipeline that leverages rubric generation, a finetuned rubric verifier, and reward shaping to enable effective reinforcement learning for instruction following. Extensive experiments demonstrate that RIFL substantially improves the instruction-following abilities of LLMs, achieving a 6.7% absolute gain on AdvancedIF and strong results on public benchmarks. Our ablation studies confirm the effectiveness of each component in RIFL. This work establishes rubrics as a powerful tool for both training and evaluating advanced IF in LLMs, paving the way for more capable and reliable AI systems.","short_abstract":"Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)-especially for complex, multi-turn, and system-prompted instructions-remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hinder...","url_abs":"https://arxiv.org/abs/2511.10507","url_pdf":"https://arxiv.org/pdf/2511.10507v2","authors":"[\"Yun He\",\"Wenzhe Li\",\"Hejia Zhang\",\"Songlin Li\",\"Karishma Mandyam\",\"Sopan Khosla\",\"Yuanhao Xiong\",\"Nanshu Wang\",\"Xiaoliang Peng\",\"Beibin Li\",\"Shengjie Bi\",\"Shishir G. Patil\",\"Qi Qi\",\"Shengyu Feng\",\"Julian Katz-Samuels\",\"Richard Yuanzhe Pang\",\"Sujan Gonugondla\",\"Hunter Lang\",\"Yue Yu\",\"Yundi Qian\",\"Maryam Fazel-Zarandi\",\"Licheng Yu\",\"Amine Benhalloum\",\"Hany Awadalla\",\"Manaal Faruqui\"]","published":"2025-11-13T17:14:01Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
