{"ID":3004639,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T11:43:53.432517148Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03965","arxiv_id":"2606.03965","title":"Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning","abstract":"Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking length by shortening, early-stopping, or compressing traces, leaving how the model thinks implicit. In this paper, we propose Agentic Chain-of-Thought Steering (ACTS), which formulates reasoning steering as a Markov decision process where a controller agent adaptively steers a frozen reasoner during inference. At each step, the controller observes the reasoning trace and remaining thinking budget, then issues a steering action consisting of a reasoning strategy and a steering phrase that initiates the next reasoner step. This enables budget-aware strategy control for efficient reasoning while preserving the reasoner's generation continuity. We initialize the controller agent from our constructed synthetic steering trajectories with multi-budget augmentation, and further optimize it via reinforcement learning with budget-conditioned reward shaping. Experiments across multiple benchmarks show that ACTS matches full-thinking performance with substantial token savings, and enables controllable accuracy-efficiency trade-offs across different reasoners and tasks. The code is available at https://github.com/Andree-9/ACTS.","short_abstract":"Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking length by shortening, early-stopping, or compressing traces, leaving how the model thinks i...","url_abs":"https://arxiv.org/abs/2606.03965","url_pdf":"https://arxiv.org/pdf/2606.03965v1","authors":"[\"Yu Xia\",\"Zhouhang Xie\",\"Xin Xu\",\"Byungkyu Kang\",\"Prarit Lamba\",\"Xiang Gao\",\"Julian McAuley\"]","published":"2026-06-02T17:51:30Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":612688,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-03T03:09:48.883664427Z","DeletedAt":null,"paper_id":3004639,"paper_url":"https://arxiv.org/abs/2606.03965","paper_title":"Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning","repo_url":"https://github.com/Andree-9/ACTS","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
