{"ID":2847685,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.27630","arxiv_id":"2510.27630","title":"Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training","abstract":"Large Language Model (LLM) agents have recently shown strong potential in domains such as automated coding, deep research, and graphical user interface manipulation. However, training them to succeed on long-horizon, domain-specialized tasks remains challenging. Current methods primarily fall into two categories. The first relies on dense human annotations through behavior cloning, which is prohibitively expensive for long-horizon tasks that can take days or months. The second depends on outcome-driven sampling, which often collapses due to the rarity of valid positive trajectories on domain-specialized tasks. We introduce Apollo, a sampling framework that integrates asynchronous human guidance with action-level data filtering. Instead of requiring annotators to shadow every step, Apollo allows them to intervene only when the agent drifts from a promising trajectory, by providing prior knowledge, strategic advice, etc. This lightweight design makes it possible to sustain interactions for over 30 hours and produces valuable trajectories at a lower cost. Apollo then applies supervision control to filter out sub-optimal actions and prevent error propagation. Together, these components enable reliable and effective data collection in long-horizon environments. To demonstrate the effectiveness of Apollo, we evaluate it using InnovatorBench. Our experiments show that when applied to train the GLM-4.5 model on InnovatorBench, Apollo achieves more than a 50% improvement over the untrained baseline and a 28% improvement over a variant trained without human interaction. These results highlight the critical role of human-in-the-loop sampling and the robustness of Apollo's design in handling long-horizon, domain-specialized tasks.","short_abstract":"Large Language Model (LLM) agents have recently shown strong potential in domains such as automated coding, deep research, and graphical user interface manipulation. However, training them to succeed on long-horizon, domain-specialized tasks remains challenging. Current methods primarily fall into two categories. The f...","url_abs":"https://arxiv.org/abs/2510.27630","url_pdf":"https://arxiv.org/pdf/2510.27630v2","authors":"[\"Dayuan Fu\",\"Yunze Wu\",\"Xiaojie Cai\",\"Lyumanshan Ye\",\"Shijie Xia\",\"Zhen Huang\",\"Weiye Si\",\"Tianze Xu\",\"Jie Sun\",\"Keyu Li\",\"Mohan Jiang\",\"Junfei Wang\",\"Qishuo Hua\",\"Pengrui Lu\",\"Yang Xiao\",\"Pengfei Liu\"]","published":"2025-10-31T17:00:22Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
