{"ID":3083923,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T03:54:17.966829144Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05922","arxiv_id":"2606.05922","title":"Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts","abstract":"AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.","short_abstract":"AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment s...","url_abs":"https://arxiv.org/abs/2606.05922","url_pdf":"https://arxiv.org/pdf/2606.05922v1","authors":"[\"Wenbo Pan\",\"Shujie Liu\",\"Chin-Yew Lin\",\"Jingying Zeng\",\"Xianfeng Tang\",\"Xiangyang Zhou\",\"Yan Lu\",\"Xiaohua Jia\"]","published":"2026-06-04T09:26:00Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false}
