{"ID":2859502,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.06186","arxiv_id":"2510.06186","title":"RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback","abstract":"Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature of realistic workflows of scientific research development. To address this gap, we present RECODE-H, a benchmark of 102 tasks from research papers and repositories that evaluates LLM agents through multi-turn interactions with LLM-simulated human feedback. It includes structured instructions,unit tests, and a five-level feedback hierarchy to reflect realistic researcher-agent collaboration. We further present ReCodeAgent, a framework that integrates feedback into iterative code generation. Experiments with leading LLMs, including GPT-5, Claude-Sonnet-4, DeepSeek-V3.1, and Gemini 2.5, show substantial performance gains with richer feedback, while also highlighting ongoing challenges in the generation of complex research code. RECODE-H establishes a foundation for developing adaptive, feedback-driven LLM agents in scientific research implementation","short_abstract":"Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature of realistic workflows of scientific research devel...","url_abs":"https://arxiv.org/abs/2510.06186","url_pdf":"https://arxiv.org/pdf/2510.06186v2","authors":"[\"Chunyu Miao\",\"Henry Peng Zou\",\"Yangning Li\",\"Yankai Chen\",\"Yibo Wang\",\"Fangxin Wang\",\"Yifan Li\",\"Wooseong Yang\",\"Bowei He\",\"Xinni Zhang\",\"Dianzhi Yu\",\"Hanchen Yang\",\"Hoang H Nguyen\",\"Yue Zhou\",\"Jie Yang\",\"Jizhou Guo\",\"Wenzhe Fan\",\"Chin-Yuan Yeh\",\"Panpan Meng\",\"Liancheng Fang\",\"Jinhu Qi\",\"Wei-Chieh Huang\",\"Zhengyao Gu\",\"Yuwei Han\",\"Langzhou He\",\"Yuyao Yang\",\"Yinghui Li\",\"Hai-Tao Zheng\",\"Xue Liu\",\"Irwin King\",\"Philip S. Yu\"]","published":"2025-10-07T17:45:35Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
