{"ID":2854650,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14677","arxiv_id":"2510.14677","title":"When Planners Meet Reality: How Learned, Reactive Traffic Agents Shift nuPlan Benchmarks","abstract":"Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive behavior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilities. We address this issue by integrating the state-of-the-art learned traffic agent model SMART into nuPlan. Thus, we are the first to evaluate planners under more realistic conditions and quantify how conclusions shift when narrowing the sim-to-real gap. Our analysis covers 14 recent planners and established baselines and shows that IDM-based simulation overestimates planning performance: nearly all scores deteriorate. In contrast, many planners interact better than previously assumed and even improve in multi-lane, interaction-heavy scenarios like lane changes or turns. Methods trained in closed-loop demonstrate the best and most stable driving performance. However, when reaching their limits in augmented edge-case scenarios, all learned planners degrade abruptly, whereas rule-based planners maintain reasonable basic behavior. Based on our results, we suggest SMART-reactive simulation as a new standard closed-loop benchmark in nuPlan and release the SMART agents as a drop-in alternative to IDM at https://github.com/shgd95/InteractiveClosedLoop.","short_abstract":"Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive behavior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilitie...","url_abs":"https://arxiv.org/abs/2510.14677","url_pdf":"https://arxiv.org/pdf/2510.14677v1","authors":"[\"Steffen Hagedorn\",\"Luka Donkov\",\"Aron Distelzweig\",\"Alexandru P. Condurache\"]","published":"2025-10-16T13:34:12Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.AI\",\"cs.LG\",\"cs.MA\"]","methods":"[]","has_code":false,"code_links":[{"ID":608179,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2854650,"paper_url":"https://arxiv.org/abs/2510.14677","paper_title":"When Planners Meet Reality: How Learned, Reactive Traffic Agents Shift nuPlan Benchmarks","repo_url":"https://github.com/shgd95/InteractiveClosedLoop","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
