{"ID":2836976,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.20325","arxiv_id":"2511.20325","title":"AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models","abstract":"End-to-end models for autonomous driving hold the promise of learning complex behaviors directly from sensor data, but face critical challenges in safety and handling long-tail events. Reinforcement Learning (RL) offers a promising path to overcome these limitations, yet its success in autonomous driving has been elusive. We identify a fundamental flaw hindering this progress: a deep seated optimistic bias in the world models used for RL. To address this, we introduce a framework for post-training policy refinement built around an Impartial World Model. Our primary contribution is to teach this model to be honest about danger. We achieve this with a novel data synthesis pipeline, Counterfactual Synthesis, which systematically generates a rich curriculum of plausible collisions and off-road events. This transforms the model from a passive scene completer into a veridical forecaster that remains faithful to the causal link between actions and outcomes. We then integrate this Impartial World Model into our closed-loop RL framework, where it serves as an internal critic. During refinement, the agent queries the critic to ``dream\" of the outcomes for candidate actions. We demonstrate through extensive experiments, including on a new Risk Foreseeing Benchmark, that our model significantly outperforms baselines in predicting failures. Consequently, when used as a critic, it enables a substantial reduction in safety violations in challenging simulations, proving that teaching a model to dream of danger is a critical step towards building truly safe and intelligent autonomous agents.","short_abstract":"End-to-end models for autonomous driving hold the promise of learning complex behaviors directly from sensor data, but face critical challenges in safety and handling long-tail events. Reinforcement Learning (RL) offers a promising path to overcome these limitations, yet its success in autonomous driving has been elusi...","url_abs":"https://arxiv.org/abs/2511.20325","url_pdf":"https://arxiv.org/pdf/2511.20325v3","authors":"[\"Tianyi Yan\",\"Tao Tang\",\"Xingtai Gui\",\"Yongkang Li\",\"Jiasen Zhesng\",\"Weiyao Huang\",\"Lingdong Kong\",\"Wencheng Han\",\"Xia Zhou\",\"Xueyang Zhang\",\"Yifei Zhan\",\"Kun Zhan\",\"Cheng-zhong Xu\",\"Jianbing Shen\"]","published":"2025-11-25T13:57:24Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}