{"ID":2828936,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.13262","arxiv_id":"2512.13262","title":"Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving","abstract":"Learning interactive motion behaviors among multiple agents is a core challenge in autonomous driving. While imitation learning models generate realistic trajectories, they often inherit biases from datasets dominated by safe demonstrations, limiting robustness in safety-critical cases. Moreover, most studies rely on open-loop evaluation, overlooking compounding errors in closed-loop execution. We address these limitations with two complementary strategies. First, we propose Group Relative Behavior Optimization (GRBO), a reinforcement learning post-training method that fine-tunes pretrained behavior models via group relative advantage maximization with human regularization. Using only 10% of the training dataset, GRBO improves safety performance by over 40% while preserving behavioral realism. Second, we introduce Warm-K, a warm-started Top-K sampling strategy that balances consistency and diversity in motion selection. Our Warm-K method-based test-time scaling enhances behavioral consistency and reactivity at test time without retraining, mitigating covariate shift and reducing performance discrepancies. Demo videos are available in the supplementary material.","short_abstract":"Learning interactive motion behaviors among multiple agents is a core challenge in autonomous driving. While imitation learning models generate realistic trajectories, they often inherit biases from datasets dominated by safe demonstrations, limiting robustness in safety-critical cases. Moreover, most studies rely on o...","url_abs":"https://arxiv.org/abs/2512.13262","url_pdf":"https://arxiv.org/pdf/2512.13262v1","authors":"[\"Hyunki Seong\",\"Jeong-Kyun Lee\",\"Heesoo Myeong\",\"Yongho Shin\",\"Hyun-Mook Cho\",\"Duck Hoon Kim\",\"Pranav Desai\",\"Monu Surana\"]","published":"2025-12-15T12:18:50Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CV\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
