{"ID":2827036,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.17445","arxiv_id":"2512.17445","title":"LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents","abstract":"LangDriveCTRL is a natural-language-controllable framework for editing real-world driving videos to synthesize diverse traffic scenarios. It represents each video as an explicit 3D scene graph, decomposing the scene into a static background and dynamic object nodes. To enable fine-grained editing and realism, it introduces a feedback-driven agentic pipeline. An Orchestrator converts user instructions into executable graphs that coordinate specialized multi-modal agents and tools. An Object Grounding Agent aligns free-form text with target object nodes in the scene graph; a Behavior Editing Agent generates multi-object trajectories from language instructions; and a Behavior Reviewer Agent iteratively reviews and refines the generated trajectories. The edited scene graph is rendered and harmonized using a video diffusion tool, and then further refined by a Video Reviewer Agent to ensure photorealism and appearance alignment. LangDriveCTRL supports both object node editing (removal, insertion, and replacement) and multi-object behavior editing from natural-language instructions. Quantitatively, it achieves nearly $2\\times$ higher instruction alignment than the previous SoTA, with superior photorealism, structural preservation, and traffic realism. Project page is available at: https://yunhe24.github.io/langdrivectrl/.","short_abstract":"LangDriveCTRL is a natural-language-controllable framework for editing real-world driving videos to synthesize diverse traffic scenarios. It represents each video as an explicit 3D scene graph, decomposing the scene into a static background and dynamic object nodes. To enable fine-grained editing and realism, it introd...","url_abs":"https://arxiv.org/abs/2512.17445","url_pdf":"https://arxiv.org/pdf/2512.17445v2","authors":"[\"Yun He\",\"Francesco Pittaluga\",\"Ziyu Jiang\",\"Matthias Zwicker\",\"Manmohan Chandraker\",\"Zaid Tasneem\"]","published":"2025-12-19T10:57:03Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}
