{"ID":2871229,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.11071","arxiv_id":"2509.11071","title":"The System Description of CPS Team for Track on Driving with Language of CVPR 2024 Autonomous Grand Challenge","abstract":"This report outlines our approach using vision language model systems for the Driving with Language track of the CVPR 2024 Autonomous Grand Challenge. We have exclusively utilized the DriveLM-nuScenes dataset for training our models. Our systems are built on the LLaVA models, which we enhanced through fine-tuning with the LoRA and DoRA methods. Additionally, we have integrated depth information from open-source depth estimation models to enrich the training and inference processes. For inference, particularly with multiple-choice and yes/no questions, we adopted a Chain-of-Thought reasoning approach to improve the accuracy of the results. This comprehensive methodology enabled us to achieve a top score of 0.7799 on the validation set leaderboard, ranking 1st on the leaderboard.","short_abstract":"This report outlines our approach using vision language model systems for the Driving with Language track of the CVPR 2024 Autonomous Grand Challenge. We have exclusively utilized the DriveLM-nuScenes dataset for training our models. Our systems are built on the LLaVA models, which we enhanced through fine-tuning with...","url_abs":"https://arxiv.org/abs/2509.11071","url_pdf":"https://arxiv.org/pdf/2509.11071v1","authors":"[\"Jinghan Peng\",\"Jingwen Wang\",\"Xing Yu\",\"Dehui Du\"]","published":"2025-09-14T03:37:17Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Language Model\",\"LoRA\"]","has_code":false}