{"ID":2874850,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.02966","arxiv_id":"2509.02966","title":"KEPT: Knowledge-Enhanced Prediction of Trajectories from Consecutive Driving Frames with Vision-Language Models","abstract":"Accurate short-horizon trajectory prediction is crucial for safe and reliable autonomous driving. However, existing vision-language models (VLMs) often fail to accurately understand driving scenes and generate trustworthy trajectories. To address this challenge, this paper introduces KEPT, a knowledge-enhanced VLM framework that predicts ego trajectories directly from consecutive front-view driving frames. KEPT integrates a temporal frequency-spatial fusion (TFSF) video encoder, which is trained via self-supervised learning with hard-negative mining, with a k-means \u0026 HNSW retrieval-augmented generation (RAG) pipeline. Retrieved prior knowledge is added into chain-of-thought (CoT) prompts with explicit planning constraints, while a triple-stage fine-tuning paradigm aligns the VLM backbone to enhance spatial perception and trajectory prediction capabilities. Evaluated on nuScenes dataset, KEPT achieves the best open-loop performance compared with baseline methods. Ablation studies on fine-tuning stages, Top-K value of RAG, different retrieval strategies, vision encoders, and VLM backbones are conducted to demonstrate the effectiveness of KEPT. These results indicate that KEPT offers a promising, data-efficient way toward trustworthy trajectory prediction in autonomous driving.","short_abstract":"Accurate short-horizon trajectory prediction is crucial for safe and reliable autonomous driving. However, existing vision-language models (VLMs) often fail to accurately understand driving scenes and generate trustworthy trajectories. To address this challenge, this paper introduces KEPT, a knowledge-enhanced VLM fram...","url_abs":"https://arxiv.org/abs/2509.02966","url_pdf":"https://arxiv.org/pdf/2509.02966v2","authors":"[\"Yujin Wang\",\"Tianyi Wang\",\"Quanfeng Liu\",\"Wenxian Fan\",\"Junfeng Jiao\",\"Christian Claudel\",\"Yunbing Yan\",\"Bingzhao Gao\",\"Jianqiang Wang\",\"Hong Chen\"]","published":"2025-09-03T03:10:42Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"RAG\",\"Language Model\"]","has_code":false}
