{"ID":2898188,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.04141","arxiv_id":"2507.04141","title":"Pedestrian Intention Prediction via Vision-Language Foundation Models","abstract":"Prediction of pedestrian crossing intention is a critical function in autonomous vehicles. Conventional vision-based methods of crossing intention prediction often struggle with generalizability, context understanding, and causal reasoning. This study explores the potential of vision-language foundation models (VLFMs) for predicting pedestrian crossing intentions by integrating multimodal data through hierarchical prompt templates. The methodology incorporates contextual information, including visual frames, physical cues observations, and ego-vehicle dynamics, into systematically refined prompts to guide VLFMs effectively in intention prediction. Experiments were conducted on three common datasets-JAAD, PIE, and FU-PIP. Results demonstrate that incorporating vehicle speed, its variations over time, and time-conscious prompts significantly enhances the prediction accuracy up to 19.8%. Additionally, optimised prompts generated via an automatic prompt engineering framework yielded 12.5% further accuracy gains. These findings highlight the superior performance of VLFMs compared to conventional vision-based models, offering enhanced generalisation and contextual understanding for autonomous driving applications.","short_abstract":"Prediction of pedestrian crossing intention is a critical function in autonomous vehicles. Conventional vision-based methods of crossing intention prediction often struggle with generalizability, context understanding, and causal reasoning. This study explores the potential of vision-language foundation models (VLFMs)...","url_abs":"https://arxiv.org/abs/2507.04141","url_pdf":"https://arxiv.org/pdf/2507.04141v1","authors":"[\"Mohsen Azarmi\",\"Mahdi Rezaei\",\"He Wang\"]","published":"2025-07-05T19:39:00Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.ET\",\"cs.LG\",\"cs.RO\"]","methods":"[]","has_code":false}
