{"ID":2848429,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.25225","arxiv_id":"2510.25225","title":"Hallucination Localization in Video Captioning","abstract":"We propose a novel task, hallucination localization in video captioning, which aims to identify hallucinations in video captions at the span level (i.e. individual words or phrases). This allows for a more detailed analysis of hallucinations compared to existing sentence-level hallucination detection task. To establish a benchmark for hallucination localization, we construct HLVC-Dataset, a carefully curated dataset created by manually annotating 1,167 video-caption pairs from VideoLLM-generated captions. We further implement a VideoLLM-based baseline method and conduct quantitative and qualitative evaluations to benchmark current performance on hallucination localization.","short_abstract":"We propose a novel task, hallucination localization in video captioning, which aims to identify hallucinations in video captions at the span level (i.e. individual words or phrases). This allows for a more detailed analysis of hallucinations compared to existing sentence-level hallucination detection task. To establish...","url_abs":"https://arxiv.org/abs/2510.25225","url_pdf":"https://arxiv.org/pdf/2510.25225v1","authors":"[\"Shota Nakada\",\"Kazuhiro Saito\",\"Yuchi Ishikawa\",\"Hokuto Munakata\",\"Tatsuya Komatsu\",\"Masayoshi Kondo\"]","published":"2025-10-29T07:00:48Z","proceeding":"cs.MM","tasks":"[\"cs.MM\"]","methods":"[\"Large Language Model\"]","has_code":false}