{"ID":2854440,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14330","arxiv_id":"2510.14330","title":"Ensembling Multiple Hallucination Detectors Trained on VLLM Internal Representations","abstract":"This paper presents the 5th place solution by our team, y3h2, for the Meta CRAG-MM Challenge at KDD Cup 2025. The CRAG-MM benchmark is a visual question answering (VQA) dataset focused on factual questions about images, including egocentric images. The competition was contested based on VQA accuracy, as judged by an LLM-based automatic evaluator. Since incorrect answers result in negative scores, our strategy focused on reducing hallucinations from the internal representations of the VLM. Specifically, we trained logistic regression-based hallucination detection models using both the hidden_state and the outputs of specific attention heads. We then employed an ensemble of these models. As a result, while our method sacrificed some correct answers, it significantly reduced hallucinations and allowed us to place among the top entries on the final leaderboard.","short_abstract":"This paper presents the 5th place solution by our team, y3h2, for the Meta CRAG-MM Challenge at KDD Cup 2025. The CRAG-MM benchmark is a visual question answering (VQA) dataset focused on factual questions about images, including egocentric images. The competition was contested based on VQA accuracy, as judged by an LL...","url_abs":"https://arxiv.org/abs/2510.14330","url_pdf":"https://arxiv.org/pdf/2510.14330v2","authors":"[\"Yuto Nakamizo\",\"Ryuhei Miyazato\",\"Hikaru Tanabe\",\"Ryuta Yamakura\",\"Kiori Hatanaka\"]","published":"2025-10-16T06:09:26Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"Large Language Model\"]","has_code":false}
