{"ID":2837615,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.19199","arxiv_id":"2511.19199","title":"CLASH: A Benchmark for Cross-Modal Contradiction Detection","abstract":"Contradictory multimodal inputs are common in real-world settings, yet existing benchmarks typically assume input consistency and fail to evaluate cross-modal contradiction detection - a fundamental capability for preventing hallucinations and ensuring reliability. We introduce CLASH, a novel benchmark for multimodal contradiction detection, featuring COCO images paired with contradictory captions containing controlled object-level or attribute-level contradictions. The samples include targeted questions evaluated in both multiple-choice and open-ended formats. The benchmark provides an extensive fine-tuning set filtered through automated quality checks, alongside a smaller human-verified diagnostic set. Our analysis of state-of-the-art models reveals substantial limitations in recognizing cross-modal conflicts, exposing systematic modality biases and category-specific weaknesses. Furthermore, we empirically demonstrate that targeted fine-tuning on CLASH substantially enhances conflict detection capabilities.","short_abstract":"Contradictory multimodal inputs are common in real-world settings, yet existing benchmarks typically assume input consistency and fail to evaluate cross-modal contradiction detection - a fundamental capability for preventing hallucinations and ensuring reliability. We introduce CLASH, a novel benchmark for multimodal c...","url_abs":"https://arxiv.org/abs/2511.19199","url_pdf":"https://arxiv.org/pdf/2511.19199v1","authors":"[\"Teodora Popordanoska\",\"Jiameng Li\",\"Matthew B. Blaschko\"]","published":"2025-11-24T15:09:07Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[]","has_code":false}