{"ID":2842784,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09228","arxiv_id":"2511.09228","title":"Taming Object Hallucinations with Verified Atomic Confidence Estimation","abstract":"Multimodal Large Language Models (MLLMs) often suffer from hallucinations, particularly errors in object existence, attributes, or relations, which undermine their reliability. We introduce TACO (Verified Atomic Confidence Estimation), a simple framework that mitigates hallucinations through self-verification and confidence calibration without relying on external vision experts. TACO decomposes responses into atomic queries, paraphrases them to reduce sensitivity to wording, and estimates confidence using self-consistency (black-box) or self-confidence (gray-box) aggregation, before refining answers with a language model. Experiments on five benchmarks (POPE, MME, HallusionBench, AMBER, and MM-Hal Bench) with two MLLMs (\\texttt{LLaVA-1.5-7B} and \\texttt{CogVLM2}) show that TACO consistently outperforms direct prompting and Visual Contrastive Decoding, reduces systematic biases, and improves confidence calibration, demonstrating its effectiveness in enhancing the faithfulness of MLLMs.","short_abstract":"Multimodal Large Language Models (MLLMs) often suffer from hallucinations, particularly errors in object existence, attributes, or relations, which undermine their reliability. We introduce TACO (Verified Atomic Confidence Estimation), a simple framework that mitigates hallucinations through self-verification and confi...","url_abs":"https://arxiv.org/abs/2511.09228","url_pdf":"https://arxiv.org/pdf/2511.09228v1","authors":"[\"Jiarui Liu\",\"Weihao Xuan\",\"Zhijing Jin\",\"Mona Diab\"]","published":"2025-11-12T11:44:46Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}