{"ID":2899724,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.00898","arxiv_id":"2507.00898","title":"ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models","abstract":"Recent Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses. Although they have achieved remarkable performance across a range of multi-modal tasks, they face the persistent challenge of hallucination, which introduces practical weaknesses and raises concerns about their reliable deployment in real-world applications. Existing work has explored contrastive decoding approaches to mitigate this issue, where the output of the original LVLM is compared and contrasted with that of a perturbed version. However, these methods require two or more queries that slow down LVLM response generation, making them less suitable for real-time applications. To overcome this limitation, we propose ONLY, a training-free decoding approach that requires only a single query and a one-layer intervention during decoding, enabling efficient real-time deployment. Specifically, we enhance textual outputs by selectively amplifying crucial textual information using a text-to-visual entropy ratio for each token. Extensive experimental results demonstrate that our proposed ONLY consistently outperforms state-of-the-art methods across various benchmarks while requiring minimal implementation effort and computational cost. Code is available at https://github.com/zifuwan/ONLY.","short_abstract":"Recent Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses. Although they have achieved remarkable performance across a range of multi-modal tasks, they face the persistent challenge of hallucination, which introduces practical...","url_abs":"https://arxiv.org/abs/2507.00898","url_pdf":"https://arxiv.org/pdf/2507.00898v1","authors":"[\"Zifu Wan\",\"Ce Zhang\",\"Silong Yong\",\"Martin Q. Ma\",\"Simon Stepputtis\",\"Louis-Philippe Morency\",\"Deva Ramanan\",\"Katia Sycara\",\"Yaqi Xie\"]","published":"2025-07-01T16:01:08Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":612512,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2899724,"paper_url":"https://arxiv.org/abs/2507.00898","paper_title":"ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models","repo_url":"https://github.com/zifuwan/ONLY","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
