{"ID":2842632,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09018","arxiv_id":"2511.09018","title":"Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs","abstract":"Object hallucination remains a critical challenge in Large Vision-Language Models (LVLMs), where models generate content inconsistent with visual inputs. Existing language-decoder based mitigation approaches often regulate visual or textual attention independently, overlooking their interaction as two key causal factors. To address this, we propose Owl (Bi-mOdal attention reWeighting for Layer-wise hallucination mitigation), a causally-grounded framework that models hallucination process via a structural causal graph, treating decomposed visual and textual attentions as mediators. We introduce VTACR (Visual-to-Textual Attention Contribution Ratio), a novel metric that quantifies the modality contribution imbalance during decoding. Our analysis reveals that hallucinations frequently occur in low-VTACR scenarios, where textual priors dominate and visual grounding is weakened. To mitigate this, we design a fine-grained attention intervention mechanism that dynamically adjusts token- and layer-wise attention guided by VTACR signals. Finally, we propose a dual-path contrastive decoding strategy: one path emphasizes visually grounded predictions, while the other amplifies hallucinated ones -- letting visual truth shine and hallucination collapse. Experimental results on the POPE and CHAIR benchmarks show that Owl achieves significant hallucination reduction, setting a new SOTA in faithfulness while preserving vision-language understanding capability. Our code is available at https://github.com/CikZ2023/OWL","short_abstract":"Object hallucination remains a critical challenge in Large Vision-Language Models (LVLMs), where models generate content inconsistent with visual inputs. Existing language-decoder based mitigation approaches often regulate visual or textual attention independently, overlooking their interaction as two key causal factor...","url_abs":"https://arxiv.org/abs/2511.09018","url_pdf":"https://arxiv.org/pdf/2511.09018v1","authors":"[\"Liu Yu\",\"Zhonghao Chen\",\"Ping Kuang\",\"Zhikun Feng\",\"Fan Zhou\",\"Lan Wang\",\"Gillian Dobbie\"]","published":"2025-11-12T06:13:26Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":607143,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2842632,"paper_url":"https://arxiv.org/abs/2511.09018","paper_title":"Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs","repo_url":"https://github.com/CikZ2023/OWL","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
