{"ID":2890146,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.19891","arxiv_id":"2507.19891","title":"Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention","abstract":"We propose Reverse Contrast Attention (RCA), a plug-in method that enhances object localization in vision-language transformers without retraining. RCA reweights final-layer attention by suppressing extremes and amplifying mid-level activations to let semantically relevant but subdued tokens guide predictions. We evaluate it on Open Vocabulary Referring Object Detection (OV-RefOD), introducing FitAP, a confidence-free average precision metric based on IoU and box area. RCA improves FitAP in 11 out of 15 open-source VLMs, with gains up to $+26.6\\%$. Effectiveness aligns with attention sharpness and fusion timing; while late-fusion models benefit consistently, models like $\\texttt{DeepSeek-VL2}$ also improve, pointing to capacity and disentanglement as key factors. RCA offers both interpretability and performance gains for multimodal transformers. Codes and dataset are available from https://github.com/earl-juanico/rca","short_abstract":"We propose Reverse Contrast Attention (RCA), a plug-in method that enhances object localization in vision-language transformers without retraining. RCA reweights final-layer attention by suppressing extremes and amplifying mid-level activations to let semantically relevant but subdued tokens guide predictions. We evalu...","url_abs":"https://arxiv.org/abs/2507.19891","url_pdf":"https://arxiv.org/pdf/2507.19891v2","authors":"[\"Drandreb Earl O. Juanico\",\"Rowel O. Atienza\",\"Jeffrey Kenneth Go\"]","published":"2025-07-26T09:43:09Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":611745,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2890146,"paper_url":"https://arxiv.org/abs/2507.19891","paper_title":"Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention","repo_url":"https://github.com/earl-juanico/rca","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
