{"ID":2862673,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.26039","arxiv_id":"2509.26039","title":"SGS: Segmentation-Guided Scoring for Global Scene Inconsistencies","abstract":"We extend HAMMER, a state-of-the-art model for multimodal manipulation detection, to handle global scene inconsistencies such as foreground-background (FG-BG) mismatch. While HAMMER achieves strong performance on the DGM4 dataset, it consistently fails when the main subject is contextually misplaced into an implausible background. We diagnose this limitation as a combination of label-space bias, local attention focus, and spurious text-foreground alignment. To remedy this without retraining, we propose a lightweight segmentation-guided scoring (SGS) pipeline. SGS uses person/face segmentation masks to separate foreground and background regions, extracts embeddings with a joint vision-language model, and computes region-aware coherence scores. These scores are fused with HAMMER's original prediction to improve binary detection, grounding, and token-level explanations. SGS is inference-only, incurs negligible computational overhead, and significantly enhances robustness to global manipulations. This work demonstrates the importance of region-aware reasoning in multimodal disinformation detection. We release scripts for segmentation and scoring at https://github.com/Gaganx0/HAMMER-sgs","short_abstract":"We extend HAMMER, a state-of-the-art model for multimodal manipulation detection, to handle global scene inconsistencies such as foreground-background (FG-BG) mismatch. While HAMMER achieves strong performance on the DGM4 dataset, it consistently fails when the main subject is contextually misplaced into an implausible...","url_abs":"https://arxiv.org/abs/2509.26039","url_pdf":"https://arxiv.org/pdf/2509.26039v1","authors":"[\"Gagandeep Singh\",\"Samudi Amarsinghe\",\"Urawee Thani\",\"Ki Fung Wong\",\"Priyanka Singh\",\"Xue Li\"]","published":"2025-09-30T10:15:11Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\",\"Generative Adversarial Network\"]","has_code":false,"code_links":[{"ID":608921,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2862673,"paper_url":"https://arxiv.org/abs/2509.26039","paper_title":"SGS: Segmentation-Guided Scoring for Global Scene Inconsistencies","repo_url":"https://github.com/Gaganx0/HAMMER-sgs","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}