{"ID":2862678,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.26047","arxiv_id":"2509.26047","title":"DGM4+: Dataset Extension for Global Scene Inconsistency","abstract":"The rapid advances in generative models have significantly lowered the barrier to producing convincing multimodal disinformation. Fabricated images and manipulated captions increasingly co-occur to create persuasive false narratives. While the Detecting and Grounding Multi-Modal Media Manipulation (DGM4) dataset established a foundation for research in this area, it is restricted to local manipulations such as face swaps, attribute edits, and caption changes. This leaves a critical gap: global inconsistencies, such as mismatched foregrounds and backgrounds, which are now prevalent in real-world forgeries. To address this, we extend DGM4 with 5,000 high-quality samples that introduce Foreground-Background (FG-BG) mismatches and their hybrids with text manipulations. Using OpenAI's gpt-image-1 and carefully designed prompts, we generate human-centric news-style images where authentic figures are placed into absurd or impossible backdrops (e.g., a teacher calmly addressing students on the surface of Mars). Captions are produced under three conditions: literal, text attribute, and text split, yielding three new manipulation categories: FG-BG, FG-BG+TA, and FG-BG+TS. Quality control pipelines enforce one-to-three visible faces, perceptual hash deduplication, OCR-based text scrubbing, and realistic headline length. By introducing global manipulations, our extension complements existing datasets, creating a benchmark DGM4+ that tests detectors on both local and global reasoning. This resource is intended to strengthen evaluation of multimodal models such as HAMMER, which currently struggle with FG-BG inconsistencies. We release our DGM4+ dataset and generation script at https://github.com/Gaganx0/DGM4plus","short_abstract":"The rapid advances in generative models have significantly lowered the barrier to producing convincing multimodal disinformation. Fabricated images and manipulated captions increasingly co-occur to create persuasive false narratives. While the Detecting and Grounding Multi-Modal Media Manipulation (DGM4) dataset establ...","url_abs":"https://arxiv.org/abs/2509.26047","url_pdf":"https://arxiv.org/pdf/2509.26047v1","authors":"[\"Gagandeep Singh\",\"Samudi Amarsinghe\",\"Priyanka Singh\",\"Xue Li\"]","published":"2025-09-30T10:24:21Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false,"code_links":[{"ID":608922,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2862678,"paper_url":"https://arxiv.org/abs/2509.26047","paper_title":"DGM4+: Dataset Extension for Global Scene Inconsistency","repo_url":"https://github.com/Gaganx0/DGM4plus","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
