{"ID":2897911,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.04410","arxiv_id":"2507.04410","title":"Multimedia Verification Through Multi-Agent Deep Research Multimodal Large Language Models","abstract":"This paper presents our submission to the ACMMM25 - Grand Challenge on Multimedia Verification. We developed a multi-agent verification system that combines Multimodal Large Language Models (MLLMs) with specialized verification tools to detect multimedia misinformation. Our system operates through six stages: raw data processing, planning, information extraction, deep research, evidence collection, and report generation. The core Deep Researcher Agent employs four tools: reverse image search, metadata analysis, fact-checking databases, and verified news processing that extracts spatial, temporal, attribution, and motivational context. We demonstrate our approach on a challenge dataset sample involving complex multimedia content. Our system successfully verified content authenticity, extracted precise geolocation and timing information, and traced source attribution across multiple platforms, effectively addressing real-world multimedia verification scenarios.","short_abstract":"This paper presents our submission to the ACMMM25 - Grand Challenge on Multimedia Verification. We developed a multi-agent verification system that combines Multimodal Large Language Models (MLLMs) with specialized verification tools to detect multimedia misinformation. Our system operates through six stages: raw data...","url_abs":"https://arxiv.org/abs/2507.04410","url_pdf":"https://arxiv.org/pdf/2507.04410v1","authors":"[\"Huy Hoan Le\",\"Van Sy Thinh Nguyen\",\"Thi Le Chi Dang\",\"Vo Thanh Khang Nguyen\",\"Truong Thanh Hung Nguyen\",\"Hung Cao\"]","published":"2025-07-06T14:54:07Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.IR\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}