{"ID":2922194,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T20:45:12.887694882Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.00891","arxiv_id":"2606.00891","title":"MMDG-Bench: A Benchmark for Multimodal Domain Generalization","abstract":"Multi-modal Domain Generalization (MMDG) seeks to leverage complementary modalities to enhance model robustness on unseen domains. Despite extensive progress in Multi-modal Learning (MML) and Domain Generalization (DG) as individual fields, their systematic integration remains under-explored. Current MMDG research is largely confined to action recognition and lacks standardized evaluation protocols. To address this, we introduce MMDG-Bench, a comprehensive benchmark featuring two foundational frameworks: DG then MML (D2M) and MML then DG (M2D). We provide unified experimental protocols across diverse tasks, including video-audio-flow action recognition and RGB-Depth-IR face anti-spoofing. By instantiating ten MMDG baselines through pairing a unified MML configuration with five DG techniques under both D2M and M2D orderings, we demonstrate that these structured combinations frequently outperform existing state-of-the-art methods, underscoring the necessity of a unified benchmarking effort. Our analysis yields three key insights: (1) Integrating DG techniques provides consistent generalization gains across various backbones, whereas non-DG methods are highly sensitive to backbone shifts; (2) The optimal framework choice depends on inter-modal stability: D2M excels when modal relations are stable across domains, while M2D is more robust to cross-domain relational variance; (3) Stronger backbones yield amplified performance dividends when integrated into our structured frameworks. MMDG-Bench provides a principled foundation and actionable design guidelines for future research in multi-modal robustness. Code is released at https://github.com/qszhan/MMDG-Bench.","short_abstract":"Multi-modal Domain Generalization (MMDG) seeks to leverage complementary modalities to enhance model robustness on unseen domains. Despite extensive progress in Multi-modal Learning (MML) and Domain Generalization (DG) as individual fields, their systematic integration remains under-explored. Current MMDG research is l...","url_abs":"https://arxiv.org/abs/2606.00891","url_pdf":"https://arxiv.org/pdf/2606.00891v1","authors":"[\"Qianshan Zhan\",\"Qian Wang\",\"Da Li\",\"Xiao-Jun Zeng\",\"Xiatian Zhu\"]","published":"2026-05-30T20:52:49Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":612652,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T02:42:49.606572591Z","DeletedAt":null,"paper_id":2922194,"paper_url":"https://arxiv.org/abs/2606.00891","paper_title":"MMDG-Bench: A Benchmark for Multimodal Domain Generalization","repo_url":"https://github.com/qszhan/MMDG-Bench","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}