{"ID":2823196,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.00561","arxiv_id":"2601.00561","title":"AEGIS: Exploring the Limit of World Knowledge Capabilities for Unified Mulitmodal Models","abstract":"The capability of Unified Multimodal Models (UMMs) to apply world knowledge across diverse tasks remains a critical, unresolved challenge. Existing benchmarks fall short, offering only siloed, single-task evaluations with limited diagnostic power. To bridge this gap, we propose AEGIS (\\emph{i.e.}, \\textbf{A}ssessing \\textbf{E}diting, \\textbf{G}eneration, \\textbf{I}nterpretation-Understanding for \\textbf{S}uper-intelligence), a comprehensive multi-task benchmark covering visual understanding, generation, editing, and interleaved generation. AEGIS comprises 1,050 challenging, manually-annotated questions spanning 21 topics (including STEM, humanities, daily life, etc.) and 6 reasoning types. To concretely evaluate the performance of UMMs in world knowledge scope without ambiguous metrics, we further propose Deterministic Checklist-based Evaluation (DCE), a protocol that replaces ambiguous prompt-based scoring with atomic ``Y/N'' judgments, to enhance evaluation reliability. Our extensive experiments reveal that most UMMs exhibit severe world knowledge deficits and that performance degrades significantly with complex reasoning. Additionally, simple plug-in reasoning modules can partially mitigate these vulnerabilities, highlighting a promising direction for future research. These results highlight the importance of world-knowledge-based reasoning as a critical frontier for UMMs.","short_abstract":"The capability of Unified Multimodal Models (UMMs) to apply world knowledge across diverse tasks remains a critical, unresolved challenge. Existing benchmarks fall short, offering only siloed, single-task evaluations with limited diagnostic power. To bridge this gap, we propose AEGIS (\\emph{i.e.}, \\textbf{A}ssessing \\t...","url_abs":"https://arxiv.org/abs/2601.00561","url_pdf":"https://arxiv.org/pdf/2601.00561v1","authors":"[\"Jintao Lin\",\"Bowen Dong\",\"Weikang Shi\",\"Chenyang Lei\",\"Suiyun Zhang\",\"Rui Liu\",\"Xihui Liu\"]","published":"2026-01-02T04:22:18Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
