{"ID":2896055,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.07610","arxiv_id":"2507.07610","title":"SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs","abstract":"Humans can imagine and manipulate visual images mentally, a capability known as spatial visualization. While many multi-modal benchmarks assess reasoning on visible visual information, the ability to infer unseen relationships through spatial visualization remains insufficiently evaluated as a spatial skill. This reliance on publicly sourced problems from IQ tests or math competitions risks data contamination and compromises assessment reliability. To this end, we introduce SpatialViz-Bench, a comprehensive multi-modal benchmark for spatial visualization with 12 tasks across 4 sub-abilities, comprising 1,180 programmatically generated problems, a scalable framework that allows for expansion to ensure fair and continuously reliable evaluations. Our evaluation of 27 Multi-modal Large Language Models (MLLMs) reveals wide performance variations, demonstrates the benchmark's strong discriminative power, and uncovers counter-intuitive findings: Chain-of-Thought (CoT) prompting paradoxically degrades accuracy on open-source models. Through statistical and qualitative analysis of error types, SpatialViz-Bench demonstrates that state-of-the-art MLLMs exhibit deficiencies in spatial visualization tasks, thereby addressing a significant lacuna in the field. The benchmark data and evaluation code are publicly available.","short_abstract":"Humans can imagine and manipulate visual images mentally, a capability known as spatial visualization. While many multi-modal benchmarks assess reasoning on visible visual information, the ability to infer unseen relationships through spatial visualization remains insufficiently evaluated as a spatial skill. This relia...","url_abs":"https://arxiv.org/abs/2507.07610","url_pdf":"https://arxiv.org/pdf/2507.07610v7","authors":"[\"Siting Wang\",\"Minnan Pei\",\"Luoyang Sun\",\"Cheng Deng\",\"Yuchen Li\",\"Kun Shao\",\"Zheng Tian\",\"Haifeng Zhang\",\"Jun Wang\"]","published":"2025-07-10T10:27:20Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\",\"cs.HC\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
