{"ID":2848232,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.26937","arxiv_id":"2510.26937","title":"MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models","abstract":"Large Vision-Language Models (LVLMs) have exhibited remarkable progress. However, deficiencies remain compared to human intelligence, such as hallucination and shallow pattern matching. In this work, we aim to evaluate a fundamental yet underexplored intelligence: association, a cornerstone of human cognition for creative thinking and knowledge integration. Current benchmarks, often limited to closed-ended tasks, fail to capture the complexity of open-ended association reasoning vital for real-world applications. To address this, we present MM-OPERA, a systematic benchmark with 11,497 instances across two open-ended tasks: Remote-Item Association (RIA) and In-Context Association (ICA), aligning association intelligence evaluation with human psychometric principles. It challenges LVLMs to resemble the spirit of divergent thinking and convergent associative reasoning through free-form responses and explicit reasoning paths. We deploy tailored LLM-as-a-Judge strategies to evaluate open-ended outputs, applying process-reward-informed judgment to dissect reasoning with precision. Extensive empirical studies on state-of-the-art LVLMs, including sensitivity analysis of task instances, validity analysis of LLM-as-a-Judge strategies, and diversity analysis across abilities, domains, languages, cultures, etc., provide a comprehensive and nuanced understanding of the limitations of current LVLMs in associative reasoning, paving the way for more human-like and general-purpose AI. The dataset and code are available at https://github.com/MM-OPERA-Bench/MM-OPERA.","short_abstract":"Large Vision-Language Models (LVLMs) have exhibited remarkable progress. However, deficiencies remain compared to human intelligence, such as hallucination and shallow pattern matching. In this work, we aim to evaluate a fundamental yet underexplored intelligence: association, a cornerstone of human cognition for creat...","url_abs":"https://arxiv.org/abs/2510.26937","url_pdf":"https://arxiv.org/pdf/2510.26937v1","authors":"[\"Zimeng Huang\",\"Jinxin Ke\",\"Xiaoxuan Fan\",\"Yufeng Yang\",\"Yang Liu\",\"Liu Zhonghan\",\"Zedi Wang\",\"Junteng Dai\",\"Haoyi Jiang\",\"Yuyu Zhou\",\"Keze Wang\",\"Ziliang Chen\"]","published":"2025-10-30T18:49:06Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":607602,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2848232,"paper_url":"https://arxiv.org/abs/2510.26937","paper_title":"MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models","repo_url":"https://github.com/MM-OPERA-Bench/MM-OPERA","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
