{"ID":2867528,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17421","arxiv_id":"2509.17421","title":"RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios","abstract":"While various multimodal multi-image evaluation datasets have been emerged, but these datasets are primarily based on English, and there has yet to be a Chinese multi-image dataset. To fill this gap, we introduce RealBench, the first Chinese multimodal multi-image dataset, which contains 9393 samples and 69910 images. RealBench distinguishes itself by incorporating real user-generated content, ensuring high relevance to real-world applications. Additionally, the dataset covers a wide variety of scenes, image resolutions, and image structures, further increasing the difficulty of multi-image understanding. Ultimately, we conduct a comprehensive evaluation of RealBench using 21 multimodal LLMs of different sizes, including closed-source models that support multi-image inputs as well as open-source visual and video models. The experimental results indicate that even the most powerful closed-source models still face challenges when handling multi-image Chinese scenarios. Moreover, there remains a noticeable performance gap of around 71.8\\% on average between open-source visual/video models and closed-source models. These results show that RealBench provides an important research foundation for further exploring multi-image understanding capabilities in the Chinese context.","short_abstract":"While various multimodal multi-image evaluation datasets have been emerged, but these datasets are primarily based on English, and there has yet to be a Chinese multi-image dataset. To fill this gap, we introduce RealBench, the first Chinese multimodal multi-image dataset, which contains 9393 samples and 69910 images....","url_abs":"https://arxiv.org/abs/2509.17421","url_pdf":"https://arxiv.org/pdf/2509.17421v1","authors":"[\"Fei Zhao\",\"Chengqiang Lu\",\"Yufan Shen\",\"Qimeng Wang\",\"Yicheng Qian\",\"Haoxin Zhang\",\"Yan Gao\",\"Yi Wu\",\"Yao Hu\",\"Zhen Wu\",\"Shangyu Xing\",\"Xinyu Dai\"]","published":"2025-09-22T07:14:31Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.MM\"]","methods":"[\"Large Language Model\"]","has_code":false}
