{"ID":2868221,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17191","arxiv_id":"2509.17191","title":"VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery","abstract":"Understanding cultural heritage artifacts such as ancient Greek pottery requires expert-level reasoning that remains challenging for current MLLMs due to limited domain-specific data. We introduce VaseVQA, a benchmark of 31,773 images and 67,614 question-answer pairs across seven expert-defined categories, enabling systematic evaluation of expert-level cultural heritage understanding. Using this dataset, we explore effective training strategies for domain-specific reasoning. While supervised fine-tuning improves adaptation to domain knowledge, it struggles with deeper reasoning tasks. We propose VaseVL, which augments SFT with reinforcement learning using verifiable rewards. Experiments show that VaseVL consistently outperforms supervised baselines, especially on reasoning-intensive questions, highlighting the value of targeted reinforcement learning for cultural heritage visual question answering. Our code and dataset will be released at https://github.com/AIGeeksGroup/VaseVQA.","short_abstract":"Understanding cultural heritage artifacts such as ancient Greek pottery requires expert-level reasoning that remains challenging for current MLLMs due to limited domain-specific data. We introduce VaseVQA, a benchmark of 31,773 images and 67,614 question-answer pairs across seven expert-defined categories, enabling sys...","url_abs":"https://arxiv.org/abs/2509.17191","url_pdf":"https://arxiv.org/pdf/2509.17191v2","authors":"[\"Jinchao Ge\",\"Tengfei Cheng\",\"Biao Wu\",\"Zeyu Zhang\",\"Shiya Huang\",\"Judith Bishop\",\"Gillian Shepherd\",\"Meng Fang\",\"Ling Chen\",\"Yang Zhao\"]","published":"2025-09-21T18:36:54Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\"]","has_code":false,"code_links":[{"ID":609564,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2868221,"paper_url":"https://arxiv.org/abs/2509.17191","paper_title":"VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery","repo_url":"https://github.com/AIGeeksGroup/VaseVQA","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
