{"ID":2868603,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.15596","arxiv_id":"2509.15596","title":"EyePCR: A Comprehensive Benchmark for Fine-Grained Perception, Knowledge Comprehension and Clinical Reasoning in Ophthalmic Surgery","abstract":"MLLMs (Multimodal Large Language Models) have showcased remarkable capabilities, but their performance in high-stakes, domain-specific scenarios like surgical settings, remains largely under-explored. To address this gap, we develop \\textbf{EyePCR}, a large-scale benchmark for ophthalmic surgery analysis, grounded in structured clinical knowledge to evaluate cognition across \\textit{Perception}, \\textit{Comprehension} and \\textit{Reasoning}. EyePCR offers a richly annotated corpus with more than 210k VQAs, which cover 1048 fine-grained attributes for multi-view perception, medical knowledge graph of more than 25k triplets for comprehension, and four clinically grounded reasoning tasks. The rich annotations facilitate in-depth cognitive analysis, simulating how surgeons perceive visual cues and combine them with domain knowledge to make decisions, thus greatly improving models' cognitive ability. In particular, \\textbf{EyePCR-MLLM}, a domain-adapted variant of Qwen2.5-VL-7B, achieves the highest accuracy on MCQs for \\textit{Perception} among compared models and outperforms open-source models in \\textit{Comprehension} and \\textit{Reasoning}, rivalling commercial models like GPT-4.1. EyePCR reveals the limitations of existing MLLMs in surgical cognition and lays the foundation for benchmarking and enhancing clinical reliability of surgical video understanding models.","short_abstract":"MLLMs (Multimodal Large Language Models) have showcased remarkable capabilities, but their performance in high-stakes, domain-specific scenarios like surgical settings, remains largely under-explored. To address this gap, we develop \\textbf{EyePCR}, a large-scale benchmark for ophthalmic surgery analysis, grounded in s...","url_abs":"https://arxiv.org/abs/2509.15596","url_pdf":"https://arxiv.org/pdf/2509.15596v2","authors":"[\"Gui Wang\",\"Yang Wennuo\",\"Xusen Ma\",\"Zehao Zhong\",\"Zhuoru Wu\",\"Ende Wu\",\"Rong Qu\",\"Wooi Ping Cheah\",\"Jianfeng Ren\",\"Linlin Shen\"]","published":"2025-09-19T04:55:56Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
