{"ID":2853344,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.16371","arxiv_id":"2510.16371","title":"Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis","abstract":"Computer-assisted surgery research requires large, deeply annotated video datasets that capture clinical and technical variability. Existing cataract surgery resources lack the diversity and annotation depth required to train generalizable deep-learning models. To address this gap, we present a dataset of 3,000 phacoemulsification cataract surgery videos acquired at two surgical centers from surgeons with varying expertise. The dataset provides four annotation layers: temporal surgical phases, instance segmentation of instruments and anatomical structures, instrument-tissue interaction tracking, and quantitative skill scores based on competency rubrics adapted from ICO-OSCAR and GRASIS. We demonstrate the technical utility of the dataset through benchmarking deep learning models across four tasks: workflow recognition, scene segmentation, instrument-tissue interaction tracking, and automated skill assessment. Furthermore, we establish a domain-adaptation baseline for phase recognition and instance segmentation by training on one surgical center and evaluating on a held-out center. Ultimately, these multi-source acquisitions, multi-layer annotations, and paired skill-kinematic labels facilitate the development of generalizable multi-task models for surgical workflow analysis, scene understanding, and competency-based training research.","short_abstract":"Computer-assisted surgery research requires large, deeply annotated video datasets that capture clinical and technical variability. Existing cataract surgery resources lack the diversity and annotation depth required to train generalizable deep-learning models. To address this gap, we present a dataset of 3,000 phacoem...","url_abs":"https://arxiv.org/abs/2510.16371","url_pdf":"https://arxiv.org/pdf/2510.16371v3","authors":"[\"Mohammad Javad Ahmadi\",\"Iman Gandomi\",\"Parisa Abdi\",\"Seyed-Farzad Mohammadi\",\"Amirhossein Taslimi\",\"Mehdi Khodaparast\",\"Hassan Hashemi\",\"Mahdi Tavakoli\",\"Hamid D. Taghirad\"]","published":"2025-10-18T06:48:29Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[]","has_code":false}
