{"ID":2864309,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23838","arxiv_id":"2509.23838","title":"2nd Place Report of MOSEv2 Challenge 2025: Concept Guided Video Object Segmentation via SeC","abstract":"Semi-supervised Video Object Segmentation aims to segment a specified target throughout a video sequence, initialized by a first-frame mask. Previous methods rely heavily on appearance-based pattern matching and thus exhibit limited robustness against challenges such as drastic visual changes, occlusions, and scene shifts. This failure is often attributed to a lack of high-level conceptual understanding of the target. The recently proposed Segment Concept (SeC) framework mitigated this limitation by using a Large Vision-Language Model (LVLM) to establish a deep semantic understanding of the object for more persistent segmentation. In this work, we evaluate its zero-shot performance on the challenging coMplex video Object SEgmentation v2 (MOSEv2) dataset. Without any fine-tuning on the training set, SeC achieved 39.7 \\JFn on the test set and ranked 2nd place in the Complex VOS track of the 7th Large-scale Video Object Segmentation Challenge.","short_abstract":"Semi-supervised Video Object Segmentation aims to segment a specified target throughout a video sequence, initialized by a first-frame mask. Previous methods rely heavily on appearance-based pattern matching and thus exhibit limited robustness against challenges such as drastic visual changes, occlusions, and scene shi...","url_abs":"https://arxiv.org/abs/2509.23838","url_pdf":"https://arxiv.org/pdf/2509.23838v1","authors":"[\"Zhixiong Zhang\",\"Shuangrui Ding\",\"Xiaoyi Dong\",\"Yuhang Zang\",\"Yuhang Cao\",\"Jiaqi Wang\"]","published":"2025-09-28T12:26:03Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}