{"ID":2868092,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16972","arxiv_id":"2509.16972","title":"The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA","abstract":"Referring video object segmentation (RVOS) requires segmenting and tracking objects in videos conditioned on natural-language expressions, demanding fine-grained understanding of both appearance and motion. Building on Sa2VA, which couples a Multi-modal Large Language Model (MLLM) with the video segmentation model SAM2, we identify two key bottlenecks that limit segmentation performance: sparse frame sampling and reliance on a single [SEG] token for an entire video. We propose Segmentation Augmented and Selective Averaged Sa2VA (SaSaSa2VA) to address these issues. On the 7th LSVOS Challenge (RVOS track), SaSaSa2VA achieves a $\\mathcal{J\\\u0026F}$ of 67.45, ranking first and surpassing the runner-up by 2.80 points. This result and ablation studies demonstrate that efficient segmentation augmentation and test-time ensembling substantially enhance grounded MLLMs for RVOS. The code is released in Sa2VA repository: https://github.com/bytedance/Sa2VA.","short_abstract":"Referring video object segmentation (RVOS) requires segmenting and tracking objects in videos conditioned on natural-language expressions, demanding fine-grained understanding of both appearance and motion. Building on Sa2VA, which couples a Multi-modal Large Language Model (MLLM) with the video segmentation model SAM2...","url_abs":"https://arxiv.org/abs/2509.16972","url_pdf":"https://arxiv.org/pdf/2509.16972v2","authors":"[\"Quanzhu Niu\",\"Dengxian Gong\",\"Shihao Chen\",\"Tao Zhang\",\"Yikang Zhou\",\"Haobo Yuan\",\"Lu Qi\",\"Xiangtai Li\",\"Shunping Ji\"]","published":"2025-09-21T08:08:17Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609543,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2868092,"paper_url":"https://arxiv.org/abs/2509.16972","paper_title":"The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA","repo_url":"https://github.com/bytedance/Sa2VA","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
