{"ID":2897639,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.05211","arxiv_id":"2507.05211","title":"All in One: Visual-Description-Guided Unified Point Cloud Segmentation","abstract":"Unified segmentation of 3D point clouds is crucial for scene understanding, but is hindered by its sparse structure, limited annotations, and the challenge of distinguishing fine-grained object classes in complex environments. Existing methods often struggle to capture rich semantic and contextual information due to limited supervision and a lack of diverse multimodal cues, leading to suboptimal differentiation of classes and instances. To address these challenges, we propose VDG-Uni3DSeg, a novel framework that integrates pre-trained vision-language models (e.g., CLIP) and large language models (LLMs) to enhance 3D segmentation. By leveraging LLM-generated textual descriptions and reference images from the internet, our method incorporates rich multimodal cues, facilitating fine-grained class and instance separation. We further design a Semantic-Visual Contrastive Loss to align point features with multimodal queries and a Spatial Enhanced Module to model scene-wide relationships efficiently. Operating within a closed-set paradigm that utilizes multimodal knowledge generated offline, VDG-Uni3DSeg achieves state-of-the-art results in semantic, instance, and panoptic segmentation, offering a scalable and practical solution for 3D understanding. Our code is available at https://github.com/Hanzy1996/VDG-Uni3DSeg.","short_abstract":"Unified segmentation of 3D point clouds is crucial for scene understanding, but is hindered by its sparse structure, limited annotations, and the challenge of distinguishing fine-grained object classes in complex environments. Existing methods often struggle to capture rich semantic and contextual information due to li...","url_abs":"https://arxiv.org/abs/2507.05211","url_pdf":"https://arxiv.org/pdf/2507.05211v2","authors":"[\"Zongyan Han\",\"Mohamed El Amine Boudjoghra\",\"Jiahua Dong\",\"Jinhong Wang\",\"Rao Muhammad Anwer\"]","published":"2025-07-07T17:22:00Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":612356,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2897639,"paper_url":"https://arxiv.org/abs/2507.05211","paper_title":"All in One: Visual-Description-Guided Unified Point Cloud Segmentation","repo_url":"https://github.com/Hanzy1996/VDG-Uni3DSeg","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}