{"ID":2822849,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.01456","arxiv_id":"2601.01456","title":"Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration","abstract":"In this paper, we revisit multimodal few-shot 3D point cloud semantic segmentation (FS-PCS), identifying a conflict in \"Fuse-then-Refine\" paradigms: the \"Plasticity-Stability Dilemma.\" In addition, CLIP's inter-class confusion can result in semantic blindness. To address these issues, we present the Decoupled-experts Arbitration Few-Shot SegNet (DA-FSS), a model that effectively distinguishes between semantic and geometric paths and mutually regularizes their gradients to achieve better generalization. DA-FSS employs the same backbone and pre-trained text encoder as MM-FSS to generate text embeddings, which can increase free modalities' utilization rate and better leverage each modality's information space. To achieve this, we propose a Parallel Expert Refinement module to generate each modal correlation. We also propose a Stacked Arbitration Module (SAM) to perform convolutional fusion and arbitrate correlations for each modality pathway. The Parallel Experts decouple two paths: a Geometric Expert maintains plasticity, and a Semantic Expert ensures stability. They are coordinated via a Decoupled Alignment Module (DAM) that transfers knowledge without propagating confusion. Experiments on popular datasets (S3DIS, ScanNet) demonstrate the superiority of DA-FSS over MM-FSS. Meanwhile, geometric boundaries, completeness, and texture differentiation are all superior to the baseline. The code is available at: https://github.com/MoWenQAQ/DA-FSS/.","short_abstract":"In this paper, we revisit multimodal few-shot 3D point cloud semantic segmentation (FS-PCS), identifying a conflict in \"Fuse-then-Refine\" paradigms: the \"Plasticity-Stability Dilemma.\" In addition, CLIP's inter-class confusion can result in semantic blindness. To address these issues, we present the Decoupled-experts A...","url_abs":"https://arxiv.org/abs/2601.01456","url_pdf":"https://arxiv.org/pdf/2601.01456v2","authors":"[\"Wentao Bian\",\"Fenglei Xu\"]","published":"2026-01-04T09:53:49Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[]","has_code":false,"code_links":[{"ID":605455,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2822849,"paper_url":"https://arxiv.org/abs/2601.01456","paper_title":"Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration","repo_url":"https://github.com/MoWenQAQ/DA-FSS","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
