{"ID":2868778,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.15871","arxiv_id":"2509.15871","title":"Zero-Shot Visual Grounding in 3D Gaussians via View Retrieval","abstract":"3D Visual Grounding (3DVG) aims to locate objects in 3D scenes based on text prompts, which is essential for applications such as robotics. However, existing 3DVG methods encounter two main challenges: first, they struggle to handle the implicit representation of spatial textures in 3D Gaussian Splatting (3DGS), making per-scene training indispensable; second, they typically require larges amounts of labeled data for effective training. To this end, we propose \\underline{G}rounding via \\underline{V}iew \\underline{R}etrieval (GVR), a novel zero-shot visual grounding framework for 3DGS to transform 3DVG as a 2D retrieval task that leverages object-level view retrieval to collect grounding clues from multiple views, which not only avoids the costly process of 3D annotation, but also eliminates the need for per-scene training. Extensive experiments demonstrate that our method achieves state-of-the-art visual grounding performance while avoiding per-scene training, providing a solid foundation for zero-shot 3DVG research. Video demos can be found in https://github.com/leviome/GVR_demos.","short_abstract":"3D Visual Grounding (3DVG) aims to locate objects in 3D scenes based on text prompts, which is essential for applications such as robotics. However, existing 3DVG methods encounter two main challenges: first, they struggle to handle the implicit representation of spatial textures in 3D Gaussian Splatting (3DGS), making...","url_abs":"https://arxiv.org/abs/2509.15871","url_pdf":"https://arxiv.org/pdf/2509.15871v1","authors":"[\"Liwei Liao\",\"Xufeng Li\",\"Xiaoyun Zheng\",\"Boning Liu\",\"Feng Gao\",\"Ronggang Wang\"]","published":"2025-09-19T11:11:36Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.MM\"]","methods":"[]","has_code":false,"code_links":[{"ID":609619,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2868778,"paper_url":"https://arxiv.org/abs/2509.15871","paper_title":"Zero-Shot Visual Grounding in 3D Gaussians via View Retrieval","repo_url":"https://github.com/leviome/GVR_demos","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
