{"ID":2836936,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.20253","arxiv_id":"2511.20253","title":"Zoo3D: Zero-Shot 3D Object Detection at Scene Level","abstract":"3D object detection is fundamental for spatial understanding. Real-world environments demand models capable of recognizing diverse, previously unseen objects, which remains a major limitation of closed-set methods. Existing open-vocabulary 3D detectors relax annotation requirements but still depend on training scenes, either as point clouds or images. We take this a step further by introducing Zoo3D, the first training-free 3D object detection framework. Our method constructs 3D bounding boxes via graph clustering of 2D instance masks, then assigns semantic labels using a novel open-vocabulary module with best-view selection and view-consensus mask generation. Zoo3D operates in two modes: the zero-shot Zoo3D$_0$, which requires no training at all, and the self-supervised Zoo3D$_1$, which refines 3D box prediction by training a class-agnostic detector on Zoo3D$_0$-generated pseudo labels. Furthermore, we extend Zoo3D beyond point clouds to work directly with posed and even unposed images. Across ScanNet200 and ARKitScenes benchmarks, both Zoo3D$_0$ and Zoo3D$_1$ achieve state-of-the-art results in open-vocabulary 3D object detection. Remarkably, our zero-shot Zoo3D$_0$ outperforms all existing self-supervised methods, hence demonstrating the power and adaptability of training-free, off-the-shelf approaches for real-world 3D understanding. Code is available at https://github.com/col14m/zoo3d .","short_abstract":"3D object detection is fundamental for spatial understanding. Real-world environments demand models capable of recognizing diverse, previously unseen objects, which remains a major limitation of closed-set methods. Existing open-vocabulary 3D detectors relax annotation requirements but still depend on training scenes,...","url_abs":"https://arxiv.org/abs/2511.20253","url_pdf":"https://arxiv.org/pdf/2511.20253v1","authors":"[\"Andrey Lemeshko\",\"Bulat Gabdullin\",\"Nikita Drozdov\",\"Anton Konushin\",\"Danila Rukhovich\",\"Maksim Kolodiazhnyi\"]","published":"2025-11-25T12:29:06Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":606641,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2836936,"paper_url":"https://arxiv.org/abs/2511.20253","paper_title":"Zoo3D: Zero-Shot 3D Object Detection at Scene Level","repo_url":"https://github.com/col14m/zoo3d","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}