{"ID":2896054,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.07605","arxiv_id":"2507.07605","title":"LOSC: LiDAR Open-voc Segmentation Consolidator","abstract":"We study the use of image-based Vision-Language Models (VLMs) for open-vocabulary segmentation of lidar scans in driving settings. Classically, image semantics can be back-projected onto 3D point clouds. Yet, resulting point labels are noisy and sparse. We consolidate these labels to enforce both spatio-temporal consistency and robustness to image-level augmentations. We then train a 3D network based on these refined labels. This simple method, called LOSC, outperforms the SOTA of zero-shot open-vocabulary semantic and panoptic segmentation on both nuScenes and SemanticKITTI, with significant margins. Code is available at https://github.com/valeoai/LOSC.","short_abstract":"We study the use of image-based Vision-Language Models (VLMs) for open-vocabulary segmentation of lidar scans in driving settings. Classically, image semantics can be back-projected onto 3D point clouds. Yet, resulting point labels are noisy and sparse. We consolidate these labels to enforce both spatio-temporal consis...","url_abs":"https://arxiv.org/abs/2507.07605","url_pdf":"https://arxiv.org/pdf/2507.07605v2","authors":"[\"Nermin Samet\",\"Gilles Puy\",\"Renaud Marlet\"]","published":"2025-07-10T10:10:13Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":612238,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2896054,"paper_url":"https://arxiv.org/abs/2507.07605","paper_title":"LOSC: LiDAR Open-voc Segmentation Consolidator","repo_url":"https://github.com/valeoai/LOSC","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}