{"ID":2886488,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.03643","arxiv_id":"2508.03643","title":"Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images","abstract":"Reconstructing and semantically interpreting 3D scenes from sparse 2D views remains a fundamental challenge in computer vision. Conventional methods often decouple semantic understanding from reconstruction or necessitate costly per-scene optimization, thereby restricting their scalability and generalizability. In this paper, we introduce Uni3R, a novel feed-forward framework that jointly reconstructs a unified 3D scene representation enriched with open-vocabulary semantics, directly from unposed multi-view images. Our approach leverages a Cross-View Transformer to robustly integrate information across arbitrary multi-view inputs, which then regresses a set of 3D Gaussian primitives endowed with semantic feature fields. This unified representation facilitates high-fidelity novel view synthesis, open-vocabulary 3D semantic segmentation, and depth prediction, all within a single, feed-forward pass. Extensive experiments demonstrate that Uni3R establishes a new state-of-the-art across multiple benchmarks, including 25.07 PSNR on RE10K and 55.84 mIoU on ScanNet. Our work signifies a novel paradigm towards generalizable, unified 3D scene reconstruction and understanding. The code is available at https://github.com/HorizonRobotics/Uni3R.","short_abstract":"Reconstructing and semantically interpreting 3D scenes from sparse 2D views remains a fundamental challenge in computer vision. Conventional methods often decouple semantic understanding from reconstruction or necessitate costly per-scene optimization, thereby restricting their scalability and generalizability. In this...","url_abs":"https://arxiv.org/abs/2508.03643","url_pdf":"https://arxiv.org/pdf/2508.03643v4","authors":"[\"Xiangyu Sun\",\"Haoyi Jiang\",\"Liu Liu\",\"Seungtae Nam\",\"Gyeongjin Kang\",\"Xinjie Wang\",\"Wei Sui\",\"Zhizhong Su\",\"Wenyu Liu\",\"Xinggang Wang\",\"Eunbyung Park\"]","published":"2025-08-05T16:54:55Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":611313,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2886488,"paper_url":"https://arxiv.org/abs/2508.03643","paper_title":"Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images","repo_url":"https://github.com/HorizonRobotics/Uni3R","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}