{"ID":2886767,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.02157","arxiv_id":"2508.02157","title":"Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes","abstract":"Recognizing objects in images is a fundamental problem in computer vision. Although detecting objects in 2D images is common, many applications require determining their pose in 3D space. Traditional category-level methods rely on RGB-D inputs, which may not always be available, or employ two-stage approaches that use separate models and representations for detection and pose estimation. For the first time, we introduce a unified model that integrates detection and pose estimation into a single framework for RGB images by leveraging neural mesh models with learned features and multi-model RANSAC. Our approach achieves state-of-the-art results for RGB category-level pose estimation on REAL275, improving on the current state-of-the-art by 22.9% averaged across all scale-agnostic metrics. Finally, we demonstrate that our unified method exhibits greater robustness compared to single-stage baselines. Our code and models are available at https://github.com/Fischer-Tom/unified-detection-and-pose-estimation.","short_abstract":"Recognizing objects in images is a fundamental problem in computer vision. Although detecting objects in 2D images is common, many applications require determining their pose in 3D space. Traditional category-level methods rely on RGB-D inputs, which may not always be available, or employ two-stage approaches that use...","url_abs":"https://arxiv.org/abs/2508.02157","url_pdf":"https://arxiv.org/pdf/2508.02157v1","authors":"[\"Tom Fischer\",\"Xiaojie Zhang\",\"Eddy Ilg\"]","published":"2025-08-04T07:57:39Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":611345,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2886767,"paper_url":"https://arxiv.org/abs/2508.02157","paper_title":"Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes","repo_url":"https://github.com/Fischer-Tom/unified-detection-and-pose-estimation","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
