{"ID":560765,"CreatedAt":"2026-03-04T20:59:09Z","UpdatedAt":"2026-03-04T20:59:09Z","DeletedAt":null,"paper_url":"https://paperswithcode.com/paper/foundationpose-unified-6d-pose-estimation-and","arxiv_id":"2312.08344","title":"FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects","abstract":"We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions. Project page: https://nvlabs.github.io/FoundationPose/","short_abstract":"We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups.","url_abs":"https://arxiv.org/abs/2312.08344v2","url_pdf":"https://arxiv.org/pdf/2312.08344v2.pdf","authors":"[\"Bowen Wen\", \"Wei Yang\", \"Jan Kautz\", \"Stan Birchfield\"]","published":"2023-12-13T00:00:00Z","proceeding":"CVPR 2024 1","tasks":"[\"3D Object Detection\", \"3D Object Tracking\", \"6D Pose Estimation\", \"6D Pose Estimation using RGB\", \"Contrastive Learning\", \"Language Modeling\", \"Language Modelling\", \"Large Language Model\", \"Novel View Synthesis\", \"Pose Estimation\", \"Pose Tracking\"]","methods":"[\"Contrastive Learning\"]","conference_url_abs":"http://openaccess.thecvf.com//content/CVPR2024/html/Wen_FoundationPose_Unified_6D_Pose_Estimation_and_Tracking_of_Novel_Objects_CVPR_2024_paper.html","conference_url_pdf":"http://openaccess.thecvf.com//content/CVPR2024/papers/Wen_FoundationPose_Unified_6D_Pose_Estimation_and_Tracking_of_Novel_Objects_CVPR_2024_paper.pdf","has_code":false,"code_links":[{"ID":301934,"CreatedAt":"2026-03-04T21:00:12Z","UpdatedAt":"2026-03-04T21:00:12Z","DeletedAt":null,"paper_id":560765,"paper_url":"https://paperswithcode.com/paper/foundationpose-unified-6d-pose-estimation-and","paper_title":"FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects","repo_url":"https://github.com/NVlabs/FoundationStereo","is_official":true,"mentioned_in_paper":false,"mentioned_in_github":false,"framework":"pytorch","github_stars":0}]}