{"ID":2883395,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.10934","arxiv_id":"2508.10934","title":"ViPE: Video Pose Engine for 3D Geometric Perception","abstract":"Accurate 3D geometric perception is an important prerequisite for a wide range of spatial AI systems. While state-of-the-art methods depend on large-scale training data, acquiring consistent and precise 3D annotations from in-the-wild videos remains a key challenge. In this work, we introduce ViPE, a handy and versatile video processing engine designed to bridge this gap. ViPE efficiently estimates camera intrinsics, camera motion, and dense, near-metric depth maps from unconstrained raw videos. It is robust to diverse scenarios, including dynamic selfie videos, cinematic shots, or dashcams, and supports various camera models such as pinhole, wide-angle, and 360° panoramas. We have benchmarked ViPE on multiple benchmarks. Notably, it outperforms existing uncalibrated pose estimation baselines by 18%/50% on TUM/KITTI sequences, and runs at 3-5FPS on a single GPU for standard input resolutions. We use ViPE to annotate a large-scale collection of videos. This collection includes around 100K real-world internet videos, 1M high-quality AI-generated videos, and 2K panoramic videos, totaling approximately 96M frames -- all annotated with accurate camera poses and dense depth maps. We open-source ViPE and the annotated dataset with the hope of accelerating the development of spatial AI systems.","short_abstract":"Accurate 3D geometric perception is an important prerequisite for a wide range of spatial AI systems. While state-of-the-art methods depend on large-scale training data, acquiring consistent and precise 3D annotations from in-the-wild videos remains a key challenge. In this work, we introduce ViPE, a handy and versatil...","url_abs":"https://arxiv.org/abs/2508.10934","url_pdf":"https://arxiv.org/pdf/2508.10934v1","authors":"[\"Jiahui Huang\",\"Qunjie Zhou\",\"Hesam Rabeti\",\"Aleksandr Korovko\",\"Huan Ling\",\"Xuanchi Ren\",\"Tianchang Shen\",\"Jun Gao\",\"Dmitry Slepichev\",\"Chen-Hsuan Lin\",\"Jiawei Ren\",\"Kevin Xie\",\"Joydeep Biswas\",\"Laura Leal-Taixe\",\"Sanja Fidler\"]","published":"2025-08-12T18:39:13Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.GR\",\"cs.RO\",\"eess.IV\"]","methods":"[]","has_code":false}