{"ID":2855074,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.13310","arxiv_id":"2510.13310","title":"InstantSfM: Towards GPU-Native SfM for the Deep Learning Era","abstract":"Structure-from-Motion (SfM) is a fundamental technique for recovering camera poses and scene structure from multi-view imagery, serving as a critical upstream component for applications ranging from 3D reconstruction to modern neural scene representations such as 3D Gaussian Splatting. However, most mature SfM systems remain CPU-centric and built upon traditional optimization toolchains, creating a growing mismatch with modern GPU-based, learning-driven pipelines and limiting scalability in large-scale scenes. While recent advances in GPU-accelerated bundle adjustment (BA) have demonstrated the potential of parallel sparse optimization, extending these techniques to build a complete global SfM system remains challenging due to unresolved issues in metric scale recovery and numerical robustness. In this paper, we implement a fully GPU-based and PyTorch-compatible global SfM system, named InstantSfM, to integrate seamlessly with modern learning pipelines. InstantSfM embeds metric depth priors directly into both global positioning and BA through a depth-constrained Jacobian structure, thereby resolving scale ambiguity within the optimization framework. To ensure numerical stability, we employ explicit filtering of under-constrained variables for the Jacobian matrix in an optimized GPU-friendly manner. Extensive experiments on diverse datasets demonstrate that InstantSfM achieves state-of-the-art efficiency while maintaining reconstruction accuracy comparable to both established classical pipelines and recent learning-based methods, showing up to ${\\sim40\\times}$ speedup over COLMAP on large-scale scenes.","short_abstract":"Structure-from-Motion (SfM) is a fundamental technique for recovering camera poses and scene structure from multi-view imagery, serving as a critical upstream component for applications ranging from 3D reconstruction to modern neural scene representations such as 3D Gaussian Splatting. However, most mature SfM systems...","url_abs":"https://arxiv.org/abs/2510.13310","url_pdf":"https://arxiv.org/pdf/2510.13310v2","authors":"[\"Jiankun Zhong\",\"Zitong Zhan\",\"Quankai Gao\",\"Ziyu Chen\",\"Haozhe Lou\",\"Jiageng Mao\",\"Ulrich Neumann\",\"Chen Wang\",\"Yue Wang\"]","published":"2025-10-15T08:58:05Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
