{"ID":2864221,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23733","arxiv_id":"2509.23733","title":"FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention","abstract":"In this paper we propose FastViDAR, a novel framework that takes four fisheye camera inputs and produces a full $360^\\circ$ depth map along with per-camera depth, fusion depth, and confidence estimates. Our main contributions are: (1) We introduce Alternative Hierarchical Attention (AHA) mechanism that efficiently fuses features across views through separate intra-frame and inter-frame windowed self-attention, achieving cross-view feature mixing with reduced overhead. (2) We propose a novel ERP fusion approach that projects multi-view depth estimates to a shared equirectangular coordinate system to obtain the final fusion depth. (3) We generate ERP image-depth pairs using HM3D and 2D3D-S datasets for comprehensive evaluation, demonstrating competitive zero-shot performance on real datasets while achieving up to 20 FPS on NVIDIA Orin NX embedded hardware. Project page: \\href{https://3f7dfc.github.io/FastVidar/}{https://3f7dfc.github.io/FastVidar/}","short_abstract":"In this paper we propose FastViDAR, a novel framework that takes four fisheye camera inputs and produces a full $360^\\circ$ depth map along with per-camera depth, fusion depth, and confidence estimates. Our main contributions are: (1) We introduce Alternative Hierarchical Attention (AHA) mechanism that efficiently fuse...","url_abs":"https://arxiv.org/abs/2509.23733","url_pdf":"https://arxiv.org/pdf/2509.23733v1","authors":"[\"Hangtian Zhao\",\"Xiang Chen\",\"Yizhe Li\",\"Qianhao Wang\",\"Haibo Lu\",\"Fei Gao\"]","published":"2025-09-28T08:25:27Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.RO\"]","methods":"[]","has_code":false}
