{"ID":3083840,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T05:32:54.120957816Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05774","arxiv_id":"2606.05774","title":"LiAuto-GeoX: Efficient Grounded Driving Transformer","abstract":"Dense 3D reconstruction has demonstrated immense potential for spatial understanding, yet its viability as a real-time, onboard representation for autonomous driving remains an open challenge. Existing large-scale visual geometry models typically require substantial computational resources and lack the long-range geometric fidelity, surround-view consistency, and real-time efficiency demanded by dynamic driving environments. To bridge this gap, we present \\textbf{LiAuto-GeoX}, an efficient grounded driving transformer designed for deployable, ego-centric 3D scene understanding. Our approach begins by learning a high-capacity driving geometry model from large-scale surround-view data, utilizing sparse LiDAR priors to provide robust geometric grounding in distant, ambiguous, or structure-sparse regions. We then instantiate this capability into a highly compact 155M-parameter onboard model through a novel geometry-preserving distillation framework. This framework employs mask-guided depth-aware distillation to retain fine-grained metric structures by emphasizing geometrically informative regions, and relative-pose relational distillation to enforce cross-view spatial consistency through pose-induced geometric relations. Extensive evaluations reveal that \\textbf{LiAuto-GeoX} runs at 220 FPS on KITTI while maintaining high-fidelity dense reconstruction, enabling real-time deployment. The learned geometry transfers seamlessly to downstream autonomy tasks, achieving 90.6 PDMS in trajectory prediction, 24.63 mIoU in occupancy prediction, and 47.67 IoU in future-frame prediction. These all demonstrate that efficient dense 3D reconstruction can transcend its traditional role as a perception target to serve as a scalable, foundational geometric representation for next-generation autonomous driving.","short_abstract":"Dense 3D reconstruction has demonstrated immense potential for spatial understanding, yet its viability as a real-time, onboard representation for autonomous driving remains an open challenge. Existing large-scale visual geometry models typically require substantial computational resources and lack the long-range geome...","url_abs":"https://arxiv.org/abs/2606.05774","url_pdf":"https://arxiv.org/pdf/2606.05774v1","authors":"[\"Jiawei Lian\",\"Haoyi Sun\",\"Yang Wu\",\"Lifu Mu\",\"Siyuan Wang\",\"Le Hui\",\"Ning Mao\",\"Tao Wei\",\"Pan Zhou\",\"Kun Zhan\",\"Jian Yang\"]","published":"2026-06-04T06:58:44Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\"]","has_code":false}
