{"ID":2888663,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.22412","arxiv_id":"2507.22412","title":"UAVScenes: A Multi-Modal Dataset for UAVs","abstract":"Multi-modal perception is essential for unmanned aerial vehicle (UAV) operations, as it enables a comprehensive understanding of the UAVs' surrounding environment. However, most existing multi-modal UAV datasets are primarily biased toward localization and 3D reconstruction tasks, or only support map-level semantic segmentation due to the lack of frame-wise annotations for both camera images and LiDAR point clouds. This limitation prevents them from being used for high-level scene understanding tasks. To address this gap and advance multi-modal UAV perception, we introduce UAVScenes, a large-scale dataset designed to benchmark various tasks across both 2D and 3D modalities. Our benchmark dataset is built upon the well-calibrated multi-modal UAV dataset MARS-LVIG, originally developed only for simultaneous localization and mapping (SLAM). We enhance this dataset by providing manually labeled semantic annotations for both frame-wise images and LiDAR point clouds, along with accurate 6-degree-of-freedom (6-DoF) poses. These additions enable a wide range of UAV perception tasks, including segmentation, depth estimation, 6-DoF localization, place recognition, and novel view synthesis (NVS). Our dataset is available at https://github.com/sijieaaa/UAVScenes","short_abstract":"Multi-modal perception is essential for unmanned aerial vehicle (UAV) operations, as it enables a comprehensive understanding of the UAVs' surrounding environment. However, most existing multi-modal UAV datasets are primarily biased toward localization and 3D reconstruction tasks, or only support map-level semantic seg...","url_abs":"https://arxiv.org/abs/2507.22412","url_pdf":"https://arxiv.org/pdf/2507.22412v1","authors":"[\"Sijie Wang\",\"Siqi Li\",\"Yawei Zhang\",\"Shangshu Yu\",\"Shenghai Yuan\",\"Rui She\",\"Quanjiang Guo\",\"JinXuan Zheng\",\"Ong Kang Howe\",\"Leonrich Chandra\",\"Shrivarshann Srijeyan\",\"Aditya Sivadas\",\"Toshan Aggarwal\",\"Heyuan Liu\",\"Hongming Zhang\",\"Chujie Chen\",\"Junyu Jiang\",\"Lihua Xie\",\"Wee Peng Tay\"]","published":"2025-07-30T06:29:52Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":611561,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2888663,"paper_url":"https://arxiv.org/abs/2507.22412","paper_title":"UAVScenes: A Multi-Modal Dataset for UAVs","repo_url":"https://github.com/sijieaaa/UAVScenes","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}