{"ID":2833592,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.03958","arxiv_id":"2512.03958","title":"MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation","abstract":"Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extended Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on multimodal reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method successfully increases Success Rate from 0.23 to 0.32 and decreases Navigation Error from 4.43m to 4.08m, demonstrating the state-of-the-art performance in the agricultural VLN domain. Code: https://github.com/AlexTraveling/MDE-AgriVLN.","short_abstract":"Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extended Vision-and-Language Navigation (VLN) to the agricultural domain, enab...","url_abs":"https://arxiv.org/abs/2512.03958","url_pdf":"https://arxiv.org/pdf/2512.03958v3","authors":"[\"Xiaobei Zhao\",\"Xingqi Lyu\",\"Xin Chen\",\"Xiang Li\"]","published":"2025-12-03T16:52:07Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[]","has_code":false,"code_links":[{"ID":606336,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2833592,"paper_url":"https://arxiv.org/abs/2512.03958","paper_title":"MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation","repo_url":"https://github.com/AlexTraveling/MDE-AgriVLN","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}