{"ID":2862943,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.26536","arxiv_id":"2509.26536","title":"OceanGym: A Benchmark Environment for Underwater Embodied Agents","abstract":"We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynamic ocean currents, making effective agent deployment exceptionally difficult. OceanGym encompasses eight realistic task domains and a unified agent framework driven by Multi-modal Large Language Models (MLLMs), which integrates perception, memory, and sequential decision-making. Agents are required to comprehend optical and sonar data, autonomously explore complex environments, and accomplish long-horizon objectives under these harsh conditions. Extensive experiments reveal substantial gaps between state-of-the-art MLLM-driven agents and human experts, highlighting the persistent difficulty of perception, planning, and adaptability in ocean underwater environments. By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI and transferring these capabilities to real-world autonomous ocean underwater vehicles, marking a decisive step toward intelligent agents capable of operating in one of Earth's last unexplored frontiers. The code and data are available at https://github.com/OceanGPT/OceanGym.","short_abstract":"We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynam...","url_abs":"https://arxiv.org/abs/2509.26536","url_pdf":"https://arxiv.org/pdf/2509.26536v2","authors":"[\"Yida Xue\",\"Mingjun Mao\",\"Xiangyuan Ru\",\"Yuqi Zhu\",\"Baochang Ren\",\"Shuofei Qiao\",\"Mengru Wang\",\"Shumin Deng\",\"Xinyu An\",\"Ningyu Zhang\",\"Ying Chen\",\"Huajun Chen\"]","published":"2025-09-30T17:09:32Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.CV\",\"cs.LG\",\"cs.RO\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":608960,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2862943,"paper_url":"https://arxiv.org/abs/2509.26536","paper_title":"OceanGym: A Benchmark Environment for Underwater Embodied Agents","repo_url":"https://github.com/OceanGPT/OceanGym","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}