{"ID":2885583,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.06553","arxiv_id":"2508.06553","title":"Static and Plugged: Make Embodied Evaluation Simple","abstract":"Embodied intelligence is advancing rapidly, driving the need for efficient evaluation. Current benchmarks typically rely on interactive simulated environments or real-world setups, which are costly, fragmented, and hard to scale. To address this, we introduce StaticEmbodiedBench, a plug-and-play benchmark that enables unified evaluation using static scene representations. Covering 42 diverse scenarios and 8 core dimensions, it supports scalable and comprehensive assessment through a simple interface. Furthermore, we evaluate 19 Vision-Language Models (VLMs) and 11 Vision-Language-Action models (VLAs), establishing the first unified static leaderboard for Embodied intelligence. Moreover, we release a subset of 200 samples from our benchmark to accelerate the development of embodied intelligence.","short_abstract":"Embodied intelligence is advancing rapidly, driving the need for efficient evaluation. Current benchmarks typically rely on interactive simulated environments or real-world setups, which are costly, fragmented, and hard to scale. To address this, we introduce StaticEmbodiedBench, a plug-and-play benchmark that enables...","url_abs":"https://arxiv.org/abs/2508.06553","url_pdf":"https://arxiv.org/pdf/2508.06553v1","authors":"[\"Jiahao Xiao\",\"Jianbo Zhang\",\"BoWen Yan\",\"Shengyu Guo\",\"Tongrui Ye\",\"Kaiwei Zhang\",\"Zicheng Zhang\",\"Xiaohong Liu\",\"Zhengxue Cheng\",\"Lei Fan\",\"Chuyi Li\",\"Guangtao Zhai\"]","published":"2025-08-06T06:42:56Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}