{"ID":2857272,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.08928","arxiv_id":"2510.08928","title":"LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition","abstract":"Existing benchmarks for large multimodal models (LMMs) often fail to capture their performance in real-time, adversarial environments. We introduce LM Fight Arena (Large Model Fight Arena), a novel framework that evaluates LMMs by pitting them against each other in the classic fighting game Mortal Kombat II, a task requiring rapid visual understanding and tactical, sequential decision-making. In a controlled tournament, we test six leading open- and closed-source models, where each agent operates controlling the same character to ensure a fair comparison. The models are prompted to interpret game frames and state data to select their next actions. Unlike static evaluations, LM Fight Arena provides a fully automated, reproducible, and objective assessment of an LMM's strategic reasoning capabilities in a dynamic setting. This work introduces a challenging and engaging benchmark that bridges the gap between AI evaluation and interactive entertainment.","short_abstract":"Existing benchmarks for large multimodal models (LMMs) often fail to capture their performance in real-time, adversarial environments. We introduce LM Fight Arena (Large Model Fight Arena), a novel framework that evaluates LMMs by pitting them against each other in the classic fighting game Mortal Kombat II, a task req...","url_abs":"https://arxiv.org/abs/2510.08928","url_pdf":"https://arxiv.org/pdf/2510.08928v1","authors":"[\"Yushuo Zheng\",\"Zicheng Zhang\",\"Xiongkuo Min\",\"Huiyu Duan\",\"Guangtao Zhai\"]","published":"2025-10-10T02:19:21Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[]","has_code":false}
