{"ID":2857548,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09325","arxiv_id":"2510.09325","title":"Rate optimal learning of equilibria from data","abstract":"We close open theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity. In the non-interactive setting, we prove a statistical lower bound that identifies the all-policy deviation concentrability coefficient as the fundamental complexity measure, and we show that Behavior Cloning (BC) is rate-optimal. For the interactive setting, we introduce a framework that combines reward-free reinforcement learning with interactive MAIL and instantiate it with an algorithm, MAIL-WARM. It improves the best previously known sample complexity from $\\mathcal{O}(\\varepsilon^{-8})$ to $\\mathcal{O}(\\varepsilon^{-2}),$ matching the dependence on $\\varepsilon$ implied by our lower bound. Finally, we provide numerical results that support our theory and illustrate, in environments such as grid worlds, where Behavior Cloning fails to learn.","short_abstract":"We close open theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity. In the non-interactive setting, we prove a statistical lower bound that identifies the all-policy deviation con...","url_abs":"https://arxiv.org/abs/2510.09325","url_pdf":"https://arxiv.org/pdf/2510.09325v1","authors":"[\"Till Freihaut\",\"Luca Viano\",\"Emanuele Nevali\",\"Volkan Cevher\",\"Matthieu Geist\",\"Giorgia Ramponi\"]","published":"2025-10-10T12:28:35Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
