{"ID":2851612,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19372","arxiv_id":"2510.19372","title":"On the Hardness of Reinforcement Learning with Transition Look-Ahead","abstract":"We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\\ell=1$) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $\\ell \\geq 2$, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.","short_abstract":"We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this infor...","url_abs":"https://arxiv.org/abs/2510.19372","url_pdf":"https://arxiv.org/pdf/2510.19372v2","authors":"[\"Corentin Pla\",\"Hugo Richard\",\"Marc Abeille\",\"Nadav Merlis\",\"Vianney Perchet\"]","published":"2025-10-22T08:47:18Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}