{"ID":2868164,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17092","arxiv_id":"2509.17092","title":"On the Limits of Tabular Hardness Metrics for Deep RL: A Study with the Pharos Benchmark","abstract":"Principled evaluation is critical for progress in deep reinforcement learning (RL), yet it lags behind the theory-driven benchmarks of tabular RL. While tabular settings benefit from well-understood hardness measures like MDP diameter and suboptimality gaps, deep RL benchmarks are often chosen based on intuition and popularity. This raises a critical question: can tabular hardness metrics be adapted to guide non-tabular benchmarking? We investigate this question and reveal a fundamental gap. Our primary contribution is demonstrating that the difficulty of non-tabular environments is dominated by a factor that tabular metrics ignore: representation hardness. The same underlying MDP can pose vastly different challenges depending on whether the agent receives state vectors or pixel-based observations. To enable this analysis, we introduce \\texttt{pharos}, a new open-source library for principled RL benchmarking that allows for systematic control over both environment structure and agent representations. Our extensive case study using \\texttt{pharos} shows that while tabular metrics offer some insight, they are poor predictors of deep RL agent performance on their own. This work highlights the urgent need for new, representation-aware hardness measures and positions \\texttt{pharos} as a key tool for developing them.","short_abstract":"Principled evaluation is critical for progress in deep reinforcement learning (RL), yet it lags behind the theory-driven benchmarks of tabular RL. While tabular settings benefit from well-understood hardness measures like MDP diameter and suboptimality gaps, deep RL benchmarks are often chosen based on intuition and po...","url_abs":"https://arxiv.org/abs/2509.17092","url_pdf":"https://arxiv.org/pdf/2509.17092v1","authors":"[\"Michelangelo Conserva\",\"Remo Sasso\",\"Paulo Rauber\"]","published":"2025-09-21T14:14:10Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
