{"ID":2874400,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.03790","arxiv_id":"2509.03790","title":"What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?","abstract":"Sparse-reward reinforcement learning (RL) remains fundamentally hard: without structure, any agent needs $Ω(|\\mathcal{S}||\\mathcal{A}|/p)$ samples to recover rewards. We introduce Policy-Aware Matrix Completion (PAMC) as a first concrete step toward a structural reward learning framework. Our key idea is to exploit approximate low-rank + sparse structure in the reward matrix, under policy-biased (MNAR) sampling. We prove recovery guarantees with inverse-propensity weighting, and establish a visitation-weighted error-to-regret bound linking completion error to control performance. Importantly, when assumptions weaken, PAMC degrades gracefully: confidence intervals widen and the algorithm abstains, ensuring safe fallback to exploration. Empirically, PAMC improves sample efficiency across Atari-26 (10M steps), DM Control, MetaWorld MT50, D4RL offline RL, and preference-based RL benchmarks, outperforming DrQ-v2, DreamerV3, Agent57, T-REX/D-REX, and PrefPPO under compute-normalized comparisons. Our results highlight PAMC as a practical and principled tool when structural rewards exist, and as a concrete first instantiation of a broader structural reward learning perspective.","short_abstract":"Sparse-reward reinforcement learning (RL) remains fundamentally hard: without structure, any agent needs $Ω(|\\mathcal{S}||\\mathcal{A}|/p)$ samples to recover rewards. We introduce Policy-Aware Matrix Completion (PAMC) as a first concrete step toward a structural reward learning framework. Our key idea is to exploit app...","url_abs":"https://arxiv.org/abs/2509.03790","url_pdf":"https://arxiv.org/pdf/2509.03790v2","authors":"[\"Ibne Farabi Shihab\",\"Sanjeda Akter\",\"Anuj Sharma\"]","published":"2025-09-04T00:53:02Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
