{"ID":2922164,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T17:44:34.312992241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.00838","arxiv_id":"2606.00838","title":"Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications","abstract":"Inductive generalization is a framework for reinforcement learning (RL) generalization in which inductively related task instances admit inductively related policies. Prior work captures this structure via a higher-order policy-evolution function learned directly with RL, but suffers from poor training scalability: as training tasks grow, aggregated reward feedback becomes noisy and conflicting, destabilizing training and weakening generalization. We propose DIBS, a decoupled behavioral cloning approach that separates learning task-specific policies from learning the evolution function. We first learn individual teacher policies per task via standard RL, then fit the evolution function via behavioral cloning on teacher-labeled state-action pairs. This replaces noisy reward aggregation with dense, stable supervision. DIBS achieves significant improvements in both training stability and zero-shot generalization against existing RL and meta-RL algorithms.","short_abstract":"Inductive generalization is a framework for reinforcement learning (RL) generalization in which inductively related task instances admit inductively related policies. Prior work captures this structure via a higher-order policy-evolution function learned directly with RL, but suffers from poor training scalability: as...","url_abs":"https://arxiv.org/abs/2606.00838","url_pdf":"https://arxiv.org/pdf/2606.00838v1","authors":"[\"Vignesh Subramanian\",\"Subhajit Roy\",\"Suguman Bansal\"]","published":"2026-05-30T18:26:59Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
