{"ID":3049897,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T15:44:26.945507316Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05143","arxiv_id":"2606.05143","title":"HORIZON: Recoverability-Governed Curriculum for Physical-Domain Scaling","abstract":"Scaling robust robot policies requires more than broader randomization, because physical-domain experience must remain organized and learnable throughout training. We study when a policy can benefit from harder physics and identify recoverability as a central constraint in on-policy physical-domain scaling. In on-policy training, new dynamics are useful only insofar as they remain close enough to the current policy to generate corrective on-policy data, rather than collapsing rollouts into unrecoverable failures. Using quadruped locomotion as a physically demanding benchmark for embodied generalization, we introduce HORIZON, a checkpointed frontier curriculum that expands physical domains only within the current policy's recoverable boundary. HORIZON uses rollback and boundary refinement to govern each expansion step, turning fixed randomization into a continual process of physical-domain growth. Experiments reveal three regularities of physical-domain expansion. First, direct domain widening is uneven across physical axes and often unlearnable without staged ordering. Second, domain composition is non-monotonic, and adding more domains beyond a compact core can dilute recoverable joint samples and reduce overall robustness. Third, offline distillation of isolated experts cannot substitute for the joint interaction generated by on-policy curriculum. Together, these results frame physical-domain generalization as a continual growth problem for embodied control, with recoverability as the organizing principle for on-policy expansion.","short_abstract":"Scaling robust robot policies requires more than broader randomization, because physical-domain experience must remain organized and learnable throughout training. We study when a policy can benefit from harder physics and identify recoverability as a central constraint in on-policy physical-domain scaling. In on-polic...","url_abs":"https://arxiv.org/abs/2606.05143","url_pdf":"https://arxiv.org/pdf/2606.05143v1","authors":"[\"Chenhao Bai\",\"Liqin Lu\",\"Kaijun Wang\",\"Hui Chen\",\"Jin-Chuan Shi\",\"Yuyang Liu\",\"Hao Chen\",\"Chunhua Shen\"]","published":"2026-06-03T17:50:02Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
