{"ID":2862477,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25727","arxiv_id":"2509.25727","title":"Boundary-to-Region Supervision for Offline Safe Reinforcement Learning","abstract":"Offline safe reinforcement learning aims to learn policies that satisfy predefined safety constraints from static datasets. Existing sequence-model-based methods condition action generation on symmetric input tokens for return-to-go and cost-to-go, neglecting their intrinsic asymmetry: return-to-go (RTG) serves as a flexible performance target, while cost-to-go (CTG) should represent a rigid safety boundary. This symmetric conditioning leads to unreliable constraint satisfaction, especially when encountering out-of-distribution cost trajectories. To address this, we propose Boundary-to-Region (B2R), a framework that enables asymmetric conditioning through cost signal realignment . B2R redefines CTG as a boundary constraint under a fixed safety budget, unifying the cost distribution of all feasible trajectories while preserving reward structures. Combined with rotary positional embeddings , it enhances exploration within the safe region. Experimental results show that B2R satisfies safety constraints in 35 out of 38 safety-critical tasks while achieving superior reward performance over baseline methods. This work highlights the limitations of symmetric token conditioning and establishes a new theoretical and practical approach for applying sequence models to safe RL. Our code is available at https://github.com/HuikangSu/B2R.","short_abstract":"Offline safe reinforcement learning aims to learn policies that satisfy predefined safety constraints from static datasets. Existing sequence-model-based methods condition action generation on symmetric input tokens for return-to-go and cost-to-go, neglecting their intrinsic asymmetry: return-to-go (RTG) serves as a fl...","url_abs":"https://arxiv.org/abs/2509.25727","url_pdf":"https://arxiv.org/pdf/2509.25727v1","authors":"[\"Huikang Su\",\"Dengyun Peng\",\"Zifeng Zhuang\",\"YuHan Liu\",\"Qiguang Chen\",\"Donglin Wang\",\"Qinghe Liu\"]","published":"2025-09-30T03:38:20Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.RO\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false,"code_links":[{"ID":608899,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2862477,"paper_url":"https://arxiv.org/abs/2509.25727","paper_title":"Boundary-to-Region Supervision for Offline Safe Reinforcement Learning","repo_url":"https://github.com/HuikangSu/B2R","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
