{"ID":2845949,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.03828","arxiv_id":"2511.03828","title":"From Static Constraints to Dynamic Adaptation: Sample-Level Constraint Relaxation for Offline-to-Online Reinforcement Learning","abstract":"Offline-to-online reinforcement learning (O2O RL) faces a central challenge between retaining offline conservatism and adapting to online feedback under distribution shift. This challenge arises because data behavior evolves during fine-tuning, rendering data origin a misleading basis for constraint handling and thereby leading to objective-data mismatch. We therefore propose Dynamic Alignment for RElaxation (DARE), a distribution-aware framework for sample-level constraint relaxation based on the behavioral consistency with a behavior model. To our knowledge, DARE is the first to condition constraint relaxation on behavioral consistency via a posterior-induced exchange mechanism, moving beyond a binary offline/online data distinction. Importantly, DARE requires only per-sample behavioral alignment, enabling instantiation on top of many offline algorithms with flexible choices of behavior models and fine-tuning objectives. We provide a theoretical analysis showing that behavior-based sample exchange consistently improves the distinction between offline-like and online-like subsets. Experiments on D4RL demonstrate that DARE consistently improves fine-tuning stability and achieves superior final performance over strong offline-to-online baselines. (The code is publicly available at \\url{https://github.com/lpzu/DARE}.)","short_abstract":"Offline-to-online reinforcement learning (O2O RL) faces a central challenge between retaining offline conservatism and adapting to online feedback under distribution shift. This challenge arises because data behavior evolves during fine-tuning, rendering data origin a misleading basis for constraint handling and thereb...","url_abs":"https://arxiv.org/abs/2511.03828","url_pdf":"https://arxiv.org/pdf/2511.03828v3","authors":"[\"Lipeng Zu\",\"Yu Qian\",\"Shayok Chakraborty\",\"Xiaonan Zhang\"]","published":"2025-11-05T19:48:46Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":607401,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2845949,"paper_url":"https://arxiv.org/abs/2511.03828","paper_title":"From Static Constraints to Dynamic Adaptation: Sample-Level Constraint Relaxation for Offline-to-Online Reinforcement Learning","repo_url":"https://github.com/lpzu/DARE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
