{"ID":2832161,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.06250","arxiv_id":"2512.06250","title":"Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning","abstract":"Autonomous agents often require multiple strategies to solve complex tasks, but determining when to switch between strategies remains challenging. This research introduces a reinforcement learning technique to learn switching thresholds between two orthogonal navigation policies. Using maze navigation as a case study, this work demonstrates how an agent can dynamically transition between systematic exploration (coverage) and goal-directed pathfinding (convergence) to improve task performance. Unlike fixed-threshold approaches, the agent uses Q-learning to adapt switching behavior based on coverage percentage and distance to goal, requiring only minimal domain knowledge: maze dimensions and target location. The agent does not require prior knowledge of wall positions, optimal threshold values, or hand-crafted heuristics; instead, it discovers effective switching strategies dynamically during each run. The agent discretizes its state space into coverage and distance buckets, then adapts which coverage threshold (20-60\\%) to apply based on observed progress signals. Experiments across 240 test configurations (4 maze sizes from 16$\\times$16 to 128$\\times$128 $\\times$ 10 unique mazes $\\times$ 6 agent variants) demonstrate that adaptive threshold learning outperforms both single-strategy agents and fixed 40\\% threshold baselines. Results show 23-55\\% improvements in completion time, 83\\% reduction in runtime variance, and 71\\% improvement in worst-case scenarios. The learned switching behavior generalizes within each size class to unseen wall configurations. Performance gains scale with problem complexity: 23\\% improvement for 16$\\times$16 mazes, 34\\% for 32$\\times$32, and 55\\% for 64$\\times$64, demonstrating that as the space of possible maze structures grows, the value of adaptive policy selection over fixed heuristics increases proportionally.","short_abstract":"Autonomous agents often require multiple strategies to solve complex tasks, but determining when to switch between strategies remains challenging. This research introduces a reinforcement learning technique to learn switching thresholds between two orthogonal navigation policies. Using maze navigation as a case study,...","url_abs":"https://arxiv.org/abs/2512.06250","url_pdf":"https://arxiv.org/pdf/2512.06250v1","authors":"[\"Chris Tava\"]","published":"2025-12-06T02:50:32Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
