{"ID":2879599,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.16807","arxiv_id":"2508.16807","title":"Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach","abstract":"Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade-off between on-policy and off-policy algorithms. Off-policy methods promise high sample efficiency, a vital trait for minimizing costly and unsafe real-world fine-tuning. In contrast, on-policy methods often exhibit greater training stability, which is essential for reliable convergence in hazard-dense environments. This paper directly investigates this trade-off by comparing a leading on-policy algorithm, Proximal Policy Optimization (PPO), against an off-policy counterpart, Soft Actor-Critic (SAC), for precision flight in procedurally generated ducts within a high-fidelity simulator. Our results show that PPO consistently learned a stable, collision-free policy that completed the entire course. In contrast, SAC failed to find a complete solution, converging to a suboptimal policy that navigated only the initial segments before failure. This work provides evidence that for high-precision, safety-critical navigation tasks, the reliable convergence of a well-established on-policy method can be more decisive than the nominal sample efficiency of an off-policy algorithm.","short_abstract":"Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade-off between on-policy and off-po...","url_abs":"https://arxiv.org/abs/2508.16807","url_pdf":"https://arxiv.org/pdf/2508.16807v2","authors":"[\"Marco S. Tayar\",\"Lucas K. de Oliveira\",\"Felipe Andrade G. Tommaselli\",\"Juliano D. Negri\",\"Thiago H. Segreto\",\"Ricardo V. Godoy\",\"Marcelo Becker\"]","published":"2025-08-22T21:29:59Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.AI\",\"cs.LG\",\"eess.SY\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}