{"ID":2869688,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18162","arxiv_id":"2509.18162","title":"A Simple and Reproducible Hybrid Solver for a Truck-Drone VRP with Recharge","abstract":"We study last-mile delivery with one truck and one drone under explicit battery management: the drone flies at twice the truck speed; each sortie must satisfy an endurance budget; after every delivery the drone recharges on the truck before the next launch. We introduce a hybrid reinforcement learning (RL) solver that couples an ALNS-based truck tour (with 2/3-opt and Or-opt) with a small pointer/attention policy that schedules drone sorties. The policy decodes launch-serve-rendezvous triplets with hard feasibility masks for endurance and post-delivery recharge; a fast, exact timeline simulator enforces launch/recovery handling and computes the true makespan used by masked greedy/beam decoding. On Euclidean instances with $N{=}50$, $E{=}0.7$, and $R{=}0.1$, the method achieves an average makespan of \\textbf{5.203}$\\pm$0.093, versus \\textbf{5.349}$\\pm$0.038 for ALNS and \\textbf{5.208}$\\pm$0.124 for NN -- i.e., \\textbf{2.73\\%} better than ALNS on average and within \\textbf{0.10\\%} of NN. Per-seed, the RL scheduler never underperforms ALNS on the same instance and ties or beats NN on two of three seeds. A decomposition of the makespan shows the expected truck-wait trade-off across heuristics; the learned scheduler balances both to minimize the total completion time. We provide a config-first implementation with plotting and significance-test utilities to support replication.","short_abstract":"We study last-mile delivery with one truck and one drone under explicit battery management: the drone flies at twice the truck speed; each sortie must satisfy an endurance budget; after every delivery the drone recharges on the truck before the next launch. We introduce a hybrid reinforcement learning (RL) solver that...","url_abs":"https://arxiv.org/abs/2509.18162","url_pdf":"https://arxiv.org/pdf/2509.18162v1","authors":"[\"Meraryslan Meraliyev\",\"Cemil Turan\",\"Shirali Kadyrov\"]","published":"2025-09-17T05:18:45Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}