Generalized Policy Gradient with History-Aware Decision Transformer for Reliable Routing over Graph Signals

cs.LG arXiv:2508.17218
View PDF arXiv JSON

Abstract

Reliable path planning in stochastic transportation networks requires decisions that account for uncertain and correlated travel times on irregular road graphs, rather than only minimizing expected delay. Such networks exhibit strong spatial-temporal coupling, where link travel times evolve as stochastic processes over graph edges, making the problem inherently sequential under uncertainty. Existing stochastic on-time arrival (SOTA) methods primarily depend on the current node and remaining budget, which limits their ability to exploit trajectory-level temporal structure and history-dependent correlations. This work proposes GPG-HT, a history-aware graph-signal policy framework that integrates a Decision Transformer with generalized policy gradient optimization for reliable routing. By attending to historical node-edge-time observations, GPG-HT captures non-Markovian spatial-temporal dependencies and enables context-aware decision making under uncertainty. Experiments on the Sioux Falls and Anaheim networks demonstrate consistent gains in on-time arrival probability over representative optimization and reinforcement learning baselines.

PDF Viewer