{"ID":2866421,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19853","arxiv_id":"2509.19853","title":"SAGE:State-Aware Guided End-to-End Policy for Multi-Stage Sequential Tasks via Hidden Markov Decision Process","abstract":"Multi-stage sequential (MSS) robotic manipulation tasks are prevalent and crucial in robotics. They often involve state ambiguity, where visually similar observations correspond to different actions. We present SAGE, a state-aware guided imitation learning framework that models tasks as a Hidden Markov Decision Process (HMDP) to explicitly capture latent task stages and resolve ambiguity. We instantiate the HMDP with a state transition network that infers hidden states, and a state-aware action policy that conditions on both observations and hidden states to produce actions, thereby enabling disambiguation across task stages. To reduce manual annotation effort, we propose a semi-automatic labeling pipeline combining active learning and soft label interpolation. In real-world experiments across multiple complex MSS tasks with state ambiguity, SAGE achieved 100% task success under the standard evaluation protocol, markedly surpassing the baselines. Ablation studies further show that such performance can be maintained with manual labeling for only about 13% of the states, indicating its strong effectiveness.","short_abstract":"Multi-stage sequential (MSS) robotic manipulation tasks are prevalent and crucial in robotics. They often involve state ambiguity, where visually similar observations correspond to different actions. We present SAGE, a state-aware guided imitation learning framework that models tasks as a Hidden Markov Decision Process...","url_abs":"https://arxiv.org/abs/2509.19853","url_pdf":"https://arxiv.org/pdf/2509.19853v1","authors":"[\"BinXu Wu\",\"TengFei Zhang\",\"Chen Yang\",\"JiaHao Wen\",\"HaoCheng Li\",\"JingTian Ma\",\"Zhen Chen\",\"JingYuan Wang\"]","published":"2025-09-24T07:52:08Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[]","has_code":false}
