{"ID":2900867,"CreatedAt":"2026-06-01T05:51:17.9442275Z","UpdatedAt":"2026-06-01T06:23:29.641557848Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2605.30928","arxiv_id":"2605.30928","title":"Enhancing Human-Likeness in Reinforcement Learning Agents via Hierarchical Macro Action Quantization","abstract":"Human-like agents are a long-standing goal of artificial intelligence. Despite strong performance, most reinforcement learning (RL) agents remain reward-driven and often exhibit behaviors that differ from humans, limiting interpretability and reliability. In this work, we introduce a novel human-like RL framework that predicts action sequences closely aligned with human behaviors while maximizing rewards. Specifically, we encode human demonstrations into macro actions using a hierarchical macro action quantization approach (termed HiMAQ) consisting of two successive levels of vector quantization. The lower quantization level maps input actions to fine-grained subaction clusters, while the higher quantization level aggregates these subaction clusters into action clusters. Extensive evaluations on the D4RL benchmarks show that our hierarchical approach outperforms the non-hierarchical baseline (MAQ), achieving better human-likeness scores while maintaining comparable or better success rates than previous RL agents. The improvements generalize across integrations with various RL algorithms, namely IQL, SAC, and RLPD.","short_abstract":"Human-like agents are a long-standing goal of artificial intelligence. Despite strong performance, most reinforcement learning (RL) agents remain reward-driven and often exhibit behaviors that differ from humans, limiting interpretability and reliability. In this work, we introduce a novel human-like RL framework that...","url_abs":"https://arxiv.org/abs/2605.30928","url_pdf":"https://arxiv.org/pdf/2605.30928v1","authors":"[\"Usman Nizamani\",\"M. Shaheer Luqman\",\"Fawad Javed Fateh\",\"Ali Shah Ali\",\"Murad Popattia\",\"M. Zeeshan Zia\",\"Quoc-Huy Tran\"]","published":"2026-05-29T07:18:20Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
