{"ID":2888520,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.00106","arxiv_id":"2508.00106","title":"Hyperproperty-Constrained Secure Reinforcement Learning","abstract":"Hyperproperties for Time Window Temporal Logic (HyperTWTL) is a domain-specific formal specification language known for its effectiveness in compactly representing security, opacity, and concurrency properties for robotics applications. This paper focuses on HyperTWTL-constrained secure reinforcement learning (SecRL). Although temporal logic-constrained safe reinforcement learning (SRL) is an evolving research problem with several existing literature, there is a significant research gap in exploring security-aware reinforcement learning (RL) using hyperproperties. Given the dynamics of an agent as a Markov Decision Process (MDP) and opacity/security constraints formalized as HyperTWTL, we propose an approach for learning security-aware optimal policies using dynamic Boltzmann softmax RL while satisfying the HyperTWTL constraints. The effectiveness and scalability of our proposed approach are demonstrated using a pick-up and delivery robotic mission case study. We also compare our results with two other baseline RL algorithms, showing that our proposed method outperforms them.","short_abstract":"Hyperproperties for Time Window Temporal Logic (HyperTWTL) is a domain-specific formal specification language known for its effectiveness in compactly representing security, opacity, and concurrency properties for robotics applications. This paper focuses on HyperTWTL-constrained secure reinforcement learning (SecRL)....","url_abs":"https://arxiv.org/abs/2508.00106","url_pdf":"https://arxiv.org/pdf/2508.00106v1","authors":"[\"Ernest Bonnah\",\"Luan Viet Nguyen\",\"Khaza Anuarul Hoque\"]","published":"2025-07-31T18:57:18Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\",\"cs.LO\",\"eess.SY\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}