{"ID":2880808,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.14027","arxiv_id":"2508.14027","title":"Learning from Preferences and Mixed Demonstrations in General Settings","abstract":"Reinforcement learning is a general method for learning in sequential settings, but it can often be difficult to specify a good reward function when the task is complex. In these cases, preference feedback or expert demonstrations can be used instead. However, existing approaches utilising both together are often ad-hoc, rely on domain-specific properties, or won't scale. We develop a new framing for learning from human data, \\emph{reward-rational partial orderings over observations}, designed to be flexible and scalable. Based on this we introduce a practical algorithm, LEOPARD: Learning Estimated Objectives from Preferences And Ranked Demonstrations. LEOPARD can learn from a broad range of data, including negative demonstrations, to efficiently learn reward functions across a wide range of domains. We find that when a limited amount of preference and demonstration feedback is available, LEOPARD outperforms existing baselines by a significant margin. Furthermore, we use LEOPARD to investigate learning from many types of feedback compared to just a single one, and find that combining feedback types is often beneficial.","short_abstract":"Reinforcement learning is a general method for learning in sequential settings, but it can often be difficult to specify a good reward function when the task is complex. In these cases, preference feedback or expert demonstrations can be used instead. However, existing approaches utilising both together are often ad-ho...","url_abs":"https://arxiv.org/abs/2508.14027","url_pdf":"https://arxiv.org/pdf/2508.14027v1","authors":"[\"Jason R Brown\",\"Carl Henrik Ek\",\"Robert D Mullins\"]","published":"2025-08-19T17:37:35Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
