{"ID":2895866,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.09016","arxiv_id":"2507.09016","title":"Enhancing RLHF with Human Gaze Modeling","abstract":"Reinforcement Learning from Human Feedback (RLHF) aligns language models with human preferences but is computationally expensive. We explore two approaches that leverage human gaze modeling to enhance RLHF: (1) gaze-aware reward models and (2) gaze-based distribution of sparse rewards at token level. Our experiments demonstate that gaze-informed RLHF achieves faster convergence while maintaining or slightly improving performance, thus, reducing computational costs during policy optimization. These results show that human gaze provides a valuable and underused signal for policy optimization, pointing to a promising direction for improving RLHF efficiency.","short_abstract":"Reinforcement Learning from Human Feedback (RLHF) aligns language models with human preferences but is computationally expensive. We explore two approaches that leverage human gaze modeling to enhance RLHF: (1) gaze-aware reward models and (2) gaze-based distribution of sparse rewards at token level. Our experiments de...","url_abs":"https://arxiv.org/abs/2507.09016","url_pdf":"https://arxiv.org/pdf/2507.09016v2","authors":"[\"Karim Galliamov\",\"Ivan Titov\",\"Ilya Pershin\"]","published":"2025-07-11T20:49:04Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Language Model\",\"RLHF\"]","has_code":false}
