{"ID":2894223,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.10960","arxiv_id":"2507.10960","title":"Whom to Respond To? A Transformer-Based Model for Multi-Party Social Robot Interaction","abstract":"Prior human-robot interaction (HRI) research has primarily focused on single-user interactions, where robots do not need to consider the timing or recipient of their responses. However, in multi-party interactions, such as at malls and hospitals, social robots must understand the context and decide both when and to whom they should respond. In this paper, we propose a Transformer-based multi-task learning framework to improve the decision-making process of social robots, particularly in multi-user environments. Considering the characteristics of HRI, we propose two novel loss functions: one that enforces constraints on active speakers to improve scene modeling, and another that guides response selection towards utterances specifically directed at the robot. Additionally, we construct a novel multi-party HRI dataset that captures real-world complexities, such as gaze misalignment. Experimental results demonstrate that our model achieves state-of-the-art performance in respond decisions, outperforming existing heuristic-based and single-task approaches. Our findings contribute to the development of socially intelligent social robots capable of engaging in natural and context-aware multi-party interactions.","short_abstract":"Prior human-robot interaction (HRI) research has primarily focused on single-user interactions, where robots do not need to consider the timing or recipient of their responses. However, in multi-party interactions, such as at malls and hospitals, social robots must understand the context and decide both when and to who...","url_abs":"https://arxiv.org/abs/2507.10960","url_pdf":"https://arxiv.org/pdf/2507.10960v1","authors":"[\"He Zhu\",\"Ryo Miyoshi\",\"Yuki Okafuji\"]","published":"2025-07-15T03:42:14Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CV\"]","methods":"[\"Transformer\"]","has_code":false}
