{"ID":2867413,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19595","arxiv_id":"2509.19595","title":"Anatomy of a Feeling: Narrating Embodied Emotions via Large Vision-Language Models","abstract":"The embodiment of emotional reactions from body parts contains rich information about our affective experiences. We propose a framework that utilizes state-of-the-art large vision-language models (LVLMs) to generate Embodied LVLM Emotion Narratives (ELENA). These are well-defined, multi-layered text outputs, primarily comprising descriptions that focus on the salient body parts involved in emotional reactions. We also employ attention maps and observe that contemporary models exhibit a persistent bias towards the facial region. Despite this limitation, we observe that our employed framework can effectively recognize embodied emotions in face-masked images, outperforming baselines without any fine-tuning. ELENA opens a new trajectory for embodied emotion analysis across the modality of vision and enriches modeling in an affect-aware setting.","short_abstract":"The embodiment of emotional reactions from body parts contains rich information about our affective experiences. We propose a framework that utilizes state-of-the-art large vision-language models (LVLMs) to generate Embodied LVLM Emotion Narratives (ELENA). These are well-defined, multi-layered text outputs, primarily...","url_abs":"https://arxiv.org/abs/2509.19595","url_pdf":"https://arxiv.org/pdf/2509.19595v1","authors":"[\"Mohammad Saim\",\"Phan Anh Duong\",\"Cat Luong\",\"Aniket Bhanderi\",\"Tianyu Jiang\"]","published":"2025-09-23T21:34:57Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}
