{"ID":2874978,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.03212","arxiv_id":"2509.03212","title":"AIVA: An AI-based Virtual Companion for Emotion-aware Interaction","abstract":"Recent advances in Large Language Models (LLMs) have significantly improved natural language understanding and generation, enhancing Human-Computer Interaction (HCI). However, LLMs are limited to unimodal text processing and lack the ability to interpret emotional cues from non-verbal signals, hindering more immersive and empathetic interactions. This work explores integrating multimodal sentiment perception into LLMs to create emotion-aware agents. We propose \\ours, an AI-based virtual companion that captures multimodal sentiment cues, enabling emotionally aligned and animated HCI. \\ours introduces a Multimodal Sentiment Perception Network (MSPN) using a cross-modal fusion transformer and supervised contrastive learning to provide emotional cues. Additionally, we develop an emotion-aware prompt engineering strategy for generating empathetic responses and integrate a Text-to-Speech (TTS) system and animated avatar module for expressive interactions. \\ours provides a framework for emotion-aware agents with applications in companion robotics, social care, mental health, and human-centered AI.","short_abstract":"Recent advances in Large Language Models (LLMs) have significantly improved natural language understanding and generation, enhancing Human-Computer Interaction (HCI). However, LLMs are limited to unimodal text processing and lack the ability to interpret emotional cues from non-verbal signals, hindering more immersive...","url_abs":"https://arxiv.org/abs/2509.03212","url_pdf":"https://arxiv.org/pdf/2509.03212v1","authors":"[\"Chenxi Li\"]","published":"2025-09-03T11:00:46Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false}