{"ID":2852957,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.18036","arxiv_id":"2510.18036","title":"Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware","abstract":"Deploying emotion recognition systems in real-world environments where devices must be small, low-power, and private remains a significant challenge. This is especially relevant for applications such as tension monitoring, conflict de-escalation, and responsive wearables, where cloud-based solutions are impractical. Multimodal emotion recognition has advanced through deep learning, but most systems remain unsuitable for deployment on ultra-constrained edge devices. Prior work typically relies on powerful hardware, lacks real-time performance, or uses unimodal input. This paper addresses that gap by presenting a hardware-aware emotion recognition system that combines acoustic and linguistic features using a late-fusion architecture optimised for Edge TPU. The design integrates a quantised transformer-based acoustic model with frozen keyword embeddings from a DSResNet-SE network, enabling real-time inference within a 1.8MB memory budget and 21-23ms latency. The pipeline ensures spectrogram alignment between training and deployment using MicroFrontend and MLTK. Evaluation on re-recorded, segmented IEMOCAP samples captured through the Coral Dev Board Micro microphone shows a 6.3% macro F1 improvement over unimodal baselines. This work demonstrates that accurate, real-time multimodal emotion inference is achievable on microcontroller-class edge platforms through task-specific fusion and hardware-guided model design.","short_abstract":"Deploying emotion recognition systems in real-world environments where devices must be small, low-power, and private remains a significant challenge. This is especially relevant for applications such as tension monitoring, conflict de-escalation, and responsive wearables, where cloud-based solutions are impractical. Mu...","url_abs":"https://arxiv.org/abs/2510.18036","url_pdf":"https://arxiv.org/pdf/2510.18036v1","authors":"[\"Stavros Mitsis\",\"Ermos Hadjikyriakos\",\"Humaid Ibrahim\",\"Savvas Neofytou\",\"Shashwat Raman\",\"James Myles\",\"Eiman Kanjo\"]","published":"2025-10-20T19:18:22Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.LG\",\"eess.AS\"]","methods":"[\"Transformer\"]","has_code":false}
