{"ID":2897316,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.04621","arxiv_id":"2507.04621","title":"Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences","abstract":"6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications. These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time, which is extremely challenging over resource-limited wireless communication systems. Moreover, a joint understanding of the environment, context, and user intent is essential to deliver task-relevant content effectively. This article presents a novel multimodal large language model (MLLM) integrated semantic communications framework, termed MLLM-SC, which fully leverages reasoning and generative capabilities of pre-trained foundation models for context-aware and task-oriented wireless communication. The MLLM-SC framework adopts a device-edge collaborative architecture. At the edge, MLLM-empowered semantic guidance module analyzes multimodal inputs, user intents, and channel conditions to generate importance-aware attention maps prioritizing semantically critical information. An importance-aware semantic encoder and a resource-adaptive semantic decoder are jointly designed and optimized, which can utilize the semantic guidance for adaptive bandwidth allocation and high-quality content reconstruction or generation. Extensive case studies on visual question answering for AR/VR applications and diffusion-driven image generation validate the effectiveness of MLLM-SC.","short_abstract":"6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications. These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time, which is extremely challenging over resource...","url_abs":"https://arxiv.org/abs/2507.04621","url_pdf":"https://arxiv.org/pdf/2507.04621v1","authors":"[\"Yusong Zhang\",\"Yuxuan Sun\",\"Lei Guo\",\"Wei Chen\",\"Bo Ai\",\"Deniz Gunduz\"]","published":"2025-07-07T02:42:35Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.NI\"]","methods":"[\"Diffusion Model\",\"Large Language Model\",\"Language Model\"]","has_code":false}
