{"ID":2842150,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.10074","arxiv_id":"2511.10074","title":"VLF-MSC: Vision-Language Feature-Based Multimodal Semantic Communication System","abstract":"We propose Vision-Language Feature-based Multimodal Semantic Communication (VLF-MSC), a unified system that transmits a single compact vision-language representation to support both image and text generation at the receiver. Unlike existing semantic communication techniques that process each modality separately, VLF-MSC employs a pre-trained vision-language model (VLM) to encode the source image into a vision-language semantic feature (VLF), which is transmitted over the wireless channel. At the receiver, a decoder-based language model and a diffusion-based image generator are both conditioned on the VLF to produce a descriptive text and a semantically aligned image. This unified representation eliminates the need for modality-specific streams or retransmissions, improving spectral efficiency and adaptability. By leveraging foundation models, the system achieves robustness to channel noise while preserving semantic fidelity. Experiments demonstrate that VLF-MSC outperforms text-only and image-only baselines, achieving higher semantic accuracy for both modalities under low SNR with significantly reduced bandwidth.","short_abstract":"We propose Vision-Language Feature-based Multimodal Semantic Communication (VLF-MSC), a unified system that transmits a single compact vision-language representation to support both image and text generation at the receiver. Unlike existing semantic communication techniques that process each modality separately, VLF-MS...","url_abs":"https://arxiv.org/abs/2511.10074","url_pdf":"https://arxiv.org/pdf/2511.10074v1","authors":"[\"Gwangyeon Ahn\",\"Jiwan Seo\",\"Joonhyuk Kang\"]","published":"2025-11-13T08:29:32Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"eess.SY\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false}