{"ID":2825159,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.21543","arxiv_id":"2512.21543","title":"CEMG: Collaborative-Enhanced Multimodal Generative Recommendation","abstract":"Generative recommendation models often struggle with two key challenges: (1) the superficial integration of collaborative signals, and (2) the decoupled fusion of multimodal features. These limitations hinder the creation of a truly holistic item representation. To overcome this, we propose CEMG, a novel Collaborative-Enhaned Multimodal Generative Recommendation framework. Our approach features a Multimodal Fusion Layer that dynamically integrates visual and textual features under the guidance of collaborative signals. Subsequently, a Unified Modality Tokenization stage employs a Residual Quantization VAE (RQ-VAE) to convert this fused representation into discrete semantic codes. Finally, in the End-to-End Generative Recommendation stage, a large language model is fine-tuned to autoregressively generate these item codes. Extensive experiments demonstrate that CEMG significantly outperforms state-of-the-art baselines.","short_abstract":"Generative recommendation models often struggle with two key challenges: (1) the superficial integration of collaborative signals, and (2) the decoupled fusion of multimodal features. These limitations hinder the creation of a truly holistic item representation. To overcome this, we propose CEMG, a novel Collaborative-...","url_abs":"https://arxiv.org/abs/2512.21543","url_pdf":"https://arxiv.org/pdf/2512.21543v1","authors":"[\"Yuzhen Lin\",\"Hongyi Chen\",\"Xuanjing Chen\",\"Shaowen Wang\",\"Ivonne Xu\",\"Dongming Jiang\"]","published":"2025-12-25T07:28:35Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"Language Model\",\"Variational Autoencoder\"]","has_code":false}