{"ID":2879320,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.16147","arxiv_id":"2508.16147","title":"Cross-Modal Prototype Augmentation and Dual-Grained Prompt Learning for Social Media Popularity Prediction","abstract":"Social Media Popularity Prediction is a complex multimodal task that requires effective integration of images, text, and structured information. However, current approaches suffer from inadequate visual-textual alignment and fail to capture the inherent cross-content correlations and hierarchical patterns in social media data. To overcome these limitations, we establish a multi-class framework , introducing hierarchical prototypes for structural enhancement and contrastive learning for improved vision-text alignment. Furthermore, we propose a feature-enhanced framework integrating dual-grained prompt learning and cross-modal attention mechanisms, achieving precise multimodal representation through fine-grained category modeling. Experimental results demonstrate state-of-the-art performance on benchmark metrics, establishing new reference standards for multimodal social media analysis.","short_abstract":"Social Media Popularity Prediction is a complex multimodal task that requires effective integration of images, text, and structured information. However, current approaches suffer from inadequate visual-textual alignment and fail to capture the inherent cross-content correlations and hierarchical patterns in social med...","url_abs":"https://arxiv.org/abs/2508.16147","url_pdf":"https://arxiv.org/pdf/2508.16147v1","authors":"[\"Ao Zhou\",\"Mingsheng Tu\",\"Luping Wang\",\"Tenghao Sun\",\"Zifeng Cheng\",\"Yafeng Yin\",\"Zhiwei Jiang\",\"Qing Gu\"]","published":"2025-08-22T07:16:47Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[]","has_code":false}