{"ID":2848505,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.25348","arxiv_id":"2510.25348","title":"Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction","abstract":"Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) temporal leakage in current evaluation--random cascade-based splits allow models to access future information, yielding unrealistic results; (2) feature-poor datasets that lack downstream conversion signals (e.g., likes, comments, or purchases), which limits more practical applications; (3) computational inefficiency of complex graph-based methods that require days of training for marginal gains. We systematically address these challenges from three perspectives: task setup, dataset construction, and model design. First, we propose a time-ordered splitting strategy that chronologically partitions data into consecutive windows, ensuring models are evaluated on genuine forecasting tasks without future information leakage. Second, we introduce Taoke, a large-scale e-commerce cascade dataset featuring rich promoter/product attributes and ground-truth purchase conversions--capturing the complete diffusion lifecycle from promotion to monetization. Third, we develop CasTemp, a lightweight framework that efficiently models cascade dynamics through temporal walks, Jaccard-based neighbor selection for inter-cascade dependencies, and GRU-based encoding with time-aware attention. Under leak-free evaluation, CasTemp achieves state-of-the-art performance across four datasets with orders-of-magnitude speedup. Notably, it excels at predicting second-stage popularity conversions--a practical task critical for real-world applications.","short_abstract":"Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) temporal leakage in current evaluation--random cascade-based splits allow models to access future information, yielding unrealistic res...","url_abs":"https://arxiv.org/abs/2510.25348","url_pdf":"https://arxiv.org/pdf/2510.25348v2","authors":"[\"Jie Peng\",\"Rui Wang\",\"Qiang Wang\",\"Zhewei Wei\",\"Bin Tong\",\"Guan Wang\",\"Bo Zheng\"]","published":"2025-10-29T10:06:08Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.SI\"]","methods":"[\"Diffusion Model\"]","has_code":false}
