{"ID":2871560,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12269","arxiv_id":"2509.12269","title":"Research on Short-Video Platform User Decision-Making via Multimodal Temporal Modeling and Reinforcement Learning","abstract":"This paper proposes the MT-DQN model, which integrates a Transformer, Temporal Graph Neural Network (TGNN), and Deep Q-Network (DQN) to address the challenges of predicting user behavior and optimizing recommendation strategies in short-video environments. Experiments demonstrated that MT-DQN consistently outperforms traditional concatenated models, such as Concat-Modal, achieving an average F1-score improvement of 10.97% and an average NDCG@5 improvement of 8.3%. Compared to the classic reinforcement learning model Vanilla-DQN, MT-DQN reduces MSE by 34.8% and MAE by 26.5%. Nonetheless, we also recognize challenges in deploying MT-DQN in real-world scenarios, such as its computational cost and latency sensitivity during online inference, which will be addressed through future architectural optimization.","short_abstract":"This paper proposes the MT-DQN model, which integrates a Transformer, Temporal Graph Neural Network (TGNN), and Deep Q-Network (DQN) to address the challenges of predicting user behavior and optimizing recommendation strategies in short-video environments. Experiments demonstrated that MT-DQN consistently outperforms t...","url_abs":"https://arxiv.org/abs/2509.12269","url_pdf":"https://arxiv.org/pdf/2509.12269v1","authors":"[\"Jinmeiyang Wang\",\"Jing Dong\",\"Li Zhou\"]","published":"2025-09-13T16:28:14Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.IR\"]","methods":"[\"Graph Neural Network\",\"Reinforcement Learning\",\"Transformer\"]","has_code":false}