{"ID":2824064,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.24325","arxiv_id":"2512.24325","title":"MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems","abstract":"Modern recommender systems face significant computational challenges due to growing model complexity and traffic scale, making efficient computation allocation critical for maximizing business revenue. Existing approaches typically simplify multi-stage computation resource allocation, neglecting inter-stage dependencies, thus limiting global optimality. In this paper, we propose MaRCA, a multi-agent reinforcement learning framework for end-to-end computation resource allocation in large-scale recommender systems. MaRCA models the stages of a recommender system as cooperative agents, using Centralized Training with Decentralized Execution (CTDE) to optimize revenue under computation resource constraints. We introduce an AutoBucket TestBench for accurate computation cost estimation, and a Model Predictive Control (MPC)-based Revenue-Cost Balancer to proactively forecast traffic loads and adjust the revenue-cost trade-off accordingly. Since its end-to-end deployment in the advertising pipeline of a leading global e-commerce platform in November 2024, MaRCA has consistently handled hundreds of billions of ad requests per day and has delivered a 16.67% revenue uplift using existing computation resources.","short_abstract":"Modern recommender systems face significant computational challenges due to growing model complexity and traffic scale, making efficient computation allocation critical for maximizing business revenue. Existing approaches typically simplify multi-stage computation resource allocation, neglecting inter-stage dependencie...","url_abs":"https://arxiv.org/abs/2512.24325","url_pdf":"https://arxiv.org/pdf/2512.24325v1","authors":"[\"Wan Jiang\",\"Xinyi Zang\",\"Yudong Zhao\",\"Yusi Zou\",\"Yunfei Lu\",\"Junbo Tong\",\"Yang Liu\",\"Ming Li\",\"Jiani Shi\",\"Xin Yang\"]","published":"2025-12-30T16:27:41Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.LG\",\"cs.MA\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
