{"ID":2846691,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.01554","arxiv_id":"2511.01554","title":"Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning","abstract":"Effective communication in multi-agent reinforcement learning (MARL) is critical for success but constrained by bandwidth, yet past approaches have been limited to complex gating mechanisms that only decide \\textit{whether} to communicate, not \\textit{how precisely}. Learning to optimize message precision at the bit-level is fundamentally harder, as the required discretization step breaks gradient flow. We address this by generalizing Differentiable Discrete Communication Learning (DDCL), a framework for end-to-end optimization of discrete messages. Our primary contribution is an extension of DDCL to support unbounded signals, transforming it into a universal, plug-and-play layer for any MARL architecture. We verify our approach with three key results. First, through a qualitative analysis in a controlled environment, we demonstrate \\textit{how} agents learn to dynamically modulate message precision according to the informational needs of the task. Second, we integrate our variant of DDCL into four state-of-the-art MARL algorithms, showing it reduces bandwidth by over an order of magnitude while matching or exceeding task performance. Finally, we provide direct evidence for the \\enquote{Bitter Lesson} in MARL communication: a simple Transformer-based policy leveraging DDCL matches the performance of complex, specialized architectures, questioning the necessity of bespoke communication designs.","short_abstract":"Effective communication in multi-agent reinforcement learning (MARL) is critical for success but constrained by bandwidth, yet past approaches have been limited to complex gating mechanisms that only decide \\textit{whether} to communicate, not \\textit{how precisely}. Learning to optimize message precision at the bit-le...","url_abs":"https://arxiv.org/abs/2511.01554","url_pdf":"https://arxiv.org/pdf/2511.01554v1","authors":"[\"Aditya Kapoor\",\"Yash Bhisikar\",\"Benjamin Freed\",\"Jan Peters\",\"Mingfei Sun\"]","published":"2025-11-03T13:16:57Z","proceeding":"cs.MA","tasks":"[\"cs.MA\",\"cs.IT\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Transformer\"]","has_code":false}
