{"ID":2921658,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-03T05:56:00.181519634Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01128","arxiv_id":"2606.01128","title":"Local MixVR: Breaking the Communication-Sample Dependence in Distributed Learning","abstract":"Communication overhead is a crucial bottleneck in scalable distributed learning. While existing methods aim to efficiently utilize data points, such as Local SGD, Minibatch SGD, and their accelerated variants, they still exhibit communication-round complexity that scales with the total number of samples $N$. In this paper, we introduce Local MixVR, a distributed framework that integrates local updates with variance-reduction techniques to mitigate local noise. We show that Local MixVR is the first distributed method to eliminate the dependence of communication complexity on $N$, achieving a complexity that scales only with the number of workers $M$. In common regimes where $M\u003cO\\left(N^{1/4}\\right)$, Local MixVR outperforms the state-of-the-art Minibatch Accelerated SGD baseline, bridging a long-standing gap in distributed optimization and establishing a new paradigm for communication-efficient training.","short_abstract":"Communication overhead is a crucial bottleneck in scalable distributed learning. While existing methods aim to efficiently utilize data points, such as Local SGD, Minibatch SGD, and their accelerated variants, they still exhibit communication-round complexity that scales with the total number of samples $N$. In this pa...","url_abs":"https://arxiv.org/abs/2606.01128","url_pdf":"https://arxiv.org/pdf/2606.01128v1","authors":"[\"Tehila Dahan\",\"Bassel Hamoud\",\"Roie Reshef\",\"Martin Jaggi\",\"Kfir Y. Levy\"]","published":"2026-05-31T10:02:15Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}