{"ID":2867163,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19029","arxiv_id":"2509.19029","title":"Clapping: Removing Per-sample Storage for Pipeline Parallel Distributed Optimization with Communication Compression","abstract":"Pipeline-parallel distributed optimization is essential for large-scale machine learning but is challenged by significant communication overhead from transmitting high-dimensional activations and gradients between workers. Existing approaches often depend on impractical unbiased gradient assumptions or incur sample-size memory overhead. This paper introduces Clapping, a Communication compression algorithm with LAzy samPling for Pipeline-parallel learnING. Clapping adopts a lazy sampling strategy that reuses data samples across steps, breaking sample-wise memory barrier and supporting convergence in few-epoch or online training regimes. Clapping comprises two variants including Clapping-FC and Clapping-FU, both of which achieve convergence without unbiased gradient assumption, effectively addressing compression error propagation in multi-worker settings. Numerical experiments validate the performance of Clapping across different learning tasks.","short_abstract":"Pipeline-parallel distributed optimization is essential for large-scale machine learning but is challenged by significant communication overhead from transmitting high-dimensional activations and gradients between workers. Existing approaches often depend on impractical unbiased gradient assumptions or incur sample-siz...","url_abs":"https://arxiv.org/abs/2509.19029","url_pdf":"https://arxiv.org/pdf/2509.19029v1","authors":"[\"Boao Kong\",\"Xu Huang\",\"Yuqi Xu\",\"Yixuan Liang\",\"Bin Wang\",\"Kun Yuan\"]","published":"2025-09-23T14:02:43Z","proceeding":"math.OC","tasks":"[\"math.OC\",\"stat.ML\"]","methods":"[]","has_code":false}
