{"ID":3053359,"CreatedAt":"2026-06-04T04:41:36.695875263Z","UpdatedAt":"2026-06-06T03:14:50.67780443Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04399","arxiv_id":"2606.04399","title":"DPDL: Towards Differential Privacy Preservation in Decentralized Stochastic Learning on Non-IID Data","abstract":"In the paradigm of decentralized learning, a group of agents collaborate to train a global model using distributed datasets without a central server. Although the power of collaboration has been verified by many state-of-the-art studies, it entails extensive gradient information exchanging among the agents and thus induces high risk of privacy leakage for the individual agents. Moreover, in real-world applications, the training data are usually non-identically and independently distributed across the agents, inducing more challenges to enable privacy-preserved decentralized learning. To address these issues, we propose a privacy-preserved decentralized learning algorithm with non-IID data, DPDL, which leverages the notion of Differential Privacy (DP) in cross-gradient aggregation through a similarity-based calibration technique. Specifically, in each round, each agent perturbs the cross-gradients (i.e., the derivatives of its neighbors' local model in its private local data) by Gaussian noise mechanism before sharing them with its neighbors; it then adopt cosine similarity to calibrate the received perturbed cross-gradients such that the aggregation of the calibrated cross-gradients can be utilized to effectively update local model in a momentum-like manner. Our rigorous theoretical analysis not only reveals the minimum noise level required to achieve a specific level of privacy preservation, but also illustrates that our algorithm still achieves a linear speedup in training with non-IID data. We finally conduct extensive experiments on real-world dataset to validate the effectiveness of our algorithm in defending privacy attacks and in training accurate models.","short_abstract":"In the paradigm of decentralized learning, a group of agents collaborate to train a global model using distributed datasets without a central server. Although the power of collaboration has been verified by many state-of-the-art studies, it entails extensive gradient information exchanging among the agents and thus ind...","url_abs":"https://arxiv.org/abs/2606.04399","url_pdf":"https://arxiv.org/pdf/2606.04399v1","authors":"[\"Yunsheng Yuan\",\"Xue Xiao\",\"Lina Wang\",\"Feng Li\"]","published":"2026-06-03T03:27:40Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CR\"]","methods":"[]","has_code":false}