{"ID":2896988,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.05876","arxiv_id":"2507.05876","title":"OLAF: Programmable Data Plane Acceleration for Asynchronous Distributed Reinforcement Learning","abstract":"Asynchronous Distributed Reinforcement Learning (DRL) can suffer from degraded convergence when model updates become stale, often the result of network congestion and packet loss during large-scale training. This work introduces a network data-plane acceleration architecture that mitigates such staleness by enabling inline processing of DRL model updates as they traverse the accelerator engine. To this end, we design and prototype a novel queueing mechanism that opportunistically combines compatible updates sharing a network element, reducing redundant traffic and preserving update utility. Complementing this we provide a lightweight transmission control mechanism at the worker nodes that is guided by feedback from the in-network accelerator. To assess model utility at line rate, we introduce the Age-of-Model (AoM) metric as a proxy for staleness and verify global fairness and responsiveness properties using a formal verification method. Our evaluations demonstrate that this architecture significantly reduces update staleness and congestion, ultimately improving the convergence rate in asynchronous DRL workloads.","short_abstract":"Asynchronous Distributed Reinforcement Learning (DRL) can suffer from degraded convergence when model updates become stale, often the result of network congestion and packet loss during large-scale training. This work introduces a network data-plane acceleration architecture that mitigates such staleness by enabling in...","url_abs":"https://arxiv.org/abs/2507.05876","url_pdf":"https://arxiv.org/pdf/2507.05876v1","authors":"[\"Nehal Baganal Krishna\",\"Anam Tahir\",\"Firas Khamis\",\"Mina Tahmasbi Arashloo\",\"Michael Zink\",\"Amr Rizk\"]","published":"2025-07-08T10:59:56Z","proceeding":"cs.NI","tasks":"[\"cs.NI\",\"cs.AR\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
