{"ID":2855225,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.13561","arxiv_id":"2510.13561","title":"OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies","abstract":"The escalating complexity of modern software imposes an unsustainable operational burden on Site Reliability Engineering (SRE) teams, demanding AI-driven automation that can emulate expert diagnostic reasoning. Existing solutions, from traditional AI methods to general-purpose multi-agent systems, fall short: they either lack deep causal reasoning or are not tailored for the specialized, investigative workflows unique to SRE. To address this gap, we present OpenDerisk, a specialized, open-source multi-agent framework architected for SRE. OpenDerisk integrates a diagnostic-native collaboration model, a pluggable reasoning engine, a knowledge engine, and a standardized protocol (MCP) to enable specialist agents to collectively solve complex, multi-domain problems. Our comprehensive evaluation demonstrates that OpenDerisk significantly outperforms state-of-the-art baselines in both accuracy and efficiency. This effectiveness is validated by its large-scale production deployment at Ant Group, where it serves over 3,000 daily users across diverse scenarios, confirming its industrial-grade scalability and practical impact. OpenDerisk is open source and available at https://github.com/derisk-ai/OpenDerisk/","short_abstract":"The escalating complexity of modern software imposes an unsustainable operational burden on Site Reliability Engineering (SRE) teams, demanding AI-driven automation that can emulate expert diagnostic reasoning. Existing solutions, from traditional AI methods to general-purpose multi-agent systems, fall short: they eith...","url_abs":"https://arxiv.org/abs/2510.13561","url_pdf":"https://arxiv.org/pdf/2510.13561v2","authors":"[\"Peng Di\",\"Faqiang Chen\",\"Xiao Bai\",\"Hongjun Yang\",\"Qingfeng Li\",\"Ganglin Wei\",\"Jian Mou\",\"Feng Shi\",\"Keting Chen\",\"Peng Tang\",\"Zhitao Shen\",\"Zheng Li\",\"Wenhui Shi\",\"Junwei Guo\",\"Hang Yu\"]","published":"2025-10-15T13:59:58Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":608232,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2855225,"paper_url":"https://arxiv.org/abs/2510.13561","paper_title":"OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies","repo_url":"https://github.com/derisk-ai/OpenDerisk","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
