{"ID":2871880,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.10401","arxiv_id":"2509.10401","title":"Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems","abstract":"Failure attribution in multi-agent systems -- pinpointing the exact step where a decisive error occurs -- is a critical yet unsolved challenge. Current methods treat this as a pattern recognition task over long conversation logs, leading to critically low step-level accuracy (below 17\\%), which renders them impractical for debugging complex systems. Their core weakness is a fundamental inability to perform robust counterfactual reasoning: to determine if correcting a single action would have actually averted the task failure. To bridge this \\emph{counterfactual inference gap}, we introduce Abduct-Act-Predict (A2P) Scaffolding, a novel agent framework that transforms failure attribution from pattern recognition into a structured causal inference task. A2P explicitly guides a large language model through a formal three-step reasoning process within a single inference pass: (1) Abduction, to infer the hidden root causes behind an agent's actions; (2) Action, to define a minimal corrective intervention; and (3) Prediction, to simulate the subsequent trajectory and verify if the intervention resolves the failure. This structured approach leverages the holistic context of the entire conversation while imposing a rigorous causal logic on the model's analysis. Our extensive experiments on the Who\\\u0026When benchmark demonstrate its efficacy. On the Algorithm-Generated dataset, A2P achieves 47.46\\% step-level accuracy, a 2.85$\\times$ improvement over the 16.67\\% of the baseline. On the more complex Hand-Crafted dataset, it achieves 29.31\\% step accuracy, a 2.43$\\times$ improvement over the baseline's 12.07\\%. By reframing the problem through a causal lens, A2P Scaffolding provides a robust, verifiable, and significantly more accurate solution for automated failure attribution. Ours code are released at https://github.com/ResearAI/A2P.","short_abstract":"Failure attribution in multi-agent systems -- pinpointing the exact step where a decisive error occurs -- is a critical yet unsolved challenge. Current methods treat this as a pattern recognition task over long conversation logs, leading to critically low step-level accuracy (below 17\\%), which renders them impractical...","url_abs":"https://arxiv.org/abs/2509.10401","url_pdf":"https://arxiv.org/pdf/2509.10401v2","authors":"[\"Alva West\",\"Yixuan Weng\",\"Minjun Zhu\",\"Zhen Lin\",\"Zhiyuan Ning\",\"Yue Zhang\"]","published":"2025-09-12T16:51:15Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":609902,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2871880,"paper_url":"https://arxiv.org/abs/2509.10401","paper_title":"Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems","repo_url":"https://github.com/ResearAI/A2P","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
