{"ID":2898236,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.03254","arxiv_id":"2507.03254","title":"CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs","abstract":"Effective prompt design is essential for improving the planning capabilities of large language model (LLM)-driven agents. However, existing structured prompting strategies are typically limited to single-agent, plan-only settings, and often evaluate performance solely based on task accuracy - overlooking critical factors such as token efficiency, modularity, and scalability in multi-agent environments. To address these limitations, we introduce CodeAgents, a prompting framework that codifies multi-agent reasoning and enables structured, token-efficient planning in multi-agent systems. In CodeAgents, all components of agent interaction - Task, Plan, Feedback, system roles, and external tool invocations - are codified into modular pseudocode enriched with control structures (e.g., loops, conditionals), boolean logic, and typed variables. This design transforms loosely connected agent plans into cohesive, interpretable, and verifiable multi-agent reasoning programs. We evaluate the proposed framework across three diverse benchmarks - GAIA, HotpotQA, and VirtualHome - using a range of representative LLMs. Results show consistent improvements in planning performance, with absolute gains of 3-36 percentage points over natural language prompting baselines. On VirtualHome, our method achieves a new state-of-the-art success rate of 56%. In addition, our approach reduces input and output token usage by 55-87% and 41-70%, respectively, underscoring the importance of token-aware evaluation metrics in the development of scalable multi-agent LLM systems. The code and resources are available at: https://anonymous.4open.science/r/CodifyingAgent-5A86","short_abstract":"Effective prompt design is essential for improving the planning capabilities of large language model (LLM)-driven agents. However, existing structured prompting strategies are typically limited to single-agent, plan-only settings, and often evaluate performance solely based on task accuracy - overlooking critical facto...","url_abs":"https://arxiv.org/abs/2507.03254","url_pdf":"https://arxiv.org/pdf/2507.03254v1","authors":"[\"Bruce Yang\",\"Xinfeng He\",\"Huan Gao\",\"Yifan Cao\",\"Xiaofan Li\",\"David Hsu\"]","published":"2025-07-04T02:20:19Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","project_urls":"[\"https://anonymous.4open.science/r/CodifyingAgent-5A86\"]","has_code":false}