{"ID":2860953,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.02995","arxiv_id":"2510.02995","title":"AudioToolAgent: An Agentic Framework for Audio-Language Models","abstract":"Large Audio-Language Models (LALMs) perform well on audio understanding tasks but lack multistep reasoning and tool-calling found in recent Large Language Models (LLMs). This paper presents AudioToolAgent, a framework that coordinates audio-language models as tools via a central LLM agent that accesses tool adapters for audio question answering and speech-to-text. The agent reasons about which tools to invoke, how to formulate follow-up queries, and how to arbitrate conflicting tool outputs, without accessing the audio. Experiments with MMAU, MMAR, and MMAU-Pro show state-of-the-art accuracy: up to 77.50% in MMAU, 77.00% in MMAR, and 61.90% in MMAU-Pro. Shapley-based analysis identifies effective agent-tool combinations. The code and reproduction materials are available at https://github.com/GLJS/AudioToolAgent.","short_abstract":"Large Audio-Language Models (LALMs) perform well on audio understanding tasks but lack multistep reasoning and tool-calling found in recent Large Language Models (LLMs). This paper presents AudioToolAgent, a framework that coordinates audio-language models as tools via a central LLM agent that accesses tool adapters fo...","url_abs":"https://arxiv.org/abs/2510.02995","url_pdf":"https://arxiv.org/pdf/2510.02995v2","authors":"[\"Gijs Wijngaard\",\"Elia Formisano\",\"Michel Dumontier\",\"Jenia Jitsev\"]","published":"2025-10-03T13:35:45Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":608779,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2860953,"paper_url":"https://arxiv.org/abs/2510.02995","paper_title":"AudioToolAgent: An Agentic Framework for Audio-Language Models","repo_url":"https://github.com/GLJS/AudioToolAgent","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
