{"ID":2838007,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.18438","arxiv_id":"2511.18438","title":"LLMs as Firmware Experts: A Runtime-Grown Tree-of-Agents Framework","abstract":"Large Language Models (LLMs) and their agent systems have recently demonstrated strong potential in automating code reasoning and vulnerability detection. However, when applied to large-scale firmware, their performance degrades due to the binary nature of firmware, complex dependency structures, and heterogeneous components. To address this challenge, this paper presents FIRMHIVE, a recursive agent hive that enables LLMs to act as autonomous firmware security analysts. FIRMHIVE introduces two key mechanisms: (1) transforming delegation into a per-agent, executable primitive and (2) constructing a runtime Tree of Agents (ToA) for decentralized coordination. We evaluate FIRMHIVE using real-world firmware images obtained from publicly available datasets, covering five representative security analysis tasks. Compared with existing LLM-agent baselines, FIRMHIVE performs deeper (about 16x more reasoning steps) and broader (about 2.3x more files inspected) cross-file exploration, resulting in about 5.6x more alerts per firmware. Compared to state-of-the-art (SOTA) security tools, FIRMHIVE identifies about 1.5x more vulnerabilities (1,802 total) and achieves 71% precision, representing significant improvements in both yield and fidelity.","short_abstract":"Large Language Models (LLMs) and their agent systems have recently demonstrated strong potential in automating code reasoning and vulnerability detection. However, when applied to large-scale firmware, their performance degrades due to the binary nature of firmware, complex dependency structures, and heterogeneous comp...","url_abs":"https://arxiv.org/abs/2511.18438","url_pdf":"https://arxiv.org/pdf/2511.18438v1","authors":"[\"Xiangrui Zhang\",\"Zeyu Chen\",\"Haining Wang\",\"Qiang Li\"]","published":"2025-11-23T13:19:40Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.SE\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}
