{"ID":2827377,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.16149","arxiv_id":"2512.16149","title":"ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs","abstract":"Training LLMs to invoke tools and leverage retrieved information necessitates high-quality, diverse data. However, existing pipelines for synthetic data generation often rely on tens of thousands of real API calls to enhance generalization, incurring prohibitive costs while lacking multi-hop reasoning and self-reflection. To address these limitations, we introduce ToolForge, an automated synthesis framework that achieves strong real-world tool-calling performance by constructing only a small number of virtual tools, eliminating the need for real API calls. ToolForge leverages a (question, golden context, answer) triple to synthesize large-scale tool-learning data specifically designed for multi-hop search scenarios, further enriching the generated data through multi-hop reasoning and self-reflection mechanisms. To ensure data fidelity, we employ a Multi-Layer Validation Framework that integrates both rule-based and model-based assessments. Empirical results show that a model with only 8B parameters, when trained on our synthesized data, outperforms GPT-4o on multiple benchmarks. Our code and dataset are publicly available at https://github.com/Buycar-arb/ToolForge .","short_abstract":"Training LLMs to invoke tools and leverage retrieved information necessitates high-quality, diverse data. However, existing pipelines for synthetic data generation often rely on tens of thousands of real API calls to enhance generalization, incurring prohibitive costs while lacking multi-hop reasoning and self-reflecti...","url_abs":"https://arxiv.org/abs/2512.16149","url_pdf":"https://arxiv.org/pdf/2512.16149v1","authors":"[\"Hao Chen\",\"Zhexin Hu\",\"Jiajun Chai\",\"Haocheng Yang\",\"Hang He\",\"Xiaohan Wang\",\"Wei Lin\",\"Luhang Wang\",\"Guojun Yin\",\"Zhuofeng zhao\"]","published":"2025-12-18T04:06:26Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":605803,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2827377,"paper_url":"https://arxiv.org/abs/2512.16149","paper_title":"ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs","repo_url":"https://github.com/Buycar-arb/ToolForge","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
