{"ID":2886941,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.02427","arxiv_id":"2508.02427","title":"CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models","abstract":"Composable AI offers a scalable and effective paradigm for tackling complex AI tasks by decomposing them into sub-tasks and solving each sub-task using ready-to-use well-trained models. However, systematically evaluating methods under this setting remains largely unexplored. In this paper, we introduce CABENCH, the first public benchmark comprising 70 realistic composable AI tasks, along with a curated pool of 700 models across multiple modalities and domains. We also propose an evaluation framework to enable end-to-end assessment of composable AI solutions. To establish initial baselines, we provide human-designed reference solutions and compare their performance with two LLM-based approaches. Our results illustrate the promise of composable AI in addressing complex real-world problems while highlighting the need for methods that can fully unlock its potential by automatically generating effective execution pipelines.","short_abstract":"Composable AI offers a scalable and effective paradigm for tackling complex AI tasks by decomposing them into sub-tasks and solving each sub-task using ready-to-use well-trained models. However, systematically evaluating methods under this setting remains largely unexplored. In this paper, we introduce CABENCH, the fir...","url_abs":"https://arxiv.org/abs/2508.02427","url_pdf":"https://arxiv.org/pdf/2508.02427v1","authors":"[\"Tung-Thuy Pham\",\"Duy-Quan Luong\",\"Minh-Quan Duong\",\"Trung-Hieu Nguyen\",\"Thu-Trang Nguyen\",\"Son Nguyen\",\"Hieu Dinh Vo\"]","published":"2025-08-04T13:48:32Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.SE\"]","methods":"[\"Large Language Model\"]","has_code":false}
