{"ID":2839355,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15074","arxiv_id":"2511.15074","title":"Knowledge-Informed Automatic Feature Extraction via Collaborative Large Language Model Agents","abstract":"The performance of machine learning models on tabular data is critically dependent on high-quality feature engineering. While Large Language Models (LLMs) have shown promise in automating feature extraction (AutoFE), existing methods are often limited by monolithic LLM architectures, simplistic quantitative feedback, and a failure to systematically integrate external domain knowledge. This paper introduces Rogue One, a novel, LLM-based multi-agent framework for knowledge-informed automatic feature extraction. Rogue One operationalizes a decentralized system of three specialized agents-Scientist, Extractor, and Tester-that collaborate iteratively to discover, generate, and validate predictive features. Crucially, the framework moves beyond primitive accuracy scores by introducing a rich, qualitative feedback mechanism and a \"flooding-pruning\" strategy, allowing it to dynamically balance feature exploration and exploitation. By actively incorporating external knowledge via an integrated retrieval-augmented (RAG) system, Rogue One generates features that are not only statistically powerful but also semantically meaningful and interpretable. We demonstrate that Rogue One significantly outperforms state-of-the-art methods on a comprehensive suite of 19 classification and 9 regression datasets. Furthermore, we show qualitatively that the system surfaces novel, testable hypotheses, such as identifying a new potential biomarker in the myocardial dataset, underscoring its utility as a tool for scientific discovery.","short_abstract":"The performance of machine learning models on tabular data is critically dependent on high-quality feature engineering. While Large Language Models (LLMs) have shown promise in automating feature extraction (AutoFE), existing methods are often limited by monolithic LLM architectures, simplistic quantitative feedback, a...","url_abs":"https://arxiv.org/abs/2511.15074","url_pdf":"https://arxiv.org/pdf/2511.15074v1","authors":"[\"Henrik Bradland\",\"Morten Goodwin\",\"Vladimir I. Zadorozhny\",\"Per-Arne Andersen\"]","published":"2025-11-19T03:27:14Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}