{"ID":2892991,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.13710","arxiv_id":"2507.13710","title":"SoftPipe: A Soft-Guided Reinforcement Learning Framework for Automated Data Preparation","abstract":"Data preparation is a foundational yet notoriously challenging component of the machine learning lifecycle, characterized by a vast combinatorial search space. While reinforcement learning (RL) offers a promising direction, state-of-the-art methods suffer from a critical limitation: to manage the search space, they rely on rigid ``hard constraints'' that prematurely prune the search space and often preclude optimal solutions. To address this, we introduce SoftPipe, a novel RL framework that replaces these constraints with a flexible ``soft guidance'' paradigm. SoftPipe formulates action selection as a Bayesian inference problem. A high-level strategic prior, generated by a Large Language Model (LLM), probabilistically guides exploration. This prior is combined with empirical estimators from two sources through a collaborative process: a fine-grained quality score from a supervised Learning-to-Rank (LTR) model and a long-term value estimate from the agent's Q-function. Through extensive experiments on 18 diverse datasets, we demonstrate that SoftPipe achieves up to a 13.9\\% improvement in pipeline quality and 2.8$\\times$ faster convergence compared to existing methods.","short_abstract":"Data preparation is a foundational yet notoriously challenging component of the machine learning lifecycle, characterized by a vast combinatorial search space. While reinforcement learning (RL) offers a promising direction, state-of-the-art methods suffer from a critical limitation: to manage the search space, they rel...","url_abs":"https://arxiv.org/abs/2507.13710","url_pdf":"https://arxiv.org/pdf/2507.13710v2","authors":"[\"Jing Chang\",\"Chang Liu\",\"Jinbin Huang\",\"Shuyuan Zheng\",\"Rui Mao\",\"Jianbin Qin\"]","published":"2025-07-18T07:43:22Z","proceeding":"cs.DB","tasks":"[\"cs.DB\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}
