{"ID":3083869,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T03:38:11.424509713Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05836","arxiv_id":"2606.05836","title":"ProSPy: A Profiling-Driven SQL-Python Agentic Framework for Enterprise Text-to-SQL","abstract":"Large language models have substantially advanced Text-to-SQL systems, yet applying them to enterprise-scale databases remains challenging. Real-world databases often contain large and heterogeneous schemas, incomplete metadata, dialect-specific SQL syntax, and complex analytical questions that are difficult to solve with a single SQL query. To address these challenges, we propose ProSPy, a Profiling-driven SQL--Python agentic framework for enterprise-scale Text-to-SQL. ProSPy structures the reasoning process into four stages: it first extracts fine-grained data evidence through automatic profiling, progressively prunes large schemas into task-relevant contexts, fetches intermediate views through a dialect-agnostic SQL interface, and finally performs flexible downstream analysis with Python. This design combines the efficiency of SQL over large databases with the flexibility of Python-based analysis, while reducing reliance on unreliable metadata and improving robustness across SQL dialects. Experiments on Spider 2.0-Lite and Spider 2.0-Snow show that ProSPy consistently outperforms strong baselines with both open-source and proprietary models, achieving execution accuracies of 60.15% and 60.51% with Claude-4.5-Opus, without majority voting. Further analysis shows that ProSPy is robust to SQL dialect variations and achieves a favorable trade-off between schema recall and precision.","short_abstract":"Large language models have substantially advanced Text-to-SQL systems, yet applying them to enterprise-scale databases remains challenging. Real-world databases often contain large and heterogeneous schemas, incomplete metadata, dialect-specific SQL syntax, and complex analytical questions that are difficult to solve w...","url_abs":"https://arxiv.org/abs/2606.05836","url_pdf":"https://arxiv.org/pdf/2606.05836v1","authors":"[\"Zhaorui Yang\",\"Huawei Zheng\",\"Sen Yang\",\"Yuhui Zhang\",\"Haoxuan Li\",\"Zhizhen Yu\",\"Xuan Yi\",\"Chen Hou\",\"Defeng Xie\",\"Chao Hu\",\"Minfeng Zhu\",\"Dazhen Deng\",\"Haozhe Feng\",\"Danqing Huang\",\"Yingcai Wu\",\"Peng Chen\",\"Wei Chen\"]","published":"2026-06-04T08:13:05Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false}
