{"ID":3005076,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T07:50:16.0004273Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03327","arxiv_id":"2606.03327","title":"CAPER: Clause-Aligned Process Supervision for Text-to-SQL","abstract":"Text-to-SQL systems are typically evaluated by query-level execution correctness, but this terminal signal provides little guidance about which intermediate SQL decision caused success or failure. Token-level dense supervision is also ill-suited: SQL tokens do not align with complete semantic decisions, can penalize execution-equivalent queries, and are difficult to label reliably at scale. We therefore propose CAPER, which automatically derives clause-level supervision via counterfactual intervention on the SQL abstract syntax tree, enabling root-cause error localization for reward modeling; the resulting data is used to train CAPER-9B, a lightweight Clause-PRM that provides clause-boundary feedback for policy optimization and candidate verification. Experiments on BIRD and Spider show that clause-aligned supervision not only improves execution accuracy, achieving up to a 15.3% relative EX improvement over GPT-5.4, but also strengthens failure-localization capability, reaching 84.53% accuracy and 90.60% MRR on held-out failures. Our project page is at https://github.com/banrichard/RL-NL2SQL.","short_abstract":"Text-to-SQL systems are typically evaluated by query-level execution correctness, but this terminal signal provides little guidance about which intermediate SQL decision caused success or failure. Token-level dense supervision is also ill-suited: SQL tokens do not align with complete semantic decisions, can penalize ex...","url_abs":"https://arxiv.org/abs/2606.03327","url_pdf":"https://arxiv.org/pdf/2606.03327v1","authors":"[\"Lujie Ban\",\"Jiasheng Shi\",\"Jinyang Li\",\"Xiaolin Han\",\"Tsz Nam Chan\",\"Chenhao Ma\"]","published":"2026-06-02T08:35:40Z","proceeding":"cs.DB","tasks":"[\"cs.DB\",\"cs.CL\"]","methods":"[]","has_code":false,"code_links":[{"ID":612731,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-03T03:09:48.883664427Z","DeletedAt":null,"paper_id":3005076,"paper_url":"https://arxiv.org/abs/2606.03327","paper_title":"CAPER: Clause-Aligned Process Supervision for Text-to-SQL","repo_url":"https://github.com/banrichard/RL-NL2SQL","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
