{"ID":2880917,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.09684","arxiv_id":"2509.09684","title":"Text-to-SQL Oriented to the Process Mining Domain: A PT-EN Dataset for Query Translation","abstract":"This paper introduces text-2-SQL-4-PM, a bilingual (Portuguese-English) benchmark dataset designed for the text-to-SQL task in the process mining domain. Text-to-SQL conversion facilitates natural language querying of databases, increasing accessibility for users without SQL expertise and productivity for those that are experts. The text-2-SQL-4-PM dataset is customized to address the unique challenges of process mining, including specialized vocabularies and single-table relational structures derived from event logs. The dataset comprises 1,655 natural language utterances, including human-generated paraphrases, 205 SQL statements, and ten qualifiers. Methods include manual curation by experts, professional translations, and a detailed annotation process to enable nuanced analyses of task complexity. Additionally, a baseline study using GPT-3.5 Turbo demonstrates the feasibility and utility of the dataset for text-to-SQL applications. The results show that text-2-SQL-4-PM supports evaluation of text-to-SQL implementations, offering broader applicability for semantic parsing and other natural language processing tasks.","short_abstract":"This paper introduces text-2-SQL-4-PM, a bilingual (Portuguese-English) benchmark dataset designed for the text-to-SQL task in the process mining domain. Text-to-SQL conversion facilitates natural language querying of databases, increasing accessibility for users without SQL expertise and productivity for those that ar...","url_abs":"https://arxiv.org/abs/2509.09684","url_pdf":"https://arxiv.org/pdf/2509.09684v1","authors":"[\"Bruno Yui Yamate\",\"Thais Rodrigues Neubauer\",\"Marcelo Fantinato\",\"Sarajane Marques Peres\"]","published":"2025-08-18T01:25:41Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.AI\",\"cs.CL\",\"cs.DB\"]","methods":"[]","has_code":false}
