{"ID":2878257,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.19372","arxiv_id":"2508.19372","title":"Database Entity Recognition with Data Augmentation and Deep Learning","abstract":"This paper addresses the challenge of Database Entity Recognition (DB-ER) in Natural Language Queries (NLQ). We present several key contributions to advance this field: (1) a human-annotated benchmark for DB-ER task, derived from popular text-to-sql benchmarks, (2) a novel data augmentation procedure that leverages automatic annotation of NLQs based on the corresponding SQL queries which are available in popular text-to-SQL benchmarks, (3) a specialized language model based entity recognition model using T5 as a backbone and two down-stream DB-ER tasks: sequence tagging and token classification for fine-tuning of backend and performing DB-ER respectively. We compared our DB-ER tagger with two state-of-the-art NER taggers, and observed better performance in both precision and recall for our model. The ablation evaluation shows that data augmentation boosts precision and recall by over 10%, while fine-tuning of the T5 backbone boosts these metrics by 5-10%.","short_abstract":"This paper addresses the challenge of Database Entity Recognition (DB-ER) in Natural Language Queries (NLQ). We present several key contributions to advance this field: (1) a human-annotated benchmark for DB-ER task, derived from popular text-to-sql benchmarks, (2) a novel data augmentation procedure that leverages aut...","url_abs":"https://arxiv.org/abs/2508.19372","url_pdf":"https://arxiv.org/pdf/2508.19372v1","authors":"[\"Zikun Fu\",\"Chen Yang\",\"Kourosh Davoudi\",\"Ken Q. Pu\"]","published":"2025-08-26T19:05:40Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.DB\",\"cs.LG\"]","methods":"[\"Language Model\"]","has_code":false}
