{"ID":2843663,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.06582","arxiv_id":"2511.06582","title":"TabRAG: Improving Tabular Document Question Answering for Retrieval Augmented Generation via Structured Representations","abstract":"Incorporating external knowledge bases in traditional retrieval-augmented generation (RAG) relies on parsing the document, followed by querying a language model with the parsed information via in-context learning. While effective for text-based documents, question answering on tabular documents often fails to generate plausible responses. Standard parsing techniques lose the two-dimensional structural semantics critical for cell interpretation. In this work, we present TabRAG, a parsing-based RAG framework designed to improve tabular document question answering via structured representations. Our framework consists of layout segmentation that decomposes the document inputs into a series of components, enabling fine-grained extraction. Subsequently, a vision language model parses and extracts the document tables into a hierarchically structured representation. In order to cater various table styles and formats, we integrate a self-generated in-context learning module that guides the table extraction process. Experimental results demonstrate that TabRAG outperforms existing popular parsing techniques across a broad suite of evaluation and ablation benchmarks. Code is available at: https://github.com/jacobyhsi/TabRAG.","short_abstract":"Incorporating external knowledge bases in traditional retrieval-augmented generation (RAG) relies on parsing the document, followed by querying a language model with the parsed information via in-context learning. While effective for text-based documents, question answering on tabular documents often fails to generate...","url_abs":"https://arxiv.org/abs/2511.06582","url_pdf":"https://arxiv.org/pdf/2511.06582v2","authors":"[\"Jacob Si\",\"Mike Qu\",\"Michelle Lee\",\"Marek Rei\",\"Yingzhen Li\"]","published":"2025-11-10T00:05:58Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.CV\",\"cs.IR\",\"cs.LG\"]","methods":"[\"RAG\",\"Language Model\"]","has_code":false,"code_links":[{"ID":607239,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2843663,"paper_url":"https://arxiv.org/abs/2511.06582","paper_title":"TabRAG: Improving Tabular Document Question Answering for Retrieval Augmented Generation via Structured Representations","repo_url":"https://github.com/jacobyhsi/TabRAG","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
