{"ID":2896615,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.08865","arxiv_id":"2507.08865","title":"Spatial ModernBERT: Spatial-Aware Transformer for Table and Key-Value Extraction in Financial Documents at Scale","abstract":"Extracting tables and key-value pairs from financial documents is essential for business workflows such as auditing, data analytics, and automated invoice processing. In this work, we introduce Spatial ModernBERT-a transformer-based model augmented with spatial embeddings-to accurately detect and extract tabular data and key-value fields from complex financial documents. We cast the extraction task as token classification across three heads: (1) Label Head, classifying each token as a label (e.g., PO Number, PO Date, Item Description, Quantity, Base Cost, MRP, etc.); (2) Column Head, predicting column indices; (3) Row Head, distinguishing the start of item rows and header rows. The model is pretrained on the PubTables-1M dataset, then fine-tuned on a financial document dataset, achieving robust performance through cross-entropy loss on each classification head. We propose a post-processing method to merge tokens using B-I-IB tagging, reconstruct the tabular layout, and extract key-value pairs. Empirical evaluation shows that Spatial ModernBERT effectively leverages both textual and spatial cues, facilitating highly accurate table and key-value extraction in real-world financial documents.","short_abstract":"Extracting tables and key-value pairs from financial documents is essential for business workflows such as auditing, data analytics, and automated invoice processing. In this work, we introduce Spatial ModernBERT-a transformer-based model augmented with spatial embeddings-to accurately detect and extract tabular data a...","url_abs":"https://arxiv.org/abs/2507.08865","url_pdf":"https://arxiv.org/pdf/2507.08865v1","authors":"[\"Javis AI Team\",\"Amrendra Singh\",\"Maulik Shah\",\"Dharshan Sampath\"]","published":"2025-07-09T14:40:40Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Transformer\"]","has_code":false}
