{"ID":2830260,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.10619","arxiv_id":"2512.10619","title":"DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM","abstract":"Document parsing aims to transform unstructured PDF images into semi-structured data, facilitating the digitization and utilization of information in diverse domains. While vision language models (VLMs) have significantly advanced this task, achieving reliable, high-quality parsing in real-world scenarios remains challenging. Common practice often selects the top-performing model on standard benchmarks. However, these benchmarks may carry dataset-specific biases, leading to inconsistent model rankings and limited correlation with real-world performance. Moreover, benchmark metrics typically provide only overall scores, which can obscure distinct error patterns in output. This raises a key challenge: how can we reliably and comprehensively assess document parsing quality in the wild? We address this problem with DOCR-Inspector, which formalizes document parsing assessment as fine-grained error detection and analysis. Leveraging VLM-as-a-Judge, DOCR-Inspector analyzes a document image and its parsed output, identifies all errors, assigns them to one of 28 predefined types, and produces a comprehensive quality assessment. To enable this capability, we construct DOCRcase-200K for training and propose the Chain-of-Checklist reasoning paradigm to enable the hierarchical structure of parsing quality assessment. For empirical validation, we introduce DOCRcaseBench, a set of 882 real-world document parsing cases with manual annotations. On this benchmark, DOCR-Inspector-7B outperforms commercial models like Gemini 2.5 Pro, as well as leading open-source models. Further experiments demonstrate that its quality assessments provide valuable guidance for parsing results refinement, making DOCR-Inspector both a practical evaluator and a driver for advancing document parsing systems at scale. Model and code are released at: https://github.com/ZZZZZQT/DOCR-Inspector.","short_abstract":"Document parsing aims to transform unstructured PDF images into semi-structured data, facilitating the digitization and utilization of information in diverse domains. While vision language models (VLMs) have significantly advanced this task, achieving reliable, high-quality parsing in real-world scenarios remains chall...","url_abs":"https://arxiv.org/abs/2512.10619","url_pdf":"https://arxiv.org/pdf/2512.10619v1","authors":"[\"Qintong Zhang\",\"Junyuan Zhang\",\"Zhifei Ren\",\"Linke Ouyang\",\"Zichen Wen\",\"Junbo Niu\",\"Yuan Qu\",\"Bin Wang\",\"Ka-Ho Chow\",\"Conghui He\",\"Wentao Zhang\"]","published":"2025-12-11T13:16:33Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":606015,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2830260,"paper_url":"https://arxiv.org/abs/2512.10619","paper_title":"DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM","repo_url":"https://github.com/ZZZZZQT/DOCR-Inspector","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
