{"ID":2892670,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.15085","arxiv_id":"2507.15085","title":"OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities","abstract":"Improving visual text synthesis has long been a challenging and evolving frontier for image generation models. While recent state-of-the-art (SOTA) models have made remarkable strides in text generation capabilities, existing benchmarks inadequately assess their true performance due to narrow scope (scene text and posters only), isolated evaluation (T2I generation or editing separately), and insufficient difficulty (lacking challenging scenarios). To bridge this gap, we pioneer the unification of text-centric T2I generation, text editing, and OCR-related image-to-image translation to evaluate a model's holistic visual text synthesis abilities, i.e., OCR generative capabilities. Accordingly, we propose OCRGenBench, the most comprehensive benchmark to date for evaluating these abilities. OCRGenBench covers five common text categories and 33 OCR generative tasks, encompassing T2I generation, text editing, and other image-to-image OCR tasks (e.g., document dewarping and handwriting removal). The benchmark includes 1,060 human-annotated samples consisting of instruction-image-GT triplets, deliberately featuring high text density, diverse generation scales, varied aspect ratios, and bilingual content to capture real-world complexity. Furthermore, we introduce OCRGenScore, a unified metric integrating text accuracy, aesthetic quality, and instruction following. Extensive experiments on 19 cutting-edge generative models reveal that most score below 60/100. Our analysis exposes critical, previously overlooked limitations, including poor text localization, unintended content modifications, and failures with dense or small-scale text. We hope OCRGenBench establishes a robust standard to evaluate OCR generative capabilities, driving the evolution of reliable visual text synthesis. The benchmark and evaluation code are available at https://github.com/NiceRingNode/Awesome-Generative-Models-for-OCR.","short_abstract":"Improving visual text synthesis has long been a challenging and evolving frontier for image generation models. While recent state-of-the-art (SOTA) models have made remarkable strides in text generation capabilities, existing benchmarks inadequately assess their true performance due to narrow scope (scene text and post...","url_abs":"https://arxiv.org/abs/2507.15085","url_pdf":"https://arxiv.org/pdf/2507.15085v4","authors":"[\"Peirong Zhang\",\"Haowei Xu\",\"Jiaxin Zhang\",\"Xuhan Zheng\",\"Guitao Xu\",\"Yuyi Zhang\",\"Junle Liu\",\"Zhenhua Yang\",\"Wei Zhou\",\"Lianwen Jin\"]","published":"2025-07-20T18:43:09Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":612009,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2892670,"paper_url":"https://arxiv.org/abs/2507.15085","paper_title":"OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities","repo_url":"https://github.com/NiceRingNode/Awesome-Generative-Models-for-OCR","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
