{"ID":2887789,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.00311","arxiv_id":"2508.00311","title":"DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios","abstract":"Optical Character Recognition (OCR) for mathematical formula is essential for the intelligent analysis of scientific literature. However, both task-specific and general vision-language models often struggle to handle the structural diversity, complexity, and real-world variability inherent in mathematical content. In this work, we present DocTron-Formula, a unified framework built upon general vision-language models, thereby eliminating the need for specialized architectures. Furthermore, we introduce CSFormula, a large-scale and challenging dataset that encompasses multidisciplinary and structurally complex formulas at the line, paragraph, and page levels. Through straightforward supervised fine-tuning, our approach achieves state-of-the-art performance across a variety of styles, scientific domains, and complex layouts. Experimental results demonstrate that our method not only surpasses specialized models in terms of accuracy and robustness, but also establishes a new paradigm for the automated understanding of complex scientific documents.","short_abstract":"Optical Character Recognition (OCR) for mathematical formula is essential for the intelligent analysis of scientific literature. However, both task-specific and general vision-language models often struggle to handle the structural diversity, complexity, and real-world variability inherent in mathematical content. In t...","url_abs":"https://arxiv.org/abs/2508.00311","url_pdf":"https://arxiv.org/pdf/2508.00311v1","authors":"[\"Yufeng Zhong\",\"Zhixiong Zeng\",\"Lei Chen\",\"Longrong Yang\",\"Liming Zheng\",\"Jing Huang\",\"Siqi Yang\",\"Lin Ma\"]","published":"2025-08-01T04:34:17Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}
