{"ID":2887477,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.01153","arxiv_id":"2508.01153","title":"TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition","abstract":"Scene Text Recognition (STR) remains a challenging task due to complex visual appearances and limited semantic priors. We propose TEACH, a novel training paradigm that injects ground-truth text into the model as auxiliary input and progressively reduces its influence during training. By encoding target labels into the embedding space and applying loss-aware masking, TEACH simulates a curriculum learning process that guides the model from label-dependent learning to fully visual recognition. Unlike language model-based approaches, TEACH requires no external pretraining and introduces no inference overhead. It is model-agnostic and can be seamlessly integrated into existing encoder-decoder frameworks. Extensive experiments across multiple public benchmarks show that models trained with TEACH achieve consistently improved accuracy, especially under challenging conditions, validating its robustness and general applicability.","short_abstract":"Scene Text Recognition (STR) remains a challenging task due to complex visual appearances and limited semantic priors. We propose TEACH, a novel training paradigm that injects ground-truth text into the model as auxiliary input and progressively reduces its influence during training. By encoding target labels into the...","url_abs":"https://arxiv.org/abs/2508.01153","url_pdf":"https://arxiv.org/pdf/2508.01153v1","authors":"[\"Xiahan Yang\",\"Hui Zheng\"]","published":"2025-08-02T02:28:09Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}