{"ID":2880572,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.13606","arxiv_id":"2508.13606","title":"AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings","abstract":"Document Visual Question Answering (Document VQA) faces significant challenges when processing long documents in low-resource environments due to context limitations and insufficient training data. This paper presents AdaDocVQA, a unified adaptive framework addressing these challenges through three core innovations: a hybrid text retrieval architecture for effective document segmentation, an intelligent data augmentation pipeline that automatically generates high-quality reasoning question-answer pairs with multi-level verification, and adaptive ensemble inference with dynamic configuration generation and early stopping mechanisms. Experiments on Japanese document VQA benchmarks demonstrate substantial improvements with 83.04\\% accuracy on Yes/No questions, 52.66\\% on factual questions, and 44.12\\% on numerical questions in JDocQA, and 59\\% accuracy on LAVA dataset. Ablation studies confirm meaningful contributions from each component, and our framework establishes new state-of-the-art results for Japanese document VQA while providing a scalable foundation for other low-resource languages and specialized domains. Our code available at: https://github.com/Haoxuanli-Thu/AdaDocVQA.","short_abstract":"Document Visual Question Answering (Document VQA) faces significant challenges when processing long documents in low-resource environments due to context limitations and insufficient training data. This paper presents AdaDocVQA, a unified adaptive framework addressing these challenges through three core innovations: a...","url_abs":"https://arxiv.org/abs/2508.13606","url_pdf":"https://arxiv.org/pdf/2508.13606v1","authors":"[\"Haoxuan Li\",\"Wei Song\",\"Aofan Liu\",\"Peiwu Qin\"]","published":"2025-08-19T08:12:45Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false,"code_links":[{"ID":610687,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880572,"paper_url":"https://arxiv.org/abs/2508.13606","paper_title":"AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings","repo_url":"https://github.com/Haoxuanli-Thu/AdaDocVQA","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
