{"ID":2889867,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.20145","arxiv_id":"2507.20145","title":"Multi-Agent Interactive Question Generation Framework for Long Document Understanding","abstract":"Document Understanding (DU) in long-contextual scenarios with complex layouts remains a significant challenge in vision-language research. Although Large Vision-Language Models (LVLMs) excel at short-context DU tasks, their performance declines in long-context settings. A key limitation is the scarcity of fine-grained training data, particularly for low-resource languages such as Arabic. Existing state-of-the-art techniques rely heavily on human annotation, which is costly and inefficient. We propose a fully automated, multi-agent interactive framework to generate long-context questions efficiently. Our approach efficiently generates high-quality single- and multi-page questions for extensive English and Arabic documents, covering hundreds of pages across diverse domains. This facilitates the development of LVLMs with enhanced long-context understanding ability. Experimental results in this work have shown that our generated English and Arabic questions (\\textbf{AraEngLongBench}) are quite challenging to major open- and close-source LVLMs. The code and data proposed in this work can be found in https://github.com/wangk0b/Multi_Agentic_QA_Long_Doc.git. Sample Question and Answer (QA) pairs and structured system prompts can be found in the Appendix.","short_abstract":"Document Understanding (DU) in long-contextual scenarios with complex layouts remains a significant challenge in vision-language research. Although Large Vision-Language Models (LVLMs) excel at short-context DU tasks, their performance declines in long-context settings. A key limitation is the scarcity of fine-grained...","url_abs":"https://arxiv.org/abs/2507.20145","url_pdf":"https://arxiv.org/pdf/2507.20145v1","authors":"[\"Kesen Wang\",\"Daulet Toibazar\",\"Abdulrahman Alfulayt\",\"Abdulaziz S. Albadawi\",\"Ranya A. Alkahtani\",\"Asma A. Ibrahim\",\"Haneen A. Alhomoud\",\"Sherif Mohamed\",\"Pedro J. Moreno\"]","published":"2025-07-27T06:44:53Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":611699,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2889867,"paper_url":"https://arxiv.org/abs/2507.20145","paper_title":"Multi-Agent Interactive Question Generation Framework for Long Document Understanding","repo_url":"https://github.com/wangk0b/Multi_Agentic_QA_Long_Doc.git","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
