{"ID":2870310,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12876","arxiv_id":"2509.12876","title":"Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents","abstract":"The proliferation of multimedia content necessitates the development of effective Multimedia Event Extraction (M2E2) systems. Though Large Vision-Language Models (LVLMs) have shown strong cross-modal capabilities, their utility in the M2E2 task remains underexplored. In this paper, we present the first systematic evaluation of representative LVLMs, including DeepSeek-VL2 and the Qwen-VL series, on the M2E2 dataset. Our evaluations cover text-only, image-only, and cross-media subtasks, assessed under both few-shot prompting and fine-tuning settings. Our key findings highlight the following valuable insights: (1) Few-shot LVLMs perform notably better on visual tasks but struggle significantly with textual tasks; (2) Fine-tuning LVLMs with LoRA substantially enhances model performance; and (3) LVLMs exhibit strong synergy when combining modalities, achieving superior performance in cross-modal settings. We further provide a detailed error analysis to reveal persistent challenges in areas such as semantic precision, localization, and cross-modal grounding, which remain critical obstacles for advancing M2E2 capabilities.","short_abstract":"The proliferation of multimedia content necessitates the development of effective Multimedia Event Extraction (M2E2) systems. Though Large Vision-Language Models (LVLMs) have shown strong cross-modal capabilities, their utility in the M2E2 task remains underexplored. In this paper, we present the first systematic evalu...","url_abs":"https://arxiv.org/abs/2509.12876","url_pdf":"https://arxiv.org/pdf/2509.12876v1","authors":"[\"Fuyu Xing\",\"Zimu Wang\",\"Wei Wang\",\"Haiyang Zhang\"]","published":"2025-09-16T09:29:02Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.MM\"]","methods":"[\"Language Model\",\"LoRA\"]","has_code":false}
