{"ID":2861293,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.01659","arxiv_id":"2510.01659","title":"MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization","abstract":"Multimodal Dialogue Summarization (MDS) is a critical task with wide-ranging applications. To support the development of effective MDS models, robust automatic evaluation methods are essential for reducing both cost and human effort. However, such methods require a strong meta-evaluation benchmark grounded in human annotations. In this work, we introduce MDSEval, the first meta-evaluation benchmark for MDS, consisting image-sharing dialogues, corresponding summaries, and human judgments across eight well-defined quality aspects. To ensure data quality and richfulness, we propose a novel filtering framework leveraging Mutually Exclusive Key Information (MEKI) across modalities. Our work is the first to identify and formalize key evaluation dimensions specific to MDS. We benchmark state-of-the-art modal evaluation methods, revealing their limitations in distinguishing summaries from advanced MLLMs and their susceptibility to various bias.","short_abstract":"Multimodal Dialogue Summarization (MDS) is a critical task with wide-ranging applications. To support the development of effective MDS models, robust automatic evaluation methods are essential for reducing both cost and human effort. However, such methods require a strong meta-evaluation benchmark grounded in human ann...","url_abs":"https://arxiv.org/abs/2510.01659","url_pdf":"https://arxiv.org/pdf/2510.01659v1","authors":"[\"Yinhong Liu\",\"Jianfeng He\",\"Hang Su\",\"Ruixue Lian\",\"Yi Nian\",\"Jake Vincent\",\"Srikanth Vishnubhotla\",\"Robinson Piramuthu\",\"Saab Mansour\"]","published":"2025-10-02T04:38:27Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}