{"ID":2863305,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24297","arxiv_id":"2509.24297","title":"Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs","abstract":"High-quality, multi-modal benchmarks are crucial for advancing scientific reasoning in large models yet their manual creation is costly and unscalable. To address this bottleneck, we explore the potential for transforming Text-Only QA Pairs (TQAs) into high-quality Multi-Modal QA Pairs (MMQAs), which include three parts: 1) Task Definition \\\u0026 Evaluation Rubric: We develop a TQA-to-MMQA framework and establish a comprehensive, multi-dimensional MMQA quality rubric that provides principles for the transformation. 2) Benchmark Construction: Then we construct two extensive benchmarks to rigorously evaluate state-of-the-art generation \\\u0026 understanding models on the distinct tasks of MMQA generation \\\u0026 MMQA quality evaluation. 3) Preliminary Solution: We develop an agentic system (Q-Mirror), which operationalizes our framework by integrating MMQA generation and evaluation into a closed loop for iterative refinement. Our experiments show that while state-of-the-art models can generate MMQAs, their outputs still leave substantial gaps, underscoring the need for reliable evaluation. We further demonstrate that top-tier understanding models align closely with human judgment in MMQA quality assessment. Leveraging both insights, the Q-Mirror agent raises average scores from 78.90 to 85.22 and pass rates from 72\\% to 95\\%, offering a practical path to large-scale scientific benchmarks.","short_abstract":"High-quality, multi-modal benchmarks are crucial for advancing scientific reasoning in large models yet their manual creation is costly and unscalable. To address this bottleneck, we explore the potential for transforming Text-Only QA Pairs (TQAs) into high-quality Multi-Modal QA Pairs (MMQAs), which include three part...","url_abs":"https://arxiv.org/abs/2509.24297","url_pdf":"https://arxiv.org/pdf/2509.24297v2","authors":"[\"Junying Wang\",\"Zicheng Zhang\",\"Ye Shen\",\"Yalun Wu\",\"Yingji Liang\",\"Yijin Guo\",\"Farong Wen\",\"Wenzhe Li\",\"Xuezhi Zhao\",\"Qi Jia\",\"Guangtao Zhai\"]","published":"2025-09-29T05:22:10Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[]","has_code":false}
