{"ID":2878651,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.18179","arxiv_id":"2508.18179","title":"SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models","abstract":"Evaluating whether vision-language models (VLMs) reason consistently across representations is challenging because modality comparisons are typically confounded by task differences and asymmetric information. We introduce SEAM, a benchmark that pairs semantically equivalent inputs across four domains that have existing standardized textual and visual notations. By employing distinct notation systems across modalities, in contrast to OCR-based image-text pairing, SEAM provides a rigorous comparative assessment of the textual-symbolic and visual-spatial reasoning capabilities of VLMs. Across 21 contemporary models, we observe systematic modality imbalance: vision frequently lags language in overall performance, despite the problems containing semantically equivalent information, and cross-modal agreement is relatively low. Our error analysis reveals two main drivers: textual perception failures from tokenization in domain notation and visual perception failures that induce hallucinations. We also show that our results are largely robust to visual transformations. SEAM establishes a controlled, semantically equivalent setting for measuring and improving modality-agnostic reasoning.","short_abstract":"Evaluating whether vision-language models (VLMs) reason consistently across representations is challenging because modality comparisons are typically confounded by task differences and asymmetric information. We introduce SEAM, a benchmark that pairs semantically equivalent inputs across four domains that have existing...","url_abs":"https://arxiv.org/abs/2508.18179","url_pdf":"https://arxiv.org/pdf/2508.18179v1","authors":"[\"Zhenwei Tang\",\"Difan Jiao\",\"Blair Yang\",\"Ashton Anderson\"]","published":"2025-08-25T16:33:07Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}
