{"ID":3004837,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T11:43:53.432517148Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03604","arxiv_id":"2606.03604","title":"Beyond the Literal: Decomposing Pragmatic Intent in Multimodal Meme Understanding","abstract":"When asked what a meme or sarcastic post means, Large Vision Language Models (LVLMs) tend to describe what the image shows rather than what the author is trying to communicate. Standard instruction tuning entangles a post's literal content with its pragmatic meaning, letting surface-level details contaminate the final response. We reframe meme understanding as a problem of literal-pragmatic decomposition and propose \\textbf{Intent Projection}, a framework that separates the two signals at the representation, output, and objective levels within a single LVLM backbone. At the representation level, an orthogonal projection module removes dominant unimodal directions from the fused image-text representation, retaining only the pragmatic residual, while a surface-real affect classifier anchors the decoder with a discrete tag that names the polarity gap. At the output level, the model externalizes a structured reasoning chain, and at the objective level a contrastive reward explicitly penalizes answers that restate the literal description. Across six multimodal benchmarks, Intent Projection consistently outperforms open-source baselines and narrows the gap to proprietary models, with the largest gains on high-divergence posts where literal collapse is most damaging.","short_abstract":"When asked what a meme or sarcastic post means, Large Vision Language Models (LVLMs) tend to describe what the image shows rather than what the author is trying to communicate. Standard instruction tuning entangles a post's literal content with its pragmatic meaning, letting surface-level details contaminate the final...","url_abs":"https://arxiv.org/abs/2606.03604","url_pdf":"https://arxiv.org/pdf/2606.03604v1","authors":"[\"Zhengyi Zhao\",\"Shubo Zhang\",\"Zezhong Wang\",\"Luyao Ye\",\"Huimin Wang\",\"Hanqi Yan\",\"Binyang Li\",\"Kam-Fai Wong\",\"Yulan He\"]","published":"2026-06-02T13:09:04Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false}
