{"ID":2834890,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.00882","arxiv_id":"2512.00882","title":"Look, Recite, Then Answer: Enhancing VLM Performance via Self-Generated Knowledge Hints","abstract":"Vision-Language Models (VLMs) exhibit significant performance plateaus in specialized domains like precision agriculture, primarily due to \"Reasoning-Driven Hallucination\" where linguistic priors override visual perception. A key bottleneck is the \"Modality Gap\": visual embeddings fail to reliably activate the fine-grained expert knowledge already encoded in model parameters. We propose \"Look, Recite, Then Answer,\" a parameter-efficient framework that enhances VLMs via self-generated knowledge hints while keeping backbone models frozen. The framework decouples inference into three stages: (1) Look generates objective visual descriptions and candidate sets; (2) Recite employs a lightweight 1.7B router to transform visual cues into targeted queries that trigger candidate-specific parametric knowledge; (3) Answer performs parallel evidence alignment between descriptions and recited knowledge to select the most consistent label. On AgroBench, our method achieves state-of-the-art results, improving Weed Identification accuracy by 23.52% over Qwen2-VL-72B and surpassing GPT-4o without external search overhead. This modular design mitigates hallucinations by transforming passive perception into active, controllable knowledge retrieval","short_abstract":"Vision-Language Models (VLMs) exhibit significant performance plateaus in specialized domains like precision agriculture, primarily due to \"Reasoning-Driven Hallucination\" where linguistic priors override visual perception. A key bottleneck is the \"Modality Gap\": visual embeddings fail to reliably activate the fine-gra...","url_abs":"https://arxiv.org/abs/2512.00882","url_pdf":"https://arxiv.org/pdf/2512.00882v3","authors":"[\"Xisheng Feng\"]","published":"2025-11-30T13:04:43Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
