{"ID":2876376,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.00305","arxiv_id":"2509.00305","title":"Language-Aware Information Maximization for Transductive Few-Shot CLIP","abstract":"Transductive few-shot learning has triggered an abundant literature focusing on vision-only models, but is still at a nascent stage within the recent context of foundational vision-language models (VLMs). Only a few recent methods addressed the problem, pointing to the potential of tranduction in VLMs and to the need for VLM-tailored methods. Building on this momentum, we leverage information-theoretic concepts and recent progress in parameter-efficient fine-tuning (PEFT), developing a highly competitive transductive few-shot CLIP method. Specifically, we introduce a novel Language-aware Information MaximizatiOn (LIMO) loss integrating three complementary terms: (i) the mutual information between the vision inputs and the textual class descriptions; (ii) a Kullback-Leibler (KL) divergence penalizing deviation of the network's probabilistic outputs from the text-driven zero-shot predictions; and (iii) a standard cross-entropy loss based on the labeled shots. Furthermore, we challenge the commonly followed fine-tuning practices in the context of transductive few-shot learning, and explore PEFT strategies, completely overlooked in this context. Surprisingly, we observe substantial boosts in performances, which points to the potential of adapting a subset of the model's parameters in the transductive few-shot setting. We report comprehensive evaluations, which show that LIMO outperforms the very recent transductive few-shot CLIP methods by a large margin and yields significant gains over the best-performing inductive methods. Our code is publicly available at:\\[ \\href{https://github.com/ghassenbaklouti/LIMO}{\\text{here}} \\]","short_abstract":"Transductive few-shot learning has triggered an abundant literature focusing on vision-only models, but is still at a nascent stage within the recent context of foundational vision-language models (VLMs). Only a few recent methods addressed the problem, pointing to the potential of tranduction in VLMs and to the need f...","url_abs":"https://arxiv.org/abs/2509.00305","url_pdf":"https://arxiv.org/pdf/2509.00305v1","authors":"[\"Ghassen Baklouti\",\"Maxime Zanella\",\"Ismail Ben Ayed\"]","published":"2025-08-30T01:46:31Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":610287,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2876376,"paper_url":"https://arxiv.org/abs/2509.00305","paper_title":"Language-Aware Information Maximization for Transductive Few-Shot CLIP","repo_url":"https://github.com/ghassenbaklouti/LIMO","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
