{"ID":2862670,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.26036","arxiv_id":"2509.26036","title":"SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP","abstract":"While Contrastive Language-Image Pretraining (CLIP) excels at zero-shot tasks by aligning image and text embeddings, its performance in few-shot classification is hindered by a critical limitation: intra-modal misalignment. This issue, caused by a persistent modality gap and CLIP's exclusively inter-modal training objective, leaves the embedding spaces uncalibrated, making direct image-to-image comparisons unreliable. Existing methods attempt to address this by refining similarity logits or by computationally expensive per-sample optimization. To overcome these challenges, we introduce SeMoBridge, a lightweight yet powerful approach that directly addresses the misalignment. Our method maps images into the text modality, while keeping their semantic content intact through what we call a Semantic Modality Bridge. SeMoBridge is closed-form and can optionally be trained through multi-modal supervision, combining image and text-alignment losses to optimize the projection. Experiments show that the trained version, SeMoBridge-T, requires only a fraction of the training time while overall outperforming other methods, particularly in low-data scenarios (1, 2, and 4 shots). The code is available at https://github.com/christti98/semobridge.","short_abstract":"While Contrastive Language-Image Pretraining (CLIP) excels at zero-shot tasks by aligning image and text embeddings, its performance in few-shot classification is hindered by a critical limitation: intra-modal misalignment. This issue, caused by a persistent modality gap and CLIP's exclusively inter-modal training obje...","url_abs":"https://arxiv.org/abs/2509.26036","url_pdf":"https://arxiv.org/pdf/2509.26036v3","authors":"[\"Christoph Timmermann\",\"Hyunse Lee\",\"Woojin Lee\"]","published":"2025-09-30T10:12:15Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[]","has_code":false,"code_links":[{"ID":608920,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2862670,"paper_url":"https://arxiv.org/abs/2509.26036","paper_title":"SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP","repo_url":"https://github.com/christti98/semobridge","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
