{"ID":3050005,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T14:07:05.414468951Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04922","arxiv_id":"2606.04922","title":"Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models","abstract":"Current prompt-based and adapter-based tuning of vision-language models (VLMs) is attractive for medical imaging, where clinical data sensitivity favors frozen backbones and annotations are limited. However, these methods typically optimize only the ground-truth class, treating all other classes as equally incorrect, ignoring clinically meaningful class relations and yielding unstable decision boundaries in limited-supervision settings. We propose Omni-Geometry Knowledge Distillation (OGKD), a new framework that injects class-relation structure into the teacher to produce directional targets that preserve the ground truth while respecting inter-class geometry. Using these targets, we develop two distillation losses: Global Geometry-Aware Distillation (GAD) operates on the global image token, and Label-Guided Geometry Distillation (LGD) applies the same geometry to attentive patch tokens to improve fine-grained alignment. Across comprehensive experiments and analyses on 11 widely-used medical datasets for base-to-novel and few-shot evaluations, our OGKD achieves substantially better performance, consistently improving accuracy by an average absolute gain of 1.7%-2.8% over all prior state-of-the-art VLM adaptation counterparts. It also robustly generalizes to unseen classes and yields more reliable predictions than other approaches. Our code is available at https://github.com/tientrandinh/OGKD.","short_abstract":"Current prompt-based and adapter-based tuning of vision-language models (VLMs) is attractive for medical imaging, where clinical data sensitivity favors frozen backbones and annotations are limited. However, these methods typically optimize only the ground-truth class, treating all other classes as equally incorrect, i...","url_abs":"https://arxiv.org/abs/2606.04922","url_pdf":"https://arxiv.org/pdf/2606.04922v1","authors":"[\"Tran Dinh Tien\",\"Zhiqiang Shen\"]","published":"2026-06-03T14:17:57Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":612773,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-04T02:13:16.786527022Z","DeletedAt":null,"paper_id":3050005,"paper_url":"https://arxiv.org/abs/2606.04922","paper_title":"Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models","repo_url":"https://github.com/tientrandinh/OGKD","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
