{"ID":2922060,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T10:37:16.173077835Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.00673","arxiv_id":"2606.00673","title":"T-CLIP: Enabling Thermal Perception for Contrastive Language-Image Pretraining","abstract":"Thermal imaging offers a powerful alternative to visible-spectrum vision under challenging conditions such as low illumination and adverse weather, yet foundational vision-language models like CLIP fail to align thermal images with textual descriptions due to a fundamental thermal perception gap. We identify three major challenges: the lack of captioned thermal datasets, the inability of standard LLMs to reason about thermal phenomena, and a key representational challenge in thermal imaging where global scene context and object-level heat signatures conflict when learned together in a single embedding space. To address these, we introduce IR-Cap, the first physics-aware thermal captioning pipeline and dataset providing complementary global and fine-grained thermal descriptions across three public benchmarks, and T-CLIP, a decoupled dual-LoRA framework that independently adapts CLIP for scene-level and object-level thermal understanding. T-CLIP achieves consistent improvements over all baselines across three thermal benchmarks in cross-modal retrieval, and we provide an exploratory demonstration of its applicability to text-conditioned thermal image generation.","short_abstract":"Thermal imaging offers a powerful alternative to visible-spectrum vision under challenging conditions such as low illumination and adverse weather, yet foundational vision-language models like CLIP fail to align thermal images with textual descriptions due to a fundamental thermal perception gap. We identify three majo...","url_abs":"https://arxiv.org/abs/2606.00673","url_pdf":"https://arxiv.org/pdf/2606.00673v1","authors":"[\"Tayeba Qazi\",\"Ayush Maheshwari\",\"Prerana Mukherjee\",\"Brejesh Lall\"]","published":"2026-05-30T11:03:58Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}
