{"ID":2877173,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.20760","arxiv_id":"2508.20760","title":"Occlusion Robustness of CLIP for Military Vehicle Classification","abstract":"Vision-language models (VLMs) like CLIP enable zero-shot classification by aligning images and text in a shared embedding space, offering advantages for defense applications with scarce labeled data. However, CLIP's robustness in challenging military environments, with partial occlusion and degraded signal-to-noise ratio (SNR), remains underexplored. We investigate CLIP variants' robustness to occlusion using a custom dataset of 18 military vehicle classes and evaluate using Normalized Area Under the Curve (NAUC) across occlusion percentages. Four key insights emerge: (1) Transformer-based CLIP models consistently outperform CNNs, (2) fine-grained, dispersed occlusions degrade performance more than larger contiguous occlusions, (3) despite improved accuracy, performance of linear-probed models sharply drops at around 35% occlusion, (4) by finetuning the model's backbone, this performance drop occurs at more than 60% occlusion. These results underscore the importance of occlusion-specific augmentations during training and the need for further exploration into patch-level sensitivity and architectural resilience for real-world deployment of CLIP.","short_abstract":"Vision-language models (VLMs) like CLIP enable zero-shot classification by aligning images and text in a shared embedding space, offering advantages for defense applications with scarce labeled data. However, CLIP's robustness in challenging military environments, with partial occlusion and degraded signal-to-noise rat...","url_abs":"https://arxiv.org/abs/2508.20760","url_pdf":"https://arxiv.org/pdf/2508.20760v2","authors":"[\"Jan Erik van Woerden\",\"Gertjan Burghouts\",\"Lotte Nijskens\",\"Alma M. Liezenga\",\"Sabina van Rooij\",\"Frank Ruis\",\"Hugo J. Kuijf\"]","published":"2025-08-28T13:16:55Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Transformer\",\"Language Model\",\"LoRA\",\"Convolutional Neural Network\"]","has_code":false}
