{"ID":2883662,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.07819","arxiv_id":"2508.07819","title":"ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection","abstract":"Pre-trained Vision-Language Models (VLMs) struggle with Zero-Shot Anomaly Detection (ZSAD) due to a critical adaptation gap: they lack the local inductive biases required for dense prediction and employ inflexible feature fusion paradigms. We address these limitations through an Architectural Co-Design framework that jointly refines feature representation and cross-modal fusion. Our method proposes a parameter-efficient Convolutional Low-Rank Adaptation (Conv-LoRA) adapter to inject local inductive biases for fine-grained representation, and introduces a Dynamic Fusion Gateway (DFG) that leverages visual context to adaptively modulate text prompts, enabling a powerful bidirectional fusion. Extensive experiments on diverse industrial and medical benchmarks demonstrate superior accuracy and robustness, validating that this synergistic co-design is critical for robustly adapting foundation models to dense perception tasks. The source code is available at https://github.com/cockmake/ACD-CLIP.","short_abstract":"Pre-trained Vision-Language Models (VLMs) struggle with Zero-Shot Anomaly Detection (ZSAD) due to a critical adaptation gap: they lack the local inductive biases required for dense prediction and employ inflexible feature fusion paradigms. We address these limitations through an Architectural Co-Design framework that j...","url_abs":"https://arxiv.org/abs/2508.07819","url_pdf":"https://arxiv.org/pdf/2508.07819v6","authors":"[\"Ke Ma\",\"Jun Long\",\"Hongxiao Fei\",\"Liujie Hua\",\"Zhen Dai\",\"Yueyi Luo\"]","published":"2025-08-11T10:03:45Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Language Model\",\"LoRA\"]","has_code":false,"code_links":[{"ID":611004,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2883662,"paper_url":"https://arxiv.org/abs/2508.07819","paper_title":"ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection","repo_url":"https://github.com/cockmake/ACD-CLIP","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}