{"ID":2855020,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.13219","arxiv_id":"2510.13219","title":"Prompt-based Adaptation in Large-scale Vision Models: A Survey","abstract":"In computer vision, Visual Prompting (VP) and Visual Prompt Tuning (VPT) have recently emerged as lightweight and effective alternatives to full fine-tuning for adapting large-scale vision models within the \"pretrain-then-finetune\" paradigm. However, despite rapid progress, their conceptual boundaries remain blurred, as VP and VPT are frequently used interchangeably in current research, reflecting a lack of systematic distinction between these techniques and their respective applications. In this survey, we revisit the designs of VP and VPT from first principles and conceptualize them within a unified framework termed Prompt-based Adaptation (PA). Within this framework, we distinguish methods based on their injection granularity: VP operates at the pixel level, while VPT injects prompts at the token level. We further categorize these methods by their generation mechanism into fixed, learnable, and generated prompts. Beyond the core methodologies, we examine PA integrations across diverse domains, including medical imaging, 3D point clouds, and vision-language tasks, as well as its role in test-time adaptation and trustworthy AI. We also summarize current benchmarks and identify key challenges and future directions. To the best of our knowledge, we are the first comprehensive survey dedicated to PA methodologies and applications in light of their distinct characteristics. Our survey aims to provide a clear roadmap for researchers and practitioners in all areas to understand and explore the evolving landscape of PA-related research.","short_abstract":"In computer vision, Visual Prompting (VP) and Visual Prompt Tuning (VPT) have recently emerged as lightweight and effective alternatives to full fine-tuning for adapting large-scale vision models within the \"pretrain-then-finetune\" paradigm. However, despite rapid progress, their conceptual boundaries remain blurred, a...","url_abs":"https://arxiv.org/abs/2510.13219","url_pdf":"https://arxiv.org/pdf/2510.13219v2","authors":"[\"Xi Xiao\",\"Yunbei Zhang\",\"Lin Zhao\",\"Yiyang Liu\",\"Xiaoying Liao\",\"Zheda Mai\",\"Xingjian Li\",\"Xiao Wang\",\"Hao Xu\",\"Jihun Hamm\",\"Xue Lin\",\"Min Xu\",\"Qifan Wang\",\"Tianyang Wang\",\"Cheng Han\"]","published":"2025-10-15T07:14:50Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
