{"ID":2899452,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.00401","arxiv_id":"2507.00401","title":"Few-shot Classification as Multi-instance Verification: Effective Backbone-agnostic Transfer across Domains","abstract":"We investigate cross-domain few-shot learning under the constraint that fine-tuning of backbones (i.e., feature extractors) is impossible or infeasible -- a scenario that is increasingly common in practical use cases. Handling the low-quality and static embeddings produced by frozen, \"black-box\" backbones leads to a problem representation of few-shot classification as a series of multiple instance verification (MIV) tasks. Inspired by this representation, we introduce a novel approach to few-shot domain adaptation, named the \"MIV-head\", akin to a classification head that is agnostic to any pretrained backbone and computationally efficient. The core components designed for the MIV-head, when trained on few-shot data from a target domain, collectively yield strong performance on test data from that domain. Importantly, it does so without fine-tuning the backbone, and within the \"meta-testing\" phase. Experimenting under various settings and on an extension of the Meta-dataset benchmark for cross-domain few-shot image classification, using representative off-the-shelf convolutional neural network and vision transformer backbones pretrained on ImageNet1K, we show that the MIV-head achieves highly competitive accuracy when compared to state-of-the-art \"adapter\" (or partially fine-tuning) methods applied to the same backbones, while incurring substantially lower adaptation cost. We also find well-known \"classification head\" approaches lag far behind in terms of accuracy. Ablation study empirically justifies the core components of our approach. We share our code at https://github.com/xxweka/MIV-head.","short_abstract":"We investigate cross-domain few-shot learning under the constraint that fine-tuning of backbones (i.e., feature extractors) is impossible or infeasible -- a scenario that is increasingly common in practical use cases. Handling the low-quality and static embeddings produced by frozen, \"black-box\" backbones leads to a pr...","url_abs":"https://arxiv.org/abs/2507.00401","url_pdf":"https://arxiv.org/pdf/2507.00401v1","authors":"[\"Xin Xu\",\"Eibe Frank\",\"Geoffrey Holmes\"]","published":"2025-07-01T03:34:20Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.LG\"]","methods":"[\"Vision Transformer\",\"Transformer\"]","has_code":false,"code_links":[{"ID":612482,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2899452,"paper_url":"https://arxiv.org/abs/2507.00401","paper_title":"Few-shot Classification as Multi-instance Verification: Effective Backbone-agnostic Transfer across Domains","repo_url":"https://github.com/xxweka/MIV-head","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
