{"ID":2876497,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.00509","arxiv_id":"2509.00509","title":"Make me an Expert: Distilling from Generalist Black-Box Models into Specialized Models for Semantic Segmentation","abstract":"The rise of Artificial Intelligence as a Service (AIaaS) democratizes access to pre-trained models via Application Programming Interfaces (APIs), but also raises a fundamental question: how can local models be effectively trained using black-box models that do not expose their weights, training data, or logits, a constraint in which current domain adaptation paradigms are impractical ? To address this challenge, we introduce the Black-Box Distillation (B2D) setting, which enables local model adaptation under realistic constraints: (1) the API model is open-vocabulary and trained on large-scale general-purpose data, and (2) access is limited to one-hot predictions only. We identify that open-vocabulary models exhibit significant sensitivity to input resolution, with different object classes being segmented optimally at different scales, a limitation termed the \"curse of resolution\". Our method, ATtention-Guided sCaler (ATGC), addresses this challenge by leveraging DINOv2 attention maps to dynamically select optimal scales for black-box model inference. ATGC scores the attention maps with entropy to identify informative scales for pseudo-labelling, enabling effective distillation. Experiments demonstrate substantial improvements under black-box supervision across multiple datasets while requiring only one-hot API predictions. Our code is available at https://github.com/yasserben/ATGC.","short_abstract":"The rise of Artificial Intelligence as a Service (AIaaS) democratizes access to pre-trained models via Application Programming Interfaces (APIs), but also raises a fundamental question: how can local models be effectively trained using black-box models that do not expose their weights, training data, or logits, a const...","url_abs":"https://arxiv.org/abs/2509.00509","url_pdf":"https://arxiv.org/pdf/2509.00509v1","authors":"[\"Yasser Benigmim\",\"Subhankar Roy\",\"Khalid Oublal\",\"Imad Eddine Marouf\",\"Slim Essid\",\"Vicky Kalogeiton\",\"Stéphane Lathuilière\"]","published":"2025-08-30T14:03:09Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":610301,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2876497,"paper_url":"https://arxiv.org/abs/2509.00509","paper_title":"Make me an Expert: Distilling from Generalist Black-Box Models into Specialized Models for Semantic Segmentation","repo_url":"https://github.com/yasserben/ATGC","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}