{"ID":2879678,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.15229","arxiv_id":"2508.15229","title":"VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models","abstract":"Small Language Models (SLMs) provide computational advantages in resource-constrained environments, yet memory limitations remain a critical bottleneck for edge device deployment. A substantial portion of SLMs' memory footprint stems from vocabulary-related components, particularly embeddings and language modeling (LM) heads, due to large vocabulary sizes. Existing static vocabulary pruning, while reducing memory usage, suffers from rigid, one-size-fits-all designs that cause information loss during the prefill stage and lack flexibility. In this work, we identify two key principles underlying the vocabulary reduction challenge: the lexical locality principle, the observation that only a small subset of tokens is required during any single inference, and the asymmetry in computational characteristics between vocabulary-related components of SLM. Based on these insights, we introduce VocabTailor, a novel decoupled dynamic vocabulary selection framework that addresses memory constraints through offloading embedding and implements a hybrid static-dynamic vocabulary selection strategy for LM Head, enabling on-demand loading of vocabulary components. Comprehensive experiments across diverse downstream tasks demonstrate that VocabTailor achieves a reduction of up to 99% in the memory usage of vocabulary-related components with minimal or no degradation in task performance, substantially outperforming existing static vocabulary pruning. Our code is available at https://github.com/AwakenedInsects/VocabTailor.","short_abstract":"Small Language Models (SLMs) provide computational advantages in resource-constrained environments, yet memory limitations remain a critical bottleneck for edge device deployment. A substantial portion of SLMs' memory footprint stems from vocabulary-related components, particularly embeddings and language modeling (LM)...","url_abs":"https://arxiv.org/abs/2508.15229","url_pdf":"https://arxiv.org/pdf/2508.15229v3","authors":"[\"Hanling Zhang\",\"Yayu Zhou\",\"Tongcheng Fang\",\"Zhihang Yuan\",\"Guohao Dai\",\"Wanli Ouyang\",\"Yu Wang\"]","published":"2025-08-21T04:32:13Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":610613,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2879678,"paper_url":"https://arxiv.org/abs/2508.15229","paper_title":"VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models","repo_url":"https://github.com/AwakenedInsects/VocabTailor","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
