{"ID":2870019,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11574","arxiv_id":"2511.11574","title":"LLM on a Budget: Active Knowledge Distillation for Efficient Classification of Large Text Corpora","abstract":"Large Language Models (LLMs) are highly accurate in classification tasks, however, substantial computational and financial costs hinder their large-scale deployment in dynamic environments. Knowledge Distillation (KD) where a LLM \"teacher\" trains a smaller and more efficient \"student\" model, offers a promising solution to this problem. However, the distillation process itself often remains costly for large datasets, since it requires the teacher to label a vast number of samples while incurring significant token consumption. To alleviate this challenge, in this work we explore the active learning (AL) as a way to create efficient student models at a fraction of the cost while preserving the LLM's performance. In particular, we introduce M-RARU (Multi-class Randomized Accept/Reject Uncertainty Sampling), a novel AL algorithm that significantly reduces training costs. M-RARU employs an innovative strategy combining uncertainty with a randomized accept-reject mechanism to select only the most informative data points for the LLM teacher. This focused approach significantly minimizes required API calls and data processing time. We evaluate M-RARU against random sampling across five diverse student models (SVM, LDA, RF, GBDT, and DistilBERT) on multiple benchmark datasets. Experiments demonstrate that our proposed method achieves up to 80% reduction in sample requirements as compared to random sampling, substantially improving classification accuracy while reducing financial costs and overall training time.","short_abstract":"Large Language Models (LLMs) are highly accurate in classification tasks, however, substantial computational and financial costs hinder their large-scale deployment in dynamic environments. Knowledge Distillation (KD) where a LLM \"teacher\" trains a smaller and more efficient \"student\" model, offers a promising solution...","url_abs":"https://arxiv.org/abs/2511.11574","url_pdf":"https://arxiv.org/pdf/2511.11574v1","authors":"[\"Viviana Luccioli\",\"Rithika Iyengar\",\"Ryan Panley\",\"Flora Haberkorn\",\"Xiaoyu Ge\",\"Leland Crane\",\"Nitish Sinha\",\"Seung Jung Lee\"]","published":"2025-09-17T18:38:56Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}