{"ID":2843743,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.06719","arxiv_id":"2511.06719","title":"MobileLLM-Pro Technical Report","abstract":"Efficient on-device language models around 1 billion parameters are essential for powering low-latency AI applications on mobile and wearable devices. However, achieving strong performance in this model class, while supporting long context windows and practical deployment remains a significant challenge. We introduce MobileLLM-Pro, a 1-billion-parameter language model optimized for on-device deployment. MobileLLM-Pro achieves state-of-the-art results across 11 standard benchmarks, significantly outperforming both Gemma 3-1B and Llama 3.2-1B, while supporting context windows of up to 128,000 tokens and showing only minor performance regressions at 4-bit quantization. These improvements are enabled by four core innovations: (1) implicit positional distillation, a novel technique that effectively instills long-context capabilities through knowledge distillation; (2) a specialist model merging framework that fuses multiple domain experts into a compact model without parameter growth; (3) simulation-driven data mixing using utility estimation; and (4) 4-bit quantization-aware training with self-distillation. We release our model weights and code to support future research in efficient on-device language models.","short_abstract":"Efficient on-device language models around 1 billion parameters are essential for powering low-latency AI applications on mobile and wearable devices. However, achieving strong performance in this model class, while supporting long context windows and practical deployment remains a significant challenge. We introduce M...","url_abs":"https://arxiv.org/abs/2511.06719","url_pdf":"https://arxiv.org/pdf/2511.06719v1","authors":"[\"Patrick Huber\",\"Ernie Chang\",\"Wei Wen\",\"Igor Fedorov\",\"Tarek Elgamal\",\"Hanxian Huang\",\"Naveen Suda\",\"Chinnadhurai Sankar\",\"Vish Vogeti\",\"Yanghan Wang\",\"Alex Gladkov\",\"Kai Sheng Tai\",\"Abdelrahman Elogeel\",\"Tarek Hefny\",\"Vikas Chandra\",\"Ahmed Aly\",\"Anuj Kumar\",\"Raghuraman Krishnamoorthi\",\"Adithya Sagar\"]","published":"2025-11-10T05:28:31Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
