{"ID":3050123,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T10:22:36.014579446Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04694","arxiv_id":"2606.04694","title":"DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer","abstract":"Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framework that combines an online sequence-level signal with off-policy and on-policy token-level signals. DuDi further uses a cross-lingual verbalizer to refine teacher feedback and improve teacher-student transferability in multilingual settings. Experiments on SEA-HELM across multiple model families, scales, and teacher-student settings show that DuDi consistently outperforms competitive distillation baselines. Ablations and analyses confirm that sequence-level optimization, token-level supervision, and cross-lingual verbalization provide complementary and transferable learning signals for multilingual SLMs.","short_abstract":"Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framework that combines an online sequence-level signal with off-policy and on-p...","url_abs":"https://arxiv.org/abs/2606.04694","url_pdf":"https://arxiv.org/pdf/2606.04694v1","authors":"[\"Patomporn Payoungkhamdee\",\"Tinnakit Udsa\",\"Jian Gang Ngui\",\"Sarana Nutanong\",\"Alham Fikri Aji\",\"Peerat Limkonchotiwat\"]","published":"2026-06-03T10:23:05Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false}
