{"ID":2874227,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.05056","arxiv_id":"2509.05056","title":"Masked Diffusion Language Models with Frequency-Informed Training","abstract":"We present a masked diffusion language modeling framework for data-efficient training for the BabyLM 2025 Challenge. Our approach applies diffusion training objectives to language modeling under strict data constraints, incorporating frequency-informed masking that prioritizes learning from rare tokens while maintaining theoretical validity. We explore multiple noise scheduling strategies, including two-mode approaches, and investigate different noise weighting schemes within the NELBO objective. We evaluate our method on the BabyLM benchmark suite, measuring linguistic competence, world knowledge, and human-likeness. Results show performance competitive to hybrid autoregressive-masked baselines, demonstrating that diffusion-based training offers a viable alternative for data-restricted language learning.","short_abstract":"We present a masked diffusion language modeling framework for data-efficient training for the BabyLM 2025 Challenge. Our approach applies diffusion training objectives to language modeling under strict data constraints, incorporating frequency-informed masking that prioritizes learning from rare tokens while maintainin...","url_abs":"https://arxiv.org/abs/2509.05056","url_pdf":"https://arxiv.org/pdf/2509.05056v1","authors":"[\"Despoina Kosmopoulou\",\"Efthymios Georgiou\",\"Vaggelis Dorovatas\",\"Georgios Paraskevopoulos\",\"Alexandros Potamianos\"]","published":"2025-09-05T12:35:06Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false}
