{"ID":2851076,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.20358","arxiv_id":"2510.20358","title":"Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)","abstract":"We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine-tuning strategies to enforce \"more communicative\" text generations by our models. Although our models underperform on most standard BabyLM benchmarks, they excel at dialogue continuation prediction in a minimal pair setting. While PPO fine-tuning has mixed to adversarial effects on our models, DPO fine-tuning further improves their performance on our custom dialogue benchmark.","short_abstract":"We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine-tuning strategies to enforce \"more communicative\" text generations by our models. Although our models underperform on m...","url_abs":"https://arxiv.org/abs/2510.20358","url_pdf":"https://arxiv.org/pdf/2510.20358v1","authors":"[\"Francesca Padovani\",\"Bastian Bunzeck\",\"Manar Ali\",\"Omar Momen\",\"Arianna Bisazza\",\"Hendrik Buschmeier\",\"Sina Zarrieß\"]","published":"2025-10-23T08:57:56Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false}
