{"ID":2878555,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.17973","arxiv_id":"2508.17973","title":"German4All -- A Dataset and Model for Readability-Controlled Paraphrasing in German","abstract":"The ability to paraphrase texts across different complexity levels is essential for creating accessible texts that can be tailored toward diverse reader groups. Thus, we introduce German4All, the first large-scale German dataset of aligned readability-controlled, paragraph-level paraphrases. It spans five readability levels and comprises over 25,000 samples. The dataset is automatically synthesized using GPT-4 and rigorously evaluated through both human and LLM-based judgments. Using German4All, we train an open-source, readability-controlled paraphrasing model that achieves state-of-the-art performance in German text simplification, enabling more nuanced and reader-specific adaptations. We opensource both the dataset and the model to encourage further research on multi-level paraphrasing","short_abstract":"The ability to paraphrase texts across different complexity levels is essential for creating accessible texts that can be tailored toward diverse reader groups. Thus, we introduce German4All, the first large-scale German dataset of aligned readability-controlled, paragraph-level paraphrases. It spans five readability l...","url_abs":"https://arxiv.org/abs/2508.17973","url_pdf":"https://arxiv.org/pdf/2508.17973v2","authors":"[\"Miriam Anschütz\",\"Thanh Mai Pham\",\"Eslam Nasrallah\",\"Maximilian Müller\",\"Cristian-George Craciun\",\"Georg Groh\"]","published":"2025-08-25T12:40:32Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false}
