{"ID":2842775,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09213","arxiv_id":"2511.09213","title":"Pretraining Finnish ModernBERTs","abstract":"This paper reports on pretraining ModernBERT encoder models in six different sizes, ranging from 51M to 475M parameters, with a focus on limited multilingualism, emphasizing languages relevant to Finland. Our models are competitive with, or superior to, existing multilingual models. They outperform monolingual models on tasks that require a context longer than 512 tokens. We present empirical results on using different data in the final stage of training. The code and models are publicly released.","short_abstract":"This paper reports on pretraining ModernBERT encoder models in six different sizes, ranging from 51M to 475M parameters, with a focus on limited multilingualism, emphasizing languages relevant to Finland. Our models are competitive with, or superior to, existing multilingual models. They outperform monolingual models o...","url_abs":"https://arxiv.org/abs/2511.09213","url_pdf":"https://arxiv.org/pdf/2511.09213v1","authors":"[\"Akseli Reunamo\",\"Laura-Maria Peltonen\",\"Hans Moen\",\"Sampo Pyysalo\"]","published":"2025-11-12T11:21:05Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
