{"ID":2863691,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24908","arxiv_id":"2509.24908","title":"BOE-XSUM: Extreme Summarization in Clear Language of Spanish Legal Decrees and Notifications","abstract":"The ability to summarize long documents succinctly is increasingly important in daily life due to information overload, yet there is a notable lack of such summaries for Spanish documents in general, and in the legal domain in particular. In this work, we present BOE-XSUM, a curated dataset comprising 3,648 concise, plain-language summaries of documents sourced from Spain's ``Boletín Oficial del Estado'' (BOE), the State Official Gazette. Each entry in the dataset includes a short summary, the original text, and its document type label. We evaluate the performance of medium-sized large language models (LLMs) fine-tuned on BOE-XSUM, comparing them to general-purpose generative models in a zero-shot setting. Results show that fine-tuned models significantly outperform their non-specialized counterparts. Notably, the best-performing model -- BERTIN GPT-J 6B (32-bit precision) -- achieves a 24\\% performance gain over the top zero-shot model, DeepSeek-R1 (accuracies of 41.6\\% vs.\\ 33.5\\%).","short_abstract":"The ability to summarize long documents succinctly is increasingly important in daily life due to information overload, yet there is a notable lack of such summaries for Spanish documents in general, and in the legal domain in particular. In this work, we present BOE-XSUM, a curated dataset comprising 3,648 concise, pl...","url_abs":"https://arxiv.org/abs/2509.24908","url_pdf":"https://arxiv.org/pdf/2509.24908v1","authors":"[\"Andrés Fernández García\",\"Javier de la Rosa\",\"Julio Gonzalo\",\"Roser Morante\",\"Enrique Amigó\",\"Alejandro Benito-Santos\",\"Jorge Carrillo-de-Albornoz\",\"Víctor Fresno\",\"Adrian Ghajari\",\"Guillermo Marco\",\"Laura Plaza\",\"Eva Sánchez Salido\"]","published":"2025-09-29T15:15:17Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
