{"ID":2851728,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19585","arxiv_id":"2510.19585","title":"Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark","abstract":"This paper presents a novel task of extracting low-resourced and noisy Latin fragments from mixed-language historical documents with varied layouts. We benchmark and evaluate the performance of large foundation models against a multimodal dataset of 724 annotated pages. The results demonstrate that reliable Latin detection with contemporary zero-shot models is achievable, yet these models lack a functional comprehension of Latin. This study establishes a comprehensive baseline for processing Latin within mixed-language corpora, supporting quantitative analysis in intellectual history and historical linguistics. Both the dataset and code are available at https://github.com/COMHIS/EACL26-detect-latin.","short_abstract":"This paper presents a novel task of extracting low-resourced and noisy Latin fragments from mixed-language historical documents with varied layouts. We benchmark and evaluate the performance of large foundation models against a multimodal dataset of 724 annotated pages. The results demonstrate that reliable Latin detec...","url_abs":"https://arxiv.org/abs/2510.19585","url_pdf":"https://arxiv.org/pdf/2510.19585v3","authors":"[\"Yu Wu\",\"Ke Shu\",\"Jonas Fischer\",\"Lidia Pivovarova\",\"David Rosson\",\"Eetu Mäkelä\",\"Mikko Tolonen\"]","published":"2025-10-22T13:37:52Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.CV\",\"cs.DL\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":607930,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2851728,"paper_url":"https://arxiv.org/abs/2510.19585","paper_title":"Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark","repo_url":"https://github.com/COMHIS/EACL26-detect-latin","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
