{"ID":2831648,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.07452","arxiv_id":"2512.07452","title":"From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models","abstract":"Many heritage institutions hold extensive collections of theatre programmes, which remain largely underused due to their complex layouts and lack of structured metadata. In this paper, we present a workflow for transforming such documents into structured data using a combination of multimodal large language models (LLMs), an ontology-based reasoning model, and a custom extension of the Linked Art framework. We show how vision-language models can accurately parse and transcribe born-digital and digitised programmes, achieving over 98% of correct extraction. To overcome the challenges of semantic annotation, we train a reasoning model (POntAvignon) using reinforcement learning with both formal and semantic rewards. This approach enables automated RDF triple generation and supports alignment with existing knowledge graphs. Through a case study based on the Festival d'Avignon corpus, we demonstrate the potential for large-scale, ontology-driven analysis of performing arts data. Our results open new possibilities for interoperable, explainable, and sustainable computational theatre historiography.","short_abstract":"Many heritage institutions hold extensive collections of theatre programmes, which remain largely underused due to their complex layouts and lack of structured metadata. In this paper, we present a workflow for transforming such documents into structured data using a combination of multimodal large language models (LLM...","url_abs":"https://arxiv.org/abs/2512.07452","url_pdf":"https://arxiv.org/pdf/2512.07452v1","authors":"[\"Clarisse Bardiot\",\"Pierre-Carl Langlais\",\"Bernard Jacquemin\",\"Jacob Hart\",\"Antonios Lagarias\",\"Nicolas Foucault\",\"Aurélie Lemaître-Legargeant\",\"Jeanne Fras\"]","published":"2025-12-08T11:27:10Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}