{"ID":2877558,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.19721","arxiv_id":"2508.19721","title":"CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese","abstract":"Existing resources for Automatic Speech Recognition in Portuguese are mostly focused on Brazilian Portuguese, leaving European Portuguese (EP) and other varieties under-explored. To bridge this gap, we introduce CAMÕES, the first open framework for EP and other Portuguese varieties. It consists of (1) a comprehensive evaluation benchmark, including 46h of EP test data spanning multiple domains; and (2) a collection of state-of-the-art models. For the latter, we consider multiple foundation models, evaluating their zero-shot and fine-tuned performances, as well as E-Branchformer models trained from scratch. A curated set of 425h of EP was used for both fine-tuning and training. Our results show comparable performance for EP between fine-tuned foundation models and the E-Branchformer. Furthermore, the best-performing models achieve relative improvements above 35% WER, compared to the strongest zero-shot foundation model, establishing a new state-of-the-art for EP and other varieties.","short_abstract":"Existing resources for Automatic Speech Recognition in Portuguese are mostly focused on Brazilian Portuguese, leaving European Portuguese (EP) and other varieties under-explored. To bridge this gap, we introduce CAMÕES, the first open framework for EP and other Portuguese varieties. It consists of (1) a comprehensive e...","url_abs":"https://arxiv.org/abs/2508.19721","url_pdf":"https://arxiv.org/pdf/2508.19721v1","authors":"[\"Carlos Carvalho\",\"Francisco Teixeira\",\"Catarina Botelho\",\"Anna Pompili\",\"Rubén Solera-Ureña\",\"Sérgio Paulo\",\"Mariana Julião\",\"Thomas Rolland\",\"John Mendonça\",\"Diogo Pereira\",\"Isabel Trancoso\",\"Alberto Abad\"]","published":"2025-08-27T09:30:43Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"eess.AS\"]","methods":"[]","has_code":false}
