{"ID":2861918,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.00582","arxiv_id":"2510.00582","title":"SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation","abstract":"In this paper, we present a neural spoken language diarization model that supports an unconstrained span of languages within a single framework. Our approach integrates a learnable query-based architecture grounded in multilingual awareness, with large-scale pretraining on simulated code-switching data. By jointly leveraging these two components, our method overcomes the limitations of conventional approaches in data scarcity and architecture optimization, and generalizes effectively to real-world multilingual settings across diverse environments. Experimental results demonstrate that our approach achieves state-of-the-art performance on several language diarization benchmarks, with a relative performance improvement of 23% to 52% over previous methods. We believe that this work not only advances research in language diarization but also establishes a foundational framework for code-switching speech technologies.","short_abstract":"In this paper, we present a neural spoken language diarization model that supports an unconstrained span of languages within a single framework. Our approach integrates a learnable query-based architecture grounded in multilingual awareness, with large-scale pretraining on simulated code-switching data. By jointly leve...","url_abs":"https://arxiv.org/abs/2510.00582","url_pdf":"https://arxiv.org/pdf/2510.00582v1","authors":"[\"Sangmin Lee\",\"Woongjib Choi\",\"Jihyun Kim\",\"Hong-Goo Kang\"]","published":"2025-10-01T07:01:33Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.SD\"]","methods":"[]","has_code":false}
