{"ID":2869216,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14737","arxiv_id":"2509.14737","title":"Pushing the Limits of End-to-End Diarization","abstract":"In this paper, we present state-of-the-art diarization error rates (DERs) on multiple publicly available datasets, including AliMeeting-far, AliMeeting-near, AMI-Mix, AMI-SDM, DIHARD III, and MagicData RAMC. Leveraging EEND-TA, a single unified non-autoregressive model for end-to-end speaker diarization, we achieve new benchmark results, most notably a DER of 14.49% on DIHARD III. Our approach scales pretraining through 8-speaker simulation mixtures, ensuring each generated speaker mixture configuration is sufficiently represented. These experiments highlight that EEND-based architectures possess a greater capacity for learning than previously explored, surpassing many existing diarization solutions while maintaining efficient speeds during inference.","short_abstract":"In this paper, we present state-of-the-art diarization error rates (DERs) on multiple publicly available datasets, including AliMeeting-far, AliMeeting-near, AMI-Mix, AMI-SDM, DIHARD III, and MagicData RAMC. Leveraging EEND-TA, a single unified non-autoregressive model for end-to-end speaker diarization, we achieve new...","url_abs":"https://arxiv.org/abs/2509.14737","url_pdf":"https://arxiv.org/pdf/2509.14737v1","authors":"[\"Samuel J. Broughton\",\"Lahiru Samarakoon\"]","published":"2025-09-18T08:39:33Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[]","has_code":false}