{"ID":2870941,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.11957","arxiv_id":"2509.11957","title":"EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection Using Self-Attention Attractors","abstract":"Voice activity detection (VAD) is essential in speech-based systems, but traditional methods detect only speech presence without identifying speakers. Target-speaker VAD (TS-VAD) extends this by detecting the speech of a known speaker using a short enrollment utterance, but this assumption fails in open-domain scenarios such as meetings or customer service calls, where the main speaker is unknown. We propose EEND-SAA, an enrollment-less, streaming-compatible framework for main-speaker VAD, which identifies the primary speaker without prior knowledge. Unlike TS-VAD, our method determines the main speaker as the one who talks more steadily and clearly, based on speech continuity and volume. We build our model on EEND using two self-attention attractors in a Transformer and apply causal masking for real-time use. Experiments on multi-speaker LibriSpeech mixtures show that EEND-SAA reduces main-speaker DER from 6.63% to 3.61% and improves F1 from 0.9667 to 0.9818 over the SA-EEND baseline, achieving state-of-the-art performance under conditions involving speaker overlap and noise.","short_abstract":"Voice activity detection (VAD) is essential in speech-based systems, but traditional methods detect only speech presence without identifying speakers. Target-speaker VAD (TS-VAD) extends this by detecting the speech of a known speaker using a short enrollment utterance, but this assumption fails in open-domain scenario...","url_abs":"https://arxiv.org/abs/2509.11957","url_pdf":"https://arxiv.org/pdf/2509.11957v1","authors":"[\"Wen-Yung Wu\",\"Pei-Chin Hsieh\",\"Tai-Shih Chi\"]","published":"2025-09-15T14:13:26Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[\"Transformer\",\"Large Language Model\"]","has_code":false}
