{"ID":2855948,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.12947","arxiv_id":"2510.12947","title":"HyWA: Hypernetwork Weight Adapting Personalized Voice Activity Detection","abstract":"Personalized Voice Activity Detection (PVAD) systems activate only in response to a specific target speaker. Speaker-conditioning methods are employed to inject information about the target speaker into a VAD pipeline, to achieve personalization. Existing speaker-conditioning methods typically modify the inputs or activations of a VAD model. We propose an alternative perspective to speaker conditioning. Our approach, HyWA, employs a hypernetwork to generate personalized weights for a few selected layers of a standard VAD model. We evaluate HyWA against multiple baseline speaker-conditioning techniques using a fixed backbone VAD. Our comparison shows consistent improvements in PVAD performance. This new approach improves the current speaker-conditioning techniques in two ways: i) increases the mean average precision, ii) facilitates deployment by reusing the same VAD architecture.","short_abstract":"Personalized Voice Activity Detection (PVAD) systems activate only in response to a specific target speaker. Speaker-conditioning methods are employed to inject information about the target speaker into a VAD pipeline, to achieve personalization. Existing speaker-conditioning methods typically modify the inputs or acti...","url_abs":"https://arxiv.org/abs/2510.12947","url_pdf":"https://arxiv.org/pdf/2510.12947v2","authors":"[\"Mahsa Ghazvini Nejad\",\"Hamed Jafarzadeh Asl\",\"Amin Edraki\",\"Mohammadreza Sadeghi\",\"Masoud Asgharian\",\"Yuanhao Yu\",\"Vahid Partovi Nia\"]","published":"2025-10-14T19:46:40Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\",\"cs.LG\",\"cs.SD\"]","methods":"[]","has_code":false}
