{"ID":2880280,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.14732","arxiv_id":"2508.14732","title":"PadAug: Robust Speaker Verification with Simple Waveform-Level Silence Padding","abstract":"The presence of non-speech segments in utterances often leads to the performance degradation of speaker verification. Existing systems usually use voice activation detection as a preprocessing step to cut off long silence segments. However, short silence segments, particularly those between speech segments, still remain a problem for speaker verification. To address this issue, in this paper, we propose a simple wave-level data augmentation method, \\textit{PadAug}, which aims to enhance the system's robustness to silence segments. The core idea of \\textit{PadAug} is to concatenate silence segments with speech segments at the waveform level for model training. Due to its simplicity, it can be directly applied to the current state-of-the art architectures. Experimental results demonstrate the effectiveness of the proposed \\textit{PadAug}. For example, applying \\textit{PadAug} to ResNet34 achieves a relative equal error rate reduction of 5.0\\% on the voxceleb dataset. Moreover, the \\textit{PadAug} based systems are robust to different lengths and proportions of silence segments in the test data.","short_abstract":"The presence of non-speech segments in utterances often leads to the performance degradation of speaker verification. Existing systems usually use voice activation detection as a preprocessing step to cut off long silence segments. However, short silence segments, particularly those between speech segments, still remai...","url_abs":"https://arxiv.org/abs/2508.14732","url_pdf":"https://arxiv.org/pdf/2508.14732v1","authors":"[\"Zijun Huang\",\"Chengdong Liang\",\"Jiadi Yao\",\"Xiao-Lei Zhang\"]","published":"2025-08-20T14:28:42Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[]","has_code":false}
