Where Does Speech Enhancement Adapt? Probing Study Under Controlled Degradation
Abstract
Speech enhancement (SE) models advance rapidly, yet it remains underexplored how degradation of input signals affects their internal representations. We introduce a probing process, aimed at modeling the behavior of internal representations in SE models under controlled degradations to input signals. We apply it to the MUSE SE model by extracting its layer activations under controlled Signal-to-Noise Ratio (SNR) and reverberation C50. We measure layer-wise representational similarity to clean input references using Centered Kernel Alignment (CKA) and regress it against the degradation level, yielding compact, robustness-adaptive profiles. Encoder layers maintain noise-invariant representations while decoder layers adapt strongly, with sensitivity increasing monotonically within blocks and skip-connection boundaries marking the sharpest transitions. The same structure emerges under reverberation and is reproduced independently by MP-SENet and Demucs, two structurally distinct architectures, suggesting that the tradeoff is induced by the enhancement objective rather than a particular model design. Together, these results characterize where SE models adapt to degradation. We then offer insight into how internal representations correlate with output-level performance metrics, e.g., PESQ.