{"ID":2923686,"CreatedAt":"2026-06-02T04:05:25.881865328Z","UpdatedAt":"2026-06-04T12:56:18.750806265Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02212","arxiv_id":"2606.02212","title":"C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification","abstract":"Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy to implement, they may inadvertently distort subtle pathological characteristics. Meanwhile, existing Variational Autoencoder (VAE)- or Generative Adversarial Network (GAN)-based generative approaches often suffer from limited sample fidelity and insufficient controllability over class semantics, particularly under conditions of scarce supervision. Methods: To overcome these limitations, we propose C2GA, a class-controllable generative augmentation framework. C2GA first constructs a semantically rich discrete latent space using a conditional Vector-Quantized Variational Autoencoder (VQ-VAE), in which local acoustic tokens are explicitly decoupled from global class prototypes. Subsequently, a Transformer-based autoregressive prior is trained to generate label-consistent token sequences. These generated tokens are then fused with the corresponding class prototypes and decoded into high-fidelity Mel-spectrograms for data augmentation. Conclusion: These results indicate that C2GA provides an effective and semantically reliable augmentation strategy for respiratory sound analysis. By enabling controllable and high-quality data generation, the proposed framework offers a promising solution for improving the robustness and generalization of respiratory sound classification in realistic clinical scenarios.","short_abstract":"Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy...","url_abs":"https://arxiv.org/abs/2606.02212","url_pdf":"https://arxiv.org/pdf/2606.02212v1","authors":"[\"Ziqi Ma\",\"Mengyu Han\",\"Anteng Cai\",\"Zhanchong Liu\",\"Bowen Feng\",\"Hang Yu\",\"Sheng Hu\"]","published":"2026-06-01T13:11:14Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[\"Transformer\",\"Generative Adversarial Network\",\"Variational Autoencoder\"]","has_code":false}