{"ID":2833383,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.03563","arxiv_id":"2512.03563","title":"State Space Models for Bioacoustics: A Comparative Evaluation with Transformers","abstract":"In this study, we evaluate the efficacy of the Mamba architecture bioacoustics by introducing BioMamba, a Mamba-based audio representation model for wildlife sounds. We pre-train a BioMamba using self-supervised learning on a large audio corpus and evaluate it on the BEANS benchmark across diverse classification and detection tasks. Compared to the state-of-the-art Transformer-based model (AVES), BioMamba achieves comparable performance while significantly reducing VRAM consumption. Our results demonstrate Mamba's potential as a computationally efficient alternative for real-world environmental monitoring.","short_abstract":"In this study, we evaluate the efficacy of the Mamba architecture bioacoustics by introducing BioMamba, a Mamba-based audio representation model for wildlife sounds. We pre-train a BioMamba using self-supervised learning on a large audio corpus and evaluate it on the BEANS benchmark across diverse classification and de...","url_abs":"https://arxiv.org/abs/2512.03563","url_pdf":"https://arxiv.org/pdf/2512.03563v2","authors":"[\"Chengyu Tang\",\"Sanjeev Baskiyar\"]","published":"2025-12-03T08:37:09Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false}