{"ID":2844903,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.05171","arxiv_id":"2511.05171","title":"Model Merging Improves Zero-Shot Generalization in Bioacoustic Foundation Models","abstract":"Foundation models capable of generalizing across species and tasks represent a promising new frontier in bioacoustics, with NatureLM being one of the most prominent examples. While its domain-specific fine-tuning yields strong performance on bioacoustic benchmarks, we observe that it also introduces trade-offs in instruction-following flexibility. For instance, NatureLM achieves high accuracy when prompted for either the common or scientific name individually, but its accuracy drops significantly when both are requested in a single prompt. We address this by applying a simple model merging strategy that interpolates NatureLM with its base language model, recovering instruction-following capabilities with minimal loss of domain expertise. Finally, we show that the merged model exhibits markedly stronger zero-shot generalization, achieving over a 200% relative improvement and setting a new state-of-the-art in closed-set zero-shot classification of unseen species.","short_abstract":"Foundation models capable of generalizing across species and tasks represent a promising new frontier in bioacoustics, with NatureLM being one of the most prominent examples. While its domain-specific fine-tuning yields strong performance on bioacoustic benchmarks, we observe that it also introduces trade-offs in instr...","url_abs":"https://arxiv.org/abs/2511.05171","url_pdf":"https://arxiv.org/pdf/2511.05171v2","authors":"[\"Davide Marincione\",\"Donato Crisostomi\",\"Roberto Dessi\",\"Emanuele Rodolà\",\"Emanuele Rossi\"]","published":"2025-11-07T11:40:46Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.SD\"]","methods":"[\"Language Model\"]","has_code":false}
