{"ID":2867335,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19469","arxiv_id":"2509.19469","title":"MusiCRS: Benchmarking Audio-Centric Conversational Recommendation","abstract":"Conversational recommendation has advanced rapidly with large language models (LLMs), yet music remains a uniquely challenging domain in which effective recommendations require reasoning over audio content beyond what text or metadata can capture. We present MusiCRS, the first benchmark for audio-centric conversational recommendation that links authentic user conversations from Reddit with corresponding tracks. MusiCRS includes 477 high-quality conversations spanning diverse genres (classical, hip-hop, electronic, metal, pop, indie, jazz), with 3,589 unique musical entities and audio grounding via YouTube links. MusiCRS supports evaluation under three input modality configurations: audio-only, query-only, and audio+query, allowing systematic comparison of audio-LLMs, retrieval models, and traditional approaches. Our experiments reveal that current systems struggle with cross-modal integration, with optimal performance frequently occurring in single-modality settings rather than multimodal configurations. This highlights fundamental limitations in cross-modal knowledge integration, as models excel at dialogue semantics but struggle when grounding abstract musical concepts in audio. To facilitate progress, we release the MusiCRS dataset (https://huggingface.co/datasets/rohan2810/MusiCRS), evaluation code (https://github.com/rohan2810/musiCRS), and comprehensive baselines.","short_abstract":"Conversational recommendation has advanced rapidly with large language models (LLMs), yet music remains a uniquely challenging domain in which effective recommendations require reasoning over audio content beyond what text or metadata can capture. We present MusiCRS, the first benchmark for audio-centric conversational...","url_abs":"https://arxiv.org/abs/2509.19469","url_pdf":"https://arxiv.org/pdf/2509.19469v2","authors":"[\"Rohan Surana\",\"Amit Namburi\",\"Gagan Mundada\",\"Abhay Lal\",\"Zachary Novack\",\"Julian McAuley\",\"Junda Wu\"]","published":"2025-09-23T18:24:07Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.MM\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609455,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2867335,"paper_url":"https://arxiv.org/abs/2509.19469","paper_title":"MusiCRS: Benchmarking Audio-Centric Conversational Recommendation","repo_url":"https://github.com/rohan2810/musiCRS","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
