{"ID":2869169,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14651","arxiv_id":"2509.14651","title":"MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models","abstract":"As large language models~(LLMs) become widely adopted, ensuring their alignment with human values is crucial to prevent jailbreaks where adversaries manipulate models to produce harmful content. While most defenses target single-turn attacks, real-world usage often involves multi-turn dialogues, exposing models to attacks that exploit conversational context to bypass safety measures. We introduce MUSE, a comprehensive framework tackling multi-turn jailbreaks from both attack and defense angles. For attacks, we propose MUSE-A, a method that uses frame semantics and heuristic tree search to explore diverse semantic trajectories. For defense, we present MUSE-D, a fine-grained safety alignment approach that intervenes early in dialogues to reduce vulnerabilities. Extensive experiments on various models show that MUSE effectively identifies and mitigates multi-turn vulnerabilities. Code is available at \\href{https://github.com/yansiyu02/MUSE}{https://github.com/yansiyu02/MUSE}.","short_abstract":"As large language models~(LLMs) become widely adopted, ensuring their alignment with human values is crucial to prevent jailbreaks where adversaries manipulate models to produce harmful content. While most defenses target single-turn attacks, real-world usage often involves multi-turn dialogues, exposing models to atta...","url_abs":"https://arxiv.org/abs/2509.14651","url_pdf":"https://arxiv.org/pdf/2509.14651v1","authors":"[\"Siyu Yan\",\"Long Zeng\",\"Xuecheng Wu\",\"Chengcheng Han\",\"Kongcheng Zhang\",\"Chong Peng\",\"Xuezhi Cao\",\"Xunliang Cai\",\"Chenjuan Guo\"]","published":"2025-09-18T06:12:27Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609654,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2869169,"paper_url":"https://arxiv.org/abs/2509.14651","paper_title":"MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models","repo_url":"https://github.com/yansiyu02/MUSE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
