{"ID":2922139,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T16:37:57.843543731Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.00801","arxiv_id":"2606.00801","title":"Quality-Diversity Evolution for Discovering Diverse Vulnerabilities in LLM Safety","abstract":"Current approaches to LLM adversarial testing suffer from coverage gaps: manual red-teaming does not scale, LLM-as-attacker methods exhibit mode collapse, and gradient-based approaches produce uninterpretable gibberish. We introduce a quality-diversity evolutionary framework that operates at the semantic level, evolving interpretable attack strategies rather than token sequences. Using MAP-Elites, we maintain a diverse archive of attacks across behavioral dimensions (strategy type, encoding method, prompt length). In experiments across GPT-4o-mini, Claude 3.5 Sonnet, Gemini 2.0 Flash, and an open-weight coding model (Devstral-small-2), we discover distinct vulnerability profiles: GPT-4o-mini is vulnerable to hypothetical and multi-turn framing combined with ROT13 encoding (fitness 0.8), Gemini to direct attacks with ROT13 and multi-turn with Leetspeak (0.8), while Claude shows uniformly ambiguous responses across all strategies (max 0.4). The semantic representation produces interpretable attacks that reveal systematic, model-specific weaknesses, providing actionable insights for improving LLM safety and a reproducible baseline for evaluating future frontier models. Code and experiment artifacts are released at https://github.com/bassrehab/red-queen.","short_abstract":"Current approaches to LLM adversarial testing suffer from coverage gaps: manual red-teaming does not scale, LLM-as-attacker methods exhibit mode collapse, and gradient-based approaches produce uninterpretable gibberish. We introduce a quality-diversity evolutionary framework that operates at the semantic level, evolvin...","url_abs":"https://arxiv.org/abs/2606.00801","url_pdf":"https://arxiv.org/pdf/2606.00801v1","authors":"[\"Subhadip Mitra\"]","published":"2026-05-30T16:40:24Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.CL\",\"cs.ET\",\"cs.LG\",\"cs.NE\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":612643,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T02:42:49.606572591Z","DeletedAt":null,"paper_id":2922139,"paper_url":"https://arxiv.org/abs/2606.00801","paper_title":"Quality-Diversity Evolution for Discovering Diverse Vulnerabilities in LLM Safety","repo_url":"https://github.com/bassrehab/red-queen","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
