{"ID":2847749,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.00203","arxiv_id":"2511.00203","title":"Diffusion LLMs are Natural Adversaries for any LLM","abstract":"We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an \\emph{efficient, amortized inference task}. Our core insight is that pretrained, non-autoregressive generative LLMs, such as Diffusion LLMs, which model the joint distribution over prompt-response pairs, can serve as powerful surrogates for prompt search. This approach enables the direct conditional generation of prompts, effectively replacing costly, per-instance discrete optimization with a small number of parallelizable samples. We provide a probabilistic analysis demonstrating that under mild fidelity assumptions, only a few conditional samples are required to recover high-reward (harmful) prompts. Empirically, we find that the generated prompts are low-perplexity, diverse jailbreaks that exhibit strong transferability to a wide range of black-box target models, including robustly trained and proprietary LLMs. Beyond adversarial prompting, our framework opens new directions for red teaming, automated prompt optimization, and leveraging emerging Flow- and Diffusion-based LLMs.","short_abstract":"We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an \\emph{efficient, amortized inference task}. Our core insight is that pretrained, non-autoregressive generative LLMs, such as Diffusion LLMs, which model the joint distribution over prompt-response pai...","url_abs":"https://arxiv.org/abs/2511.00203","url_pdf":"https://arxiv.org/pdf/2511.00203v1","authors":"[\"David Lüdke\",\"Tom Wollschläger\",\"Paul Ungermann\",\"Stephan Günnemann\",\"Leo Schwinn\"]","published":"2025-10-31T19:04:09Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ML\"]","methods":"[\"Diffusion Model\",\"Large Language Model\"]","has_code":false}
