{"ID":2857155,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.10265","arxiv_id":"2510.10265","title":"Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models","abstract":"Backdoor attacks are a significant threat to large language models (LLMs), often embedded via public checkpoints, yet existing defenses rely on impractical assumptions about trigger settings. To address this challenge, we propose \\ourmethod, a defense framework that requires no prior knowledge of trigger settings. \\ourmethod is based on the key observation that when deliberately injecting known backdoors into an already-compromised model, both existing unknown and newly injected backdoors aggregate in the representation space. \\ourmethod leverages this through a two-stage process: \\textbf{first}, aggregating backdoor representations by injecting known triggers, and \\textbf{then}, performing recovery fine-tuning to restore benign outputs. Extensive experiments across multiple LLM architectures demonstrate that: (I) \\ourmethod reduces the average Attack Success Rate to 4.41\\% across multiple benchmarks, outperforming existing baselines by 28.1\\%$\\sim$69.3\\%$\\uparrow$. (II) Clean accuracy and utility are preserved within 0.5\\% of the original model, ensuring negligible impact on legitimate tasks. (III) The defense generalizes across different types of backdoors, confirming its robustness in practical deployment scenarios.","short_abstract":"Backdoor attacks are a significant threat to large language models (LLMs), often embedded via public checkpoints, yet existing defenses rely on impractical assumptions about trigger settings. To address this challenge, we propose \\ourmethod, a defense framework that requires no prior knowledge of trigger settings. \\our...","url_abs":"https://arxiv.org/abs/2510.10265","url_pdf":"https://arxiv.org/pdf/2510.10265v2","authors":"[\"Liang Lin\",\"Miao Yu\",\"Moayad Aloqaily\",\"Zhenhong Zhou\",\"Kun Wang\",\"Linsey Pang\",\"Prakhar Mehrotra\",\"Qingsong Wen\"]","published":"2025-10-11T15:47:35Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
