{"ID":2839950,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14301","arxiv_id":"2511.14301","title":"SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models","abstract":"Modern language models remain vulnerable to backdoor attacks via poisoned data, where training inputs containing a trigger are paired with a target output, causing the model to reproduce that behavior whenever the trigger appears at inference time. Recent work has emphasized stealthy attacks that stress-test data-curation defenses using stylized artifacts or token-level perturbations as triggers, but this focus leaves a more practically relevant threat model underexplored: backdoors tied to naturally occurring semantic concepts. We introduce SteganoBackdoor, an optimization-based framework that constructs SteganoPoisons, steganographic poisoned training examples in which a backdoor payload is distributed across a fluent sentence while exhibiting no representational overlap with the inference-time semantic trigger. Across diverse model architectures, SteganoBackdoor achieves high attack success under constrained poisoning budgets and remains effective under conservative data-level filtering, highlighting a blind spot in existing data-curation defenses.","short_abstract":"Modern language models remain vulnerable to backdoor attacks via poisoned data, where training inputs containing a trigger are paired with a target output, causing the model to reproduce that behavior whenever the trigger appears at inference time. Recent work has emphasized stealthy attacks that stress-test data-curat...","url_abs":"https://arxiv.org/abs/2511.14301","url_pdf":"https://arxiv.org/pdf/2511.14301v3","authors":"[\"Eric Xue\",\"Ruiyi Zhang\",\"Pengtao Xie\"]","published":"2025-11-18T09:56:16Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Language Model\",\"Generative Adversarial Network\"]","has_code":false}
