{"ID":2856290,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.11288","arxiv_id":"2510.11288","title":"Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs","abstract":"Recent work has shown that narrow finetuning can produce broadly misaligned LLMs, a phenomenon termed emergent misalignment (EM). While concerning, these findings were limited to finetuning and activation steering, leaving out in-context learning (ICL). We therefore ask: does EM emerge in ICL? We find that it does: across four model families (Gemini, Kimi-K2, Grok, and Qwen), narrow in-context examples cause models to produce misaligned responses to benign, unrelated queries. With 16 in-context examples, EM rates range from 1% to 24% depending on model and domain, appearing with as few as 2 examples. Neither larger model scale nor explicit reasoning provides reliable protection, and larger models are typically even more susceptible. Next, we formulate and test a hypothesis, which explains in-context EM as conflict between safety objectives and context-following behavior. Consistent with this, instructing models to prioritize safety reduces EM while prioritizing context-following increases it. These findings establish ICL as a previously underappreciated vector for emergent misalignment that resists simple scaling-based solutions.","short_abstract":"Recent work has shown that narrow finetuning can produce broadly misaligned LLMs, a phenomenon termed emergent misalignment (EM). While concerning, these findings were limited to finetuning and activation steering, leaving out in-context learning (ICL). We therefore ask: does EM emerge in ICL? We find that it does: acr...","url_abs":"https://arxiv.org/abs/2510.11288","url_pdf":"https://arxiv.org/pdf/2510.11288v4","authors":"[\"Nikita Afonin\",\"Nikita Andriianov\",\"Vahagn Hovhannisyan\",\"Nikhil Bageshpura\",\"Kyle Liu\",\"Kevin Zhu\",\"Sunishchal Dev\",\"Ashwinee Panda\",\"Oleg Rogov\",\"Elena Tutubalina\",\"Alexander Panchenko\",\"Mikhail Seleznyov\"]","published":"2025-10-13T11:23:56Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false}
