{"ID":2864334,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23882","arxiv_id":"2509.23882","title":"Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B","abstract":"OpenAI's GPT-OSS family provides open-weight language models with explicit chain-of-thought (CoT) reasoning and a Harmony prompt format. We summarize an extensive security evaluation of GPT-OSS-20B that probes the model's behavior under different adversarial conditions. Using the Jailbreak Oracle (JO) [1], a systematic LLM evaluation tool, the study uncovers several failure modes including quant fever, reasoning blackholes, Schrodinger's compliance, reasoning procedure mirage, and chain-oriented prompting. Experiments demonstrate how these behaviors can be exploited on the GPT-OSS-20B model, leading to severe consequences.","short_abstract":"OpenAI's GPT-OSS family provides open-weight language models with explicit chain-of-thought (CoT) reasoning and a Harmony prompt format. We summarize an extensive security evaluation of GPT-OSS-20B that probes the model's behavior under different adversarial conditions. Using the Jailbreak Oracle (JO) [1], a systematic...","url_abs":"https://arxiv.org/abs/2509.23882","url_pdf":"https://arxiv.org/pdf/2509.23882v2","authors":"[\"Shuyi Lin\",\"Tian Lu\",\"Zikai Wang\",\"Bo Wen\",\"Yibo Zhao\",\"Cheng Tan\"]","published":"2025-09-28T13:44:37Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CR\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
