{"ID":2872467,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.13333","arxiv_id":"2509.13333","title":"Evaluation Awareness Scales Predictably in Open-Weights Large Language Models","abstract":"Large language models (LLMs) can internally distinguish between evaluation and deployment contexts, a behaviour known as \\emph{evaluation awareness}. This undermines AI safety evaluations, as models may conceal dangerous capabilities during testing. Prior work demonstrated this in a single $70$B model, but the scaling relationship across model sizes remains unknown. We investigate evaluation awareness across $15$ models scaling from $0.27$B to $70$B parameters from four families using linear probing on steering vector activations. Our results reveal a clear power-law scaling: evaluation awareness increases predictably with model size. This scaling law enables forecasting deceptive behavior in future larger models and guides the design of scale-aware evaluation strategies for AI safety. A link to the implementation of this paper can be found at https://anonymous.4open.science/r/evaluation-awareness-scaling-laws/README.md.","short_abstract":"Large language models (LLMs) can internally distinguish between evaluation and deployment contexts, a behaviour known as \\emph{evaluation awareness}. This undermines AI safety evaluations, as models may conceal dangerous capabilities during testing. Prior work demonstrated this in a single $70$B model, but the scaling...","url_abs":"https://arxiv.org/abs/2509.13333","url_pdf":"https://arxiv.org/pdf/2509.13333v2","authors":"[\"Maheep Chaudhary\",\"Ian Su\",\"Nikhil Hooda\",\"Nishith Shankar\",\"Julia Tan\",\"Kevin Zhu\",\"Ryan Lagasse\",\"Vasu Sharma\",\"Ashwinee Panda\"]","published":"2025-09-10T06:36:38Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","project_urls":"[\"https://anonymous.4open.science/r/evaluation-awareness-scaling-laws/README.md\"]","has_code":false}
