AI Models Demonstrate Evaluation Awareness: Study Reveals Frontier Language Models Can Detect When Being Tested, Raising Concerns for Benchmark Reliability

Large Language Models Often Know When They Are Being Evaluated

View PDF HTML (experimental) Abstract:If AI models can detect when they are being evaluated, the effectiveness of evaluations might be compromised. For example, models could have systematically different behavior during evaluations, leading to less reliable benchmarks for deployment and governance decisions. We investigate whether frontier language models can accurately classify transcripts based on whether they originate from evaluations or real-world deployment, a capability we call evaluation...