News Score: Score the News, Sort the News, Rewrite the Headlines

Large Language Models Often Know When They Are Being Evaluated

View PDF HTML (experimental) Abstract:If AI models can detect when they are being evaluated, the effectiveness of evaluations might be compromised. For example, models could have systematically different behavior during evaluations, leading to less reliable benchmarks for deployment and governance decisions. We investigate whether frontier language models can accurately classify transcripts based on whether they originate from evaluations or real-world deployment, a capability we call evaluation...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines