News Score: Score the News, Sort the News, Rewrite the Headlines

Emergent Introspective Awareness in Large Language Models

Affiliations Published October 29th, 2025 Contents We investigate whether large language models can introspect on their internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model’s activations, and measuring the influence of these manipulations on the model’s self-reported states. We find that models can, in ce...

Read more at transformer-circuits.pub

© News Score  score the news, sort the news, rewrite the headlines