Anthropic's 2025 research reveals LLMs perform multi-step reasoning through traceable concepts like "Texas" and "Austin"—not black boxes but interpretable systems with identifiable thought processes.

LLMs are not the Black Box you were promised

On the Biology of a Large Language Model (Anthropic, 2025) LLMs are not the "black box" you were promised. Mechanistic interpretability — peering into a neural network to reverse engineer its inner workings — has made major strides. Anthropic's On the Biology of a Large Language Model (2025) is a landmark in that effort. What follows is a summary of their progress and some related thoughts. What is an LLM actually "thinking"? How can we understand what an LLM is "thinking"? It's clearly very val...