Intuitions for Transformer Circuits
In a previous post on language modeling, I implemented a GPT-style transformer. Lately I’ve been learning mechanistic interpretability to go deeper and understand why the transformer works on a mathematical level.This post is a brain dump of what I’ve learned so far after reading A Mathematical Framework for Transformer Circuits (herein: “Framework”) and working through the Intro to Mech Interp section on ARENA. My goal is to describe my current intuition for the paper, especially parts I was co...
Read more at connorjdavis.com