Understanding Transformers Using A Minimal Example
Introduction
The internal mechanisms of Transformer Large Language models (LLMs),
particularly the flow of information through the layers and the
operation of the attention mechanism, can be challenging to follow
due to the vast amount of numbers involved. We humans can hardly
form a mental model. This article aims to make these workings
tangible by providing visualizations of a Transformer's internal
state. Utilizing a minimal dataset and a deliberately simplified
model, it is possible to follo...
Read more at rti.github.io