Researchers reverse-engineer AI models to find why transformers fail at multi-digit multiplication; discover long-range dependency problems can be fixed with auxiliary loss providing correct inductive bias.

Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls

View PDF HTML (experimental) Abstract:Language models are increasingly capable, yet still fail at a seemingly simple task of multi-digit multiplication. In this work, we study why, by reverse-engineering a model that successfully learns multiplication via \emph{implicit chain-of-thought}, and report three findings: (1) Evidence of long-range structure: Logit attributions and linear probes indicate that the model encodes the necessary long-range dependencies for multi-digit multiplication. (2) Me...