Coder Trains LLM from Scratch, Tests on Short Story; Compares Results with GPT-2 Weights

Writing an LLM from scratch, part 22 -- finally training our LLM!

Archives Categories Blogroll This post wraps up my notes on chapter 5 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)". Understanding cross entropy loss and perplexity were the hard bits for me in this chapter -- the remaining 28 pages were more a case of plugging bits together and running the code, to see what happens. The shortness of this post almost feels like a damp squib. After writing so much in the last 22 posts, there's really not all that much to say -- but th...