News Score: Score the News, Sort the News, Rewrite the Headlines

Diffusion Beats Autoregressive in Data-Constrained Settings

TLDR: If you are compute-constrained, use autoregressive models; if you are data-constrained, use diffusion models. Motivation Progress in AI over the past decade has largely been driven by scaling compute and data. The recipe from GPT-1 to GPT-5 has appeared straightforward: train a larger model on more data, and the result is a more capable system. Scaling plot from Chinchilla paper Yet a central question remains: will this recipe continue to hold from GPT-6 to GPT-N? Many analysts and researc...

Read more at blog.ml.cmu.edu

© News Score  score the news, sort the news, rewrite the headlines