CMU Study: Diffusion Models Outperform Autoregressive in Data-Limited AI; Compute-Data Gap Predicted by 2028

Diffusion Beats Autoregressive in Data-Constrained Settings

TLDR: If you are compute-constrained, use autoregressive models; if you are data-constrained, use diffusion models. Motivation Progress in AI over the past decade has largely been driven by scaling compute and data. The recipe from GPT-1 to GPT-5 has appeared straightforward: train a larger model on more data, and the result is a more capable system. Scaling plot from Chinchilla paper Yet a central question remains: will this recipe continue to hold from GPT-6 to GPT-N? Many analysts and researc...