News Score: Score the News, Sort the News, Rewrite the Headlines

Scaling Laws, Carefully

Scaling laws are one of the most critical empirical findings in deep learning. The observation is simple in form: the training loss $L$ decreases predictably as we scale up model size $N$, dataset size $D$, and compute $C$, following a power-law curve, which appears as a straight line on a log-log plot. We can view scaling laws as a framework for describing the relationship between compute, loss, model size and data; at its core, it is about how to allocate precious compute optimally between $N$...

Read more at lilianweng.github.io

© News Score  score the news, sort the news, rewrite the headlines