Q-learning is not yet scalable
Does RL scale?
Over the past few years,
we've seen that next-token prediction scales, denoising diffusion scales, contrastive learning scales,
and so on, all the way to the point where we can train models with billions of parameters
with a scalable objective that can eat up as much data as we can throw at it.
Then, what about reinforcement learning (RL)?
Does RL also scale like all the other objectives?
Apparently, it does.
In 2016, RL achieved superhuman-level performance in games like Go and C...
Read more at seohong.me