News Score: Score the News, Sort the News, Rewrite the Headlines

Learning to Reason in 13 Parameters

View PDF HTML (experimental) Abstract:Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train t...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines