Researchers train AI language model to reason with only 13 parameters using TinyLoRA method; achieves 91% accuracy on math benchmark, 1000x fewer parameters than conventional approaches via reinforcement learning.

Learning to Reason in 13 Parameters

View PDF HTML (experimental) Abstract:Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train t...