Researchers develop sleep-like consolidation mechanism for AI language models to handle long tasks by converting recent context into fast weights during offline processing, improving performance on complex reasoning tasks.

Language Models Need Sleep

View PDF HTML (experimental) Abstract:Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in i...