News Score: Score the News, Sort the News, Rewrite the Headlines

Language Models Need Sleep

View PDF HTML (experimental) Abstract:Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in i...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines