News Score: Score the News, Sort the News, Rewrite the Headlines

Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons

View PDF HTML (experimental) Abstract:We present a novel non attention based architecture for large language models (LLMs) that efficiently handles very long context windows, on the order of hundreds of thousands to potentially millions of tokens. Unlike traditional Transformer designs, which suffer from quadratic memory and computation overload due to the nature of the self attention mechanism, our model avoids token to token attention entirely. Instead, it combines the following complementary ...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines