Jet-Nemotron: New AI Model Outperforms Larger Rivals, Boosts Speed 53x Using Novel Architecture Search

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

View PDF HTML (experimental) Abstract:We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation throughput. Jet-Nemotron is developed using Post Neural Architecture Search (PostNAS), a novel neural architecture exploration pipeline that enables efficient model design. Unlike prior approaches, PostNAS begins with a pre-trained full-attention model and freezes its MLP...