News Score: Score the News, Sort the News, Rewrite the Headlines

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude

In brief Xiaomi and inference partner TileRT have broken 1,000 tokens per second on a 1-trillion-parameter model, a first at that scale, using a standard 8-GPU commodity node—not custom chips. The speed comes from FP4 quantization on the model's expert layers and DFlash speculative decoding, which proposes a full block of tokens in one pass instead of one at a time. A limited API trial opens June 9 through June 23, priced at 3× standard MiMo rates for roughly 10× the generation speed. Most peopl...

Read more at decrypt.co

© News Score  score the news, sort the news, rewrite the headlines