ByteShape runs 30B Qwen3 AI model on Raspberry Pi 5 at 8 TPS, 94% quality using Shapelearn bitlength optimization; outperforms Unsloth, MagicQuant on speed-accuracy tradeoff

A 30B Qwen Model Walks Into a Raspberry Pi… and Runs in Real Time

For this release, we optimize for what people actually experience when they run a model: fast, high-quality responses on a specific target device. We use Shapelearn, our bitlength learning method to choose weight datatypes for Qwen3-30B-A3B-Instruct-2507 that maximize performance in terms of tokens per second (TPS) and output quality, with one practical constraint: the model must fit comfortably in the available memory. Once it fits, making the file smaller isn't a goal by itself. We only shrink...