A 30B Qwen Model Walks Into a Raspberry Pi… and Runs in Real Time
For this release, we optimize for what people actually experience when they run a model:
fast, high-quality responses on a specific target device.
We use Shapelearn, our bitlength learning method to choose weight datatypes for
Qwen3-30B-A3B-Instruct-2507 that maximize performance in terms of
tokens per second (TPS) and output quality, with one practical constraint: the model
must fit comfortably in the available memory. Once it fits, making the file smaller
isn't a goal by itself. We only shrink...
Read more at byteshape.com