GitHub - cactus-compute/needle: 26m function call model that runs on incredibly small devices
We distilled Gemini 3.1 into a 26m parameter "Simple Attention Network" that you can even finetune locally on your Mac/PC.
In production, Needle runs on Cactus at 6000 toks/sec prefill and 1200 decode speed.
Weights are fully open on Cactus-Compute/needle, as well as the dataset generation.
d=512, 8H/4KV, BPE=8192
┌──────────────┐
│ Tool Call │
└──────┬───────┘
┌┴──────────┐
│ Softmax │
└─────┬─────┘
┌─────┴─────┐
│ Linear (T)│ ← tied
└─────┬─────┘
┌─────┴─────┐
│ ZCRMSNorm │
└─────┬─────┘...
Read more at github.com