A Year of Fast Apply — The Path to 10k Tokens per Second
A year ago today, we released our first Fast Apply model publicly. Since then,
we’ve learned a lot about how to fine-tune small, specialized models for
code-specific tasks.
Today, we’re open-sourcing what we've learned in training this series of models
— dataset curation, training methods, and inference techniques that led to
Relace Apply 3, our best model yet, capable of running at 10k+ tokens per second
while maintaining state-of-the-art accuracy.
Error rate
comparison on 500 randomly sampled ...
Read more at relace.ai