Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster
We pointed Claude Code at autoresearch and gave it access to 16 GPUs on a Kubernetes cluster. Over 8 hours it submitted ~910 experiments, found that scaling model width mattered more than any single hyperparameter, taught itself to use H200s for validation while screening ideas on H100s, and drove val_bpb from 1.003 down to 0.974 - a 2.87% improvement over baseline.Beyond raw speedup, parallelism changed how the agent searched.
With one GPU, it’s stuck doing greedy hill-climbing - try one thing,...
Read more at blog.skypilot.co