AI agent given 16 GPUs on Kubernetes runs 910 experiments in 8 hours, discovers model width scaling and GPU-type optimization strategy, achieves 2.87% improvement via parallel search

AI agent given 16 GPUs on Kubernetes runs 910 experiments in 8 hours, discovers model width scaling and GPU-type optimization strategy, achieves 2.87% improvement via parallel search—9x faster than sequential baseline.

Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

We pointed Claude Code at autoresearch and gave it access to 16 GPUs on a Kubernetes cluster. Over 8 hours it submitted ~910 experiments, found that scaling model width mattered more than any single hyperparameter, taught itself to use H200s for validation while screening ideas on H100s, and drove val_bpb from 1.003 down to 0.974 - a 2.87% improvement over baseline.Beyond raw speedup, parallelism changed how the agent searched. With one GPU, it’s stuck doing greedy hill-climbing - try one thing,...