News Score: Score the News, Sort the News, Rewrite the Headlines

Keeping 20,000 GPUs healthy

Back Engineering December 28, 2025•8 minute readModal runs a globally distributed, autoscaling GPU worker pool by sourcing compute from all cloud giants: AWS, GCP, Azure, OCI. We’ve scaled the worker pool to well over 20,000 concurrent GPUs, and launched over four million cloud instances in the last couple years. At this scale, you see almost every GPU reliability problem there is. Today, we’re sharing our GPU reliability system as both a demonstration of our commitment to Modal customers and as...

Read more at modal.com

© News Score  score the news, sort the news, rewrite the headlines