News Score: Score the News, Sort the News, Rewrite the Headlines

Combining NVIDIA DGX Spark + Apple Mac Studio for 4x Faster LLM Inference with EXO 1.0

We recently received early access to 2 NVIDIA DGX Sparkā„¢ units. NVIDIA calls it the world's smallest AI supercomputer. It has ~100 TFLOPs of FP16 performance with 128GB of CPU-GPU coherent memory at 273 GB/s. With EXO, we've been running LLMs on clusters of Apple Mac Studios with M3 Ultra chips. The Mac Studio has 512GB of unified memory at 819 GB/s, but the GPU only has ~26 TFLOPs of FP16 performance. The DGX Spark has 4x the compute, the Mac Studio has 3x the memory bandwidth. What if we combi...

Read more at blog.exolabs.net

© News Score  score the news, sort the news, rewrite the headlines