News Score: Score the News, Sort the News, Rewrite the Headlines

Supercomputer networking to accelerate large scale AI training

Frontier model training depends on reliable supercomputer networks that can quickly move data between GPUs. To make this faster and more efficient, OpenAI has partnered with AMD, Broadcom, Intel, Microsoft, and NVIDIA to develop MRC (Multipath Reliable Connection): a novel protocol that improves GPU networking performance and resilience in large training clusters. We released MRC today⁠(opens in a new window) through the Open Compute Project (OCP) to enable the broader industry to use it. With m...

Read more at openai.com

© News Score  score the news, sort the news, rewrite the headlines