Communication Efficient LLM Pre-training with SparseLoCo
View PDF
HTML (experimental)
Abstract:Communication-efficient distributed training algorithms have received considerable interest recently due to their benefits for training Large Language Models (LLMs) in bandwidth-constrained settings, such as across data centers and over the internet. Despite reducing communication frequency, these methods still typically require communicating a full copy of the model's gradients-resulting in a communication bottleneck even for cross-datacenter links. Further...
Read more at arxiv.org