NVIDIA Blackwell Achieves New Milestone in MLPerf Training Performance
In a remarkable achievement, NVIDIA's Blackwell platform has set new standards in the MLPerf Training 4.1 industry benchmarks, delivering outstanding results across a variety of workloads, according to NVIDIA's blog. The platform demonstrated up to 2.2x more performance per GPU on large language model (LLM) benchmarks, notably excelling in Llama 2 70B fine-tuning and GPT-3 175B pretraining.
Leaps and Bounds With Blackwell
The Blackwell architecture's first submission to the MLCommons Consortium emphasizes its role in advancing generative AI training performance. Key to this achievement are new kernels that optimize the use of Tensor Cores, the fundamental math operations behind many deep learning algorithms. This optimization allows Blackwell to achieve higher compute throughput per GPU while utilizing significantly larger and faster high-bandwidth memory.
Notably, the platform's efficiency was highlighted by its ability to run the GPT-3 LLM benchmark with just 64 GPUs, maintaining exceptional per-GPU performance. In contrast, the same task required 256 GPUs on the Hopper platform, underlining Blackwell's superior efficiency and capability.
Relentless Optimization
NVIDIA continues to enhance its platforms through ongoing software development, improving performance and features for a wide range of frameworks and applications. The latest MLPerf training submissions showcased a 1.3x improvement in GPT-3 175B per-GPU training performance on Hopper since the benchmark's introduction.
Additionally, large-scale results were achieved using 11,616 Hopper GPUs, connected via NVIDIA NVLink and NVSwitch for high-bandwidth communication, alongside NVIDIA Quantum-2 InfiniBand networking. This setup has more than tripled scale and performance on the GPT-3 175B benchmark compared to the previous year.
Partnering Up
NVIDIA's success is also reflected in the contributions of its partners, including major system makers and cloud service providers such as ASUSTek, Azure, Cisco, Dell, Fujitsu, and others, who submitted impressive results to MLPerf. As a founding member of MLCommons, NVIDIA emphasizes the importance of industry-standard benchmarks in AI computing, providing crucial data for companies to make informed platform investment decisions.
Through continuous advancements and optimizations, NVIDIA's accelerated computing platforms are setting new benchmarks in AI training, offering enhanced performance and greater returns on investment for partners and customers alike.