Copied


NVIDIA GH200 Superchip Achieves Remarkable Results in MLPerf Inference v4.1

Rebeca Moen   Sep 26, 2024 14:20 0 Min Read


In the latest round of MLPerf Inference benchmarks, the NVIDIA platform has demonstrated exceptional performance across various tests, according to the NVIDIA Technical Blog. A standout performer in these benchmarks was the NVIDIA GH200 Grace Hopper Superchip, which integrates an NVIDIA Grace CPU with an NVIDIA Hopper GPU using the high-bandwidth, low-latency NVIDIA NVLink-C2C interconnect.

GH200 Superchip's Architectural Innovation

The NVIDIA GH200 Grace Hopper Superchip represents a novel converged CPU and GPU architecture, combining the high-performance and power-efficient Grace CPU with the powerful Hopper GPU. This integration is facilitated by NVLink-C2C, delivering 900 GB/s bandwidth to the GPU, significantly outpacing current servers. This architecture allows CPU and GPU threads to access all system-allocated memory without the need for data transfer between the CPU and GPU, enhancing efficiency and performance.

Performance in MLPerf Inference Benchmarks

The GH200 Superchip excelled in various generative AI benchmarks in MLPerf Inference v4.1. Notably, it delivered up to 1.4 times more performance per accelerator in demanding benchmarks such as Mixtral 8x7B and Llama 2 70B, compared to the H100 Tensor Core GPU. Furthermore, it outperformed the best two-socket, CPU-only submissions by up to 22 times in the GPT-J benchmark.

In real-time, user-facing services, the GH200 maintained performance within 5% of its offline capabilities, a stark contrast to the 55% performance degradation observed in the best CPU-only submissions. This makes the GH200 a viable option for deployment in production environments requiring real-time AI inference.

GH200 NVL2: Enhanced Capabilities

The GH200 NVL2 builds on the GH200’s capabilities by linking two GH200 Superchips via NVLink within a single node. This configuration provides 8 petaflops of AI performance, 144 Arm Neoverse cores, and 960GB of LPDDR5X memory. The Hopper GPUs in this setup offer 288GB of HBM3e memory and up to 10TB/s of memory bandwidth, making it ideal for high-performance applications such as large language models (LLMs), graph neural networks (GNNs), and high-performance computing (HPC).

Industry Adoption and Endorsements

Several industry leaders have adopted the GH200 architecture in their server designs. Hewlett-Packard Enterprise (HPE) and Supermicro were among the companies that submitted results using GH200-based designs. Kenneth Leach, Principal AI Performance Engineer at HPE, praised the GH200 NVL2 design for its high performance, attributing it to the 144GB HBM3e memory per Superchip.

Oracle Cloud Infrastructure (OCI) also validated the GH200’s performance, with Sanjay Basu, Senior Director of Cloud Engineering at OCI, highlighting the architecture’s potential for AI inference and the forthcoming Grace Blackwell Superchips.

Conclusion

The NVIDIA GH200 Grace Hopper Superchip has set a new benchmark in the MLPerf Inference v4.1 tests, offering unparalleled performance and efficiency. Its innovative architecture and high bandwidth make it a robust solution for enterprise AI applications, ensuring that it remains a leading choice for organizations looking to deploy advanced AI workloads.


Read More