Peak Training: Blackwell Delivers Next-Level MLPerf Training Performance

Generative AI applications that use text, computer code, protein chains, summaries, video and even 3D graphics require data-center-scale accelerated computing to efficiently train the large language models (LLMs) that power them.

In MLPerf Training 4.1 industry benchmarks, the NVIDIA Blackwell platform delivered impressive results on workloads across all tests — and up to 2.2x more performance per GPU on LLM benchmarks, including Llama 2 70B fine-tuning and GPT-3 175B pretraining.

In addition, NVIDIA’s submissions on the NVIDIA Hopper platform continued to hold at-scale records on all benchmarks, including a submission with 11,616 Hopper GPUs on the GPT-3 175B benchmark.

Leaps and Bounds With Blackwell

The first Blackwell training submission to the MLCommons Consortium — which creates standardized, unbiased and rigorously peer-reviewed testing for industry participants — highlights how the architecture is advancing generative AI training performance.

For instance, the architecture includes new kernels that make more efficient use of Tensor Cores. Kernels are optimized, purpose-built math operations like matrix-multiplies that are at the heart of many deep learning algorithms.

Blackwell’s higher per-GPU compute throughput and significantly larger and faster high-bandwidth memory allows it to run the GPT-3 175B benchmark on fewer GPUs while achieving excellent per-GPU performance.

Taking advantage of larger, higher-bandwidth HBM3e memory, just 64 Blackwell GPUs were able to run in the GPT-3 LLM benchmark without compromising per-GPU performance. The same benchmark run using Hopper needed 256 GPUs.

The Blackwell training results follow an earlier submission to MLPerf Inference 4.1, where Blackwell delivered up to 4x more LLM inference performance versus the Hopper generation. Taking advantage of the Blackwell architecture’s FP4 precision, along with the NVIDIA QUASAR Quantization System, the submission revealed powerful performance while meeting the benchmark’s accuracy requirements.

Relentless Optimization

NVIDIA platforms undergo continuous software development, racking up performance and feature improvements in training and inference for a wide variety of frameworks, models and applications.

In this round of MLPerf training submissions, Hopper delivered a 1.3x improvement on GPT-3 175B per-GPU training performance since the introduction of the benchmark.

NVIDIA also submitted large-scale results on the GPT-3 175B benchmark using 11,616 Hopper GPUs connected with NVIDIA NVLink and NVSwitch high-bandwidth GPU-to-GPU communication and NVIDIA Quantum-2 InfiniBand networking.

NVIDIA Hopper GPUs have more than tripled scale and performance on the GPT-3 175B benchmark since last year. In addition, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA increased performance by 26% using the same number of Hopper GPUs, reflecting continued software enhancements.

NVIDIA’s ongoing work on optimizing its accelerated computing platforms enables continued improvements in MLPerf test results — driving performance up in containerized software, bringing more powerful computing to partners and customers on existing platforms and delivering more return on their platform investment.

Partnering Up

NVIDIA partners, including system makers and cloud service providers like ASUSTek, Azure, Cisco, Dell, Fujitsu, Giga Computing, Lambda Labs, Lenovo, Oracle Cloud, Quanta Cloud Technology and Supermicro also submitted impressive results to MLPerf in this latest round.

A founding member of MLCommons, NVIDIA sees the role of industry-standard benchmarks and benchmarking best practices in AI computing as vital. With access to peer-reviewed, streamlined comparisons of AI and HPC platforms, companies can keep pace with the latest AI computing innovations and access crucial data that can help guide important platform investment decisions.

Blog Article: Here

  • Related Posts

    AI’s in Style: Ulta Beauty Helps Shoppers Virtually Try New Hairstyles

    Shoppers pondering a new hairstyle can now try styles before committing to curls or a new color. An AI app by Ulta Beauty, the largest specialty beauty retailer in the U.S., uses selfies to show near-instant, highly realistic previews of desired hairstyles. GLAMlab Hair Try On is a digital experience that lets users take a
    Read Article

    NieR Perfect: GeForce NOW Loops Square Enix’s ‘NieR:Automata’ and ‘NieR Replicant ver.1.22474487139…’ Into the Cloud

    Stuck in a gaming rut? Get out of the loop this GFN Thursday with four new games joining the GeForce NOW library of over 2,000 supported games. Dive into Square Enix’s mind-bending action role-playing games (RPGs) NieR:Automata and NieR Replicant ver.1.22474487139…, now streaming in the cloud. Plus, explore HoYoverse’s Zenless Zone Zero for an adrenaline-packed
    Read Article

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Announcing CodeQL Community Packs

    60 of our biggest AI announcements in 2024

    60 of our biggest AI announcements in 2024

    Our remedies proposal in DOJ’s search distribution case

    Our remedies proposal in DOJ’s search distribution case

    How Chrome’s Autofill can drive more conversions at checkout

    How Chrome’s Autofill can drive more conversions at checkout

    The latest AI news we announced in December

    The latest AI news we announced in December

    OpenAI’s latest o1 model now available in GitHub Copilot and GitHub Models

    OpenAI’s latest o1 model now available in GitHub Copilot and GitHub Models