HomeArtificial IntelligenceMLPerf 4.0 training results show as much as 80% performance improvement in...

MLPerf 4.0 training results show as much as 80% performance improvement in AI

Innovations in machine learning and AI training proceed to achieve momentum, at the same time as increasingly complex generative AI workloads come online.

Today MLCommons has released the MLPerf 4.0 training benchmark, which once more shows record performance. The MLPerf training benchmark is a vendor-neutral standard that enjoys broad industry participation. The MLPerf training suite measures the performance of complete AI training systems across a variety of workloads. Version 4.0 included over 205 results from 17 organizations. The recent update is the primary release of MLPerf training results for the reason that MLPerf 3.1 training in November 2023.

MLPerf 4.0's training benchmarks include results for image generation using Stable Diffusion and Large Language Model (LLM) training for GPT-3. MLPerf 4.0's training benchmarks include quite a lot of first-time results, including a brand new LoRA benchmark that fine-tunes the Llama 2 70B language model for document summarization using a parameter-efficient approach.

As is commonly the case with MLPerf results, there’s a big increase even in comparison to the outcomes from six months ago.

“Even in the event you have a look at it in comparison with the last cycle, a few of our benchmarks have performed almost twice as well, particularly Stable Diffusion,” said David Kanter, founder and CEO of MLCommons, in a press conference. “So that's pretty impressive in six months.”

The actual gain for Stable Diffusion training is 1.8 times faster than in November 2023, while training for GPT-3 was as much as 1.2 times faster.

The performance of AI training depends not only on the hardware

Many aspects play a job in training an AI model.

While the hardware is significant, the software and the network that connects the clusters are equally vital.

“In AI training particularly, we’ve access to many various levers that help us improve performance and efficiency,” Kanter said. “For training, most of those systems use multiple processors or accelerators, and the way in which the work is split and communicated is completely critical.”

Kanter added that vendors should not only profiting from higher silicon chips, but are also using higher algorithms and higher scaling to deliver more performance over time.

Nvidia continues to expand training on Hopper

The great ends in the MLPerf 4.0 training benchmarks mostly go to Nvidia.

Across nine different workloads tested, Nvidia claims to have set recent performance records on five of them. Perhaps most impressively, the brand new records were largely set using the identical core hardware platforms that Nvidia used a yr ago in June 2023.

In a press conference, David Salvator, AI director at Nvidia, explained that the Nvidia H100 Hopper architecture continues to supply value.

“In Nvidia's history with deep learning, in each product generation we’ve typically gotten two to 2.5 times more performance out of an architecture through software innovations over the lifetime of that product,” Salvator said.

For the H100, Nvidia has used quite a few techniques to enhance performance for MLPerf 4.0 training. The various techniques include full-stack optimization, highly optimized FP8 kernels, FP8-aware distributed optimizer, optimized cuDNN FlashAttention, improved overlap of math and communication execution, and intelligent GPU power allocation.

Why MLPerf training benchmarks are vital for firms

Aside from providing firms with standardized benchmarks on training performance, the actual numbers are much more worthwhile.

While performance is continuously improving, Salvator stressed that it is usually improving with the identical hardware.

Salvator noted that the outcomes are a quantitative demonstration of how Nvidia is capable of create recent value on top of existing architectures. As firms consider constructing recent deployments, especially on-premises, they’re essentially embracing a technology platform. What is significant is the proven fact that once an organization first deploys a technology, it will probably proceed to deliver accumulating advantages years later.

“The query of why performance is so vital to us is easy: for firms, it increases the return on investment,” he says.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read