MLCommons is out today with its MLPerf 4.0 benchmarks for inference, once more demonstrating the relentless pace of software and hardware improvements.
As generative AI continues to evolve and gain acceptance, there’s a transparent need for a vendor-neutral set of performance benchmarks, which MLCommons provides with the MLPerf benchmark set. There are several MLPerf benchmarks, with training and inference being amongst probably the most useful. The latest MLPerf 4.0 inference results are the primary update to inference benchmarks for the reason that release of MLPerf 3.1 leads to September 2023.
Needless to say, lots has happened within the AI world within the last six months, including amongst the key hardware vendors Nvidia And Intel have been busy improving each hardware and software to further optimize inference. The MLPerf 4.0 inference results show significant improvements for each Nvidia and Intel technologies.
The MLPerf inference benchmark has also modified. Using the MLPerf 3.1 benchmark, large language models (LLMs) were included within the GPT-J 6B (billion) parameter model to perform text summarization. The latest MLPerf 4.0 benchmark benchmarks the favored open model Llama 2 with 70 billion questions and answers (Q&A) parameters. MLPerf 4 also features a stable diffusion Gen AI imaging benchmark for the primary time.
“MLPerf is absolutely something of an industry benchmark for improving the speed, efficiency and accuracy of AI,” said David Kanter, founder and CEO of MLCommons, in a press conference.
Why AI benchmarks are vital
MLCommons' latest benchmark accommodates greater than 8,500 performance results and tests all possible combos and permutations of hardware, software and AI inference use cases. Kanter emphasized that the MLPerf benchmarking process has an actual purpose.
“To remind people of the principle behind benchmarks. The real goal is to ascertain good metrics for AI performance,” he said. “The point is that when we are able to measure this stuff, we are able to start to enhance them.”
Another goal of MLCommons is to assist bring all the industry together. The benchmark results are all performed using tests with similar data sets and configuration parameters on different hardware and software. The results are visible to all submitters of a given test, allowing questions from other submitters to be answered.
Ultimately, the standardized approach to measuring AI performance is about empowering firms to make informed decisions.
“This helps inform buyers, make decisions and understand how systems, whether on-premise systems, cloud systems or embedded systems, perform under relevant workloads,” Kanter said. “If you need to buy a system to run large language model inference, you need to use benchmarks to guide you on what those systems should appear to be.”
Nvidia triples AI inference performance with the identical hardware
Once again, Nvidia dominates the MLPerf benchmarks with quite a lot of impressive results.
While latest hardware is predicted to deliver higher performance, Nvidia can be in a position to squeeze higher performance out of its existing hardware. With Nvidia's TensorRT-LLM Thanks to open source inference technology, Nvidia was in a position to nearly triple inference performance for text summarization with the GPT-J LLM on its H100 Hopper GPU.
In a briefing with press and analysts, Dave Salvator, director of accelerated computing products at Nvidia, emphasized that the performance increase got here in only six months.
“We were in a position to triple the performance that we see and we’re very, very completely satisfied with this result,” said Salvator. “Our engineering team continues to do great work finding ways to squeeze more performance out of the Hopper architecture.”
Nvidia announced its latest generation of Blackwell GPU at GTC last week, which is the successor to the Hopper architecture. In response to a matter from VentureBeat, Salvator said he doesn't know exactly when Blackwell-based GPUs can be benchmarked for MLPerf, but he hopes it can occur as soon as possible.
Even before Blackwell is benchmarked, the MLPerf 4.0 results mark the debut of the H200 GPU results, further improving the H100's inference capabilities. H200 results are as much as 45% faster than H100 when evaluated using Llama 2 for inference.
Intel reminds the industry that CPUs are still vital even for inference
Intel may be very actively participating within the MLPerf 4.0 benchmarks with each its Habana AI accelerator and its Xeon CPU technologies.
At Gaudi, Intel's actual performance results lag those of the Nvidia H100, although the corporate claims it offers higher value for money. What's even perhaps more interesting are the impressive gains that come from the fifth Gen Intel Xeon processor.
In a briefing with press and analysts, Ronak Shah, AI product director for Xeon at Intel, explained that the Intel fifth Gen Xeon was as much as 1.9- times faster.
“We recognize that many enterprise customers deploying their AI solutions will accomplish that in a mixed general-purpose and AI environment,” Shah said. “That’s why we developed CPUs with our AMX engine that work together, combining strong general-purpose features with strong AI capabilities.”