What Nvidia’s new MLPerf AI benchmark results really mean

by | Sep 8, 2022 | Technology

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Nvidia released results today against new MLPerf industry-standard artificial intelligence (AI) benchmarks for its AI-targeted processors. While the results looked impressive, it is important to note that some of the comparisons they make with other systems are really not apples-to-apples. For instance, the Qualcomm systems are running at a much smaller power footprint than the H100, and are targeted at market segments similar to the A100, where the test comparisons are much more equitable. 

Nvidia tested its top-of-the-line H100 system based on its latest Hopper architecture; its now mid-range A100 system targeted at edge compute; and its Jetson smaller system targeted at smaller individual and/or edge types of workloads. This is the first H100 submission, and shows up to 4.5 times higher performance than the A100. According to the below chart, Nvidia has some impressive results for the top-of-the-line H100 platform.

Image source: Nvidia.Inference workloads for AI inference

Nvidia used the MLPerf Inference V2.1 benchmark to assess its capabilities in various workload scenarios for AI inference. Inference is different from machine learning (ML) where training models are created and systems “learn.” 

Inference is used to run the learned models on a series of data points and obtain results. Based on conversations with companies and vendors, we at J. Gold Associates, LLC, estimate that the AI inference market is many times larger in volume than the ML training market, so showing good inference benchmarks is critical to success.

Event
MetaBeat 2022
MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

Why Nvidia would run MLPerf

MLPerf is an industry standard benchmark series that has broad inputs from a variety of companies, and models a variety of workloads. Included are items such as natural language processing, speech recognition, image classification, medical imaging and object detection. 

The benchmark is useful in that it can work across machines from high-end data centers and cloud, down to smaller-scale edge computing systems, and can offer a consistent benchmark across various vendors’ products, even though not all of the subtests in the benchmark are run by all testers. 

It can also create scenarios for running offline, single stream or multistream tests that create a series of AI functions to simulate a real-world example of a complete workflow pipeline (e.g., speech recognition, natural language processing, search and recommendations, text-to-speech, etc.). 

While MLPerf is accepted broadly, many players feel that runnin …

Article Attribution | Read More at Article Source

Share This