At a time when AI applications are booming, the ability to process and extract useful insights from big data becomes increasingly important, especially for enterprises wishing to retain their competitive edge. Machine learning analyses are based on utilizing enormous datasets for training and inference models. By leveraging these technologies, enterprises can enhance customer experiences and streamline processes, freeing up personnel from mundane tasks to allow them to focus on developing innovative solutions. This year, MLCommons, a San Francisco-based open AI consortium, announced new results for its industry-standard MLPerf Inference v4.0 benchmark suite, which delivers industry standard machine learning (ML) system performance benchmarking in vision, speech, language, and commerce. As the domain of AI is rapidly changing, two new model tests were added for inferencing in the v4.0 release of the MLPerf Data Center Benchmark Suite. Llama 2 70B model was chosen to represent the “larger” LLMs with 70 billion parameters, while stable diffusion XL was selected to represent text-to-image generative AI models with 2.6 billion parameters.
As generative AI is becoming more and more adopted, so are its benchmarks, and that requires a different class of hardware. Quanta Cloud Technology (QCT), a global data center solution provider enabling diverse AI workloads, has once again been named amongst those on the MLPerf inference list in the latest MLPerf results released by MLCommons. For this latest round of MLPerf Inference v4.0, QCT submitted its results to the data center closed division:
- QCT QuantaGrid S74G-2U with NVIDIA Grace Hopper™ Superchip was a highlight amongst those systems listed. The coherent memory of up to 624GB, including 144GB of HBM3e fast memory, between the Grace CPU and Hopper GPU with NVLink C2C interconnect can improve memory-intensive AI inference. QCT achieved outstanding performance in multiple AI tasks in the data center categories as one of the very few Grace Hopper system submissions.
The QuantaGrid D54U-3U is an acceleration server designed for AI. Supporting dual 5th Gen Intel® Xeon® Scalable processors, this 3U system features support for four dual-width accelerator cards or up to eight single-width accelerator cards, providing a comprehensive and flexible architecture optimized for various AI applications and large language models. QCT validated the results of one system supporting four NVIDIA H100 PCIe GPUs and another with four NVIDIA L40S PCIe cards. This is the first appearance of 5th Gen Intel® Xeon® Scalable CPUs in the MLPerf inferencing benchmark, and there are definitive performance boosts over the 4th Gen Xeon® systems from the 2023 MLPerf results.
- Another Intel configuration server listed was the QCT QuantaGrid D54X-1U with dual Intel® Xeon® Scalable Processors in CPU-only inference scenarios. QCT’s server with a CPU-only configuration was validated for its capability to perform excellently in general-purpose AI workloads with Intel® AMX instruction sets.
QCT will continue to deliver comprehensive hardware systems and solutions, while sharing its MLPerf results with the MLCommons community and to its customers. These measurements allow QCT to provide best-in-class solutions as well as its improvements by benchmarking itself amongst those listed, while also contributing to the advancement of MLPerf inference and training benchmarks.
To view the results for MLPerf Inference v4.0 visit: https://mlcommons.org/benchmarks/inference-datacenter/