MLPerf, developed by MLCommons, is an open engineering consortium of AI leaders from academia, research labs, and the industry that provides unbiased evaluations of training and inference performance for hardware, software, and services. For MLPerf Training v3.0, QCT, who is also a member of MLCommons, submitted two systems in the closed division showcasing QCT’s capabilities in AI training. (Shown below: server, CPU, GPU)
- QCT D54Q-2U; 2 x Intel Xeon Gold 6430 processors; 2 x NVIDIA H100 PCIe-80GB
- QCT D74H-7U; 2 x Intel Xeon Platinum 8490H processors; 8 x NVIDIA H100 SXM5-80GB
In the areas of Vision, Language and Commerce, QCT’s submissions included benchmarks in Image Classification, Object Detection, Natural Language Processing (NLP), Speech Recognition, and Recommendation, achieving the specified Quality Targets (see chart below) using its QuantaGrid-D54Q-2U System and QuantaGrid D74H-7U.
Each benchmark measures the wall-clock time required to train a model on the specified dataset to achieve the specified quality target.
The QuantaGrid D54Q-2U powered by 4th Gen Intel Xeon Scalable processors delivers scalability with flexible expansion slot options including PCle 5.0, up to two double-width accelerators, up to 16TB DDR5 memory capacity and 26 drive bays, serving as a compact AI system for computer vision and language processing scenarios. In this round, the QuantaGrid-D54Q-2U Server was configured with two NVIDIA H100-PCIe-80GB accelerator cards to achieve outstanding performance within Pytorch and MXNet.
The QuantaGrid D74H-7U is an 8-way GPU server equipped with the NVIDIA HGX H100 8-GPU Hopper SXM5 module which supports two Sapphire Rapids sockets each with 16 DIMMs, and supporting up to 10x OCP NIC 3.0 slots. The system design makes it ideal for training massive datasets and AI models such as natural language processing (NLP) & large language models (LLM). With its innovative hardware design and software optimization, the QuantaGrid D74H-7U server achieved excellent benchmark results in the ML framework and primary ML hardware libraries used.
As a member of MLCommons, QCT will continue to provide comprehensive hardware systems, solutions, and services to encourage innovation as MLPerf continues to evolve and hold new tests at regular intervals that represent the current state of AI in MLPerf training and inference benchmarks.
To view the QCT submission results please visit: https://mlcommons.org/en/training-normal-30/