In the rapidly evolving landscape of artificial intelligence (AI), the demand for high-performance computing (HPC) is skyrocketing. Quanta Cloud Technology (QCT), a global leader in data center solutions, is at the forefront of this revolution, consistently enabling HPC and AI workloads with unparalleled efficiency.
A Leap in AI Benchmarking: MLPerf Training v4.0
The latest MLPerf Training v4.0 benchmarks have become a testament to QCT’s dedication to innovation. QCT’s AI Cluster, powered by NVIDIA H100 GPUs, has been listed among the benchmark leaders, showcasing its prowess in a range of AI tasks.
QCT’s submissions in the closed division of MLPerf Training v4.0 included two powerhouse systems: the QuantaGrid D54U-3U and the QuantaGrid D74H-7U. These systems were rigorously tested across various areas, including vision, language, commerce, marketing, art, gaming, and research.
The QuantaGrid D74H-7U: A Beacon of AI Training
The QuantaGrid D74H-7U, equipped with dual Intel® Xeon® processors and eight-way NVIDIA HGX H100 SXM5 GPUs, excels in compute-intensive AI training. Its design supports non-blocking GPUDirect RDMA and GPUDirect Storage, ensuring superior performance. This makes it an ideal choice for compute-intensive AI training. Its innovative hardware design and software optimization ensure top-tier performance.
Organization | System Name | Host Processor Model Name | Host Processors | Accelerator Model Name | Total Accelerators | Accelerators Per Node |
QCT | D74H-7U | Intel® Xeon® Platinum 8480+ | 2 | NVIDIA H100-SXM5-80GB | 8 | 8 |
Fig. 1. QuantaGrid D74H-7U Cluster Set-up
Fig. 2. QuantaGrid D74H-7U
The QuantaGrid D54U-3U: Versatility Meets Performance
The QuantaGrid D54U-3U’s flexible architecture allows it to accommodate various AI/HPC applications. With four NVIDIA H100-PCIe 80GB accelerators and NVLink bridge adapters, it has demonstrated outstanding performance in the latest MLPerf benchmarks. It can support four dual-width accelerators or eight single-width accelerators, along with dual Intel® Xeon® processors and 32 DIMM slots.
Organization | System Name | Host Processor Model Name | Host Processors | Accelerator Model Name | Total Accelerators | Accelerators Per Node |
QCT | D54U-3U | Intel® Xeon® Platinum 8470 | 2 | NVIDIA H100-PCIe-80GB | 4 | 4 |
Fig. 3. QuantaGrid D54U-3U Cluster Set-up
Fig. 4. QuantaGrid D54U-3U
QCT’s Commitment to Transparency and Excellence
QCT’s commitment goes beyond providing state-of-the-art hardware. By openly sharing MLPerf results, QCT maintains transparency and fosters trust with both academic and industrial users. The company’s dedication to excellence is evident in its continuous efforts to push the boundaries of what’s possible in AI and HPC.
As AI continues to shape our world, QCT’s achievements in MLPerf Training v4.0 are a clear indication of its role as a leader in the field, driving innovation and empowering the future of technology.
QCT’s journey through MLPerf Training v4.0 is a remarkable showcase of its commitment to advancing AI and HPC capabilities. With each benchmark, QCT reaffirms its position as a leader in data center solutions, paving the way for a future where AI’s potential is fully realized.
Click to view MLPerf Training v4.0 results: https://mlcommons.org/benchmarks/training/
Appendix: MLPerf Training v4.0 Bechmark Tests
Area | ||||||||||||||||
Benchmark | Image classification | Image segmentation (medical) | Object detection (light weight) | NLP | LLM | LLM finetuning | Recommendation | Image Generation | Graph neural network (GNN)* | Object detection (heavy weight) | Speech recognition | Recommendation | Reinforcement learning | Object detection (light weight) | Translation (recurrent) | Translation (non-recurrent) |
Dataset | ImageNet | KiTS19 | Open Images | Wikipedia 2020/01/01 | C4 | GovRep r1/r2/r3 | Criteo 4TB multi-hot | LAION-400M-filtered | IGBH-Full | COCO | LibriSpeech | 1TB Click Logs | Go | COCO | WMT English-German | WMT English-German |
Quality Target | 75.90% classification | 0.908 Mean DICE score | 34.0% mAP | 0.72 Mask-LM accuracy | 2.69 log perplexity | ROUGE score | 0.8032 AUC | FID<=90 and CLIP>=0.15 | 72% classification accuracy | 0.377 Box min AP and 0.339 Mask min AP | 0.058 Word Error Rate | 0.8025 AUC | 50% win rate vs. checkpoint | 23.0% mAP | 24.0 Sacre BLEU | 25.00 BLEU |
Reference Implementation Model | ResNet-50 v1.5 | 3D U-Net | RetinaNet | BERT-large | GPT3 | Llama 2 70B | DLRM-dcnv2 | Stable Diffusionv2 | R-GAT | Mask R-CNN | RNN-T | DLRM | Mini Go (based on Alpha Go paper) | SSD | NMT | Transformer |
Latest Version Available | v4.0 | v4.0 | v4.0 | v4.0 | v4.0 | v4.0 | v4.0 | v4.0 | v4.0 | v3.1 | v3.1 | v2.1 | v2.1 | v1.1 | v0.7 | v0.7 |