The MLPerf™ Inference v6.0 results further underscore the impact of QCT’s engineering collaboration with NVIDIA. Quanta Cloud Technology (QCT) platforms with NVIDIA technologies are enabling customers to fuel AI at scale with the same industry-leading performance demonstrated in this latest benchmark round.
From next-generation GPU platforms to emerging AI models, QCT delivers a diverse and robust set of submissions that reflect AI deployment needs across generative AI, multimodal processing, speech inference, and graph intelligence workloads.
MLPerf Inference v6.0 Highlighted Systems from QCT:

QuantaGrid D75E-4U (8× NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs): A versatile, power-efficient GPU system based on the NVIDIA MGX architecture and ideal for enterprises scaling inference services across language and vision workloads. Optimized for balanced performance and deployment flexibility, the D75E-4U can also support up to eight NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Its low-power and modular design also ensures operational efficiency without compromising on acceleration capabilities.

QuantaGrid D75H-10U (8× NVIDIA Blackwell Ultra GPUs): This HGX B300 platform is purpose-built for demanding generative AI workloads and large-scale inference deployments. Designed for maximum throughput, memory bandwidth efficiency, and exceptional scaling for high-parameter LLMs. The D75H-10U supports eight NVIDIA Blackwell Ultra GPUs and is ideal for enterprises that want to build large-scale GPU clusters to accelerate LLM training and inference. It delivers breakthrough performance for the most complex workloads including agentic AI, AI reasoning, and real-time video generation for every data center.
Models and workloads featured in MLPerf Inference v6.0 submissions:
From GPU utilization efficiency to inference throughput under MLPerf’s rigorous closed-division rules, QCT systems powered by NVIDIA accelerated computing consistently deliver results aligned with the highest standards of performance. QCT showcased broad coverage across today’s most relevant inference workloads. By running Llama 3.1-405B, DeepSeek-R1, and Whisper benchmarks, along with other MLCommons standards around computer vision, speech, language, and recommendation systems, QCT’s NVIDIA-powered systems demonstrated excellent capacity and readiness for real-time inference and the most demanding enterprise AI workloads.
Through continuous participation in MLPerf, the industry standard for AI benchmarking, QCT has built deep expertise across a wide range of AI inference workloads. This foundation enables QCT to consistently deliver best-fit infrastructure tailored to the specific needs of enterprise customers. View the full MLPerf Inference v6.0 results here.

