QCT’s QuantaGrid server portfolio enables enterprises and businesses looking to use the power of GPUs to satisfy their compute needs with significant improvements through the use of NVIDIA L40S GPUs, announced at SIGGRAPH 2023. These new GPUs deliver incredible AI capabilities, exceeding the inference performance of NVIDIA A100 Tensor Core GPUs. The NVIDIA L40S GPU, which powers the new NVIDIA OVX™ servers, is based on the NVIDIA Ada Lovelace architecture and delivers breakthrough graphics, compute, and AI performance for data center workloads. With fourth-generation Tensor Cores, third-generation RT Cores, high-speed GDDR6 memory, and support for the FP8 Transformer Engine, the NVIDIA L40S GPU delivers more generative AI performance than the previous generation of GPUs based on the NVIDIA Ampere architecture.
NVIDIA L40S GPUs undergo rigorous joint-qualification programs with leading OEMs, encompassing a comprehensive set of system configurations that pass a battery of hardware and software validation tests. This ensures that the final product meets the reliability and quality standards customers expect. Combining these types of capabilities with QCT’s QuantaGrid servers delivers breakthrough performance and scalability to accelerate the below listed types of workloads:
The AI, graphics, and media acceleration capabilities of the NVIDIA L40S GPU make it the premier platform for multi-modal generative AI pipelines. With powerful inferencing capabilities, combined with NVIDIA RTX-accelerated ray tracing and dedicated encode and decode engines, the NVIDIA L40S GPU accelerates AI-enabled audio, speech, 2D, video, and 3D generative AI applications. For image generative AI inference, the NVIDIA L40S GPU delivers 1.2X more performance than the NVIDIA A100 GPU. This breakthrough performance, combined with 48GB of memory capacity, makes the NVIDIA L40S GPU the ideal generative AI platform for high-quality images and immersive visual content.
Inference and Training
The NVIDIA L40S GPU accelerates training, fine-tuning, and inference workloads with powerful throughput and floating-point performance to build and deploy state-of-the-art AI models. Powerful NVIDIA-Certified Systems with eight NVIDIA L40S GPUs can train foundational models with up to 175 billion parameters to convergence and accelerate fine-tuning and retraining of existing large-scale models to adapt them for new tasks, delivering up to 1.7X the performance of the prior generation NVIDIA A100 GPU.
Combining NVIDIA’s full stack of inference-serving software with the compute capabilities of the NVIDIA L40S GPU provides a powerful platform for trained models ready for inference. With support for structural sparsity and a broad range of precisions, including TF32, INT8, and FP8, the L40S delivers over 1 petaFLOPS of tensor operation performance, delivering actionable insights with speed and precision.
AI-Ready Development Platform
Enterprise adoption of AI is now mainstream and leading to an increased demand for skilled AI developers and data scientists. Organizations require a flexible, high-performance platform consisting of optimized hardware and software to maximize productivity and accelerate AI development.
NVIDIA AI Enterprise is an end-to-end, enterprise-grade software that powers the NVIDIA AI platform. It offers 100+ frameworks, pretrained models, and libraries to streamline development and deployment of production AI, including generative AI, computer vision, and speech AI. Optimized and certified for reliable performance, NVIDIA AI Enterprise, together with the NVIDIA L40S GPU, provides a unified platform to develop applications once and deploy them anywhere, reducing the risks involved in moving from pilot to production.
Rendering and 3D Graphics
Running professional 3D visualization applications with NVIDIA L40S GPUs enables creative professionals to iterate more, render faster, and unlock tremendous performance advantages that increase productivity and speed up project completion. The NVIDIA L40S GPU’s third-generation RT Cores and industry-leading 48GB of GDDR6 memory deliver more than 3.8X the real-time ray-tracing performance of the previous generation. With these capabilities, artists and designers can work with complex geometry and high-resolution textures in real time to generate photorealistic designs and power full-fidelity creative workflows, from interactive rendering to virtual production.
NVIDIA Omniverse is a software platform for developing unified 3D workflows and OpenUSD applications to enable industrial digitalization. The full-stack platform is based on the OpenUSD framework and NVIDIA RTX technology. As the engine of Omniverse in the data center, the NVIDIA L40S GPU brings powerful capabilities to product design and review workloads. For the most complex Omniverse workloads, the NVIDIA L40S GPU accelerates ray-traced and path-traced rendering of materials. It can be used for generating photorealistic 3D synthetic data and developing physically accurate simulations and digital twins.
Streaming and Video Content
The NVIDIA L40S GPU takes streaming and video content workloads to the next level, delivering breakthrough media acceleration capabilities with three video encode and three video decode engines. With the addition of AV1 encoding, the NVIDIA L40S GPU delivers breakthrough performance and improved TCO for broadcast streaming, video production, and transcription workflows.
When combined with NVIDIA RTX Virtual Workstation (vWS) software, the NVIDIA L40S GPU can be virtualized to deliver high-performance workstation instances to remote users for high-end design, AI, and compute workloads. With 48GB of GPU memory, the NVIDIA L40S GPU with vWS enables flexible, work-from-anywhere solutions for GPU memory-intensive workloads.
To fulfill these types of workloads, QCT offers support of the NVIDIA L40S GPU with its QuantaGrid D54Q-2U and QuantaGrid D54U-3U. The D54Q-2U is 2U general-purpose server, powered by 4th Gen Intel® Xeon® Scalable processors with flexible PCIe expansion slot options, all supporting PCle 5.0. lt can also be thermally optimized for two dual-width NVIDIA L40S GPUs and optimized for generative Al workloads. As part of the 4th Gen of the QuantaGrid product lineup, the D54Q-2U offers all 24 NVMe flash drives in U.2 or E1.S form factors as hot tier storage, targeting HPC and enterprise workloads. The QuantaGrid D54U-3U is an acceleration server designed for parallel computing, supporting NVIDIA NVLink bridge, two 4th Gen Intel® Xeon® Scalable processors, up to 350W, and 32x DIMM slots. This 3U system can support four NVIDIA L40S GPUs or up to eight single-width accelerator cards to provide a comprehensive and flexible architecture that can be optimized for various AI, HPC, and deep learning applications.
For more information on QCT products supporting NVIDIA accelerators, visit: https://go.qct.io/nvidia/qct-servers-powered-by-nvidia-gpus/