Artificial intelligence (AI) is transforming every industry greater than ever before. From healthcare to education, automotive, retail, and finance, trillions of dollars are being allocated to the AI economy. The reason for this trend is that AI inferencing is making consumer lives more convenient with smart, real-time experiences, and increasing operational efficiency as we ascend into the age of AI.
As more modern enterprises and cloud service providers (CSPs) are moving their deployments to the edge to avoid network latency, they need more accelerators that target inference applications. However, developing inference solutions from concept to deployment is not easy. Many moving pieces must work together and in harmony to orchestrate successful inference deployments. One of the obstacles for enterprises and CSPs when it comes to supplying the inference demands, is choosing a validated and trusted hardware provider that works with the latest NVIDIA GPUs.
With this vision, QCT already offers a full portfolio of NVIDIA-Certified Systems, featuring NVIDIA Ampere architecture-based Tensor Core GPUs, for inference workloads. QCT enables enterprises to confidently deploy their hardware solutions optimized for modern accelerated workloads and NVIDIA AI, by bringing together NVIDIA GPUs and NVIDIA networking.
QCT’s wide range of NVIDIA-Certified Systems powered by NVIDIA A100, and A30 deliver leading inference performance across cloud, data center, and edge, ensuring that AI-enabled applications deploy with fewer servers and less power, resulting in faster insights with dramatically lower costs. A100 features the highest inference performance for the most compute-intensive applications, and A30 brings optimal inference performance for mainstream servers. With the introduction of the NVIDIA A2 Tensor Core GPUs, there is now an entry-level inference option in a low-profile form factor added to the existing NVIDIA AI platform to deliver the most comprehensive AI inference offering across edge, data center, and cloud.
Figure 1. NVIDIA A2, A30, and A100 GPUs for inference
Optimized for Any Server
The NVIDIA A2 Tensor Core GPU is provided in a small footprint with high performance with NVIDIA AI at the edge. Featuring a low-profile PCIe Gen4 card and a low power envelope, down to 40W, the A2 can fit a variety of QCT’s servers, making it ideal for far edge servers. It also features configurable thermal design power (TDP) capabilities that scale performance per watt for versatile inference acceleration to any server. Compared to CPU-only servers, edge and entry-level servers with NVIDIA A2 Tensor Core GPUs offer up to 20x more inference performance (see Figure 2 below), instantly upgrading any server to handle modern AI.
Figure 2. NVIDIA A2 inference speedups versus CPU
QCT Platforms supporting NVIDIA A2 Tensor Core GPUs:
- QuantaGrid D53X-1U – a balanced architecture powered by 3rd Gen Intel® Xeon Scalable processors with built-in acceleration and advanced security for hybrid cloud infrastructures and data analytics, supporting up to 3 NVIDIA A2 GPUs.
- QuantaGrid D52B-1U – a general purpose rackmount server designed for optimal performance and power efficiency. It is based on dual 2nd Gen Intel® Xeon® Scalable Processor Family and supports up to 2 NVIDIA A2 GPUs in a 1U chassis.
- QuantaGrid D43K-1U – a general-purpose 1U AMD EPYC™ dual socket compute server supporting 3 NVIDIA A2 GPUs for optimum performance to support demanding workloads. It is ideal to build software-defined and virtualized infrastructures, supporting HPC applications such as database analytics or all flash storage arrays.
- QuantaGrid D52BQ-2U – based on dual 2nd Gen Intel® Xeon® Scalable Processors, this server features up to 3 TB memory capacity, and supports 2 NVIDIA A2 Tensor Core GPUs in a 2U chassis.
- QuantaEdge EGX63IS-1U – a single-socket 3rd Gen Intel® Xeon® Scalable Processor short-depth MEC server supporting up to 2 NVIDIA A2 GPUs with high-performance, I/ O expandability and power efficiency, providing a best-in-class open platform for 5G Open RAN, private 5G networks, and a broad range of 5G MEC applications.
- QuantaEdge EGX66Y-2U – a 3rd Gen Intel® Xeon® Scalable Processor dual-socket short-depth MEC server, supporting up to 4 NVIDIA A2 Tensor Core GPUs with high-performance, I/O expandability and power efficiency for 5G Open RAN, private 5G networks, and a broad range of 5G MEC applications.
As enterprises and CSPs work to capitalize on the massive amounts of data they collect everyday, AI inferencing is emerging as one of the most important enterprise applications. It’s the key in making consumers’ lives easier, while providing automation capabilities to modernize society. Furthermore, the results enabled by QCT platforms with NVIDIA GPUs are easier than ever to develop, deploy and manage from anywhere and at any time.
FInd out more about QCT platforms at www.qct.io.