In 2018, Quanta Cloud Technology (QCT) said that the “Next Major Upgrade for AI and Supercomputing is Liquid Cooling. And since then, QCT has developed its liquid -to-air cooling rack, also known as QoolRack, that reduces power consumption by up to 66.8%.
This all comes at a time when energy price hikes and compute-intensive workloads are posing a cooling cost challenge for data centers, with the thermal design power (TDP) of next-gen CPUs and GPUs continuing to increase, demanding more power and posing a greater thermal challenge for IT managers. To help their customers cope with these situations, several server manufacturers have started to develop advanced data center cooling technologies.
As a pioneer, QCT created its QoolRack, a rack-level direct-to-chip cooling solution, that adopts a cold plate design to meet customers’ thermal demands in 2022. This ready-to-ship solution can save rack-level cooling power consumption by nearly 70%, as it can be operated without adjusting the heating, ventilation or air conditioning (HVAC) to reduce overall power consumption. Under such environmental settings, the power usage effectiveness (PUE) can be lowered to 1.07. At the heart of QoolRack is a coolant distribution unit (CDU) with a data center security control module (DC-SCM), which can execute smart power consumption management according to real workloads.
The CDU’s DC-SCM on each node’s Smart Link automatically adjusts the rear door fan speed through fan zoning and the CDU pump speed to balance the coolant temperature while detecting leakages and thermal conditions to enhance power savings.
Between the CDU at the bottom and the top-of-rack management switch, there is a 30U space for high-performance servers to fit in. The current version of QoolRack can house a total of 15 QuantaGrid D54Q-2U servers, which support the latest 4th Gen Intel Xeon Scalable processors and are thermally optimized for 2 dual-width accelerators to run AI workloads and meet sustainability goals.
QCT QoolRack Combines the Benefits of a Direct-to-chip Design, and is Ready for Data Center Deployments
Why did QCT become engaged with liquid cooling product development and why did they prefer cold plate-based liquid cooling designs? “With many liquid cooling solutions around in the industry, we believe feasible innovative cooling solutions are a must for today’s data centers” said Jack Luoh, QCT’s senior product manager.
As the thermal design power (TDP) of CPUs and GPUs goes up, the heat output of servers are reaching power levels well above 1.5-2 kilowatts for a single node, rendering the traditional air-cooled infrastructures inadequate and costly in dissipating the heat.
QCT addresses this challenge with a direct-to-chip liquid cooling design. It supports chips with a per-slot TDP of 700 watt, in which the highest operating temperature is 70°C and the per-watt cooling cost is lower than 0.05 USD. According to their latest test results, this solution can support chips with a per-slot TDP of 800 watts or higher. Solution readiness is another key consideration of QCT. Luoh addressed that the cold plate(s) of the direct-to-chip design will be the best liquid cooling choice if customers are implementing such solutions today. He then gave a list of evaluations spanning cost, performance and warranty to illustrate its advantages.
First, in terms of cooling costs, cold plate and direct-to-chip cooling uses ordinary coolant, which is cheap (3.5 USD per liter) and only a small volume is needed because it is only stored in finned heat pipes. By comparison, both single-phase and two-phase immersive cooling require much more expensive coolant (150 USD per liter). Furthermore, a large volume of such coolant is needed because it has to fill the server tank. For a 42U tank, approximately 500 liters of coolant is required. In addition, coolant for some two-phase immersive cooling solutions will evaporate and require constant refilling.
Regarding performance, QCT believes that with the number of companies investing in cold-plate based liquid cooling, its effectiveness will become better over time. Two-phase immersive liquid cooling is the most effective approach by far; however, both single-phase and two-phase liquid cooling approaches have not achieved their optimal performance because current data center infrastructures are designed for air-chilled architectures instead of liquid cooling ones. These technologies should be able to deliver higher performance in the future.
When it comes to PUE, two-phase immersive liquid cooling achieves the most satisfactory results, followed closely by single-phase liquid cooling. As for cold plate-based liquid cooling, QCT expressed that the PUE for cold plate-based liquid cooling plus chilled air is 1.3 or 1.6, and that for cold plate-based liquid cooling with unchilled air can be lowered to 1.07.
In terms of the impact of implementing liquid cooling to data center infrastructures, immersive liquid cooling would cause greater changes. For example, regarding serviceability, servicing the hardware components of the servers would require dragging these computing appliances from the cool tank and hanging these appliances for 10, 15, or even 20 minutes for drying, which often requires the assistance of robotic arms before servicing can be carried out.
Coolant ingredients are another issue. The coolant used for single-phase immersive liquid cooling has a low ignition point, and no existing regulations can eliminate this potential risk. As for two-phase immersive liquid cooling, the coolant used contains fluorocarbon, which can release a toxic gas during phase shifting and requires additional measures to undermine its environmental impact.
In terms of capital expenditure, while implementing immersive liquid cooling requires large-scale adjustments and renovations of the entire data center environment, implementing cold plate and direct-to-chip liquid cooling only incurs expenses in the rack level, but requires no changes in the infrastructure level. In addition, as the design will not cause any infrastructure-level impact, the data center servicing IT staff do not need to change their usual practices to implement this solution. From a practical aspect of securing existing investments, organizations and enterprises prefer implementing liquid cooling solutions that can co-exist and interoperate with existing equipment and environment. Because most servers in today’s data centers rely on air cooling infrastructures, they cannot be comprehensively replaced in a short time, QCT provides a liquid cooling design that does not require data center overhauls to ensure continued usage of existing servers.
To meet the deployment demands in existing data center environments, QCT offers two variations of its cold plate-based liquid cooling solutions: one is the Liquid-to-air (L2A) rack that is primarily being displayed, and the other is a Liquid-to-liquid (L2L) rack that can be deployed in data centers with existing hot and cold manifold pipes.