The Next Major Upgrade for AI and Supercomputing is Liquid Cooling

New and more demanding workloads are pushing power demands over what traditionally designed server processors could deliver. With the development of more powerful chips (the amount of power is actually increasing as more cores are packed into a single CPU and more CPUs are packed into a smaller space), and the slowing growth of Moore’s Law, accelerators are becoming more popular. In this scenario, an increasing number of accelerator processors (i.e. GPU and also FPGAs) are picking up the slack for pushing systems to their limits for big data analytics, AI, HPC, machine learning, media streaming, and latency-sensitive applications.

With these hardware accelerators and high performance server CPUs (at times working in tandem) requiring more wattage, we are seeing individual systems that require better cooling. But for the most part, servers that are mounted to racks have relied on traditional heat sinks and fans. In this setup a thermal conductor carries heat away from the CPU, and a fan blows on the heatsink to dissipate the heat. And because CPUs are socketed on the motherboard, it’s easier to fit them with the heat sink.

So why liquid instead of air? The air cooling method dates back to the era when things didn’t run that hot. Server cabinets would be placed on raised floors with tiny holes for cold air to come up and be sucked into the mainframe. Those rooms, may have been cold, but we can’t put servers in a freezer as the demand for more cooling capacity exceeds that of what cold air can offer. Furthermore, rack density is on the rise, hand-in-hand with the amount of watts that are required to keep data centers up and running while meeting low PUE levels to stay “green”.

So, we need to start looking at liquid cooling as the answer to address the resulting high wattages at both the node and rack-level. Especially now that we are at a time when heat rejection and immersion setups are flexible and cost effective enough for cooling hardware components, including both GPUs and CPUs. Liquid cooling also operates like air conditioning, the difference is the fact that we are using water or refrigerant to chill, and “something” to transport that liquid to meet the cooling levels required to meet peak performance levels. That translates to pumps and pipes which deliver cold liquid into the server and send hot liquid out, with redundant cooling (fans blowing air to a radiator) to avoid downtime.

Now this might seem like the days when PC hobbyists and gamers would go to retail outlets like Fry’s Electronics, CompUSA, Micro Center, or Best Buy to secure parts to overclock their rigs. And in some ways that’s exactly what it is. But those were one-size-fit all solutions, that would be difficult for data center servers to adopt. What is required for high-performance is an architecture and “plumbing” that can be distributed to address the full range of cooling scenarios.

It’s not a new idea for servers; companies had used liquid refrigerant for cooling back even before the 1980s. Without it, those systems would have melted if they exceeded such high thermal design point (TDP) temperatures. Even in today’s scenarios, legacy cooling equipment in an old data center could be running just fine. But what if you could improve the efficiency, recover megawatts of power capacity, all while lowering operating costs? Why wouldn’t you make the upgrade? And this conversation is proving to be more true not just for super high-density racks, but even for common-density run-of-the-mill solutions placed in data centers all over the world. So go ahead and ask yourself: Should my data center be cooled by liquid or air?

For further questions or inquiries regarding QCT products and/or solutions leave a comment or visit: https://www.qct.io/contactform/index

Leave a Reply Cancel reply