Gigabyte Adopts CoolIT Direct Liquid Cooling for Nvidia A100 Servers

Gigabyte
(Image credit: Gigabyte)

Gigabyte has introduced its first AMD EPYC and Nvidia A100-based high-performance computing (HPC) servers featuring a direct liquid cooling system designed by CoolIT. The new machines feature one or two AMD EPYC 7003-series 'Milan' processors with up to 128 cores in total and four or eight Nvidia A100 80GB SXM4 modules. 

Gigabyte's 4-way 2U G262-ZL0 and 8-way 4U G492-ZL2 machines are designed for high-density AI and HPC installations (such as those used in research labs or universities) that tend to install as many servers in a relatively small footprint as possible and ensure their stable operation and predictable performance. In such use cases, thermals become a problem, so direct liquid cooling makes a lot of sense. Meanwhile, since Gigabyte's server designs are unique, CoolIT, a major liquid cooling specialist, needed to develop a proprietary direct liquid cooling system that cools down CPU(s) and GPUs separately to ensure peak performance.

(Image credit: Gigabyte)

Being based on up to two 64-core AMD EPYC 7003-series 'Milan' processors, the machines feature 128 or 256 PCIe 4.0 lanes and support 4TB or 8TB of DDR4 memory using RDIMM or LRDIMM modules.  

The machines also use Nvidia's HGX platform, which fully supports the company's Magnum IO software stack. It includes Magnum IO GPUDirect storage for a direct data path to move data from local and remote storage to GPU memory and GPUDirect RDMA for direct data exchange between GPUs and third-party devices such as network adapters. Also, the 4-way and 8-way Nvidia A100 GPU complexes fully support Nvidia's NVLink and NVSwitch interconnects to enable GPU peer-to-peer communication at 600 GB/s.

(Image credit: Gigabyte)

In addition to formidable processing power, the new machines also offer relatively decent expandability. The smaller G262-ZL0 machine can pack four 2.5-inch drives (U.2 or SATA), three M.2 SSDs with a PCIe 4.0 x4 interface, five low-profile expansion cards, and one PCI 3.0 board. Meanwhile, the larger G262-ZL2 offers six 2.5-inch bays (U.2 or SATA), two M.2 slots, and ten expansion slots for low-profile cards. Expansion slots can be used to install advanced network cards, such as Nvidia's Mellanox ConnectX-7 SmartNIC for four connectivity ports and up to 400Gb/s of throughput. 

As for power supply, the 4-way GPU server comes with a 3000W redundant PSU, whereas the 8-way GPU machine comes with 3+1 3000W PSUs.  

(Image credit: Gigabyte)

Gigabyte will offer its 4-way 2U G262-ZL0 and 8-way 4U G492-ZL2 machines directly and will consult how to better integrate these machines with their direct liquid cooling into existing data centers. Meanwhile, CoolIT Systems has validated service providers worldwide that can handle planning, installation, and maintenance of its cooling systems as well as sell spare parts.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.