Standard data center designs may not withstand the power and cooling necessities posed by AI processing, as stated by Schneider Electric.
The firm, Schneider Electric, emphasizes that conventional data centers may not be equipped to cater to the intense power and cooling needs that AI demands, prompting a shift in infrastructure design.
While one might anticipate such claims from a corporation like Schneider, a prominent player in the power and cooling sectors for data centers, their argument stands valid. AI processing is inherently different from routine server tasks like database management, and conventional methodologies are becoming outdated.
In a recent publication, Schneider accentuates that AI’s efficiency heavily relies on three factors: adequate power, cooling, and bandwidth. Currently, GPUs, which are prevalent for AI processing, consume significant power. To draw a comparison, while standard Intel and AMD CPUs consume around 300 to 400 watts, Nvidia’s latest GPUs draw a massive 700 watts each, often used in groups of eight.
This heightened consumption leads to an increase in rack density. Traditionally, a rack density between 10kW to 20kW was easily managed through air cooling. However, with densities surpassing 30kW, liquid cooling emerges as the primary cooling solution, which isn’t easily integrated into existing systems.
The document authors emphasize that stakeholders, ranging from AI startups and large-scale enterprises to data center management, must assess the ramifications of these intense densities on the infrastructure.
Based on Schneider’s forecast, global data center power usage will reach 54GW this year, escalating to 90GW by 2028. AI processing, currently using 8% of the total power, is expected to increase its consumption to 15-20% by 2028.
While power and cooling dominate discussions, network speed and connectivity also warrant attention. In the AI realm, every GPU requires a high-throughput network port. However, the growth rate of GPUs has outstripped that of network ports. For instance, GPUs processing memory data at 900 Gbps paired with a 100 Gbps compute fabric may face performance bottlenecks. While InfiniBand offers a quicker alternative, it comes at a premium.
One potential solution to heat issues is to spatially distribute the hardware. However, this can introduce latency, which hampers performance.
Schneider’s Recommendations Schneider suggests several measures:
- Transitioning from 120/280V to 240/415V power systems to streamline circuits within dense racks and endorsing multiple PDUs to ensure power sufficiency.
- Maintaining a 20kW per rack cap for air cooling. Beyond this, the adoption of liquid cooling is advised. Considering air cooling’s 30kW threshold, Schneider’s air cooling limit might appear a tad conservative.
- Although various liquid cooling methods exist, Schneider leans towards direct liquid cooling where heat absorption occurs via a copperplate mechanism. However, the company seems skeptical about immersion cooling due to potential environmental concerns.
Schneider underscores the absence of standardization in the realm of liquid cooling. Hence, a comprehensive infrastructure evaluation, led by seasoned professionals, becomes imperative. Typically, liquid cooling installations occur during the data center construction phase, rather than as a subsequent addition.