Elon Musk spent roughly $10 billion on AI training hardware in 2024
Enough to be competitive?
Tesla and xAI, Elon Musk's companies, will bring online $10 billion worth of training compute capacity by the end of this year, as observed by Sawyer Merritt, a co-founder of TwinBirch and a Tesla investor. And yet, it probably means that both companies will be somewhat behind schedule set by Elon Musk.
Elon Musk and his companies have recently been actively making announcements about AI supercomputers, so indeed, we are talking about huge investments.
In July, xAI began AI training using the Memphis Supercluster, which is set to integrate 100,000 liquid-cooled H100 GPUs. This system requires a gargantuan amount of power, drawing at least 150 MW, as the 100,000 H100 GPUs alone account for around 70 MW. The system's total cost is unknown, though GPUs alone would cost around $2 billion (if bought at $20,000 per unit), and typically, AI GPUs account for half of the cost of the whole system.
In late August, Tesla unveiled its Cortex AI cluster, equipped with an impressive 50,000 Nvidia H100 GPUs and 20,000 of Tesla's own Dojo AI wafer-sized chips. The Dojo cluster is projected to train Tesla's full self-drive (FSD) capability, so this machine is strategically vital for the company.
As for costs, we are talking about two billion on the H100-based machine and about at least a billion on the Dojo supercomputer. That billion could be underestimated as Dojo machines are entirely custom-designed. For example, each Dojo D1 cabinet consumes more than 200 kW (to put it into context, each B200 NVL72 cabinet is expected to consume 120 kW) and therefore requires a fully custom cooling distribution unit (CDU) and power supply, which dramatically increases its cost.
Finally, in early September, xAI began operating its Colossus supercomputer, which already integrates 100,000 H100 GPUs and is expected to add 50,000 H100 and 50,000 H200 GPUs in the coming months. This giant AI supercomputer also costs billions.
XAI and Tesla probably announced spending of well over $10 billion on AI hardware this year. Of course, it will take some time before all those AI servers are installed and come online, so we can only guess the total cost of functioning AI hardware the two companies installed in 2024.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
But the most ironic thing about these enormous spending is that they seem to be somewhat behind the ambitious plan that Elon Musk outlined this April, when he said that Tesla alone would spend $10 billion on AI hardware this year.
"Tesla will spend around $10 billion this year on combined training and inference AI, the latter being primarily in car," Musk wrote in an X post. "Any company not spending at this level, and doing so efficiently, cannot compete."
While Tesla's Cortex AI cluster is probably a costly endeavor that will likely get more expensive over time should the company decide to install more Dojo or more Nvidia-based machines, we doubt that it is that costs significantly more than, say, $5 billion. As for the costs of AI inference hardware in cars, we cannot imagine that AI compute hardware in the vehicles set to be produced by Tesla this year costs $5 billion.
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
Giroro Don't worry, they'll only have to charge their remaining X users a measly $30 per month, and they'll be able to break even on their initial investment in just a few short decades.Reply
What could possibly go wrong? -
hotaru251 sure is enjoyable seeing rich people (especially musk) throwing $ down drain that they wont ever recover xDReply -
JRStern Probably all repurposed within three years to generate AI pr0n, but at least that way it will turn a profit.Reply -
jed351 I still don't understand why Tesla needs that much compute to train a computer vision model that runs on a 30W ASIC with 8GB RAM designed in 2019Reply