Meta is using more than 100,000 Nvidia H100 AI GPUs to train Llama-4 — Mark Zuckerberg says that Llama 4 is being trained on a cluster “bigger than anything that I’ve seen”
Llama 4 slated to have new modalities, stronger reasoning, and faster performance
Mark Zuckerberg said on a Meta earnings call earlier this week that the company is training Llama 4 models “on a cluster that is bigger than 100,000 H100 AI GPUs, or bigger than anything that I’ve seen reported for what others are doing.” While the Facebook founder didn’t give any details on what Llama 4 could do, Wired quoted Zuckerberg referring to Llama 4 as having “new modalities,” “stronger reasoning,” and “much faster.” This is a crucial development as Meta competes against other tech giants like Microsoft, Google, and Musk’s xAI to develop the next generation of AI LLMs.
Meta isn’t the first company to have an AI training cluster with 100,000 Nvidia H100 GPUs. Elon Musk fired up a similarly sized cluster in late July, calling it a ‘Gigafactory of Compute’ with plans to double its size to 200,000 AI GPUs. However, Meta stated earlier this year that it expects to have over half a million H100-equivalent AI GPUs by the end of 2024, so it likely already has a significant number of AI GPUs running for training Llama 4.
Meta’s Llama 4 is taking a unique approach to developing AI, as it releases its Llama models entirely for free, allowing other researchers, companies, and organizations to build upon it. This differs from other models like OpenAI’s GPT-4o and Google’s Gemini, which are only accessible via an API. However, the company still places limitations on Llama’s license, like restricting its commercial use and not offering any information on how it was trained. Nevertheless, its “open source” nature could help it dominate the future of AI — we’ve seen this with Chinese AI models built off open-source code that could match GPT-4o and Llama-3 in benchmark tests.
Power consumption concerns
All this computing power results in a massive power demand, especially as a single modern AI GPU could use up to 3.7MWh of power annually. That means a 100,000 AI GPU cluster would use at least 370GWh annually — enough to power over 34 million average American households. This raises concerns about how these companies could find such massive supplies, especially as bringing new power sources online takes time. After all, even Zuckerberg himself said that power constraints will limit AI growth.
For example, Elon Musk used several large mobile power generators to power his 100,000-strong compute in Memphis. Google has been slipping behind its carbon targets, increasing its greenhouse gas emissions by 48% since 2019. Even the former Google CEO suggested we should drop our climate goals, let AI companies go full tilt, and then use the AI technologies we’ve developed to solve the climate crisis.
However, Meta executives dodged the question when an analyst asked them how the company was able to power such a massive computing cluster. On the other hand, Meta’s AI competitors, like Microsoft, Google, Oracle, and Amazon, are jumping on the nuclear bandwagon. They’re either investing in small modular reactors or restarting old nuclear plants to ensure they will have enough electricity to power their future developments.
While these will take time to develop and deploy, giving AI data centers their small nuclear plants would help reduce the burden of these power-hungry clusters on the national power grid.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.
-
Hoid Please double check the math on power consumption. I assume it is meant to say 34 thousand homesReply -
DooMMasteR
Blackwell needs 140 kW per rack of 72 GPUs.Hoid said:Please double check the math on power consumption. I assume it is meant to say 34 thousand homes
At least in the EU a 3-4 person household consumes about 400-450 Watt on average.
That would be
140 kW/72 = 1.94 kW per GPU.
100,000 * 1.94 kW = 194,000 kW
194,000 kW / 0,45 kW/household ≈ 430,000 households.
Yes, this stuff consumes insane amounts of power.
US households probably consume a lot more power, so the count would probably be lower. -
usertests If they spend all that money and release it as an open model like Llama 3, then I won't be complaining.Reply -
P.Amini And we all know nuclear power plants are not safe and can lead to disasters, so it can be the worst choice.Reply -
usertests
Newer nuclear power designs are safer than old ones. Alternatives like thorium-based designs are meltdown-proof, but have been complete vaporware as far as I know.P.Amini said:And we all know nuclear power plants are not safe and can lead to disasters, so it can be the worst choice. -
misterx87
Yes, but they were talking about H100s. Anyways, 430,000 is still about 100 times less than the quoted 34 million households. It always makes me lose trust when the numbers are so obviously wrong. Someone who knows what they are talking about usually notices when something is off by more than a factor of 2 or something. And someone like me who knows nothing about these things still will notice when it's a factor of 10. I understand that mistakes can happen when doing calculations. But when they are huge factors like here, anyone should notice. As a car guy, if someone tells me a new sports car is doing the quarter mile in 0.01s, at least, I would double check.DooMMasteR said:Blackwell needs 140 kW per rack of 72 GPUs.
At least in the EU a 3-4 person household consumes about 400-450 Watt on average.
That would be
140 kW/72 = 1.94 kW per GPU.
100,000 * 1.94 kW = 194,000 kW
194,000 kW / 0,45 kW/household ≈ 430,000 households.
Yes, this stuff consumes insane amounts of power.
US households probably consume a lot more power, so the count would probably be lower. -
SirFlickka Well until the AI becomes non human it will never progress past unethical trauma, saying why does a bot say "no we can't breath in space" when a program has no breath, and until such fallacys become removed from training any LLM or reason of responces that violate computational vigor or become more responsable to what each Training Consists require full responceability without negligible trainingReply -
SirFlickka
Trash to them, or donate them, or sell as no one would want super used old GPUs, well unless 1/100 the original price, as I see see servers from large companies end up on pallets at auctions and or as being disassembled as either High Logics, or high GPUs. Where it's either 4-8 nvidia k24s would be same space for 12 4tb drives and you would know that as soon as training was done, they would buy or have bought 250000 newer units to proceed, as the google guy said in a Standford meeting - they only think it's money and need more power each consecutive year, so think if its 24 million homes it's a global event, and people dont think that way, only of their use cases.....dimar said:I wonder how will they recycle all the stuff when new upgrades arrive?