DeepSeek might not be as disruptive as claimed, firm reportedly has 50,000 Nvidia GPUs and spent $1.6 billion on buildouts
The fabled $6 million was just a portion of the total training cost.
Chinese startup DeepSeek recently took center stage in the tech world with its startlingly low usage of compute resources for its advanced AI model called R1, a model that is believed to be competitive with Open AI's o1 despite the company's claims that DeepSeek only cost $6 million and 2,048 GPUs to train. However, industry analyst firm SemiAnalysis reports that the company behind DeepSeek incurred $1.6 billion in hardware costs and has a fleet of 50,000 Nvidia Hopper GPUs, a finding that undermines the idea that DeepSeek reinvented AI training and inference with dramatically lower investments than the leaders of the AI industry.
DeepSeek operates an extensive computing infrastructure with approximately 50,000 Hopper GPUs, the report claims. This includes 10,000 H800s and 10,000 H100s, with additional purchases of H20 units, according to SemiAnalysis. These resources are distributed across multiple locations and serve purposes such as AI training, research, and financial modeling. The company's total capital investment in servers is around $1.6 billion, with an estimated $944 million spent on operating costs, according to SemiAnalysis.
DeepSeek took the attention of the AI world by storm when it disclosed the minuscule hardware requirements of its DeepSeek-V3 Mixture-of-Experts (MoE) AI model that are vastly lower when compared to those of U.S.-based models. Then DeepSeek shook the high-tech world with an Open AI-competitive R1 AI model. However, the reputable market intelligence company SemiAnalysis revealed its findings that indicate the company has some $1.6 billion worth of hardware investments.
DeepSeek originates from High-Flyer, a Chinese hedge fund that adopted AI early and heavily invested in GPUs. In 2023, High-Flyer launched DeepSeek as a separate venture solely focused on AI. Unlike many competitors, DeepSeek remains self-funded, giving it flexibility and speed in decision-making. Despite claims that it is a minor offshoot, the company has invested over $500 million into its technology, according to SemiAnalysis.
A major differentiator for DeepSeek is its ability to run its own data centers, unlike most other AI startups that rely on external cloud providers. This independence allows for full control over experiments and AI model optimizations. In addition, it enables rapid iteration without external bottlenecks, making DeepSeek highly efficient compared to traditional players in the industry.
Then there is something that one would not expect from a Chinese company: talent acquisition from mainland China, with no poaching from Taiwan or the U.S. DeepSeek exclusively hires from within China, focusing on skills and problem-solving abilities rather than formal credentials, according to SemiAnalysis. Recruitment efforts target institutions like Peking University and Zhejiang University, offering highly competitive salaries. According to the research, some AI researchers at DeepSeek earn over $1.3 million, exceeding compensation at other leading Chinese AI firms such as Moonshot.
Due to the talent inflow, DeepSeek has pioneered innovations like Multi-Head Latent Attention (MLA), which required months of development and substantial GPU usage, SemiAnalysis reports. DeepSeek emphasizes efficiency and algorithmic improvements over brute-force scaling, reshaping expectations around AI model development. This approach has, for many reasons, led some to believe that rapid advancements may reduce the demand for high-end GPUs, impacting companies like Nvidia.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
A recent claim that DeepSeek trained its latest model for just $6 million has fueled much of the hype. However, this figure refers only to a portion of the total training cost— specifically, the GPU time required for pre-training. It does not account for research, model refinement, data processing, or overall infrastructure expenses. In reality, DeepSeek has spent well over $500 million on AI development since its inception. Unlike larger firms burdened by bureaucracy, DeepSeek’s lean structure enables it to push forward aggressively in AI innovation, SemiAnalysis believes.
DeepSeek's rise underscores how a well-funded, independent AI company can challenge industry leaders. However, the public discourse might have been driven by hype. Reality is more complex: SemiAnalysis contends that DeepSeek’s success is built on strategic investments of billions of dollars, technical breakthroughs, and a competitive workforce. What it means is that there are no wonders. As Elon Musk noted a year or so ago, if you want to be competitive in AI, you have to spend billions per year, which is reportedly in the range of what was spent.
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
JTWrenn First rule of tech when dealing with Chinese companies. They are part of the state and the state has a vested interest in making the USA and Europe look bad. Triple check their numbers. Do the same for Elon.Reply -
alrighty_then I'm not shocked but didn't have enough confidence to buy more NVIDIA stock when I should have. Now Monday morning will be a race to sell airline stocks and buy some big green before everyone else does.Reply -
quorm This is just cope aiming to protect the inflated value of "AI" companies. It doesn't really matter how many GPU's they have or their parent company has. The real disruptive part is releasing the source and weights for their models.Reply -
JTWrenn
I think any big moves now is just impossible to get right. I am in a holding pattern for new investments, and will just put them into something interesting bearing for probably a few months, and let the rest ride. No way to guess right on this roller coaster.alrighty_then said:I'm not shocked but didn't have enough confidence to buy more NVIDIA stock when I should have. Now Monday morning will be a race to sell airline stocks and buy some big green before everyone else does.
I do think the reactions really show that people are worried it is a bubble whether it turns out to be one or not. -
phead128 $1.6 billion is still significantly cheaper than the entirety of OpenAI's budget to produce 4o and o1.Reply
The exact dollar amount doesn't exactly matter, it's still significantly cheaper, so the overall spend for $500 Billion StarGate or $65 Billion Meta mega farm cluster is wayyy overblown.
Plus, the key part is it's open sourced, and that future fancy models will simply be cloned/distilled by DeepSeek and made public. So "commoditization" of AI LLM beyond the very top end models, it really degrades the justification for the super mega farm builds. -
palladin9479 Ehh this is kinda mixing up two different sets of numbers. Those GPU's don't explode once the model is built, they still exist and can be used to build another model. The $6 million number was how much compute / power it took to build just that program. Building another one would be another $6 million and so forth, the capital hardware has already been purchased, you are now just paying for the compute / power. Most models at places like Google / Amazon / OpenAI cost tens of millions worth of compute to build, this isn't counting the billions in hardware costs.Reply -
purposelycryptic The fact that the hardware requirements to actually run the model are so much lower than current Western models was always the aspect that was most impressive from my perspective, and likely the most important one for China as well, given the restrictions on acquiring GPUs they have to work with.Reply
Being that much more efficient opens up the option for them to license their model directly to companies to use on their own hardware, rather than selling usage time on their own servers, which has the potential to be quite attractive, particularly for those keen on keeping their data and the specifics of their AI model usage as private as possible. And once they invest in running their own hardware, they are likely to be reluctant to waste that investment by going back to a third-party access seller.
I guess it most depends on whether they can demonstrate that they can continue to churn out more advanced models in pace with Western companies, especially with the difficulties in acquiring newer generation hardware to build them with; their current model is certainly impressive, but it feels more like it was intended it as a way to plant their flag and make themselves known, a demonstration of what can be expected of them in the future, rather than a core product.
So, I guess we'll see whether they can repeat the success they've demonstrated - that would be the point where Western AI developers should start soiling their trousers.
Either way, ever-growing GPU power will continue be necessary to actually build/train models, so Nvidia should keep rolling without too much issue (and maybe finally start seeing a proper jump in valuation again), and hopefully the market will once again recognize AMD's importance as well. Ideally, AMD's AI systems will finally be able to offer Nvidia some proper competition, since they have really let themselves go in the absence of a proper competitor - but with the advent of lighter-weight, more efficient models, and the status quo of many corporations just automatically going Intel for their servers finally slowly breaking down, AMD really needs to see a more fitting valuation. -
phead128
Well said.palladin9479 said:Ehh this is kinda mixing up two different sets of numbers. Those GPU's don't explode once the model is built, they still exist and can be used to build another model. The $6 million number was how much compute / power it took to build just that program. Building another one would be another $6 million and so forth, the capital hardware has already been purchased, you are now just paying for the compute / power. Most models at places like Google / Amazon / OpenAI cost tens of millions worth of compute to build, this isn't counting the billions in hardware costs.
The $6 million is the "variable" cost, whereas the $1.6 billion is the "fixed cost."
One thing to note it's 50,000 hoppers (older H20, H800s) to make DeepSeek, whereas xAi needs 100,000 H100s to make GrokAI, or Meta's 100,000 H100s to make Llama 3. So even if you compare fixed costs, DeepSeek needs 50% of the fixed costs (and less efficient NPUs) for 10-20% better performance in their models, which is a hugely impressive feat.
So even if you account for the higher fixed cost, DeepSeek is still cheaper overall direct costs (variable AND fixed cost).
One thing that people don't understand is, no matter what model OpenAI publishes, DeepSeek will distill the output, and make it free/publically available (v3 is dstilled 4o, r1 is distilled o1, and they are going to clone o3 etc...) So 90% of the AI LLM market will be "commoditized", with remaining occupied by very top end models, which inevitably will be distilled as well. OpenAI's only "hail mary" to justify enormous spend is trying to reach "AGI", but can it be an enduring moat if DeepSeek can also reach AGI, and make it open source? -
aero1x Look I'm no genius nor do I understand all the implications.. but when I saw these facts - 1) claims of a hilariously paltry budget + 2) ai performance conveniently similar to that of chat gpts o1 + 3) from a rando Chinese financial company turned AI company - the LAST thing I thought was woowww major breakthrough. Are there innovations, yes. More like, innovations on how to copy & build off others work, potentially illegally. Oh and this just so happens to be what the Chinese are historically good at.Reply
I saw the reactions of ppl losing their sht thought.. damn ppl are really not as smart/informed as I assume them to be. Then you noticed the CCP bots in droves all over .. so obvious. Also a red flag
I'm Chinese, raised in North America. My mom LOVES China (and the CCP lol) but damn guys you gotta see things clearly through non western eyes. Get it through your heads - how do you know when China's lying - when they're saying gddamnn anything. It's just the facts and how they operate.