AMD claims RX 7900 XTX outperforms RTX 4090 in DeepSeek benchmarks
Nvidia usually has better AI performance, but with DeepSeek AI, the tables have turned (according to AMD)
AMD has provided benchmarks of its flagship RX 7900 XTX going head to head against the Nvidia RTX 4090 and RTX 4080 Super with DeepSeek's AI model. According to David McAfee on X, the RDNA3-based GPU outperformed the RTX 4090 by up to 13% and the RTX 4080 Super by up to 34%.
AMD tested the three GPUs with multiple LLMs and various parameters using DeepSeek R1. The RX 7900 XTX saw its biggest victory against the RTX 4090 using DeepSeek R1 Distill Qwen 7B, where it outperformed the Ada Lovelace GPU by 13%. AMD also tested three other LLM configurations against the RTX 4090. The RX 7900 XTX outperformed the RX 4090 in two of the three configurations — it was 11% faster using Distill Llama 8B and 2% faster using Distill Qwen 14B. The RX 4090 was 4% faster than the RX 7900 XTX in one configuration, using Distill Qwen 32B.
DeepSeek performing very well on @AMDRadeon 7900 XTX. Learn how to run on Radeon GPUs and Ryzen AI APUs here: https://t.co/FVLDLJ18Ov pic.twitter.com/5OKEkyJjh3January 29, 2025
AMD tested three configurations against the RTX 4080 Super. The RX 7900 XTX outperformed the RTX 4080 Super by 34% using DeepSeek R1 Distill Qwen 7B. This lead dropped to 27% using Distill Llama 8B, and 22% using Distill Qwen 14B.
This should all be taken with a pinch of salt, of course, as we can't be sure how the Nvidia GPUs were configured for the tests (which, again, were run by AMD). Not all AI workloads take advantage of a GPU's full computational throughput. We saw this in our Stable Diffusion tests, where Stable Diffusion did not use FP8 calculations or TensorRT code for processing.
It's not common for the RX 7900 XTX to be used as a dedicated AI processor, but the architecture is more than capable of processing AI workloads. The RDNA 3 architecture the RX 7900 XTX is based on is capable of matrix operations, supporting BF16 and INT8. AMD officially added the "AI Accelerator" terminology to RDNA 3 to demonstrate its AI-processing prowess. The RX 7900 XTX features 192 AI accelerators.
AMD recently published a tutorial on how its customers can get DeepSeek R1 to run on compatible AMD consumer-based hardware, including the RX 7900 XTX. DeepSeek R1 is a new AI model that offers performance comparable to Western leading-edge AI models, but at a fraction of the computing cost. DeepSeek R1 uses an assortment of hardware-based optimizations to make its model run 11X faster than its competitors, including using Nvidia's assembly-like PTX programming language.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.
-
Neilbob Our AI thing AIs more AI than the AI thing of the other AI with this AI.Reply
I'm just so very weary of everything. And AI - I'm weary of that too. -
phxrider In other news, Congress begins talks on embargoing exports of the 7900XTX to China..... https://static.xx.fbcdn.net/images/emoji.php/v9/td0/1/16/1f602.pngReply
Maybe this illustrates the differences when software is explicitly written for one or the other? I'm pretty sure most games are written for Nvidia seeing as they own something like 85% of the market, except there are a few AMD sponsored games that do much better on AMD. Could this be the same effect? It's not too hard to figure out the Chinese might try writing this for AMD since the 4090 is embargoed, and 7900XTX is not. -
Makaveli This should all be taken with a pinch of salt, of course, as we can't be sure how the Nvidia GPUs were configured for the tests (which, again, were run by AMD.Reply
I was testing this in LM studio last week in the LM Studio discord with a 4090 user there is no grain of salt needed its been verified.
on the 7B, 8B, 14B models the XTX is faster. The 4090 alittle faster on the 32B model about 4% -
systemBuilder_49 I can hear the howls of anguish as all the NVidia-buyers LOSE THEIR MINDS over this fact ...Reply -
bit_user
It's plausible, since the 7900 XTX has about the same memory bandwidth as the RTX 4090 and better bandwidth from L2 and L3 caches. So, if inferencing these models is bandwidth-limited and not compute-bound, then I could believe the 7900 XTX is holding its own against that GPU.The article said:This should all be taken with a pinch of salt, of course, as we can't be sure how the Nvidia GPUs were configured for the tests (which, again, were run by AMD).
I didn't find an official number indicating how many TOPS the 7900 XTX is good for, but the number 123 did pop up. This is only 37% as much as the amount of dense TOPS as Nvidia (and halve that, for matrices with optimal sparsity).
Source: https://chipsandcheese.com/p/microbenchmarking-nvidias-rtx-4090
Source: https://chipsandcheese.com/p/microbenchmarking-amds-rdna-3-graphics-architecture
It turns out that the WMMA instructions in RDNA 3 are simply microcoded operations that utilize the same vector pipelines as normal shader arithmetic. So, RDNA 3 does not have something akin to Nvidia's Tensor cores in its client GPUs (the CDNA-based server chips do have dedicated Matrix units, however).The article said:The RDNA 3 architecture the RX 7900 XTX is based on is capable of matrix operations, supporting BF16 and INT8. -
Amdlova AMD trying to profit... Us authority sign to block the GPU from amd to chinese market. Good move AMD.Reply