Nvidia gaming GPUs modded with 2X VRAM for AI workloads — RTX 4090D 48GB and RTX 4080 Super 32GB go up for rent at Chinese cloud computing provider

GeForce RTX 4090 (Image credit: Nvidia)

AI enthusiast 青龍聖者 has discovered two fascinating graphics cards in China: the GeForce RTX 4090D 48GB and GeForce RTX 4080 Super 32GB. The mysterious SKUs are clearly modified versions of the GeForce RTX 4090D and GeForce RTX 4080 Super, which contend with the best graphics cards.

The GeForce RTX 4090D 48GB and GeForce RTX 4080 Super 32GB are available for rent at AutoDL, a Chinese cloud computing provider that rents servers for AI work. Pricing is a steal. You can rent a single GeForce RTX 4080 Super 32GB for $0.03 hourly. However, the service is currently restricted to China, as you need a Chinese phone number to sign up.

As the model names already indicated, the GeForce RTX 4090D 48GB and GeForce RTX 4080 Super 32GB have double the memory of their regular versions. More VRAM is beneficial when dealing with AI workloads. However, the graphics cards' core specifications should remain unaltered, although the additional memory likely raises their TDP marginally.

We've seen a fair share of mods where users resourcefully upgrade Nvidia's GeForce gaming graphics cards with more memory, such as the GeForce RTX 3070 16GB. Nonetheless, this may be the first time we've seen the same level of work done on Nvidia's latest GeForce RTX 40-series (Ada Lovelace) graphics cards. The procedure sounds easy because you only have to remove the existing GDDR6X memory modules on the graphics cards and solder the new and higher-capacity ones. However, it does require a bit of skill to perform.

Image 1 of 3

The regular GeForce RTX 4090D has 24GB of GDDR6X memory distributed across 12 2GB GDDR6X memory modules, whereas the GeForce RTX 4080 wields 16GB of GDDR6X comprised of eight chips with similar capacity. To achieve twice the memory, the vendor would have to replace the 2GB GDDR6X memory modules with 4GB. The problem is that 4GB GDDR6X/GDDR6 memory modules don't exist.

Therefore, the GeForce RTX 4090D 48GB and GeForce RTX 4080 Super 32GB are likely utilizing a custom PCB, allowing the vendor to place GDDR6X memory modules on both sides of the PCB. This approach resembles what Nvidia takes on the chipmaker's more premium Titan and RTX professional graphics cards.

How AutoDL obtained the GeForce RTX 4090D 48GB and GeForce RTX 4080 Super 32GB remains a mystery, as does who is performing the modification work. However, 青龍聖者 learned that both graphics cards have been in circulation in China since the end of June. The GeForce RTX 4090D 48GB reportedly sells for around $2,500, $685 more expensive than the vanilla GeForce RTX 4090D, which has a 12,999 yuan ($1,815) MSRP.

The GeForce RTX 4090D 48GB and GeForce RTX 4080 Super 32GB may not be the unique graphics cards in AutoDL's arsenal. There's reportedly an A100 upgraded to 96GB of HBM2e instead of the regular A100 PCIe 80GB variant that Nvidia sells. However, that's a story for another day.

See more GPUs News

Zhiye Liu is a news editor and memory reviewer at Tom’s Hardware. Although he loves everything that’s hardware, he has a soft spot for CPUs, GPUs, and RAM.

14 Comments Comment from the forums

Squishynidas

They need to make GPU ram modular like system ram. We keep buying stuff we could easily reuse.
Reply
hotaru251

Squishynidas said:
They need to make GPU ram modular like system ram.
no.
there is a reason its soldered on now-a-days.
Reply
Notton

Squishynidas said:
They need to make GPU ram modular like system ram. We keep buying stuff we could easily reuse.
okay, how are you going to fit it?

Reply
Makaveli

hotaru251 said:
no.
there is a reason its soldered on now-a-days.
And also giving customers easy way to increase VRAM and not making them buy a new product is bad for the bottom line of AMD, NV and Intel.
Reply
Evildead_666

Makaveli said:
And also giving customers easy way to increase VRAM and not making them buy a new product is bad for the bottom line of AMD, NV and Intel.
Back in the day, when this was possible, they were almost all proprietary, and were not cheap...
I remember getting a memory upgrade for my Matrox Mystique...

Also, the memory type changes every couple of generations.

I could see it happening on High end Pro cards, as a memory doubler thing though.
Reply
jlake3

Notton said:
okay, how are you going to fit it?

In addition to physical size, how about the trace length and signal integrity problems?

Also GPU memory controllers don't work like CPU memory controllers, so if it were possible you'd probably end up with something like a single CAMM socket rather than multiple DIMM slots, and because the bus widths and number of memory modules per GPU aren't constant, you'd need different daughterboards for each configuration.
Reply
toaste

Notton said:
okay, how are you going to fit it?

You could do it with something like CAMM on the back side of the board, but only if you dropped the GDDR speed dramatically. Nobody would like the crippling performance drop the GDDR bandwidth reduction would bring with it.

Makaveli said:
And also giving customers easy way to increase VRAM and not making them buy a new product is bad for the bottom line of AMD, NV and Intel.
Yeah, that's not a concern. If it were possible, then Intel or AMD would do it. They're competing for like 2% and 10% of the market, and increased longevity would be massively offset by the piles of money afforded by taking a chunk of Nvidia's pie.

It's not possible because the GPU relies on increasing memory bandwidth to keep up with increasing shader vertex or texture processing.

The most extreme GDDR5 overclocking is at around DDR5 7200 or so. So 7.2Gbit/s. It pumps MCLK at a languid 3.6GHz. Pathetic.

Your GDDR6 gpu is pumping the WCK pin at 8 or 9 GHz to hit 16 or 18Gb/s. GDDR6x uses QAM so it's "only" pushing around 5GHz.

The PCB routing is kept physically as short as possible, and arranging it so the traces are all EXACTLY the same length and impedance is critical. You simply cannot get a signal through a PCB at that rate with a mechanical connector in the way.
Reply
Lucky_SLS

This is where tiered memory comes into play. Intel already has this in its Lunar lake. Consider 16 gigs of GDDR7 memory and supplement the rest with CAMM2. This will still have performance penalty, but at least you are no longer limited by memory.
Reply
nocturn9x

Lucky_SLS said:
This is where tiered memory comes into play. Intel already has this in its Lunar lake. Consider 16 gigs of GDDR7 memory and supplement the rest with CAMM2. This will still have performance penalty, but at least you are no longer limited by memory.
How is that even remotely worth it? Lots of memory is useless if you can't access it fast enough
Reply
Lucky_SLS

Who said this is for gaming or AI? Productivity applications does not benefit the same from fast memory. But memory capacity improves the capability.
Reply

Show more comments