GB202 die shot beautifully showcases Blackwell in all its glory — GB202 is 24% larger than AD102
GB202 is massive but is still noticeably smaller than Hopper and Volta's flagship GPU dies.
A diagram of Nvidia's flagship GB202 die, which powers the RTX 5090, was shared on X, revealing the componentry layout of Nvidia's Blackwell architecture. The image annotated by Kurnal shows the layout of GB202's primary components, such as the L2 cache, GPCs, SMs, memory controllers, and more. A second image was also shared comparing GB202 to AD102, used with Nvidia's previous-gen RTX 4090 flagship.
Looking at the GB202 die shot, the L2 cache resides directly in the middle of the die, split into several 2 MB chunks, forming two 32 MB layers. Surrounding the entire L2 cache pool are 12 graphics processing clusters. Inside each cluster resides several texture processing clusters responsible for vertex shading, texture mapping, and rasterization. Inside each TPC are several Streaming Processors that contain components such as the GPU's CUDA cores and L1 cache. There are 96 TPCs, with each TPC allegedly having up to four SMs each.
Bordering the L2 cache are 12 Raster engine/3D FF blocks, which convert a scene's geometric representation, such as polygons, into a viewable image on the screen. Directly in the middle of the GB202 die, from top to bottom, are the Nvidia video encoders and decoders (NVDEC and NVENC) responsible for highly efficient video playback and recording.
GB202 Dieshot/5090 DieshotThanks By@ASUS Tony 俞元麟 by Chip@万扯淡 by Dieshot@Kurnalsalts LayoutPhoto1 GB202 DieshotPhoto2 AD102 vs GB202 full Pixel Photo pls join in Kurnal’s Telegram teamhttps://t.co/MI6oCa2yOA pic.twitter.com/pny7bvCs5jJanuary 25, 2025
To the left and right of the L2 cache are the eight 64-bit memory controllers, and flanking the top, left, and right of the die are 32-bit GDDR7 physical interfaces that connect the GPU to the GDDR7 memory modules. Finally, the PCie 5.0 x16 physical interface and display controller are on the bottom.
In the second image, we better perceive GB202's immense size than Ada Lovelace. GB202 is 24% larger than AD102, featuring a die size of 761.56mm². AD102 measures 616mm². Blackwell leverages TSMC's N4P node, a tuned variant of N4 used on Ada Lovelace, so it stands to reason why Nvidia increased GB202's physical dimensions significantly over AD102. If Nvidia used TSMC's bleeding-edge 3N, there's a chance Blackwell chips could have had similar dimensions to Ada Lovelace chips but with the same performance characteristics; that said, this has not been confirmed.
Regardless, GB202 is not the largest die Nvidia has had to produce; it still belongs to its older Hopper and Volta architectures. GH100 and GV100 (powering the Titan V) feature die sizes of 814mm² and 815mm², respectively.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.
-
bit_user Somehow, it feels strange to see the cache in big blocks, like that. I guess I'm so accustomed to seeing LLC being more distributed, but it's certainly better if a SM doesn't have to fetch something from the opposite corner of the die, for instance.Reply
Here's the twitter link:
1883153126011892140View: https://x.com/Kurnalsalts/status/1883153126011892140/photo/1
I'd love to know something about its interconnect topology. I wonder if there's any direct GPC-to-GPC communication path, or if all GPC communication has to go through L2.
Also, does anyone know what "3D FF" is referring to? -
Heiro78
I found a spanish forum saying 3D FF are texture engines/ RT Cores. For what it's worth. They were looking at AD102 vs GH100bit_user said:Somehow, it feels strange to see the cache in big blocks, like that. I guess I'm so accustomed to seeing LLC being more distributed, but it's certainly better if a SM doesn't have to fetch something from the opposite corner of the die, for instance.
Here's the twitter link:
1883153126011892140View: https://x.com/Kurnalsalts/status/1883153126011892140/photo/1
I'd love to know something about its interconnect topology. I wonder if there's any direct GPC-to-GPC communication path, or if all GPC communication has to go through L2.
Also, does anyone know what "3D FF" is referring to?
https://elchapuzasinformatico.com/2022/09/nvidia-rtx-4090-ad102-area-densidad/ -
bit_user
Thanks! I can believe it, but I'm just trying to make sense of the name. I guess the "FF" is short for Fixed Function (i.e. not programmable)?Heiro78 said:I found a spanish forum saying 3D FF are texture engines/ RT Cores. For what it's worth. They were looking at AD102 vs GH100