Intel Lunar Lake CPU gets die annotation — four Skymont E-cores slightly bigger than one Lion Cove P-core
Intel's latest mobile chip is packed.
Die shots of Intel’s latest Core Ultra 200V (codenamed Lunar Lake) CPUs have been analyzed and annotated, revealing the size and location of the mobile chip’s components.
Photographed by GeekerWan and 万扯淡 and annotated by hardware expert Nemez, the images deeply dive into Lunar Lake’s tiniest parts in the compute, platform controller, and base tiles. The first two tiles are fabbed on TSMC’s 3nm and 6nm nodes, respectively, while the base tile alone is fabbed at Intel on the 22nm node.
Lunar Lake is oriented towards laptops. The quad-core Lion Cove P-core cluster, the NPU, and the Xe2 integrated GPU are all the same size. The media engine, display engine, and memory controller are smaller, unlike in higher-end processors, where they’re almost a footnote in size.
Annotation of #Intel LunarLake!Not much to highlight imo, though I quite like the layout of the iGPU, way less messy than MeteorLakeLionCove is also quite compact relative to its L2 cache compared to RedwoodCovePictures by GeekerWan & 万扯淡, provided to me by @Kurnalsalts pic.twitter.com/YVqqt8gRtsOctober 3, 2024
Lunar Lake also has a quad-core cluster of Skymont-based E-cores, and the die annotation indicates that the entire cluster is just a little larger than a single Lion Cove core. This isn’t surprising since Meteor Lake’s Redwood Cove P-core is about the same size as its quad-core Crestmont E-core cluster. However, it is notable that Intel was able to keep E-cores small despite Skymont packing 38% higher integer and 68% higher floating point IPC.
Some parts of the annotation are guesses made clear by the handful of question marks accompanying some labels. The amount of cache per neural compute engine (NCE) is also an educated guess, as Nemez assumes Lunar Lake is 2MB per NCE, just like Meteor Lake, for 12MB from six NCEs.
The platform controller tile isn’t quite as busy as the compute tile, but the annotation does give us a pretty good idea of how much more extensive PCIe 5.0 circuitry is compared to PCIe 4.0, as Lunar Lake has four lanes of both connections. It seems the four PCIe 5.0 lanes plus their logic take up roughly double the space that the PCIe 4.0 lanes and logic use. In addition to PCIe 5.0 SSDs not quite being laptop friendly yet, the physical size of PCIe 5.0 within silicon could be another reason mobile CPUs have taken so long to upgrade to the most recent PCIe version.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general.
AMD crafts custom EPYC CPU with HBM3 memory for Microsoft Azure – CPU with 88 Zen 4 cores and 450GB of HBM3 may be repurposed MI300C, four chips hit 7 TB/s
AMD-powered El Capitan is now the world's fastest supercomputer with 1.7 exaflops of performance — fastest Intel machine falls to third place on Top500 list
-
edzieba While big desktop and server CPUs are expected to be mainly laptop-oriented, Lunar Lake is almost equally composed of its constituents.
ChatGPT strikes again! -
Giroro Oh boy, the article's content is just an embedded link to X (Formerly known as Twitter)Reply
I love interacting with that platform, and that link definitely won't be broken within a week. -
thestryker I'm surprised they've seemingly managed to keep the same E-core proportions when compared to the P-cores given the performance boost. It'll be interesting to see how they perform on ARL given the direct L3 access. Also hoping that they do a new N series even though I don't need to buy more of them and this would likely make me want to.Reply -
bit_user
That's because cache plays a big part of it. Without L2 cache, Skymont is 33.2% as big as Lion Cove. Gracemont was about 29.6% as big as Golden Cove, after excluding L2 cache.The article said:Lunar Lake also has a quad-core cluster of Skymont-based E-cores, and the die annotation indicates that the entire cluster is just a little larger than a single Lion Cove core. This isn’t surprising since Meteor Lake’s Redwood Cove P-core is about the same size as its quad-core Crestmont E-core cluster. However, it is notable that Intel was able to keep E-cores small despite Skymont packing 38% higher integer and 68% higher floating point IPC.
In Lunar Lake, both the P-cores and the E-cores got bigger. It's just that the rate of increase in the E-cores' size & complexity was a little higher than that of the P-cores.
Regarding those IPC figures, Chips & Cheese found that Skymont really can't stretch its legs in Lunar Lake. This is due to the entire cluster being implemented as a low-power island, rather than as a proper peer of the P-cores. Here's how they put it:
"Despite massive architecture improvements, Skymont’s performance is hit or miss compared to Crestmont. Lunar Lake’s different cache hierarchy plays a large role in this, and highlights the difficulties in having one core setup play both the low power and multithreaded performance roles. It also highlights the massive role caches play in CPU performance. Even a dramatically improved core can struggle to deliver gains if the cache subsystem doesn’t keep up. That’s especially important with LPDDR5X, which has high latency and can be a handicap in low core count workloads."
https://chipsandcheese.com/2024/10/03/skymont-intels-e-cores-reach-for-the-sky/ -
bit_user
Well, not quite. Excluding L2 cache Golden Cove was 5.37 mm^2. Gracemont was 1.59 mm^2. So, that's a ratio of 29.6% in Alder Lake and Raptor Lake, whereas I think Lunar Lake has a ratio of 33.2%.thestryker said:I'm surprised they've seemingly managed to keep the same E-core proportions when compared to the P-cores given the performance boost.
Source: https://locuza.substack.com/p/die-walkthrough-alder-lake-sp-and
Given how area-intensive cache is, I think we could expect an even greater disparity if we could compare them after excluding L1 (and L0, in Lion Cove's case).
The other thing to keep in mind is that Lion Cove also increased in IPC and complexity. Because Redwood Cove was so big and complex, the relative improvement wasn't as big. Another way of looking at it is that Redwood Cove is probably a lot further down the path of diminishing returns. So, just to get a ~15% IPC increase they had to increase gate count by more than what it would take to make a comparable improvement to Crestmont.
I think Intel would probably rather use its own nodes for the low cost, low margin stuff. So, I'd expect we'll get N-series based on Crestmont and using the Intel 3 node. That's just a guess.thestryker said:It'll be interesting to see how they perform on ARL given the direct L3 access. Also hoping that they do a new N series even though I don't need to buy more of them and this would likely make me want to. -
thestryker
The client Lion Cove cores also don't contain HT which saves some on area comparatively speaking. It's also plausible they don't have AVX512 which definitely increased GC/RC core size. I mostly just expected Skymont to end up being larger than it is since Intel is making the E-cores more full featured.bit_user said:Well, not quite. Excluding L2 cache Golden Cove was 5.37 mm^2. Gracemont was 1.59 mm^2. So, that's a ratio of 29.6% in Alder Lake and Raptor Lake, whereas I think Lunar Lake has a ratio of 33.2%. -
bit_user
I hope it doesn't have AVX-512, since the server cores are already deviating in areas like hyperthreading.thestryker said:The client Lion Cove cores also don't contain HT which saves some on area comparatively speaking. It's also plausible they don't have AVX512 which definitely increased GC/RC core size.
It seems to me the main reason why client cores have had AVX-512, since Ice Lake, was because Intel actually wanted to support it on them. When you consider how the server P-cores already differed from client P-cores in the number of AVX-512 ports, it does seem kind of pointless to integrate it if you really have no intention of enabling it.
TBH, I did actually expect them to be closer in size. If it turned out that Skymont were a little more than half the size of Lion Cove, I think I wouldn't have been surprised.thestryker said:I mostly just expected Skymont to end up being larger than it is since Intel is making the E-cores more full featured.
However, upon reflection, it does seem to me that probably a lot of the size difference is simply due to supporting higher clock frequencies in Lion Cove. If we consider how much smaller AMD's C-cores are, which have the exact same microarchitecture as the full-sized ones, Zen 4C is only 64.6% as big as regular Zen 4 (excluding L3). When you consider the additional differences and longer critical paths of Skymont, I guess I really should've expected it to be more in the range of 30 to 40% as big as Lion Cove. -
thestryker
The ARL leaks have all shown the E-cores hitting 4.6Ghz which is higher than Gracemont on any of the RPL SKUs. This was somewhat unexpected given that LNL caps out at 3.7Ghz on even the 288V (MTL U series went up to 3.8Ghz). It makes me even more curious about the Skymont efficiency curve than I was before especially with what we know about how Intel blew Gracemont's efficiency to gain MT on desktop.bit_user said:However, upon reflection, it does seem to me that probably a lot of the size difference is simply due to supporting higher clock frequencies in Lion Cove. If we consider how much smaller AMD's C-cores are, which have the exact same microarchitecture as the full-sized ones, Zen 4C is only 64.6% as big as regular Zen 4 (excluding L3). When you consider the additional differences and longer critical paths of Skymont, I guess I really should've expected it to be more in the range of 30 to 40% as big as Lion Cove. -
bit_user
That's interesting. If that were on a higher-density node (20A), it would make more sense to me, as you'd get some additional frequency pretty much for free (design-wise).thestryker said:The ARL leaks have all shown the E-cores hitting 4.6Ghz which is higher than Gracemont on any of the RPL SKUs. This was somewhat unexpected given that LNL caps out at 3.7Ghz on even the 288V (MTL U series went up to 3.8Ghz). It makes me even more curious about the Skymont efficiency curve than I was before especially with what we know about how Intel blew Gracemont's efficiency to gain MT on desktop.
In terms of perf/W, it does seem like Intel might be interested in trying to juice the Symont cores to help offset the loss of hyperthreading, when it comes to MT performance. Too bad we're not going to get the rumored versions with 32 E-cores, as that really would've been something to behold!