AMD-powered El Capitan is now the world's fastest supercomputer with 1.7 exaflops of performance — fastest Intel machine falls to third place on Top500 list
AMDomination.
AMD and the Lawrence Livermore National Laboratory (LLNL) announced today that the AMD-powered El Capitan has taken the top spot on the semi-annual Top500 list as the fastest-known supercomputer on the planet with 1.742 exaflops of performance. El Capitan debuts on the list at the top spot, catapulting over the previous leader, the 1.3 exaflop Frontier. The Intel-powered Aurora system fell to third place on the list—the system didn't submit a new benchmark run, implying that the partially operational system is still experiencing failure issues on numerous fronts (more below).
The sheer scale of El Capitan is mind-boggling — the system has 11,136 nodes packed with 44,544 of AMD's MI300A APUs, 5.4 petabytes of main memory, and an exceptionally performant 'Rabbit' near-node storage subsystem (more on those details below). El Capitan achieved 1.742 quintillion operations per second (exaflops) of performance in the benchmark, equivalent to doing one calculation every second for 54 billion years—but El Capitan does that amount of work every second. That's 45% faster than the second-fastest system on the list.
The National Nuclear Security Administration (NNSA) will use the system to modernize the US nuclear arsenal by simulating explosions to eliminate the need for underground detonations and simulate aging effects, safety, and reliability of the nuclear stockpile. The system will also be used to develop two new ICBM designs. The system will be used for HPC and AI workloads, or a fusion of the two.
System | Cores | Rmax (PFlop/s) | Rpeak (PFlop/s) | Power (kW) |
El Capitan - HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A, Slingshot-11 | 11,039,616 | 1,742 | 2,746 | 29,581 |
Frontier - HPE Cray EX235a, AMD custom 3rd-Gen EPYC 64C 2GHz, AMD Instinct MI250X | 8,699,904 | 1,353 | 2.055 | 22,786 |
Aurora - HPE Cray EX - Xeon CPU Max 9470 52C 2.4GHz, Intel Data Center GPU | 9,264,128 | 1,012 | 1,980 | 38,698 |
El Capitan boasts a theoretical peak (Rpeak) of 2.746 exaflops of performance. However, that number is calculated with the full performance of all system components operating at peak speeds with perfect linear performance scaling, which simply isn't feasible in the real world.
El Capitan's Rmax, a real-world performance measurement in the High-Performance Linpack (HPL) benchmark that serves as the measuring stick for the top supercomputers, reached 1.742 exaflops in actual use. The Rmax could increase in the future with further system tuning, and the agency says it will do one more full-scale HPL benchmark before El Capitan is moved to a classified network.
It's also important to note that supercomputer system performance in HPL is measured with full double-precision FP64. In contrast, AI-centric supercomputers are measured with smaller data types that enable much higher 'AI exaflop' ratings, but those aren't directly comparable to the listings on the Top500 list.
El Capitan consumes >35 megawatts of power at full utilization and delivers 58.89 Gigaflops/watt, taking the 18th spot on the Green500 ranking of the most efficient supercomputers.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
El Capitan has an astounding total of 11,039,616 compute cores (CPU+GPU) spread across 44,544 AMD MI300A processors. These APUs blend both CPU and GPU cores into the same physical package. Each MI300A chip has 13 chiplets, many of them 3D-stacked, to create a single chip package with twenty-four Zen 4 CPU cores fused with a CDNA 3 graphics engine and eight stacks of HBM3 memory totaling 128GB.
Overall, the MI300A chip weighs in with 146 billion transistors, making it the largest chip AMD has pressed into production. The nine compute dies, a mix of 5nm CPUs and GPUs, are 3D-stacked atop four 6nm base dies that are active interposers that handle memory and I/O traffic, among other functions. You can see the deep dive of the El Capitan topology here. The architecture employs cache-coherent memory to reduce data movement between the CPU and GPU, which often consumes more power than the computation itself, thus reducing latency and improving performance and power efficiency. It also vastly simplifies both porting over older code and creating new code.
HPE builds the El Capitan system with its Shasta architecture, which consists of high-density liquid-cooled EX4000 cabinets and EX225a accelerator blades tied together with the Slingshot-11 networking interconnect. This platform powers the DOE's other two exascale supercomputers: Frontier, the previous fastest supercomputer in the world, and the oft-delayed Aurora, which is powered by Intel silicon. That gives HPE the first, second, and third slots on the Top500 list, and all three are the first and only exascale-class systems on the list.
For comparison, El Capitan is 45% faster than Frontier, the second-fastest super on the Top500 list. The AMD-powered Frontier now occupies the second spot on the Top500 list, giving the company another feather in its hat — AMD's silicon powers the two fastest supercomputers in the world. Interestingly, the Frontier supercomputer also has a new benchmark result for the list with a benchmark of 1.353 exaflops, an increase over the prior submission of 1.194. The Rpeak was also increased from 1.714 exaflops to 2.055 exaflops.
While El Capitan is now the fastest known supercomputer in the world, we'd be remiss if we didn't mention that China has several of its own exascale-class machines. These are shrouded in secrecy and not submitted to the Top500 list for fear of reprisal via US sanctions.
The DOE did not submit a new benchmark for Intel's Aurora, which is quite surprising. Six months ago, an Aurora submission cemented the system in the second spot on the Top500, but the system wasn't fully operational. Instead, the benchmark run only comprised 87% of the system active. At the time, Intel said Aurora suffered from numerous hardware issues, including hardware and cooling system failures, operational errors, and network instability. The lack of a new submission implies those errors have not yet been fully rectified. Aurora still leads the AI-centric HPL-MxP mixed-precision benchmark, making it the fastest known AI supercomputer in the world with 10.6 AI Exaflops of performance.
In fact, AMD powers five of the top ten fastest supercomputers, while Intel has three, Nvidia has one, and Japan's custom-built Arm Fugaku still holds a spot. LLNL also commented that this system is far and away the most cost-effective system deployed at 'even close' to a similar scale, indicating that not only is El Capitan the world's fastest, but it is also the most economical on the cutting edge of technology.
Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.
-
DS426 Impressive that AMD has advanced so quickly in datacenter hardware that pure AMD CPU-GPU systems like this and Frontier are #1 and #2. Of course, the icing on the cake is LLNL stating that this was the most affordable way to get to these kinds of numbers. No Intel, no nVidia -- NOT surprised! Lol.Reply -
redgarl I don't understand how could Intel still maintain 75% of the market after projects like these.Reply -
thestryker It definitely makes sense that the MI300A is delivering on both the design and performance front. For HPC installations the simplicity of a single chip has got to help a lot. It will be really interesting to see what AMD has next in this form.Reply -
bit_user Before anyone gets too proud, just know that the AI bros have sailed way past these metrics. The only reason clusters like Musk's Colossus aren't leading the Top 500 is that they just don't bother to submit their systems to it.Reply -
SBKch xAI cluster is easily 2-5 times faster than this, they just don't bother testing it at these type of workloads and adding it to top500 list.Reply -
SBKch
Also Collosus is 5-10 times faster than Aurora at AI workloads.SBKch said:xAI cluster is easily 2-5 times faster than this, they just don't bother testing it at these type of workloads and adding it to top500 list. -
einheriar
mwahhh we need to move to zetta flops, so we can focus onto reaching yotta flops..gg83 said:I love this stuff. Not too long ago we were talking about petaflops as the target.