Enter the IBM z17 mainframe with Telum II (more clues for Power11?)
The first Telum strongly emphasized cache. Interestingly, it did so by dropping categorial L3 and L4 altogether: instead, IBM developed a strategy where cores could reach into the L2 of other cores and treat that as L3, and reach into other chips' cache and treat that as L4. Each chip had eight cores and 32MB of L2 per core, giving lots of opportunity for more efficient utilization. The picture of the Telum II die above shows that IBM has not substantially deviated from this plan, using the same 128K/128K L1 but increasing L2 to 36MB per core. IBM's documentation says that there are eight cores per chip, but at a cursory glance there appear to be ten on the die, likely for yield reasons (two cores would be fused off). Assuming these dud cores still have useable cache, however, that matches IBM's specs of up to 360MB of effective L3 and a whopping 2.88GB of L4 per system.
The cores top out at 5.5GHz with various microarchitectural improvements such as better branch prediction and faster store writeback and address translation, all the typical kinds of tweaks that would also likely show up in Power11. Power11 is also expected to remain on 7nm with a "refined" process instead of moving to 5nm. (It's possible that Power12, whenever that arrives, may skip 5nm entirely.)
Of course, the marketing material on z17 is all AI all the time. IBM's claimed AI improvements seem to descend from an enhanced "DPU" ("data processing unit") with its own 64K (32K instruction/32K data) L1 cache capable of 24 trillion INT8 operations per second, the kind of bolt-on hardware that could also be incorporated or scaled-down into other products. In fact, such a product exists already, shown above: IBM's Spyre Accelerator, which is basically 32 more DPUs. These attach over PCIe and would be a good alternative to our having to scrabble around with iffy GPU support, assuming that IBM supports this in Linux (but they already do for LinuxONE systems, so it shouldn't be much of a stretch).If you have the money and a convenient IBM salesdroid who actually answers the phone, you too can horrify your electrical utility starting in June. As for those of us on the small systems side, Power11 in whatever form it ends up taking is not anticipated to emerge until Q3 2025, presumably as what will be the E1100 series starting with the E1180 and going down. This further shrinks the production and sales window for the long-anticipated Raptor S1 systems, however, and there hasn't been a lot of news about those — to say nothing of what the Trump tariffs could mean for rolling out a new system.
Likely not dud cores, but spares. IBM zSeries does continuous parallel (redundant) execution of instructions by trio of processors, and in case of a glitch uses the majority vote. In case of severe problem, spare processor unit is brought in to replace failed unit.
ReplyDeletePossibly, but I didn't think that extended to the core level (just the chip level).
DeletePower11 will be released across the board. Not the typical enterprise trickle down. Nothing new except the chips. No new I/O, etc. For many, in place chip/MCM or CEC swaps.
ReplyDeleteThe "DPU" - silly name, by the way - is actually distinct from the AI accelerator block, which is a separate IP core. The DPU is basically an on-die replacement for off-chip channel controllers like the OSA Express (which, AFAIK, are actually multicore PowerPC 4xx complexes internally.)
ReplyDelete