- For the first time after revealing the zettascale 2027 target in 2021 (implying 1000x in just six years), Intel has detailed the innovations that are required.
- As Moore’s Law will only yield around a 10x improvement, Intel envisions quite exotic new tech such as ultra-low voltage and temperature CMOS, and silicon photonics.
- Even if all of these innovations would be realized timely, the main issue is likely that while performance per watt may be improved sufficiently, performance per dollar may not.
- For example, the reason ultra-low voltage computing is uncommon is because GPU manufacturers push the voltage to improve performance per mm2 of silicon.
- While Intel has presented an exciting engineering challenge, there is currently no investment thesis.
A bit over a year ago, Intel (NASDAQ:INTC) came out of nowhere with stating its goal to reach zettascale supercomputing in 2027, just six years after the world reached exascale in 2021: Nvidia Vs. Intel: Zettascale Is The New Metaverse. Note that the first Intel-based exascale supercomputer has been delayed from late 2021 to early 2023 due to mid-2020 7nm/Intel 4 delay, so Intel was criticized for talking about its future roadmap before first delivering exascale.
Of course, people also wondered how Intel would be able to reach zettascale so quickly, as this would outpace Moore’s Law by around 10 years. Recently, Intel provided a comprehensive deep dive into its vision for zettascale computing.
However, while the presentation suggests Intel has some interesting technologies in development, Intel itself admitted it has not solved all problems yet, which suggests some of these technologies are still in research rather than development, as would be required to reach the ambitious 2027 target. The presentation itself also suggested the target has been postponed already to <2030 instead.
In that regard, I have to tone down my initial enthusiasm/hype, and suggest investors take a wait-and-see attitude; the zettascale strategy certainly shouldn’t be a primary thesis for investing in the stock.
The presentation took the upcoming Intel-based two exaflops Aurora supercomputer, based on its Ponte Vecchio GPU, as starting point. So in order to reach zettascale (1 zettaflops), Intel needs a 500x improvement in performance per watt to make this possible in the same power envelope. In fact, as a first step Intel increased the power budget to 100MW (about 2x higher), so “just” a 250x improvement is required.
Intel then took its existing GPU and process roadmap to extrapolate the further improvements Intel would be able to reach in the late 2020s simply by following Moore’s Law. Intel has publicly disclosed that its 2025 Falcon Shores was targeting a 5x improvement in performance per watt, so extrapolating another generation or so further, and this suggests that Moore’s Law could yield a 10x improvement in performance per watt. This brings the 250x requirement down to just 25x.
As one aside, Intel said that Ponte Vecchio is capable of 60TFLOPS and that Aurora has over 60k Ponte Vecchios, which suggests that Aurora has actually about a 3.6 EFLOPS theoretical peak performance, which implies “only” a bit over 250x performance is required to go from Aurora to zettascale, or 125x if the power budget is increased.
Nevertheless, Intel indeed acknowledged that Moore’s Law alone wouldn’t be enough, as it said that Ponte Vecchio has 100B transistors, while a zettaflops GPU would require on the order of 10-100T transistors instead. For comparison, Intel’s official goal is to reach 1T transistors by 2030, falling short of the target by at least 10x.
As such, Intel then turned to some technologies that are in R&D that could bridge the gap.
The first technique is low voltage design and operation. Now, at first sight this is not necessarily something ground-breaking, as it is simply a known fact that silicon has a quadratic or even cubic performance-power curve, as evidenced for example in Intel’s marketing that Raptor Lake at 65W delivers the same performance as Alder lake at 241W. Both CPUs are made on roughly the same manufacturing process, but Raptor Lake simply added more cores so it could deliver the same performance at a lower frequency.
Nevertheless, a more extreme application of this is called near-threshold voltage (NTV) computing, and it is much rarer. As the name implies, by operating close to the transistor’s threshold voltage, which is the most energy efficient regime, the efficiency improves dramatically. However, the drawback is that while the power drops a lot, the performance also decreases to some extent, while the manufacturing cost remains the same.
Put differently, given that manufacturing cost is dependent on the size of the chip, in general the industry trend is to maximize the clock speed (performance) of the piece of silicon being sold (over the last decade, the clock speed of GPUs has increased from a bit over 1GHz to nearly 3GHz, for example), which naturally comes at the expense of power efficiency.
In general, this is also the main issue I found after going through Intel’s presentation about reaching zettascale: Intel indeed detailed a plausible path to improve the energy efficiency by hundreds of times, but even if Intel could get all these technologies from R&D to production in the next several years, then the issue that remains is cost, which is ultimately determined by Moore’s Law, which is only expected to improve by at most 10x over the next decade.
In any case, the research Intel presented was still quite novel, and it actually partly solves the performance problem (when operating at near-threshold voltage) just described. To wit, Intel said that it could overcome the performance issue in two ways. First, through optimized circuit design, for which it referred to its Blockscale bitcoin mining chip, which operates at a respectable 1.6GHz at just 355mV. For comparison, Intel’s earlier Ponte Vecchio target for 44TFLOPS was based on a 1.4GHz frequency.
Secondly, and this is the novel part, is that Intel claimed it is possible to improve the frequency of circuits by using 3D packaging. By removing the slowest parts of the circuit (which is the memory aka SRAM) that form the bottleneck, and by putting those on a separate chiplet, the clock speed could be improved. Note that this is similar to AMD’s (AMD) 3D V-Cache, but Intel’s proposal is even more extreme as it would remove all memory from the compute chiplet to completely remove this bottleneck.
Overall, Intel argues it could reduce the voltage from 500mV to 350mV without impacting performance, which would improve performance per watt by around 2x. Simply going to low voltage (presumably 500mV) would also yield a 2x improvement, while the circuit techniques also yield a 2x improvement. Overall, it seems Intel expects that by pushing this ultra-low voltage regime, the total benefit could be nearly an order of magnitude.
Intel had a second idea that could improve performance per watt by another 2x, which is cool/cryo CMOS. For those who haven’t had a course in semiconductor physics, it turns out that semiconductors improve when they operate at lower temperature. So for HPC applications, cryo CMOS could be a legitimate way to improve performance per watt. However, the drawback is that reducing the operating temperature also requires more cooling, and some components such as DRAM don’t benefit as much. According to Intel’s research, the optimum trade-off is reached at around -40 to 0 degrees Celsius, providing a 2x benefit. Intel also said that by optimizing circuits for such temperatures, the benefit could be even greater.
The next set of proposals are based on the observation that only 23% of the total power goes towards computing. Intel argues, similar to the NTV chiplet idea above, that by combining advanced packaging with several new technologies, it could remove these bottlenecks.
The first is the general efficiency of power delivery to the chip. To that end, Intel has been researching GaN semiconductors for some time, which could achieve over 90% efficiency, providing an incremental improvement.
The second innovation is silicon photonics. It is known that Intel’s goal is to make silicon photonics a mainstream data center interconnect that could replace PCIe. Whereas copper has at best a few pJ/bit energy consumption, silicon photonics could go well below 1pJ/bit.
Intel discussed Lightmatter as example, which is a start-up that uses silicon photonics as interconnect between chips on a silicon wafer. Basically, this ties to the idea that advanced packaging can be used to blur the line between a package and a system, by increasing the amount of chips that can be interconnected while maintaining near-monolithic characteristics. Note that the Lightmatter idea is similar to some extent to what Cerebras is doing with its Wafer-Scale Engine or what Tesla (TSLA) is doing with Dojo: the total power consumption increases from 100s of watts to around 15kW, resulting in a much higher performance as well.
Intel claimed that the combination of going wafer-scale using a silicon photonics interconnect could yield another 2x improvement in performance per watt. Intel provided a napkin calculation that a copper interconnect by itself would consume the full 100MW power budget, whereas optical could reduce this to just 3.2MW (although Intel’s calculation was done using 0.1pJ/bit energy efficiency, whereas its first-gen optical interconnect target is actually over 2pJ/bit).
As noted, the Aurora supercomputer deliver about 2 to 4 exaflops, which means a 250 to 500x improvement in performance per watt is required to reach zettascale within the same power budgets. However, if the power budget is doubled, only a 125-250x improvement is needed.
Intel estimates that its future architecture could improve performance per watt by around 2x, while its future angstrom-era process would yield another 5x, for a total of about 10x from Moore’s Law. Hence, at that point, zettascale is only 12.5-25x out of reach (which means it would take a nuclear plant to power such a supercomputer).
Next, by going to ultra-low voltage CMOS (the most efficient operating regime of a transistor), another 2x improvement could be eked out. Note that this is before any clocking and circuit optimizations, which supposedly yields another 2x benefit. Then, the technique of separating memory and compute through 3D packaging in order to alleviate further (clocking) bottlenecks yields another 2x benefit. In addition, going to cool CMOS delivers another 2x improvement.
In total, these techniques deliver a 16x improvement. Lastly, using GaN for power delivery and wafer-scale computing with a silicon photonics network deliver another 2x improvement, bringing the non-Moore’s Law improvements to around 32x.
Stacking this (roughly) 32x with a 10x improvement from Moore’s Law yields a 320x improvement in performance per watt. Since Aurora already delivers over 2 exaflops, this means zettascale could become possible with just a small increase in power. QED.
Many of the techniques discussed pertain to low voltage operation, which seems so obvious that it makes one wonder why they aren’t done yet to any extent. The answer, as discussed already, is cost: if there is a technique that reduces power by 2x (at iso performance), then the organization that buys the supercomputer will still need to buy 2x more chips to achieve 2x performance.
In other words, Intel’s presentation was about how it could be possible to improve the performance per watt of a supercomputer by up to 1000x. However, the feasibility of such a system comes down to cost in addition to power. To that end, Intel should also have discussed how these techniques impact not just performance per watt, but also the performance per mm2 of silicon.
For reference, 60k Ponte Vecchios as $5k per piece (the exact price is unknown, but Nvidia’s (NVDA) top-end GPUs easily go for $10k) is already $300M worth of GPUs before any of the other components. Of course, perhaps Intel could reduce the price of its chips, since the manufacturing cost of a full 5nm wafer (on top of which dozens of chips are manufactured) is below $10k, but this means less gross margin, impacting the profitability.
Ultimately, the presentation Intel gave at the engineering conference about reaching zettascale was indeed that: an engineering challenge, with many hurdles to overcome before commercialization, as on the order of half a dozen innovations were discussed besides Moore’s Law that would be required to make this 2027 zettascale target a reality. The Intel Fellow even frankly answered one question with the statement that if the memory problem wasn’t solved, zettascale could be a late 2030s project.
As discussed throughout, even if some of the techniques such as ultra-low voltage and low temperature operation are perhaps fairly straightforward, the commercial reality is that NTV (near-threshold voltage) computing has not seen much use to date in high-performance applications such as CPUs and GPUs because it inevitably also results in lower performance despite requiring the same silicon (cost) footprint. While performance per watt may be seen as important, especially in discussions, the commercial reality of products is that performance per dollar (performance per mm2) is king.
As such, while even CEO Pat Gelsinger was boldly talking about the zettascale 2027 target at Intel’s Innovation event in October 2021, unless literally all of these various innovations are already further along than Intel made it seem (and even then there are still question marks), then Pat Gelsinger and Raja Koduri have likely been a bit premature in starting this zettascale hype train.
In short, while Intel has charted an aspiring course for the future of HPC (high-performance computing), so far this remains an engineering challenge, not (yet) an investing opportunity.
Disclosure: I/we have a beneficial long position in the shares of INTC either through stock ownership, options, or other derivatives. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.