AMD Is Coming For Nvidia’s AI Lead
Summary:
- Nvidia’s lead in AI inference workloads is shrinking as AMD has shown parity with a Nvidia GPU on AI inference workloads for the first time.
- Nvidia remains a Buy due to its engineering expertise, high margins, market positioning, and CUDA software moat, despite AMD’s progress.
- AMD presents a strong long-term investment with an attractive risk-reward profile as it continues to close the gap with Nvidia in the GPU accelerator market.
- Nvidia stock is a Buy, and AMD is a Strong Buy.
NVIDIA Corporation (NASDAQ:NVDA) has long been untouchable in the GPU space, and its lead in AI applications especially has often seemed insurmountable. While that likely isn’t in danger of changing tomorrow, new AI benchmarks from MLCommons, an organization led by vendors to display their processor performances in an equitable and controlled manner, show that the race is certainly tightening. The group released MLPerf Inference v4.1 benchmark results a couple of weeks ago, showing Advanced Micro Devices, Inc. (AMD) narrowing the gap to Nvidia’s performance in AI inference workloads. Could AMD be on the verge of catching up? Let’s dive in!
AMD Hot on Nvidia’s Tail
While AMD has stormed back from near bankruptcy to outmatch Intel Corporation (INTC) in the data center CPU business, Nvidia has been able to maintain its lead in GPUs despite AMD’s push into the GPU accelerator market. There are a few reasons for that, but the main one, in my opinion, has just been the sheer performance advantage of Nvidia’s processors. While Intel struggled to scale out its 10 nm (Intel 7) node, Taiwan Semiconductor Manufacturing Company Limited (TSM) stormed past, allowing AMD to design processors on a superior node and eat Intel’s lunch. Nvidia has always been basically fabless, so it avoided such pitfalls and has been able to focus on what its engineers are good at designing bleeding-edge processors.
And boy, are they good at it. Whether in gaming or the data center, AMD has always been multiple generations behind Nvidia in terms of performance and power consumption. However, as AMD has seen success in the data center server market with its EPYC line of CPUs, it has been able to invest that back into GPU research and development initiatives that have begun to bear fruit.
While Nvidia is still far and away the leader in AI training performance, the real golden goose in AI is inference workloads. To quickly recap the difference, training is the process that “teaches” the AI model using a dataset and inference is the process whereby the taught model makes predictions on previously unseen data. Intel CEO Pat Gelsinger gave the example of creating weather models versus consuming them: only a few organizations predict the weather, but hundreds of millions of people check the forecast every day. It’s easy to see why inference benchmarks are where people’s attention is drawn.
MLPerf Inference v4.1 has fresh performance results from a multitude of chipmakers, but let’s focus on AMD and Nvidia. The headline takeaway, in my opinion, is that Nvidia’s lead is shrinking. For the first time ever, AMD has demonstrated parity with Nvidia’s current generation processor in an inference workload:
Note: The “Genoa” and “Turin” references are to the generation of AMD’s EPYC that the server is running, and these tests were run in 8xGPU configurations.
As we can see, AMD’s MI300X is essentially level with Nvidia’s H100 80 GB GPU in tokens/second in both server and offline inference workloads (server mode more closely matches how a real-world interaction would go). We don’t know exactly what these GPUs retail for, but we know AMD is aggressive at pricing its accelerated server offerings and that Nvidia, with net profit margins greater than 50%, is not. This is the same strategy AMD used to undercut Intel in the CPU market, and the strategy is likely to pull some customers away from Nvidia due to the comparable price-to-performance we see above.
Further, I should note that the MI300X actually has 192 GB of HBM3 memory, which is significantly more than the 80 GB of the H100 and the 141 GB of the H200. The model used in these evaluations, Llama 2 70B, is fairly lightweight, so these benchmarks are likely under-representing the performance of AMD’s processor.
Also included in the benchmarks, is Nvidia’s take on the matchup:
I think it’s apparent from both these images that AMD wants to emphasize the parity with the H100 while Nvidia would like to emphasize the lack thereof with the H200. Both have valid points. AMD would like to call attention to the fact that it has come a long way in providing a viable alternative to Nvidia’s dominance, and Nvidia is showing the performance gap is still sizable.
I’m sure many Nvidia bulls are reading this and wondering why anyone would be concerned when the company is still probably a generation and a half ahead. My response to that would be to look at what AMD did to Intel. Nvidia’s management is significantly more competent, but mistakes and missteps happen. For example, while it ended up being a rather short delay, the release of Nvidia’s next-generation Blackwell line of GPUs was pushed back because of a small design flaw that was affecting yields. As processors become more complex, design flaws become more likely and what seemed like a massive technological lead could evaporate.
Regarding Blackwell, Nvidia submitted benchmarks for just the B200 (the more powerful Blackwell chip), which demonstrated impressive performance (though it’s unclear how much was due to hardware improvements or the support for FP4) of 10,755 tokens/second in server mode on the Llama2-70B model. This would represent a nearly 4x improvement over the H100 and MI300X and a 2.5x improvement over the H200.
While this all sounds impressive, we’d be remiss not to note that the B200 will surely sell for more than double what an MI300X retails for and its power requirements will be significantly higher. The B100 will be somewhere in the middle. Still, the value proposition will have certainly shifted back in Nvidia’s favor, even at a substantially higher price tag. And because the cycle never ends, AMD is aiming for a new fourth quarter release as well: the MI325X.
AMD plans to provide more details about the release of this chip at its annual Advancing AI event where it will outline improvements in performance, efficiency, and a massive boost to memory. Specifically, MI325X will sport a hefty 288 GB of HBM3E memory (denser than HBM3), which is also significantly higher than the 192 GB HBM3E of both the B100 and B200, providing AMD a possible value edge in inference workloads for larger AI models.
All that said, it’s important to remember that hardware is only one side of the equation: Nvidia’s true moat lies in its CUDA software layer, which is the gold standard for developers who will actually be creating applications with these models. AMD has built out ROCm into a serviceable alternative, but it still has minimal adoption compared to CUDA and until that changes, Nvidia’s current customer base will remain sticky and reluctant to switch ecosystems.
On a pure compute basis, AMD appears to be closing the gap with Nvidia in AI inference workloads. The latter still offers many advantages that AMD will have to chip away at, but AMD bulls should be excited about the progress being made.
Investor Takeaway
Well, that was a lot of technical talk, what about the stock implications?
I still think the overall AI market will continue to sustain the supposedly rich valuations of AMD and NVDA for years to come. The GPU accelerator market will likely remain strong as cloud providers and other big tech companies scale up for an AI revolution that has just begun. Nvidia has the expertise, margins, and market positioning to continue to succeed despite AMD’s progress.
On the back of this wave of Blackwell processors, which will likely give the company an even bigger boost to its already insane profit margins as the new high-margin processors enter the product mix, continued lead in AI applications, and a software moat that looks impenetrable, Nvidia is a Buy. I recommend bulls keep an eye on these benchmarks going forward to see if the company is maintaining, losing, or maybe growing its lead over AMD and other competitors.
However, I think AMD presents a significantly more attractive risk-reward profile than NVDA. The GPU accelerator market will be lucrative for a long time, and AMD still has so little of the market that the potential upside, as the company improves its offerings and continues to close the gap with Nvidia, appears inevitable. Now, this could take years to play out, if not longer, so anyone hoping to buy AMD shares to benefit from the company eating Nvidia’s lunch with MI325X or MI350X might want to take a beat. But long-term, AMD is a company that has its hands in all the right pies and whose competitive standing is moving in the right direction. These factors make AMD a Strong Buy.
Thanks for reading!
Editor’s Note: This article discusses one or more securities that do not trade on a major U.S. exchange. Please be aware of the risks associated with these stocks.
Analyst’s Disclosure: I/we have no stock, option or similar derivative position in any of the companies mentioned, and no plans to initiate any such positions within the next 72 hours. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.
Seeking Alpha’s Disclosure: Past performance is no guarantee of future results. No recommendation or advice is being given as to whether any investment is suitable for a particular investor. Any views or opinions expressed above may not reflect those of Seeking Alpha as a whole. Seeking Alpha is not a licensed securities dealer, broker or US investment adviser or investment bank. Our analysts are third party authors that include both professional investors and individual investors who may not be licensed or certified by any institute or regulatory body.