Microsoft (MSFT) has released its latest artificial intelligence accelerator chip, the Maia 200, which is built for inference and to improve the efficiency of token generation.
The new chip is built with Taiwan Semiconductor Manufacturing’s (TSM) 3nm process with native FP8/FP4 tensor cores, a redesigned memory system with 216GB HBM3e at 7 TB per second, and 272MB of on-chip SRAM, plus data movement engines.
Microsoft’s executive vice president of cloud and AI, Scott Guthrie, said it outperforms AI chips built by hyperscaler competitors such as Amazon (AMZN) and Google (GOOG)(GOOGL).
“This makes Maia 200 the most performant, first-party silicon from any hyperscaler, with three times the FP4 performance of the third-generation Amazon Trainium and FP8 performance above Google’s seventh-generation TPU,” Guthrie said. “Maia 200 is also the most efficient inference system Microsoft has ever deployed, with 30% better performance per dollar than the latest generation hardware in our fleet today.”
Microsoft said it has already deployed the Maia 200 in its U.S. Central datacenter region in Iowa. It plans to deploy the chip in the U.S. West 3 datacenter region near Phoenix next, with more regions to follow.
“It is designed for the latest models using low-precision compute, with each Maia 200 chip delivering over 10 petaFLOPS in 4-bit precision (FP4) and over 5 petaFLOPS of 8-bit (FP8) performance, all within a 750W SoC TDP envelope,” Guthrie added.
Although Nvidia’s (NVDA) AI chips remain the industry standard for training and inference, the high demand and high costs of those chips have prompted the three major hyperscalers to develop some of their own AI hardware.