Nvidia (NVDA) said leading cloud providers — Amazon’s (AMZN) AWS, Alphabet’s (GOOG) (GOOGL) Google Cloud, Microsoft (MSFT) Azure and Oracle (ORCL) Cloud Infrastructure — are accelerating AI inference for their customers with Nvidia’s software platform Dynamo.
Dynamo is a software platform that enables multi-node inference across graphics processing unit, or GPU, clusters.
A multi-node system is an architecture that uses multiple independent computing systems, or “nodes” to work together as a single unit. AI inference is a process where a trained AI model uses its learned knowledge to make predictions or conclusions on new, unseen data.
Nvidia said Dynamo’s integrations with major cloud providers and support for new Kubernetes management capabilities enable multi-node inference for enterprises — boosting performance and efficiency for complex AI models, such as large-scale mixture of experts models.
Kubernetes management is the process of deploying, monitoring, securing, and scaling applications and clusters using specialized tools and best practices.
Nvidia noted that with Dynamo now integrated into managed Kubernetes services from all major cloud providers, customers can scale multi-node inference across Nvidia’s Blackwell systems, including GB200 and GB300 NVL72, with the performance and reliability that enterprise AI deployments demand.
Nvidia said Amazon’s Amazon Web Services, or AWS, is accelerating generative AI inference for its customers with Dynamo and integrated with Amazon EKS. Google Cloud is providing Dynamo recipe to optimize large language model, or LLM, inference at enterprise scale on its AI Hypercomputer.
Microsoft Azure is enabling multi-node LLM inference with Dynamo and ND GB200-v6 GPUs on Azure Kubernetes Service. Oracle Cloud Infrastructure is enabling multi-node LLM inferencing with OCI Superclusters and NVIDIA Dynamo, Nvidia added.