Deploy generative AI faster with NVIDIA NIM on AWS

Unlock the potential of generative AI and LLMs at scale

Overview

NVIDIA NIM inference microservices integrate closely with AWS managed services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon SageMaker, to enable the deployment of generative AI models at scale. As part of NVIDIA AI Enterprise, available in the AWS Marketplace, NVIDIA NIM is a set of easy-to-use microservices designed to accelerate the deployment of generative AI. These prebuilt containers support a broad spectrum of generative AI models from open-source community models to NVIDIA AI Foundation and custom models. NIM microservices are deployed with a single command for easy integration into generative AI applications using industry-standard APIs and just a few lines of code. Engineered to facilitate seamless generative AI inferencing at scale, NIM ensures generative AI applications can be deployed anywhere.

Image of shapes in a stacked formation

Benefits

Self-host the latest generative AI models using NIM on AWS and maintain security and control of generative AI applications and data using AWS’s industry leading security capabilities.
Reduce engineering efforts and costs with prebuilt, continuously updated microservices that ensure access to the latest advancements in generative AI inference technology. Quickly deploy specialized out-of-the-box generative AI applications with support for fine-tuned foundation models (FMs).
Minimize the complexities of generative AI model development with access to the latest generative AI models, industry-standard APIs, and tools.
Increase token generation and responsiveness with low-latency and high-throughput generative AI inference that scales seamlessly in the cloud.
NIM includes dedicated feature branches, rigorous validation processes, regular security updates for common vulnerabilities and exposures (CVEs), and support with direct access to NVIDIA AI experts.

Performance

As part of the NVIDIA AI Enterprise suite of software, NIM goes through exhaustive tuning to ensure the high-performance configuration for each model. Using NIM, throughput and latency improve significantly. For example, the NVIDIA Llama 3.1 8B Instruct NIM has achieved 2.5x improvement in throughput, 4x faster “time to first token” (TTFT), and 2.2x faster “inter-token latency” (ITL) compared to the best open-source alternatives.

Stats

Improved throughput on Llama 3.1 8B Instruct with NIM On versus NIM Off

Faster TTFT on Llama 3.1 8B Instruct with NIM On versus NIM Off

Faster ITL on Llama 3.1 8B Instruct with NIM On versus NIM Off

Features

Prebuilt containers

NIM offers a variety of prebuilt containers and Helm charts, which include optimized generative AI models. NIM seamlessly integrates with Amazon EKS to deliver a high-performance and cost-optimized model serving infrastructure.

Standardized APIs

Simplify the development, deployment, and scaling of generative AI-based applications, with industry-standard APIs for building powerful copilots, chatbots, and generative AI assistants on AWS. These APIs are compatible with standard deployment processes, meaning teams can update applications quickly and easily.

Model support

Deploy custom generative AI models that are fine-tuned to specific industries or use cases. NIM supports generative AI use cases across multiple domains including LLMs, vision language models (VLMs), and models for speech, images, video, 3D, drug discovery, medical imaging, and more.

Domain-specific

NIM includes domain-specific NVIDIA CUDA libraries and specialized code, covering areas such as speech, language, and video processing.

Inference engines

Optimized using Triton Inference Server, TensorRT, TensorRT-LLM, and PyTorch NIM maximizes throughput and decreases latency, thereby reducing the cost of running inference workloads as they scale.

How to get started with NVIDIA NIM on AWS

Deploy production-grade NIM microservices with NVIDIA AI Enterprise running on AWS

Fast and easy generative AI deployment

To get started, users can set up optimized inference workloads on AWS with accelerated generative AI models in NVIDIA’s API catalog at ai.nvidia.com. When ready to deploy, organizations can self-host models with NVIDIA NIM and run them securely on AWS, giving them ownership of their customizations and full control of their intellectual property (IP) and generative AI applications.

Customers can purchase the NVIDIA AI Enterprise license from the AWS Marketplace then go to NVIDIA NGC to access the NIM catalog, download the containers, and bring them to AWS. Deploy NIM on Amazon Elastic Compute Cloud (Amazon EC2), Amazon EKS, and Amazon SageMaker using AWS Batch, AWS ParallelCluster, Amazon FXs for Lustre, and Amazon Simple Storage Service (Amazon S3).

Learn more about NVIDIA AI Enterprise

NVIDIA NIM FAQs

Access NVIDIA NIM in NVIDIA AI Enterprise in the AWS Marketplace for self-hosted deployment on AWS, including Amazon EC2, Amazon EKS, Amazon SageMaker, and more. Use NVIDIA NIM to deploy generative AI models in optimized containers accelerated by NVIDIA GPUs. NIM is an easy way for developers to integrate generative AI into applications such as copilots, chatbots, and more.

Developers can experiment with NVIDIA microservices at ai.nvidia.com at no charge. For customers who want to self-host a NIM, NVIDIA offers a free evaluation license to try NVIDIA AI Enterprise for 90 days. NVIDIA also offers free hands-on labs through NVIDIA LaunchPad for customers who want to try NIM without an existing infrastructure.

Your AWS bill will include NVIDIA AI Enterprise when purchased in AWS Marketplace. Customers can also contact NVIDIA directly via the "private offer" button in the NVIDIA AI Enterprise AWS Marketplace listing to purchase an annual license.

NVIDIA AI Enterprise support includes access to NVIDIA AI experts, comprehensive software patches, updates, upgrades, and technical support. Customized support upgrade options are also available via the “private offer” button in the NVIDIA AI Enterprise AWS Marketplace listing.

Customers can use the combination of AWS managed services with NIVIDIA NIM to quickly deploy FMs on Amazon EC2, Amazon EKS, and Amazon SageMaker using AWS Batch, AWS ParallelCluster, Amazon FXs for Lustre, and Amazon S3.

Deploy accelerated generative AI inference at scale with NVIDIA NIM on AWS today