HomeNewsNvidia introduces NIM to make deploying AI models in production smoother

Nvidia introduces NIM to make deploying AI models in production smoother

At its GTC conference today, Nvidia announced Nvidia NIM, a brand new software platform designed to streamline the deployment of custom and pre-trained AI models in production environments. NIM takes the software work that Nvidia has done around inferencing and optimizing models and makes it easily accessible by combining a given model with an optimized inference engine after which packaging it right into a container to make it accessible as a microservice .

Normally, Nvidia argues, it will take developers weeks – if not months – to ship similar containers – and that's assuming the corporate even has in-house AI talent. With NIM, Nvidia is clearly aiming to create an ecosystem of AI-ready containers that use their hardware as the muse layer and these curated microservices because the core software layer for firms seeking to speed up their AI roadmap.

NIM currently supports models from NVIDIA, A121, Adept, Cohere, Getty Images and Shutterstock, in addition to open models from Google, Hugging Face, Meta, Microsoft, Mistral AI and Stability AI. Nvidia is already working with Amazon, Google and Microsoft to make these NIM microservices available on SageMaker, Kubernetes Engine and Azure AI, respectively. They are also integrated into frameworks akin to Deepset, LangChain and LlamaIndex.

Photo credit: Nvidia

“We imagine that the Nvidia GPU is the most effective place to run the inference of those models on (…) and we imagine that NVIDIA NIM is the most effective software package, the most effective runtime, for developers to construct on so that they can “On the enterprise applications – and just let Nvidia do the work of constructing those models for them in essentially the most efficient and enterprise-ready way so that they can just do the remainder of their work,” said Manuvir Das, head of enterprise computing at Nvidia, during a press conference ahead of today’s announcements.”

Nvidia will use the Triton Inference Server, TensorRT and TensorRT-LLM because the inference engine. Nvidia microservices available through NIM include Riva for language and translation model customization, cuOpt for routing optimizations, and the Earth-2 model for weather and climate simulations.

The company plans so as to add additional features over time, including, for instance, deploying the Nvidia RAG LLM operator as a NIM, which should significantly simplify the event of generative AI chatbots that may retrieve custom data.

This wouldn't be a developer conference with out a few customer and partner announcements. Current NIM users include Box, Cloudera, Cohesity, Datastax and Dropbox
and NetApp.

“Established enterprise platforms have a goldmine of knowledge that could be changed into generative AI copilots,” said Jensen Huang, founder and CEO of NVIDIA. “These containerized AI microservices, developed with our partner ecosystem, are the constructing blocks for firms in every industry to turn out to be AI firms.”


Please enter your comment!
Please enter your name here

Must Read