Jensen Huang, CEO of Nvidia, gave a keynote speech on the Computex trade show in Taiwan about transforming AI models with Nvidia NIM (Nvidia Inference Microservices) in order that AI applications may be deployed in minutes as a substitute of weeks.
He said the world's 28 million developers can now download Nvidia NIM – inference microservices that deploy models as optimized containers – to deploy in clouds, data centers or workstations, allowing them to simply construct generative AI applications for copilots, chatbots and more in minutes as a substitute of weeks, he said.
These latest generative AI applications have gotten increasingly complex and sometimes use multiple models with different capabilities to generate text, images, video, speech, and more. Nvidia NIM significantly increases developer productivity by providing a straightforward, standardized approach to integrate generative AI into their applications.
NIM also enables enterprises to maximise their infrastructure investments. For example, when Meta Llama 3-8B is running on a NIM, as much as 3 times more generative AI tokens are produced on accelerated infrastructure than without NIM. This enables enterprises to extend efficiency and generate more answers with the identical amount of compute infrastructure.
Nearly 200 technology partners – including Cadence, Cloudera, Cohesity, DataStax, NetApp, Scale AI, and Synopsys – are integrating NIM into their platforms to speed up generative AI deployments for domain-specific applications corresponding to copilots, code assistants, digital human avatars, and more. Hugging Face now offers NIM – starting with Meta Llama 3.
“Every company desires to integrate generative AI into their operations, but not every company has a dedicated team of AI researchers,” said Huang. “Integrated into platforms all over the place, accessible to developers all over the place, running anywhere – Nvidia NIM helps the technology industry
Make generative AI accessible to each organization.”
Enterprises can deploy AI applications in production with NIM through the Nvidia AI Enterprise software platform. Starting next month, Nvidia Developer Program members can access NIM free of charge to research, develop and test on their preferred infrastructure.
More than 40 microservices drive the AI models of Generation AI
NIM containers are pre-built to speed up model deployment for GPU-accelerated inference and might include Nvidia CUDA software, Nvidia Triton Inference Server, and Nvidia TensorRT-LLM software.
Over 40 Nvidia and community models can be found to check out as NIM endpoints on ai.nvidia.com, including Databricks DBRX, Google's open model Gemma, Meta Llama 3, Microsoft Phi-3, Mistral Large, Mixtral 8x22B, and Snowflake Arctic.
Developers can now access Nvidia NIM microservices for Meta Llama 3 models from the Hugging Face AI platform, allowing developers to simply access and run the Llama 3 NIM with just a couple of clicks using Hugging Face Inference Endpoints powered by NVIDIA GPUs of their preferred cloud.
Companies can use NIM to run applications to generate text, images and videos, speech, and digital humans. Nvidia BioNeMo NIM microservices for digital biology enable researchers to create novel protein structures to speed up drug discovery.
Dozens of healthcare firms are using NIM to enable generative AI inference in a spread of applications, including surgical planning, digital assistants, drug discovery, and clinical trial optimization.
Hundreds of AI ecosystem partners integrate NIM
Platform providers corresponding to Canonical, Red Hat, Nutanix, and VMware (acquired by Broadcom) support NIM on open source KServe or enterprise solutions. AI application firms Hippocratic AI, Glean, Kinetica, and Redis also use NIM to enable generative AI inference.
Leading AI tools and MLOps partners – including Amazon SageMaker, Microsoft Azure AI, Dataiku, DataRobot, deepset, Domino Data Lab, LangChain, Llama Index, Replicate, Run.ai, Securiti AI, and Weights & Biases – have also integrated NIM into their platforms to enable developers to construct and deploy domain-specific generative AI applications with optimized inference.
Global systems integrators and repair delivery partners Accenture, Deloitte, Infosys, Latentview, Quantiphi, SoftServe, TCS and Wipro have developed NIM competencies to assist organizations worldwide rapidly develop and deploy production AI strategies.
Enterprises can run NIM-enabled applications virtually anywhere, including on Nvidia-certified systems from global infrastructure manufacturers Cisco, Dell Technologies, Hewlett-Packard Enterprise, Lenovo, and Supermicro, in addition to server manufacturers ASRock Rack, Asus, Gigabyte, Ingrasys, Inventec, Pegatron, QCT, Wistron, and Wiwynn. NIM microservices have also been integrated into Amazon.
Web services, Google Cloud, Azure and Oracle Cloud infrastructure.
Industry leaders Foxconn, Pegatron, Amdocs, Lowe's and ServiceNow are among the many
Companies using NIM for generative AI applications in manufacturing, healthcare,
Financial services, retail, customer support and more.
Foxconn – the world’s largest electronics manufacturer – uses NIM to develop domain-specific LLMs embedded in a wide range of internal systems and processes across its AI factories for smart manufacturing, smart cities, and smart electric vehicles.
Developers can experiment with Nvidia microservices free of charge at ai.nvidia.com. Enterprises can deploy production-ready NIM microservices with Nvidia AI Enterprise on Nvidia-certified systems and leading cloud platforms. Starting next month, Nvidia Developer Program members will get free access to NIM for research and testing.
Nvidia Certified Systems Program
Driven by generative AI, firms around the globe are creating “AI factories” that feed data and output information.
And Nvidia is making its technology an important a part of enabling firms to deploy validated systems and reference architectures that reduce the chance and time required to deploy specialized infrastructure that may support complex, compute-intensive generative AI workloads.
Nvidia ALSO today announced the expansion of its Nvidia Certified Systems program, which designates leading partner systems as ready for AI and accelerated computing, enabling customers to simply deploy these platforms from the info center to the sting.
Two latest certification types at the moment are included: Nvidia-certified Spectrum-X Ready systems for AI in the info center and Nvidia-certified IGX systems for AI at the sting. Each Nvidia-certified system undergoes rigorous testing and is validated to deliver enterprise-grade performance, manageability, security, and scalability for Nvidia AI.
Enterprise software workloads, including generative AI applications, built with Nvidia NIM (Nvidia Inference Microservices). The systems provide a trusted approach to design and implement efficient, reliable infrastructure.
The world's first Ethernet fabric designed for AI, the Nvidia Spectrum-X AI Ethernet platform combines the Nvidia Spectrum-4 SN5000 Ethernet switch series, Nvidia BlueField-3 SuperNICs, and network acceleration software to deliver 1.6x the AI network performance in comparison with traditional Ethernet fabrics.
Nvidia-certified Spectrum-X Ready servers function constructing blocks for high-performance AI compute clusters, supporting the powerful Nvidia Hopper architecture and Nvidia L40S GPUs.
Nvidia certified IGX systems
Nvidia IGX Orin is an enterprise-ready AI platform for industrial edge and medical applications that features industrial-grade hardware, a production-grade software stack, and long-term enterprise support.
It includes the most recent device security, distant provisioning and management technologies, and built-in extensions to deliver powerful AI and proactive security for real-time, low-latency applications in areas corresponding to medical diagnostics, manufacturing, industrial robotics, agriculture, and more.
The most significant partners of the Nvidia ecosystem will receive the brand new certifications. Asus, Dell Technologies, Gigabyte, Hewlett Packard Enterprise, Ingrasys, Lenovo, QCT and Supermicro will soon offer the certified systems.
And certified IGX systems will soon be available from Adlink, Advantech, Aetina, Ahead, Cosmo Intelligent Medical Devices (a division of Cosmo Pharmaceuticals), Dedicated Computing, Leadtek, Onyx and Yuan.
Nvidia also said that deploying generative AI within the enterprise will likely be easier than ever. Nvidia NIM, a set of generative AI inference microservices, will work with KServe, an open-source software that automates the deployment of AI models at the size of a cloud computing application.
The combination ensures that generative AI may be deployed like several other large enterprise application. In addition, NIM is deployed across platforms from dozens of firms corresponding to Canonical, Nutanix and
Red hat.
Integrating NIM into KServe extends Nvidia's technologies to the open source community, ecosystem partners, and customers. Through NIM, they’ll all access the facility, support, and security of the Nvidia AI Enterprise software platform with an API call—the push button of recent programming.
Meanwhile, Huang said Meta Llama 3, Meta's freely available, cutting-edge large language model – trained and optimized with accelerated computing from Nvidia – is dramatically improving workflows in healthcare and life sciences and helping deliver applications designed to enhance patients' lives.
Now available as a downloadable Nvidia NIM inference microservice at ai.nvidia.com, Llama 3 enables developers, researchers, and healthcare firms to responsibly innovate across a wide selection of applications. The NIM has a typical application programming interface that may be deployed anywhere.
For use cases starting from surgical planning and digital assistants to drug discovery and clinical trial optimization, Llama 3 enables developers to simply deploy optimized generative AI models for copilots, chatbots, and more.