Nvidia is engaged in world models – AI models that take inspiration from the mental models of the world that individuals naturally develop.
At the Consumer Electronics Show in Las Vegas, the corporate announced that it’s making publicly available a family of world models that may predict and generate “physics-aware” videos. Nvidia calls this family Cosmos World Foundation Models, or Cosmos WFM for brief.
The models, which could be fine-tuned for specific applications, can be found in Nvidia's API and NGC catalogs, in addition to on the Hugging Face AI developer platform.
“Nvidia is making the primary wave of Cosmos WFMs available for physics-based simulation and artificial data generation,” the corporate wrote in a blog post to TechCrunch. “Researchers and developers can freely use the Cosmos models no matter their company size under Nvidia’s permissive open model license, which allows business use.”
The Cosmos WFM family includes a spread of models divided into three categories: Nano for low latency and real-time applications; Super for “high-performance basic models”; And Ultra for optimum quality and fidelity.
Models range in size from 4 to 14 billion parameters, with Nano being the smallest and Ultra being the most important. Parameters roughly correspond to a model's problem-solving capabilities, and models with more parameters generally perform higher than those with fewer parameters.
As a part of Cosmos WFM, Nvidia can be releasing an “upsampling model,” a video decoder optimized for augmented reality, guardrail models to make sure responsible use, and fine-tuned models for applications akin to sensor data generation for autonomous vehicle development. This, in addition to the opposite Cosmos WFM models, were trained on 9,000 trillion tokens from 20 million hours of real human interactions, environmental, industrial, robotics and driving data, Nvidia claimed. (In AI, “tokens” represent pieces of raw data – on this case, video footage.)
Nvidia wouldn't say where this training data got here from, but at the very least a report – and suit — claims that the corporate was training on copyrighted YouTube videos without permission. We've reached out to Nvidia's press team for comment and can update this text as soon as we hear back.
Nvidia claimed that Cosmos WFM models using text or video images can generate “controllable, high-quality” synthetic data to assist train models for robotics, self-driving cars, and more.
“Nvidia Cosmos' suite of open models means developers can customize the WFMs to fulfill the needs of their goal application using datasets akin to video recordings of autonomous vehicle rides or robots navigating a warehouse,” Nvidia wrote in a press release. “Cosmos WFMs are specifically designed for physics AI research and development and may generate physics-based videos from a mixture of inputs akin to text, image and video, in addition to robotic sensors or motion data.”
Nvidia said corporations like Waabi, Wayve, Fortellix and Uber have already committed to testing Cosmos WFMs for various use cases, from video search and curation to constructing AI models for self-driving vehicles.
It is very important to notice that Nvidia's world models will not be “open source” within the strict sense. To meet a widely accepted definition of “open source” AI, an AI model must provide enough details about its design for an individual to “substantially” recreate it and disclose all relevant details about its training data, including provenance and the way the info could be obtained or licensed.
Nvidia has not released details in regards to the Cosmos WFM training data nor provided all of the tools needed to recreate the models from scratch. That's probably why the tech giant calls the models “open” slightly than “open source.”