Imagine an AI that not only understands commands, but applies them like a human in a series of simulated 3D environments.
This is the goal of DeepMind (Scalable, Instructable, Multiworld Agent (SIMA).
Unlike traditional AI, which could excel at individual tasks equivalent to strategic games or solving specific problems, SIMA's agents are trained to interpret instructions in human language and translate them into actions using a keyboard and mouse, thereby improving the imitating human interaction with a pc.
This implies that SIMA goals to grasp and execute these commands with the identical intuition and flexibility, whether it's navigating a digital landscape, solving puzzles, or interacting with objects in a game, like a person would do it.
Introducing SIMA: the primary generalist AI agent that follows natural language instructions in a wide selection of 3D virtual environments and video games. 🕹️
It can perform tasks just like a human, outperforming an agent trained in only one environment. 🧵 https://t.co/qz3IxzUpto pic.twitter.com/02Q6AkW4uq
– Google DeepMind (@GoogleDeepMind) March 13, 2024
At the core of this project is an unlimited and diverse dataset of human gameplay in research environments and business video games.
SIMA has been trained and tested on a number of nine video games through collaboration with eight game studios, including well-known titles equivalent to No Man's Sky and Teardown. Each game challenges SIMA with different skills, from easy navigation and resource gathering to more complex activities equivalent to crafting and spaceship piloting.
SIMA's training included 4 research environments to judge its physical interaction and object manipulation skills.
In terms of architecture, SIMA uses pre-trained vision and video prediction models which are fine-tuned to the particular 3D settings of its gaming portfolio.
Unlike traditional game AIs, SIMA doesn’t require access to source code or custom APIs. It serves screen images and user-provided instructions and uses keyboard and mouse actions to perform tasks.
In its evaluation phase, SIMA demonstrated proficiency in 600 basic skills, including navigation, object interaction, and menu usage.
What sets SIMA apart is its universality. This AI is just not trained to master a single game or solve a particular set of problems.
Instead, DeepMind teaches it to be adaptable, understand instructions and act accordingly in several virtual worlds.
DeepMind's Tim Harley explained: “It's still a research project,” but in the long run “one could imagine agents like SIMA someday playing alongside you and your mates in games.”
SIMA only requires the photographs provided by the 3D environment and natural language instructions provided by the user. 🖱️
Mouse and keyboard output assesses 600 skills, covering areas equivalent to navigation and object interaction – equivalent to “turning left” or “cutting down a tree”…. pic.twitter.com/PEPfLZv2o0
– Google DeepMind (@GoogleDeepMind) March 13, 2024
SIMA masters the art of understanding our instructions and acting accordingly by anchoring language in perception and motion.
DeepMind has an intensive gaming legacy dating back to 2014's AlphaGo, which defeated several high-profile players of the famously complex Asian game Go.
However, SIMA goes deeper than video games and gets closer to the dream of truly intelligent, instructable AI agents that blur the lines between human and machine understanding.