DeepMind, Google's AI research organization, has presented one Model which may create an “countless” number of playable 3D worlds.
The model, called Genie 2 – the successor to DeepMind's Genie, released earlier this 12 months – can generate a real-time interactive scene (e.g. “A cute humanoid robot within the forest”) from a single image and a text description. In this respect, it is analogous to models being developed by Fei-Fei Li's company World Labs and Israeli startup Decart.
DeepMind claims that Genie 2 can create a “wide selection of wealthy 3D worlds,” including worlds where users can perform actions reminiscent of jumping and swimming using the mouse or keyboard. The model, trained on videos, is able to simulating object interactions, animations, lighting, physics, reflections and “NPC” behavior.
Many of Genie 2's simulations appear to be this AAA Video games – and the explanation could well be that the model's training data includes playthroughs of popular titles. But DeepMind, like many AI labs, didn't need to reveal many details about its data acquisition methods, whether for competitive reasons or otherwise.
One wonders concerning the impact on mental property. As a subsidiary of Google, DeepMind has full access to YouTube, and Google has previously indicated that it can receive permission to make use of YouTube videos for model training in its terms of service. But does Genie 2 essentially create unauthorized copies of the video games it has “seen”? The courts have to make your mind up that.
DeepMind says Genie 2 can produce consistent worlds with different perspectives reminiscent of first-person and isometric views for as much as a minute, with the bulk lasting 10 to twenty seconds.
“Genie 2 intelligently responds to actions performed by pressing keys on a keyboard, identifying the character and moving it appropriately,” DeepMind wrote in a blog post. “For example, our model (can) work out that arrow keys should move a robot, not trees or clouds.”
Most models like Genie 2 – world models, so to talk – can simulate games and 3D environments, but with artifacts, consistency and hallucination problems. Decart's Minecraft simulator Oasis, for instance, has a low resolution and quickly “forgets” the arrangement of the degrees.
However, Genie 2 can remember parts of a simulated scene that usually are not visible and play them back accurately once they turn into visible again. (World Labs models can do that too.)
Well, games made with Genie 2 wouldn't actually be that much fun as they might wipe your progress every minute or so. For this reason, DeepMind positions the model more as a research and inventive tool – a tool for prototyping “interactive experiences” and evaluating AI agents.
“Thanks to Genie 2’s out-of-distribution generalization capabilities, concept art and drawings could be transformed into fully interactive environments,” DeepMind wrote. “And by utilizing Genie 2 to quickly create wealthy and diverse environments for AI agents, our researchers can generate assessment tasks that agents haven’t yet seen during training.”
Creatives could have mixed feelings – especially those within the video game industry. A youngest Wired research found that major players like Activision Blizzard, which has laid off scores of employees, are using AI to chop corners, increase productivity and smooth out turnover.
Still, Google has poured increasingly resources into its world modeling research, which guarantees to be the following big thing in AI. In October, DeepMind hired Tim Brooks, who led the event of OpenAI's Sora video generator, to work on video generation technologies and world simulators. And two years ago, the laboratory poached Tim Rocktäschel, who’s best known for his “openness”. Experiments with video games like NetHack by Meta.