HomeArtificial IntelligenceGoogle DeepMind introduces a brand new video model that rivals Sora

Google DeepMind introduces a brand new video model that rivals Sora

Google DeepMind, Google's flagship AI research lab, desires to beat OpenAI in video generation – and that may very well be the case, at the very least for a short while.

On Monday, DeepMind announced Veo 2, a next-generation video-generating AI and successor to Veo that powers a growing variety of products across Google's portfolio. Veo 2 can create clips longer than two minutes in resolutions as much as 4K (4096 x 2160 pixels).

What's notable is that that is 4 times the resolution – and greater than six times the duration – that OpenAI's Sora can achieve.

Admittedly, it’s a theoretical advantage for now. In Google's experimental video creation tool VideoFX, where Veo 2 is now exclusive, videos are limited to 720p and eight seconds in length. (Sora can produce as much as 1080p, 20 second clips.)

Veo 2 in VideoFX.Photo credit:Google

VideoFX is on a waiting list, but Google says it’s increasing the variety of users who can access it this week.

Eli Collins, VP of Product at DeepMind, also told TechCrunch that Google will make Veo 2 available through its Vertex AI developer platform “once the model is prepared to be used at scale.”

“In the approaching months, we’ll proceed to iterate based on user feedback,” Collins said, “and (we) look to integrate Veo 2's updated features into compelling use cases across the Google ecosystem… (we) expect.” to share more updates next yr.”

More controllable

Like Veo, Veo 2 can create videos with a text prompt (e.g. “A automotive is speeding down a highway”) or text and a reference image.

So what's recent in Veo 2? According to DeepMind, the model, which might generate clips in numerous styles, has an improved “understanding” of physics and camera controls and produces “clearer” footage.

By clearer, DeepMind implies that textures and pictures in clips are sharper – especially in scenes with lots of movement. The improved camera controls allow Veo 2 to more precisely position the virtual “camera” within the videos it generates, and to maneuver that camera to capture objects and other people from different angles.

DeepMind also claims that Veo 2 can model motion, fluid dynamics (e.g. pouring coffee right into a cup) and lighting properties (e.g. shadows and reflections) more realistically. According to DeepMind, this includes various lenses and cinematic effects in addition to “nuanced” human expression.

I see Google 2
Google Veo 2 example. Note that the compression artifacts occurred when converting the clip to a GIF. Photo credit:Google

DeepMind shared just a few select examples of Veo 2 with TechCrunch last week. For AI-generated videos, they looked pretty good – exceptionally good, actually. Veo 2 seems to have a keen sense of sunshine refraction and difficult liquids like maple syrup, and a knack for emulating Pixar-style animations.

But despite DeepMind's insistence that the model is less more likely to hallucinate elements like extra fingers or “unexpected objects,” Veo 2 can't quite overcome the uncanny valley.

Note the lifeless eyes of this dog-like cartoon creature:

I see Google 2
Photo credit:Google

And the strangely slippery road on this footage – plus the pedestrians within the background merging into one another and the buildings with physically unimaginable facades:

I see Google 2
Photo credit:Google

Collins admitted there remains to be work to be done.

“Coherence and consistency are areas for growth,” he said. “Veo can follow a prompt consistently for just a few minutes, but not complex prompts for an extended time period. Likewise, character consistency generally is a challenge. There can be room for improvement in creating intricate details, fast and sophisticated movements, and in further pushing the boundaries of realism.”

DeepMind continues to work with artists and producers to refine its models and video generation tools, Collins added.

“Since the start of our Veo development, we have now began working with creatives like Donald Glover, The Weeknd, d4vd and more to actually understand their creative process and the way technology will help bring their vision to life,” Collins said. “Our work with the developers of Veo 1 has informed the event of Veo 2 and we stay up for working with trusted testers and developers to get feedback on this recent model.”

Safety and training

Veo 2 was trained using many videos. In general, AI models work like this: When given example after example of a form of knowledge, the models recognize patterns in the info, allowing them to generate recent data.

DeepMind won't say exactly where the Veo 2 training videos were scraped, but YouTube is a possible source; Google owns YouTube, and DeepMind previously told TechCrunch that Google models like Veo “may” be trained on some YouTube content.

“Veo has been trained in combining high-quality video descriptions,” said Collins. “Video description pairs are a video and an associated description of what happens in that video.”

I see Google 2
Photo credit:Google

While DeepMind, through Google, provides tools that allow webmasters to dam the lab's bots from extracting training data from their web sites, DeepMind doesn’t provide a mechanism for creators to remove works from its existing training sets. The lab and its parent company claim that training models are based on public data fair useThis implies that DeepMind believes that it is just not required to acquire permission from data owners.

Not all creatives agree – especially with reference to Studies It is estimated that tens of 1000’s of film and tv jobs may very well be destroyed by AI in the approaching years. Several AI corporations, including the eponymous startup behind the favored AI art app Midjourney, are within the crosshairs of lawsuits accusing them of violating artists' rights by training them on content without consent.

“We are committed to working with creators and our partners to attain common goals,” Collins said. “We proceed to interact with the creative community and other people across the industry, gathering insights and listening to feedback, including from those that use VideoFX.”

Because of the way in which today's generative models behave during training, they pose certain risks, reminiscent of: B. Burping when a model generates a mirror copy of the training data. DeepMind's solution is prompt-level filters, including for violent, graphic and explicit content.

Googles Compensation policywhich provides certain customers with a defense against allegations of copyright infringement arising from use of its products is not going to apply to Veo 2 until it is mostly available, Collins said.

I see Google 2
Photo credit:Google

To reduce the chance of deepfakes, DeepMind says it uses its proprietary watermarking technology SynthID to embed invisible markers into the frames generated by Veo 2. However, like all watermarking technologies, SynthID offers is just not foolproof.

Image upgrades

In addition to Veo 2, Google DeepMind this morning announced upgrades to Imagen 3, its industrial image generation model.

Starting Monday, a new edition of Imagen 3 will probably be rolled out to users of ImageFX, Google's image generation tool. According to DeepMind, it will probably be used to create “brighter, higher composed” images and photos in styles reminiscent of photorealism, impressionism and anime.

“This upgrade (to Imagen 3) also follows prompts more closely and presents more granular details and textures,” DeepMind wrote in a blog post shared with TechCrunch.

Google ImageFX
Photo credit:Google

UI updates for ImageFX will probably be introduced in parallel with the model. Now, when users enter prompts, key terms in those prompts change into “chiplets” with a drop-down menu of suggested, related words. Users can use the chips to repeat what they’ve written or select from a series of routinely generated descriptors on the prompt.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read