HomeNewsGoogle releases Imagen 2, a video clip generator

Google releases Imagen 2, a video clip generator

Google doesn't have one of the best track record in terms of image-generating AI.

In February, the image generator built into Gemini, Google's AI-powered chatbot, was found to randomly insert gender and racial diversity into prompts about people, leading to images of racially diverse Nazis, amongst other offensive inaccuracies.

Google has withdrawn the generator, promising to enhance it and eventually re-release it. While we wait for its return, the corporate is rolling out an improved image generation tool, Imagen 2, inside its Vertex AI developer platform – albeit a way more enterprise-focused tool. Google announced Imagen 2 at its annual Cloud Next conference in Las Vegas.

Imagen 2 — actually a family of models launched in December after being unveiled at Google's I/O conference in May 2023 — can create and manipulate images that receive a text prompt, like OpenAI's DALL-E and Midjourney. Of interest to corporate types: Imagen 2 can render text, emblems and logos in multiple languages ​​and optionally overlay these elements on existing images, for instance on business cards, clothing and products.

After initially launching in preview, image editing with Imagen 2 is now generally available in Vertex AI, together with two latest features: inpainting and outpainting. Inpainting and outpainting, features that other popular image generators, including DALL-E, have offered for a while, could be used for removal Remove unwanted parts of a picture, add latest components, and expand the sides of a picture to create a wider field of view.

But the true meat of the Imagen 2 upgrade is what Google calls “text-to-live images.”

Imagen 2 can now create short, four-second videos from text prompts, just like AI-powered clip generation tools like Runway, Pika and Irreverent Labs. True to Imagen 2's corporate focus, Google presents Live Images as a tool for marketers and creatives, comparable to a GIF generator for ads depicting nature, food and animals – the theme that Imagen 2 was designed around.

Google says Live Images can capture “a variety of camera angles and movements,” while “Supporting consistency across the complete sequence.” However, for now they’ve a low resolution: 360 x 640 pixels. Google guarantees that this can improve in the longer term.

To address concerns about the potential for creating deepfakes (or at the very least attempting to), Google says Imagen 2 will use SynthID, an approach developed by Google DeepMind to use invisible, cryptographic watermarks to live images. Of course, detecting these watermarks – which Google claims are proof against edits comparable to compression, filters and color adjustments – requires a tool provided by Google that just isn’t available to 3rd parties.

And Google undoubtedly desires to avoid one other controversy over generative media, emphasizing that live image generation is “filtered for security reasons.” A spokesperson told TechCrunch via email: “The The Imagen 2 model in Vertex AI didn’t experience the identical issues because the Gemini app. We proceed to conduct extensive testing and have interaction with our customers.”

But assuming Google's watermarking technology, bias reduction and filters are as effective because it claims, are live images possible with the video generation tools already available?

Not really.

Runway can generate 18-second clips in much higher resolutions. Stability AI's video clip tool, Stable Video Diffusion, offers greater customizability (by way of frame rate). And OpenAI's Sora – which admittedly isn't commercially available yet – seems poised to outdo the competition with the photorealism it might achieve.

So what are the true technical benefits of live images? I'm probably not sure. And I don't think I'm being too harsh.

After all, Google is behind some really impressive video generation technologies like Imagen Video and Phenaki. Phenaki, one among Google's more interesting text-to-video experiments, turns long, detailed prompts into two-minute-plus “movies” – with the caveat that the clips are low resolution, low frame rate, and only somewhat coherent.

Given recent reporting suggesting that the generative AI revolution has caught Google CEO Sundar Pichai by surprise, and that The company remains to be struggling to maintain up with the competition, it's not surprising that a product like live images looks as if a byproduct. But it's still disappointing. I can't help but feel that there may be, or was, a more impressive product lurking in Google's skunkworks.

Models like Imagen are trained on an unlimited variety of examples, normally from public web sites and datasets on the Internet. Many providers of generative AI view training data as a competitive advantage and subsequently keep it and the associated information at their fingertips. But details about training data are also a possible source of mental property lawsuits, one other incentive to disclose lots.

I asked, as I all the time do with announcements about generative AI models, concerning the data used to coach the updated Imagen 2 and whether creators whose work can have been caught up within the model training process can opt out in some unspecified time in the future in the longer term .

Google only told me that its models are “primarily” trained on public web data collected from “blog posts, media transcripts, and public conversation forums.” Which blogs, transcripts and forums? It's anyone's guess.

A spokesperson pointed to Google's web publisher controls, which permit webmasters to stop the corporate from removing data, including photos and graphics, from their sites. But Google wouldn’t commit to publishing an opt-out tool or, alternatively, compensating creators for his or her (ignorant) contributions – a step that a lot of its competitors, including OpenAI, Stability AI and Adobe, have taken.

Another point price noting: Text-to-Live images aren’t covered by Google's generative AI indemnification policy, which protects Vertex AI customers from copyright claims related to Google's use of coaching data and output from its generative AI models. That's because text-to-live images are technically in preview; The policy only covers generative AI products usually availability (GA).

The regurgitation, or instance where a generative model spits out a mirror copy of an example (e.g. a picture) it was trained on, is rightly a priority for enterprise customers. Study each informal And academic have shown that first-generation Imagen, Imagen 2's predecessor, was not proof against this, spitting out identifiable photos of individuals, copyrighted works of artists, and more when prompted in certain ways.

Barring any controversy, technical issues, or other major unexpected setback, text-to-live images will eventually find yourself in GA. But with live images as they exist today, Google mainly says: Use at your individual risk.


Please enter your comment!
Please enter your name here

Must Read