As Amazon takes an enormous step into the AI ​​space with its latest Nova family of base models, Google is doubling down by itself multimodal AI capabilities. The tech giant's cloud division has announced that its latest video and image generation models, Veo and Imagen 3, are actually available on Vertex AI.
This move enables teams to integrate cutting-edge video and image generation capabilities into their AI workflows, unlocking a wide selection of use cases – particularly in marketing and promoting. This makes Google Cloud the primary hyperscaler to supply its customers a video model.
While the Veo model is currently in private preview, Imagen 3 might be generally available to all Vertex AI users starting next week. Notably, Imagen 3 also includes editing features that allow users to refine generated images to fulfill specific creative needs.
What do Veo and Imagen 3 offer?
First introduced at Google's I/O developer conference, Veo is Google DeepMind's answer to competitors like Runway's Gen-3 and OpenAI's Sora, offering a classy video generation experience. The model converts text or image prompts into cinematic, high-resolution videos in various visual styles, generating clips over 60 seconds long. What sets it apart is its image-level consistency, ensuring subjects move seamlessly inside shots.
Imagen 3, also from DeepMind, takes on the duty of text-to-image generation and creates photorealistic images in various styles. Google claims it outperforms its predecessors by way of detail, lighting accuracy, and artifact reduction.
Beyond generation, users on Google's allowlist may access advanced customization options with Imagen 3. This includes image upscaling, inpainting, outpainting and background alternative – all controlled by text prompts. Additionally, users can provide reference images, allowing Imagen 3 to create content that aligns with specific brand aesthetics, logos, or product features.
Wider impact on industry
Vertex AI has long been Google Cloud's flagship platform for optimizing the event and delivery of AI applications. By integrating Veo and Imagen 3, the platform offers firms an excellent more comprehensive suite of tools to innovate in marketing, sales and beyond.
For example, Imagen 3 makes it easy to create high-value assets like product images and social media content, while Veo expands this capability by giving teams the flexibility to show these visuals into polished videos. This accelerates production, reduces costs, and accelerates prototyping, allowing teams to quickly evolve their creative strategies.
“Customers like Agoda are leveraging the ability of AI models like Veo, Gemini and Imagen to optimize the production of their video ads, significantly reducing production time,” said Warren Barkley, senior director of product management at Google, in a press release Blog post. He also emphasized that each models include security measures resembling digital watermarks and content moderation guardrails to mitigate the risks related to generative AI.
Other early adopters include Mondelez International – owner of brands resembling Oreo, Cadbury and Milka – and global marketing and communications services provider WPP. As Google's base models expand their reach, firms across industries have an ideal opportunity to rethink the way in which they create and deliver visual content.
Competition continues to accentuate
While all major cloud providers, including Google Cloud, Amazon Web Services and Microsoft Azure, provide image generation models on their respective AI orchestration platforms, video generation has to this point been a rarity. Google's move to launch Veo in private preview today changes that.
Interestingly, shortly after the Veo announcement, AWS made a splash at re:Invent with the announcement of Nova Reel, a basic model that generates six-second, studio-quality videos from text and image prompts.
This model, together with other models within the Nova family, is predicted to be available through Amazon Bedrock, the corporate's fully managed service designed to simplify the creation and deployment of generative AI applications.
Microsoft, alternatively, appears to be lagging behind on this category in the intervening time. Its AI Foundry doesn’t contain models for video generation. However, we expect that to vary once OpenAI's Sora hits the market.