Google is fully committed to AI – and desires you to understand it. During the corporate's keynote at its I/O developer conference on Tuesday, Google mentioned “AI” greater than 120 times. That is rather a lot!
But not all of Google's AI announcements were significant per se. Some were incremental. Others were warmed up again. To separate the wheat from the chaff, we've rounded up the very best latest AI products and features unveiled at Google I/O 2024.
Generative AI in search
Google plans to make use of generative AI to prepare entire Google search results pages.
What will AI-organized pages seem like? Well, it depends upon the search query. But they may display AI-generated summaries of reviews, discussions from social media sites like Reddit and AI-generated suggestion lists, Google said.
Currently, Google plans to display AI-powered results pages when it detects that a user is searching for inspiration – for instance, when planning a visit. Soon, these results may even appear when users seek for dining options and recipes, with results for movies, books, hotels, e-commerce and more.
Project Astra and Gemini Live
Google is improving its AI-powered chatbot Gemini to raised understand the world around it.
The company introduced a brand new experience in Gemini called Gemini Live, which allows users to have “in-depth” voice chats with Gemini on their smartphones. Users can interrupt Gemini while the chatbot is chatting with ask clarifying questions, and the chatbot will adapt to their speech patterns in real time. And Gemini can see and reply to users' surroundings, either through photos or videos captured by their smartphones' cameras.
Gemini Live – which isn't launching until later this 12 months – can answer questions on things which are in the sphere of view (or recently in view) of a smartphone's camera, comparable to what neighborhood a user is in or how an element of a broken one is bicycle known as. The technical innovations driving Live are available part from Project Astra, a brand new initiative inside DeepMind to develop AI-powered apps and “agents” for real-time multimodal understanding.
I spy on Google
Google relies on OpenAI's Sora with Veo, an AI model that may create 1080p video clips a few minute long when given a text prompt.
Veo can capture various visual and cinematic styles, including landscape shots and time-lapse shots, and make edits and adjustments to footage that has already been created. The model understands camera movement and VFX quite well from prompts (think descriptors like “pan,” “zoom,” and “explosion”). And Veo has some understanding of physics – things like fluid dynamics and gravity – which adds to the realism of the videos he generates.
Veo also supports masked editing for changes to specific areas of a video and might generate video from a still image, a la generative models like Stability AI's Stable Video. Perhaps most intriguingly, given a sequence of prompts that together tell a story, Veo can generate longer videos – videos longer than a minute.
Ask for photos
Google Photos is getting an AI infusion with the launch of an experimental feature called Ask Photos, based on Google's Gemini family of generative AI models.
Launching later this summer, Ask Photos will allow users to go looking their Google Photos collection using natural language queries that leverage Gemini's understanding of the content of their photos – and other metadata.
For example, as an alternative of looking for a selected thing in a photograph, comparable to “One World Trade,” users can perform much broader and more complex searches, comparable to “best photo from each of the national parks I actually have visited.” ” In this instance, Gemini would use signals like lighting, blur, and lack of background distortion to find out what makes a photograph the “best” in a given set, and mix that with an understanding of geolocation information and data to return the relevant images .
Gemini in Gmail
Thanks to Gemini, Gmail users will soon have the opportunity to go looking, summarize and compose emails – in addition to edit emails to handle more complex tasks, comparable to processing returns.
In a demo at I/O, Google showed how parents can discover what's happening at their child's school by asking Gemini to summarize all of the college's recent emails. In addition to the text of the emails, Gemini also analyzes attachments, comparable to: B. PDFs, and spits out a summary with key points and motion items.
Through a sidebar in Gmail, users can ask Gemini to assist them organize receipts from their emails and even place them in a Google Drive folder, or extract information from the receipts and paste them right into a spreadsheet. If you do that regularly – for instance, as a business traveler to trace expenses – Gemini may offer to automate the workflow for future use.
Detecting fraud during a call
Google has previewed an AI-powered feature to alert users to potential scams during a call.
The feature, set to be integrated right into a future version of Android, leverages Gemini Nano, the smallest version of Google's generative AI offering that may run entirely on-device to listen in real-time to “conversation patterns which are common with fraud”. .
No specific release date has been set for the feature. As with lots of these items, Google is predicting how much Gemini Nano can do in the longer term. However, we all know the feature might be optional – which is thing. While using Nano means the system doesn't routinely upload audio to the cloud, the system still effectively listens to users' conversations – a possible privacy risk.
AI for accessibility
Google is adding a bit of generative AI magic to its TalkBack accessibility feature for Android.
Soon, TalkBack will use Gemini Nano to create audio descriptions of objects for visually impaired and blind users. For example, TalkBack might describe an item of clothing as follows: “A detailed-up of a black and white gingham dress. The dress is brief, with a collar and long sleeves. It ties on the waist with a big bow.”
According to Google, TalkBack users encounter around 90 unlabeled images day-after-day. Using Nano, the system will have the opportunity to supply insights into the content – potentially without anyone having to manually enter this information.