Top AI announcements from Google I/O

May 15, 2024

233

Google is fully committed to AI – and desires you to understand it. During the corporate's keynote at its I/O developer conference on Tuesday, Google mentioned “AI” greater than 120 times. That is rather a lot!

But not all of Google's AI announcements were significant per se. Some were incremental. Others were warmed up again. To separate the wheat from the chaff, we've rounded up the very best latest AI products and features unveiled at Google I/O 2024.

Generative AI in search

Google plans to make use of generative AI to prepare entire Google search results pages.

What will AI-organized pages seem like? Well, it depends upon the search query. But they may display AI-generated summaries of reviews, discussions from social media sites like Reddit and AI-generated suggestion lists, Google said.

Currently, Google plans to display AI-powered results pages when it detects that a user is searching for inspiration – for instance, when planning a visit. Soon, these results may even appear when users seek for dining options and recipes, with results for movies, books, hotels, e-commerce and more.

Project Astra and Gemini Live

Photo credit: Google Google

Google is improving its AI-powered chatbot Gemini to raised understand the world around it.

The company introduced a brand new experience in Gemini called Gemini Live, which allows users to have “in-depth” voice chats with Gemini on their smartphones. Users can interrupt Gemini while the chatbot is chatting with ask clarifying questions, and the chatbot will adapt to their speech patterns in real time. And Gemini can see and reply to users' surroundings, either through photos or videos captured by their smartphones' cameras.

Gemini Live – which isn't launching until later this 12 months – can answer questions on things which are in the sphere of view (or recently in view) of a smartphone's camera, comparable to what neighborhood a user is in or how an element of a broken one is bicycle known as. The technical innovations driving Live are available part from Project Astra, a brand new initiative inside DeepMind to develop AI-powered apps and “agents” for real-time multimodal understanding.

I spy on Google

Google relies on OpenAI's Sora with Veo, an AI model that may create 1080p video clips a few minute long when given a text prompt.

Veo can capture various visual and cinematic styles, including landscape shots and time-lapse shots, and make edits and adjustments to footage that has already been created. The model understands camera movement and VFX quite well from prompts (think descriptors like “pan,” “zoom,” and “explosion”). And Veo has some understanding of physics – things like fluid dynamics and gravity – which adds to the realism of the videos he generates.

Veo also supports masked editing for changes to specific areas of a video and might generate video from a still image, a la generative models like Stability AI's Stable Video. Perhaps most intriguingly, given a sequence of prompts that together tell a story, Veo can generate longer videos – videos longer than a minute.

Ask for photos

Google Photos is getting an AI infusion with the launch of an experimental feature called Ask Photos, based on Google's Gemini family of generative AI models.

Launching later this summer, Ask Photos will allow users to go looking their Google Photos collection using natural language queries that leverage Gemini's understanding of the content of their photos – and other metadata.

For example, as an alternative of looking for a selected thing in a photograph, comparable to “One World Trade,” users can perform much broader and more complex searches, comparable to “best photo from each of the national parks I actually have visited.” ” In this instance, Gemini would use signals like lighting, blur, and lack of background distortion to find out what makes a photograph the “best” in a given set, and mix that with an understanding of geolocation information and data to return the relevant images .

Gemini in Gmail

Thanks to Gemini, Gmail users will soon have the opportunity to go looking, summarize and compose emails – in addition to edit emails to handle more complex tasks, comparable to processing returns.

In a demo at I/O, Google showed how parents can discover what's happening at their child's school by asking Gemini to summarize all of the college's recent emails. In addition to the text of the emails, Gemini also analyzes attachments, comparable to: B. PDFs, and spits out a summary with key points and motion items.

Through a sidebar in Gmail, users can ask Gemini to assist them organize receipts from their emails and even place them in a Google Drive folder, or extract information from the receipts and paste them right into a spreadsheet. If you do that regularly – for instance, as a business traveler to trace expenses – Gemini may offer to automate the workflow for future use.

Detecting fraud during a call

Google has previewed an AI-powered feature to alert users to potential scams during a call.

The feature, set to be integrated right into a future version of Android, leverages Gemini Nano, the smallest version of Google's generative AI offering that may run entirely on-device to listen in real-time to “conversation patterns which are common with fraud”. .

No specific release date has been set for the feature. As with lots of these items, Google is predicting how much Gemini Nano can do in the longer term. However, we all know the feature might be optional – which is thing. While using Nano means the system doesn't routinely upload audio to the cloud, the system still effectively listens to users' conversations – a possible privacy risk.

AI for accessibility

Google is adding a bit of generative AI magic to its TalkBack accessibility feature for Android.

Soon, TalkBack will use Gemini Nano to create audio descriptions of objects for visually impaired and blind users. For example, TalkBack might describe an item of clothing as follows: “A detailed-up of a black and white gingham dress. The dress is brief, with a collar and long sleeves. It ties on the waist with a big bow.”

According to Google, TalkBack users encounter around 90 unlabeled images day-after-day. Using Nano, the system will have the opportunity to supply insights into the content – potentially without anyone having to manually enter this information.

Top AI announcements from Google I/O

Generative AI in search

Project Astra and Gemini Live

I spy on Google

Ask for photos

Gemini in Gmail

Detecting fraud during a call

AI for accessibility

LEAVE A REPLY Cancel reply

Must Read

Identity theft meets 1.1m reports – and the fatigue of authentication is just worse

The seismic effect of AI changes the expectations of the clients of law firms

Mindminimalism: The recent AI strategy saves tens of millions

A biological computer grow within the British laboratory

The inference trap: How cloud providers eat their AI margins

The increase within the fast ops: accused of hidden AI costs from bad inputs and context blue

Problem at work? You will hear from my chatbot

Latest articles

Identity theft meets 1.1m reports – and the fatigue of authentication is just worse

The seismic effect of AI changes the expectations of the clients of law firms

Mindminimalism: The recent AI strategy saves tens of millions

Our Newsletter

Top AI announcements from Google I/O

Generative AI in search

Project Astra and Gemini Live

I spy on Google

Ask for photos

Gemini in Gmail

Detecting fraud during a call

AI for accessibility

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter