HomeArtificial IntelligenceGemini Live, Google's answer to ChatGPT's enhanced voice mode, launches

Gemini Live, Google's answer to ChatGPT's enhanced voice mode, launches

Gemini Live, Google's answer to the recently introduced (in limited alpha) Advanced Voice Mode for OpenAI's ChatGPT, launches on Tuesday, months after its announcement at Google's I/O 2024 developer conference. It was announced at Google's Made by Google 2024 event.

Gemini Live lets users have “in-depth” voice chats with Gemini, Google's generative AI-powered chatbot, on their smartphones. Thanks to an improved speech engine that Google says delivers more consistent, emotionally expressive and realistic multi-turn dialogues, users can interrupt Gemini while it's chatting with ask follow-up questions, and it adapts to their speech patterns in real time.

Here's how Google describes it in a blog post: “With Gemini Live (via the Gemini app), you’ll be able to check with Gemini and pick from (10 recent) natural-sounding voices for it to reply with. You may even speak at your individual pace or interrupt mid-answer with clarifying questions, identical to you’d in any conversation.”

Gemini Live is hands-free in the event you want it to be. You can proceed talking with the Gemini app within the background or when the phone is locked, and pause and resume conversations at any time.

So how could this be useful? Google gives the instance of a job interview rehearsal – a bit like ironic scenariobut OK. Gemini Live can practice with you, Google says, offering speaking suggestions and suggesting skills to focus on when chatting with a hiring manager (or AI). depending on the situation).

One advantage Gemini Live has over ChatGPT's enhanced speech mode is its higher memory. The architecture of the generative AI model that underlies Live, Gemini 1.5 Pro, and Gemini 1.5 Flash has a longer-than-average “context window,” meaning they will soak up and process numerous data—hours of back-and-forth conversations—before formulating a response.

“Live uses our Gemini Advanced models, which we've adapted to enable more conversation,” a Google spokesperson told TechCrunch via email. “The model's large context window is leveraged when users have long conversations with Live.”

We'll should see how well this all works in practice, after all. If OpenAI's setbacks with its enhanced voice mode are any indication, demos rarely translate seamlessly into the true world.

Photo credits: Google

In this context, Gemini Live has certainly one of the features Google has shown off to this point at I/O: multimodal input. Back in May, Google released pre-recorded videos showing how Gemini Live senses and responds to users' environments based on photos and pictures captured with their phones' cameras – for instance, naming an element of a broken bike or explaining what a chunk of code on a pc screen does.

Multimodal typing might be available “later this yr,” Google said, but didn’t provide further details. Live may also expand to more languages ​​and thru the Google app for iOS later this yr; for now, it is simply available in English.

Gemini Live, like Advanced Voice Mode, is just not free. It is out there exclusively on Gemini Advanced, a more sophisticated version of Gemini hidden behind the Google One AI Premium plan that costs $20 per 30 days.

However, other recent Gemini features are free.

Android users will soon (in the approaching weeks) have the ability to display Gemini's overlay over whatever app they're currently using to ask questions on what's on the screen (like a YouTube video) by holding down their phone's power button or saying “Hey Google.” Gemini will have the ability to generate images (but still not images of individuals, unfortunately) directly from the overlay – images that could be dragged and dropped into apps like Gmail and Google Messages.

Gemini can be getting recent integrations with Google services (or “extensions,” as the corporate prefers to call them) on each mobile and the online. In the approaching weeks, Gemini will have the ability to perform more actions with Google Calendar, Keep, Tasks, YouTube Music, and Utilities, the apps that control device features like timers and alarms, media controls, flashlight, volume, Wi-Fi, Bluetooth, and more.

In a blog post, Google gives a couple of ideas on how one can benefit from this. Sounds clever, assuming every part works reliably:

  • Ask Gemini to “create a playlist of songs that remind me of the late 90s.”
  • Take a photograph of a concert flyer and ask Gemini if ​​you're available that day – and even arrange a reminder to purchase tickets.
  • Have Gemini pick up a recipe from Gmail and ask it so as to add the ingredients to your shopping list in Keep.

Starting later this week, Gemini may also be available for Android tablets.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read