Google unveiled Gemini Live during its Made by Google event on Tuesday. The feature lets you may have a semi-natural spoken conversation (not a typed conversation) with an AI chatbot based on Google's latest big language model. TechCrunch was there to check it out firsthand.
Gemini Live is Google's answer to OpenAI's Advanced Voice Mode, the nearly similar feature of ChatGPT that’s currently in limited alpha testing. OpenAI beat Google to the punch and demoed the feature first, but Google is the primary to roll out the ultimate feature.
In my experience, these low-latency voice features feel far more natural than texting with ChatGPT and even chatting with Siri or Alexa. I discovered that Gemini Live responded to questions in under two seconds and was pretty quick to reply when interrupted. Gemini Live isn't perfect, however it's the perfect approach to use your phone hands-free that I've seen thus far.
How Gemini Live works
Before chatting with Gemini Live, you’ll be able to select from 10 voices, in comparison with just three voices on OpenAI. Google worked with voice actors to create each voice. I actually enjoyed the variability there and located each voice to sound very human.
In one example, a Google product manager verbally asked Gemini Live to search out family-friendly wineries near Mountain View with outdoor spaces and playgrounds nearby so kids could potentially come along. That's a much more complicated task than I'd ask Siri—or, frankly, Google Search—to do, but Gemini successfully really helpful a spot that met the standards: Cooper-Garrod Vineyards in Saratoga.
However, Gemini Live leaves rather a lot to be desired. It looked as if it would hallucinate a close-by playground called the Henry Elementary School Playground, which is supposedly “10 minutes away” from this vineyard. There are other playgrounds nearby in Saratoga, but the closest Henry Elementary School is greater than a two-hour drive from there. There is a Henry Ford Elementary School in Redwood City, however it's half-hour away.
Google was pleased to indicate how users can interrupt Gemini Live mid-sentence and the AI will quickly switch over. The company says this enables users to manage the conversation. In practice, nonetheless, this feature doesn't work perfectly. Sometimes Google and Gemini Live project managers would talk over one another and the AI didn't seem to grasp what was being said.
Notably, in response to product manager Leland Rechis, Google doesn’t allow Gemini Live to sing or imitate voices apart from the ten available. The company likely does this to avoid conflicts with copyright law. In addition, Rechis said Google just isn’t aiming to make Gemini Live understand the emotional intonation of a user's voice – something OpenAI touted during its demo.
Overall, the feature looks as if an ideal approach to dive deeper right into a topic in a more natural way than with an easy Google search. Google notes that Gemini Live is a step toward Project Astra, the fully multimodal AI model the corporate unveiled during Google I/O. Currently, Gemini Live is just good for voice conversations, but in the longer term, Google hopes so as to add real-time video understanding as well.