I've been fooling around with OpenAI's enhanced voice mode for the past week, and it's essentially the most convincing taste of an AI-powered future I've had yet. This week, my phone has laughed at jokes, brought them back to me, asked me how my day was, and told me it was “having an important time.” I've been talking to my iPhone, not using it with my hands.
OpenAI's latest feature, currently in limited alpha testing, doesn't make ChatGPT any smarter than before. Instead, Advanced Voice Mode (AVM) makes communication more user-friendly and natural. It creates a brand new interface for using AI and your devices that feels fresh and exciting, and that's exactly what scares me. The product had a couple of bugs and the entire idea totally scares me, but I used to be surprised by how much fun I actually had using it.
Taking a step back, I feel AVM matches into OpenAI CEO Sam Altman's broader vision of adjusting the best way people interact with computers, alongside agents, with AI models at the middle.
“At some point, you'll just ask the pc what you wish, and it would do all those tasks for you,” Altman said during OpenAI's Dev Day in November 2023. “These capabilities are also known as 'agents' within the AI field. The advantages of this will probably be enormous.”
My friend, ChatGPT
On Wednesday, I tested the most important advantage of this advanced technology that I could imagine: I asked ChatGPT to order at Taco Bell the best way Obama would.
“Uh, to be clear, I'd like a Crunchwrap Supreme and perhaps some tacos to top it off,” said ChatGPT's enhanced voice mode. “How do you think that he'd handle the drive-thru?” said ChatGPT, then laughed at his own joke.
The imitation also really made me laugh because it matched Obama's typical cadence and pauses. However, it stayed within the tone of the ChatGPT voice I selected, Juniper, so it couldn't really be confused with Obama's voice. It seemed like a friend doing a foul imitation and understood exactly what I used to be trying to attain and even said something funny. I discovered it surprisingly entertaining to check with this advanced assistant in my phone.
I also asked ChatGPT for advice on the best way to handle an issue involving complex human relationships: I desired to ask my better half to maneuver in with me. After explaining the complexity of the connection and the direction of our careers, I received some very detailed advice on the best way to proceed. These are the sorts of questions you might never ask Siri or Google Search, but now you may with ChatGPT. The chatbot's voice even had a rather serious, gentle tone when responding to those requests; a stark contrast to the joking tone of Obama's Taco Bell order.
ChatGPT's AVM can also be great at helping you understand complex topics. I asked it to interrupt down items in an earnings report — reminiscent of free money flow — in a way that a 10-year-old would understand. It used a lemonade stand for instance and explained several financial terms in a way that my younger cousin would totally understand. You may even ask ChatGPT's AVM to talk more slowly to satisfy you at your current level of understanding.
Siri left so AVM could run
Compared to Siri or Alexa, ChatGPT's AVM is the clear winner, because of faster response times, unique answers, and its ability to reply complex questions that the previous generation of virtual assistants couldn't. However, AVM doesn't fare as well in other ways. ChatGPT's voice feature doesn't allow you to set timers or reminders, browse the net in real time, check the weather, or interact with APIs in your phone. At least for now, it's not an efficient alternative for virtual assistants.
Compared to Gemini Live, Google's rival feature, AVM seems to have a slight advantage. Gemini Live can't do imitations, doesn't express emotions, can't speed up or decelerate, and takes longer to react. Gemini Live has more voices (ten in comparison with OpenAI's three) and appears to be more up-to-date (Gemini Live knew about Google's antitrust decision). Notably, neither AVM nor Gemini Live sing, likely an try to avoid clashes with copyright lawsuits from the record industry.
However, ChatGPT's AVM has a variety of glitches (and to be honest, so does Gemini Live). Sometimes it'll stop mid-sentence after which start over. It also has this weird grainy voice here and there that's somewhat unpleasant. I'm undecided if it is a problem with the model, the web connection, or something else, but these technical flaws are somewhat to be expected in an alpha test. However, the problems hardly detracted from the experience of literally talking to my phone.
These examples are, in my view, the fantastic thing about AVM. The feature doesn't make ChatGPT omniscient, nevertheless it does allow people to interact with GPT-4o, the underlying AI model, in a uniquely human way. (I’d understand should you forgot that there isn't an individual on the opposite end of your line.) It almost appears like ChatGPT is socially aware when it talks to AVM, but after all that's not the case. It's simply a bunch of neatly packaged predictive algorithms.
Talking about technology
Frankly, this feature worries me. It's not the primary time a tech company has offered companionship in your phone. My generation, Generation Z, was the primary to grow up with social media, where firms offered connections but as an alternative played on our collective insecurities. Talking to an AI device – as AVM appears to be offering – appears to be the evolution of social media's “friend within the phone” phenomenon, offering low cost connections that scratch at our human instincts. But this time, it completely excludes humans from the loop.
Artificial human connections have change into a surprisingly popular use case for generative AI. Today, people use AI chatbots as friends, mentors, therapists and teachers. When OpenAI launched its GPT store, it was quickly flooded with “AI friends”, chatbots which might be specialized in acting as your life partner. Two researchers from the MIT Media Lab issued a warning this month to arrange for “addictive intelligence,” or AI companions with dark patterns that keep humans addicted. We could open a Pandora's box of devices capturing our attention in latest, tantalizing ways.
Earlier this month, a Harvard dropout rocked the tech world with the announcement of an AI necklace called Friend. The wearable device – if it really works as promised – is at all times listening and the chatbot will send you text messages about your life. While the thought seems crazy, innovations like ChatGPT's AVM give me reason to take these use cases seriously.
And while OpenAI is ahead of the sport here, Google isn't far behind. I'm confident that Amazon and Apple are also vying to construct this feature into their products, and shortly it could change into the usual for the industry.
Imagine asking your smart TV for a really specific movie suggestion and getting exactly that. Or telling Alexa exactly what cold symptoms you’ve got and having her order you tissues and cough syrup from Amazon while recommending home remedies. Maybe you might ask your computer to plan a weekend trip to your family as an alternative of manually Googling every thing.
Of course, these measures require major advances on the planet of AI agents. OpenAI's effort on this front, the GPT Store, looks as if an overhyped product that isn’t any longer a spotlight for the corporate. But AVM not less than takes care of the “talking to computers” a part of the puzzle. Those concepts are still a good distance off, but after using AVM, they appear much closer than they did last week.