Last week, OpenAI introduced Advanced Voice Mode with Vision, which feeds real-time video to ChatGPT and allows the chatbot to “see” beyond the boundaries of its app layer. The premise is that ChatGPT's greater contextual awareness allows the bot to reply more naturally and intuitively.
But the primary time I attempted it, I used to be lied to.
“The sofa looks comfortable!” ChatGPT said as I held up my phone and asked the bot to explain our front room. It had mistaken the ottoman for a couch.
“My mistake!” ChatGPT said as I corrected it. “Well, it still looks like a snug room.”
It's been almost a 12 months since OpenAI first demonstrated the Advanced Voice Mode with Vision that the corporate developed opened as a step towards AI, as portrayed within the Spike Jonze film Her. The way OpenAI sold it, Advanced Voice Mode with Vision would give ChatGPT superpowers – allowing the bot to resolve outlined math problems, read and reply to emotions loving letters.
Did it achieve all of this? More or less. But Advanced Voice Mode with Vision hasn't solved ChatGPT's biggest problem: reliability. If anything, this feature makes the bot's hallucinations more obvious.
At some point I used to be curious if the advanced voice mode with Vision ChatGPT could help offer fashion suggestions. I activated it and asked ChatGPT to rate an outfit of mine. Fortunately, this happened. But while the bot was giving its opinion on my jeans and olive shirt combo, it kept missing the brown jacket I used to be wearing.
I'm not the just one who has experienced slip-ups.
When OpenAI President Greg Brockman demonstrated Advanced Voice Mode with Vision on “60 Minutes” earlier this month, ChatGPT made a mistake on a geometry problem. When calculating the world of a triangle: misidentified the peak of the triangle.
So my query is: What good is a “you”-like AI in case you can’t trust it?
With each ChatGPT malfunction, I became less and fewer inclined to achieve into my pocket, unlock my phone, launch ChatGPT, open Enhanced Voice Mode, and activate Vision—a cumbersome series of steps even under one of the best of circumstances. With its vibrant and cheerful demeanor, Advanced Voice Mode is clearly designed to encourage confidence. When this implicit promise will not be kept, it’s annoying – and disappointing.
Maybe in the future OpenAI can solve the hallucination problem once and for all. Until then, we're stuck with a bot taking a look at the world through criss-cross wiring. And truthfully, I'm undecided who would want that.
News
OpenAI’s 12 days of “shipmas” proceed: OpenAI will release recent products day-after-day until December twentieth. Here you will see that a summary of all announcements, which we update commonly.
YouTube offers creators the choice to opt out: YouTube is giving creators more alternative over how third parties can use their content to coach their AI models. Creators and rights holders can enroll for YouTube in the event that they allow certain firms to coach models for his or her clips.
Meta’s smart glasses receive upgrades: The Ray-Ban Meta data glasses from Meta have arrived several recent AI-powered updatesincluding the flexibility to have an ongoing conversation with Meta's AI and translate between languages.
DeepMind's response to Sora: Google DeepMind, Google's flagship AI research lab, goals to beat OpenAI within the video generation game. On Monday, DeepMind announced Veo 2, a next-generation video-generating AI that may create greater than two-minute clips in resolutions as much as 4K (4,096 x 2,160 pixels).
OpenAI whistleblower found dead: According to the San Francisco Office of the Chief Medical Examiner, a former OpenAI worker, Suchir Balaji, was recently found dead in his San Francisco apartment. In October, the 26-year-old AI researcher raised concerns that OpenAI violated copyright law in an interview with The New York Times.
Grammarly acquires Coda: Grammarly, best known for its style and spelling checking tools, has acquired productivity startup Coda for an undisclosed amount. As a part of the deal, Shishir Mehrotra, CEO and co-founder of Coda, will develop into the brand new CEO of Grammarly.
Cohere works with Palantir: TechCrunch exclusively reported that Cohere, the $5.5 billion enterprise-focused AI startup, has partnered with data analytics firm Palantir. Palantir is vocal about its demise – sometimes controversial – work with US defense and intelligence agencies.
Research paper of the week
Anthropic has pulled back the curtain on Clio (“Clhears ISights and OObservations), a system that helps the corporate understand how customers use its various AI models. Clio, which compares Anthropic to analytics tools like Google Trends, provides “beneficial insights” to enhance the safety of Anthropic’s AI, the corporate claims.
Anthropic hired Clio to compile anonymized usage data, a few of which the corporate released last week. What do customers use Anthropic’s AI for? A variety of tasks – but web and mobile app development, content creation and academic research are at the highest of the list. Predictably, use cases vary by language. For example, Japanese speakers usually tend to ask Anthropic's AI to research anime than Spanish speakers.
Model of the week
AI startup Pika has released its next-generation video generation model. Pika 2which may create a clip from a user-specified character, object and placement. Pika's platform allows users to upload multiple references (e.g. images of a boardroom and office employees), and Pika 2 “guesses” the role of every reference before combining them right into a single scene.
Of course, no model is ideal. Check out the “anime” created by Pika 2 below, which has impressive consistency but suffers from the aesthetic weirdness present in all generative AI shots.
pic.twitter.com/3jWCy4659o As I said before, anime shall be the primary genre to be 100% AI generated. It's amazing to see what’s already possible with Pika 2.0
— Chubby♨️ (@kimmonismus) December 16, 2024
Nevertheless, the tools within the video sector are improving in a short time – and are arousing the interest and anger of creative professionals in equal measure.
Lucky bag
The Future of Life Institute (FLI), the nonprofit organization co-founded by MIT cosmologist Max Tegmark, has released an “AI Safety Index” designed to evaluate the security practices of leading AI firms in five key areas: current harms, safety framework, existential safety strategy, Governance and accountability in addition to transparency and communication.
With an overall grade of F, Meta was the worst of the group evaluated within the index. (The index uses a numerical and GPA-based scoring system.) Anthropic was one of the best, but didn't manage to improve than a C—suggesting there's room for improvement.