OpenAI's agent tool could also be near release

January 21, 2025

84

OpenAI could also be near releasing an AI tool that may take control of your PC and perform actions in your behalf.

Tibor Blaho, a software engineer with a popularity for accurately leaking upcoming AI products, Claims Having discovered evidence of OpenAI's long-rumored Operator tool. Publications including Bloomberg has previously reported on Operator, which is claimed to be an “agentic” system that may autonomously handle tasks like writing code and booking trips.

After According to The Information, OpenAI is targeting January because the operator's release month. The code uncovered by Blaho this weekend lends additional credibility to this reporting.

OpenAI's ChatGPT client for macOS has received hidden options for outlining shortcuts to “Toggle Operator” and “Force Quit Operator,” in accordance with Blaho. And OpenAI has added references to Operator on its website, Blaho said — although references that will not be yet publicly visible.

Confirmed – ChatGPT macOS desktop app has hidden options to define desktop launcher shortcuts to “Toggle Operator” and “Force Quit Operator”. https://t.co/rSFobi4iPN pic.twitter.com/j19YSlexAS

— Tibor Blaho (@btibor91) January 19, 2025

According to Blaho, OpenAI's website also accommodates not-yet-public tables comparing Operator's performance to other computational AI systems. The tables can definitely be placeholders. However, if the numbers are correct, they suggest that the operator shouldn’t be 100% reliable depending on the duty.

The OpenAI website already accommodates references to Operator/OpenAI CUA (Computer Use Agent) – “Operator System Card Table”, “Operator Research Eval Table” and “Operator Refusal Rate Table”.

Including comparison with Claude 3.5 Sonnet Computer Usage, Google Mariner etc.

(Preview of the tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

On OSWorld, a benchmark that attempts to mimic an actual computing environment, “OpenAI Computer Use Agent (CUA)” – possibly the AI model that powers Operator – scores 38.1%, ahead of Anthropic's computer control model, but by a big margin behind the 72.4% people rating. OpenAI CUA outperforms human performance on WebVoyager, which evaluates an AI's ability to navigate and interact with web sites. However, in accordance with the leaked benchmarks, the model fails to attain human scores on one other web-based benchmark, WebArena.

The operator also struggles with tasks that a human could easily perform, if the leak is to be believed. In a test that asked Operator to log in to a cloud provider and begin a virtual machine, Operator was only successful 60% of the time. In the duty of making a Bitcoin wallet, the operator succeeded only in 10% of cases.

We've reached out to OpenAI for comment and can update this text if we hear back.

OpenAI's impending entry into the AI agent space comes as competitors similar to the aforementioned Anthropic, Google and others are committed to the emerging segment. AI agents might be dangerous and speculativebut tech giants are already touting it as the subsequent big thing in AI. After According to analytics firm Markets and Markets, the AI agent market could possibly be price $47.1 billion by 2030.

Nowadays agents are fairly primitive. However, some experts have raised concerns about their safety if the technology improves quickly.

One of the leaked charts shows that Operator performs well in select security assessments, including tests that try to trick the system into performing “illegal activities” and looking for “sensitive personal data.” According to reportsSecurity testing is one in all the explanations for Operator's long development cycle. In a current X postOpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing an agent that he said lacked security measures.

“I can only imagine the negative response if OpenAI released the same publication,” Zaremba wrote.

It's price noting that OpenAI has been criticized by AI researchers, including former employees, for allegedly deemphasizing security work in favor of rapid product launches of its technology.

OpenAI's agent tool could also be near release

LEAVE A REPLY Cancel reply

Must Read

TikTok owner ByteDance plans to spend $12 billion on AI chips in 2025

When does an actor stop and when does the AI start? What The Brutalist and Emilia Pérez tell us about AI in Hollywood

OpenAI is working with SoftBank and Oracle on a $500 billion data center project

Die vielfältige Herausforderung, KI voranzutreiben

According to Google's Hassabis, the AI-developed drug might be in testing by the top of the yr

ChatGPT 3.5 vs 4: Understanding the Differences for Businesses

Mistral AI plans to go public

Latest articles

TikTok owner ByteDance plans to spend $12 billion on AI chips in 2025

When does an actor stop and when does the AI start? What The Brutalist and Emilia Pérez tell us about AI in Hollywood

OpenAI is working with SoftBank and Oracle on a $500 billion data center project

Our Newsletter

OpenAI's agent tool could also be near release

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter