Researchers have published this most comprehensive survey previously so-called “Operating system agents” – Artificial intelligence systems that may autonomously control computers, mobile phones and web browsers by interacting directly with their interfaces. The 30-page scientific review, accepted for publication by the renowned institute Association for Computational Linguistics The conference represents a rapidly evolving field that has attracted billions in investment from major technology corporations.
“The dream of making AI assistants as capable and versatile because the fictional JARVIS from Iron Man has long captured the imagination,” the researchers write. “With the event of (multimodal) large language models ((M)LLMs), this dream is moving closer to reality.”
The survey, conducted by researchers from Zhejiang University And OPPO AI Centercomes at a time when major tech corporations are competing to offer AI agents that may perform complex digital tasks. OpenAI was recently launched.operator“Anthropic released”Computer use“Apple has introduced advanced AI features”Apple Intelligence,” and Google revealed “Project Mariner“ – any system designed to automate computer interactions.
Operating system agents observe computer screens and system data, then perform actions similar to clicks and swipes across mobile, desktop and web platforms. The systems must understand interfaces, plan multi-step tasks, and translate those plans into executable code. (Source: GitHub)
Tech giants are rushing to deploy AI to regulate your desktop
The speed at which academic research has transformed into consumer-ready products is unprecedented, even by Silicon Valley standards. The Opinion poll reveals a research explosion: over 60 base models and 50 agent frameworks designed specifically for computer control, with release rates increasing dramatically since 2023.
This is just not only a matter of gradual progress. We are witnessing the emergence of AI systems that may actually understand and manipulate the digital world the way in which humans do. Current systems work by taking screenshots of computer screens, using advanced computer vision to know what’s being displayed, after which performing precise actions similar to clicking buttons, filling out forms, and navigating between applications.
“OS agents can complete tasks autonomously and have the potential to significantly improve the lives of billions of users worldwide,” the researchers note. “Imagine a world where tasks like online shopping, travel bookings and other each day activities may very well be seamlessly accomplished by these agents.”
The most sophisticated systems can handle complex, multi-step workflows that span multiple applications – book a restaurant reservation, then routinely add it to your calendar, then set a reminder to go away early due to traffic. What took humans minutes of clicking and typing can now be done in seconds without human intervention.
Developing AI agents requires a fancy training pipeline that mixes multiple approaches, from initial pre-training using on-screen data to reinforcement learning that optimizes performance through trial and error. (Source: arxiv.org)
Why security experts are sounding the alarm about AI-driven enterprise systems
For enterprise technology leaders, the promise of productivity gains comes with a sobering reality: These systems represent a completely recent attack surface that almost all organizations cannot defend.
Researchers pay close attention to what they diplomatically consult with as “”.Security and privacy“Concerns, however the implications are more alarming than their academic language suggests. “OS agents face these risks, especially given its widespread application on personal devices containing user data,” they write.
The attack methods they documented read like a cybersecurity nightmare. “Indirect web prompt injection” allows malicious actors to embed hidden instructions in web pages that may hijack an AI agent's behavior. Even more worrisome are “environmental injection attacks,” where seemingly innocuous web content can trick agents into stealing user data or performing unauthorized actions.
Consider the implications: An AI agent with access to your organization's email, financial systems, and customer databases may very well be manipulated by a rigorously crafted website to extract sensitive information. Traditional security models that depend on human users who can detect obvious phishing attempts fail when the “user” is an AI system that processes information in another way.
The survey shows a worrying gap in preparation. While general security frameworks for AI agents exist, “studies on specific defenses for operating system agents remain limited.” This is just not just a tutorial problem, but an instantaneous challenge for any organization considering the usage of these systems.
The reality check: Current AI agents still struggle with complex digital tasks
Despite the hype surrounding these systems, evaluation of performance benchmarks within the survey reveals significant limitations that dampen expectations of immediate widespread adoption.
Success rates vary significantly depending on the duty and platform. Some business systems achieve success rates of over 50% on certain benchmarks – impressive for an emerging technology – but struggle with others. Researchers categorize assessment tasks into three types: basic “GUI grounding” (understanding interface elements), “information retrieval” (searching and extracting data), and complicated “agentic tasks” (multi-stage autonomous operations).
The pattern is telling: Current systems excel at easy, well-defined tasks, but falter when confronted with the complex, context-dependent workflows that make up much of recent knowledge work. They can reliably click a selected button or fill out a normal form, but struggle with tasks that require sustained thought or adaptation to unexpected interface changes.
This performance gap explains why early deployments deal with small, high-volume tasks somewhat than general automation. Technology is just not yet ready to interchange human judgment in complex scenarios, however it is increasingly able to handling routine digital tasks.
Operating system agents depend on interconnected systems for perception, planning, memory, and motion execution. The complexity of coordinating these components explains why current systems still struggle with demanding tasks. (Source: arxiv.org)
What happens when AI agents learn to adapt to every user?
Perhaps essentially the most intriguing – and potentially transformative – challenge identified within the survey concerns what researchers call “personalization and self-development.” Unlike today's stateless AI assistants, which treat each interaction as independent, future operating system agents might want to learn from user interactions and adapt to individual preferences over time.
“Developing personalized operating system agents has long been a goal of AI research,” the authors write. “A private assistant is anticipated to repeatedly adapt to individual user preferences and supply enhanced experiences.”
This capability could fundamentally change the way in which we interact with technology. Imagine an AI agent that learns your email writing style, understands your calendar preferences, knows which restaurants you like, and might make increasingly sophisticated decisions in your behalf. The potential productivity gains are enormous, however the privacy implications are also enormous.
The technical challenges are significant. The survey points to the necessity for higher multimodal storage systems that may handle not only text but in addition images and speech, which poses “significant challenges” to current technology. How do you construct a system that remembers your preferences without making a comprehensive surveillance record of your digital life?
For technology managers evaluating these systems, this personalization challenge represents each the best opportunity and best risk. The corporations that solve the issue first will gain significant competitive advantage, however the privacy and security implications may very well be severe if handled poorly.
The race to develop AI assistants that may truly function like human users is rapidly heating up. While fundamental challenges around security, reliability and personalization remain unresolved, the trajectory is evident. The researchers maintain an open source repository that tracks developments and acknowledge that “OS agents are still within the early stages of development” with “rapid advances continuing to introduce recent methods and applications.”
The query is just not whether AI agents will change the way in which we interact with computers – it is whether or not we will probably be prepared for the implications in the event that they do. The window of opportunity for the correct security and privacy frameworks is shrinking as technology advances.

