Apple researchers are developing AI that may “see” and understand screen context

April 1, 2024

107

Apple researchers have developed a brand new artificial intelligence system that may understand ambiguous references to on-screen entities in addition to conversational and background context, enabling more natural interactions with voice assistants, in keeping with a paper published Friday.

The system, called ReALM (Reference Resolution as Language Modeling), uses large language models to rework the complex task of reference resolution – including understanding references to visual elements on a screen – right into a pure language modeling problem. This allows ReALM to realize significant performance improvements in comparison with existing methods.

“It is important for a conversational assistant to find a way to grasp context, including references,” the Apple research team wrote. “Giving the user the power to ask questions on what they see on their screen is a critical step in ensuring a really hands-free voice assistant experience.”

Improve conversation assistants

To address screen-based references, a key innovation of ReALM is to reconstruct the screen using analyzed screen elements and their positions to generate a textual representation that captures the visual layout. The researchers showed that this approach, combined with fine-tuning language models specifically for reference resolution, could outperform GPT-4 at this task.

Apple's AI system, ReALM, can understand references to on-screen elements just like the “260 Sample Sale” listing shown on this model, enabling more natural interactions with voice assistants. (Image credit: arxiv.org)

“We exhibit large improvements over an existing system with similar functionality across different reference types, with our smallest model achieving absolute gains of over 5% for screen references,” the researchers write. “Our larger models significantly outperform GPT-4.”

Practical applications and limitations

The work highlights the potential of focused language models to handle tasks similar to reference resolution in production systems where using large, end-to-end models just isn’t possible because of latency or computational limitations. With the discharge of the study, Apple signals its continued investment in making Siri and other products more familiar and context-aware.

Still, the researchers indicate that there are limitations to using automated screen evaluation. Dealing with more complex visual references, similar to distinguishing between multiple images, would likely require the incorporation of computer vision and multimodal techniques.

Apple is attempting to close the AI gap while the competition is on the rise

Apple is quietly making significant progress in artificial intelligence research, at the same time as it lags behind its tech rivals within the race for dominance within the fast-moving AI landscape.

From multimodal models that mix vision and speech to AI-powered animation tools to techniques for constructing high-performance, specialized AI on a budget, a gentle stream of breakthroughs from the corporate's research labs suggests its AI ambitions are rapidly escalating .

But the famously secretive tech giant faces stiff competition from corporations like Google, Microsoft, Amazon and OpenAI, which have aggressively adopted generative AI in search, office software, cloud services and more.

Long a quick follower moderately than a primary mover, Apple now faces a market that’s being transformed at breakneck speed by artificial intelligence. Watched closely Worldwide developer conference In June, the corporate is anticipated to unveil a brand new major language model framework, a “Apple GPT“Chatbot and other AI-powered features across its ecosystem.

“We stay up for sharing details of our ongoing work in AI later this 12 months,” said CEO Tim Cook recently hinted at it during a conference call. Despite the characteristic opacity, it's clear that Apple's AI efforts are far-reaching in scope.

But because the battle for AI supremacy heats up, the iPhone maker's late arrival to the party has put it in an unusual position of weakness. Big coffers, brand loyalty, world-class technology and a tightly integrated product portfolio give it a fighting probability – but there are not any guarantees on this high-stakes competition.

A brand new era of ubiquitous, truly intelligent computing is upon us. In June we'll see if Apple has done enough to make sure it's involved within the design.

Apple researchers are developing AI that may “see” and understand screen context

Improve conversation assistants

Practical applications and limitations

Apple is attempting to close the AI gap while the competition is on the rise

LEAVE A REPLY Cancel reply

Must Read

Silicon Valley shaken as open-source AI models Llama 3.1 and Mistral Large 2 rival industry leaders

Trend reversal in technology stocks pushes US megacaps into correction zone

A brand new Chinese video generation model appears to censor politically sensitive topics

OpenAI pronounces “SearchGPT” to remain at the highest

How Salesforce's STEM 1T dataset could revolutionize the AI industry

Forget coding bootcamps: Airtable's AI can construct your app in seconds

Level AI applies algorithms to the weak points within the contact center

Latest articles

Silicon Valley shaken as open-source AI models Llama 3.1 and Mistral Large 2 rival industry leaders

Trend reversal in technology stocks pushes US megacaps into correction zone

A brand new Chinese video generation model appears to censor politically sensitive topics

Our Newsletter

Apple researchers are developing AI that may “see” and understand screen context

Improve conversation assistants

Practical applications and limitations

Apple is attempting to close the AI ​​gap while the competition is on the rise

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter

Apple is attempting to close the AI gap while the competition is on the rise