HomeArtificial IntelligenceHow Foundation Agents Can Revolutionize AI Decision-Making within the Real World

How Foundation Agents Can Revolutionize AI Decision-Making within the Real World

Foundation models have revolutionized the fields of computer vision and natural language processing. Now a bunch of researchers imagine the identical principles could be applied to create foundation agents—AI systems that may perform open-ended decision-making tasks within the physical world.

In a brand new Position paperResearchers at University of Chinese Academy of Sciences describe Foundation Agents as “general-performance agents in physical and virtual worlds” that “will represent a paradigm shift for decision making, much like how (large language models) LLMs function general language models for solving linguistic and knowledge-based tasks.”

Foundation Agents will facilitate the creation of versatile AI systems for the true world and might have a significant impact on areas that depend on fragile and task-specific AI systems.

The challenges of AI decision making

Traditional approaches to AI decision-making have several shortcomings. Expert systems rely heavily on formalized human knowledge and manually created rules. Reinforcement learning (RL) systems, which have turn into increasingly popular lately, have to be trained from scratch for every recent task, making them sampling inefficient and limiting their ability to generalize to recent environments. Imitation learning (IL), where AI learns to make decisions based on human demonstrations, also requires extensive human effort to create training examples and motion sequences.

In contrast, LLMs and Vision Language Models (VLMs) can quickly adapt to different tasks with minimal fine-tuning or prompting. The researchers imagine that with some adjustments, the identical approach could be used to create basic agents that may handle open-ended decision-making tasks within the physical and virtual worlds.

Some of the important thing features of base models will help create base agents for the true world. First, LLMs could be pre-trained on large, unlabeled datasets from the Internet to achieve an amazing amount of information. Second, the models can use this data to quickly adapt to human preferences and specific tasks.

Properties of primers

The researchers identified three fundamental properties of foundation agents:

1. A unified representation of environmental states, agent actions and feedback signals.

2. A unified policy interface that could be applied to diverse tasks and domains, from robotics and gameplay to healthcare and beyond.

3. A choice-making process based on considerations of world knowledge, the environment, and other aspects.

“These properties make foundation agents unique and difficult by giving them the flexibility to multimodally perceive, multitask, adapt across domains, and generalize with few or no shots,” the researchers write.

A guide for foundation agents

The researchers propose a roadmap for the event of Foundation Agents that features three key components.

First, large amounts of interactive data have to be collected from the Internet and physical environments. In environments where real-world interactive data is rare or at great risk, simulators and generative models comparable to Sora could be used.

Second, the bottom agents are pre-trained on the unlabeled data. This step allows the agent to learn decision-related knowledge representations that turn into useful when adapting the model for specific tasks. For example, the model could be fine-tuned on a small dataset where rewards or outcomes can be found, or adapted through prompt engineering. The knowledge gained in the course of the pre-training phase allows the model to adapt to recent tasks with far fewer examples during this adaptation phase.

“Self-supervised (unsupervised) pre-training for decision making allows basic agents to learn without reward signals and encourages the agent to learn from suboptimal offline datasets,” the researchers write. “This is especially applicable when large, unlabeled data could be easily collected from the Internet or from real-world simulators.”

Third, basic agents have to be aligned with large language models to integrate world knowledge and human values.

Challenges and opportunities for foundation agents

Developing foundation agents presents several challenges in comparison with language and image models. The information within the physical world consists of low-level details relatively than high-level abstract information. This makes it difficult to create consistent representations of the variables involved within the decision-making process.

In addition, there’s a big domain gap between different decision scenarios, which makes it difficult to develop a unified policy interface for foundation agents. One solution could also be to create a unified foundation model that takes under consideration all modalities, environments and possible actions. However, this will likely end in the model becoming increasingly complex and uninterpretable.

While language and image models are focused on understanding and generating content, the bottom agents have to be involved within the dynamic technique of choosing optimal actions based on complex environmental information.

The authors propose several research directions that will help bridge the gap between current foundational models and foundational agents that may perform open-ended tasks and adapt to unpredictable environments and novel situations.

There have already been interesting advances in robotics where the principles of control systems and base models have been brought together to create systems which are more versatile and transfer well to situations and tasks that weren’t included within the training data. These models use the extensive general knowledge of LLMs and VLMs to reason concerning the world and select the proper actions in previously unknown situations.

Another vital area is self-driving cars, where Researchers explore how large language models could be used to integrate common knowledge and human cognitive skills into autonomous driving systems. The researchers suggest other areas comparable to healthcare and science where foundation agents can perform tasks along with human experts.

“Foundational agents have the potential to alter the landscape of agent learning for decision making, much like the revolutionary impact of foundational models in language and vision,” the researchers write. “The enhanced perception, adaptation, and reasoning capabilities of agents not only address the constraints of traditional RL, but are also the important thing to unlocking the total potential of foundational agents in real-world decision making.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read