The introduction of ChatGPT has led to the widespread use of enormous language models (LLMs) in each the technology and non-technology industries. This popularity is principally attributable to two aspects:
- LLMs as knowledge repositories: LLMs are trained on a considerable amount of Internet data and updated periodically (i.e. GPT-3, GPT-3.5, GPT-4, GPT-4o and others);
- New Skills: As LLMs grow, they reveal themselves skills not found on smaller models.
Does this mean we’ve already achieved human intelligence, which we call artificial general intelligence (AGI)? Gartner defined AGI is a type of AI that has the power to know, learn, and apply knowledge across a wide selection of tasks and areas. The road to AGI is long. A serious hurdle is the autoregressive nature of LLM training, where words are predicted based on previous sequences. One of the pioneers of AI research is Yann LeCun points out that LLMs may vary from exact answers attributable to their autoregressive nature. Consequently, LLMs have several limitations:
- Limited Knowledge: Although LLMs are trained on massive amounts of information, they lack current world knowledge.
- Limited Reasoning: LLMs have limited reasoning ability. As Subbarao Kambhampati points out LLMs are good knowledge collectors, but not good thinkers.
- No dynamics: LLMs are static and can’t access real-time information.
To overcome the challenges of the LLM, a more progressive approach is required. Agents play an important role here.
agents to the rescue
The concept of intelligent agent in AI has evolved over twenty years, with implementations changing over time. Nowadays, agents are discussed within the context of LLMs. Simply put, an agent is sort of a Swiss army knife for LLM challenges: it may help us think, provide means to acquire current information from the Internet (solving dynamics problems with LLM), and complete a task autonomously. With LLM because the backbone, an agent formally includes tools, memory, reasoning (or planning), and motion components.
Components of AI agents
- Tools enable agents to access external information – be it from the Internet, databases or APIs – and collect the mandatory data.
- Memory might be short-term or long-term. Agents use notepad memory to temporarily store results from various sources, while chat history is an example of long-term memory.
- The Reasoner enables agents to think methodically and break complex tasks into manageable sub-tasks for effective processing.
- Actions: Agents perform actions based on their environment and considerations, and iteratively adjust and solve tasks through feedback. ReAct is some of the common methods for iteratively carrying out considerations and actions.
What are agents good at?
Agents excel at complex tasks, especially after they work in a single Role play Mode that takes advantage of the improved performance of LLMs. For example, for those who're writing a blog, one agent might deal with the research while one other handles the writing – each coping with one concrete sub-goal. This multi-agent approach might be applied to quite a few real-world problems.
Role-playing games help agents deal with specific tasks to realize larger goals, thereby significantly reducing hallucinations Define parts a prompt – reminiscent of role, instruction and context. Since LLM performance depends upon well-structured prompts, various frameworks formalize this process. One such framework, CrewAI, provides a structured approach to defining role-playing games, as we are going to discuss next.
Multi-agents vs. single agents
Take the instance of Retrieval Augmented Generation (RAG) with a single agent. This is an efficient solution to enable LLMs to handle domain-specific queries by leveraging information from indexed documents. However, single agent RAG has its own limitationsreminiscent of retrieval performance or document rating. Multi-agent RAG overcomes these limitations by deploying specialized agents for document understanding, retrieval, and classification.
In a multi-agent scenario, agents collaborate in other ways, much like distributed computing patterns: sequential, centralized, decentralized, or shared message pools. Frameworks like CrewAI, Autogen and langGraph+langChain enable complex problems to be solved using multi-agent approaches. In this text, I used CrewAI as a frame of reference to explore autonomous workflow management.
Workflow management: A use case for multi-agent systems
Most industrial processes involve managing workflows, be it loan processing, managing marketing campaigns, and even DevOps. To achieve a selected goal, sequential or cyclical steps are required. In a conventional approach, each step (e.g. loan application review) requires a human to undertake the tedious and mundane task of manually processing and reviewing each application before moving on to the subsequent step.
Each step requires input from an authority in the sphere. In a multi-agent setup with CrewAI, each step is carried out by a team of multiple agents. For example, when reviewing credit applications, one agent may confirm the user's identity through background checks on documents reminiscent of a driver's license, while one other agent verifies the user's financial information.
This raises the query: Can a single team (with multiple agents so as or hierarchy) handle all steps of loan processing? Although possible, it complicates crew operations, requires extensive temporary memory, and increases the chance of off-target and hallucination. A simpler approach is to treat each loan processing step as a separate crew and examine your complete workflow as a graph of crew nodes (using tools like langGraph) working sequentially or cyclically.
Because LLMs are still at an early stage of intelligence, complete workflow management can’t be completely autonomous. A human-in-the-loop is required at key stages of end-user verification. For example, after the team completes the loan application verification step, human oversight is required to validate the outcomes. Over time, as trust in AI grows, some steps may develop into fully autonomous. Currently, AI-based workflow management plays a supporting role by streamlining tedious tasks and reducing overall processing time.
Production challenges
Introducing multi-agent solutions into production can pose several challenges.
- Scaling: As the variety of agents increases, collaboration and management develop into difficult. Various frameworks offer scalable solutions – for instance Llamaindex uses an event-driven workflow to administer multiagents at scale.
- Latency: Agent performance often introduces latency because tasks are executed iteratively and require multiple LLM calls. Managed LLMs (like GPT-4o) are slow attributable to implicit guardrails and network delays. Self-hosted LLMs (with GPU control) are handy in solving latency issues.
- Performance and Hallucination Issues: Due to the probabilistic nature of LLM, the agent's performance may vary with each execution. Techniques reminiscent of output templates (e.g. JSON format) and providing quite a few examples in prompts may help reduce response variability. The problem of hallucination might be further reduced through training agents.
Final thoughts
As Andrew Ng points this outAgents are the longer term of AI and can proceed to evolve alongside LLMs. Multi-agent systems will make progress in processing multimodal data (text, images, video, audio) and tackling increasingly complex tasks. While AGI and fully autonomous systems are still on the horizon, multiagents will bridge the present gap between LLMs and AGI.