A contemporary AI agent consists of no less than one large language model (LLM) that allows calling some tools. With the correct coding tools, one would first generate the code, run it in a container, observe the outcomes, modify the code and thus have a greater probability of manufacturing useful code.
In contrast, a generative AI model takes some input and produces an output through the technique of predicting expectations. For example, we give it a coding task, it produces code, and depending on the complexity of the duty, the code could also be usable as is.
Because they perform different tasks, agents ought to be allowed to refer to one another. For example, imagine your organization intranet with its useful search box that guides you to the apps and resources you wish. If you're a big enough company, these apps, owned by different departments, will each have their very own search fields. It makes plenty of sense to create agents, perhaps by utilizing techniques like Retrieval Augmented Generation (RAG) to expand search fields. What doesn't make sense is forcing the user to repeat their query after the search box identifies it as useful given the primary query. Rather, we would like the highest agent to coordinate with other agents representing different apps and present you, the user, with a consolidated and unified chat interface.
A multi-agent system representing software or a corporation's various workflows can have several interesting advantages, including improved productivity and robustness, operational resilience, and the flexibility to perform faster upgrades of varied modules. Hopefully this text will allow you to understand how that is achieved.
But how should we first go about constructing these multi-agent systems?
Recording of the organization and roles
First, we should always capture the processes, roles, responsible nodes and connections of varied actors within the organization. By actors, I mean individuals and/or software apps that act as knowledge staff inside the organization.
An organization chart could possibly be a superb place to begin, but I might recommend starting with workflows, as the identical people inside a corporation are likely to operate with different processes and folks depending on the workflow.
There are tools available that use AI to discover workflows, or you may create your individual Gen AI model. I built one as GPT An agent network definition is created from the outline of a website or company name. Since I take advantage of a multi-agent framework developed in my company, GPT creates the network as a Hocon file. However, it ought to be clear from the generated files what roles and responsibilities each agent has and what other agents it’s related to.
Note that we wish to be sure that the agent network is a directed acyclic graph (DAG). This implies that no agent can directly or not directly change into one other agent within the down-chain and up-chain at the identical time. This significantly reduces the likelihood of requests going right into a tailspin on the agent network.
In the examples listed here, all agents are based on LLM. If a node within the multi-agent organization cannot have autonomy, this agent, together with its human counterpart, should control the whole lot through humans. We need all processing nodes, be they apps, people or existing agents, represented as agents.
There have been many announcements recently from firms offering specialized agents. Of course, we might wish to use such resources if available. We can take an existing agent and integrate its API into considered one of our agents in order that we will leverage our agent-to-agent communication protocols. This implies that the API of those third parties have to be available to us.
How to define agents
Various agent architectures have been proposed previously. For example, a Blackboard architecture requires a central communication point where various agents specify their roles and capabilities and the Blackboard invokes them depending on the way it wants to satisfy a request (see OAA).
I prefer a more distributed architecture that respects encapsulation of responsibilities. After each agent receives a request, they resolve whether or not they can process it or not and what they should do to process the request. It then sends its request list back to the requesting up-chain agent. If the agent has down-chains, he asks if he may help fulfill the request in whole or partly. When it receives requests from the contacted down-chains, it checks with other agents to see whether or not they can fulfill them; If not, they’re redirected up-chain so that they can ask the human user. This architecture is known as AAOSA Architecture and – fun fact – was the architecture utilized in early versions of Siri.
Here is an example system prompt that could be used to convert an agent into an AAOSA agent.
When you receive a request, you’ll:
- Call your tools to find out which down-chain agents in your tools are chargeable for all or a part of it
- Ask downstream agents what they should process their a part of the request.
- Once the necessities are captured, delegate the request and the fulfilled requirements to the suitable down-chain agents.
- Once all down-chain agents have responded, compile their responses and return the ultimate response.
- You may in turn be called by other agents within the system and want to act as a down chain for them.
In addition to the roles and responsibilities defined in natural language in each agent's system prompt, agents may or may not contain tools that they might or may not invoke, with various arguments passed to the tools. For example, a product manager agent may have to find a way to process various tickets on a virtual Kanban board, or a notification agent may have to invoke a tool to issue alerts in a notification system.
Current multi-agent systems equivalent to Microsoft AutoGen have sophisticated and sometimes hard-coded agent coordination mechanisms and architectures. I prefer a more robust setup where agents treat their immediate downstream agents as tools, with loosely defined arguments that could be entered and whose semantics are determined by the agents as needed.
In this setup, a down-chain agent could be defined as a function call:
AI agents in practice
Let's explore an HR agent scenario. A user could inform the agent within the major search box of a life-changing event, equivalent to “my spouse has passed away,” triggering legal and human resources agents to play a job in caring for the user. This is because they, in turn, received information from a few of their downstream agents, namely legal, payroll and advantages agents. The resulting consolidated response that the user receives from the major search field agent looks like this:
-
- The Benefits Agent may help the user understand how death could impact advantages, medical insurance and retirement plans. You must provide information in regards to the form of plans involved and whether your spouse was a dependent.
- The payroll clerk may help manage tasks equivalent to updating tax deductions or processing bereavement leave advantages. This requires details about company policies on bereavement leave and current payroll.
-
- Legal representatives can provide general guidance on how this adoption should apply to corporate matters.
Each department needs specific information to supply tailored support.
In an IT scenario, nevertheless, the one entry point is the client service representative, who can handle complex, multi-faceted queries equivalent to: “Our network is slow.” I consider it’s overloaded since we recently increased staff in our office. I also need a quote for an upgrade.”
A sample answer describes the steps:
Network performance assessment: The network ops center specialist may help troubleshoot slowness and potential congestion issues. You need access to network performance data, details in regards to the current network infrastructure, information in regards to the variety of employees added and their network usage patterns.
An upgrade quote: The sales engineer and field service technician can allow you to create a quote for an upgrade. To do that you wish:
- Specific information in regards to the current network setup, including device and bandwidth requirements.
- Desired performance improvements.
- Budget constraints and specific preferences or requirements.
- The scope of the upgrade and any specific performance goals.
I hope this has given you a superb idea of what’s required to construct a multi-agent network. In the second part, I’ll discuss the importance of implementing safeguards when constructing multi-agent systems and the way controls could be inbuilt to permit for human intervention and uncertainty checking. I will even detail the steps required to create a protection agent to watch the agent network and delve deeper into the challenges of developing multi-agent networks – equivalent to tailspins and congestion – and methods to mitigate these using timeouts, task sharing, and redundancy can.
.