As more firms look toward the so-called agent future, one obstacle may lie in how AI models are built. For enterprise AI developers A121The answer is obvious: the industry must search for other model architectures to enable more efficient AI agents.
Ari Goshen, CEO of AI21, said in an interview with VentureBeat that Transformers, the most well-liked model architecture, has limitations that may make a multi-agent ecosystem difficult.
“One trend I see is the rise of non-transformer architectures, and these alternative architectures will likely be more efficient,” Goshen said. “Transformers work by generating so many tokens that they’ll turn into very expensive.”
AI21, which focuses on developing AI solutions for enterprises, has previously outlined that Transformers must be a model architecture option, but not the usual. It develops base models using its JAMBA architecture, short for Joint Attention and Mamba architecture. It relies on the Mamba architecture developed by researchers from Princeton University and Carnegie Mellon University, which may provide faster inference times and longer context.
Goshen said alternative architectures like Mamba and Jamba can often make agent structures more efficient and, most significantly, more cost-effective. In his opinion, Mamba-based models have higher memory performance, which might allow agents, especially agents that connect with other models, to perform higher.
He attributes the rationale AI agents are only now gaining popularity – and why most agents haven't moved into products yet – to the reliance on LLMs built with transformations.
“The fundamental reason agents usually are not yet in production mode is reliability or lack of reliability,” Goshen said. “When you break down a transformer model, you already know that it is vitally stochastic, so all of the errors persist.”
Corporate agents have gotten increasingly popular
AI agents have emerged as one in all the most important trends in enterprise AI this yr. Several firms have introduced AI agents and platforms to simplify the creation of agents.
ServiceNow announced updates to its Now Assist AI platform, including a library of AI agents for purchasers. Salesforce has an agent base called Agentforce, while Slack has began allowing users to integrate agents from Salesforce, Cohere, Workday, Asana, Adobe, and more.
Goshen believes that with the best mixture of models and model architectures, this trend will turn into much more popular.
“Some use cases we’re seeing now, like questions and answers from a chatbot, are essentially glorified search,” he said. “I believe true intelligence is connecting and retrieving diverse information from sources.”
Goshen added that AI21 is currently developing offerings around AI agents.
Other architectures are vying for attention
Goshen strongly supports alternative architectures corresponding to AI21's Mamba and Jamba, primarily because he believes Transformer models are too expensive and cumbersome to operate.
Instead of an attention mechanism that forms the backbone of Transformer models, Mamba can prioritize different data and assign weights to inputs, optimize memory usage, and leverage the processing power of a GPU.
Mamba is having fun with growing popularity. Other open-source and open-weight AI developers have begun releasing Mamba-based models in recent months. Mistral released Codestral Mamba 7B in July and in August Falcon released its own Mamba-based model, Falcon Mamba 7B.
However, when developing foundation models, transformer architecture has turn into the default selection, if not the usual. OpenAI's GPT is after all a Transformer model – that's literally within the name – but so are most other popular models.
Goshen said firms ultimately want the more reliable approach. But firms also should be wary of flashy demos that promise to unravel a lot of their problems.
“We're on the stage where charismatic demos are easy to do, but we're closer to it than we were on the product stage,” Goshen said. “It’s fantastic to make use of enterprise AI for research purposes, but we’re not yet at the purpose where firms can use it to tell decisions.”