HomeArtificial IntelligenceAI21 Labs revamps Gen AI transformers with Jamba

AI21 Labs revamps Gen AI transformers with Jamba

Since the groundbreaking research paper “Attention is All You Need” in 2017, the concept of transformers has dominated the generative AI landscape.

However, transformers aren’t the one way forward for generative AI. A brand new approach from AI21 Labs called “Jamba” is meant to transcend transformers. Jamba combines the Mamba model, based on the Structured State Space model (SSM), with a transformer architecture to create an optimized Gen AI model. Jamba is an acronym that stands for Joint Attention and Mamba (Jamba) architecture and goals to bring together the very best features of SSM and Transformers. Jamba is released as an open source model under the Apache 2.0 license.

To be clear, Jamba is unlikely to interchange the present transformer-based Large Language Models (LLM) today, but it would likely be a complement in certain areas. According to AI21 Labs, Jamba can outperform traditional transformer-based models in generative considering tasks as measured by benchmarks similar to HellaSwag. However, it currently doesn’t outperform transformer-based models on other critical benchmarks similar to Massive Multitask Language Understanding (MMLU) for problem solving.

Jamba shouldn’t be only a new edition of Jurassic from AI21 Labs

AI21 Labs has a specific deal with genetic AI for enterprise use cases. The company raised $155 million in August 2023 to support its growing efforts.

The company's business tools include Wordtune, a streamlined service that helps corporations generate content that matches an organization's tone and brand. A121 Labs told VentureBeat in 2023 that it often competes and wins outright with the genetic AI giant OpenAI within the enterprise business.

Until now, AI21 Labs' LLM technology, like every other LLM, was based on the Transformer architecture. Just a little over a yr ago, the corporate introduced its Jurassic-2 LLM family, a part of the Natural Language Processing (NLP)-as-a-Service platform AI21 Studio and in addition available via APIs for enterprise integrations.

Jamba shouldn’t be an extra development of Jurassic, but quite something completely different as a hybrid SSM and Transformer model.

Not only do they need attention, but additionally they need context

Transformers have dominated the genetic AI landscape to this point, but they still have some shortcomings. Most notable is the indisputable fact that inference generally becomes slower as context windows turn out to be larger.

As AI21 Labs researchers note, a transformer's attention mechanism scales with sequence length, slowing throughput because each token depends upon your complete sequence that preceded it. This ends in long-context use cases falling outside the scope of efficient production.

The other issue highlighted by AI21 Labs is the massive memory requirement for scaling transformers. The transformer's memory footprint scales with context length, making it difficult to run long context windows or quite a few parallel batches without extensive hardware resources.

The context and memory resource issues are two problems that the SSM approach goals to resolve.

Originally proposed by researchers at Carnegie Mellon and Princeton universities, the Mamba SSM architecture has a smaller memory footprint and a distinct attention mechanism for processing large windows of context. However, the Mamba approach has difficulty providing the identical output level as a transformer model. Jamba's hybrid SSM Transformer approach is an try to mix the resource and context optimization of the SSM architecture with the strong output capabilities of a Transformer.

AI21 Labs' Jamba model offers a 256K context window and may deliver 3 times throughput on long contexts in comparison with Mixtral 8x7B. AI21 Labs also claims that Jamba is the one model in its size class to accommodate as much as 140 KB of context on a single GPU.

Notably, Jamba uses a Mixture of Experts (MoE) model, similar to Mixtral. However, Jamba uses MoE as a part of its hybrid SSM Transformer approach, which allows for extreme levels of optimization. Specifically, in keeping with AI21 Labs, Jamba's MoE layers allow it to attract on just 12 billion of its available 52 billion parameters during inference, making those 12 billion energetic parameters more efficient than a transformer-only model of the identical size.

Jamba continues to be in its infancy and shouldn’t be yet a part of an AI21 Labs corporate offering. The company plans to supply an Instruct version as a beta on the AI21 platform soon.


Please enter your comment!
Please enter your name here

Must Read