AI21 Labs' recent AI model can handle more context than most

March 29, 2024

153

The AI industry is increasingly oriented towards generative AI models with longer contexts. However, models with large context windows are likely to be computationally intensive. Or Dagan, product lead at AI startup AI21 Labs, claims this doesn't should be the case – and his company is releasing a generative model to prove it.

Contexts or context windows confer with input data (e.g. text) that a model considers before generating output (more text). Models with small context windows are likely to forget the content of even very recent conversations, while models with larger contexts avoid this danger – and as an additional benefit, higher capture the flow of information they ingest.

AI21 Labs' Jamba, a brand new text generation and evaluation model, can perform lots of the same tasks as models like OpenAI's ChatGPT and Google's Gemini. Jamba is trained on a combination of public and proprietary data and may write text in English, French, Spanish and Portuguese.

Jamba can process as much as 140,000 tokens while running on a single GPU with at the least 80GB of memory (like a high-end Nvidia A100). That's about 105,000 words or 210 pages – a novel of decent size.

In comparison, Metas Llama 2 has a context window with 32,000 tokens – barely smaller by today's standards – but only requires a GPU with around 12GB of memory to run. (Context windows are typically measured in tokens, that are bits of raw text and other data.)

At first glance, Jamba is unremarkable. There are loads of freely available, downloadable generative AI models, from Databricks' recently released DBRX to the aforementioned Llama 2.

But what makes Jamba unique is what's under the hood. It uses a mix of two model architectures: transformers and state space models (SSMs).

Transformers are the architecture of alternative for complex reasoning tasks, powering models similar to GPT-4 and Google's Gemini. They have several unique properties, however the defining feature of transformers is their “attention mechanism.” For each input data item (e.g. a sentence), transforms the relevance of all other inputs (other sentences) and generates the output (a brand new sentence) from them.

SSMs, alternatively, mix several qualities of older varieties of AI models, similar to recurrent neural networks and convolutional neural networks, to create a more computationally efficient architecture able to processing long sequences of information.

Now SSMs have their limitations. But among the early incarnations, including an open-source model called Mamba from researchers at Princeton and Carnegie Mellon, can handle larger inputs than their transformer-based equivalents and outperform them at language generation tasks.

In fact, Jamba uses Mamba as a part of the core model – and Dagan claims that it delivers 3 times the throughput for long contexts in comparison with transformer-based models of comparable size.

“Although there are some early academic examples of SSM models, that is the primary production-scale industrial model,” Dagan said in an interview with TechCrunch. “This architecture will not be only progressive and interesting for further research by the community, but additionally opens up great efficiency and throughput opportunities.”

While Jamba has now been released under the Apache 2.0 License, an open source license with relatively few usage restrictions, Dagan emphasizes that it’s a research version and will not be intended for industrial use. The model doesn’t have safeguards to stop malicious text from being generated or mitigations to deal with potential bias. a refined, supposedly “safer” version can be available in the approaching weeks.

However, Dagan claims that Jamba is already demonstrating the promise of the SSM architecture at this early stage.

“The added value of this model, each attributable to its size and progressive architecture, is that it might be easily mounted on a single GPU,” he said. “We imagine performance will proceed to enhance as Mamba receives further optimizations.”

AI21 Labs' recent AI model can handle more context than most

LEAVE A REPLY Cancel reply

Must Read

Google releases technology to watermark AI-generated text

Nuclear energy stocks hit record highs on rising demand for AI

The governor of California has blocked groundbreaking AI security laws. This is why it’s such a very important decision for the longer term of...

Contactless stores set to grow in Europe as Sensei rakes in one other $16 million

AI search start-up Perplexity is targeting an $8 billion valuation in a brand new round of funding

Socket receives recent $40 million to scan software for security vulnerabilities

Cohere adds a vision to its RAG search capabilities

Latest articles

Google releases technology to watermark AI-generated text

Nuclear energy stocks hit record highs on rising demand for AI

The governor of California has blocked groundbreaking AI security laws. This is why it’s such a very important decision for the longer term of...

Our Newsletter

AI21 Labs' recent AI model can handle more context than most

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter