HomeArtificial IntelligenceMiniMax introduces its own open source LLM with industry-leading 4M token context

MiniMax introduces its own open source LLM with industry-leading 4M token context

MiniMax is probably best known here within the US today because the Singaporean company behind Hailuo, a practical, high-definition generative AI video model that competes with Runway, OpenAI's Sora, and Luma AI's Dream Machine.

But the corporate has lots more tricks up its sleeve: today, for instance, it announced the discharge and open source release of MiniMax-01 seriesa brand new family of models designed to handle extremely long contexts and improve AI agent development.

The series includes MiniMax-Text-01, a basic large language model (LLM), and MiniMax-VL-01, a visible multimodal model.

An enormous context window

MiniMax-Text-o1 is especially notable since it allows as much as 4 million tokens in its context window – that's one A small library's value of books. The context window indicates how much information the LLM can process an input/output exchangewhere words and ideas are represented as numerical “tokens,” the LLM’s internal mathematical abstraction of the information it was trained on.

And while Google was previously ahead with its Gemini 1.5 Pro model and 2 million token context windowsMiniMax has remarkably doubled that.

As MiniMax posted on his official X account today: “MiniMax-01 efficiently processes as much as 4 million tokens – 20 to 32 times the capability of other leading models.” We imagine MiniMax-01 will support the expected increase in agent-related applications in the approaching 12 months as agents increasingly expand their capabilities for context processing and require persistent storage.”

The models are actually available for download Hugging face And Github under a custom MiniMax licensein order that users can try it on directly Hailuo AI chat (a ChatGPT/Gemini/Claude competitor) and via MiniMax Application programming interface (API)where third party developers can link their very own unique apps to it.

MiniMax offers APIs for text and multimodal processing at competitive prices:

  • $0.2 per 1 million tokens entered
  • $1.1 per 1 million tokens issued

For comparison: OpenAI’s GPT-4o costs $2.50 per 1 million tokens entered an incredible 12.5x cost via its API.

MiniMax has also integrated a Mix of Experts (MoE) framework with 32 experts to optimize scalability. This design balances compute and storage efficiency while ensuring competitive performance on key benchmarks.

Break recent ground with Lightning Attention Architecture

At the center of MiniMax-01 is a Lightning Attention mechanism, an modern alternative to transformer architecture.

This design significantly reduces computational complexity. The models consist of 456 billion parameters, of which 45.9 billion are enabled per inference.

Unlike previous architectures, Lightning Attention uses a mixture of linear and traditional SoftMax layers, achieving near-linear complexity for long inputs. SoftMaxfor those like me who’re recent to the concept, these are converting input numbers into probabilities that sum to 1 in order that the LLM can approximate the meaning of the input that’s most certainly.

MiniMax has rebuilt its training and inference frameworks to support the Lightning Attention architecture. Key improvements include:

  • MoE all-to-all communication optimization: Reduces communication overhead between GPUs.
  • Varlen draws attention to himself: Minimizes the computational effort required to process long sequences.
  • Efficient kernel implementations: Tailored CUDA kernels improve Lightning Attention performance.

These advances make MiniMax-01 models accessible to real-world applications while remaining inexpensive.

Performance and benchmarks

In mainstream text and multimodal benchmarks, MiniMax-01 competes with top models akin to GPT-4 and Claude-3.5 and achieves particularly good leads to long-context evaluations. Notably, MiniMax-Text-01 achieved 100% accuracy Needle in a haystack task with a 4 million token context.

The models also exhibit minimal performance degradation because the input length increases.

MiniMax plans regular updates to expand the capabilities of the models, including code and multimodal improvements.

The company views open sourcing as a step toward constructing foundational AI capabilities for the evolving AI agent landscape.

With 2025 expected to be a 12 months of change for AI agents, the necessity for sustainable memory and efficient communication between agents is increasing. MiniMax's innovations are designed to beat these challenges.

Open to collaboration

MiniMax invites developers and researchers to explore the capabilities of MiniMax-01. Beyond open sourcing, the team welcomes technical suggestions and collaboration requests at model@minimaxi.com.

With its commitment to low-cost and scalable AI, MiniMax is positioning itself as a key player in shaping the era of AI agents. The MiniMax-01 series offers developers an exciting opportunity to push the boundaries of what long-context AI can do.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read