Watch out, DeepSeek and Qwen! There’s a brand new king of open source large language models (LLMs), especially in terms of something enterprises are increasingly valuing: agentic tool use — that’s, the flexibility to go off and use other software capabilities like web search or bespoke applications — without much human guidance.
That model is none aside from MiniMax-M2, the most recent LLM from the Chinese startup of the identical name. And in an enormous win for enterprises globally, the model is accessible under a permissive, enterprise-friendly MIT License, meaning it’s made available freely for developers to take, deploy, retrain, and use how they see fit — even for business purposes. It could be found on Hugging Face, GitHub and ModelScope, in addition to through MiniMax’s API here. It supports OpenAI and Anthropic API standards, as well, making it easy for purchasers of said proprietary AI startups to shift out their models to MiniMax’s API, in the event that they want.
According to independent evaluations by Artificial Analysis, a third-party generative AI model benchmarking and research organization, M2 now ranks first amongst all open-weight systems worldwide on the Intelligence Index—a composite measure of reasoning, coding, and task-execution performance.
In agentic benchmarks that measure how well a model can plan, execute, and use external tools—skills that power coding assistants and autonomous agents—MiniMax’s own reported results, following the Artificial Analysis methodology, show τ²-Bench 77.2, BrowseComp 44.0, and FinSearchComp-global 65.5.
These scores place it at or near the extent of top proprietary systems like GPT-5 (considering) and Claude Sonnet 4.5, making MiniMax-M2 the highest-performing open model yet released for real-world agentic and tool-calling tasks.
What It Means For Enterprises and the AI Race
Built around an efficient Mixture-of-Experts (MoE) architecture, MiniMax-M2 delivers high-end capability for agentic and developer workflows while remaining practical for enterprise deployment.
For technical decision-makers, the discharge marks a very important turning point for open models in business settings. MiniMax-M2 combines frontier-level reasoning with a manageable activation footprint—just 10 billion lively parameters out of 230 billion total.
This design enables enterprises to operate advanced reasoning and automation workloads on fewer GPUs, achieving near-state-of-the-art results without the infrastructure demands or licensing costs related to proprietary frontier systems.
Artificial Analysis’ data show that MiniMax-M2’s strengths transcend raw intelligence scores. The model leads or closely trails top proprietary systems similar to GPT-5 (considering) and Claude Sonnet 4.5 across benchmarks for end-to-end coding, reasoning, and agentic tool use.
Its performance in τ²-Bench, SWE-Bench, and BrowseComp indicates particular benefits for organizations that rely upon AI systems able to planning, executing, and verifying complex workflows—key functions for agentic and developer tools inside enterprise environments.
As LLM engineer Pierre-Carl Langlais aka Alexander Doria posted on X: “MiniMax (is) making a case for mastering the technology end-to-end to get actual agentic automation.”
Compact Design, Scalable Performance
MiniMax-M2’s technical architecture is a sparse Mixture-of-Experts model with 230 billion total parameters and 10 billion lively per inference.
This configuration significantly reduces latency and compute requirements while maintaining broad general intelligence.
The design allows for responsive agent loops—compile–run–test or browse–retrieve–cite cycles—that execute faster and more predictably than denser models.
For enterprise technology teams, this implies easier scaling, lower cloud costs, and reduced deployment friction. According to Artificial Analysis, the model could be served efficiently on as few as 4 NVIDIA H100 GPUs at FP8 precision, a setup well close by for mid-size organizations or departmental AI clusters.
Benchmark Leadership Across Agentic and Coding Workflows
MiniMax’s benchmark suite highlights strong real-world performance across developer and agent environments. The figure below, released with the model, compares MiniMax-M2 (in red) with several leading proprietary and open models, including GPT-5 (considering), Claude Sonnet 4.5, Gemini 2.5 Pro, and DeepSeek-V3.2.
MiniMax-M2 achieves top or near-top performance in lots of categories:
-
SWE-bench Verified: 69.4 — near GPT-5’s 74.9
-
ArtifactsBench: 66.8 — above Claude Sonnet 4.5 and DeepSeek-V3.2
-
τ²-Bench: 77.2 — approaching GPT-5’s 80.1
-
GAIA (text only): 75.7 — surpassing DeepSeek-V3.2
-
BrowseComp: 44.0 — notably stronger than other open models
-
FinSearchComp-global: 65.5 — best amongst tested open-weight systems
These results show MiniMax-M2’s capability in executing complex, tool-augmented tasks across multiple languages and environments—skills increasingly relevant for automated support, R&D, and data evaluation inside enterprises.
Strong Showing in Artificial Analysis’ Intelligence Index
The model’s overall intelligence profile is confirmed in the most recent Artificial Analysis Intelligence Index v3.0, which aggregates performance across ten reasoning benchmarks including MMLU-Pro, GPQA Diamond, AIME 2025, IFBench, and τ²-Bench Telecom.
MiniMax-M2 scored 61 points, rating as the very best open-weight model globally and following closely behind GPT-5 (high) and Grok 4.
Artificial Analysis highlighted the model’s balance between technical accuracy, reasoning depth, and applied intelligence across domains. For enterprise users, this consistency indicates a reliable model foundation suitable for integration into software engineering, customer support, or knowledge automation systems.
Designed for Developers and Agentic Systems
MiniMax engineered M2 for end-to-end developer workflows, enabling multi-file code edits, automated testing, and regression repair directly inside integrated development environments or CI/CD pipelines.
The model also excels in agentic planning—handling tasks that mix web search, command execution, and API calls while maintaining reasoning traceability.
These capabilities make MiniMax-M2 especially beneficial for enterprises exploring autonomous developer agents, data evaluation assistants, or AI-augmented operational tools.
Benchmarks similar to Terminal-Bench and BrowseComp reveal the model’s ability to adapt to incomplete data and get better gracefully from intermediate errors, improving reliability in production settings.
Interleaved Thinking and Structured Tool Use
A particular aspect of MiniMax-M2 is its interleaved considering format, which maintains visible reasoning traces between
This enables the model to plan and confirm steps across multiple exchanges, a critical feature for agentic reasoning. MiniMax advises retaining these segments when passing conversation history to preserve the model’s logic and continuity.
The company also provides a Tool Calling Guide on Hugging Face, detailing how developers can connect external tools and APIs via structured XML-style calls.
This functionality allows MiniMax-M2 to serve because the reasoning core for larger agent frameworks, executing dynamic tasks similar to search, retrieval, and computation through external functions.
Open Source Access and Enterprise Deployment Options
Enterprises can access the model through the MiniMax Open Platform API and MiniMax Agent interface (an internet chat much like ChatGPT), each currently free for a limited time.
MiniMax recommends SGLang and vLLM for efficient serving, each offering day-one support for the model’s unique interleaved reasoning and tool-calling structure.
Deployment guides and parameter configurations can be found through MiniMax’s documentation.
Cost Efficiency and Token Economics
As Artificial Analysis noted, MiniMax’s API pricing is ready at $0.30 per million input tokens and $1.20 per million output tokens, amongst essentially the most competitive within the open-model ecosystem.
|
Provider |
Model (doc link) |
Input $/1M |
Output $/1M |
Notes |
|
MiniMax |
$0.30 |
$1.20 |
Listed under “Chat Completion v2” for M2. |
|
|
OpenAI |
$1.25 |
$10.00 |
Flagship model pricing on OpenAI’s API pricing page. |
|
|
OpenAI |
$0.25 |
$2.00 |
Cheaper tier for well-defined tasks. |
|
|
Anthropic |
$3.00 |
$15.00 |
Anthropic’s current per-MTok list; long-context (>200K input) uses a premium tier. |
|
|
|
$0.30 |
$2.50 |
Prices include “considering tokens”; page also lists cheaper Flash-Lite and a couple of.0 tiers. |
|
|
xAI |
$0.20 |
$0.50 |
“Fast” tier; xAI also lists Grok-4 at $3 / $15. |
|
|
DeepSeek |
$0.28 |
$0.42 |
Cache-hit input is $0.028; table shows per-model details. |
|
|
Qwen (Alibaba) |
from $0.022 |
from $0.216 |
Tiered by input size (≤128K, ≤256K, ≤1M tokens); listed “Input price / Output price per 1M”. |
|
|
Cohere |
$2.50 |
$10.00 |
First-party pricing page also lists Command R ($0.50 / $1.50) and others. |
Notes & caveats (for readers):
-
Prices are USD per million tokens and may change; check linked pages for updates and region/endpoint nuances (e.g., Anthropic long-context >200K input, Google Live API variants, cache discounts).
-
Vendors may bill extra for server-side tools (web search, code execution) or offer batch/context-cache discounts.
While the model produces longer, more explicit reasoning traces, its sparse activation and optimized compute design help maintain a positive cost-performance balance—a bonus for teams deploying interactive agents or high-volume automation systems.
Background on MiniMax — an Emerging Chinese Powerhouse
MiniMax has quickly develop into one of the vital closely watched names in China’s fast-rising AI sector.
Backed by Alibaba and Tencent, the corporate moved from relative obscurity to international recognition inside a 12 months—first through breakthroughs in AI video generation, then through a series of open-weight large language models (LLMs) aimed squarely at developers and enterprises.
The company first captured global attention in late 2024 with its AI video generation tool, “video-01,” which demonstrated the flexibility to create dynamic, cinematic scenes in seconds. VentureBeat described how the model’s launch sparked widespread interest after online creators began sharing lifelike, AI-generated footage—most memorably, a viral clip of a Star Wars lightsaber duel that drew hundreds of thousands of views in under two days.
CEO Yan Junjie emphasized that the system outperformed leading Western tools in generating human movement and expression, an area where video AIs often struggle. The product, later commercialized through MiniMax’s Hailuo platform, showcased the startup’s technical confidence and inventive reach, helping to determine China as a serious contender in generative video technology.
By early 2025, MiniMax had turned its attention to long-context language modeling, unveiling the MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01. These open-weight models introduced an unprecedented 4-million-token context window, doubling the reach of Google’s Gemini 1.5 Pro and dwarfing OpenAI’s GPT-4o by greater than twentyfold.
The company continued its rapid cadence with the MiniMax-M1 release in June 2025, a model focused on long-context reasoning and reinforcement learning efficiency. M1 prolonged context capability to 1 million tokens and introduced a hybrid Mixture-of-Experts design trained using a custom reinforcement-learning algorithm often called CISPO. Remarkably, VentureBeat reported that MiniMax trained M1 at a complete cost of about $534,700, roughly one-tenth of DeepSeek’s R1 and much below the multimillion-dollar budgets typical for frontier-scale models.
For enterprises and technical teams, MiniMax’s trajectory signals the arrival of a brand new generation of cost-efficient, open-weight models designed for real-world deployment. Its open licensing—starting from Apache 2.0 to MIT—gives businesses freedom to customize, self-host, and fine-tune without vendor lock-in or compliance restrictions.
Features similar to structured function calling, long-context retention, and high-efficiency attention architectures directly address the needs of engineering groups managing multi-step reasoning systems and data-intensive pipelines.
As MiniMax continues to expand its lineup, the corporate has emerged as a key global innovator in open-weight AI, combining ambitious research with pragmatic engineering.
Open-Weight Leadership and Industry Context
The release of MiniMax-M2 reinforces the growing leadership of Chinese AI research groups in open-weight model development.
Following earlier contributions from DeepSeek, Alibaba’s Qwen series, and Moonshot AI, MiniMax’s entry continues the trend toward open, efficient systems designed for real-world use.
Artificial Analysis observed that MiniMax-M2 exemplifies a broader shift in focus toward agentic capability and reinforcement-learning refinement, prioritizing controllable reasoning and real utility over raw model size.
For enterprises, this implies access to a state-of-the-art open model that could be audited, fine-tuned, and deployed internally with full transparency.
By pairing strong benchmark performance with open licensing and efficient scaling, MiniMaxAI positions MiniMax-M2 as a practical foundation for intelligent systems that think, act, and assist with traceable logic—making it one of the vital enterprise-ready open AI models available today.

