DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

March 25, 2025

343

Chinese AI startup DeepSeek has quietly released a brand new large language model that’s already sending ripples through the synthetic intelligence industry — not only for its capabilities, but for the way it’s being deployed. The 641-gigabyte model, dubbed DeepSeek-V3-0324, appeared on AI repository Hugging Face today with virtually no announcement, continuing the corporate’s pattern of low-key but impactful releases.

What makes this launch particularly notable is the model’s MIT license — making it freely available for business use — and early reports that it may well run directly on consumer-grade hardware, specifically Apple’s Mac Studio with M3 Ultra chip.

The latest Deep Seek V3 0324 in 4-bit runs at > 20 toks/sec on a 512GB M3 Ultra with mlx-lm! pic.twitter.com/wFVrFCxGS6

— Awni Hannun (@awnihannun) March 24, 2025

“The latest DeepSeek-V3-0324 in 4-bit runs at > 20 tokens/second on a 512GB M3 Ultra with mlx-lm!” wrote AI researcher Awni Hannun on social media. While the $9,499 Mac Studio might stretch the definition of “consumer hardware,” the power to run such an enormous model locally is a serious departure from the information center requirements typically related to state-of-the-art AI.

DeepSeek’s stealth launch strategy disrupts AI market expectations

The 685-billion-parameter model arrived with no accompanying whitepaper, blog post, or marketing push — just an empty README file and the model weights themselves. This approach contrasts sharply with the rigorously orchestrated product launches typical of Western AI firms, where months of hype often precede actual releases.

Early testers report significant improvements over the previous version. AI researcher Xeophon proclaimed in a post on X.com: “Tested the brand new DeepSeek V3 on my internal bench and it has an enormous jump in all metrics on all tests. It is now one of the best non-reasoning model, dethroning Sonnet 3.5.”

Tested the brand new DeepSeek V3 on my internal bench and it has an enormous jump in all metrics on all tests.
It is now one of the best non-reasoning model, dethroning Sonnet 3.5.

Congrats @deepseek_ai! pic.twitter.com/efEu2FQSBe

— Xeophon (@TheXeophon) March 24, 2025

This claim, if validated by broader testing, would position DeepSeek’s latest model above Claude Sonnet 3.5 from Anthropic, one of the crucial respected business AI systems. And unlike Sonnet, which requires a subscription, DeepSeek-V3-0324‘s weights are freely available for anyone to download and use.

How DeepSeek V3-0324’s breakthrough architecture achieves unmatched efficiency

DeepSeek-V3-0324 employs a mixture-of-experts (MoE) architecture that fundamentally reimagines how large language models operate. Traditional models activate their entire parameter count for each task, but DeepSeek’s approach prompts only about 37 billion of its 685 billion parameters during specific tasks.

This selective activation represents a paradigm shift in model efficiency. By activating only essentially the most relevant “expert” parameters for every specific task, DeepSeek achieves performance comparable to much larger fully-activated models while drastically reducing computational demands.

The model incorporates two additional breakthrough technologies: Multi-Head Latent Attention (MLA) and Multi-Token Prediction (MTP). MLA enhances the model’s ability to keep up context across long passages of text, while MTP generates multiple tokens per step as an alternative of the same old one-at-a-time approach. Together, these innovations boost output speed by nearly 80%.

Simon Willison, a developer tools creator, noted in a blog post that a 4-bit quantized version reduces the storage footprint to 352GB, making it feasible to run on high-end consumer hardware just like the Mac Studio with M3 Ultra chip.

This represents a potentially significant shift in AI deployment. While traditional AI infrastructure typically relies on multiple Nvidia GPUs consuming several kilowatts of power, the Mac Studio draws lower than 200 watts during inference. This efficiency gap suggests the AI industry may have to rethink assumptions about infrastructure requirements for top-tier model performance.

China’s open source AI revolution challenges Silicon Valley’s closed garden model

DeepSeek’s release strategy exemplifies a fundamental divergence in AI business philosophy between Chinese and Western firms. While U.S. leaders like OpenAI and Anthropic keep their models behind paywalls, Chinese AI firms increasingly embrace permissive open-source licensing.

This approach is rapidly transforming China’s AI ecosystem. The open availability of cutting-edge models creates a multiplier effect, enabling startups, researchers, and developers to construct upon sophisticated AI technology without massive capital expenditure. This has accelerated China’s AI capabilities at a pace that has shocked Western observers.

The business logic behind this strategy reflects market realities in China. With multiple well-funded competitors, maintaining a proprietary approach becomes increasingly difficult when competitors offer similar capabilities without spending a dime. Open-sourcing creates alternative value pathways through ecosystem leadership, API services, and enterprise solutions built atop freely available foundation models.

Even established Chinese tech giants have recognized this shift. Baidu announced plans to make its Ernie 4.5 model series open-source by June, while Alibaba and Tencent have released open-source AI models with specialized capabilities. This movement stands in stark contrast to the API-centric strategy employed by Western leaders.

The open-source approach also addresses unique challenges faced by Chinese AI firms. With restrictions on access to cutting-edge Nvidia chips, Chinese firms have emphasized efficiency and optimization to attain competitive performance with more limited computational resources. This necessity-driven innovation has now turn into a possible competitive advantage.

DeepSeek V3-0324: The foundation for an AI reasoning revolution

The timing and characteristics of DeepSeek-V3-0324 strongly suggest it can function the inspiration for DeepSeek-R2, an improved reasoning-focused model expected inside the subsequent two months. This follows DeepSeek’s established pattern, where its base models precede specialized reasoning models by several weeks.

“This lines up with how they released V3 around Christmas followed by R1 just a few weeks later. R2 is rumored for April so this might be it,” noted Reddit user mxforest.

The implications of a complicated open-source reasoning model can’t be overstated. Current reasoning models like OpenAI’s o1 and DeepSeek’s R1 represent the innovative of AI capabilities, demonstrating unprecedented problem-solving abilities in domains from mathematics to coding. Making this technology freely available would democratize access to AI systems currently limited to those with substantial budgets.

The potential R2 model arrives amid significant revelations about reasoning models’ computational demands. Nvidia CEO Jensen Huang recently noted that DeepSeek’s R1 model “consumes 100 times more compute than a non-reasoning AI,” contradicting earlier industry assumptions about efficiency. This reveals the remarkable achievement behind DeepSeek’s models, which deliver competitive performance while operating under greater resource constraints than their Western counterparts.

If DeepSeek-R2 follows the trajectory set by R1, it could present a direct challenge to GPT-5, OpenAI’s next flagship model rumored for release in coming months. The contrast between OpenAI’s closed, heavily-funded approach and DeepSeek’s open, resource-efficient strategy represents two competing visions for AI’s future.

How to experience DeepSeek V3-0324: An entire guide for developers and users

For those desirous to experiment with DeepSeek-V3-0324, several pathways exist depending on technical needs and resources. The complete model weights can be found from Hugging Face, though the 641GB size makes direct download practical just for those with substantial storage and computational resources.

For most users, cloud-based options offer essentially the most accessible entry point. OpenRouter provides free API access to the model, with a user-friendly chat interface. Simply select DeepSeek V3 0324 because the model to start experimenting.

DeepSeek’s own chat interface at chat.deepseek.com has likely been updated to the new edition as well, though the corporate hasn’t explicitly confirmed this. Early users report the model is accessible through this platform with improved performance over previous versions.

Developers trying to integrate the model into applications can access it through various inference providers. Hyperbolic Labs announced immediate availability as “the primary inference provider serving this model on Hugging Face,” while OpenRouter offers API access compatible with the OpenAI SDK.

DeepSeek-V3-0324 Now Live on Hyperbolic ?

At Hyperbolic, we’re committed to delivering the newest open-source models as soon as they’re available. This is our promise to the developer community.

Start inferencing today. pic.twitter.com/495xf6kofa

— Hyperbolic (@hyperbolic_labs) March 24, 2025

DeepSeek’s latest model prioritizes technical precision over conversational warmth

Early users have reported a noticeable shift within the model’s communication style. While previous DeepSeek models were praised for his or her conversational, human-like tone, “V3-0324” presents a more formal, technically-oriented persona.

“Is it only me or does this version feel less human like?” asked Reddit user nother_level. “For me the thing that set apart deepseek v3 from others were the indisputable fact that it felt more like human. Like the tone the words and such it was not robotic sounding like other llm’s but now with this version its like other llms sounding robotic af.”

Another user, AppearanceHeavy6724, added: “Yeah, it lost its aloof charm of course, it feels too mental for its own good.”

This personality shift likely reflects deliberate design decisions by DeepSeek’s engineers. The move toward a more precise, analytical communication style suggests a strategic repositioning of the model for skilled and technical applications reasonably than casual conversation. This aligns with broader industry trends, as AI developers increasingly recognize that different use cases profit from different interaction styles.

For developers constructing specialized applications, this more precise communication style may very well represent a bonus, providing clearer and more consistent outputs for integration into skilled workflows. However, it might limit the model’s appeal for customer-facing applications where warmth and approachability are valued.

How DeepSeek’s open source strategy is redrawing the worldwide AI landscape

DeepSeek’s approach to AI development and distribution represents greater than a technical achievement — it embodies a fundamentally different vision for the way advanced technology should propagate through society. By making cutting-edge AI freely available under permissive licensing, DeepSeek enables exponential innovation that closed models inherently constrain.

This philosophy is rapidly closing the perceived AI gap between China and the United States. Just months ago, most analysts estimated China lagged 1-2 years behind U.S. AI capabilities. Today, that gap has narrowed dramatically to perhaps 3-6 months, with some areas approaching parity and even Chinese leadership.

The parallels to Android’s impact on the mobile ecosystem are striking. Google’s decision to make Android freely available created a platform that ultimately achieved dominant global market share. Similarly, open-source AI models may outcompete closed systems through sheer ubiquity and the collective innovation of 1000’s of contributors.

The implications extend beyond market competition to fundamental questions on technology access. Western AI leaders increasingly face criticism for concentrating advanced capabilities amongst well-resourced corporations and individuals. DeepSeek’s approach distributes these capabilities more broadly, potentially accelerating global AI adoption.

As DeepSeek-V3-0324 finds its way into research labs and developer workstations worldwide, the competition isn’t any longer simply about constructing essentially the most powerful AI, but about enabling essentially the most people to construct with AI. In that race, DeepSeek’s quiet release speaks volumes in regards to the way forward for artificial intelligence. The company that shares its technology most freely may ultimately wield the best influence over how AI reshapes our world.

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

DeepSeek’s stealth launch strategy disrupts AI market expectations

How DeepSeek V3-0324’s breakthrough architecture achieves unmatched efficiency

China’s open source AI revolution challenges Silicon Valley’s closed garden model

DeepSeek V3-0324: The foundation for an AI reasoning revolution

How to experience DeepSeek V3-0324: An entire guide for developers and users

DeepSeek’s latest model prioritizes technical precision over conversational warmth

How DeepSeek’s open source strategy is redrawing the worldwide AI landscape

LEAVE A REPLY Cancel reply

Must Read

2025 war das Jahr, in dem KI einen Vibe-Check erhielt

AI agents got here onto the market in 2025 – here's what happened and what challenges lie ahead in 2026

Deepfakes have increased in 2025 – here's what's next

The 12 months data centers took center stage from the backend

As AI recreates the feminine voice, it also rewrites who’s heard

How can Canada develop into a worldwide AI powerhouse? By investing in mathematics

MIT within the media: 2025 in review

Latest articles

2025 war das Jahr, in dem KI einen Vibe-Check erhielt

AI agents got here onto the market in 2025 – here's what happened and what challenges lie ahead in 2026

Deepfakes have increased in 2025 – here's what's next

Our Newsletter

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

DeepSeek’s stealth launch strategy disrupts AI market expectations

How DeepSeek V3-0324’s breakthrough architecture achieves unmatched efficiency

China’s open source AI revolution challenges Silicon Valley’s closed garden model

DeepSeek V3-0324: The foundation for an AI reasoning revolution

How to experience DeepSeek V3-0324: An entire guide for developers and users

DeepSeek’s latest model prioritizes technical precision over conversational warmth

How DeepSeek’s open source strategy is redrawing the worldwide AI landscape

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter