A Chinese lab has developed one of the crucial powerful “open” AI models thus far.
The model, DeepSeek V3was developed by AI firm DeepSeek and released Wednesday under a permissive license that permits developers to download and modify it for many applications, including industrial ones.
DeepSeek V3 can handle a spread of text-based workloads and tasks, similar to coding, translating, and writing essays and emails via a descriptive prompt.
According to DeepSeek's internal benchmark tests, DeepSeek V3 outperforms each downloadable, “openly” available models and “closed” AI models that may only be accessed via an API. In a subset of coding competitions hosted on Codeforces, a coding competition platform, DeepSeek outperforms other models, including Meta's Llama 3.1 405B, OpenAI's GPT-4o and Alibaba's Qwen 2.5 72B.
DeepSeek V3 also outperforms the competition on Aider Polyglot, a test designed to measure, amongst other things, whether a model can successfully write recent code that integrates with existing code.
DeepSeek V3!
60 tokens/second (3x faster than V2!)
API compatibility intact
Completely open source models and papers
671B MoE parameters
37B activated parameters
Trains with prime quality 14.8T tokensBeats Llama 3.1 405b in almost every benchmark https://t.co/OiHu17hBSI pic.twitter.com/jVwJU07dqf
— Chubby♨️ (@kimmonismus) December 26, 2024
DeepSeek claims that DeepSeek V3 was trained on a dataset of 14.8 trillion tokens. In data science, tokens are used to represent bits of raw data – 1 million tokens equals about 750,000 words.
It's not only the training set that's huge. DeepSeek V3 is big: 671 billion parameters, or 685 billion on the Hugging Face AI development platform. (Parameters are the interior variables that models use to make predictions or decisions.) That's about 1.6 times the dimensions of Llama 3.1 405B, which has 405 billion parameters.
The variety of parameters often (but not all the time) correlates with skill; Models with more parameters are inclined to outperform models with fewer parameters. But large models also require more powerful hardware to operate. A non-optimized version of DeepSeek V3 would require a lot of high-end GPUs to reply questions at an inexpensive speed.
Although it's not essentially the most practical model, DeepSeek V3 is an achievement in some ways. DeepSeek was capable of train the model in nearly two months using a knowledge center with Nvidia H800 GPUs – GPUs that Chinese corporations have recently been using restricted excluded from procurement by the US Department of Commerce. The company also says it only spent $5.5 million on DeepSeek V3 training fraction the event costs of models like OpenAI's GPT-4.
The downside is that the model's political beliefs are a bit… stilted. For example, in the event you ask DeepSeek V3 about Tiananmen Square, you won't get a solution.
Since DeepSeek is a Chinese company, it’s subject to this regulation Benchmarking by China's Internet regulator to make sure its models' responses “embody fundamental socialist values.” Many Chinese AI systems refuse to answer topics that would arouse the ire of regulators, similar to speculation about them Xi Jinping Regime.
DeepSeek, which introduced DeepSeek-R1, a response to OpenAI's o1 “reasoning” model, in late November, is an odd organization. It is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses AI to drive its trading decisions.
High-Flyer is constructing its own server clusters for model training, considered one of the youngest of them allegedly features 10,000 Nvidia A100 GPUs and costs 1 billion yen (~$138 million). Founded by Liang Wenfeng, a pc science graduate, High-Flyer goals to realize “superintelligent” AI through its DeepSeek organization.
In one interview Earlier this 12 months, Wenfeng called closed-source AI like OpenAI a “temporary” moat. “(It) hasn’t stopped others from catching up,” he noted.
Indeed.