Open source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 - at 95% lower cost

January 21, 2025

142

Chinese AI startup DeepSeekwhich is thought for difficult AI leaders with open source technologies, has just dropped one other bombshell: a brand new open reasoning LLM called DeepSeek-R1.

Based on the recently launched DeepSeek V3 expert model, DeepSeek-R1 matches the performance of o1, OpenAI's Frontier Reasoning LLM, on math, coding and reasoning tasks. The better part? This happens at a rather more tempting price and seems to be 90-95% cheaper than the latter.

The release marks a significant step forward within the open source area. It shows that open models proceed to shut the gap to closed industrial models in the bogus general intelligence (AGI) race. To display the facility of its work, DeepSeek also used R1 to distill six Llama and Qwen models and take their performance to recent levels. In one case, the distilled version of Qwen-1.5B outperformed the much larger GPT-4o and Claude 3.5 Sonnet models in select mathematical benchmarks.

These distilled models, together with the Main R1have been open sourced and can be found on Hugging Face under an MIT license.

What does DeepSeek-R1 bring?

There is increasing give attention to artificial general intelligence (AGI), a level of AI that may perform mental tasks like humans. Many teams are increasingly working to enhance the reasoning abilities of models. OpenAI took the primary notable step on this area with its o1 model, which uses a chain-of-thinking process to unravel an issue. Through RL (reinforcement learning or reward-driven optimization), o1 learns to refine its chain of thought and refine the strategies it uses – ultimately learning to acknowledge and proper its mistakes or try recent approaches when the present ones don't work.

To proceed work on this direction, DeepSeek has now released DeepSeek-R1, which uses a mix of RL and supervised fine-tuning to tackle complex reasoning tasks and achieve o1 performance.

When tested, DeepSeek-R1 achieved 79.8% on the AIME 2024 math tests and 97.3% on MATH-500. It also achieved a rating of two,029 on Codeforces – higher than 96.3% of human programmers. In contrast, o1-1217 achieved 79.2%, 96.4% and 96.6% on these benchmarks, respectively.

Strong general knowledge was also demonstrated, with an accuracy of 90.8% on MMLU, just behind o1's 91.8%.

Performance of DeepSeek-R1 in comparison with OpenAI o1 and o1-mini

The training pipeline

DeepSeek-R1's reasoning power represents a significant win for the Chinese startup within the US-dominated AI space, especially since all of its work is open source, including how the corporate trained the entire thing.

However, the work just isn’t as easy because it sounds.

According to the paper describing the research, DeepSeek-R1 was developed as an enhanced version of DeepSeek-R1-Zero – a groundbreaking model trained exclusively through reinforcement learning.

https://twitter.com/DrJimFan/status/1881353126210687089

The company initially used the DeepSeek V3 base as a base model and developed its reasoning capabilities without the usage of monitored data, essentially focusing only on its self-development through a purely RL-based trial and error process. This ability, intrinsically developed from work, ensures that the model can solve increasingly complex reasoning tasks through the use of longer test time calculations to explore and refine its thought processes more deeply.

“During training, DeepSeek-R1-Zero naturally exhibited quite a few powerful and interesting pondering behaviors,” the researchers note within the paper. “After hundreds of RL steps, DeepSeek-R1-Zero shows excellent performance on reasoning benchmarks. For example, in AIME 2024, the pass@1 rating increases from 15.6% to 71.0%, and under majority voting, the rating further improves to 86.7%, which is in keeping with the performance of OpenAI-o1-0912.”

Although the unique model showed improved performance, including behaviors reminiscent of reflection and exploration of alternatives, it had some problems, including poor readability and language mixing. To address this issue, the corporate built on the work done for R1-Zero and used a multi-stage approach that combined each supervised learning and reinforcement learning, developing the improved R1 model.

“Specifically, we’re beginning to collect hundreds of cold start data to refine the DeepSeek V3 Base model,” the researchers explained. “We then perform argumentative RL like DeepSeek-R1-Zero. As we approach convergence within the RL process, we create recent SFT data through rejection sampling on the RL checkpoint, combined with monitored data from DeepSeek-V3 in areas reminiscent of writing, factual QA and self-discovery, after which retrain DeepSeek-V3 – Basic model. After fine-tuning with the brand new data, the checkpoint goes through an extra RL process that takes inputs from all scenarios under consideration. After these steps, we obtained a probe called DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.”

Far cheaper than o1

In addition to the improved performance, which nearly rivals OpenAI's o1 in all benchmarks, the brand new DeepSeek-R1 can be very reasonably priced. Specifically, OpenAI o1 costs $15 per million input tokens and $60 per million output tokens. DeepSeek Reasoner, based on the R1 model, Cost $0.55 per million input and $2.19 per million output tokens.

https://twitter.com/EMostaque/status/1881310721746804810

The model will be tested as “DeepThink”. DeepSeek chat platformwhich is analogous to ChatGPT. Interested users can access the model weights and code repository via Hugging Face under an MIT license or use the API for direct integration.

Open source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 – at 95% lower cost

What does DeepSeek-R1 bring?

The training pipeline

Far cheaper than o1

LEAVE A REPLY Cancel reply

Must Read

How much information really remember? Now we all know due to Meta, Google, Nvidia and Cornell

'Godfather' by Ai Yoshua Bengio says that the newest models are lying

“Pate of Ai” now fears that it’s unsure. He has a plan to contain it again

BIG FOUR FIRMS Run to the event of audits for AI products

Google claims

According to CEO, 780 million inquiries were received last month, in line with CEO

Solidroad has just 6.5 million US

Latest articles

How much information really remember? Now we all know due to Meta, Google, Nvidia and Cornell

'Godfather' by Ai Yoshua Bengio says that the newest models are lying

“Pate of Ai” now fears that it’s unsure. He has a plan to contain it again

Our Newsletter

Open source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 – at 95% lower cost

What does DeepSeek-R1 bring?

The training pipeline

Far cheaper than o1

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter