Qwen teamA division of the Chinese e-commerce giant Alibaba The development of his growing family of open source QWen large language models (LLMS) has introduced QWQ-32bA brand new 32-billion parameter argumentation model for improving performance in complex problem-solving tasks by learning reinforcement (RL).
The model is offered as an open weight Hug and further Modelscope Under an Apache 2.0 license. This signifies that it is offered for business and research purposes in order that corporations can use them immediately to provide their products and applications (even in the event that they use customers).
It will also be accessed for individual users Qwen chat.
Quan-without questions was Alibaba's answer to Openais original argumentation model O1
QWQ, short for QWen-With questions, was first introduced by Alibaba in November 2024 as an open source argumentation model for the competition with Openas O1 cited.
At the beginning, the model was developed to enhance logical considering and planning by checking and refining his own answers through the inference, a method that made it particularly effective for mathematics and coding tasks.
The initial version of QWQ contained 32 billion parameters and a 32,000-person context length, alibaba emphasizing its ability to surpass the O1 previews in mathematical benchmarks comparable to Aime and arithmetic in addition to mathematical tasks comparable to GPQA.
Despite its strengths, the early iterations of QWQ fought with programming benchmarks comparable to Livecodebench, where the models from Openaai kept a lead. In addition, as with many aspiring argumentation models, QWQ was challenges comparable to language mix and occasional circular argument.
Alibaba's decision to publish the model under an Apache 2.0 license, nevertheless, ensured that developers and firms were capable of freely adapt and commercialize it, which distinguishes alternatives comparable to Openas O1.
Since the primary publication of QWQ, the AI ​​landscape has quickly developed. The restrictions of traditional LLMs have change into more obvious, with scaling laws decreasing returns in performance improvements.
This postponement has the interest in large argumentation models (LRMS) Awakened and a brand new category of AI systems that use inference argument and self-reflection to enhance accuracy. This includes the O3 series from Openai and the massively successful deepseek-r1 by Rival Chinese Lab Deepseek, an offshoot of the quantitative evaluation company in Hong Kong, the high-flyer capital management.
A brand new report From the online traffic evaluation and research company similarly, Deepseek has been the charts because the start of R1 in January 2024 to change into probably the most visited AI model providing website behind Openaai.
QWQ-32b, Alibaba's latest iteration, builds on this progress by integrating the RL and structured self-survey, which suggests that it’s positioned as a serious competitor within the growing area of ​​the argumentation-oriented AI.
Scaling the performance with multi -stage reinforcement learning
Traditional models for lesson reduction often fight with difficult argumentation tasks, but research by the QWen team suggests that RL can significantly improve the power of a model to unravel complex problems.
QWQ-32b builds on this concept by implementing a multi-stage RL training approach to enhance mathematical considering, skills and general problem solving.
The model was checked against leading alternatives comparable to Deepseek-R1, O1-Mini and Deekseek-R1 distilled-Qwen-32B, which shows competing results, although they’ve fewer parameters than a few of these models.

For example, while Deepseek-R1 works with 671 billion parameters (with 37 billion activated), QWQ-32B achieves a comparable performance with a much smaller footprint-normally required 24 GB VRAM on a GPU (Nvidia's H100S have 80 GB) in comparison with greater than 1500 GB VRAM For the operation of the total Deepseek R1 (16 NVIDIA A100 GPUS), the efficiency of the QWen -RL approach is emphasized.
QWQ-32B follows a causal voice model architecture and accommodates several optimizations:
- 64 transformer layers with rope, Swiglu, RMS standard and a focus QKV preload;
- Generalized query attention (GQA) with 40 attention heads for queries and eight for key value pairs;
- Extended context length of 131,072 tokens that enable higher handling of long -term inputs;
- Multi-stage training including stem, supervised fine-tuning and RL.
The RL process for QWQ-32B was executed in two phases:
- Mathematics and coding focus: The model was trained using an accuracy tester for mathematical considering and a code execution server for the coding of tasks. This approach ensured that generated answers were validated for correctness before they were reinforced.
- General ability to enhance: In a second phase, the model received a reward base with general reward models and rule -based checkers. This phase improved the instructions, the direction of man and argument for agents without affecting mathematics and coding functions.
What it means for enterprise decisions
For the corporate manager inlay, CEOs, CTOs, IT executives, team managers and AI application developers, QWQ-32B represents a possible shift within the AI ​​that may support business decisions and technical innovations.
With its RL-controlled argumentation functions, the model can provide more precise, structured and context-related knowledge of what makes it precious for application cases comparable to automated data evaluation, strategic planning, software development and intelligent automation.
Companies that wish to use AI solutions for complex problem solutions, coding aid, financial modeling or customer support stood-up may find the efficiency of QWQ-32b as a lovely option. In addition, the provision of open weight enables corporations to optimize and adapt the model for domain -specific applications without proprietary restrictions and to adapt, which makes it a versatile selection for the strategies of corporations AI.
The proven fact that it comes from a Chinese e-commerce giant can raise some security and concerns for some non-Chinese users, especially if the QWen chat interface is used. But as with Deepseek-R1, the proven fact that the model for download and offline use in addition to the fine-tuning or retraining is upset that they could be easily overcome. And it’s a practical alternative to Deepseek-R1.
Early reactions from AI performance users and influencers
The publication of QWQ-32B has already attracted attention from the AI ​​research and development community. Several developers and industry experts share their first impressions on X (formerly Twitter):
- Hug the face Vaibhav Srivastav (@Reach_vb) Emphasized the speed of QWQ-32B in a row because of the provider Hyperbolic laboratoriescall it “pale” and comparable to the highest models. He also noticed that the “Deepseek-R1 and Openaai O1-Mini with Apache 2.0 license beats”.
- AI messages and rumor publisher Chubby (@kimmonism) Was impressed by the performance of the model and emphasized that QWQ-32b Deepseek-R1 sometimes exceeds, regardless that it’s 20 times smaller. “Saint Moly! Qwen cooked! ” She wrote.
- Yuchen Jin (@yuchenj_uw), Co -founder and CTO of hyperbolic laboratoriesPresent Celebrated the publication by determining the efficiency gains. “Small models are so powerful! Alibaba Qwen published QWQ-32b, an argumentation model, the Deepseek-R1 (671b) and Openaai O1-Mini defeated! “
- Another hugging facial assembly member, Erik Kaunismäki (@erikkaum) emphasized the straightforward provision and informed that the model is offered to be used with a click for hug -heeling perspective and makes it accessible to developers without extensive furnishings.
Agent skills
QWQ-32B accommodates the agent functions in order that it will probably adapt dynamic argumentation processes based on environmental feedback.
For optimal performance, the QWen team recommends using the next inference settings:
- temperature: 0.6
- Top: 0.95
- Great: Between 20-40
- Yarn scaling: Recommended for coping with sequences that longer than 32,768 tokens
The model supports the availability with VLLM, an inference framework with high throughput. However, the present implementations of VLLM only support static yarn scaling, which maintains a set scaling factor whatever the input length.
Future developments
The QWEN team sees QWQ-32B as step one in scaling RL to enhance the argumentation functions. With a view to the long run, the team is planning:
- Further research the scaling of RL to enhance the model information.
- Integrating agents in RL for long-term argument;
- Develop other basic models optimized for RL;
- Use advanced training techniques towards artificial general intelligence (AGI).
With QWQ-32B, the QWen team RL positions as a crucial driver of the subsequent generation of AI models and shows that scaling can produce high-colored and effective argumentation systems.

