HomeArtificial IntelligenceAlibaba releases Qwen with Questions, an open argumentation model that outperforms o1-preview

Alibaba releases Qwen with Questions, an open argumentation model that outperforms o1-preview

Chinese e-commerce giant Alibaba has released the most recent model in its ever-growing Qwen family. This is generally known as Qwen with Questions (QwQ) and serves as the most recent open source competitor to OpenAI's o1 reasoning model.

Like other Large Reasoning Models (LRMs), QwQ uses extra computational cycles during inference to examine its answers and proper its errors, making it more suitable for tasks that require logical considering and planning, corresponding to mathematics and coding.

What is Qwen with Questions (OwQ?) and may or not it’s used for industrial purposes?

Alibaba has released a 32 billion parameter version of QwQ with a 32,000 token context. The model is currently in preview, meaning a more powerful version is prone to follow.

According to Alibaba's testing, QwQ outperforms o1-preview on the AIME and MATH benchmarks that assess math problem-solving skills. It also outperforms o1-mini on GPQA, a measure of scientific reasoning. QwQ performs worse than o1 in LiveCodeBench coding benchmarks, but still outperforms other borderline models corresponding to GPT-4o and Claude 3.5 Sonnet.

Sample output from Qwen with questions

QwQ doesn’t include an accompanying paper describing the info or process used to coach the model, making it difficult to breed the model's results. However, since the model is open, unlike OpenAI o1, its “thought process” is just not hidden and could be used to grasp how the model reasons when solving problems.

Alibaba has also released the model under an Apache 2.0 license, meaning it may be used for industrial purposes.

“We discovered something profound”

According to a Blog post which was published together with the discharge of the model: “Through thorough exploration and countless experiments, we’ve got discovered something profound: when we’ve got time to think, query, and reflect, the model's understanding of mathematics and programming blooms like a flower taking up the Sun opens.”…This technique of careful reflection and self-questioning results in remarkable breakthroughs in solving complex problems.”

This could be very just like what we learn about how reasoning models work. By generating more tokens and checking their previous answers, the models usually tend to correct potential errors. Marco-o1, one other reasoning model recently released by Alibaba, may additionally hold clues to how QwQ might work. Marco-o1 used Monte Carlo tree search (MCTS) and self-reflection on the time of conclusion to create different branches of reasoning and choose the most effective answers. The model was trained using a combination of Chain of Thinking (CoT) examples and artificial data generated using MCTS algorithms.

Alibaba points out that QwQ still has limitations, corresponding to: B. mixing languages ​​or getting stuck in circular reasoning loops. The model is accessible for download at Hugging face and a web-based demo could be found at Embrace facial spaces.

The LLM era gives approach to LRMs: Large Reasoning Models

The release of o1 has sparked growing interest in constructing LRMs, although not much is understood about how the model works under the hood apart from using the inference time scale to enhance model responses.

There are actually several Chinese competitors to o1. Chinese AI lab DeepSeek recently released R1 Lite Preview, its o1 competitor, which is currently only available through the corporate's online chat interface. R1 Lite Preview reportedly beats O1 in several key benchmarks.

Another recently released model is LLaVA-o1, developed by researchers from several universities in China, which transfers the inference-time reasoning paradigm to open-source vision-language models (VLMs).

The give attention to LRMs comes at a time of uncertainty concerning the way forward for model scaling laws. Reports suggest that AI labs corresponding to OpenAI, Google DeepMind and Anthropic are seeing diminishing returns as they train larger models. And creating larger amounts of high-quality training data is becoming increasingly difficult because models are already trained on trillions of tokens collected from the web.

Meanwhile, inference timescale offers another that might provide the following breakthrough in improving the capabilities of the following generation of AI models. There are reports that OpenAI is that this Using o1 to generate synthetic reasoning data to coach the following generation of its LLMs. The publication of open reasoning models is anticipated to stimulate progress and make the sector more competitive.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read