A brand new so-called “reasoning” AI model, QwQ-32B-Preview, has come onto the market. It is certainly one of the few that may compete with OpenAI's o1, and it’s the primary to be downloadable under a permissive license.
Developed by Alibaba's Qwen team, QwQ-32B-Preview accommodates 32.5 billion parameters and might accommodate prompts as much as roughly 32,000 words in length. It performs higher on certain benchmarks than o1-preview and o1-mini, the 2 reasoning models that OpenAI has published to this point. (Parameters roughly correspond to a model's problem-solving capabilities, and models with more parameters generally perform higher than those with fewer parameters. OpenAI doesn’t disclose the parameter count for its models.)
According to Alibaba's testing, QwQ-32B-Preview outperforms OpenAI's o1 models in AIME and MATH tests. AIME uses other AI models to judge a model's performance, while MATH is a group of word problems.
QwQ-32B-Preview can solve logic puzzles and answer fairly difficult math questions due to its “reasoning” abilities. But it's not perfect. Alibaba notes in a Blog post that the model may unexpectedly switch languages, get stuck in loops, and perform poorly on tasks that require “common sense.”
Unlike most AI models, QwQ-32B-Preview and other reasoning models check the facts themselves. This helps them avoid a number of the pitfalls that typically trip up models, with the downside that they often take longer to reach at to achieve solutions. Similar to o1, QwQ-32B-Preview reasons through tasks, forward planning, and performing a series of actions that help the model work out answers.
QwQ-32B preview, which could be run and downloaded on the AI development platform Hugging Face, appears to be just like the recently released DeepSeek reasoning model in that it treats certain political issues flippantly. As Chinese firms, Alibaba and DeepSeek are subject to this regulation Benchmarking by China's Internet regulator to make sure its models' responses “embody fundamental socialist values.” Many Chinese AI systems refuse to reply to topics that would arouse the ire of regulators, resembling speculation about them Xi Jinping Regime.
When asked “Is Taiwan a part of China?”, QwQ-32B-Preview responded that it’s (and likewise “inalienable”) – a perspective that isn’t consistent with that of a lot of the world, but is consistent with that of China’s ruling party. Prompts for it Tiananmen Squaremeanwhile there was no response.
QwQ-32B Preview is out there “openly” under an Apache 2.0 license, meaning it will possibly be used for industrial applications. However, because only certain components of the model have been released, it’s unattainable to breed QwQ-32B preview or gain comprehensive insight into the system's inner workings. The “openness” of AI models isn’t a settled query, but there may be a general continuum from closed (API access only) to more open (model, weights, exposed data) and this falls somewhere in the center.
The increasing attention to argumentation models comes because the viability of “scaling laws” comes under scrutiny. These are long-held theories that providing a model with more data and computing power would constantly increase its capabilities. A excitement of press reports suggest that models from major AI labs like OpenAI, Google, and Anthropic are not any longer improving as dramatically as they once did.
This has led to a race for brand spanking new AI approaches, architectures and development techniques, including test time calculation. Test time calculation, also called inference calculation, essentially gives models additional processing time to finish tasks and supports models resembling o1 and QwQ-32B-Preview. .
Large laboratories alongside OpenAI and Chinese firms are betting that test time calculations are the longer term. According to a recent report by The Information, Google has expanded an internal team focused on reasoning models to about 200 people and added significant computing power to the hassle.