The Chinese e-commerce giant Alibaba has conducted waves worldwide within the technical and business communities with its family of “Qwen” generative AI-major language models.
Why?
Well, his models will not be only powerful and points high at Benchmark tests of third-party providers to do mathematics, natural sciences, argument and writing tasks, but were largely published under satisfactory open source license terms that enable organizations and corporations to adapt them, generally to adapt them for all variants of purposes Use, even commercially, even business. Imagine as a substitute for Deepseek.
This week published alibabas “Qwen-Team”, as its AI division is understood, the most recent updates for its QWen family and already attracted the eye of Ki-Power users within the west for his or her top performance, which in a single case even the Mitte-Juli-2025 model of the New Kimi-2 model from the rival Chinese Brought out ai startup moon shot.
The New QWEN3-235B-A22B-22507-instruct model – Published on KI code Sharing Community Hug face next to A “floating point 8” or FP8 versionWhat we cover in additional detail below-improved in comparison with the unique QWen 3 on argumentation tasks, factual accuracy and multilingual understanding. It also surpasses the “non -thinking” version of Claude Opus 4.
According to its creators, the brand new QWen3 model update also provides higher coding results, concentrate on user preferences and long contexts. But that's not all …
Read on what still offers company users and technical decision -makers.
With the FP8 version, firms can calculate QWEN 3 with far fewer memory and much less
In addition to the brand new model QWEN3-235B-A22B-22507, the model The QWen team published a “FP8” versionWhat stands for 8-bit swimming pointA format that compresses the numerical operations of the model to make use of less storage and processing performance – without noticeably influence the performance.
In practice, because of this firms can do a model with the functions of QWEN3 for smaller, cheaper hardware or more efficiently within the cloud. The result is quicker response times, lower energy costs and the flexibility to scale deprivation without requiring an enormous infrastructure.
This makes the FP8 model particularly attractive for production environments with tight latency or cost restrictions. The teams can scale the functions of QWEN3 in GPU instances or local development machines, which requires massive multi-GPU cluster. It also lowers the barrier for personal high-quality votes and native deployments through which the infrastructure resources are finite and the full costs for owner matters.
Although the QWen team has not published official calculations, comparisons with similar quantized FP8 calculations indicate that the efficiency savings are significant. Here is a practical illustration ():
| Metric | BF16 / BF16-EMP-BUILD | FP8 quantized construct |
|---|---|---|
| GPU memory use* | ≈ 640 GB total (8 × H100-80 GB, TP-8) | ≈ 320 GB Total variety of the beneficial 4 × H100-80 GB, TP-4 community with the bottom footprint: ~ 143 GB over 2 × H100 with Ollama-Lading |
| Inquiries † † | ~ 74 token / s (batch = 1, context = 2 k, 8 × H20-96 GB, TP-8) | ~ 72 tokens / s (same settings, 4 × H20-96 GB, TP-4) |
| Power / energy | Full of nodes of eight H100S ~ 4 to 4.5 kW draws under load (550–600 W per card, plus host) ‡ | FP8 needs half of the cards and moves half of the info. NVIDIAS Hopper FP8 case studies report ≈ 35-40 % lower TCO and energy with comparable throughput |
| GPUS required (practical) | 8 × H100-80 GB (TP-8) or 8 × A100-80 GB for parity | 4 × H100-80 GB (TP-4). 2 × H100 is feasible with aggressive outline on the expense of latency |
No more “hybrid argument” … as an alternative, QWen will publish separate argument and instruct models!
The QWen team announced that it should now not pursue a “hybrid” argument that it attributed to QWEN 3 in April and seemed inspired by an approach that was pioneering work by the sovereignty.
This enabled users to change an “argumentation” model and to enable the AI model before answering “AI chains” and to supply “chains to think”.
In a way, it should imitate the argumentation skills of powerful proprietary models reminiscent of Openais “O” series (O1, O3, O4-Mini, O4-Mini-High), which also produce “chains of the thought”.
In contrast to the rival models, which at all times go into such “argument” for each command prompt, QWen 3 could manually switch the argument mode by the user by clicking on or off by the user by clicking on a “Memorial mode” button on the QWen website or “/considering” before your input request on an area or privately executed model.
The idea was to provide users to regulate the slower and more token -intensive mode of considering for tougher input requests and tasks and to make use of a non -thinking mode for easier input requests. However, this has provided the user with responsibility for the choice. Although it’s flexible, it also introduced the design complexity and inconsistent behavior in some cases.
Now as a QWen team wrote in his announcement contribution to X:
With the 2507-update first only an instruction or non-boundary model-Alibaba now not spreads each approaches in a single model. Instead, separate model variants for instructions and argumentation tasks are trained.
The result’s a model that adheres to user instructions more precisely, generates more predictable answers and, as Benchmark data show, are significantly improved over several evaluation domains.
Performance benchmarks and applications
Compared to its predecessor, the QWEN3-235B-A22B-INSTRUCT-22507 model delivers measurable improvements:
- MMLU-Pro values rise from 75.2 to 83.0A remarkable profit normally knowledge performance.
- GPQA and SuperGPQA benchmarks improve by 15 to twenty percentage pointsReflect stronger factual accuracy.
- Consideration tasks Like AIME25 and ARC-AGI, the previous performance show greater than twice as high.
- Codegenization improvesWith Livecodebech values from 32.9 to 51.8.
- Multilingual support expandsSupported by improved reporting on long -tails and a greater alignment about dialects.
The model maintains a mix of experts (MEE), the 8 out of 128 experts in the course of the inference with a complete of 235 billion parameters activated 22 billion of that are lively in any respect times.
As already mentioned, the FP8 version introduces higher quantization for higher inference speed and reduced memory use.
Enterprise-capable of design
In contrast to many open source LELMs, which are sometimes only published under restrictive licenses for research results or require API access to business use, QWEN3 is precisely aimed toward providing firms.
With a revealing Apache 2.0 licenseThis signifies that firms can use it freely for business applications. You can too:
- Set models locally or via Openai-compatible APIS with Vllm and Sglang
- Fine-tunes models that Lora or Qlora privately exposed to proprietary data
- Log and inspect all input requests and outputs to acquire compliance and auditing
- Scalate from the prototype to production using dense variants (from 0.6b to 32b) or MOE control points
Alibaba's team also presented Qwen-agentA lightweight framework that abstracts the logic of tool invocation logic for users that construct agent systems.
Benchmarks reminiscent of Tau-Retail and BFCL-V3 suggest that the instruction model can perform multi-stage decision-making tasks-in the region of specially built agents.
Reactions of the community and industry
The publication was already well received by AI performance users.
Covered PaulAI teacher and founding father of the private LLM Chatbot host Blue Shell AiPresent posted A comparison card on x, which exceeds QWen3-235b-A22B-Instruct-22507, the Claude Opus 4 and Kimi K2 on benchmarks reminiscent of GPQA, AIME25 and Arena-Hhard V2 surpasses and calls it
Influenced Nik (@ns123abc)commented on his quick effects:
In the meantime, Jeff BoudierProduct head on the embrace faceemphasized the availability benefits:
He praised the supply of an FP8 checkpoint for faster inference, 1-click provision on Azure ML and support for local use via MLX on Mac or Int4 builds from Intel.
The entire tone of developers was enthusiastic since the balance between performance, licensing and provision appeals to each hobbyists and specialists.
What's next for QWen team?
Alibaba already laid the muse for future updates. A separate argumentation device model is within the pipeline, and the Qwen Roadmap is increasingly showing agent systems which can be in a position to plan long-term horizon tasks.
It can also be expected that multimodal support, that are observed in QWEN2.5-omni and Qwen-VL models, will proceed to expand.
And rumors and rumors have already begun when QWen team members teasing one other update for the incoming model family Updates of your web properties Interchangeable URL strings for a brand new QWen3 coodern 480B-A35B instruct model, probably a 480 billion parameter mixture of the experts (MOE) with a token context of 1 million.
What QWEN3-235B-A22B-INSTRUCT-25507 ultimately signals just isn’t only an additional jump within the benchmark performance, but in addition a maturation of open models as a practical alternative to proprietary systems.
The flexibility of the use, the strong general performance and the corporate -friendly licensing gives the model a singular advantage in a crowded area.
For teams who need to integrate prolonged models into their AI stack with instructions–without the restrictions on the supplier barrier or usage-based fees qwen3 a serious contender.

