HomeIndustriesAnthropic releases Claude 3, which outperforms GPT-4 in benchmarks

Anthropic releases Claude 3, which outperforms GPT-4 in benchmarks

San Francisco-based AI startup Anthropic has released its latest LLM with its family of Claude 3 models.

Claude 3 is available in three variations: Haiku, Sonnet and Opus. For the less poetic amongst us, which means small, medium and enormous. Claude 3 Opus is Anthropic's most advanced model and the primary within the industry to say to beat OpenAI's GPT-4 in quite a lot of benchmarks.

GPT-4 has long been the gold standard that AI firms use to check their LLM performance. Words like “close” or “almost” have often been utilized in these comparisons, but Anthropic can ultimately claim to outperform GPT-4.

Here are the benchmark numbers for Claude 3 in comparison with GPT-4, GPT-3 and Gemini Ultra and Pro.

Claude 3 benchmark numbers in comparison with GPT-4, GPT-3.5, Gemini Ultra and Gemini Pro. Source: Anthropopic

It is price noting that the GPT-4 numbers above are those provided by OpenAI in its technical report before the discharge of GPT-4. The Model card Claude 3 acknowledges that higher values ​​have been reported for GPT-4 Turbo.

Still, the Claude 3 Opus figures are an enormous deal. Despite the inevitable arguments over how the corporate arrived at these numbers, Anthropic says that Claude 3 Opus represents “higher intelligence than every other model available.”

The cost of the Claude 3 Opus input/output API ranges from $15 to $75 per million tokens. That's loads in comparison with the GPT-4 Turbo, which costs $10 to $30. The Claude 3 Sonnet ($3 / $15) and Claude 3 Haiku ($0.25 / $1.25) offer really good value for money once you take a look at the performance specs of those smaller models.

If you ought to try Claude 3 totally free, you’ll be able to accomplish that at Anthropic Claude.ai Chatbot once its servers have recovered from the frenzy of traffic. It is operated by Claude 3 Sonnet, with paying Pro users having access to Opus.

Claude 3 models should not multimodal, but have impressive visual capabilities. They can't generate a picture for you, however the benchmarks show that Opus is sweet at analyzing photos, charts, graphs, and technical charts.

Claude 3-Vision features in comparison with GPT-4V, Gemini Ultra and Gemini Pro. Source: Anthropopic

According to Anthropic, the Claude 3 models are able to accepting inputs of greater than 1 million tokens. However, for many users, the context window is restricted to 200,000 tokens for now. That's still loads greater than the 128k context of GPT-4 Turbo.

A big context window is barely useful when coupled with a very good memory, and Anthropic claims that Opus provides a “near-perfect memory with over 99% accuracy.”

Something interesting happened through the Claude 3 Opus “needle in a haystack” recall test. When asked an issue that might only be answered if he recognized the inserted “needle” phrase, he indicated that he understood that he was being tested. Impressive and slightly scary.

Claude 3 Opus realized that it was being tested. Source: X

Anthropic is an enormous proponent of what it calls “Constitutional AI,” which goals to enhance the safety and transparency of its models. In Claude 2, this desire for security resulted in lots of requests that were actually harmless being rejected.

Claude 3 is best at understanding the nuances of prompts to raised determine what does and doesn't conflict with Anthropic's guardrails. Claude 3 also achieves significantly better accuracy and reduced hallucinations in comparison with Claude 2.1.

An example of a prompt that Claude 2.1 doesn't need to answer while Claude 3 recognizes it as protected.

Some AI pessimists claim that we’re heading into an AI winter and that LLM model performance is plateauing, but Anthropic disagrees. The company doesn’t imagine that “model intelligence is anywhere near its limits.”

There are plans to deliver several interesting upgrades to Claude 3 in the longer term, including enhanced agent functionality including tool usage in addition to interactive coding (REPL).

Due to the high prices, the initial marketplace for Claude 3 Opus could also be more area of interest research or skilled applications. The prices and services offered by Sonnet and Haiku are currently more likely to be essentially the most widely accepted.

Will we see a drop in the value of OpenAI? With OpenAI under pressure at the highest of the benchmarks, we want to get very near a GPT-5 announcement.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read