xAI previews Grok-1.5 and creates a brand new benchmark called RealWorldQA

April 14, 2024

149

Elon Musk's xAI has unveiled Grok-1.5, a multimodal AI model designed to beat the competition in understanding real-world scenarios.

The latest Grok-1.5 follows within the footsteps of others, equivalent to GPT-4V, and introduces visual processing to investigate every little thing from documents and charts to graphs, screenshots and photos.

Grok-1.5 It also gains ground in text, coding, and math tasks, scoring 50.6% on the MATH benchmark, 90% on the GSM8K benchmark, and 74.1% on the HumanEval benchmark.

This puts Grok-1.5 right within the LLM heavyweight category and achieves barely lower average values than Gemini Pro 1.5, GPT-4 and Claude 3 Opus.

Grok-1.5's competitive text, math, and coding benchmarks. Source: xAI

Grok-1.5 also offers longer contextual understanding with as much as 128,000 tokens, a 16x increase in comparison with its predecessor, but falls well wanting the degrees touted by Claude 3 Opus and Gemini 1.5 Pro.

The Needle in a Haystack (NIAH) evaluation showed that Grok-1.5 is able to locating embedded text in contexts as much as 128,000 tokens long.

However, it’s Grok-1.5's vision capabilities that drive xAI probably the most.

Demos show how Grok-1.5 converts block schemas into Python code, generates bedtime stories inspired by children's paintings, creates CSV records from screenshots, and even “extends” memes.

Grok-1.5 tops the leaderboard in some established benchmarks equivalent to Mathvista and TextVQA and performs best within the newly established xAI benchmark RealWorldQA.

Grok-1.5's impressive vision benchmarks. Source: xAI

Under the hood, Grok-1.5 is built on a custom distributed training framework that permits the xAI team to prototype ideas and train latest architectures at scale with minimal effort.

xAI was founded last yr It includes a number of the world's leading AI researchers with the extremely ambitious goal of “understanding the universe.”

So far we’ve the fun and edgy Grok-1, which tells people the way to synthesize narcotics and criticizes Musk and Tesla.

Grok can be connected to the post database

Musk's xAI project challenges the predominantly closed-source generative AI ecosystem and makes its models generally available under true Open source licenses.

Combined with Meta, which has an analogous intent to go against the grain of the competition, xAI's open thesis could grow to be a thorn within the side of monetization efforts from OpenAI, Microsoft, Anthropic and Google.

RealWorldQA

In the Grok-1.5 preview, xAI also unveiled RealWorldQA, a brand new benchmark consisting of over 700 images, each accompanied by an issue and a verifiable answer.

The dataset mainly consists of anonymized images captured from vehicles and other real-world situations.

The RealWorldQA dataset is used to guage the spatial understanding capabilities of Grok 1.5 and other multimodal AI models. xAI felt that other benchmarks were missing on this department.

Grok-1.5 outperforms the competition in RealWorldQA and it’s going to be interesting to see if it catches on.

Even if Grok-1.5 is unable to know the universe, it’s going to take its place as one other top model in an ever-expanding product range.

This also shows that generative AI in its current form is reaching the height of its capabilities – although perhaps not for long.

xAI previews Grok-1.5 and creates a brand new benchmark called RealWorldQA

RealWorldQA

LEAVE A REPLY Cancel reply

Must Read

A brand new Chinese video generation model appears to censor politically sensitive topics

OpenAI pronounces “SearchGPT” to remain at the highest

How Salesforce's STEM 1T dataset could revolutionize the AI industry

Forget coding bootcamps: Airtable's AI can construct your app in seconds

Level AI applies algorithms to the weak points within the contact center

ChatGPT: Everything you have to know concerning the AI-powered chatbot

Breakthroughs in artificial intelligence create a brand new ‘brain’ for advanced robots

Latest articles

A brand new Chinese video generation model appears to censor politically sensitive topics

OpenAI pronounces “SearchGPT” to remain at the highest

How Salesforce's STEM 1T dataset could revolutionize the AI industry

Our Newsletter

xAI previews Grok-1.5 and creates a brand new benchmark called RealWorldQA

RealWorldQA

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter