HomeArtificial IntelligenceWill Smith Eats Spaghetti and Other Weird AI Benchmarks Launched in 2024

Will Smith Eats Spaghetti and Other Weird AI Benchmarks Launched in 2024

When an organization releases a brand new AI video generator, it doesn't take long before someone uses it to create a video of actor Will Smith eating spaghetti.

It's change into something of a meme and a benchmark: Seeing if a brand new video generator can realistically depict Smith slurping down a bowl of noodles. Smith himself parodied the trend in an Instagram post in February.

Will Smith and Pasta are only one among several bizarre “unofficial” benchmarks set to take the AI ​​community by storm in 2024. A 16-year-old developer has created an app that puts AI in command of Minecraft and tests its ability to design structures. Elsewhere, a British programmer has created a platform where AI plays games like Pictionary and Connect 4 against one another.

It's not that there aren't more academic tests on AI performance. Why did the stranger specimens explode?

Photo credit:Paul Calcraft

For one thing, most of the industry-standard AI benchmarks don't mean much to the common person. Companies often tout their AI's ability to reply questions on math Olympiad exams or find plausible solutions to doctoral-level problems. Still, most individuals – including you actually – use chatbots for things like Answering emails and conducting basic research.

Crowdsourced industry metrics should not necessarily higher or more meaningful.

Take Chatbot Arena, for instance, a public benchmark that many AI enthusiasts and developers follow obsessively. With Chatbot Arena, anyone on the net can evaluate how well the AI ​​performs at specific tasks, equivalent to constructing an online app or generating a picture. But reviewers are inclined to be unrepresentative—most come from AI and tech industry circles—and solid their votes based on personal, hard-to-determine preferences.

LMSYS
The Chatbot Arena interface.Photo credit:LMSYS

Ethan Mollick, professor of management at Wharton University, pointed this out in a recent article post on

“The proven fact that there aren't 30 different benchmarks from different organizations in medicine, law, advice quality, etc. is an actual shame because people use systems for this stuff regardless,” Mollick wrote.

Weird AI benchmarks like Connect 4, Minecraft, and Will Smith eating spaghetti are definitely empirical—and even generalizable. Just because an AI passes the Will Smith test doesn't mean it’ll create, say, a burger fountain.

Mcbench
Note the typo; There isn’t any model like Claude 3.6 Sonnet.Photo credit:Adonis Singh

An expert I spoke to about AI benchmarks suggested that the AI ​​community should give attention to the downstream impact of AI slightly than its capabilities in narrow domains. That's reasonable. But I actually have a sense weird benchmarks aren't going away anytime soon. Not only are they entertaining – who doesn't love watching AI construct Minecraft castles? – but they’re easy to grasp. And as my colleague Max Zeff recently wrote, the industry continues to struggle to convert a technology as complex as AI into digestible marketing.

The only query on my mind is: What strange latest benchmarks will go viral in 2025?

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read