HomeNewsPeople add AI by bouncing balls in rotating shapes

People add AI by bouncing balls in rotating shapes

The list of informal, strange AI benchmarks continues to grow.

In recent days, some within the AI ​​community have commented on X turn out to be obsessed With a test of how different AI models, especially so-called reasoning models, process input like this: “Write a Python script for a bouncing yellow ball in a shape. Allow the mold to rotate slowly, ensuring the ball stays within the mold. “

Some models handle this “ball in rotating form” benchmark higher than others. After For one user on

Per Another X posterAnthropic's Claude 3.5 Sonnet and Google's Gemini 1.5 Pro models misjudged physics, causing the ball to flee form. Other user reported that Google's Gemini 2.0 Flash-Thinking Experimental and even Openai's older GPT-4O scored in a Go.

But what does it prove that an AI can or can encode a rotating, spherical shape?

Well, simulating a bouncing ball is a classic programming Challenge. Accurate simulations include collision detection algorithms that try and discover when two objects (e.g. a sphere and the side of a shape) collide. Poorly written algorithms can affect the performance of the simulation or result in obvious physics errors.

X users N8 programsA researcher in residence at AI startup Nous Research said it took him about two hours to program a bouncing ball in a rotating heptagon from scratch. “You must keep track of multiple coordinate systems, how the collisions are performed in each system, and design the code to be robust from the beginning,” N8 Programs explained in a single post.

But while the balls and rotating shapes are an inexpensive test of programming skills, they're not a really empirical AI benchmark. Even slight variations within the prompt can – and do – produce different results. That's why some users of X report having higher luck with it O1while others say that r1 falls too short.

If anything, such virus tests point to the intractable problem of making useful measurement systems for AI models. It's often difficult to say what differentiates one model from one other, outside of the esoteric benchmarks that aren't relevant to most individuals.

Many efforts are underway to create higher tests corresponding to the ARC-Agi benchmark and the Last Test of Mankind. We'll see how this fares – and within the meantime watch gifs of balls in rotating shapes.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read