HomeArtificial IntelligenceAlter3 is the most recent humanoid robot with GPT-4 drive

Alter3 is the most recent humanoid robot with GPT-4 drive

Researchers at University of Tokyo And Alternative machine have developed a humanoid robot system that may translate natural language commands on to robot actions. Age3The robot was designed to leverage the extensive knowledge of enormous language models (LLMs) similar to GPT-4 to perform complicated tasks similar to taking a selfie or faking a ghost.

This is the most recent results of a growing body of research that brings together the ability of basic models and robotic systems. Although such systems don’t yet represent a scalable business solution, they’ve driven robotics research forward lately and are very promising.

How LLMs control robots

Alter3 uses GPT-4 as a backend model. The model receives a natural language instruction describing either an motion or a situation to which the robot must respond.

The LLM uses an “agentic framework” to plan a series of actions that the robot must perform to achieve its goal. In the primary phase, the model acts as a planner that must determine the steps required to perform the specified motion.

Next, the motion plan is passed to a coding agent, which generates the commands the robot needs to finish each of the steps. Because GPT-4 has not been trained on Alter3's programming commands, the researchers use its ability to learn in context to adapt its behavior to the robot's API. This signifies that the prompt incorporates an inventory of commands and a set of examples that show how each command might be used. The model then maps each of the steps to at least one or more API commands which might be sent to the robot to execute.

“Before the LLM, we had to regulate all 43 axes in a selected order to mimic an individual's pose or simulate a behavior like serving tea or playing chess,” the researchers write. “Thanks to LLM, we at the moment are free of this iterative work.”

Learning from human feedback

Language will not be probably the most accurate medium for describing physical postures. Therefore, the motion sequence generated by the model may not produce exactly the specified behavior within the robot.

To support corrections, the researchers added a feature that permits humans to supply feedback, similar to “lift your arm a bit more.” These instructions are sent to a different GPT-4 agent, which matches through the code, makes the needed corrections, and returns the motion sequence to the robot. The refined motion recipe and code are stored in a database for future use.

alter3 human feedback

The researchers tested Alter3 on quite a lot of tasks, including on a regular basis actions similar to taking a selfie and drinking tea, in addition to imitative movements similar to pretending to be a ghost or a snake. They also tested the model's ability to reply to scenarios that require elaborate motion planning.

“The training of the LLM covers a big selection of linguistic representations of movements. GPT-4 can accurately map these representations to Alter3's body,” the researchers write.

GPT-4's extensive knowledge of human behavior and actions makes it possible to create more realistic behavior plans for humanoid robots like Alter3. The researchers' experiments show that they were also capable of mimic emotions similar to embarrassment and joy within the robot.

“Even from texts wherein emotional expressions should not explicitly mentioned, the LLM can infer corresponding emotions and reflect them in Alter3’s physical reactions,” the researchers write.

More advanced models

The use of base models is becoming increasingly popular in robotics research. For example, Figure, valued at $2.6 billion, uses OpenAI models within the background to grasp human instructions and perform actions in the actual world. As multimodality becomes the norm in base models, robotic systems might be higher capable of understand their environment and select their actions.

Alter3 belongs to a category of projects that use off-the-shelf base models as reasoning and planning modules in robot control systems. Alter3 doesn’t use an optimized version of GPT-4, and the researchers indicate that the code might be used for other humanoid robots.

Other projects, similar to RT-2-X and OpenVLA, use special base models designed to generate robot commands directly. These models are inclined to produce more stable results and are applicable to more tasks and environments. However, in addition they require more technical skills and are dearer to construct.

What is usually ignored in these projects are the basic challenges of developing robots that may perform easy tasks like grasping objects, maintaining balance and moving. “At the extent below that, there's lots of other work that these models can't do,” AI and robotics researcher Chris Paxton said in an interview with VentureBeat earlier this 12 months. “And those are the things which might be hard to do. And in some ways, that's because the info isn't there.”


Please enter your comment!
Please enter your name here

Must Read