Let's say you desire to teach a robot the right way to use tools after which quickly have it learn to perform repairs across the house using a hammer, wrench, and screwdriver. To do that, you would wish an incredible amount of information demonstrating tool use.
Existing robot datasets vary widely in modality—some contain color images, while others consist of tactile impressions, for instance. Data may be collected in other areas, similar to simulations or human demos. And each dataset can capture a novel task and environment.
Because it’s difficult to efficiently integrate data from so many sources right into a machine learning model, many methods use only a single data type to coach a robot. But robots trained in this fashion with a comparatively small amount of task-specific data are sometimes unable to perform recent tasks in unfamiliar environments.
In an effort to coach higher multipurpose robots, researchers at MIT have developed a method to mix multiple data sources from different domains, modalities, and tasks using a sort of generative AI called diffusion models.
They train a separate diffusion model to learn a technique or policy for completing a task given a given dataset. They then mix the policies learned by the diffusion models right into a general policy that allows a robot to perform multiple tasks in numerous environments.
In simulations and real-world experiments, this training approach enabled a robot to perform multiple tool-using tasks and adapt to recent tasks it had not seen during training. The method, generally known as policy composition (PoCo), resulted in a 20 percent improvement in task performance in comparison with baseline techniques.
“Considering heterogeneity in robot datasets is sort of a chicken-and-egg problem. If we would like to make use of a whole lot of data to coach general robot policies, we first need deployable robots to get all that data. I believe that leveraging all available heterogeneous data, just like what researchers have done with ChatGPT, is a very important step for the sphere of robotics,” says Lirui Wang, a PhD student in Electrical Engineering and Computer Science (EECS) and lead writer of a Paper on PoCo.
Wang's co-authors include Jialiang Zhao, a mechanical engineering graduate student; Yilun Du, an EECS graduate student; Edward Adelson, the John and Dorothy Wilson Professor of Vision Sciences within the Department of Brain and Cognitive Sciences and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior writer Russ Tedrake, the Toyota Professor of EECS, Aerospace and Mechanical Engineering and a member of CSAIL. The research shall be presented on the Robotics: Science and Systems Conference.
Combining different data sets
A robot policy is a machine learning model that takes input and uses it to perform an motion. You can consider a policy as a technique. In the case of a robot arm, that strategy could be a trajectory or a series of poses that move the arm to choose up a hammer and use it to drive a nail.
The datasets used to learn robot policies are typically small and focused on a selected task and environment, similar to packing items into boxes in a warehouse.
“Each individual robot warehouse generates terabytes of information, but that data only belongs to the precise robot installation working on those packages. That's not ideal if you desire to use all that data to coach a general machine,” says Wang.
The MIT researchers developed a method that takes a series of smaller data sets—similar to those collected by many robot warehouses—and learns separate policies for every data set, then combines those policies in a way that enables a robot to generalize them to many tasks.
They represent each policy using a sort of generative AI model called a diffusion model. Commonly used for image generation, diffusion models learn to create recent data samples which are just like samples in a training dataset by iteratively refining their output.
But as a substitute of teaching a diffusion model to generate images, the researchers teach it to generate a trajectory for a robot. They do that by adding noise to the trajectories in a training dataset. The diffusion model regularly removes the noise and refines its output right into a trajectory.
This technique, generally known as Distribution policywas previously presented by researchers at MIT, Columbia University, and the Toyota Research Institute. PoCo builds on this diffusion policy work.
The team trains each diffusion model on a special sort of dataset, for instance one with video demonstrations from humans and one other obtained from handheld remote control of a robotic arm.
The researchers then perform a weighted combination of the person strategies learned by all diffusion models and iteratively refine the result in order that the combined strategy meets the objectives of every individual strategy.
More than the sum of its parts
“One of the benefits of this approach is that we are able to mix strategies to get the very best of each worlds. For example, a technique trained on real data could achieve greater dexterity, while a technique trained on simulations could potentially achieve greater generalization,” says Wang.
Because the policies are trained individually, one can mix and match diffusion policies to improve results for a selected task. A user also can add data in a brand new modality or domain by training a further diffusion policy on that dataset, somewhat than starting the entire process from scratch.
The researchers tested PoCo in a simulation and on real robot arms performing various tool tasks, similar to driving a nail with a hammer or turning an object over with a spatula. PoCo resulted in a 20 percent improvement in task performance in comparison with baseline methods.
“The striking thing was that after we finished fine-tuning and visualized them, we could clearly see that the composite trajectory looks significantly better than any of them individually,” says Wang.
In the longer term, the researchers would really like to use this system to long-term tasks where a robot picks up a tool, uses it, after which switches to a different. They also want to include larger robot datasets to enhance performance.
“To achieve success in robotics, we’d like all three kinds of data: web data, simulation data and real robot data. How to mix them effectively shall be the million-dollar query. PoCo is a solid step in the appropriate direction,” says Jim Fan, principal scientist at NVIDIA and head of the AI Agents Initiative, who was not involved on this work.
This research is funded partially by Amazon, the Singapore Defense Science and Technology Agency, the U.S. National Science Foundation and the Toyota Research Institute.