If someone advises you to “know your limits,” they’re probably suggesting, for instance, exercise sparsely. However, for a robot, the motto represents learning limitations or limitations of a specific task throughout the machine environment to finish tasks safely and appropriately.
For example, imagine asking a robot to wash your kitchen despite the fact that it doesn't understand the physics of its environment. How can the machine create a practical multi-step plan to make sure the room is spotless? Large language models (LLMs) can come close, but when the model is trained only on text, it is going to likely miss essential details concerning the robot's physical limitations, equivalent to how far it may well reach or whether there are nearby obstacles that prevent it needs to be avoided. Just follow LLMs and also you'll probably find yourself removing pasta stains out of your floorboards.
To help robots perform these open-ended tasks, researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) used machine vision models to see what’s near the machine and model its limitations. The team's strategy is for an LLM to design a plan, which is checked in a simulator to make sure it’s secure and realistic. If this sequence of actions is just not feasible, the language model generates a brand new plan until it arrives at a plan that the robot can perform.
This trial-and-error method, which the researchers call “Planning for Robots via Code for Continuous Constraint Satisfaction” (PRoC3S), tests long-term plans to make sure they meet all constraints, allowing a robot to perform such diverse tasks equivalent to writing individual letters, drawing a star, and sorting and placing blocks in numerous positions. In the longer term, PRoC3S could help robots complete more complicated tasks in dynamic environments equivalent to homes, where they could be asked to finish a general task consisting of many steps (e.g., “make me breakfast”).
“LLMs and classical robotics systems equivalent to task and motion planners cannot perform these kind of tasks alone, but together their synergies enable open-ended problem solving,” says graduate student Nishanth Kumar SM '24, co-lead creator of a brand new paper on PRoC3S. “We create a simulation of the robot’s environment during operation and take a look at out many possible motion plans. Vision models help us create a really realistic digital world that permits the robot to take into consideration possible actions for every step of a long-term plan.”
The team's work was presented last month in a paper shown on the Conference on Robot Learning (CoRL) in Munich, Germany.
Teaching a robot its limitations for open-ended tasks
WITH CSAIL
The researchers' method uses an LLM that was previously trained on texts from the Internet. Before tasking PRoC3S with executing a task, the team provided its language model with a sample task (e.g. drawing a square) that was related to the goal task (drawing a star). The example task includes an outline of the activity, a long-term plan, and relevant details concerning the robot's environment.
But how have these plans worked out in practice? In simulations, PRoC3S successfully drew stars and letters eight out of ten times. It could also stack digital blocks in pyramids and contours and precisely place objects like fruit on a plate. In each of those digital demos, the CSAIL method accomplished the required task more consistently than comparable approaches “LLM3” And “Code as guidelines”.
Next, CSAIL engineers took their approach into the true world. Their method developed and executed plans on a robotic arm, teaching it to rearrange blocks in straight lines. PRoC3S also allowed the machine to position blue and red blocks in matching bowls and move all objects near the middle of the table.
Kumar and co-lead creator Aidan Curtis SM '23, who can also be a doctoral student at CSAIL, say these results show how an LLM can develop safer plans that folks can trust to work in practice. The researchers envision a house robot that may be asked a more general request (e.g., “Bring me some chips”) and reliably determine the precise steps needed to finish it. PRoC3S could help a robot test plans in an analogous digital environment to seek out a plan of action that works – and more importantly, bring you a tasty snack.
For future work, the researchers need to improve the outcomes using a more advanced physics simulator and extend them to more complex, longer horizon tasks using more scalable data search techniques. In addition, they plan to use PRoC3S to mobile robots equivalent to a quadruped for tasks that include walking and scanning the environment.
“Using basic models like ChatGPT to regulate robot actions can result in unsafe or incorrect behavior as a consequence of hallucinations,” says Eric Rosen, a researcher on the AI ​​Institute who is just not involved within the research. “PRoC3S addresses this problem by leveraging fundamental models for high-level task execution while leveraging AI techniques that explicitly reason concerning the world to provably ensure secure and proper actions. This combination of planning-based and data-driven approaches might be key to developing robots able to understanding and reliably performing a broader range of tasks than is currently possible.”
Kumar and Curtis' co-authors are also CSAIL partners: MIT undergraduate researcher Jing Cao and MIT electrical engineering and computer science professors Leslie Pack Kaelbling and Tomás Lozano-Pérez. Her work was supported partly by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, the Army Research Office, MIT Quest for Intelligence, and the AI ​​Institute.