Computer-aided design (CAD) systems are proven tools for designing most of the physical objects we use day-after-day. However, mastering CAD software requires extensive expertise, and lots of tools have such a high level of detail that they aren’t suitable for brainstorming or rapid prototyping.
To make design faster and more accessible to laypeople, researchers at MIT and elsewhere have developed an AI-driven robotic assembly system that permits people to construct physical objects just by describing them in words.
Their system uses a generative AI model to create a 3D representation of an object's geometry based on the user's input prompt. A second generative AI model then determines the specified object and figures out where various components ought to be placed in line with the item's function and geometry.
The system can routinely create the item from a set of prefabricated parts using robotic assembly. The design can be iterated based on user feedback.
The researchers used this end-to-end system to create furniture, including chairs and shelves, from two forms of prefabricated components. The components might be disassembled and reassembled at will, reducing the quantity of waste created by the manufacturing process.
They evaluated these designs as a part of a user study and located that greater than 90 percent of participants preferred the objects produced by their AI-driven system in comparison with other approaches.
Although this work is an initial demonstration, the framework could possibly be particularly useful for rapid prototyping of complex objects equivalent to aerospace components and architectural objects. In the long run, it could possibly be utilized in homes to provide furniture or other items on-site, without the necessity to ship bulky products from a central facility.
“Sooner or later, we wish to give you the option to speak and confer with a robot and an AI system in the identical way we confer with one another to create things together. Our system is a primary step toward making that future possible,” says lead writer Alex Kyaw, a graduate student in MIT's Departments of Electrical Engineering and Computer Science (EECS) and Architecture.
Kyaw is joined on the article by Richa Gupta, an architecture student at MIT; Faez Ahmed, associate professor of mechanical engineering; Lawrence Sass, professor and chair of the Computation Group within the Department of Architecture; senior writer Randall Davis, EECS professor and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and others at Google Deepmind and Autodesk Research. The Paper was recently presented on the Conference on Neural Information Processing Systems.
Generating a multi-component design
While generative AI models are good at generating 3D representations called meshes from text input, most don’t produce consistent representations of an object's geometry which have the component-level details required for robot assembly.
Separating these meshes into components is difficult for a model since the task of components depends upon the geometry and functionality of the item and its parts.
The researchers addressed these challenges using a vision-language model (VLM), a strong generative AI model pre-trained to know images and text. You task the VLM with determining how two forms of prefabricated parts, structural and panel, should fit together to form an object.
“There are some ways to connect plates to a physical object, however the robot must see the geometry and take into consideration that geometry as a way to make a call about it. By serving as each the robot's eyes and brain, the VLM allows the robot to do that,” says Kyaw.
A user prompts the system with text, equivalent to typing “Make me a chair,” and provides it an AI-generated image of a chair to begin.
The VLM then considers the chair and determines where to put panel components on structural components based on the functionality of many sample objects it has seen before. For example, the model may specify that the seat and backrest must have panels to create a seating and leaning surface for somebody sitting on the chair.
It outputs this information as text, for instance “seat” or “backrest”. Each surface of the chair is then labeled with numbers and the knowledge is reported back to the VLM.
The VLM then selects the labels that correspond to the geometric parts of the chair to be paneled on the 3D mesh to finish the design.
Co-design between humans and AI
The user stays informed during this process and may refine the design by giving the model a brand new prompt, equivalent to: B. “Only use panels on the backrest, not on the seat.”
“The scope for design could be very wide, so we narrow it down based on user feedback. We imagine that is the easiest way because people have different preferences and it might be inconceivable to create an idealized model for everybody,” says Kyaw.
“The human-in-the-loop process allows users to manage the AI-generated designs and develop a way of ownership over the tip result,” adds Gupta.
Once the 3D mesh is accomplished, a robotic assembly system builds the item from prefabricated parts. These reusable parts might be disassembled and reassembled in various configurations.
The researchers compared the outcomes of their method with an algorithm that places panels on all horizontal surfaces facing upwards and an algorithm that places panels randomly. In a user study, greater than 90 percent of individuals preferred the designs created by their system.
They also asked the VLM to clarify why it selected to put in panels in these areas.
“We learned that the Vision-Language model is in a position to understand the functional elements of a chair, equivalent to backing and sitting, to a certain extent, and understand why it puts panels on the seat and back. It doesn't just spit out these tasks at random,” says Kyaw.
In the long run, the researchers need to expand their system in order that it could possibly also handle more complex and differentiated user input, equivalent to a table made from glass and metal. In addition, they need to include other prefabricated components equivalent to gears, hinges or other moving parts to present the objects more functionality.
“Our hope is to dramatically lower the barrier to access to design tools. We have shown that we will use generative AI and robotics to remodel ideas into physical objects quickly, accessibly and sustainably,” says Davis.

