Generative AI and robotics are bringing us ever closer to the day when we will ask for an object and have it created in minutes. In fact, MIT researchers have developed a speech-to-reality system, an AI-driven workflow that enables them to send input to a robotic arm and “bring objects into life,” allowing them to make furniture in only five minutes, for instance.
With the Speech-to-Reality system, a table-mounted robotic arm is in a position to receive spoken input from a human, corresponding to “I would like an easy stool,” after which construct the objects from modular components. So far, researchers have used the system to create stools, shelves, chairs, a small table and even decorative items like a dog statue.
“We mix natural language processing, 3D generative AI and robotic assembly,” says Alexander Htet Kyaw, a graduate student at MIT and a Morningside Academy for Design (MAD) fellow. “These are rapidly advancing areas of research which have never before been brought together in such a way that you may actually make physical objects just from an easy voice prompt.”
Speech to Reality: On-demand production with generative 3D AI and discrete robotic assembly
The idea got here about when Kyaw — a graduate student in architecture, electrical engineering and computer science — took Professor Neil Gershenfeld’s “How to Make Almost Anything” course. In this course he built the Speech-to-Reality system. He continued working on the project on the MIT Center for Bits and Atoms (CBA) under the direction of Gershenfeld, collaborating with graduate students Se Hwan Jeon of the Department of Mechanical Engineering and Miana Smith of the CBA.
The speech-to-reality system begins with speech recognition that processes the user's request using a big language model, followed by 3D generative AI that creates a digital mesh representation of the thing, and a voxelization algorithm that breaks the 3D mesh into assembly components.
Geometric processing then modifies the AI-generated assembly to account for manufacturing and physical constraints related to the actual world, corresponding to: B. the variety of components, overhangs and connectivity of the geometry. This is followed by making a viable assembly sequence and automatic path planning for the robotic arm to assemble physical objects based on user input.
By using natural language, the system makes design and manufacturing more accessible to people without knowledge of 3D modeling or robot programming. And unlike 3D printing, which might take hours or days, this method may be in-built minutes.
“This project is an interface between humans, AI and robots to co-create the world around us,” says Kyaw. “Imagine a scenario where you say 'I would like a chair' and inside five minutes a physical chair appears in front of you.”
The team has immediate plans to enhance the furniture's load-bearing capability by switching the connection of the cubes from magnets to more robust connections.
“We have also developed pipelines to convert voxel structures into feasible assembly sequences for small, distributed mobile robots, which could help translate this work into structures of any size scale,” says Smith.
The purpose of using modular components is to eliminate the waste that comes from making physical objects by disassembling them after which reassembling them, for instance to show a settee right into a bed while you not need the sofa.
Because Kyaw also has experience in using it Gesture recognition and augmented reality for interaction Using robots within the manufacturing process, he’s currently working on integrating each voice and gesture control into the speech-to-reality system.
Drawing on his memories of the replicator within the Star Trek series and the robots within the animated film Big Hero 6, Kyaw explains his vision.
“I have the desire to make it easier for people to access physical objects in a fast, accessible and sustainable way,” he says. “I'm working toward a future where the character of matter is actually under your control. A future where reality may be generated on demand.”
The team presented their paper “Language to Reality: On-Demand Production with Natural Language, Generative 3D AI, and Discrete Robotic Assembly” on the Association for Computing Machinery (ACM) Symposium on Computational Fabrication (SCF '25), held November 21 at MIT.

