This is an automated archive.
The original was posted on /r/singularity by /u/IluvBsissa on 2024-01-21 00:11:36+00:00.
"I just published a story on a new robotics system from Stanford called Mobile ALOHA, which researchers used to get a cheap, off-the-shelf wheeled robot to do some incredibly complex things on its own, such as cooking shrimp, wiping stains off surfaces and moving chairs. They even managed to get it to cook a three-course meal—though that was with human supervision. Read more about it here.
Robotics is at an inflection point, says Chelsea Finn, an assistant professor at Stanford University, who was an advisor for the project. In the past, researchers have been constrained by the amount of data they can train robots on. Now there is a lot more data available, and work like Mobile ALOHA shows that with neural networks and more data, robots can learn complex tasks fairly quickly and easily, she says.
While AI models, such as the large language models that power chatbots, are trained on huge datasets that have been hoovered up from the internet, robots need to be trained on data that has been physically collected. This makes it a lot harder to build vast datasets. A team of researchers at NYU and Meta recently came up with a simple and clever way to work around this problem. They used an iPhone attached to a reacher-grabber stick to record volunteers doing tasks at home. They were then able to train a system called Dobb-E (10 points to Ravenclaw for that name) to complete over 100 household tasks in around 20 minutes. (Read more from Rhiannon Williams here.)
Mobile ALOHA also debunks a belief held in the robotics community that it was primarily hardware shortcomings holding back robots’ ability to do such tasks, says Deepak Pathak, an assistant professor at Carnegie Mellon University, who was also not part of the research team.
“The missing piece is AI,” he says.
AI has also shown promise in getting robots to respond to verbal commands, and helping them adapt to the often messy environments in the real world. For example, Google’s RT-2 system combines a vision-language-action model with a robot. This allows the robot to “see” and analyze the world, and respond to verbal instructions to make it move. And a new system called AutoRT from DeepMind uses a similar vision-language model to help robots adapt to unseen environments, and a large language model to come up with instructions for a fleet of robots.
And now for the bad news: even the most cutting-edge robots still cannot do laundry. It’s a chore that is significantly harder for robots than for humans. Crumpled clothes form weird shapes which makes it hard for robots to process and handle.
But it might just be a matter of time, says Tony Zhao, one of the researchers from Stanford. He is optimistic that even this trickiest of tasks will one day be possible for robots to master using AI. They just need to collect the data first. Maybe there is hope for me and my chair after all! "