Over the past few weeks, we’ve seen a couple of high-profile videos of robotic systems doing really impressive things. And I mean, that’s what we’re all here for, right? Being impressed by the awesomeness of robots! But sometimes the awesomeness of robots is more complicated than what you see in a video making the rounds on social media—any robot has a lot of things going on behind the scenes to make it successful, but if you can’t tell what those things are, what you see at first glance might be deceiving you.
Earlier this month, a group of researchers from Stanford’s IRIS Lab introduced Mobile ALOHA, which (if you read the YouTube video description) is described as “a low-cost and whole-body teleoperation system for data collection”:
And just last week, Elon Musk posted a video of Tesla’s Optimus robot folding a shirt:
Most people who watch these videos without poking around in the descriptions or comments will likely not assume that these robots were being entirely controlled by experienced humans, because why would they? Even for roboticists, it can be tricky to know for sure whether the robot they’re watching has a human in the loop somewhere. This is a problem that’s not unique to the folks behind either of the videos above; it’s a communication issue that the entire robotics community struggles with. But as robots (and robot videos) become more mainstream, it’s important that we get better at it.
Why use teleoperation?
Humans are way, way, way, way, way better than robots at almost everything. We’re fragile and expensive, which is why so many people are trying to get robots to do stuff instead, but with a very few exceptions involving speed and precision, humans are the gold standard and are likely to remain so for the foreseeable future. So, if you need a robot to do something complicated or something finicky or something that might require some innovation or creativity, the best solution is to put a human in control.
What about autonomy, though?
Having one-to-one human teleoperation of a robot is a great way of getting things done, but it’s not scalable, and aside from some very specific circumstances, the whole point of robots is to do stuff autonomously at scale so that humans don’t have to. One approach to autonomy is to learn as much as you can from human teleoperation: Many robotics companies are betting that they’ll be able to use humans to gradually train their robotic systems, transitioning from full teleoperation to partial teleoperation to supervisory control to full autonomy. Sanctuary AI is a great example of this: They’ve been teleoperating their humanoid robots through all kinds of tasks, collecting training data as a foundation for later autonomy.
What’s wrong with teleoperation, then?
Nothing! Teleoperation is great. But when people see a robot doing something and it looks autonomous but it’s actually teleoperated, that’s a problem, because it’s a misrepresentation of the state of the technology. Not only do people end up with the wrong idea of how your robot functions and what it’s really capable of, it also means that whenever those people see other robots doing similar tasks autonomously, their frame of reference will be completely wrong, minimizing what otherwise may be a significant contribution to the field by other robotics folks. To be clear, I don’t (usually) think that the roboticists making these videos have any intention of misleading people, but that is unfortunately what often ends up happening.
What can we do about this problem?
Last year, I wrote an article for the IEEE Robotics & Automation Society (RAS) with some tips for making a good robot video, which includes arguably the most important thing: context. This covers teleoperation, along with other common things that can cause robot videos to mislead an unfamiliar audience. Here’s an excerpt from the RAS article:
It’s critical to provide accurate context for videos of robots. It’s not always clear (especially to nonroboticists) what a robot may be doing or not doing on its own, and your video should be as explicit as possible about any assistance that your system is getting. For example, your video should identify:
If the video has been sped up or slowed down
If the video makes multiple experiments look like one continuous experiment
If external power, compute, or localization is being used
How the robot is being controlled (e.g., human in the loop, human supervised, scripted actions, partial autonomy, full autonomy)
These things should be made explicit on the video itself, not in the video description or in captions. Clearly communicating the limitations of your work is the responsible thing to do, and not doing this is detrimental to the robotics community.
I want to emphasize that context should be made explicit on the video itself. That is, when you edit the video together, add captions or callouts or something that describes the context on top of the actual footage. Don’t put it in the description or in the subtitles or in a link, because when videos get popular online, they may be viewed and shared and remixed without any of that stuff being readily available.
So how can I tell if a robot is being teleoperated?
If you run across a video of a robot doing some kind of amazing manipulation task and aren’t sure whether it’s autonomous or not, here are some questions to ask that might help you figure it out.
Can you identify an operator? In both of the videos we mentioned above, if you look very closely, you can tell that there’s a human operator, whether it’s a pair of legs or a wayward hand in a force-sensing glove. This may be the first thing to look for, because sometimes an operator is very obvious, but at the same time, not seeing an operator isn’t particularly meaningful because it’s easy for them to be out of frame.
Is there any more information? The second thing to check is whether the video says anywhere what’s actually going on. Does the video have a description? Is there a link to a project page or paper? Are there credits at the end of the video? What account is publishing the video? Even if you can narrow down the institution or company or lab, you might be able to get a sense of whether they’re working on autonomy or teleoperation.
What kind of task is it? You’re most likely to see teleoperation in tasks that would be especially difficult for a robot to do autonomously. At the moment, that’s predominantly manipulation tasks that aren’t well structured—for example, getting multiple objects to interact with each other, handling things that are difficult to model (like fabrics), or extended multistep tasks. If you see a robot doing this stuff quickly and well, it’s worth questioning whether it’s autonomous.
Is the robot just too good? I always start asking more questions when a robot demo strikes me as just too impressive. But when does impressive become too impressive? Personally, I think a robot demonstrating human-level performance at just about any complex task is too impressive. Some autonomous robots definitely have reached that benchmark, but not many, and the circumstances of them doing so are usually atypical. Furthermore, it takes a lot of work to reach humanlike performance with an autonomous system, so there’s usually some warning in the form of previous work. If you see an impressive demo that comes out of nowhere, showcasing an autonomous capability without any recent precedents, that’s probably too impressive. Remember that it can be tricky with a video because you have no idea whether you’re watching the first take or the 500th, and that itself is a good thing to be aware of—even if it turns out that a demo is fully autonomous, there are many other ways of obfuscating how successful the system actually is.
Is it too fast? Autonomous robots are well known for being very fast and precise, but only in the context of structured tasks. For complex manipulation tasks, robots need to sense their environment, decide what to do next, and then plan how to move. This takes time. If you see an extended task that consists of multiple parts but the system never stops moving, that suggests it’s not fully autonomous.
Does it move like a human? Robots like to move optimally. Humans might also like to move optimally, but we’re bad at it. Autonomous robots tend to move smoothly and fluidly, while teleoperated robots often display small movements that don’t make sense in the context of the task, but are very humanlike in nature. For example, finger motions that are unrelated to gripping, or returning an arm to a natural rest position for no particular reason, or being just a little bit sloppy in general. If the motions seem humanlike, that’s usually a sign of a human in the loop rather than a robot that’s just so good at doing a task that it looks human.
None of these points make it impossible for an autonomous robot demo to come out of nowhere and blow everyone away. Improbable, perhaps, but not impossible. And the rare moments when that actually happens is part of what makes robotics so exciting. That’s why it’s so important to understand what’s going on when you see a robot doing something amazing, though—knowing how it’s done, and all of the work that went into it, can only make it more impressive.
This article was inspired by Peter Corke‘s LinkedIn post, What’s with all these deceptive teleoperation demos? And extra thanks to Peter for his feedback on an early draft of this article.